22. August 2023 By Yelle Lieder
Testing the sustainability of software – how do you test sustainability as a quality feature of software?
‘Non-functional requirements are constraints imposed on a system to define quality characteristics such as reliability, maintainability or scalability.’ This is probably what many computer scientists were taught during their first semester at university. Many of the ‘-ilities’ (usability, scalability and so) are defined as measurable quality requirements for software in ISO standard 25010 on ‘Software Quality Requirements’. What are still missing, however, is sustainability. The question is, how do you operationalise sustainability as a testable quality requirement for software?
Tools to check the sustainability of software
The process for testing the sustainability of software often starts with the CPU load. This measures how much a software program utilises the processor. Since processors account for a large share of power consumption, measuring their load can give a rough estimate of how much power they consume. The advantage here is that almost every language has libraries to measure the CPU load at runtime.
The CPU load alone says nothing about the exact amount of power a piece of software consumes, but if two versions of the same software – on the same hardware – generate different CPU loads, then this information is enough to track rough trends. Seeing as measuring the CPU load is an easy way to get started and does not require extensive modifications to be made to the codebases, it is good place to gain some initial experience with indirect sustainability testing – even if it has certain weaknesses in terms of reproducibility and comparability, as this case study by Green Coding Berlin shows.
Measuring sustainability based on source code
Technically speaking, we are not ‘measuring’ anything here if we do not have any specialised hardware or electricity meters installed. Digitally observing CPU, disk, bandwidth and I/O only allows us to estimate the actual environmental impact. There are, however, some source code-based tools out there that produce sufficiently good estimates for software sustainability. For example, JoularJX for Java, co2.js for JavaScript and Codecarbon for Python are libraries that can be easily integrated into code and thus allow you to get roughly accurate data about the runtime behaviour. Sophisticated solutions such as Scaphandre, PowerJoular or TRACARBON sometimes require special hardware or RAPL, meaning you can obtain actual measurements rather than ‘just estimates’.
Monitoring the sustainability of software
The tools above are part of the design quality assurance. Monitoring at runtime is a good idea when software is fully developed and source code-based measurements are going to be put to the test externally using analytical quality assurance. This article by .eco introduces common online tools such as greenframe.io or digitalbeacon.co – which are designed to help measure the footprint of websites – as well as their advantages and disadvantages. However, since it is rather difficult to establish a correlation between data transfer and environmental impact – as explained in this blog post – these tools are not included here as quantitative software quality assurance tools.
If you are looking to perform a quick round of local tests (especially for desktop applications), then softwarefootprint from the Öko-Institut’s (German Institute for Applied Ecology) Jens Gröger is a good option. If it is browser tests you want to perform, then the Firefox Profiler has also included a toolkit for testing the sustainability of applications in the browser since version 104. This offers the advantage of also being able to test sites and applications that are not publicly accessible, unlike online tools in which you enter the website URL.
Kepler and Cloud Carbon Footprint (CCF) are runtime monitoring tools for the cloud. Kepler allows you to monitor individual pots in Kubernetes clusters. CCF allows you to connect different cloud providers via billing APIs and can offer an alternative view to the often ‘nicely calculated’ dashboards of the providers themselves using billed resources.
Sustainability testing in CI/CD pipelines
In conventional testing, many quality criteria for software are automatically checked in development pipelines. Once code is deployed, these pipelines check rules before the code is integrated and made available to others or put into operation. A common tool for mapping such pipelines is SonarQube, for which the EcoCode plug-ins examine many best practices for sustainable code. This is static code analysis, which has the advantage that code does not need to be executed to assess its sustainability. At the same time, unit tests can also be performed to measure the amount of energy software already in pipelines consumes and to stop code being integrated if defined limits are exceeded.
The German Environment Agency’s (Umweltbundesamt) SoftAware research project, which it is running in cooperation with the Sustainable Digital Infrastructure Alliance, also focuses on pipelines. Since most modern software consists to a large extent on external components and libraries, the project aims to analyse these recycled components within pipelines in order to get closer to having a holistic picture of software sustainability. If you are also interested in improving the energy consumption of your own pipelines, then you should take a look at the Eco-CI Actions for GitHub and GitLab CI.
Inspections and reviews are also among the more static procedures. If you are serious about wanting to integrate sustainability into your software development and quality assurance process, then you need to make sure that you also pay attention to code quality in terms of sustainability during merge requests and in code reviews. It makes sense here to define project and organisation-specific guidelines and checklists for best practices and anti-patterns. Take a look at the VU Amsterdam and the European Institute for Sustainable IT if you are looking for some inspiration for the right patterns.
Methodology for sustainable software testing
Once you have decided on a toolset, you then need to decide what you are going to measure and how you are going to measure it. There is no point in testing software once with one user. You have to define a standard use scenario and measure it regularly under a realistic load, using hardware that you would in live production. You also need to factor in aspects of scaling behaviour in elastic infrastructures.
In addition to the measurement under load, you should also consider measuring when the system is idle, as many are occasionally operated without an acute load. The danger is that systems generate a disproportionately high IT load when idle, but optimisations are often only considered under load.
Finally, you need to classify and evaluate your measurement results, as a measurement by itself will not help you make a decision. You can contextualise your measurement results by defining target KPIs and critical values. You must also give consideration to whether the environmental impacts are in reasonable proportion to the application. Just as with a life cycle assessment, it is useful here to define functional units – such as users, sessions or transactions – to which the KPIs are related.
Summary
Data-based improvement strategies contribute more to achieving sustainability goals than blindly optimising code where you cannot objectively assess how effective it is. People are now highly familiar with the technical solution for sustainable software. The challenges getting in the way of sustainable IT in organisations today tend to be culture and data. As long as sustainability is not seen and measured as a quality feature across the board, it will be impossible to assess which decisions will improve sustainability. As a final summary, here are the steps you should consider for sustainable testing:
- 1. Sustainability KPIs: Define KPIs, targets and functional units.
- 2. Source code-based measurement: Do this during the development phase to incorporate early insights into design decisions.
- 3. Static code analysis: Do this before integrating code to avoid common anti-patterns.
- 4. Reviews: Perform these in teams to discuss complex dependencies and conflicting goals transparently.
- 5. Monitoring: Continuous monitoring at runtime.
Outlook
A major limitation of the tools and methods in this blog post is that they are restricted to greenhouse gas emissions and electricity consumption. Data centres, hardware, minerals, cooling systems, water consumption, submarine cables and e-waste cause measurable environmental impacts that go beyond pure electricity consumption. What is more, few tools take into account the power source at runtime, although this can be influenced during the development process, as shown in this blog post.
At adesso, we are meeting this challenge together with our partners in the ECO:DIGIT project, which is funded by the German Federal Ministry for Economic Affairs and Climate Action (Bundesministerium für Wirtschaft und Klimaschutz). The overarching goal of the project is to establish transparency, objectivity and standardisation for assessing resource consumption in the use of software. Indicators will be the power consumption of the working environment, the use of hardware resources, but also other metrics such as the amount of raw materials and chemicals used in hardware production.
The tools and methods considered in this post are a good place to start and you should not wait for the perfect methodology to be developed before getting the ball rolling with measuring sustainability. However, as long as the full extent of the environmental impact systems have is not systematically recorded and evaluated, the actual quality of software and thus of work results cannot be conclusively assessed as it stands.
You can find more exciting topics from the adesso world in our blog pots published so far.
Also interesting: