Gary Parker

The problem

On the face of it you may say, my test execution doesn't cost anything. I run my tests on a build agent or Docker Container, and there are no costs incurred.

If you're using Selenium Grid or other services, there may be some direct costs in terms of infrastructure (either self-hosted or paid cloud)

The costs which aren't usually considered are:

How long does it take for your tests to execute?
How long are you waiting for the results?

And...

If a test or multiple tests fail, what is your process?
- Do you re-run the build?
- Do you have a way to only re-run failed tests?
- Or do you manually execute the failed tests somewhere else?

All of this time adds up very quickly, a test suite that took 20 minutes to execute but had 5 flaky tests, has now added up to almost an hour of investigation and re-running of tests.

How can we put a value on time

You could just directly add up the amount of time tests take to execute and monitor how long any extra investigation takes.

Another idea is to directly associate a cost with time, this might be easier if you are paying for your infrastructure, as you can see how much any execution and re-running is costing you.

And then there's the human element to it, do you consider your hours spent as a direct cost that you calculate?

Let's look at an example

I like to use the Microsoft pricing model for Playwright service as a baseline, as it's fairly reasonable for their pricing, and it's easy to calculate.

Hosted on Linux OS: $0.01/ 1 test minute

Scenario

So you've got a test suite of:

100 tests running for 30 minutes - +$0.30
3 tests fail, so you re-run the whole suite - +$0.30
1 test still fails, so you manually test it - +$xx.xx$?
That 1 failing test delays delivery of a new feature - +$xxx.xx?

The manual check could vary in cost, but based on an average mid-level salary I estimated about $10-$20 of time spent for a 30 minute investigation or manual check.

The cost of this flaky test becomes significantly more expensive when a human has to get involved.

Add in the impact of delaying a feature being released in to production, and the costs are multiplying up further.

What can we do?

So there's many issues with the previous scenario, let's just highlight a few of them.

The time it takes to get feedback on the test suite is far too long
A small selection of flaky tests is having a large impact on time and costs
There should be a way to run only failing tests quickly
A decision needs to be made on what to do with flaky tests

Tackling flaky tests

Running only failing tests is fairly simple depending on the framework you use, some have flags or parameters you can pass. Or you could create your own parameters which allow you to specify which tests run.

Flaky tests however are a hot topic for debate, what do you do with them?

Ignore them?
Try to fix them?
Add extra retries?
Delete them?

I think whichever route you take with flaky tests, you need to be consistent and you need to be in agreement with your team. The uncertainty that comes with flaky tests is the most costly aspect.

I've found that trying to hack away at an existing flaky test never works, and sometimes starting from scratch and incrementally executing the test as you make changes is much more effective.

Monitoring and SLA's

Most teams have visualisations or outputs of how long their tests take to execute, but the actual proactive monitoring is sometimes missing.

There are a few common SLA's that should be considered:

How long individual tests take to execute
- Do you have a few tests in your suite taking significantly longer than others
How long the overall test suite takes to execute
- And having a strict SLA that you make sure isn't breached, eg. 5 minutes

And some less common ones:

How many retries/re-runs is acceptable
- They are an easy option to use, but sometimes mask underlying issues with your tests or application
- Setting a strict zero or one time retry could benefit more than multiple retries
How many times does a test have to fail to be considered flaky
- As we mentioned before, ignoring a 'known' failing test does more harm than good - if the test no longer adds value to your suite, make a decision on what action to take

Conclusion

Test execution is expensive, and most times it goes unnoticed because there isn't a physical invoice at the end of the month.

Taking extra care and attention to your test execution as a whole, and not just do they pass or fail, will have many benefits for your team in the short and long term.