Automation is a great way to run tests faster and more frequently than manual execution. There are a plethora of tools and frameworks available for automating every kind of test imaginable. However, automation is not a free lunch. It requires full software development skills to create as well as perpetual maintenance to keep it running.
Ongoing maintenance costs are inevitable. Products evolve. Features change. Better tools come out. And things don’t always turn out right the first time. Teams can feel trapped if the burden of their test projects becomes too great.
So, when is test automation maintenance too much? At what point does it take too much time away from actually testing product behaviors and finding bugs? Let’s look at three ways to answer this question.
The Hard Math
One way to determine if automation maintenance is too burdensome is through the “hard math.” This is the path for those who like metrics and measurements. All things equal, the calculations come down to time.
Assume an automated test case is already written. Now, suppose that this automated test stops working for whatever reason. The team must make a choice: either fix the test or abandon it. If abandoned, the team must run the test manually to maintain the same level of coverage. Both choices – to fix it or to run it manually – take time.
Clearly, if the fix takes too much time to implement, then the team should just run it manually. However, “too much time” is not merely a comparison of the time to fix versus the time to run manually. The frequency of test execution matters. If the test must be run very frequently, then the team can invest more time for the fix. Otherwise, if the test runs rarely, then the fix might not be worthwhile, especially if it takes lots of time to make.
Let’s break down the math. Ultimately, to justify automation:
Time spent on automation < Time spent manually running the test
Factoring in fixes and test execution frequency:
Time to fix + (Automated execution time x Frequency) < (Manual execution time x Frequency)
Isolating the time spent on the fix:
Time to fix < (Manual execution time – Automated execution time) x Frequency
There’s a famous xkcd comic entitled “Is It Worth the Time?” that illustrates this tradeoff perfectly.
Let’s walk through an example with this chart. Suppose the test in question runs daily, and suppose that the automated version test runs 1 minute faster than the manually-executed version. Over 5 years, this test will run 1825 times. If the test is automated, the team will save 1825 minutes in total, which is a little more than 1 day. Thus, if the team can fix the test in less than a day, then fixing the test is worth it.
In theory, the hard math seems great: just pop in the numbers, and the answer becomes obvious. In practice, however, it’s rather flawed. The inputs are based on estimations. Manual test execution time depends on the tester, and automated execution time depends on the system. Test frequency can change over time, such as moving a test from a continuous suite to a nightly one. Frequency also depends on the total lifespan of the test, which is difficult to predict. From my experience, most tests have a longevity of only a few years, but I’ve also seen tests keep going for a decade.
Furthermore, this calculation considers only the cost of time. It does not consider the cost of money. Automated testing typically costs more money than manual testing for labor and for computing resources. For example, if automation is twice the price, then perhaps the team could hire two manual testers instead of one automation engineer. The two testers could grind through manual testing in half the time, which would alter the aforementioned calculations.
Overall, the hard math is helpful but not perfect. Use it for estimations.
The Gut Check
Another way to determine if automation maintenance is too burdensome is through the “gut check.” Teams intuitively know when they are spending too much time fixing broken scripts. It’s an unpleasant chore. Folks naturally begin questioning if the effort is worthwhile.
My gut check triggers when I find underlying issues in a test project that affect multiple tests, if not the whole suite. These issues run deeper than increasing a timeout value or rewriting a troublesome assertion. A quick fix may get tests running again temporarily, but inevitably, more tests will break in the future. Since underlying problems are systemic in nature, they can require a lot of effort to fix properly. I size up the cost as best as I can and do a little hard math to determine if the real fixes are worthwhile.
Here are six underlying issues that always trigger my gut check:
Feature Churn
Development is dynamic. The product under test is always changing. Sometimes, a team makes changes iteratively by adding new behaviors upon the existing system. Other times, a team decides to rewrite the whole thing, changing everything. (Hey, they need to try that newfangled frontend framework somehow!)
Changes to the product inevitably require changes to the tests. This is normal and expected, and teams should expect this kind of maintenance cost. Excessive changes, however, become problematic. If features keep churning, then the team should consider running tests manually until the product becomes more stable.
Unstable Environments
An “unstable” environment is one in which the product under test cannot run reliably. The app itself may become cripplingly slow. Its services may intermittently go down. The configuration or the data may unexpectedly change. I once faced this issue at a large company that required us to do all our testing in a “staging” environment shared by dozens of teams.
Unstable environments can ruin a test suite that is otherwise perfectly fine. Humans can overcome problems with instability when running their tests, but automated scripts cannot. Teams should strive to eliminate instability in test environments as a prerequisite for all testing, but if that isn’t possible, then they should build safety checks into automation or consider scaling back their automation efforts.
Unimportant Tests
Let’s face it: some tests just aren’t as valuable as others. In a perfect world, teams would achieve 100% coverage. In the real world, teams need to prioritize the tests they automate. If an unimportant test breaks, a team should not just consider if the test is worth fixing: they should consider if the test is worth keeping.
When considering if a test is important, ask this question: If this test fails, will anyone care? Will developers stop what they are doing and pivot to fixing the bug? If the answer is “no,” then perhaps the test is not important.
Interdependent Tests
Automated test cases should be completely independent of each other. They should be able to run individually, in parallel, or in any order. Unfortunately, not all teams build their tests this way. Some teams build interdependencies between their tests. For example, test A might set up some data that test B requires.
Interdependencies between tests create a mess. Attempting to change one test could inadvertently affect other tests. That’s why test case independence is vital for any scalable test suite.
Complicated Tests
Simple is better than complex. A test case should be plainly understandable to any normal person. When I need to read a complicated test that has dozens of steps and checks a myriad of behaviors, I feel quite overwhelmed. It takes more time to understand the intentions of the test. The automation code should also follow standard patterns and conventions.
Simple tests are easy to maintain. Complicated tests are harder. ‘Nuff said.
Persistent Flaky Failures
Flakiness is the bane of black-box test automation. It results from improperly-handled race conditions between the automation process and the system under test. Automation must wait for the system to be ready before interacting with it. If the automation tries to, say, click on a button before a page loads it, then the script will crash and the test will fail.
A robust test project should handle waiting automatically at the framework level. Expecting testers to add explicit waits to every interaction is untenable. Tests that don’t perform any waiting are flatly unmaintainable because they could intermittently fail at any time.
The Opportunity Cost
A third way to determine if automation maintenance is too burdensome is its opportunity cost. What could a team do with its time instead of fixing broken scripts? The “hard math” from before presented a dichotomy between manual execution and automation. However, those aren’t the only two factors to consider.
For example, suppose a team spends all their testing time running existing automated suites, triaging the results, and fixing any problems that arise. There would be no capacity left for exploratory testing – and that’s a major risk. Exploratory testing forces real humans to actually try using the product, revealing user experience issues that automation cannot catch. There would also be no time for automating new tests, which might be more important than updating old, legacy tests.
Even if a team’s gut check leans toward fixing broken automation, and even if the hard math validates that intuition, a team may still choose to abandon those tests in favor of other, more valuable activities. It’s a tough call to make. Justify it with opportunity cost.
The Final Answer
Whichever way you determine if test maintenance is taking too much time away from finding defects, be sure to avoid one major pitfall: the sunk cost fallacy. Endlessly fixing broken scripts that keep flailing and failing is counterproductive. Test automation should be a force multiplier. If it becomes a force divider, then we’re doing it wrong. We should stop and reconsider our return on investment. Use these three ways – the hard math, the gut check, and the opportunity cost – to avoid the sunk cost fallacy.