In the first part of this series, I described how we worked with a risk averse organization to achieve their test automation targets while leading them to a culture more accepting of change. We were working with a more confident project management organization now. They felt they could allow a few more significant changes, ones that might have been too unnerving for the customer at the beginning of the revamp project. The leadership of the development team were keen to grow the separate test automation framework since they felt it allowed them to produce application features faster. Feedback from test automation into development was generally positive and our recommended changes were always sensible. Oversight and reporting constraints were lessened because shared tools and wikis contained most of the information that internal stakeholders needed. We were now in a position to add new goals to the project. The product was already undergoing continuous improvement: we wanted to see the processes and people improving continuously too.

Improving Specification

The project was using a form of scrum, tailored based on the experiences of the few of us who had experienced agile delivery before. We adopted the sprint planning, planning poker and retrospective ceremonies to add to the project’s long-standing daily stand-up. Our product owner and business analysts, although new to agile delivery, did great work preparing for each ceremony, and work flowed through our sprints quickly.

We still had separate development and test automation teams. The teams worked off the same product backlog, with features appearing on the test automation board when moved into the “testable” state on the development board. By the time tests were designed, the initial implementation was already complete. I was concerned that this exposed us to the risk of (test) confirmation bias: we would test that the product did things right, rather than testing that it did the right things. I pushed for earlier feature specification, so that tests could be designed against a shared understanding instead of a working (but potentially wrong) product. We experimented with example mapping. At first this was well received, but as development velocity increased and the number of stories completed per sprint approached sixty, its use waned in favour of feature expositions by the product owner or business analyst. These brief meetings gave the attending developer, tester and user representative the opportunity to raise concerns, without the back-and-forth process of collaborative specification creation. This worked for us because the domain was so well understood by the product owner and analysts. The fact that every user of the product sat within fifty metres of the development team also helped!

Improving Test Automation

At this stage we were still dealing with automated service tests written in PowerShell. The minimal user interface was exercised with just light exploratory testing. Initially, this was acceptable because the user interface was experimental and continuously changing: the only requirement for testing it was that it was good enough for managed demonstrations. As we prepared to start work on potentially-releasable user interface stories, we needed to decide how to automate UI-level regression checks. We used our technical successes to convince the development and management teams that we were able to drive Selenium tests through C#, and that we would not need much assistance from the developers. Rather, this would allow the developers to leverage or adopt the tests later, should the project staffing model ever change.

The key point to be taught to the test team as they learned to design automated user interface tests was to limit the scope of each test to the barest minimum. Testing a new user interface feature as a manual tester usually involves understanding how the feature is likely to be used in production, how it fits into business process, what other actions are likely to happen before and after the feature under test, and so forth. Designing good UI-level automated tests to be tolerant of UI changes and to live as long as the feature under test requires that you do as little as possible through the user interface. Ideally, the only interaction an automated test would have with the user interface would be to check a single UI element’s value, to make a single click, or something like that. Everything else should be done via calls to robust and well-specified APIs. This includes preparing all entities required by the test, and asserting on values changed by the action (if any). Practically, this is usually impossible, but we did develop patterns that applied to most pages of the web application and resulted in good, short tests that gave us more than enough coverage to satisfy everyone on the project. Each form submission was tested with two simple UI level tests: one that populated and submitted a form through the UI, and verified the correct data was persisted; and one that used the equivalent API call to persist the data and asserted that the UI rendered the resulting entities correctly. Cross-cutting functionality like logging in, field validity checks and so forth were not repeated per page. They were considered features in their own right, and automated UI checks were developed for them only when the features were initially developed.

To populate the system with the test data that the UI tests needed, we leveraged the previously developed services directly. There were a few cases where we had to develop additional features to support this; for example, the application would never allow creation of reference data through services, but we needed that data in the empty test environments. Fortunately, adding new test-only services was easy and did not pollute the application assemblies, as we had a shared test infrastructure assembly into which we could put low-level services that would be available only to test suites. This would have been hard in a fully siloed environment, but being on both the development and test automation teams made this work possible.

The new Selenium tests were driven through NUnit, which the development team was using. This meant that the new reports Jenkins built from these tests cost nothing to integrate everywhere the existing unit tests were: HipChat, Confluence, aggregated test reports, executive summaries and more. The old PowerShell tests, though better than the really old Selenium IDE tests, were not integrated in the same ways. This gave us another reason to pay down some of the technical debt we had accrued: replacing the PowerShell service tests with simpler C# service tests made everything more maintainable, reduced test execution time and allowed us to deliver all test results in a single report, created by NUnit and Jenkins. Fortunately, the delivery of UI stories through the development team’s sprints was a little slower than the delivery of earlier service stories, so we were able to have some people eliminating debt while others worked on new stories.

Improving Build Automation

As more UI stories were prepared for test automation, and fewer services were being added to the system, we found that the time required to complete the full build started growing very quickly. We were also making more frequent commits because we were often making small changes to existing tests; more frequent commits means more frequent continuous integration builds. When the CI builds take longer than it takes to change code and commit it, then the CI becomes more of an impediment than the valuable resource it was built to be.

At this point in the project, every feature was still new enough to require its original test coverage. As features mature and stabilize, the value in a comprehensive suite of automated tests goes down, and the cost of maintaining a large number of tests goes up. But we were not yet at the point where we could start merging or deleting tests. Instead, I made three changes that kept the build time within our team’s tolerances.

First, I added more Jenkins nodes and associated test environments. There was already a simple deployment procedure kicked off from within the Jenkins pipeline which leveraged Entity Framework’s Code First framework and Octopus Deploy. Changing that to be parameterized meant that a single Jenkins job could be used in different ways by different pipelines. This allowed me to create multiple test pipelines that ran specific sets of tests on specific environments. We found that for the test stages in the pipeline, this gave us very close to the conceptual limit in performance improvement: when we doubled the number of test environments available to the pipeline, we (almost) halved the time taken to complete them.

Next, I upgraded NUnit because more recent versions (3 and up) have much improved support for multi-threading tests. This was the easiest way to take full advantage of the multi-core CPUs available to us. This was not as simple as adding more Jenkins nodes, because nothing in the test framework had been designed to be thread safe. But once we fixed the framework and the tests, we got a reasonable performance improvement: in this case, going from one thread to six threads (on an eight core machine) halved the test time. We found no measurable improvement going to seven or eight threads. Multi-threading the tests also found a few problems in the web application under test. Later on in the project, we were able to massively reduce the scope of performance testing due to the nature of the multi-threaded automated tests: we had confidence that the thousands of parallel tests we were running every day were hammering the system far harder than thirty real users would ever be able to.

Finally, I introduced Selenium Grid. Grid introduces an extra element into the user interface testing infrastructure: rather than a test driving a web browser directly, it drives a grid which delegates to one of many web browsers it controls. We had not needed this previously because Selenium Grid is most often used to facilitate multi-browser testing and the system was officially supporting only a single version of a single browser. I added it so we could take advantage of several under-utilized machines available to us. Web browsers are able to consume all the memory and CPU cycles you throw at them! Although this improvement trimmed just ten or fifteen percent off the time of the user interface tests, it did greatly reduce the occurrence of timeouts that we attributed to resource starvation on the machines running the browsers.

Improvement As A Goal

Throughout the first year of the rebooted project, we continually reviewed how we worked, what we delivered and what was required of us.  Although most of the team were new to test automation and to programming in general, we were able to deliver excellent automation coverage for all core areas and most of the ancillary areas of the product. All the other teams on the project had confidence in our regression suites. We achieved this because everyone on the team was keen to experiment, to learn new ways of doing things and to put effort into automating anything repetitive and boring. During this phase of the project, the new features available with the addition of each tool changed how the teams worked together and with the tools. In turn, these new ways of working changed the demands the teams put on the infrastructure and tools. Small, frequent improvements became the norm: we had reached my goal of a culture of continuous improvement.

Not all changes were in the direction of increased automation. At one point mid-way through the project, the features were being delivered to our team so fast that we had to drastically cut the amount of new automated checks we could create. Most of the team had time for nothing but exploratory testing. But because the new features were variations or extensions of existing features that did have good automation coverage, the development and management teams accepted the extra risk with confidence. None of those features ever caused any problems: we accidentally learned that not every feature requires even initial automation coverage. It’s more important to test the basics well than to get full test coverage (whatever that means).

Before the customer knew us and our capabilities, we weren’t able to ask them to just trust us to get on with whatever is best for the project. And there was no way we could predict all the changes that we would make in order to improve the product and our work. So what we described in our initial plans and strategy documents doesn’t describe even a small part of what we were able to deliver or how we delivered it. In the final part of this series, I’ll compare what we said we’d do with what we actually did.