At IntegrationQA (iQA), we are advocates of building quality in, rather than testing defects out. But working within constraints can sometimes lead to the optimum solution.

If a client tells us that testing before pre-production is impossible, then we must usually accept the situation and work within the boundaries provided. This does not mean we cannot be smart about testing.

The Constrained Problem

For one client, the problem was a slow file processor used to transform very large files. Though the client did their own development,  testing before an official acceptance phase was not possible.

Manual testing involved test analysts obfuscating old production files and dumping them in the processor. In the hours before the processing finished, the testers would make variations on the test files for the next test. The feedback time was abysmal, both when running the test and when receiving fixes from the developers.

iQA were asked to come up with an automated doodad that would make this problem go away. The brief was something along the lines of “make testing faster, and don’t charge us much”.

The Proposed Solution

Since we weren’t able to shift left or influence practices before acceptance testing, we had to test smarter with what we had, and the only thing over which we had control was the input file. We could provide a generator that builds files according to specification. While this didn’t resolve our concerns about their approach, it did satisfy their requirements within their constraints.

I admit that I’m not keen on generating random test data (see box out, The problems with generating test data), but as long as the issues with automated generation are understood and the risks accepted, it can be useful. You can test what happens when thousands of multi-terabyte files appear in the input directory at the same time. You can build valid empty files, invalid files, corrupt files and out-of-sequence files. Tests that were previously expensive are now easy.

Practical, Agile Design

We considered several options and design patterns for the generator. Each had pros and cons, but our priority was to keep costs low while improving testing performance. Grand solutions could wait.

But for flexibility, I made a clear distinction between file-level design and the format of each record generated. This allows the easy addition of new record types. Record-data generators would be defined in one place while test-file definitions would use those record definitions to generate test data.

We’d get something along the lines of:

Not abstract, not domain specific, but fairly easy to read
<record type=”customerRecord”/>

Avoiding tying the record format to the file design files was simple thanks to the availability of marshallers and serializers for common formats.

Implementation Matches Capability

The users of the generator tool are technical testers, with a familiarity of several interpreted languages and no fear of getting stuck into XML or JSON files. They were happy to use JSON files for the majority of the configuration.

Given this capability, the time and cost savings of developing in a language that provides all the plugins needed to quickly and cheaply create the generator was compelling: even if it meant they would have to learn a new language. We chose Ruby and the excellent factory-girl library which together allow easy rapid creation of text-based files of different formats.

Using the generator required the tester to concentrate on three types of description files targeting different outcomes and covering the bulk of input-data cases:

  • Record Variations: used both for examples and as a quick post-deployment smoke test.
  • A basic stress test: a huge test file, with thousands of the most common records.
  • Comprehensive fields test: to test overflows and formatting glitches.

Testing smarter can be smarter still

While we failed to shift the testing left or to build quality in, we can at least be happy that we’ve helped the client to test smarter. The testers could contribute earlier than before by creating examples that developers and customers can agree meet the business requirements. A robust understanding of file formats has the potential to significantly reduce the costs.

We continue to have reservations about generating test data. When valid and invalid test data is available early in the development life cycle, developers may feel they can skip unit and integration testing in favour of system testing. We’re strongly against this tactic: deferring testing increases the cost of fixing defects due to the delay between development and detection.

But if you are constrained… think smart.

Why generate random test-data?

It is common to use generators to produce random test data. Are you sure you don’t have these problems?

  • It’s hard to distinguish one field from the next when all fields have random data
  • The important edge cases and bug-finding values will almost never be randomly generated when you need them
  • The files are usually generated once and used many times (especially the very large flies, which might take an hour or two to generate), so they’re only random the first time
  • It can be hard to figure out which random value is the one at fault.

It is easy to use a data generator as a crutch that perpetuates systemic shortcomings proving only what you want it to prove, and removing the involvement of a test-data designer.

What confidence can you take from a data generator that “proves” a process is working when nothing proves that the generator is working correctly? And when the format of the data changes, who updates and tests the data generator?