Developing quality: Towards an integration test data strategy

For me, integration tests are what really adds that extra feeling of accomplishment to a piece of software I deliver. Achieving a decent design and a bunch of unit tests also add to that feeling, but the integration test is that final touch (not that they are written last, but that's a different discussion...)

This post deals with system-level integration tests, where we test many components of the system in a deployed environment. We test the system like the user would, using a GUI, a web service or other interfaces. These tests should be portable to other environments so that we can use them as regression tests during the applications life cycle.

Cheating the data pain

For almost any integration test, data is something we have to consider. Our integration test commonly depends on some amount of data being setup prior to the test. It might be data that your code uses, valid parameters it needs or data it produces. Selecting and managing this data is often hard and has been a frequent pain-point for projects I have been part of.

So why is test data painful? Often the models our software are built on are complex, so understanding it requires hard work. It might be easy enough to understand it for one test case to work, but it is a completely different thing to gain a general understanding in order to create dozens of test cases. Another painful attribute is portability. You might own and know the development environment pretty well and you may have some "dummy data" setup, but what if you are testing in the UAT environment. Customers will have access and as we all know - they won't handle it gently...

So. Things are hard and painful. What happens? I have a few options, pick one...

We skip it. Integration tests take to much time, are to expensive and have no value.
We skip it. We have unit tests.
We kind of skip it. We create tests only in our local environment, that will have to do!
We think we don't skip it, but we really do. We create smaller, smoke-tests, in the environments outside of our control.
We do it. We test all the environments, since we want our bases covered. We know that stuff happens and that any environment is a new one and that if we don't find the bugs - customers will.

Okay, that cheesy list might not be true or resemble any reality that you know - but for anything difficult we tend to cheat. We do it on different levels and for different reasons, but we do it. We cheat.

Enduring the data pain

Since I have cheated the data pain many times I wanted to explore how I could bring some order to this mess. That's what we developers do, we organize messy things into stuff that at least we can understand.

I think there are ways to cheat that actually don't impact the quality of your tests.
So, let's get to it. Basically,you have four approaches for any data in your tests.

1. Hard-coded
This is the "quick and dirty" approach. Here we assume that the same value will be available no matter what. Even if this may be true, it tends to be a quite naive approach. Moving from one environment to the other, data will change. But this approach is acceptable in certain cases:

When you are just trying to get something together for the first time
When you are creating throw-away-tests (Why would you? Even the simplest test adds value to you regression test suite!)
When data really IS that stable (Countries, Languages etc)

2. Find any

This approach is a bit more ambitious, but still requires low effort. Lets assume that you need to use a Country for some reason. Your environment is not setup for every single country in the world, nor are countries static - approach 1 is out of the question. For a database scenario, we'll create a simple "SELECT TOP 1 FROM xxx" query to retrieve the value to use. We don't care what country we get, as long as its valid. Only selecting the columns you need is a sound approach for many reasons, one is improved resilience against schema changes.

Note: My examples assumes that your data can only be retrieved from a database but, depending on the system, you might be able to collect data via web services, REST services etc.

3. Find with predicate

Here's the more ambitious cousin of option 2, this time we make the same "SELECT TOP 1..." query, but we add some WHERE statements, since what exact entity we want is important. In the simplest scenario we might just want to make sure that the entity we use has not been "soft-deleted". Another example (sticking to the country-scenario) would be that we want a country that has defined states. Again, only query agains columns that you use. When these predicates become very advanced and start to grow hair, consider this

Will the predicate always produce a match, is the data stable enough? In all environments?
Should you consider creating a matching entity instead, using option 4?

Beware: Some might think that updating an existing record is a good idea. Then you might produce a match, but you will also leave a footprint - that has to be removed. Updated entries are a lot harder to keep track of than inserted ones, since you need to select and remember the previous values.

4. Create your own

This is the hardcore solution. If you want it done right, do it yourself! Our selects now become inserts and we create that entity that we need. This requires the deepest level of model knowledge since you need to know every column and table relation in order to make a valid insert.

So, if this is so great - why not use it everywhere and take the power back!? Well, there are a couple of reasons why such an approach has problems.

Vulnerable tests
When you stick a large number of INSERT statements in your test setup, you are depending heavily on a stable database schema. Any new column (Non-NULL), renamed column or removed column will shred your test to pieces. And probably it will not fail in a pretty way, but in a time-consuming way that ultimately will make people question your ambitious effort.
Non-portable tests
I am targeting system-level integration tests, that use the entire system - or at least as much of it as possible. Inserting data will assume that no duplicate data already exists, which is no problem in your empty developer sandbox database. However, I am guessing that empty databases are not that common in your deployed environments... Therefore moving your test suite closer to the production environment will be impossible. There's just no way that those environments will be empty.
Time
Simply, this approach just takes too long. Figuring all the references out, understanding every piece of the database model even if many of them are irrelevant to what you are testing. Time can be spent more wisely.
Footprint
Many inserts, large footprint. Cleaning it up is a large part of that data pain.

Selecting the right approach

So, I have these four options, how do I select what to use when? I'll give you the architects answer: it depends. It depends on several things but I've started to think that there are two axes for categorizing test data.

A model for categorizing test data and selecting data approach

The Y axis represents the stability of your data. As for the example of countries, a system where all countries are present everywhere - that's pretty stable data. On the other end of that axis is purely transactional data, data produced by transactions - such as orders, customers etc.

The X axis is of a more subjective nature. For any test case there is data that you'd wish you didn't need. Its required in order to create or find more important data, it is reference data - supporting data. On the other end we have data that is what you are actually testing. Maybe I am currently testing how different customer types affect the order processing, then the customers and orders are what I am focused on. The focal data drives your tests and your software's logic.

Making the distinction between what data is focal and supportive is crucial to avoid test data pain. It also drives us to understand what we are testing and what we are not. The not part is in my experience most important as it gives us boundaries that divides our problem into a manageable chunk.

Summary

In projects I try to defend the quality of both the software and the tests. Schedules and pressure might lead us to think that cutting corners is a smart move, but it rarely is. That's why I wanted to bring some order to an area where I spend to much time arguing about stuff taking too long.

For some time I have advocated an approach where all data is created before the test and removed after, a strict "create your own" way. This is not only stupid but scares co-workers away from testing. Considering other options and seeing data from a focal/supportive and dynamic/stable perspective enables me to make new decisions for each situation and not try to fit every integration test into the same mold. It gives me the capability to put the effort where it is needed and put the slack where it is acceptable.

In the end, I just want higher quality tests and more of them. This might be one piece of the puzzle.

Developing quality

Tuesday, March 27, 2012

Towards an integration test data strategy