Developing quality: 2012

Tuesday, March 27, 2012

Towards an integration test data strategy

For me, integration tests are what really adds that extra feeling of accomplishment to a piece of software I deliver. Achieving a decent design and a bunch of unit tests also add to that feeling, but the integration test is that final touch (not that they are written last, but that's a different discussion...)

This post deals with system-level integration tests, where we test many components of the system in a deployed environment. We test the system like the user would, using a GUI, a web service or other interfaces. These tests should be portable to other environments so that we can use them as regression tests during the applications life cycle.

Cheating the data pain

For almost any integration test, data is something we have to consider. Our integration test commonly depends on some amount of data being setup prior to the test. It might be data that your code uses, valid parameters it needs or data it produces. Selecting and managing this data is often hard and has been a frequent pain-point for projects I have been part of.

So why is test data painful? Often the models our software are built on are complex, so understanding it requires hard work. It might be easy enough to understand it for one test case to work, but it is a completely different thing to gain a general understanding in order to create dozens of test cases. Another painful attribute is portability. You might own and know the development environment pretty well and you may have some "dummy data" setup, but what if you are testing in the UAT environment. Customers will have access and as we all know - they won't handle it gently...

So. Things are hard and painful. What happens? I have a few options, pick one...

We skip it. Integration tests take to much time, are to expensive and have no value.
We skip it. We have unit tests.
We kind of skip it. We create tests only in our local environment, that will have to do!
We think we don't skip it, but we really do. We create smaller, smoke-tests, in the environments outside of our control.
We do it. We test all the environments, since we want our bases covered. We know that stuff happens and that any environment is a new one and that if we don't find the bugs - customers will.

Okay, that cheesy list might not be true or resemble any reality that you know - but for anything difficult we tend to cheat. We do it on different levels and for different reasons, but we do it. We cheat.

Enduring the data pain

Since I have cheated the data pain many times I wanted to explore how I could bring some order to this mess. That's what we developers do, we organize messy things into stuff that at least we can understand.

I think there are ways to cheat that actually don't impact the quality of your tests.
So, let's get to it. Basically,you have four approaches for any data in your tests.

1. Hard-coded
This is the "quick and dirty" approach. Here we assume that the same value will be available no matter what. Even if this may be true, it tends to be a quite naive approach. Moving from one environment to the other, data will change. But this approach is acceptable in certain cases:

When you are just trying to get something together for the first time
When you are creating throw-away-tests (Why would you? Even the simplest test adds value to you regression test suite!)
When data really IS that stable (Countries, Languages etc)

2. Find any

This approach is a bit more ambitious, but still requires low effort. Lets assume that you need to use a Country for some reason. Your environment is not setup for every single country in the world, nor are countries static - approach 1 is out of the question. For a database scenario, we'll create a simple "SELECT TOP 1 FROM xxx" query to retrieve the value to use. We don't care what country we get, as long as its valid. Only selecting the columns you need is a sound approach for many reasons, one is improved resilience against schema changes.

Note: My examples assumes that your data can only be retrieved from a database but, depending on the system, you might be able to collect data via web services, REST services etc.

3. Find with predicate

Here's the more ambitious cousin of option 2, this time we make the same "SELECT TOP 1..." query, but we add some WHERE statements, since what exact entity we want is important. In the simplest scenario we might just want to make sure that the entity we use has not been "soft-deleted". Another example (sticking to the country-scenario) would be that we want a country that has defined states. Again, only query agains columns that you use. When these predicates become very advanced and start to grow hair, consider this

Will the predicate always produce a match, is the data stable enough? In all environments?
Should you consider creating a matching entity instead, using option 4?

Beware: Some might think that updating an existing record is a good idea. Then you might produce a match, but you will also leave a footprint - that has to be removed. Updated entries are a lot harder to keep track of than inserted ones, since you need to select and remember the previous values.

4. Create your own

This is the hardcore solution. If you want it done right, do it yourself! Our selects now become inserts and we create that entity that we need. This requires the deepest level of model knowledge since you need to know every column and table relation in order to make a valid insert.

So, if this is so great - why not use it everywhere and take the power back!? Well, there are a couple of reasons why such an approach has problems.

Vulnerable tests
When you stick a large number of INSERT statements in your test setup, you are depending heavily on a stable database schema. Any new column (Non-NULL), renamed column or removed column will shred your test to pieces. And probably it will not fail in a pretty way, but in a time-consuming way that ultimately will make people question your ambitious effort.
Non-portable tests
I am targeting system-level integration tests, that use the entire system - or at least as much of it as possible. Inserting data will assume that no duplicate data already exists, which is no problem in your empty developer sandbox database. However, I am guessing that empty databases are not that common in your deployed environments... Therefore moving your test suite closer to the production environment will be impossible. There's just no way that those environments will be empty.
Time
Simply, this approach just takes too long. Figuring all the references out, understanding every piece of the database model even if many of them are irrelevant to what you are testing. Time can be spent more wisely.
Footprint
Many inserts, large footprint. Cleaning it up is a large part of that data pain.

Selecting the right approach

So, I have these four options, how do I select what to use when? I'll give you the architects answer: it depends. It depends on several things but I've started to think that there are two axes for categorizing test data.

A model for categorizing test data and selecting data approach

The Y axis represents the stability of your data. As for the example of countries, a system where all countries are present everywhere - that's pretty stable data. On the other end of that axis is purely transactional data, data produced by transactions - such as orders, customers etc.

The X axis is of a more subjective nature. For any test case there is data that you'd wish you didn't need. Its required in order to create or find more important data, it is reference data - supporting data. On the other end we have data that is what you are actually testing. Maybe I am currently testing how different customer types affect the order processing, then the customers and orders are what I am focused on. The focal data drives your tests and your software's logic.

Making the distinction between what data is focal and supportive is crucial to avoid test data pain. It also drives us to understand what we are testing and what we are not. The not part is in my experience most important as it gives us boundaries that divides our problem into a manageable chunk.

Summary

In projects I try to defend the quality of both the software and the tests. Schedules and pressure might lead us to think that cutting corners is a smart move, but it rarely is. That's why I wanted to bring some order to an area where I spend to much time arguing about stuff taking too long.

For some time I have advocated an approach where all data is created before the test and removed after, a strict "create your own" way. This is not only stupid but scares co-workers away from testing. Considering other options and seeing data from a focal/supportive and dynamic/stable perspective enables me to make new decisions for each situation and not try to fit every integration test into the same mold. It gives me the capability to put the effort where it is needed and put the slack where it is acceptable.

In the end, I just want higher quality tests and more of them. This might be one piece of the puzzle.

Wednesday, March 21, 2012

soapUI 4.5 Beta 2

The 4.5 Beta 2 of soapUI just got released and it certainly has some new cool features. This post is in no way a complete review, but two of the new features came surprisingly close to what I had on the top of my wish-list.

1. Environment handling

I had a disussion with my co-worker some time ago. We spoke about how we could port our test projects from one environment to the another. "Why can't we just have a drop-down where we select the current environment?!" - that was our wish. So, since the 4.0.1 version contained no such thing, I decided to work something out for myself, resulting in this blog post. Even if that did the trick, it was messy, manipulative and certainly no drop-down...

In the 4.5 version we have a new tab on the project level, called "Environments"

This tab will contain all of your defined environments where you will run your tests. Going from an old project, with no envrionment handling, to a new one is dead simple.

When you click the "+" to create a new environment you can select to "copy endpoints and credentials from the project". This means that all of your current endpoints will be saved in that environment.

An environment is a set of endpoints, rest services, properties and database connections. Each of these artifacts will have a name which you can then use in your tests. For instance, if you are setting up a JDBC test step you will be able to select from the defined database connections by name. When you switch between environments all the JDBC steps using that name will be automatically targeted against the JDBC connection defined in that environment under that name. Awesome! Using environments is now completely transparent.

And the best of all, I got my drop-down:

This dropdown is available on all levels: project, test suite and test case. Switching the environment on any of these levels has a global impact, meaning that all consecutive requests (SOAP requests, JDBC requests) will target the selected environment.

For me this feature is a huge improvement. I have not yet tried the custom properties on the environment level, but I think that feature has some potential as well. It is the whole transparency aspect that appeals to me.

2. Assertion test step
Now this is a feature that many might be excited about for completely different reasons than me. This feature means that we now have a step type that is focused only on asserting. With it, you can make very complex assertions using the interface we are used to with all the guidance for XPath expressions etc.

For me, this just makes my argument for BDD using soapUI even stronger. In this blog post I described how I implement the Given-When-Then syntax into soapUI. The only quirk about it was that I hade to do some Groovy-script-ninja-tricks to get the syntax to be clean. Lets do a short recap:

The GIVEN steps constitute the background of the tests, the circumstances in which the test is executing. Normally this is a bunch of SOAP and JDBC requests setting up data. Many times we´ll need a bunch of steps to set everything up.
The WHEN steps is the actual execution of my web service. This is normally only one step, the web service call.
The THEN steps are the assertions. If I am asserting something from my WHEN request, the plain approach is to put the assertions inside that step. But then, we'll have no THEN step. Previously I solved this by adding a "virtual" assertion step using Groovy, but that caused some frowning among not-so-Groovy-familiar-co-workers...

Now, this last issue with the THEN steps is history. Now we have the assertion test step which is a perfect match for my THEN step. So now there really is no reason not to go BDD with soapUI.

In summary
These two features really strengthened soapUI's position as my tool of choice for web service testing. The Environments improve maintainability, assertion steps improve readability by enabling BDD. Both features reduce the need for "ninja-stuff" in Groovy. Don't get me wrong - I enjoy a good ninja-coding-spree as much as the next developer but for me testing is all about understanding. It is about making as many stakeholders as possible understand what is happening, what the requirements are and if we are meeting them. I think we are on the right track with this tool.

Thursday, February 9, 2012

Handling test environments with Soap UI

Dealing with multiple environments for a a piece of software is something most of us do. At the very least, you'll have a testing environment that is separate from your production environment. In many cases there will be a lot of testing environments, representing the different stages of quality assurance.
Some examples might be:

Developer sandbox
Internal test
External test
Acceptance test
Pre-production
Production

When using Soap UI for testing, you want to be able to perform tests in all of your environments. Previously we kind of struggled with this since we had to have separate projects for each environment. Maintaining tests through theses stages was a real pain...

Each environment had its own endpoint for the webservice under test. But since we also test stuff with the database, each environment would have its own connection string. The endpoint problem was quire easy to handle manually through the Assign menu option in the WSDL interface screen. But reassigning the database connection was something else.

I recently managed to create a way that took away the need for those separate projects completely. This post will try to explain my way of doing it and hopefully there is someone that has an even better solution...

1. Open the "overview" tab of the project, by double-clicking it

The project view has some really good features worth exploring. Properties are really powerful, but there is more - wait and see...

2. Create a property under your project, called "Environment"

This property will hold the name of your environment, such as "Dev", "Acceptance" or "Pre-production".

3. Create one property for each of your service endpoint adresses

You'll need to create one property for each environment, which is tiresome - but on the other hand creates a good place to find those hard-to-remember-addresses!

4. Create one property for each of your database connections

I am using the jtds driver, enabling Windows authentication for SQL server using Soap UI.

5. Hook in to the event model

Now this was something new for me. Soap UI has a pretty extensive event model that enables you to execute code on certain events.

Still in the project view, open the "Events" tab. Click on the + to add a new event handler.

I selected the "TestSuiteRunListener.beforeRun" event which fires just before an entire test suite is run. This way, my environment configuration will fire only when I run the entire suite. Executing single test cases is something I'll do more in the development stage of things.

There are many events to select from and I have not examined them all, but most names are pretty self-explanatory.

Now you'll end up with an empty Groovy script code window. I'll break my script up into pieces to make it easier to read. Sorry about the images, but I couldn't get syntax coloring otherwise...
The text version is here.

First we need to import some stuff..

Then i collect all of the property values we created earlier

Then lets just do an if-statement, checking which is the selected environment, keeping in mind that someone may have entered something incorrectly.

Finally to the actual code that does anything. First we loop through all the test steps of all test cases in all the suites to find steps that use the web service. There we replace the endpoint adress with the selected one. Then we repeat the procedure with our connection string.

In summary

So, what does this do?

Whenever I run a test case this event handler will reset all requests to use the wsdl endpoint of my choice and the connection string that we want. The code isn't pretty our optimized, but that's not my concern just now. I wanted to see if it could be done. And it could.

Any better ideas (which there must be?!) are appreciated!!