Developing quality: database integration

Showing posts with label database integration. Show all posts

Tuesday, March 27, 2012

Towards an integration test data strategy

For me, integration tests are what really adds that extra feeling of accomplishment to a piece of software I deliver. Achieving a decent design and a bunch of unit tests also add to that feeling, but the integration test is that final touch (not that they are written last, but that's a different discussion...)

This post deals with system-level integration tests, where we test many components of the system in a deployed environment. We test the system like the user would, using a GUI, a web service or other interfaces. These tests should be portable to other environments so that we can use them as regression tests during the applications life cycle.

Cheating the data pain

For almost any integration test, data is something we have to consider. Our integration test commonly depends on some amount of data being setup prior to the test. It might be data that your code uses, valid parameters it needs or data it produces. Selecting and managing this data is often hard and has been a frequent pain-point for projects I have been part of.

So why is test data painful? Often the models our software are built on are complex, so understanding it requires hard work. It might be easy enough to understand it for one test case to work, but it is a completely different thing to gain a general understanding in order to create dozens of test cases. Another painful attribute is portability. You might own and know the development environment pretty well and you may have some "dummy data" setup, but what if you are testing in the UAT environment. Customers will have access and as we all know - they won't handle it gently...

So. Things are hard and painful. What happens? I have a few options, pick one...

We skip it. Integration tests take to much time, are to expensive and have no value.
We skip it. We have unit tests.
We kind of skip it. We create tests only in our local environment, that will have to do!
We think we don't skip it, but we really do. We create smaller, smoke-tests, in the environments outside of our control.
We do it. We test all the environments, since we want our bases covered. We know that stuff happens and that any environment is a new one and that if we don't find the bugs - customers will.

Okay, that cheesy list might not be true or resemble any reality that you know - but for anything difficult we tend to cheat. We do it on different levels and for different reasons, but we do it. We cheat.

Enduring the data pain

Since I have cheated the data pain many times I wanted to explore how I could bring some order to this mess. That's what we developers do, we organize messy things into stuff that at least we can understand.

I think there are ways to cheat that actually don't impact the quality of your tests.
So, let's get to it. Basically,you have four approaches for any data in your tests.

1. Hard-coded
This is the "quick and dirty" approach. Here we assume that the same value will be available no matter what. Even if this may be true, it tends to be a quite naive approach. Moving from one environment to the other, data will change. But this approach is acceptable in certain cases:

When you are just trying to get something together for the first time
When you are creating throw-away-tests (Why would you? Even the simplest test adds value to you regression test suite!)
When data really IS that stable (Countries, Languages etc)

2. Find any

This approach is a bit more ambitious, but still requires low effort. Lets assume that you need to use a Country for some reason. Your environment is not setup for every single country in the world, nor are countries static - approach 1 is out of the question. For a database scenario, we'll create a simple "SELECT TOP 1 FROM xxx" query to retrieve the value to use. We don't care what country we get, as long as its valid. Only selecting the columns you need is a sound approach for many reasons, one is improved resilience against schema changes.

Note: My examples assumes that your data can only be retrieved from a database but, depending on the system, you might be able to collect data via web services, REST services etc.

3. Find with predicate

Here's the more ambitious cousin of option 2, this time we make the same "SELECT TOP 1..." query, but we add some WHERE statements, since what exact entity we want is important. In the simplest scenario we might just want to make sure that the entity we use has not been "soft-deleted". Another example (sticking to the country-scenario) would be that we want a country that has defined states. Again, only query agains columns that you use. When these predicates become very advanced and start to grow hair, consider this

Will the predicate always produce a match, is the data stable enough? In all environments?
Should you consider creating a matching entity instead, using option 4?

Beware: Some might think that updating an existing record is a good idea. Then you might produce a match, but you will also leave a footprint - that has to be removed. Updated entries are a lot harder to keep track of than inserted ones, since you need to select and remember the previous values.

4. Create your own

This is the hardcore solution. If you want it done right, do it yourself! Our selects now become inserts and we create that entity that we need. This requires the deepest level of model knowledge since you need to know every column and table relation in order to make a valid insert.

So, if this is so great - why not use it everywhere and take the power back!? Well, there are a couple of reasons why such an approach has problems.

Vulnerable tests
When you stick a large number of INSERT statements in your test setup, you are depending heavily on a stable database schema. Any new column (Non-NULL), renamed column or removed column will shred your test to pieces. And probably it will not fail in a pretty way, but in a time-consuming way that ultimately will make people question your ambitious effort.
Non-portable tests
I am targeting system-level integration tests, that use the entire system - or at least as much of it as possible. Inserting data will assume that no duplicate data already exists, which is no problem in your empty developer sandbox database. However, I am guessing that empty databases are not that common in your deployed environments... Therefore moving your test suite closer to the production environment will be impossible. There's just no way that those environments will be empty.
Time
Simply, this approach just takes too long. Figuring all the references out, understanding every piece of the database model even if many of them are irrelevant to what you are testing. Time can be spent more wisely.
Footprint
Many inserts, large footprint. Cleaning it up is a large part of that data pain.

Selecting the right approach

So, I have these four options, how do I select what to use when? I'll give you the architects answer: it depends. It depends on several things but I've started to think that there are two axes for categorizing test data.

A model for categorizing test data and selecting data approach

The Y axis represents the stability of your data. As for the example of countries, a system where all countries are present everywhere - that's pretty stable data. On the other end of that axis is purely transactional data, data produced by transactions - such as orders, customers etc.

The X axis is of a more subjective nature. For any test case there is data that you'd wish you didn't need. Its required in order to create or find more important data, it is reference data - supporting data. On the other end we have data that is what you are actually testing. Maybe I am currently testing how different customer types affect the order processing, then the customers and orders are what I am focused on. The focal data drives your tests and your software's logic.

Making the distinction between what data is focal and supportive is crucial to avoid test data pain. It also drives us to understand what we are testing and what we are not. The not part is in my experience most important as it gives us boundaries that divides our problem into a manageable chunk.

Summary

In projects I try to defend the quality of both the software and the tests. Schedules and pressure might lead us to think that cutting corners is a smart move, but it rarely is. That's why I wanted to bring some order to an area where I spend to much time arguing about stuff taking too long.

For some time I have advocated an approach where all data is created before the test and removed after, a strict "create your own" way. This is not only stupid but scares co-workers away from testing. Considering other options and seeing data from a focal/supportive and dynamic/stable perspective enables me to make new decisions for each situation and not try to fit every integration test into the same mold. It gives me the capability to put the effort where it is needed and put the slack where it is acceptable.

In the end, I just want higher quality tests and more of them. This might be one piece of the puzzle.

Thursday, February 9, 2012

Handling test environments with Soap UI

Dealing with multiple environments for a a piece of software is something most of us do. At the very least, you'll have a testing environment that is separate from your production environment. In many cases there will be a lot of testing environments, representing the different stages of quality assurance.
Some examples might be:

Developer sandbox
Internal test
External test
Acceptance test
Pre-production
Production

When using Soap UI for testing, you want to be able to perform tests in all of your environments. Previously we kind of struggled with this since we had to have separate projects for each environment. Maintaining tests through theses stages was a real pain...

Each environment had its own endpoint for the webservice under test. But since we also test stuff with the database, each environment would have its own connection string. The endpoint problem was quire easy to handle manually through the Assign menu option in the WSDL interface screen. But reassigning the database connection was something else.

I recently managed to create a way that took away the need for those separate projects completely. This post will try to explain my way of doing it and hopefully there is someone that has an even better solution...

1. Open the "overview" tab of the project, by double-clicking it

The project view has some really good features worth exploring. Properties are really powerful, but there is more - wait and see...

2. Create a property under your project, called "Environment"

This property will hold the name of your environment, such as "Dev", "Acceptance" or "Pre-production".

3. Create one property for each of your service endpoint adresses

You'll need to create one property for each environment, which is tiresome - but on the other hand creates a good place to find those hard-to-remember-addresses!

4. Create one property for each of your database connections

I am using the jtds driver, enabling Windows authentication for SQL server using Soap UI.

5. Hook in to the event model

Now this was something new for me. Soap UI has a pretty extensive event model that enables you to execute code on certain events.

Still in the project view, open the "Events" tab. Click on the + to add a new event handler.

I selected the "TestSuiteRunListener.beforeRun" event which fires just before an entire test suite is run. This way, my environment configuration will fire only when I run the entire suite. Executing single test cases is something I'll do more in the development stage of things.

There are many events to select from and I have not examined them all, but most names are pretty self-explanatory.

Now you'll end up with an empty Groovy script code window. I'll break my script up into pieces to make it easier to read. Sorry about the images, but I couldn't get syntax coloring otherwise...
The text version is here.

First we need to import some stuff..

Then i collect all of the property values we created earlier

Then lets just do an if-statement, checking which is the selected environment, keeping in mind that someone may have entered something incorrectly.

Finally to the actual code that does anything. First we loop through all the test steps of all test cases in all the suites to find steps that use the web service. There we replace the endpoint adress with the selected one. Then we repeat the procedure with our connection string.

In summary

So, what does this do?

Whenever I run a test case this event handler will reset all requests to use the wsdl endpoint of my choice and the connection string that we want. The code isn't pretty our optimized, but that's not my concern just now. I wanted to see if it could be done. And it could.

Any better ideas (which there must be?!) are appreciated!!

Thursday, December 15, 2011

Database integration with SOAP UI - in way over my head...

Today I embarked on a journey to create that real database integration test I always wanted.
Many times in the past I have created tests for applications that use a complex database structure to produce some results. Every time I have resorted to setting up a static data structure and then make my tests agains that collection of data.

The problem with my old approach is that it is not very stable. What if someone tampers with your data? What if you want to move the tests to a UAT environment, or worse yet - production?

The ideal scenario, I think, is to have the test
1. Set all the necessary data up
2. Do the test
3. Clean up the data

But that sounds a lot simpler than it is.

Setting the data up

Using Soap UI I created a bunch of JDBC test steps to create every tiny bit of data that I needed. I put all of these steps in a disabled test case that I trigger manually from the actual cases.
I divided the data into two categories

Stable data, or "master data"
This data will most probably exist in all environments, but not necessarily with the same ids.
Therefore a needed a bunch of "find XXX" test steps. I then transferred all those values into properties I could access later.
Dynamic data
This is the data I will perform my actual tests against. Its important that I have full control over this data to create valid tests and to be able to trust the results. For this I created a bunch of JDBC steps that insert data into different tables.

One thing I discovered is that having GUIDs as primary keys, as opposed to having auto-ids, makes testing easier. I created some Groovy test steps that produced new GUIDs for all new entities. I stored these values as properties for later access.

As you can see on the right - there were a whole bunch of steps to make this happen...

Testing against the data

This is the easy part. Since all values I needed were transferred into properties I could use them in my actual tests. I actually put the setup data test case in a disabled test suite, and that is quite useful. In my tests I don't want to care about where the data comes from or how it is created. I only want to be provided with some values that will make my tests do what I want them to. Properties in Soap UI is your friend here.

Cleaning up

To clean up I created another test case in my disabled suite that took all of the generated Ids and deleted them from the database. No trace whatsoever! But... the infrastructure to do this is quite tiresome. Read on...

Maintaining tests

I wanted to create my data once and then do many tests against it. In fact, I wanted to create some base data that would be available for all tests and then some per-test data. The per-test data could be setup using a test step in my test. It fits nicely under a GIVEN step. (See my post on BDD in Soap UI) But the clean up fits nowhere in the test. Soap UI gives you Setup and TearDown possibilities on many levels so for this I put a TearDown step on my test case.

I created a disabled test case that took care of the cleaning and I made a Groovy script call to execute it.

While this is nice and dandy for the per-test data, the global data was still an issue. I could put a setup and teardown script on the test suite level, but that requires anyone that uses these tests to never run any case in isolation. That would mean that global data would not be available or linger on long after the test was finished.

So, I put all of the data creation and cleaning on all the test cases. That did nothing good to my performance... What I am considering now is to have some sort of flagging for the setup so that a test case will not set global data up if it has already been setup.

BUT... I think flags like "IsSetup" is kind of a smell. So, here I am - I have stable tests that perform crappy. Do I care? Well, if the trade-off is between performance and stability, I choose stability any day. But I would really like to find a better way of doing this. Maybe it is not supposed to be done at all? Maybe I am grasping for test Utopia?

With those indecisive words, I bid you good night.

Ps. Any other suggestions on how to do these kind of tests, or motivations on why I shouldn't at all, are appreciated. Ds.