Subscribe to the RSS feed then follow me on twitter at @mrlacey (misc) and @wpug (WPDev news)

Friday, April 04, 2008

How I justified the move to unit testing

Update. This memo was written while at a previous job. The IT director had previously claimed there was no value in unit testing and we shouldn't do it. I submitted this memo to all developers at one of the fortnightly update meetings. It did lead to unit tesing being adopted by other developers as part of the development process.

The following is from a memo I wrote:



Why we must use unit testing in the future.

or
How I saved myself almost three days work and met a deadline.

Plus saved countless days work in the future.

or

SLA calculation and bank holiday identification was so broken it's ridiculous!

I have a confession to make. I have been creating unit tests for some of the code I have been writing recently.
The purpose of writing these tests has been to make my task easier and to try and ensure that the code I have written does not contain any preventable errors.
I think it has worked:
  • Of the code I have written unit tests for, none has failed testing. (So far.)
  • I have saved time by not having to repeatedly perform manual checks.
  • I have removed a lot of the opportunity for human error in reviewing my manual testing.
  • I have created a resource which will save time in the future, if and when any code in this part of the system is changed and the tests reused.
To show how unit testing has helped me, I'll use CCP15852 (errors in SLA calculations) as an example.

The calculation of dates and times as part of an SLA is non trivial.
(In the last 9 months there have been 6 changes (by 3 other developers) to this code, to try and get it to work correctly. That's a pretty good indication that this isn't simple to get right. – Or test that changes made have fully corrected the problem.)

Please note that this is not a criticism of those who have previously written or tested code in this area. I am simply trying to highlight that this is a complicated area which is difficult to test and our current practices have proved inadequate.

There is a lot to consider in SLA calculation:
  • Non working days
  • 24 hour working days
  • Working days for specific hours
  • Not working on bank holidays
  • Working on bank holidays
  • Starting on a non working day
  • Starting on midnight on a 24 hour day (beginning and end)
  • Starting on a bank holiday
  • Starting before the start time on a day with specific working hours
  • Starting on the time a specific working period begins
  • Starting on the time a specific working period ends
  • Starting after a specific working period ends
  • Calculations limited to the same day
  • Calculations spanning multiple days
  • Combinations of the above (E.g. Starting at midnight on a 24 hour working day, adding 112 working hours, but not working the next two days, then having a bank holiday and then working between 9 and 5 on the two days after that, before going back to a 24 hour working day!)

All this adds up to a very large number of possible situations to test.

In making the changes for this CCP, I started out by creating some tests to find out where the errors in the code were. Which calculations were affected, etc.

In total I ended up creating 155 tests which performed a total of 580 checks.
(All these tests could be performed in around 10 seconds.)

When running these tests against the original code 94 tests failed. A pass rate of just 39%.

The originally reported problem was with using the '24 Hour day' setting in a 'Working week'.
It was assumed that calculations based on days with a start and end time were being performed correctly.
I identified 56 tests (of the 155) which covered SLA calculations using such days.
Of these 56, 36 failed. (A pass rate of just 35%)


Admittedly, many of these tests are edge cases so it is unlikely that end users would see only 39% of calculations being performed correctly. It is concerning, however, that of all the different situations that need to be accounted for, more than 3 in every 5 will be done incorrectly.

As soon as I started adding tests for dates which are affected by bank holidays it became apparent that the code to determine if a date was a bank holiday was also broken.
How broken? Well...
If there was only 1 bank holiday in the system, it would never be found.
If there were 2, only 1 would ever be found. (The date in position 1)
If there were 3, only 1 would ever be found. (The date in position 2)
If there were 4, only 1 would ever be found. (The date in position 2)
If there were 5, only 1 would ever be found. (The date in position 5)
If there were 6, only 3 would ever be found. (The date in positions 1, 3, 5)
If there were 7, only 3 would ever be found. (The date in positions 2, 4, 6)
If there were 8, only 3 would ever be found. (The date in positions 2, 4, 6)
If there were 9, only 3 would ever be found. (The date in positions 2, 4, 6)
If there were 10, only 2 would ever be found. (The date in positions 3, 5)
If there were 11, only 6 would ever be found. (The date in positions 1, 3, 5, 6, 9, 11)
If there were 12, only 6 would ever be found. (The date in positions 1, 3, 5, 6, 9, 11)
If there were 13, only 6 would ever be found. (The date in positions 1, 3, 5, 6, 9, 11)
If there were 14, only 6 would ever be found. (The date in positions 1, 3, 5, 9, 11, 13)
If there were 15, only 7 would ever be found. (The date in positions 2, 4, 6, 8, 10, 12, 14)
If there were 16, only 7 would ever be found. (The date in positions 2, 4, 6, 8, 10, 12, 14)
If there were 17, only 7 would ever be found. (The date in positions 2, 4, 6, 8, 10, 12, 14)
If there were 18, only 5 would ever be found. (The date in positions 3, 5, 9, 11, 13)

(I only checked with up to 18 bank holidays in the system, but there is no way that it would magically start being able to find all dates if there were more.)

Clearly the number of records in the system and which position the one that was being searched for appeared in that list affected whether it would be found, or not.

Obviously the smaller number of bank holidays in the system is less likely to be an issue as we ship with 2 years worth of values in the database, but as the number of records gets bigger, there are still large numbers of records being missed.

To get the above results (on the IsBankHoliday function) I created 172 tests (those listed above plus no bank holidays in the system at all) with each test performing 1 check.
All these tests could be performed in around 20 seconds (It takes longer than the above tests because of all the database changes made in setting up and restoring the database.)
Of these tests, originally 103 failed. Only a 40% pass rate.

Of all 327 tests created, 197 failed when using the original code. A pass rate of only 39%.

I didn't avoid manual testing altogether though. I did still use the UI to test that the results reported by the tests matched what was displayed in the program. But I only had to do this at the end once I was confident all the calculations were correct.

So how much did writing these tests help?

Well, the process of sitting down and listing all the things (and combination of things) to test caused me to identify a large number of situations to test.
When writing some of these tests it prompted it to think of other situations the code had to account for, and which I might not have originally considered.
When some of the tests failed, it highlighted other situations which should also be tested.
The act of thinking about writing tests helps identify more tests. This leads to more bugs being found, which leads to fewer bugs being shipped.

(Some) opportunity for human error was removed.
Doing lots of SLA calculations manually can be very mentally taxing. As more are done, the opportunity for error increases.
Creating tests meant that the calculations only had to be performed once and the computer could check that what was returned was what was expected.
Manually checking that lots of similar dates and times are the same and performing lots of similar, but non trivial calculations, are tasks which it is easy make mistakes in.
Time was saved.

How much time was saved?
As I ran all the tests many times, I would say that I easily performed over 1000 tests (I actually think this is a very conservative estimate.)

To perform these tests manually would involve:
To test SLA calculation:
  • Log a call entering all details as needed, including setting the SLA start time.
  • Save the call.
  • Load the form to view the calculated times.
  • Check the times are as expected.

To test bank holiday identification:
  • Set the right number of bank holiday entries in the database (deleting and adding as required)
  • Perform the test (as above) to test the SLA calculation, but over the required bank holiday.

I estimate it would take an average of 1 minute to do each of the above tests.

That adds up to nearly three days of manual testing.

Even if it was only necessary to do half as many tests manually and they could be done twice as fast, it would still take the best part of a day.

Or look at it this way. Let's say someone spotted something in the code when doing a code review. It's just a minor change but we want to be sure that in making the change nothing else has been broken.

There are now 300+ tests to do to make sure the code runs as intended. If you are going to manually test them (at 30 seconds each) it would take at least 2.5 hours.
Or I can run my unit tests and be done in 30 seconds.

I have an opinion on which I think is best for the speed and quality of product development. Not to mention tester sanity.

What can we learn from this?

Based on previously shipped software, it is not possible (or practical) to manually test that SLA calculations are performed correctly, in all circumstances, when just using manual testing. (This will inevitably also apply to other parts of the system.)
Performing unit testing is faster then manual testing.
Having unit tests makes regression testing much faster than relying entirely on manual (re)testing.
Unit testing makes it easier to ensure the accuracy of the code written, leading to fewer bugs being included in released software.
In the short term, unit testing does not greatly add to the developer’s workload.
In the long term, unit testing saves a great deal of time. (For developers and testers)
Unit testing is a tool which (when used appropriately) can help us improve the quality of the shipped product and help us with the issues of increasing workloads and software with increasing quantities of existing code.
Obviously, it is not appropriate to write unit tests for all code, and it is not intended to replace manual testing. It is simply an available tool and I think we are making things harder for ourselves by not using it.

I am aware that I have raised the issue of unit testing before.
I do so again to make sure everyone is clear on the benefits of its use.
If it is decided that we still have no desire to incorporate unit testing as part of our development process I will not raise the subject again.

Thoughts?
Comments?

Are you a Windows Phone developer? If so, you could be getting rewards for the apps you build and the success they achieve by joining Nokia's DVLUP program.

No comments:

Post a Comment