Monday, December 27, 2010

Lies, damned lies, and statistics

Mark Twain told once, There are three kind of lies : lies, damned lies and statistics. There are websites who take this analysis far and show how statistics are used to lie about something. There is a book named How to Lie with Statistics (

Basic concept is anyone can use the statistics to present a different picture than what actually is. Check this video ( mathematician Peter Donnelly  explains how statistics used to fool juries in the criminal cases. 

How is that related with software or testing? Lot of things we do is related with numbers. Let me give you some examples.

First, code and test coverage numbers. Let us say 10 bugs per KLOC, i.e., 10 bugs in every thousand lines of code. However, if you look at module wise, branch wise, loop wise data, this average won't fit. And there will differences between code written by experienced developer and fresher. So, 10 bugs per KLOC may tell that we need to put same testing effort for all modules, but in reality some modules may need more testing effort than other modules. If you can categorize modules based of effort needed, then it is easier to test. 

Another example is number of test cases executed data. Say, testing team executed 10k cases and found 1k bugs. Is that mean 1 bug per 100 cases? Maybe. But can we conclude the most if not all of the modules tested by these? Nope. 

That data need to be correlated with test coverage numbers. If 100 test cases check the same code logic or use different data to test, it is waste of time and effort. Chances of finding a bug goes to none after the first test. So if  these unnecessary test cases removed, then, we will get an estimation of how many test cases needed to test the product. 

Aware of this kind of averaging, because, it is not good in most of the times.


raja's shared items

My "Testing" Bundle