Delta debugging is a technique which narrows down the set of changes which could have led to a fault, given a base version of a program which worked correctly and a new version which did not. It first breaks down the difference between the two versions into a set of changes. Delta debugging is able to construct new programs by applying a subset of these changes to the original base version. It does a binary search through the 2^n possible programs to find the smallest set of changes which resulted in a failure. The choice of binary search over linear search is significant. Firstly, it allows the tool to construct the new program using any combination, not just from the first n changes. Secondly, it improves efficiency, effectively finding the one change set out of a 2^n space in logarithmic time. There is another variation of delta debugging which narrows down the subset of inputs which caused the fault, given one set which worked correctly and one which did not [1], which is not discussed here.
Delta debugging mimics manual debugging in many ways. Its main contribution is providing a systematic and mechanizable technique, saving programmer time and effort of doing repetitive chores, and is also much faster. I would like to see an experiment to quantify its effectiveness in debugging. I will measure effectiveness by 1) how often does it return a set containing the error-inducing line? 2) what is the average size of the set returned? and 3) does the returned set contain an error-inducing line? It is possible that delta debugging returns a set containing no error line if the program is ambiguous (Definition 7). Also, the size of the change set returned in each case study is a single line, but that may not be true in general.
I think a shortcoming of delta debugging is its black box nature. Even if the programmer locates the failure-inducing delta, he is given no intuition on why it caused the failure. This means he may not know how to correct it such that it will both not cause the failure as well as implement the new functionality he wants. Very often, the correct fix is not in the failure-inducing delta. He will still need to manually trace through the program, using the failure inducing delta as a starting point, to understand why the failure occured and how to fix it.
One of the sub-problems in delta debugging is how to break down the difference between two versions into changes. If the granularity is too coarse, delta debugging may be unable to isolate the precise source of the error. On the other hand, the finer the granularity, the slower the algorithm runs. Also, the possibility that the fault was caused by multiple changes increases. Also, dividing something the programmer intended as a single logical change into two can cause ambiguity, where each individual change fails but when combined the program succeeds.
This paper breaks down the difference between the two versions according to deltas from diff. Such a division makes no attempt to maximize compilable configurations or find the best granularity. An improvement similar to in Crisp [2] is to group changes at method granularity, connected by their structural dependencies. Grouping changes guarantees that we only generate compilable configurations. This improvement subsumes both the 'grouping by semantic criterion' and 'predicting test outcome' heuristics suggested in the paper. Another possible improvement is to have multiple levels of granularity. We can then start at a very coarse granularity, eg all changes in the same class are considered one change, then progressively switch to finer granularity when we are stuck, eg switch to method granularity, or statement block granularity, when we are left with a set of potentially failure-inducing classes. This may helps to keep the number of changes we deal with at one time small.
[1] Zeller, A., Simplifying and Isolating Failure-Inducing Input, IEEE Transactions on Software Engineering Vol 28 Pg 183-200, 2002.
[2] Chesley, O.C. et al, Crisp—A Debugging Tool for Java Programs, 21st ICSM, 2005
[3] Bouillon, P. et al, Automated Debugging in Eclipse (at the touch of not even a button), OOPSLA 2003.