Concept-based failure clustering


When attempting to determine the number and set of execution failures that are caused by particular faults, developers must perform an arduous task of investigating and diagnosing each individual failure. Researchers proposed failure-clustering techniques to automatically categorize failures, with the intention of isolating each culpable fault. The current techniques utilize dynamic control flow to characterize each failure to then cluster them. These existing techniques, however, are blind to the intent or purpose of each execution, other than what can be inferred by the control-flow profile. We hypothesize that semantically rich execution information can aid clustering effectiveness by categorizing failures according to which functionality they exhibit in the software. This paper presents a novel clustering method that utilizes latent-semantic-analysis techniques to categorize each failure by the semantic concepts that are expressed in the executed source code. We present an experiment comparing this new technique to traditional control-flow-based clustering. The results of the experiment showed that the semantic-concept clustering was more precise in the number of clusters produced than the traditional approach, without sacrificing cluster accuracy.

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering