Fuzzy Fine-Grained Code-History Analysis

Abstract

Existing software-history techniques represent source-code evolution as an absolute and unambiguous mapping of lines of code in prior revisions to lines of code in subsequent revisions. However, the true evolutionary lineage of a line of code is often complex, subjective, and ambiguous. As such, existing techniques are predisposed to, both, overestimate and underestimate true evolution lineage. In this paper, we seek to address these issues by providing a more expressive model of code evolution, the fuzzy history graph, by representing code lineage as a continuous (i.e., fuzzy) metric rather than a discrete (i.e., absolute) one. Using this more descriptive model, we additionally provide a novel multi-revision code-history analysis - fuzzy history slicing. In our experiments over three real-world software systems, we found that the fuzzy history graph provides a tunable balance of precision and recall, and an overall improved accuracy over existing code-evolution models. Furthermore, we found that the use of such a fuzzy model of history provided improved accuracy for code-history analysis tasks.

Publication
2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)