A comprehensive analysis of fault localization techniques and strategies
In Handbook of Software Fault Localization: Foundations and Advances, distinguished computer scientists Prof. W. Eric Wong and Prof. T.H. Tse deliver a robust treatment of up-to-date techniques, tools, and essential issues in software fault localization. The authors offer collective discussions of fault localization strategies with an emphasis on the most important features of each approach.
The book also explores critical aspects of software fault localization, like multiple bugs, successful and failed test cases, coincidental correctness, faults introduced by missing code, the combination of several fault localization techniques, ties within fault localization rankings, concurrency bugs, spreadsheet fault localization, and theoretical studies on fault localization.
Readers will benefit from the authors’ straightforward discussions of how to apply cost-effective techniques to a variety of specific environments common in the real world. They will also enjoy the in-depth explorations of recent research directions on this topic.
Handbook of Software Fault Localization also includes: - A thorough introduction to the concepts of software testing and debugging, their importance, typical challenges, and the consequences of poor efforts - Comprehensive explorations of traditional fault localization techniques, including program logging, assertions, and breakpoints - Practical discussions of slicing-based, program spectrum-based, and statistics-based techniques - In-depth examinations of machine learning-, data mining-, and model-based techniques for software fault localization
Perfect for researchers, professors, and students studying and working in the field, Handbook of Software Fault Localization: Foundations and Advances is also an indispensable resource for software engineers, managers, and software project decision makers responsible for schedule and budget control. </p
Table of Contents
Editor Biographies xv
List of Contributors xvii
1 Software Fault Localization: an Overview of Research, Techniques, and Tools 1
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, Franz Wotawa, and Dongcheng li
1.1 Introduction 1
1.2 Traditional Fault Localization Techniques 14
1.2.1 Program Logging 14
1.2.2 Assertions 14
1.2.3 Breakpoints 14
1.2.4 Profiling 15
1.3 Advanced Fault Localization Techniques 15
1.3.1 Slicing-Based Techniques 15
1.3.2 Program Spectrum-Based Techniques 20
1.3.2.1 Notation 20
1.3.2.2 Techniques 21
1.3.2.3 Issues and Concerns 27
1.3.3 Statistics-Based Techniques 30
1.3.4 Program State-Based Techniques 32
1.3.5 Machine Learning-Based Techniques 34
1.3.6 Data Mining-Based Techniques 36
1.3.7 Model-Based Techniques 37
1.3.8 Additional Techniques 41
1.3.9 Distribution of Papers in Our Repository 45
1.4 Subject Programs 47
1.5 Evaluation Metrics 50
1.6 Software Fault Localization Tools 53
1.7 Critical Aspects 58
1.7.1 Fault Localization with Multiple Bugs 58
1.7.2 Inputs, Outputs, and Impact of Test Cases 60
1.7.3 Coincidental Correctness 63
1.7.4 Faults Introduced by Missing Code 64
1.7.5 Combination of Multiple Fault Localization Techniques 65
1.7.6 Ties Within Fault Localization Rankings 67
1.7.7 Fault Localization for Concurrency Bugs 67
1.7.8 Spreadsheet Fault Localization 68
1.7.9 Theoretical Studies 70
1.8 Conclusion 71
Notes 73
References 73
2 Traditional Techniques for Software Fault Localization 119
Yihao Li, Linghuan Hu, W. Eric Wong, Vidroha Debroy, and Dongcheng li
2.1 Program Logging 119
2.2 Assertions 121
2.3 Breakpoints 124
2.4 Profiling 125
2.5 Discussion 128
2.6 Conclusion 130
References 131
3 Slicing-Based Techniques for Software Fault Localization 135
W. Eric Wong, Hira Agrawal, and Xiangyu Zhang
3.1 Introduction 135
3.2 Static Slicing-Based Fault Localization 136
3.2.1 Introduction 136
3.2.2 Program Slicing Combined with Equivalence Analysis 137
3.2.3 Further Application 138
3.3 Dynamic Slicing-Based Fault Localization 138
3.3.1 Dynamic Slicing and Backtracking Techniques 144
3.3.2 Dynamic Slicing and Model-Based Techniques 145
3.3.3 Critical Slicing 148
3.3.3.1 Relationships Between Critical Slices (CS) and Exact Dynamic Program Slices (DPS) 149
3.3.3.2 Relationship Between Critical Slices and Executed Static Program Slices 150
3.3.3.3 Construction Cost 150
3.3.4 Multiple-Points Dynamic Slicing 151
3.3.4.1 BwS of an Erroneous Computed Value 152
3.3.4.2 FwS of Failure-Inducing Input Difference 152
3.3.4.3 BiS of a Critical Predicate 154
3.3.4.4 MPSs: Dynamic Chops 157
3.3.5 Execution Indexing 158
3.3.5.1 Concepts 159
3.3.5.2 Structural Indexing 161
3.3.6 Dual Slicing to Locate Concurrency Bugs 165
3.3.6.1 Trace Comparison 165
3.3.6.2 Dual Slicing 168
3.3.7 Comparative Causality: a Causal Inference Model Based on Dual Slicing 173
3.3.7.1 Property One: Relevance 174
3.3.7.2 Property Two: Sufficiency 175
3.3.8 Implicit Dependences to Locate Execution Omission Errors 177
3.3.9 Other Dynamic Slicing-Based Techniques 179
3.4 Execution Slicing-Based Fault Localization 179
3.4.1 Fault Localization Using Execution Dice 179
3.4.2 A Family of Fault Localization Heuristics Based on Execution Slicing 181
3.4.2.1 Heuristic I 182
3.4.2.2 Heuristic II 183
3.4.2.3 Heuristic III 185
3.4.3 Effective Fault Localization Based on Execution Slices and Inter-block Data Dependence 188
3.4.3.1 Augmenting a Bad D(1) 189
3.4.3.2 Refining a Good D(1) 190
3.4.3.3 An Incremental Debugging Strategy 191
3.4.4 Other Execution Slicing-Based Techniques in Software Fault Localization 193
3.5 Discussions 193
3.6 Conclusion 194
Notes 195
References 195
4 Spectrum-Based Techniques for Software Fault Localization 201
W. Eric Wong, Hua Jie Lee, Ruizhi Gao, and Lee Naish
4.1 Introduction 201
4.2 Background and Notation 203
4.2.1 Similarity Coefficient-Based Fault Localization 204
4.2.2 An Example of Using Similarity Coefficient to Compute Suspiciousness 205
4.3 Insights of Some Spectra-Based Metrics 210
4.4 Equivalence Metrics 212
4.4.1 Applicability of the Equivalence Relation to Other Fault Localization Techniques 217
4.4.2 Applicability Beyond Fault Localization 218
4.5 Selecting a Good Suspiciousness Function (Metric) 219
4.5.1 Cost of Using a Metric 219
4.5.2 Optimality for Programs with a Single Bug 220
4.5.3 Optimality for Programs with Deterministic Bugs 221
4.6 Using Spectrum-Based Metrics for Fault Localization 222
4.6.1 Spectrum-Based Metrics for Fault Localization 222
4.6.2 Refinement of Spectra-Based Metrics 227
4.7 Empirical Evaluation Studies of SBFL Metrics 232
4.7.1 The Construction of D ∗ 234
4.7.2 An Illustrative Example 235
4.7.3 A Case Study Using D ∗ 237
4.7.3.1 Subject Programs 237
4.7.3.2 Fault Localization Techniques Used in Comparisons 238
4.7.3.3 Evaluation Metrics and Criteria 239
4.7.3.4 Statement with Same Suspiciousness Values 240
4.7.3.5 Results 241
4.7.3.6 Effectiveness of D ∗ with Different Values of ∗ 247
4.7.3.7 D ∗ Versus Other Fault Localization Techniques 248
4.7.3.8 Programs with Multiple Bugs 251
4.7.3.9 Discussion 255
4.8 Conclusion 261
Notes 262
References 263
5 Statistics-Based Techniques for Software Fault Localization 271
Zhenyu Zhang and W. Eric Wong
5.1 Introduction 271
5.1.1 Tarantula 272
5.1.2 How It Works 272
5.2 Working with Statements 274
5.2.1 Techniques Under the Same Problem Settings 275
5.2.2 Statistical Variances 275
5.3 Working with Non-statements 283
5.3.1 Predicate: a Popular Trend 283
5.3.2 BPEL: a Sample Application 285
5.4 Purifying the Input 286
5.4.1 Coincidental Correctness Issue 286
5.4.2 Class Balance Consideration 287
5.5 Reinterpreting the Output 288
5.5.1 Revealing Fault Number 288
5.5.2 Noise Reduction 291
Notes 292
References 293
6 Machine Learning-Based Techniques for Software Fault Localization 297
W. Eric Wong
6.1 Introduction 297
6.2 BP Neural Network-Based Fault Localization 298
6.2.1 Fault Localization with a BP Neural Network 298
6.2.2 Reduce the Number of Candidate Suspicious Statements 302
6.3 RBF Neural Network-Based Fault Localization 304
6.3.1 RBF Neural Networks 304
6.3.2 Methodology 305
6.3.2.1 Fault Localization Using an RBF Neural Network 306
6.3.2.2 Training of the RBF Neural Network 307
6.3.2.3 Definition of a Weighted Bit-Comparison-Based Dissimilarity 309
6.4 C4.5 Decision Tree-Based Fault Localization 309
6.4.1 Category-Partition for Rule Induction 309
6.4.2 Rule Induction Algorithms 310
6.4.3 Statement Ranking Strategies 310
6.4.3.1 Revisiting Tarantula 310
6.4.3.2 Ranking Statements Based on C4.5 Rules 312
6.5 Applying Simulated Annealing with Statement Pruning for an SBFL Formula 314
6.6 Conclusion 317
Notes 317
References 317
7 Data Mining-Based Techniques for Software Fault Localization 321
Peggy Cellier, Mireille Ducassé, Sébastien Ferré, Olivier Ridoux, and W. Eric Wong
7.1 Introduction 321
7.2 Formal Concept Analysis and Association Rules 324
7.2.1 Formal Concept Analysis 325
7.2.2 Association Rules 327
7.3 Data Mining for Fault Localization 329
7.3.1 Failure Rules 329
7.3.2 Failure Lattice 331
7.4 The Failure Lattice for Multiple Faults 336
7.4.1 Dependencies Between Faults 336
7.4.2 Example 341
7.5 Discussion 342
7.5.1 The Structure of the Execution Traces 342
7.5.2 Union Model 343
7.5.3 Intersection Model 343
7.5.4 Nearest Neighbor 343
7.5.5 Delta Debugging 344
7.5.6 From the Trace Context to the Failure Context 344
7.5.7 The Structure of Association Rules 345
7.5.8 Multiple Faults 345
7.6 Fault Localization Using N-gram Analysis 346
7.6.1 Background 347
7.6.1.1 Execution Sequence 347
7.6.1.2 N-gram Analysis 347
7.6.1.3 Linear Execution Blocks 349
7.6.1.4 Association Rule Mining 349
7.6.2 Methodology 350
7.6.3 Conclusion 353
7.7 Fault Localization for GUI Software Using N-gram Analysis 353
7.7.1 Background 354
7.7.1.1 Representation of the GUI and Its Operations 354
7.7.1.2 Event Handler 356
7.7.1.3 N-gram 356
7.7.2 Association Rule Mining 357
7.7.3 Methodology 357
7.7.3.1 General Approach 358
7.7.3.2 N-gram Fault Localization Algorithm 358
7.8 Conclusion 360
Notes 361
References 361
8 Information Retrieval-Based Techniques for Software Fault Localization 365
Xin Xia and David Lo
8.1 Introduction 365
8.2 General IR-Based Fault Localization Process 368
8.3 Fundamental Information Retrieval Techniques for Software Fault Localization 369
8.3.1 Vector Space Model 369
8.3.2 Topic Modeling 370
8.3.3 Word Embedding 371
8.4 Evaluation Metrics 372
8.4.1 Top-k Prediction Accuracy 372
8.4.2 Mean Reciprocal Rank (MRR) 373
8.4.3 Mean Average Precision (MAP) 373
8.5 Techniques for Different Scenarios 374
8.5.1 Text of Current Bug Report Only 374
8.5.1.1 VSM Variants 374
8.5.1.2 Topic Modeling 375
8.5.2 Text and History 376
8.5.2.1 VSM Variants 376
8.5.2.2 Topic Modeling 378
8.5.2.3 Deep Learning 378
8.5.3 Text and Stack/Execution Traces 379
8.6 Empirical Studies 380
8.7 Miscellaneous 383
8.8 Conclusion 385
Notes 385
References 386
9 Model-Based Techniques for Software Fault Localization 393
Birgit Hofer, Franz Wotawa, Wolfgang Mayer, and Markus Stumptner
9.1 Introduction 393
9.2 Basic Definitions and Algorithms 395
9.2.1 Algorithms for MBD 401
9.3 Modeling for MBD 404
9.3.1 The Value-Based Model 405
9.3.2 The Dependency-Based Model 409
9.3.3 Approximation Models for Debugging 413
9.3.4 Other Modeling Approaches 416
9.4 Application Areas 417
9.5 Hybrid Approaches 418
9.6 Conclusions 419
Notes 420
References 420
10 Software Fault Localization in Spreadsheets 425
Birgit Hofer and Franz Wotawa
10.1 Motivation 425
10.2 Definition of the Spreadsheet Language 427
10.3 Cones 430
10.4 Spectrum-Based Fault Localization 431
10.5 Model-Based Spreadsheet Debugging 435
10.6 Repair Approaches 440
10.7 Checking Approaches 443
10.8 Testing 445
10.9 Conclusion 446
Notes 446
References 447
11 Theoretical Aspects of Software Fault Localization 451
Xiaoyuan Xie and W. Eric Wong
11.1 Introduction 451
11.2 A Model-Based Hybrid Analysis 452
11.2.1 The Model Program Segment 452
11.2.2 Important Findings 454
11.2.3 Discussion 454
11.3 A Set-Based Pure Theoretical Framework 455
11.3.1 Definitions and Theorems 455
11.3.2 Evaluation 457
11.3.3 The Maximality Among All Investigated Formulas 461
11.4 A Generalized Study 462
11.4.1 Spectral Coordinate for SBFL 462
11.4.2 Generalized Maximal and Greatest Formula in F 464
11.5 About the Assumptions 465
11.5.1 Omission Fault and 100% Coverage 465
11.5.2 Tie-Breaking Scheme 467
11.5.3 Multiple Faults 467
11.5.4 Some Plausible Causes for the Inconsistence Between Empirical and Theoretical Analyses 468
Notes 469
References 470
12 Software Fault Localization for Programs with Multiple Bugs 473
Ruizhi Gao, W. Eric Wong, and Rui Abreu
12.1 Introduction 473
12.2 One-Bug-at-a-Time 474
12.3 Two Techniques Proposed by Jones et al. 475
12.3.1 J1: Clustering Based on Profiles and Fault Localization Results 476
12.3.1.1 Clustering Profile-Based Behavior Models 476
12.3.1.2 Using Fault Localization to Stop Clustering 478
12.3.1.3 Using Fault Localization Clustering to Refine Clusters 479
12.3.2 J2: Clustering Based on Fault Localization Results 480
12.4 Localization of Multiple Bugs Using Algorithms from Integer Linear Programming 481
12.5 MSeer: an Advanced Fault Localization Technique for Locating Multiple Bugs in Parallel 483
12.5.1 MSeer 485
12.5.1.1 Representation of Failed Test Cases 485
12.5.1.2 Revised Kendall tau Distance 486
12.5.1.3 Clustering 488
12.5.1.4 MSeer: a Technique for Locating Multiple Bugs in Parallel 494
12.5.2 A Running Example 496
12.5.3 Case Studies 499
12.5.3.1 Subject Programs and Data Collections 499
12.5.3.2 Evaluation of Effectiveness and Efficiency 501
12.5.3.3 Results 503
12.5.4 Discussions 510
12.5.4.1 Using Different Fault Localization Techniques 510
12.5.4.2 Apply MSeer to Programs with a Single Bug 510
12.5.4.3 Distance Metrics 512
12.5.4.4 The Importance of Estimating the Number of Clusters and Assigning Initial Medoids 514
12.6 Spectrum-Based Reasoning for Fault Localization 514
12.6.1 Barinel 515
12.6.2 Results 517
12.7 Other Studies 518
12.8 Conclusion 520
Notes 521
References 522
13 Emerging Aspects of Software Fault Localization 529
T.H. Tse, David Lo, Alex Gorce, Michael Perscheid, Robert Hirschfeld, and W. Eric Wong
13.1 Introduction 529
13.2 Application of the Scientific Method to Fault Localization 530
13.2.1 Scientific Debugging 531
13.2.2 Identifying and Assigning Bug Reports to Developers 532
13.2.3 Using Debuggers in Fault Localization 534
13.2.4 Conclusion 538
13.3 Fault Localization in the Absence of Test Oracles by Semi-proving of Metamorphic Relations 538
13.3.1 Metamorphic Testing and Metamorphic Relations 539
13.3.2 The Semi-proving Methodology 541
13.3.2.1 Semi-proving by Symbolic Evaluation 541
13.3.2.2 Semi-proving as a Fault Localization Technique 542
13.3.3 The Need to Go Beyond Symbolic Evaluation 543
13.3.4 Initial Empirical Study 543
13.3.5 Detailed Illustrative Examples 544
13.3.5.1 Fault Localization Example Related to Predicate Statement 544
13.3.5.2 Fault Localization Example Related to Faulty Statement 548
13.3.5.3 Fault Localization Example Related to Missing Path 552
13.3.5.4 Fault Localization Example Related to Loop 556
13.3.6 Comparisons with Related Work 558
13.3.7 Conclusion 560
13.4 Automated Prediction of Fault Localization Effectiveness 560
13.4.1 Overview of PEFA 561
13.4.2 Model Learning 564
13.4.3 Effectiveness Prediction 564
13.4.4 Conclusion 564
13.5 Integrating Fault Localization into Automated Test Generation Tools 565
13.5.1 Localization in the Context of Automated Test Generation 566
13.5.2 Automated Test Generation Tools Supporting Localization 567
13.5.3 Antifragile Tests and Localization 568
13.5.4 Conclusion 568
Notes 569
References 569
Index 581