Handbook of Software Fault Localization. Foundations and Advances. Edition No. 1


Book
608 Pages
April 2023
John Wiley and Sons Ltd
ID: 5840909

Handbook of Software Fault Localization

A comprehensive analysis of fault localization techniques and strategies

In Handbook of Software Fault Localization: Foundations and Advances, distinguished computer scientists Prof. W. Eric Wong and Prof. T.H. Tse deliver a robust treatment of up-to-date techniques, tools, and essential issues in software fault localization. The authors offer collective discussions of fault localization strategies with an emphasis on the most important features of each approach.

The book also explores critical aspects of software fault localization, like multiple bugs, successful and failed test cases, coincidental correctness, faults introduced by missing code, the combination of several fault localization techniques, ties within fault localization rankings, concurrency bugs, spreadsheet fault localization, and theoretical studies on fault localization.

Readers will benefit from the authors’ straightforward discussions of how to apply cost-effective techniques to a variety of specific environments common in the real world. They will also enjoy the in-depth explorations of recent research directions on this topic.

Handbook of Software Fault Localization also includes: - A thorough introduction to the concepts of software testing and debugging, their importance, typical challenges, and the consequences of poor efforts - Comprehensive explorations of traditional fault localization techniques, including program logging, assertions, and breakpoints - Practical discussions of slicing-based, program spectrum-based, and statistics-based techniques - In-depth examinations of machine learning-, data mining-, and model-based techniques for software fault localization

Perfect for researchers, professors, and students studying and working in the field, Handbook of Software Fault Localization: Foundations and Advances is also an indispensable resource for software engineers, managers, and software project decision makers responsible for schedule and budget control. </p

Editor Biographies xv

List of Contributors xvii

1 Software Fault Localization: an Overview of Research, Techniques, and Tools 1
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, Franz Wotawa, and Dongcheng li

1.1 Introduction 1

1.2 Traditional Fault Localization Techniques 14

1.2.1 Program Logging 14

1.2.2 Assertions 14

1.2.3 Breakpoints 14

1.2.4 Profiling 15

1.3 Advanced Fault Localization Techniques 15

1.3.1 Slicing-Based Techniques 15

1.3.2 Program Spectrum-Based Techniques 20

1.3.2.1 Notation 20

1.3.2.2 Techniques 21

1.3.2.3 Issues and Concerns 27

1.3.3 Statistics-Based Techniques 30

1.3.4 Program State-Based Techniques 32

1.3.5 Machine Learning-Based Techniques 34

1.3.6 Data Mining-Based Techniques 36

1.3.7 Model-Based Techniques 37

1.3.8 Additional Techniques 41

1.3.9 Distribution of Papers in Our Repository 45

1.4 Subject Programs 47

1.5 Evaluation Metrics 50

1.6 Software Fault Localization Tools 53

1.7 Critical Aspects 58

1.7.1 Fault Localization with Multiple Bugs 58

1.7.2 Inputs, Outputs, and Impact of Test Cases 60

1.7.3 Coincidental Correctness 63

1.7.4 Faults Introduced by Missing Code 64

1.7.5 Combination of Multiple Fault Localization Techniques 65

1.7.6 Ties Within Fault Localization Rankings 67

1.7.7 Fault Localization for Concurrency Bugs 67

1.7.8 Spreadsheet Fault Localization 68

1.7.9 Theoretical Studies 70

1.8 Conclusion 71

Notes 73

References 73

2 Traditional Techniques for Software Fault Localization 119
Yihao Li, Linghuan Hu, W. Eric Wong, Vidroha Debroy, and Dongcheng li

2.1 Program Logging 119

2.2 Assertions 121

2.3 Breakpoints 124

2.4 Profiling 125

2.5 Discussion 128

2.6 Conclusion 130

References 131

3 Slicing-Based Techniques for Software Fault Localization 135
W. Eric Wong, Hira Agrawal, and Xiangyu Zhang

3.1 Introduction 135

3.2 Static Slicing-Based Fault Localization 136

3.2.1 Introduction 136

3.2.2 Program Slicing Combined with Equivalence Analysis 137

3.2.3 Further Application 138

3.3 Dynamic Slicing-Based Fault Localization 138

3.3.1 Dynamic Slicing and Backtracking Techniques 144

3.3.2 Dynamic Slicing and Model-Based Techniques 145

3.3.3 Critical Slicing 148

3.3.3.1 Relationships Between Critical Slices (CS) and Exact Dynamic Program Slices (DPS) 149

3.3.3.2 Relationship Between Critical Slices and Executed Static Program Slices 150

3.3.3.3 Construction Cost 150

3.3.4 Multiple-Points Dynamic Slicing 151

3.3.4.1 BwS of an Erroneous Computed Value 152

3.3.4.2 FwS of Failure-Inducing Input Difference 152

3.3.4.3 BiS of a Critical Predicate 154

3.3.4.4 MPSs: Dynamic Chops 157

3.3.5 Execution Indexing 158

3.3.5.1 Concepts 159

3.3.5.2 Structural Indexing 161

3.3.6 Dual Slicing to Locate Concurrency Bugs 165

3.3.6.1 Trace Comparison 165

3.3.6.2 Dual Slicing 168

3.3.7 Comparative Causality: a Causal Inference Model Based on Dual Slicing 173

3.3.7.1 Property One: Relevance 174

3.3.7.2 Property Two: Sufficiency 175

3.3.8 Implicit Dependences to Locate Execution Omission Errors 177

3.3.9 Other Dynamic Slicing-Based Techniques 179

3.4 Execution Slicing-Based Fault Localization 179

3.4.1 Fault Localization Using Execution Dice 179

3.4.2 A Family of Fault Localization Heuristics Based on Execution Slicing 181

3.4.2.1 Heuristic I 182

3.4.2.2 Heuristic II 183

3.4.2.3 Heuristic III 185

3.4.3 Effective Fault Localization Based on Execution Slices and Inter-block Data Dependence 188

3.4.3.1 Augmenting a Bad D(1) 189

3.4.3.2 Refining a Good D(1) 190

3.4.3.3 An Incremental Debugging Strategy 191

3.4.4 Other Execution Slicing-Based Techniques in Software Fault Localization 193

3.5 Discussions 193

3.6 Conclusion 194

Notes 195

References 195

4 Spectrum-Based Techniques for Software Fault Localization 201
W. Eric Wong, Hua Jie Lee, Ruizhi Gao, and Lee Naish

4.1 Introduction 201

4.2 Background and Notation 203

4.2.1 Similarity Coefficient-Based Fault Localization 204

4.2.2 An Example of Using Similarity Coefficient to Compute Suspiciousness 205

4.3 Insights of Some Spectra-Based Metrics 210

4.4 Equivalence Metrics 212

4.4.1 Applicability of the Equivalence Relation to Other Fault Localization Techniques 217

4.4.2 Applicability Beyond Fault Localization 218

4.5 Selecting a Good Suspiciousness Function (Metric) 219

4.5.1 Cost of Using a Metric 219

4.5.2 Optimality for Programs with a Single Bug 220

4.5.3 Optimality for Programs with Deterministic Bugs 221

4.6 Using Spectrum-Based Metrics for Fault Localization 222

4.6.1 Spectrum-Based Metrics for Fault Localization 222

4.6.2 Refinement of Spectra-Based Metrics 227

4.7 Empirical Evaluation Studies of SBFL Metrics 232

4.7.1 The Construction of D ∗ 234

4.7.2 An Illustrative Example 235

4.7.3 A Case Study Using D ∗ 237

4.7.3.1 Subject Programs 237

4.7.3.2 Fault Localization Techniques Used in Comparisons 238

4.7.3.3 Evaluation Metrics and Criteria 239

4.7.3.4 Statement with Same Suspiciousness Values 240

4.7.3.5 Results 241

4.7.3.6 Effectiveness of D ∗ with Different Values of ∗ 247

4.7.3.7 D ∗ Versus Other Fault Localization Techniques 248

4.7.3.8 Programs with Multiple Bugs 251

4.7.3.9 Discussion 255

4.8 Conclusion 261

Notes 262

References 263

5 Statistics-Based Techniques for Software Fault Localization 271
Zhenyu Zhang and W. Eric Wong

5.1 Introduction 271

5.1.1 Tarantula 272

5.1.2 How It Works 272

5.2 Working with Statements 274

5.2.1 Techniques Under the Same Problem Settings 275

5.2.2 Statistical Variances 275

5.3 Working with Non-statements 283

5.3.1 Predicate: a Popular Trend 283

5.3.2 BPEL: a Sample Application 285

5.4 Purifying the Input 286

5.4.1 Coincidental Correctness Issue 286

5.4.2 Class Balance Consideration 287

5.5 Reinterpreting the Output 288

5.5.1 Revealing Fault Number 288

5.5.2 Noise Reduction 291

Notes 292

References 293

6 Machine Learning-Based Techniques for Software Fault Localization 297
W. Eric Wong

6.1 Introduction 297

6.2 BP Neural Network-Based Fault Localization 298

6.2.1 Fault Localization with a BP Neural Network 298

6.2.2 Reduce the Number of Candidate Suspicious Statements 302

6.3 RBF Neural Network-Based Fault Localization 304

6.3.1 RBF Neural Networks 304

6.3.2 Methodology 305

6.3.2.1 Fault Localization Using an RBF Neural Network 306

6.3.2.2 Training of the RBF Neural Network 307

6.3.2.3 Definition of a Weighted Bit-Comparison-Based Dissimilarity 309

6.4 C4.5 Decision Tree-Based Fault Localization 309

6.4.1 Category-Partition for Rule Induction 309

6.4.2 Rule Induction Algorithms 310

6.4.3 Statement Ranking Strategies 310

6.4.3.1 Revisiting Tarantula 310

6.4.3.2 Ranking Statements Based on C4.5 Rules 312

6.5 Applying Simulated Annealing with Statement Pruning for an SBFL Formula 314

6.6 Conclusion 317

Notes 317

References 317

7 Data Mining-Based Techniques for Software Fault Localization 321
Peggy Cellier, Mireille Ducassé, Sébastien Ferré, Olivier Ridoux, and W. Eric Wong

7.1 Introduction 321

7.2 Formal Concept Analysis and Association Rules 324

7.2.1 Formal Concept Analysis 325

7.2.2 Association Rules 327

7.3 Data Mining for Fault Localization 329

7.3.1 Failure Rules 329

7.3.2 Failure Lattice 331

7.4 The Failure Lattice for Multiple Faults 336

7.4.1 Dependencies Between Faults 336

7.4.2 Example 341

7.5 Discussion 342

7.5.1 The Structure of the Execution Traces 342

7.5.2 Union Model 343

7.5.3 Intersection Model 343

7.5.4 Nearest Neighbor 343

7.5.5 Delta Debugging 344

7.5.6 From the Trace Context to the Failure Context 344

7.5.7 The Structure of Association Rules 345

7.5.8 Multiple Faults 345

7.6 Fault Localization Using N-gram Analysis 346

7.6.1 Background 347

7.6.1.1 Execution Sequence 347

7.6.1.2 N-gram Analysis 347

7.6.1.3 Linear Execution Blocks 349

7.6.1.4 Association Rule Mining 349

7.6.2 Methodology 350

7.6.3 Conclusion 353

7.7 Fault Localization for GUI Software Using N-gram Analysis 353

7.7.1 Background 354

7.7.1.1 Representation of the GUI and Its Operations 354

7.7.1.2 Event Handler 356

7.7.1.3 N-gram 356

7.7.2 Association Rule Mining 357

7.7.3 Methodology 357

7.7.3.1 General Approach 358

7.7.3.2 N-gram Fault Localization Algorithm 358

7.8 Conclusion 360

Notes 361

References 361

8 Information Retrieval-Based Techniques for Software Fault Localization 365
Xin Xia and David Lo

8.1 Introduction 365

8.2 General IR-Based Fault Localization Process 368

8.3 Fundamental Information Retrieval Techniques for Software Fault Localization 369

8.3.1 Vector Space Model 369

8.3.2 Topic Modeling 370

8.3.3 Word Embedding 371

8.4 Evaluation Metrics 372

8.4.1 Top-k Prediction Accuracy 372

8.4.2 Mean Reciprocal Rank (MRR) 373

8.4.3 Mean Average Precision (MAP) 373

8.5 Techniques for Different Scenarios 374

8.5.1 Text of Current Bug Report Only 374

8.5.1.1 VSM Variants 374

8.5.1.2 Topic Modeling 375

8.5.2 Text and History 376

8.5.2.1 VSM Variants 376

8.5.2.2 Topic Modeling 378

8.5.2.3 Deep Learning 378

8.5.3 Text and Stack/Execution Traces 379

8.6 Empirical Studies 380

8.7 Miscellaneous 383

8.8 Conclusion 385

Notes 385

References 386

9 Model-Based Techniques for Software Fault Localization 393
Birgit Hofer, Franz Wotawa, Wolfgang Mayer, and Markus Stumptner

9.1 Introduction 393

9.2 Basic Definitions and Algorithms 395

9.2.1 Algorithms for MBD 401

9.3 Modeling for MBD 404

9.3.1 The Value-Based Model 405

9.3.2 The Dependency-Based Model 409

9.3.3 Approximation Models for Debugging 413

9.3.4 Other Modeling Approaches 416

9.4 Application Areas 417

9.5 Hybrid Approaches 418

9.6 Conclusions 419

Notes 420

References 420

10 Software Fault Localization in Spreadsheets 425
Birgit Hofer and Franz Wotawa

10.1 Motivation 425

10.2 Definition of the Spreadsheet Language 427

10.3 Cones 430

10.4 Spectrum-Based Fault Localization 431

10.5 Model-Based Spreadsheet Debugging 435

10.6 Repair Approaches 440

10.7 Checking Approaches 443

10.8 Testing 445

10.9 Conclusion 446

Notes 446

References 447

11 Theoretical Aspects of Software Fault Localization 451
Xiaoyuan Xie and W. Eric Wong

11.1 Introduction 451

11.2 A Model-Based Hybrid Analysis 452

11.2.1 The Model Program Segment 452

11.2.2 Important Findings 454

11.2.3 Discussion 454

11.3 A Set-Based Pure Theoretical Framework 455

11.3.1 Definitions and Theorems 455

11.3.2 Evaluation 457

11.3.3 The Maximality Among All Investigated Formulas 461

11.4 A Generalized Study 462

11.4.1 Spectral Coordinate for SBFL 462

11.4.2 Generalized Maximal and Greatest Formula in F 464

11.5 About the Assumptions 465

11.5.1 Omission Fault and 100% Coverage 465

11.5.2 Tie-Breaking Scheme 467

11.5.3 Multiple Faults 467

11.5.4 Some Plausible Causes for the Inconsistence Between Empirical and Theoretical Analyses 468

Notes 469

References 470

12 Software Fault Localization for Programs with Multiple Bugs 473
Ruizhi Gao, W. Eric Wong, and Rui Abreu

12.1 Introduction 473

12.2 One-Bug-at-a-Time 474

12.3 Two Techniques Proposed by Jones et al. 475

12.3.1 J1: Clustering Based on Profiles and Fault Localization Results 476

12.3.1.1 Clustering Profile-Based Behavior Models 476

12.3.1.2 Using Fault Localization to Stop Clustering 478

12.3.1.3 Using Fault Localization Clustering to Refine Clusters 479

12.3.2 J2: Clustering Based on Fault Localization Results 480

12.4 Localization of Multiple Bugs Using Algorithms from Integer Linear Programming 481

12.5 MSeer: an Advanced Fault Localization Technique for Locating Multiple Bugs in Parallel 483

12.5.1 MSeer 485

12.5.1.1 Representation of Failed Test Cases 485

12.5.1.2 Revised Kendall tau Distance 486

12.5.1.3 Clustering 488

12.5.1.4 MSeer: a Technique for Locating Multiple Bugs in Parallel 494

12.5.2 A Running Example 496

12.5.3 Case Studies 499

12.5.3.1 Subject Programs and Data Collections 499

12.5.3.2 Evaluation of Effectiveness and Efficiency 501

12.5.3.3 Results 503

12.5.4 Discussions 510

12.5.4.1 Using Different Fault Localization Techniques 510

12.5.4.2 Apply MSeer to Programs with a Single Bug 510

12.5.4.3 Distance Metrics 512

12.5.4.4 The Importance of Estimating the Number of Clusters and Assigning Initial Medoids 514

12.6 Spectrum-Based Reasoning for Fault Localization 514

12.6.1 Barinel 515

12.6.2 Results 517

12.7 Other Studies 518

12.8 Conclusion 520

Notes 521

References 522

13 Emerging Aspects of Software Fault Localization 529
T.H. Tse, David Lo, Alex Gorce, Michael Perscheid, Robert Hirschfeld, and W. Eric Wong

13.1 Introduction 529

13.2 Application of the Scientific Method to Fault Localization 530

13.2.1 Scientific Debugging 531

13.2.2 Identifying and Assigning Bug Reports to Developers 532

13.2.3 Using Debuggers in Fault Localization 534

13.2.4 Conclusion 538

13.3 Fault Localization in the Absence of Test Oracles by Semi-proving of Metamorphic Relations 538

13.3.1 Metamorphic Testing and Metamorphic Relations 539

13.3.2 The Semi-proving Methodology 541

13.3.2.1 Semi-proving by Symbolic Evaluation 541

13.3.2.2 Semi-proving as a Fault Localization Technique 542

13.3.3 The Need to Go Beyond Symbolic Evaluation 543

13.3.4 Initial Empirical Study 543

13.3.5 Detailed Illustrative Examples 544

13.3.5.1 Fault Localization Example Related to Predicate Statement 544

13.3.5.2 Fault Localization Example Related to Faulty Statement 548

13.3.5.3 Fault Localization Example Related to Missing Path 552

13.3.5.4 Fault Localization Example Related to Loop 556

13.3.6 Comparisons with Related Work 558

13.3.7 Conclusion 560

13.4 Automated Prediction of Fault Localization Effectiveness 560

13.4.1 Overview of PEFA 561

13.4.2 Model Learning 564

13.4.3 Effectiveness Prediction 564

13.4.4 Conclusion 564

13.5 Integrating Fault Localization into Automated Test Generation Tools 565

13.5.1 Localization in the Context of Automated Test Generation 566

13.5.2 Automated Test Generation Tools Supporting Localization 567

13.5.3 Antifragile Tests and Localization 568

13.5.4 Conclusion 568

Notes 569

References 569

Index 581

Authors

W. Eric Wong University of Texas at Dallas, TX. T.H. Tse The University of Hong Kong, Pokfulam, Hong Kong.

Table of Contents

Authors

Related Topics

Related Products

LCA Software Market Report: Trends, Forecast and Competitive Analysis to 2031

Building Energy Analysis Software Market by Component, Deployment Type, Organization Size, End User, Application - Global Forecast to 2030

Journal of Software Engineering Tools & Technology Trends

Autonomous Driving Operating System Market by Software Function, Level Of Autonomy - Global Forecast to 2030

34th European Symposium on Computer Aided Process Engineering /15th International Symposium on Process Systems Engineering. ESCAPE-34/PSE2024. Computer Aided Chemical Engineering Volume 53