Handbook and reference for industrial statisticians and system reliability engineers
System Reliability Theory: Models, Statistical Methods, and Applications, Third Edition presents an updated and revised look at system reliability theory, modeling, and analytical methods. The new edition is based on feedback to the second edition from numerous students, professors, researchers, and industries around the world. New sections and chapters are added together with new real-world industry examples, and standards and problems are revised and updated.
System Reliability Theory covers a broad and deep array of system reliability topics, including:
· In depth discussion of failures and failure modes
· The main system reliability assessment methods
· Common-cause failure modeling
· Deterioration modeling
· Maintenance modeling and assessment using Python code
· Bayesian probability and methods
· Life data analysis using R
Perfect for undergraduate and graduate students taking courses in reliability engineering, this book also serves as a reference and resource for practicing statisticians and engineers.
Throughout, the book has a practical focus, incorporating industry feedback and real-world industry problems and examples.
Table of Contents
Preface xxiii
About the Companion Website xxix
1 Introduction 1
1.1 What is Reliability? 1
1.1.1 Service Reliability 2
1.1.2 Past and Future Reliability 3
1.2 The Importance of Reliability 3
1.2.1 Related Applications 4
1.3 Basic Reliability Concepts 6
1.3.1 Reliability 6
1.3.2 Maintainability and Maintenance 8
1.3.3 Availability 8
1.3.4 Quality 9
1.3.5 Dependability 9
1.3.6 Safety and Security 10
1.3.7 RAM and RAMS 10
1.4 Reliability Metrics 11
1.4.1 Reliability Metrics for a Technical Item 11
1.4.2 Reliability Metrics for a Service 12
1.5 Approaches to Reliability Analysis 12
1.5.1 The Physical Approach to Reliability 13
1.5.2 Systems Approach to Reliability 13
1.6 Reliability Engineering 15
1.6.1 Roles of the Reliability Engineer 16
1.6.2 Timing of Reliability Studies 17
1.7 Objectives, Scope, and Delimitations of the Book 17
1.8 Trends and Challenges 19
1.9 Standards and Guidelines 20
1.10 History of System Reliability 20
1.11 Problems 26
References 27
2 The Study Object and its Functions 31
2.1 Introduction 31
2.2 System and System Elements 31
2.2.1 Item 32
2.2.2 Embedded Item 33
2.3 Boundary Conditions 33
2.3.1 Closed and Open Systems 34
2.4 Operating Context 35
2.5 Functions and Performance Requirements 35
2.5.1 Functions 35
2.5.2 Performance Requirements 36
2.5.3 Classification of Functions 37
2.5.4 Functional Modeling and Analysis 38
2.5.5 Function Trees 38
2.5.6 SADT and IDEF 0 39
2.6 System Analysis 41
2.6.1 Synthesis 41
2.7 Simple, Complicated, and Complex Systems 42
2.8 System Structure Modeling 44
2.8.1 Reliability Block Diagram 44
2.8.2 Series Structure 46
2.8.3 Parallel Structure 46
2.8.4 Redundancy 47
2.8.5 Voted Structure 47
2.8.6 Standby Structure 48
2.8.7 More Complicated Structures 48
2.8.8 Two Different System Functions 49
2.8.9 Practical Construction of RBDs 50
2.9 Problems 51
References 52
3 Failures and Faults 55
3.1 Introduction 55
3.1.1 States and Transitions 56
3.1.2 Operational Modes 56
3.2 Failures 57
3.2.1 Failures in a State 58
3.2.2 Failures During Transition 59
3.3 Faults 60
3.4 Failure Modes 60
3.5 Failure Causes and Effects 62
3.5.1 Failure Causes 62
3.5.2 Proximate Causes and Root Causes 63
3.5.3 Hierarchy of Causes 64
3.6 Classification of Failures and Failure Modes 64
3.6.1 Classification According to Local Consequence 65
3.6.2 Classification According to Cause 65
3.6.3 Failure Mechanisms 70
3.6.4 Software Faults 71
3.6.5 Failure Effects 71
3.7 Failure/Fault Analysis 72
3.7.1 Cause and Effect Analysis 73
3.7.2 Root Cause Analysis 74
3.8 Problems 76
References 77
4 Qualitative System Reliability Analysis 79
4.1 Introduction 79
4.1.1 Deductive Versus Inductive Analysis 80
4.2 FMEA/FMECA 80
4.2.1 Types of FMECA 81
4.2.2 Objectives of FMECA 82
4.2.3 FMECA Procedure 83
4.2.4 Applications 87
4.3 Fault Tree Analysis 88
4.3.1 Fault Tree Symbols and Elements 88
4.3.2 Definition of the Problem and the Boundary Conditions 91
4.3.3 Constructing the Fault Tree 92
4.3.4 Identification of Minimal Cut and Path Sets 95
4.3.5 MOCUS 96
4.3.6 Qualitative Evaluation of the Fault Tree 98
4.3.7 Dynamic Fault Trees 101
4.4 Event Tree Analysis 103
4.4.1 Initiating Event 104
4.4.2 Safety Functions 105
4.4.3 Event Tree Construction 106
4.4.4 Description of Resulting Event Sequences 106
4.5 Fault Trees versus Reliability Block Diagrams 109
4.5.1 Recommendation 111
4.6 Structure Function 111
4.6.1 Series Structure 112
4.6.2 Parallel Structure 112
4.6.3 koon:G Structure 113
4.6.4 Truth Tables 114
4.7 System Structure Analysis 114
4.7.1 Single Points of Failure 115
4.7.2 Coherent Structures 115
4.7.3 General Properties of Coherent Structures 117
4.7.4 Structures Represented by Paths and Cuts 119
4.7.5 Pivotal Decomposition 123
4.7.6 Modules of Coherent Structures 124
4.8 Bayesian Networks 127
4.8.1 Illustrative Examples 128
4.9 Problems 131
References 138
5 Probability Distributions in Reliability Analysis 141
5.1 Introduction 141
5.1.1 State Variable 142
5.1.2 Time-to-Failure 142
5.2 A Dataset 143
5.2.1 Relative Frequency Distribution 143
5.2.2 Empirical Distribution and Survivor Function 144
5.3 General Characteristics of Time-to-Failure Distributions 145
5.3.1 Survivor Function 147
5.3.2 Failure Rate Function 148
5.3.3 Conditional Survivor Function 153
5.3.4 Mean Time-to-Failure 154
5.3.5 Additional Probability Metrics 155
5.3.6 Mean Residual Lifetime 157
5.3.7 Mixture of Time-to-Failure Distributions 160
5.4 Some Time-to-Failure Distributions 161
5.4.1 The Exponential Distribution 161
5.4.2 The Gamma Distribution 168
5.4.3 TheWeibull Distribution 173
5.4.4 The Normal Distribution 180
5.4.5 The Lognormal Distribution 183
5.4.6 Additional Time-to-Failure Distributions 188
5.5 Extreme Value Distributions 188
5.5.1 The Gumbel Distribution of the Smallest Extreme 190
5.5.2 The Gumbel Distribution of the Largest Extreme 191
5.5.3 TheWeibull Distribution of the Smallest Extreme 191
5.6 Time-to-Failure Models With Covariates 193
5.6.1 Accelerated Failure Time Models 194
5.6.2 The Arrhenius Model 195
5.6.3 Proportional Hazards Models 198
5.7 Additional Continuous Distributions 198
5.7.1 The Uniform Distribution 198
5.7.2 The Beta Distribution 199
5.8 Discrete Distributions 200
5.8.1 Binomial Situation 200
5.8.2 The Binomial Distribution 201
5.8.3 The Geometric Distribution 201
5.8.4 The Negative Binomial Distribution 202
5.8.5 The Homogeneous Poisson Process 203
5.9 Classes of Time-to-Failure Distributions 205
5.9.1 IFR and DFR Distributions 206
5.9.2 IFRA and DFRA Distributions 208
5.9.3 NBU and NWU Distributions 208
5.9.4 NBUE and NWUE Distributions 209
5.9.5 Some Implications 209
5.10 Summary of Time-to-Failure Distributions 210
5.11 Problems 210
References 218
6 System Reliability Analysis 221
6.1 Introduction 221
6.1.1 Assumptions 222
6.2 System Reliability 222
6.2.1 Reliability of Series Structures 223
6.2.2 Reliability of Parallel Structures 224
6.2.3 Reliability of koon Structures 225
6.2.4 Pivotal Decomposition 226
6.2.5 Critical Component 227
6.3 Nonrepairable Systems 228
6.3.1 Nonrepairable Series Structures 228
6.3.2 Nonrepairable Parallel Structures 230
6.3.3 Nonrepairable 2oo3 Structures 234
6.3.4 A Brief Comparison 235
6.3.5 Nonrepairable koon Structures 236
6.4 Standby Redundancy 237
6.4.1 Passive Redundancy, Perfect Switching, No Repairs 238
6.4.2 Cold Standby, Imperfect Switch, No Repairs 240
6.4.3 Partly Loaded Redundancy, Imperfect Switch, No Repairs 241
6.5 Single Repairable Items 242
6.5.1 Availability 243
6.5.2 Average Availability with Perfect Repair 244
6.5.3 Availability of a Single Item with Constant Failure and Repair Rates 246
6.5.4 Operational Availability 247
6.5.5 Production Availability 248
6.5.6 Punctuality 249
6.5.7 Failure Rate of Repairable Items 249
6.6 Availability of Repairable Systems 252
6.6.1 The MUT and MDT of Repairable Systems 253
6.6.2 Computation Based on Minimal Cut Sets 258
6.6.3 Uptimes and Downtimes for Reparable Systems 260
6.7 Quantitative Fault Tree Analysis 262
6.7.1 Terminology and Symbols 263
6.7.2 Delimitations and Assumptions 263
6.7.3 Fault Trees with a Single AND-Gate 264
6.7.4 Fault Tree with a Single OR-Gate 265
6.7.5 The Upper Bound Approximation Formula for Q0(t) 265
6.7.6 The Inclusion-Exclusion Principle 267
6.7.7 ROCOF of a Minimal Cut Parallel Structure 271
6.7.8 Frequency of the TOP Event 271
6.7.9 Binary Decision Diagrams 273
6.8 Event Tree Analysis 275
6.9 Bayesian Networks 277
6.9.1 Influence and Cause 278
6.9.2 Independence Assumptions 278
6.9.3 Conditional Probability Table 279
6.9.4 Conditional Independence 280
6.9.5 Inference and Learning 282
6.9.6 BN and Fault Tree Analysis 282
6.10 Monte Carlo Simulation 284
6.10.1 Random Number Generation 285
6.10.2 Monte Carlo Next Event Simulation 287
6.10.3 Simulation of Multicomponent Systems 289
6.11 Problems 291
References 296
7 Reliability Importance Metrics 299
7.1 Introduction 299
7.1.1 Objectives of Reliability Importance Metrics 300
7.1.2 Reliability Importance Metrics Considered 300
7.1.3 Assumptions and Notation 301
7.2 Critical Components 302
7.3 Birnbaum’s Metric for Structural Importance 304
7.4 Birnbaum’s Metric of Reliability Importance 305
7.4.1 Birnbaum’s Metric in Fault Tree Analysis 307
7.4.2 A Second Definition of Birnbaum’s Metric 308
7.4.3 A Third Definition of Birnbaum’s Metric 310
7.4.4 Computation of Birnbaum’s Metric for Structural Importance 312
7.4.5 Variants of Birnbaum’s Metric 312
7.5 Improvement Potential 313
7.5.1 Relation to Birnbaum’s Metric 314
7.5.2 A Variant of the Improvement Potential 314
7.6 Criticality Importance 315
7.7 Fussell-Vesely’s Metric 317
7.7.1 Derivation of Formulas for Fussell-Vesely’s Metric 317
7.7.2 Relationship to Other Metrics for Importance 320
7.8 Differential Importance Metric 323
7.8.1 Option 1 323
7.8.2 Option 2 324
7.9 Importance Metrics for Safety Features 326
7.9.1 Risk AchievementWorth 327
7.9.2 Risk ReductionWorth 329
7.9.3 Relationship with the Improvement Potential 330
7.10 Barlow-Proschan’s Metric 331
7.11 Problems 333
References 335
8 Dependent Failures 337
8.1 Introduction 337
8.1.1 Dependent Events and Variables 337
8.1.2 Correlated Variables 338
8.2 Types of Dependence 340
8.3 Cascading Failures 340
8.3.1 Tight Coupling 342
8.4 Common-Cause Failures 342
8.4.1 Multiple Failures that Are Not a CCF 344
8.4.2 Causes of CCF 344
8.4.3 Defenses Against CCF 345
8.5 CCF Models and Analysis 346
8.5.1 Explicit Modeling 347
8.5.2 Implicit Modeling 348
8.5.3 Modeling Approach 348
8.5.4 Model Assumptions 349
8.6 Basic Parameter Model 349
8.6.1 Probability of a Specific Multiplicity 350
8.6.2 Conditional Probability of a Specific Multiplicity 351
8.7 Beta-Factor Model 352
8.7.1 Relation to the BPM 354
8.7.2 Beta-Factor Model in System Analysis 354
8.7.3 Beta-Factor Model for Nonidentical Components 358
8.7.4 C-Factor Model 360
8.8 Multi-parameter Models 360
8.8.1 Binomial Failure Rate Model 360
8.8.2 Multiple Greek Letter Model 362
8.8.3 Alpha-Factor Model 364
8.8.4 Multiple Beta-Factor Model 365
8.9 Problems 366
References 368
9 Maintenance and Maintenance Strategies 371
9.1 Introduction 371
9.1.1 What is Maintenance? 372
9.2 Maintainability 372
9.3 Maintenance Categories 374
9.3.1 Completeness of a Repair Task 377
9.3.2 Condition Monitoring 377
9.4 Maintenance Downtime 378
9.4.1 Downtime Caused by Failures 379
9.4.2 Downtime of a Series Structure 381
9.4.3 Downtime of a Parallel Structure 381
9.4.4 Downtime of a General Structure 382
9.5 Reliability Centered Maintenance 382
9.5.1 What is RCM? 383
9.5.2 Main Steps of an RCM Analysis 384
9.6 Total Productive Maintenance 396
9.7 Problems 398
References 399
10 Counting Processes 401
10.1 Introduction 401
10.1.1 Counting Processes 401
10.1.2 Basic Concepts 406
10.1.3 Martingale Theory 408
10.1.4 Four Types of Counting Processes 409
10.2 Homogeneous Poisson Processes 410
10.2.1 Main Features of the HPP 411
10.2.2 Asymptotic Properties 412
10.2.3 Estimate and Confidence Interval 412
10.2.4 Sum and Decomposition of HPPs 413
10.2.5 Conditional Distribution of Failure Time 414
10.2.6 Compound HPPs 415
10.3 Renewal Processes 417
10.3.1 Basic Concepts 417
10.3.2 The Distribution of Sn 418
10.3.3 The Distribution of N(t) 420
10.3.4 The Renewal Function 421
10.3.5 The Renewal Density 423
10.3.6 Age and Remaining Lifetime 427
10.3.7 Bounds for the Renewal Function 431
10.3.8 Superimposed Renewal Processes 433
10.3.9 Renewal Reward Processes 434
10.3.10 Delayed Renewal Processes 436
10.3.11 Alternating Renewal Processes 438
10.4 Nonhomogeneous Poisson Processes 447
10.4.1 Introduction and Definitions 447
10.4.2 Some Results 449
10.4.3 Parametric NHPP Models 452
10.4.4 Statistical Tests of Trend 454
10.5 Imperfect Repair Processes 455
10.5.1 Brown and Proschan’s model 456
10.5.2 Failure Rate Reduction Models 458
10.5.3 Age Reduction Models 461
10.5.4 Trend Renewal Process 462
10.6 Model Selection 464
10.7 Problems 466
References 470
11 Markov Analysis 473
11.1 Introduction 473
11.1.1 Markov Property 475
11.2 Markov Processes 476
11.2.1 Procedure to Establish the Transition Rate Matrix 479
11.2.2 Chapman-Kolmogorov Equations 482
11.2.3 Kolmogorov Differential Equations 483
11.2.4 State Equations 484
11.3 Asymptotic Solution 487
11.3.1 System Performance Metrics 492
11.4 Parallel and Series Structures 495
11.4.1 Parallel Structures of Independent Components 495
11.4.2 Series Structures of Independent Components 497
11.4.3 Series Structure of Components Where Failure of One Component Prevents Failure of the Other 499
11.5 Mean Time to First System Failure 501
11.5.1 Absorbing States 501
11.5.2 Survivor Function 504
11.5.3 Mean Time to the First System Failure 505
11.6 Systems with Dependent Components 507
11.6.1 Common Cause Failures 508
11.6.2 Load-Sharing Systems 510
11.7 Standby Systems 512
11.7.1 Parallel System with Cold Standby and Perfect Switching 513
11.7.2 Parallel System with Cold Standby and Perfect Switching (Item A is the Main Operating Item) 515
11.7.3 Parallel System with Cold Standby and Imperfect Switching (Item A is the Main Operating Item) 517
11.7.4 Parallel System with Partly Loaded Standby and Perfect Switching (Item A is the Main Operating Item) 518
11.8 Markov Analysis in Fault Tree Analysis 519
11.8.1 Cut Set Information 520
11.8.2 System Information 521
11.9 Time-Dependent Solution 521
11.9.1 Laplace Transforms 522
11.10 Semi-Markov Processes 524
11.11 Multiphase Markov Processes 526
11.11.1 Changing the Transition Rates 526
11.11.2 Changing the Initial State 527
11.12 Piecewise Deterministic Markov Processes 528
11.12.1 Definition of PDMP 529
11.12.2 State Probabilities 529
11.12.3 A Specific Case 530
11.13 Simulation of a Markov Process 532
11.14 Problems 536
References 543
12 Preventive Maintenance 545
12.1 Introduction 545
12.2 Terminology and Cost Function 546
12.3 Time-Based Preventive Maintenance 548
12.3.1 Age Replacement 549
12.3.2 Block Replacement 553
12.3.3 P-F Intervals 557
12.4 Degradation Models 564
12.4.1 Remaining Useful Lifetime 565
12.4.2 Trend Models; Regression-Based Models 567
12.4.3 Models with Increments 569
12.4.4 Shock Models 571
12.4.5 Stochastic Processes with Discrete States 573
12.4.6 Failure Rate Models 574
12.5 Condition-Based Maintenance 574
12.5.1 CBM Strategy 575
12.5.2 Continuous Monitoring and Finite Discrete State Space 576
12.5.3 Continuous Monitoring and Continuous State Space 581
12.5.4 Inspection-Based Monitoring and Finite Discrete State Space 583
12.5.5 Inspection-Based Monitoring and Continuous State Space 586
12.6 Maintenance of Multi-Item Systems 587
12.6.1 System Model 587
12.6.2 Maintenance Models 589
12.6.3 An Illustrative Example 591
12.7 Problems 595
References 601
13 Reliability of Safety Systems 605
13.1 Introduction 605
13.2 Safety-Instrumented Systems 606
13.2.1 Main SIS Functions 607
13.2.2 Testing of SIS Functions 608
13.2.3 Failure Classification 609
13.3 Probability of Failure on Demand 611
13.3.1 Probability of Failure on Demand 612
13.3.2 Approximation Formulas 617
13.3.3 Mean Downtime in a Test Interval 618
13.3.4 Mean Number of Test Intervals Until First Failure 619
13.3.5 Staggered Testing 620
13.3.6 Nonnegligible Repair Time 621
13.4 Safety Unavailability 622
13.4.1 Probability of Critical Situation 623
13.4.2 Spurious Trips 623
13.4.3 Failures Detected by Diagnostic Self-Testing 625
13.5 Common Cause Failures 627
13.5.1 Diagnostic Self-Testing and CCFs 629
13.6 CCFs Between Groups and Subsystems 631
13.6.1 CCFs Between Voted Groups 632
13.6.2 CCFs Between Subsystems 632
13.7 IEC 61508 632
13.7.1 Safety Lifecycle 633
13.7.2 Safety Integrity Level 634
13.7.3 Compliance with IEC 61508 635
13.8 The PDS Method 638
13.9 Markov Approach 639
13.9.1 All Failures are Repaired After Each Test 643
13.9.2 All Critical Failures Are Repaired after Each Test 644
13.9.3 Imperfect Repair after Each Test 644
13.10 Problems 644
References 652
14 Reliability Data Analysis 655
14.1 Introduction 655
14.1.1 Purpose of the Chapter 656
14.2 Some Basic Concepts 656
14.2.1 Datasets 657
14.2.2 Survival Times 658
14.2.3 Categories of Censored Datasets 660
14.2.4 Field Data Collection Exercises 662
14.2.5 At-Risk-Set 663
14.3 Exploratory Data Analysis 663
14.3.1 A Complete Dataset 664
14.3.2 Sample Metrics 665
14.3.3 Histogram 669
14.3.4 Density Plot 670
14.3.5 Empirical Survivor Function 671
14.3.6 Q-Q Plot 673
14.4 Parameter Estimation 674
14.4.1 Estimators and Estimates 675
14.4.2 Properties of Estimators 675
14.4.3 Method of Moments Estimation 677
14.4.4 Maximum Likelihood Estimation 680
14.4.5 Exponentially Distributed Lifetimes 686
14.4.6 Weibull Distributed Lifetimes 692
14.5 The Kaplan-Meier Estimate 696
14.5.1 Motivation for the Kaplan-Meier Estimate Based a Complete Dataset 696
14.5.2 The Kaplan-Meier Estimator for a Censored Dataset 697
14.6 Cumulative Failure Rate Plots 701
14.6.1 The Nelson-Aalen Estimate of the Cumulative Failure Rate 703
14.7 Total-Time-on-Test Plotting 708
14.7.1 Total-Time-on-Test Plot for Complete Datasets 708
14.7.2 Total-Time-on-Test Plot for Censored Datasets 721
14.7.3 A Brief Comparison 722
14.8 Survival Analysis with Covariates 723
14.8.1 Proportional Hazards Model 723
14.8.2 Cox Models 726
14.8.3 Estimating the Parameters of the Cox Model 727
14.9 Problems 730
References 736
15 Bayesian Reliability Analysis 739
15.1 Introduction 739
15.1.1 Three Interpretations of Probability 739
15.1.2 Bayes’ Formula 741
15.2 Bayesian Data Analysis 742
15.2.1 Frequentist Data Analysis 743
15.2.2 Bayesian Data Analysis 743
15.2.3 Model for Observed Data 745
15.2.4 Prior Distribution 745
15.2.5 Observed Data 746
15.2.6 Likelihood Function 746
15.2.7 Posterior Distribution 747
15.3 Selection of Prior Distribution 749
15.3.1 Binomial Model 749
15.3.2 Exponential Model - Single Observation 752
15.3.3 Exponential Model - Multiple Observations 753
15.3.4 Homogeneous Poisson Process 755
15.3.5 Noninformative Prior Distributions 757
15.4 Bayesian Estimation 758
15.4.1 Bayesian Point Estimation 758
15.4.2 Credible Intervals 760
15.5 Predictive Distribution 761
15.6 Models with Multiple Parameters 762
15.7 Bayesian Analysis with R 762
15.8 Problems 764
References 766
16 Reliability Data: Sources and Quality 767
16.1 Introduction 767
16.1.1 Categories of Input Data 767
16.1.2 Parameters Estimates 768
16.2 Generic Reliability Databases 769
16.2.1 OREDA 770
16.2.2 PDS Data Handbook 772
16.2.3 PERD 773
16.2.4 SERH 773
16.2.5 NPRD, EPRD, and FMD 773
16.2.6 GADS 774
16.2.7 GIDEP 774
16.2.8 FMEDA Approach 775
16.2.9 Failure Event Databases 775
16.3 Reliability Prediction 775
16.3.1 MIL-HDBK-217 Approach 776
16.3.2 Similar Methods 778
16.4 Common Cause Failure Data 778
16.4.1 ICDE 779
16.4.2 IEC 61508 Method 779
16.5 Data Analysis and Data Quality 780
16.5.1 Outdated Technology 780
16.5.2 Inventory Data 781
16.5.3 Constant Failure Rates 781
16.5.4 Multiple Samples 783
16.5.5 Data From Manufacturers 785
16.5.6 Questioning the Data Quality 785
16.6 Data Dossier 785
16.6.1 Final Remarks 785
References 787
Appendix A Acronyms 789
Appendix B Laplace Transforms 793
B.1 Important Properties of Laplace Transforms 794
B.2 Laplace Transforms of Some Selected Functions 794
Author Index 797
Subject Index 803