Introduces basic concepts in probability and statistics to data science students, as well as engineers and scientists
Aimed at undergraduate/graduate-level engineering and natural science students, this timely, fully updated edition of a popular book on statistics and probability shows how real-world problems can be solved using statistical concepts. It removes Excel exhibits and replaces them with R software throughout, and updates both MINITAB and JMP software instructions and content. A new chapter discussing data mining - including big data, classification, machine learning, and visualization - is featured. Another new chapter covers cluster analysis methodologies in hierarchical, nonhierarchical, and model based clustering. The book also offers a chapter on Response Surfaces that previously appeared on the book’s companion website.
Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP, Second Edition is broken into two parts. Part I covers topics such as: describing data graphically and numerically, elements of probability, discrete and continuous random variables and their probability distributions, distribution functions of random variables, sampling distributions, estimation of population parameters and hypothesis testing. Part II covers: elements of reliability theory, data mining, cluster analysis, analysis of categorical data, nonparametric tests, simple and multiple linear regression analysis, analysis of variance, factorial designs, response surfaces, and statistical quality control (SQC) including phase I and phase II control charts. The appendices contain statistical tables and charts and answers to selected problems.
- Features two new chapters - one on Data Mining and another on Cluster Analysis
- Now contains R exhibits including code, graphical display, and some results
- MINITAB and JMP have been updated to their latest versions
- Emphasizes the p-value approach and includes related practical interpretations
- Offers a more applied statistical focus, and features modified examples to better exhibit statistical concepts
- Supplemented with an Instructor's-only solutions manual on a book’s companion website
Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP is an excellent text for graduate level data science students, and engineers and scientists. It is also an ideal introduction to applied statistics and probability for undergraduate students in engineering and the natural sciences.
Table of Contents
Preface xvii
Acknowledgments xxi
About The Companion Site xxiii
1 Introduction 1
1.1 Designed Experiment 2
1.1.1 Motivation for the Study 2
1.1.2 Investigation 3
1.1.3 Changing Criteria 3
1.1.4 A Summary of the Various Phases of the Investigation 5
1.2 A Survey 6
1.3 An Observational Study 6
1.4 A Set of Historical Data 7
1.5 A Brief Description of What is Covered in this Book 7
Part I Fundamentals of Probability and Statistics
2 Describing Data Graphically and Numerically 13
2.1 Getting Started with Statistics 14
2.1.1 What is Statistics? 14
2.1.2 Population and Sample in a Statistical Study 14
2.2 Classification of Various Types of Data 18
2.2.1 Nominal Data 18
2.2.2 Ordinal Data 19
2.2.3 Interval Data 19
2.2.4 Ratio Data 19
2.3 Frequency Distribution Tables for Qualitative and Quantitative Data 20
2.3.1 Qualitative Data 21
2.3.2 Quantitative Data 24
2.4 Graphical Description of Qualitative and Quantitative Data 30
2.4.1 Dot Plot 30
2.4.2 Pie Chart 31
2.4.3 Bar Chart 33
2.4.4 Histograms 37
2.4.5 Line Graph 44
2.4.6 Stem-and-Leaf Plot 45
2.5 Numerical Measures of Quantitative Data 50
2.5.1 Measures of Centrality 51
2.5.2 Measures of Dispersion 56
2.6 Numerical Measures of Grouped Data 67
2.6.1 Mean of a Grouped Data 67
2.6.2 Median of a Grouped Data 68
2.6.3 Mode of a Grouped Data 69
2.6.4 Variance of a Grouped Data 69
2.7 Measures of Relative Position 70
2.7.1 Percentiles 71
2.7.2 Quartiles 72
2.7.3 Interquartile Range (IQR) 72
2.7.4 Coefficient of Variation 73
2.8 Box-Whisker Plot 75
2.8.1 Construction of a Box Plot 75
2.8.2 How to Use the Box Plot 76
2.9 Measures of Association 80
2.10 Case Studies 84
2.10.1 About St. Luke’s Hospital 85
2.11 Using JMP 86
Review Practice Problems 87
3 Elements of Probability 97
3.1 Introduction 97
3.2 Random Experiments, Sample Spaces, and Events 98
3.2.1 Random Experiments and Sample Spaces 98
3.2.2 Events 99
3.3 Concepts of Probability 103
3.4 Techniques of Counting Sample Points 108
3.4.1 Tree Diagram 108
3.4.2 Permutations 110
3.4.3 Combinations 110
3.4.4 Arrangements of n Objects Involving Several Kinds of Objects 111
3.5 Conditional Probability 113
3.6 Bayes’s Theorem 116
3.7 Introducing Random Variables 120
Review Practice Problems 122
4 Discrete Random Variables and Some Important Discrete Probability Distributions 128
4.1 Graphical Descriptions of Discrete Distributions 129
4.2 Mean and Variance of a Discrete Random Variable 130
4.2.1 Expected Value of Discrete Random Variables and Their Functions 130
4.2.2 The Moment-Generating Function-Expected Value of a Special Function of X 133
4.3 The Discrete Uniform Distribution 136
4.4 The Hypergeometric Distribution 137
4.5 The Bernoulli Distribution 141
4.6 The Binomial Distribution 142
4.7 The Multinomial Distribution 146
4.8 The Poisson Distribution 147
4.8.1 Definition and Properties of the Poisson Distribution 147
4.8.2 Poisson Process 148
4.8.3 Poisson Distribution as a Limiting Form of the Binomial 148
4.9 The Negative Binomial Distribution 153
4.10 Some Derivations and Proofs (Optional) 156
4.11 A Case Study 156
4.12 Using JMP 157
Review Practice Problems 157
5 Continuous Random Variables and Some Important Continuous Probability Distributions 164
5.1 Continuous Random Variables 165
5.2 Mean and Variance of Continuous Random Variables 168
5.2.1 Expected Value of Continuous Random Variables and Their Functions 168
5.2.2 The Moment-Generating Function and Expected Value of a Special Function of X 171
5.3 Chebyshev’s Inequality 173
5.4 The Uniform Distribution 175
5.4.1 Definition and Properties 175
5.4.2 Mean and Standard Deviation of the Uniform Distribution 178
5.5 The Normal Distribution 180
5.5.1 Definition and Properties 180
5.5.2 The Standard Normal Distribution 182
5.5.3 The Moment-Generating Function of the Normal Distribution 187
5.6 Distribution of Linear Combination of Independent Normal Variables 189
5.7 Approximation of the Binomial and Poisson Distributions by the Normal Distribution 193
5.7.1 Approximation of the Binomial Distribution by the Normal Distribution 193
5.7.2 Approximation of the Poisson Distribution by the Normal Distribution 196
5.8 A Test of Normality 196
5.9 Probability Models Commonly used in Reliability Theory 201
5.9.1 The Lognormal Distribution 202
5.9.2 The Exponential Distribution 206
5.9.3 The Gamma Distribution 211
5.9.4 The Weibull Distribution 214
5.10 A Case Study 218
5.11 Using JMP 219
Review Practice Problems 220
6 Distribution of Functions Of Random Variables 228
6.1 Introduction 229
6.2 Distribution Functions of Two Random Variables 229
6.2.1 Case of Two Discrete Random Variables 229
6.2.2 Case of Two Continuous Random Variables 232
6.2.3 The Mean Value and Variance of Functions of Two Random Variables 233
6.2.4 Conditional Distributions 235
6.2.5 Correlation between Two Random Variables 238
6.2.6 Bivariate Normal Distribution 241
6.3 Extension to Several Random Variables 244
6.4 The Moment-Generating Function Revisited 245
Review Practice Problems 249
7 Sampling Distributions 253
7.1 Random Sampling 253
7.1.1 Random Sampling from an Infinite Population 254
7.1.2 Random Sampling from a Finite Population 256
7.2 The Sampling Distribution of the Sample Mean 258
7.2.1 Normal Sampled Population 258
7.2.2 Nonnormal Sampled Population 258
7.2.3 The Central Limit Theorem 259
7.3 Sampling from a Normal Population 264
7.3.1 The Chi-Square Distribution 264
7.3.2 The Student t-Distribution 271
7.3.3 Snedecor’s F-Distribution 276
7.4 Order Statistics 279
7.4.1 Distribution of the Largest Element in a Sample 280
7.4.2 Distribution of the Smallest Element in a Sample 281
7.4.3 Distribution of the Median of a Sample and of the kth Order Statistic 282
7.4.4 Other Uses of Order Statistics 284
7.5 Using JMP 286
Review Practice Problems 286
8 Estimation of Population Parameters 289
8.1 Introduction 290
8.2 Point Estimators for the Population Mean and Variance 290
8.2.1 Properties of Point Estimators 292
8.2.2 Methods of Finding Point Estimators 295
8.3 Interval Estimators for the Mean μ of a Normal Population 301
8.3.1 σ2 Known 301
8.3.2 σ2 Unknown 304
8.3.3 Sample Size is Large 306
8.4 Interval Estimators for The Difference of Means of Two Normal Populations 313
8.4.1 Variances are Known 313
8.4.2 Variances are Unknown 314
8.5 Interval Estimators for the Variance of a Normal Population 322
8.6 Interval Estimator for the Ratio of Variances of Two Normal Populations 327
8.7 Point and Interval Estimators for the Parameters of Binomial Populations 331
8.7.1 One Binomial Population 331
8.7.2 Two Binomial Populations 334
8.8 Determination of Sample Size 338
8.8.1 One Population Mean 339
8.8.2 Difference of Two Population Means 339
8.8.3 One Population Proportion 340
8.8.4 Difference of Two Population Proportions 341
8.9 Some Supplemental Information 343
8.10 A Case Study 343
8.11 Using JMP 343
Review Practice Problems 344
9 Hypothesis Testing 352
9.1 Introduction 353
9.2 Basic Concepts of Testing a Statistical Hypothesis 353
9.2.1 Hypothesis Formulation 353
9.2.2 Risk Assessment 355
9.3 Tests Concerning the Mean of a Normal Population Having Known Variance 358
9.3.1 Case of a One-Tail (Left-Sided) Test 358
9.3.2 Case of a One-Tail (Right-Sided) Test 362
9.3.3 Case of a Two-Tail Test 363
9.4 Tests Concerning the Mean of a Normal Population Having Unknown Variance 372
9.4.1 Case of a Left-Tail Test 372
9.4.2 Case of a Right-Tail Test 373
9.4.3 The Two-Tail Case 374
9.5 Large Sample Theory 378
9.6 Tests Concerning the Difference of Means of Two Populations Having Distributions with Known Variances 380
9.6.1 The Left-Tail Test 380
9.6.2 The Right-Tail Test 381
9.6.3 The Two-Tail Test 383
9.7 Tests Concerning the Difference of Means of Two Populations Having Normal Distributions with Unknown Variances 388
9.7.1 Two Population Variances are Equal 388
9.7.2 Two Population Variances are Unequal 392
9.7.3 The Paired t-Test 395
9.8 Testing Population Proportions 401
9.8.1 Test Concerning One Population Proportion 401
9.8.2 Test Concerning the Difference Between Two Population Proportions 405
9.9 Tests Concerning the Variance of a Normal Population 410
9.10 Tests Concerning the Ratio of Variances of Two Normal Populations 414
9.11 Testing of Statistical Hypotheses using Confidence Intervals 418
9.12 Sequential Tests of Hypotheses 422
9.12.1 A One-Tail Sequential Testing Procedure 422
9.12.2 A Two-Tail Sequential Testing Procedure 427
9.13 Case Studies 430
9.14 Using JMP 431
Review Practice Problems 431
Part II Statistics in Actions
10 Elements of Reliability Theory 445
10.1 The Reliability Function 446
10.1.1 The Hazard Rate Function 446
10.1.2 Employing the Hazard Function 455
10.2 Estimation: Exponential Distribution 457
10.3 Hypothesis Testing: Exponential Distribution 465
10.4 Estimation: Weibull Distribution 467
10.5 Case Studies 472
10.6 Using JMP 474
Review Practice Problems 474
11 On Data Mining 476
11.1 Introduction 476
11.2 What is Data Mining? 477
11.2.1 Big Data 477
11.3 Data Reduction 478
11.4 Data Visualization 481
11.5 Data Preparation 490
11.5.1 Missing Data 490
11.5.2 Outlier Detection and Remedial Measures 491
11.6 Classification 492
11.6.1 Evaluating a Classification Model 493
11.7 Decision Trees 499
11.7.1 Classification and Regression Trees (CART) 500
11.7.2 Further Reading 511
11.8 Case Studies 511
11.9 Using JMP 512
Review Practice Problems 512
12 Cluster Analysis 518
12.1 Introduction 518
12.2 Similarity Measures 519
12.2.1 Common Similarity Coefficients 524
12.3 Hierarchical Clustering Methods 525
12.3.1 Single Linkage 526
12.3.2 Complete Linkage 531
12.3.3 Average Linkage 534
12.3.4 Ward’s Hierarchical Clustering 536
12.4 Nonhierarchical Clustering Methods 538
12.4.1 K-Means Method 538
12.5 Density-Based Clustering 544
12.6 Model-Based Clustering 547
12.7 A Case Study 552
12.8 Using JMP 553
Review Practice Problems 553
13 Analysis of Categorical Data 558
13.1 Introduction 558
13.2 The Chi-Square Goodness-of-Fit Test 559
13.3 Contingency Tables 568
13.3.1 The 2 × 2 Case with Known Parameters 568
13.3.2 The 2 × 2 Case with Unknown Parameters 570
13.3.3 The r × s Contingency Table 572
13.4 Chi-Square Test for Homogeneity 577
13.5 Comments on the Distribution of the Lack-of-Fit Statistics 581
13.6 Case Studies 583
13.7 Using JMP 584
Review Practice Problems 585
14 Nonparametric Tests 591
14.1 Introduction 591
14.2 The Sign Test 592
14.2.1 One-Sample Test 592
14.2.2 The Wilcoxon Signed-Rank Test 595
14.2.3 Two-Sample Test 598
14.3 Mann-Whitney (Wilcoxon) W Test for Two Samples 604
14.4 Runs Test 608
14.4.1 Runs above and below the Median 608
14.4.2 The Wald-Wolfowitz Run Test 611
14.5 Spearman Rank Correlation 614
14.6 Using JMP 618
Review Practice Problems 618
15 Simple Linear Regression Analysis 622
15.1 Introduction 623
15.2 Fitting the Simple Linear Regression Model 624
15.2.1 Simple Linear Regression Model 624
15.2.2 Fitting a Straight Line by Least Squares 627
15.2.3 Sampling Distribution of the Estimators of Regression Coefficients 631
15.3 Unbiased Estimator of σ2 637
15.4 Further Inferences Concerning Regression Coefficients (β0, β1), E(Y ), and Y 639
15.4.1 Confidence Interval for β1 with Confidence Coefficient (1 - α) 639
15.4.2 Confidence Interval for β0 with Confidence Coefficient (1 - α) 640
15.4.3 Confidence Interval for E(Y |X) with Confidence Coefficient (1 - α) 642
15.4.4 Prediction Interval for a Future Observation Y with Confidence Coefficient (1 - α) 645
15.5 Tests of Hypotheses for β0 and β1 652
15.5.1 Test of Hypotheses for β1 652
15.5.2 Test of Hypotheses for β0 652
15.6 Analysis of Variance Approach to Simple Linear Regression Analysis 659
15.7 Residual Analysis 665
15.8 Transformations 674
15.9 Inference About ρ 681
15.10A Case Study 683
15.11 Using JMP 684
Review Practice Problems 684
16 Multiple Linear Regression Analysis 693
16.1 Introduction 694
16.2 Multiple Linear Regression Models 694
16.3 Estimation of Regression Coefficients 699
16.3.1 Estimation of Regression Coefficients Using Matrix Notation 701
16.3.2 Properties of the Least-Squares Estimators 703
16.3.3 The Analysis of Variance Table 704
16.3.4 More Inferences about Regression Coefficients 706
16.4 Multiple Linear Regression Model Using Quantitative and Qualitative Predictor Variables 714
16.4.1 Single Qualitative Variable with Two Categories 714
16.4.2 Single Qualitative Variable with Three or More Categories 716
16.5 Standardized Regression Coefficients 726
16.5.1 Multicollinearity 728
16.5.2 Consequences of Multicollinearity 729
16.6 Building Regression Type Prediction Models 730
16.6.1 First Variable to Enter into the Model 730
16.7 Residual Analysis and Certain Criteria for Model Selection 734
16.7.1 Residual Analysis 734
16.7.2 Certain Criteria for Model Selection 735
16.8 Logistic Regression 740
16.9 Case Studies 745
16.10 Using JMP 748
Review Practice Problems 748
17 Analysis of Variance 757
17.1 Introduction 758
17.2 The Design Models 758
17.2.1 Estimable Parameters 758
17.2.2 Estimable Functions 760
17.3 One-Way Experimental Layouts 761
17.3.1 The Model and Its Analysis 761
17.3.2 Confidence Intervals for Treatment Means 767
17.3.3 Multiple Comparisons 773
17.3.4 Determination of Sample Size 780
17.3.5 The Kruskal-Wallis Test for One-Way Layouts (Nonparametric Method) 781
17.4 Randomized Complete Block (RCB) Designs 785
17.4.1 The Friedman Fr-Test for Randomized Complete Block Design (Nonparametric Method) 792
17.4.2 Experiments with One Missing Observation in an RCB-Design Experiment 794
17.4.3 Experiments with Several Missing Observations in an RCB-Design Experiment 795
17.5 Two-Way Experimental Layouts 798
17.5.1 Two-Way Experimental Layouts with One Observation per Cell 800
17.5.2 Two-Way Experimental Layouts with r > 1 Observations per Cell 801
17.5.3 Blocking in Two-Way Experimental Layouts 810
17.5.4 Extending Two-Way Experimental Designs to n-Way Experimental Layouts 811
17.6 Latin Square Designs 813
17.7 Random-Effects and Mixed-Effects Models 820
17.7.1 Random-Effects Model 820
17.7.2 Mixed-Effects Model 822
17.7.3 Nested (Hierarchical) Designs 824
17.8 A Case Study 831
17.9 Using JMP 832
Review Practice Problems 832
18 The 2k Factorial Designs 847
18.1 Introduction 848
18.2 The Factorial Designs 848
18.3 The 2k Factorial Designs 850
18.4 Unreplicated 2k Factorial Designs 859
18.5 Blocking in the 2k Factorial Design 867
18.5.1 Confounding in the 2k Factorial Design 867
18.5.2 Yates’s Algorithm for the 2k Factorial Designs 875
18.6 The 2k Fractional Factorial Designs 877
18.6.1 One-half Replicate of a 2k Factorial Design 877
18.6.2 One-quarter Replicate of a 2k Factorial Design 882
18.7 Case Studies 887
18.8 Using JMP 889
Review Practice Problems 889
19 Response Surfaces 897
19.1 Introduction 897
19.1.1 Basic Concepts of Response Surface Methodology 898
19.2 First-Order Designs 903
19.3 Second-Order Designs 917
19.3.1 Central Composite Designs (CCDs) 918
19.3.2 Some Other First-Order and Second-Order Designs 928
19.4 Determination of Optimum or Near-Optimum Point 936
19.4.1 The Method of Steepest Ascent 937
19.4.2 Analysis of a Fitted Second-Order Response Surface 941
19.5 Anova Table for a Second-Order Model 946
19.6 Case Studies 948
19.7 Using JMP 950
Review Practice Problems 950
20 Statistical Quality Control - Phase I Control Charts 958
21 Statistical Quality Control - Phase II Control Charts 960
Appendices 961
Appendix A Statistical Tables 962
Appendix B Answers to Selected Problems 969
Appendix C Bibliography 992
Index 1003