Statistic: A Concise Mathematical Introduction for Students and Scientists offers a one academic term text that prepares the student to broaden their skills in statistics, probability and inference, prior to selecting their follow-on courses in their chosen fields, whether it be engineering, computer science, programming, data sciences, business or economics.
The book places focus early on continuous measurements, as well as discrete random variables. By invoking simple and intuitive models and geometric probability, discrete and continuous experiments and probabilities are discussed throughout the book in a natural way. Classical probability, random variables, and inference are discussed, as well as material on understanding data and topics of special interest.
Topics discussed include:
• Classical equally likely outcomes
• Variety of models of discrete and continuous probability laws
• Likelihood function and ratio
• Inference
• Bayesian statistics
With the growth in the volume of data generated in many disciplines that is enabling the growth in data science, companies now demand statistically literate scientists and this textbook is the answer, suited for undergraduates studying science or engineering, be it computer science, economics, life sciences, environmental, business, amongst many others. Basic knowledge of bivariate calculus, R language, Matematica and JMP is useful, however there is an accompanying website including sample R and Mathematica code to help instructors and students.
Table of Contents
Preface xiii
1 Data Analysis and Understanding 1
1.1 Exploring the Distribution of Data 1
1.1.1 Pearson’s Father-Son Height Data 2
1.1.2 Lord Rayleigh’s Data 3
1.1.3 Discussion 4
1.2 Exploring Prediction Using Data 4
1.2.1 Body and Brain Weights of Land Mammals 5
1.2.2 Space Shuttle Flight 25 5
1.2.3 Pearson’s Father-Son Height Data Revisited 7
1.2.4 Discussion 8
Problems 8
2 Classical Probability 11
2.1 Experiments with Equally Likely Outcomes 11
2.1.1 Simple Outcomes 12
2.1.2 Compound Events and Set Operations 12
2.2 Probability Laws 13
2.2.1 Union and Intersection of Events A and B 13
2.2.1.1 Case (i) 14
2.2.1.2 Cases (ii) and (iii) 14
2.2.1.3 Case (iv) 14
2.2.2 Conditional Probability 14
2.2.2.1 Definition of Conditional Probability 15
2.2.2.2 Conditional Probability With More Than Two Events 15
2.2.3 Independent Events 16
2.2.4 Bayes Theorem 17
2.2.5 Partitions and Total Probability 18
2.3 Counting Methods 19
2.3.1 With Replacement 20
2.3.2 Without Replacement (Permutations) 20
2.3.3 Without Replacement or Order (Combinations) 21
2.3.4 Examples 21
2.3.5 Extended Combinations (Multinomial) 22
2.4 Countable Sets: Implications as n → ∞ 22
2.4.1 Selecting Even or Odd Integers 22
2.4.2 Selecting Rational Versus Irrational Numbers 23
2.5 Kolmogorov’s Axioms 23
2.6 Reliability: Series Versus Parallel Networks 24
2.6.1 Series Network 24
2.6.2 Parallel Network 25
Problems 26
3 Random Variables and Models Derived From Classical Probability and Postulates 27
3.1 Random Variables and Probability Distributions: Discrete Uniform Example 27
3.1.1 Toss of a Single Die 28
3.1.2 Toss of a Pair of Dice 28
3.2 The Univariate Probability Density Function: Continuous Uniform Example 30
3.2.1 Using the PDF to Compute Probabilities 32
3.2.2 Using the PDF to Compute Relative Odds 33
3.3 Summary Statistics: Central and Non-Central Moments 33
3.3.1 Expectation, Average, and Mean 34
3.3.2 Expectation as a Linear Operator 35
3.3.3 The Variance of a Random Variable 35
3.3.4 Standardized Random Variables 36
3.3.5 Higher Order Moments 37
3.3.6 Moment Generating Function 37
3.3.7 Measurement Scales and Units of Measurement 38
3.3.7.1 The Four Measurement Scales 38
3.3.7.2 Units of Measurement 39
3.4 Binomial Experiments 39
3.5 Waiting Time for a Success: Geometric PMF 42
3.6 Waiting Time for r Successes: Negative Binomial 43
3.7 Poisson Process and Distribution 43
3.7.1 Moments of the Poisson PMF 44
3.7.2 Examples 45
3.8 Waiting Time for Poisson Events: Negative Exponential PDF 45
3.9 The Normal Distribution (Also Known as the Gaussian Distribution) 46
3.9.1 Standard Normal Distribution 48
3.9.2 Sums of Independent Normal Random Variables 49
3.9.3 Normal Approximation to the Poisson Distribution 49
Problems 50
4 Bivariate Random Variables, Transformations, and Simulations 51
4.1 Bivariate Continuous Random Variables 51
4.1.1 Joint CDF and PDF Functions 51
4.1.2 Marginal PDF 52
4.1.3 Conditional Probability Density Function 52
4.1.4 Independence of Two Random Variables 54
4.1.5 Expectation, Correlation, and Regression 54
4.1.5.1 Covariance and Correlation 55
4.1.5.2 Regression Function 55
4.1.6 Independence of n Random Variables 56
4.1.7 Bivariate Normal PDF 56
4.1.8 Correlation, Independence, and Confounding Variables 57
4.2 Change of Variables 57
4.2.1 Examples: Two Uniform Transformations 57
4.2.2 One-Dimensional Transformations 58
4.2.2.1 Example 1: Negative exponential PDF 59
4.2.2.2 Example 2: Cauchy PDF 60
4.2.2.3 Example 3: Chi-squared PDF with one degree of freedom 60
4.2.3 Two-Dimensional Transformations 60
4.3 Simulations 62
4.3.1 Generating Uniform Pseudo-Random Numbers 62
4.3.1.1 Reproducibility 62
4.3.1.2 RANDU 62
4.3.2 Probability Integral Transformation 63
4.3.3 Event-driven Simulation 63
Problems 64
5 Approximations and Asymptotics 67
5.1 Why Do We Like Random Samples? 67
5.1.1 When u(X) Takes a Product Form 68
5.1.2 When u(X) Takes a Summation Form 68
5.2 Useful Inequalities 69
5.2.1 Markov’s Inequality 69
5.2.2 Chebyshev’s Inequality 70
5.2.3 Jensen’s Inequality 70
5.2.4 Cauchy-Schwarz Inequality 71
5.3 Sequences of Random Variables 72
5.3.1 Weak Law of Large Numbers 73
5.3.2 Consistency of the Sample Variance 73
5.3.3 Relationships Among the Modes of Convergence 74
5.3.3.1 Proof of Result (5.21) 74
5.3.3.2 Proof of Result (5.22) 74
5.4 Central Limit Theorem 75
5.4.1 Moment Generating Function for Sums 75
5.4.2 Standardizing the Sum Sn 75
5.4.3 Proof of Central Limit Theorem 76
5.5 Delta Method and Variance-stabilizing Transformations 77
Problems 78
6 Parameter Estimation 79
6.1 Desirable Properties of an Estimator 80
6.2 Moments of the Sample Mean and Variance 80
6.2.1 Theoretical Mean and Variance of the Sample Mean 81
6.2.2 Theoretical Mean of the Sample Variance 81
6.2.3 Theoretical Variance of the Sample Variance 82
6.3 Method of Moments (MoM) 82
6.4 Sufficient Statistics and Data Compression 83
6.5 Bayesian Parameter Estimation 85
6.6 Maximum Likelihood Parameter Estimation 86
6.6.1 Relationship to Bayesian Parameter Estimation 87
6.6.2 Poisson MLE Example 87
6.6.3 Normal MLE Example 87
6.6.4 Uniform MLE Example 88
6.7 Information Inequalities and the Cramér-Rao Lower Bound 89
6.7.1 Score Function 89
6.7.2 Asymptotics of the MLE 90
6.7.3 Minimum Variance of Unbiased Estimators 91
6.7.4 Examples 91
Problems 92
7 Hypothesis Testing 93
7.1 Setting up a Hypothesis Test 94
7.1.1 Example of a Critical Region 94
7.1.2 Accuracy and Errors in Hypothesis Testing 95
7.2 Best Critical Region for Simple Hypotheses 96
7.2.1 Simple Example Continued 96
7.2.2 Normal Shift Model with Common Variance 97
7.3 Best Critical Region for a Composite Alternative Hypothesis 98
7.3.1 Negative Exponential Composite Hypothesis Test 99
7.3.1.1 Example 99
7.3.1.2 Alternative Critical Regions 99
7.3.1.3 Mount St. Helens Example 100
7.3.2 Normal Shift Model with Common But Unknown Variance: The T-test 102
7.3.3 The Random Variable Tn-1 102
7.3.3.1 Where We Show X and S2 are Independent 102
7.3.3.2 Where We Show That S2 Scaled is 𝜒2(n - 1) 103
7.3.3.3 Where We Finally Derive the T PDF 103
7.3.4 The One-Sample T-test 104
7.3.5 Example 105
7.3.6 Other T-tests 106
7.3.6.1 Paired T-test 106
7.3.6.2 Two-Sample T-test 107
7.3.6.3 Example Two-Sample T-test: Lord Rayleigh’s Data 107
7.4 Reporting Results: p-values and Power 108
7.4.1 Example When the Null Hypothesis Is Rejected 109
7.4.2 When the Null Hypothesis is Not Rejected 109
7.4.3 The Power Function 110
7.5 Multiple Testing and the Bonferroni Correction 111
Problems 111
8 Confidence Intervals and Other Hypothesis Tests 113
8.1 Confidence Intervals 113
8.1.1 Confidence Interval for 𝜇: Normal Data, 𝜎2 Known 113
8.1.2 Confidence Interval for 𝜇: 𝜎2 Unknown 114
8.1.3 Confidence Intervals and p-values 115
8.2 Hypotheses About the Variance and the F-Distribution 115
8.2.1 The F-Distribution 116
8.2.2 Hypotheses About the Value of the Variance 116
8.2.3 Confidence Interval for the Variance 117
8.2.4 Two-Sided Alternative for Testing 𝜎2 = 𝜎2 0 117
8.3 Pearson’s Chi-Squared Tests 118
8.3.1 The Multinomial PMF 118
8.3.2 Goodness-of-Fit (GoF) Tests 118
8.3.3 Two-Category Binomial Case 119
8.3.4 m-Category Multinomial Case 120
8.3.5 Goodness-of-Fit Test for a Parametric Model 120
8.3.6 Tests for Independence in Contingency Tables 122
8.4 Correlation Coefficient Tests and CIs 123
8.4.1 How to Test if the Correlation 𝜌 = 0 123
8.4.2 Confidence Intervals and Tests for a General Correlation Coefficient 125
8.5 Linear Regression 125
8.5.1 Least Squares Regression 125
8.5.2 Distribution of the Least-Squares Parameters 126
8.5.3 A Confidence Interval for the Slope 127
8.5.4 A Two-side Hypothesis Test for the Slope 128
8.5.5 Predictions at a New Value 128
8.5.6 Population Interval at a New Value 128
8.6 Analysis of Variance 129
Problems 131
9 Topics in Statistics 133
9.1 MSE and Histogram Bin Width Selection 133
9.1.1 MSE Criterion for Biased Estimators 133
9.1.2 Case Study: Optimal Histogram Bin Widths 134
9.1.3 Examples with Normal Data 137
9.1.4 Normal Reference Rules for the Histogram Bin Width 137
9.1.4.1 Scott’s Rule 137
9.1.4.2 Freedman-Diaconis Rule 137
9.1.4.3 Sturges’ Rule 138
9.1.4.4 Comparison of the Three Rules 138
9.2 An Optimal Stopping Time Problem 139
9.3 Compound Random Variables 141
9.3.1 Computing Expectations with Conditioning 141
9.3.2 Sum of a Random Number of Random Variables 142
9.4 Simulation and the Bootstrap 143
9.5 Multiple Linear Regression 144
9.6 Experimental Design 145
9.7 Logistic Regression, Poisson Regression, and the Generalized Linear Model 147
9.8 Robustness 148
9.9 Conclusions 150
Appendices 151
A Notation Used in This Book 151
B Common Distributions 153
C Using R and Mathematica For This Text 154
C.1 R Language - The Very Basics 154
C.2 Mathematica - The Basics 155
Bibliography 157
Index 159