An up-to-date exploration of foundational concepts in statistics and probability for medical students and researchers
Medical journals and researchers are increasingly recognizing the need for improved statistical rigor in medical science. In Applied Medical Statistics, renowned statistician and researcher Dr. Jingmei Jiang delivers a clear, coherent, and accessible introduction to basic statistical concepts, ideal for medical students and medical research practitioners. The book will help readers master foundational concepts in statistical analysis and assist in the development of a critical understanding of the basic rationale of statistical analysis techniques.
The distinguished author presents information without assuming the reader has a background in specialized mathematics, statistics, or probability. All of the described methods are illustrated with up-to-date examples based on real-world medical research, supplemented by exercises and case discussions to help solidify the concepts and give readers an opportunity to critically evaluate different research scenarios.
Readers will also benefit from the inclusion of: - A thorough introduction to basic concepts in statistics, including foundational terms and definitions, location and spread of data distributions, population parameters estimation, and statistical hypothesis tests - Explorations of commonly used statistical methods, including t-tests,analysis of variance, and linear regression - Discussions of advanced analysis topics, including multiple linear regression and correlation, logistic regression, and survival analysis - Substantive exercises and case discussions at the end of each chapter
Perfect for postgraduate medical students, clinicians, and medical and biomedical researchers, Applied Medical Statistics will also earn a place on the shelf of any researcher with an interest in biostatistics or applying statistical methods to their own field of research.
Table of Contents
Preface xiii
Acknowledgments xv
About the Companion Website xvii
1 What is Biostatistics 1
1.1 Overview 1
1.2 Some Statistical Terminology 2
1.2.1 Population and Sample 2
1.2.2 Homogeneity and Variation 3
1.2.3 Parameter and Statistic 4
1.2.4 Types of Data 4
1.2.5 Error 5
1.3 Workflow of Applied Statistics 6
1.4 Statistics and Its Related Disciplines 6
1.5 Statistical Thinking 7
1.6 Summary 7
1.7 Exercises 8
2 Descriptive Statistics 11
2.1 Frequency Tables and Graphs 12
2.1.1 Frequency Distribution of Numerical Data 12
2.1.2 Frequency Distribution of Categorical Data 16
2.2 Descriptive Statistics of Numerical Data 17
2.2.1 Measures of Central Tendency 17
2.2.2 Measures of Dispersion 26
2.3 Descriptive Statistics of Categorical Data 31
2.3.1 Relative Numbers 31
2.3.2 Standardization of Rates 34
2.4 Constructing Statistical Tables and Graphs 38
2.4.1 Statistical Tables 38
2.4.2 Statistical Graphs 40
2.5 Summary 47
2.6 Exercises 48
3 Fundamentals of Probability 53
3.1 Sample Space and Random Events 54
3.1.1 Definitions of Sample Space and Random Events 54
3.1.2 Operation of Events 55
3.2 Relative Frequency and Probability 58
3.2.1 Definition of Probability 59
3.2.2 Basic Properties of Probability 59
3.3 Conditional Probability and Independence of Events 60
3.3.1 Conditional Probability 60
3.3.2 Independence of Events 60
3.4 Multiplication Law of Probability 61
3.5 Addition Law of Probability 62
3.5.1 General Addition Law 62
3.5.2 Addition Law of Mutually Exclusive Events 62
3.6 Total Probability Formula and Bayes’ Rule 63
3.6.1 Total Probability Formula 63
3.6.2 Bayes’ Rule 64
3.7 Summary 65
3.8 Exercises 65
4 Discrete Random Variable 69
4.1 Concept of the Random Variable 69
4.2 Probability Distribution of the Discrete Random Variable 70
4.2.1 Probability Mass Function 70
4.2.2 Cumulative Distribution Function 71
4.2.3 Association Between the Probability Distribution and Relative Frequency Distribution 72
4.3 Numerical Characteristics 73
4.3.1 Expected Value 73
4.3.2 Variance and Standard Deviation 74
4.4 Commonly Used Discrete Probability Distributions 75
4.4.1 Binomial Distribution 75
4.4.2 Multinomial Distribution 80
4.4.3 Poisson Distribution 82
4.5 Summary 87
4.6 Exercises 87
5 Continuous Random Variable 91
5.1 Concept of Continuous Random Variable 92
5.2 Numerical Characteristics 93
5.3 Normal Distribution 94
5.3.1 Concept of the Normal Distribution 94
5.3.2 Standard Normal Distribution 96
5.3.3 Descriptive Methods for Assessing Normality 99
5.4 Application of the Normal Distribution 102
5.4.1 Normal Approximation to the Binomial Distribution 102
5.4.2 Normal Approximation to the Poisson Distribution 105
5.4.3 Determining the Medical Reference Interval 108
5.5 Summary 109
5.6 Exercises 110
6 Sampling Distribution and Parameter Estimation 113
6.1 Samples and Statistics 114
6.2 Sampling Distribution of a Statistic 114
6.2.1 Sampling Distribution of the Mean 115
6.2.2 Sampling Distribution of the Variance 120
6.2.3 Sampling Distribution of the Rate (Normal Approximation) 122
6.3 Estimation of One Population Parameter 124
6.3.1 Point Estimation and Its Quality Evaluation 124
6.3.2 Interval Estimation for the Mean 126
6.3.3 Interval Estimation for the Variance 130
6.3.4 Interval Estimation for the Rate (Normal Approximation Method) 131
6.4 Estimation of Two Population Parameters 132
6.4.1 Estimation of the Difference in Means 132
6.4.2 Estimation of the Ratio of Variances 136
6.4.3 Estimation of the Difference Between Rates (Normal Approximation Method) 139
6.5 Summary 141
6.6 Exercises 141
7 Hypothesis Testing for One Parameter 145
7.1 Overview 145
7.1.1 Concepts and Procedures 146
7.1.2 Type I and Type II Errors 150
7.1.3 One-sided and Two-sided Hypothesis 152
7.1.4 Association Between Hypothesis Testing and Interval Estimation 153
7.2 Hypothesis Testing for One Parameter 155
7.2.1 Hypothesis Tests for the Mean 155
7.2.1.1 Power of the Test 156
7.2.1.2 Sample Size Determination 160
7.2.2 Hypothesis Tests for the Rate (Normal Approximation Methods) 162
7.2.2.1 Power of the Test 163
7.2.2.2 Sample Size Determination 164
7.3 Further Considerations on Hypothesis Testing 164
7.3.1 About the Significance Level 164
7.3.2 Statistical Significance and Clinical Significance 165
7.4 Summary 165
7.5 Exercises 166
8 Hypothesis Testing for Two Population Parameters 169
8.1 Testing the Difference Between Two Population Means: Paired Samples 170
8.2 Testing the Difference Between Two Population Means: Independent Samples 173
8.2.1 t-Test for Means with Equal Variances 173
8.2.2 F-Test for the Equality of Two Variances 176
8.2.3 Approximation t-Test for Means with Unequal Variances 178
8.2.4 Z-Test for Means with Large-Sample Sizes 181
8.2.5 Power for Comparing Two Means 182
8.2.6 Sample Size Determination 183
8.3 Testing the Difference Between Two Population Rates (Normal Approximation Method) 185
8.3.1 Power for Comparing Two Rates 186
8.3.2 Sample Size Determination 187
8.4 Summary 188
8.5 Exercises 189
9 One-way Analysis of Variance 193
9.1 Overview 193
9.1.1 Concept of ANOVA 194
9.1.2 Data Layout and Modeling Assumption 195
9.2 Procedures of ANOVA 196
9.3 Multiple Comparisons of Means 204
9.3.1 Tukey’s Test 204
9.3.2 Dunnett’s Test 206
9.3.3 Least Significant Difference (LSD) Test 209
9.4 Checking ANOVA Assumptions 211
9.4.1 Check for Normality 211
9.4.2 Test for Homogeneity of Variances 213
9.4.2.1 Bartlett’s Test 213
9.4.2.2 Levene’s Test 215
9.5 Data Transformations 217
9.6 Summary 218
9.7 Exercises 218
10 Analysis of Variance in Different Experimental Designs 221
10.1 ANOVA for Randomized Block Design 221
10.1.1 Data Layout and Model Assumptions 223
10.1.2 Procedure of ANOVA 224
10.2 ANOVA for Two-factor Factorial Design 229
10.2.1 Concept of Factorial Design 230
10.2.2 Data Layout and Model Assumptions 233
10.2.3 Procedure of ANOVA 234
10.3 ANOVA for Repeated Measures Design 240
10.3.1 Characteristics of Repeated Measures Data 240
10.3.2 Data Layout and Model Assumptions 242
10.3.3 Procedure of ANOVA 243
10.3.4 Sphericity Test of Covariance Matrix 245
10.3.5 Multiple Comparisons of Means 248
10.4 ANOVA for 2 × 2 Crossover Design 251
10.4.1 Concept of a 2 × 2 Crossover Design 251
10.4.2 Data Layout and Model Assumptions 252
10.4.3 Procedure of ANOVA 254
10.5 Summary 256
10.6 Exercises 257
11 χ2 Test 261
11.1 Contingency Table 262
11.1.1 General Form of Contingency Table 263
11.1.2 Independence of Two Categorical Variables 264
11.1.3 Significance Testing Using the Contingency Table 265
11.2 χ2 Test for a 2 × 2 Contingency Table 266
11.2.1 Test of Independence 266
11.2.2 Yates’ Corrected χ2 test for a 2 × 2 Contingency Table 269
11.2.3 Paired Samples Design χ2 Test 269
11.2.4 Fisher’s Exact Tests for Completely Randomized Design 272
11.2.5 Exact McNemar’s Test for Paired Samples Design 275
11.3 χ2 Test for R × C Contingency Tables 276
11.3.1 Comparison of Multiple Independent Proportions 276
11.3.2 Multiple Comparisons of Proportions 278
11.4 χ2 Goodness-of-Fit Test 280
11.4.1 Normal Distribution Goodness-of-Fit Test 281
11.4.2 Poisson Distribution Goodness-of-Fit Test 283
11.5 Summary 284
11.6 Exercises 285
12 Nonparametric Tests Based on Rank 289
12.1 Concept of Order Statistics 289
12.2 Wilcoxon’s Signed-Rank Test for Paired Samples 290
12.3 Wilcoxon’s Rank-Sum Test for Two Independent Samples 295
12.4 Kruskal-Wallis Test for Multiple Independent Samples 299
12.4.1 Kruskal-Wallis Test 299
12.4.2 Multiple Comparisons 301
12.5 Friedman’s Test for Randomized Block Design 303
12.6 Further Considerations About Nonparametric Tests 306
12.7 Summary 306
12.8 Exercises 306
13 Simple Linear Regression 311
13.1 Concept of Simple Linear Regression 311
13.2 Establishment of Regression Model 314
13.2.1 Least Squares Estimation of a Regression Coefficient 314
13.2.2 Basic Properties of the Regression Model 316
13.2.3 Hypothesis Testing of Regression Model 317
13.3 Application of Regression Model 321
13.3.1 Confidence Interval Estimation of a Regression Coefficient 321
13.3.2 Confidence Band Estimation of Regression Model 322
13.3.3 Prediction Band Estimation of Individual Response Values 323
13.4 Evaluation of Model Fitting 325
13.4.1 Coefficient of Determination 325
13.4.2 Residual Analysis 326
13.5 Summary 327
13.6 Exercises 328
14 Simple Linear Correlation 331
14.1 Concept of Simple Linear Correlation 331
14.1.1 Definition of Correlation Coefficient 331
14.1.2 Interpretation of Correlation Coefficient 334
14.2 Hypothesis Testing of Correlation Coefficient 336
14.3 Confidence Interval Estimation for Correlation Coefficient 338
14.4 Spearman’s Rank Correlation 340
14.4.1 Concept of Spearman’s Rank Correlation Coefficient 340
14.4.2 Hypothesis Testing of Spearman’s Rank Correlation Coefficient 342
14.5 Summary 342
14.6 Exercises 343
15 Multiple Linear Regression 345
15.1 Multiple Linear Regression Model 346
15.1.1 Concept of the Multiple Linear Regression 346
15.1.2 Least Squares Estimation of Regression Coefficient 349
15.1.3 Properties of the Least Squares Estimators 351
15.1.4 Standardized Partial-Regression Coefficient 351
15.2 Hypothesis Testing 352
15.2.1 F-Test for Overall Regression Model 352
15.2.2 t-Test for Partial-Regression Coefficients 354
15.3 Evaluation of Model Fitting 356
15.3.1 Coefficient of Determination and Adjusted Coefficient of Determination 356
15.3.2 Residual Analysis and Outliers 357
15.4 Other Aspects of Regression 359
15.4.1 Multicollinearity 359
15.4.2 Selection of Independent Variables 361
15.4.3 Sample Size 364
15.5 Summary 364
15.6 Exercises 364
16 Logistic Regression 369
16.1 Logistic Regression Model 370
16.1.1 Linear Probability Model 371
16.1.2 Probability, Odds, and Logit Transformation 371
16.1.3 Definition of Logistic Regression 373
16.1.4 Inference for Logistic Regression 375
16.1.4.1 Estimation of Model Coefficient 375
16.1.4.2 Interpretation of Model Coefficient 378
16.1.4.3 Hypothesis Testing of Model Coefficient 380
16.1.4.4 Interval Estimation of Model Coefficient 382
16.1.5 Evaluation of Model Fitting 385
16.2 Conditional Logistic Regression Model 388
16.2.1 Characteristics of Conditional Logistic Regression Model 390
16.2.2 Estimation of Regression Coefficient 390
16.2.3 Hypothesis Testing of Regression Coefficient 393
16.3 Additional Remarks 394
16.3.1 Sample Size 394
16.3.2 Types of Independent Variables 394
16.3.3 Selection of Independent Variables 395
16.3.4 Missing Data 395
16.4 Summary 395
16.5 Exercises 396
17 Survival Analysis 399
17.1 Overview 400
17.1.1 Concept of Survival Analysis 400
17.1.2 Basic Functions of Survival Time 402
17.2 Description of the Survival Process 405
17.2.1 Product Limit Method 405
17.2.2 Life Table Method 408
17.3 Comparison of Survival Processes 410
17.3.1 Log-Rank Test 410
17.3.2 Other Methods for Comparing Survival Processes 413
17.4 Cox’s Proportional Hazards Model 414
17.4.1 Concept and Model Assumptions 415
17.4.2 Estimation of Model Coefficient 417
17.4.3 Hypothesis Testing of Model Coefficient 419
17.4.4 Evaluation of Model Fitting 420
17.5 Other Aspects of Cox’s Proportional Hazard Model 421
17.5.1 Hazard Index 421
17.5.2 Sample Size 421
17.6 Summary 422
17.7 Exercises 423
18 Evaluation of Diagnostic Tests 431
18.1 Basic Characteristics of Diagnostic Tests 431
18.1.1 Sensitivity and Specificity 433
18.1.2 Composite Measures of Sensitivity and Specificity 435
18.1.3 Predictive Values 438
18.1.4 Sensitivity and Specificity Comparison of Two Diagnostic Tests 440
18.2 Agreement Between Diagnostic Tests 443
18.2.1 Agreement of Categorical Data 444
18.2.2 Agreement of Numerical Data 447
18.3 Receiver Operating Characteristic Curve Analysis 448
18.3.1 Concept of an ROC Curve 449
18.3.2 Area Under the ROC Curve 450
18.3.3 Comparison of Areas Under ROC Curves 453
18.4 Summary 456
18.5 Exercises 457
19 Observational Study Design 461
19.1 Cross-Sectional Studies 462
19.1.1 Types of Cross-Sectional Studies 462
19.1.2 Probability Sampling Methods 462
19.1.3 Sample Size for Surveys 466
19.1.4 Cross-Sectional Studies for Clues of Etiology 468
19.2 Cohort Studies 469
19.2.1 Measures of Association in Cohort Studies 469
19.2.2 Sample Size for Cohort Studies 470
19.3 Case-Control Studies 472
19.3.1 Measures of Association in Case-Control Studies 472
19.3.2 Sample Size for Case-Control Studies 473
19.4 Summary 474
19.5 Exercises 475
20 Experimental Study Design 477
20.1 Overview 478
20.1.1 Basic Components of an Experimental Study 478
20.1.2 Principles of Experimental Study Design 480
20.1.3 Blinding Procedures in Clinical Trials 482
20.2 Completely Randomized Design 483
20.2.1 Concept of Completely Randomized Design 483
20.2.2 Sample Size for Completely Randomized Design 485
20.3 Randomized Block Design 486
20.3.1 Concepts of Randomized Block Design 486
20.3.2 Sample Size for Randomized Block Design 488
20.4 Factorial Design 489
20.5 Crossover Design 491
20.5.1 Concepts of Crossover Design 491
20.5.2 Sample Size for 2 × 2 Crossover Design 492
20.6 Summary 493
20.7 Exercises 493
Appendix 495
References 549
Index 557