This book is intended to complement first-year statistics and biostatistics textbooks. The main focus here is on ideas, rather than on methodological details. Basic concepts are illustrated with representations from history, followed by technical discussions on what different statistical methods really mean. Graphics are used extensively throughout the book in order to introduce mathematical formulae in an accessible way.
Key features:
- Discusses confidence intervals and p-values in terms of confidence functions.
- Explains basic statistical methodology represented in terms of graphics rather than mathematical formulae, whilst highlighting the mathematical basis of biostatistics.
- Looks at problems of estimating parameters in statistical models and looks at the similarities between different models.
- Provides an extensive discussion on the position of statistics within the medical scientific process.
- Discusses distribution functions, including the Guassian distribution and its importance in biostatistics.
This book will be useful for biostatisticians with little mathematical background as well as those who want to understand the connections in biostatistics and mathematical issues.
Table of Contents
Preface ix
1 Statistics and medical science 1
1.1 Introduction 1
1.2 On the nature of science 3
1.3 How the scientific method uses statistics 5
1.4 Finding an outcome variable to assess your hypothesis 7
1.5 How we draw medical conclusions from statistical results 8
1.6 A few words about probabilities 13
1.7 The need for honesty: the multiplicity issue 16
1.8 Prespecification and p-value history 19
1.9 Adaptive designs: controlling the risks in an experiment 21
1.10 The elusive concept of probability 23
1.11 Comments and further reading 26
References 27
2 Observational studies and the need for clinical trials 29
2.1 Introduction 29
2.2 Investigations of medical interventions and risk factors 29
2.3 Observational studies and confounders 33
2.4 The experimental study 39
2.5 Population risks and individual risks 42
2.6 Confounders, Simpson’s paradox and stratification 44
2.7 On incidence and prevalence in epidemiology 51
2.8 Comments and further reading 53
References 54
3 Study design and the bias issue 57
3.1 Introduction 57
3.2 What bias is all about 58
3.3 The need for a representative sample: on selection bias 58
3.4 Group comparability and randomization 61
3.5 Information bias in a cohort study 65
3.6 The study, or placebo, effect 68
3.7 The curse of missing values 70
3.8 Approaches to data analysis: avoiding self-inflicted bias 75
3.9 On meta-analysis and publication bias 79
3.10 Comments and further reading 81
References 82
4 The anatomy of a statistical test 85
4.1 Introduction 85
4.2 Statistical tests, medical diagnosis and Roman law 85
4.3 The risks with medical diagnosis 87
4.3.1 Medical diagnosis based on a single test 87
4.3.2 Bayes’ theorem and the use and misuse of screening tests 89
4.4 The law: a non-quantitative analogue 91
4.5 Risks in statistical testing 93
4.5.1 Does tonsillectomy increase the risk of Hodgkin’s lymphoma? 93
4.5.2 General discussion about statistical tests 98
4.6 Making statements about a binomial parameter 101
4.6.1 The frequentist approach 101
4.6.2 The Bayesian approach 104
4.7 The bell-shaped error distribution 109
4.8 Comments and further reading 112
References 113
4.A Appendix: The evolution of the central limit theorem 115
5 Learning about parameters, and some notes on planning 119
5.1 Introduction 119
5.2 Test statistics described by parameters 120
5.3 How we describe our knowledge about a parameter from an experiment 122
5.4 Statistical analysis of two proportions 127
5.4.1 Some ways to compare two proportions 127
5.4.2 Analysis of the group difference 130
5.5 Adjusting for confounders in the analysis 133
5.6 The power curve of an experiment 138
5.7 Some confusing aspects of power calculations 143
5.8 Comments and further reading 145
References 145
5.A Appendix: Some technical comments 146
5.A.1 The non-central hypergeometric distribution and 2 × 2 tables 146
5.A.2 The gamma and χ2 distributions 147
6 Empirical distribution functions 149
6.1 Introduction 149
6.2 How to describe the distribution of a sample 149
6.3 Describing the sample: descriptive statistics 153
6.4 Population distribution parameters 156
6.5 Confidence in the CDF and its parameters 158
6.6 Analysis of paired data 162
6.7 Bootstrapping 163
6.8 Meta-analysis and heterogeneity 166
6.9 Comments and further reading 169
References 170
6.A Appendix: Some technical comments 171
6.A.1 The extended family of the univariate Gaussian distributions 171
6.A.2 The Wiener process and its bridge 173
6.A.3 Confidence regions for the CDF and the Kolmogorov–Smirnov test 174
7 Correlation and regression in bivariate distributions 177
7.1 Introduction 177
7.2 Bivariate distributions and correlation 178
7.3 On baseline corrections and other covariates 183
7.4 Bivariate Gaussian distributions 186
7.5 Regression to the mean 189
7.6 Statistical analysis of bivariate Gaussian data 195
7.7 Simultaneous analysis of two binomial proportions 199
7.8 Comments and further reading 203
References 203
7.A Appendix: Some technical comments 205
7.A.1 The regression to the mode equation 205
7.A.2 Analysis of data from the multivariate Gaussian distribution 206
7.A.3 On the geometric approach to univariate confidence limits 207
8 How to compare the outcome in two groups 209
8.1 Introduction 209
8.2 Simple models that compare two distributions 210
8.3 Comparison done the horizontal way 212
8.4 Analysis done the vertical way 216
8.5 Some ways to compute p-values 224
8.6 The discrete Wilcoxon test 226
8.7 The two-period crossover trial 229
8.8 Multivariate analysis and analysis of covariance 232
8.9 Comments and further reading 239
References 240
8.A Appendix: About U-statistics 241
9 Least squares, linear models and beyond 245
9.1 Introduction 245
9.2 The purpose of mathematical models 246
9.3 Different ways to do least squares 250
9.4 Logistic regression, with variations 252
9.5 The two-step modeling approach 257
9.6 The effect of missing covariates 260
9.7 The exponential family of distributions 263
9.8 Generalized linear models 269
9.9 Comments and further reading 270
References 270
10 Analysis of dose response 273
10.1 Introduction 273
10.2 Dose–response relationship 274
10.3 Relative dose potency and therapeutic ratio 278
10.4 Subject-specific and population averaged dose response 279
10.5 Estimation of the population averaged dose–response relationship 281
10.6 Estimating subject-specific dose responses 285
10.7 Comments and further reading 288
References 288
11 Hazards and censored data 289
11.1 Introduction 289
11.2 Censored observations: incomplete knowledge 290
11.3 Hazard models from a population perspective 291
11.4 The impact of competing risks 296
11.5 Heterogeneity in survival analysis 300
11.6 Recurrent events and frailty 304
11.7 The principles behind the analysis of censored data 306
11.8 The Kaplan–Meier estimator of the CDF 309
11.9 Comments and further reading 312
References 313
11.A Appendix: On the large-sample approximations of counting processes 314
12 From the log-rank test to the Cox proportional hazards model 317
12.1 Introduction 317
12.2 Comparing hazards between two groups 318
12.3 Nonparametric tests for hazards 319
12.4 Parameter estimation in hazard models 324
12.5 The accelerated failure time model 328
12.6 The Cox proportional hazards model 331
12.7 On omitted covariates and stratification in the log-rank test 336
12.8 Comments and further reading 338
References 339
12.A Appendix: Comments on interval-censored data 341
13 Remarks on some estimation methods 343
13.1 Introduction 343
13.2 Estimating equations and the robust variance estimate 344
13.3 From maximum likelihood theory to generalized estimating equations 351
13.4 The analysis of recurrent events 355
13.5 Defining and estimating mixed effects models 360
13.6 Comments and further reading 366
References 367
13.A Appendix: Formulas for first-order bias 368
Index 371