Makes mathematical and statistical analysis understandable to even the least math-minded biology student
This unique textbook aims to demystify statistical formulae for the average biology student. Written in a lively and engaging style, Statistics for Terrified Biologists, 2nd Edition draws on the author’s 30 years of lecturing experience to teach statistical methods to even the most guarded of biology students. It presents basic methods using straightforward, jargon-free language. Students are taught to use simple formulae and how to interpret what is being measured with each test and statistic, while at the same time learning to recognize overall patterns and guiding principles. Complemented by simple examples and useful case studies, this is an ideal statistics resource tool for undergraduate biology and environmental science students who lack confidence in their mathematical abilities.
Statistics for Terrified Biologists presents readers with the basic foundations of parametric statistics, the t-test, analysis of variance, linear regression and chi-square, and guides them to important extensions of these techniques. It introduces them to non-parametric tests, and includes a checklist of non-parametric methods linked to their parametric counterparts. The book also provides many end-of-chapter summaries and additional exercises to help readers understand and practice what they’ve learned.
- Presented in a clear and easy-to-understand style
- Makes statistics tangible and enjoyable for even the most hesitant student
- Features multiple formulas to facilitate comprehension
- Written by of the foremost entomologists of his generation
This second edition of Statistics for Terrified Biologists is an invaluable guide that will be of great benefit to pre-health and biology undergraduate students.
Table of Contents
Preface to the second edition xv
Preface to the first edition xvii
1 How to use this book 1
Introduction 1
The text of the chapters 1
What should you do if you run into trouble? 2
Elephants 3
The numerical examples in the text 3
Boxes 4
Spare-time activities 4
Executive summaries 5
Why go to all that bother? 5
The bibliography 7
2 Introduction 9
What are statistics? 9
Notation 10
Notation for calculating the mean 12
3 Summarising variation 13
Introduction 13
Different summaries of variation 14
Range 14
Total deviation 14
Mean deviation 15
Variance 16
Why n−1? 17
Why are the deviations squared? 18
The standard deviation 19
The next chapter 21
Spare-time activities 21
4 When are sums of squares NOT sums of squares? 23
Introduction 23
Calculating machines offer a quicker method of calculating the sum of squares 24
Added squares 24
The correction factor 24
Avoid being confused by the term sum of squares 24
Summary of the calculator method for calculations as far as the standard deviation 25
Spare-time activities 26
5 The normal distribution 27
Introduction 27
Frequency distributions 27
The normal distribution 28
What percentage is a standard deviation worth? 30
Are the percentages always the same as these? 30
Other similar scales in everyday life 33
The standard deviation as an estimate of the frequency of a number occurring in a sample 33
From percentage to probability 34
Executive Summary 1 - The standard deviation 36
6 The relevance of the normal distribution to biological data 39
To recap 39
Is our observed distribution normal? 41
Checking for normality 42
What can we do about a distribution that clearly is not normal? 42
Transformation 42
Grouping samples 47
Doing nothing! 47
How many samples are needed? 47
Type 1 and Type 2 errors 48
Calculating how many samples are needed 49
7 Further calculations from the normal distribution 51
Introduction 51
Is A bigger than B? 52
The yardstick for deciding 52
The standard error of a difference between two means of three eggs 53
Derivation of the standard error of a difference between two means 53
Step 1: from variance of single data to variance of means 55
Step 2: From variance of single data to variance of differences 57
Step 3: The combination of Steps 1 and 2: the standard error of difference between means (s.e.d.m.) 58
Recap of the calculation of s.e.d.m. from the variance calculated from the individual values 61
The importance of the standard error of differences between means 61
Summary of this chapter 62
Executive Summary 2 - Standard error of a difference between two means 66
Spare-time activities 67
8 Thet-test 69
Introduction 69
The principle of the t-test 70
The t-test in statistical terms 71
Why t? 71
Tables of the t-distribution 72
The standard t-test 75
The procedure 76
The actual t-test 81
t-test for means associated with unequal variances 81
The s.e.d.m. when variances are unequal 82
A worked example of the t-test for means associated with unequal variances 85
The paired t-test 87
Pair when possible 90
Executive Summary 3 - The t-test 92
Spare-time activities 94
9 One tail or two? 95
Introduction 95
Why is the analysis of variance F-test one-tailed? 95
The two-tailed F-test 96
Howmany tails has the t-test? 98
The final conclusion on number of tails 99
10 Analysis of variance (ANOVA): what is it? How does it work? 101
Introduction 101
Sums of squares in ANOVA 102
Some ‘made-up’ variation to analyse by ANOVA 102
The sum of squares table 104
Using ANOVA to sort out the variation in Table C 104
Phase 1 104
Phase 2 105
SqADS: an important acronym 107
Back to the sum of squares table 108
How well does the analysis reflect the input? 109
End phase 109
Degrees of freedom in ANOVA 110
The completion of the end phase 112
The variance ratio 113
The relationship between t and F 114
Constraints on ANOVA 115
Adequate size of experiment 115
Equality of variance between treatments 117
Testing the homogeneity of variance 117
The element of chance: randomisation 118
Comparison between treatment means in ANOVA 119
The least significant difference 121
A caveat about using the LSD 123
Executive Summary 4 - The principle of ANOVA 124
11 Experimental designs for analysis of variance (ANOVA) 129
Introduction 129
Fully randomised 130
Data for analysis of a fully randomised experiment 131
Prelims 132
Phase 1 132
Phase 2 133
End phase 133
Randomised blocks 135
Data for analysis of a randomised block experiment 137
Prelims 138
Phase 1 139
Phase 2 140
End phase 141
Incomplete blocks 142
Latin square 145
Data for the analysis of a Latin square 145
Prelims 146
Phase 1 150
Phase 2 150
End phase 151
Further comments on the Latin square design 152
Split plot 154
Types of analysis of variance 154
One- and two-way analysis of variance 155
Fixed-, random-, and mixed-effects analysis of variance 156
Executive Summary 5 - Analysis of a one-way randomised block experiment 158
Spare-time activities 159
12 Introduction to factorial experiments 163
What is a factorial experiment? 163
Interaction: what does it mean biologically? 165
If there is no interaction 167
What if there IS interaction? 167
How about a biological example? 168
Measuring any interaction between factors is often the main/only purpose of an experiment 170
How does a factorial experiment change the form of the analysis of variance? 171
Degrees of freedom for interactions 171
The similarity between the residual in Phase 2 and the interaction in Phase 3 172
Sums of squares for interactions 172
13 2-Factor factorial experiments 175
Introduction 175
An example of a 2-factor experiment 175
Analysis of the 2-factor experiment 176
Prelims 176
Phase 1 177
Phase 2 177
End phase (of Phase 2) 178
Phase 3 179
End phase (of Phase 3) 183
Two important things to remember about factorials before tackling the next chapter 185
Analysis of factorial experiments with unequal replication 185
Executive Summary 6 - Analysis of a 2-factor randomised block experiment 188
Spare-time activity 190
14 Factorial experiments with more than two factors - leave this out if you wish! 191
Introduction 191
Different ‘orders’ of interaction 191
Example of a 4-factor experiment 192
Prelims 194
Phase 1 196
Phase 2 196
Phase 3 197
To the end phase 205
Spare-time activity 214
15 Factorial experiments with split plots 217
Introduction 217
Deriving the split plot design from the randomised block design 218
Degrees of freedom in a split plot analysis 221
Main plots 221
Sub-plots 222
Numerical example of a split plot experiment and its analysis 224
Calculating the sums of squares 225
End phase 229
Comparison of split plot and randomised block experiments 229
Uses of split plot designs 233
Spare-time activity 235
16 The t-test in the analysis of variance 237
Introduction 237
Brief recap of relevant earlier sections of this book 238
Least significant difference test 239
Multiple range tests 240
Operating the multiple range test 242
Testing differences between means 246
My rules for testing differences between means 246
Presentation of the results of tests of differences between means 247
The results of the experiments analysed by analysis of variance in Chapters 11-15 249
Fully randomised design (p. 131) 250
Randomised block experiment (p. 137) 251
Latin square design (p. 146) 253
2-Factor experiment (p. 176) 255
4-Factor experiment (p. 195) 257
Split plot experiment (p. 224) 259
Some final advice 261
Spare-time activities 261
17 Linear regression and correlation 263
Introduction 263
Cause and effect 264
Other traps waiting for you to fall into 264
Extrapolating beyond the range of your data 264
Is a straight line appropriate? 265
The distribution of variability 268
Regression 268
Independent and dependent variables 272
The regression coefficient (b) 272
Calculating the regression coefficient (b) 275
The regression equation 281
A worked example on some real data 282
The data 282
Calculating the regression coefficient (b), i.e. the slope of the regression line 282
Calculating the intercept (a) 284
Drawing the regression line 285
Testing the significance of the slope (b) of the regression 286
How well do the points fit the line? The coefficient of determination (r2) 290
Correlation 291
Derivation of the correlation coefficient (r) 291
An example of correlation 292
Is there a correlation line? 293
Extensions of regression analysis 296
Nonlinear regression 297
Multiple linear regression 298
Multiple nonlinear regression 300
Executive Summary - Linear regression 301
Spare time activities 303
18 Analysis of covariance (ANCOVA) 305
Introduction 305
A worked example of ANCOVA 307
Data: cholesterol levels of subjects given different diets 307
Data: ages of subjects in experiment 308
Regression of cholesterol level on age 309
The structure of the ANCOVA table 312
Total sum of squares 313
Residual sum of squares 314
Corrected means 316
Test for significant difference between means 316
Executive Summary 8 - Analysis of covariance (ANCOVA) 319
Spare-time activity 320
19 Chi-square tests 323
Introduction 323
When not and where not to use 𝜒 2 324
The problem of low frequencies 325
Yates’ correction for continuity 325
The 𝜒 2 test for goodness of fit 326
The case of more than two classes 328
𝜒 2 with heterogeneity 331
Heterogeneity 𝜒 2 Analysis with ‘Covariance’ 333
Association (or contingency) 𝜒 2 335
2 × 2 contingency table 336
Fisher’s exact test for a 2 × 2 table 338
Larger contingency tables 340
Interpretation of contingency tables 341
Spare-time activities 343
20 Nonparametric methods (what are they?) 345
Disclaimer 345
Introduction 346
Advantages and disadvantages of parametric and nonparametric methods 347
Where nonparametric methods score 347
Where parametric methods score 349
Some ways data are organised for nonparametric tests 349
The sign test 350
The Kruskal-Wallis analysis of ranks 350
Kendall’s rank correlation coefficient 352
The main nonparametric methods that are available 353
Analysis of two replicated treatments as in the t-test (Chapter 8) 353
Analysis of more than two replicated treatments as in the analysis of variance (Chapter 11) 354
Correlation of two variables (Chapter 17) 354
Appendix A How many replicates? 355
Appendix B Statistical tables 365
Appendix C Solutions to spare-time activities 373
Appendix D Bibliography 393
Index 397