+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

Data Analysis for the Geosciences. Essentials of Uncertainty, Comparison, and Visualization. Edition No. 1. AGU Advanced Textbooks

  • Book

  • 448 Pages
  • November 2023
  • John Wiley and Sons Ltd
  • ID: 5842750

An initial course in scientific data analysis and hypothesis testing designed for students in all science, technology, engineering, and mathematics disciplines

Data Analysis for the Geosciences: Essentials of Uncertainty, Comparison, and Visualization is a textbook for upper-level undergraduate STEM students, designed to be their statistics course in a degree program.

This volume provides a comprehensive introduction to data analysis, visualization, and data-model comparisons and metrics, within the framework of the uncertainty around the values. It offers a learning experience based on real data from the Earth, ocean, atmospheric, space, and planetary sciences.

About this volume:

  • Serves as an initial course in scientific data analysis and hypothesis testing
  • Focuses on the methods of data processing
  • Introduces a wide range of analysis techniques
  • Describes the many ways to compare data with models
  • Centers on applications rather than derivations
  • Explains how to select appropriate statistics for meaningful decisions
  • Explores the importance of the concept of uncertainty
  • Uses examples from real geoscience observations
  • Homework problems at the end of chapters

The American Geophysical Union promotes discovery in Earth and space science for the benefit of humanity. Its publications disseminate scientific knowledge and provide resources for researchers, students, and professionals.

Table of Contents

Preface xv

Acknowledgments xxi

About the Companion Website xxiii

1 Assessment and Uncertainty: Examples and Introductory Concepts 1

1.1 Chicken Little, Amateur Meteorologist 2

1.2 Uncertainty Ascribes Meaning to Values 3

1.3 Significant Figures 3

1.4 Types of Uncertainty 7

1.5 Example: Finding Saturn’s Moons 9

1.6 Comparing Two Numbers: Are They Measuring the Same Value? 11

1.6.1 Distributions of Number Sets 12

1.6.2 The Gaussian Distribution 13

1.6.3 Testing a Specific Value within a Data Set: The z Test 14

1.6.4 Comparing Two Values Revisited 18

1.7 Use and Misuse of Statistics 19

1.8 Example: Solar Wind Density and Space Weather 20

1.9 Uncertainty and the Scientific Method 22

1.10 Further Reading 24

1.11 Exercises in the Geosciences 26

2 Plotting Data: Visualizing Sets of Numbers 27

2.1 Plotting One- Dimensional Data 27

2.1.1 What Makes a Good Plot? 29

2.1.2 Exploratory Versus Explanatory Plot Styles 31

2.2 Example: Earth’s Magnetic Field Strength 33

2.3 Probability Distributions - The Histogram 35

2.4 Plotting Two Data Sets Against Each Other 39

2.4.1 Overlaid Histograms 39

2.4.2 The Scatterplot 40

2.4.3 The Box Plot 42

2.4.4 The Box- and- Whisker Scatterplot 43

2.4.5 The Running Average Plot 44

2.5 Example: Temperature and Carbon Dioxide 48

2.6 Scientific Visualization: A Sampling from the Literature 50

2.6.1 A Very Brief History of Visualization 51

2.6.2 Good Modern- Day Example Visualizations 53

2.7 Visualization Best Practices 58

2.7.1 Levels of Abstraction 58

2.7.2 A Process for a Good Graphic 61

2.7.3 Types of Colorblindness 63

2.7.4 Color Scales 63

2.8 Further Reading 65

2.9 Exercises in the Geosciences 67

3 Uncertainty Analysis: Techniques for Propagating Uncertainty 69

3.1 Propagating Uncertainty 69

3.1.1 Calculating Uncertainty with One Independent Variable 69

3.1.2 Calculating Uncertainty with Two Independent Variables 70

3.1.3 Calculating Uncertainty with Many Independent Variables 72

3.2 Example: Atmospheric Density 72

3.2.1 The Hydrostatic Equilibrium Approximation 72

3.2.2 One Independent Variable 73

3.2.3 Two Independent Variables 74

3.2.4 Many Independent Variables 74

3.3 Fractional and Percentage Uncertainties 75

3.4 Special Cases of Uncertainty Propagation 77

3.4.1 Addition and Subtraction 77

3.4.2 Multiplication and Division 78

3.4.2.1 Multiplication of Two Parameters 78

3.4.2.2 Uncertainty of Air Pressure 79

3.4.2.3 Division with Correlated Variables 80

3.4.2.4 Multiplication and Division with Independent Variables 81

3.4.3 Power Laws 82

3.4.4 Exponentials and Logarithms 82

3.4.4.1 Exponential Functions 83

3.4.4.2 Logarithmic Functions 84

3.4.5 Trigonometric Functions 84

3.5 Stepwise Uncertainty Propagation 85

3.6 Example: Planetary Equilibrium Temperature 87

3.7 Multistep Processing 90

3.8 Final Advice on Uncertainty Propagation 91

3.9 Further Reading 93

3.10 Exercises in the Geosciences 93

4 Centroids and Spreads: Analyzing a Set of Numbers 95

4.1 Quantitatively Describing a Data Set: The Centroid 95

4.1.1 Three Versions of Mean 96

4.1.2 More Centroids: Median and Mode 98

4.1.3 Histograms and the Arithmetic Mean 99

4.2 Quantitatively Describing a Data Set: Spread 100

4.2.1 Measures of Spread: Standard Deviation and Mean Absolute Difference 100

4.2.2 Another Measure of Spread: Quantiles 102

4.2.3 Spread Via Full Width at Half Maximum 106

4.2.4 Spread as an L- p Norm 107

4.2.5 Sample Versus Population 108

4.3 Random and Systematic Error of a Data Set 109

4.4 Which Centroid and Spread to Use and Other Tidbits of Advice 111

4.5 Standard Deviation of the Mean 112

4.6 Counting Statistics 113

4.7 Example: Galactic Cosmic Rays 116

4.8 Further Reading 119

4.9 Exercises in the Geosciences 120

5 Assessing Normality: Tests for Assessing the Gaussian Nature of a Distribution 123

5.1 Histogram Check 124

5.2 Comparing Centroid and Spread Measures 126

5.3 Skew 128

5.4 Kurtosis 130

5.5 The Chi- Squared Test 132

5.6 The Kolmogorov-Smirnov Test 137

5.7 Example: pH in a Lake 139

5.8 Asymmetric Uncertainties 142

5.9 Outliers - Tests for a Single Data Value 144

5.10 Combining Centroid and Spread: The Weighted Average 146

5.11 Example: pH in a Lake Redux 148

5.12 Further Reading 149

5.13 Exercises in the Geosciences 150

6 Correlating Two Data Sets: Analyzing Two Sets of Numbers Together 153

6.1 Comparing Two Number Sets 153

6.1.1 Chi- Squared and Kolmogorov-Smirnov Tests 154

6.1.2 The Student’s t Test 155

6.1.3 The Welch’s t Test 156

6.2 Linear Correlation 157

6.2.1 Covariance of Two Data Sets 158

6.2.2 Pearson Linear Correlation Coefficient 161

6.2.3 Spearman Rank- Order Correlation 163

6.2.4 Correlation with Logarithms 167

6.3 Example: Atmospheric Ozone and Temperature 168

6.4 Uncertainty of R 172

6.4.1 The Jackknife Method 172

6.4.2 The Bootstrap Method 173

6.4.3 Uncertainty of R for the Ozone- Temperature Example 175

6.5 Correlation and Causation 177

6.6 Further Reading 178

6.7 Exercises in the Geosciences 179

7 Curve Fitting: Fitting a Line between Two Sets of Numbers 181

7.1 Linear Regression 181

7.1.1 Obtaining A and B 181

7.1.2 Uncertainties on A and B 185

7.1.3 The Zero- Intercept Special Case 186

7.1.4 Weighted Linear Fitting 187

7.2 Testing a Linear Fit 188

7.3 Example: Human- Induced Seismicity 191

7.4 Nonlinear Fitting 194

7.4.1 Polynomial Fitting 194

7.4.2 Generalized “Linear Coefficient” Fitting 196

7.4.3 Exponential Fitting: Linearizing the Dependence on Coefficients 197

7.4.4 Piecewise Linear Fitting 198

7.4.5 Advice about Curve Fitting 199

7.5 Example: The Ozone Hole 200

7.6 Iterative Curve Fitting 203

7.6.1 One- Dimensional Iterative Curve Fitting 203

7.6.2 Multidimensional Iterative Curve Fitting 205

7.6.3 Gradient Descent Curve Fitting 208

7.7 Final Thoughts on Curve Fitting 209

7.8 Further Reading 210

7.9 Exercises in the Geosciences 210

8 Data- Model Comparison Basics: Philosophies of Calculating and Categorizing Metrics 213

8.1 Example Model: River Flow Rate 213

8.2 What Is a Model? 214

8.3 Visualizing Observed and Modeled Values Together 217

8.3.1 Scatterplots of Data and Model Values 217

8.3.2 The 2D Histogram Plot 219

8.3.3 Overlaid Histogram Plots 221

8.3.4 Cumulative Probability Distribution Plots 222

8.3.5 Quantile-Quantile Plots 224

8.4 Example: Total Solar Irradiance 226

8.5 A Diverse Zoo of Metrics 229

8.5.1 The Primary Categories of Metrics 230

8.5.2 Skill 231

8.5.3 Metrics Categories Based on Subsetting 234

8.6 The Concept of Model “Goodness of Fit” 235

8.7 Application Usability Levels 236

8.8 Designing a Meaningful Data- Model Comparison 237

8.9 Further Reading 239

8.10 Exercises in the Geosciences 240

9 Fit Performance Metrics: Data- Model Comparisons Based on Exact Observed and Modeled Values 243

9.1 What Is Fit Performance? 244

9.2 Running Example: Dst and the O’Brien Model 245

9.3 Accuracy 250

9.3.1 The Big Three of Accuracy: MSE, RMSE, and MAE 251

9.3.2 Neglecting Degrees of Freedom 253

9.3.3 Normalizing the Accuracy Measure 256

9.3.4 Percentage Accuracy Metrics 257

9.3.5 Choosing the Right Accuracy Metric 261

9.4 Bias 262

9.4.1 Mean Error 262

9.4.2 Percentage Bias 265

9.5 Precision 266

9.5.1 Modeling Yield 266

9.5.2 Definitions of Precision Using Standard Deviation 268

9.6 Association 268

9.6.1 Correlation Coefficient 269

9.6.2 Nonlinear Association Metrics 270

9.7 Extremes 272

9.7.1 Extremes of the Cumulative Probability Distribution 272

9.7.2 Using Skew and Kurtosis for an Extremes Assessment 276

9.8 Skill 278

9.8.1 Prediction Efficiency 278

9.8.2 Other Options for Fit Performance Skill 279

9.9 Discrimination 281

9.10 Reliability 283

9.11 Summarizing the Running Example 286

9.12 Summary of Fit Performance Metrics 287

9.13 Further Reading 291

9.14 Exercises in the Geosciences 292

10 Event Detection Metrics: Comparing Observed and Modeled Number Sets When Only Event Status Matters 295

10.1 Defining an Event 296

10.2 Contingency Tables 299

10.3 Data- Model Comparisons with Events 301

10.4 Running Example: Will It Rain? 303

10.5 Significance of a Contingency Table 307

10.6 Accuracy 310

10.7 Bias 311

10.8 Precision 313

10.9 Association 314

10.9.1 Odds Ratio 315

10.9.2 Odds Ratio Skill Score 316

10.9.3 Matthews Correlation Coefficient 317

10.10 Extremes 317

10.11 Skill 321

10.11.1 Heidke Skill Score 321

10.11.2 Peirce and Clayton Skill Scores 323

10.11.3 Gilbert Skill Score 324

10.12 Discrimination 325

10.13 Reliability 326

10.14 Summarizing the Running Example 327

10.15 Summary of Event Detection Metrics 328

10.16 Further Reading 330

10.17 Exercises in the Geosciences 331

11 Sliding Thresholds: Event Detection Metrics with a Variable Event Identification 333

11.1 Sliding the Event Identification Thresholds 334

11.2 Sweeping the Modeled Threshold 337

11.3 Sweeping the Data Threshold 340

11.4 Sweeping Both Thresholds Simultaneously 342

11.5 Metric- Versus- Metric Curves 344

11.5.1 ROC Curves 344

11.5.2 Alt- ROC Curves 346

11.5.3 STONE Curves 347

11.6 Application of Sliding Thresholds to the Geophysical Running Examples 349

11.6.1 Event Definitions for the Running Examples 349

11.6.2 Metric- Versus- Modeled Threshold Curves for the Running Examples 352

11.6.3 Metric- Versus- Observed Threshold Curves for the Running Examples 355

11.6.4 Metric- Versus- Simultaneous Threshold Sweep Curves for the Running Examples 357

11.6.5 Metric- Versus- Metric Analysis for the Running Examples 359

11.7 The Power of Sliding Thresholds 362

11.8 Further Reading 364

11.9 Exercises in the Geosciences 365

12 Applications of Metrics and Uncertainty: Final Advice and Introductions to Advanced Topics 367

12.1 Choosing the Right Set of Metrics 367

12.1.1 Metrics for Fit Performance Assessment on Gaussian Distributions 368

12.1.2 Metrics for Fit Performance Assessment on Non- Gaussian Distributions 369

12.1.3 Metrics for Event Detection Assessment 372

12.2 Combining Metrics for Robust Data- Model Comparisons 374

12.2.1 The Accuracy-Bias-Precision Trifecta 374

12.2.2 The Accuracy-Association Connection 376

12.2.3 The Association-Extremes Linkage 377

12.2.4 Expanding Our Understanding of Skill 378

12.2.5 Using Discrimination and Reliability Together 379

12.3 Uncertainty on Metrics 380

12.4 Uncertainty on Fit Performance Metrics for the Dst Running Example 381

12.5 A Recipe for Robust Comparisons 385

12.6 Metrics and Decision- Making 387

12.6.1 Choice Combination Statistics 388

12.6.2 Example: Spacecraft- Charging Model 390

12.7 Additional Advanced Topics 392

12.7.1 Periodicity Analysis 392

12.7.2 Time- Lagged Analysis 393

12.7.3 Additional Tests 394

12.7.4 Multidimensional Data Analysis 394

12.7.5 Multidimensional Data- Model Comparisons 396

12.7.6 Uncertainty Quantification 397

12.7.7 Design of Experiments 398

12.7.8 Geographical Information System (GIS) Analysis 398

12.7.9 Machine Learning 399

12.8 Uncertainty and the Scientist 400

12.9 Further Reading 402

12.10 Exercises in Geoscience 406

Index 407

Authors

Michael W. Liemohn University of Michigan, USA.