Introduces readers to the principles of managerial statistics and data science, with an emphasis on statistical literacy of business students
Through a statistical perspective, this book introduces readers to the topic of data science, including Big Data, data analytics, and data wrangling. Chapters include multiple examples showing the application of the theoretical aspects presented. It features practice problems designed to ensure that readers understand the concepts and can apply them using real data. Over 100 open data sets used for examples and problems come from regions throughout the world, allowing the instructor to adapt the application to local data with which students can identify. Applications with these data sets include:
- Assessing if searches during a police stop in San Diego are dependent on driver’s race
- Visualizing the association between fat percentage and moisture percentage in Canadian cheese
- Modeling taxi fares in Chicago using data from millions of rides
- Analyzing mean sales per unit of legal marijuana products in Washington state
Topics covered in Principles of Managerial Statistics and Data Science include:data visualization; descriptive measures; probability; probability distributions; mathematical expectation; confidence intervals; and hypothesis testing. Analysis of variance; simple linear regression; and multiple linear regression are also included. In addition, the book offers contingency tables, Chi-square tests, non-parametric methods, and time series methods. The textbook:
- Includes academic material usually covered in introductory Statistics courses, but with a data science twist, and less emphasis in the theory
- Relies on Minitab to present how to perform tasks with a computer
- Presents and motivates use of data that comes from open portals
- Focuses on developing an intuition on how the procedures work
- Exposes readers to the potential in Big Data and current failures of its use
- Supplementary material includes: a companion website that houses PowerPoint slides; an Instructor's Manual with tips, a syllabus model, and project ideas; R code to reproduce examples and case studies; and information about the open portal data
- Features an appendix with solutions to some practice problems
Principles of Managerial Statistics and Data Science is a textbook for undergraduate and graduate students taking managerial Statistics courses, and a reference book for working business professionals.
Table of Contents
Preface xv
Acknowledgments xvii
Acronyms xix
About the Companion Site xxi
Principles of Managerial Statistics and Data Science xxiii
1 Statistics Suck; So Why Do I Need to Learn About It? 1
1.1 Introduction 1
Practice Problems 4
1.2 Data-Based Decision Making: Some Applications 5
1.3 Statistics Defined 9
1.4 Use of Technology and the New Buzzwords: Data Science, Data Analytics, and Big Data 11
1.4.1 A Quick Look at Data Science: Some Definitions 11
Chapter Problems 14
Further Reading 14
2 Concepts in Statistics 15
2.1 Introduction 15
Practice Problems 17
2.2 Type of Data 19
Practice Problems 20
2.3 Four Important Notions in Statistics 22
Practice Problems 24
2.4 Sampling Methods 25
2.4.1 Probability Sampling 25
2.4.2 Nonprobability Sampling 27
Practice Problems 30
2.5 Data Management 31
2.5.1 A Quick Look at Data Science: Data Wrangling Baltimore Housing Variables 34
2.6 Proposing a Statistical Study 36
Chapter Problems 37
Further Reading 39
3 Data Visualization 41
3.1 Introduction 41
3.2 Visualization Methods for Categorical Variables 41
Practice Problems 46
3.3 Visualization Methods for Numerical Variables 50
Practice Problems 56
3.4 Visualizing Summaries of More than Two Variables Simultaneously 59
3.4.1 A Quick Look at Data Science: Does Race Affect the Chances of a Driver Being Searched During a Vehicle Stop in San Diego? 66
Practice Problems 69
3.5 Novel Data Visualization 75
3.5.1 A Quick Look at Data Science: Visualizing Association Between Baltimore Housing Variables Over 14 Years 78
Chapter Problems 81
Further Reading 96
4 Descriptive Statistics 97
4.1 Introduction 97
4.2 Measures of Centrality 99
Practice Problems 108
4.3 Measures of Dispersion 111
Practice Problems 115
4.4 Percentiles 116
4.4.1 Quartiles 117
Practice Problems 122
4.5 Measuring the Association Between Two Variables 124
Practice Problems 128
4.6 Sample Proportion and Other Numerical Statistics 130
4.6.1 A Quick Look at Data Science: Murder Rates in Los Angeles 131
4.7 How to Use Descriptive Statistics 132
Chapter Problems 133
Further Reading 139
5 Introduction to Probability 141
5.1 Introduction 141
5.2 Preliminaries 142
Practice Problems 144
5.3 The Probability of an Event 145
Practice Problems 148
5.4 Rules and Properties of Probabilities 149
Practice Problems 152
5.5 Conditional Probability and Independent Events 154
Practice Problems 159
5.6 Empirical Probabilities 161
5.6.1 A Quick Look at Data Science: Missing People Reports in Boston by Day of Week 164
Practice Problems 165
5.7 Counting Outcomes 168
Practice Problems 171
Chapter Problems 171
Further Reading 175
6 Discrete Random Variables 177
6.1 Introduction 177
6.2 General Properties 178
6.2.1 A Quick Look at Data Science: Number of Stroke Emergency Calls in Manhattan 183
Practice Problems 184
6.3 Properties of Expected Value and Variance 186
Practice Problems 189
6.4 Bernoulli and Binomial Random Variables 190
Practice Problems 197
6.5 Poisson Distribution 198
Practice Problems 201
6.6 Optional: Other Useful Probability Distributions 203
Chapter Problems 205
Further Reading 208
7 Continuous Random Variables 209
7.1 Introduction 209
Practice Problems 211
7.2 The Uniform Probability Distribution 211
Practice Problems 215
7.3 The Normal Distribution 216
Practice Problems 225
7.4 Probabilities for Any Normally Distributed Random Variable 227
7.4.1 A Quick Look at Data Science: Normal Distribution, A Good Match for University of Puerto Rico SATs? 229
Practice Problems 231
7.5 Approximating the Binomial Distribution 234
Practice Problems 236
7.6 Exponential Distribution 236
Practice Problems 238
Chapter Problems 239
Further Reading 242
8 Properties of Sample Statistics 243
8.1 Introduction 243
8.2 Expected Value and Standard Deviation of x̄ 244
Practice Problems 246
8.3 Sampling Distribution of x̄ When Sample Comes From a Normal Distribution 247
Practice Problems 251
8.4 Central Limit Theorem 252
8.4.1 A Quick Look at Data Science: Bacteria at New York City Beaches 257
Practice Problems 259
8.5 Other Properties of Estimators 261
Chapter Problems 264
Further Reading 267
9 Interval Estimation for One Population Parameter 269
9.1 Introduction 269
9.2 Intuition of a Two-Sided Confidence Interval 270
9.3 Confidence Interval for the Population Mean: 𝜎 Known 271
Practice Problems 276
9.4 Determining Sample Size for a Confidence Interval for 𝜇 278
Practice Problems 279
9.5 Confidence Interval for the Population Mean: 𝜎 Unknown 279
Practice Problems 284
9.6 Confidence Interval for 𝜋 286
Practice Problems 287
9.7 Determining Sample Size for 𝜋 Confidence Interval 288
Practice Problems 290
9.8 Optional: Confidence Interval for 𝜎 290
9.8.1 A Quick Look at Data Science: A Confidence Interval for the Standard Deviation of Walking Scores in Baltimore 292
Chapter Problems 293
Further Reading 296
10 Hypothesis Testing for One Population 297
10.1 Introduction 297
10.2 Basics of Hypothesis Testing 299
10.3 Steps to Perform a Hypothesis Test 304
Practice Problems 305
10.4 Inference on the Population Mean: Known Standard Deviation 306
Practice Problems 318
10.5 Hypothesis Testing for the Mean (𝜎 Unknown) 323
Practice Problems 327
10.6 Hypothesis Testing for the Population Proportion 329
10.6.1 A Quick Look at Data Science: Proportion of New York City High Schools with a Mean SAT Score of 1498 or More 333
Practice Problems 334
10.7 Hypothesis Testing for the Population Variance 337
10.8 More on the p-Value and Final Remarks 338
10.8.1 Misunderstanding the p-Value 339
Chapter Problems 343
Further Reading 347
11 Statistical Inference to Compare Parameters from Two Populations 349
11.1 Introduction 349
11.2 Inference on Two Population Means 350
11.3 Inference on Two Population Means - Independent Samples, Variances Known 351
Practice Problems 357
11.4 Inference on Two Population Means When Two Independent Samples are Used - Unknown Variances 360
11.4.1 A Quick Look at Data Science: Suicide Rates Among Asian Men and Women in New York City 364
Practice Problems 366
11.5 Inference on Two Means Using Two Dependent Samples 368
Practice Problems 370
11.6 Inference on Two Population Proportions 371
Practice Problems 374
Chapter Problems 375
References 378
Further Reading 378
12 Analysis of Variance (ANOVA) 379
12.1 Introduction 379
Practice Problems 382
12.2 ANOVA for One Factor 383
Practice Problems 390
12.3 Multiple Comparisons 391
Practice Problems 395
12.4 Diagnostics of ANOVA Assumptions 395
12.4.1 A Quick Look at Data Science: Emergency Response Time for Cardiac Arrest in New York City 399
Practice Problems 403
12.5 ANOVA with Two Factors 404
Practice Problems 409
12.6 Extensions to ANOVA 413
Chapter Problems 416
Further Reading 419
13 Simple Linear Regression 421
13.1 Introduction 421
13.2 Basics of Simple Linear Regression 423
Practice Problems 425
13.3 Fitting the Simple Linear Regression Parameters 426
Practice Problems 429
13.4 Inference for Simple Linear Regression 431
Practice Problems 440
13.5 Estimating and Predicting the Response Variable 443
Practice Problems 446
13.6 A Binary X 448
Practice Problems 449
13.7 Model Diagnostics (Residual Analysis) 450
Practice Problems 456
13.8 What Correlation Doesn’t Mean 458
13.8.1 A Quick Look at Data Science: Can Rate of College Educated People Help Predict the Rate of Narcotic Problems in Baltimore? 461
Chapter Problems 466
Further Reading 472
14 Multiple Linear Regression 473
14.1 Introduction 473
14.2 The Multiple Linear Regression Model 474
Practice Problems 477
14.3 Inference for Multiple Linear Regression 478
Practice Problems 483
14.4 Multicollinearity and Other Modeling Aspects 486
Practice Problems 490
14.5 Variability Around the Regression Line: Residuals and Intervals 492
Practice Problems 494
14.6 Modifying Predictors 494
Practice Problems 495
14.7 General Linear Model 496
Practice Problems 502
14.8 Steps to Fit a Multiple Linear Regression Model 505
14.9 Other Regression Topics 507
14.9.1 A Quick Look at Data Science: Modeling Taxi Fares in Chicago 510
Chapter Problems 513
Further Reading 517
15 Inference on Association of Categorical Variables 519
15.1 Introduction 519
15.2 Association Between Two Categorical Variables 520
15.2.1 A Quick Look at Data Science: Affordability and Business Environment in Chattanooga 525
Practice Problems 529
Chapter Problems 532
Further Reading 532
16 Nonparametric Testing 533
16.1 Introduction 533
16.2 Sign Tests and Wilcoxon Sign-Rank Tests: One Sample and Matched Pairs Scenarios 533
Practice Problems 537
16.3 Wilcoxon Rank-Sum Test: Two Independent Samples 539
16.3.1 A Quick Look at Data Science: Austin, Texas, as a Place to Live; Do Men Rate It Higher Than Women? 540
Practice Problems 543
16.4 Kruskal-Wallis Test: More Than Two Samples 544
Practice Problems 546
16.5 Nonparametric Tests Versus Their Parametric Counterparts 547
Chapter Problems 548
Further Reading 549
17 Forecasting 551
17.1 Introduction 551
17.2 Time Series Components 552
Practice Problems 557
17.3 Simple Forecasting Models 558
Practice Problems 562
17.4 Forecasting When Data Has Trend, Seasonality 563
Practice Problems 569
17.5 Assessing Forecasts 572
17.5.1 A Quick Look at Data Science: Forecasting Tourism Jobs in Canada 575
17.5.2 A Quick Look at Data Science: Forecasting Retail Gross Sales of Marijuana in Denver 577
Chapter Problems 580
Further Reading 581
Appendix A Math Notation and Symbols 583
A.1 Summation 583
A.2 pth Power 583
A.3 Inequalities 584
A.4 Factorials 584
A.5 Exponential Function 585
A.6 Greek and Statistics Symbols 585
Appendix B Standard Normal Cumulative Distribution Function 587
Appendix C t Distribution Critical Values 591
Appendix D Solutions to Odd-Numbered Problems 593
Index 643