Standardizes the definition and framework of analytics
#2 on Book Authority’s list of the Best New Analytics Books to Read in 2019 (January 2019)
We live in a world of pervasive data and ubiquitous, powerful computation. This convergence has inspired and accelerated the development of both analytic techniques and tools and this potential for analytics to have an impact has been a huge call to action for organizations, universities, and governments.
This title from Institute for Operations Research and the Management Sciences (INFORMS) represents the perspectives of some of the most respected experts on analytics.
Readers with various backgrounds in analytics – from novices to experienced professionals – will benefit from reading about and implementing the concepts and methods covered here.
Peer reviewed chapters provide readers with in-depth insights and a better understanding of the dynamic field of analytics
The INFORMS Analytics Body of Knowledge documents the core concepts and skills with which an analytics professional should be familiar; establishes a dynamic resource that will be used by practitioners to increase their understanding of analytics; and, presents instructors with a framework for developing academic courses and programs in analytics.
Table of Contents
Preface xv
List of Contributors xix
1 Introduction to Analytics 1
Philip T. Keenan, Jonathan H. Owen, and Kathryn Schumacher
1.1 Introduction 1
1.2 Conceptual Framework 3
1.2.1 Data-Centric Analytics 3
1.2.2 Decision-Centric Analytics 4
1.2.3 Combining Data- and Decision-Centric Approaches 5
1.3 Categories of Analytics 6
1.3.1 Descriptive Analytics 7
Data Modeling 7
Reporting 10
Visualization 10
Software 10
1.3.2 Predictive Analytics 10
Data Mining and Pattern Recognition 11
Predictive Modeling, Simulation, and Forecasting 11
Leveraging Expertise 12
1.3.3 Prescriptive Analytics 14
1.4 Analytics Within Organizations 16
1.4.1 Projects 17
1.4.2 Communicating Analytics 21
1.4.3 Organizational Capability 21
1.5 Ethical Implications 23
1.6 The Changing World of Analytics 25
1.7 Conclusion 28
References 28
2 Getting Started with Analytics 31
Karl G. Kempf
2.1 Introduction 31
2.2 Five Manageable Tasks 32
2.2.1 Task 1: Selecting the Target Problem 33
2.2.2 Task 2: Assemble the Team 34
Executive Sponsor 35
Project Manager 35
Domain Expert 35
IT Expert 35
Data Scientist 36
Stakeholders 36
2.2.3 Task 3: Prepare the Data 36
2.2.4 Task 4: Selecting Analytics Tools 39
Analytical Specificity or Breadth 39
Access to Data 40
Execution Performance 40
Visualization Capability 40
Data Scientist Skillset 40
Vendor Pricing 41
Team Budget 41
Sharing and Collaboration 41
2.2.5 Task 5: Execute 42
2.3 Real Examples 43
Case 1: Sensor Data and High-Velocity Analytics to Save Operating Costs 43
Case 2: Social Media and High-Velocity Analytics for Quick Response to Customers 44
Case 3: Sensor Data and High-Velocity Analytics to Save Maintenance Costs 44
Case 4: Using Old Data and Analytics to Detect New Fraudulent Claims 45
Case 5: Using Old and New Data Plus Analytics to Decrease Crime 45
Case 6: Collecting the Data and Applying the Analytics Is the Business 45
References 46
Further Reading: Papers 47
Further Reading: Books 48
3 The Analytics Team 49
Thomas H. Davenport
3.1 Introduction 49
3.2 Skills Necessary for Analytics 50
3.2.1 More Advanced or Recent Analytical and Data Science Skills 51
3.2.2 The Larger Team 53
3.3 Managing Analytical Talent 57
3.3.1 Developing Talent 58
3.3.2 Working with the HR Organization 59
3.4 Organizing Analytics 61
3.4.1 Goals of a Particular Analytics Organization 62
3.4.2 Basic Models for Organizing Analytics 63
3.4.3 Coordination Approaches 65
Program Management Office 66
Federation 67
Community 67
Matrix 67
Rotation 67
Assigned Customers 67
What Model Fits Your Business? 68
3.4.4 Organizational Structures for Specific Analytics Strategies and Scenarios 70
3.4.5 Analytical Leadership and the Chief Analytics Officer 70
3.5 To Where Should Analytical Functions Report? 72
Information Technology 72
Strategy 72
Shared Services 72
Finance 73
Marketing or Other Specific Function 73
Product Development 73
3.5.1 Building an Analytical Ecosystem 73
3.5.2 Developing the Analytical Organization over Time 74
References 75
4 The Data 77
Brian T. Downs
4.1 Introduction 77
4.2 Data Collection 77
4.2.1 Data Types 77
4.2.2 Data Discovery 80
4.3 Data Preparation 86
4.4 Data Modeling 93
4.4.1 Relational Databases 93
4.4.2 Nonrelational Databases 95
4.5 Data Management 97
5 Solution Methodologies 99
Mary E. Helander
5.1 Introduction 99
5.1.1 What Exactly Do We Mean by “Solution,” “Problem,” and “Methodology?” 99
5.1.2 It’s All About the Problem 101
5.1.3 Solutions versus Products 101
5.1.4 How This Chapter Is Organized 103
5.1.5 The “Descriptive–Predictive–Prescriptive” Analytics Paradigm 105
5.1.6 The Goals of This Chapter 105
5.2 Macro-Solution Methodologies for the Analytics Practitioner 106
5.2.1 The Scientific Research Methodology 106
5.2.2 The Operations Research Project Methodology 109
5.2.3 The Cross-Industry Standard Process for Data Mining (CRISP-DM) Methodology 112
5.2.4 Software Engineering-Related Solution Methodologies 114
5.2.5 Summary of Macro-Methodologies 114
5.3 Micro-Solution Methodologies for the Analytics Practitioner 116
5.3.1 Micro-Solution Methodology Preliminaries 116
5.3.2 Micro-Solution Methodology Description Framework 117
5.3.3 Group I: Micro-Solution Methodologies for Exploration and Discovery 119
Group I: Problems of Interest 119
Group I: Relevant Models 119
Group I: Data Considerations 120
Group I: Solution Techniques 120
Group I: Relationship to Macro-Methodologies 126
Group I: Takeaways 126
5.3.4 Group II: Micro-Solution Methodologies Using Models Where Techniques to Find Solutions Are Independent of Data 127
Group II: Problems of Interest 127
Group II: Relevant Models 127
Group II: Data Considerations 128
Group II: Solution Techniques 128
Group II: Relationship to Macro-Methodologies 135
Group II: Takeaways 137
5.3.5 Group III: Micro-Solution Methodologies Using Models Where Techniques to Find Solutions Are Dependent on Data 137
Group III: Problems of Interest 137
Group III: Relevant Models 138
Group III: Data Considerations 138
Group III: Solution Techniques 139
Group III: Relationship to Macro-Methodologies 140
Group III: Takeaways 141
5.3.6 Micro-Methodology Summary 141
5.4 General Methodology-Related Considerations 142
5.4.1 Planning an Analytics Project 142
5.4.2 Software and Tool Selection 142
5.4.3 Visualization 143
5.4.4 Fields with Related Methodologies 144
5.5 Summary and Conclusions 144
5.5.1 “Ding Dong, the Scientific Method Is Dead!” 145
5.5.2 “Methodology Cramps My Analytics Style” 145
5.5.3 “There Is Only One Way to Solve This” 146
5.5.4 Perceived Success Is More Important Than the Right Answer 148
5.6 Acknowledgments 149
References 149
6 Modeling 155
Gerald G. Brown
6.1 Introduction 155
6.2 When are Models Appropriate 155
6.2.1 What Is the Problem with This System? 159
6.2.2 Is This Problem Important? 159
6.2.3 How Will This Problem Be Solved Without a New Model? 159
6.2.4 What Modeling Technique Will Be Used? 159
6.2.5 How Will We Know When We Have Succeeded? 160
Who Are the System Operator Stakeholders? 160
6.3 Types of Models 161
6.3.1 Descriptive Models 161
6.3.2 Predictive Models 161
6.3.3 Prescriptive Models 161
6.4 Models Can Also Be Characterized by Whether They Are Deterministic or Stochastic (Random) 161
6.5 Counting 162
6.6 Probability 163
6.7 Probability Perspectives and Subject Matter Experts 165
6.8 Subject Matter Experts 165
6.9 Statistics 166
6.9.1 A Random Sample 166
6.9.2 Descriptive Statistics 166
6.9.3 Parameter Estimation with a Confidence Interval 166
6.9.4 Regression 167
6.10 Inferential Statistics 169
6.11 A Stochastic Process 170
6.12 Digital Simulation 173
6.12.1 Static versus Dynamic Simulations 174
6.13 Mathematical Optimization 174
6.14 Measurement Units 175
6.15 Critical Path Method 176
6.16 Portfolio Optimization Case Study Solved By a Variety of Methods 178
6.16.1 Linear Program 178
6.16.2 Heuristic 179
6.16.3 Assessing Our Progress 179
6.16.4 Relaxations and Bounds 179
6.16.5 Are We Finished Yet? 180
6.17 Game Theory 181
6.18 Decision Theory 184
6.19 Susceptible, Exposed, Infected, Recovered (SEIR) Epidemiology 187
6.20 Search Theory 189
6.21 Lanchester Models of Warfare 189
6.22 Hughes’ Salvo Model of Combat 192
6.23 Single-Use Models 193
6.24 The Principle of Optimality and Dynamic Programming 195
6.25 Stack-Based Enumeration 197
6.25.1 Data Structures 197
6.25.2 Discussion 199
6.25.3 Generating Permutations and Combinations 199
6.26 Traveling Salesman Problem: Another Case Study in Alternate Solution Methods 200
6.27 Model Documentation, Management, and Performance 206
6.27.1 Model Formulation 206
6.27.2 Choice of Implementation Language 207
6.27.3 Supervised versus Automated Models 207
6.27.4 Model Fidelity 208
6.27.5 Sensitivity Analysis 210
6.27.6 With Different Methods 211
6.27.7 With Different Variables 212
6.27.8 Stability 213
6.27.9 Reliability 213
6.27.10 Scalability 213
6.27.11 Extensibility 214
6.28 Rules for Data Use 215
6.28.1 Proprietary Data 215
6.28.2 Licensed Data 215
6.28.3 Personally Identifiable Information 216
6.28.4 Protected Critical Infrastructure Information System (PCIIMS) 216
6.28.5 Institutional Review Board (IRB) 216
6.28.6 Department of Defense and Department of Energy Classification 216
6.28.7 Law Enforcement Data 216
6.28.8 Copyright and Trademark 216
6.28.9 Paraphrased and Plagiarized 217
6.28.10 Displays of Model Outputs 217
6.28.11 Data Integrity 217
6.28.12 Multiple Data Evolutions 217
6.29 Data Interpolation and Extrapolation 217
6.30 Model Verification and Validation 218
6.30.1 Verifying 219
6.30.2 Validating 219
6.30.3 Comparing Models 219
6.30.4 Sample Data 220
6.30.5 Data Diagnostics 220
6.30.6 Data Vintage and Provenance 220
6.31 Communicate with Stakeholders 220
6.31.1 Training 221
6.31.2 Report Writers 221
6.31.3 Standard Form Model Statement 222
6.31.4 Persistence and Monotonicity: Examples of Realistic Model Restrictions 223
6.31.5 Model Solutions Require a Lot of Polish and Refinement Before They Can Directly Influence Policy 224
6.31.6 Model Obsolescence and Model-Advised Thumb Rules 226
6.32 Software 227
6.33 Where to Go from Here 228
6.34 Acknowledgments 228
References 229
7 Machine Learning 231
Samuel H. Huddleston and Gerald G. Brown
7.1 Introduction 231
7.2 Supervised, Unsupervised, and Reinforcement Learning 232
7.3 Model Development, Selection, and Deployment for Supervised Learning 235
7.3.1 Goals and Guiding Principles in Machine Learning 235
7.3.2 Algorithmic Modeling Overview 236
7.3.3 Data Acquisition and Cleaning 236
7.3.4 Feature Engineering 237
7.3.5 Modeling Overview 238
7.3.6 Model Fitting (Training) and Feature Selection 240
7.3.7 Model (Algorithm) Selection 241
7.3.8 Model Performance Assessment 242
7.3.9 Model Implementation 242
7.4 Model Fitting, Model Error, and the Bias-Variance Trade-Off 243
7.4.1 Components of (Regression) Model Error 243
7.4.2 Model Fitting: Balancing Bias and Variance 245
7.5 Predictive Performance Evaluation 247
7.5.1 Regression Performance Evaluation 248
7.5.2 Classification Performance Evaluation 249
7.5.3 Performance Evaluation for Time-Dependent Data 253
7.6 An Overview of Supervised Learning Algorithms 254
7.6.1 k-Nearest Neighbors (KNN) 255
7.6.2 Extensions to Regression 256
7.6.3 Classification and Regression Trees 257
7.6.4 Time Series Forecasting 259
7.6.5 Support Vector Machines 261
7.6.6 Artificial Neural Networks 262
7.6.7 Ensemble Methods 265
7.7 Unsupervised Learning Algorithms 267
7.7.1 Kernel Density Estimation 267
7.7.2 Association Rule Mining 268
7.7.3 Clustering Methods 269
7.7.4 Principal Components Analysis (PCA) 270
7.7.5 Bag-of-Words and Vector Space Models 271
7.8 Conclusion 272
7.9 Acknowledgments 272
References 273
8 Deployment and Life Cycle Management 275
Arnie Greenland
8.1 Introduction 275
8.2 The Analytics Methodology: Understanding the Critical Steps in Deployment and Life Cycle Management 276
8.2.1 CRISP-DM Phase 1: Business Understanding 278
8.2.2 JTA Domain I, Task 1: Obtain or Receive Problem Statement and Usability 278
8.2.3 JTA Domain I, Task 2: Identify Stakeholders 279
8.2.4 JTA Domain I, Task 3: Determine if the Problem Is Amenable to an Analytics Solution 281
8.2.5 JTA Domain I, Task 4: Refine the Problem Statement and Delineate Constraints 281
8.2.6 JTA Domain I, Task 5: Define an Initial Set of Business Benefits 281
8.2.7 JTA Domain I, Task 6: Obtain Stakeholder Agreement on the Business Statement 282
8.2.8 JTA Domain II, Task 1: Reformulate the Problem Statement as an Analytics Problem 283
8.2.9 JTA Domain II, Task 2: Develop a Proposed Set of Drivers and Relationships to Outputs 285
8.2.10 JTA Domain II, Task 3: State the Set of Assumptions Related to the Problem 286
8.2.11 JTA Domain II, Task 4: Define the Key Metrics of Success 287
8.2.12 JTA Domain II, Task 5: Obtain Stakeholder Agreement 287
8.2.13 CRISP-DM Phases 2 and 3: Data Understanding and Data Preparation 288
8.2.14 JTA Domain III, Task 1: Identify and Prioritize Data Needs and Sources 290
8.2.15 JTA Domain III, Task 2: Acquire Data 290
8.2.16 JTA Domain III, Task 3: Harmonize, Rescale, Clean, and Share Data 291
8.2.17 JTA Domain III, Task 4: Identify Relationships in the Data 292
8.2.18 JTA Domain III, Task 5: Document and Report Finding 293
8.2.19 JTA Domain III, Task 6: Refine the Business and Analytics Problem Statements 293
8.2.20 CRISP-DM Phase 4: Modeling 293
8.2.21 CRISP-DM Phase 5: Evaluation 294
8.2.22 CRISP-DM Phase 6: Deployment 297
8.2.23 Deployment of the Analytics Model (Up to Delivery) 298
8.2.24 Post-deployment Activities (Domain VI: Model Life Cycle Management) 301
8.3 Overarching Issues of Life Cycle Management 303
8.3.1 Documentation 303
8.3.2 Communication 305
8.3.3 Testing 307
8.3.4 Metrics 308
9 The Blossoming Analytics Talent Pool: An Overview of the Analytics Ecosystem 311
Ramesh Sharda and Pankush Kalgotra
9.1 Introduction 311
9.2 Analytics Industry Ecosystem 312
9.2.1 Data Generation Infrastructure Providers 314
9.2.2 Data Management Infrastructure Providers 315
9.2.3 Data Warehouse Providers 316
9.2.4 Middleware Providers 316
9.2.5 Data Service Providers 316
9.2.6 Analytics-Focused Software Developers 317
Reporting/Descriptive Analytics 317
Predictive Analytics 318
Prescriptive Analytics 318
9.2.7 Application Developers: Industry-Specific or General 319
9.2.8 Analytics Industry Analysts and Influencers 321
9.2.9 Academic Institutions and Certification Agencies 322
9.2.10 Regulators and Policy Makers 323
9.2.11 Analytics User Organizations 323
9.3 Conclusions 325
References 326
Appendix: Writing and Teaching Analytics with Cases 327
James J. Cochran
Index 355