MACHINE LEARNING FOR BUSINESS ANALYTICS
Machine learning - also known as data mining or data analytics - is a fundamental part of data science. It is used by organizations in a wide variety of arenas to turn raw data into actionable information.
Machine Learning for Business Analytics: Concepts, Techniques, and Applications in R provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation, and network analytics. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques.
This is the second R edition of Machine Learning for Business Analytics. This edition also includes:
- A new co-author, Peter Gedeck, who brings over 20 years of experience in machine learning using R
- An expanded chapter focused on discussion of deep learning techniques
- A new chapter on experimental feedback techniques including A/B testing, uplift modeling, and reinforcement learning
- A new chapter on responsible data science
- Updates and new material based on feedback from instructors teaching MBA, Masters in Business Analytics and related programs, undergraduate, diploma and executive courses, and from their students
- A full chapter devoted to relevant case studies with more than a dozen cases demonstrating applications for the machine learning techniques
- End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented
- A companion website with more than two dozen data sets, and instructor materials including exercise solutions, slides, and case solutions
This textbook is an ideal resource for upper-level undergraduate and graduate level courses in data science, predictive analytics, and business analytics. It is also an excellent reference for analysts, researchers, and data science practitioners working with quantitative data in management, finance, marketing, operations management, information systems, computer science, and information technology.
Table of Contents
Foreword by Ravi Bapna xix
Foreword by Gareth James xxi
Preface to the Second R Edition xxiii
Acknowledgments xxvi
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 11
Order of Topics 13
Chapter 2 Overview of the Machine Learning Process 17
2.1 Introduction 17
2.2 Core Ideas in Machine Learning 18
Classification 18
Prediction 18
Association Rules and Recommendation Systems 18
Predictive Analytics 19
Data Reduction and Dimension Reduction 19
Data Exploration and Visualization 19
Supervised and Unsupervised Learning 20
2.3 The Steps in a Machine Learning Project 21
2.4 Preliminary Steps 23
Organization of Data 23
Predicting Home Values in the West Roxbury Neighborhood 23
Loading and Looking at the Data in R 24
Sampling from a Database 26
Oversampling Rare Events in Classification Tasks 27
Preprocessing and Cleaning the Data 28
2.5 Predictive Power and Overfitting 35
Overfitting 36
Creating and Using Data Partitions 38
2.6 Building a Predictive Model 41
Modeling Process 41
2.7 Using R for Machine Learning on a Local Machine 46
2.8 Automating Machine Learning Solutions 47
Predicting Power Generator Failure 48
Uber’s Michelangelo 50
2.9 Ethical Practice in Machine Learning 52
Machine Learning Software: The State of the Market (by Herb Edelstein) 53
Problems 57
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 63
3.1 Uses of Data Visualization 63
Base R or ggplot? 65
3.2 Data Examples 65
Example 1: Boston Housing Data 65
Example 2: Ridership on Amtrak Trains 67
3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 67
Distribution Plots: Boxplots and Histograms 70
Heatmaps: Visualizing Correlations and Missing Values 73
3.4 Multidimensional Visualization 75
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 76
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 79
Reference: Trend Lines and Labels 83
Scaling Up to Large Datasets 85
Multivariate Plot: Parallel Coordinates Plot 85
Interactive Visualization 88
3.5 Specialized Visualizations 91
Visualizing Networked Data 91
Visualizing Hierarchical Data: Treemaps 93
Visualizing Geographical Data: Map Charts 95
3.6 Major Visualizations and Operations, by Machine Learning Goal 97
Prediction 97
Classification 97
Time Series Forecasting 97
Unsupervised Learning 98
Problems 99
Chapter 4 Dimension Reduction 101
4.1 Introduction 101
4.2 Curse of Dimensionality 102
4.3 Practical Considerations 102
Example 1: House Prices in Boston 103
4.4 Data Summaries 103
Summary Statistics 104
Aggregation and Pivot Tables 104
4.5 Correlation Analysis 107
4.6 Reducing the Number of Categories in Categorical Variables 109
4.7 Converting a Categorical Variable to a Numerical Variable 111
4.8 Principal Component Analysis 111
Example 2: Breakfast Cereals 111
Principal Components 116
Normalizing the Data 117
Using Principal Components for Classification and Prediction 120
4.9 Dimension Reduction Using Regression Models 121
4.10 Dimension Reduction Using Classification and Regression Trees 121
Problems 123
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 129
5.1 Introduction 130
5.2 Evaluating Predictive Performance 130
Naive Benchmark: The Average 131
Prediction Accuracy Measures 131
Comparing Training and Holdout Performance 133
Cumulative Gains and Lift Charts 133
5.3 Judging Classifier Performance 136
Benchmark: The Naive Rule 136
Class Separation 136
The Confusion (Classification) Matrix 137
Using the Holdout Data 138
Accuracy Measures 139
Propensities and Threshold for Classification 139
Performance in Case of Unequal Importance of Classes 143
Asymmetric Misclassification Costs 146
Generalization to More Than Two Classes 149
5.4 Judging Ranking Performance 150
Cumulative Gains and Lift Charts for Binary Data 150
Decile-wise Lift Charts 153
Beyond Two Classes 154
Gains and Lift Charts Incorporating Costs and Benefits 154
Cumulative Gains as a Function of Threshold 155
5.5 Oversampling 156
Creating an Over-sampled Training Set 158
Evaluating Model Performance Using a Non-oversampled Holdout Set 159
Evaluating Model Performance If Only Oversampled Holdout Set Exists 159
Problems 162
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 167
6.1 Introduction 167
6.2 Explanatory vs. Predictive Modeling 168
6.3 Estimating the Regression Equation and Prediction 170
Example: Predicting the Price of Used Toyota Corolla Cars 171
Cross-validation and caret 175
6.4 Variable Selection in Linear Regression 176
Reducing the Number of Predictors 176
How to Reduce the Number of Predictors 178
Regularization (Shrinkage Models) 183
Problems 188
Chapter 7 k-Nearest Neighbors (kNN) 193
7.1 The k-NN Classifier (Categorical Outcome) 193
Determining Neighbors 194
Classification Rule 194
Example: Riding Mowers 195
Choosing k 196
Weighted k-NN 199
Setting the Cutoff Value 200
k-NN with More Than Two Classes 201
Converting Categorical Variables to Binary Dummies 201
7.2 k-NN for a Numerical Outcome 201
7.3 Advantages and Shortcomings of k-NN Algorithms 204
Problems 205
Chapter 8 The Naive Bayes Classifier 207
8.1 Introduction 207
Threshold Probability Method 208
Conditional Probability 208
Example 1: Predicting Fraudulent Financial Reporting 208
8.2 Applying the Full (Exact) Bayesian Classifier 209
Using the “Assign to the Most Probable Class” Method 210
Using the Threshold Probability Method 210
Practical Difficulty with the Complete (Exact) Bayes Procedure 210
8.3 Solution: Naive Bayes 211
The Naive Bayes Assumption of Conditional Independence 212
Using the Threshold Probability Method 212
Example 2: Predicting Fraudulent Financial Reports, Two Predictors 213
Example 3: Predicting Delayed Flights 214
Working with Continuous Predictors 218
8.4 Advantages and Shortcomings of the Naive Bayes Classifier 220
Problems 223
Chapter 9 Classification and Regression Trees 225
9.1 Introduction 226
Tree Structure 227
Decision Rules 227
Classifying a New Record 227
9.2 Classification Trees 228
Recursive Partitioning 228
Example 1: Riding Mowers 228
Measures of Impurity 231
9.3 Evaluating the Performance of a Classification Tree 235
Example 2: Acceptance of Personal Loan 236
9.4 Avoiding Overfitting 239
Stopping Tree Growth 242
Pruning the Tree 243
Best-Pruned Tree 245
9.5 Classification Rules from Trees 247
9.6 Classification Trees for More Than Two Classes 248
9.7 Regression Trees 249
Prediction 250
Measuring Impurity 250
Evaluating Performance 250
9.8 Advantages and Weaknesses of a Tree 250
9.9 Improving Prediction: Random Forests and Boosted Trees 252
Random Forests 252
Boosted Trees 254
Problems 257
Chapter 10 Logistic Regression 261
10.1 Introduction 261
10.2 The Logistic Regression Model 263
10.3 Example: Acceptance of Personal Loan 264
Model with a Single Predictor 265
Estimating the Logistic Model from Data: Computing Parameter Estimates 267
Interpreting Results in Terms of Odds (for a Profiling Goal) 270
10.4 Evaluating Classification Performance 271
10.5 Variable Selection 273
10.6 Logistic Regression for Multi-Class Classification 274
Ordinal Classes 275
Nominal Classes 276
10.7 Example of Complete Analysis: Predicting Delayed Flights 277
Data Preprocessing 282
Model-Fitting and Estimation 282
Model Interpretation 282
Model Performance 284
Variable Selection 285
Problems 289
Chapter 11 Neural Nets 293
11.1 Introduction 293
11.2 Concept and Structure of a Neural Network 294
11.3 Fitting a Network to Data 295
Example 1: Tiny Dataset 295
Computing Output of Nodes 296
Preprocessing the Data 299
Training the Model 300
Example 2: Classifying Accident Severity 304
Avoiding Overfitting 305
Using the Output for Prediction and Classification 305
11.4 Required User Input 307
11.5 Exploring the Relationship Between Predictors and Outcome 308
11.6 Deep Learning 309
Convolutional Neural Networks (CNNs) 310
Local Feature Map 311
A Hierarchy of Features 311
The Learning Process 312
Unsupervised Learning 312
Example: Classification of Fashion Images 313
Conclusion 320
11.7 Advantages and Weaknesses of Neural Networks 320
Problems 322
Chapter 12 Discriminant Analysis 325
12.1 Introduction 325
Example 1: Riding Mowers 326
Example 2: Personal Loan Acceptance 327
12.2 Distance of a Record from a Class 327
12.3 Fisher’s Linear Classification Functions 329
12.4 Classification Performance of Discriminant Analysis 333
12.5 Prior Probabilities 334
12.6 Unequal Misclassification Costs 334
12.7 Classifying More Than Two Classes 336
Example 3: Medical Dispatch to Accident Scenes 336
12.8 Advantages and Weaknesses 339
Problems 341
Chapter 13 Generating, Comparing, and Combining Multiple Models 345
13.1 Ensembles 346
Why Ensembles Can Improve Predictive Power 346
Simple Averaging or Voting 348
Bagging 349
Boosting 349
Bagging and Boosting in R 349
Stacking 350
Advantages and Weaknesses of Ensembles 351
13.2 Automated Machine Learning (AutoML) 352
AutoML: Explore and Clean Data 352
AutoML: Determine Machine Learning Task 353
AutoML: Choose Features and Machine Learning Methods 354
AutoML: Evaluate Model Performance 354
AutoML: Model Deployment 356
Advantages and Weaknesses of Automated Machine Learning 357
13.3 Explaining Model Predictions 358
13.4 Summary 360
Problems 362
345
Part V Intervention and User Feedback
Chapter 14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 367
14.1 A/B Testing 368
Example: Testing a New Feature in a Photo Sharing App 369
The Statistical Test for Comparing Two Groups (T-Test) 370
Multiple Treatment Groups: A/B/n Tests 372
Multiple A/B Tests and the Danger of Multiple Testing 372
14.2 Uplift (Persuasion) Modeling 373
Gathering the Data 374
A Simple Model 376
Modeling Individual Uplift 376
Computing Uplift with R 378
Using the Results of an Uplift Model 378
14.3 Reinforcement Learning 380
Explore-Exploit: Multi-armed Bandits 380
Example of Using a Contextual Multi-Arm Bandit for Movie Recommendations 382
Markov Decision Process (MDP) 383
14.4 Summary 388
Problems 390
Part VI Mining Relationships Among Records
Chapter 15 Association Rules and Collaborative Filtering 393
15.1 Association Rules 394
Discovering Association Rules in Transaction Databases 394
Example 1: Synthetic Data on Purchases of Phone Faceplates 394
Generating Candidate Rules 395
The Apriori Algorithm 397
Selecting Strong Rules 397
Data Format 399
The Process of Rule Selection 400
Interpreting the Results 401
Rules and Chance 403
Example 2: Rules for Similar Book Purchases 405
15.2 Collaborative Filtering 407
Data Type and Format 407
Example 3: Netflix Prize Contest 408
User-Based Collaborative Filtering: “People Like You” 409
Item-Based Collaborative Filtering 411
Evaluating Performance 412
Example 4: Predicting Movie Ratings with MovieLens Data 413
Advantages and Weaknesses of Collaborative Filtering 416
Collaborative Filtering vs. Association Rules 417
15.3 Summary 419
Problems 421
Chapter 16 Cluster Analysis 425
16.1 Introduction 426
Example: Public Utilities 427
16.2 Measuring Distance Between Two Records 429
Euclidean Distance 429
Normalizing Numerical Variables 430
Other Distance Measures for Numerical Data 432
Distance Measures for Categorical Data 433
Distance Measures for Mixed Data 434
16.3 Measuring Distance Between Two Clusters 434
Minimum Distance 434
Maximum Distance 435
Average Distance 435
Centroid Distance 435
16.4 Hierarchical (Agglomerative) Clustering 437
Single Linkage 437
Complete Linkage 438
Average Linkage 438
Centroid Linkage 438
Ward’s Method 438
Dendrograms: Displaying Clustering Process and Results 439
Validating Clusters 441
Limitations of Hierarchical Clustering 443
16.5 Non-Hierarchical Clustering: The k-Means Algorithm 444
Choosing the Number of Clusters (k) 445
Problems 450
Part VII Forecasting Time Series
Chapter 17 Handling Time Series 455
17.1 Introduction 455
17.2 Descriptive vs. Predictive Modeling 457
17.3 Popular Forecasting Methods in Business 457
Problems 466
Chapter 18 Regression-Based Forecasting 469
18.1 A Model with Trend 469
Linear Trend 469
Exponential Trend 473
Polynomial Trend 474
Problems 489
Chapter 19 Smoothing and Deep Learning Methods for Forecasting 499
19.1 Smoothing Methods: Introduction 500
19.2 Moving Average 500
Centered Moving Average for Visualization 500
Trailing Moving Average for Forecasting 501
Choosing Window Width (w) 504
Problems 516
Part VIII Data Analytics
Chapter 20 Social Network Analytics 527
20.1 Introduction 527
20.2 Directed vs. Undirected Networks 529
20.3 Visualizing and Analyzing Networks 530
Plot Layout 530
Edge List 533
Adjacency Matrix 533
Using Network Data in Classification and Prediction 534
Problems 548
Chapter 21 Text Mining 549
21.1 Introduction 549
21.2 The Tabular Representation of Text 550
21.3 Bag-of-Words vs. Meaning Extraction at Document Level 551
Problems 570
Chapter 22 Responsible Data Science 573
22.1 Introduction 573
22.2 Unintentional Harm 574
22.3 Legal Considerations 576
22.4 Principles of Responsible Data Science 577
Non-maleficence 578
Fairness 578
Transparency 579
Accountability 580
Data Privacy and Security 580
Problems 599
Part IX Cases
Chapter 23 Cases 603
23.1 Charles Book Club 603
The Book Industry 603
Database Marketing at Charles 604
Machine Learning Techniques 606
Assignment 608
23.2 German Credit 610
Background 610
Data 610
Assignment 614
Index 647