An up-to-date introduction to a market-leading platform for data analysis and machine learning
Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro®, 2nd ed. offers an accessible and engaging introduction to machine learning. It provides concrete examples and case studies to educate new users and deepen existing users’ understanding of their data and their business. Fully updated to incorporate new topics and instructional material, this remains the only comprehensive introduction to this crucial set of analytical tools specifically tailored to the needs of businesses.
Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro®, 2nd ed. is ideal for students and instructors of business analytics and data mining classes, as well as data science practitioners and professionals in data-driven industries.
Table of Contents
Foreword xix
Preface xx
Acknowledgments xxiii
Part I Preliminaries
1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
Statistical Modeling vs. Machine Learning 6
1.4 Big Data 6
1.5 Data Science 7
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 8
1.8 Road Maps to This Book 10
Order of Topics 12
2 Overview of the Machine Learning Process 17
2.1 Introduction 17
2.2 Core Ideas in Machine Learning 18
Classification 18
Prediction 18
Association Rules and Recommendation Systems 18
Predictive Analytics 19
Data Reduction and Dimension Reduction 19
Data Exploration and Visualization 19
Supervised and Unsupervised Learning 19
2.3 The Steps in A Machine Learning Project 21
2.4 Preliminary Steps 22
Organization of Data 22
Sampling from a Database 22
Oversampling Rare Events in Classification Tasks 23
Preprocessing and Cleaning the Data 23
2.5 Predictive Power and Overfitting 29
Overfitting 29
Creation and Use of Data Partitions 31
2.6 Building a Predictive Model with JMP Pro 34
Predicting Home Values in a Boston Neighborhood 34
Modeling Process 36
2.7 Using JMP Pro for Machine Learning 42
2.8 Automating Machine Learning Solutions 43
Predicting Power Generator Failure 44
Uber’s Michelangelo 45
2.9 Ethical Practice in Machine Learning 47
Machine Learning Software: The State of the Market by Herb
Edelstein 47
Problems 52
Part II Data Exploration and Dimension Reduction
3 Data Visualization 59
3.1 Introduction 59
3.2 Data Examples 61
Example 1: Boston Housing Data 61
Example 2: Ridership on Amtrak Trains 62
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 62
Distribution Plots: Boxplots and Histograms 64
Heatmaps 67
3.4 Multidimensional Visualization 70
Adding Variables: Color, Hue, Size, Shape, Multiple Panels,
Animation 70
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming,
Filtering 73
Reference: Trend Line and Labels 77
Scaling Up: Large Datasets 79
Multivariate Plot: Parallel Coordinates Plot 80
Interactive Visualization 80
3.5 Specialized Visualizations 82
Visualizing Networked Data 82
Visualizing Hierarchical Data: More on Treemaps 83
Visualizing Geographical Data: Maps 84
3.6 Summary: Major Visualizations and Operations, According to
Machine Learning Goal 87
Prediction 87
Classification 87
Time Series Forecasting 87
Unsupervised Learning 88
Problems 89
4 Dimension Reduction 91
4.1 Introduction 91
4.2 Curse of Dimensionality 92
4.3 Practical Considerations 92
Problems 112
Part III Performance Evaluation
5 Evaluating Predictive Performance 117
5.1 Introduction 118
5.2 Evaluating Predictive Performance 118
Problems 142
Part IV Prediction and Classification Methods
6 Multiple Linear Regression 147
6.1 Introduction 147
6.2 Explanatory vs. Predictive Modeling 148
6.3 Estimating the Regression Equation and Prediction 149
Example: Predicting the Price of Used Toyota Corolla
Automobiles 150
6.4 Variable Selection in Linear Regression 155
Reducing the Number of Predictors 155
How to Reduce the Number of Predictors 156
Manual Variable Selection 156
Automated Variable Selection 157
Regularization (Shriknage Models) 164
Problems 170
7 k-Nearest Neighbors (k-NN) 175
7.1 The 𝑘-NN Classifier (Categorical Outcome) 175
Problems 186
8 The Naive Bayes Classifier 189
8.1 Introduction 189
Threshold Probability Method 190
Conditional Probability 190
Problems 203
9 Classification and Regression Trees 205
9.1 Introduction 206
Tree Structure 206
Decision Rules 207
Classifying a New Record 207
9.2 Classification Trees 207
Recursive Partitioning 207
Example 1: Riding Mowers 208
Categorical Predictors 210
Standardization 210
9.3 Growing a Tree for Riding Mowers Example 210
Choice of First Split 211
Choice of Second Split 212
Final Tree 212
Using a Tree to Classify New Records 213
9.4 Evaluating the Performance of a Classification Tree 215
Example 2: Acceptance of Personal Loan 215
9.5 Avoiding Overfitting 219
Stopping Tree Growth: CHAID 220
Growing a Full Tree and Pruning It Back 220
How JMP Pro Limits Tree Size 221
9.6 Classification Rules from Trees 222
9.7 Classification Trees for More Than Two Classes 224
9.8 Regression Trees 224
Prediction 224
Evaluating Performance 225
9.9 Advantages and Weaknesses of a Single Tree 227
9.10 Improving Prediction: Random Forests and Boosted Trees 229
Random Forests 229
Boosted Trees 230
Problems 233
10 Logistic Regression 237
10.1 Introduction 237
10.2 The Logistic Regression Model 239
10.3 Example: Acceptance of Personal Loan 240
Model with a Single Predictor 241
Estimating the Logistic Model from Data: Multiple Predictors 243
Interpreting Results in Terms of Odds (for a Profiling Goal) 246
10.4 Evaluating Classification Performance 247
10.5 Variable Selection 249
10.6 Logistic Regression for Multi-class Classification 250
Logistic Regression for Nominal Classes 250
Logistic Regression for Ordinal Classes 251
Example: Accident Data 252
10.7 Example of Complete Analysis: Predicting Delayed Flights 253
Data Preprocessing 255
Model Fitting, Estimation, and Interpretation---A Simple Model 256
Model Fitting, Estimation and Interpretation---The Full Model 257
Model Performance 257
Problems 264
11 Neural Nets 267
11.1 Introduction 267
11.2 Concept and Structure of a Neural Network 268
11.3 Fitting a Network to Data 269
Example 1: Tiny Dataset 269
Computing Output of Nodes 269
Preprocessing the Data 272
Training the Model 273
Using the Output for Prediction and Classification 279
Example 2: Classifying Accident Severity 279
Avoiding Overfitting 281
11.4 User Input in JMP Pro 282
11.5 Exploring the Relationship Between Predictors and Outcome 284
11.6 Deep Learning 285
Convolutional Neural Networks (CNNs) 285
Local Feature Map 287
A Hierarchy of Features 287
The Learning Process 287
Unsupervised Learning 288
Conclusion 289
11.7 Advantages and Weaknesses of Neural Networks 289
Problems 290
12 Discriminant Analysis 293
12.1 Introduction 293
Example 1: Riding Mowers 294
Example 2: Personal Loan Acceptance 294
12.2 Distance of an Observation from a Class 295
12.3 From Distances to Propensities and Classifications 297
12.4 Classification Performance of Discriminant Analysis 300
12.5 Prior Probabilities 301
12.6 Classifying More Than Two Classes 303
Example 3: Medical Dispatch to Accident Scenes 303
12.7 Advantages and Weaknesses 306
Problems 307
13 Generating, Comparing, and Combining Multiple Models 311
13.1 Ensembles 311
Why Ensembles Can Improve Predictive Power 312
Simple Averaging or Voting 313
Bagging 314
Boosting 315
Stacking 316
Advantages and Weaknesses of Ensembles 317
13.2 Automated Machine Learning (AutoML) 317
AutoML: Explore and Clean Data 317
AutoML: Determine Machine Learning Task 318
AutoML: Choose Features and Machine Learning Methods 318
AutoML: Evaluate Model Performance 320
AutoML: Model Deployment 321
Advantages and Weaknesses of Automated Machine Learning 322
13.3 Summary 322
Problems 323
Part V Intervention and User Feedback
14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 327
14.1 Introduction 327
14.2 A/B Testing 328
Example: Testing a New Feature in a Photo Sharing App 329
The Statistical Test for Comparing Two Groups (𝑇 -Test) 329
Multiple Treatment Groups: A/B/n Tests 333
Multiple A/B Tests and the Danger of Multiple Testing 333
14.3 Uplift (Persuasion) Modeling 333
Getting the Data 334
A Simple Model 336
Modeling Individual Uplift 336
Creating Uplift Models in JMP Pro 337
Using the Results of an Uplift Model 338
14.4 Reinforcement Learning 340
Explore-Exploit: Multi-armed Bandits 340
Markov Decision Process (MDP) 341
14.5 Summary 344
Problems 345
Part VI Mining Relationships Among Records
15 Association Rules and Collaborative Filtering 349
15.1 Association Rules 349
Discovering Association Rules in Transaction Databases 350
Example 1: Synthetic Data on Purchases of Phone Faceplates 350
Data Format 350
Generating Candidate Rules 352
The Apriori Algorithm 353
Selecting Strong Rules 353
The Process of Rule Selection 356
Interpreting the Results 358
Rules and Chance 359
Example 2: Rules for Similar Book Purchases 361
15.2 Collaborative Filtering 362
Data Type and Format 363
Example 3: Netflix Prize Contest 363
User-Based Collaborative Filtering: “People Like You” 365
Item-Based Collaborative Filtering 366
Evaluating Performance 367
Advantages and Weaknesses of Collaborative Filtering 368
Collaborative Filtering vs. Association Rules 369
15.3 Summary 370
Problems 372
16 Cluster Analysis 375
16.1 Introduction 375
Example: Public Utilities 377
16.2 Measuring Distance Between Two Records 378
Euclidean Distance 379
Standardizing Numerical Measurements 379
Other Distance Measures for Numerical Data 379
Distance Measures for Categorical Data 382
Distance Measures for Mixed Data 382
16.3 Measuring Distance Between Two Clusters 383
Minimum Distance 383
Maximum Distance 383
Average Distance 383
Centroid Distance 383
16.4 Hierarchical (Agglomerative) Clustering 385
Single Linkage 385
Complete Linkage 386
Average Linkage 386
Centroid Linkage 386
Ward’s Method 387
Dendrograms: Displaying Clustering Process and Results 387
Validating Clusters 391
Two-Way Clustering 393
Limitations of Hierarchical Clustering 393
16.5 Nonhierarchical Clustering: The 𝐾-Means Algorithm 394
Choosing the Number of Clusters (𝑘) 396
Problems 403
Part VII Forecasting Time Series
17 Handling Time Series 409
17.1 Introduction 409
17.2 Descriptive vs. Predictive Modeling 410
17.3 Popular Forecasting Methods in Business 411
Combining Methods 411
17.4 Time Series Components 411
Example: Ridership on Amtrak Trains 412
17.5 Data Partitioning and Performance Evaluation 415
Benchmark Performance: Naive Forecasts 417
Generating Future Forecasts 417
Problems 419
18 Regression-Based Forecasting 423
18.1 A Model with Trend 424
Linear Trend 424
Exponential Trend 427
Polynomial Trend 429
18.2 A Model with Seasonality 430
Additive vs. Multiplicative Seasonality 432
18.3 A Model with Trend and Seasonality 433
18.4 Autocorrelation and ARIMA Models 433
Computing Autocorrelation 433
Improving Forecasts by Integrating Autocorrelation Information 437
Fitting AR Models to Residuals 439
Evaluating Predictability 441
Problems 444
19 Smoothing and Deep Learning Methods for Forecasting 455
19.1 Introduction 455
19.2 Moving Average 456
Centered Moving Average for Visualization 456
Trailing Moving Average for Forecasting 457
Choosing Window Width (𝑤) 460
19.3 Simple Exponential Smoothing 461
Choosing Smoothing Parameter 𝛼 462
Relation Between Moving Average and Simple Exponential
Smoothing 465
19.4 Advanced Exponential Smoothing 465
Series With a Trend 465
Series With a Trend and Seasonality 466
19.5 Deep Learning for Forecasting 470
Problems 472
Part VIII Data Analytics
20 Text Mining 483
20.1 Introduction 483
20.2 The Tabular Representation of Text: Document-Term Matrix and
“Bag-of-Words” 484
20.3 Bag-of-Words vs. Meaning Extraction at Document Level 486
20.4 Preprocessing the Text 486
Tokenization 487
Text Reduction 488
Presence/Absence vs. Frequency (Occurrences) 489
Term Frequency-Inverse Document Frequency (TF-IDF) 489
From Terms to Topics: Latent Semantic Analysis and Topic
Analysis 490
Extracting Meaning 491
From Terms to High Dimensional Word Vectors: Word2Vec 491
20.5 Implementing Machine Learning Methods 492
20.6 Example: Online Discussions on Autos and Electronics 492
Importing the Records 493
Text Preprocessing in JMP 494
Using Latent Semantic Analysis and Topic Analysis 496
Fitting a Predictive Model 499
Prediction 499
20.7 Example: Sentiment Analysis of Movie Reviews 500
Data Preparation 500
Latent Semantic Analysis and Fitting a Predictive Model 500
20.8 Summary 502
Problems 503
21 Responsible Data Science 505
21.1 Introduction 505
Example: Predicting Recidivism 506
21.2 Unintentional Harm 506
21.3 Legal Considerations 508
The General Data Protection Regulation (GDPR) 508
Protected Groups 508
21.4 Principles of Responsible Data Science 508
Non-maleficence 509
Fairness 509
Transparency 510
Accountability 511
Data Privacy and Security 511
21.5 A Responsible Data Science Framework 511
Justification 511
Assembly 512
Data Preparation 513
Modeling 513
Auditing 513
21.6 Documentation Tools 514
Impact Statements 514
Model Cards 515
Datasheets 516
Audit Reports 516
21.7 Example: Applying the RDS Framework to the COMPAS Example 517
Unanticipated Uses 518
Ethical Concerns 518
Protected Groups 518
Data Issues 518
Fitting the Model 519
Auditing the Model 520
Bias Mitigation 526
21.8 Summary 526
Problems 528
Part IX Cases
22 Cases 533
22.1 Charles Book Club 533
The Book Industry 533
Database Marketing at Charles 534
Machine Learning Techniques 535
Assignment 537
22.2 German Credit 541
Background 541
Data 541
Assignment 544
22.3 Tayko Software Cataloger 545
Background 545
The Mailing Experiment 545
Data 545
Assignment 546
22.4 Political Persuasion 548
Background 548
Predictive Analytics Arrives in US Politics 548
Political Targeting 548
Uplift 549
Data 549
Assignment 550
22.5 Taxi Cancellations 552
Business Situation 552
Assignment 552
22.6 Segmenting Consumers of Bath Soap 554
Business Situation 554
Key Problems 554
Data 555
Measuring Brand Loyalty 556
Assignment 556
22.7 Catalog Cross-Selling 557
Background 557
Assignment 557
22.8 Direct-Mail Fundraising 559
Background 559
Data 559
Assignment 559
22.9 Time Series Case: Forecasting Public Transportation Demand 562
Background 562
Problem Description 562
Available Data 562
Assignment Goal 562
Assignment 563
Tips and Suggested Steps 563
22.10 Loan Approval 564
Background 564
Regulatory Requirements 564
Getting Started 564
Assignment 564
References 567
Data Files Used in the Book 571
Index 573