Machine Learning for Business Analytics. Concepts, Techniques and Applications with JMP Pro. Edition No. 2


Book
608 Pages
April 2023
John Wiley and Sons Ltd
ID: 5840806

MACHINE LEARNING FOR BUSINESS ANALYTICS

An up-to-date introduction to a market-leading platform for data analysis and machine learning

Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro®, 2nd ed. offers an accessible and engaging introduction to machine learning. It provides concrete examples and case studies to educate new users and deepen existing users’ understanding of their data and their business. Fully updated to incorporate new topics and instructional material, this remains the only comprehensive introduction to this crucial set of analytical tools specifically tailored to the needs of businesses.

Machine Learning for Business Analytics: Concepts, Techniques, and Applications with JMP Pro®, 2nd ed. is ideal for students and instructors of business analytics and data mining classes, as well as data science practitioners and professionals in data-driven industries.

Foreword xix

Preface xx

Acknowledgments xxiii

Part I Preliminaries

1 Introduction 3

1.1 What Is Business Analytics? 3

1.2 What Is Machine Learning? 5

1.3 Machine Learning, AI, and Related Terms 5

Statistical Modeling vs. Machine Learning 6

1.4 Big Data 6

1.5 Data Science 7

1.6 Why Are There So Many Different Methods? 8

1.7 Terminology and Notation 8

1.8 Road Maps to This Book 10

Order of Topics 12

2 Overview of the Machine Learning Process 17

2.1 Introduction 17

2.2 Core Ideas in Machine Learning 18

Classification 18

Prediction 18

Association Rules and Recommendation Systems 18

Predictive Analytics 19

Data Reduction and Dimension Reduction 19

Data Exploration and Visualization 19

Supervised and Unsupervised Learning 19

2.3 The Steps in A Machine Learning Project 21

2.4 Preliminary Steps 22

Organization of Data 22

Sampling from a Database 22

Oversampling Rare Events in Classification Tasks 23

Preprocessing and Cleaning the Data 23

2.5 Predictive Power and Overfitting 29

Overfitting 29

Creation and Use of Data Partitions 31

2.6 Building a Predictive Model with JMP Pro 34

Predicting Home Values in a Boston Neighborhood 34

Modeling Process 36

2.7 Using JMP Pro for Machine Learning 42

2.8 Automating Machine Learning Solutions 43

Predicting Power Generator Failure 44

Uber’s Michelangelo 45

2.9 Ethical Practice in Machine Learning 47

Machine Learning Software: The State of the Market by Herb

Edelstein 47

Problems 52

Part II Data Exploration and Dimension Reduction

3 Data Visualization 59

3.1 Introduction 59

3.2 Data Examples 61

Example 1: Boston Housing Data 61

Example 2: Ridership on Amtrak Trains 62

3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 62

Distribution Plots: Boxplots and Histograms 64

Heatmaps 67

3.4 Multidimensional Visualization 70

Adding Variables: Color, Hue, Size, Shape, Multiple Panels,

Animation 70

Manipulations: Rescaling, Aggregation and Hierarchies, Zooming,

Filtering 73

Reference: Trend Line and Labels 77

Scaling Up: Large Datasets 79

Multivariate Plot: Parallel Coordinates Plot 80

Interactive Visualization 80

3.5 Specialized Visualizations 82

Visualizing Networked Data 82

Visualizing Hierarchical Data: More on Treemaps 83

Visualizing Geographical Data: Maps 84

3.6 Summary: Major Visualizations and Operations, According to

Machine Learning Goal 87

Prediction 87

Classification 87

Time Series Forecasting 87

Unsupervised Learning 88

Problems 89

4 Dimension Reduction 91

4.1 Introduction 91

4.2 Curse of Dimensionality 92

4.3 Practical Considerations 92

Problems 112

Part III Performance Evaluation

5 Evaluating Predictive Performance 117

5.1 Introduction 118

5.2 Evaluating Predictive Performance 118

Problems 142

Part IV Prediction and Classification Methods

6 Multiple Linear Regression 147

6.1 Introduction 147

6.2 Explanatory vs. Predictive Modeling 148

6.3 Estimating the Regression Equation and Prediction 149

Example: Predicting the Price of Used Toyota Corolla

Automobiles 150

6.4 Variable Selection in Linear Regression 155

Reducing the Number of Predictors 155

How to Reduce the Number of Predictors 156

Manual Variable Selection 156

Automated Variable Selection 157

Regularization (Shriknage Models) 164

Problems 170

7 k-Nearest Neighbors (k-NN) 175

7.1 The 𝑘-NN Classifier (Categorical Outcome) 175

Problems 186

8 The Naive Bayes Classifier 189

8.1 Introduction 189

Threshold Probability Method 190

Conditional Probability 190

Problems 203

9 Classification and Regression Trees 205

9.1 Introduction 206

Tree Structure 206

Decision Rules 207

Classifying a New Record 207

9.2 Classification Trees 207

Recursive Partitioning 207

Example 1: Riding Mowers 208

Categorical Predictors 210

Standardization 210

9.3 Growing a Tree for Riding Mowers Example 210

Choice of First Split 211

Choice of Second Split 212

Final Tree 212

Using a Tree to Classify New Records 213

9.4 Evaluating the Performance of a Classification Tree 215

Example 2: Acceptance of Personal Loan 215

9.5 Avoiding Overfitting 219

Stopping Tree Growth: CHAID 220

Growing a Full Tree and Pruning It Back 220

How JMP Pro Limits Tree Size 221

9.6 Classification Rules from Trees 222

9.7 Classification Trees for More Than Two Classes 224

9.8 Regression Trees 224

Prediction 224

Evaluating Performance 225

9.9 Advantages and Weaknesses of a Single Tree 227

9.10 Improving Prediction: Random Forests and Boosted Trees 229

Random Forests 229

Boosted Trees 230

Problems 233

10 Logistic Regression 237

10.1 Introduction 237

10.2 The Logistic Regression Model 239

10.3 Example: Acceptance of Personal Loan 240

Model with a Single Predictor 241

Estimating the Logistic Model from Data: Multiple Predictors 243

Interpreting Results in Terms of Odds (for a Profiling Goal) 246

10.4 Evaluating Classification Performance 247

10.5 Variable Selection 249

10.6 Logistic Regression for Multi-class Classification 250

Logistic Regression for Nominal Classes 250

Logistic Regression for Ordinal Classes 251

Example: Accident Data 252

10.7 Example of Complete Analysis: Predicting Delayed Flights 253

Data Preprocessing 255

Model Fitting, Estimation, and Interpretation---A Simple Model 256

Model Fitting, Estimation and Interpretation---The Full Model 257

Model Performance 257

Problems 264

11 Neural Nets 267

11.1 Introduction 267

11.2 Concept and Structure of a Neural Network 268

11.3 Fitting a Network to Data 269

Example 1: Tiny Dataset 269

Computing Output of Nodes 269

Preprocessing the Data 272

Training the Model 273

Using the Output for Prediction and Classification 279

Example 2: Classifying Accident Severity 279

Avoiding Overfitting 281

11.4 User Input in JMP Pro 282

11.5 Exploring the Relationship Between Predictors and Outcome 284

11.6 Deep Learning 285

Convolutional Neural Networks (CNNs) 285

Local Feature Map 287

A Hierarchy of Features 287

The Learning Process 287

Unsupervised Learning 288

Conclusion 289

11.7 Advantages and Weaknesses of Neural Networks 289

Problems 290

12 Discriminant Analysis 293

12.1 Introduction 293

Example 1: Riding Mowers 294

Example 2: Personal Loan Acceptance 294

12.2 Distance of an Observation from a Class 295

12.3 From Distances to Propensities and Classifications 297

12.4 Classification Performance of Discriminant Analysis 300

12.5 Prior Probabilities 301

12.6 Classifying More Than Two Classes 303

Example 3: Medical Dispatch to Accident Scenes 303

12.7 Advantages and Weaknesses 306

Problems 307

13 Generating, Comparing, and Combining Multiple Models 311

13.1 Ensembles 311

Why Ensembles Can Improve Predictive Power 312

Simple Averaging or Voting 313

Bagging 314

Boosting 315

Stacking 316

Advantages and Weaknesses of Ensembles 317

13.2 Automated Machine Learning (AutoML) 317

AutoML: Explore and Clean Data 317

AutoML: Determine Machine Learning Task 318

AutoML: Choose Features and Machine Learning Methods 318

AutoML: Evaluate Model Performance 320

AutoML: Model Deployment 321

Advantages and Weaknesses of Automated Machine Learning 322

13.3 Summary 322

Problems 323

Part V Intervention and User Feedback

14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 327

14.1 Introduction 327

14.2 A/B Testing 328

Example: Testing a New Feature in a Photo Sharing App 329

The Statistical Test for Comparing Two Groups (𝑇 -Test) 329

Multiple Treatment Groups: A/B/n Tests 333

Multiple A/B Tests and the Danger of Multiple Testing 333

14.3 Uplift (Persuasion) Modeling 333

Getting the Data 334

A Simple Model 336

Modeling Individual Uplift 336

Creating Uplift Models in JMP Pro 337

Using the Results of an Uplift Model 338

14.4 Reinforcement Learning 340

Explore-Exploit: Multi-armed Bandits 340

Markov Decision Process (MDP) 341

14.5 Summary 344

Problems 345

Part VI Mining Relationships Among Records

15 Association Rules and Collaborative Filtering 349

15.1 Association Rules 349

Discovering Association Rules in Transaction Databases 350

Example 1: Synthetic Data on Purchases of Phone Faceplates 350

Data Format 350

Generating Candidate Rules 352

The Apriori Algorithm 353

Selecting Strong Rules 353

The Process of Rule Selection 356

Interpreting the Results 358

Rules and Chance 359

Example 2: Rules for Similar Book Purchases 361

15.2 Collaborative Filtering 362

Data Type and Format 363

Example 3: Netflix Prize Contest 363

User-Based Collaborative Filtering: “People Like You” 365

Item-Based Collaborative Filtering 366

Evaluating Performance 367

Advantages and Weaknesses of Collaborative Filtering 368

Collaborative Filtering vs. Association Rules 369

15.3 Summary 370

Problems 372

16 Cluster Analysis 375

16.1 Introduction 375

Example: Public Utilities 377

16.2 Measuring Distance Between Two Records 378

Euclidean Distance 379

Standardizing Numerical Measurements 379

Other Distance Measures for Numerical Data 379

Distance Measures for Categorical Data 382

Distance Measures for Mixed Data 382

16.3 Measuring Distance Between Two Clusters 383

Minimum Distance 383

Maximum Distance 383

Average Distance 383

Centroid Distance 383

16.4 Hierarchical (Agglomerative) Clustering 385

Single Linkage 385

Complete Linkage 386

Average Linkage 386

Centroid Linkage 386

Ward’s Method 387

Dendrograms: Displaying Clustering Process and Results 387

Validating Clusters 391

Two-Way Clustering 393

Limitations of Hierarchical Clustering 393

16.5 Nonhierarchical Clustering: The 𝐾-Means Algorithm 394

Choosing the Number of Clusters (𝑘) 396

Problems 403

Part VII Forecasting Time Series

17 Handling Time Series 409

17.1 Introduction 409

17.2 Descriptive vs. Predictive Modeling 410

17.3 Popular Forecasting Methods in Business 411

Combining Methods 411

17.4 Time Series Components 411

Example: Ridership on Amtrak Trains 412

17.5 Data Partitioning and Performance Evaluation 415

Benchmark Performance: Naive Forecasts 417

Generating Future Forecasts 417

Problems 419

18 Regression-Based Forecasting 423

18.1 A Model with Trend 424

Linear Trend 424

Exponential Trend 427

Polynomial Trend 429

18.2 A Model with Seasonality 430

Additive vs. Multiplicative Seasonality 432

18.3 A Model with Trend and Seasonality 433

18.4 Autocorrelation and ARIMA Models 433

Computing Autocorrelation 433

Improving Forecasts by Integrating Autocorrelation Information 437

Fitting AR Models to Residuals 439

Evaluating Predictability 441

Problems 444

19 Smoothing and Deep Learning Methods for Forecasting 455

19.1 Introduction 455

19.2 Moving Average 456

Centered Moving Average for Visualization 456

Trailing Moving Average for Forecasting 457

Choosing Window Width (𝑤) 460

19.3 Simple Exponential Smoothing 461

Choosing Smoothing Parameter 𝛼 462

Relation Between Moving Average and Simple Exponential

Smoothing 465

19.4 Advanced Exponential Smoothing 465

Series With a Trend 465

Series With a Trend and Seasonality 466

19.5 Deep Learning for Forecasting 470

Problems 472

Part VIII Data Analytics

20 Text Mining 483

20.1 Introduction 483

20.2 The Tabular Representation of Text: Document-Term Matrix and

“Bag-of-Words” 484

20.3 Bag-of-Words vs. Meaning Extraction at Document Level 486

20.4 Preprocessing the Text 486

Tokenization 487

Text Reduction 488

Presence/Absence vs. Frequency (Occurrences) 489

Term Frequency-Inverse Document Frequency (TF-IDF) 489

From Terms to Topics: Latent Semantic Analysis and Topic

Analysis 490

Extracting Meaning 491

From Terms to High Dimensional Word Vectors: Word2Vec 491

20.5 Implementing Machine Learning Methods 492

20.6 Example: Online Discussions on Autos and Electronics 492

Importing the Records 493

Text Preprocessing in JMP 494

Using Latent Semantic Analysis and Topic Analysis 496

Fitting a Predictive Model 499

Prediction 499

20.7 Example: Sentiment Analysis of Movie Reviews 500

Data Preparation 500

Latent Semantic Analysis and Fitting a Predictive Model 500

20.8 Summary 502

Problems 503

21 Responsible Data Science 505

21.1 Introduction 505

Example: Predicting Recidivism 506

21.2 Unintentional Harm 506

21.3 Legal Considerations 508

The General Data Protection Regulation (GDPR) 508

Protected Groups 508

21.4 Principles of Responsible Data Science 508

Non-maleficence 509

Fairness 509

Transparency 510

Accountability 511

Data Privacy and Security 511

21.5 A Responsible Data Science Framework 511

Justification 511

Assembly 512

Data Preparation 513

Modeling 513

Auditing 513

21.6 Documentation Tools 514

Impact Statements 514

Model Cards 515

Datasheets 516

Audit Reports 516

21.7 Example: Applying the RDS Framework to the COMPAS Example 517

Unanticipated Uses 518

Ethical Concerns 518

Protected Groups 518

Data Issues 518

Fitting the Model 519

Auditing the Model 520

Bias Mitigation 526

21.8 Summary 526

Problems 528

Part IX Cases

22 Cases 533

22.1 Charles Book Club 533

The Book Industry 533

Database Marketing at Charles 534

Machine Learning Techniques 535

Assignment 537

22.2 German Credit 541

Background 541

Data 541

Assignment 544

22.3 Tayko Software Cataloger 545

Background 545

The Mailing Experiment 545

Data 545

Assignment 546

22.4 Political Persuasion 548

Background 548

Predictive Analytics Arrives in US Politics 548

Political Targeting 548

Uplift 549

Data 549

Assignment 550

22.5 Taxi Cancellations 552

Business Situation 552

Assignment 552

22.6 Segmenting Consumers of Bath Soap 554

Business Situation 554

Key Problems 554

Data 555

Measuring Brand Loyalty 556

Assignment 556

22.7 Catalog Cross-Selling 557

Background 557

Assignment 557

22.8 Direct-Mail Fundraising 559

Background 559

Data 559

Assignment 559

22.9 Time Series Case: Forecasting Public Transportation Demand 562

Background 562

Problem Description 562

Available Data 562

Assignment Goal 562

Assignment 563

Tips and Suggested Steps 563

22.10 Loan Approval 564

Background 564

Regulatory Requirements 564

Getting Started 564

Assignment 564

References 567

Data Files Used in the Book 571

Index 573

Authors

Galit Shmueli University of Maryland, College Park. Peter C. Bruce Massachusetts Institute of Technology. Mia L. Stephens JMP. Muralidhara Anandamurthy JMP. Nitin R. Patel Cytel Inc; Massachusetts Institute of Technology; Harvard University, USA.

Table of Contents

Authors

Related Topics

Related Products

Automated Machine Learning (AutoML) Market Report 2025

Automated Machine Learning Market - AI Integration & Forecast 2025-2033

Automated Machine Learning (AutoML) - Global Strategic Business Report

Automated Machine Learning Solutions - Global Strategic Business Report

Automated Machine Learning - Market Share Analysis, Industry Trends & Statistics, Growth Forecasts (2025 - 2030)