+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

Machine Learning. Hands-On for Developers and Technical Professionals. Edition No. 2

  • Book

  • 432 Pages
  • April 2020
  • John Wiley and Sons Ltd
  • ID: 5838276
Dig deep into the data with a hands-on guide to machine learning with updated examples and more!

Machine Learning: Hands-On for Developers and Technical Professionals provides hands-on instruction and fully-coded working examples for the most common machine learning techniques used by developers and technical professionals. The book contains a breakdown of each ML variant, explaining how it works and how it is used within certain industries, allowing readers to incorporate the presented techniques into their own work as they follow along. A core tenant of machine learning is a strong focus on data preparation, and a full exploration of the various types of learning algorithms illustrates how the proper tools can help any developer extract information and insights from existing data. The book includes a full complement of Instructor's Materials to facilitate use in the classroom, making this resource useful for students and as a professional reference.

At its core, machine learning is a mathematical, algorithm-based technology that forms the basis of historical data mining and modern big data science. Scientific analysis of big data requires a working knowledge of machine learning, which forms predictions based on known properties learned from training data. Machine Learning is an accessible, comprehensive guide for the non-mathematician, providing clear guidance that allows readers to:

  • Learn the languages of machine learning including Hadoop, Mahout, and Weka
  • Understand decision trees, Bayesian networks, and artificial neural networks
  • Implement Association Rule, Real Time, and Batch learning
  • Develop a strategic plan for safe, effective, and efficient machine learning

By learning to construct a system that can learn from data, readers can increase their utility across industries. Machine learning sits at the core of deep dive data analysis and visualization, which is increasingly in demand as companies discover the goldmine hiding in their existing data. For the tech professional involved in data science, Machine Learning: Hands-On for Developers and Technical Professionals provides the skills and techniques required to dig deeper.

Table of Contents

Introduction xxvii

Chapter 1 What is Machine Learning? 1

History of Machine Learning 1

Alan Turing 1

Arthur Samuel 2

Tom M. Mitchell 2

Summary Definition 3

Algorithm Types for Machine Learning 3

Supervised Learning 3

Unsupervised Learning 4

The Human Touch 4

Uses for Machine Learning 4

Software 4

Stock Trading 5

Robotics 6

Medicine and Healthcare 6

Advertising 7

Retail and E-commerce 7

Gaming Analytics 9

The Internet of Things 10

Languages for Machine Learning 10

Python 10

R 11

Matlab 11

Scala 11

Ruby 11

Software Used in This Book 11

Checking the Java Version 12

Weka Toolkit 12

DeepLearning4J 13

Kafka 13

Spark and Hadoop 13

Text Editors and IDEs 13

Data Repositories 14

UC Irvine Machine Learning Repository 14

Kaggle 14

Summary 14

Chapter 2 Planning for Machine Learning 15

The Machine Learning Cycle 15

It All Starts with a Question 16

I Don’t Have Data! 16

Starting Local 17

Transfer Learning 17

Competitions 17

One Solution Fits All? 18

Defining the Process 18

Planning 18

Developing 19

Testing 19

Reporting 19

Refining 19

Production 20

Avoiding Bias 20

Building a Data Team 20

Mathematics and Statistics 20

Programming 21

Graphic Design 21

Domain Knowledge 21

Data Processing 22

Using Your Computer 22

A Cluster of Machines 22

Cloud-Based Services 22

Data Storage 23

Physical Discs 23

Cloud-Based Storage 23

Data Privacy 23

Cultural Norms 24

Generational Expectations 24

The Anonymity of User Data 25

Don’t Cross the “Creepy Line” 25

Data Quality and Cleaning 26

Presence Checks 26

Type Checks 27

Length Checks 27

Range Checks 28

Format Checks 28

The Britney Dilemma 28

What’s in a Country Name? 31

Dates and Times 33

Final Thoughts on Data Cleaning 33

Thinking About Input Data 34

Raw Text 34

Comma-Separated Variables 34

JSON 35

YAML 37

XML 37

Spreadsheets 38

Databases 39

Thinking About Output Data 39

Don’t Be Afraid to Experiment 40

Summary 40

Chapter 3 Data Acquisition Techniques 43

Scraping Data 43

Copy and Paste 44

Google Sheets 46

Using an API 47

Acquiring Weather Data 48

Migrating Data 50

Installing Embulk 51

Using the Quick Run 51

Installing Plugins 52

Migrating Files to Database 53

Bulk Converting CSV to JSON 55

Summary 56

Chapter 4 Statistics, Linear Regression, and Randomness 57

Working with a Basic Dataset 57

Loading and Converting the Dataset 58

Introducing Basic Statistics 59

Minimum and Maximum Values 60

Sum 61

Mean 62

Arithmetic Mean 62

Harmonic Mean 62

Geometric Mean 63

The Relationship Between the Three Averages 63

Mode 65

Median 66

Range 67

Interquartile Ranges 67

Variance 68

Standard Deviation 69

Using Simple Linear Regression 70

Using Your Spreadsheet 70

Writing a Program 73

Embracing Randomness 75

Finding Pi with Random Numbers 76

Using Monte Carlo Pi in Clojure 77

Summary 80

Chapter 5 Working with Decision Trees 81

The Basics of Decision Trees 81

Uses for Decision Trees 81

Advantages of Decision Trees 82

Limitations of Decision Trees 82

Different Algorithm Types 82

How Decision Trees Work 84

Decision Trees in Weka 88

The Requirement 88

Training Data 89

Using Weka to Create a Decision Tree 90

Creating Java Code from the Classification 94

Testing the Classifier Code 99

Thinking About Future Iterations 101

Summary 101

Chapter 6 Clustering 103

What is Clustering? 103

Where is Clustering Used? 104

The Internet 104

Business and Retail 104

Law Enforcement 105

Computing 105

Clustering Models 105

How the K-Means Works 106

Calculating the Number of Clusters in a Dataset 108

K-Means Clustering with Weka 110

Preparing the Data 110

The Workbench Method 111

The Command-Line Method 116

Converting CSV File to ARFF 116

The Coded Method 120

Summary 128

Chapter 7 Association Rules Learning 129

Where is Association Rules Learning Used? 129

Web Usage Mining 130

Beer and Diapers 130

How Association Rules Learning Works 131

Support 133

Confidence 133

Lift 134

Conviction 134

Defining the Process 134

Algorithms 135

Apriori 135

FP-Growth 136

Mining the Baskets - A Walk-Through 136

The Raw Basket Data 136

Using the Weka Application 137

Inspecting the Results 141

Summary 142

Chapter 8 Support Vector Machines 143

What is a Support Vector Machine? 143

Where are Support Vector Machines Used? 144

The Basic Classification Principles 144

Binary and Multiclass Classification 144

Linear Classifiers 146

Confidence 147

Maximizing and Minimizing to Find the Line 147

How Support Vector Machines Approach Classification 148

Using Linear Classification 148

Using Non-Linear Classification 150

Using Support Vector Machines in Weka 151

Installing LibSVM 151

A Classification Walk-Through 152

Implementing LibSVM with Java 158

Summary 164

Chapter 9 Artificial Neural Networks 165

What is a Neural Network? 165

Artificial Neural Network Uses 166

High-Frequency Trading 166

Credit Applications 167

Data Center Management 167

Robotics 167

Medical Monitoring 168

Trusting the Black Box 168

Breaking Down the Artificial Neural Network 169

Perceptrons 169

Activation Functions 170

Multilayer Perceptrons 171

Back Propagation 173

Data Preparation for Artificial Neural Networks 174

Artificial Neural Networks with Weka 175

Generating a Dataset 175

Loading the Data into Weka 177

Configuring the Multilayer Perceptron 178

Training the Network 180

Altering the Network 182

Increasing the Test Data Size 183

Implementing a Neural Network in Java 183

Creating the Project 183

Writing the Code 185

Converting from CSV to Arff 188

Running the Neural Network 188

Developing Neural Networks with DeepLearning4J 189

Modifying the Data 189

Viewing Maven Dependencies 190

Handling the Training Data 191

Normalizing Data 191

Building the Model 192

Evaluating the Model 193

Saving the Model 193

Building and Executing the Program 194

Summary 195

Chapter 10 Machine Learning with Text Documents 197

Preparing Text for Analysis 198

Apache Tika 198

Cleaning the Text Data 203

Stopwords 205

Stemming 206

N-grams 206

TF/IDF 207

Loading the Documents 207

Calculating the Term Frequency 208

Calculating the Inverse Document Frequency 208

Computing the TF/IDF Score 209

Reviewing the Final Code Listing 209

Word2Vec 211

Loading the Raw Text Data 212

Tokenizing the Strings 212

Creating the Model 212

Evaluating the Model 213

Reviewing the Final Code 214

Basic Sentiment Analysis 216

Loading Positive and Negative Words 216

Loading Sentences 217

Calculating the Sentiment Score 217

Reviewing the Final Code 218

Performing a Test Run 220

Further Development 220

Summary 221

Chapter 11 Machine Learning with Images 223

What is an Image? 223

Introducing Color Depth 224

Images in Machine Learning 225

Basic Classifi cation with Neural Networks 226

Basic Settings 226

Loading the MNIST Images 226

Model Configuration 227

Model Training 228

Model Evaluation 228

Convolutional Neural Networks 228

How CNNs Work 228

CNN Demonstration 231

Downloading the Image Data 231

Basic Setup 232

Handling the Training and Test Data 233

Image Preparation 233

CNN Model Configuration 234

Model Training 236

Model Evaluation 236

Saving the Model 237

Transfer Learning 237

Summary 238

Chapter 12 Machine Learning Streaming with Kafka 239

What You Will Learn in This Chapter 239

From Machine Learning to Machine Learning Engineer 240

From Batch Processing to Streaming Data Processing 241

What is Kafka? 241

How Does It Work? 241

Fault Tolerance 243

Further Reading 243

Installing Kafka 243

Kafka as a Single-Node Cluster 244

Kafka as a Multinode Cluster 245

Topics Management 247

Creating Topics 248

Finding Out Information About Existing Topics 248

Deleting Topics 249

Sending Messages from the Command Line 249

Receiving Messages from the Command Line 250

Kafka Tool UI 250

Writing Your Own Producers and Consumers 251

Producers in Java 251

Consumers in Java 255

Building and Running the Applications 258

The Streaming API 260

Building a Streaming Machine Learning System 262

Planning the System 263

Continuous Training 265

Determining Which Models to Use for Predictions 266

Determining Which Algorithms to Use 268

Simple Linear Regression 271

Neural Network 274

Kafka Topics 281

Creating the Topics 281

Kafka Connect 283

Why Persist the Event Data? 283

The REST API Microservice 285

Processing Commands and Events 287

Finding Kafka Brokers 288

A Command or an Event? 289

Making Predictions 293

Prediction Streaming API 293

Prediction Functions 296

Predicting Linear Regression 298

Predicting the Neural Network Model 299

Running the Project 301

Run MySQL 301

Run Zookeeper 301

Run Kafka 301

Create the Topics 301

Run Kafka Connect 301

Model Builds 302

Run Events Streaming Application 302

Run Prediction Streaming Application 302

Start the API 302

Send JSON Training Data 302

Train a Model 302

Make a Prediction 303

Summary 303

Chapter 13 Apache Spark 305

Spark: A Hadoop Replacement? 305

Java, Scala, or Python? 306

Downloading and Installing Spark 306

A Quick Intro to Spark 306

Starting the Shell 307

Data Sources 307

Testing Spark 308

Spark Monitor 309

Comparing Hadoop MapReduce to Spark 310

Writing Stand-Alone Programs with Spark 313

Spark Programs in Java 313

Spark Program Summary 318

Spark SQL 318

Basic Concepts 318

Wrapping Up SparkSQL 323

Spark Streaming 323

Basic Concepts 323

Creating Your First Spark Stream 324

Spark Streams from Kafka 326

MLib: The Machine Learning Library 327

Dependencies 328

Decision Trees 328

Clustering 330

Association Rules with FP-Growth 332

Summary 335

Chapter 14 Machine Learning with R 337

Installing R 337

macOS 337

Windows 338

Linux 338

Your First Run 338

Installing R-Studio 339

The R Basics 340

Variables and Vectors 340

Matrices 341

Lists 342

Data Frames 343

Installing Packages 344

Loading in Data 345

Plotting Data 347

Simple Statistics 350

Simple Linear Regression 350

Creating the Data 351

The Initial Graph 351

Regression with the Linear Model 351

Making a Prediction 352

Basic Sentiment Analysis 353

Using Functions to Load in Word Lists 353

Writing a Function to Score Sentiment 354

Testing the Function 354

Apriori Association Rules 355

Installing the arules Package 355

Gathering the Training Data 356

Importing the Transaction Data 356

Running the Apriori Algorithm 357

Inspecting the Results 358

Accessing R from Java 358

Installing the rJava Package 358

Creating Your First Java Code in R 359

Calling R from Java Programs 359

Setting Up an Eclipse Project 360

Creating the Java/R Class 361

Running the Example 361

Extending Your R Implementations 363

Connecting to Social Media with R 364

Summary 366

Appendix A Kafka Quick Start 367

Installing Kafka 367

Starting Zookeeper 367

Starting Kafka 368

Creating Topics 368

Listing Topics 369

Describing a Topic 369

Deleting Topics 369

Running a Console Producer 370

Running a Console Consumer 370

Appendix B The Twitter API Developer Application Configuration 371

Appendix C Useful Unix Commands 375

Using Sample Data 375

Showing the Contents: cat, more, and less 376

Example Command 376

Expected Output 376

Filtering Content: grep 377

Example Command for Finding Text 377

Example Output 377

Sorting Data: sort 378

Example Command for Basic Sorting 378

Example Output 378

Finding Unique Occurrences: uniq 380

Showing the Top of a File: head 381

Counting Words: wc 381

Locating Anything: find 382

Combining Commands and Redirecting Output 383

Picking a Text Editor 383

Colon Frenzy: Vi and Vim 383

Nano 384

Emacs 384

Appendix D Further Reading 385

Machine Learning 385

Statistics 386

Big Data and Data Science 386

Visualization 387

Making Decisions 387

Datasets 388

Blogs 388

Useful Websites 389

The Tools of the Trade 389

Index 391

Authors

Jason Bell