In Google Cloud Certified Professional Machine Learning Study Guide, a team of accomplished artificial intelligence (AI) and machine learning (ML) specialists delivers an expert roadmap to AI and ML on the Google Cloud Platform based on new exam curriculum. With Sybex, you’ll prepare faster and smarter for the Google Cloud Certified Professional Machine Learning Engineer exam and get ready to hit the ground running on your first day at your new job as an ML engineer.
The book walks readers through the machine learning process from start to finish, starting with data, feature engineering, model training, and deployment on Google Cloud. It also discusses best practices on when to pick a custom model vs AutoML or pretrained models with Vertex AI platform. All technologies such as Tensorflow, Kubeflow, and Vertex AI are presented by way of real-world scenarios to help you apply the theory to practical examples and show you how IT professionals design, build, and operate secure ML cloud environments.
The book also shows you how to: - Frame ML problems and architect ML solutions from scratch - Banish test anxiety by verifying and checking your progress with built-in self-assessments and other practical tools - Use the Sybex online practice environment, complete with practice questions and explanations, a glossary, objective maps, and flash cards
A can’t-miss resource for everyone preparing for the Google Cloud Certified Professional Machine Learning certification exam, or for a new career in ML powered by the Google Cloud Platform, this Sybex Study Guide has everything you need to take the next step in your career.
Table of Contents
Introduction xxi
Assessment Testxxxii
Chapter 1 Framing ML Problems 1
Translating Business Use Cases 3
Machine Learning Approaches 5
Supervised, Unsupervised, and Semi- supervised Learning 5
Classification, Regression, Forecasting, and Clustering 7
ML Success Metrics 8
Regression 12
Responsible AI Practices 13
Summary 14
Exam Essentials 14
Review Questions 15
Chapter 2 Exploring Data and Building Data Pipelines 19
Visualization 20
Box Plot 20
Line Plot 21
Bar Plot 21
Scatterplot 22
Statistics Fundamentals 22
Mean 22
Median 22
Mode 23
Outlier Detection 23
Standard Deviation 23
Correlation 24
Data Quality and Reliability 24
Data Skew 25
Data Cleaning 25
Scaling 25
Log Scaling 26
Z-score 26
Clipping 26
Handling Outliers 26
Establishing Data Constraints 27
Exploration and Validation at Big- Data Scale 27
Running TFDV on Google Cloud Platform 28
Organizing and Optimizing Training Datasets 29
Imbalanced Data 29
Data Splitting 31
Data Splitting Strategy for Online Systems 31
Handling Missing Data 32
Data Leakage 33
Summary 34
Exam Essentials 34
Review Questions 36
Chapter 3 Feature Engineering 39
Consistent Data Preprocessing 40
Encoding Structured Data Types 41
Mapping Numeric Values 42
Mapping Categorical Values 42
Feature Selection 44
Class Imbalance 44
Classification Threshold with Precision and Recall 45
Area under the Curve (AUC) 46
Feature Crosses 46
TensorFlow Transform 49
TensorFlow Data API (tf.data) 49
TensorFlow Transform 49
GCP Data and ETL Tools 51
Summary 51
Exam Essentials 52
Review Questions 53
Chapter 4 Choosing the Right ML Infrastructure 57
Pretrained vs. AutoML vs. Custom Models 58
Pretrained Models 60
Vision AI 61
Video AI 62
Natural Language AI 62
Translation AI 63
Speech- to- Text 63
Text- to- Speech 64
AutoML 64
AutoML for Tables or Structured Data 64
AutoML for Images and Video 66
AutoML for Text 67
Recommendations AI/Retail AI 68
Document AI 69
Dialogflow and Contact Center AI 69
Custom Training 70
How a CPU Works 71
GPU 71
TPU 72
Provisioning for Predictions 74
Scaling Behavior 75
Finding the Ideal Machine Type 75
Edge TPU 76
Deploy to Android or iOS Device 76
Summary 77
Exam Essentials 77
Review Questions 78
Chapter 5 Architecting ML Solutions 83
Designing Reliable, Scalable, and Highly Available ml Solutions 84
Choosing an Appropriate ML Service 86
Data Collection and Data Management 87
Google Cloud Storage (GCS) 88
BigQuery 88
Vertex AI Managed Datasets 89
Vertex AI Feature Store 89
NoSQL Data Store 90
Automation and Orchestration 91
Use Vertex AI Pipelines to Orchestrate the ML Workflow 92
Use Kubeflow Pipelines for Flexible Pipeline Construction 92
Use TensorFlow Extended SDK to Leverage Pre-built Components for Common Steps 93
When to Use Which Pipeline 93
Serving 94
Offline or Batch Prediction 94
Online Prediction 95
Summary 97
Exam Essentials 97
Review Questions 98
Chapter 6 Building Secure ML Pipelines 103
Building Secure ML Systems 104
Encryption at Rest 104
Encryption in Transit 105
Encryption in Use 105
Identity and Access Management 105
IAM Permissions for Vertex AI Workbench 106
Securing a Network with Vertex AI 109
Privacy Implications of Data Usage and Collection 113
Google Cloud Data Loss Prevention 114
Google Cloud Healthcare API for PHI Identification 115
Best Practices for Removing Sensitive Data 116
Summary 117
Exam Essentials 118
Review Questions 119
Chapter 7 Model Building 121
Choice of Framework and Model Parallelism 122
Data Parallelism 122
Model Parallelism 123
Modeling Techniques 125
Artificial Neural Network 126
Deep Neural Network (DNN) 126
Convolutional Neural Network 126
Recurrent Neural Network 127
What Loss Function to Use 127
Gradient Descent 128
Learning Rate 129
Batch 129
Batch Size 129
Epoch 129
Hyperparameters 129
Transfer Learning 130
Semi-supervised Learning 131
When You Need Semi-supervised Learning 131
Limitations of SSL 131
Data Augmentation 132
Offline Augmentation 132
Online Augmentation 132
Model Generalization and Strategies to Handle Overfitting and Underfitting 133
Bias Variance Trade- Off 133
Underfitting 133
Overfitting 134
Regularization 134
Summary 136
Exam Essentials 137
Review Questions 138
Chapter 8 Model Training and Hyperparameter Tuning 143
Ingestion of Various File Types into Training 145
Collect 146
Process 147
Store and Analyze 150
Developing Models in Vertex AI Workbench by Using Common Frameworks 151
Creating a Managed Notebook 153
Exploring Managed JupyterLab Features 154
Data Integration 155
BigQuery Integration 155
Ability to Scale the Compute Up or Down 156
Git Integration for Team Collaboration 156
Schedule or Execute a Notebook Code 158
Creating a User-Managed Notebook 159
Training a Model as a Job in Different Environments 161
Training Workflow with Vertex AI 162
Training Dataset Options in Vertex AI 163
Pre-built Containers 163
Custom Containers 166
Distributed Training 168
Hyperparameter Tuning 169
Why Hyperparameters Are Important 170
Techniques to Speed Up Hyperparameter Optimization 171
How Vertex AI Hyperparameter Tuning Works 171
Vertex AI Vizier 174
Tracking Metrics During Training 175
Interactive Shell 175
TensorFlow Profiler 177
What-If Tool 177
Retraining/Redeployment Evaluation 178
Data Drift 178
Concept Drift 178
When Should a Model Be Retrained? 178
Unit Testing for Model Training and Serving 179
Testing for Updates in API Calls 180
Testing for Algorithmic Correctness 180
Summary 180
Exam Essentials 181
Review Questions 182
Chapter 9 Model Explainability on Vertex AI 187
Model Explainability on Vertex AI 188
Explainable AI 188
Interpretability and Explainability 189
Feature Importance 189
Vertex Explainable AI 189
Data Bias and Fairness 193
ML Solution Readiness 194
How to Set Up Explanations in the Vertex AI 195
Summary 196
Exam Essentials 196
Review Questions 197
Chapter 10 Scaling Models in Production 199
Scaling Prediction Service 200
TensorFlow Serving 201
Serving (Online, Batch, and Caching) 203
Real- Time Static and Dynamic Reference Features 203
Pre-computing and Caching Prediction 206
Google Cloud Serving Options 207
Online Predictions 207
Batch Predictions 212
Hosting Third- Party Pipelines (MLFlow) on Google Cloud 213
Testing for Target Performance 214
Configuring Triggers and Pipeline Schedules 215
Summary 216
Exam Essentials 217
Review Questions 218
Chapter 11 Designing ML Training Pipelines 221
Orchestration Frameworks 223
Kubeflow Pipelines 224
Vertex AI Pipelines 225
Apache Airflow 228
Cloud Composer 229
Comparison of Tools 229
Identification of Components, Parameters, Triggers, and Compute Needs 230
Schedule the Workflows with Kubeflow Pipelines 230
Schedule Vertex AI Pipelines 232
System Design with Kubeflow/TFX 232
System Design with Kubeflow DSL 232
System Design with TFX 234
Hybrid or Multicloud Strategies 235
Summary 236
Exam Essentials 237
Review Questions 238
Chapter 12 Model Monitoring, Tracking, and Auditing Metadata 241
Model Monitoring 242
Concept Drift 242
Data Drift 243
Model Monitoring on Vertex AI 243
Drift and Skew Calculation 244
Input Schemas 245
Logging Strategy 247
Types of Prediction Logs 247
Log Settings 248
Model Monitoring and Logging 248
Model and Dataset Lineage 249
Vertex ML Metadata 249
Vertex AI Experiments 252
Vertex AI Debugging 253
Summary 253
Exam Essentials 254
Review Questions 255
Chapter 13 Maintaining ML Solutions 259
MLOps Maturity 260
MLOps Level 0: Manual/Tactical Phase 261
MLOps Level 1: Strategic Automation Phase 263
MLOps Level 2: CI/CD Automation, Transformational Phase 264
Retraining and Versioning Models 266
Triggers for Retraining 267
Versioning Models 267
Feature Store 268
Solution 268
Data Model 269
Ingestion and Serving 269
Vertex AI Permissions Model 270
Custom Service Account 270
Access Transparency in Vertex AI 271
Common Training and Serving Errors 271
Training Time Errors 271
Serving Time Errors 271
TensorFlow Data Validation 272
Vertex AI Debugging Shell 272
Summary 272
Exam Essentials 273
Review Questions 274
Chapter 14 BigQuery ML 279
BigQuery - Data Access 280
BigQuery ML Algorithms 282
Model Training 282
Model Evaluation 284
Prediction 285
Explainability in BigQuery ML 286
BigQuery ML vs. Vertex AI Tables 289
Interoperability with Vertex AI 289
Access BigQuery Public Dataset 289
Import BigQuery Data into Vertex AI 290
Access BigQuery Data from Vertex AI Workbench Notebooks 290
Analyze Test Prediction Data in BigQuery 290
Export Vertex AI Batch Prediction Results 290
Export BigQuery Models into Vertex AI 291
BigQuery Design Patterns 291
Hashed Feature 291
Transforms 291
Summary 292
Exam Essentials 293
Review Questions 294
Appendix Answers to Review Questions 299
Chapter 1: Framing ML Problems 300
Chapter 2: Exploring Data and Building Data Pipelines 301
Chapter 3: Feature Engineering 302
Chapter 4: Choosing the Right ML Infrastructure 302
Chapter 5: Architecting ML Solutions 304
Chapter 6: Building Secure ML Pipelines 305
Chapter 7: Model Building 306
Chapter 8: Model Training and Hyperparameter Tuning 307
Chapter 9: Model Explainability on Vertex AI 308
Chapter 10: Scaling Models in Production 308
Chapter 11: Designing ML Training Pipelines 309
Chapter 12: Model Monitoring, Tracking, and Auditing Metadata 310
Chapter 13: Maintaining ML Solutions 311
Chapter 14: BigQuery ML 313
Index 315