+353-1-416-8900REST OF WORLD
+44-20-3973-8888REST OF WORLD
1-917-300-0470EAST COAST U.S
1-800-526-8630U.S. (TOLL FREE)

Demystifying Deep Learning. An Introduction to the Mathematics of Neural Networks. Edition No. 1

  • Book

  • 256 Pages
  • November 2023
  • John Wiley and Sons Ltd
  • ID: 5864028
DEMYSTIFYING DEEP LEARNING

Discover how to train Deep Learning models by learning how to build real Deep Learning software libraries and verification software!

The study of Deep Learning and Artificial Neural Networks (ANN) is a significant subfield of artificial intelligence (AI) that can be found within numerous fields: medicine, law, financial services, and science, for example. Just as the robot revolution threatened blue-collar jobs in the 1970s, so now the AI revolution promises a new era of productivity for white collar jobs. Important tasks have begun being taken over by ANNs, from disease detection and prevention, to reading and supporting legal contracts, to understanding experimental data, model protein folding, and hurricane modeling. AI is everywhere - on the news, in think tanks, and occupies government policy makers all over the world - and ANNs often provide the backbone for AI.

Relying on an informal and succinct approach, Demystifying Deep Learning is a useful tool to learn the necessary steps to implement ANN algorithms by using both a software library applying neural network training and verification software. The volume offers explanations of how real ANNs work, and includes 6 practical examples that demonstrate in real code how to build ANNs and the datasets they need in their implementation, available in open-source to ensure practical usage. This approachable book follows ANN techniques that are used every day as they adapt to natural language processing, image recognition, problem solving, and generative applications. This volume is an important introduction to the field, equipping the reader for more advanced study. - An accessible motivation and elucidation of how transformers, the basis of large language models (LLM) such as ChatGPT, work.

Demystifying Deep Learning is ideal for engineers and professionals that need to learn and understand ANNs in their work. It is also a helpful text for advanced undergraduates to get a solid grounding on the topic.

Table of Contents

About the Author ix

Acronyms x

1 Introduction 1

1.1 AI/ML - Deep Learning? 5

1.2 A Brief History 6

1.3 The Genesis of Models 9

1.3.1 Rise of the Empirical Functions 9

1.3.2 The Biological Phenomenon and the Analogue 13

1.4 Numerical Computation-Computer Numbers Are Not Real 14

1.4.1 The IEEE 754 Floating Point System 15

1.4.2 Numerical Coding Tip: Think in Floating Point 18

1.5 Summary 20

1.6 Projects 21

2 Deep Learning and Neural Networks 23

2.1 Feed-Forward and Fully-Connected Artificial Neural Networks 24

2.2 Computing Neuron State 29

2.2.1 Activation Functions 29

2.3 The Feed-Forward ANN Expressed with Matrices 31

2.3.1 Neural Matrices: A Convenient Notation 32

2.4 Classification 33

2.4.1 Binary Classification 34

2.4.2 One-Hot Encoding 36

2.4.3 The Softmax Layer 38

2.5 Summary 39

2.6 Projects 40

3 Training Neural Networks 41

3.1 Preparing the Training Set: Data Preprocessing 42

3.2 Weight Initialization 45

3.3 Training Outline 47

3.4 Least Squares: A Trivial Example 49

3.5 Backpropagation of Error for Regression 51

3.5.1 The Terminal Layer (Output) 54

3.5.2 Backpropagation: The Shallower Layers 57

3.5.3 The Complete Backpropagation Algorithm 61

3.5.4 AWord on the Rectified Linear Unit (ReLU) 62

3.6 Stochastic Sine 64

3.7 Verification of a Software Implementation 66

3.8 Summary 70

3.9 Projects 71

4 Training Classifiers 73

4.1 Backpropagation for Classifiers 73

4.1.1 Likelihood 74

4.1.2 Categorical Loss Functions 75

4.2 Computing the Derivative of the Loss 77

4.2.1 Initiate Backpropagation 80

4.3 Multilabel Classification 81

4.3.1 Binary Classification 82

4.3.2 Training A Multilabel Classifier ANN 82

4.4 Summary 84

4.5 Projects 85

5 Weight Update Strategies 87

5.1 Stochastic Gradient Descent 87

5.2 Weight Updates as Iteration and Convex Optimization 92

5.2.1 Newton's Method for Optimization 93

5.3 RPROP+ 96

5.4 Momentum Methods 99

5.4.1 AdaGrad and RMSProp 100

5.4.2 ADAM 101

5.5 Levenberg-Marquard Optimization for Neural Networks 103

5.6 Summary 108

5.7 Projects 109

6 Convolutional Neural Networks 111

6.1 Motivation 112

6.2 Convolutions and Features 113

6.3 Filters 117

6.4 Pooling 119

6.5 Feature Layers 120

6.6 Training a CNN 123

6.6.1 Flatten and the Gradient 123

6.6.2 Pooling and the Gradient 124

6.6.3 Filters and the Gradient 125

6.7 Applications 129

6.8 Summary 130

6.9 Projects 130

7 Fixing the Fit 133

7.1 Quality of the Solution 133

7.2 Generalization Error 134

7.2.1 Bias 134

7.2.2 Variance 135

7.2.3 The Bias-Variance Trade-off 136

7.2.4 The Bias-Variance Trade-off in Context 138

7.2.5 The Test Set 138

7.3 Classification Performance 140

7.4 Regularization 143

7.4.1 Forward Pass During Training 143

7.4.2 Forward Pass During Normal Inference 145

7.4.3 Backpropagation of Error 146

7.5 Advanced Normalization 148

7.5.1 Batch Normalization 149

7.5.2 Layer Normalization 154

7.6 Summary 156

7.7 Projects 157

8 Design Principles for a Deep Learning Training Library 159

8.1 Computer Languages 160

8.2 The Matrix: Crux of a Library Implementation 164

8.2.1 Memory Access and Modern CPU Architectures 165

8.2.2 Designing Matrix Computations 168

8.2.2.1 Convolutions as Matrices 170

8.3 The Framework 171

8.4 Summary 173

8.5 Projects 173

9 Vistas 175

9.1 The Limits of ANN Learning Capacity 175

9.2 Generative Adversarial Networks 177

9.2.1 GAN Architecture 178

9.2.2 The GAN Loss Function 180

9.3 Reinforcement Learning 183

9.3.1 The Elements of Reinforcement Learning 185

9.3.2 A Trivial RL Training Algorithm 187

9.4 Natural Language Processing Transformed 193

9.4.1 The Challenges of Natural Language 195

9.4.2 Word Embeddings 195

9.4.3 Attention 198

9.4.4 Transformer Blocks 200

9.4.5 Multi-Head Attention 204

9.4.6 Transformer Applications 205

9.5 Neural Turing Machines 207

9.6 Summary 210

9.7 Projects 210

Appendix A Mathematical Review 211

A.1 Linear Algebra 211

A.1.1 Vectors 211

A.1.2 Matrices 212

A.1.3 Matrix Properties 214

A.1.4 Linear Independence 215

A.1.5 The QR Decomposition 215

A.1.6 Least Squares 215

A.1.7 Eigenvalues and Eigenvectors 216

A.1.8 Hadamard Operations 216

A.2 Basic Calculus 217

A.2.1 The Product Rule 217

A.2.2 The Chain Rule 218

A.2.3 Multivariable Functions 218

A.2.4 Taylor Series 218

A.3 Advanced Matrices 219

A.4 Probability 219

Glossary 221

References 229

Index 243

Authors

Douglas J. Santry University of Kent, UK.