Provides a comprehensive introduction to probability with an emphasis on computing-related applications
This self-contained new and extended edition outlines a first course in probability applied to computer-related disciplines. As in the first edition, experimentation and simulation are favoured over mathematical proofs. The freely down-loadable statistical programming language R is used throughout the text, not only as a tool for calculation and data analysis, but also to illustrate concepts of probability and to simulate distributions. The examples in Probability with R: An Introduction with Computer Science Applications, Second Edition cover a wide range of computer science applications, including: testing program performance; measuring response time and CPU time; estimating the reliability of components and systems; evaluating algorithms and queuing systems.
Chapters cover: The R language; summarizing statistical data; graphical displays; the fundamentals of probability; reliability; discrete and continuous distributions; and more.
This second edition includes:
- improved R code throughout the text, as well as new procedures, packages and interfaces;
- updated and additional examples, exercises and projects covering recent developments of computing;
- an introduction to bivariate discrete distributions together with the R functions used to handle large matrices of conditional probabilities, which are often needed in machine translation;
- an introduction to linear regression with particular emphasis on its application to machine learning using testing and training data;
- a new section on spam filtering using Bayes theorem to develop the filters;
- an extended range of Poisson applications such as network failures, website hits, virus attacks and accessing the cloud;
- use of new allocation functions in R to deal with hash table collision, server overload and the general allocation problem.
The book is supplemented with a Wiley Book Companion Site featuring data and solutions to exercises within the book.
Primarily addressed to students of computer science and related areas, Probability with R: An Introduction with Computer Science Applications, Second Edition is also an excellent text for students of engineering and the general sciences. Computing professionals who need to understand the relevance of probability in their areas of practice will find it useful.
Table of Contents
Preface to the Second Edition xiii
Preface to the First Edition xvii
Acknowledgments xxi
About the Companion Website xxiii
I The R Language 1
1 Basics of R 3
1.1 What is R? 3
1.2 Installing R 4
1.3 R Documentation 4
1.4 Basics 5
1.5 Getting Help 6
1.6 Data Entry 7
1.7 Missing Values 11
1.8 Editing 12
1.9 Tidying Up 12
1.10 Saving and Retrieving 13
1.11 Packages 13
1.12 Interfaces 14
1.13 Project 16
2 Summarizing Statistical Data 17
2.1 Measures of Central Tendency 17
2.2 Measures of Dispersion 21
2.3 Overall Summary Statistics 24
2.4 Programming in R 25
2.5 Project 30
3 Graphical Displays 31
3.1 Boxplots 31
3.2 Histograms 36
3.3 Stem and Leaf 40
3.4 Scatter Plots 40
3.5 The Line of Best Fit 43
3.6 Machine Learning and the Line of Best Fit 44
3.7 Graphical Displays Versus Summary Statistics 49
3.8 Projects 53
II Fundamentals of Probability 55
4 Probability Basics 57
4.1 Experiments, Sample Spaces, and Events 58
4.2 Classical Approach to Probability 61
4.3 Permutations and Combinations 64
4.4 The Birthday Problem 71
4.5 Balls and Bins 76
4.6 R Functions for Allocation 79
4.7 Allocation Overload 81
4.8 Relative Frequency Approach to Probability 83
4.9 Simulating Probabilities 84
4.10 Projects 89
5 Rules of Probability 91
5.1 Probability and Sets 91
5.2 Mutually Exclusive Events 92
5.3 Complementary Events 93
5.4 Axioms of Probability 94
5.5 Properties of Probability 96
6 Conditional Probability 104
6.1 Multiplication Law of Probability 107
6.2 Independent Events 108
6.3 Independence of More than Two Events 110
6.4 The Intel Fiasco 113
6.5 Law of Total Probability 115
6.6 Trees 118
6.7 Project 123
7 Posterior Probability and Bayes 124
7.1 Bayes’ Rule 124
7.2 Hardware Fault Diagnosis 131
7.3 Machine Learning and Classification 132
7.4 Spam Filtering 135
7.5 Machine Translation 137
8 Reliability 142
8.1 Series Systems 142
8.2 Parallel Systems 143
8.3 Reliability of a System 143
8.4 Series-Parallel Systems 150
8.5 The Design of Systems 153
8.6 The General System 158
III Discrete Distributions 161
9 Introduction to Discrete Distributions 163
9.1 Discrete Random Variables 163
9.2 Cumulative Distribution Function 168
9.3 Some Simple Discrete Distributions 170
9.4 Benford’s Law 174
9.5 Summarizing Random Variables: Expectation 175
9.6 Properties of Expectations 180
9.7 Simulating Discrete Random Variables and Expectations 183
9.8 Bivariate Distributions 187
9.9 Marginal Distributions 189
9.10 Conditional Distributions 190
9.11 Project 194
10 The Geometric Distribution 196
10.1 Geometric Random Variables 198
10.2 Cumulative Distribution Function 203
10.3 The Quantile Function 207
10.4 Geometric Expectations 209
10.5 Simulating Geometric Probabilities and Expectations 210
10.6 Amnesia 217
10.7 Simulating Markov 219
10.8 Projects 224
11 The Binomial Distribution 226
11.1 Binomial Probabilities 227
11.2 Binomial Random Variables 229
11.3 Cumulative Distribution Function 233
11.4 The Quantile Function 235
11.5 Reliability: The General System 238
11.6 Machine Learning 241
11.7 Binomial Expectations 245
11.8 Simulating Binomial Probabilities and Expectations 248
11.9 Projects 254
12 The Hypergeometric Distribution 255
12.1 Hypergeometric Random Variables 257
12.2 Cumulative Distribution Function 260
12.3 The Lottery 262
12.4 Hypergeometric or Binomial? 266
12.5 Projects 273
13 The Poisson Distribution 274
13.1 Death by Horse Kick 274
13.2 Limiting Binomial Distribution 275
13.3 Random Events in Time and Space 281
13.4 Probability Density Function 283
13.5 Cumulative Distribution Function 287
13.6 The Quantile Function 289
13.7 Estimating Software Reliability 290
13.8 Modeling Defects in Integrated Circuits 292
13.9 Simulating Poisson Probabilities 293
13.10 Projects 298
14 Sampling Inspection Schemes 299
14.1 Introduction 299
14.2 Single Sampling Inspection Schemes 300
14.3 Acceptance Probabilities 301
14.4 Simulating Sampling Inspection Schemes 303
14.5 Operating Characteristic Curve 308
14.6 Producer’s and Consumer’s Risks 310
14.7 Design of Sampling Schemes 311
14.8 Rectifying Sampling Inspection Schemes 315
14.9 Average Outgoing Quality 316
14.10 Double Sampling Inspection Schemes 318
14.11 Average Sample Size 319
14.12 Single Versus Double Schemes 320
14.13 Projects 324
IV Continuous Distributions 325
15 Introduction to Continuous Distributions 327
15.1 Introduction to Continuous Random Variables 328
15.2 Probability Density Function 328
15.3 Cumulative Distribution Function 331
15.4 The Uniform Distribution 332
15.5 Expectation of a Continuous Random Variable 336
15.6 Simulating Continuous Variables 338
16 The Exponential Distribution 341
16.1 Modeling Waiting Times 341
16.2 Probability Density Function of Waiting Times 342
16.3 Cumulative Distribution Function 344
16.4 Modeling Lifetimes 347
16.5 Quantiles 349
16.6 Exponential Expectations 351
16.7 Simulating Exponential Probabilities and Expectations 353
16.8 Amnesia 356
16.9 Simulating Markov 360
16.10 Project 369
17 Queues 370
17.1 The Single Server Queue 370
17.2 Traffic Intensity 371
17.3 Queue Length 372
17.4 Average Response Time 376
17.5 Extensions of the M/M/1 Queue 378
17.6 Project 382
18 The Normal Distribution 383
18.1 The Normal Probability Density Function 385
18.2 The Cumulative Distribution Function 387
18.3 Quantiles 389
18.4 The Standard Normal Distribution 391
18.5 Achieving Normality: Limiting Distributions 394
18.6 Projects 405
19 Process Control 407
19.1 Control Charts 407
19.2 Cusum Charts 411
19.3 Charts for Defective Rates 412
19.4 Project 416
V Tailing Off 417
20 The Inequalities of Markov and Chebyshev 419
20.1 Markov’s Inequality 420
20.2 Algorithm Runtime 426
20.3 Chebyshev’s Inequality 427
Appendix A: Data: Examination Results 433
Appendix B: The Line of Best Fit: Coefficient Derivations 437
Appendix C: Variance Derivations 440
Appendix D: Binomial Approximation to the Hypergeometric 446
Appendix E: Normal Tables 448
Appendix F: The Inequalities of Markov and Chebyshev 450
Index to R Commands 453
Index 457
Postface