Comprehensive Reference Work on Multivariate Analysis and its Applications
The first edition of this book, by Mardia, Kent and Bibby, has been used globally for over 40 years. This second edition brings many topics up to date, with a special emphasis on recent developments.
A wide range of material in multivariate analysis is covered, including the classical themes of multivariate normal theory, multivariate regression, inference, multidimensional scaling, factor analysis, cluster analysis and principal component analysis. The book also now covers modern developments such as graphical models, robust estimation, statistical learning, and high-dimensional methods. The book expertly blends theory and application, providing numerous worked examples and exercises at the end of each chapter. The reader is assumed to have a basic knowledge of mathematical statistics at an undergraduate level together with an elementary understanding of linear algebra. There are appendices which provide a background in matrix algebra, a summary of univariate statistics, a collection of statistical tables and a discussion of computational aspects. The work includes coverage of: - Basic properties of random vectors, copulas, normal distribution theory, and estimation - Hypothesis testing, multivariate regression, and analysis of variance - Principal component analysis, factor analysis, and canonical correlation analysis - Discriminant analysis, cluster analysis, and multidimensional scaling - New advances and techniques, including supervised and unsupervised statistical learning, graphical models and regularization methods for high-dimensional data
Although primarily designed as a textbook for final year undergraduates and postgraduate students in mathematics and statistics, the book will also be of interest to research workers and applied scientists.
Table of Contents
Epigraph xvii
Preface to the Second Edition xix
Preface to the First Edition xxi
Acknowledgments from First Edition xxv
Notation, Abbreviations, and Key Ideas xxvii
1 Introduction 1
1.1 Objects and Variables 1
1.2 Some Multivariate Problems and Techniques 1
1.3 The Data Matrix 7
1.4 Summary Statistics 8
1.5 Linear Combinations 12
1.6 Geometrical Ideas 14
1.7 Graphical Representation 15
1.8 Measures of Multivariate Skewness and Kurtosis 20
Exercises and Complements 22
2 Basic Properties of Random Vectors 25
Introduction 25
2.1 Cumulative Distribution Functions and Probability Density Functions 25
2.2 Population Moments 27
2.3 Characteristic Functions 31
2.4 Transformations 32
2.5 The Multivariate Normal Distribution 34
2.6 Random Samples 41
2.7 Limit Theorems 42
Exercises and Complements 44
3 Nonnormal Distributions 49
3.1 Introduction 49
3.2 Some Multivariate Generalizations of Univariate Distributions 49
3.3 Families of Distributions 52
3.4 Insights into Skewness and Kurtosis 57
3.5 Copulas 60
Exercises and Complements 65
4 Normal Distribution Theory 71
4.1 Introduction and Characterization 71
4.2 Linear Forms 73
4.3 Transformations of Normal Data Matrices 75
4.4 The Wishart Distribution 77
4.5 The Hotelling T2 Distribution 83
4.6 Mahalanobis Distance 85
4.7 Statistics Based on the Wishart Distribution 88
4.8 Other Distributions Related to the Multivariate Normal 92
Exercises and Complements 93
5 Estimation 101
Introduction 101
5.1 Likelihood and Sufficiency 101
5.2 Maximum-likelihood Estimation 106
5.3 Robust Estimation of Location and Dispersion for Multivariate Distributions 112
5.4 Bayesian Inference 117
Exercises and Complements 119
6 Hypothesis Testing 125
6.1 Introduction 125
6.2 The Techniques Introduced 127
6.3 The Techniques Further Illustrated 134
6.4 Simultaneous Confidence Intervals 142
6.5 The Behrens-Fisher Problem 144
6.6 Multivariate Hypothesis Testing: Some General Points 145
6.7 Nonnormal Data 146
6.8 Mardia’s Nonparametric Test for the Bivariate Two-sample Problem 149
Exercises and Complements 151
7 Multivariate Regression Analysis 159
7.1 Introduction 159
7.2 Maximum-likelihood Estimation 160
7.3 The General Linear Hypothesis 162
7.4 Design Matrices of Degenerate Rank 165
7.5 Multiple Correlation 167
7.6 Least-squares Estimation 171
7.7 Discarding of Variables 174
Exercises and Complements 178
8 Graphical Models 183
8.1 Introduction 183
8.2 Graphs and Conditional Independence 184
8.3 Gaussian Graphical Models 188
8.4 Log-linear Graphical Models 195
8.5 Directed and Mixed Graphs 202
Exercises and Complements 204
9 Principal Component Analysis 207
9.1 Introduction 207
9.2 Definition and Properties of Principal Components 207
9.3 Sampling Properties of Principal Components 221
9.4 Testing Hypotheses About Principal Components 227
9.5 Correspondence Analysis 230
9.6 Allometry - Measurement of Size and Shape 237
9.7 Discarding of Variables 240
9.8 Principal Component Regression 241
9.9 Projection Pursuit and Independent Component Analysis 244
9.10 PCA in High Dimensions 247
Exercises and Complements 249
10 Factor Analysis 259
10.1 Introduction 259
10.2 The Factor Model 260
10.3 Principal Factor Analysis 264
10.4 Maximum-likelihood Factor Analysis 266
10.5 Goodness-of-fit Test 269
10.6 Rotation of Factors 270
10.7 Factor Scores 275
10.8 Relationships Between Factor Analysis and Principal Component Analysis 276
10.9 Analysis of Covariance Structures 277
Exercises and Complements 277
11 Canonical Correlation Analysis 281
11.1 Introduction 281
11.2 Mathematical Development 282
11.3 Qualitative Data and Dummy Variables 288
11.4 Qualitative and Quantitative Data 290
Exercises and Complements 293
12 Discriminant Analysis and Statistical Learning 297
12.1 Introduction 297
12.2 Bayes’ Discriminant Rule 299
12.3 The Error Rate 300
12.4 Discrimination Using the Normal Distribution 304
12.5 Discarding of Variables 312
12.6 Fisher’s Linear Discriminant Function 314
12.7 Nonparametric Distance-based Methods 319
12.8 Classification Trees 323
12.9 Logistic Discrimination 332
12.10 Neural Networks 336
Exercises and Complements 342
13 Multivariate Analysis of Variance 355
13.1 Introduction 355
13.2 Formulation of Multivariate One-way Classification 355
13.3 The Likelihood Ratio Principle 356
13.4 Testing Fixed Contrasts 358
13.5 Canonical Variables and A Test of Dimensionality 359
13.6 The Union Intersection Approach 369
13.7 Two-way Classification 370
Exercises and Complements 375
14 Cluster Analysis and Unsupervised Learning 379
14.1 Introduction 379
14.2 Probabilistic Membership Models 380
14.3 Parametric Mixture Models 384
14.4 Partitioning Methods 386
14.5 Hierarchical Methods 391
14.6 Distances and Similarities 397
14.7 Grouped Data 404
14.8 Mode Seeking 406
14.9 Measures of Agreement 408
Exercises and Complements 412
15 Multidimensional Scaling 419
15.1 Introduction 419
15.2 Classical Solution 421
15.3 Duality Between Principal Coordinate Analysis and Principal Component Analysis 428
15.4 Optimal Properties of the Classical Solution and Goodness of Fit 429
15.5 Seriation 436
15.6 Nonmetric Methods 438
15.7 Goodness of Fit Measure: Procrustes Rotation 440
15.8 Multisample Problem and Canonical Variates 443
Exercises and Complements 444
16 High-dimensional Data 449
16.1 Introduction 449
16.2 Shrinkage Methods in Regression 451
16.3 Principal Component Regression 455
16.4 Partial Least Squares Regression 457
16.5 Functional Data 465
Exercises and Complements 473
A Matrix Algebra 475
A.1 Introduction 475
A.2 Matrix Operations 478
A.3 Further Particular Matrices and Types of Matrices 483
A.4 Vector Spaces, Rank, and Linear Equations 485
A.5 Linear Transformations 488
A.6 Eigenvalues and Eigenvectors 488
A.7 Quadratic Forms and Definiteness 495
A.8 Generalized Inverse 497
A.9 Matrix Differentiation and Maximization Problems 499
A.10 Geometrical Ideas 501
B Univariate Statistics 505
B.1 Introduction 505
B.2 Normal Distribution 505
B.3 Chi-squared Distribution 506
B.4 F and Beta Variables 506
B.5 t Distribution 507
B.6 Poisson Distribution 507
C R commands and Data 509
C.1 Basic R Commands Related to Matrices 509
C.2 R Libraries and Commands Used in Exercises and Figures 510
C.3 Data Availability 511
D Tables 513
References and Author Index 523
Index 543