A complete discussion of fundamental and advanced topics in Item Response Theory written by pioneers in the field
In Item Response Theory, accomplished psychometricians Darrell Bock and Robert Gibbons deliver a comprehensive and up-to-date exploration of the theoretical foundations and applications of Item Response Theory (IRT). Covering both unidimensional and multidimensional IRT, as well as related adaptive test administration of previously calibrated item banks, the book addresses the growing need for understanding of this topic as the use of IRT spreads to other fields.
The first book on the topic that offers a complete and unified treatment of its subject, Item Response Theory prepares researchers and students to understand and apply IRT and multidimensional IRT to fields like education, mental health and marketing. Accessible to first year-graduate students with a foundation in the behavioral or social sciences, basic statistics, and generalized linear models, the book walks readers through everything from the logic of IRT to cutting edge applications of the technique.
Readers will also benefit from the inclusion of:
• A thorough introduction to the foundations of Item Response Theory, including its logic and origins, model-based measurement, psychological scaling, and classical test theory
• An exploration of selected mathematical and statistical results, including points, point sets, and set operations, probability, sampling, and joint, conditional, and marginal probability
• Discussions of unidimensional and multidimensional IRT models, including item parameter estimation with binary and polytomous data
• Analysis of dimensionality, differential item functioning, and multiple group IRT
Perfect for graduate students and researchers studying and working with psychometrics in psychology, quantitative psychology, educational measurement, marketing, and statistics, Item Response Theory will also benefit researchers interested in patient reported outcomes in health research.
Table of Contents
Preface xvii
Acknowledgments xix
1 Foundations 1
1.1 The Logic of Item Response Theory 3
1.2 Model-based Data Analysis 4
1.3 Origins 5
1.3.1 Psychometric Scaling 6
1.3.2 Classical Test Theory 9
1.3.3 Contributions fromStatistics 10
1.4 The Population Concept in IRT 11
1.5 Generalizability Theory 14
2 Selected Mathematical and Statistical Results 21
2.1 Points, Point Sets, and Set Operations 21
2.2 Probability 24
2.3 Sampling 25
2.4 Joint, Conditional, and Marginal Probability 26
2.5 Probability Distributions and Densities 28
2.6 Describing Distributions 32
2.7 Functions of RandomVariables 34
2.7.1 Linear Functions 34
2.7.2 Nonlinear Functions 37
2.8 Elements ofMatrix Algebra 37
2.8.1 PartitionedMatrices 41
2.8.2 The Kronecker Product 42
2.8.3 Row and ColumnMatrices 43
2.8.4 Matrix Inversion 43
2.9 Determinants 45
2.10 Matrix Differentiation 45
2.10.1 Scalar Functions of Vector Variables 46
2.10.2 Vector Functions of a Vector Variable 47
2.10.3 Scalar Functions of aMatrix Variable 48
2.10.4 Chain Rule for Scalar Functions of a Matrix Variable 49
2.10.5 Matrix Functions of aMatrix Variable 49
2.10.6 Derivatives of a Scalar Function with Respect to a SymmetricMatrix 50
2.10.7 Second-order Differentiation 52
2.11 Theory of Estimation 53
2.11.1 Analysis of Variance 56
2.11.2 Estimating VarianceComponents 57
2.12 MaximumLikelihoodEstimation (MLE) 59
2.12.1 Likelihood Functions 59
2.12.2 The LikelihoodEquations 60
2.12.3 Examples of Maximum Likelihood Estimation 60
2.12.4 SamplingDistribution of the Estimator 62
2.12.5 The Fisher-scoring Solution of the Likelihood Equations 63
2.12.6 Properties of the Maximum Likelihood Estimator (MLE) 63
2.12.7 Constrained Estimation 64
2.12.8 Admissibility 64
2.13 Bayes Estimation 65
2.14 TheMaximumA Posteriori (MAP) Estimator 68
2.15 Marginal Maximum Likelihood Estimation (MMLE) 69
2.15.1 TheMarginal Likelihood Equations 70
2.15.2 Application in the “Normal-Normal” Case 72
2.15.3 The EMSolution 75
2.15.4 The Fisher-scoring Solution 75
2.16 Probit and LogitAnalysis 77
2.16.1 ProbitAnalysis 77
2.16.2 LogitAnalysis 79
2.16.3 Logit-linearAnalysis 80
2.16.4 Extension of Logit-linear Analysis to Multinomial Data 82
2.16.4.1 Graded Categories 83
2.16.4.2 NominalCategories 85
2.17 SomeResults fromClassical Test Theory 88
2.17.1 Test Reliability 90
2.17.2 Estimating Reliability 91
2.17.2.1 Bayes Estimation of True Scores 96
2.17.3 When are the Assumptions of Classical Test Theory Reasonable? 97
3 Unidimensional IRT Models 101
3.1 The General IRT Framework 103
3.2 Item ResponseModels 104
3.2.1 DichotomousCategories 105
3.2.1.1 Normal OgiveModel 105
3.2.1.2 2-PLModel 109
3.2.1.3 3-PLModel 111
3.2.1.4 1-PLModel 113
3.2.1.5 Illustration 114
3.2.2 PolytomousCategories 115
3.2.2.1 Graded CategoriesModel 118
3.2.2.2 Illustration 120
3.2.2.3 The NominalCategoriesModel 122
3.2.2.4 Nominal Multiple-Choice Model 130
3.2.2.5 Illustration 132
3.2.2.6 Partial CreditModel 135
3.2.2.7 Generalized Partial Credit Model 136
3.2.2.8 Illustration 136
3.2.2.9 Rating ScaleModels 136
3.2.3 RankingModel 139
4 Item Parameter Estimation - Binary Data 141
4.1 Estimation of Item Parameters Assuming Known Attribute
Values of the Respondents 142
4.1.1 Estimation 143
4.1.1.1 The 1-parameterModel 143
4.1.1.2 The 2-parameterModel 144
4.1.1.3 The 3-parameterModel 145
4.2 Estimation of Item Parameters Assuming Unknown Attribute Values of the Respondents 146
4.2.1 Joint Maximum Likelihood Estimation (JML) 147
4.2.1.1 The 1-parameter Logistic Model 147
4.2.1.2 Logit-linearAnalysis 148
4.2.1.3 Proportional Marginal Adjustments 153
4.2.2 Marginal Maximum Likelihood Estimation (MML) 158
4.2.2.1 The 2-parameterModel 162
5 Item Parameter Estimation - Polytomous Data 177
5.1 General Results 177
5.2 The Normal OgiveModel 182
5.3 The NominalCategoriesModel 183
5.4 The Graded CategoriesModel 185
5.5 The Generalized Partial Credit Model 188
5.5.1 The Unrestricted Version 189
5.5.2 The EMSolution 190
5.5.2.1 The GPCM Newton-Gauss Joint Solution 191
5.5.3 Rating ScaleModels 191
5.5.3.1 The EMSolution for the RSM 192
5.5.3.2 The Newton-Gauss Solution for the RSM 193
5.6 Boundary Problems 194
5.7 MultipleGroupModels 196
5.8 Discussion 197
5.9 Conclusions 200
6 Multidimensional IRT Models 201
6.1 Classical Multiple Factor Analysis of Test Scores 202
6.2 Classical Item Factor Analysis 203
6.3 Item Factor Analysis Based on Item Response Theory 205
6.4 Maximum Likelihood Estimation of Item Slopes and Intercepts 206
6.4.1 Estimating Parameters of the Item Response Model 208
6.5 Indeterminacies of Item Factor Analysis 212
6.5.1 Direction of Response 212
6.5.2 Indeterminacy of Location and Scale 212
6.5.3 Rotational Indeterminacy of Factor Loadings in exploratory Factor Analysis 213
6.5.3.1 Varimax Factor Pattern 214
6.5.3.2 Promax Factor Pattern 214
6.5.3.3 General andGroup Factors 215
6.5.3.4 Confirmatory Item Factor Analysis and the Bifactor Pattern 215
6.6 Estimation of Item Parameters and Respondent Scores in Item Bifactor Analysis 218
6.7 Estimating Factor Scores 219
6.8 Example 220
6.8.1 Exploratory Item Factor Analysis 221
6.8.2 Confirmatory Item Bifactor Analysis 223
6.9 Two-tierModel 227
6.10 Summary 230
7 Analysis of Dimensionality 233
7.1 Unidimensional Models and Multidimensional Data 234
7.2 Limited-InformationGoodness of Fit Tests 237
7.3 Example 240
7.3.1 Exploratory Item Factor Analysis 240
7.3.2 Confirmatory Item Bifactor Analysis 241
7.4 Discussion 242
8 Computerized Adaptive Testing 243
8.1 What is Computerized AdaptiveTesting? 243
8.2 Computerized Adaptive Testing - An Overview 244
8.3 Item Selection 245
8.3.1 UnidimensionalComputerized Adaptive Testing (UCAT) 246
8.3.1.1 Fisher Information in IRT Model 246
8.3.1.2 Maximizing Fisher Information (MFI) and Its Limitations 248
8.3.1.3 Modifications toMFI 249
8.3.2 MultidimensionalComputerized Adaptive Testing (MCAT) 251
8.3.2.1 Two Conceptualizations of the Information Function in Multidimensional Space 252
8.3.2.2 SelectionMethods inMCAT 253
8.3.3 Bifactor IRT 256
8.4 Terminating an Adaptive Test 257
8.5 AdditionalConsiderations 258
8.6 An Example fromMental HealthMeasurement 260
8.6.1 The CAT-Mental Health 261
8.6.2 Discussion 264
9 Differential Item Functioning 267
9.1 Introduction 267
9.2 Types of DIF 268
9.3 TheMantel-Haenszel Procedure 270
9.4 Lord’sWald Test 271
9.5 LagrangeMultiplier Test 272
9.6 LogisticRegression 273
9.7 Assessing DIF for the BifactorModel 275
9.8 Assessing DIF fromCATData 276
10 Estimating Respondent Attributes 279
10.1 Introduction 279
10.2 Ability Estimation 279
10.2.1 MaximumLikelihood280
10.2.2 BayesMAP 281
10.2.3 Bayes EAP 281
10.2.4 Ability Estimation for Polytomous data 282
10.2.5 Ability Estimation for Multidimensional IRT Models 283
10.2.6 Ability Estimation for the Bifactor Model 284
10.2.7 Estimation of the Ability Distribution 284
10.2.8 Domain Scores 285
11 Multiple Group Item Response Models 287
11.1 Introduction 287
11.2 IRT Estimation when the Grouping Structure is Known: TraditionalMultipleGroup
IRT 288
11.2.1 Example 291
11.3 IRT Estimation when the Grouping Structure is Unknown: Mixtures of Gaussian Components 292
11.3.1 TheMixture Distribution 293
11.3.2 The LikelihoodComponent 295
11.3.3 Algorithm 296
11.3.4 Unequal Variances 297
11.4 MultivariateProbit Analysis 297
11.4.1 TheModel 299
11.4.2 Identification 300
11.4.3 Estimation 300
11.4.4 Tests of Fit 301
11.4.5 Illustration 302
11.5 Multilevel IRTModels 306
11.5.1 The RaschModel 306
11.5.2 The Two-parameter LogisticModel 308
11.5.3 Estimation 308
11.5.4 Illustration 309
12 Test and Scale Development and Maintenance 311
12.1 Introduction 311
12.2 Item Banking 311
12.3 Item Calibration 314
12.3.1 The OEMMethod 315
12.3.2 TheMEMMethod 315
12.3.3 Stocking’sMethod A 315
12.3.4 Stocking’sMethod B 316
12.4 IRT Equating 318
12.4.1 Linking, Scale Aligning and Equating 318
12.4.2 Experimental Designs for Equating 319
12.4.2.1 SingleGroup (SG)Design 319
12.4.2.2 Equivalent Groups (EG) Design 319
12.4.2.3 Counterbalanced (CB) Design 319
12.4.2.4 The Anchor Test or Nonequivalent Groups with Anchor Test (NEAT) Design 319
12.5 Harmonization 320
12.6 Item Parameter Drift 322
12.7 Summary 323
13 Some Interesting Applications 325
13.1 Introduction 325
13.2 Bio-behavioral Synthesis 325
13.3 Mental HealthMeasurement 328
13.3.1 The CAT-Depression Inventory 328
13.3.2 The CAT-Anxiety Scale 330
13.3.3 The Measurement of Suicidality and the Prediction of Future Suicidal Attempt 331
13.3.4 Clinician and Self-rated Psychosis Measurement 332
13.3.5 Substance Use Disorder 334
13.3.6 Special Populations and Differential Item Functioning 335
13.3.6.1 Perinatal 335
13.3.6.2 Emergency Medicine 336
13.3.6.3 Latinos Taking Tests in Spanish 336
13.3.6.4 Criminal Justice 338
13.3.7 Intensive LongitudinalData 339
13.4 IRT inMachine Learning 340
Bibliography 343
Index 361