Addresses the international use of administrative records for large-scale surveys, censuses, and other statistical purposes
Administrative Records for Survey Methodology is a comprehensive guide to improving the quality, cost-efficiency, and interpretability of surveys and censuses using administrative data research. Contributions from a team of internationally-recognized experts provide practical approaches for integrating administrative data in statistical surveys, and discuss the methodological issues - including concerns of privacy, confidentiality, and legality - involved in collecting and analyzing administrative records. Numerous real-world examples highlight technological and statistical innovations, helping readers gain a better understanding of both fundamental methods and advanced techniques for controlling data quality reducing total survey error.
Divided into four sections, the first describes the basics of administrative records research and addresses disclosure limitation and confidentiality protection in linked data. Section two focuses on data quality and linking methodology, covering topics such as quality evaluation, measuring and controlling for non-consent bias, and cleaning and using administrative lists. The third section examines the use of administrative records in surveys and includes case studies of the Swedish register-based census and the administrative records applications used for the US 2020 Census. The book’s final section discusses combining administrative and survey data to improve income measurement, enhancing health surveys with data linkage, and other uses of administrative data in evidence-based policymaking. This state-of-the-art resource:- Discusses important administrative data issues and suggests how administrative data can be integrated with more traditional surveys- Describes practical uses of administrative records for evidence-driven decisions in both public and private sectors- Emphasizes using interdisciplinary methodology and linking administrative records with other data sources- Explores techniques to leverage administrative data to improve the survey frame, reduce nonresponse follow-up, assess coverage error, measure linkage non-consent bias, and perform small area estimation.
Administrative Records for Survey Methodology is an indispensable reference and guide for statistical researchers and methodologists in academia, industry, and government, particularly census bureaus and national statistical offices, and an ideal supplemental text for undergraduate and graduate courses in data science, survey methodology, data collection, and data analysis methods.
Table of Contents
Preface xv
Acknowledgments xxi
List of Contributors xxiii
Part I Fundamentals of Administrative Records Research and Applications 1
1 On the Use of Proxy Variables in Combining Register and Survey Data 3
Li-Chun Zhang
1.1 Introduction 3
1.1.1 A Multisource Data Perspective 3
1.1.2 Concept of Proxy Variable 5
1.2 Instances of Proxy Variable 7
1.2.1 Representation 7
1.2.2 Measurement 10
1.3 Estimation Using Multiple Proxy Variables 12
1.3.1 Asymmetric Setting 13
1.3.2 Uncertainty Evaluation: A Case of Two-Way Data 15
1.3.3 Symmetric Setting 17
1.4 Summary 20
References 20
2 Disclosure Limitation and Confidentiality Protection in Linked Data 25
John M. Abowd, Ian M. Schmutte, and Lars Vilhuber
2.1 Introduction 25
2.2 Paradigms of Protection 27
2.2.1 Input Noise Infusion 29
2.2.2 Formal Privacy Models 30
2.3 Confidentiality Protection in Linked Data: Examples 32
2.3.1 HRS-SSA 32
2.3.1.1 Data Description 32
2.3.1.2 Linkages to Other Data 32
2.3.1.3 Disclosure Avoidance Methods 33
2.3.2 SIPP-SSA-IRS (SSB) 34
2.3.2.1 Data Description 34
2.3.2.2 Disclosure Avoidance Methods 35
2.3.2.3 Disclosure Avoidance Assessment 35
2.3.2.4 Analytical Validity Assessment 37
2.3.3 LEHD: Linked Establishment and Employee Records 38
2.3.3.1 Data Description 38
2.3.3.2 Disclosure Avoidance Methods 39
2.3.3.3 Disclosure Avoidance Assessment for QWI 41
2.3.3.4 Analytical Validity Assessment for QWI 42
2.4 Physical and Legal Protections 43
2.4.1 Statistical Data Enclaves 44
2.4.2 Remote Processing 46
2.4.3 Licensing 46
2.4.4 Disclosure Avoidance Methods 47
2.4.5 Data Silos 48
2.5 Conclusions 49
2.A.1 Other Abbreviations 51
2.A.2 Concepts 52
Acknowledgments 54
References 54
Part II Data Quality of Administrative Records and Linking Methodology 61
3 Evaluation of the Quality of Administrative Data Used in the Dutch Virtual Census 63
Piet Daas, Eric S. Nordholt, Martijn Tennekes, and Saskia Ossen
3.1 Introduction 63
3.2 Data Sources and Variables 64
3.3 Quality Framework 66
3.3.1 Source and Metadata Hyper Dimensions 66
3.3.2 Data Hyper Dimension 68
3.4 Quality Evaluation Results for the Dutch 2011 Census 69
3.4.1 Source and Metadata: Application of Checklist 69
3.4.2 Data Hyper Dimension: Completeness and Accuracy Results 72
3.4.2.1 Completeness Dimension 73
3.4.2.2 Accuracy Dimension 75
3.4.2.3 Visualizing with a Tableplot 78
3.4.3 Discussion of the Quality Findings 80
3.5 Summary 81
3.6 Practical Implications for Implementation with Surveys and Censuses 81
3.7 Exercises 82
References 82
4 Improving Input Data Quality in Register-Based Statistics: The Norwegian Experience 85
Coen Hendriks
4.1 Introduction 85
4.2 The Use of Administrative Sources in Statistics Norway 86
4.3 Managing Statistical Populations 89
4.4 Experiences from the First Norwegian Purely Register-Based Population and Housing Census of 2011 91
4.5 The Contact with the Owners of Administrative Registers Was Put into System 93
4.5.1 Agreements on Data Processing 93
4.5.2 Agreements of Cooperation on Data Quality in Administrative Data Systems 95
4.5.3 The Forums for Cooperation 96
4.6 Measuring and Documenting Input Data Quality 96
4.6.1 Quality Indicators 96
4.6.2 Operationalizing the Quality Checks 97
4.6.3 Quality Reports 99
4.6.4 The Approach is Being Adopted by the Owners of Administrative Data 99
4.7 Summary 100
4.8 Exercises 101
References 104
5 Cleaning and Using Administrative Lists: Enhanced Practices and Computational Algorithms for Record Linkage and Modeling/Editing/Imputation 105
William E. Winkler
5.1 Introductory Comments 105
5.1.1 Example 1 105
5.1.2 Example 2 106
5.1.3 Example 3 107
5.2 Edit/Imputation 108
5.2.1 Background 108
5.2.2 Fellegi-Holt Model 110
5.2.3 Imputation Generalizing Little-Rubin 110
5.2.4 Connecting Edit with Imputation 111
5.2.5 Achieving Extreme Computational Speed 112
5.3 Record Linkage 113
5.3.1 Fellegi-Sunter Model 113
5.3.2 Estimating Parameters 116
5.3.3 Estimating False Match Rates 118
5.3.3.1 The Data Files 118
5.3.4 Achieving Extreme Computational Speed 123
5.4 Models for Adjusting Statistical Analyses for Linkage Error 124
5.4.1 Scheuren-Winkler 124
5.4.2 Lahiri-Larsen 125
5.4.3 Chambers and Kim 127
5.4.4 Chipperfield, Bishop, and Campbell 128
5.4.4.1 Empirical Data 130
5.4.5 Goldstein, Harron, and Wade 132
5.4.6 Hof and Zwinderman 133
5.4.7 Tancredi and Liseo 133
5.5 Concluding Remarks 133
5.6 Issues and Some Related Questions 134
References 134
6 Assessing Uncertainty When Using Linked Administrative Records 139
Jerome P. Reiter
6.1 Introduction 139
6.2 General Sources of Uncertainty 140
6.2.1 Imperfect Matching 140
6.2.2 Incomplete Matching 141
6.3 Approaches to Accounting for Uncertainty 142
6.3.1 Modeling Matching Matrix as Parameter 143
6.3.2 Direct Modeling 146
6.3.3 Imputation of Entire Concatenated File 148
6.4 Concluding Remarks 149
6.4.1 Problems to Be Solved 149
6.4.2 Practical Implications 150
6.5 Exercises 150
Acknowledgment 151
References 151
7 Measuring and Controlling for Non-Consent Bias in Linked Survey and Administrative Data 155
Joseph W. Sakshaug
7.1 Introduction 155
7.1.1 What is Linkage Consent? Why is Linkage Consent Needed? 155
7.1.2 Linkage Consent Rates in Large-Scale Surveys 156
7.1.3 The Impact of Linkage Non-Consent Bias on Survey Inference 158
7.1.4 The Challenge of Measuring and Controlling for Linkage Non-Consent Bias 158
7.2 Strategies for Measuring Linkage Non-Consent Bias 159
7.2.1 Formulation of Linkage Non-Consent Bias 159
7.2.2 Modeling Non-Consent Using Survey Information 160
7.2.3 Analyzing Non-Consent Bias for Administrative Variables 162
7.3 Methods for Minimizing Non-Consent Bias at the Survey Design Stage 163
7.3.1 Optimizing Linkage Consent Rates 163
7.3.2 Placement of the Consent Request 163
7.3.3 Wording of the Consent Request 165
7.3.4 Active and Passive Consent Procedures 166
7.3.5 Linkage Consent in Panel Studies 167
7.4 Methods for Minimizing Non-Consent Bias at the Survey Analysis Stage 168
7.4.1 Controlling for Linkage Non-Consent Bias via Statistical Adjustment 169
7.4.2 Weighting Adjustments 169
7.4.3 Imputation 170
7.5 Summary 172
7.5.1 Key Points for Measuring Linkage Non-Consent Bias 172
7.5.2 Key Points for Controlling for Linkage Non-Consent Bias 172
7.6 Practical Implications for Implementation with Surveys and Censuses 173
7.7 Exercises 174
References 174
Part III Use of Administrative Records in Surveys 179
8 A Register-Based Census: The Swedish Experience 181
Martin Axelson, Anders Holmberg, Ingegerd Jansson, and Sara Westling
8.1 Introduction 181
8.2 Background 182
8.3 Census 2011 183
8.4 A Register-Based Census 185
8.4.1 Registers at Statistics Sweden 185
8.4.2 Facilitating a System of Registers 186
8.4.3 Introducing a Dwelling Identification Key 187
8.4.4 The Census Household and Dwelling Populations 188
8.5 Evaluation of the Census 190
8.5.1 Introduction 190
8.5.2 Evaluating Household Size and Type 192
8.5.2.1 Sampling Design 192
8.5.2.2 Data Collection 193
8.5.2.3 Reconciliation 194
8.5.2.4 Results 194
8.5.3 Evaluating Ownership 195
8.5.4 Lessons Learned 198
8.6 Impact on Population and Housing Statistics 199
8.7 Summary and Final Remarks 201
References 203
9 Administrative Records Applications for the 2020 Census 205
Vincent T. Mule Jr, and Andrew Keller
9.1 Introduction 205
9.2 Administrative Record Usage in the U.S. Census 206
9.3 Administrative Record Integration in 2020 Census Research 207
9.3.1 Administrative Record Usage Determinations 207
9.3.2 NRFU Design Incorporating Administrative Records 208
9.3.3 Administrative Records Sources and Data Preparation 210
9.3.4 Approach to Determine Administrative Record Vacant Addresses 212
9.3.5 Extension of Vacant Methodology to Nonexistent Cases 214
9.3.6 Approach to Determine Occupied Addresses 215
9.3.7 Other Aspects and Alternatives of Administrative Record Enumeration 217
9.4 Quality Assessment 219
9.4.1 Microlevel Evaluations of Quality 219
9.4.2 Macrolevel Evaluations of Quality 221
9.5 Other Applications of Administrative Record Usage 224
9.5.1 Register-Based Census 224
9.5.2 Supplement Traditional Enumeration with Adjustments for Estimated Error for Official Census Counts 224
9.5.3 Coverage Evaluation 225
9.6 Summary 226
9.7 Exercises 227
References 228
10 Use of Administrative Records in Small Area Estimation 231
Andreea L. Erciulescu, Carolina Franco, and Partha Lahiri
10.1 Introduction 231
10.2 Data Preparation 233
10.3 Small Area Estimation Models for Combining Information 238
10.3.1 Area-level Models 238
10.3.2 Unit-level Models 247
10.4 An Application 252
10.5 Concluding Remarks 259
10.6 Exercises 259
Acknowledgments 261
References 261
Part IV Use of Administrative Data in Evidence-Based Policymaking 269
11 Enhancement of Health Surveys with Data Linkage 271
Cordell Golden and Lisa B. Mirel
11.1 Introduction 271
11.1.1 The National Center for Health Statistics (NCHS) 271
11.1.2 The NCHS Data Linkage Program 272
11.1.3 Initial Linkages with NCHS Surveys 272
11.2 Examples of NCHS Health Surveys that Were Enhanced Through Linkage 273
11.2.1 National Health Interview Survey (NHIS) 273
11.2.2 National Health and Nutrition Examination Survey (NHANES) 274
11.2.3 National Health Care Surveys 274
11.3 NCHS Health Surveys Linked with Vital Records and Administrative Data 275
11.3.1 National Death Index (NDI) 276
11.3.2 Centers for Medicare and Medicaid Services (CMS) 276
11.3.3 Social Security Administration (SSA) 277
11.3.4 Department of Housing and Urban Development (HUD) 277
11.3.5 United States Renal Data System and the Florida Cancer Data System 278
11.4 NCHS Data Linkage Program: Linkage Methodology and Processing Issues 278
11.4.1 Informed Consent in Health Surveys 278
11.4.2 Informed Consent for Child Survey Participants 279
11.4.3 Adaptive Approaches to Linking Health Surveys with Administrative Data 280
11.4.4 Use of Alternate Records 281
11.4.5 Protecting the Privacy of Health Survey Participants and Maintaining Data Confidentiality 282
11.4.6 Updates Over Time 283
11.5 Enhancements to Health Survey Data Through Linkage 284
11.6 Analytic Considerations and Limitations of Administrative Data 286
11.6.1 Adjusting Sample Weights for Linkage-Eligibility 287
11.6.2 Residential Mobility and Linkages to State Programs and Registries 288
11.7 Future of the NCHS Data Linkage Program 289
11.8 Exercises 291
Acknowledgments 292
Disclaimer 292
References 292
12 Combining Administrative and Survey Data to Improve Income Measurement 297
Bruce D. Meyer and Nikolas Mittag
12.1 Introduction 297
12.2 Measuring and Decomposing Total Survey Error 299
12.3 Generalized Coverage Error 302
12.4 Item Nonresponse and Imputation Error 305
12.5 Measurement Error 307
12.6 Illustration: Using Data Linkage to Better Measure Income and Poverty 311
12.7 Accuracy of Links and the Administrative Data 312
12.8 Conclusions 315
12.9 Exercises 316
Acknowledgments 317
References 317
13 Combining Data from Multiple Sources to Define a Respondent: The Case of Education Data 323
Peter Siegel, Darryl Creel, and James Chromy
13.1 Introduction 323
13.1.1 Options for Defining a Unit Respondent When Data Exist from Sources Instead of or in Addition to an Interview 324
13.1.2 Concerns with Defining a Unit Respondent Without Having an Interview 325
13.2 Literature Review 326
13.3 Methodology 327
13.3.1 Computing Weights for Interview Respondents and for Unit Respondents Who May Not Have Interview Data (Usable Case Respondents) 327
13.3.1.1 How Many Weights Are Necessary? 328
13.3.2 Imputing Data When All or Some Interview Data Are Missing 328
13.3.3 Conducting Nonresponse Bias Analyses to Appropriately Consider Interview and Study Nonresponse 329
13.4 Example of Defining a Unit Respondent for the National Postsecondary Student Aid Study (NPSAS) 330
13.4.1 Overview of NPSAS 330
13.4.2 Usable Case Respondent Approach 333
13.4.2.1 Results 333
13.4.3 Interview Respondent Approach 335
13.4.3.1 Results 336
13.4.4 Comparison of Estimates, Variances, and Nonresponse Bias Using Two Approaches to Define a Unit Respondent 338
13.5 Discussion: Advantages and Disadvantages of Two Approaches to Defining a Unit Respondent 340
13.5.1 Interview Respondents 340
13.5.2 Usable Case Respondents 341
13.6 Practical Implications for Implementation with Surveys and Censuses 342
13.A Appendix 343
13.A.1 NPSAS:08 Study Respondent Definition 343
13.B Appendix 343
References 348
Index 349