Should the Missing-Indicator Method be Preferred to Complete Case Analysis When Handling Missingness in a Categorical Regressor?
Keywords:categorical data, missing data, missingness at random, missingness not at random, complete case analysis, missing indicator method, regression analysis, simulated data, statistical experiment, Monte Carlo technique, bias, coverage
If missingness is encountered in a categorical regressor, which approach is preferable: complete case analysis or the missing-indicator method? The former approach implies including in analysis (linear regression in our research) only the cases without missingness across analyzed variables. This approach is embedded in many statistical applications by default, and despite the opinion that its applicability is rather restricted, up-to-date studies provide evidence for its wide applicability – even to missingness not at random. The missing-indicator method, according to which missing data are replaced with a single valid value and a new missing-indicator variable is created, pretends to be an alternative that keeps a full sample available for analysis and, hypothetically, does not lead to the deterioration of parameter estimates. By means of simulated data and a statistical experiment, controlling the factors of missingness mechanism, missingness proportion, and a regression model’s specification, we compare parameter estimates produced by each approach to handling missingness – how biased and inefficient they are. According to the results, no approach leads to crucially biased estimates, but the missing-indicator method produces ineffective estimates.
Acknowledges. The publication was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2020 (grant No. 20-04-016) and by the Russian Academic Excellence Project «5–100».
Copyright (c) 2021 Monitoring of Public Opinion: Economic and Social Changes Journal (Public Opinion Monitoring) ISSN 2219-5467
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.