Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern

Revista de métodos cuantitativos para la economía y la empresa

imputation methods
missing data
Mexico
Author

Juan Armando Torres Munguía

Published

June 2, 2014

Doi

Abstract

This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.

Citation

Torres Munguía, Juan Armando (2014). Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern. Revista de métodos cuantitativos para la economía y la empresa (17). DOI: 10.46661/revmetodoscuanteconempresa.2196. Retrieved from: https://www.upo.es/revistas/index.php/RevMetCuant/article/view/2196