The National Survey of Children with Special Health Care Needs (NSCSHCN) provides a unique opportunity to investigate issues related to children with special health care needs (CSHCN) and their families. Among many substantively important uses of this survey, our particular problem pertains to drawing inferences on the correlates of unmet health care and family service needs for all CSHCN. Following a broad definition released by the US Maternal and Child Health Bureau [
1], CSHCN are those children who require services beyond that required by children generally because of a chronic physical, developmental, behavioral, or emotional condition. Using this definition, approximately 12.8% of children in the USA had a special health care need in 2001 [
2]. Previous studies [
3,
4] show that CSHCN use more health care services and that their families experience a variety of consequences of caring for a child with SHCN such as lost employment and increased financial burden.
Our proposed method is largely motivated by the need to make full use of observed data in the NSCSHCN. If one were to only proceed with complete-case only analysis, only 68% of the sample would be analyzed. As documented by many researchers, such analyses have potential undesired inferential properties including bias and distorted estimates on the uncertainty measures. An increasingly popular inferential method to accomplish this is multiple imputation (MI) [
5]. Briefly, MI is a simulation-based inferential tool operating on
M > 1 ‘completed’ data sets, where the missing values are replaced by random draws from their respective predictive distributions (e.g., posterior predictive distribution of missing data). These
M versions of completed data are then analyzed by standard complete-data methods, and the results are combined into a single inferential statement using rules to yield estimates, standard errors, and
p-values that formally incorporate the missing-data uncertainty into the modeling process [
5]. The key ideas and the advantages of MI are given by [
5,
6].
From a practitioner’s perspective, MI is a flexible and convenient solution. Growing availability of MI software (e.g., SAS PROC MI) [
9] has contributed greatly to its popularity. The underlying normality assumption, however, is probably the single most violated assumption of this procedure. In most health surveys, for example, most items subject to incomplete data are the items measured categorically using either nominal or ordinal scales. As demonstrated in the forthcoming sections, utilizing incorrect distributional assumptions can pose a great inferential threat. Our methods aim to remove this threat with minimal adjustments to the current methods.
Our imputation methods are specifically developed for incompletely observed categorical variables. These methods extend the calibration techniques developed by [
7] for binary imputation to the ordinal and nominal variable imputation. Our methods are largely motivated by practical purposes and allow practitioners to adopt them with minimal programming. The unique contribution of these techniques is the ability to employ Gaussian-based (or any working distribution) imputation techniques which avoid the common computational problem due to dimensionality and/or scarcity ([
6,
7]).
The substantive focal point of our methodology is the study of racial disparities within the context of unmet needs for all CSHCN and for CSHCN with severe conditions. Understanding the disparities in these outcomes is an important topic in health services research, and incomplete values on the key race variable introduce analytical difficulties and create a potential source of statistical bias if no sound action is taken. Descriptive summaries of the selected survey items from the NSCSHCN can be seen in .
| Table ISummary of selected characteristics for all CSHCN with severe conditions among those with complete race values versus those with missing race values. Children with more severe conditions were selected if screened positive for either of the following items: (more ...) |
One of the most frequently asked questions in the analysis of incomplete data pertains to implications of conducting analyses with only complete cases. To have empirical insight about such implications, practitioners can analyze the missingness patterns. In our application, for example, disproportionate distributions of key variables corresponding to observed and missing cases (e.g., CSHCN with severe conditions with race) raise a valid concern for misleading conclusions under an analysis ignoring incomplete cases. Biased inferences as well as under-powered conclusions are the usual adverse outcomes of such analyses.
Although missing data exist in most of the NSCSHCN variables, to illustrate our calibration technique, we focus on a single variable, the race variable, which is measured on a nominal scale. Although relatively low levels of missing data are found on race variable (around 4.5%), those who are missing on race had significantly high rates of unmet health needs and CSHCN with severe conditions. summarizes descriptive statistics and compares cases with missing race values, which do not resemble the same distribution as the cases with complete race values. Among the most notable differences, we see that cases with missing values have lower levels of maternal education, higher poverty levels, and higher rates of public/private and only private insurance. A significantly higher rate of Hispanics is also seen in the cases with missing race information. These issues pose real analytical problems for health services researchers who face incomplete race data in disparity studies. In addition, as Section 5 elaborates more, our methods can easily accommodate arbitrary missing values in other types of variables.
The remainder of this article is organized as follows. In the next section, we summarize the software and previous methods on rounding under multivariate normal (MVN) distribution. Section 3 then describes our rounding strategy on the basis of the calibration of the marginal distribution and states how it can be implemented for nominal and ordinal variables. Section 4 summarizes the results from a simulation study assessing the performance of our method. Section 5 demonstrates an application using the NSCSHCN to explore the correlates of self-reported CSHCN with severe conditions. Finally, Section 6 discusses the strengths and weaknesses of this approach.