These methods are demonstrated using the Healthcare Cost and Utilization Project (HCUP) KIDS' Inpatient Database (KID) for the year 2000 (HCUP Kids' Inpatient Database (KID) 2000). This dataset, which is publicly available for a fee from the Agency for Healthcare Quality and Research, collects data from states on child hospitalizations to improve the quality of health care. We investigated what factors predicted whether a pediatric subject with a psychiatric or substance abuse diagnosis had a routine discharge from the hospital.
More specifically, we included all 10–20 year-old subjects with a Clinical Classifications Software (CCS) category for primary, secondary or tertiary diagnosis equal to (66) alcohol-related mental disorders, (67) substance-related mental disorders, (68) senility and organic mental disorders, (69) affective disorders, (70) schizophrenia and related disorders, (71) other psychoses, (72) anxiety; somatoform; dissociative; and personality disorders; (73) pre-adult disorders, (74) other mental conditions, or (75) personal history of mental disorder; mental and behavioral problems; observation and screening for mental condition.
The outcome in our model was routine discharge vs. non-routine discharge (including transfer to a short term hospital, other facility, release to home health care, dying in hospital or leaving against medical advice).
Predictors in the logistic regression included an indicator of gender (FEMALE, 1=female, 0=male), AGE (in years), length of stay (LOS, in days), admission type (ATYPE, 1=emergency, 2=urgent, 3=elective), admission month (AMONTH, used to derive season of admission, NSEASON), admission on weekend (WEEKEND, 1=Saturday or Sunday, 0=otherwise), number of diagnoses on original record (NDX), race/ethnicity (RACE, 1=white, 2=black, 3=hispanic, 4=other), and total charges (TOTCHG, in dollars).
4.1 Descriptive statistics
provides descriptive statistics for the observed data from the KID dataset. More than four-fifths of the sample were discharged in a routine fashion, one-fifth during the weekend, with more than half female and two-thirds white/caucasian. The average age was 16 years, and the length of stay, total charges and number of diagnoses were all skewed to the right.
Descriptive statistics for KID dataset
4.2 Missing data
A total of 133,774 observations were recorded. Data were complete for the ROUTINE, FEMALE, AGE, LOS, WEEKEND and NDX variables.
There were missing values for TOTCHG (4% of dataset), ATYPE (11% of dataset), RACE (16% of dataset) and NSEASON (12% of dataset). AMONTH and ATYPE were missing by design since some states restrict the availability of information to minimize the possibility of inadvertent reidentification of subjects in smaller hospitals, while some states prohibited reporting data on RACE. A total of 79,574 (59%) of observations had complete data.
Because LogXact requires variables with missing values to have no more than 5 levels (coded 0, 1, . . . , 4), the variable AMONTH was recoded into a variable ASEASON where Winter was defined as months December, January or February, Spring as months March, April or May, etc.
displays the pattern of missing data using routines within Stata; a similar presentation can be created with SAS, R or S-Plus.
Description of missing data (using Stata misschk function)
displays the results from the complete case estimator (n=79,017). However, the use of the complete case estimator means that incomplete observations are excluded from the analysis, even though for almost all subjects, complete data on the outcome as well as all but one or two predictors are available. We now review software implementations to incorporate these incomplete observations.
Results from complete case estimator (Stata)