Our results using the full censored likelihood indicate that the ecological association between total mercury emissions and 2000-2001 autism in Texas school districts may be much smaller than previously reported. Published associations between precipitation and autism are also dependent on censored data and could be affected by similar biases. RR estimates for the ecological association using the censored likelihood were fairly similar to the results using the ad hoc substitution approach when values near the middle of the censoring interval were chosen. However, confidence intervals were (appropriately) wider for the Bayesian likelihood-based approach than for the substitution approaches. Exclusion of all censored values produced similar central RR estimates to the Bayesian approach but with artificially narrow confidence intervals.
However, all results should also be viewed as ecologic effect estimates that may not be representative of individual level effects due to aggregation bias. Another consequence of the group-level analysis is that uncertainty regarding individual-level associations between mercury exposure and autism in Texas is much greater than suggested by the group-level confidence intervals reported here and in previous publications [3
These results were surprising to us given that the cutoff for censoring was so low (5 students), but they may be explained by an apparent association between mercury emissions and the presence of censoring, whereby those districts with small autism counts tended to have lower mercury emissions. This effect may be better understood through inspection of , which shows the data and model fit using a variety of fixed value substitution approaches. Because many of the censored values occur in counties with low mercury emissions, the overall intercept from the mixed effects Poisson regression is highly sensitive to the choice of fixed value substituted for the censored observations. In contrast, there are relatively few censored observations in counties with higher mercury releases, leading to a more stable position of the right-hand side of the prediction curve. The result appears to be a forcing of the slope (the log relative risk) to higher values when the intercept is lower.
Figure 1 Autism prevalence versus Toxic Release Inventory reported air mercury emissions, with threes substituted for censored values (c = 3) for all data points. Lines show the covariate-adjusted random effects Poisson model predictions using five different fixed (more ...)
Although both total mercury releases and air mercury emissions are poor surrogates for individual mercury exposures, air mercury emissions may be a better proxy for concurrent exposures than total mercury releases. Total mercury releases include landfill disposals and other releases that may not lead to widespread or immediate public exposure, especially if compliant with the 1976 Resource Conservation Recovery Act (P.L. 94–580) and its 1984 amendment. In contrast, air mercury emissions are more widely and rapidly dispersed with wind and rainfall and are, therefore, of the two, more plausible sources of human exposure in the short term. Surface water mercury emissions may also be quickly dispersed, but these emissions were relatively small in Texas in 2001: only 25% of Texas counties reported any surface water mercury emissions, and the largest annual release was 16
lbs. Air mercury releases occurred in much larger quantities, with reported releases in most Texas counties, a mean annual release of 288
lbs., and a maximum annual release of 1579
lbs. Although these analyses are limited by the ecological nature of the data and the crude measure of exposure, we believe that the air mercury emissions are the more relevant measure within these limitations. Researchers considering similar analyses should also consider time-lagged comparisons of autism counts with emissions/exposure estimates from previous years to allow enough time for environmental transport, autism development, and diagnosis to have occurred [12
The three substitution and exclusion approaches happen to provide a rough approximation to the central RR estimate for these particular data, but likelihood-based approaches for handling censored data produce more realistic confidence intervals and have a stronger theoretical basis. All of the previous and current approaches, however, depend on common model assumptions such as Poisson distributed counts, log linearity of the predictors, and adequate control of confounding. The third assumption is of most concern in the present analysis, given the limited understanding of autism risk factors and a lack of any individual-level data on mercury exposure or confounders.
Ecological studies of autism are not the only studies where censored disease counts are a threat to validity. All special education category counts reported by Texas are administratively censored, and the same practice appears to be followed by some other states. Other count data reported by states may be affected by similar issues. Moreover, efforts to protect privacy under the Health Insurance Portability and Accountability Act of 1996 (P.L.104–191) may lead to administrative censoring of medical surveillance data when counts are extremely low. With the widespread computerization of large data sets, ecological analyses are becoming ever easier to conduct and will likely appear more frequently in the scientific literature despite their limitations.
Censoring also arises in chemical concentration measurement, another common issue in environmental health, through reporting of low concentrations as below the “limit of detection” (LOD). If censored values are excluded, calculated means will generally be biased upwards in this setting. Environmental health researchers have a long tradition of substituting zeros, LOD/2, LOD/
, or LOD for these censored values, treating the substituted values as if they were actually observed. The substitution approach is easy to implement but is inferior to formal likelihood-based censored data analysis in that it may also produce biased estimates and always fails to capture the uncertainty associated with measurements below the LOD, such as in calculating confidence intervals [14
]. However, the adverse effects of substitution or exclusion may be negligible when both of the following conditions are met: (1) few samples are below the cutoff for censoring and (2) the cutoff for censoring is small relative to most of the measurements.