We obtained daily counts of hospital admissions for the period 2000–2006 from billing claims of enrollees in the U.S. Medicare system. Because the Medicare data analyzed for this study did not include individual identifiers, we did not obtain consent from individuals. This study was reviewed and exempted by the Institutional Review Board at the Johns Hopkins Bloomberg School of Public Health.
Each billing claim contains the date of service, disease classification using International Classification of Diseases, 9th Revision
(ICD-9) codes (Centers for Disease Control and Prevention 2008
), age, and county of residence. We considered two broad classes of outcomes based on ICD-9 codes: urgent or emergency cardiovascular admissions and urgent or emergency respiratory admissions. The classification of “urgent” and “emergency” is designated directly on each Medicare hospital admissions record. We excluded other classifications, such as “elective.” A recent study (Dominici et al. 2006
) considered a number of different cardiovascular and respiratory outcomes. Because of the sparser sampling of the PM2.5
component data compared with the PM2.5
total mass data, to obtain sufficient statistical power we collapsed the data into two broad categories of hospital admissions: a
) CVD, which includes heart failure (ICD-9 code 428), heart rhythm disturbances (426–427), cerebrovascular events (430–438), ischemic heart disease (410–414, 429), and peripheral vascular disease (440–448); and b
) respiratory diseases, which include chronic obstructive pulmonary disease (490–492) and respiratory infection (464–466, 480–487). We excluded admissions for injuries and for external causes (800–849). By collapsing these health outcomes, we increased statistical power and obtained more stable estimates of risk at the cost of some specificity of the outcome.
We analyzed each outcome (respiratory or cardiovascular admissions) separately. We calculated the daily counts of hospitalizations by summing the hospital admissions for each disease of interest recorded as a primary diagnosis. To calculate daily hospitalization rates, we constructed a parallel time series of the numbers of individuals enrolled in Medicare that were at risk in each county on each day. We based the location of each hospital admission on the county of residence of the enrollee.
The U.S. EPA established the PM Speciation Trends Network (STN) to measure more than 50 PM2.5
chemical components, in addition to total mass. The STN includes > 50 national air monitoring stations (NAMS) and > 200 state and local air monitoring stations (SLAMS) (U.S. EPA 1999
). Air pollution concentrations were typically measured on a 1-in-3–day schedule in the NAMS and on a 1-in-6–day schedule in the SLAMS. We removed suspect data and extreme values from the original monitor records; monitors with very little data were omitted altogether. Full details of the construction of the database can be found else-where (Bell et al. 2007
). We also used PM2.5
total mass measurements from the U.S. EPA’s Air Quality System as in our previous analyses (Dominici et al. 2006
). Of the 187 counties described in the Bell et al. (2007)
analysis, we restricted the present analysis to counties with general populations larger than 150,000 and with at least 100 observations on components of PM2.5
. These requirements ensured that we would have enough data in a particular location to estimate an association between PM2.5
components and hospital admissions. The study population consisted of 12 million Medicare enrollees living in 119 urban counties in the United States ().
U.S. counties with populations larger than 150,000 for which sufficient hospital admissions and PM2.5 chemical component data were available, 2000–2006 (119 total).
We limited our analysis to the components making up a large fraction of the total PM2.5
mass or covarying with total mass (Bell et al. 2007
): sulfate, nitrate, silicon, elemental carbon (EC), organic carbon matter (OCM), sodium ion, and ammonium ion. These seven components, in aggregate, constituted 83% of the total PM2.5
mass, whereas all other components individually contributed < 1%. We computed countywide averages for each of these components and for PM2.5
total mass by averaging the daily values from all monitors in a county. We adjusted organic carbon measurements for field blanks to estimate OCM. We used a standard approach such that OCM = k
), where OCM represents organic carbon matter, OCm
represents measured organic carbon, OCb
represents organic carbon for blank filters, and k
is the adjustment factor to account for non-carbon organic matter. We applied a k
value of 1.4, as in a previous analysis (Bell et al. 2007
). We obtained temperature and dew-point temperature data from the National Climatic Data Center on the Earth-Info CD database (EarthInfo 2006
As a check on the consistency of the chemical component data, we first assessed whether three different PM2.5 indicators (four scenarios total) provided comparable estimates of the short-term associations of PM2.5 with cardiovascular and respiratory admissions: PM2.5 (1), PM2.5 measured by the national PM2.5 monitoring network for the period 1999–2006; PM2.5 (1a), PM2.5 (1) for the period 2000–2006 and including only days with available measurements for all the seven PM2.5 components from the STN; PM2.5 (2), PM2.5 measured by the STN for the period 2000–2006 and including only days with available measurements for all the seven PM2.5 components from the STN; and PM2.5 (3), PM2.5 estimated as the sum of the seven largest components of PM2.5 mass for the period 2000–2006. Significant differences between these estimates would raise uncertainty as to the recorded values of PM2.5 total mass and its components. The estimates obtained under the scenarios 1a, 2, and 3 use data on the same subset of days. Each of these measures of PM2.5 was available in all 119 counties.
We estimated the within-county monitor-to-monitor correlation for each of the seven PM2.5 components to obtain a measure of the spatial homogeneity of each component. For this calculation we used a subset of 12 counties that had more than one monitor (27 monitors total): Jefferson, Alabama; Washington, DC; Cook, Illinois; Jefferson, Kentucky; Wayne, Michigan; Bronx, New York; Cuyahoga, Ohio; Allegheny, Pennsylvania; Philadelphia, Pennsylvania; Providence, Rhode Island; King, Washington; and Kanawha, West Virginia. We computed correlations only if at least 90 paired observations were available between two monitors. We also estimated the median within-county correlations between the seven PM2.5 components, and the three measures of PM2.5 total mass by a) estimating the correlations between time series data for each pair of air pollutants within each county and b) taking the median of the estimated correlations across the 119 counties. As a separate measure of spatial homogeneity, we calculated, for each of the seven components and using all monitors, the distance at which the correlation between pairs of monitors was 0.5 on average.
We applied Bayesian hierarchical statistical models to estimate county-specific and national average associations between daily variation in the seven PM2.5
chemical components and daily variation in hospital admissions rates. This approach was originally developed for the National Morbidity, Mortality, and Air Pollution Study (Bell et al. 2004
; Samet et al. 2000
) and subsequently extended (Dominici et al. 2006
) to provide a consistent and unified methodology for analyzing data from multiple locations. We fit log-linear Poisson regression models with overdispersion to county-specific time-series data on hospital admissions and chemical components, accounting for potential confounders such as weather, day of the week, unobserved seasonal factors, and long-term trends. In each county-specific regression model, we included an indicator for the day of the week, a smooth function of time with 8 degrees of freedom (df) per calendar year to control for seasonality and long-term trends, a smooth function of current-day temperature (6 df), a smooth function of the 3-day running mean temperature (6 df), a smooth function of current-day dew-point temperature (3 df), and a smooth function of the 3-day running mean dew-point temperature (3 df). For all of the smooth functions we used a natural spline basis. We conducted a sensitivity analysis with respect to the smooth function of time to determine the degree to which risk estimates changed with varying levels of adjustment for smooth unmeasured confounders. Although other information about Medicare enrollees is available, such as sex and race, we excluded these factors from all models because they do not vary over time and should not play a role in our time series analysis.
For the exposure concentrations, we examined 0-, 1-, and 2-day lag concentrations because our previous work with PM2.5
total mass and hospital admissions showed little evidence of a strong association with admissions at a lag of ≥ 3 days (Dominici et al. 2006
). We examined each lag separately because the 1-in-6–day sampling of the chemical component data from the STN prohibited the use of distributed lag models where all lags can be examined simultaneously.
For estimating the health effects of the PM2.5 components, we employed single-pollutant and multipollutant models. In single-pollutant models, we included each PM2.5 component in the regression model individually, without adjusting for any other chemical component (the model does adjust for other time-varying factors). In multipollutant models, we included PM2.5 components simultaneously to obtain estimates of the regression coefficients for each component adjusted for the other components. Ammonium was excluded from models that included sulfate and nitrate because of the high correlation among these three components. We included ammonium in a separate multipollutant model that did not include sulfate or nitrate but included the remaining four components.
We combined the county-specific risk estimates to form a national average using a Bayesian hierarchical model. In the single-pollutant models, we combined the log-relative risks separately for each pollutant using TLNise two-level normal independent sampling estimation software (Everson and Morris 2000
). For the multiple-pollutant models, the risks were treated as a vector for each county and combined using a multivariate normal hierarchical model. We used Markov chain Monte Carlo methods to obtain the posterior distribution of the national average component effects. We assessed statistical significance by the posterior probability that the national average relative risk for a component was greater than zero. Values of the posterior probability > 0.95 were considered statistically significant (Dominici et al. 2006
; Peng et al. 2008
We evaluated whether the relative risks of each PM2.5 component in a multipollutant model were equal. In this analysis, the risks represent the percent increase in admissions associated with a 1-μg/m3 increase in each PM2.5 component in a multipollutant model. We assessed the evidence against equal component risks using a chi-square statistic applied to the national average estimates. We also estimated the posterior probability that the coefficient for a particular component was greater than the mean of the coefficients for the other components.
For statistical calculations we used R statistical software, version 2.7.0 (R Foundation for Statistical Computing, Vienna, Austria).