Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2856093

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Data Description
- 3 Statistical Models
- 4 Spatial-Temporal Model for PM2.5
- 5 Application
- 6 Conclusion
- References

Authors

Related links

J Stat Theory Pract. Author manuscript; available in PMC 2010 April 19.

Published in final edited form as:

J Stat Theory Pract. 2009 June 1; 3(2): 407–418.

PMCID: PMC2856093

NIHMSID: NIHMS148545

See other articles in PMC that cite the published article.

Fine particulate matter (PM_{2.5}) is an atmospheric pollutant that has been linked to serious health problems, including mortality. PM_{2.5} is a mixture of pollutants, and it has five main components: sulfate, nitrate, total carbonaceous mass, ammonium, and crustal material. These components have complex spatial-temporal dependency and cross dependency structures. It is important to gain insight and better understanding about the spatial-temporal distribution of each component of the total PM_{2.5} mass, and also to estimate how the composition of PM_{2.5} might change with space and time, by spatially interpolating speciated PM_{2.5}. This type of analysis is needed to conduct spatial-temporal epidemiological studies of the association of these pollutants and adverse health effect.

We introduce a multivariate spatial-temporal model for speciated PM_{2.5}. We propose a Bayesian hierarchical framework with spatiotemporally varying coefficients. In addition, a linear model of coregionalization is developed to account for spatial and temporal dependency structures for each component as well as the associations among the components. We also introduce a statistical framework to combine different sources of data, which accounts for bias and measurement error. We apply our framework to speciated PM_{2.5} data in the United States for the year 2004. Our study shows that sulfate concentrations are the highest during the summer while nitrate concentrations are the highest during the winter. The results also show total carbonaceous mass

The study of the association between ambient particulate matter (PM) and human health has received much attention in epidemiological studies over the past few years. Özkaynak and Thurston (1987) conducted an analysis of the association between several particle measures and mortality. Their results showed the importance of considering particle size, composition, and source information when modeling particle pollution health effect. In particular, fine particle matter, PM_{2.5} (< 2.5*μm* in diameter), is an atmospheric pollutant that has been linked to numerous adverse health effect (e.g. respiratory and cardiovascular diseases). PM_{2.5} is a mixture of pollutants, and is classified into five main components (U.S. EPA, 2003): sulfate, nitrate, total carbonaceous mass (TCM), ammonium, and crustal material (that includes calcium, iron, silicon, aluminum, and titanium). These components have complex spatial-temporal dependency and cross dependency structures. Fuentes et al. (2006) studied the health effect of PM_{2.5} and its components using monthly data across the United States. In order to investigate the health effect associated to speciated fine PM across space and time, we need to have first the speciated PM_{2.5} information at the locations and times of interest. However, daily speciated PM_{2.5} measurements are available at limited monitoring sites, and missing values may occur at given time point. Rao et al. (2003) and Malm et al. (2004) showed the spatial and temporal patterns of speciated PM_{2.5}, but they only conducted an exploratory analysis of speciated PM_{2.5}. The research presented here is part of a larger project to study the association between speciated PM_{2.5} and adverse health outcomes across the entire U.S. Our goal here is to develop a statistical framework using all available sources of data about speciated PM_{2.5} to investigate the spatial-temporal patterns of speciated PM_{2.5} and then be able to predict speciated PM_{2.5} at all locations and times of interest. We also study the spatial-temporal patterns of the so called “unknown components” which are not the main speciated PM_{2.5} components (as defined by EPA).

In this article we introduce a new statistical framework to combine different sources of information of PM_{2.5} taking into account potential bias and measurement error, and we develop a multivariate spatial-temporal model for speciated PM_{2.5} accounting for the complex dependency structures of the speciated PM_{2.5}. We develop a spatial-temporal linear model of coregionalization (STLMC) to account for multivariate spatial-temporal dependency structures. In addition, a hierarchical framework is proposed to investigate the relative contribution of each component to the total PM_{2.5} mass, which changes over space and time.

We use a new speciated PM_{2.5} data set. To our knowledge, this is the first time that a statistical framework has been proposed and used to analyze speciated PM_{2.5} across the entire United States. The proposed framework captures the cross dependency structure among the PM_{2.5} components and explains their spatial-temporal dependency structures. A Bayesian hierarchical framework is used to study different random effect of interest. The STLMC proposed here allows for very general multivariate spatial-temporal covariance structures, and offer some computational advantages. We can accurately predict speciated PM_{2.5} at any location or time point of interest. In addition, we present a new approach to combine different sources of information about PM_{2.5}, that improve the prediction of the total PM_{2.5} mass. Wikle et al. (2001) presented a similar approach to combine different sources of information of ocean surface winds, but they treated one of the data sources as a prior process. In our approach, all of the data sources are simultaneously represented in terms of the underlying truth, and we also model the potential bias of the different sources of information as spatial temporal processes.

This article is organized as follows. In Section 2 we describe the data used in this study. In Section 3, we present a Bayesian hierarchical multivariate spatial-temporal model for speciated PM_{2.5}, and we also introduce the STLMC. A statistical framework to combine PM_{2.5} data is described in Section 4. In Section 5 we present the results, and in Section 6 we offer a general discussion.

PM_{2.5} data from two monitoring networks and meteorological data in the conterminous United States for year 2004 were used in this study. The first source of PM_{2.5} data is the Speciated Trends Network (STN) established by the U.S. Environmental Protection Agency (EPA) in 1999. The STN measures the speciated PM_{2.5} either every day, every third day or every sixth day. It included about 200 monitoring stations in 2004, which were mostly in urban areas. Even though the STN collects numerous trace elements, elemental carbon, organic carbon, and ions (sulfate, nitrate, sodium, potassium, ammonium), we only consider the five main PM_{2.5} components presented in the previous section. In the STN, sulfate, nitrate, and ammonium are measured independently. Total carbonaceous mass is sum of elemental carbon mass and estimated organic carbon mass which is 1.4 × ([*OC*] − 1.53), where [*OC*] is the measured organic carbon value, 1.4 is the factor to correct organic carbon mass for other elements (Rao et al., 2003), and 1.53 is the blank correction factor to adjust for sampling artifacts (Flanagan et al., 2003). Elemental carbon mass is also measured at STN monitoring stations. Crustal material is computed using the IMPROVE equation (Malm et al., 2004) for the five most prevalent trace elements.

Since the PM_{2.5} data at the STN monitoring stations provide sparse spatial coverage, using only the STN monitoring data might be insufficient for modeling the speciated fine PM over the entire United States. Therefore, we use the total PM_{2.5} data from the Federal Reference Method (FRM) monitoring network which includes rural and urban sites, and measures PM_{2.5} samples either every day, every third day or every sixth day. While the STN is a smaller network, the FRM network is a large national network, which consisted of about 1000 monitoring stations in 2004.

Meteorological data for 2004 have been provided from the U.S. National Climate Data Center. We use five daily meteorological variables: minimum temperature (°C), maximum temperature (°C), dew point temperature (°C), wind speed (m/s), and pressure (hPa).

While speciated PM_{2.5} is only available at about 200 STN stations, total PM_{2.5} mass is available at about 1000 FRM network stations. In addition, we can compute an estimate of the total PM_{2.5} mass from the PM_{2.5} components. Thus, we model the total PM_{2.5} mass using the PM_{2.5} data from both networks, and then express the mean of each PM_{2.5} component as a proportion of the total PM_{2.5} mass. The proportion of each component to the total PM_{2.5} mass varies over space and time, and we use a hierarchical framework to account for the spatial-temporal associations of the proportion. To ensure that at each site and time the sum of the proportions of the components to the total PM_{2.5} mass is one, we use a multinomial logit function (McFadden, 1974) for the proportion parameters. Even though the spatial-temporal dependency structures of the proportions are considered, it could be insufficient to capture the spatial-temporal dependency and the cross dependency structures of speciated PM_{2.5}. We thus include a mean-zero spatial-temporal process in the model, which explains the dependency structures. This approach allows us to estimate both the speciated PM_{2.5} in terms of the total PM_{2.5} mass and the cross-covariance between the PM_{2.5} components, which is not captured by the mean function of the speciated PM_{2.5}. Figure 1 shows the framework of the speciated fine PM.

We assume there is an underlying (unobserved) field *Z*(**s***, t*), where *Z*(**s***, t*) represents the true total PM_{2.5} value at location **s** *D*_{1} at time *t* *D*_{2}, where {**s : s**_{1}…,**s**_{Ns}} *D*_{1} ^{2}and {*t : t*_{1}, …, *t _{Nt}*}

$$\begin{array}{l}\mathbf{V}(\mathbf{s},t)=\mathit{\theta}(\mathbf{s},t)Z(\mathbf{s},t)+\mathit{\epsilon}(\mathbf{s},t)+{\mathit{\epsilon}}^{w}(\mathbf{s},t),\\ \text{where}\phantom{\rule{0.16667em}{0ex}}{\mathit{\theta}}^{T}(\mathbf{s},t)=({\theta}_{1}(\mathbf{s},t),\dots ,{\theta}_{5}(\mathbf{s},t))\\ \text{logit}[{\theta}_{i}(\mathbf{s},t)]={\eta}_{i}(t)+{\gamma}_{i}(\mathbf{s},t),\end{array}$$

(1)

$$\begin{array}{l}{\eta}_{i}(t)={f}_{i}(t)+{e}_{{\eta}_{i}}(t),\\ {\gamma}_{i}(\mathbf{s},t)={\gamma}_{i}(\mathbf{s},t-1)+{e}_{{\gamma}_{i}}(\mathbf{s},t),\\ Z(\mathbf{s},t)={\mathbf{M}}^{T}(\mathbf{s},t)\mathit{\beta}(\mathbf{s},t)+{e}_{z}(\mathbf{s},t),\end{array}$$

(2)

$$\mathit{\epsilon}(\mathbf{s},t)=\mathbf{Aw}(\mathbf{s},t),$$

(3)

where logit [*θ _{i}*(

The true total PM_{2.5}, *Z*(**s***, t*), is affected by a vector of meteorological weather variables **M**(**s***, t*) (minimum temperature, maximum temperature, dew point temperature, wind speed, and pressure). The meteorological influence could vary in space and time, and we define *β*(**s***, t*) to be meteorological parameters at location **s** and time *t*. The meteorological data might not exist at all the sites of interest. We thus interpolate the weather data at those locations using a spatial model for each time point (as part of the hierarchical frame-work). The residual process denoted by *e _{z}*(

The spatial-temporal process ** ε**(

The basic idea of the STLMC is that dependent spatial-temporal processes are expressed using the linear combination of uncorrelated spatial-temporal processes in the spatial-temporal modeling. The STLMC provides a very rich class of multivariate spatial-temporal processes with simple specification and interpretation. The STLMC like the linear model of coregionalization use in multivariate spatial analysis (Grzebyk and Wackernagel, 1994; Wackernagel, 1998) could be used as a dimension reduction method, which means that the given multivariate processes are represented as lower dimensional processes. In this study, we use the STLMC to construct a valid cross-covariance function of multivariate spatial-temporal processes. The STLMC used here is:

$$\mathit{\epsilon}(\mathbf{s},t)=\mathbf{Aw}(\mathbf{s},t),$$

(4)

where **w*** ^{T}*(

$$\mathit{Cov}(\mathit{\epsilon}({\mathbf{s}}_{l},{t}_{j}),\mathit{\epsilon}({\mathbf{s}}_{{l}^{\prime}},{t}_{{j}^{\prime}}))=\sum _{i=1}^{5}{C}_{i}^{(1)}({\mathbf{s}}_{l},{\mathbf{s}}_{{l}^{\prime}};{\mathit{\phi}}_{i}){C}_{i}^{(2)}({t}_{j},{t}_{{j}^{\prime}};{\mathit{\psi}}_{i}){\mathbf{T}}_{i},$$

(5)

Where
${\mathbf{T}}_{i}={\mathbf{a}}_{i}{\mathbf{a}}_{i}^{T}$ and **a*** _{i}* is the

We form
$\mathit{\epsilon}={({\mathit{\epsilon}}_{1}^{T},\dots ,{\mathit{\epsilon}}_{5}^{T})}^{T}$ and
${\mathit{\epsilon}}_{i}^{T}=({\mathit{\epsilon}}_{i}^{T}({t}_{1}),\dots ,{\mathit{\epsilon}}_{i}^{T}({t}_{{N}_{t}}))$ for *i* = 1, …. 5, where
${\mathit{\epsilon}}_{i}^{T}({t}_{j})=({\epsilon}_{i}({s}_{1},{t}_{j}),\dots ,{\epsilon}_{i}({s}_{{N}_{s}},{t}_{j}))$ for *j*=1 …*N _{t}* then the covariance matrix of

$${\mathbf{\sum}}^{\mathit{\epsilon}}=\sum _{i=1}^{5}{\mathbf{T}}_{i}\otimes {\mathbf{U}}_{i}\otimes {\mathbf{R}}_{i},$$

(6)

where denotes the Kronecker product. Each **R*** _{i}* is a

We now discuss estimation and prediction of the speciated PM_{2.5} using a Bayesian approach. In order to predict the speciated PM_{2.5} at location **s**_{0} and time *t*_{0} given the data **V** and **Z** (all available speciated PM_{2.5} data and total PM_{2.5} mass data, respectively) and a value *Z*(**s**_{0}, *t*_{0}), we need the posterior predictive distribution of **V**(**s**_{0}, *t*_{0}):

$$p(\mathbf{V}({\mathbf{s}}_{0},{t}_{0})\mid \mathbf{V},\mathbf{Z},Z({\mathbf{s}}_{0},{t}_{0}))\propto \int p(\mathbf{V}({\mathbf{s}}_{0},{t}_{0})\mid \mathbf{V},\mathbf{Z},\mathbf{\Theta},Z({\mathbf{s}}_{0},{t}_{0}))p(\mathbf{\Theta}\mid \mathbf{V},\mathbf{Z})d\mathbf{\Theta},$$

(7)

where **Θ**= (**Θ**_{1}, **Θ**_{2}) is a collection of all of the unknown parameters in our statistical framework. The vector **Θ**_{1} includes parameters used to model the proportions of the speciated PM_{2.5} to the total PM_{2.5} mass, ** θ**, and the vector

The conditional distribution of **V** at location **s**_{0} and time *t*_{0} is:

$$p(\mathbf{V}({\mathbf{s}}_{0},{t}_{0})\mid \mathbf{V},\mathbf{Z},\mathbf{\Theta},Z({\mathbf{s}}_{0},{t}_{0}))\sim N(\mathit{\mu}(\mathbf{V}({\mathbf{s}}_{0},{t}_{0})),\mathbf{T}-{\mathrm{\sum}}^{12}{\mathrm{\sum}}^{\mathbf{V}}{\mathrm{\sum}}^{21})$$

(8)

where ** μ**(

$$p(\mathbf{V}({\mathbf{s}}_{0},{t}_{0})\mid \mathbf{V},\mathbf{Z},Z({\mathbf{s}}_{0},{t}_{0}))=\frac{1}{{N}_{1}}\sum _{{n}_{1}=1}^{{N}_{1}}p(\mathbf{V}({\mathbf{s}}_{0},{t}_{0})\mid \mathbf{V},\mathbf{Z},{\mathbf{\Theta}}^{({n}_{1})},Z({\mathbf{s}}_{0},{t}_{0})),$$

(9)

where **Θ**^{(n1)} is the
${n}_{1}^{th}$ draw from the posterior distribution for the parameters.

This section introduces a spatial-temporal model for total PM_{2.5} mass. We do not consider the PM_{2.5} measurements at the FRM monitoring stations to be the “true” values because of measurement error. We thus denote the observed total PM_{2.5} mass at location **s** at time *t* from the FRM network by * _{F}* (

$${\widehat{Z}}_{F}(\mathbf{s},t)=Z(\mathbf{s},t)+{e}_{F}(\mathbf{s},t),$$

(10)

where
${e}_{F}(\mathbf{s},t)\sim N(0,{\sigma}_{F}^{2})$ is the measurement error at location **s** and time *t*, which is independent of the true underlying process *Z*.

A second estimate of the total PM_{2.5} mass is the reconstructed fine mass (RCFM), * _{R}*(

$${\widehat{Z}}_{R}(\mathbf{s},t)=a(\mathbf{s},t)+Z(\mathbf{s},t)+{e}_{R}(\mathbf{s},t),$$

(11)

where
${e}_{R}(\mathbf{s},t)\sim N(0,{\sigma}_{R}^{2})$ is the measurement error at location **s** and time *t* and is also independent of processes *e _{F}* and

$$a(\mathbf{s},t)={a}_{1}(t)+{a}_{2}(\mathbf{s},t),$$

(12)

$${a}_{1}(t)=h(t)+{e}_{1}(t),$$

(13)

$${a}_{2}(\mathbf{s},t)={a}_{2}(\mathbf{s},t-1)+{e}_{2}(\mathbf{s},t),$$

(14)

where *a*_{1}(*t*) represents the overall temporal trend in the bias of the RCFM data, and *h*(*t*) is a smoothing function of time to explain seasonality in the additive bias term. The process *a*_{2}(**s**, *t*) accounts for the spatial-temporal structure which is not captured by the overall temporal trend. We assume the process *a*_{2}(**s**, *t*) is a spatial-temporal first-order Markovian process, and *e*_{1} and *e*_{2} are independent white noise processes and are independent of the process *Z*.

The true underlying PM_{2.5} process, *Z*(**s**, *t*), is modeled in terms of meteorological variables,

$$Z(\mathbf{s},t)={\mathbf{M}}^{T}(\mathbf{s},t)\mathit{\beta}(\mathbf{s},t)+{e}_{z}(\mathbf{s},t),$$

(15)

where the residual process *e _{z}* is assumed to be normal with a spatial-temporal covariance function.

We seek to predict values of *Z* at location **s**_{0} and time *t*_{0} given the data, * _{F}*,

$$p(Z({\mathbf{s}}_{0},{t}_{0})\mid \widehat{Z},\mathbf{M})\propto \int p(Z({\mathbf{s}}_{0},{t}_{0})\mid \widehat{Z},\mathbf{M},{\mathbf{\Theta}}_{Z})p({\mathbf{\Theta}}_{Z}\mid \widehat{Z},\mathbf{M})d{\mathbf{\Theta}}_{Z},$$

(16)

where **Θ*** _{Z}* is a collection of all parameters considered in the PM

Our statistical framework is applied to the daily speciated PM_{2.5} data in the United States for the year 2004. In the PM_{2.5} data framework, we study the spatial-temporal patterns of the additive bias term of the RCFM process. In the speciated PM_{2.5} model, we study the spatial-temporal associations for each component as well as the associations among the components, and we also study the spatial-temporal pattern of the proportion of each component of the total PM_{2.5} mass. We work with our spatial-temporal framework using only California data because of the computational costs, but we work with our framework over the entire United States at a fixed time or at a fixed location for year 2004.

In the PM_{2.5} data framework, we observed that the coefficients of the weather covariates are different in different regions from the results of the preliminary exploratory analysis. We thus implement our framework for the nine geographic regions as defined by the United States Census: Region (1): Northeast (New England); (2): Northeast (Middle Atlantic); (3): Midwest (East North Central); (4): Midwest (West North Central); (5): South (South Atlantic); (6): South (East South Central); (7): South (West South Central); (8): West (Mountain); (9): West (Pacific). We assume that ** β**(

We now describe the prior distributions used here. We use gamma priors, G(0.01,0.01), for the precision of the error terms, *e*_{1}(*t*) and *e*_{2}(**s**, *t*). Following the guidance and general approach used by U.S. EPA (1997 U.S. EPA (2000), we use informative uniform priors, Unif(1.778,1.814) and Unif(1.362,1.390) for σ* _{F}* and σ

Figure 2 shows boxplots of the estimated additive bias parameter, *a*(**s***, t*), for July 2004 and December 2004 in nine geographic regions as defined by the U.S. Census Bureau. The estimated daily bias values are the medians of the posterior distribution. Here, we used monthly averaged values at each location. In July 2004, the estimated bias values were the lowest in the South Atlantic region and the highest in the Pacific region. In all regions except the Pacific region, the total PM_{2.5} mass measured at the FRM network was larger than the RCFM in July 2004. The difference results in the Pacific region are not surprising because California during summer season is known to have the losses of about 60 – 90% of nitrate due to evaporation when the total PM_{2.5} mass at the FRM network is measured (Frank, 2006). In December 2004, all regions had negative values for the estimated additive bias. Overall, the estimated additive bias values in July was lower than in December because summer season has high sulfate and ammonium concentrations and the FRM total PM_{2.5} mass includes high amount of water during the summer season (Frank, 2006).

Boxplots of the additive bias of the RCFM process for (a) July 2004 and (b) December 2004 by geographic region (as defined by the U.S. Census). Region (1): Northeast (New England); (2): Northeast (Middle Atlantic); (3): Midwest (East North Central); (4): **...**

Figure 3 presents maps of the estimated concentrations of sulfate and nitrate on June 14 and December 14. Sulfate concentrations were higher than nitrate concentrations across the United States on June 14. On average, sulfate concentration was 3.22*μg/m*^{3} on June 14, while nitrate concentration was 0.38*μg/m*^{3}. On June 14, sulfate concentrations in the eastern United States were highest over the United States. Nitrate concentrations seemed to be high in Southern California. However, on December 14, sulfate concentrations decreased and nitrate concentrations increased. The northwestern United States had high sulfate concentrations on December 14, while Southern California and the Mountain region had high nitrate concentrations.

Maps of the posterior median of sulfate concentration (*μg/m*^{3}) and nitrate concentration (*μg/m*^{3}) on June 14, 2004 and on December 14, 2004, respectively.

The time series plots of the estimated concentrations of speciated PM_{2.5} in Los Angeles, Phoenix, and New York City in 2004 are presented in Figure 4. The estimated sulfate and ammonium concentrations in Los Angeles and New York City were higher than those in Phoenix. Sulfate concentrations tended to be highest during the summer in Los Angeles and New York City. For these three cities, nitrate concentrations tended to be highest during the winter. In particular, it is interesting that nitrate concentrations in Phoenix peak during the winter. It appears that Los Angeles and Phoenix have high TCM concentrations during the winter. In Los Angeles, all of the components had high concentrations during March, and total PM_{2.5} mass was also high during March.

Time series plots of the estimated speciated PM_{2.5} (*μg/m*^{3}) for three cities (Los Angeles, Phoenix, and New York City) in 2004.

Figure 5 shows maps of the estimated speciated PM_{2.5} composition by region and by season in 2004. In the maps, circle size corresponds to total PM_{2.5} mass, and we can clearly see the spatial-temporal pattern of the total PM_{2.5} mass. During the summer season (July-September), the total PM_{2.5} mass was highest in the eastern United States. TCM had the highest proportion of the total PM_{2.5} mass among the components over the entire United States. Sulfate concentrations were highest during the summer season in most of the eastern United States and the Pacific region because increased photochemical reactions in the atmosphere increases sulfate formation (Baumgardner et al., 1999). Nitrate concentrations were highest during the winter season (January-March) because higher ammonia availability, the cooler winter temperatures, and higher relative humidities favor ammonium nitrate condensation. On average, nitrate concentration during the winter season was 1.92*μg/m*^{3} over the United States (vs. 1.09*μg/m*^{3} for the year 2004). We also see the seasonal pattern of ammonium concentration. On average, during the winter and spring seasons (January-June), ammonium concentrations were about 3.2 time higher than during the summer and fall seasons. During the summer and fall seasons, TCM concentrations were high because of secondary organic aerosol formation. Crustal material concentrations were higher in the eastern United States during the spring season because of low soil moisture and high wind speeds. Also, these regions are impacted by North African dust during the spring (Malm et al., 2004).

We present in Table 1 the posterior estimates of the elements (*T _{ij}*) of the

Summary of the posterior distributions for the **T** matrix in California in 2004: The labelling is 1, sulfate, 2, nitrate, 3, ammonium, 4, total carbonaceous mass, 5, crustal material.

Finally, we present some model diagnostics using the deviance information criterion (DIC) of Speigelhalter et al. (2002), as well as the calibration. We compare three different models using only data from California in 2004. Model 1 is our statistical framework proposed in this article. Model 2 ignores the spatial-temporal process **ε** in Model 1. Model 3 removes both the spatial-temporal process **ε** and the hierarchical framework of the proportion parameters. The DIC for Model 1 was −1149. For Model 2 it was 15575, and for Model 3 it was 17604. We also use the root mean squared prediction error (RMSPE). The RMSPE value for Model 1 was 0.4631, for Model 2 it was 2.092, and for Model 3 it was 3.726. Thus, our framework has the lowest DIC and RMSPE values among three models. In addition, we did calibration analysis for the speciated PM_{2.5} in Phoenix to see the performance of our framework. We selected randomly 30 observations in 2004, and we obtained 95% prediction intervals for the *i*^{th} time given the data, not using data from the *i*^{th} time we are predicting. In Figure 6, we present the calibration plots for sulfate and nitrate in Phoenix. The percentages of the observed values that are outside the interval are 3.3%. We also did calibration analyses for the other three components in Phoenix. The percentages of the observed values lying outside the interval are between 0% and 6.7%. We obtained very good calibration results.

In this article we present a flexible hierarchical framework to study speciated PM_{2.5}. The multivariate spatial-temporal model proposed here allows for spatial-temporal dependency for each component and cross dependency structures among the components. A hierarchical framework provides a natural way to investigate the spatiotemporally varying contribution of each component to the total PM_{2.5} mass. Using our framework, we can estimate speciated PM_{2.5} at unobserved locations of interest in the United States. We also introduce a new statistical framework to incorporate PM_{2.5} data from different sources, which takes into account bias and measurement error over space and time.

We found that the additive bias term of the RCFM process overall seems to be negative and the RCFM is less than the total PM_{2.5} mass observed at the FRM network. However, in the Pacific region, we can clearly see the different results during the summer season because of the losses of nitrate. In the eastern United States, the contribution of sulfate to the total PM_{2.5} mass tends to be higher during the summer. In almost all regions, sulfate concentrations are higher during the summer. Also, the spatial differences in the sulfate concentrations are the largest during the summer. On average, the sulfate proportions and concentrations in the East South Central region are highest where sulfur is emitted from coal-fired sources (Malm et al., 2002). In general, nitrate concentrations are higher during the winter, and they are also higher in urban areas because of high nitrogen oxide (NO_{x}) emissions from automobiles. During the summer, nitrate and ammonium concentrations in the western United States are low. TCM concentrations explain most of total PM_{2.5} mass. It is found that TCM has high concentrations in the summer and fall seasons because of high fire-related activity. During the spring season, crustal material concentrations are high in the eastern United States. Our results for the speciated PM_{2.5} are consistent with previous analyses (Malm et al., 2004).

The diagnostics show the adequate performance of our model. Some extensions could be considered in our statistical framework. For example, the total PM_{2.5} process, *Z*(**s**, *t*), could be modeled as a nonstationary spatial-temporal Gaussian process with the meteorological variables. In the STLMC, the separable spatial-temporal process for **w**(**s***, t*) we assumed could be extended to a non-separable process. However, computational burden is exacerbated in these cases and we used simple spatial-temporal models.

We are currently working on association between speciated PM_{2.5} and daily mortality. The framework and results presented here will be essential for the health analysis.

The authors would like to thank Drs. Holland, Tesh, and Prakash at EPA for providing the data as well as helpful and insightful discussions.

- Baumgardner RE, Isil SS, Bowser JJ, Fitzgerald KM. Measurements of rural sulfur dioxide and particle sulfate: Analysis of CASTNet data, 1987 through 1996. Journal of Air Waste Management Association. 1999;49:1266–1279.
- Flanagan JB, Peterson MR, Jayanty RKM, Rickman EE. Analysis of Speciation Network Carbon Blank Data. RTP, NC: RTI International; 2003.
- Frank NH. Retained nitrate, hydrated sulfates, and carbonaceous mass in federal reference method fine particulate matter for six eastern U.S. cities. Journal of Air Waste Management Association. 2006;56:500–511. [PubMed]
- Fuentes M, Song H, Ghosh SK, Holland DM, Davis JM. Spatial association between speciated fine particles and mortality. Biometrics. 2006;62:855–863. [PubMed]
- Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association. 1990;85:398–409.
- Gelfand AE, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:465–479.
- Grzebyk M, Wackernagel H. Multivariate analysis and spatial/temporal scales: real and complex models. Proceedings of the XVIIth International Biometrics Conference; 1994. pp. 19–33.
- Malm WC, Schichtel BA, Ames RB, Gebhart KA. A 10-year spatial and temporal trend of sulfate across the United States. Journal of Geophysical Research. 2002;107:D224627. doi: 10.1029/2002JD002107. [Cross Ref]
- Malm WC, Schichtel BA, Pitchford ML, Ashbaugh LL, Eldred RA. Spatial and monthly trends in speciated fine particle concentration in the United States. Journal of Geophysical Research. 2004;109:D03306. doi: 10.1029/2003JD003739. [Cross Ref]
- McFadden D. Conditional logit analysis of qualitative choice behavior. In: Zarembka P, editor. Fron-tiers of Econometrics. New York: Academic Press; 1974. pp. 105–142.
- Özkaynak H, Thurston GD. Associations between 1980 U.S. mortality rates and alternative measures of airborne particle concentration. Risk Analysis. 1987;7:449–461. [PubMed]
- Rao V, Frank N, Rush A, Dimmick F. Chemical Speciation of PM
_{2.5}in Urban and Rural Areas. National Air Quality and Emissions Trends Report. 2003:13–23. - Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society B. 2002;64:583–639.
- U.S. Environmental Protection Agency. National Ambient Air Quality Standards for Particulate Matter; Final Rule, Part II. Federal Register. 1997;40 CFR Part 50.
- U.S. Environmental Protection Agency. Quality Assurance Guidance Document, Quality assurance Project Plan: PM
_{2.5}Speciation Trends Network Field Sampling. 2000. EPA 454/R-01-001, RTP, NC, Available at: http://www.epa.gov/ttn/amtic/files/ambient/pm25/spec/1025sqap.pdf. - U.S. Environmental Protection Agency. National Air Quality and Emissions Trends Report, 2003. Special Studies Edition. 2003. EPA 454/R-03-005, RTP, NC, Available at: http://www.epa.gov/air/airtrends/aqtrnd03/
- Wackernagel H. Multivariate Geostatistics-An Introduction with applications. 2. New York: Springer-Verlag; 1998.
- Wikle CK, Milliff RF, Nychka D, Berliner M. Spatiotemporal hierarchical Bayesian modeling: Tropical ocean surface winds. Journal of the American Statistical Association. 2001;95:1076–1987.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |