PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Stat Theory Pract. Author manuscript; available in PMC 2010 April 19.
Published in final edited form as:
J Stat Theory Pract. 2009 June 1; 3(2): 407–418.
PMCID: PMC2856093
NIHMSID: NIHMS148545

Multivariate spatial-temporal modeling and prediction of speciated fine particles 1

Abstract

Fine particulate matter (PM2.5) is an atmospheric pollutant that has been linked to serious health problems, including mortality. PM2.5 is a mixture of pollutants, and it has five main components: sulfate, nitrate, total carbonaceous mass, ammonium, and crustal material. These components have complex spatial-temporal dependency and cross dependency structures. It is important to gain insight and better understanding about the spatial-temporal distribution of each component of the total PM2.5 mass, and also to estimate how the composition of PM2.5 might change with space and time, by spatially interpolating speciated PM2.5. This type of analysis is needed to conduct spatial-temporal epidemiological studies of the association of these pollutants and adverse health effect.

We introduce a multivariate spatial-temporal model for speciated PM2.5. We propose a Bayesian hierarchical framework with spatiotemporally varying coefficients. In addition, a linear model of coregionalization is developed to account for spatial and temporal dependency structures for each component as well as the associations among the components. We also introduce a statistical framework to combine different sources of data, which accounts for bias and measurement error. We apply our framework to speciated PM2.5 data in the United States for the year 2004. Our study shows that sulfate concentrations are the highest during the summer while nitrate concentrations are the highest during the winter. The results also show total carbonaceous mass

Keywords: multivariate spatiotemporal processes, Bayesian inference, linear coregionalization model, air pollution, environmental statistics

1 Introduction

The study of the association between ambient particulate matter (PM) and human health has received much attention in epidemiological studies over the past few years. Özkaynak and Thurston (1987) conducted an analysis of the association between several particle measures and mortality. Their results showed the importance of considering particle size, composition, and source information when modeling particle pollution health effect. In particular, fine particle matter, PM2.5 (< 2.5μm in diameter), is an atmospheric pollutant that has been linked to numerous adverse health effect (e.g. respiratory and cardiovascular diseases). PM2.5 is a mixture of pollutants, and is classified into five main components (U.S. EPA, 2003): sulfate, nitrate, total carbonaceous mass (TCM), ammonium, and crustal material (that includes calcium, iron, silicon, aluminum, and titanium). These components have complex spatial-temporal dependency and cross dependency structures. Fuentes et al. (2006) studied the health effect of PM2.5 and its components using monthly data across the United States. In order to investigate the health effect associated to speciated fine PM across space and time, we need to have first the speciated PM2.5 information at the locations and times of interest. However, daily speciated PM2.5 measurements are available at limited monitoring sites, and missing values may occur at given time point. Rao et al. (2003) and Malm et al. (2004) showed the spatial and temporal patterns of speciated PM2.5, but they only conducted an exploratory analysis of speciated PM2.5. The research presented here is part of a larger project to study the association between speciated PM2.5 and adverse health outcomes across the entire U.S. Our goal here is to develop a statistical framework using all available sources of data about speciated PM2.5 to investigate the spatial-temporal patterns of speciated PM2.5 and then be able to predict speciated PM2.5 at all locations and times of interest. We also study the spatial-temporal patterns of the so called “unknown components” which are not the main speciated PM2.5 components (as defined by EPA).

In this article we introduce a new statistical framework to combine different sources of information of PM2.5 taking into account potential bias and measurement error, and we develop a multivariate spatial-temporal model for speciated PM2.5 accounting for the complex dependency structures of the speciated PM2.5. We develop a spatial-temporal linear model of coregionalization (STLMC) to account for multivariate spatial-temporal dependency structures. In addition, a hierarchical framework is proposed to investigate the relative contribution of each component to the total PM2.5 mass, which changes over space and time.

We use a new speciated PM2.5 data set. To our knowledge, this is the first time that a statistical framework has been proposed and used to analyze speciated PM2.5 across the entire United States. The proposed framework captures the cross dependency structure among the PM2.5 components and explains their spatial-temporal dependency structures. A Bayesian hierarchical framework is used to study different random effect of interest. The STLMC proposed here allows for very general multivariate spatial-temporal covariance structures, and offer some computational advantages. We can accurately predict speciated PM2.5 at any location or time point of interest. In addition, we present a new approach to combine different sources of information about PM2.5, that improve the prediction of the total PM2.5 mass. Wikle et al. (2001) presented a similar approach to combine different sources of information of ocean surface winds, but they treated one of the data sources as a prior process. In our approach, all of the data sources are simultaneously represented in terms of the underlying truth, and we also model the potential bias of the different sources of information as spatial temporal processes.

This article is organized as follows. In Section 2 we describe the data used in this study. In Section 3, we present a Bayesian hierarchical multivariate spatial-temporal model for speciated PM2.5, and we also introduce the STLMC. A statistical framework to combine PM2.5 data is described in Section 4. In Section 5 we present the results, and in Section 6 we offer a general discussion.

2 Data Description

PM2.5 data from two monitoring networks and meteorological data in the conterminous United States for year 2004 were used in this study. The first source of PM2.5 data is the Speciated Trends Network (STN) established by the U.S. Environmental Protection Agency (EPA) in 1999. The STN measures the speciated PM2.5 either every day, every third day or every sixth day. It included about 200 monitoring stations in 2004, which were mostly in urban areas. Even though the STN collects numerous trace elements, elemental carbon, organic carbon, and ions (sulfate, nitrate, sodium, potassium, ammonium), we only consider the five main PM2.5 components presented in the previous section. In the STN, sulfate, nitrate, and ammonium are measured independently. Total carbonaceous mass is sum of elemental carbon mass and estimated organic carbon mass which is 1.4 × ([OC] − 1.53), where [OC] is the measured organic carbon value, 1.4 is the factor to correct organic carbon mass for other elements (Rao et al., 2003), and 1.53 is the blank correction factor to adjust for sampling artifacts (Flanagan et al., 2003). Elemental carbon mass is also measured at STN monitoring stations. Crustal material is computed using the IMPROVE equation (Malm et al., 2004) for the five most prevalent trace elements.

Since the PM2.5 data at the STN monitoring stations provide sparse spatial coverage, using only the STN monitoring data might be insufficient for modeling the speciated fine PM over the entire United States. Therefore, we use the total PM2.5 data from the Federal Reference Method (FRM) monitoring network which includes rural and urban sites, and measures PM2.5 samples either every day, every third day or every sixth day. While the STN is a smaller network, the FRM network is a large national network, which consisted of about 1000 monitoring stations in 2004.

Meteorological data for 2004 have been provided from the U.S. National Climate Data Center. We use five daily meteorological variables: minimum temperature (°C), maximum temperature (°C), dew point temperature (°C), wind speed (m/s), and pressure (hPa).

3 Statistical Models

While speciated PM2.5 is only available at about 200 STN stations, total PM2.5 mass is available at about 1000 FRM network stations. In addition, we can compute an estimate of the total PM2.5 mass from the PM2.5 components. Thus, we model the total PM2.5 mass using the PM2.5 data from both networks, and then express the mean of each PM2.5 component as a proportion of the total PM2.5 mass. The proportion of each component to the total PM2.5 mass varies over space and time, and we use a hierarchical framework to account for the spatial-temporal associations of the proportion. To ensure that at each site and time the sum of the proportions of the components to the total PM2.5 mass is one, we use a multinomial logit function (McFadden, 1974) for the proportion parameters. Even though the spatial-temporal dependency structures of the proportions are considered, it could be insufficient to capture the spatial-temporal dependency and the cross dependency structures of speciated PM2.5. We thus include a mean-zero spatial-temporal process in the model, which explains the dependency structures. This approach allows us to estimate both the speciated PM2.5 in terms of the total PM2.5 mass and the cross-covariance between the PM2.5 components, which is not captured by the mean function of the speciated PM2.5. Figure 1 shows the framework of the speciated fine PM.

Figure 1
Model framework for speciated PM2.5.

We assume there is an underlying (unobserved) field Z(s, t), where Z(s, t) represents the true total PM2.5 value at location s [set membership] D1 at time t [set membership] D2, where {s : s1…,sNs} [set membership] D1 [subset or is implied by] R2and {t : t1, …, tNt} [set membership] D2 [subset or is implied by]R. Let V(s, t) = (V1(s, t), V2(s, t), … V5(s, t))T be a vector of the speciated PM2.5 at location s and at time t. The parameter θ (s, t) is a vector of the proportions of the speciated PM2.5 to the total PM2.5 mass at location s and time t. Each parameter θi(s, t) denotes a proportion of the total PM2.5 mass attributed to the component i, for i = 1, … 5. The model of the speciated PM2.5 is defined as:

V(s,t)=θ(s,t)Z(s,t)+ε(s,t)+εw(s,t),whereθT(s,t)=(θ1(s,t),,θ5(s,t))logit[θi(s,t)]=ηi(t)+γi(s,t),
(1)

ηi(t)=fi(t)+eηi(t),γi(s,t)=γi(s,t1)+eγi(s,t),Z(s,t)=MT(s,t)β(s,t)+ez(s,t),
(2)

ε(s,t)=Aw(s,t),
(3)

where logit [θi(s, t)] = log [θi(s, t)/θ5(s, t)] is modeled using a dynamic hierarchical model (Gelfand et al., 2005). The function ηi(t) denotes the overall temporal trend of the ith logit component, which is expressed by the function fi(t) to explain seasonality of the ith logit component. The process eηi is assumed to be a white noise Gaussian process. We model the process γi(s, t) using a spatial-temporal first-order Markovian process, which accounts for the spatial-temporal structure of the ith logit component not explained by the overall temporal trend. We assume the process eγi, t) is a Gaussian process with mean zero and a spatial covariance function to explain spatial dependency structure. To guarantee that the multinomial logit model is identifiable, we assume η5(t) = 0 for all t, and γ5(s, t) = 0, for all s and t. In our study, crustal material is assumed to be the 5th component because it is the most stable component.

The true total PM2.5, Z(s, t), is affected by a vector of meteorological weather variables M(s, t) (minimum temperature, maximum temperature, dew point temperature, wind speed, and pressure). The meteorological influence could vary in space and time, and we define β(s, t) to be meteorological parameters at location s and time t. The meteorological data might not exist at all the sites of interest. We thus interpolate the weather data at those locations using a spatial model for each time point (as part of the hierarchical frame-work). The residual process denoted by ez(s, t) is assumed to be normal. We explain the spatial-temporal modeling of the Z(·, ·) in Section 4.

The spatial-temporal process ε(s, t) = (ε1(s, t), … ε5(s, t))T is assumed to be a Gaussian process with mean zero and a covariance matrix, which changes with space and time. The covariance matrix of the process ε(s, t) represents the dependency structures of the speciated PM2.5 not captured by the mean function of the speciated fine PM. The STLMC is developed to specify the covariance function of the process ε(s, t). The process ε(s, t) is expressed by the weight matrix A and the vector w(s, t) having independent Gaussian spatial-temporal processes. We discuss the STLMC in detail in the following subsection. The pure measurement error process, εw(s, t), is assumed to be normal and be independent of ε(s, t).

3.1 The Spatial-Temporal Linear Model of Coregionalization

The basic idea of the STLMC is that dependent spatial-temporal processes are expressed using the linear combination of uncorrelated spatial-temporal processes in the spatial-temporal modeling. The STLMC provides a very rich class of multivariate spatial-temporal processes with simple specification and interpretation. The STLMC like the linear model of coregionalization use in multivariate spatial analysis (Grzebyk and Wackernagel, 1994; Wackernagel, 1998) could be used as a dimension reduction method, which means that the given multivariate processes are represented as lower dimensional processes. In this study, we use the STLMC to construct a valid cross-covariance function of multivariate spatial-temporal processes. The STLMC used here is:

ε(s,t)=Aw(s,t),
(4)

where wT(s, t) = (w1(s, t), … w5(s, t)), and A is a 5 × 5 weight matrix explaining the association among the five variables. Without loss of generality, we assume A is a full rank lower triangular matrix. For computational convenience, we adopt a simple approach to model the spatial-temporal process w(s, t). We assume that wi(s, t), i = 1, … 5, are independent Gaussian spatial-temporal processes with mean zero and separable spatial-temporal covariance, Cov(wi(sl,tj),wi(sl,tj))=Ci(1)(sl,sl;φi)Ci(2)(tj,tj;ψi), where Ci(1) is a spatial covariance with the parameter vector [var phi]i and Ci(2) is a temporal autocovariance with the parameter vector ψi. The STLMC in (4) implies E(ε (s, t)) = 0 and

Cov(ε(sl,tj),ε(sl,tj))=i=15Ci(1)(sl,sl;φi)Ci(2)(tj,tj;ψi)Ti,
(5)

Where Ti=aiaiT and ai is the ith column vector of A, and i=15Ti=T becomes the covariance matrix of ε at any site s and time t.

We form ε=(ε1T,,ε5T)T and εiT=(εiT(t1),,εiT(tNt)) for i = 1, …. 5, where εiT(tj)=(εi(s1,tj),,εi(sNs,tj)) for j=1 …Nt then the covariance matrix of ε is

ε=i=15TiUiRi,
(6)

where [multiply sign in circle] denotes the Kronecker product. Each Ri is a Ns × Ns matrix with (Ri)ll=Ci(1)(sl,sl;φi), which accounts for spatial associations. Each Ui is a Nt × Nt matrix with (Ui)jj=Ci(2)(tj,tj;ψi), which explains temporal associations. This covariance matrix, Σε, is nonseparable, except in the special case of the STLMC where Ci(1)=Ci(1)=C(1) and Ci(2)=Ci(2)=C(2) for all i, i′ = 1. … 5. In this case, Σε = T [multiply sign in circle] U [multiply sign in circle] R for (R)ll = C(1)(sl, sl; [var phi]) and (U)jj = C(2)(tj, tj; ψ).

3.2 Algorithm for Estimation and Prediction

We now discuss estimation and prediction of the speciated PM2.5 using a Bayesian approach. In order to predict the speciated PM2.5 at location s0 and time t0 given the data V and Z (all available speciated PM2.5 data and total PM2.5 mass data, respectively) and a value Z(s0, t0), we need the posterior predictive distribution of V(s0, t0):

p(V(s0,t0)V,Z,Z(s0,t0))p(V(s0,t0)V,Z,Θ,Z(s0,t0))p(ΘV,Z)dΘ,
(7)

where Θ= (Θ1, Θ2) is a collection of all of the unknown parameters in our statistical framework. The vector Θ1 includes parameters used to model the proportions of the speciated PM2.5 to the total PM2.5 mass, θ, and the vector Θ2 includes covariance parameters of the speciated PM2.5. We use a Markov chain Monte Carlo (MCMC) sampling algorithm to sample N1 values from the posterior distribution of the parameter Θ (within the software WinBUGS). Our MCMC sampling algorithm has three stages. We alternate between the proportion parameters Θ1 given the data (Stage 1), the covariance parameters Θ2 given the data and the values of Θ1 updated (Stage 2), and the unobserved true values of V at all sites and time points (Stage 3). We obtain the conditional posterior distribution of the parameters Θgiven the data updated in Stage 3.

The conditional distribution of V at location s0 and time t0 is:

p(V(s0,t0)V,Z,Θ,Z(s0,t0))N(μ(V(s0,t0)),T12V21)
(8)

where μ(V(s0, t0)) = θ (s0, t0)Z(s0, t0) + Σ12ΣV(VθZ), (Σ21)T = Σ12 = Cov(V(s0, t0), V) is a 5 × (5NtNs) matrix and ΣV = Cov(V, V). Using the Rao-Blackwellized estimator (Gelfand and Smith, 1990), the predictive distribution is approximated by

p(V(s0,t0)V,Z,Z(s0,t0))=1N1n1=1N1p(V(s0,t0)V,Z,Θ(n1),Z(s0,t0)),
(9)

where Θ(n1) is the n1th draw from the posterior distribution for the parameters.

4 Spatial-Temporal Model for PM2.5

This section introduces a spatial-temporal model for total PM2.5 mass. We do not consider the PM2.5 measurements at the FRM monitoring stations to be the “true” values because of measurement error. We thus denote the observed total PM2.5 mass at location s at time t from the FRM network by ZF (s, t), and we assume that

Z^F(s,t)=Z(s,t)+eF(s,t),
(10)

where eF(s,t)N(0,σF2) is the measurement error at location s and time t, which is independent of the true underlying process Z.

A second estimate of the total PM2.5 mass is the reconstructed fine mass (RCFM), ZR(s, t), defined as a sum of five main PM2.5 components measured at the STN monitoring stations. We model ZR(s, t) as

Z^R(s,t)=a(s,t)+Z(s,t)+eR(s,t),
(11)

where eR(s,t)N(0,σR2) is the measurement error at location s and time t and is also independent of processes eF and Z. Since the “true” total PM2.5 mass consists of more pollutants than the five main components, the additive bias a(s, t) is needed. The bias can be represented as the bias of the RCFM data, ZR, with respect to the FRM data, ZF. Exploratory analysis suggests the additive bias varies over space and time, and we model a(s, t) using a hierarchical framework,

a(s,t)=a1(t)+a2(s,t),
(12)

a1(t)=h(t)+e1(t),
(13)

a2(s,t)=a2(s,t1)+e2(s,t),
(14)

where a1(t) represents the overall temporal trend in the bias of the RCFM data, and h(t) is a smoothing function of time to explain seasonality in the additive bias term. The process a2(s, t) accounts for the spatial-temporal structure which is not captured by the overall temporal trend. We assume the process a2(s, t) is a spatial-temporal first-order Markovian process, and e1 and e2 are independent white noise processes and are independent of the process Z.

The true underlying PM2.5 process, Z(s, t), is modeled in terms of meteorological variables,

Z(s,t)=MT(s,t)β(s,t)+ez(s,t),
(15)

where the residual process ez is assumed to be normal with a spatial-temporal covariance function.

We seek to predict values of Z at location s0 and time t0 given the data, ZF, ZR, and M. Therefore, the posterior predictive distribution of Z(s0, t0) given the observations Z = (ZF, ZR) and M is

p(Z(s0,t0)Z^,M)p(Z(s0,t0)Z^,M,ΘZ)p(ΘZZ^,M)dΘZ,
(16)

where ΘZ is a collection of all parameters considered in the PM2.5 model. After simulating N2 values from the posterior distribution of the parameters ΘZ, the estimator for the predictive distribution is p(Z(s0,t0)Z^,M)=1N2n2=1N2p(Z(s0,t0)Z^,M,ΘZ(n2)), where ΘZ(n2) is the n2th draw from the posterior distribution.

5 Application

Our statistical framework is applied to the daily speciated PM2.5 data in the United States for the year 2004. In the PM2.5 data framework, we study the spatial-temporal patterns of the additive bias term of the RCFM process. In the speciated PM2.5 model, we study the spatial-temporal associations for each component as well as the associations among the components, and we also study the spatial-temporal pattern of the proportion of each component of the total PM2.5 mass. We work with our spatial-temporal framework using only California data because of the computational costs, but we work with our framework over the entire United States at a fixed time or at a fixed location for year 2004.

In the PM2.5 data framework, we observed that the coefficients of the weather covariates are different in different regions from the results of the preliminary exploratory analysis. We thus implement our framework for the nine geographic regions as defined by the United States Census: Region (1): Northeast (New England); (2): Northeast (Middle Atlantic); (3): Midwest (East North Central); (4): Midwest (West North Central); (5): South (South Atlantic); (6): South (East South Central); (7): South (West South Central); (8): West (Mountain); (9): West (Pacific). We assume that β(s, t) varies across regions but is constant over space and time within each region. For the error term ez, we use a separable spatial-temporal covariance, a stationary exponential covariance for space and autocovariance of the first-order autoregressive function AR(1) for time. From the exploratory analysis, it appears that the overall temporal pattern of bias is a sine or cosine function. We assume that the overall temporal trend function for the bias, h(t), in (13) is a linear combination of one sine and one cosine function with respect to a 12-month period (1/frequency). Similarly, we conducted a preliminary data analysis to choose the temporal function, fi(t), in the multinomial logit model of the proportion parameters. We fitted possible smoothing functions, and we found the third-order autoregressive function AR(3) seemed appropriate using AIC and BIC. We use AR(3) for the function fi(t). For the process w, we use a stationary exponential covariance function Ci(1)(sl,sl;σi2,φi)=σi2exp(h1/φi), i = 1, …, 5, for h1 = ||slsl|| (in km) with the sill parameter σi2 and the range parameter ϕi and the autocovariance function of the AR(1) Ci(2)(tj,tj;ψi)=ψih2/(1ψi2) for h2 = |tjtj|. For the spatial covariance function of the process eγi (s, t), we also use an exponential covariance function with unit variance.

We now describe the prior distributions used here. We use gamma priors, G(0.01,0.01), for the precision of the error terms, e1(t) and e2(s, t). Following the guidance and general approach used by U.S. EPA (1997 U.S. EPA (2000), we use informative uniform priors, Unif(1.778,1.814) and Unif(1.362,1.390) for σF and σR, respectively. For the coefficients of the weather covariates and the sine and cosine functions, we use vague normal priors, N(0, 0.01) (0.01 is the precision). For the error term ez, we use normal hyperprior, N (0, 0.1), for the parameter in the temporal covariance, and we use uniform hyperpriors, Unif(1,100) and Unif(0,100), for the range and sill parameters, respectively. We assume the measurement error term εw(s,t)N(0,σw2I5) where I5 is an identity matrix. We use a uniform prior, Unif(0,100), for σw2. In terms of the priors for the coefficients of the function fi(t), we use normal priors, N(0, 0.01). For the error term eni, we use a normal prior, N(0,σηi2),σηi2 being a uniform hyperprior, Unif(0,100). For the process eγi, we use a uniform hyperprior, Unif(1,100), for the range parameter. For the process wi, we use uniform hyperpriors Unif(1,100) and Unif(0,100), for the range and sill parameters, respectively. The parameter ψi has normal hyperprior, N(0, 0.1). Since A is a lower triangular matrix, we need to assign priors for elements Aii, i, i′ = 1, …, 5 and i ≥i′. We set Aii2=1, and we use normal priors, N (0, 0.1), for off-diagonal elements.

Figure 2 shows boxplots of the estimated additive bias parameter, a(s, t), for July 2004 and December 2004 in nine geographic regions as defined by the U.S. Census Bureau. The estimated daily bias values are the medians of the posterior distribution. Here, we used monthly averaged values at each location. In July 2004, the estimated bias values were the lowest in the South Atlantic region and the highest in the Pacific region. In all regions except the Pacific region, the total PM2.5 mass measured at the FRM network was larger than the RCFM in July 2004. The difference results in the Pacific region are not surprising because California during summer season is known to have the losses of about 60 – 90% of nitrate due to evaporation when the total PM2.5 mass at the FRM network is measured (Frank, 2006). In December 2004, all regions had negative values for the estimated additive bias. Overall, the estimated additive bias values in July was lower than in December because summer season has high sulfate and ammonium concentrations and the FRM total PM2.5 mass includes high amount of water during the summer season (Frank, 2006).

Figure 2
Boxplots of the additive bias of the RCFM process for (a) July 2004 and (b) December 2004 by geographic region (as defined by the U.S. Census). Region (1): Northeast (New England); (2): Northeast (Middle Atlantic); (3): Midwest (East North Central); (4): ...

Figure 3 presents maps of the estimated concentrations of sulfate and nitrate on June 14 and December 14. Sulfate concentrations were higher than nitrate concentrations across the United States on June 14. On average, sulfate concentration was 3.22μg/m3 on June 14, while nitrate concentration was 0.38μg/m3. On June 14, sulfate concentrations in the eastern United States were highest over the United States. Nitrate concentrations seemed to be high in Southern California. However, on December 14, sulfate concentrations decreased and nitrate concentrations increased. The northwestern United States had high sulfate concentrations on December 14, while Southern California and the Mountain region had high nitrate concentrations.

Figure 3
Maps of the posterior median of sulfate concentration (μg/m3) and nitrate concentration (μg/m3) on June 14, 2004 and on December 14, 2004, respectively.

The time series plots of the estimated concentrations of speciated PM2.5 in Los Angeles, Phoenix, and New York City in 2004 are presented in Figure 4. The estimated sulfate and ammonium concentrations in Los Angeles and New York City were higher than those in Phoenix. Sulfate concentrations tended to be highest during the summer in Los Angeles and New York City. For these three cities, nitrate concentrations tended to be highest during the winter. In particular, it is interesting that nitrate concentrations in Phoenix peak during the winter. It appears that Los Angeles and Phoenix have high TCM concentrations during the winter. In Los Angeles, all of the components had high concentrations during March, and total PM2.5 mass was also high during March.

Figure 4
Time series plots of the estimated speciated PM2.5 (μg/m3) for three cities (Los Angeles, Phoenix, and New York City) in 2004.

Figure 5 shows maps of the estimated speciated PM2.5 composition by region and by season in 2004. In the maps, circle size corresponds to total PM2.5 mass, and we can clearly see the spatial-temporal pattern of the total PM2.5 mass. During the summer season (July-September), the total PM2.5 mass was highest in the eastern United States. TCM had the highest proportion of the total PM2.5 mass among the components over the entire United States. Sulfate concentrations were highest during the summer season in most of the eastern United States and the Pacific region because increased photochemical reactions in the atmosphere increases sulfate formation (Baumgardner et al., 1999). Nitrate concentrations were highest during the winter season (January-March) because higher ammonia availability, the cooler winter temperatures, and higher relative humidities favor ammonium nitrate condensation. On average, nitrate concentration during the winter season was 1.92μg/m3 over the United States (vs. 1.09μg/m3 for the year 2004). We also see the seasonal pattern of ammonium concentration. On average, during the winter and spring seasons (January-June), ammonium concentrations were about 3.2 time higher than during the summer and fall seasons. During the summer and fall seasons, TCM concentrations were high because of secondary organic aerosol formation. Crustal material concentrations were higher in the eastern United States during the spring season because of low soil moisture and high wind speeds. Also, these regions are impacted by North African dust during the spring (Malm et al., 2004).

Figure 5
Maps of estimated speciated PM2.5 composition by region and by season in 2004.

We present in Table 1 the posterior estimates of the elements (Tij) of the T matrix using the data from California in 2004. The results show the correlations between a pair of speciated PM2.5 not explained by the mean function of the speciated PM2.5.

Table 1
Summary of the posterior distributions for the T matrix in California in 2004: The labelling is 1, sulfate, 2, nitrate, 3, ammonium, 4, total carbonaceous mass, 5, crustal material.

Finally, we present some model diagnostics using the deviance information criterion (DIC) of Speigelhalter et al. (2002), as well as the calibration. We compare three different models using only data from California in 2004. Model 1 is our statistical framework proposed in this article. Model 2 ignores the spatial-temporal process ε in Model 1. Model 3 removes both the spatial-temporal process ε and the hierarchical framework of the proportion parameters. The DIC for Model 1 was −1149. For Model 2 it was 15575, and for Model 3 it was 17604. We also use the root mean squared prediction error (RMSPE). The RMSPE value for Model 1 was 0.4631, for Model 2 it was 2.092, and for Model 3 it was 3.726. Thus, our framework has the lowest DIC and RMSPE values among three models. In addition, we did calibration analysis for the speciated PM2.5 in Phoenix to see the performance of our framework. We selected randomly 30 observations in 2004, and we obtained 95% prediction intervals for the ith time given the data, not using data from the ith time we are predicting. In Figure 6, we present the calibration plots for sulfate and nitrate in Phoenix. The percentages of the observed values that are outside the interval are 3.3%. We also did calibration analyses for the other three components in Phoenix. The percentages of the observed values lying outside the interval are between 0% and 6.7%. We obtained very good calibration results.

Figure 6
Model diagnostics for sulfate and nitrate in Phoenix: STN values of each component versus the median of the predictive posterior distribution of the component values at time j eliminating the ith observation. The dotted lines show the 95% prediction intervals. ...

6 Conclusion

In this article we present a flexible hierarchical framework to study speciated PM2.5. The multivariate spatial-temporal model proposed here allows for spatial-temporal dependency for each component and cross dependency structures among the components. A hierarchical framework provides a natural way to investigate the spatiotemporally varying contribution of each component to the total PM2.5 mass. Using our framework, we can estimate speciated PM2.5 at unobserved locations of interest in the United States. We also introduce a new statistical framework to incorporate PM2.5 data from different sources, which takes into account bias and measurement error over space and time.

We found that the additive bias term of the RCFM process overall seems to be negative and the RCFM is less than the total PM2.5 mass observed at the FRM network. However, in the Pacific region, we can clearly see the different results during the summer season because of the losses of nitrate. In the eastern United States, the contribution of sulfate to the total PM2.5 mass tends to be higher during the summer. In almost all regions, sulfate concentrations are higher during the summer. Also, the spatial differences in the sulfate concentrations are the largest during the summer. On average, the sulfate proportions and concentrations in the East South Central region are highest where sulfur is emitted from coal-fired sources (Malm et al., 2002). In general, nitrate concentrations are higher during the winter, and they are also higher in urban areas because of high nitrogen oxide (NOx) emissions from automobiles. During the summer, nitrate and ammonium concentrations in the western United States are low. TCM concentrations explain most of total PM2.5 mass. It is found that TCM has high concentrations in the summer and fall seasons because of high fire-related activity. During the spring season, crustal material concentrations are high in the eastern United States. Our results for the speciated PM2.5 are consistent with previous analyses (Malm et al., 2004).

The diagnostics show the adequate performance of our model. Some extensions could be considered in our statistical framework. For example, the total PM2.5 process, Z(s, t), could be modeled as a nonstationary spatial-temporal Gaussian process with the meteorological variables. In the STLMC, the separable spatial-temporal process for w(s, t) we assumed could be extended to a non-separable process. However, computational burden is exacerbated in these cases and we used simple spatial-temporal models.

We are currently working on association between speciated PM2.5 and daily mortality. The framework and results presented here will be essential for the health analysis.

Acknowledgments

The authors would like to thank Drs. Holland, Tesh, and Prakash at EPA for providing the data as well as helpful and insightful discussions.

References

  • Baumgardner RE, Isil SS, Bowser JJ, Fitzgerald KM. Measurements of rural sulfur dioxide and particle sulfate: Analysis of CASTNet data, 1987 through 1996. Journal of Air Waste Management Association. 1999;49:1266–1279.
  • Flanagan JB, Peterson MR, Jayanty RKM, Rickman EE. Analysis of Speciation Network Carbon Blank Data. RTP, NC: RTI International; 2003.
  • Frank NH. Retained nitrate, hydrated sulfates, and carbonaceous mass in federal reference method fine particulate matter for six eastern U.S. cities. Journal of Air Waste Management Association. 2006;56:500–511. [PubMed]
  • Fuentes M, Song H, Ghosh SK, Holland DM, Davis JM. Spatial association between speciated fine particles and mortality. Biometrics. 2006;62:855–863. [PubMed]
  • Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association. 1990;85:398–409.
  • Gelfand AE, Banerjee S, Gamerman D. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics. 2005;16:465–479.
  • Grzebyk M, Wackernagel H. Multivariate analysis and spatial/temporal scales: real and complex models. Proceedings of the XVIIth International Biometrics Conference; 1994. pp. 19–33.
  • Malm WC, Schichtel BA, Ames RB, Gebhart KA. A 10-year spatial and temporal trend of sulfate across the United States. Journal of Geophysical Research. 2002;107:D224627. doi: 10.1029/2002JD002107. [Cross Ref]
  • Malm WC, Schichtel BA, Pitchford ML, Ashbaugh LL, Eldred RA. Spatial and monthly trends in speciated fine particle concentration in the United States. Journal of Geophysical Research. 2004;109:D03306. doi: 10.1029/2003JD003739. [Cross Ref]
  • McFadden D. Conditional logit analysis of qualitative choice behavior. In: Zarembka P, editor. Fron-tiers of Econometrics. New York: Academic Press; 1974. pp. 105–142.
  • Özkaynak H, Thurston GD. Associations between 1980 U.S. mortality rates and alternative measures of airborne particle concentration. Risk Analysis. 1987;7:449–461. [PubMed]
  • Rao V, Frank N, Rush A, Dimmick F. Chemical Speciation of PM2.5 in Urban and Rural Areas. National Air Quality and Emissions Trends Report. 2003:13–23.
  • Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society B. 2002;64:583–639.
  • U.S. Environmental Protection Agency. National Ambient Air Quality Standards for Particulate Matter; Final Rule, Part II. Federal Register. 1997;40 CFR Part 50.
  • U.S. Environmental Protection Agency. Quality Assurance Guidance Document, Quality assurance Project Plan: PM2.5Speciation Trends Network Field Sampling. 2000. EPA 454/R-01-001, RTP, NC, Available at: http://www.epa.gov/ttn/amtic/files/ambient/pm25/spec/1025sqap.pdf.
  • U.S. Environmental Protection Agency. National Air Quality and Emissions Trends Report, 2003. Special Studies Edition. 2003. EPA 454/R-03-005, RTP, NC, Available at: http://www.epa.gov/air/airtrends/aqtrnd03/
  • Wackernagel H. Multivariate Geostatistics-An Introduction with applications. 2. New York: Springer-Verlag; 1998.
  • Wikle CK, Milliff RF, Nychka D, Berliner M. Spatiotemporal hierarchical Bayesian modeling: Tropical ocean surface winds. Journal of the American Statistical Association. 2001;95:1076–1987.