Summary statistics for all covariates used to develop the step-function, dispersion model of NO2 concentration in the state of Connecticut are shown in . The geographic distribution of average daily traffic (ADT) counts is displayed in . The unadjusted, isotropic dispersion function relating NO2 to traffic volume in concentric buffers out to 10 km from a residence (unadjusted R2 = 0.30) is shown in .
| Table 1Summary statistics for model covanates. |
An examination of the wind rose representing the composite wind speed and direction for the study period reveals prevailing north/south winds (). Therefore, an anisotropic model was created by dividing buffers into four quadrants (north, east, south and west of the residence as indicated by the labeled quadrants in the compass inset in each panel of ) and calculating traffic volume for each buffer/quadrant combination.
Prior to inclusion in the final model, categories of land use were collapsed from 12 to 3 according to their effect on NO2 level in a model that included only land use variables: increased NO2 levels were significantly associated with only the “developed” land category; decreased NO2 levels were significantly associated with “forest/grass” categories (including deciduous forest, coniferous forest, turf and grass and other grasses); and no significant associations were found between NO2 level and any other category (including agricultural field, water, non-forested wetland, forested wetland, tidal wetland, barren, utility rights-of-way). Geographic distributions of land use (“developed,” “forest/grass,” “other”), population density and elevation are displayed in , and , respectively. Land use variables for “developed” and “forest/grass” land were calculated for each residential buffer prior to inclusion in the model. Model covariates for elevation and population density were calculated once for each residence.
Initial analyses using traffic volume variables calculated out to 10 km from the residence showed no significant improvement in model fit compared to using variables within 6 km, therefore covariates for the most distant residential buffers were dropped from further consideration. After variable selection, following the rules described in section 3, above, the final model incorporated significant traffic buffer/quadrant combinations, significant buffers for “developed” and “forest/grass” land as well as population density, elevation and variables representing season ().
| Table 2Final modela of NO2 concentration adjusted for variables shown. (Connecticut 2006 – 2009) |
displays NO2 concentrations measured over the course of the study as well as NO2 as predicted by two different functions of date: a third-degree bspline with six knots (red line) and a trigonometric and linear function of date (green line), which were used to represent seasonality in model development and model prediction, respectively. The overall R2 (adjusted) for the final model was 0.6728 using the bspline function of date to represent seasonality (, Model 1) and 0.6430 using the trigonometric and linear function of date (, Model 2). The relationship between traffic volume in the final adjusted model (, Model 1) and NO2 level is shown for each of the four quadrants surrounding a residence ().
Data used for validating the model included 120 NO2 samples taken outside of Connecticut residences in 1994, and ranged in value from 4.39 to 33.10 ppb (mean [SD] of 13.75 [5.35]). Using the trigonometric and linear function of date to adjust for seasonality, in addition to covariates from the final model including traffic, land use, population density, elevation, and adjusting the intercept to reflect the 3.75 ppb difference in means between the two sample sets, the correlation between observed and predicted NO2 was 0.68. Results of the additional cross-validation analysis using the leave-one-residence-out strategy produced a RMSE of 2.40 compared to 2.38 for the final model (Model 2, ) fitted using all observations. The similarity of RMSEs between the cross-validation sample and the final model indicates that the model performs well in estimating exposure given relevant data on traffic, land use and time.
A variogram was constructed (not shown) and revealed no evidence of spatial correlation for these data. There was, however, evidence of correlation among the repeated temporal measurements from each residence and this was incorporated into our final models (Models 1 and 2, ).
The final model (with the trigonometric and linear function of date for seasonality) was used to predict NO2 levels in a 616 square km area around Hartford, Connecticut at two times in a year (). Orange areas indicate the highest predicted concentrations of NO2 and blue areas the lowest. Predicted values for February 1 ranged from 5.9 to 19.8 ppb (mean [SD] 10.4 [2.3]) (, left panel), and for August 1 ranged from 0.5 to 14.4 ppb (mean [SD] 5.0 [2.3]) (, right panel).