Although our models are physically interpretable and explain significant variability, there are some limitations to our analysis. Our small sample size may be a limitation in model-building, though sensitivity analyses show our findings to be robust. Similarly, our multi-pollutant approach limited our sampling design to sample at only a few homes simultaneously and distributing sampling sessions over the course of a season, such that samples incorporate both temporal and spatial variability. Measuring at a large number of homes simultaneously, however, is generally infeasible for equipment-intensive multi-pollutant sampling designs, especially given interest in both the indoor and outdoor environment. In addition, long-term residential exposure estimation can benefit from within-season temporal variability, which can not be obtained from short-term simultaneous sampling campaigns. Further, the influence of meteorological covariates may be under-estimated using such approaches, and are needed for long-term exposure estimation models. The central site monitor used to account for temporal heterogeneity may capture some local-source component as well, and for this reason we opted to use only one central site monitor with the most complete data coverage for all three pollutants, to avoid confounding spatial and temporal variability during periods when some sites were unavailable. In addition, we chose to maintain the central site term in our final models, rather than predicting the temporally-corrected concentration residual, and included other temporal terms such as season and meteorological parameters.
Ultimately, most GIS-based residential exposure models are intended to allow exposure estimation across large cohorts, and thus we rely on readily-created GIS-based traffic indicators generally available across urban neighborhoods, such as total roadway length measures. Several predictors in our models, however, such as obstruction between the home and nearest major road, are effectively correction factors for the restrictions associated with residential monitoring – i.e. samplers often need be set up behind the buildings, wherever power sources are available, on a porch where smoking or grilling also occurred, or some such non-ideal location. These parameters may not be appropriate for extrapolation, as they may not reflect mean concentrations near the home, but are important to correctly interpreting residential data.
There are a number of issues related to LUR studies which limit model generalizability. First, LUR model results are highly dependent upon the quality of spatial data available. Here, for example, total roadway length produced the strongest concentration estimates in our urban neighborhoods. In areas with better traffic data, however, indicators incorporating traffic density may fare better. Second, spatial variables can have different meaning in different settings; in rural areas, for example, proximity to major roads may be correlated with proximity to parking lots, industrial areas, and other sources, which is less likely in urban areas. Similarly, overall traffic counts in Europe can provide better estimates for EC modeling due to the higher prevalence of diesel vehicles. Higher emissions of NO2 and total particle mass from diesel engines suggests that we may expect higher R2's on European LUR models from intra-urban variability in traffic-related pollutants than in the US.
These issues of generalizability continue to challenge the search for "causal agents" in the association between traffic density and respiratory and cardiovascular illness. Because different traffic indicators have been shown to predict concentrations (and illness) in different regions, it is difficult to identify the specific spatial characteristics of unidentified causal agents. We maintain, however, that because the chemical and physical properties among various pollutants lead to differing rates of decay and deposition near roadways [
15,
30], predictive models should be built and compared for multiple pollutants in epidemiological studies of the effect of traffic exposure on health. Further, because pollutants more refined than PM
2.5 are still complex (e.g., EC, a subset of PM
2.5, may have VOCs and metals bound to its surface), there remains a need for spatial models investigating distributions of specific PM
2.5 constituents to evaluate their relationship with health outcomes. Finally, most urban residents of North American cities spend the majority of their time indoors, further complicating efforts to define causal pollutants in the traffic-health relationship. The models presented here do not address infiltration or indoor residential environments directly, but do facilitate estimation of indoor exposures when combined with home characteristics such as building type [
19]. Finally, measurement error is differential across the three pollutants modeled here, as evidenced by their varying R
2's, which complicates comparisons across predictive models for different pollutants in epidemiological studies. This issue deserves greater attention, as such comparisons across pollutant-specific models will be important in identifying causal agents.
Our results provide some insight to researchers working to elucidate intra-urban residential exposures for long-term epidemiological analyses. First, the issue of distinguishing temporal from spatial variability is a significant difficulty for multi-pollutant sampling designs; when only a small number of homes can be sampled simultaneously, there is not adequate variability observed within any time period to distinguish the temporal from spatial effects. To this end, we suggest, in cities lacking a year-round rural background monitor for all pollutants of interest, a fixed-site monitor (using identical methods to those at the homes) in a less heavily-trafficked area would be useful to capture background concentrations and long-range transport. We also suggest exploration of sampling designs which maximize temporal overlap; for example, a systematically-staggered sampling with a predictable amount of overlap at different homes might allow for a simplified temporal correction term. This design would retain intra-season variability as well, which is lost in the shorter-term simultaneous sampling campaigns. However, such theoretically optimal sampling designs should be approached realistically, with the recognition that residential sampling will require some amount of visit rescheduling (especially for a cohort such as ours, with women who are pregnant or with young children). Finally, in selecting sampling locations, certain site characteristics potentially impeding accurate sample collection should be considered (e.g., construction, lack of secure outdoor space or power outlets), and incorporated into previous methods for optimizing concentration variability, such as that outlined in [
12].