Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2670402

Formats

Article sections

Authors

Related links

Water Res. Author manuscript; available in PMC 2010 April 1.

Published in final edited form as:

Published online 2009 February 21. doi: 10.1016/j.watres.2009.01.034

PMCID: PMC2670402

NIHMSID: NIHMS97501

Understanding surface water quality is a critical step towards protecting human health and ecological stability. Because of resource deficiencies and the large number of river miles needing assessment, there is a need for a methodology that can accurately depict river water quality where data do not exist. The objective of this research is to implement a methodology that incorporates a river metric into the space/time analysis of dissolved oxygen data for two impaired river basins. An efficient algorithm is developed to calculate river distances within the BME*lib* statistical package for space/time geostatistics. We find that using a river distance in a space/time context leads to an appreciable 10% reduction in the overall estimation error, and results in maps of DO that are more realistic than those obtained using a Euclidean distance. As a result river distance is used in the subsequent non-attainment assessment of DO for two impaired river basins in New Jersey.

The identification of impaired river segments is a significant requirement of the federally implemented Clean Water Act (CWA) of 1972. The CWA requires states to assess water quality and identify and report those segments that are impaired for particular uses. Dissolved Oxygen (DO) content is one of the easiest and most basic water quality parameters to measure and is a good indicator of overall stream health. Because of resource deficiencies, budget constraints, and the sheer number of river miles to be assessed, there is a need for cost efficient and effective methods that can estimate DO for a large number of river miles using limited monitoring data.

One way to do this is with geostatistics based methods that use the principle of correlation among like data points to derive values where data do not exist. There have been several studies that characterize surface water quality using geostatistics. These studies involve linear kriging methods, or regression based models based on the spatial variability in the data (Rasmussen *et al.* 2005, Tortorelli and Pickup 2006). They also use a Euclidean, or ‘across land’ metric to calculate distances between data points.

Recent developments in geostatistics have moved beyond the purely spatial domain to include temporal variability as well (Stein 1986, Christakos 1992, Cressie 1993, Bogaert 1996, Kyriakidis and Journel 1999, Fuentes 2004, Kolovos *et al.* 2004, Akita *et al.* 2007). Akita *et al.* (2007) use spatiotemporal methods to assess tetrachloroethene (PCE) in the rivers of New Jersey, and claims a 56% improvement in estimation accuracy when comparing a space/time to a purely spatial approach. This is a substantial improvement and is most likely due to the irregularity of the spatial and temporal sampling of PCE. DO data in New Jersey are characterized by the same irregular space/time sampling; therefore a spatiotemporal framework will be used in this work.

The framework implemented in this research is the spatiotemporal Bayesian Maximum Entropy (BME) method (Christakos 1990, 2000; Serre *et al.* 1998, Serre and Christakos, 1999). This method has been successfully applied to a variety of environmental issues, including air quality (Christakos *et al.* 2004; Wilson and Serre 2007), and epidemiology (Law *et al.* 2004, 2006), as well as water quality (Serre *et al.* 2004, Akita *et al.* 2007). As demonstrated in these studies, BME presents the flexibility of providing the space/time kriging method as its linear limiting case, as needed for this work on DO, while it can be expanded to a non-linear estimator if other non-linear knowledge bases (e.g. soft data, non Gaussian distribution, etc.) need to be considered.

There have been several recent studies regarding the use of non-Euclidean distances and stream flow in water quality estimation, and the development of corresponding permissible covariance models (Ver Hoef, 2006; Cressie *et al.*, 2006; Peterson and Urquhart, 2006; Curriero, 2006; Bailly *et al.*, 2006; Bernard-Michel and Fouquet, 2006; Peterson *et al.*, 2007). Ver Hoef (2006), Cressie *et al.* (2006), and Peterson *et al.* (2006) demonstrate the use of flow-weighted covariance models using nitrates, change in DO, and dissolved organic carbon (DOC), respectively.

A summary of the most recent studies that compare Euclidean and river covariance models is presented in Table 1. Cressie *et al.* (2006) and Peterson and Urquhart (2006) compared Euclidean and flow-weighted covariance models, and found that the Euclidean model performed better. Ver Hoef *et al.* (2006) is the only study that found a flow-weighted covariance model is more accurate, though they compare that model with an isotropic covariance model using a river metric instead of an Euclidean metric.

The methods proposed in this work are based on geostatistical principles and spatial autocorrelation between data points. They are not meant to take the place of mechanistic and process-based models such as the traditional Streeter-Phelps or the Qual2 models developed by EPA. Geostatistical models can complement these existing methods by taking the outputs of these models and using them as inputs into a geostatistical framework to create larger spatial and temporal coverages of the parameter of interest, possibly leading to more accurate maps (LoBuglio *et al.* 2007). Alternatively, geostatistical models can also complement mechanistic models by providing evaluation data reconstructed over a basin. This study attempts to look at only geostatistical models in order to gain an understanding of the influences that distance measures have on our ability to assess rivers for DO impairments. Future work will examine the use of these models in combination with other mechanistic modeling approaches.

While the majority of studies have focused on purely spatial estimation methods, this research will examine the use of a river metric in a composite space/time analysis. Since very few studies have used a river metric to examine DO in a spatial context, and even fewer have done such analysis in a space/time context, the two objectives of this study are (1) to determine whether the use of a river metric provides a better model for estimation of DO along a river network in a space/time context, and (2) to apply the most appropriate space/time model to estimate DO non-attainment for two impaired river basins.

The two study watersheds are shown in Fig. 1 together with arithmetic averages of measured DO over the study time period. Both areas are high priority basins for the state and have impairments related to nutrients, sediments, micro-organisms, and DO. The state of New Jersey is divided into 20 watershed management areas (WMA). The Raritan consists of three WMAs, the North and South Branch, Millstone, and Lower Raritan. The land uses in both basins are primarily urban or agricultural. Overall, the Raritan is 36% urban, 19% agriculture, with the remaining divided between forest, wetland, and water. The Lower Delaware is 46% urban and 21% agricultural. These classifications are based on the 1995/97 Land Use/Land Cover designations by the State of New Jersey. New Jersey has a moderate climate with cold winters and warm, humid summers. These fluctuations in temperature play an important role in determining the amount of available DO found in these basins. Additionally, both the Lower Delaware and Raritan basins are geologically structured such that highlands situated to the west (Raritan) and east (Lower Delaware) feed into flat, highly developed areas near the basin outlets, where impervious surfaces exceed 50% in many places (NJDEP, 2002). Urban development taking place in both of these basins over the last two decades coupled with relatively little change in agricultural uses produces a wide array of point and non-point sources in both regions. This leads to increased nutrient levels from waste water discharge, urban runoff, and agricultural runoff, and the potential for higher biological oxygen demand (BOD) and reduced DO levels. According to the 2006 integrated water quality report, only 5% of statewide impairments were due to dissolved oxygen, however, greater than 30% of river miles went un-assessed due to insufficient data (NJDEP, 2006b). This is where methods such as the one employed in this study become increasingly important.

DO data were obtained from two sources for the period beginning January 1, 1990 through August 1, 2005. The first source is the U.S. Geological Survey (USGS) National Water Information System (NWIS). The second source is the USEPA storage and retrieval (STORET) database. Often times these databases report values with clarifying symbols accompanying them to signify uncertainty in the measurement. Therefore, in order to use these values in the analysis, any value reported as ’less than’ a particular value (i.e. containing a ‘<’ in the database) were treated as equal to 50% of that value, and values reported as estimated (i.e. containing an ‘E’ in the database) were treated as actual values. A summary of the data is given in Table 2.

Consider the case of a river network that can be represented by a directed tree of river reaches with zero width. This representation is highly adequate for downstream combining stream networks with somewhat narrow reaches; however it is not highly adequate for wider water bodies such as connected estuaries or lakes (Curriero, 2006). The river network is made up of reaches connected at confluence nodes. Each river reach is identified by a unique index *i* (Fig. 2), and let V be the set of all river reach indexes; V={1,2,…n}, where n is the total number of individual reaches. An *i*=1 will denote by convention the downstream-most river reach. The downstream end of the downstream-most reach is the outlet of the river network. The longitudinal coordinate *l* of a point on the river network is defined as the length of the continuous line connecting the outlet to that point along the river network (by convention, negative *l* values represent fictitious locations downstream of the outlet). A point ** r**=(

(Left) Directed tree river network with 5 stream reaches (numbered in circles), and showing point (*l,i*) on reach 4, and point (*l*′*,i*′) on reach 3. (Right) Range of the exponential-power river covariance parameters (α,β) **...**

A non-negative real-valued function *d*(**r**,**r**′) is a metric if it verifies the following three properties

$$\begin{array}{l}d(\mathbf{r},{\mathbf{r}}^{\prime})=0\phantom{\rule{0.38889em}{0ex}}\text{if}\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}\text{only}\phantom{\rule{0.16667em}{0ex}}\text{if}\phantom{\rule{0.16667em}{0ex}}\mathbf{r}={\mathbf{r}}^{\prime},\\ d(\mathbf{r},{\mathbf{r}}^{\prime})=d({\mathbf{r}}^{\prime},\mathbf{r}),\phantom{\rule{0.16667em}{0ex}}\text{and}\\ d(\mathbf{r},{\mathbf{r}}^{\prime})\le d(\mathbf{r},{\mathbf{r}}^{\u2033})+d({\mathbf{r}}^{\u2033},{\mathbf{r}}^{\prime})\end{array}$$

(1)

for all **r**, **r**′, **r**″. We denote *d _{E}*(

We let *X*(** r**) be a random field representing the value taken by a water quality parameter

$$\sum _{k=1}^{n}\sum _{{k}^{\prime}=1}^{n}{q}_{k}{q}_{{k}^{\prime}}c(d({\mathit{r}}_{k}{\mathit{r}}_{{k}^{\prime}}))\ge 0$$

(2)

for all choices of *n* river points *r** _{k}* and real numbers

$$cov(\mathit{r},{\mathit{r}}^{\prime})=exp(-{({d}_{E}(\mathbf{r},{\mathbf{r}}^{\prime})/{a}_{r})}^{\beta}),\phantom{\rule{0.38889em}{0ex}}0<\beta \le 2$$

(3)

where *a _{r}* is the covariance range. This model corresponds to the usual exponential and Gaussian models when

Ver Hoef *et al.* (2006) propose an appealing method to construct permissible covariance functions for river networks. Using their approach, we define the random variable X(*l*,*i*) at longitudinal coordinate *l* along reach *i* as the moving-average of a white noise random process W(*u*,*j*) defined at longitudinal coordinate *u*<*l* along reach *j* downstream of reach *i*. Let V* _{i}*(

$$\text{X}(l,i)=\underset{-\infty}{\overset{l}{\int}}du\phantom{\rule{0.16667em}{0ex}}g(u-l)\phantom{\rule{0.16667em}{0ex}}W(u,{V}_{i}(u))$$

(4)

where *g*(*u*−*l*) is a moving average function defined on *R*^{1}. As indicated in Ver Hoef *et al.* (2006), by choosing a moving average function that is exponentially decaying away from 0, i.e.
$g(h)=\sqrt{2}exp(-\mid h\mid )$, the moving average construction leads to a valid covariance function of exponential type that is a function of the river distance, i.e.

$$cov(\mathit{r},{\mathit{r}}^{\prime})=exp(-{d}_{R}(\mathit{r},{\mathit{r}}^{\prime}))$$

(5)

An overview of how to obtain this result has already been provided by Ver Hoef *et al.* (2006) and Ver Hoef and Peterson (2008), therefore we only provide the detailed proof of this result in the supplementary material for this paper. We note that while the exponential power model is valid for 0<*β*≤ 2 for the Euclidean distance, that model has only be shown to be valid for the river distance when *β*=1.

The most appropriate distance for a given water quality parameter may be a combination of the Euclidean and river distances. We may therefore define a composite Euclidean-river distance as

$${d}_{\alpha}(\mathit{r},{\mathit{r}}^{\prime})=\alpha {d}_{R}(\mathit{r},{\mathit{r}}^{\prime})+(\alpha -1){d}_{E}(\mathit{r},{\mathit{r}}^{\prime}),\phantom{\rule{0.38889em}{0ex}}0\le \alpha \le 1$$

(6)

which can easily be shown to verify the properties of a metric. Using *d _{α}* (

$$cov(\mathit{r},{\mathit{r}}^{\prime})=\mathit{exp}(-{({d}_{\alpha}(\mathit{r},{\mathit{r}}^{\prime})/{a}_{r})}^{\beta}),\phantom{\rule{0.38889em}{0ex}}0\le \alpha \le 1\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}0<\beta \le 2$$

(7)

which to the best of our knowledge, has not been proposed in this form in earlier works. This covariance model is permissible for *any* directed tree river network for (*α*=0,*β* ]0,2]) and (*α*=1,*β*=1). Additionally, for a particular river of interest, this covariance model may be valid for other values of *α* [0,1] and *β*ε ]0,2], which can be verified numerically by checking that the lowest eigenvalue λ of any covariance matrix used in the estimation of water quality is non-negative. Fig. 2 depicts the range of (*α,β*) values for which the lowest eigenvalue is positive, i.e. min(λ)>0, for 20 points randomly selected in the Raritan river in New Jersey. As can be seen from this figure, there is a large range of permissible (*α,β*) values for this particular river and selected points.

Hence a composite Euclidean-river distance has been developed that can be used for a variety of water quality parameters. Using an isotropic exponential-power river covariance model, it is shown that this model is permissible for any directed tree river network for (*α*=0,*β* ε ]0,2]) and (*α*=1,*β*=1), and provides a river-specific numerical test to check whether the model is permissible using other choices of *α*ε [0,1] and *β*ε ]0,2].

Another important class of permissible covariance models for directed tree river networks are covariance functions that use flow and river distance (Ver Hoef *et al.*, 2006; Cressie *et al.* 2006; Peterson *et al.*, 2006; Peterson *et al.*, 2007; Bernard-Michel and Fouquet, 2006, see supplemental materials for mathematical details of their work using a unified mathematical notation), which we refer to as flow-weighted covariance models, and which can be written as

$$cov(\mathit{r},{\mathit{r}}^{\prime})=\sqrt{\mathrm{\Omega}(i,{i}^{\prime})}\phantom{\rule{0.16667em}{0ex}}{c}_{1}({d}_{R}(\mathbf{r},{\mathbf{r}}^{\prime}))$$

(8)

where the real valued function *c*_{1}(.) can be any permissible covariance function in *R*^{1} (e.g. such that it is the Fourier transform of a non-negative bounded function in *R*^{1}, Christakos, 1992), and Ω(*i*,*i*′) is a real number between 0 and 1 expressing the amount of flow connection between reach *i* and *i*′ such that Ω(*i*,*i*′)=0 if they are not flow-connected, Ω(*i*,*i*′)=1 if they are on the same reach, and
$\sum _{{i}^{\prime}\in {V}_{l}(u)}}\mathrm{\Omega}(i,{i}^{\prime})=1\phantom{\rule{0.38889em}{0ex}}\text{for}\phantom{\rule{0.38889em}{0ex}}u>l$. The above flow-connected covariance model was first derived by Ver Hoef *et al.* (2006). Cressie et al. (2006) subsequently proposed that the flow connection between reach *i* and an upstream reach *i*′ can be defined as Ω(*i*,*i*′)= Ω(*i*′)/Ω(*i*) where Ω(*i*) is a function that increases in the direction of flow. In that case, the property
$\sum _{{i}^{\prime}\in {V}_{l}(u)}}\mathrm{\Omega}(i,{i}^{\prime})=1\phantom{\rule{0.38889em}{0ex}}\forall \phantom{\rule{0.38889em}{0ex}}u>l$ is verified if and only if Ω(*i*) is a flow additive function, i.e. such that if two reaches *i*′ and *i*″ combine into reach *i*, then Ω(*i*′)+ Ω(*i*″)=Ω(*i*). As shown in the supplementary material various additive functions can be used to obtain Ω(*i*), including flow discharges if these are available, watershed areas (Ver Hoef et al. 2006; Peterson and Urquhart, 2006; Peterson *et al.*, 2007; Bernard-Michel and Fouquet, 2006), or simply an additive stream-order number (Cressie *et al.*, 2006).

Flow-weighted covariance models do not belong to the class of isotropic river covariance models because the flow connection term cannot be reduced to a function of the distance between points. Their obvious advantage is that they incorporate flow-connectivity in the model of autocorrelation. However, as noted by Peterson and Urquhart (2006), setting the covariance to zero when points are not flow-connected may be a hindrance if very few monitoring sites are flow-connected, because in that case the number of data points in the estimation neighborhood is drastically reduced, leading to less informed estimation maps than those produced using an isotropic river covariance model. Unfortunately there are very few monitoring points that are flow-connected on any given sampling day in our DO dataset. Hence flow-connected covariance models could not be used in this work. However; these models should be used when a large fraction of the monitoring samples are flow-connected. Recent exciting work by Bailly *et al.* (2006) may allow us to extend the class of flow-connected covariance models to include models allowing some autocorrelation between points that are not flow-connected (with conditional independence to common downstream points).

This analysis uses a space/time random field (S/TRF) *X*(** p**), where

$$c(r,\tau )={c}_{1}\delta (r)\delta (\tau )+{c}_{2}exp\left(\frac{-3r}{{a}_{r2}}\right)exp\left(\frac{-3\tau}{{a}_{t2}}\right)+{c}_{3}exp\left(\frac{-3r}{{a}_{r3}}\right)cos\left(\frac{2\pi \tau}{{a}_{t3}}\right)+{c}_{4}exp\left(\frac{-3r}{{a}_{r4}}\right)exp\left(\frac{-3\tau}{{a}_{t4}}\right)$$

(9)

where *r* is chosen to be either the Euclidean or river distance and *δ* is the nugget coefficient in space or time. This model consists of 4 structures where c1…c4 are calculated portions of the total variance and correspond to the coefficients of each structure (i.e. c_{1} for structure 1, c_{2} for structure 2…). The first term of each structure is the spatial component, while the second term relates to the temporal component of the covariance. The variables a_{r} and a_{t} are the spatial and temporal ranges for each structure. Other than the initial nugget, the spatial component of the remaining structures is exponential, which as shown above is permissible for any directed tree river network for the Euclidean and river distances, and therefore the overall model is permissible because it corresponds to nested space/time separable permissible covariance functions (Kolovos *et al.*, 2004). The temporal component is exponential for structures 2 and 4, while structure 3 is a cosinusoidal function related to the seasonal fluctuations often associated with DO. Further covariance details are found in section 3.1.

The BME method was used to estimate DO at unsampled river locations. BME provides a rigorous mathematical framework to process a wide variety of knowledge bases characterizing the space/time distribution and monitoring data available for DO, and obtain a complete stochastic description of DO at any unmonitored space/time point in terms of its posterior Probability Distribution Function (PDF). The BME method was introduced by Christakos (1990), and a detailed description of the conceptual underpinnings of the BME framework are provided in Christakos (1992,2000), while its *BMElib* numerical implementation is described in Serre *et al.* (1998), Serre and Christakos (1999) and Christakos *et al.*(2002).

*BMElib* version 2.0b was used in this analysis. It was written using the MATLAB R2000a programming platform. *BMElib* 2.0b does not have functions to calculate river distances. We therefore added new MATLAB functions to *BMElib* 2.0b that efficiently calculate river distances. Details about the implementation of river distance calculations in *BMElib* are provided in the supplemental materials for this paper.

The distribution of DO across space and time is modeled as the sum of a non-random function *m*_{DO}(** p**) and an isotropic/stationary residual S/TRF

The site specific knowledge includes both hard data (e.g. measured value) and soft data (i.e. data with associated measurement error). By way of summary, BME uses the maximization of a Shannon measure of information entropy and an operational Bayesian updating rule to process the general and site specific knowledge bases, and obtain the posterior PDF describing the DO concentration at any unsampled point of the river network (Christakos *et al*., 2002).

This research uses the special case where only hard data are considered (i.e. the measurement errors are small or unidentified). In this case the BME method yields the estimators of linear Geostatistics known as the simple, ordinary and universal kriging methods. This research, therefore, is based on a form of space/time linear kriging. The *BMElib* package (BMElib, 2008) implements concepts of composite space/time analysis (i.e. composite space/time metrics and neighborhood search, non separable space/time covariance models, etc.) that result in better geostatistical functions for linear space/time kriging than that provided by classical geostatistics software where time is included as merely another spatial dimension.

In order to determine which of the Euclidean or river metrics was more accurate for the assessment of DO in the study basins, a cross-validation procedure was used. Each data point was removed sequentially and re-estimated using the remaining space/time data points. The Mean Square Error (MSE) is calculated as the sum of the squared differences between re-estimated and measured values. The method with the lowest MSE is then used in the assessment of DO along unmonitored rivers.

Using the selected distance metric within the BME framework we estimate DO at equidistant estimation points (i.e. distributed at a fixed interval of 0.1 miles) along the Raritan and Lower Delaware river networks. Monitoring data are treated as hard data because all measurements met the USGS or EPA data quality standards. For each estimation point the hard data situated in its local space/time neighborhood is selected, and the corresponding BME posterior PDF is calculated to describe DO at that estimation point. The BME posterior PDF obtained at equidistant points along the river network are then used to obtain estimated DO values, which are used to produce maps of DO concentration, and delineate river miles that may be impaired.

In order to better understand the seasonal pattern of DO impairment and better quantify the probability of these impairments, a criterion-based space/time assessment framework is employed to categorize the fraction of river miles meeting certain probability thresholds, as discussed in Akita *et al.* (2007). These thresholds give us the ability to classify the probability of violation of a standard for any space/time estimation point based on its BME posterior PDF. The standard for DO concentration was set at 7 mg/L, which is the standard used by NJDEP for FW-TP streams (NJDEP, 2006a). Using this standard, the probability of violation at space/time point ** p** is then defined as the probability that the BME mean estimate is < 7 mg/L, i.e.

$$\text{Prob}[\text{Violation},\mathit{p}]=\text{Prob}[\text{DO}(\mathit{p})<7.0\text{mg}/\text{L}]$$

(10)

The fraction of river miles impaired during any given time period is calculated using the fraction of equidistant estimation points for which the probability of violation is in excess of some pre-selected probability threshold.

Fig. 3 shows the experimental covariance values (squares) obtained using the mean trend removed DO data for the Raritan and Lower Delaware River Basins. These estimates were then used to fit the non separable space/time covariance model (Eq. 9). The sills *c*_{1}, …, *c*_{4}, spatial ranges a_{r2}, a_{r3}, a_{r4}, and temporal ranges a_{t2}, a_{t3}, a_{t4} obtained are listed in Table 3, and the resulting model is shown as a solid line in Fig. 3.

Space/time covariance of mean-trend removed DO in New Jersey’s Raritan and Lower Delaware River Basins shown as a function of distance *r* along the river network for a temporal lag of τ=0 (top plot) and as a function of τ for *r* **...**

The variance of the first structure of the covariance model, or the nugget effect, is about 25% of the overall variance. The nugget effect typically consists of the variance due to inherent variability of DO over very short distances plus the measurement-error variance. In our case, we assessed that for our dataset, the measurement-error variance for DO contributes at most 20% of the total variation in the data, which is within the upper bound indicated by the nugget effect. The second structure of the covariance model contains a short range exponential spatial component and a short range exponential temporal component. Fluctuations of DO over this combination of short spatial ranges (5 km) and short temporal ranges (25 days) may be due to local sources of pollution acting over short spatial distances (such as point pollution discharges leading to local increase of BOD and subsequent reductions in DO over short distances) that either have intermittent pollution discharge loading lasting just a few days, or are persistent but have an effect that is altered intermittently by meteorology events lasting from a few days to a month (e.g. rainfall events, or changes in temperature which significantly effects the oxygen saturation of water). This accounts for nearly 50% of the overall variation. The third structure of the covariance also contains a very short range exponential spatial component but coupled with a medium range cosinusoidal hole temporal component with a periodicity corresponding exactly to a calendar year. This covariance structure contributes approximately 5% of the total variation in DO and corresponds to processes acting seasonally. These processes are very localized geographically as they act over distances of about 2.2 km, which may again include localized spikes in BOD and subsequent DO depletion, as well as the natural variability in river morphology and processes acting on DO over distances ranging from of 1 to 3 km. The final covariance structure consists of a long range exponential component in both space and time. The long spatial range of 88.9 km can be attributed to characteristics and impacts from non-point source pollution from suburban development and agricultural runoff that can affect long stretches of rivers at once. What is interesting to note is that these fluctuations have a temporal range of about 10,000 days or 27.4 calendar years, which captures time scales corresponding to long term effects of human activities and impact on the environment, as well as climatic changes that may alter the air/water interface and oxygen equilibrium. It should be recognized that there is a wider confidence interval for this temporal range than for any of the other spatial or temporal ranges of our covariance model because this temporal range of 27.4 years exceeds the duration of the time period for which data are available (15 years). Nonetheless it is interesting to note that it is a very large temporal range, which suggests that non-point source pollution over large geographical areas may have an impact on DO that is lasting much longer than the impact of point source pollutions. This may have the serious policy implication that, while pollution prevention strategies may have quick responses in abating the effect of point sources pollution, these strategies may face a much greater challenge in abating rapidly the effect of non-point source pollution on the DO in the surface waters of New Jersey.

A cross-validation was performed to examine the differences in estimation of DO using a Euclidean versus a river distance. Table 4 summarizes the cross-validation MSE obtained for each river basin using both distances. The use of a river metric resulted in 11.3% (Raritan) and 10.3% (Lower Delaware) decrease in MSE. We note that the cross-validation points were at a distance from their neighboring training data points corresponding to several times the average spatial and temporal ranges. In this situation there isn’t as much contrast between the Euclidean and river metrics as would be the case if the points were closer across space. Hence, it is possible that the true gain in mapping accuracy is higher than the 10%–11% found. This is supported by other cross-validation analysis we conducted using synthetic datasets (results not shown here). The approximately 10% reduction in estimation error is appreciable because previous studies using river distance in an estimation context found little difference between a Euclidean and river based model and in some cases found a river distance to increase the prediction error (Ver Hoef *et al.*, 2006; Cressie *et al.* 2006; Peterson *et al.*, 2006; Peterson *et al.*, 2007).

Change in cross validation mean square error (MSE) for each basin. A negative change indicates a reduction in overall MSE (i.e. improvement) when using a river metric.

The improvement in mapping accuracy is supported by our covariance analysis. The variance weighted average of the Euclidean and river spatial ranges were 9.7km and 20.4km, respectively. This means that DO levels are correlated over much longer distances along the river network than across land. This is due in part to the fact that a river meanders, so that the distance along two points is longer along the river reach than across land. The ratio of river distance to straight-line distance between two reach endpoints is known as the meandering ratio (MR). However, even when we account for the meandering of the network, the range of correlation between points along a river network is significantly higher than when using a Euclidean metric. For example, the average (MR) for both the Lower Delaware and Raritan basins is approximately 1.2. Factoring out this effect by dividing the ratio of river range to Euclidean range (2.1) by the average MR (1.2) gives us an adjusted ratio of river vs. Euclidean range of 1.8. This means that, in practice, even after adjusting for meandering, the correlation along the river is still 1.8 times longer than across land.

While use of the river metric produces maps of DO that are more accurate than those obtained using a Euclidean metric, one might ask whether these maps are visually different. The visual difference can best be shown by comparing the DO estimated in two areas of the Raritan Basin, as shown in Fig. 4. The maps obtained using a Euclidean metric are shown on the left, while the maps obtained using the river metric are shown on the right. The subfigure contains the zoomed in portion of the northwestern Raritan basin corresponding to the North and South Branch WMA to highlight two major differences when comparing metrics. Fig. 4(a) depicts the zonal differences while Fig. 4(b) depicts the parallel reach effect.

Zonal (a) and Parallel Reach effect (b) on the BME Estimation of DO Residual in the Upper & Lower Branch Raritan Basin on Dec 16, 2002 using a Euclidean metric (left) or a river metric (right). Squares are locations of monitoring stations for **...**

From Fig. 4(a) the differences in zonal influence that points have when using a Euclidean vs. a river metric are apparent. This is directly connected to the differences in covariance ranges. The river covariance has a longer variance weighted spatial range, resulting in a larger zone of influence of data points along the river. For the Euclidean metric this zone is circular in nature with a smaller range than the zone of influence observed with the river metric, as can be seen by comparing the right and left maps of Fig. 4(a). Fig. 4(b) depicts another phenomenon along parallel reaches. When estimating the DO level at a point along an unmonitored reach, a higher relative weight is assigned to a sample collected at a point that is at a short distance along the river network, than at a point that is at a short distance across land. So when considering the case shown in Fig. 4(b) where two clearly different river branches are running in parallel of one another, we see that the Euclidean map on the left tends to propagate information from the monitoring data points across land, while the river map on the right constrains the propagation of that information to the river branch where the sample was collected, leading to a more realistic map where parallel branches have distinct water quality.

Given the monitoring data available in this study, the results support our hypothesis that the river metric provides more accurate and realistic maps of DO across a river network than maps obtained using a Euclidean metric. Based on this conclusion, river distance was incorporated into the estimation of DO in the Raritan and Lower Delaware River Basins for a subset of the study period (2000–2005) to improve our assessment of the fraction of river miles not attaining the FW-TP standard for DO in New Jersey.

Using the river metric the BME posterior PDF was calculated describing DO at estimation points distributed uniformly along all river miles of in the Raritan and Lower Delaware Basins. Fig. 5 depicts the BME mean estimate of DO on June 12, 2002. This date is representative of a typical summer month where DO is at its lowest in both basins. The darker areas highlight river miles where DO has fallen below the New Jersey FW-TP standard of 7 mg/L. The inset highlights an area in the southeast quadrant of the Raritan Basin, corresponding to the Millstone WMA where a majority of river miles are impaired for DO during this time period. Additional movies provided in the supplementary data for this paper show DO for every 30 days of the 2000 through August 2005 time period, for both the Lower Delaware and Raritan Basins. These maps are used to calculate the fraction of river miles impaired.

For illustration purpose we use the New Jersey FW-TP standard of 7 mg/L for waters designated for freshwater trout-production because the Lower Delaware and Raritan Basins contain a significant number of trout producing and trout maintaining streams. The data were examined to see if a temporal or seasonal trend existed as the temporal covariance would suggest.

The average fraction of river miles not meeting the assessment criteria for more likely than not (MLTN) in non-attainment (i.e. probability of violation > 50%, Akita *et al.*; 2007) increases from 0.00% to 6.61% between winter and spring of 2002 in the Raritan Basin, and from 0% to 19.70% of river miles in the Lower Delaware Basin (Table 5). In the summer of 2002, this fraction of impaired river miles increases even further, to about 23% in the Raritan and about 58% in the Lower Delaware. Much of this phenomenon can be related to temperature changes depending on season. In the warmer months, water temperature is at its highest and therefore does not hold as much oxygen as the colder water in the winter months. Because New Jersey, and particularly the Raritan sit at higher latitudes, the water temperature drops drastically in winter, leading to near 0% of river miles being impaired. Alternatively, the warmer waters of the summer and spring can foster biological growth that consumes large amounts of DO. This increase in BOD compounds the affects of higher temperatures and leads to more river miles in non-attainment.

Seasonal Average Variation in Fraction (%) of River Miles More Likely than Not (MLTN) in Non-Attainment (probability of Violation > 50%) for 2002

From 2000 to 2005, between about 8% to 58% of river miles were found to be MLTN in non-attainment in the warmer summer months (Table 6). The fraction of impaired river miles was highest in 2002, that fraction decreased in 2003 and 2004, but it increased again in the last year of our study period, 2005, indicating that low DO may be an on-going problem in these basins. 2005 was also the warmest summer on record in New Jersey and coupled with a drought, could have contributed to the increase in impaired river miles. An analysis was also conducted to determine the fraction of river miles highly likely in non-attainment (i.e. probability of violation > 90%, Akita *et al.*; 2007). Based on this criterion, we found that the Lower Delaware had a much higher fraction of river miles ascertained as impaired than can be found in the Raritan. Over the study period, the Lower Delaware had as much as 19% of river miles highly likely in non-attainment, while the Raritan remained around 1.8%. DO is affected by a number of environmental factors, including temperature, salinity, nutrient levels and biological oxygen demand. One reason for explaining the larger percentage of impaired miles in the Lower Delaware is the fact that not only is the percentage of urban area larger, but the overall amount of agricultural land is also larger than that in the Raritan, possibly contributing non-point source nutrient loading into adjacent streams. Land cover and land use have shown to greatly impact the quality of streams and rivers, especially in areas undergoing rapid conversion to more urban development patterns (King et al., 2005; Chang, H. 2008).

Average Summer Fraction (%) of River Miles More Likely than Not (MLTN) in Non- Attainment (probability of Violation > 50%) for the period 2000–2005 (Summer = Jul- Sep).

One of the limitations of this approach is the exclusion of other parameters that can be used to predict DO levels. DO is affected by the geochemistry of the water, and therefore process-based models may provide additional information to refine our geostatistical models. Additionally, this approach looks at only one class of potential covariance functions, the exponential power model, other functions need to be tested for permissibility when using river distance (isotropic or flow-weighted). Finally, we use a partial cross-validation procedure to examine the predictive capability of our model. Full model validation using measured DO will be useful for further model refinement in future work.

These results suggest that continued DO monitoring is particularly critical in the Lower Delaware basin to evaluate future trends in DO during the summer months. More work is needed to identify the specific causes of low DO. The DO maps generated here provide a general basis to help identify these causes.

Several conclusions can be drawn from the results of this study:

- Our implementation of a river distance metric in the BMElib package provides an efficient and flexible tool for the space/time analysis of water quality along river networks.
- Application of the river metric to analyze DO in two river basins in New Jersey leads to maps that are 10–11% more accurate and visually more realistic than maps obtained using the classical Euclidean distance.
- After adjusting for river meandering, the correlation of DO along the river is about 1.8 times longer than across land.
- DO non-attainment was worse in the Lower Delaware, over more river miles, and over a longer period of time than in the Raritan.
- Additional parameters, such as BOD, temperature, salinity, and nutrients should be factored in to improve estimation accuracy at unmonitored locations.
- Future work should examine other water quality parameters in a space/time context as they may behave differently than DO.

We are grateful for the assistance of staff for the state of New Jersey. This work was supported by the New Jersey Dep. of Environmental Protection (contracts SR03-046, SR04-062 and SR05-050) and grants from the National Inst. of Environmental Health Sciences (grants no. 5 P42 ES05948, P30ES10126 and T32 ES007018).

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Akita Y, Carter G, Serre ML. Spatiotemporal non-attainment assessment of surface water tetrachloroethene in New Jersey. Journal of Environmental Quality. 2007;36(2) [PubMed]
- Bailly J-S, Monestiez P, Lagacherie P. Modelling spatial variability along drainage networks with geostatistics. Math Geology. 2006;38(5):515–539.
- Bernard-Michel C, de Fouquet C. In: Pirard E, Dassargues A, Havenith H-B, editors. Construction of valid covariances along a hydrographic network. Application to specific water discharge on the Moselle Basin; Proceedings of IAMG’2006 - XIth International Congress for Mathematical Geology; Liège. Sep 3 – 8; 2006. CD S13_03, 4p.
- BMElib: Bayesian Maximum Entropy Library for Space/Time Geostatistics. Version 2.0b for MATLAB. http://www.unc.edu/depts/case/BMELIB/
- Bogaert P. Comparison of kriging techniques in a space-time context. Math Geology. 1996;28(1):73–86.
- Chang H. Spatial analysis of water quality trends in the Han River basin, South Korea. Water Research. 2008;42:3285–3304. [PubMed]
- Christakos G. A Bayesian/maximum-entropy view on the spatial estimation problem. Math Geology. 1990;22(7):763–776.
- Christakos G. Random field models in Earth Sciences. Academic Press; San Diego: 1992. p. 474.
- Christakos G. Modern spatiotemporal geostatistics. 2. Oxford Univ. press; New York: 2000. 2001.
- Christakos G, Bogaert P, Serre ML. Temporal GIS: advanced functions for field-based applications. Springer; New York: 2002. p. 217.
- Christakos G, Kolovos A, Serre ML, Vukovich F. Total ozone mapping by integrating data bases from remote sensing instruments and empirical models. IEEE Trans on Geosc and Rem Sensing. 2004;42(5):991–1008.
- Cressie NAC. Statistics for spatial data. rev. John Wiley & Sons; New York: 1993. p. 900.
- Cressie N, Frey J, Harch B, Smith M. Spatial Prediction on a River Network. Journal of Agricultural Biological And Environmental Statistics. 2006;11(2):127–150.
- Curriero FC. On the use of non-Euclidean distance measures in geostatistics. Math Geology. 2006;38(8):907–925.
- Fuentes M. Testing for separability of spatiotemporal covariance functions. Journal of Statistical Planning and Inference. 2004;136(2):447–466.
- King RS, Baker ME, Whigham DF, Weller DE, Jordan TE, Kazyak PF, Hurd MK. Spatial considerations for linking watershed land cover to ecological indicators in streams. Ecological Applications. 2005;15:137–153.
- Kolovos A, Christakos G, Hristopulos DT, Serre ML. Methods for generating non-separable spatiotemporal covariance models with potential environmental applications. Advances in Water Resources. 2004;27(8):815–830.
- Kyriakidis PC, Journel AG. Geostatistical space-time models: a review. Math Geology. 1999;31(6):651–684.
- Law DCG, Serre ML, Christakos G, Leone PA, Miller WC. Spatial analysis and mapping of sexually transmitted diseases to optimize intervention and prevention strategies. Sexually Transmitted Infections. 2004;80:294–299. [PMC free article] [PubMed]
- Law DCG, Bernstein K, Serre ML, Schumacher CM, Leone PA, Zenilman JM, Miller WC, Rompalo AM. Modeling an early syphilis outbreak through space and time using the bayesian maximum entropy approach. Annals of Epidemiology. 2006;16(11):797–804. [PubMed]
- LoBuglio JN, Characklis GW, Serre ML. Cost-effective water quality assessment through the integration of monitoring data and modeling results. Water Resources Research. 2007;43:W03435. doi: 10.1029/2006WR005020. [Cross Ref]
- NJDEP. Portrait of a Watershed. Division of Water Quality; New Jersey Dept. of Env. Protection: 2002.
- NJDEP. Surface Water Quality Standards. Title 7, New Jersey Administrative Code. N.J.A.C. 7:9B. 2006a.
- NJDEP. New Jersey 2006 integrated water quality monitoring and assessment report. Water Monitoring and Standards; New Jersey Dept. Of Env. Protection: 2006b.
- Peterson EE, Merton AA, Theobald DM, Urquhart NS. Patterns in spatial autocorrelation in stream water chemistry. Environmental Monitoring and Assessment. 2006;121:569–594. [PubMed]
- Peterson EE, Urquhart NS. Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: a case study in Maryland. Environmental Monitoring and Assessment. 2006;121:613–636. [PubMed]
- Peterson EE, Theobold D, Ver Hoef JM. Geostatistical modeling on stream networks: developing valid covariance matrices based on hydrologic distance and stream flow. Freshwater Biology. 2007;52:267–279.
- Rasmussen, T.J., A.C. Ziegler, P.P. Rasmussen. 2005. Estimation of constituent concentrations, densities, loads, and yields in lower Kansas River, northeast Kansas, using regression models and continuous water- quality monitoring, January 2000 through December 2003. USGS Scientific Investigation Reports: 2005–5165. 126p.
- Serre ML, Bogaert P, Christakos G. Computational investigations of Bayesian maximum entropy spatiotemporal mapping. In: Buccianti A, Nardi G, Potenza R, editors. IAMG98, Proceed. Of 4th Annual Conference of the International Association for Mathematical Geology. Vol. 1. De Frede Editore; Naples, Italy: 1998. pp. 117–122.
- Serre ML, Christakos G. Modern geostatistics: computational BME in the light of uncertain physical knowledge--the equus beds study. Stochastic Environmental Research and Risk Assessment. 1999;13(1):1–26.
- Serre ML, Carter G, Money E. Geostatistical space/time estimation of water quality along the raritan river basin in New Jersey. In: Miller CT, Farthing MW, Gray WG, Pinder GF, editors. Computational Methods in Water Resources 2004 International Conference, Elsevier. Vol. 2. 2004. pp. 1839–1852.
- Stein M. A simple model for spatial-temporal processes. Water Resources Research. 1986;22(13):2107–2110.
- Stein ML. Interpolation of Spatial Data: Some Theory for Kriging. Springer; New York: 1999. p. 247.
- Tortorelli RL, Pickup BE. USGS Scientific Investigation Reports: 2006–5175. 2006. Phosphorus concentrations, loads, and yields in the Illinois River basin, Arkansas and Oklahoma, 2000–2004; p. 38.
- Ver Hoef JM, Peterson EE, Theobold D. Spatial statistical models that use flow and stream distance. Environmental and Ecological Statistics. 2006;13:449–464.
- Ver Hoef JM, Peterson EE. A moving average approach for spatial statistical models of stream networks. Journal American Statistical Association. Submitted in 2008. Accepted subject to minor revisions.
- Water on the Web. Understanding Water Quality Parameters: Dissolved Oxygen. 2004. [accessed on Feb. 24, 2007]. http://waterontheweb.org/under/waterquality/oxygen.html.
- Wilson S, Serre ML. Examination of atmospheric ammonia levels near hog CAFOs, homes, and schools in eastern NC. Atmospheric Environment. 2007;41(23):4977–4987.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |