Home | About | Journals | Submit | Contact Us | Français |

**|**Ann Occup Hyg**|**PMC2732913

Formats

Article sections

- Abstract
- INTRODUCTION
- DETERMINISTIC EQUATIONS OF TWO-ZONES MODELING
- STATISTICAL MODELING
- BAYESIAN MODELING
- SIMULATION STUDY OF TWO-ZONE DATA
- EXPERIMENTAL TWO-ZONE STUDY
- CONCLUSIONS AND FUTURE WORK
- References

Authors

Related links

Ann Occup Hyg. 2009 June; 53(4): 409–424.

Published online 2009 April 29. doi: 10.1093/annhyg/mep017

PMCID: PMC2732913

Yufen Zhang,^{1} Sudipto Banerjee,^{1} Rui Yang,^{2} Claudiu Lungu,^{3} and Gurumurthy Ramachandran^{4,}^{*}

Received 2008 October 3; Accepted 2009 February 26.

Copyright © The Author 2009. Published by Oxford University Press on behalf of the British Occupational Hygiene Society

Mathematical modeling is being increasingly used as a means for assessing occupational exposures. However, predicting exposure in real settings is constrained by lack of quantitative knowledge of exposure determinants. Validation of models in occupational settings is, therefore, a challenge. Not only do the model parameters need to be known, the models also need to predict the output with some degree of accuracy. In this paper, a Bayesian statistical framework is used for estimating model parameters and exposure concentrations for a two-zone model. The model predicts concentrations in a zone near the source and far away from the source as functions of the toluene generation rate, air ventilation rate through the chamber, and the airflow between near and far fields. The framework combines prior or expert information on the physical model along with the observed data. The framework is applied to simulated data as well as data obtained from the experiments conducted in a chamber. Toluene vapors are generated from a source under different conditions of airflow direction, the presence of a mannequin, and simulated body heat of the mannequin. The Bayesian framework accounts for uncertainty in measurement as well as in the unknown rate of airflow between the near and far fields. The results show that estimates of the interzonal airflow are always close to the estimated equilibrium solutions, which implies that the method works efficiently. The predictions of near-field concentration for both the simulated and real data show nice concordance with the true values, indicating that the two-zone model assumptions agree with the reality to a large extent and the model is suitable for predicting the contaminant concentration. Comparison of the estimated model and its margin of error with the experimental data thus enables validation of the physical model assumptions. The approach illustrates how exposure models and information on model parameters together with the knowledge of uncertainty and variability in these quantities can be used to not only provide better estimates of model outputs but also model parameters.

A primary issue in industrial hygiene is the estimation of a worker's exposure to chemical, physical, and biological agents. Usually exposure assessment proceeds from three basic methodologies: (i) subjective estimation using professional judgment, (ii) direct measurement of the environment exposure, and (iii) prediction of exposure through mathematical modeling. Traditionally, subjective judgments made with little transparency have driven most exposure assessments with direct measurements playing a smaller role. Exposure modeling, however, has been a neglected area, with very little emphasis within industry and negligible support for research from governmental funding agencies. However, this situation is expected to change dramatically with the advent of the REACH regulations in the European Union that require assessing exposures in a variety of exposure scenarios where monitoring may not be feasible (Ramachandran, 2008).

Statistical and mathematical modeling have some advantages over direct air monitoring data in certain situations: systematically evaluating retrospective exposure when past monitoring data are poor or nonexistent; predicting current and future exposure in the absence of the working process or operation, and in estimating exposure with only a small number of air samples with possibly high variability. Indeed, Nicas and Jayjock (2002) have argued that with only a few monitoring data points, modeling may provide more precise estimates of exposure than monitoring with only a few data points. With advances in computational methods and inexpensive software implementation, formal modeling is set to become an indispensable tool in the industrial hygienists' armory.

Formal modeling includes a deterministic component describing the physical laws that underlie the relationship between contaminant generation rate, pollutant transportation characteristics, and contaminant concentrations. The precise nature of the models varies in their scope and complexity according to the different experimental settings and certain assumptions that might be made. For example, assumptions on pollutant transportation patterns range from complete instantaneous mixing, to two well-mixed zones within a room, to diffusion resulting in continuous concentration gradients in time and space.

However, predicting exposure in real settings is constrained by lack of quantitative knowledge of exposure determinants and mathematical exposure models that are appropriate for the scenario. Even though the profession has for long paid lip service to the importance of a thorough knowledge of the determinants of exposure on the part of the hygienist, most companies do not collect such information routinely. Even basic data such as ventilation rates, pollutant generation rates, and worker time activity patterns are hard to come by in most situations.

The models employed today in industrial hygiene are typically based upon some simple assumptions about airflow and contaminant transport pattern. Hemeon (1963) and Nicas (1996) have described a two-zone model for concentrations in a ‘near field’ in the proximity of a source and a ‘far field’ that is some distance away from the source. While these models are being increasingly employed in the industrial hygiene community (e.g. Nicas, 2003; Nicas *et al.*, 2006), they have not been validated empirically in occupational settings. Little experimental research has been conducted to evaluate the parameters used in the models and assess model performance. For example, the parameter denoting the airflow rate between the near and the far field is poorly understood in the two-zone model. Existing methods for estimating this parameter (Melikov and Zhou, 1996; Nicas and Miller, 1999) ignore possible influencing factors such as presence of human body, body movement, and body temperature that could affect the average air speed inside the near-field zone. Cherrie (1999) investigated the effect of a wide range of general ventilation conditions and room sizes in a simulation study. Given the uncertainties in model parameters in most settings, validating these models becomes a challenging exercise. Not only do the model parameters need to be known, the models also need to predict the output (i.e. the air concentrations in the two zones) with some degree of accuracy.

The present article offers a Bayesian statistical framework for estimating model parameters and exposure concentrations from experimental data. This estimation will provide the margins of statistical confidence, thereby accounting for uncertainty. Finally, comparing the estimated model and its margin of error with the experimental data will enable validation of physical model assumptions. The approach illustrates how exposure models and information on model parameters together with the knowledge of uncertainty and variability in these quantities can be used to not only provide better estimates of model outputs but also model parameters.

Deterministic Equations of Two-Zones Modeling provides the physical background of the two-zone exposure problem leading to the formulation of the differential equations and their solutions. Statistical Modeling provides a description of the statistical modeling framework highlighting the different approaches we investigate. Bayesian Modeling briefly outlines the numerical algorithms for implementing and formally comparing our models. Simulation Study of Two-Zone Data present data analysis from a simulated data set as well as an actual experimental setting, while Experimental Two-Zone Study concludes the paper with some discussion and thoughts.

The one-compartment model that assumes a uniform concentration at all points in the compartment irrespective of distance from the source can underestimate worker exposures, especially for workers near the contaminant source (AIHA, 2000). To account for this deficiency, a slightly more complicated two-compartment model can be used (Nicas, 1996; Cherrie, 1999; Nicas and Miller, 1999) as in Fig. 1. Conceptually, it is a small step from a one-compartment to a two-compartment model. The region very near and around the source is modeled as one well-mixed box—the so-called near field, and the rest of the room is another well-mixed box that completely encloses the near-field box. This box is called the far field, and there is some amount of air exchange between the two boxes. Figure 1 shows the two zones schematically.

The volumes of the far- and near-field zones are *V*_{F} and *V*_{N}, respectively. The supply and exhaust flow rates are the same and equal to *Q* (e.g. in units of m^{3} min^{−1}). The airflow rate between the two zones is β (in units of m^{3} min^{−1}). While the determination of *Q* is straightforward, the determination of β is not. It is dependent, to some extent, on the dimensions of the near-field zone. One approach (Nicas, 1996) is to determine it as the product of the random air speed (*v*) at the boundary of the near field (reported by Baldwin and Maynard, 1998) and one half of the free surface area (SA) of the near field (β =SA × *v*). The factor of arises because the flow rate of β occurs by air flowing into the near field through one half of the free surface area and air flowing out through the other half of the free surface area.

Certain quantities accounted for in the modeling are measured externally and are treated as ‘known’. In this article, we will assume that the volumes of the far-field and near-field zones, *V*_{F} and *V*_{N}, respectively, are such. Here, we concentrate on a statistical estimation procedure (Bayesian) that combines information from experimental data and prior information (quantified in terms of probability distributions) for the unknown parameters and provide statistical predictions of the near- and far-field concentrations. We will primarily focus upon estimating the airflow rate between the far and near field, denoted by β. Two other parameters governing the dynamics of the system are the contaminant's total mass emission rate, denoted by *G*, and the rate of airflow into and out of the workplace, denoted by *Q*. In practical settings, *G* and *Q* may also not be known precisely. To illustrate our statistical approach, however, we will first fix *G* and *Q* and regard β as the only parameter requiring statistical estimation. We do so to keep the notations simpler and because the essential concepts of the approach do not change when additional parameters are assumed unknown. Subsequently, we will extend the analysis with unknown *G* and *Q*, hence with priors on *G*, *Q*, and β.

For the sake of simplicity, we assume that the initial concentration in both zones is equal to zero, the supply air is free of contaminant, and the only removal mechanism for the contaminant is by ventilation. Denoting concentrations in the near and far fields by *C*_{F}(β;*t*) and *C*_{N}(β;*t*), the dynamics of the total contaminant mass in a two-zone field is described as

(1)

This is concisely expressed as , where **C**(β;*t*) = (*C*_{N}(β;*t*), *C*_{F}(β;*t*))* ^{T}*, where

(2)

Matrix representations such as in equation (2) are especially useful in solving linear systems of differential equations (see, e.g. Laub, 2005) in computer packages such as MATLAB© and R (http://www.r-project.org/). In the following discussion, variables in bold represent matrices or vectors while the other variables are scalars.

Statistical modeling focuses upon optimal estimation of the parameter β from the experimental data. Since perfect experimental conditions are impossible, observed data may contain significant deviations from the physical two-zone model. Therefore, one seeks a nonlinear statistical equation that will account for noise in the measurements and uncertainties in model estimation and prediction. This statistically estimated model can subsequently help in validating the physical model for the experimental data.

At each time point *t*, let us consider the response **Y**(*t*) = (*Y*_{N}(*t*), *Y*_{F}(*t*))* ^{T}* as a 2 × 1 vector corresponding to the natural logarithm of the concentration measurements from the near and far fields. The observed value of

(3)

The error ϵ(*t*) requires a probability distribution, which we take as normal or Gaussian. In other words, we are modeling two components in equation (3): the mean function log **C**(β;*t*) captures the trend or large scale variation in **Y**(*t*), expressed as a function of time and some experimental parameters (β, *G*, *Q*). The second component, ϵ(*t*), captures measurement error as well as discrepancies in the response due to imperfect physical conditions and, perhaps, misspecification of the physical model. The parameters in the distribution of ϵ(*t*) help us estimate and predict the uncertainty.

A customary specification will have ϵ_{N}(*t*):^{iid} *N*(0, τ_{N}) and ϵ_{F}(*t*):^{iid}*N*(0, τ_{F}), implying that the measurement errors for **Y**(*t*) for both the near and far fields are independently and identically distributed as normal distributions with zero means and variances denoted by τ_{N} and τ_{F}, respectively. Equation (3) implies that the concentration measurements in their true scale [i.e. exp(*Y*_{N}(*t*)) and exp(*Y*_{F}(*t*))] are assumed to follow a log-normal distribution. We call and the residual standard deviations. The geometric standard deviation (GSD) is given by exp() and exp(). This one-one correspondence between the residual standard deviation and the GSD is appealing from a computational and inferential perspective. It is computationally convenient to have prior distributions on the residual variances. Once inference on the residual variances is obtained, we can easily obtain inference on the GSDs.

Another feature that can influence the observed values of the response is the presence of correlation between the near- and far-field measurement error processes. For example, assuming that ϵ_{N}(*t*) and ϵ_{F}(*t*) are independent of each other (and across time), we can write the error distribution as . This implies that the two components of ϵ(*t*) follow a bivariate normal distribution with zero mean and a 2 × 2 variance–covariance matrix that is simply a diagonal matrix with the two variances. The off-diagonal element that represents the covariance between the fields is 0. More generally, however, we may want to relax this assumption of independence between the fields, in which case we write , where the matrix Σ is symmetric and positive definite (i.e. all its eigenvalues are real and positive) and is no longer restricted to be diagonal. Positive definiteness of is equivalent to , where the off-diagonal element τ_{NF} represents a parameter that captures the covariance between the two fields and represents the statistical ‘correlation coefficient’ between the near field and far field.

Nonlinear models, as in equation (2), often cause estimation problems since the parameters in the physical model (e.g. the parameter β) are often not easily estimable from the data. Some expert or ‘prior’ knowledge regarding the plausible ranges for these parameters is often required. One approach is to fit the model for a number of different fixed values of β to gather an idea about what values yield the closest fit. This approach is fairly *ad hoc* and subjective and might deliver spurious estimates that subsequently generate inaccurate predictions for concentrations.

A more principled statistical approach would quantify the plausible range of β through a prior probability distribution, allowing us to quantify the strength of this belief. This information is then combined with the likelihood using Bayes Theorem [see equation (5) in Bayesian Modeling]. Statistical models that combine such prior knowledge with experimental data are known as Bayesian hierarchical models and have become extremely popular over the last two decades (e.g. Gelman *et al.*, 2004; Carlin and Louis, 2008).

In Bayesian statistics, one constructs hierarchical (or multilevel) schemes by assigning probability distributions to parameters *a priori* and inference is based upon the distribution of the parameters conditional upon the data *a posteriori*. By modeling both the observed data and any unknown regressor or covariate effects as random variables, the hierarchical Bayesian approach to statistical analysis offers a cohesive framework for combining complex data models and external knowledge or expert opinion.

More specifically, let **Y** = (**Y**(*t*_{1})* ^{T}*, …,

(4)

which is also called the ‘data likelihood’. The prior distribution for θ is specified by some probability distribution $p\left(\mathbf{\theta}\right)$ and the ‘posterior’ distribution is given by

(5)

which incorporates both the data and the possible external (prior or expert) knowledge. In general, posterior distributions are analytically intractable due to the potentially complicated integral in the denominator and are evaluated by drawing samples from the posterior distribution. A suite of methods known as Markov chain Monte Carlo (MCMC) algorithms such as the Gibbs sampler and Metropolis–Hastings algorithms (see, e.g. Gilks *et al.*, 1996; Gelman *et al.*, 2004; Marin and Robert, 2007; Carlin and Louis, 2008) have gained enormous popularity in Bayesian statistics. The power of these methods lies in completely avoiding the numerical integration in equation (5), which, in general, can be high dimensional, by absorbing this integral into a ‘proportionality constant’ and generating samples from the distribution $p\left(\mathbf{\theta}\text{\hspace{0.17em}}\right|\text{\hspace{0.17em}}\mathbf{Y})\propto p(\mathbf{\theta}\left)p\right(\mathbf{Y}\text{\hspace{0.17em}}\left|\text{\hspace{0.17em}}\mathbf{\theta}\right)$. In Appendix B, we briefly describe the Gibbs sampler and Metropolis–Hastings algorithm in the current context; details are available in the aforementioned references. Since there is no *a priori* justification for why β and the elements of Σ should be correlated, we assume they are independent *a priori*, i.e. $p\left(\mathbf{\theta}\right)=p\left(\mathrm{\beta}\right)p\left(\Sigma \right)$.

MCMC yields samples from a Markov chain that require an initial ‘burn-in’ time to converge to its stationary distribution, namely the posterior distribution. Customarily, convergence is diagnosed by starting a few different chains (say two or three) from different starting values and noting where they start mixing (e.g. Gelman *et al.*, 2004; Chapter 11). On the basis of these plots, we discard some initial samples as burn-in and retain the rest to draw inference. Furthermore, samples obtained from MCMC are not independent, but autocorrelated. This effect is usually small for models with relatively fewer parameters and drawing a large enough posterior sample ensures consistency in estimation. MCMC algorithms have now been automated in free software projects such as WinBUGS (http://www.mrc-bsu.cam.ac.uk/) and several packages within the R statistical environment (http://www.r-project.org/). These programs not only execute sampling from the posterior distribution but also provide tools for gauging convergence and suggesting effective Monte Carlo sample sizes for posterior inference.

An important advantage of carrying out inference using posterior samples is that they immediately yield inference for functions of the parameters we have sampled. For example, once we have obtained the posterior samples of the elements of Σ from (B), we can immediately obtain the posterior samples of the GSDs by exponentiating each sampled value of the square root of the diagonal elements of Σ. More precisely, suppose we have obtained *L* posterior sample units ${\left\{{\Sigma}_{\left(i\right)}\right\}}_{\mathit{i}\text{=1}}^{L}.$ For each sample Σ_{(i)}, we extract its diagonal elements τ_{N(i)} and τ_{F(i)} and compute and . The samples {}_{i = 1}^{L} and {}_{i = 1}^{L} are now the posterior samples of the GSDs.

Another feature of Bayesian inference is that prediction and model assessments can be carried out from the ‘posterior predictive’ distribution. To be precise, predicting the response **Y**(*t*_{0}) at a new location amounts to evaluating , where $p\left(\mathbf{Y}\right({t}_{0}\left)\text{\hspace{0.17em}}\right|\text{\hspace{0.17em}}{\mathbf{\theta}}_{\left(i\right)},\mathbf{Y})$ further simplifies to $p\left(\mathbf{Y}\right({t}_{0}\left)\text{\hspace{0.17em}}\right|\text{\hspace{0.17em}}\mathbf{\theta})$ which is the model likelihood at *t*_{0}. Again, sampling from *p*(**Y**(*t*_{0})|**Y**) is preferred: having obtained posterior samples, ${\left\{{\mathbf{\theta}}_{\left(i\right)}\right\}}_{i=1}^{\text{M}}$, for each ${\mathbf{\theta}}_{\left(i\right)},$ we now draw **Y**_{(i)}(*t*_{0}) from the distribution $p\left(\mathbf{Y}\right({t}_{0}\left)\text{\hspace{0.17em}}\right|\text{\hspace{0.17em}}{\mathbf{\theta}}_{\left(i\right)},\mathbf{Y})$. The resulting ${{\{\text{Y}}_{\text{(}i\text{)}}\text{(}{t}_{\text{0}}\text{)}}}_{i\text{=1}}^{\text{M}}$ are the desired samples from the posterior predictive distribution. The mean and standard deviation offer predictive estimates and uncertainty. All uncertainty pertaining to estimation and prediction are accounted for in these samples. Bayesian model assessment can now proceed from this predictive distribution by using the estimated model to predict at observed time points and comparing these predictions with the observed data.

While the highly nonlinear nature of equation (1) precludes effective prediction using simple linear regression models, a least-squares approach that maximizes the likelihood in equation (4), either using non-linear regression or some locally linear approximation, offers a feasible alternative to our Bayesian method for fetching point estimates and predictions. However, quantifying uncertainty of these estimates will rely upon asymptotic normality for large sample sizes. Furthermore, classical nonlinear optimization algorithms can perform poorly with several parameters in equation (1) (we discuss such a setting in Estimates for G and Q, where we estimate not just β, but also *G* and *Q* and dependent error structures (see Results with dependent near- and far-field measurements). The sampling-based Bayesian approach absolves us of the above problems: the posterior simulations offer ‘exact’ inference based upon whatever sample size we have in a numerically stable manner.

We will first illustrate our method using a simulated data set that will help assess the effectiveness of the proposed algorithms. We generate data with all model parameters fixed and known and then to estimate them from the generated data in a Bayesian setting with priors on these parameters. We then check to see whether the differences between the estimated and true values are within the range of statistical error. This entire procedure is repeated with not one but several, say 100, data sets generated with different parameter values. To be more precise, here we simulate data from the statistical model in equation (3) having the following factors—(i) two different settings for the covariance structure Σ: the first assumes independent near- and far-field measurements, while the second imposes a fairly strong association (the correlation coefficient is to be precise); (ii) β is set to be 0.1, 0.2, …, 10 m^{3} min^{−1}, yielding a total of 100 simulated data sets for each fixed covariance structure in (i); (iii) a sequence of time points is taken as 0, 0.01, …, 4 min, so that each data set has 401 pairs of (*C*_{N}, *C*_{F}) with a fixed Σ and β. The values of β chosen are in the low to mid-range used by Cherrie (1999) in his simulation study.

The remaining physical parameters are treated as fixed constants. They are assigned values that simulate conditions similar to our real data set (see the next section) with near-field volume *V*_{N} = π × 10^{−3} m^{3}, far-field volume *V*_{F} = 3.8 m^{3}, room supply air rate *Q* = 13.8 m^{3} min^{−1}, and the constant mass emission rate *G* = 351.5 mg min^{−1}. The data sets are generated using an ordinary differential equation solver from the package odesolve in R. Given the parameters in equation (1), odesolve produces values of *C*_{N}(*t*) and *C*_{F}(*t*) at different time points for fixed β.

For statistical estimation, we worked with the log concentration as our dependent variable and assume Gaussian error distributions in the log scale. In fact, *C*_{N} shows serious violation for the normality assumption which the log transformation meliorates. The closed-form expressions we derived for **C**(β;*t*) in equation (2) [also equation (A1) in Appendix A] facilitate computations, but even when closed forms were unavailable, we could numerically solve the system in equation (1) using a linear differential equation package (e.g. odesolve in R). This numerical solution must be carried out in every iteration of the MCMC algorithm using the current value of β.

The parameter β represents airflow rate between the near and the far field and is, hence, positive. The prior for β can be log-normal with mean 0 and a large variance. A large prior variance represents vagueness and uncertainty about our confidence in our prior beliefs and allows the data to drive the inference. In our simulation, we set the prior variance to be 2.0, i.e. geometric mean 1, and variance exp(2.0). Based upon practical experience and physical principles, we know β cannot be huge for most of the real situations. Therefore, we can also assume that, β is distributed on a positive interval and another sufficiently vague prior would be the uniform prior *U*(0, 50). Both these priors gave stable and similar estimates for the testing data set and we chose to present the results from the log-normal prior in the final analysis.

For the error term in equation (3), we first considered the independence error model, where Σ = diag(τ_{N}, τ_{F}). Therefore, the priors for τ_{N} and τ_{F} are set to be inverse Gamma with mean 0 and variance 1000. Next, instead of assuming independence between fields, we also considered the inverse-Wishart prior (see Carlin and Louis, 2008, p. 426, for the Wishart/inverse-Wishart density) for a general Σ matrix. In the simulation, we let Σ have an inverse-Wishart (*S*, *ν*) prior with the scale matrix whose diagonal elements are each 10 and degrees of freedom *ν* = 4.

For each simulated data set, we ran three chains for 10000 iterations each. Convergence was diagnosed within 2000 iterations using the coda package in R. We discarded the first 6000 (2000 × 3) samples as burn-in and retained the remaining 24000 (8000 × 3) for posterior inference. Using equation (2) directly or numerically solving equation (1) with the odesolve package (which even applies to analytically intractable differential systems) produced practically indistinguishable results. Hence, we only present results from the latter.

Table 1 is the posterior summary for one simulated data set that was generated with the true β equaling 5 and the true Σ = diag(1, 0.64). The table reveals that the values that generated the data are all included in the Bayesian 95% equal-tailed posterior credible intervals (BCI). Estimates of the error term τ_{N} and τ_{F} are also quite good for the independent inverse-Gamma priors with the posterior 95% credible interval for τ_{N} being a little wider than that for τ_{F}. Figure 2 shows the posterior distribution of coefficient of variation (CV) at different time points for the near and far fields. These are computed using the posterior predictive intervals for *Y*_{N}(*t*) and *Y*_{F}(*t*). With time, the posterior concentrations at the near field and far field become steady, so their CVs quickly become stable as expected.

Parameter estimates for our simulated data example from the model assuming independence between the far and near fields (i.e. τ_{NF} = 0) with *G* and *Q* held fixed

Posterior means of CVs for a simulated data with true *β* = 5, τ_{N} = 1.0, τ_{F} = 0.64 and τ_{NF} = 0: upper panel is the near field; lower panel is the far field. X-axis: time (minutes); y-axis: CV(*C*_{N}) and CV(*C*_{F}).

Figure 3 presents the 95% equal-tailed Bayesian credible intervals (BCI) for β computed from each of the 100 simulated data sets, where true β = 0.1, 0.2, …, 10 m^{3} min^{−1} as described in Generating data and the methods and true Σ = diag(1, 0.64). The middle points represent the posterior medians, and the line segments represent the (2.5%, 97.5%) percentiles of the posterior distribution. The overall coverage for the 100 data sets is ~92%, only slightly lower than the exact theoretical coverage of 95%.

The 95% equal-tailed posterior credible intervals for 100 simulated data sets, where the true values are τ_{N} = 1, τ_{F} = 0.64, τ_{NF} = 0, and β = 0.1, 0.2, …,10. X-axis: true β (m^{3} min^{−1}); y-axis: estimated **...**

Figure 3 also exhibits an interesting trend: the larger the β, the wider the confidence interval. Additional insight is obtained from the differential system's properties. The 3D plots in Fig. 4 reveal that both *C*_{N} and *C*_{F} reach steady concentration. For any data set, the smaller the β, the larger its ‘steady *C*_{N}’ will be (in Fig. 4a). On the other hand, for *C*_{F}, the data set with different βs always show similar ‘steady *C*_{F}’ (in Fig. 4b) i.e. *C*_{F} contains little information about β. Therefore, *C*_{N} contains more information than *C*_{F} to restrict β in a small interval when the true β is really small, thereby causing narrow Bayesian confidence intervals. On the other hand, even for *C*_{N}, the steady concentration becomes more similar for large βs. Consequently, a data set with a large β will have wider confidence intervals.

3D plots of concentrations for the two-zone differential equation system. X-axis: time (minutes); y-axis: β (m^{3} min^{−1}); z-axis: *C*_{N}, *C*_{F} (mg m^{−3}).

In Table 1, log() and log() present posterior summaries for the steady-state (when time *t* is large enough) log concentrations [i.e. log(*C*_{N}(*t*)) and log(*C*_{F}(*t*))] for the simulated data. Figure 5 shows the plots for the predictive estimates of log(*C*_{N}(*t*)) and log(*C*_{F}(*t*)) with time, where the 95% equal-tailed predictive interval is approximately symmetric about the median line. Because the error variance τ_{N} at the near field is greater than τ_{F} at the far field, the width of predictive interval for *C*_{N} is also larger than *C*_{F}. Figure 5 clearly reveals that the predicted concentration (median) stabilizes with time.

We next generated 100 data sets, where the true covariance matrix Σ was not diagonal, but had off-diagonal elements equaling 0.5. We now estimated this data set with two models: the first was the independent error model as in the preceding illustration where we put independent inverse-Gamma priors on τ_{N} and τ_{F}; the second was the model with the dependent error assumption, where the covariance matrix Σ was assigned an inverse-Wishart prior. Table 2 shows a comparison of posterior summaries for one simulated data set between these two models. The true values that generated this simulated data set are β = 5 m^{3} min^{−1}, τ_{N} = 1, τ_{F} = 0.64, τ_{NF} = 0.5, i.e. a correlation of ρ = 0.625.

Comparisons between parameter estimates from two models: the block on the left fits the model assuming independence between the fields (i.e. τ_{NF} = 0), while the block on the right corresponds to a dependent model that estimates τ_{NF}

Table 2 reveals that the 95% posterior credible interval (BCI) includes β only for the model with inverse-Gamma prior. The model with the inverse-Wishart prior has a narrower BCI for β than the inverse-Gamma prior and does not include the true β. The inverse-Wishart prior involves an additional parameter estimate that might lead to more conservative intervals for β. The estimation of the error terms is excellent for both priors, even for the covariance term, which has the narrow BCI (0.41, 0.96). We also calculated the BCI coverage of β for the 100 data sets with true τ_{NF} = 0.5 under two different prior models. The overall coverage rate for the inverse-Wishart model is ~91%, which is slightly higher than the independent inverse-Gamma prior model (89%). This suggests that the independent inverse-Gamma prior model can be robust even for data sets with a fairly strong covariance structure.

So far, we assumed that only β was unknown in equation (1); we now assume that *G* and *Q* are also to be estimated. We did a simulation experiment for the same data as Table 1, but now β, *G*, and *Q* are all assumed to be unknown. We then set some ‘informative’ priors based on practical knowledge. For example, we assigned *G* a uniform prior within ±20% of the true value *G* = 351.5 mg min^{−1}, that is *U*(281, 422). For *Q*, an informative prior, *U*(11, 17), is given based on true *Q* = 13.8 m^{3} min^{−1}. After setting the priors for new parameters, the MCMC procedure is fairly routine. The results are included in Table 3.

Parameter estimates for models with *G* and *Q* also assumed unknown and estimated: the block on the left fits the model assuming independence between the fields (i.e. τ_{NF} = 0), while the block on the right corresponds to a dependent model that estimates **...**

From Table 3, all the true parameter values that generated the data are included in the 95% equal-tailed posterior intervals for both dependent and independent models. Treating *G* and *Q* as unknown parameters, the estimates for β, τs, log() and log() remain quite close to those in Table 2 for the independent model. For the dependent model, we observed a slight increase in β and a slight decrease in τ_{N} and τ_{F} compared to their counterparts in Table 2. The estimates for *G* = 351.23 mg min^{−1} is very close to the true value, 351.5 mg min^{−1} for the independent model, and it (384.84 mg min^{−1}) is a little larger than the true value for the dependent model. Both *Q* = 12.58 m^{3} min^{−1} for independent model and *Q* = 14.87 m^{3} min^{−1} for dependent model are fairly close to the true value, 13.8 m^{3} min^{−1}. It reveals that even with additional unknowns, our method yields relatively stable results. This property is appealing mainly because there are always multiple unknowns that are hard to measure for real problems.

We now turn to an experiment conducted in a test chamber depicted in Fig. 6. The test chamber is connected to the inlet of a centrifugal exhaust fan powered by a 0.75 *HP* (at 2500 *rpm*) motor. The upstream pre-filters increase the level of turbulence in the entering airflow. The test section is 1.73 m long, 1.27 m wide, and 1.73 m high. Thus, the volume of the far field is *V*_{F} = 3.8 m^{3}. A mixing fan is mounted midway across the ceiling of the chamber and 20 cm downstream from the air inlet plane of the chamber. The position of the mixing fan was selected after running flow visualization tests to maximize the air-mixing effect. The measured average airflow rate through chamber was *Q* = 13.8 m^{3} min.

Toluene in a 200-ml glass impinger was vaporized using an airflow of 15 l min^{−1}, and the outlet of the impinger was connected to a conical metal vessel covered with a mesh placed in the near field (described later). The toluene generation rate was calculated to be *G* = 0.404 ml min^{−1} or 351.5 mg min^{−1} and is constant over time.

It is useful to remember that the model (used to predict worker exposures) makes several simplifying assumptions besides the ones listed in Deterministic Equations of Two-Zones Modeling. These include neglecting the presence of the worker's body, body movement, and body heat. We conducted a series of experiments to explore the effects of each of these factors on concentrations in the two zones. A 25-cm high mannequin measuring 6 cm across the shoulder (14.4% scale mannequin) was used to model the worker. Body movement was simulated by placing the mannequin on a stand powered by a motor that rotated at 10 rpm. For stationary conditions, the mannequin was oriented facing the contaminant source. To simulate heat generated by the human body, the mannequin surface was heated using heating tape that was wrapped around the mannequin. The voltage input to the heating tape was regulated using a variable autotransformer that maintained temperature at a constant 33°C under normal conditions (skin temperature is 4°C less than the core temperature of 37°C).

Toluene concentrations were measured with an infrared analyzer (Foxboro MIRAN 1B2, Foxboro, MA, USA). The measured data were logged using a data logger (Model TC 110, Omega Engineering Inc., Stanford, CT, USA) and transferred into a spreadsheet for further analysis. Prior to data collection, the data loggers were calibrated using known toluene concentrations. To study the air contaminant concentration in the test chamber, concentrations were measured at >60 locations around the contaminant source to obtain spatial profiles of toluene concentrations. These measurements were made in four directions (east, west, north, and south) on three horizontal parallel planes at five different distances (10, 15, 20, 30, and 40 cm) from the generation source. The source was located on the middle plane. Concentrations were measured every 5 s and for at least 15 min in each location.

In the previous studies, the near field has been arbitrarily selected as a volume that contains the breathing zone of the worker. We used a different approach. The near and far fields are supposed to have distinctly different concentrations and the model, in fact, predicts a sharp discontinuity at the boundary between the two zones that would not be seen in reality. While the measured concentrations show a more gradual decrease in concentration moving away from the source, the spatial rate of change of concentration is not uniform. The maximum rate of change of concentration occurred at a distance of 10 cm from the source. Accordingly, we used this to define our near field as a 10 cm high cylinder with a radius of 10 cm with its base on the plane of the generation source.

The concentrations in the near and far fields have the same initial states, i.e. *C*_{N} (0) = *C*_{F} (0) = 0 mg m^{−3}. We assume that concentrations at the near and far fields were measured simultaneously at each time point. Although the real data set does not fairly meet this condition, we can estimate β and τ_{N} using only concentrations at the near field. Later, under more assumptions, we also used [*C*_{N}(*t*), *C*_{F}(*t*)] pairs that were taken under the same experimental condition.

We fit the model in equation (3) to the experimental data set using three parallel MCMC chains of 10000 iterations each. Convergence diagnostics revealed that the three chains converged fairly quickly and the first 2000 × 3 samples were discarded as burn-in. Table 4 shows the posterior summary of β. The posterior mean of β is ~2.5 m^{3} min^{−1} for log-scaled data, while the variance for the log-scaled data at the near field is ~0.7.

After fixing the initial conditions, a unique solution of the equation system in equation (1) can be found [see equation (A1) in Appendix A]. From the equations, we see that and as *t* → ∞. Therefore, the steady-state solution for β is . Theoretically, as time goes by, the measured and modeled concentrations agree reasonably well on steady state as long as the system conditions remain the same. We can calculate β based on the steady-state solution, i.e. . Therefore, the calculated β is ~1.74 m^{3} min^{−1} based on the steady concentration *C*_{N} close to 60 ppm = 227.44 mg m^{−3} for toluene.

As in Simulation results, we plotted the 95% equal-tailed posterior predicted interval for the near field concentration (log scaled) for the build-up data in Fig. 7. From the figure, the real data show smaller error than the simulated data. There is a rapid increase in the calculated concentration, but a more gradual one in the measured concentration. This might result from the assumption of mixing efficiency of the two-zone model and measurement error. Therefore, our estimated β is not far, but still different from the steady-state solution. When calculating the steady-state solution, we used the steady state of measured concentration. It is hard to define precisely the exact steady-state concentration from experimental data. In our experiment, we used the average concentration of the last 5 min to represent the steady concentration at that point. Therefore, measuring steady concentration involves arbitrary judgments. As a result, the estimated steady-state solutions may also have some bias.

Posterior predictive median and 95% intervals for *C*_{N}(*t*) computed from the two-zone data considering *C*_{N}(*t*) measurements only. X-axis: time (minutes); y-axis: log(*C*_{N}), (log (mg m^{−3})).

To explore the predictive concentration at the near field, we also plotted the posterior density for *C*_{N} at some fixed time points. Figure 8 shows density for posterior *C*_{N} at time *t* = 5, 25, 250, and 1000 s. All the curves have similar shapes as expected. The posterior means for *C*_{N} increase rapidly at earlier stages (*t* = 5 and 25 s), becoming more gradual for later times (*t* = 250 and *t* = 1000 s). This agrees well with the 3D data plot (Fig. 4).

Posterior predictive density of *C*_{N} for the two-zone data considering *C*_{N} measurements only. X-axis: *C*_{N} (mg m^{−3}); y-axis: density.

As mentioned earlier, we did not measure *C*_{N} and *C*_{F} simultaneously, but we can assume that separately measured *C*_{N}(*t*) and *C*_{F}(*t*) were taken at the exact same time points. With the same initial conditions, we regarded them as pairs and fit the model. In the following, we consider the complete two-zone data with both near- and far-field measurements. A series of experiments were conducted to explore the effects of the presence of the worker's body, movement of the body, and body heat on *C*_{N}, *C*_{F}, and β.

The directional airflow through the chamber also has an effect on the shape of the near field, something that the model ignores. Flow visualization experiments (not reported here) showed that the near field is not symmetrical along the direction of the airflow through the chamber (north–south), while it is symmetrical along the east–west direction. Though measurements taken at northbound and southbound directions were excluded when defining the near-field zone, we wanted to determine if different directions really matter for concentrations and their estimates of β. Table 5a compares the estimates among data sets taken under the same working conditions (without mannequin, body movement, and body heat) but in different directions, where refers to the estimates for β from its posterior samples; *CV*() and *CV*() are the posterior estimates of CV for steady-state *C*_{N} and *C*_{F}; ‘β_{Eqm}’ refers to the steady-state solution for β as mentioned earlier; denotes the posterior estimates for the steady-state *C*_{N}; experimental is the steady-state *C*_{N} that we used to calculate the steady-state solution for β. The steady-state *C*_{N} varies for different exposure conditions; therefore, their steady-state solutions are different too. As expected, all estimated βs are close to the estimated steady-state solution from the differential equation system. The east and west estimates of βs are more similar than those from the south.

Comparison of estimated βs and the coefficient of variations for the two-zone experiments: (a) corresponds to different directions under the same working conditions; (b) corresponds to different working conditions but for a fixed direction (east) **...**

To assess the effects of different exposure conditions, we also experimented with the presence of mannequin, body movement, and body heat. The results are shown in Table 5b. We expect that the presence of the mannequin will reduce air exchange rates between the near and the far field, thereby increasing the toluene concentration. Thus, it decreases the airflow rate β between the near field and far field. The first two rows of Table 5b show that β is ~4.33 m^{3} min^{−1} without the presence of mannequin and becomes 3.96 m^{3} min^{−1} with the mannequin presented. Comparing with base case (with mannequin present), the body movement and body heat shows a small influence on the concentration level and the airflow rate β as shown in the second, third, and fourth rows of Table 5b. The estimated βs are always ~3.95 and have similar posterior credible intervals. Thus, their posterior predictive steady-state concentrations also are similar [ and in Table 5b]. The posterior predictive steady-state concentrations for far field are almost the same (≈25.51 mg m^{−3}) for all data, so is their steady-state solution . This also implies that *C*_{N} contains much more information than *C*_{F}. The extremely small *CV*() at far field (<10^{−3}) shows the stability of steady-state concentration at far field. Moreover, the small value of *CV*() at near field compared with experimental also shows the accuracy of our approach.

Figure 9 shows the 95% posterior predictive intervals for the real data at east without the presence of mannequin, body movement, or body heat. By fitting the complete two-zone data (including *C*_{N} and *C*_{F}), we found that the predicted *C*_{F}s are farther from the truth than the predicted *C*_{N}s. Obviously, the information from near-field data and far-field data about β do not agree well with each other. It is very likely that there is measurement delay between measured *C*_{N}(*t*) and *C*_{F}(*t*), which we combine together to use as a complete data. Another explanation is that the concentration at the far field is more spatially variable than the near field, e.g. different measurement positions may give very different *C*_{F}s, then our fixed position measurement may not represent the far field very well, thereby producing larger measurement errors for *C*_{F}(*t*) than *C*_{N}(*t*). Consequently, the posterior β has to find a best fit between the two different information sources (*C*_{N}(*t*) and *C*_{F}(*t*)). As mentioned earlier in results with independent near- and far-field measurements, *C*_{N} has more information about β than *C*_{F} and therefore the estimated β agrees more with *C*_{N}(*t*) than *C*_{F}(*t*). As a result, the estimated variance at near field is much smaller than far field, i.e. *C*_{F} has much wider posterior predictive intervals.

Prediction of *C*_{N}(*t*) and *C*_{F}(*t*) for the complete two-zone data in a fixed direction (east). X-axis: time (minutes); y-axis: log(*C*_{N}) and log(*C*_{F}) (log (mg m^{−3})).

Validation of the model is essentially a comparison of the model predictions of near and far-field concentrations with actual measurements of the same. This, of course, presupposes that the model input parameters are known. However if there are uncertainties in the input parameters, validation becomes challenging. The approach here outlined here illustrates how exposure models and information on model parameters together with the knowledge of uncertainty and variability in these quantities are part of the validation process.

It is interesting to compare the values of the parameters chosen in this laboratory study to those in real-life situations reported by Nicas *et al.* (2006) and simulations reported by Cherrie (1999). The ratio of VNF/VFF is ~8 × 10^{−4} in this study while Nicas reports a value of 9 × 10^{−5}. Thus, there is an order of magnitude difference. However, the air changes per minute in the near field (VNF/beta) varies between ~32 and 3180 in this study. Nicas *et al.* (2006) reports a range of 21–27 air changes per minute for the near field. Thus, the Nicas *et al.* study is in the lower end of the range examined by us. The values of β in this study range between 0.1 and 10 m^{3} min^{−1}. This is comparable to the values in Cherrie (3–30 m^{3} min^{−1}). Thus, we can conclude that the experimental and simulation conditions examined in this study are roughly of the same order as reported in other studies. At the same time, the validation has been done with a substantial amount of data. The simulations were conducted over a wide range of model input conditions. Realistic effects such as body heat, body movement, and presence of a mannequin were incorporated as part of the experiments. The results show a close match between model predictions and actual measurements.

We have proposed a nonlinear regression model within a Bayesian setting for analyzing experimental data from two-zone experiments. This approach combines prior information on the physical model along with the observed data and accounts for uncertainty in measurement and in the rate of airflow between the far and near fields. We recognize the variability arising in experimental data and also predict the concentration from the model and arrive at more pronounced inference regarding the time to achieve system equilibrium.

Our findings reveal that direction and exposure conditions affect the concentration levels differently. Indeed, direction has a pronounced effect on β and the concentration level. On the other hand, the presence of the mannequin has greater effect than body movement and body heat. We found body movement and body heat to only slightly affect β and the concentration level. Our estimates of β are always close to the estimated steady-state solutions, implying that our method works efficiently. The predictions of near-field concentration for both the simulated and experimental data show nice concordance between the observed and predicted values, indicating that the two-zone model assumptions agree with the reality to a large extent and the model is suitable for predicting the contaminant concentration.

Our approach also helps validate the underlying physical process using experimental data. In other words, we need to test if the experimental data support the nonlinear relationship between concentration and time until it attains steady state. Simply predicting the average concentration using linear relationships, as is done by normal linear regression models, will not allow such validations as they do not estimate the physical model. Our framework can also be applied to other physical models to estimate parameters or validate the model itself.

A unique solution of equation (1) is

(A1)

where *λ*_{1} and *λ*_{2} are eigenvalues of the matrix **A**(β) in equation (2). These are available in closed form:

(B1)

We provide some details on the MCMC algorithm for estimating β and Σ. The setting with *G* and *Q* unknown is analogous. The algorithm proceeds iteratively: we first assign starting values β_{(0)} and Σ_{(0)} to the parameters; then the *i*-th iteration updates these parameters by drawing from their full conditionals β_{(i)} ~ *p*(β|Σ_{(i – 1)}, **Y**) *p*(β)*p*(**Y**|β, Σ_{(i – 1)}) and Σ_{(i)} ~ *p*(Σ|β_{(i)}, **Y**) *p*(Σ)*p*(**Y**|β_{(i)}, Σ). The mathematical expressions for these ‘full conditional distributions’ are required only up to a proportionality constant.

For β, we note that the prior or expert knowledge regarding this parameter would usually be summarized either as a uniform or a log-normal distribution (note that β > 0). These do not yield a standard distribution to draw from and we update β_{(i)} using a Metropolis–Hastings step (Gelman *et al.*, 2004). We first draw a β^{*} from a log-normal ‘proposal‘ distribution, denoted *J*(*v*|*u*), with mean *u* and a fixed variance *σ*^{2}. We then calculate an ‘acceptance ratio’ . If α > 1, then we set β_{(i)} = β^{*}. If α ϵ (0, 1), then we draw a random number from *U*(0, 1) and set β_{(i)} = β^{*} if the drawn number is less than α; otherwise we ‘reject’ the proposed value and set β_{(i)} = β_{(i – 1)}.

After obtaining β_{(i + 1)}, we turn to the parameter Σ, which has a prior distribution *IW*(*S*, *ν*) (Carlin and Louis, 2008, p. 426). Here, *S* is a 2 × 2 positive definite scale matrix and *ν* is the degrees of freedom. We let *S* be diagonal implying no prior assumption of dependence between the near and far-field measurements. Then Σ_{(i + 1)} is updated using a Gibbs update: Σ_{(i + 1)} ~ *IW*(*S*_{1(i + 1)}, *ν*_{1}), where ${S}_{1}=S+{\displaystyle \sum _{j=1}^{n}\left(\mathbf{Y}({t}_{j})-\mathrm{log}(\mathbf{C}({\mathrm{\beta}}_{(i+1)});{t}_{j})\right)}{\left(\mathbf{Y}({t}_{j})-\mathrm{log}(\mathbf{C}({\mathrm{\beta}}_{(i+1)});{t}_{j})\right)}^{T}$ and *ν*_{1} = *ν* + *n*. For the independence model, τ_{NF} = 0. Assuming that τ_{N} and τ_{F} have *IG*(*a*_{N}, *b*_{N}) and *IG*(*a*_{F}, *b*_{F}) priors, respectively, we obtain Gibbs updates for τ_{N} and τ_{F}:

- American Industrial Hygiene Association (AIHA) Mathematical models for estimating occupational exposure to chemicals. Fairfax, VA: AIHA Exposure Assessment Strategies Committee, AIHA Press; 2000.
- Baldwin PE, Maynard AD. A survey of wind speeds in indoor workplaces. Ann Occup Hyg. 1998;42:303–13. [PubMed]
- Carlin BP, Louis TA. Bayesian methods for data analysis. Boca Raton, FL: Chapman & Hall/CRC Press; 2008.
- Cherrie JW. The effect of room size and general ventilation on relationship between near and far-field concentration. Appl Occup Environ Hyg. 1999;14:539–46. [PubMed]
- Gelman A, Carlin JB, Stern HS, et al. Bayesian data analysis. Boca Raton, FL: Chapman & Hall/CRC Press; 2004.
- Gilks W, Richardson S, Spiegelhalter D. Markov chain Monte Carlo in practice. London: Chapman and Hall; 1996.
- Hemeon WC. Plant and process ventilation. 2nd. New York: Industrial Press; 1963. pp. 235–45.
- Laub AJ. Matrix analysis for scientists and engineers. Philadelphia, PA: Society for Industrial and Applied Mathematics; 2005.
- Marin J, Robert CP. Bayesian core: a practical approach to computational Bayesian statistics. New York: Springer; 2007.
- Melikov A, Zhou G. Proceedings, 7th international conference on indoor air quality and climate: indoor air ’96. Vol. 1. Nagoya, Japan: 1996. Air movement at the neck of the human body; pp. 209–14.
- Nicas M. Estimating exposure intensity in an imperfectly mixed room. Am Ind Hyg Assoc J. 1996;57:542–50. [PubMed]
- Nicas M. Estimating methyl bromide exposure due to offgassing from fumigated commodities. App Occup Environ Hyg. 2003;18:200–10. [PubMed]
- Nicas M, Jayjock M. Uncertainty in exposure estimates made by modeling versus monitoring. AIHA J. 2002;63:275–83. [PubMed]
- Nicas M, Miller SL. A multi-zone model evaluation of the efficacy of upper-room air ultraviolet germicidal irradiation. Appl Occup Environ Hyg. 1999;14:317–28. [PubMed]
- Nicas M, Plisko MJ, Spencer JW. Estimating benzene exposure at a solvent parts washer. J Occup Environ Hyg. 2006;3:284–91. [PubMed]
- Ramachandran G. Toward better exposure assessment strategies—the new NIOSH initiative. Ann Occup Hyg. 2008;52:297–301. [PubMed]

Articles from Annals of Occupational Hygiene are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |