Search tips
Search criteria 


Logo of annhygLink to Publisher's site
Ann Occup Hyg. Jun 2010; 54(4): 459–472.
Published online Apr 23, 2010. doi:  10.1093/annhyg/meq027
PMCID: PMC2913720
Statistical modeling of occupational chlorinated solvent exposures for case–control studies using a literature-based database
Misty J. Hein,1* Martha A. Waters,2 Avima M. Ruder,1 Mark R. Stenzel,3 Aaron Blair,4 and Patricia A. Stewart5
1Division of Surveillance, Hazard Evaluations and Field Studies, National Institute for Occupational Safety and Health, 4676 Columbia Parkway, Cincinnati, OH 45226, USA
2Division of Applied Research and Technology, National Institute for Occupational Safety and Health, 4676 Columbia Parkway, Cincinnati, OH 45226, USA
3Exposure Assessment Applications, LLC, 6045 North 27th Street, Arlington, VA 22207, USA
4Division of Cancer Epidemiology and Genetics, National Cancer Institute, Executive Plaza South, Room 8118, Bethesda, MD 20892, USA
5Stewart Exposure Assessments, LLC, 6045 North 27th Street, Arlington, VA 22207, USA
*Author to whom correspondence should be addressed. Tel: +1-513-841-4207; fax: +1-513-841-4486; e-mail: mhein/at/
Received October 5, 2009; Accepted March 5, 2010.
Objectives: Occupational exposure assessment for population-based case–control studies is challenging due to the wide variety of industries and occupations encountered by study participants. We developed and evaluated statistical models to estimate the intensity of exposure to three chlorinated solvents—methylene chloride, 1,1,1-trichloroethane, and trichloroethylene—using a database of air measurement data and associated exposure determinants.
Methods: A measurement database was developed after an extensive review of the published industrial hygiene literature. The database of nearly 3000 measurements or summary measurements included sample size, measurement characteristics (year, duration, and type), and several potential exposure determinants associated with the measurements: mechanism of release (e.g. evaporation), process condition, temperature, usage rate, type of ventilation, location, presence of a confined space, and proximity to the source. The natural log-transformed measurement levels in the exposure database were modeled as a function of the measurement characteristics and exposure determinants using maximum likelihood methods. Assuming a single lognormal distribution of the measurements, an arithmetic mean exposure intensity level was estimated for each unique combination of exposure determinants and decade.
Results: The proportions of variability in the measurement data explained by the modeled measurement characteristics and exposure determinants were 36, 38, and 54% for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively. Model parameter estimates for the exposure determinants were in the anticipated direction. Exposure intensity estimates were plausible and exhibited internal consistency, but the ability to evaluate validity was limited.
Conclusions: These prediction models can be used to estimate chlorinated solvent exposure intensity for jobs reported by population-based case–control study participants that have sufficiently detailed information regarding the exposure determinants.
Keywords: case–control study, exposure assessment, exposure determinants, occupational exposure
Occupational exposure assessment for population-based case–control studies is challenging because exposure information is largely from questionnaire responses from study participants. Historically, information from study participants typically included job title, industry, and dates (and sometimes, tasks, chemicals, and equipments) obtained using open-ended questions. To overcome the limitations of this design, job or exposure-specific questionnaires have been developed that collect more detailed information. Regardless of the questionnaire design, little information has been available on how to convert questionnaire information into exposure estimates.
Two population- and one hospital-based case–control studies were conducted in the USA by the National Institute for Occupational Safety and Health (NIOSH) of the Centers for Disease Control and Prevention (CDC), the National Cancer Institute (NCI), and the National Center on Birth Defects and Developmental Disabilities of the CDC to examine associations between health outcomes and occupational exposures (Yoon et al., 2001; Ruder et al., 2006; Samanic et al., 2008). A primary occupational hypothesis in these studies involves chlorinated solvents. Two of the studies (Ruder et al., 2006; Samanic et al., 2008) used questionnaires that collected detailed job or exposure information; therefore, an optimization approach was developed to base the estimation process on job characteristics, called exposure determinants, that could be obtained from the questionnaire responses. A goal of the approach was to provide a rigorous transparent estimation process that could characterize exposure in widely varying job situations.
The purpose of these analyses is to develop an approach for estimating exposure levels for three chlorinated solvents: methylene chloride [Chemical Abstract Service (CAS) 75-09-2], 1,1,1-trichloroethane (CAS 71-55-6), and trichloroethylene (CAS 79-01-6) in the three studies. The approach developed here could be used in other studies involving these solvents, evaluated for the same exposure determinants. A similar determinant-based approach was used to estimate occupational exposures to three aromatic solvents (Hein et al., 2008).
The assessment approach, based on exposure determinants, involved several steps. First, all jobs identified as possibly exposed were characterized by a single set of exposure determinants based on the questionnaire responses. Second, this same set of determinants was used to characterize the available published measurement data for the three solvents of interest based on descriptions in the published reports. Third, a regression analysis of the measurement data and associated determinants was used to predict an exposure level for each unique set of determinant values. Finally, predicted levels were assigned to the reported jobs in the studies using each job's set of determinant values.
Measurement database
Air measurement results and associated sampling characteristics and exposure determinant information for the three solvents were compiled from published literature; NIOSH Health Hazard Evaluations (HHEs), detailed industrial hygiene reports of a single or several facilities; and NIOSH Industry-wide Studies (IWS) reports investigating typical exposure levels within specific industries (Table 1). Literature was identified from MEDLINE, TOXLINE, NIOSHTIC, and NIOSH HHE database searches, other reviews (e.g. Bakke et al., 2007; Gold et al., 2008), and personal archives. The compiled literature included primarily US journal articles and trade association reports (1940–2001), NIOSH HHEs (1976–1996), and NIOSH IWS reports (1951–1985) [IWS reports prior to 1971 (the year NIOSH began) include reports from the US HEW/HHS Bureau of Occupational Safety and Health.]. See Supplementary data (available at Annals of Occupational Hygiene online) for citations of the articles and reports included in the database.
Table 1.
Table 1.
Summary of the chlorinated solvents measurement database compiled from the published literature and NIOSH reports
Exposure metrics and sample size
Most publications reported individual measurements, but some provided only summary exposure measures, usually arithmetic means (AMs), but occasionally the geometric mean (GM) and geometric standard deviation (GSD), the median, or the range. When summary AMs were absent, reported information was used to estimate the AMs. First, when both the GM and GSD were available, the AM was estimated assuming a lognormal distribution as given below:
An external file that holds a picture, illustration, etc.
Object name is annhygmeq027fx1_ht.jpg
If only the GM was provided, the GSD was estimated to be 3.5 and a similar conversion was made. This value, although higher than often observed (Kromhout et al., 1993), was selected because many of the measurement data were across different jobs and worksites that likely would lead to greater variability than that reported by Kromhout. Finally, if only the range was provided, the AM was estimated by assuming a lognormal distribution according to the following algorithm: first, the midpoint of the log-transformed minimum and maximum levels provided an estimate of the mean of the log-transformed levels (equation n2); second, the range of the log-transformed levels divided by Wmedian, the theoretical median standardized range, provided an estimate of the standard deviation of the log-transformed levels (equation n3) (Pearson and Hartley, 1942; Lavoué et al., 2007); and finally, the AM was estimated as given below:
An external file that holds a picture, illustration, etc.
Object name is annhygmeq027fx2_ht.jpg
Publications reporting summary measures usually reported sample size. If not, sample size magnitude (e.g. 1, 2, or 10) was estimated based on information in the report including the purpose of sample collection, the number of measurements for other measured agents, the time span over which measurements were collected, and non-quantitative comments suggestive of the scale of the measurement collection effort. In the following, the term ‘reported levels’ refers to both individual measurements and reported or estimated AMs.
Reported levels presented in milligram per cubic metre were converted to p.p.m. (by volume) using the conventional formula for gases and vapors at normal temperature and pressure (25°C, 760 mmHg). The distributions of the reported levels were strongly skewed to the right and somewhat consistent with a lognormal distribution based on graphical methods, although log-normality was rejected by the Shapiro–Wilk test for all three solvents (P-value < 0.0001), likely due to the large sample sizes (D'Agostino and Stephens, 1986).
Measurement characteristics
The database included reported levels from long-term (≥60 min) and short-term (<60 min) and personal and area measurements. For estimating 8-h time-weighted average exposure intensities, data from long-term personal measurements were preferred; when these were unavailable, to increase the number of reported levels available for modeling, available short-term personal and long- and short-term area measurements were included. Data were available for 1940–2001, but were generally sparse prior to 1970.
Exposure determinants
Each report was reviewed to identify characteristics of the work site and the job's interaction with the work site associated with each measurement. The characteristics evaluated, called exposure determinants, were mechanisms of release of the solvent into the breathing zone of the monitored worker (described below); the process condition (closed, open, or both); the process temperature (room temperature, elevated, or both); the solvent usage rate at the location where the measurement took place (<380 l month−1, 380–3800 l month−1, or >3800 l month−1); the types of ventilation available (described below); the process location (outdoor, indoor, or both); and the worker's location, i.e. in a confined space (no, yes, or both) and proximity to the exposure source (≥0.9 m, <0.9 m, or both).
Mechanism of release included evaporation and five active mechanisms: spreading, manual agitation, rolling, mechanical agitation, and aerosolization. Basic industrial hygiene principles suggest that compared to evaporation, active mechanisms of release should be associated with higher emission levels, due to the external energy imparted to the solvent and the increased solvent–air interface. Each job was assigned a primary mechanism of release and where more than one mechanism of release was likely, a secondary mechanism of release (e.g. spraying a degreasing agent with a wand and then allowing the solvent to evaporate). Due to the small numbers of observations for some mechanisms, all active mechanisms, other than aerosolization, were combined.
Two ventilation descriptions were assigned: local exhaust ventilation (LEV) at the point of generation of the solvent and general industrial mechanical dilution ventilation (IMD; e.g. room air mixing using fans or recirculation). Ventilation was classified as: both LEV and IMD; LEV only; LEV and no LEV (where more than one source existed); IMD only; IMD and none (when some areas had no ventilation); or no ventilation present or specified. For modeling purposes, we considered effects for LEV separately from IMD. LEV was evaluated as absent, present but ineffective, or present and effective. IMD was evaluated as present or absent.
Each determinant value was assigned a value to indicate the associated level of certainty: L, literature (information in the report was used to assign the determinant value); F, probably factual; or J, judgment. Factual was assigned when the fact was inherent in the process. For example, assigning room temperature to trichloroethylene measurements taken when trichloroethylene was being used as an anesthetic in an operating room because it is unlikely that trichloroethylene would be heated in such a situation. Judgment was assigned when the determinant could vary from workplace to workplace. For example, the use of ventilation in a particular year was considered a judgment.
Statistical modeling
The goal of the statistical modeling was to relate the measurement levels abstracted from the literature to their associated exposure determinants to develop models for estimating exposure intensity for jobs reported by the case–control study participants. Some reported levels were presented as below the limit of detection (LOD), non-detectable, or zero; consequently, censored regression techniques based on maximum likelihood estimation methods (Lubin et al., 2004), which perform well for censoring <30% (Uh et al., 2008), were used. When no LOD was reported, a solvent- and year-specific LOD was assigned (Supplementary data are available at Annals of Occupational Hygiene online).
The LIFEREG procedure in SAS (version 9.2, SAS Institute Inc., Cary, NC, USA) was used to estimate model parameters. The outcome variable was the natural log-transformed reported level. Each observation was weighted by its sample size, which had the effect of multiplying the contribution of each reported level to the log likelihood. A regression model relating the reported levels to the assigned determinants and measurement characteristics was given by
An external file that holds a picture, illustration, etc.
Object name is annhygmeq027fx3_ht.jpg
where yi was the ith reported level, β0 was the intercept, β1–β14 were parameters for the exposure determinants, β15–β18 were parameters for the measurement characteristics (i.e. duration, type, and year of the measurement, if available, otherwise publication year), σm was the model scale parameter (i.e. the standard deviation of the model residuals), and ϵi, the error (assumed to be independent and identically distributed as normal with a mean of zero and a variance of one). Terms for measurement duration (long term, short term) and type (personal, area) were included in all models to account for the variation associated with these sampling characteristics. Measurement year, treated as a continuous variable, was centered at 1970. In this model, the intercept (β0) has the interpretation as the log intensity level for primary and secondary mechanisms of release of evaporation, no LEV, no IMD, medium usage rate, open process condition, room temperature, indoor location, not confined space, far proximity, long-term personal, and year 1970.
Only determinants for which a high percentage (≥50%) of values were reported in the literature (rather than probably factual or judgment) were further considered because initial modeling resulted in some parameter estimates that were difficult to interpret. Furthermore, reported levels with missing values for one or more exposure determinant were excluded, as were reported levels for which more than half of the exposure determinant values were based on judgment. These decisions resulted in models with greater interpretability. Finally, determinants with parameter estimates that were not significantly different from zero at a 5% level of significance were removed from the model using a backwards elimination procedure.
Model goodness-of-fit was estimated using a ‘pseudo’ R-squared computed from the scale parameter from the model (σm) and the scale parameter from a model fit with no independent variables (σnull) as
An external file that holds a picture, illustration, etc.
Object name is annhygmeq027fx4_ht.jpg
The pseudo R-squared is similar to the R2 derived from least squares regression and has the interpretation as the percentage of the residual variation that is explained by the terms in the model. Standardized model residuals were examined for normality and homogeneity.
To evaluate the model-predicted estimates, a long-term personal predicted intensity was computed for each unique combination of inputs to the prediction models. Unique combinations of inputs considered each exposure determinant and year and are henceforth denoted as ‘scenarios’. For example, one scenario was primary mechanism of release evaporation, secondary mechanism of release evaporation, LEV absent, IMD present, low usage rate, closed condition, elevated temperature, indoor location, not confined space, and far proximity. To avoid extrapolation errors due to limited data, no evaluations were done for years prior to 1970 for methylene chloride and 1,1,1-trichloroethane and years prior to 1950 for trichloroethylene. A prediction was made for each unique combination of inputs during each decade by using the midpoint of the decade as the measurement year, rather than evaluating yearly estimates.
Since the exposure measurements were log-transformed and a majority of the reported levels were based on individual observations, these predictions were geometric, rather than arithmetic, means. Assuming a lognormal distribution, the standard conversion from a GM to an AM was used (i.e. equation (1); Aitchison and Brown, 1963). The estimated scale parameter (σm) from the model can be used to estimate the GSD [GSD = expm)]; however, this resulted in estimated GSDs higher (i.e. 6.8, 9.1, and 4.0) than would be expected based on a single lognormal distribution. Instead, to convert estimated GM intensities to AM intensities, we used a GSD of 2.5 for each solvent because a previous analysis of a large number of measurements for a variety of chemical agents found GSDs ranging from 2.2 to 2.7 (Kromhout et al., 1993).
Plausibility of the model-predicted exposure intensity levels was evaluated by comparing the predicted AM intensities for the exposure scenarios to the current American Conference of Governmental Industrial Hygienists (ACGIH) threshold limit values (TLVs) (ACGIH, 2001, 2007) and to solvent saturation vapor pressures (SVPs), at 25°C. Practicing industrial hygienists use a rule of thumb that 1% of the SVP represents an air concentration likely to occur under the worst conditions (e.g. confined space with no ventilation) (Stenzel, 2006). Exposure intensities considerably higher than 1% of the SVP are not expected to occur in the workplace, at least, not with great frequency or not for lengthy periods of the workday.
Internal consistency of the predicted estimates was evaluated by ranking and grouping model-predicted AM intensities for the exposure scenarios into five categories; matching corresponding measurement data to these exposure categories; and calculating the mean and median of the reported levels, weighted by sample size, within each category. Because of the small number of scenarios with measurement data, solvent-specific cutpoints defining exposure categories were selected a priori based on SVP using the Rule-of-Ten (2, 20, 200, and 2000 p.p.m. for methylene chloride; 0.5, 5, 50, and 500 p.p.m. for 1,1,1-trichloroethane; and 0.3, 3, 30, and 300 p.p.m. for trichloroethylene) (Stenzel, 2006).
Although 100% of the data were used for modeling purposes, cross-validation of the modeling process used a combination of data splitting and Monte Carlo techniques. For each solvent, measurement data were split: 80% for modeling and 20% for validation. A model fit to the 80% modeling set was then applied to the exposure scenarios measured in the 20% validation set. Observed levels and predicted exposure levels were compared using the Spearman correlation coefficient. The process of data splitting, model fitting, and comparing observed and predicted values was repeated 1000 times, and the mean of the 1000 correlation coefficients was used as a measure of the validity of the modeling process. The Spearman correlation was used because of the need to appropriately rank the jobs reported in the epidemiologic studies by exposure level. A 95% confidence interval (CI) was estimated using the 2.5 and 97.5 percentiles of the observed correlation coefficients.
NIOSH HHEs supplied a majority of the measurements (Table 1). More than 90% of the reported levels were individual measurements and <30% were censored at the LOD. A majority of reported levels for methylene chloride and 1,1,1-trichloroethane were from long-term personal measurements, but trichloroethylene reported levels were split between personal (47%) and area (53%) measurements. The median measurement year for all three solvents was the early 1980s.
Evaporation was the most frequent primary mechanism of release for 1,1,1-trichloroethane and trichloroethylene and the most frequent secondary mechanism of release for methylene chloride (Table 2). Evaporation was specified as one of the two mechanisms of release for 87% of the database. Most reported levels were associated with open/both process conditions, indoor locations, and not confined spaces. Distributions for the other exposure determinants varied by solvent.
Table 2.
Table 2.
Exposure determinant distributions
Most determinants were abstracted from the literature (Table 3). Process temperature (for methylene chloride and 1,1,1-trichloroethane) and usage rate and confined space (for all three solvents) had ~50% or more of determinant values derived from judgment and so were excluded from the modeling process.
Table 3.
Table 3.
Distribution of decision values for each determinant
In the regression models, parameter estimates for measurement year were negative and highly statistically significant, resulting in estimates of declines of 2.7, 3.5, and 6.7% per year for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively (Table 4). Estimates for measurement duration and type were not consistent. For example, compared to long-term personal measurements, short-term personal measurements were significantly lower for methylene chloride but significantly higher for 1,1,1-trichloroethane. Measurement duration, type, and year explained a fair amount of the variability with R-squared values of 22, 13, and 42% for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively.
Table 4.
Table 4.
Parameter estimates and standard errors (SE) for models of natural log-transformed chlorinated solvent levels (p.p.m.)
Modeled parameter estimates for the three solvents were generally as expected. Because of small numbers and to increase interpretability, the category of active mechanism of release was combined with aerosolization for methylene chloride and with evaporation for trichloroethylene. For methylene chloride, compared to active/aerosolized, primary evaporation was associated with a 50% decrease and secondary evaporation with a 70% decrease. For 1,1,1-trichloroethane, compared to evaporation, active primary mechanism was associated with a 7-fold increase and aerosolized primary mechanism with a 30-fold increase; active and aerosolized secondary mechanisms were associated with 4-fold and 5-fold increases, respectively. For trichloroethylene, primary aerosolized was associated with a 2-fold increase and secondary aerosolized with a 10-fold increase compared to evaporation/active. Effective LEV was associated with 60–70% lower levels, and for trichloroethylene, ineffective LEV was associated with 30% lower levels, compared to no LEV. IMD was associated with 50% lower levels for methylene chloride and 1,1,1-trichloroethane, but only 20% lower levels for trichloroethylene. Elevated process temperature was associated with a 4-fold increase for trichloroethylene. Compared to working indoors, working at an outdoor location was associated with 90–95% lower levels for methylene chloride and 1,1,1-trichloroethane. Working in close proximity to the source was associated with an ~3-fold increase. The proportion of variation in the reported levels explained by these models was 36, 38, and 54 for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively.
Standardized residuals for the reduced models were approximately normally distributed; however, formal statistical tests (Shapiro–Wilk) rejected the null hypothesis of normality for methylene chloride (P-value = 0.0026) and trichloroethylene (P-value = 0.034) (data not shown). A visual inspection (box plots) of the standardized residuals indicated no major problems with heteroscedasticity.
Models described in Table 4 were used to predict AM exposure intensity levels for the exposure scenarios (described above). Predicted AM exposure intensity levels for the evaluated exposure scenarios ranged from 0.051 to 160 p.p.m. (median 2.8 p.p.m.) for methylene chloride, from 0.0013 to 200 p.p.m. (median 0.67 p.p.m.) for 1,1,1-trichloroethane, and from 0.21 to 3700 p.p.m. (median 30 p.p.m.) for trichloroethylene (Table 5). The percent of predicted exposure levels exceeding current ACGIH TLVs was comparable to the percentage of reported levels exceeding the TLVs for 1,1,1-trichloroethane (0 and 2.1%, respectively), but lower for methylene chloride (4.7 and 23%, respectively) and higher for trichloroethylene (71 and 45%, respectively). No predicted exposure intensities for the 192 evaluated methylene chloride exposure scenarios or the 432 evaluated 1,1,1-trichloroethane scenarios exceeded the 1% SVP threshold. For trichloroethylene, the large difference in percentage of reported versus predicted levels exceeding the current ACGIH TLV and the higher percentage of predictions above the 1% SVP threshold were due predominantly to estimates derived from the earlier decades (i.e. 1950s–1970s).
Table 5.
Table 5.
Percent of measurement data and predicted intensities (for the evaluated exposure determinant scenarios) exceeding various thresholdsa
Predicted exposure intensity levels were generally consistent with the measurement data, with the median and mean reported levels from the measurement database increasing with the predicted intensity scores (Table 6). For example, estimated methylene chloride exposure intensities for 85 (44%) of 192 exposure determinant scenarios were assigned to the lowest score (<2 p.p.m.). Air measurements in the database were available for six scenarios, representing 20 reported levels. The median and mean of these measurements were 1.8 and 6.5 p.p.m., respectively.
Table 6.
Table 6.
Internal consistency of predicted intensities
Mean Spearman correlation coefficients between the reported levels and the predicted intensities in the 20% validation samples were 0.21 (95% CI: 0.09–0.32), 0.47 (95% CI: 0.36–0.57), and 0.61 (95% CI: 0.49–0.72) for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively.
Statistical modeling frequently has been used to identify exposure determinants in a single industry or occupation [e.g. wood dust and particulates in lumber mills (Teschke et al., 1999a; Friesen et al., 2005); bitumen and polycyclic aromatic hydrocarbons among paving workers (Burstyn et al., 2000); and herbicide exposure among custom applicators (Hines et al., 2001)]. Some exposure estimates from these models have been used in occupational epidemiologic studies (Burstyn et al., 2007; Friesen et al., 2007). Modeling exposure determinants across multiple industries are somewhat less common, possibly due to a lack of available data or difficulty compiling data. Databases of measurement levels across multiple industries have been constructed for some agents by abstracting data from the published industrial hygiene literature [e.g. solvents (van Wijngaarden and Stewart, 2003; Bakke et al., 2007; Gold et al., 2008)] or by utilizing existing databanks of exposure information [e.g. the Integrated Management and Information System (IMIS) database of air sampling data from US Occupational Safety and Health Administration inspections beginning in 1972]. Teschke et al. (1999b) used IMIS data and a multiple regression model including terms for measurement year, state, industry group, and job group to estimate wood dust exposure levels for a population-based case–control study without having detailed questionnaire information about wood dust exposure. Lavoué et al. (2008) used IMIS data to model formaldehyde concentrations as a function of inspection type, sample type, season, industry, year, number of workers, state, and mean outside temperature. However, databases such as these (i.e. IMIS and, in our case, the published literature) suffer from several limitations including lack of representativeness of the available measurement data and lack of available data for many jobs and industries. The inclusion of exposure determinants based on environmental conditions during the measurement in such databases has additional limitations including limited information for accurate assessment of determinants. For example, the IMIS database does not include exposure determinant information considered here, such as mechanism of release and ventilation. Finally, given the retrospective nature of the exposure measurements comprising the database, there are limited avenues for model validation (Hein et al., 2008). Models derived from our database of chlorinated solvent air measurements and associated determinants share these same limitations.
Several modeling decisions may have impacted our results. The decision to include air measurement results from short-term personal and area samples was not without consideration of the limitations of doing so. The fundamental, but unverifiable, assumption was that information regarding exposure determinants would be similar across different sample types and that any differences could be captured by including terms in the regression model for measurement duration and type. For example, we assumed that the estimate for effective LEV was similar for all sample types, which results in the modification factor for effective LEV being the same for long-term personal, long-term area, and short-term measurements. In addition, the decision to not include a random effect in the model for ‘publication’ required an assumption that measurements from the same publication were independent. Our previous work identified a problem with confounding between random effects and some exposure determinants, so we elected to use models with no random effects (Hein et al., 2008). Finally, although a majority of the air measurement results were individual measurements, some publications reported only summary measurements. Therefore, since individual and summary measurements were combined in the regression models, the modeled estimates could not be strictly interpreted as GMs (Hein et al., 2008) and the variance estimates could not be strictly interpreted as variability in individual measurements. In contrast, Lavoué et al. (2007) developed a database of individual and summary formaldehyde measurements from the reconstituted wood panels industry for use in a regression model; however, prior to modeling, Monte Carlo simulation was used to re-create datasets from sources that did not report original data, enabling the authors to avoid problems with interpretation introduced by combining individual and summary measurements.
The AM is considered the measure of central tendency of choice in the calculation of cumulative exposure for use in epidemiologic studies and risk assessments (Seixas et al., 1988; Crump, 1998). Consequently, we converted predicted GM exposure intensities from the models to AM exposure intensities. Estimated GSDs based on the regression models (6.8, 9.1, and 4.0 for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively) were higher than those observed based on exposure determinants modeling of measurement data from a single industry [e.g. GSDs of 2.8–3.7 for bitumen fume, bitumen vapor, and benzo(a)pyrene in the asphalt paving industry (Burstyn et al., 2000)] or even in a single industry with exposure data spanning several years [e.g. GSDs of 2.4 for non-specific dust and wood dust from British Columbia sawmills 1981–1997 (Friesen et al., 2005)]. The high GSDs we observed may be a result of selective sampling (e.g. some reports included measurements on low-exposed jobs, some did not) or the variety of purposes of sampling (e.g. NIOSH studies may have been initiated because of reported exposure to another chemical, which could have resulted in lower levels than had the solvents been the agent of interest), as well as including data from multiple industries and decades. Because of the high GSDs, we converted GM exposure intensity estimates (obtained by exponentiation of the model-predicted log intensities) to AM exposure intensity estimates using an assumed GSD of 2.5.
The cross-validation using correlation tests indicated somewhat limited model validity, particularly for methylene chloride. Implications of low model validity include poor predictability. An explanation could be that the exposure determinants for methylene chloride did not reflect the same difference in exposures as the determinants for the other solvents. In spite of the low validity, the internal consistency showed good results. Based on the internal consistency, using the ranked scores rather than the predicted levels in p.p.m. should reduce misclassification, however, due to the limited internal validity, model bias remains likely. An additional limitation resulted from the constraints imposed by the measurement data and measurement characteristics and exposure descriptions provided in the literature. In many cases, important information was missing, so we assigned values for the missing determinants based on judgment, experience, and knowledge. The lack of information is likely to have resulted in some reported levels having been assigned to the incorrect exposure scenario. We attempted to reduce the impact of this limitation by dropping three exposure determinants (usage rate, confined space, and, for methylene chloride and 1,1,1-trichloroethane, temperature) that had a large number of values assigned from judgment. In addition, limited measurements in the literature meant that for many exposure scenarios, few or no reported levels were available. It also meant that some determinant values were only associated with a single combination of other determinant values, such that the variability of the determinants was limited. All these constraints could have affected the goodness-of-fit of the models.
For each solvent, jobs reported by study participants in the NIOSH and NCI studies rated as exposed to the solvent were assigned values for each of the exposure determinants (data not shown). The models described in Table 4 were then applied to these jobs to predict exposure intensities (Supplementary data are available at Annals of Occupational Hygiene online for a sample calculation). However, additional limitations result from the application of these models to specific studies. First, although the exposure measurements in the literature database spanned several decades, the jobs in the NIOSH and NCI studies covered a wider range of years. Consequently, estimated exposure intensities for early years are subject to era or time-period extrapolation error, particularly for methylene chloride and 1,1,1-trichloroethane, for which there were few data prior to 1970. If exposures were higher in the earlier decades (as expected), predicted estimates for the earliest decades may be too low. Since there is some evidence that occupational exposure levels tend to correlate with concurrent TLVs (Roach and Rappaport, 1990), including a modification factor, such as the ratio of the TLVs, might improve such estimates.
The models have several strengths. Exposure determinants explained a moderate amount of the variability in the data, particularly for trichloroethylene, in spite of the sparse data in many cases. Parameter estimates for the exposure determinants were interpretable. The quality of the exposure determinant information was considered by excluding determinants and observations based on judgment, which had the effect (data not shown) of increasing interpretability of the model. The resulting estimated exposure intensities are quantitative, internally consistent, and plausible when compared to the ACGIH TLVs and likely air concentrations based on the SVP. Because the models are based on the whole range of exposure literature rather than the range of exposures likely to have been encountered within a single or a few industries, they are probably more generalizable than cohort-based models.
This work was done to develop rigorous and transparent methods for estimating intensity levels for three case–control studies. Raters estimating exposure levels for such studies generally have been found to be in low to moderate agreement (Benke et al., 1997; Teschke et al., 2002; Correa et al., 2006). Here, rather than estimating intensity directly, determinants were used to characterize measurements and develop a prediction model. Separately, jobs reported by study participants were characterized for these same determinants to estimate the predicted intensity from the models. This reliance on determinants may be more straightforward, and more reproducible, than estimating intensity levels (Teschke et al., 1989). In particular, detailed job information (i.e. exposure determinants) elicited from study participants using job or exposure modules rather than typical work histories may lead to more accurate exposure estimates (Stewart et al., 1998). We believe that this approach should be considered when estimating intensity for epidemiologic studies. Further work will have to be done to determine whether the goal of increasing reliability was achieved.
In summary, we developed statistical models to estimate exposure intensity from exposure determinants for three chlorinated solvents for use in studies that included jobs from a wide variety of industries and occupations spanning a wide range of years. The models explained a moderate amount of the measurement variability and were internally consistent. These models can also be used in future case–control studies that have sufficiently detailed participant job histories.
Intramural Federal research funding from National Institutes of Health and CDC, including CDC/NIOSH Initiative for Cancer Control Projects for Farmers; CDC/NIOSH National Occupational Research Agenda; Intramural Research Program of the National Institutes of Health (National Cancer Institute).
Supplementary Material
[Supplementary Data]
AcknowledgementsThe authors thank Steven Ahrenholz, Dennis Roberts, and Steven Wurzelbacher of NIOSH for assisting with data cleaning efforts and Diana Echeverria, Marianne Story Yencken (formerly), and James Catalano (formerly) of Battelle for assisting with the creation of the exposure database.
Disclaimer—The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the NIOSH.
  • ACGIH. (2001) Documentation of the threshold limit values and biological exposure indices. 7th edn. Cincinnati, OH: ACGIH; Dichloromethane and methyl chloroform.
  • ACGIH. Documentation of the threshold limit values and biological exposure indices. 7th edn. Cincinnati, OH: ACGIH; 2007. Trichloroethylene.
  • Aitchison J, Brown JAC. The lognormal distribution. Cambridge, UK: Cambridge University Press; 1963. p. 8.
  • Bakke B, Stewart PA, Waters MA. Uses of and exposures to trichloroethylene in U.S. industry: a systematic literature review. J Occup Environ Hyg. 2007;4:375–90. [PubMed]
  • Benke G, Sim M, Forbes A, et al. Retrospective assessment of occupational exposure to chemicals in community-based studies: validity and repeatability of industrial hygiene panel ratings. Int J Epidemiol. 1997;26:635–42. [PubMed]
  • Burstyn I, Kromhout H, Johansen C, et al. Bladder cancer incidence and exposure to polycyclic aromatic hydrocarbons among asphalt pavers. Occup Environ Med. 2007;64:520–6. [PMC free article] [PubMed]
  • Burstyn I, Kromhout H, Kauppinen T, et al. Statistical modelling of the determinants of historical exposure to bitumen and polycyclic aromatic hydrocarbons among paving workers. Ann Occup Hyg. 2000;44:43–56. [PubMed]
  • Correa A, Min YI, Stewart PA, et al. Inter-rater agreement of assessed prenatal maternal occupational exposures to lead. Birth Defects Res A Clin Mol Teratol. 2006;76:811–24. [PubMed]
  • Crump KS. On summarizing group exposures in risk assessment: is an arithmetic mean or a geometric mean more appropriate? Risk Anal. 1998;18:293–7. [PubMed]
  • D'Agostino RB, Stephens MA, editors. Goodness-of-fit techniques. Statistics: textbooks and monographs, volume 68. New York: Marcel Dekker; 1986.
  • Friesen MC, Davies HW, Teschke K, et al. Predicting historical dust and wood exposure in sawmills: model development and validation. J Occup Environ Hyg. 2005;2:650–8. [PubMed]
  • Friesen MC, Davis HW, Teschke K, et al. Impact of the specificity of the exposure metric on exposure-response relationships. Epidemiology. 2007;18:88–94. [PubMed]
  • Gold LS, De Roos AJ, Waters M, et al. Systematic literature review of uses and levels of occupational exposure to tetrachloroethylene. J Occup Environ Hyg. 2008;5:807–39. [PubMed]
  • Hein MJ, Waters MA, van Wijngaarden E, et al. Issues when modeling benzene, toluene and xylene exposures using a literature database. J Occup Environ Hyg. 2008;5:36–47. [PubMed]
  • Hines CJ, Deddens JA, Tucker SP, et al. Distributions and determinants of pre-emergent herbicide exposures among custom applicators. Ann Occup Hyg. 2001;45:227–39. [PubMed]
  • Kromhout H, Symanski E, Rappaport SM. A comprehensive evaluation of within- and between-worker components of occupational exposure to chemical agents. Ann Occup Hyg. 1993;37:253–70. [PubMed]
  • Lavoué J, Bégin D, Beaudry C, et al. Monte Carlo simulation to reconstruct formaldehyde exposure levels from summary parameters reported in the literature. Ann Occup Hyg. 2007;51:161–72. [PubMed]
  • Lavoué J, Vincent R, Gérin M. Formaldehyde exposure in U.S. industries from OSHA air sampling data. J Occup Environ Hyg. 2008;5:575–87. [PubMed]
  • Lubin JH, Colt JS, Camann D, et al. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect. 2004;112:1691–6. [PMC free article] [PubMed]
  • Pearson ES, Hartley HO. The probability integral of the range in samples of n observations from a normal population. Biometrika. 1942;32:301–10.
  • Roach SA, Rappaport SM. But they are not thresholds: a critical analysis of the documentation of threshold limit values. Am J Ind Med. 1990;17:727–53. [PubMed]
  • Ruder AM, Waters MA, Carreón T, et al. The Upper Midwest Health Study: a case-control study of glioma in rural residents. J Agric Saf Health. 2006;12:255–74. [PubMed]
  • Samanic CM, De Roos AJ, Stewart PA, et al. Occupational exposure to pesticides and risk of adult brain tumors. Am J Epidemiol. 2008;167:976–85. [PubMed]
  • Seixas NS, Robins TG, Moulton LH. The use of geometric and arithmetic mean exposures in occupational epidemiology. Am J Ind Med. 1988;14:465–77. [PubMed]
  • Stenzel MR. Mixtures and non-ambient conditions. In: Bullock WH, Ignacio JS, editors. A strategy for assessing and managing occupational exposures. 3rd edn. Fairfax, VA: American Industrial Hygiene Association; 2006. pp. 279–84.
  • Stewart PA, Stewart WF, Siemiatycki J, et al. Questionnaires for collecting detailed occupational information for community-based case control studies. Am Ind Hyg Assoc J. 1998;59:39–44. [PubMed]
  • Teschke K, Demers PA, Davies HW, et al. Determinants of exposure to inhalable particulate, wood dust, resin acids, and monoterpenes in a lumber mill environment. Ann Occup Hyg. 1999a;43:247–55. [PubMed]
  • Teschke K, Hertzman C, Dimich-Ward H, et al. A comparison of exposure estimates by worker raters and industrial hygienists. Scand J Work Environ Health. 1989;15:424–9. [PubMed]
  • Teschke K, Marion SA, Vaughan TL, et al. Exposures to wood dust in U.S. industries and occupations, 1979 to 1997. Am J Ind Med. 1999b;35:581–9. [PubMed]
  • Teschke K, Olshan AF, Daniels JL, et al. Occupational exposure assessment in case-control studies: opportunities for improvement. Occup Environ Med. 2002;59:575–93. [PMC free article] [PubMed]
  • Uh HW, Hartgers FC, Yazdanbakhsh M, et al. Evaluation of regression methods when immunological measurements are constrained by detection limits. BMC Immunol. 2008;9:59. [PMC free article] [PubMed]
  • van Wijngaarden E, Stewart PA. Critical literature review of determinants and levels of occupational benzene exposure for United States community-based case-control studies. App Occup Environ Hyg. 2003;18:678–93. [PubMed]
  • Yoon PW, Rasmussen SA, Lynberg MC, et al. The National Birth Defects Prevention Study. Public Health Rep. 2001;116(Suppl 1):32–40. [PMC free article] [PubMed]
Articles from Annals of Occupational Hygiene are provided here courtesy of
Oxford University Press