|Home | About | Journals | Submit | Contact Us | Français|
We developed prediction models for bone lead using blood lead levels and other standard covariates in a community-based cohort of older men.
Participants having bone lead levels measured by K-x-ray fluorescence were included in the model selection process (n=825). Predictors of each tibia and patella lead were identified in three quarters of the population and then predicted the bone lead levels in the remaining one quarter and in the Community Lead Study.
18 predictors were selected for tibia (blood lead, age, education, occupation, smoking status, pack-years of cigarette, serum levels of phosphorus, uric acid, calcium, creatinine and total and high-density lipoprotein cholesterols, hematocrit, body mass index, systolic and diastolic blood pressure, diagnoses of cancer and diabetes; R2=0.32) and 16 for patella lead (among the predictors included in the tibia model diagnosis of cancer, serum levels of calcium and total cholesterol were not included in patella lead model, but diagnosis of hypertension was included; R2=0.34), respectively. The correlation coefficients between the observed and predicted values were 0.43-0.50 for tibia and 0.52-0.58 for patella lead in internal and external validation. We applied these predicted bone lead models to the Third National Health and Nutrition Examination Survey (NHANES-III) to examine associations with hypertension and found relatively more significant associations compared to blood lead.
This study suggests that the prediction equations may be used to predict bone lead levels in other community-based cohorts with reasonable accuracy.
Environmental exposure to lead in the United States has decreased substantially since the 1970s with an 84 percent decline in geometric mean blood levels of lead in adults (1, 2). However, there still remain serious concerns regarding the effects of cumulative exposure to lead, particularly, in association with chronic health endpoints (3). Lead levels in bone have been suggested as an indicator of cumulative lead exposure (3, 4). The half-life of lead in bone is of the order of years to decades (5, 6). In adults, approximately 95% of the total body burden of lead is deposited in the skeleton (7, 8) and therefore, bone lead levels may better predict the effects of lead toxicity that arise from long-term low to moderate exposure. The evolution of K x-ray fluorescence (KXRF) technology has made in vivo measurements of bone lead concentrations possible (3, 4). Numerous studies have reported associations of elevated bone lead levels as measured by KXRF with various chronic health outcomes (9-11). The KXRF measurements require fairly sophisticated trained personnel and technology and thus, accurate bone lead measurements are not commonly available in standard community-based cohorts due to limitations in infrastructure and budgetary resources. On the other hand, lead in blood reflects, for the most part, recent lead exposure and is relatively easier and less expensive to measure and therefore, may be more frequently available in other cohorts.
Several studies have reported determinants of bone lead levels (12-19). These studies presented several multivariate regression models for bone lead levels in terms of covariates such as age, race, smoking and drinking behavior, and education. However, all these studies, except the study by Kosnett et al. (19), did not contain a model for bone lead level in terms of blood lead and other covariates, and did not address the issue of predictability of bone lead levels based on a fitted equation.
The purpose of this study is to develop and validate a prediction equation for chronic lead exposure as measured in tibia (representing cortical bone) and patella (representing trabecular bone) by KXRF using blood lead levels and other demographic, socio-economic, and clinical variables that are commonly available in any study cohort of lead. Additionally, we illustrate how predicted bone lead measures can add to prediction of health outcomes in a new cohort after incorporating the uncertainty in the predictions.
In this study we focused on participants of the Normative Aging Study (NAS) established by the Veterans Administration in 1961. The NAS is a longitudinal study in which 2280 community-dwelling men from 21 to 80 were recruited from the Greater Boston area. The enrollment contained both veterans and nonveterans, and represented a great variety of occupations, while only few, if any participants were employed in primary lead industry. Based on current status, physical examination and medical history, qualified volunteers were selected. Those excluded consisted of ones with history or presence of chronic conditions such as heart disease, hypertension, diabetes mellitus, cancer, peptic ulcer, gout, recurrent asthma, bronchitis, or sinusitis. Criteria of disqualification also included systolic blood pressure (SBP) >140 mmHg or diastolic blood pressure (DBP) >90 mmHg, but not biochemical parameters. During the follow-up period, study subjects returned every 3-5 years, took extensive examinations and had laboratory, anthropometric and questionnaire data collected.
Starting in 1991, assistants at Department of Veterans Affairs Outpatient clinic in Boston took fresh whole blood specimen from participants during their regular tests. Permission was sought to use these subjects for KXRF bone lead measurements.
The blood samples for lead measurement were analyzed by graphite furnace atomic absorption with Zeeman background correction (ESA laboratories, Inc., Chelmsford, MA). Values below the minimum detection limit of 1 μg/dL (<1%) were coded as 0. The instrument was calibrated with National Institute of Standards and Technology Standard Reference Material (NIST SRM 955a, lead in blood) after every 20 samples. Ten percent of samples were run in duplicate; at least 10% of the samples were controls and 10% were blanks. In tests on reference samples from the Centers for Disease Control and Prevention (Atlanta, GA), the coefficient of variation (CV) ranged from 8% for concentrations from 10 to 30 μg/dL to 1% for higher concentrations. In this study, the CV was 5% for concentrations below 10 μg/dL. Compared with a NIST target of 5.7 μg/dL, 24 measurements by this method gave a mean of 5.3 (SD=1.2) μg/dL.
Bone lead measurements were taken of each subject's mid-tibial shaft and patella using a KXRF instrument (ABIOMED Inc., Danvers, MA). The tibia and patella have been targeted for bone lead research because they consist mainly of cortical and trabecular bone, respectively. The physical principles and technical details of this instrument have been described elsewhere (20, 21). The unit of measurement derived is μg of lead per g of bone mineral (μg/g). The instrument provides a continuous unbiased point estimate that oscillates around the true bone lead value, thus, negative measurements are sometimes produced when bone-lead values are close to zero. The instrument also provides an estimate of the uncertainty associated with each measurement that is equivalent to a single standard deviation. Retaining all the measurements, including the negative ones, as opposed to applying a threshold approach makes better use of data in epidemiologic studies (22).
The initial dataset contained 1786 observations on 877 NAS participants. To control the quality of KXRF measurement, we first identified tibia and patella lead measurements with estimated measurement uncertainty greater than 10 and 15 μg/g, and discarded these observations. Additionally, the KXRF instrument used for bone lead measurement had been updated in 1999. To ensure homogeneity among observations, the observations measured by the new instrument were excluded from our working dataset. We initially included 28 potential predictors in our analysis, a list of which is provided in Table 1. We deleted observations with severe missing data in the initial set of covariates. After the data-cleaning procedure, our working dataset then consisted of 825 subjects with a total of 1,434 observations, with subjects having more than one observation in most cases. Among the study subjects only one had four observations; 126 had three observations; 354 had two observations and 344 had only one observation.
For this study, each individual's first measurements (baseline measurements, n=825) were included in the development of prediction models, instead of fitting a longitudinal model to the entire set, as (i) prediction is a more difficult concept in a longitudinal set-up, and (ii) because our goal is to develop a prediction equation which is usable in other cohorts, which may not have longitudinal data available. The follow-up observations were used for validating our models fitted on the baseline measurements. Figure 1 demonstrates procedures of model development and validation.
The 825 subjects were randomly assigned to either a training set or a test set with an approximate ratio of 3:1 after deleting the missing observations in covariates or response. The training set was used for model selection and fitting the prediction equation, whereas the test set was used to assess the prediction performance of the fitted model on a set of independent observations, not used in the model building process.
Among the 28 potential predictors, we pre-selected three variables considered to be important in determining lead burden in the body from previous studies: age, job type and education (13, 18, 23, 24). To identify other important predictors, we first calculated the residuals corresponding to a linear regression model of bone lead on the three pre-selected variables, fitted on the training set. We treated these residuals as the response in the next model selection procedure step and performed the least absolute shrinkage and selection operator (LASSO), a novel model selection plus shrinkage estimation procedure (25), instead of traditional stepwise methods. In usual least squares linear regression models, the regression coefficients are estimated by minimizing the residual sum of squares. However, if a large number of predictors are available, and some variables are highly correlated, then variance of the ordinary least squares estimators may be unacceptably high. The LASSO operator, proposed by Tibshirani (25), estimates a vector of regression coefficients by minimizing the residual sum of squares, subject to sum of the absolute value of the regression coefficients being less than a constant. Because of the nature of the constraint, the LASSO shrinks some of the regression coefficients to 0, or eliminates those variables. The LASSO has better prediction accuracy than subset selection when there may be a large number of predictors with moderate to small effects, which appears to be the case in this situation. The model fitted by LASSO was then assessed by several traditional model selection criteria, like the Akaike information criterion (AIC), the Bayes information criterion (BIC), Mallows' Cp and adjusted R2.
To minimize the sensitivity of the variable selection process with respect to a single random partition of the data to training set and test set, we repeated the selection procedures in 1000 randomly partitioned test and training sets and created a list of predictors which were selected in more than 50% of the replications. We then fitted a final linear regression model with these selected predictors on our fixed initial training set. We computed partial R2 corresponding to each predictor using type-II sum of squares which evaluates the partial effect of a predictor by comparing a model with the full set of covariates to a model of dimension one less, by removing that particular predictor, but retaining all other covariates.
Finally, we verified the predictive ability of the fitted models on two test sets. Test set 1 consisted of approximately 25% of the baseline observations with complete case-data, not included in the training set. Test set 2 consisted of the next follow-up observation available on the subjects included in Test set 1. Thus, both the test sets are completely non-overlapping with the training set on which fitted equations are derived. We calculated predicted values for bone lead levels corresponding to the Test set observations based on the fitted equation from the training set. We then calculated the correlation between observed bone lead levels and predicted bone lead levels in the test sets. We also calculated descriptive statistics on the absolute prediction error (|observed-predicted|) distribution to assess prediction accuracy. In order to determine relative accuracy of the predicted bone lead levels, we compared the absolute prediction errors to the estimated measurement uncertainties which are generated by the KXRF instrument and represent inherent variability of the bone lead measurements.
We also constructed a set of reduced models without blood pressure and disease diagnoses. The reason for constructing this secondary model is the following. Predicted bone lead levels from our full model, if naively used in an association model corresponding to similar health outcomes as response, could potentially lead to strongly significant results that are artifact of the model building process and duplicate use of the outcome data in prediction and association. Hereafter, we refer to the model with all predictors including blood pressure and disease diagnoses as “full model” and the model without those variables as “reduced model”.
Other variables that have been reported to be associated with bone lead levels, such as dietary intakes of calcium and vitamin D (12), were also considered, but not included in the final model because their influences were minimal and because they might not be routinely collected in population studies which precludes investigators from using this prediction model. We also considered cumulative traffic as a surrogate of past lead exposures to traffic particles from combustion of leaded gasoline (26), but did not include it because of the same reason above.
We also developed prediction models including urinary cross-linked N-telopeptides of type I collagen (NTx), because NTx is considered a specific marker of bone resorption (27) and thus may reflect lead mobilization from bone (4, 28). An inverse of NTx (1/NTx) and an interaction between blood lead and 1/NTx were included in the models in order to account for the effect modification of blood lead by the rate of bone turnover. Due to missing in NTx responses, smaller numbers of subjects were used for the models with NTx.
To validate the prediction model in another cohort, we used data from the Community Lead Study (CLS) that includes minority and female subjects. Details of population characteristics can be found elsewhere (13). Briefly, subjects were recruited from the pool of subjects who participated in a study funded by the National Institutes of Health (NIH; “Impact of Sleep-Disordered Breathing in Older Adults”, NIH HL51075; PI, D. Sparrow). Between 1999 and 2000, 84 minority subjects aged ≥ 35 years (mean age 50 years, 82% African American, 67% female) were invited to measure bone and blood lead levels with the same techniques used in the NAS. Characteristics of the CLS population are presented in Table 2. Because all variables selected in the training set were not available in the CLS, we constructed a prediction equation using a subset of six predictors that were available in the CLS and had high partial R2 in the prediction model: age, blood lead levels, education, job-type, cumulative cigarette pack-years and smoking status. Hereafter, we refer to this model as “6-predictor model”. We carried out similar validation procedures for this model of 6 predictors in the CLS as well as in Test sets 1 and 2.
To examine applicability of the prediction models in another cohort where bone lead levels are not available, we examined the associations between estimated bone lead levels and hypertension using data from the Third National Health and Nutrition Examination Survey (NHANES-III). The association between blood lead and hypertension has already been published elsewhere (29, 30). Vupputuri et al. found that blood lead levels were significantly associated with hypertension only among both white and black women, borderline significantly associated with hypertension among black men, and not associated with hypertension among white men (30). We ran multiple logistic regression models with hypertension as an outcome and bone lead estimated using one of the three prediction models (full, reduced or 6-predictor model) as an exposure variable, adjusting for age, age2, education, smoking status, pack-years of cigarette, BMI, hematocrit, alcohol consumption, physical activity, use of antihypertensive medication, and diagnosis of type-2 diabetes. We conducted stratified analyses by race and gender as done by Vupputuri et al (30). We also did stratified analyses by gender and age (<50 vs. ≥ 50 years) to see whether our models constructed from older males predict bone lead levels in younger population. We log-transformed blood lead to normalize the distribution but used non-log-transformed bone lead because those distributions were not very skewed and because bone lead levels include negative values. We illustrate how to use regression calibration techniques (31) to accommodate for prediction uncertainties in the imputed bone lead measurements in the logistic regression association models and obtain correct standard errors. All analyses were performed using the survey package in SAS (SAS Institute, Inc., Cary, North Carolina) to account for the complex sampling design.
Table 1 shows the distributions of demographic and clinical characteristics and lead biomarkers among the baseline sample of 825 subjects in the NAS. The mean age was 67 years, and 29% and 53% of subjects were college graduates and white collar, respectively. There were no apparent differences in terms of descriptive statistics of participants in the training and test sets.
Figure 2 shows a visual description of LASSO model selection iterations in the tibia lead prediction model plotted against certain standard model assessment diagnostics like the AIC, BIC, Mallow's Cp and adjusted R2. These figures correspond to the fixed training set data. See Supplemental Figure 1 for the LASSO model selection for the patella lead. To arrive at a more robust set of predictors, we aggregated the model selection results over several training sets and collected the variables that appeared consistently over the iterations. Though the optimal dimension of the selected model varied across training sets, most frequently 15 predictors were included in tibia and 13 in the patella lead model besides 3 pre-selected predictors (age, education and job type) which are listed in Table 3.
Table 3 presents the regression models fitted to the complete case data in our original training set. In this training set 550 subjects had complete data on tibia lead and selected predictors, whereas 548 of these subjects had complete data on patella lead and the selected covariates. This difference in the number of complete case observations between tibia and patella lead models was due to missing data in different covariates used in the tibia lead and patella lead models. Blood lead, age, job type and education contributed almost all of the total variations explained by the two models. Blood lead had a stronger partial association with patella lead than tibia lead (partial R2=0.0924 and 0.1253 for tibia and patella, respectively). Other variables that were statistically significant in both models were serum phosphorus (P=0.001 for tibia lead and P=0.01 for patella lead). Variables reaching significance in the tibia lead model only were serum uric acid (P=0.005), SBP (P=0.04) and current diagnosis of diabetes (P=0.04). Cumulative cigarette smoking as measured by pack-years appeared to have a stronger effect in the patella lead model (P=0.09) than the tibia lead model (P=0.37). Current diagnosis of hypertension (P=0.01) and high density lipoprotein (HDL) (P=0.03) also attained significance in the patella lead model. Finally, a set of 18 and 16 predictors were selected for tibia lead and patella lead, respectively. The different predictors included in each model are serum calcium, total cholesterol, and diagnosis of cancer for tibia lead; diagnosis of hypertension for patella lead. The R2 for the patella lead model was slightly higher than that for the tibia lead model (0.335 vs. 0.316), and so was the adjusted R2 value (0.313 vs. 0.290). The predictors listed in Table 3 were selected only if they were identified as important predictors more than 500 times out of 1000 different training sets by the LASSO procedure. Some variables with high p-values, such as serum creatinine (p=0.85) and serum calcium (p=0.79) for tibia lead and serum creatinine (p=0.86) and hematocrit (p=0.82) for patella lead, appeared not to be predictors of bone lead levels in the original training set, but they were identified as important predictors in other training sets (data not shown). Before we delve into the issue of prediction, we would like to mention that this model selection and fitting procedure provides guidance in terms of relative importance and statistical contribution of the variables determining the different lead burdens in body.
The fitted models were then used to predict the tibia bone lead levels of 188 subjects with completely observed data and, patella lead levels in 186 subjects with completely observed data in the first test set from NAS. Figure 3 presents validation of the fitted model in this test set. The correlations of the observed versus the predicted values were 0.454 for tibia lead and 0.524 for patella lead. The prediction was poor for the observations with higher observed values of tibia and patella lead, which was expected due to lack of observations and increasing variability in those sparse domains. The bottom panel plots the errors in prediction (observed – predicted), again showing that a reasonable prediction accuracy was attained by using the fitted model. The mean absolute error for tibia lead was 7.9 (SD=7.4) μg/g, with median 6.5 (interquartile range (IQR)=6.2) μg/g, and that for patella lead was 10.2 (SD=9.4) μg/g with median 7.7 (IQR=9.7) μg/g. These absolute errors do not appear to be much different compared to the measurement uncertainties of the instrument (median: 5.3 μg/g for tibia lead; 6.9 μg/g for patella lead).
We carried out a second validation on test set 2 as described in the methods section consisting of 118 subjects for tibia lead and 117 subjects for patella lead (see Supplemental Figure 2). We obtained slightly higher correlation coefficients for both tibia lead (r=0.497) and patella lead (r=0.575) than those in the test set 1. The mean absolute errors of prediction were also similar.
We also fit the models after dropping blood pressure and disease diagnoses (reduced model) which can be used when cancer, diabetes or blood pressure-related outcomes such as hypertension is examined (Table 4). Adjusted R2's were slightly reduced in both the tibia lead and patella lead models.
We also developed prediction models including NTx to account for individual differences in the bone resorption rate. NTx (1/NTx) and its interaction with blood lead were not significant predictors of both tibia and patella lead levels (data not shown), thus, including these variables did not improve the prediction models.
Since the models presented in Tables 3 and and44 contain measurements on many specific serum biomarkers and personal disease history variables which may not be available in other community based cohorts, we selected six basic variables which are likely to be routinely available in other cohorts and which account for a large contribution to the R2 reported in Table 3. The results based on this model consisting of blood lead, age, education level, job type, smoking status and cumulative cigarette smoking (pack-years) are reported in Table 5. The validity of this model was tested on the two previously defined test sets constituted from the NAS. We found slightly better results in these models than in those with more covariates including serum measures and disease history (see Supplemental Figures 3 and 4). For example, the correlations of the observed versus the predicted values were 0.480 for tibia lead and 0.518 for patella lead in the test set 1. The mean absolute errors of prediction for tibia lead and patella lead were 7.5 (SD=7.3) μg/g and 10.0 (SD=9.7) μg/g, respectively. Similar results were also found in the test set 2.
To examine the generalizability of the prediction equation to other cohorts beyond NAS, we used the prediction equation described in Table 5, to predict bone lead levels of 84 study participants in the CLS. Figure 4 presents the plot of observed vs predicted bone lead levels as well as the error of prediction. For CLS subjects, the correlation coefficients between observed and predicted values were 0.431 for tibia lead and 0.575 for patella lead. The mean absolute errors of prediction for the CLS subjects were 9.2 (SD=7.5) μg/g with median 7.9 (IQR=9.2) μg/g for tibia lead and 11.1 (SD=9.3) μg/g with median 9.0 (IQR=13.6) μg/g for patella lead levels. The absolute errors in this external validation seem to be somewhat higher compared to the measurement uncertainties in CLS (median 5.9 μg/g for tibia lead; 7.5 μg/g for patella lead) which could result from the different demographic characteristics of the NAS versus CLS population.
Blood lead and estimated bone lead levels among NHANES-III participants were presented in Table 6. The study sample included 12,500 subjects (5,870 men and 6,630 women) with a weighted mean ± SE age of 44.6 (0.46) years (range 20 – 90 years). Weighted mean ± SE blood lead level was 3.52 ± 0.10 μg/dL; higher for men compared with women and for blacks compared with whites. Estimated bone lead levels differed by the prediction models; the 6-predictor models had the highest bone lead levels (5.47 ± 0.35 μg/g for tibia lead and 6.43 ± 0.52 μg/g for patella lead) followed by the full models (3.81 ± 0.36 and 4.97 ± 0.52 μg/g) and the reduced models (3.32 ± 0.36 and 3.52 ± 0.53 μg/g). Estimated bone lead levels were also higher for men compared with women and for blacks compared with whites, but the weighted means among subjects younger than 50 years of age were all negative (except for black men) which resulted from negative values of the estimated bone lead levels for 20's and 30's. Subjects aged 50 years and older had substantially higher estimated bone lead levels from all three models.
Table 7 shows the associations between hypertension and blood lead and estimated bone lead levels among NHANES-III participants. After adjustment for potential confounders, a 1-SD increase in log-transformed blood lead (0.75 μg/dL) was associated with hypertension with an odds ratio (OR) of 1.12 (95% confidence interval (CI), 1.03, 1.23). As expected, the associations with estimated bone lead levels from the full models seem to be inflated (OR=3.70 (95% CI, 2.80, 4.89) for tibia lead (SD=15.2 μg/g); OR=4.95 (95% CI, 3.55, 6.90) for patella lead (SD=21.9 μg/g)). The associations with estimated bone lead levels from the reduced models appear to be much stable and stronger than those with blood lead (OR=2.06 (95% CI, 1.59, 2.67) for tibia lead (SD=15.1 μg/g); OR=1.69 (95% CI, 1.34, 2.14) for patella lead (SD=22.3 μg/g)). The 6-predictor models showed slightly larger effect sizes but similar significances to the blood lead model.
In stratified analyses by gender and race, the associations with blood lead were statistically significant only among women, borderline significant among black men and not significant among white men, which is consistent with the study by Vupputuri et al (30). The associations with bone lead markers from the reduced models were statistically significant for all 4 groups but the effect sizes among women seem to be overestimated. We found similar trends in the 6-predictor models.
In stratified analyses by gender and age (<50 vs. ≥ 50 years), blood lead was significantly associated with hypertension among men aged 50 years and older but among women younger than 50 years of age. The associations with bone lead markers from the reduced models among men were similar regardless of age groups, but those among women were larger in the older group. The associations with bone lead markers from the 6-predictor models among men were similar to those with blood lead, but among women the associations were larger in the older group.
While the analysis reported above treats the predicted bone lead levels as truly observed values in the health outcome association model, incorporating prediction uncertainties in the standard error expression for the coefficients in the logistic regression model for hypertension in NHANES-III is necessary for achieving correct inference. One may adopt two methods to achieve this goal: (i) recall that bone lead is imputed from a regression model of the following nature: bone lead = linear function of predictors + ε, where ε is the normally distributed error with mean 0 and a standard deviation parameter that can be estimated from the square root of the mean squared error of the fitted linear model (11.26 for tibia, 16.70 for patella based on the fitted model in Table 3). One can generate M copies of the bone lead predictions by generating M (typically 5-10) random normal variables from the estimated error distribution (N(0,11.26), say for tibia) and adding the errors to the point prediction. The errors can alternatively be generated from a log-normal distribution with parameters estimated from the observed error distribution in the training set to capture skewness in the bone-lead levels. Thus, one will have M datasets with M different imputed bone lead levels and each will give rise to a fitted logistic regression model and M sets of log OR estimates. The final estimate of the log OR and the corresponding variance derived from the M datasets can be obtained using the following equations (32).
Another approach is to adopt the regression calibration framework of Spiegelman et al (1997) to derive corrected large sample standard errors for the log OR. By applying this correction, 95% CI for the OR corresponding to 1-SD change in bone lead in the NHANES-III hypertension model (with bone lead as imputed via the full model) changes to (2.69, 5.12) for tibia lead; (3.38, 7.27) for patella lead. With imputation via the reduced model, 95% CIs are (1.57, 2.71) for tibia lead; (1.33, 2.16) for patella lead. For the 6-predictor model, they are (1.11, 1.83) for tibia lead and (1.09, 1.69) for patella lead. The point estimates of the OR remain unchanged.
In this paper we derived prediction equations for bone lead levels in terms of blood lead and a set of covariates that may be measured in population-based cohorts. Our analysis served two purposes, first through model selection, we identified the important predictors, and then by validating on test sets and an external cohort and applying to a cohort where bone lead data are not available, we assessed the translatability of the coefficients to a set of independent observations. Based on the comparisons between measurement uncertainty and the absolute errors of prediction and the measurement errors computed from the regression models in NHANES-III, the prediction models including important determinants seem to predict bone lead levels reasonably. Although we could not assess how accurate estimated bone lead levels in NHANES-III were, the findings that estimated bone lead markers were more strongly associated with hypertension than blood lead provide evidence of the practical value of the present study, in particular, use of the reduced models in relation to hypertension, high blood pressure and disease diagnoses, and the potential use of the full models in relation to other health endpoints, such as cognitive declines. However, the associations of bone lead levels estimated from the 6-predictor models with hypertension did not appear to be different from those of blood lead. These findings suggest that biochemical blood parameters, such as serum phosphorus, uric acid, total and HDL cholesterols, may be critical contributors to distinguish predicted bone lead from blood lead and the models including these parameters may provide better estimations of bone lead levels than the 6-predictor models.
In this study, we verified the prediction equations using internal test datasets as well as an external cohort, the CLS. The correlations between the observed and the predicted bone lead levels were high in both validation tests, ranging 0.4-0.5 for tibia lead and 0.5-0.6 for patella lead. The CLS participants were all members of minority groups with 82% African American and 67% females, which are quite different characteristics from the NAS. Gender is an important determinant of bone lead levels. Kosnett et al. observed no significant difference in tibia lead levels by gender among subjects less than 55 years age, but significantly higher tibia lead levels in males 55 years and older (19). This may be because the rate of bone remodeling increases in postmenopausal women: the bone resorption rate in postmenopausal women is accelerated, resulting in old bone being replaced relatively quickly by new bone matrix formed in a more recent lower-lead environment (14, 33). Although the majority of the CLS participants were females, 75% of the females were aged less than 55 years (mean age 49 years), and therefore, the influence of postmenopause might be limited. Hence, the prediction equations may not be appropriate for populations including a number of postmenopausal women. In fact, much larger effects of estimated bone lead were observed among women aged 50 years and older in NHANES-III (Table 7). Because women showed a larger association between blood lead and hypertension in NHANES-III, it is unclear whether the larger effects observed among women aged 50 years and older indicate true associations or inflation, thus the association results should be taken as precursors for further epidemiological investigation. Also the equations should be used cautiously to predict bone lead levels for persons in a stage of high bone turnover other than postmenopause, such as pregnant women.
Race/ethnicity may also be an important factor to predict bone lead levels, but not included in the current prediction equations because the NAS participants were predominantly White. A recent study conducted in Baltimore found higher tibia lead levels in African Americans compared with Whites, after controlling for socio-economic status (SES) or other factors (17). The authors suggested that this difference might be due to the higher life-time cumulative exposure to more polluted environments, rather than differences in bone kinetics by race/ethnicity. Although the equations in this study predicted bone lead levels well even in the CLS, we cannot rule out a possibility that using the prediction equations may underestimate bone lead levels for African Americans or other members of minority groups.
Lead levels in blood generally reflect both recent exogenous exposures and the mobilization of lead from the skeleton back into the circulation (3, 34). In the present study, a 1 μg/dL difference in blood lead levels was associated with a 1.1 μg/g change in tibia lead and a 1.9 μg/g change in patella lead (Tables 3 and and4).4). Blood lead was the strongest predictor of bone lead levels which explained approximately 9% and 13% of the variances in tibia lead and patella lead, respectively. Because only one study reported blood lead as a determinant of tibia lead levels and that study log-transformed tibia lead by adding an offset value (19), we could not directly compare that study with ours. The study conducted with 101 suburban residents also found blood lead (log-transformed) as a significant predictor of tibia lead, but the association was not as strong as ours (p=0.04). This might be because of the small number of subjects with a wide age range (11 – 78 years). The current and our previous studies show a stronger association between blood lead and patella lead levels than between blood lead and tibia lead levels. Lead in trabecular bone, such as patella bone, is more readily released than lead in cortical bone, and therefore, this higher correlation between blood lead and patella lead led to better validation results.
The equations may be more feasible for older populations not-occupationally exposed to lead. Predicted bone lead levels for older age groups seem to be comparable to the measured bone lead levels in other cohorts. Measured tibia lead levels for white men of the NAS (mean age ± SD: 67.2 ± 7.3, see Table 1), white women of the Nurses' Health Study (NHS, mean age ± SD: 58.7 ± 7.2) (35), and blacks (mean age ± SD: 59.7 ± 6.2) and whites (mean age ± SD: 59.1 ± 5.7) of the Baltimore Memory Study (36) were 21.9, 13.3, 21.5 and 16.7 μg/g, respectively. The estimated tibia lead levels in the older group of the NHANES-III were 19.2 μg/g for white men, 16.5 μg/g for white women, 24.7 μg/g for black men and 20.4 μg/g for black women (Table 6). Measured patella lead levels in NAS (white men) and NHS (white women) were 31.1 and 17.3 μg/g, respectively, and the estimated patella lead levels for the corresponding groups in NHANES-III were 27.5 and 23.2 μg/g, respectively.
Previous studies have consistently reported that age is the predominant predictor of bone lead levels (13-16, 18, 19). A one-year increase in age was positively associated with 0.28 to 0.63 μg/g increases in tibia lead and 0.24 to 0.72 μg/g increases in patella lead. In our study, the corresponding regression coefficients were 0.54 to 0.60 μg/g for tibia lead and 0.86 to 0.91 μg/g for patella lead. Age accounted for about 7 to 10% of the variance in bone lead levels, one third of the model R2 observed (Tables 3--5).5). Studies conducted by Kosnett et al. (19) and Lin et al. (13) suggested that age-related bone lead levels differ by age groups: tibia lead levels increased at a greater rate among persons aged 55 years and older, as compared with those < 55 years. This might reflect the age-related difference in bone turnover (26) or the birth cohort effect that exposure levels to environmental lead are different over time (37). Because our prediction models were derived from an older age cohort (the range of age 48-94 years), the bone lead levels for young participants (less than 50 years of age) in NHANES-III did not appear to be adequately predicted: estimated bone lead levels in this age group were negative. Nonetheless, this does not necessarily mean that examination of the effect of estimated bone lead in younger population is inadequate; we could detect positive associations between estimated bone lead and hypertension in this age group (Table 7).
Education, job-type (white collar or not) and cigarette smoke were identified as strong determinants of bone lead levels. Educational attainment and job-type are proxies of SES and low SES is a well-known predictor of bone lead levels (3, 24). In studies with populations with occupational lead exposure, exposure duration was another important determinant of bone lead levels (15, 16), which could not be examined in the present study. Because details of occupation history are unlikely collected in community-based studies, even the crude white or blue collar occupation classification improves the prediction models. Smoking can also predict bone lead levels well because lead exposure may occur through direct lead intake from tobacco, increased hand-to-mouth activity, and enhanced permeability of the smoke-exposed respiratory tract (38, 39).
Biological data are often right-skewed and non-normally distributed and distributions of bone lead levels in the present study were not normal. In general, the assumption of normality is particularly important for hypothesis testing and construction of confidence intervals. However, the primary objective of our study is to provide point predictions for bone lead levels, not to do hypothesis testing with bone lead as response, and thus, these assumptions are not critical for our prediction rule. The least square fitting may be evaluated as a optimization technique to yield the best fitted surface irrespective of the distributional assumption of a standard normal model. In addition, negative bone lead values are sometimes produced when the true value is close to zero (22) and in fact, there were negative bone lead levels in this cohort which precludes log-transformation. Therefore, we used non-transformed bone lead levels as the target outcome for prediction. Also, while correcting for prediction uncertainty in the second-step logistic regression association models with predicted bone lead as a risk factor, normality assumption is not required (40).
Several limitations should be considered in this study. Many other factors that were not considered in the prediction equations may contribute to bone lead prediction, such as diet and genetic polymorphisms. We considered NTx, a marker of bone resorption, but it was not identified as an important predictor of bone lead levels. In the NAS, our group previously found that dietary intakes of vitamin D, phosphorus and calcium were associated with bone lead levels (12). However, those dietary variables were not identified as important predictors in the present study, probably because of high correlations with serum phosphorus and calcium which might be better markers than dietary measures. We included those serum markers because they are routinely collected from a simple blood test, but not all population studies collect dietary intakes. In addition, our group has found that bone lead levels differ by genetic polymorphisms, such as δ-aminolevulinic acid dehydratase and hemochromatosis genes (41, 42). Including these genetic polymorphisms would provide better prediction models.
We presented the results for one specific training set and test sets, while the coefficients may change in the next training set selected. One could average all fitted equations over a large number of training sets and the prediction accuracy over test sets, to account for this uncertainty. However, we refrained from that approach as description, interpretation of model coefficients, as well as visualization of the prediction plots would become cumbersome in that context.
The results indicate that in fact the prediction equation fitted in NAS is translatable to other cohorts which may not exactly be similar to the NAS cohort in demographic and socio-economic characteristics. The consistency of the reported values across test sets and different models are indeed reassuring in the sense that in spite of sampling variability, and differences in the study population, one can ensure approximately 0.4-0.5 correlation between observed and predicted bone lead measurements.
If one were to impute the bone lead levels in other studies based on this equation, it is important to include a measure of the uncertainty associated with this prediction and to adjust the variance estimates of the obtained effect sizes accordingly. A naïve approach of assuming the predicted bone lead levels as true values will underestimate the actual variances and inflate Type I error properties of the tests based on imputed data. Prediction uncertainty measures and prediction intervals for the measurements can be obtained, using the estimated error variability from the NAS study.
In summary, this study suggests that the prediction equations developed in the NAS may be used to predict bone lead levels in other community-based cohorts with reasonable accuracy based on blood lead levels and other standard covariates where bone lead measurements are not available.
Disclosure: This study was funded by the National Institute of Environmental Health Sciences (NIEHS) R01-ES05257, ES10798, and P30-ES00002. S.K.P was supported by the National Institute of Environmental Health Sciences (NIEHS) K01-ES016587-01A1.
All Sources of Support: This research was supported primarily by the National Institute of Environmental Health Sciences (NIEHS) R01-ES05257, ES10798, and P30-ES00002. S.K.P was supported by the National Institute of Environmental Health Sciences (NIEHS) K01-ES016587-01A1. The VA Normative Aging Study is supported by the Cooperative Studies Program/Epidemiology Research and Information Center of the US Department of Veterans Affairs and is a component of the Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, Massachusetts. The authors declare they have no competing financial interests.