In a family-based cohort of 372 persons with the protease inhibitor ZZ
genotype whose pulmonary function spanned a broad spectrum, we developed and tested predictive models for FEV1
and the presence of severe COPD by incorporating demographic, clinical, and genetic information. In validation testing, these models explained 50% of the variance in FEV1
and had a c
statistic of 0.88 in models for severe COPD, a level of performance that matches or exceeds other, well-established clinical prediction tools (21
). The predictive variables common to both models were age, sex, first 20 pack-years of personal smoking exposure, bronchodilator responsiveness, the presence of chronic bronchitis symptoms, and index case status. Two interacting loci in IL10
(rs3024492) and TNF
(rs1800610) met the criteria for inclusion in the severe COPD model and yielded a marginal improvement in predictive performance (the c
statistic increased from 0.86 to 0.88). When we tested for main effects and SNP-by-smoking interactions, none of the genetic markers in the 4 studied genes were incrementally informative in predicting FEV1
or severe COPD.
In previous analyses from the AAT Genetic Modifier study cohort, researchers identified associations between FEV1
and sex, asthma, chronic bronchitis, pneumonia, pack-years of personal cigarette smoking, and genetic polymorphisms in IL10
). As expected, results from our models corroborate the findings from previous analyses of this cohort, with the exception that asthma and the genetic polymorphisms in IL10
were not significant predictors in our models when analyzed as main effects. In the case of asthma, bronchodilator responsiveness was a correlated but more powerful predictor of FEV1
. Once this variable was included in the model, asthma diagnosis provided no additional predictive information. However, in models without bronchodilator responsiveness, asthma before age 30 years was a significant predictor of FEV1
and severe COPD. Regarding associations with IL10
, and NOS3
, none of the polymorphisms analyzed in our study added incremental benefit to our models when analyzed as main effects. This could be due to a true lack of effect, confounding with variables included in the model, or differences in analytic methods between our study and previous studies (8
). In addition, splitting our sample into development and validation samples limited our power to identify genetic associations.
A multiplicative interaction between loci in IL10
(rs3024492) and TNF
(rs1800610) met the inclusion threshold for the severe COPD model but not the FEV1
model, though the direction of effect in the FEV1
model was consistent with the COPD findings. Both SNPs are located in introns. In the family-based analysis of this sample by DeMeo et al. (12
), the TNF
SNP but not the IL10
SNP was associated with FEV1
. To our knowledge, the IL10
SNP has not been studied in other cohorts for an association with COPD. The TNF
SNP has been studied in at least 4 case-control studies; in 1 of them, investigators reported increased COPD risk associated with the minor allele (23
), and in the other 3, investigators reported no statistically significant association (24
). While there is a biologic rationale for a link between AAT, IL10
, and TNF
), our finding should be interpreted cautiously given the low power of our study to detect SNP-SNP interactions and the large number of statistical tests performed. Until this interaction is replicated in larger cohorts, it should be viewed as a tentative association that may be attributable to chance.
Our finding of a nonlinear relation between pack-years and FEV1
may be due to a “floor effect”; namely, once FEV1
levels of 30%–40% predicted are reached, there is simply less lung function to lose and mortality rates are so high that few people survive to manifest lower levels of FEV1
. This effect has been previously described in other clinical situations (28
). While previous research has demonstrated a linear relation between pack-years of smoking and FEV1
, there have been conflicting reports on the effect of a relation between pack-years and FEV1
in AAT deficiency (16
). A floor effect in the smoking-FEV1
relation could explain the observation of a linear relation in healthier populations and the lack of a relation in more severely affected populations. For example, in the Swedish national AAT deficiency registry (16
), the relation between FEV1
decline and pack-years was found to be dose-dependent, whereas in the more severely obstructed population studied by Hutchison et al. (19
), there was no statistically significant relation between FEV1
decline and smoking exposure. Understanding and correctly modeling these relations is crucial to the development of accurate predictive models.
This study had several strengths. First, our data set was characterized by wide variability in lung function, which made it well suited for developing a predictive model. Second, the family-based nature of our study sample mirrored the situation in which a predictive model would probably be used—that is, to predict the lung function of index patients and their newly diagnosed siblings. Third, we were able to test genetic polymorphisms in 4 genes for their potential impact on lung function.
However, our study also had some limitations. First, our study population had only 1 set of spirometric values, so our predictions were for single FEV1 values rather than longitudinal decline in FEV1. Our predictive model can provide an initial assessment of disease severity by allowing a comparison of the subject's observed and predicted FEV1 values. In addition, using our models, an estimate of an individual's future FEV1 level or likelihood of developing severe COPD can be calculated; but this requires the assumption that the older members of the study population are comparable to the younger ones in terms of COPD susceptibility and environmental exposures. A model developed from longitudinal data would not be susceptible to this source of bias.
Second, given the modest size of our study population, we had limited power to detect small effects and to test for interactions between candidate variables. With a larger sample size, it is likely that some of the genetic risk factors, epidemiologic risk factors, and interaction terms that were not included in our models would have reached the levels of significance required for model entry.
Third, while our models exhibited good predictive performance, 50% of the variance in FEV1 was not explained by our FEV1 model. Our efforts represent an initial attempt at developing a predictive model for lung function level, and we have demonstrated that a limited number of predictive variables explain a significant proportion of the variability in lung function. Future models might include genetic data from genome-wide association analyses, other interactions between epidemiologic and genetic factors, or important clinical variables not assessed in our models, such as passive smoking exposure.
Fourth, we validated our model by splitting our study population into a development sample and a validation sample. However, data supporting some of our initial variables (asthma, pneumonia, and chronic bronchitis) came from an earlier analysis of this data set, which biased our model towards better validation performance than might be expected in a previously unanalyzed sample. However, if these variables are excluded, the resulting predictive models still explain 47% of the variance in FEV1 and have a c statistic of 0.84.
There has been considerable debate regarding the potential usefulness of genetic information in disease risk prediction (30
). While common genetic variants individually explain only a small portion of disease risk, the identification of many such variants may explain a sizeable portion (33
). More importantly, common genetic variants probably interact with other factors, such as environmental exposures and other genes, to produce significant effects. Theoretical work suggests that these effects could be much larger than the main effects of genetic variants (35
). While it is reasonable to search for large interaction effects in relatively small samples, the advent of well-powered genome-wide association studies may provide enough data to explore such relations and better delineate the true limits of the usefulness of genetic data in the predictive modeling of genetically complex diseases. Finally, targeted or genome-wide sequencing may identify previously unknown low-frequency, high-risk alleles and provide more accurate genetic data, which may be more useful in predictive modeling than the currently available linkage-disequilibrium tagging SNP data (36
It is likely that COPD consists of a number of different pathologic states with their own unique biology and risk factors (37
). This holds for COPD overall and for COPD in AAT deficiency. With this is mind, FEV1
-based predictive models are a logical place to begin, but it will also be useful to explore predictive modeling for various COPD subtypes which may be based on various COPD-related traits, such as emphysema distribution, inflammatory profile, or particular combinations of genetic markers.
In summary, we have presented results from 2 predictive models for FEV1 and severe COPD in AAT deficiency. These represent a set of predictive variables that performed well in 2 separate data samples, indicating that they are a stable set of variables with relatively uniform, predictable effects in our study population. The predictive performance of these models rivals or exceeds that of other, well-established clinical predictive models, but there is still room for improvement in predictive accuracy. Further identification of epidemiologic and genetic risk factors for FEV1 decline in AAT deficiency, larger sample sizes for model development and validation, and longitudinal data on lung function will allow for the development of more powerful risk prediction tools.