Using a well-characterized group of patients from NETT, we were able to demonstrate significant genetic associations for several COPD-related phenotypes. Polymorphisms in four genes—EPHX1, LTBP4, SFTPB, and TGFB1—were significantly associated with measures of functional capacity, including exercise capacity, pulmonary function tests, and respiratory symptoms. Many of the single SNP associations were confirmed in haplotype analyses. Spirometric traits have been analyzed previously as intermediate phenotypes in COPD genetics (
2,
5,
6,
18). However, COPD genetics studies using measures of exercise tolerance and symptom severity have not been previously published.
Variants in EPHX1 were associated with traits in all three categories, including maximal work capacity on a cardiopulmonary exercise test, DlCO, and UCSD SOBQ score. SNPs in two genes in the TGF-β pathway, TGFB1 and LTBP4, were associated with maximal work capacity and UCSD SOBQ score. The association between TGFB1 and dyspnea was replicated in an independent population, using a family-based study design. In addition, an STR near SFTPB was associated with 6-min walk test distance and dyspnea index.
Microsomal epoxide hydrolase is an enzyme important in the metabolism of reactive epoxide intermediates, such as those found in cigarette smoke. A coding variant in exon 3 (rs1051740, Tyr113His), termed the “slow” variant because of its effect on enzyme activity, has been associated with COPD in case-control studies (
19,
20); in the Lung Health Study, a haplotype carrying the slow variant was associated with rapid decline in lung function (
21). A coding variant in exon 4 (rs2234922, His139Arg), the “fast” variant, was associated with COPD diagnosis in a study comparing the subjects in the NETT Genetics Ancillary Study to community control subjects (
6); the slow allele was not associated. However, other studies have failed to confirm the associations with either polymorphism (
22,
23). The functional effects of the “fast” and “slow” variants and their haplotypes found
in vitro (
24) have not been confirmed
in vivo (
25). Therefore, it is possible that other variants in EPHX1 may affect COPD susceptibility, leading to the inconsistency in previous genetic association studies.
TGFB1 is located on chromosome 19q, a region of the genome linked to COPD-related phenotypes (
5). Wu and colleagues demonstrated an association between the Leu10Pro polymorphism (rs1982073) and COPD (
26). Celedón and colleagues showed that the rs2241712 promoter polymorphism was associated with COPD and related traits in the Boston Early-Onset COPD Study families and in an analysis comparing the 304 NETT subjects to community control subjects (
5). The other promoter SNP (rs1800469) and the Leu10Pro polymorphism were also associated with COPD susceptibility in this case-control analysis.
LTBP4 is a component of the extracellular matrix and is involved in TGF-β signaling. LTBP4 is also located in the linked region on chromosome 19q. No human genetic association studies have been reported, but a mouse model suggests the importance of LTBP4 in the development of pulmonary emphysema (
27).
SFTPB is a hydrophobic protein involved in regulating surface tension in the alveoli. Mutations in SFTPB have been implicated in respiratory failure in full-term neonates (
28). Several studies have found variants in or near SFTPB to be associated with COPD in adults (
29,
30). Our group has reported an association between a coding SNP (Thr131Ile) and airflow obstruction in the Boston Early-Onset COPD Study families (
6); in a model that accounted for gene-by-environment interaction, this SNP was also associated in the case-control study that included the NETT subjects.
The genes that we have found to be associated with COPD phenotypes can be placed into pathways that may relate to COPD pathogenesis. Pathways such as xenobiotic metabolism (EPHX1) (
31), extracellular matrix properties (LTBP4) (
32), and inflammation and cellular signaling (TGFB1) (
33) have been areas of active investigation in COPD. The importance of surface tension (SFTPB) in COPD has not been widely studied, but a recent article describes mathematical models relating surface properties to the development of emphysema (
34). The different genes associated in our study may underscore the heterogeneity of COPD. It is possible that different genes and pathways contribute in varying combinations to various COPD-related phenotypes. For example, we have found that SNPs in TGFB1 are associated with dyspnea but not with other functional measures in COPD, such as exercise capacity, despite the correlations between these traits. Narrower phenotype definitions and rational subgroup analyses may help to successfully identify and replicate associations for COPD candidate genes.
The present study has several limitations. Replication of significant association results is an important step in complex trait genetics, but we only had measurements of one of the phenotypes, dyspnea, in a separate replication population. The instrument used to measure dyspnea in this replication study, the modified MRC scale, has a narrower range of possible results than the UCSD SOBQ score used in NETT. This may reduce the power for replication. Despite these limitations, we were able to replicate the association of TGFB1 with dyspnea. The other phenotypes analyzed, such as performance on a cardiopulmonary exercise test, are not routinely collected in studies of COPD genetics but may be collected in future clinical trials of COPD therapies, allowing for continued study of these functional impairment traits. Replication of these associations will be the strongest protection from spurious results due to multiple testing.
A variable number of markers in each gene were genotyped in this study, with some genes having only one or two markers tested. If the markers tested were not the true functional variants (or in linkage disequilibrium with the functional variants), then significant associations could be missed. This may explain why three of our four most significant genes (EPHX1, LTBP4, and TGFB1) were genes with multiple genotyped SNPs. However, no significant associations were found for SERPINE2, which had the largest number of SNPs tested.
Spurious results arising from multiple testing are a major concern in genetic epidemiology, especially in studies of multiple markers and phenotypes, such as our analysis. The optimal approach to adjust for multiple testing is not clear (
35,
36). Many of the widely used methods are inappropriate for correlated data, such as multiple SNPs in a single gene or multiple related phenotypes. Therefore, we used a test–replication procedure within the NETT Genetics Ancillary Study cohort to reduce the possibility of false-positive results due to multiple testing, accepting that it may have reduced power to detect valid genetic associations. Based on power calculations in the online supplement, the power would be adequate to detect genes with moderate effects using the split dataset approach. However, we cannot exclude the possibility that some of the other candidate gene polymorphisms we studied may be associated with COPD-related phenotypes. Nevertheless, we were able to identify significant associations for four candidate genes with measures of functional impairment in COPD. Future studies using similarly specific COPD-related phenotypes may be able to identify additional genetic associations for COPD in general and for more precise subgroups in particular. In the future, narrowly defined subgroups, possibly based on genetic polymorphisms, may be better able to predict response to COPD therapies, including LVRS.