|Home | About | Journals | Submit | Contact Us | Français|
Case-control studies have successfully identified many significant genetic associations for complex diseases, but lack of replication has been a criticism of case-control genetic association studies in general. We selected 12 candidate genes with reported associations to chronic obstructive pulmonary disease (COPD) and genotyped 29 polymorphisms in a family-based study and in a case-control study. In the Boston Early-Onset COPD Study families, significant associations with quantitative and/or qualitative COPD-related phenotypes were found for the tumor necrosis factor (TNF)-α −308G>A promoter polymorphism (P < 0.02), a coding variant in surfactant protein B (SFTPB Thr131Ile) (P = 0.03), and the (GT)31 allele of the heme oxygenase (HMOX1) promoter short tandem repeat (P = 0.02). In the case-control study, the SFTPB Thr131Ile polymorphism was associated with COPD, but only in the presence of a gene-by-environment interaction term (P = 0.01 for both main effect and interaction). The 30-repeat, but not the 31-repeat, allele of HMOX1 was associated (P = 0.04). The TNF −308G>A polymorphism was not significant. In addition, the microsomal epoxide hydrolase “fast” allele (EPHX1 His139Arg) was significantly associated in the case-control study (P = 0.03). Although some evidence for replication was found for SFTPB and HMOX1, none of the previously published COPD genetic associations was convincingly replicated across both study designs.
Case-control association analysis is a commonly used study design in the field of complex trait genetics. Susceptibility genes successfully identified using this approach include the associations between Factor V Leiden and venous thromboembolism (1), Apolipoprotein E4 and Alzheimer's disease (2), and peroxisome proliferator–activated receptor-γ (PPARγ) and type 2 diabetes mellitus (3). However, candidate gene case-control association studies have been criticized because of a lack of replication (4, 5). Often the first published study reports a significant association between a candidate gene polymorphism and a disease of interest, but subsequent studies are unable to confirm the association (6).
Candidate genes can be selected based on previous genetic linkage analysis results (positional candidate genes), but they are more commonly chosen based on known or presumed mechanisms in disease pathophysiology or based on the results of previous association studies. In the example of PPARγ above, Altshuler and colleagues identified 16 published genetic associations for type 2 diabetes mellitus (3). They genotyped the previously associated polymorphisms in a parent-child trios study, to protect against spurious results due to population stratification. Significant associations (P < 0.05) were then tested for replication in two case-control studies and 1 sibling-pair study. By the use of this method, they confirmed only one of the previously reported associations. The common allele of the PPARγ Pro12Ala polymorphism conferred a modest (1.25-fold), but significant increase in risk for type 2 diabetes mellitus.
In this study of chronic obstructive pulmonary disease (COPD), we used a study design similar to that of Altshuler and colleagues in their study of diabetes. We sought to determine whether any of the published COPD candidate gene associations would withstand the test of replication and whether a convincing COPD genetic association could be found using this approach. Twelve candidate genes with reported significant associations to COPD were identified in the published literature (Table 1 and Table E1 in the online supplement). A total of twenty-nine polymorphisms (24 single nucleotide polymorphisms [SNPs], 1 insertion/deletion [indel], 3 short tandem repeats [STRs], and 1 null deletion) were genotyped in both a family-based COPD study and a case-control COPD study. Results from this study have been previously reported as an abstract (7).
Details of subject recruitment and phenotyping in the Boston Early-Onset COPD Study have been reported previously (8). Severe, early-onset COPD probands had an FEV1 < 40% predicted, age < 53 yr, and did not have severe α1-antitrypsin deficiency. First-degree relatives, older second-degree relatives, spouses, and other family members with COPD were invited to participate. This analysis included 949 individuals from 127 extended pedigrees. Ninety-eight percent of the Boston Early-Onset COPD Study participants were white.
The case-control study identified cases from the National Emphysema Treatment Trial (NETT) (9, 10). Subjects participating in NETT had an FEV1 45% predicted (11), evidence of hyperinflation on pulmonary function testing, and bilateral emphysema on high-resolution chest CT scan. For this analysis, cases included 304 non-Hispanic white participants in the NETT Genetics Ancillary Study; nearly all were current or former smokers. Control subjects were participants in the Normative Aging Study (NAS), a longitudinal study of healthy men conducted by the Veterans Administration (VA) in Greater Boston, starting in the 1960s (12). Control subjects included 441 white male NAS participants with a history of at least 10 pack-years of cigarette smoking and without airflow obstruction at their most recent visit (FEV1 > 80% predicted  and FEV1/FVC > 90% predicted ).
All studies were approved by the appropriate institutional review boards. Participants in the Boston Early-Onset COPD Study and the NETT Genetics Ancillary Study gave written informed consent. Anonymized data were used for the Normative Aging Study participants, as approved by the IRBs of Partners Healthcare System and of the Boston VA.
Twelve candidate genes with reported associations to COPD were identified from the published literature (Tables 1 and E1). Nineteen polymorphisms were initially genotyped in both the family-based and case-control studies (Table 2). Four SNPs (EPHX1 His139Arg, EPHX1 Tyr113His, GSTP1 Ile105Val, and tumor necrosis factor [TNF] −308G>A) were genotyped using the 5′ to 3′ exonuclease assay in TaqMan (Applied Biosystems, Foster City, CA) (14), using ABI TaqMan Assays-on-Demand. The 1–base pair indel in MMP1 (rs1799750) was also genotyped using TaqMan. Details of this assay and the remaining SNP and STR assays are listed in Table E2. The remaining SNPs, including the ten additional TNF and EPHX1 SNPs below, were genotyped with unlabeled minisequencing reactions and mass spectrometry in Sequenom (San Diego, CA) (15). Sequenom genotyping protocols are available on our website (http://www.innateimmunity.net). For three STR markers, PCR was performed using fluorescent-labeled and unlabeled primers, and product sizes were assessed by capillary electrophoresis on an ABI 3100 machine. For D2S388 and D20S838, the primers listed in the public database were used (http://www.ncbi.nlm.nih.gov/). The HMOX1 STR assay was designed per Yamada and coworkers (16). Product sizes were compared with the human genome sequence (http://genome.ucsc.edu/) to determine the number of repeats. The glutathione S-transferase M1 (GSTM1) null deletion was assessed using a quantitative real-time PCR (TaqMan) assay (17). Primers and probes for GSTM1 and a two-copy gene control (BRCA1) were from the previous report (17).
Four additional SNPs in TNF (Table 2) were selected using a linkage disequilibrium tagging algorithm (http://www.innateimmunity.net), based on genotype data available from SeattleSNPs (http://pga.mbt.washington.edu/). Tag SNPs were selected using an LD threshold defined by r2 0.9 and a minor allele frequency 0.05. Six additional SNPs in EPHX1 (Table 2) were selected from public databases (http://snpper.chip.org/, http://snp500cancer.nci.nih.gov/).
In the family-based study, deviations from Mendelian inheritance were tested with Pedcheck (18). The data were analyzed using the extended pedigree family-based association test, implemented in PBAT (http://www.biostat.harvard.edu/~clange/default.htm) (19). Quantitative (FEV1 and FEV1/FVC) and qualitative (mild-to-severe and moderate-to-severe airflow obstruction) traits were analyzed under additive genetic models, adjusting for age, sex, height, smoking status, and pack-years of cigarette smoking, including quadratic terms where appropriate. Dominant genetic models and gene-by-environment interactions were tested in secondary analyses.
The case-control data were analyzed in SAS/Genetics (SAS Institute, Cary, NC). Hardy-Weinberg equilibrium was assessed in control subjects using an exact test. Odds ratios and chi-square statistics were calculated from 2 × 2 tables of allele frequencies. Logistic regression was used to control for age and pack-years in both additive and dominant genetic models. Gene-by-smoking and gene-by-gene interactions were tested by including appropriate cross-product terms in the regression models. Haplotype analysis was performed using the expectation-maximization algorithm and score tests, implemented in haplo.stats (20).
In the case-control study, we had previously genotyped a panel of 44 unlinked SNPs to test for population stratification using the method of Pritchard and Rosenberg (21). There was no compelling evidence for overt population stratification (χ244 d.f. = 58.5, P = 0.07) (22).
Characteristics of participants in the Boston Early-Onset COPD Study are shown in Table 3A. The probands were predominantly female, as has been previously reported (8, 23). Probands had severe airflow obstruction (mean FEV1 = 19.2% predicted). Details of the participants in NETT and NAS are found in Table 3B. The majority of NETT subjects were men (63.8%), while the NAS control subjects were all men. Ages of cases and control subjects were similar, but the NETT cases had a significantly greater smoking history (67.4 versus 38.5 pack-years, P < 0.0001). The 304 NETT cases had severe COPD (mean FEV1 = 24.8% predicted).
In the Boston Early-Onset COPD Study families, all markers were tested for association with the postbronchodilator phenotypes, including FEV1, FEV1/FVC, mild-to-severe airflow obstruction (FEV1 < 80% predicted, with FEV1/FVC < 90% predicted), and moderate-to-severe airflow obstruction (FEV1 < 60% predicted, with FEV1/FVC < 90% predicted). Significant results from the extended pedigree family-based association test are shown in Table 4. The strongest associations were found with the TNF promoter −308G>A SNP for both quantitative and qualitative COPD-related phenotypes (postbronchodilator), in additive genetic models, adjusting for age, sex, height, smoking status (ever versus never), and pack-years. Similar results were obtained using prebronchodilator spirometry phenotypes (data not shown). A coding variant in surfactant protein B (SFTPB), Thr131Ile, was associated with the qualitative phenotype moderate-to-severe airflow obstruction, but a STR near SFTPB (D2S388) was not associated.
In addition, the 31-repeat (137-bp) allele (allele frequency in the extended pedigrees = 0.07) of the Heme oxygenase (HMOX1) promoter STR also was found to be associated with postbronchodilator FEV1 and FEV1/FVC. None of the other variants—two additional STRs, the MMP1 indel, and the other 12 SNPs—were found to be associated with qualitative or quantitative COPD-related traits in the early-onset COPD families. The GSTM1 deletion could not be analyzed in the family-based study design, as heterozygotes for the deletion could not be distinguished consistently from the wild-type. One marker, the α1-antichymotrypsin Pro229Ala SNP (Bonn-1), was found to be monomorphic in the extended pedigrees and was not genotyped in the case-control study.
To assess for gene-by-environment interactions, the TNF −308G>A and SFTPB Thr131Ile variants were examined in an analysis stratified by smoking status as well as in a model including an interaction term for pack-years. These analyses did not show evidence of gene-by-environment interaction effects.
In the analysis of the additional TNF SNPs in the early-onset COPD families, one SNP in the 3′ untranslated region (UTR), rs3091257, had a high rate of Mendelian errors. In addition, this SNP deviated from Hardy-Weinberg proportions in the NAS control subjects and was not analyzed further in either cohort. None of the three additional TNF SNPs were associated with COPD-related phenotypes in the family-based study.
In the early-onset COPD families, none of the eight SNPs in EPHX1 were found to be associated in the primary analysis. In a secondary analysis using a dominant genetic model, one intronic SNP (rs1877724) was marginally associated with postbronchodilator values for both FEV1 (P = 0.02) and moderate-to-severe airflow obstruction (P = 0.02).
As noted above, the TNF rs3091257 SNP was out of Hardy-Weinberg Equilibrium (HWE) in the NAS control subjects and was removed from the analysis. All other SNPs and STRs tested in the study were found to be in HWE in the NAS control subjects. The results of the case-control association analysis of the biallelic markers (SNPs, insertion/deletion, and GSTM1 null allele) are shown in Table 5A. The only significant association was with a coding variant in microsomal epoxide hydrolase (EPHX1), His139Arg, referred to as the “fast” allele (P = 0.03 for an additive genetic model, adjusting for age and pack-years in a logistic regression); the fast allele appeared to be protective (odds ratio 0.73; 95% confidence interval 0.56, 0.96). The positive family-based association with TNF −308G>A was not replicated in the case-control study. The SFTPB Thr131Ile polymorphism was not significant in the primary case-control analysis. However, in a logistic regression model adjusting for age and pack-years that included and an interaction between SFTPB Thr131Ile and pack-years, both the main SNP effect (additive model) and the interaction term became significant (P = 0.01 for both main effect and interaction). In analyses of both additive and dominant genetic models, none of the other biallelic markers were predictive of COPD in the case-control study, in models that did and did not include gene-by-environment interactions.
The matrix metalloproteinase-9 (MMP9) STR (D20S838) was initially analyzed by grouping alleles into “Small” and “Large” repeat numbers, according to the method of Joos and colleagues (24); alleles 110 bp (16 repeats in the previous report) were classified as small, and those 112 bp (17 repeats) were considered large. The HMOX1 promoter polymorphism was analyzed as per Yamada and coworkers (16). Alleles were classified as Small (< 129 bp, < 27 repeats), Medium (129–139 bp, 27–32 repeats) and Large ( 141 bp, 33 repeats). Neither STR marker was significantly associated in these analyses.
The MMP9, HMOX1, and SFTPB STRs were then analyzed by comparing each allele with a frequency of at least 0.05 to all other alleles, using both additive and dominant genetic models (Table 5B). The 30-repeat allele (135 bp, allele frequency in NETT cases = 0.43) of the HMOX1 STR was significantly associated in the case-control analysis (adjusted P = 0.04, additive genetic model), whereas the 31-repeat allele (137 bp, allele frequency in Boston Early-Onset COPD Study families = 0.07) had been significant in the family-based study. The 135-bp allele was the most common repeat size and was underrepresented in the cases (allele test, odds ratio = 0.84; 95% CI, 0.68–1.04).
None of the three additional SNPs in TNF were significant in the case-control study; no evidence of gene-by-smoking interaction was found in this analysis. In the case-control study, none of the additional SNPs in EPHX1 was found to be associated with COPD. However, in a model that included a gene-by-smoking (pack-years) interaction, both the main effect (P = 0.02) and the interaction (P = 0.03) were significant for a silent coding variant in exon 3 (rs2292566). This SNP was not in linkage disequilibrium with the fast allele in exon 4 that was found to be associated with COPD (r2 < 0.1 in NAS control subjects).
Previous authors have examined gene-by-gene interactions in COPD, specifically among genes involved in xenobiotic metabolism (25, 26). The GSTM1 deletion was tested in two-way interactions with GSTP1 Ile105Val, the EPHX1 slow allele, and the EPHX1 fast allele in the case-control study. In additive models for the SNPs, none of the interactions was significant, in models including or excluding the main SNP effects. Similarly, three-way interactions between the GSTM1 deletion, GSTP1 Ile105Val, and either the EPHX1 fast or slow alleles were not significant.
We also tested for interactions between the EPHX1 fast polymorphism, TNF −308 G>A, and SFTPB Thr131Ile, the SNPs that had been significant in either the family-based or case-control study. None of the two-way interactions nor the three-way interaction were significant.
In the case-control population, two-SNP haplotype analyses in SFTPD and GC were not significant, though there was a trend toward association for GSTP1 (global score test, P = 0.06). Haplotype analyses in TNF, using all four SNPs as well as using two SNP sliding windows, were not significant. The eight-SNP haplotype analysis in EPHX1 was not significant, and none of the two- or four-SNP sliding window haplotypes of EPHX1 was associated with COPD.
In this study, we selected 29 polymorphisms in 12 genes that had been reported to be associated with COPD in the published literature and genotyped these variants in a family-based study of early-onset COPD and a case-control COPD study. The most significant association in the family-based study (TNF −308G>A) was not replicated in the case-control study, and the strongest association in the case-control study (EPHX1 fast allele) was not found in the family-based analysis. Two variants showed modest evidence for replication across both study designs. A coding SNP in surfactant protein B (Thr131Ile) that was marginally associated (P = 0.03) with one qualitative trait (moderate-to-severe airflow obstruction) in the Boston Early-Onset COPD families was not associated in the primary case-control analysis, but did show association (P = 0.01) when an SNP-by-smoking interaction was included. An STR in HMOX1 was significant in both studies, though different alleles were associated in each cohort.
The associations with SFTPB and HMOX1 merit further investigation. The different effects of gene-by-smoking interaction in the analyses of SFTPB Thr131Ile in the family-based and case-control studies and the different alleles of the HMOX1 repeat driving the associations in the two study designs suggest that these polymorphisms are not the functional variants affecting COPD susceptibility. The effects that we detected may be due to linkage disequilibrium with nearby functional variants. Analysis of additional SNPs in these genes will be required to confirm these genetic associations. Despite our positive results, we cannot exclude that these may be spurious associations due to the multiple comparisons performed.
Several explanations have been proposed to explain the lack of replication that is commonly seen in case-control association studies in complex trait genetics (5, 6). Small sample sizes may lead to inadequate power to detect an association in the initial study or to replicate true associations in subsequent studies. In fact, the majority of the COPD candidate gene association studies listed in Table 1 enrolled fewer than 100 cases and 100 control subjects. Insufficient power is not likely to explain the lack of replication seen in our study. Using the example of TNF −308G>A with a 17% minor allele frequency in the NAS control subjects, our case-control study had 90% power (α = 0.05) to detect an odds ratio of 1.8, in an additive model; this odds ratio is less than reported in either of the two published COPD association studies with significant results (27, 28). However, if the true odds ratio were lower than 1.8 in other populations, then the power to detect significant associations would be reduced.
Spurious associations may result from multiple testing in studies that assess many genes, markers, and phenotypes (29). No consensus exists on the optimal method to adjust for multiple testing in case-control genetic association studies, though replication in an independent study may provide the strongest evidence for true association. Multiple testing was a potential problem in our family-based study, given the multiple genes and phenotypes tested, though the independent case-control sample provided an opportunity to confirm the findings from the family-based study.
Genotyping errors usually bias toward no association, though systematic errors may lead to false positive results. Deviation from Hardy-Weinberg equilibrium (HWE) in the control group may be a sign of genotyping error (29). We found that only one of the markers tested deviated from HWE, and that SNP was excluded from the association analyses. Departure from Mendelian transmission of alleles is another indication of genotyping error that is only applicable to family-based studies; besides the excluded SNP above, only a small number of Mendelian inconsistencies were found in our study.
Failure to demonstrate HWE may also be a sign of population stratification, which refers to differences in allele frequency between cases and controls due to ethnic differences and not due to disease status (30). Population stratification can lead to spurious association in case-control studies (21). Careful matching of cases and control subjects on ethnicity provides some protection against population stratification. Several statistical methods, based on genotype data from additional unlinked markers elsewhere in the genome, are available to test for stratification and control for its effects if present (21, 31, 32). None of the published COPD genetic association studies have employed these formal tests. We tested a modest sized panel of SNPs in our case-control study and found little evidence for stratification.
The issues described may lead to false positive (multiple testing, population stratification) or false negative (small sample size, genotyping error) results. However, true differences may lead to inconsistent results. COPD is a heterogeneous disease and published association studies have used different phenotype definitions. For example, studies of TNF have defined cases on the basis of airflow obstruction (28), emphysema (33), decline in lung function (34), or chronic bronchitis (27). It is possible that a given genetic variant may confer susceptibility to a specific COPD-related phenotype. In our case-control study, the NETT cases all had emphysema confirmed by chest CT scan. Radiographic evidence of emphysema was not a requirement for entry into the Boston Early-Onset COPD Study, though many probands did have chest CT scans showing emphysema (8). In our family-based study we analyzed quantitative and qualitative traits, based on spirometry, but the case-control study used COPD diagnosis as a binary outcome. However, we used strict spirometric criteria to define cases and controls, so the overall conclusions should not be affected. The power may be greater using quantitative versus qualitative traits, however.
For the majority of the genes studied, we genotyped only one or two markers per gene, as has been done in most of the previously reported studies. This method relies on the assumption that the variants tested have functional effects on COPD susceptibility. If another variant in or near the gene were the causal variant, then the true association could be easily missed. Different linkage disequilibrium patterns with the functional variant may lead to variable results in different populations. In two genes, TNF and EPHX1, we tested additional SNPs and used haplotype analysis to study these genes more thoroughly. However, this did not strengthen our findings.
Genetic heterogeneity may also explain the varying results among case-control association studies, especially those done in different ethnic groups. Many of the COPD association studies in Table 1 have shown inconsistent results in white and Asian populations. True differences may be the result of different genetic determinants of disease in diverse populations, variation in gene–environment interaction due to specific environmental exposures, or different patterns of linkage disequilibrium between the tested marker and the causal variant (35). Though most of our study subjects were whites from the United States, it is possible that severe early-onset COPD represents a unique disease subtype with different genetic determinants than the usually seen, later-onset COPD. However, many of the family members in the Boston Early-Onset COPD Study had less severe airflow obstruction, consistent with more usual forms of COPD. Nevertheless, variants in several genes studied, including TNF-α and surfactant protein B, may primarily increase susceptibility to severe early-onset COPD. These results should be interpreted with caution due to the multiple tests performed; replication in an independent cohort is still required.
This study highlights the major difficulty with using a candidate gene approach to uncover susceptibility genes for COPD, namely the lack of replication commonly seen in candidate gene studies. Future candidate gene association studies need to employ rigorous genetic epidemiology methods, including adequate sample sizes, control for multiple testing, and testing for population stratification. A more systematic approach to COPD genetics, starting with genome-wide linkage analysis followed by positional candidate gene association testing and/or SNP-based fine mapping, may lead to more consistent results in the search for genetic determinants of COPD.
The authors thank Salvatore Mazza, Michael Hagar, Molly Brown, Alison Brown, and Maura Regan for their genotyping work and Robert Welch (National Cancer Institute) for providing details of the GSTM1 assay. Co-investigators in the NETT Genetics Ancillary Study include: Marcia Katz, Rob McKenna, Malcolm DeCamp, Mark Ginsburg, Neil MacIntyre, James Utz, Barry Make, Philip Diaz, Gerard Criner, Andrew Ries, Mark Krasna, Fernando Martinez, Larry Kaiser, Frank Sciurba, Zab Mosenifar, and Joshua Benditt.
This work was funded by NIH grants HL61575, HL71393, and HL075478 (E.K.S.) and by an American Lung Association Career Investigator Award (E.K.S.). C.P.H. is supported by T32-HL07427. The National Emphysema Treatment Trial (NETT) was supported by the US National Heart, Lung, and Blood Institute (contracts N01HR76101, N01HR76102, N01HR76103, N01HR76104, N01HR76105, N01HR76106, N01HR76107, N01HR76108, N01HR76109, N01HR76110, N01HR76111, N01HR76112, N01HR76113, N01HR76114, N01HR76115, N01HR76116, N01HR76118, N01HR76119), the Centers for Medicare and Medicaid Services, and the Agency for Healthcare Research and Quality. The Normative Aging Study is supported by the Cooperative Studies Program/ERIC of the US Department of Veterans Affairs and is a component of the Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC).
This article has an online supplement, which is accessible from this issue's table of contents at www.atsjournals.org
Conflict of Interest Statement: C.P.H. has no declared conflicts of interest; D.L.D. has no declared conflicts of interest; C.L. has no declared conflicts of interest; A.A.L. has no declared conflicts of interest; J.J.R. has no declared conflicts of interest; D.K. has no declared conflicts of interest; N.L. has no declared conflicts of interest; J.S.S. has no declared conflicts of interest; D.S. has no declared conflicts of interest; F.E.S. has no declared conflicts of interest; S.T.W. received a grant for $900,065, Asthma Policy Modeling Study, from AstraZeneca from 1997–2003. He has been a co-investigator on a grant from Boehringer Ingelheim to investigate a COPD natural history model which began in 2003. He has received no funds for his involvement in this project. He has been an advisor to the TENOR Study for Genentech and has received $5,000 for 2003–2004. He received a grant from GlaxoWellcome for $500,000 for genomic equipment from 2000–2003. He was a consultant for Roche Pharmaceuticals in 2000 and received no financial remuneration for this consultancy; E.K.S. received grant support and honoraria from GlaxoSmithKline for a study of COPD genetics and received a $500 Speaker Fee from Wyeth for a talk on COPD genetics.