|Home | About | Journals | Submit | Contact Us | Français|
Pulmonary function measures reflect respiratory health and predict mortality, and are used in the diagnosis of chronic obstructive pulmonary disease (COPD). We tested genome-wide association with the forced expiratory volume in 1 second (FEV1) and the ratio of FEV1 to forced vital capacity (FVC) in 48,201 individuals of European ancestry, with follow-up of top associations in up to an additional 46,411 individuals. We identified new regions showing association (combined P<5×10−8) with pulmonary function, in or near MFAP2, TGFB2, HDAC4, RARB, MECOM (EVI1), SPATA9, ARMC2, NCR3, ZKSCAN3, CDC123, C10orf11, LRP1, CCDC38, MMP15, CFDP1, and KCNE2. Identification of these 16 new loci may provide insight into the molecular mechanisms regulating pulmonary function and into molecular targets for future therapy to alleviate reduced lung function.
Pulmonary function, reliably measurable by spirometry, is a heritable trait reflecting the physiological state of the airways and lungs1. Pulmonary function measures are important predictors of population morbidity and mortality2-4, and are used in the diagnosis of chronic obstructive pulmonary disease (COPD), which ranks among the leading causes of death in developed and developing countries5,6. A reduced ratio of forced expiratory volume in 1 second (FEV1) to forced vital capacity (FVC) is used to define airway obstruction, and a reduced FEV1 is used to grade the severity of airway obstruction7.
Recently, two large genome-wide association studies (GWAS), each comprising discovery sets of more than 20,000 individuals of European ancestry identified novel loci for lung function8,9. Recognizing the need for larger datasets to increase the power to detect loci of individually modest effect size, we conducted a meta-analysis of 23 lung function GWAS comprising a total of 48,201 individuals of European ancestry (Stage 1) and followed-up potentially novel loci in 17 further studies comprising up to 46,411 individuals (Stage 2). We identified 16 additional novel loci for lung function, and provided evidence corroborating association of loci previously associated with lung function8-11. Our findings implicate a number of different mechanisms underlying regulation of lung function and highlight loci shared with complex traits and diseases, including height, lung cancer, and myocardial infarction.
Meta-analyses for cross-sectional lung function measures were undertaken for approximately 2.5 million genotyped or imputed SNPs across 23 studies with a combined sample size of 48,201 adult individuals of European ancestry. Characteristics of the cohort participants and the genotyping are shown in Supplementary Tables 1A and 1B. FEV1 and FEV1/FVC were adjusted for ancestry principal components, age, age2, sex, and height as covariates. Association testing of the inverse-normal transformed residuals for FEV1 and FEV1/FVC assumed an additive genetic model and was stratified by ever-smoking (versus never-smoking) status. Meta-analyses of the smoking strata within study, and the study-specific results, were undertaken using inverse variance weighting (the inverse of the standard error squared was used as the weight). We applied genomic control twice at study level (to each smoking stratum separately and to the study level pooled estimates) and also at meta-analysis level to avoid inflation of test statistics due to cryptic population structure or relatedness (see Supplementary Table 1A for study level estimates). Our application of genomic control at the three stages is likely to be overly conservative because it has recently been shown that in large meta-analyses, test statistics are expected to be elevated under polygenic inheritance even when there is no population structure12. Test statistic inflation (λGC) prior to applying genomic control at meta-analysis level was 1.12 for FEV1 and 1.09 for FEV1/FVC. Genomic inflation estimates increase with sample size, as has been shown for other traits13-15; standardised estimates to a sample of 1000 individuals (λGC_1000) were 1.002 for FEV1 and 1.002 for FEV1/FVC. Plots of meta-analysis P-values for FEV1 and FEV1/FVC against a uniform distribution of P values expected under the null hypothesis showed deviations which were attenuated, but persisted, after removal of SNPs in loci reported previously, consistent with additional loci being associated with lung function (Supplementary Figure 1A).
Twenty-nine new loci showing evidence of association with lung function (P<3×10−6) in Stage 1 were followed up in Stage 2 by utilizing in silico data from seven studies, and by undertaking additional genotyping in 10 studies for the 10 highest-ranked SNPs (Figure 1). Full details of the SNP selection are given in the Online Methods. Inverse variance weighting meta-analysis was performed across Stages 1 and 2, and two-sided p-values were obtained for the pooled estimates. Sixteen new loci reached genome-wide significance (P<5×10−8) and showed consistent direction of effects in both stages, comprising 12 new loci for FEV1/FVC, 3 new loci for FEV1, and one new locus reaching genome-wide significance for both traits (Figure 2, Table 1). To assess the heterogeneity across studies included in Stage 1 and Stage 2, Chi-square tests were undertaken for all 16 SNPs; and none of them was statistically significant after applying a Bonferroni correction for 16 tests. The sentinel SNPs at these loci were in or near the genes MFAP2 (1p36.13), TGFB2/LYPLAL1 (1q41), HDAC4/FLJ43879 (2q37.3), RARB (3p24.2), MECOM (EVI1) (3q26.2), SPATA9/RHOBTB3 (5q15), ARMC2 (6q21), NCR3/AIF1 (6p21.33), ZKSCAN3 (6p22.1), CDC123 (10p13), C10orf11 (10q22.3), LRP1 (12q13.3), CCDC38 (12q22), MMP15 (16q13), CFDP1 (16q23.1) and KCNE2/C21orf82 (21q22.11) (Supplementary Figures 1B and 1C). The strongest signals in AGER (rs2070600)8,9 and two of the novel signals (rs6903823 in ZKSCAN3 and rs2857595, upstream of NCR3) lie within a~3.8Mb interval at 6p21.32-22.1 which is characterised by long-range linkage disequilibrium. Nevertheless, the leading SNPs in these regions which are within the major histocompatibility complex (MHC) were statistically independent (Supplementary Note).
We investigated mRNA expression of the nearest gene for each of the 16 novel loci in human lung tissue and in a range of human primary cells including lung, brain, airway smooth muscle cells and bronchial epithelial cells. Transcripts were detected for all selected genes in lung tissue except CCDC38 and transcripts for most genes were also detected in airway smooth muscle cells and in bronchial epithelial cells (Table 2). As we were unable to detect expression of CCDC38 in any tissue, we also examined expression of SNPRF, which is the adjacent gene (Table 2), and found expression in all four cell types. TGFB2, MFAP2, EVI1 and MMP15 were expressed in one or more lung cell types but not in peripheral blood mononuclear cells providing evidence that these genes may exhibit tissue-specific expression.
We assessed whether SNPs in these new regions, or their proxies (r2>0.6), were associated with gene expression using a database of expression-associated SNPs in lymphoblastoid cell lines16. Four loci showed regional (cis) effects on expression (P<1×10−7, Supplementary Note). A proxy for our sentinel SNP in CFDP1, rs2865531, coincided with the peak of the expression signal for CFDP1 and the strongest proxy for rs6903823 in ZKSCAN3 coincided with the peak of expression for ZSCAN12.
The putative function of the genes within, or closest to, the association peaks identify a range of plausible mechanisms for impacting lung function. The most statistically significant new signal for FEV1/FVC (P=7.5×10−16) was in the gene encoding MFAP2, an antigen of elastin-associated microfibrils17, although correlated SNPs in the region potentially implicate other genes that could plausibly influence lung function, such as CROCC, which encodes rootletin, a component of cilia18. Our second strongest new signal, also for FEV1/FVC, was in the gene encoding the retinoic acid receptor beta (RARB). Rarb-null knockout mice exhibit premature alveolar septation19. The third most statistically significant new signal for FEV1/FVC, and the most statistically significant new signal for FEV1, was in CDC123. This was the only novel region to show genome-wide association with both traits. CDC123 encodes a homologue of a yeast cell division cycle protein which plays a critical role in modulating Eukaryotic initiation factor 2 in times of cell stress20. The fourth signal for FEV1/FVC is downstream of HDAC4 which encodes a histone deacetylase; reductions in the expression of other histone deacetylases (specifically HDAC2, HDAC5 and HDAC8) have been noted in COPD21. The regions we observed in the MHC are much more difficult to localize with multiple genes being tagged by the top SNP, including non-synonymous SNPs in ZKSCAN3, PGBD1, ZSCAN12, ZNF323, TCF19, LTA, C6orf15 and GPANK1 (also known as BAT4) (Supplementary Table 2). At 6p21.33, the strongest association with lung function was observed for rs2857595, which is in linkage disequilibrium (LD, r2=0.47) with a non-synonymous SNP in LTA (encoding lymphotoxin alpha) and with a SNP in the upstream promoter region of TNFA (encoding tumour necrosis factor alpha, r2=0.86), both of which are plausible candidates22,23. Our top SNP in MMP15 is in strong LD (r2=1) with a non-synonymous SNP (rs3743563, which has an association with FEV1/FVC at P=1.8×10−7) within the same gene. Plausible mechanisms implicated by the other novel signals of association with lung function reported here include TGF-beta signalling; TGFB2 expression is upregulated in bronchial epithelial cells in asthma24. The putative function of key genes (as defined by LD with the leading SNP) in each of the 16 loci, and relevant findings from animal models, are summarised in Table 2 and detailed in Supplementary Table 2.
Alleles representing 11 of the 16 novel loci showed directionally consistent effects on lung function in 6,281 children (7 to 9 years of age) (Supplementary Table 3A) suggesting that genetic determination of lung function in adults may in part act via effects on lung development, or alternatively, that some genetic determinants of lung growth and lung function decline are shared.
Although we stratified for ever-smoking versus never-smoking, we did not adjust for the amount smoked. In order to investigate the possibility that the associations at any of our 16 novel regions were driven by an effect of the SNP on smoking behaviour, we evaluated in silico data for associations with smoking amount from the Ox-GSK consortium25 for the leading SNPs in these 16 regions. None of these 16 SNPs showed statistically significant association with the number of cigarettes smoked per day (Supplementary Table 3B).
In addition, in our Stage 1 and Stage 2 datasets combined, we assessed whether the estimated effect sizes of the variants on lung function phenotypes differed substantially between ever-smokers and never-smokers (Supplementary Table 4) across the 16 loci. For the most strongly associated trait at each locus, we tested the SNP interaction with ever-smoking (versus never-smoking). None of the 16 novel loci showed a significant interaction (Bonferroni corrected threshold for 16 independent SNPs P=0.003125). These analyses suggest that the genetic effects we have identified underlie lung function variability irrespective of smoking exposure.
Our lung function associations were adjusted for height, but there are some overlaps between loci associated with height and those associated with lung function. Therefore, we evaluated in silico data for height associations of our novel regions in the GIANT consortium14 dataset. The G allele of rs2284746 (MFAP2, intron), which was associated with decreased FEV1/FVC was associated with increased height (Supplementary Table 3C).
Given reported associations between lung cancer and either COPD or lung function decline, we also assessed in silico data for sentinel or proxy SNPs in these 16 regions for associations with lung cancer in the International Lung Cancer Consortium (ILCCO) GWAS meta-analysis26. Alleles associated with reduced lung function were associated with risk of lung cancer at the strongest available proxy SNP for rs2857595 (upstream of NCR3) at 6p21.33 (rs3099844, r2=0.67), and the strongest proxy SNP for rs6903823 (SNP in intron of ZKSCAN3 and ZNF323) at 6p22.1 (rs209181, r2=0.69) (lung cancer associations, P=2.2×10−7 and P=3.4×10−5, respectively, Supplementary Table 3D). No significant associations with lung cancer were seen at the other new loci (proxy SNPs were available for 15 of the 16 loci, Bonferroni corrected P<0.0033).
In addition to the effects on height, smoking and lung cancer described above, we examined the literature for evidence for associations with other traits for each of the 16 new loci (detailed in Supplementary Table 2). Genome-wide significant associations (P<5×10−8) have been reported in KCNE2 with myocardial infarction27, and at 6p21.33 near NCR3/AIF1, with neonatal lupus28 and with systemic lupus erythematosus29. Other significant complex disease associations have also been noted in the regions of CDC123 (type 2 diabetes30), CFDP1 (type 1 diabetes31) and MECOM (blood pressure32,33), but with weaker LD (r2<0.3) between the reported SNP and the sentinel SNP for lung function in the region (Supplementary Table 2).
Associations in 10 loci previously reported for lung function8,9 reached genome-wide significance (P<5×10−8) in our Stage 1 data, namely loci in or near TNS1, FAM13A, GSTCD/NPNT, HHIP, HTR4, ADAM19, AGER, GPR126, PTCH1, and TSHD4 (Supplementary Table 5A). Thus, a total of 26 regions showed genome-wide significant association with lung function in our study. In aggregate, variants at these 26 regions explain approximately 3.2% of the additive polygenic variance for FEV1/FVC and 1.5% for FEV1 (see Supplementary Note). Following the approach described by Park et al.34 we estimated that there is a total of 102 (95% CI 57-155) independent variants with similar effect sizes to the 26 variants we report. In combination these 102 variants, comprising 26 discovered variants and 76 putative undiscovered variants, collectively explain around 7.5% of the additive polygenic variance for FEV1/FVC and 3.4% for FEV1 (see Supplementary Table 6, Online Methods and Supplementary Note).
In meta-analysis of 23 studies comprising 48,201 individuals of European ancestry and follow-up in 17 studies comprising up to 46,411 individuals, we report genome-wide significant associations with an additional 12 regions for FEV1/FVC, an additional three regions for FEV1 and one additional region associated with both FEV1 and FEV1/FVC. We also confirm genome-wide association with 10 regions previously associated with lung function, bringing to 26 the total number of loci associated with lung function in these data. Most of the new loci are in regions not previously suspected to have been involved in lung development, the control of pulmonary function or risk of developing COPD. Elucidating the mechanisms through which these regions influence lung function should lead to a more complete understanding of lung function regulation and the pathogenesis of COPD. Four of the new loci (MFAP2, ZKSCAN3, near NCR3 and near KCNE2) we show to be associated with lung function are also associated with other complex traits and diseases (P<5×10−8 for the other trait at a SNP with r2>0.3 with the top lung function SNP in the region). Understanding the intermediates underlying these pleiotropic effects could also reveal crucial insights into the pathophysiology of lung disease. One potential explanation is that these loci underlie control of the mechanisms regulating the development and resolution of inflammation and subsequent tissue remodelling in a range of tissues.
The effect sizes of the variants in the 26 loci associated with lung function explain a modest proportion of the additive genetic variance in FEV1/FVC and in FEV1, even after accounting for putative undetected variants with a similar distribution of effect sizes34. Our findings are consistent with those from other common complex traits, where it is thought that many as yet unidentified common and rare sequence variants, and potentially structural variants could explain the remaining heritability35. That our study more than doubled the number of loci known to be associated with lung function underlines the utility of large sample sizes to achieve the power to detect common variants associated with complex traits. Nevertheless, it is likely that additional variants with similar effect sizes remain undiscovered14. In addition, our study was not designed to detect rare variants or structural variants associated with lung function. Identification of rare variants associated with lung function could be helpful in narrowing the scope of ongoing functional work to those genes most likely to be causally related to the association signals we detected.
Our study focused on cross-sectional measures of lung function. Adult lung function at a particular time point is influenced by the peak lung function achieved by 25-35 years of age, as well as the rate of decline of lung function after that peak36. The 26 loci now confirmed to be associated with lung function could affect either pre- or post-natal lung development and growth or decline in lung function during adulthood, or both. We showed consistent directions of estimated effects on lung function between adults and children at 7-9 years of age for SNPs at 11 of the 16 new loci, and eight of 10 previously reported loci (Supplementary Table 3A). The results we show for lung function in children provide some indication that these loci affect lung function development, although studies in larger populations of children would provide greater clarity for SNPs in the new loci. Further investigations will be required in large populations with longitudinal data to delineate the influence of these variants on the rates of development of, and decline in, lung function and on the risk of developing COPD.
Of the sentinel SNPs at the 16 new loci associated with lung function, only rs2284746 (MFAP2) was associated with height in the GIANT consortium14 dataset. The G allele of rs2284746 was associated with both increased height and reduced lung function. A similar relationship between lung function and height was previously reported for the G allele of rs3817928 in GPR1268,14, which is associated with decreased height, but with increased FEV1/FVC. A further three of the 180 loci found to be associated with height14 showed association (for 180 loci, Bonferroni corrected threshold P=2.8×10−4) with either FEV1 (CLIC4 and BMP6) or FEV1/FVC (PIP4K2B) (Supplementary Table 3E). In each case, the allele associated with an increase in height was associated with a decrease in lung function. This is not the case for the association of rs1032296 near HHIP, which has shown consistent directions of effects on lung function and height14,11. However, the strongest SNP associated with height in the HHIP region lies within an intron of HHIP but shows no association with FEV1 or FEV1/FVC. Furthermore, while height is an important predictor of FEV1, this is not true for its ratio to FVC37. These observations argue against the associations with lung function at these loci being simply due to incomplete adjustment for height.
We stratified by ever- and never-smoker status in our analyses and in our investigation of amount smoked in the Ox-GSK consortium25 none of the sentinel SNPs in the 16 new regions showed association with the number of cigarettes smoked per day. Additionally, none of these regions was associated with ever-smoking in the Ox-GSK consortium data (Supplementary Table 3B). Thus the SNP associations with lung function we observed are unlikely to have arisen simply as a consequence of inadequate adjustment for smoking.
We did not observe any interactions with ever-smoking for any of the sentinel SNPs in the 16 new regions that exceeded a Bonferroni-corrected significance level (for 16 SNPs). Thus, the effects on lung function of the novel variants we identified are apparent in both ever-smokers and in never-smokers, and the effects of smoking and of these genetic variants may be independent and additive.
In other common complex diseases, follow up studies that incorporate common genetic risk variants into models to predict disease have not been shown to add substantially to existing risk models, particularly when such models already include family history38,39. The same may also prove to be true for the 26 genetic variants described in this paper, as the effect size of any individual variant is small, but further work is required in this area. The major utility of our findings will be in the knowledge they provide about previously unknown pathways underlying lung function. Elucidating the mechanisms that these genes are involved in will lead to improved understanding of the regulation of lung function and potentially to new therapeutic targets for COPD.
We thank the many colleagues who contributed to collection and phenotypic characterization of the clinical sampling, genotyping, and analysis of the data. We especially thank those who kindly agreed to participate in the studies.
Major funding for this work is from the following sources (alphabetical): Academy of Finland (project grants 104781, 120315, 129269, 1114194, Center of Excellence in Complex Disease Genetics (213506 and 129680) and SALVE); Althingi (Icelandic Parliament); Arthritis Research Campaign; Asthma UK; AstraZeneca; AXA Research Fund; Biotechnology and Biological Sciences Research Council (BBSRC) (BB/F019394/1, G20234); British Heart Foundation (PG/97012, PG/06/154/22043, FS05/125); British Lung Foundation; Canadian Institutes of Health Research (Grant ID MOP-82893); Cancer Research United Kingdom; Chief Scientist Office, Scottish Government Health Directorate (CZD/16/6); Croatian Institute for Public Health; UK Department of Health; Dutch Kidney Foundation; Erasmus Medical Center and Erasmus University, Rotterdam; Estonian Genome Center, University of Tartu, Estonia (SF0180142s08); EU funding (GABRIEL GRANT Number: 018996, ECRHS II Coordination Number: QLK4-CT-1999-01237); European Commission (DG XII, EURO-BLCS, FP-5 QLG1-CT-2000-01643, FP-6 LSHB-CT-2006-018996 (GABRIEL), FP-6 LSHG-CT-2006-018947 (EUROSPAN), FP-6 GenomEUtwin project QLG2-CT-2002-01254, FP7/2007-2013: HEALTH-F2-2008-201865, GEFOS, HEALTH-F2-2008-35627, TREAT-OA, HEALTH-F4-2007-201413 (ENGAGE)); Finnish Foundation for Cardiovascular Research; Flight Attendant Medical Research Institute (FAMRI); German Asthma and COPD Network (COSYCONET: BMBF grant 01GI0883); German Bundesministerium fuer Forschung und Technology (01 AK 803 A-H, 01 IG 07015 G); German Federal Ministry of Education and Research (BMBF) (03ZIK012, 01ZZ9603, 01ZZ0103, and 01ZZ0403): German National Genome Research Network (NGFN-2 and NGFN-plus); German Ministry of Cultural Affairs; GlaxoSmithKline; Gyllenberg Foundations; Healthway, Western Australia; Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany; Healthcare and Bioscience iNet (funded by the East Midlands Development Agency, part-financed by the European Regional Development Fund, delivered by Medilink East Midlands); Higher Education Funding Council for England (HEFCE); Hjartavernd (Icelandic Heart Association); Innsbruck Medical University; Institute for Anthropological Research in Zagreb; International Osteoporosis Foundation; Intramural Research Program of the NIH, National Institute on Aging and National Institute of Environmental Health Sciences; Jalmari and Rauha Ahokas Foundation; Juvenile Diabetes Research Foundation International (JDRF); Lifelong Health and Wellbeing Initiative (G0700704/84698); Medical Research Council UK (G1000861, G0501942, G0902313, G0000934, G0800582, G0500539, G0600705, PrevMetSyn/SALVE, G9901462); Medical Research Fund of the Tampere University Hospital; Ministry of Science, Education and Sport of the Republic of Croatia (108-1080315-0302); Medical Research Council Human Genetics Unit; Medisearch-The Leicester Medical Research Foundation; Munich Center of Health Sciences (MC Health) as part of LMUinnovativ; National Health and Medical Research Council of Australia (Grant ID 403981 and ID 003209); National Human Genome Research Institute (NHGRI) (U01-HG-004729, U01-HG-004402); National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centres (Guy’s & St Thomas’ NHS Foundation Trust in partnership with King’s College London and Cambridge University Hospitals NHS Foundation Trust in partnership with the University of Cambridge); Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) (050-060-810); Netherlands Organization for the Health Research and Development (ZonMw); Netherlands Organization of Scientific Research NOW (1750102007006, 175.010.2005.011, 911-03-012); Northern Netherlands Collaboration of Provinces (SNN); Norwegian University of Science and Technology; Novo Nordisk; Ontario Institute of Cancer Research and Canadian Cancer Society Research Institute (CCSRI 020214); Republic of Croatia Ministry of Science, Education and Sports research grants (108-1080315-0302); Research Institute for Diseases in the Elderly (RIDE) (014-93-015: RIDE2); Research Into Ageing (251); Siemens Healthcare, Erlangen, Germany and the Federal State of Mecklenburg- West Pomerania; Social Ministry of the Federal State of Mecklenburg-West Pomerania; Structure Enhancing Fund (FES) of the Dutch government; Swedish Heart and Lung Foundation grant 20050561; Swedish Research Council for Worklife and Social research (FAS), grants 2001-0263, 2003-0139; Swiss National Science Foundation (grants no 4026-28099,3347CO-108796, 3247BO-104283, 3247BO-104288, 3247BO-104284, 32-65896.01,32-59302.99, 32-52720.97, 32-4253.94); The Asthma, Allergy and Inflammation Research Trust; The Great Wine Estates of the Margaret River region of Western Australia; The Netherlands’ Ministry of Economic Affairs, Ministry of Education, Culture and Science and Ministry for Health, Welfare and Sports; The Royal Society; The University of Split and Zagreb Medical Schools,; Tromsø University; U01 DK062418; UBS Wealth Foundation Grant BA29s8Q7-DZZ; UK Department of Health Policy Research Programme; University Hospital Oulu, Biocenter, University of Oulu, Finland (75617); University Medical Center Groningen; University of Bristol; University of Leicester HEFCE CIF award; University of Nottingham; US National Institute of Health (NIH) (1P50 CA70907, RO1 CA121197, U19 CA148127, CA55769, CA127219, R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071252, R01HL071258, R01HL071259, UL1RR025005, contracts HHSN268200625226C, HHSN268200782096C, R01-HL084099); US NIH National Cancer Institute (RO1CA111703); US NIH National Center for Research Resources (grants M01-RR00425 and 5M01 RR00997); US NIH National Eye Institute (NEI); US NIH National Heart, Lung and Blood Institute (contracts N01-HC-85079 through N01-HC-85086, N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133, N01-HC-95095, N01-HC-48047, N01-HC-48048, N01-HC-48049, N01-HC-48050, N01-HC-45134, N01-HC-05187, N01-HC-45205, N01-HC-45204, N01 HC-25195, N01-HC-95159 through N01-HC-95169, RR-024156, N02-HL-6-4278, R01 HL-071022,R01 HL-077612, R01 HL-074104, RC1 HL100543, HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C, grants HL080295, HL087652, HL105756, R01-HL-084099, R01HL087641, R01HL59367,R01HL086694, HL088133, HL075336 5R01HL087679-02 through the STAMPEED program (1RL1MH083268-01), 1K23HL094531-01); US NIH National Institute of Allergy and Infectious Diseases (NIAID); US NIH National Institute of Child Health and Human Development (NICHD); US NIH National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) (DK063491); US NIH National Institute of Environmental Health Sciences (NIEHS) (ZO1 ES49019, ES015794); US NIH National Institute of Mental Health (NIMH) (5R01MH63706:02); US NIH National Institute of Neurological Disorders and Stroke (NINDD); US NIH National Institute on Aging (NIA) (R01 AG032098, RC1 AG035835, N01AG12100, N01AG62101, N01AG62103, N01AG62106, 1R01AG032098-01A1, AG-023269, AG-15928, AG-20098, AG-027058); Wellcome Trust (077016/Z/05/Z, GR069224, 068545/Z/02, 076113/B/04/Z, 079895).
The study consisted of two stages. Stage 1 was a meta-analysis conducted on directly genotyped and imputed SNPs from individuals of European ancestry in 23 studies, with a total of 48,201 individuals. Supplementary Table 1A gives details of these studies. Thirty-four SNPs selected according to the results in Stage 1 were followed up in Stage 2. The ten leading SNPs were followed up in up to 46,411 individuals of European origin and the remaining 24 SNPs were followed up in a subset of up to 21,674 individuals (Figure 1).
A total of 23 studies, 17 from the SpiroMeta consortium and six studies from the CHARGE consortium, formed Stage 1: AGES, ARIC, B58C T1DGC, B58C WTCCC, BHS1, CHS, ECRHS, EPIC (obese cases and population-based studies), the EUROSPAN studies (CROATIA-Korcula, ORCADES and CROATIA-Vis), FHS, FTC (incorporating the FinnTwin16 and Finnish Twin Study on Aging), Health 2000, Health ABC, KORA F4, KORA S3, NFBC1966, RS-I, RS-II, SHIP and TwinsUK-I (see Supplementary Table 1 for definitions of abbreviations). Measurements of spirometry for each study are described in the Supplementary Note. The genotyping platforms and quality-control criteria implemented by each study are described in Supplementary Table 1B.
Imputation of non-genotyped SNPs was undertaken with MACH47, IMPUTE48 or BIMBAM49 with pre-imputation filters and parameters as shown in Supplementary Table 1B. SNPs were excluded if the imputation information, assessed using r2.hat (MACH), .info (IMPUTE) or OEvar (BIMBAM), was <0.3. In total, 2,706,349 SNPs were analyzed.
Linear regression of age, age2, sex, height and ancestry principal components was undertaken on FEV1 (milliliters) and FEV1/FVC (percent). The residuals were transformed to ranks and then transformed to normally distributed z-scores. These transformed residuals were then used as the phenotype for association testing under an additive genetic model, separately for ever-smokers and for never-smokers. The software used is specified in Supplementary Table 1B. Appropriate tests for association in related individuals were applied where necessary, as described in the Supplementary Note.
All Stage 1 study effect estimates, both for ever-smokers and never-smokers were corrected using genomic control50 and were oriented to the forward strand of the NCBI build 36 reference sequence of the human genome, consistently using the alphabetically higher allele as the coded allele. Study-specific lambda estimates are shown in Supplementary Table 1. For each study, effect estimates and standard errors for ever-smokers and never-smokers were meta-analysed using inverse variance weighting. Genomic control was applied again to the pooled effect-size estimates for each study. Finally effect-size estimates and standard errors were combined across studies using inverse variance weighting meta-analysis, and genomic control was applied to the pooled effect-size estimates. To describe the effect of imperfect imputation on power, for each SNP we report the effective sample size (N effective); the sum of the study-specific products of the sample size and the imputation quality metric. Meta-analysis statistics and figures were produced using R version 2.9.2.
All regions selected for follow-up in Stage 2 contained a lead SNP with novel evidence of association (all with P<3×10−6) with FEV1 and/or FEV1/FVC, an N effective ≥70% of the total Stage 1 sample size and the association signals from surrounding SNPs were consistent with their correlation (linkage disequilibrium) with the leading SNP. Twenty-nine independent regions with a leading SNP meeting these criteria were assessed in Stage 2. Regions were defined as independent if the leading SNP from one region was >500kb from the leading SNP of any other region. Long range linkage disequilibrium was also investigated between leading SNPs of regions in or near the MHC on chromosome 6 (Supplementary Note). For two regions, the leading SNP had an N effective ≥70% but <80% of the Stage 1 sample size and therefore a proxy SNP (r2=1 and 0.97) was also taken forward. For three regions, there were different leading SNPs for FEV1 and FEV1/FVC and so both leading SNPs were assessed. A total of 34 SNPs were analysed in Stage 2 and are listed in Supplementary Table 5B. Previously reported regions8-11,51,52 were not followed up. We present association test statistics in Stage 1 only for relevant SNPs from previously reported regions in Supplementary Table 5A.
The 34 SNPs were followed up in up to 11,275 individuals from seven studies with in silico data; CARDIA, CROATIA-Split, LifeLines, LBC1936, MESA-Lung, RS-III and TwinsUK-II (Supplementary Table 1). SNP rs2647044 was not available from TwinsUK-II.
The 34 SNPs were ranked by P value (for association with either FEV1 or FEV1/FVC) and the top ten leading SNPs selected for follow up by genotyping in up to 35,136 individuals from ADONIX, BHS2, BRHS, BWHHS, Gedling, GS:SFHS, HCS, Nottingham Smokers, NSHD and SAPALDIA (Supplementary Table 1). If a SNP within the top ten had an N effective <80%, only the proxy SNP was included in the top ten for follow up. For regions which showed association with both FEV1 and FEV1/FVC, only the leading SNP with the lowest P value for either trait was included if it was within the top ten SNPs. The study design is illustrated in Figure 1.
All Stage 2 studies provided effect estimates for ever-smokers and never-smokers, apart from Nottingham Smokers since they only had smokers. Studies with family data (BHS2 and GS:SFHS) analysed ever- and never-smokers together to account for the family correlation, adding the smoking status as a covariate in the model, and therefore provided smoking adjusted effect estimates. All Stage 2 study effect estimates were oriented to the forward strand of the NCBI build 36 reference sequence of the human genome, consistently using the alphabetically higher allele as the coded allele. For each study with separate results for ever- and for never-smokers, effect estimates and standard errors for ever- and never-smokers were meta-analysed using inverse variance weighting. Genomic control was applied to the pooled effect sizes of those studies with in silico data that undertook the analysis genome-wide. Effect estimates and standard errors were combined across Stage 2 studies using inverse variance weighting meta-analysis.
Meta-analysis of Stage 1 and Stage 2 results was undertaken using inverse variance weighting. We described associations as genome-wide significant if P<5×10−8.
The mRNA expression profiles of TGFB2, MFAP2, HDAC4, EVI1, RARB, SPATA9, ARMC2, NCR3, CDC123, LRP1, CCDC38, SNRPF, MMP15, CFDP1, ZKSCAN3, KCNE2 and C10orf11 were determined in human lung tissue and primary cell samples using RT-PCR, including RNA from lung (Ambion/ABI), brain, airway smooth muscle cells and human bronchial epithelial cells (Clonetics42). Primer sequences are listed in Supplementary Table 2. Full details are provided in the Supplementary Note.
In order to permit comparison of findings with recent studies of relevance to the field, we present association test statistics (in Stage 1 only) for relevant SNPs from previously reported regions (Supplementary Table 5A). We included regions: (i) reported as showing genome-wide significant association (P<5×10−8) with lung function; (ii) reported as showing genome-wide significant association with COPD, providing that there was additional evidence of association with lung function and; (iii) DAAM2, which reached borderline significance in the SpiroMeta consortium9. Within each of these regions, if multiple SNPs had been reported, we included all relevant SNPs and also the SNP that showed the strongest association in our data.
Regions associated (P<5×10−8) with lung function or COPD (and also associated with lung function) were looked up for other traits. Where multiple SNPs were reported for different traits or by different investigators, we aimed to include all relevant SNPs, except those with r2>0.9 with another SNP in the region. We also included the SNPs that showed the strongest association in our data for each region. The following related traits were assessed: (i) lung function in children (Supplementary Table 3A); (ii) smoking amount and ever-smoking versus never-smoking in the Ox-GSK consortium25 dataset (Supplementary Table 3B); (iii) height in the GIANT consortium14 dataset (Supplementary Table 3C and 3E) and (iv) lung cancer in the International Lung Cancer Consortium (ILCCO) GWAS meta-analysis26 (Supplementary Table 3D).
We used the approach proposed by Park et al.34 to estimate the number of independent variants associated with lung function measures that have similar effect sizes to the variants already reported, and to calculate the proportion of the variance explained by them. We excluded discovery data when estimating effect sizes to avoid winner’s curse bias, and obtained the number of undiscovered variants using the discovery power to detect the unbiased effect sizes (Supplementary Table 6 and Supplementary Note).
The top SNPs from our novel loci, and proxies, were searched for correlation with known common copy number variants and expression SNPs. Analyses to identify common pathways underlying the association signals for lung function were undertaken using MAGENTA v253 and GRAIL54. Full methods and results are given in the Supplementary Note.
Author contributions are listed in alphabetical order. See Supplementary Note for definitions of study acronyms.
Project conception, design and management. Stage 1 GWAS, AGES: G.E., M.G., V.G., T.B.H., L.J.L. ARIC: S.J.L., N.F., L.R.L., D.J.C., D.B.H., B.R.J., A.C.M., K.E.N. B58C-T1DGC: D.P.S. B58C-WTCCC: D.P.S. BHS1: A.L.J., A.W.M., L.J.P., CHS: S.A.G., S.R.H., T.L., B.M.P. CROATIA-Korcula: H.C., I.G., S.J., I.R., A.F.W., L.Z. CROATIA-Vis: H.C., C.H., O.P., I.R., A.F.W. ECRHS: D.L.J., E.O., I.P, M.W. EPIC: N.J.W. FHS: J.B.W., G.T.O. FTC: J.K., K.H.P., T. Rantanen. Health ABC: M.C.A., P.A.C., T.B.H., S.B.K., Y.L., B.M. Health 2000: M.H., M.K. KORA F4: J. Heinrich. KORA S3: C.G., H.E.W. NFBC1966: P.E., A-L.H., M-R.J., A.P. ORCADES: H.C., S.H.W., J.F.W., A.F.W. RS: A. Hofman. SHIP: S.G., G.H., B.K., H.V. TwinsUK: T.D.S., G.Z. Stage 2 follow-up, ADONIX: J. Brisman., A-C.O. BHS2: J. Beilby. BRHS: R.W.M., S.G.W., P.H.W. BWHHS: G.D.S., S.E., D.A.L., P.H.W. CARDIA: A.S.CROATIA-Split: M.B., I.K., T.Z. GS:SFHS: C.M.J., S.M.K., A.D.M, D.J.P. HCS: C.C., J.W.H., A.A.S. LBC1936: I.J.D., S.E.H., J.M.S. LifeLines: H.M.B., D.S.P., J.M.V., C. W. MESA-Lung: R.G.B., J.L.H. Nottingham smokers: I.P.H. NSHD: R.H., D.K. SAPALDIA: N.P.-H., T. Rochat. Look-up studies, ALSPAC: R.G., J. Henderson. ILCCO: ILCCO data. Ox-GSK: C.F., J.M.
Phenotype collection and data management. Stage 1 GWAS, AGES: T.A. ARIC: D.J.C., N.F., L.R.L., A.C.M., K.E.N. B58C-T1DGC: A.R.R., D.P.S. B58C-WTCCC: A.R.R., D.P.S. BHS1: A.L.J., A.W.M., L.J.P. CHS: S.A.G., S.R.H., T.L., B.M.P.CROATIA-Korcula: I.G., S.J., O.P., I.R., L.Z. CROATIA-Vis: H.C., C.H., O.P., I.R., A.F.W. ECRHS: D.L.J., E.O., I.P, M.W. EPIC: N.J.W. FHS: J.B.W., G.T.O. FTC: J.K., K.H.P., T. Rantanen. Health ABC: P.A.C., B.M., W.T. Health 2000: M.H., M.K. KORA F4: S.K, H.S. KORA S3: N.P.-H. NFBC1966: P.E., A-L.H., M-R.J., A.P. ORCADES: H.C., S.H.W., J.F.W. RS: G.G.B, M.E., D.W.L., B.H.Ch.S. SHIP: S.G., B.K., H.V. TwinsUK: C.J.H., P.G.Hysi., M.M., T.D.S., G.Z. Stage 2 follow-up, ADONIX: J. Brisman., A-C.O. BHS2: J. Beilby., M.L.H. BRHS: R.W.M., S.G.W., P.H.W. BWHHS: G.D.S., S.E., D.A.L., P.H.W. CARDIA: O.D.W. CROATIA-Split: M.B., I.K., T.Z. GS:SFHS: C.M.J., A.D.M. HCS: C.C., K.A.J., A.A.S. LBC1936: I.J.D., L.M.L., J.M.S. LifeLines: D.S.P., J.M.V. MESA-Lung: R.G.B., J.L.H. Nottingham smokers: K.A.A-B., J.D.B., I.P.H., A. Henry., M.O., I. Sayers. NSHD: R.H., D.K. SAPALDIA: N.P-H. Look up studies, ALSPAC: R.G., J. Henderson. ILCCO: ILCCO. Raine: W.Q.A., P.G. Holt., C.E.P., P.D.S
Genotyping. Stage 1 GWAS, B58C-T1DGC: W.L.M. B58C-WTCCC: W.L.M. BHS1: A.L.J., A.W.M., L.J.P. CHS: S.R.H., B.M.P., J.I.R. CROATIA-Vis: C.H., I.R., A.F.W. ECRHS: M.W. EPIC: I.B., R.J.F.L., J.H.Z. FTC: J.K. Health ABC: Y.L., K.L. Health 2000: S.R., I. Surakka. KORA F4: N.K. KORA S3: C.G. NFBC1966: P.E., A-L.H., M-R.J., A.P., A.R. ORCADES: H.C., J.F.W. RS: F.R., A.G.U. SHIP: G.H. TwinsUK: C.J.H., S-Y.S. Stage 2 follow-up, ADONIX: S.D., F.N., A-C.O. BHS2: J. Beilby, G.C., J.H. BRHS: A.D.H., R.W.M. BWHHS: S.E., D.A.L. CARDIA: M.F., X.G. CROATIA-Split: V.B., T.Z. Gedling: J.R.B., T.M. GS:SFHS: C.M.J., S.M.K., D.J.P. HCS: J.W.H. LBC1936: I.J.D., S.E.H., L.M.L., J.M.S. LifeLines: C.W. MESA-Lung: S.S.R. NSHD: D.K., A.W. SAPALDIA: M.I., F.K. Look up studies, ALSPAC: S.M.R., W.L.M. ILCCO: ILCCO. Raine: W.Q.A., C.E.P.
Data analysis. Stage 1 GWAS, AGES: G.K.G., A.V.S. ARIC: N.F., D.B.H., L.R.L. B58C-T1DGC: A.R.R., D.P.S., B58C-WTCCC: A.R.R., D.P.S. BHS1: N.M.W. CHS: K.D.M., J.I.R. CROATIA-Korcula: C.H., J.E.H, V.V. CROATIA-Vis: C.H, V.V. ECRHS: D.L.J., A.R. EPIC: J.H.Z. FHS: J.B.W. FTC: I. Surakka. Health ABC: P.A.C., Y.L., K.L., W.T. Health 2000: M.K., S.R., I. Surakka. KORA S3: E.A. NFBC1966: A.R. ORCADES: C.H., V.V. RS: M.E., D.W.L. SHIP: S.G., G.H., B.K, H.V. TwinsUK: M.M., G.Z. Stage 2 follow-up studies, ADONIX: S.D., F.N. BHS2: G.C. BRHS: R.W.M. BWHHS: D.A.L. CARDIA: M.F., X.G. HCS: J.W.H., K.A.J. LBC1936: L.M.L. LifeLines: H.M.B. MESA-Lung: A.M., S.S.R. Nottingham smokers: I. Sayers., A. Henry. NSHD: D.G., R.H. SAPALDIA: I.C., M.I. Look up studies, ALSPAC: D.M.E. ILCCO: ILCCO. Ox-GSK: J.Z.L. Raine: W.Q.A.
Analysis group: SpiroMeta consortium, I.P.H., T.J., M.S.A., M.D.T., L.V.W. CHARGE consortium, N.F., S.J.L., D.W.L., K.D.M., A.V.S., W.T., J.B.W.
Expression profiling and bioinformatics group: SpiroMeta consortium, I.P.H., M.O., I.Sayers, M.S.A., M.D.T., L.V.W. CHARGE consortium, S.A.G., D.W.L.
Writing group: SpiroMeta consortium, P.E., I.P.H., M.O., M.S.A., D.P.S., M.D.T., L.V.W. CHARGE consortium, S.J.L., D.W.L., S.A.G., G.T.O., V.G., B.H.Ch.S., W.T.
COMPETING FINANCIAL INTERESTS:
Inês Barroso and spouse own stock in Incyte Ltd and GlaxoSmithKline. Fredrik Nyberg is employed by AstraZeneca R&D, 431 83 Mölndal, Sweden. Professor Postma has received unrestricted research grants from and has been consultant to AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, Nycomed, TEVA. Clyde Francks is a full-time employee of the company GlaxoSmithKline (GSK) and GSK also funded several aspects of the study as detailed in the ACKNOWLEDGMENTS section for Ox-GSK.