|Home | About | Journals | Submit | Contact Us | Français|
Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
Saccular intracranial aneurysms (IAs) are balloon-like dilations of the intracranial arterial wall; their hemorrhage commonly results in severe neurologic impairment and death. We report a second genome-wide association study with discovery and replication cohorts from Europe and Japan comprising 5,891 cases and 14,181 controls with ~832,000 genotyped and imputed SNPs across discovery cohorts. We identified three new loci showing strong evidence for association with IA in the combined data set, including intervals near RBBP8 on 18q11.2 (OR=1.22, P=1.1×10-12), STARD13/KL on 13q13.1 (OR=1.20, P=2.5×10-9) and a gene-rich region on 10q24.32 (OR=1.29, P=1.2×10-9). We also confirmed prior associations near SOX17 (8q11.23-q12.1; OR=1.28, P=1.3×10-12) and CDKN2A/B (9p21.3; OR=1.31, P=1.5×10-22). It is noteworthy that several putative risk genes play a role in cell-cycle progression, potentially affecting proliferation and senescence of progenitor cell populations that are responsible for vascular formation and repair.
IA affects approximately 2% of the general population and arises from the action of multiple genetic and environmental risk factors1. We previously reported the first genome-wide association study (GWAS) of IA2 that identified three IA risk loci on chromosomes 8q11.23-q12.1, 9p21.3 and 2q33.1 with P < 5×10-8. This previous study had limited power to detect loci imparting genotypic relative risk (GRR) < 1.35 (Supplementary Table 1).
To increase the power to detect additional loci of similar or smaller effect, we ascertained and whole-genome genotyped 2 new European case cohorts (n = 1,616) and collected genotyping data from 5 additional European control cohorts (Supplementary Note, n = 11,955). We also increased the size of the original Japanese replication cohort and added a new Japanese replication cohort (2,282 cases and 905 controls) (Table 1). The new combined cohort has nearly 3-fold more cases than the original cohort and increased our power to detect variants with modest effect sizes. For example, this study had 89% and 64% average power to detect common variants (minor allele frequencies ≥ 10%) with GRR of 1.25 and 1.20, respectively (Supplementary Table 1).
All subjects were genotyped using the Illumina platform. The new as well as the previously analyzed genotyping data were subjected to well-established quality control (QC) measures (Supplementary Table 2). We sought to eliminate potential confounding due to population stratification and gender1,3 by matching cases and controls of the same gender based on inferred genetic ancestry. As previous studies4,5 demonstrated that the Finnish population forms an ancestry cluster distinct from other European populations like those included in this study, we analyzed our Finnish cohort independently from others. To maximize opportunities for genetic matching and analytic power, we analyzed all subjects in the remaining European cohorts together. The resulting matched case-control data consisted of 808 cases and 4,393 controls in the Finnish (FI) cohort and 1,972 cases and 8,122 controls in the rest of the combined European (CE) cohort (Supplementary Table 3). We used the QC-passed genotype data and phased chromosomes from the HapMap CEU sample to impute missing genotypes6. We based our further analyses on 831,534 SNPs that passed the QC filters both in the FI and CE samples (Table 1 and Supplementary Table 2).
We tested for association of each QC-passed SNP with IA using conditional logistic regression, assuming a log-additive effect of allele dosage. We corrected each cohort for residual overdispersion (Table 1) using genomic control7, and combined the results from FI and CE to obtain P-values, odds ratios (ORs) and confidence intervals (CIs) for the discovery cohort of 2,780 cases and 12,515 controls using a fixed-effects model.
To evaluate the strength of association, in addition to using P-values, we employed a Bayesian approach8. We used the Bayes factor (BF) that represents the fold-change of the odds of association before and after observing the data9, and the posterior probability of association (PPA), calculated through the BF, that provides a simple probabilistic measure of the evidence of association8,10. For every SNP, we assumed a uniform prior probability of association of 1/10,000 and set the prior of the logarithm of per-allele OR as a normal distribution with a 95% probability of the OR to be between 0.67 and 1.5, with larger weights for smaller effect sizes9,11.
From the discovery results, we eliminated 2 imputed SNPs that showed PPAs of 0.97 and 0.94 as their association signals were not supported by surrounding genotyped SNPs and their genotypes were not confirmed by direct genotyping results (data not shown). This resulted in 831,532 QC-passed SNPs (Supplementary Table 2).
We observed 3 regions that showed very high PPA (> 0.995; Fig. 1a) and also a substantial excess of SNPs with P < 1×10-3 (1,295 SNPs versus 831 SNPs expected by chance) even after excluding those within previously identified associated regions2 (Fig. 1b). Moreover, we observed a strong correlation between the P-values and BFs for the upper tail of the distribution (Fig. 1c).
We focused on 5 genomic regions (Fig. 1a) that contained at least one SNP with PPA > 0.5, for which the hypothesis of association with IA is more likely than the null hypothesis of no association. The PPAs and P-values of the most highly associated SNPs in these intervals ranged from 0.6621 to > 0.9999 and 7.9×10-7 to 2.2×10-16, respectively (Supplementary Table 4). The 5 chromosomal segments included 3 newly identified SNP clusters at 10q24.32, 13q13.1 and 18q11.2. The remaining 2 regions were previously identified loci at 8q11.23-q12.1 and 9p21.32 (Fig. 2). The third locus identified in our previous study at 2q33 did not contain any SNPs with PPA > 0.5. Furthermore, consistent with our previous results2, detailed analysis of the 8q11.23-q12.1 region detected two independent association signals within < 100 kb interval that spans the SOX17 locus (Fig. 2 and Supplementary Fig. 1); hereafter these two signals are referred to as 5′-SOX17 and 3′-SOX17. Thus the 5 chromosomal segments comprised 6 independent association signals for follow-up.
We performed replication genotyping in 2 Japanese cohorts including 3,111 cases and 1,666 controls (JP1 and JP2, see Table 1). For each independent signal, we selected for replication the genotyped SNP with the highest PPA, and added up to 2 additional SNPs per locus. For the 5′-SOX17 region, we selected 2 SNPs analyzed previously, as they tag the best SNP in the current study (Supplementary Fig. 1).
All but one of the SNPs (rs12411886 on 10q24.32 in JP1) were successfully genotyped and passed QC filters. We tested for association of each SNP with IA using logistic regression stratified by gender, specifying the same model as for the discovery cohort (Supplementary Table 5). We combined results from JP1 and JP2 using a fixed-effects model (Table 2 and Supplementary Table 4). We considered an association to be replicated if the BF increased the odds of association > 10-fold after observing the replication data.
Of the 6 candidate loci, all but the 5′-SOX17 interval were replicated, with replication P-values ranging from 0.0019 to 1.0×10-7 and the odds of association with IA increasing by 22.9 to 1.5×105-fold, yielding robust evidence for replication for each interval (Table 2).
We combined the discovery and replication results using a fixed-effects model. All of the 5 loci that replicated in the Japanese cohort surpassed the conventional threshold for genome-wide significance (P < 5×10-8), with P-values ranging from 2.5×10-9 to 1.5×10-22, and all also had PPAs ≥ 0.998 (Table 2).
In order to determine each cohort's contribution to the observed association and to assess the consistency of the effect size across cohorts, we analyzed each ascertained cohort separately (Table 1 and Supplementary Table 5) and then combined the results from the 6 cohorts using a random-effects model. The association results remained highly significant (Fig. 3). For the 5 loci that were replicated in the Japanese cohorts, we found no evidence of significant heterogeneity across cohorts (P > 0.1). Every cohort had the same risk allele and provided support for association with the exception of JP1 cohort for the 3′-SOX17 locus, consistent with our previous study2 (Fig. 3).
The most significant association was detected in the previously reported2 9p21.3 region near CDKN2A and CDKN2B with P = 1.5×10-22 (OR = 1.32; PPA > 0.9999). All of the newly studied cohorts strongly supported this association with IA (Fig. 3). These same alleles are also associated with coronary artery disease, but not with type 2 diabetes12. Similarly, the previously reported 8q11.23-q12.1 region showed significant association. The 3′-SOX17 interval (rs92986506) showed robust association with P = 1.3×10-12 (OR = 1.28; PPA > 0.9999) and all new cohorts supported association of this SNP with IA (Fig. 3). For the 5′-SOX17 region (rs10958409), the new cohorts introduced a substantial heterogeneity across cohorts, lowering PPA to 0.016 (Fig. 3).
Among the newly identified loci, the strongest association was found at rs11661542 on 18q11.2 (OR = 1.22; P = 1.1×10-12; PPA > 0.9999). A cluster of SNPs that are associated with IA spans the interval between 18.400Mb and 18.509Mb and are strongly correlated with rs11661542 (Fig. 2). A single gene, RBBP8 (retinoblastoma binding protein 8), is located within an extended linkage disequilibrium (LD) interval (Fig. 2).
The second strongest new association was at rs12413409 on 10q24.32 (OR = 1.29; P = 1.2×10-9; PPA = 0.9990), which maps to intron 1 of CNNM2 (cyclin M2) (Fig. 2). A cluster of SNPs strongly correlated with rs12413409 and located within a ~247kb interval in the same LD block supported the association (Fig. 2).
The third new locus is defined by rs9315204 at 13q13.1 (OR = 1.20; P = 2.5×10-9; PPA = 0.9981) in intron 7 of STARD13 (StAR-related lipid transfer (START) domain containing 13) (Fig. 2). Two SNPs, rs1980781 and rs3742321, that are strongly correlated with rs9315204 (r2 > 0.9) also showed significant association with IA (Fig. 2 and Supplementary Table 4). These two SNPs are missense (lysine to arginine) and synonymous coding variants of STARD13, respectively. Another gene that has been implicated in aging phenoytpes, KL (klotho), is located nearby13.
A search of the gene-expression database (eQTL browser, http://eqtl.uchicago.edu/) for all the IA-risk loci did not reveal any consistent pattern of association of IA SNPs with variation in gene expression levels.
In this second GWAS of IA, which included nearly 3 times as many cases as the initial study, we detected 3 novel risk loci and obtained strong independent evidence for association of 2 previously identified loci. The evidence that these are bona fide risk loci for IA is very strong from both Bayesian measures and conventional P-values.
Given our power (~90%) to detect variants that confer risk of IA with GRR = 1.25 and MAFs ≥ 10%, we expect that we have identified most of these variants, limited principally by potential gaps in SNP coverage. Indeed, across the rest of the genome, there was no locus with PPA > 0.22 and MAF ≥ 10%, while there were 14 loci with PPAs between 0.1 and 0.22 and ORs between 1.16 and 1.25 (data not shown). We expect that a fraction of these loci are genuine IA risk loci, as suggested by the excess of SNPs with P < 1×10-3 (Fig. 1b); exploring this possibility will require analysis of still larger IA cohorts and/or genotyping of alleles with lower MAF.
Based on the results of the first GWAS of IA and the role of the implicated gene products, Sox17 and p15INK4b/p16INK4a, we previously hypothesized2 that the IA genes implicated might play a role in determining cell cycle progression, affecting proliferation14 and senescence of progenitor cell populations and/or the balance between production of progenitor cells versus cells committed to differentiation. Genes located within the newly identified regions support this idea. RBBP8, located within the 18q11.2 region, influences progression through the cell cycle by interacting with BRCA115. Similarly, of the two genes located within the 13q13.1 interval, STARD13 contains Rho-GAP and C-terminal STAR related lipid transfer (START) domains and its overexpression results in suppression of cell proliferation16. The other gene, KL, encodes a transmembrane protein that modulates FGF receptor specificity17; KL-deficient mice display accelerated aging in diverse organ systems13.
On the assumption that there is a four-fold increase in the risk of IA among siblings of cases18,19 and that the SNPs combine to increase log-odds of disease in an additive fashion, the 5 IA risk loci explain 5.2% (FI), 4.0% (CE) and 3.5% (combined JP1 and JP2) of the familial risk of IA. Under this model, the odds of developing IA varies 4.99 to 7.63 fold across the top and bottom 1% of genetic risk profile at these loci in these populations and 3.61 to 4.64 fold across the 5% extremes (Supplementary Fig. 2). When combined with traditional risk factors such as gender, blood pressure and smoking, these findings form the basis of future work aimed at pre-clinical identification of individuals who are at high risk of IA formation and rupture.
Whole-genome genotyping for discovery cohort was performed on the Illumina platform according to the manufacturer's protocol (Illumina, SanDiego, CA, USA). Beadchips used for individual cohorts are presented in Supplementary Table 2. Replication genotyping in the JP1 cohort was performed using either Taqman (Applied Biosystems) or MassARRAY (Sequenom) assays. For the JP2 cohort, genotyping for cases was performed using the multiplex PCR-based Invader assay (Third Wave Technologies Inc.); genotyping for controls was performed on Illumina platform as described previously20.
The study protocol was approved by the Yale Human Investigation Committee (HIC protocol #7680). Institutional review board approval for genetic studies, along with written consent from all study participants, was obtained at all participating institutions.
Prior to the analysis of genotyping data, we excluded SNPs that were located either on mtDNA or sex chromosomes; with A/T or C/G alleles; for which all subjects were assigned as ‘no call’; and assayed on Hap300v1 or 550v1 but dropped from newer versions.
We excluded subjects in the discovery cohort that did not conform to our study design on the basis of genotyping and information quality, cryptic relatedness and population outliers. We summarized the sample exclusion steps in Supplementary Table 2. This filtering process resulted in 835 cases and 6,529 controls in the Finnish (FI) cohort and 2,000 cases and 8,722 controls in the rest of the combined European (CE) cohort.
We performed imputation analysis with the HapMap phase II CEU reference panel (release 24) using the IMPUTE v1 software6. The analysis was performed separately for the FI and CE cohorts. We converted posterior probabilities of three possible genotypes to the fractional allele dosage scores (between 0 and 2) and used these scores for association tests in order to take into account the imputation uncertainty23. For the quality assessment of imputed SNPs, we also converted the posterior probabilities to the most likely genotypes with the threshold at 0.9.
Population stratification and independent genotyping of cases and controls are major causes of confounding in genome-wide association studies24. Because our study consisted of multiple independently ascertained cohorts that were genotyped separately, we performed a stringent analysis to control for these biases by inferring genetic ancestries of subjects25,26. We used the Laplacian eigenmaps27 to infer population structure. Following the determination of the number of dimensions (K + 1) using the threshold given in Lee et al.28, we used the K-dimensional non-trivial generalized eigenvectors29 to calculate the Euclidean distance between two subjects.
In the course of this analysis, we excluded “isolated” subjects who were identified by using the nearest-neighbor distance distributions in any of the 2-dimensional sections. After excluding these subjects, we observed 13 and 5 dimensions in FI and CE, respectively. The larger dimensions observed in FI could be attributable to the presence of many isolated populations in Finland5.
Before matching, we stratified data into males and females because female gender is a known risk factor of IA1,3. We also set the maximum distance between cases and controls to match to be less than 0.028 and 0.009 in FI and CE cohorts, respectively. These values were determined by examining the distribution of the nearest-neighbor distances in K-dimensions (data not shown). We matched cases and controls using the fullmatch function in the R-package optmatch30,31.
For both genotyped and imputed SNPs in the discovery cohort, we applied QC filters to individual cohorts and to cases and controls separately, on the basis of the missing rate, minor allele frequency (MAF) and the P-value of the exact test of Hardy-Weinberg equilibrium (HWE)32. For imputed SNPs, we also assessed imputation quality using the average posterior probability, MAF and allelic R2 metric33. Finally, we assessed differential missingness between cases and controls (Supplementary Table 2).
Any genotyped SNP that passed the QC filters both in the CE and FI cohorts is referred to as a “genotyped SNP” while one for which we used the QC-passed imputation data either in one or both of the cohorts is classified as an “imputed SNP”.
For genotyping data of the replication cohorts, we excluded SNPs if any of the following 3 conditions were met in either cases or controls: (i) missing rate > 0.05; (ii) P-value of the exact test of HWE < 0.001; or (iii) MAF < 0.01.
We tested for association between each QC-passed SNP and IA using the conditional and unconditional logistic regression for the discovery and replication cohorts, respectively34. For the discovery cohort, we used the matched strata to correct for potential confounding due to population stratification and gender, while for the replication cohorts we adjusted for gender. We assumed the log-additive effect of allele dosage on disease risk. We obtained P-values from the score test (two-sided) and estimated the logarithm of per-allele odds ratios (ORs) with standard errors (SEs) by maximizing the (conditional or unconditional) likelihood. Both the test statistic and the SE of log-OR were corrected using genomic control7. We performed the association analysis for FI and CE, as well as sub-cohorts of CE that consisted of NL cases, DE cases or @neurIST cases and their matched controls (Table 1 and Supplementary Table 3). We used the following R-functions to perform the association analysis: clogit, glm and snp.rhs.tests22.
We combined the cohort-wise per-allele ORs in FI and CE using a fixed-effects model of meta-analysis for 831,534 QC-passed SNPs to obtain the discovery results. For SNPs analyzed both in the discovery and replication cohorts, we combined JP1 and JP2 to obtain replication results and all 4 cohorts to obtain combined results. Our primary analysis was based on the fixed-effects model23. In order to assess the heterogeneity of the effect size between cohorts, we first divided CE into 3 cohorts as described above, aiming to analyze data without averaging effect sizes over the combined European cohorts, and then combined 6 cohorts using the random-effects model. We employed the restricted maximum likelihood procedure to estimate the between-cohort heterogeneity variance (τ 2) using the R-function MiMa35 (http://www.wvbauer.com/). From this estimate, we calculated the Cochran's Q statistic and the I2 statistic36.
To evaluate the strength of association, we employed a Bayesian approach9,37. A limitation of the use of P-values alone is that variability in factors such as effect size, MAF and sample size can result in identical statistics that might correspond to markedly different levels of evidence regarding the strength of association10. The Bayes factor (BF) provides an alternative that compares the probabilities of the data under the alternative hypothesis versus the null hypothesis. For computational simplicity, we approximated BF as described by Wakefield8. For all SNPs, we assumed a single prior for the log-OR: a normal distribution with mean 0 and standard deviation log(1.5)/ Φ-1(0.975), where Φ is the normal distribution function9.
The posterior probability of association10 (PPA) provides a simple probabilistic measure of evidence by introducing the prior probability of association, π1. We assumed a uniform prior, π1 = 1/10,000, for all the SNPs11. For BF > 106, changing π1 to a more conservative value of 1/100,000 would result in little change in the posterior probability of association.
To combine the results from multiple cohorts, we extended the formula38 to be applicable to multiple (> 2) cohorts.
For each region that contained a SNP with PPA > 0.5, we examined the number of independent association signals by testing for association of every genotyped SNP with IA by adjusting for the effect of a specified SNP (Supplementary Fig. 1).
We tested for deviation from a linear model, which assumes that two SNPs combine to increase the log-odds of disease in an additive fashion, using conditional (FI and CE) or unconditional (JP: JP1 plus JP2, stratified by cohorts and gender) logistic regression. There was no significant deviation from the linear model (data not shown).
We evaluated potential clinical implications of the genetic profiles of the 5 IA risk loci following the approach described by Clayton39. We fitted a 5-locus conditional (FI and CE) or unconditional (JP) logistic regression model including the additive and dominance-deviation terms for each locus. Using the estimated effect sizes and individual's genotypes, we calculated the risk scores for every individual. The receiver-operating characteristic (ROC) curve for each ethnic cohort (FI, CE and JP) was depicted using the risk score.
We also calculated the ratio of the exponential of the mean of the risk scores for control subjects within the top versus bottom 5 or 1% to obtain approximated odds ratios of disease between these classes.
The sibling recurrence risk was estimated by assuming the polygenic model that fits well to our data39. Fraction of the sibling recurrence risk attributable to all of the 5 loci was calculated by taking the ratio of the logarithm of this value and epidemiologically estimated value of 418,19.
We are grateful to the participants who made this study possible. We thank Andrea Chamberlain, Birgitt Meseck-Selchow and members of the Keck Foundation Biotechnology Resource Laboratory for their technical help. This study was supported by the Yale Center for Human Genetics and Genomics and the Yale Program on Neurogenetics, the US National Institute of Health grants R01NS057756 (M.G.) and U24 NS051869 (S.M.) and the Howard Hughes Medical Institute (R.P.L.). The @neurIST project was funded by European Commission, VI Framework Programme, Priority 2, Information Society Technologies, a European Public Funded Organization (Research Grant No. IST-FP6-027703). The Frankfurt case cohort collection was supported by BMBF (01GI9907), Utrecht Control cohort by the Prinses Beatrix Fonds and the Adessium foundation (L.H.vdB.). S.M. was supported in part by the Clinical and Translational Science Award UL1 RR024139, National Center for Research Resources, NIH. We would also like to acknowledge the use of Yale University Biomedical High Performance Computing Center (NIH grant: RR19895).
Competing Financial Interest: The authors declare competing financial interests. The authors have a provisional patent application under consideration based on the findings of this work.
Author Contributions: Study Cohorts: ascertainment, characterization and DNA preparation: M.N., M.v.u.z.F., E.G., J.E.J., J.H. and A.P. (FI case-control); Y.M.R. and G.J.E.R. (NL cases); P.B., T.D., J.B., G.Z., P.S., R.R., S.T., C.M.F., P.S., A.F.F., V.E., M.C.J.M.S., P.L., J.B., J.M. and D.R. (@neurIST case series); B.K., G.A., M.S., D.K., F.W., A.O., B.S., C.S., J.B., F.R., C.R., D.B., C.G., E.I.S., B.M., A.R. and H.S. (DE case series); A.T., A.H., H.K. and I.I. (JP1); S.K.L., H.Z. and Y.N. (JP2). Control Cohorts: A.A., A.P. and L.P. (Health2000); A.A., A.P. and L.P. (NFBC1966); C.M.v.D. and M.M.B.B. (Rotterdam Study); L.H.v.d.B. and C.W. (Utrecht); T.I. and H.E.W. (KORA-gen); S.S. (PopGen). Genotyping: K.B., Z.A., N.N., A.K.O., E.G., S.M., R.P.L. and M.G. (Yale); P.C., P.C. and F.C. (Aneurist); S.K.L., H.Z. and Y.N. (JP2). Data management and informatics: K.Y., K.B., Z.A., N.N. and M.G. (Yale); S.K.L., H.Z. and Y.N. (JP2 cohort); Statistical analysis: K.Y. and M.G. Writing team: K.Y., K.B., M.W.S., R.P.L. and M.G. Study design and analysis plan: K.Y., R.P.L. and M.G.