Search tips
Search criteria 


Lancet Neurol. 2012 November; 11(11): 951–962.
PMCID: PMC3490334

Genetic risk factors for ischaemic stroke and its subtypes (the METASTROKE Collaboration): a meta-analysis of genome-wide association studies

Matthew Traylor,a Martin Farrall,b,c Elizabeth G Holliday,d Cathie Sudlow,e Jemma C Hopewell,f Yu-Ching Cheng,g Myriam Fornage,h M Arfan Ikram,i,j,k Rainer Malik,l Steve Bevan,a Unnur Thorsteinsdottir,m,n Mike A Nalls,o WT Longstreth,p,q Kerri L Wiggins,r Sunaina Yadav,s Eugenio A Parati,t Anita L DeStefano,u Bradford B Worrall,v,w Steven J Kittner,x,y Muhammad Saleem Khan,s Alex P Reiner,z Anna Helgadottir,c,m,n Sefanja Achterberg,aa Israel Fernandez-Cadenas,ab Sherine Abboud,ac Reinhold Schmidt,ad Matthew Walters,ae Wei-Min Chen,m,af E Bernd Ringelstein,ag Martin O'Donnell,ah Weang Kee Ho,ai Joanna Pera,aj Robin Lemmens,ak,al,am Bo Norrving,an Peter Higgins,ae Marianne Benn,ao Michele Sale,af,ap Gregor Kuhlenbäumer,aq Alexander S F Doney,ar Astrid M Vicente,as Hossein Delavaran,an Ale Algra,aa,at Gail Davies,au Sofia A Oliveira,av Colin N A Palmer,ar Ian Deary,au Helena Schmidt,aw Massimo Pandolfo,ac Joan Montaner,ab Cara Carty,z Paul I W de Bakker,ax,ay,az Konstantinos Kostulas,ba Jose M Ferro,bb Natalie R van Zuydam,ar Einar Valdimarsson,bc Børge G Nordestgaard,ao,bd Arne Lindgren,an Vincent Thijs,ak,al,am Agnieszka Slowik,aj Danish Saleheen,ai,be,bf Guillaume Paré,bg Klaus Berger,bh Gudmar Thorleifsson,m The Australian Stroke Genetics Collaborative, Wellcome Trust Case Control Consortium 2 (WTCCC2), Albert Hofman,i,k Thomas H Mosley,bi Braxton D Mitchell,g Karen Furie,bj Robert Clarke,f Christopher Levi,bk Sudha Seshadri,bl Andreas Gschwendtner,l Giorgio B Boncoraglio,t Pankaj Sharma,s Joshua C Bis,r Solveig Gretarsdottir,m Bruce M Psaty,bm Peter M Rothwell,bn Jonathan Rosand,ay,bi,bo James F Meschia,bp Kari Stefansson,m,n Martin Dichgans,l Hugh S Markus,a,* and on behalf of the International Stroke Genetics Consortium



Various genome-wide association studies (GWAS) have been done in ischaemic stroke, identifying a few loci associated with the disease, but sample sizes have been 3500 cases or less. We established the METASTROKE collaboration with the aim of validating associations from previous GWAS and identifying novel genetic associations through meta-analysis of GWAS datasets for ischaemic stroke and its subtypes.


We meta-analysed data from 15 ischaemic stroke cohorts with a total of 12 389 individuals with ischaemic stroke and 62 004 controls, all of European ancestry. For the associations reaching genome-wide significance in METASTROKE, we did a further analysis, conditioning on the lead single nucleotide polymorphism in every associated region. Replication of novel suggestive signals was done in 13 347 cases and 29 083 controls.


We verified previous associations for cardioembolic stroke near PITX2 (p=2·8×10−16) and ZFHX3 (p=2·28×10−8), and for large-vessel stroke at a 9p21 locus (p=3·32×10−5) and HDAC9 (p=2·03×10−12). Additionally, we verified that all associations were subtype specific. Conditional analysis in the three regions for which the associations reached genome-wide significance (PITX2, ZFHX3, and HDAC9) indicated that all the signal in each region could be attributed to one risk haplotype. We also identified 12 potentially novel loci at p<5×10−6. However, we were unable to replicate any of these novel associations in the replication cohort.


Our results show that, although genetic variants can be detected in patients with ischaemic stroke when compared with controls, all associations we were able to confirm are specific to a stroke subtype. This finding has two implications. First, to maximise success of genetic studies in ischaemic stroke, detailed stroke subtyping is required. Second, different genetic pathophysiological mechanisms seem to be associated with different stroke subtypes.


Wellcome Trust, UK Medical Research Council (MRC), Australian National and Medical Health Research Council, National Institutes of Health (NIH) including National Heart, Lung and Blood Institute (NHLBI), the National Institute on Aging (NIA), the National Human Genome Research Institute (NHGRI), and the National Institute of Neurological Disorders and Stroke (NINDS).


Stroke is one of the three most common causes of death, is a major cause of adult chronic disability,1 and represents an important cause of age-related cognitive decline and dementia. Conventional risk factors explain only a small proportion of all stroke risk.2 Evidence from studies of twins and family history suggests that genetic predisposition is important.3 In common with many other complex diseases, in which environmental risk factors are thought to interact with multiple genes, the identification of the underlying molecular mechanisms contributing to stroke risk has been a challenge. Candidate gene studies have produced few replicable associations.4 More recently, the genome-wide association study (GWAS) approach has transformed the genetics of other complex diseases and is just beginning to affect the study of stroke.5,6

About 80% of stroke is ischaemic, whereas 20% is due to primary haemorrhage.6 Ischaemic stroke itself includes several subtypes with differing pathophysiological mechanisms, the most common of which are large-vessel disease stroke, small-vessel disease stroke, and cardioembolic stroke.7 Various genetic variants that predispose to risk factors for stroke have also been shown in GWAS to predispose to ischaemic stroke.8–10 Two loci associated with atrial fibrillation (PITX2 and ZFHX3) were associated with cardioembolic stroke, whereas a locus on chromosome 9p21 originally associated with coronary artery disease was shown to be a risk factor for large-vessel stroke.8–10 The few novel stroke-associated loci reported to date have been mainly associated with stroke subtypes, rather than with the phenotype of ischaemic stroke. In Japanese populations, a variant in the protein kinase C family (PRKCH) was associated with small-vessel stroke.11 A meta-analysis of prospective population-based cohort studies reported an association with the 12p13 region, thought to be with the NINJ2 gene, although this result was not replicated in a larger case-control sample.12,13 Recently, the Wellcome Trust Case Control Consortium 2 (WTCCC2) GWAS in ischaemic stroke reported a novel association on chromosome 7p21 within the HDAC9 gene, although it was associated only with large-vessel ischaemic stroke.14

GWAS in ischaemic stroke to date have used small discovery populations, with the largest including 3548 individuals.14 In other complex diseases, many additional associations have been detected as the discovery sample size has increased.15–17 This increase has usually been achieved by meta-analysis of independent datasets. Therefore, we established the METASTROKE collaboration to combine the available GWAS datasets of ischaemic stroke. Here, we describe the first paper from METASTROKE with a description of the constituent cohorts. Using this dataset, we attempted both to replicate previous GWAS associations with ischaemic stroke and to identify novel associations. Additionally, we determined whether stroke loci were specific to individual stroke subtypes.


Study design and participating studies

The discovery sample consisted of 15 cohorts of patients with ischaemic stroke who were of European ancestry from Europe, North America, and Australia, together with controls of matched ancestry. All studies used a case-control methodology. Most participating studies were cross-sectional, whereas four were in large, prospective, population-based cohorts (table 1).

Table 1
Description of cohorts used in analysis by study population

Additionally, 18 cohorts were analysed in the replication phase. These cohorts were included for replication only, most did not have GWAS data available; and those with GWAS data were not available at the time of the discovery analysis. 17 of the included cohorts contained individuals of solely European ancestry, and one contained individuals of Pakistani ancestry (table 1). Most cohorts (16) were cross-sectional, whereas two were population-based.

The appendix includes detailed descriptions of the design and clinical characteristics of the participating studies.

Stroke was defined as a typical clinical syndrome with radiological confirmation. Stroke subtyping was done with the Trial of Org 10172 in Acute Stroke Treatment (TOAST) classification system.18 Where subtyping was done, brain CT or MRI was undertaken for more than 95% of cases in all the discovery cohorts.

Participating studies were approved by relevant institutional review boards, and all participants gave written or oral consent for study participation, including genetic research, as approved by the local institutional body.

Data imputation and statistical analysis

The 15 discovery cohorts used commercially available GWAS panels of single nucleotide polymorphisms (SNPs) from either Affymetrix (Santa Clara, CA, USA) or Illumina (San Diego, CA, USA). 14 of the 15 centres undertook genotype imputation with HapMap II,19 HapMap III,20 or 1000 Genomes21 as reference haplotype training sets. Every centre did genotypic quality control steps before imputation, including removal of ancestry outliers defined by principal component analysis and poorly typed individuals.

We used logistic regression for all cohorts with a cross-sectional study design to model the multiplicative SNP effects on risk for the dichotomous outcome of stroke against ancestry-matched controls, whereas we used Cox proportional-hazards models for the prospective studies to assess time to first stroke, fitting an additive model relating genotype dose to the stroke outcome. Where genotypes were imputed, SNPs were modelled as allele dosages. Of the discovery cohorts, four (of 15) centres used ancestry-informative principal components as covariates to correct for population stratification. All cohorts providing genome-wide data removed population outliers before imputation. After verifying strand alignment, filtering SNPs with minor allele frequency lower than 0·01, and removing poorly imputed SNPs across centres, we did a meta-analysis of the results of the association analyses from every centre using a fixed-effects inverse-variance weighted model using METAL.22

We sought further evidence for association with novel suggestively associated SNPs in new samples from 18 different cohorts. Of the 18 centres, six submitted in-silico genotype data and 12 undertook direct genotyping with the Sequenom (Sequenom, San Diego, CA, USA) or Taqman (Applied Biosystems, Foster City, CA, USA) platforms. All of the five replication cohorts contributing genome-wide data used principal components as covariates in their analyses. We did a meta-analysis of the results for the replication cohorts using a fixed-effects, inverse variance weighted method first for all datasets, and then for replication datasets of solely European ancestry. We determined whether SNPs were significantly associated in the replication population, and additionally, we combined results from the discovery and replication analyses using a fixed-effects, inverse-variance weighted approach.

We set the study-wide genome-wide significance level at p<5×10−8 to control the experiment-wide error rate to <5%. Following the example of previous GWAS studies,15 we set the level for suggestive significance at p<5×10−6.

First, we attempted to determine the evidence for association for the six loci reported previously from GWAS to be associated with ischaemic stroke (HDAC9, PITX2, ZFHX3, NINJ2, PRKCH, and 9p21).8–12,14 After determining the evidence for association with the previously reported SNPs, we investigated whether any proxy SNPs were more significantly associated in the METASTROKE dataset. Because some loci had been identified in discovery populations included in METASTROKE, we initially did analyses for the whole dataset, and then we restricted analysis to the lead SNP for every locus in the METASTROKE cohorts that had not been included in the discovery phase of the initial publication. We set the significance level for independent replication at p<0·01, corresponding to Bonferroni corrected type 1 error <5% for the five SNPs (excluding PRKCH) tested.

As the SNP in PRKCH (rs2230500) underlying the previous association in Japanese cohorts1 is monomorphic in populations of European ancestry, we sought to identify any associations within this gene region, including the 50 kbp window upstream and downstream, in our large population of European ancestry. Using the modified Nyholt correction approach of Li and Ji on the 353 SNPs from the region, we estimated the effective number of SNPs tested to be 103·3.23 We therefore set the significance level at p<0·00048, corresponding to Bonferroni corrected type I error <5% for the effective SNPs tested.

We also did an analysis to determine whether the six previously reported variants were associated with stroke risk in prospective population-based studies. We did this analysis only for the known SNPs that had been analysed in a minimum of 100 cases in the prospective cohorts with incident stroke events for the relevant subtype.

For those associations we could confirm, we then did a conditional analysis within the associated region to identify any signal in the region that was independent of the lead SNP in every case. For every association, we selected regions used in the conditional analysis on the basis of adjacent recombination hotspots, meaning we analysed different numbers of SNPs for every locus (appendix). We used logistic regression in every centre, using imputed genotype dosages to model the effect of the lead SNP on risk as a covariate. We then did a meta-analysis of the results using a fixed-effects, inverse-variance weighted model. We used our suggestive significance threshold (p<5×10−6) to identify SNPs that were statistically independent of the lead SNP for every locus.

We then did a meta-analysis of the genome-wide study-specific analysed datasets to identify novel associations with ischaemic stroke and its subtypes. We did the primary association analyses for all ischaemic stroke and for the three major subtypes: cardioembolic stroke, large-vessel disease, and small-vessel disease. We did additional secondary analyses for young cases (younger than 70 years at first stroke) and for the phenotype of ischaemic stroke in each sex separately. We reused the same controls per centre for all analyses. Excluding the previously published associations, we considered all SNPs reaching suggestive significance (p<5×10−6) for replication. We examined SNPs for heterogeneity across datasets and attempted replication in independent datasets for the loci that were deemed plausible candidates for association with ischaemic stroke.

For a minor allele frequency of 0·25, we had 80% power to detect variants with a per-allele odds ratio (OR) greater than 1·11 for the all ischaemic stroke analysis, 1·23 for cardioembolic stroke, 1·24 for large-vessel disease, and 1·26 for small-vessel disease at p<5×10−8 in the discovery phase.

Role of the funding source

The sponsors of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.


The discovery meta-analysis of ischaemic stroke phenotypes involved a total of 12 389 cases and 62 004 controls from 15 populations (table 1; figure 1).

Figure 1
Flow diagram of METASTROKE analyses

The discovery meta-analysis confirmed associations at genome-wide significance levels for HDAC9 with large-vessel disease, and for both PITX2 and ZFHX3 with cardioembolic stroke (table 2). For PITX2, ZFHX3, and HDAC9 a proxy SNP was more significant in the METASTROKE dataset than the SNP from the original publication (original SNP shown in appendix). The 9p21 locus was associated with large-vessel disease with a similar OR (1·15, 95% CI 1·08–1·23, in METASTROKE) to that reported previously (1·21, 1·07–1·37),10 although it did not reach genome-wide significance (p=3·32×10−5). All four associations were subtype specific, being present only for a single stroke subtype (table 2). To determine the extent to which these results replicated the findings from the originally published associations, we repeated the meta-analysis, this time excluding the populations that contributed to the discovery phase of the original publication. For the PITX2, ZFHX3, HDAC9, and 9p21 loci, the associations were replicated in the independent METASTROKE samples (table 2). The population attributable risks in the METASTROKE discovery cohort were estimated as 5·8% for PITX2 and 7·0% for ZFHX3 in cardioembolic stroke, and 4·5% for HDAC9 and 7·2% for 9p21 in large-vessel disease.

Table 2
METASTROKE association signals for SNPs identified in previous genome-wide association studies by gene and disease subtype

The NINJ2 locus showed nominal evidence of association with all ischaemic stroke when all populations were included (table 2). However, no evidence was noted for association with the NINJ2 locus when the original discovery populations were excluded (table 2).

To estimate the effect of these associations in prospective population-based studies, we had a sufficient number of stroke cases for the analysis in only the cardioembolic subtype (n=376). We noted ORs similar to those identified in the overall case-control study for both PITX2 (1·26, 95% CI 1·05–1·52, in prospective studies and 1·36, 1·27–1·47, in case-control analysis) and ZFHX3 (1·23, 0·98–1·55, in prospective studies and 1·25, 1·15–1·35, in case-control analysis), although this similarity was significant only for PITX2 (appendix).

We found no significant associations between the PRKCH gene region and all ischaemic strokes or with the three main subtype analyses. Table 2 provides details of the most strongly associated SNPs in every subtype for this locus.

For those loci for which we confirmed genome-wide significance (PITX2, ZFHX3, and HDAC9), we did conditional analyses. After conditioning on the lead SNP in the given region, no SNP showed significance at p<0·01 in PITX2 or ZFHX3, and no SNP showed significance at p<0·005 in HDAC9. Furthermore, all other SNPs in the regions that were associated at p<5×10−8 in the main analysis showed no significance (p>0·05) in any of the analyses after conditioning on the lead SNP. Figure 2 shows plots of –log10(p values) against genomic position in the selected regions for the unconditional and conditional analyses.

Figure 2
Manhattan plots of –log10(p) against genomic position for principal analyses

We selected a total of 12 novel SNPs for testing in the independent replication cohort: three associated with all ischaemic stroke, five associated with specific stroke subtypes, and two each associated with young stroke and female stroke. Four of these SNPs showed associations close to genome-wide significance in the discovery cohort: rs225132 in the ERRF11 gene and rs17696736 in the NAA25 (C12orf30) gene with all ischaemic stroke (p=6·3×10−8 and 5·9×10−8, respectively), rs7937106 in ALKBH8 with large-vessel disease (p=5·9×10−8), and rs13407662 on chromosome 2p16.2 (p=5·2×10−8) in an intergenic region with small-vessel disease. The remaining SNPs were identified at the suggestive significance level of p<5×10−6. Table 3 shows details of these SNPs, including stroke subtypes with which they were associated, and significance levels. These 12 novel SNPs were taken forward for replication in an additional 13 347 cases and 29 083 controls. Figure 3 shows the plots of –log10(p values) by chromosomal location for the analysis of all stroke and the three main subtypes.

Figure 3
Plots of conditional analysis regions before and after conditioning on lead SNP
Table 3
Association signals for SNPs selecting for testing in the independent replication cohort by subtype

None of the novel SNPs reached genome-wide significance on combination of the discovery and replication data. This result was the same when replication analysis was restricted to individuals of European ancestry (table 3). There was significant heterogeneity (p<0·05) for all of the SNPs in the combined analysis. We had sufficient sample size to obtain 80% power to confirm each of the 12 loci (appendix).


METASTROKE is the first large meta-analysis of stroke GWAS data (panel). The METASTROKE collaboration brings together GWAS data from more than 12 000 cases of ischaemic stroke and 60 000 controls from 15 cohorts all of European ancestry. In this first analysis from the dataset, we confirmed four of five previously described associations with ischaemic stroke in populations of European ancestry, including replication in an independent non-overlapping sample of the dataset not included in the original GWAS. All these associations were with specific subtypes of ischaemic stroke, emphasising the genetic heterogeneity of the disease. Additionally, we identified several promising novel associations, some of which were close to genome-wide significance in the discovery cohorts, but these were not confirmed in our replication population.


Research in context

Systematic review

As part of the International Stroke Genetics Consortium we had access to several genome-wide association datasets for ischaemic stroke, including both published and unpublished studies. To identify other studies, we searched PubMed on July 30, 2012, for published genome-wide association studies in ischaemic stroke with the terms “ischaemic stroke” and “genome wide association”. The search returned studies already included by the consortium members. No further studies with ischaemic stroke as a primary endpoint were identified.


This is the largest analysis of genetic data for ischaemic stroke. This study provides evidence that common genetic variation has a role in the pathogenesis of ischaemic stroke. The genetic associations identified so far are with specific stroke subtypes, suggesting that the different subtypes of ischaemic stroke have different risk factor profiles and pathophysiological mechanisms, with potential implications for all areas of stroke research.

Our results provide further robust data supporting an association between two gene regions (PITX2 and ZFHX3) and cardioembolic stroke, and a further two (HDAC9 and 9p21) with large-vessel stroke although the 9p21 locus did not reach genome-wide significance. In all cases, these associations were present in the dataset as a whole, and also when those samples used in the original discovery cohorts that identified associations with ischaemic stroke were excluded.

Both PITX2 and ZFHX3 were originally identified as risk factors for atrial fibrillation.8,9 Atrial fibrillation is a major risk factor for stroke, particularly in the elderly, and therefore their association with ischaemic stroke is not unexpected. Our results confirm this association and clearly show that it is limited to the cardioembolic stroke subtype. Furthermore, we were able to show an association between PITX2 and ischaemic stroke in prospective cohorts. A potential bias is that a variant that is in fact associated with mortality rate after acute stroke and not with stroke risk might seem to be related to risk; for cross-sectional studies in a disease such as stroke, which has substantial early mortality, death might occur before or soon after hospital admission before samples are taken. In a prospective study, such cases are included as the sample was taken at recruitment to the study and therefore before the onset of stroke.

By contrast, the HDAC9 and 9p21 associations were specific to large-vessel stroke, and not present with other stroke subtypes. An association with the 9p21 locus was first associated with myocardial infarction and coronary artery disease but has now been associated more widely with other arterial diseases such as aneurysms and ischaemic stroke.10,24 HDAC9 was recently identified in the WTCCC2 ischaemic stroke study as a novel association with ischaemic stroke,14 having not previously been shown in GWAS analyses of ischaemic heart disease.

For the PITX2, ZFHX3, and HDAC9 associations, we did a conditional analysis to establish whether the lead SNP that we had identified was sufficient to model all of the associations within that region, or whether other independent genetic variants were associated with disease. In every case, no significant association remained after controlling for the lead SNP, suggesting that all the signal in each region can be attributed to one risk haplotype.

A meta-analysis of prospective cohort studies reported an association between ischaemic stroke and a SNP in the 12p13 region, although this was not replicated in an independent study.13 The underlying gene was suggested to be NINJ2.12 This association was present in the METASTROKE discovery cohort, but this cohort contained the datasets in which the original association had been determined. When these datasets were excluded, there was no evidence of any associations.

In a Japanese population, a variant in PRKCH has been associated with small-artery disease, a stroke subtype that is particularly common in this ethnic group.11 This association was confirmed in a prospective study with relatively few stroke endpoints, and also in a Chinese population.25,26 Interestingly, an association was also suggested with cerebral haemorrhage, which shares some underlying pathological similarities with cerebral small-vessel disease causing lacunar infarction. The association has not yet been examined in other ancestral groups. The SNP is monomorphic in European populations and therefore we were unable to examine whether the association was present in our population. However, we assessed all SNPs at this chromosomal region and noted no evidence of any association in our population of European ancestry.

We identified associations at four loci that were near genome-wide significance in the discovery cohort and had not been associated with stroke in previous studies: SNPs in the ERRF11 and NAA25 (C12orf30) genes with all ischaemic stroke, a SNP in ALKBH8 with large-vessel stroke, and rs13407662 on chromosome 2p16.2 in an intergenic region with small-vessel disease. We took these four forward, with an additional eight of the strongest associations that had not reached genome-wide significance, to replication in an independent sample. None of the associations replicated. Our replication sample contained a cohort of patients of Pakistani ancestry, but, restriction of our analysis to individuals of European ancestry did not alter the results.

The same risk allele of SNP rs17696736 in the NAA25 gene has previously been associated with type 1 diabetes in a large genome-wide association study.27 Other SNPs in this 12q24 region have also been implicated in several of related phenotypes including microcirculation in vivo, platelet count, and blood pressure.28–30 None of the other three associations near to genome-wide significance have previously been associated with cardiovascular or neurological disease.

Our inability to replicate any of the novel associations we identified in the discovery phase could be explained by various factors. All non-imputed SNPs in all cohorts were checked for Hardy-Weinberg equilibrium and standard quality control measures were done, including checking for sex mismatch on the basis of three genotypic markers, but we cannot rule out confounding by other means. For example, many of the 12 replication cohorts only directly genotyped the 12 replication SNPs. First, this type of analysis provides no means of adjustment for ancestry-informative principal components, which could lead to results being adversely affected by population structure. Second, our strategy of attempting replication with one SNP from each region might not have been optimum. In regions such as the 12q24 locus, where the linkage disequilibrium patterns are complex, attempting replication in multiple SNPs might have proved more fruitful. Furthermore, one SNP (rs13407662) associated with small-vessel disease in the discovery phase failed genotyping in more than half of the replication cohorts. Genotyping multiple SNPs at this locus might have avoided this issue. We also cannot rule out confounding because of other environmental factors or phenotypic heterogeneity. Although phenotyping was done using the TOAST classification system, interpretation of exact classification criteria and definitions can differ across countries and studies, which becomes more of an issue when there are many smaller cohorts, such as in the replication phase of this study. Varying cohort study designs might also increase heterogeneity in large-scale meta-analyses.

Our results show that although genetic variants can be detected with ischaemic stroke, all associations we were able to confirm were specific to a stroke subtype. This finding has two implications. First, to maximise success of genetic studies in ischaemic stroke, detailed stroke subtyping is needed. Second, it implies that different pathophysiological mechanisms are associated with different stroke subtypes and, therefore, drug treatments might have different effects in different stroke subtypes. Most trials of secondary prevention in stroke have included all strokes, with limited stroke subtyping, and further studies with the detailed subtyping would be required to show different pharmacological profiles.

METASTROKE brings together GWAS data from most groups working in the area of stroke genetics worldwide. This paper describes the details of every population and represents the first analysis of the datasets. Various additional GWAS studies in stroke are currently taking place or have recently been completed, including a recently published GWAS in an Australian population, which confirmed an association at a 6p21.1 locus with large-artery atherosclerotic stroke.31 The addition of these data might lead to identification of further novel associations with ischaemic stroke.


Atherosclerosis Risk in Communities Study (ARIC) is a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C), R01HL087641, R01HL59367, and R01HL086694; National Human Genome Research Institute contract U01HG004402; National Institutes of Health (NIH) contract HHSN268200625226C; and NHLBI contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, and grants R01-HL087641, U01 HL096917 (T H Mosley), and R01-HL093029 (M Fornage). Infrastructure was partly supported by Grant Number UL1RR025005, a component of the NIH and NIH Roadmap for Medical Research. ARIC analyses performed as part of the METASTROKE project were supported by grant HL-093029 to M Fornage. Australian Stroke Genetics Collaboration (ASGC) Australian population control data were derived from the Hunter Community Study. We also thank the University of Newcastle for funding and the men and women of the Hunter region who participated in this study. This research was funded by grants from the Australian National and Medical Health Research Council (NHMRC Project Grant ID: 569257), the Australian National Heart Foundation (NHF Project Grant ID: G 04S 1623), the University of Newcastle, the Gladys M Brawn Fellowship scheme, and the Vincent Fairfax Family Foundation in Australia. Elizabeth G Holliday is supported by the Australian NHMRC Fellowship scheme. Bio-Repository of DNA in Stroke (BRAINS) is partly funded by a Senior Fellowship from the Department of Health (UK) to P Sharma, the Henry Smith Charity and the UK-India Education Research Institutive (UKIERI) from the British Council. Cardiovascular Health Study (CHS) research was supported by NHLBI contracts N01-HC-85239, N01-HC-85079 through N01-HC-85086, N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133, HHSN268201200036C and NHLBI grants HL080295, HL087652, HL105756 with additional contribution from the National Institute of Neurological Diseases and Stroke (NINDS). Additional support was provided through AG-023629, AG-15928, AG-20098, and AG-027058 from the National Institute on Aging (NIA). DNA handling and genotyping was supported in part by National Center of Advancing Translational Technologies CTSI grant UL1TR000124 and National Institute of Diabetes and Digestive and Kidney Diseases grant DK063491 to the Southern California Diabetes Endocrinology Research Center. deCODE Genetics Work performed at deCODE was funded in part through a grant from the European Community's Seventh Framework Programme (FP7/2007-2013), the ENGAGE project grant agreement HEALTH-F4-2007-201413. Framingham Heart Study (FHS) This work was supported by the dedication of the Framingham Heart Study participants, the NHLBI's Framingham Heart Study (Contract Nos. N01-HC-25195 and N02-HL-6-4278), and by grants from the NINDS (NS17950), the NHLBI (HL93029), and the NIA (AG033193). Genetics of Early Onset Stroke (GEOS) Study, Baltimore, USA was supported by the NIH Genes, Environment and Health Initiative (GEI) Grant U01 HG004436, as part of the GENEVA consortium under GEI, with additional support provided by the Mid-Atlantic Nutrition and Obesity Research Center (P30 DK072488), and the Office of Research and Development, Medical Research Service, and the Baltimore Geriatrics Research, Education, and Clinical Center of the Department of Veterans Affairs. Genotyping services were provided by the Johns Hopkins University Center for Inherited Disease Research (CIDR), which is fully funded through a federal contract from the NIH to the Johns Hopkins University (contract number HHSN268200782096C). Assistance with data cleaning was provided by the GENEVA Coordinating Center (U01 HG 004446; PI Bruce S Weir). Study recruitment and assembly of datasets were supported by a Cooperative Agreement with the Division of Adult and Community Health, Centers for Disease Control and Prevention and by grants from NINDS and the NIH Office of Research on Women's Health (R01 NS45012, U01 NS069208-01). Heart Protection Study (HPS) (ISRCTN48489393) was supported by the UK Medical Research Council (MRC), British Heart Foundation, Merck and Co (manufacturers of simvastatin), and Roche Vitamins Ltd (manufacturers of vitamins). Genotyping was supported by a grant to Oxford University and CNG from Merck and Co. Jemma C Hopewell acknowledges support from the British Heart Foundation Centre of Research Excellence, Oxford (RE/08/004). Heart and Vascular Health Study (HVH) research reported in this article was funded by NHLBI grants R01 HL085251 and R01 HL073410. Ischemic Stroke Genetics Study (ISGS)/Siblings With Ischemic Stroke Study (SWISS) was supported in part by the Intramural Research Program of the NIA, NIH project Z01 AG-000954-06. ISGS/SWISS used samples and clinical data from the NIH-NINDS Human Genetics Resource Center DNA and Cell Line Repository (, human subjects protocol numbers 2003-081 and 2004-147. ISGS/SWISS used stroke-free participants from the Baltimore Longitudinal Study of Aging (BLSA) as controls. The inclusion of BLSA samples was supported in part by the Intramural Research Program of the NIA, NIH project Z01 AG-000015-50, human subjects protocol number 2003-078. The ISGS study was funded by NIH-NINDS Grant R01 NS-42733 (J F Meschia). The SWISS study was funded by NIH-NINDS Grant R01 NS-39987 (J F Meschia). This study used the high-performance computational capabilities of the Biowulf Linux cluster at the NIH ( MGH Genes Affecting Stroke Risk and Outcome Study (MGH-GASROS) was supported by NINDS (U01 NS069208), the American Heart Association/Bugher Foundation Centers for Stroke Prevention Research 0775010N, the NIH and NHLBI's STAMPEED genomics research program (R01 HL087676), and a grant from the National Center for Research Resources. The Broad Institute Center for Genotyping and Analysis is supported by grant U54 RR020278 from the National Center for Research resources. Milano - Besta Stroke Register Collection and genotyping of the Milan cases within CEDIR were supported by Annual Research Funding of the Italian Ministry of Health (Grant Numbers: RC 2007/LR6, RC 2008/LR6; RC 2009/LR8; RC 2010/LR8). FP6 LSHM-CT-2007-037273 for the PROCARDIS control samples. Rotterdam Study was supported by the Netherlands Organization of Scientific Research (175.010.2005.011), the Netherlands Genomics Initiative (NGI)/Netherlands Organization for Scientific Research (NWO) Netherlands Consortium for Healthy Ageing (050-060-810), the Erasmus Medical Center and Erasmus University, Rotterdam, the Netherlands Organization for Health Research and Development, the Research Institute for Diseases in the Elderly, the Ministry of Education, Culture, and Science, the Ministry for Health, Welfare, and Sports, the European Commission, and the Municipality of Rotterdam to the Rotterdam Study. Further funding was obtained from the Netherlands Heart Foundation (Nederlandse Hartstichting) 2009B102. Wellcome Trust Case-Control Consortium 2 (WTCCC2) was principally funded by the Wellcome Trust, as part of the Wellcome Trust Case Control Consortium 2 project (085475/B/08/Z and 085475/Z/08/Z and WT084724MA). The Stroke Association provided additional support for collection of some of the St George's, London cases. The Oxford cases were collected as part of the Oxford Vascular Study which is funded by the MRC, Stroke Association, Dunhill Medical Trust, National Institute of Health Research (NIHR) and the NIHR Biomedical Research Centre, Oxford. The Edinburgh Stroke Study was supported by the Wellcome Trust (clinician scientist award to C Sudlow), and the Binks Trust. Sample processing occurred in the Genetics Core Laboratory of the Wellcome Trust Clinical Research Facility, Western General Hospital, Edinburgh. Much of the neuroimaging occurred in the Scottish Funding Council Brain Imaging Research Centre (, Division of Clinical Neurosciences, University of Edinburgh, a core area of the Wellcome Trust Clinical Research Facility and part of the SINAPSE (Scottish Imaging Network—A Platform for Scientific Excellence) collaboration (, funded by the Scottish Funding Council and the Chief Scientist Office. Collection of the Munich cases and data analysis was supported by the Vascular Dementia Research Foundation. M Farrall and A Helgadottir acknowledge support from the BHF Centre of Research Excellence in Oxford and the Wellcome Trust core award (090532/Z/09/Z). Barcelona The Neurovascular Research Laboratory takes part in the International Stroke Genetics Consortium (ISGC), the Spanish Stroke Genetics Consortium (, and the Cooperative Neurovascular Research RENEVAS (RD06/0026/0010). This study was funded by a grant of the Spanish government (PI10/01212.). The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreements #201024 and #202213 (European Stroke Network). Belgium Stroke Study (BSS) was supported by Erasme Funds. Edinburgh Stroke Study (ESS) (which contributed discovery cases as part of WTCCC2 and additional replication cases) was supported as described above. Lothian Birth Cohort 1936 was supported in part by Research into Aging, Help the Aged (Sidney De Haan Award and The Disconnected Mind Major Gift Campaign), MRC, and UK Biotechnology and Biological Sciences Research Council (BBSRC). Lothian Birth Cohort 1936 was also supported by a programme grant from Research Into Ageing and continues with programme grants from Help the Aged/Research Into Ageing (Disconnected Mind). The work was undertaken by The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross council Lifelong Health and Wellbeing Initiative (G0700704/84698). Funding from the BBSRC, Engineering and Physical Sciences Research Council (EPSRC), Economic and Social Research Council (ESRC), and MRC is gratefully acknowledged. Genotyping of the LBC1936 was funded by the BBSRC. Glasgow The work was supported by NHS Greater Glasgow Endowment funds. Genetics of Diabetes Audit and Research in Tayside Study (Go-Darts) N R van Zuydam is supported by PhD funding from IMI SUMMIT study, under the EU Framework Programme 7 funding stream. The Wellcome Trust provides support for Wellcome Trust United Kingdom Type 2 Diabetes Case Control Collection (Go-DARTS) and the Scottish Health Informatics Programme. Graz Stroke Study Genetic studies of the Austrian Stroke Prevention Study are supported by the Austrian Science Fund (P20545). Interstroke has received unrestricted grants from the Canadian Institutes of Health Research, Heart and Stroke Foundation of Canada, Canadian Stroke Network, Pfizer Cardiovascular Award, Merck, AstraZeneca, and Boehringer Ingelheim. The study was facilitated by CANNeCTIN network. Funding for genotyping was provided by the Heart and Stroke Foundation of Ontario, the Canadian Stroke Network, and by an unrestricted grant from Boehringer-Ingelheim. Leuven Stroke Study is funded by personal research funds from the Department of Neurology, University Hospital Leuven, Leuven, Belgium. V Thijs and R Lemmens are supported by Fundamental Clinical Investigatorships from FWO Flanders. Lund Stroke Register (LSR) was supported by the Swedish Research Council (K2007-61X-20378-01-3, K2010-61X-20378-04-3), Region Skåne, the Freemasons Lodge of Instruction EOS in Lund, King Gustav V and Queen Victoria's Foundation, Lund University, and the Swedish Stroke Association. DNA extraction and preparation for LSR was performed by the SWEGENE Resource Center for Profiling Polygenic Disease (Skåne University Hospital, Malmö, Sweden). Munster (Westphalian Stroke Cases and Controls from the Dortmund Health Study, Germany) Case ascertainment in the Westphalian Stroke Register was part of the German Competence Net Stroke, supported by the German Federal Ministry of Education and Research (01GI9909/3). Blood collection in the Dortmund Health Study was done through funds from the Institute of Epidemiology and Social Medicine University of Muenster. The collection of sociodemographic and clinical data in the Dortmund Health Study was supported by the German Migraine and Headache Society (DMKG) and by unrestricted grants of equal share from Almirall, Astra Zeneca, Berlin Chemie, Boehringer, Boots Health Care, Glaxo-Smith-Kline, Janssen Cilag, McNeil Pharma, MSD Sharp & Dohme, and Pfizer to the University of Muenster. Portugal SAO, JMF, and AMV are deeply grateful to all study participants, to the genotyping unit at the Instituto Gulbenkian de Ciência, and to the Portuguese study neurologists and nurses for their contributions. This work was supported by the PTDC/SAU-GMG/64426/2006 grant, a Cilência 2008 contract (SAO) and doctoral fellowships from the Portuguese Fundaçaão para a Ciência e a Tecnologia. Poland: Krakow The study was supported by the grant from the Jagiellonian University, Krakow Poland: K/ZDS/002848. SMART-study the Netherlands Genotyping in the SMART Study was made possible, in part, by a Complementation Grant to P I W de Bakker from the Biobanking and Biomolecular Research Infrastructure in the Netherlands (BBMRI-NL). S Achterberg is working in part on a grant from the Netherlands Heart Foundation, No. 2005B031. VISP The GWAS component of the VISP study was supported by the United States National Human Genome Research Institute (NHGRI), Grant U01 HG005160 (PI Michèle Sale & Bradford Worrall), as part of the Genomics and Randomized Trials Network (GARNET). Genotyping services were provided by the Johns Hopkins University Center for Inherited Disease Research (CIDR), which is fully funded through a federal contract from the NIH to the Johns Hopkins University. Assistance with data cleaning was provided by the GARNET Coordinating Center (U01 HG005157; PI Bruce S Weir). Study recruitment and collection of datasets for the VISP clinical trial were supported by an investigator-initiated research grant (R01 NS34447; PI James Toole) from the United States Public Health Service, NINDS, Bethesda, Maryland. Control data obtained through the database of genotypes and phenotypes (dbGAP) maintained and supported by the United States National Center for Biotechnology Information, US National Library of Medicine. WHI Funding support for WHI-GARNET was provided through the NHGRI GARNET (Grant Number U01 HG005152). Assistance with phenotype harmonisation and genotype cleaning, as well as with general study coordination, was provided by the GARNET Coordinating Center (U01 HG005157). Funding support for genotyping, which was performed at the Broad Institute of MIT and Harvard, was provided by the NIH Genes, Environment, and Health Initiative (GEI; U01 HG004424).


HM, MD, and MFa designed the experiment. MT drew the figures. MFa, MT, SB, and RM did the meta-analysis and subsequent replication statistical analysis. MT, MFa, EGH, CS, JCH, Y-CC, MFo, FAI, RM, SV, UT, MAN, WTL, KLW, SY, EAP, ALD, KS, BBW, SJK, MSK, AH, THM, BDM, KF, RC, CL, SS, AG, GBB, PS, JCB, BMP, PMR, JR, JFM, SG, MD, and HSM were responsible for the collection, phenotyping, or analysis of the discovery cohorts. Replication samples or replication data were provided by AR, AH, SAc, IF-C, SAb, RS, MW, W-MC, EBR, MO, WKH, JP, RL, BN, PH, MB, MS, GK, ASFD, AMV, HD, AA, GD, SAO, CNAP, ID, HS, MP, JM, CC, PIWD, KK, JMF, NRV, BGN, AL, VT, AS, DS, GP, KB, GT, and CS. SB coordinated wet lab replication genotyping. MT, HM, and MFa wrote the first draft of the report. All authors reviewed and commented on the report.

Conflicts of interest

All authors affiliated with deCODE are employees of deCODE, a biotechnology company. Some deCODE employees own stock options in deCODE. The other authors declare that they have no conflicts of interest.

Supplementary Material

Supplementary appendix:


1. Department of Health . Reducing brain damage: faster access to better stroke care. National Audit Office; London: 2005.
2. Sacco RL, Ellenberg JH, Mohr JP. Infarcts of undetermined cause: the NINCDS Stroke Data Bank. Ann Neurol. 1989;25:382–390. [PubMed]
3. Dichgans M. Genetics of ischaemic stroke. Lancet Neurol. 2007;6:149–161. [PubMed]
4. Dichgans M, Markus HS. Genetic association studies in stroke: methodological issues and proposed standard criteria. Stroke. 2005;36:2027–2031. [PubMed]
5. Manolio TA, Collins FS, Cox NJ. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. [PMC free article] [PubMed]
6. Markus HS. Stroke genetics. Hum Mol Genet. 2011;20:R124–R131. [PubMed]
7. Jerrard-Dunne P, Cloud G, Hassan A, Markus HS. Evaluating the genetic component of ischemic stroke subtypes: a family history study. Stroke. 2003;34:1364–1369. [PubMed]
8. Gretarsdottir S, Thorleifsson G, Manolescu A. Risk variants for atrial fibrillation on chromosome 4q25 associate with ischemic stroke. Ann Neurol. 2008;64:402–409. [PubMed]
9. Gudbjartsson DF, Holm H, Gretarsdottir S. A sequence variant in ZFHX3 on 16q22 associates with atrial fibrillation and ischemic stroke. Nat Genet. 2009;41:876–878. [PMC free article] [PubMed]
10. Gschwendtner A, Bevan S, Cole JW. Sequence variants on chromosome 9p21.3 confer risk for atherosclerotic stroke. Ann Neurol. 2009;65:531–539. [PMC free article] [PubMed]
11. Kubo M, Hata J, Ninomiya T. A nonsynonymous SNP in PRKCH (protein kinase C eta) increases the risk of cerebral infarction. Nat Genet. 2007;39:212–217. [PubMed]
12. Ikram MA, Seshadri S, Bis JC. Genomewide association studies of stroke. N Engl J Med. 2009;360:1718–1728. [PMC free article] [PubMed]
13. International Stroke Genetics Consortium. Wellcome Trust Case–Control Consortium 2 Failure to validate association between 12p13 variants and ischemic stroke. N Engl J Med. 2010;362:1547–1550. [PMC free article] [PubMed]
14. International Stroke Genetics Consortium (ISGC) Wellcome Trust Case Control Consortium 2 (WTCCC2) Bellenguez C. Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke. Nat Genet. 2012;44:328–333. [PMC free article] [PubMed]
15. Franke A, McGovern DP, Barrett JC. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat Genet. 2010;42:1118–1125. [PMC free article] [PubMed]
16. Barrett JC, Clayton DG, Concannon P. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet. 2009;41:703–707. [PMC free article] [PubMed]
17. International Parkinson Disease Genomics Consortium. Nalls MA, Plagnol V. Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies. Lancet. 2011;377:641–649. [PMC free article] [PubMed]
18. Adams HP Jr, Bendixen BH, KappelleLJ, et al. Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST. Trial of Org 10172 in Acute Stroke Treatment. Stroke; 24: 35–41. [PubMed]
19. International HapMap Consortium. Frazer KA. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PMC free article] [PubMed]
20. International HapMap 3 Consortium. Altshuler DM, Gibbs RA. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. [PMC free article] [PubMed]
21. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. [PMC free article] [PubMed]
22. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;27:2190–2191. [PMC free article] [PubMed]
23. Li J, Ji L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity (Edinb) 2005;95:221–227. [PubMed]
24. Helgadottir A, Thorleifsson G, Magnusson KP. The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat Genet. 2008;40:217–224. [PubMed]
25. Wu L, Shen Y, Liu X. The 1425G/A SNP in PRKCH is associated with ischemic stroke and cerebral hemorrhage in a Chinese population. Stroke. 2009;40:2973–2976. [PubMed]
26. Serizawa M, Nabika T, Ochiai Y. Association between PRKCH gene polymorphisms and subcortical silent brain infarction. Atherosclerosis. 2008;199:340–345. [PubMed]
27. Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. [PMC free article] [PubMed]
28. Ikram MK, Sim X, Jensen RA. Four novel loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo. PLoS Genet. 2010;6:e1001184. [PMC free article] [PubMed]
29. Soranzo N, Spector TD, Mangino M. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat Genet. 2009;41:1182–1190. [PMC free article] [PubMed]
30. Newton-Cheh C, Johnson T, Gateva V. Genome-wide association study identifies eight loci associated with blood pressure. Nat Genet. 2009;41:666–676. [PMC free article] [PubMed]
31. Holliday EG, Maguire JM, Evans TJ. Common variants at 6p21.1 are associated with large artery atherosclerotic stroke. Nat Genet. 2012 published online Sept 2.