|Home | About | Journals | Submit | Contact Us | Français|
Despite evidence of a genetic role in stroke, the identification of common genetic risk factors for this devastating disorder remains problematic. We aimed to identify any common genetic variability exerting a moderate to large effect on risk of ischaemic stroke, and to generate publicly available genome-wide genotype data to facilitate others doing the same.
We applied a genome-wide high-density single-nucleotide-polymorphism (SNP) genotyping approach to a cohort of samples with and without ischaemic stroke (n=278 and 275, respectively), and did an association analysis adjusted for known confounders in a final cohort of 249 cases and 268 controls. More than 400 000 unique SNPs were assayed.
We produced more than 200 million genotypes in 553 unique participants. The raw genotypes of all the controls have been posted publicly in a previous study of Parkinson’s disease. From this effort, results of genotype and allele association tests have been publicly posted for 88% of stroke patients who provided proper consent for public release. Preliminary analysis of these data did not reveal any single locus conferring a large effect on risk for ischaemic stroke.
The data generated here comprise the first phase of a genome-wide association analysis in patients with stroke. Release of phase I results generated in these publicly available samples from each consenting individual makes this dataset a valuable resource for data-mining and augmentation.
Ischaemic stroke is a common neurological disease and a leading cause of severe disability and death in developed countries.1 About 85-90% of strokes are ischaemic.2,3 In most cases, stroke is thought to be a multifactorial disorder or complex trait for which classic patterns of inheritance cannot be shown. However, studies in family and animal models have consistently indicated a genetic influence on stroke risk and prognosis.4-6
Historically, the two most common approaches to finding genes involved in disease have been linkage and candidate gene association studies. For familial disorders with suitable family structures available for sampling, linkage has been a successful approach. The identification of genetic lesions underlying monogenic disorders has now become routine in many laboratories. In the context of stroke, this approach has provided some success, most notably with the identification of NOTCH3 mutations underlying cerebral autosomal dominant arteriopathy with subcortical infacts and leukoencephalopathy (CADASIL).5 Combined linkage and association approaches have led to the identification of putative risk factor loci for ischaemic stroke, namely PDE4D and ALOX5AP in the Icelandic population.7,8
In the context of diseases with an oligogenic or polygenic basis of genetic risk, population-based association studies provide more statistical power than family-based linkage studies.9,10 However, similar to other late-onset disorders, the identification of common alleles that affect stroke has been problematic. Until the advent of technology that made feasible the testing of thousands of genetic variants at once, the most practical and widely employed approach to identify disease-associated loci has been candidate gene association analysis. Many candidate genes for ischaemic stroke have been investigated with numerous statistically significant associations reported; however, few associations have been consistently replicated.11
The completion of the International Haplotype Map Project (HapMap), coupled with the availability of high-throughput genotyping methods, affords the opportunity to apply the powerful approach of association testing in a genome-wide manner.12,13 A handful of genome-wide association studies have so far been published on diseases including age-related macular degeneration, Parkinson’s disease, myocardial infarction, and inflammatory bowel disease, and at least two of these have identified novel loci that have been independently replicated.14-18
In an attempt to define whether there is a common genetic risk factor underlying risk for stroke, and to generate publicly available genome-wide genotypes that can be reanalysed or augmented by others, we did whole-genome genotyping using more than 400 000 unique SNPs from the Illumina Infinium Human-1 and HumanHap300 assays in a cohort of 278 patients with ischaemic stroke and 275 neurologically normal controls. Here we present these data and an initial analysis of this genotyping effort.
All control samples were from the National Institute of Neurological Disorders and Stroke Neurogenetics Repository. All individuals involved in this study gave written consent for the genetic analysis. As described previously,16 the panels containing neurologically normal control samples were NDPT002, NDPT006, and NDPT008; these consist of DNA from 275 unique individuals and one replicate sample. Blood samples were drawn from white individuals who were unrelated and neurologically normal at many different sites within the USA. All individuals underwent a detailed medical history interview. None had a history of stroke, Alzheimer’s disease, amyotrophic lateral sclerosis, ataxia, autism, bipolar disorder, brain aneurysm, dementia, dystonia, or Parkinson’s disease. Sum scores on the mini-mental state examination19 ranged from 26 to 30, and all were interviewed for detailed family history. None had any first-degree relative with a known primary neurological disorder including amyotrophic lateral sclerosis, ataxia, autism, brain aneurysm, dystonia, Parkinson’s disease, or schizophrenia. The mean age at sample collection was 68 years (range 55-88 years).
All stroke samples used in the current study came from the Ischaemic Stroke Genetics Study (ISGS), which was a prospective five-centre case-control study in the USA. The protocol for ISGS has been reported previously.20 For the stroke cohort, all cases had recent (within 30 days) first-ever ischaemic stroke confirmed by history, physical examination, and head imaging (CT or MRI). Stroke was defined according to the WHO definition.21 Iatrogenic, septic embolic, vasospastic, and vasculitic stroke cases were excluded.
A genotype-blinded neurologist rater (RDB) classified ischaemic strokes according to the pre-specified Trial of Org 10172 in Acute Stroke Treatment (TOAST),22 Oxfordshire,23 and Baltimore24 criteria on the basis of a detailed medical record review. Video-certified examiners assessed neurological impairment using the NIH stroke scale.25 Functional status at baseline and at 90 days was assessed using the Barthel index,26 Oxford handicap scale,27 and the Glasgow outcome score.28
The work presented here represents the first phase of a multi-stage genetic association study. The first stage is a 275 case, 275 control, genome-wide association scan using more than 408000 SNPs. In the second stage, 1225 independent cases and 1225 independent controls will be genotyped for roughly 3000 of the most highly associated SNPs. The significance level for proceeding to genotype in the second stage is 0.0075 (3000 of 400000).
Using an estimated cumulative incidence of ischaemic stroke of 10% for adults over age 55 years, we have calculated the power estimates to detect an associated SNP with given minor allele frequency and odds ratio in a series of 250 cases and 250 controls (webtable 1). Thus, for the ultimate two-stage design, the power to detect a SNP with odds ratio of 1.50, assuming a minor allele frequency of 0.20 at p=5×10-7, is 89%. Using this two-stage strategy, there is excellent power (greater than 80%) to detect stroke susceptibility loci with realistically modest effect (odds ratio around 1.5) and low (but still common) disease allele frequency (around 0.15). Although this calculation does not take into account incomplete coverage, we estimate that the combination of data from the gene-centric Human-1 and haplotype tagging HumanHap300 chips provides excellent coverage in our population, and thus feel that this would have minimum effect on these power calculations.
In comparison to a two-stage design, the power of a single-stage design with 250 cases and 250 controls is substantially lower (webtable 2). If this collection of cases and controls was the only component of the genetic study, only susceptibility loci with large effect (odds ratio >2) and common frequency (>0.30) could be detected. Such loci would probably be detected by linkage, since they would have an effect equivalent to that of HLA and in patients with type 1 diabetes (originally detected with 100 cases and 100 controls).
All the control samples and 88% (219 of 249) of the stroke patients included in the association analysis gave consent for public release of their genotype data. 24 patients with stroke, whose data predated the inclusion within the NINDS Neurogenetics Repository, had given consent but did not explicitly agree to public sample release or data sharing, so we have not released raw genotype data on these individuals.
Epstein-Barr virus immortalisation of peripheral blood lymphocytes was done as previously described29,30 and DNA was extracted using a modified salting out procedure.31 DNA was also extracted from 0.5 mL of blood for subsequent quality control steps in the cell banking process. DNA for the analyses was extracted from the Epstein-Barr virus immortalised lymphocyte cell lines; these cell lines remain largely faithful to the genotype of the source tissue when examined by high-density SNP genotyping assays.32
All samples were assayed with the Illumina Infinium Human-1 and HumanHap300 SNP chips (Illumina Inc, San Diego, CA, USA). The Human-1 product assays 109 365 gene-centric SNPs and the HumanHap300 product assays 317 511 haplotype tagging SNPs derived from phase I of the international HapMap project. There are 18 073 SNPs in common between the two arrays; thus the assays combined provide data on 408 803 unique SNPs. Any assay with a call rate below 95% was repeated on a fresh DNA aliquot; if the call rate persisted below this level the sample was excluded from the analysis.
All chips were scanned using the Illumina BeadStation system. Human-1 chips were all scanned with settings standard for that product; HumanHap300 chips were scanned with one of two settings; a slow scan setting (about 90 min scan time) or a fast scan setting (around 40 min scan time), which was made available during this analysis. Genotype concordance rates between these two analyses are extremely high (0.9999 [SD 0.001]).
Data were analysed with BeadStudio v220.127.116.11 (Illumina Inc, San Diego CA, USA). All genotypes were stored within an open source in-house genotype database GERON genotyping; this database was also used for data manipulation and export for analysis using the programs STRUCTURE33 version 2.1 and SNPGWA version 2.2.
Population substructure was examined with STRUCTURE version 2.1 in the entire sample of cases and controls.33 Specifically, 267 SNPs were selected from across the autosomes ensuring that no two of the selected SNPs were in linkage disequilibrium and that each had a minor allele frequency of more than 10%. Global tests for substructure were computed and individual observations that showed departure from the general population were identified.
Statistical analysis of the raw genotype data was done with the software SNPGWA. Each SNP was tested for departures from Hardy-Weinberg equilibrium (HWE) expectations in the case, control, and combined samples using the exact test.34 Case departures from HWE expectations were compared with control proportion patterns for insight into possible genetic models. Linkage disequilibrium statistics, D’ and r2, were computed for each tandem pair of SNPs. To identify any association between the individual polymorphism and stroke status, several tests were done with SNPGWA. These included the overall 2-degree of freedom test (genotype), tests of the additive genetic model (Cochran-Armitage trend test), and the corresponding test for lack of fit to additivity. Tests of allelic association and association under dominant and recessive models are also reported. For SNPs on chromosome X not within the PAR1 (pseudoautosomal region 1) and PAR2 (pseudoautosomal region 2) regions, only the dominant genetic model was considered. The human X and Y chromosomes are morphologically and genetically distinct; however, there are X-Y homologous regions, PAR1 and PAR2, which pair and recombine at meiosis.
For sets of tandem SNPs, allelic and two-marker and three-marker moving window haplotype tests were computed using the expectation-maximisation algorithm implemented in SNPGWA. For all analyses, odds ratios and 95% CIs were computed. The above analyses were repeated for the separate TOAST subtypes.
Using the case-control data, we computed a series of generalised estimating equations35 that included relevant covariates (age, sex, hypertension, smoking status, diabetes mellitus, and heart disease). All modelling was done in a hierarchical manner, with a baseline model that included only the single nucleotide polymorphism as the predictor. Additional models were tested, with age and sex; further models were then tested by adding an individual stroke risk factor variable as a covariate. A final, fully saturated model that included all relevant covariates was analysed. P values were computed using the 2 degree-of-freedom generalised test of association. A series of genetic models were tested (dominant, additive, recessive) for estimation of best fit for risk. Results were examined only for SNPs that were in HWE in controls and had less than a 5% missing genotype rate. Only the additive model was considered with the exception of SNPs on chromosome X, where the dominant model was examined.
The study sponsors had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
DNA samples from 278 patients with stroke and 275 control subjects were genotyped using the Human-1 beadchip; 2 stroke cases and 1 control subject were genotyped in replicate to assess reproducibility of the genotyping platform. Ten stroke samples were dropped because of insufficient DNA and nine stroke samples did not reach the quality threshold of 95% call rate, or showed discordance between expected sex and genotype. Thus the ischaemic stroke cohort taken through to the analysis by STRUCTURE consisted of 259 unique individuals. In the control cohort, four samples were dropped because of low-quality genotyping, and two samples showed a discrepancy between expected sex and genotype. The control cohort used for STRUCTURE analysis therefore consisted of 269 unique individuals. The median genotype success rate for the 408 803 SNPs studied in our population was 99.72% (mean 99.44%; range 96.14% to 99.97%) for samples that passed the initial quality check (call rate ≥95%). Genotyping of three replicate samples showed concordance rates of 99.55% (99.81% for the control replicate with Human-1, and 99.12% and 99.71% for the two cases replicated with the HumanHap300 assays). Analysis of the 18 073 SNPs that overlap between the Human-1 and HumanHap300 products revealed genotype concordance rates of 99.97% (range 98.53% to 99.99%) between the assays across 528 samples (259 samples from patients with ischaemic stroke and 269 from unrelated population controls) including both samples scanned with fast scan and slow scan settings.
STRUCTURE failed to detect overall differences in the population substructure between cases and controls. However, 2% (11 of 528) of samples, which consisted of the ten stroke cases and one control reported previously,16 seem to have a substantially different genetic background (figure). Assessment of these samples revealed that all were from self-identified African Americans. On the basis of the STRUCTURE results, these 11 patients were not included in subsequent analyses. Statistical comparison for association was therefore done with 249 white patients with stroke and 268 white neurologically normal controls.
Table 1 shows the clinical and demographic characteristics of the study population by case status. As expected, traditional vascular risk factors are more common in cases with ischaemic stroke than in stroke-free controls. Table 2 shows stroke severity, stroke subtype, and functional status among cases.
Hardy-Weinberg equilibrium (HWE) was tested in the controls for all loci with a call rate of 95% or more. A total of 394 513 SNPs and 374 544 SNPs had a HWE p value higher than 0.001 and 0.05, respectively. Statistical analysis of association was done on all genotypes irrespective of Hardy-Weinberg disequilibrium or minor allele frequency. However, in table 3 we present the SNPs with a p value of <1×10-5 after adjusting for demographic and stroke risk factors (age, sex, hypertension, smoking status, diabetes mellitus, and heart disease) with call rates more than 95% and HWE of p<0.001.
Analyses according to TOAST stroke subtypes were also done. Webtables 3, 4, and 5 summarise the most significant SNPs for each of cardioembolic, large vessel and small vessel stroke subtypes with call rates more than 95% and HWE of p<0.001 in the control cohort.
Raw p values and genotype and allele frequencies for all loci and under each model are available at the National Center for Biotechnology Information dbGAP website. Individual genotype data for all the controls and 88 % of cases are also available at this site. Within 9 months of publication we will also release raw scan data to enable the analysis of structural genomic variation.
We present here an initial genome-wide SNP association study in ischaemic stroke, which compared 408 803 unique SNPs in 249 white patients with ischaemic stroke and 268 white neurologically normal controls. The ischaemic stroke cohort, prospectively ascertained at five US stroke centres, is comparable to population-based cohorts in the United States in terms of its conventional atherosclerotic risk factor profile.36 The control cohort showed a paucity of conventional risk factors relative to what would be expected from an age-related population-based sample, probably due to the restrictive control eligibility criteria and possible volunteer bias.
As expected, our screening approach yields hundreds of nominally statistically significant associated markers, leaving the challenge to distinguish true associations from those that are false positives. None of our results are significant after Bonferroni correction. However, in view of the correlation between tests in most SNP settings, this correction is probably overly conservative. The individual risk provided by SNPs (table 3) is moderate-high with odds ratios ranging from 0.40 (0.26-0.62) to 0.54 (0.41-0.74) and 1.9 (1.42-2.65) to 8 (2.85-22.33). In a recent meta-analysis37 from 120 case-control studies, 32 genes, and around 18 000 cases of stroke and 58 000 controls, the summary odds ratio for those candidate genes with significant association varied from 1.21 (95% CI 1.08- 1.35) for the angiotensin-converting enzyme (ACE) insertion or deletion polymorphism to 1.88 (1.28-2.76) for a polymorphism in the Kozak sequence of GPIBA (encoding glycoprotein Ib-α). Most published candidate genes studies report significant association with ischaemic stroke with an odds ratio of <3.11 Clearly the current study in isolation is underpowered to detect genetic loci that exert a moderate risk for stroke; however, as discussed, as a part of our two stage design, we have excellent power to detect effects of odds ratios higher than 1.5 conferred by common risk alleles. The posting of the raw genotype data will allow other interested parties to also pursue independent follow-up work.
Notably, some of the most significant SNPs listed in table 3 are within or near interesting candidate loci; for example, two genes involved in potassium transport, KCNIP4 and KCNK17. Two different functions have been suggested for KCNIP4. Firstly, all KCNIP family proteins modulate the activity of Kv4 A-type potassium channels, which contribute to the frequency of slow repetitive firing and back-propagation of action potentials in neurons and shape the action potential in the heart.38 Secondly, KCNIP4 has been shown to have a role in presenilin function.39,40 KCNK17 is a member of the acid-sensitive subfamily of tandem pore K+ channels, which are open at all membrane potentials and contribute to cellular resting membrane potential. KCNK17 transcripts are widely expressed in humans, with highest levels in liver, lung, pancreas, placenta, aorta, and heart.41 In the heart, background K+ currents are thought to modulate the cardiac action potential.42,43
In the stroke cohort, ischaemic strokes were classified according to TOAST criteria. Subtype analysis greatly reduces statistical power, making all conclusions preliminary. Nonetheless, it is of interest to note that the cardioembolic subtype of ischaemic stroke showed an association with common variation in APEG-1 (aortic preferentially expressed gene 1). Expression of APEG-1 gene is thought to serve as a marker for differentiated vascular smooth muscle cells; given that alterations in arterial smooth muscle cells phenotypes have an important role in the pathogenesis of vascular diseases and angiogenesis, APEG-1 could be a good candidate.44
Although there is a temptation to speculate on the potential pathogenic roles that the most statistically significantly associated genes might have in disease, we should emphasise that this first-stage association work is not designed to unequivocally link genetic variability, conferring a risk of the size identified here, with stroke. As described, the approach presented here is taken as an intermediate step where putative associations will be re-tested in separate cohorts. As such, the loci listed in table 3 should not be regarded as the only regions warranting follow-up. More appropriately, several thousand of the most significantly associated loci will be re-tested in independent series. Some studies point out that analysing the data from both stages jointly can be more powerful than treating the second stage as a stand-alone replication study.45 Public release of genotype data makes the comparison and combination of experiments easier thus increasing power.
Our data strongly suggest that there is no single common genetic variant exerting a major risk on stroke. This finding contrasts with our recent study of Alzheimer’s disease, which showed that APOE ε4 was the only allele in the entire genome conferring a risk of an odds ratio higher than 2.46 Although the absence of evidence does not necessarily imply evidence of absence, the genomic coverage of the methods applied here for white populations is in the order of 90% (at r² of 0.8). Thus, we have a fair degree of certainty to believe in the absence of one common variant that exerts a strong effect on risk for ischaemic stroke in this population. However, subsequent genome-wide association studies in larger cohorts, and focused follow-up of candidate loci, will be key steps in delineating the role that common genetic variability has in risk for ischaemic stroke. We believe this study is a necessary first step in the elucidation of genetic risk factors underpinning the third most common cause of mortality in the developed world.
The Ischaemic Stroke Genetics study (ISGS) is funded by a grant from the National Institute of Neurological Disorders and Stroke (R01 NS42733). We thank the participants and the submitters for depositing samples at the NINDS neurogenetics repository. Many samples for this study are derived from the NINDS neurogenetics repository at Coriell Cell Repositories, and the data are available from the website. This study in part used the high-performance computational capabilities of the Biowulf PC/Linux cluster at the NIH, Bethesda, MD, USA. This work was supported by the intramural programmes of the National Institute on Aging and NINDS and by an extramural NINDS contract funding the Coriell Repository.
Conflicts of interest We have no conflicts of interest.
ISGS Investigative Team
Executive Committee James F Meschia (Principal investigator), chair; Thomas G Brott, Robert D Brown Jr, Michael Frankel, John Hardy, Stephen S Rich, Scott Silliman, Bradford B Worrall. Data Management Centre Wake Forest University Medical Center: L Douglas Case (director), Laurie Russell, Carolyn Bell, Darrin Harris, Wes Roberson.Clinical Coordinating Centre Mayo Clinic, Jacksonville, FL, USA: James F Meschia, Alexa Richie, Dale Gamble, Sothear Luke. DNA Repository Coriell Institute for Medical Research, Camden, NJ, USA: Roderick A Corriveau. Genetics Laboratory National Institute on Aging, Bethesda MD, USA: John Hardy, Andrew Singleton. Statistical Genetics Wake Forest University Medical Center, Winston-Salem NC, USA: Stephen S Rich, W Mark Brown, Carl D Langefeld.
Sites and Investigators as of Sept 26, 2006
Mayo Clinic, Jacksonville, FL, USA (271 patients)—Principal investigator: James F Meschia. Coordinators: Alexa Richie, Dale Gamble, Sothear Luke. Subinvestigators: Thomas G Brott , Benjamin H Eidelman. University of Florida/Shands Hospital, Jacksonville, Florida (216 patients)—Principal investigator: Scott Silliman. Coordinators: Yvonne Douglas, Raam Sambandam. Subinvestigators: Nader Antonios. Emory University School of Medicine, Atlanta, Georgia (237 patients)—Principal investigator: Michael Frankel. Coordinator: Sharion Sailor-Smith. Mayo Clinic, Rochester, Minnesota (266 patients)—Principal investigator: Robert D Brown Jr. Coordinator: Colleen S Albers. University of Virginia, Charlottesville, Virginia (231 patients)—Principal investigator: Bradford Worrall. Coordinator: Daniel Chernavvsky.