|Home | About | Journals | Submit | Contact Us | Français|
This article is part of the Spotlight Issue on atrial fibrillation
Atrial fibrillation (AF) is the most common cardiac arrhythmia with well-established clinical and genetic risk components. Genome-wide association studies (GWAS) have identified 17 independent susceptibility signals for AF at 14 genomic regions, but the mechanisms through which these loci confer risk to AF remain largely undefined. This problem is not unique to AF, as the field of functional genomics, which attempts to bridge this gap from genotype to phenotype, has only uncovered the mechanisms for a handful of GWAS loci. Recent functional genomic studies have made great strides towards translating genetic discoveries to an underlying mechanism, but the large-scale application of these techniques to AF has remain limited. These advances, as well as the continued unresolved challenges for both common variation in AF and the functional genomics field in general, will be the subject of the following review.
Among all cardiac arrhythmias, atrial fibrillation (AF) is by far the most prevalent, with over 3 million current cases in the USA and 30 million cases worldwide. This number is expected to greatly increase over the next several decades as the population ages, an event that will further exacerbate the societal and monetary burden caused by this arrhythmia.
Clinical risk factors for AF are numerous and include advanced age, hypertension, obesity, and heart disease, but a significant hereditary component for AF risk has also been well described.1–4 This genetic contribution is observed as a markedly increased risk for individuals with a first-degree relative with AF.1,3–5 At present, estimates of the heritability of AF have been found to be as high as 62%, of which only a small portion has been mechanistically defined.4,5 This is evidenced by an inability for adjustment of clinical risk factors or common genetic variation to substantially alter the observed 40% increased AF risk in those with familial AF.4
Together, the total genetic contribution to AF risk can be broadly divided into three components:
With respect to the rare coding variation, there have been numerous reports from individuals or families with AF with associated mutations in a variety of cardiac ion channels, signalling molecules, structural proteins, and transcription factors.6 These include mutations that are not germline, but instead somatic in nature, as has been described for the gap junction gene GJA5.7 It is likely that the decreasing cost of exome and genome sequencing will aid in the discovery of additional monogenic or polygenic variants that confer a large risk of disease, although primary tissue samples for identifying somatic causes will be difficult to obtain and analyse. In total, the utility in these studies is likely to be family-specific and may not be useful in the ultimate identification of generalizable pathways that may be therapeutic targets in the population as a whole.
With respect to the role of undiscovered genetic variation in AF, there continues to be a disconnect between the heritability of AF and the AF-associated common or rare variants described to date. This so-called ‘missing heritability’ can potentially be attributed to many potential aetiologies including common variants that have not been captured in current GWAS arrays, epigenetics, rare variants with strong effects, large or small copy number variants, and mosaicism that is poorly captured with current techniques.
Instead, more common risk-conferring AF variants, typically found at >1% frequency in the population, may provide valuable insights into targetable pathways. However, determining the mechanism through which these variants perturb disease risk has been challenging and is the subject of the following review.
In the mid-2000s, platforms for genotyping hundreds of thousands of common variants in an individual, combined with the collection of large numbers of patient samples, led to the rapid expansion of GWAS. In brief, these studies examine the relative frequency of individual common variants in cases vs. controls and identify genomic variants that are more or less commonly observed in the cases. GWAS have been broadly used to identify the genetic basis of a wide range of traits and diseases. The first such GWAS for AF was published in 2007, in which a region on 4q25, upstream of the transcription factor gene PITX2, was found to be significantly associated with AF.8
As the effectiveness of GWAS relies greatly on the sample size and resolution of the assay, efforts were made to improve both of these in subsequent analyses. In 2009, a pair of groups identified an additional AF-associated signal at 16q22,9,10 which was intronic to another transcription factor, ZFHX3. In a separate analysis focused on individuals with early onset AF, a third signal intronic to the KCNN3 gene on chromosome 1 was identified.11
Large increases in sample size and increased density of genotyping led to a trio of studies that identified many additional AF risk loci. In 2012, a meta-analysis in individuals of European and Japanese descent uncovered an additional six loci for AF.12 Next, in 2014, the combination of fine-mapping and conditional analyses revealed that there was not one but four independent loci at 4q25.13 Finally, also in 2014, a meta-analysis of over 13 000 cases and 70 000 controls identified a further 5 loci, one of which was exclusive to those of Asian descent.14 At present, there are 14 genomic regions of susceptibility for AF with 17 independent signals at these loci (Table 1).
Although there have been rapid advances in the discovery of new genetic loci for AF, the clinical utility of GWAS variants remains unclear. For example, a genetic risk score consisting of the top 12 AF GWAS loci allows for a stratification of the population into a five-fold gradient of AF risk.14 However, a similar risk score resulted in only a modest increase in the prediction of incident AF beyond other known risk factors for arrhythmia.16 Furthermore, the effect of genotype at the top AF risk loci has also been examined in relation to the response to treatment, where AF genotype has been reported to predict response to cardioversion,17 pulmonary vein isolation,18–20 and anti-arrhythmic medications.21 In contrast, a single study in a Korean cohort found no association between AF risk alleles and response to AF ablations.22
Importantly, most of the studies relating genetic variants to AF treatments or outcomes have been of modest size and only considered the top few variants for AF. Thus, it will be interesting to re-examine each of these questions as we develop a more comprehensive picture of the AF genetics and gain access to larger patient populations. In future, it will also be interesting to determine whether AF genetic risk scores can be used to identify high-risk patients with heart failure or cryptogenic stroke who would benefit from more intensive monitoring or anticoagulation.
In the upcoming years, it is clear that increased resources will be applied to this endeavour, and we can expect GWAS for AF with massively increased sample sizes. Such an approach has been taken for other common diseases including coronary artery disease23 and schizophrenia,24 and what emerges is a comprehensive picture of the common genetic underpinnings of a disease. Larger GWASs also improve and refine the clinical risk prediction models of AF and will enable sufficiently powered analyses in cohorts stratified for AF outcomes such as stroke or heart failure.
Despite these anticipated advances, there remains a great knowledge gap in ‘how’ common variants at these loci confer AF risk. This void must be filled if we are to maximally utilize these discovery efforts to understand the molecular pathways underlying AF.
Although GWAS can be a powerful tool, this method is blind to the divergent pathways through which AF can arise, and it does not directly help to understand the basis for AF pathogenesis. For example, does the genotype at an AF SNP alter atrial fibrosis? Increase atrial diameter? Decrease atrial conduction time? Affect ion channel remodelling? Alter the positioning or development of the myocardial sleeves? Perturb autonomic tone? These and other possibilities25–27 will ultimately need to be considered when evaluating each locus. Such genotype–phenotype correlations are a daunting but important task that will provide insight for translational biology to develop novel, and hopefully personalized, treatments for AF.
For most AF loci, the steps from an association to the phenotype are entirely unknown. AF-associated SNPs mark a general region of the genome without the implication that an individual SNP is causative. The definition of the functional SNP(s) will be key for identifying a transcriptional network for AF, and an integrated approach spanning from genetics to animal modelling will be necessary (Figure 1). First, it will be essential to refine the genetics association at a locus. Second, for coding variants, the challenge is relatively straightforward and will involve the determination of an amino acid substitution on protein function. However, as with most GWAS, the vast majority of AF loci reside in non-coding regions. Therefore, a third approach will be to link the AF-associated SNP to a gene in the region. Fourth, we will need to refine the epigenetic landscape at a given locus. Fifth, it will be necessary to develop higher throughput methods to identify functional variants, and finally, appropriate modelling of the disease-related gene can be undertaken. In practice, it will be essential to combine these different approaches to develop a comprehensive view of how a locus contributes to disease.
The challenge behind identification of functional non-coding variation is rooted in the linkage structure of the human genome, in which relatively large regions or haplotype blocks of many SNPs are inherited together. The ‘sentinel’ SNPs identified by GWAS serve as a marker for a larger region of SNPs, all with a similar degree of association with disease. Thus, it is important to note that the sentinel SNP is not necessarily the functional variant at a locus. Further complicating these efforts, it is also possible that a common SNP may only serve as a tag for a coding variant, rare variant, or a variation in copy number such as an insertion or deletion.
A convenient way to display the results of an association at a given locus is with a regional plot (Figure 2). The data are plotted as the –log of the P-value for the association with AF vs. chromosomal distance. Each point represents a distinct SNP that is tested for an association with AF, and the red colour provides the degree of linkage disequilibrium to the sentinel SNP at the locus. The two plots for PITX2 and ZFHX3 illustrate some of the challenges in deciphering the mechanism of the AF-associated variants. For example, in the plot of the PITX2 region (Figure 2, left), there are a large number of non-coding, common variants that span a nearly 75 kb region (illustrated by the red bar) that are all essentially equivalent to the top signal. The region is much narrower at the ZFHX3 locus (Figure 2, right), so one could expect that the identification of a functional variant would be more straightforward, yet there are still many potential variants to consider even at a relatively simple locus.
Given the challenges of the aforementioned studies, as a first step after the identification of an AF locus, it is helpful to refine the association signal. Potential approaches include performing conditional analyses, imputation to newer haplotype maps, genotyping with alternate arrays to capture exonic, copy number, or epigenetic markers, and targeted or whole genome sequencing studies.
In a conditional analysis, one corrects for the top SNPs in a region to determine whether there is just one or potentially multiple independent association signals at the locus. This technique allowed the discovery of an additional three loci at the 4q25 locus, which, when combined with that at the main 4q25 signal, could be used to predict a five-fold graded risk score in individuals of European or Japanese descent.14 It is possible that other AF emerging loci may also contain multiple independent signals.
In addition to fine-mapping, one may improve the resolution at a locus by re-imputing previous data to newer reference map of the genome. Imputation is a strategy whereby known genotype at several SNPs in a region can be used to predict those not directly genotyped. Initially, this prediction was based upon reference maps from the HapMap consortium consisting of approximately 2 million markers.28 In recent years, these maps have been supplanted by those built from the 1000 Genomes Project,29 wherein hundreds of genomes have been sequenced and assembled. With 8–10 million variants, imputation to the 1000 genomes data set will greatly increase the resolution of any GWAS locus, as has been the case with coronary artery disease.23
The greatest resolution of an association signal at a locus can be provided by whole-genome sequencing. Although genome sequencing was once far too costly, with the continued improvements in sequencing cost, quality, and turnaround times, such studies are increasingly feasible. Genome sequencing will permit the identification of all common genetic variants and uncover any previously unidentified causative rare or copy number variation that was previously tagged by common variants.
Although each of these approaches will improve the resolution at a given locus, they will not solve the challenge of defining a functional regulatory variant. Further prioritization and functional analysis would be necessitated to define which variant(s) underlies association.
Even if genome sequencing data were readily available, there will still be dozens, if not hundreds, of possible candidate SNPs which may be a functional variant(s) at a given locus. Although it is possible that some may be coding variants, the vast majority of GWAS loci reside in non-coding regions (Figure 2, left, PITX2). For these loci, it is expected that functional variants act by differentially regulating the expression of nearby genes through enhancer or promoter activities. As comprehensively testing each of these with reporter assays, such as luciferase or EGFP expression, is an expensive and time-consuming endeavour, previous studies have attempted to limit the number of SNPs tested through prioritization schemes. The most widely used approach is to examine epigenetic data from the ENCODE project,30 including studies of DNase hypersensitivity and histone post-translational modification marks obtained from a series of immortalized cell lines and human primary samples.30 Together, these signals observed from these data sets allow researchers to hone in on regions of active chromatin that may be acting as enhancers or promoters in a given tissue. In theory, this can allow one to test far fewer SNPs as potential candidates.
While impressive for its time, the ENCODE data were limited in its scope and depth and have been integrated with the more comprehensive Roadmap Epigenomics Project data sets.31 These data, obtained from 111 epigenomes of human tissues, include additional CHIP-seq data for histone modifications, DNA methylation status, DNase hypersensitivity, and RNA sequencing data for relative expression. Using these data in combination, the Roadmap Epigenomics Project31 has released a series of models which predict the state of chromatin in a given tissue, a valuable resource for those looking to follow up on the functional basis for GWAS loci. As a template, in initial studies, these data were used to generate predictive models of where functional variants for 58 studies resided, including those for lipids, PR interval, blood pressure, and aortic root size.
Although the next release of the Roadmap data will be the result of more than 180 biological samples, its usefulness for the study of AF is limited by the lack of left atrial samples, nodal tissue, myocardial sleeves, or pulmonary vasculature. Inclusion of these samples in future epigenomic analyses will be important for the data set as a tool for functional follow-up for AF loci. Additionally, it should be noted that for human primary samples, the epigenomic map is a product of a mixed cell population from the tissue of interest. Therefore, if the molecular basis for an AF locus was the modification of enhancer activity in a minor cell population of the heart, it would be unlikely to be marked as ‘active chromatin’ in the aggregate analyses. A refinement of maps which take into account tissue heterogeneity will be an important improvement to watch for in future iterations of epigenetic modelling.
As regions of chromatin can form long-range regulatory interactions that can extend as far as a megabase, it is important to consider all genes in a wide region around an AF GWAS signal as potential candidate genes. Sometimes, there are relatively few candidate genes in a region, and the potential association with AF is clear, for example, the association between AF and the GJA1 or connexin 43 locus.14 However, in most other loci, there are many potential candidate genes with a number of plausible biological mechanisms. In these cases, determining which of these genes are regulated differentially based on genotype is of paramount importance prior to any further functional evaluation. Current approaches include expression quantitative trait analyses (eQTL) or the measurement of the physical interactions between regulatory regions and proximal promoters.
In an eQTL analysis, samples isolated from primary tissue are genotyped for a sentinel SNP at an AF locus, and then transcription of genes in the region is quantitated. If there is a correlation between the AF SNP and expression of a nearby gene, then that gene is highly likely be the causative one for AF. On a small scale, surgical samples from left atrial appendage tissue have been used to determine that the AF locus on chromosome 10q22 alters expression of a nearby gene (MYOZ1) and not the gene closest to the AF SNP (SYNPO2L).32 Similar analyses can also be performed more globally by performing genome-wide genotyping and RNA sequencing in relevant tissues.14,33 Large-scale projects for examining gene expression using RNA sequencing are currently underway in the Genotype Tissue Expression (GTEx) project.15 However, at present, there are 190 samples from the left atrial appendage, a number that is still underpowered for subtle effects of most GWAS functional variants. Future evaluations by the GTEx Consortium or other investigators will hopefully provide well-powered analysis of gene expression in the upcoming year.
A complimentary approach to find which genes are affected by GWAS SNPs is to examine the physical interaction between a potential enhancer(s) in a non-coding GWAS locus and the promoter of the genes in the region using chromatin conformation capture.34 Current models of distal enhancer action suggest that physical interactions occur between factors binding enhancers and the transcription initiation complexes at proximal promoters. Thus, by physically linking these enhancer: promoter interactions and examining the frequency with which chromosomally distant DNA regions are physically near to one another by sequencing or targeted polymerase chain reaction, one can develop a interaction map of the locus. One iteration of this approach is 4C, works from a limited number of reference points, and can provide a high-resolution interaction map of a reference point with all other regions at a given locus. Alternatively, Hi-C examines the interaction between all points in the genome with all other points in the genome. However, the latter currently provides low-resolution maps due to limited sequencing depth and is only available in a small number of cell and tissue types. Investment to improve both these characteristics in Hi-C data will greatly aid in the usefulness of these data to AF and other cardiovascular diseases.
Although eQTL signals or chromatin conformation capture provides strong evidence to implicate a given gene, the absence of either is uninformative. Both these strategies rely on homogeneous tissue samples for best resolution, and myocardial samples used for the creation of current maps are a mixture of several distinct cell types. Therefore, if an important interaction were to occur in a minor subpopulation, it would be unlikely to be observed above the noise of the assay. To address this, evaluation in cell subpopulations will be made feasible as quantitative in situ RNAseq techniques such as FISSEQ continue to mature.35,36 In the meantime, despite general technical limitations, generation of publically available eQTL and Hi-C maps for the left atrium, pulmonary vasculature, or nodal tissues would be a great advance for the functional genomic analysis of AF and should be a focus of subsequent efforts.
Knowledge of functional variation is necessary to provide insight into genes that are linked with the development of AF and the ‘when and where’ of the altered regulation, but modelling these functional consequences is critical for comprehensive understanding of mechanism, secondary pharmaceutical screens, and therapeutic translation.
A definite functional analysis in the context of a complex and multifactorial disease such as AF needs a suitable animal model. So far, however, there is no ideal animal model for AF available.37,38 Large animals such as pigs have a cardiac anatomy similar to that of humans and would allow a comprehensive in vivo evaluation of the cardiovascular system including three-dimensional mapping, voltage mapping, or left atrial electrophysiology (EP) studies.38 Unfortunately, genetic manipulation to generate large transgenic animals is still not widely available, and housing and experimental costs would be prohibitive.39 Genetic engineering is routinely available in small animal models such as zebrafish and mouse models.40,41 Although zebrafish models can be used to evaluate basic EP parameters such as the action potential duration, they are limited by their small size and by the lack of a four-chamber heart.42,43 Mouse models seem to be a good compromise as transgenic mice are available, and performing an in vivo EP study including induction and evaluation of arrhythmias is possible.44,45 However, the dramatically different heart rates and the action potential duration in a mouse significantly limit the use of the model for studies of AF. A final approach would be to perform gene knockout or overexpression of candidates in induced pluripotent stem-cell-derived cardiomyocytes.46,47 These cells have the advantage of being human in origin, but have limitations related to unknown chamber specificity and various levels of maturity. Therefore, it is difficult for this model in its current state to recapitulate all of the features of the mature human myocardium.
Despite the many challenges outlined earlier, there have been some important successes in identifying functional variants at the SORT1 locus for myocardial infarction (MI)/low-density lipoprotein (LDL),48 NOS1AP in the QT interval,49 SCN10A/5A for the PR interval,50 BCL11A for foetal haemoglobin,51 and MEIS1 for restless legs syndrome,52 among others. However, the effort required for these analyses is immense and time-consuming, problems best displayed by the vast disconnect between the thousands of known GWAS loci and handful which are defined mechanistically.
It is also important to note that these studies have relied on important advantages, which are largely not available for most other GWAS loci. For example, the SORT1 locus for MI/LDL encompassed a 6 kb region which contained only six SNPs to be functionally analysed. This is notable when compared to the PITX2 locus for AF on 4q25 (Figure 2), which is ~75 kb in length and incorporates nearly 100 associated SNPs. Similar smaller regions of associations have been observed in all mechanistically defined loci, but this advantageous size is the exception rather than the rule for GWAS loci. Therefore, integration of additional refinement methods is an important pursuit which will increase the rate at which functional variants are described and defined.
In addition to the prioritization schemes using fine-mapping and epigenomic marks, the use of massively parallel reporter assays53–56 will also aid in this rate of functional variant discovery. These assays, which are capable of profiling hundreds or thousands of variants simultaneously, have been applied to the study of transcription factor base preference and to the study of enhancer organization, but their potential for the study of variants identified by GWAS has remained untapped.
Finally, following discovery of a functional SNP, the definition of mechanism for altered activity, presumably through differential transcription factor binding, has become standard. Although this has not been demonstrated at any locus for AF, CHIP-seq studies have aided discovery efforts for other GWAS loci. In those studies, CHIP-seq signals obtained from large-scale pulldown and mapping efforts were used to discover the altered transcription factor binding site. In other cases in which CHIP-seq data might not exist, databases such as JASPAR,57 Haploreg,58 PROMO,59 Transfac,60 or Uniprobe61 may be useful for predicting differential binding of transcription factors. Efforts can also be made to determine transcription factor de novo, although nuclear lysates from appropriate tissues and comprehensive follow-up in vitro and in vivo would be necessary for the establishment of a convincing mechanism.
To move forward in the dissection of AF loci will require an integrated approach with collaborative teams of investigators with a diverse set of skills to enable each of the studies outlined earlier. Progress in dissecting AF and other common disease loci has remained challenging due to the complex nature of AF disease pathogenesis, but the advent of newly designed analysis tools, release of public data sets for linking locus to gene modulation, increased ease of genetic modulation of gene expression, and improvements in sequencing technology should aid in the rate of discovery in the near future. Ultimately, only through a shared understanding of the genetics, epigenomics, three-dimensional chromatin organization, and gene modelling studies, we will reach our goal of more completely understanding the link between genetic variation and the causative mechanisms for AF.
Conflict of interest: P.T.E. is the principal investigator on a grant from Bayer HealthCare to the Broad Institute related to the genetics and therapeutics of atrial fibrillation.
This work was supported by grants from the National Institutes of Health to P.T.E. (1RO1HL092577, R01HL128914, and K24HL105780). S.C. was supported by a Marie Curie International Outgoing Fellowship within the 7th European Community Framework Programme (PIOF-GA-2012-328352) and by the German Centre for Cardiovascular Research (DZHK; 81X2600210, 81X2600204). P.T.E. is also supported by an Established Investigator Award from the American Heart Association (13EIA14220013) and by the Fondation Leducq (14CVD01).