|Home | About | Journals | Submit | Contact Us | Français|
Correspondence to: Dr. Cecilia Vecoli, CNR, Institute of Clinical Physiology, via Aurelia Sud, 54100 Massa, Italy. vecoli/at/ifc.cnr.it
Telephone: +39-585-493693 Fax: +39-585-493501
In the last few years, the advent of next generation sequencing (NGS) has revolutionized the approach to genetic studies, making whole-genome sequencing a possible way of obtaining global genomic information. NGS has very recently been shown to be successful in identifying novel causative mutations of rare or common Mendelian disorders. At the present time, it is expected that NGS will be increasingly important in the study of inherited and complex cardiovascular diseases (CVDs). However, the NGS approach to the genetics of CVDs represents a territory which has not been widely investigated. The identification of rare and frequent genetic variants can be very important in clinical practice to detect pathogenic mutations or to establish a profile of risk for the development of pathology. The purpose of this paper is to discuss the recent application of NGS in the study of several CVDs such as inherited cardiomyopathies, channelopathies, coronary artery disease and aortic aneurysm. We also discuss the future utility and challenges related to NGS in studying the genetic basis of CVDs in order to improve diagnosis, prevention, and treatment.
In the years after 2000, the completion of the Human Genome Project (HGP) has completely changed the approach to many genetic research studies.
Indeed, the knowledge of the genome sequence has been increasingly important in order to define the basis of human biology and medicine, providing a single, essential reference for all genetic information. Currently, ten years after the HGP, a new technology, next-generation sequencing (NGS), has revolutionized the genomic and transcriptomic approaches to biology reducing the sequencing cost and significantly increasing the throughput. Whole-genome sequencing has become a possible and efficient way to obtain global genomic information.
At present, Roche/454 (Roche), Solexa (Illumina) and AB SOLiD (Applied Biosystem) are the NGS technologies predominantly used in genetics. In all NGS platforms, a whole genome, or targeted regions of the genome, are randomly digested into small fragments (or short reads) which are sequenced and are then, either aligned to a reference genome or assembled (Figure (Figure1).1). The unique combination of specific protocols distinguishes the NGS technology determining limits or advantages. This new strategy of sequencing producing many short reads (tens or hundreds of Gbp for each run) has made necessary the development of several bioinformatics tools to perform the correct alignment/assembly or to analyze large amounts of data. To date, many bioinformatics programs have been created for the different platforms of NGS.
For instance, Mapping and Assembly with Quality (MAQ) is a very popular NGS software program developed to efficiently map short reads to a reference genome and derive genotype calls to the consensus sequence with quality scores. MAQ is one of the first reference guided assembly programs. It is accurate, efficient, versatile and has been successfully applied to several NGS projects. Efficient Large-Scale Alignment of Nucleotide Databases (ELAND) is a different NGS program designed to search DNA files for short DNA reads allowing up to 2 errors per match[7,8]. Benchmarks comparing ELAND with other popular NGS software, such as MAQ, Basic Local Alignment Search Tool, Short Oligonucleotide Alignment Program or SeqMap generally place ELAND as one of the fastest available programs. Since many of the programs are open source, additional programming may be needed to modify the program to the needs of a specific NGS project. Finally, some online utility programs, such as EagleView or LookSeq, provide some additional assistance on NGS data analysis and interpretation.
However, data management remains the biggest challenge in NGS and the major limiting factor in moving the sequencing to the clinical setting.
Indeed, the production of large numbers of low-cost reads could make the NGS platforms useful for many applications including variant discovery by re-sequencing targeted regions of interest or whole genomes, cataloguing the transcriptomes of cells, and genome-wide profiling of epigenetic marks.
Furthermore, it is expected that NGS will be increasingly important in the studies of complex diseases, such as common cardiovascular diseases in which one or more variants in a single gene or more variants in different genes are involved. Currently, the NGS approach to the genetics of cardiovascular diseases (CVDs) represents a territory which has not been widely investigated.
The purpose of this paper is to discuss the recent results of NGS in monogenic classic and in complex genetic cardiovascular disorders, such as inherited cardiomyopathy, channelopathies and coronary artery disease (CAD). We also discuss the potential contribution of future NGS applications in order to significantly improve our understanding of the genetics of CVDs.
In the etiology of the most CVDs a clear hereditary component has been demonstrated.
The CVDs can be divided in two major categories: the monogenic (more rare) and the polygenic/multifactorial forms (Figure (Figure22).
In the monogenic diseases, the mutation of a single gene causes the pathology. These diseases are rare Mendelian traits that show the classical inheritance patterns: autosomal dominant, autosomal recessive, X-linked, or mitochondrial (maternally inherited). Examples of these traits in cardiovascular medicine include structural cardiomyopathies (i.e., hypertrophic or dilated cardiomyopathy) and channelopathies (i.e., Brugada and long QT syndrome) as well as familial dyslipidemias.
In clinical practice, the most common CVDs (i.e., CAD) are complex traits that arise from elaborate gene-gene and gene-environmental interactions that confer risk for disease in a probabilistic manner. In these cases, a series of polymorphic variants in several genes increases the risk of developing the disease.
The inherited cardiomyopathies are heterogeneous diseases caused by functional abnormality of cardiac muscle. Hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) are two major clinical forms of inherited cardiomyopathy. HCM, the major cause of sudden death in young people and of heart failure, is characterized by left ventricular hypertrophy, often asymmetric, accompanied by myofibrillar disarrays and reduced compliance (diastolic dysfunction) of cardiac ventricles. Conversely, DCM is characterized by dilated ventricular cavity with systolic dysfunction. The clinical symptom of DCM is heart failure, often associated with sudden death. More than half of HCM patients have a family history of the disease consistent with an autosomal dominant genetic trait. In the case of DCM, about 20%-35% of patients show a family history of the disease (consistent with autosomal dominant inheritance) although some familial cases can be explained by an autosomal recessive or X-linked recessive trait[18-22].
The inherited forms of cardiomyopathies can be caused by mutations in at least 30 different genes. A specific genetic test in patients with cardiomyopathy is of immense clinical importance since some genetic forms of heart muscle diseases are associated with disease manifestation at an early age, an overall poor prognosis, or a high incidence of sudden cardiac death.
Characterizing the genetics of DCM has been a challenging task due to incomplete knowledge of the genes involved in the disease. Unlike HCM, which is largely a disease of the sarcomere (more than 450 mutations in 16 genes have been identified codifying for the sarcomere proteins), the pathways leading to DCM are considerably more diverse, involving genes encoding components of the sarcomere, Z-disk, nuclear lamina proteins, intermediate filaments, and the dystrophin-associated glycoprotein complex.
NGS offers a new approach in the diagnosis of cardiomyopathies, and it has recently been used to characterize both HCM and DCM patients. At the present time, despite its good cost-efficiency ratio, NGS is not suitable for clinical practice because of a lack of efficient reduction in genomic complexity and established protocols.
Using HCM as a diagnostic model, Dames et al developed a NGS-based approach for multi-gene re-sequencing in a clinical laboratory. Sixteen genes implicated in HCM have been sequenced using an uncharacterized human DNA sample. A Long-Range polymerase chain reaction (PCR) (a PCR method used to amplify a long region of genome) was used for gene enrichment, followed by comparative sequencing on the Illumina Genome Analyzer and Roche 454 GS FLX platforms. Both platforms detected several different variants, of which only 27 were common and have been confirmed by Sanger classical sequencing. According to these results the authors proposed a targeted re-sequencing by combining Long-Range PCR and NGS as a new approach for multigene analysis.
Conversely, Meder et al established a microarray-based target enrichment followed by SOLiD NGS for a comprehensive and cost-efficient genetic diagnosis of cardiomyopathies. This approach increased the mean depth of coverage of cardiomyopathy genes analyzing 1092 disease exons and adjacent intronic regions in only one NGS run and identified 1891 sequence variants within these regions (of which 349 were nonsynonymous).
NGS has also been used to study the maternally-inherited cardiomyopathy caused by mutations in mitochondrial DNA. Zaragoza et al studied 18 patients with mitochondrial cardiomyopathy and two patients with suspected mitochondrial disease. Sequencing PCR-amplified mtDNA with a single run on Roche’s 454 Genome Sequencer identified 427 variants.
Recently, Herman et al analyzed the gene encoding the sarcomere protein, titin (TTN), in subjects with DCM, subjects with HCM, and in controls using NGS, and evaluated the deleterious variants for cosegregation in families and assessed clinical characteristics. Using this approach they found that TTN truncating mutations are a common cause of DCM, occurring in approximately 25% of familial cases of idiopathic DCM and in 18% of sporadic cases. The authors concluded that the incorporation of sequencing approaches which detect TTN truncations in genetic testing for DCM should substantially increase test sensitivity, thereby allowing earlier diagnosis and therapeutic intervention for many patients with this pathology.
Finally, the NGS tools may be used to develop RNA sequencing methodologies for high-throughput comprehensive analysis of individual transcriptomic profiles. In Gαq transgenic mice, a well-characterized model of cardiac hypertrophy/cardiomyopathy, the results of sequencing through NGS (Illumina platform) have been compared with an array-based transcriptional profiling.
The results of this study revealed superior dynamic range for mRNA expression and enhanced specificity for reporting low-abundance transcripts by RNA sequencing.
Together these studies suggest that the application of NGS tools in the inherited cardiomyopathies will be increasingly important to define the genetic component of these disorders and to detect cardiomyopathy-causing mutations with high accuracy in a fast and cost-efficient manner which will be suitable for daily clinical practice of genetic testing.
A different group of CVDs included in inherited cardiomyopathies is the primary electrical diseases such as Brugada syndrome (BrS) and long QT syndrome (LQTS). Each of these cardiac channelopathies is characterized by a unique genetic profile and clinical features. Advances in molecular biology have allowed the identification of many disorders linked to specific genes previously ascribed as idiopathic. Genetic studies in families with LQTS have associated this disorder with gene mutations affecting cardiac ion channels - specifically the sodium and potassium channels. To date, hundreds of variants have been identified in 13 LQTS genes.
Conversely, BrS has been associated with more than 100 mutations in 7 genes. Loss-of-function mutations in the SCN5A gene, which encodes the alpha-subunit of the sodium ion channel, causes 18%-30% of BrS cases.
To date, NGS methodologies have not been applied to these disorders.
In addition, since more genes are involved in these diseases, the application of NGS will provide important advantages for identifying pathogenetic mutations in a fast and cost-efficient manner.
CAD remains the leading cause of death in industrialized and developing countries.
It has been estimated that heritable factors contribute 30%-60% of the inter-individual variation in the risk of CAD.
Mendelian disorders such as familial dyslipidemia which lead to alterations in the lipid profile are heritable risk factors for CAD. While these rare mutations are well-recognized and well-characterized, the identification of common genetic variants associated with CAD is more difficult despite strong evidence that disease susceptibility is heritable.
Recently, genome-wide association studies (GWAS) have identified several common variants (single nucleotide polymorphisms, SNPs) associated with the risk of CAD. Notably, these SNPs are not inherited independently, but as “bins” or “blocks”. Furthermore, the genotype of 1 SNP may be sufficient to affect the genotype of all other SNPs within a given linkage disequilibrium block (haplotype), thereby “tagging” an entire region of interest.
Next generation sequencing has enabled targeted re-sequencing of genomic regions found to be involved in the disease. In particular, the re-sequencing of genomic regions identified by GWAS in healthy and diseased populations represents a powerful strategy for assessing the contribution of rare variants to disease etiology. This is because NGS is able to identify rare genetic variants with a minor allele frequency (MAF) of < 5%, which complement the common susceptibility SNPs (MAF > 5%) established through the GWAS.
Interestingly, a very recent study analyzed sequence data from a 240-kilobase (kb) region on chromosome 9p21 in 47 individuals using the Illumina GA platform. The authors compared the results of targeted sequencing with NGS to pilot data from the 1000 Genomes Project (characterized by a description of the location, allele frequency and local haplotype structure of approximately 15 million SNPs). The findings showed that the targeted sequencing provides high sensitivity for lower-frequency variants despite several gaps in sequence coverage which existed after the alignment to a reference genome.
Furthermore, NGS has been used to detect rare variants in the gene encoding adiponectin. In this study, a combination of family-based linkage, whole-exome sequencing (by NGS), direct sequencing and association methods has been developed in order to efficiently identify rare variants associated with large effects in families from the Insulin Resistance Atherosclerosis Family Study. These results suggest that this approach could be advantageous in discovering novel genes influencing complex traits in a wide range of family studies.
The NGS technologies have also been applied in other cardiovascular diseases with complex traits, such as aortic aneurysm. Harakalova et al performed a pilot experiment designed to find an efficient method for the detection of rare genetic variants in regions of interest in large sample groups with aortic aneurysm using SOLiD platform. They discussed the challenges and limitations connected with this approach and showed that the high number of novel variants detected per pool can be limiting factors for successful variant prioritization and confirmation. Indeed, they discovered 681 coding variants, however, the majority of the detected candidate novel variants were false positives.
Moreover, a very recent study using exome sequencing of 2 distantly affected relatives, efficiently and successfully identified a frameshift mutation in the SMAD3 gene as the cause of vascular disease in a family with arterial aneurysms and dissections inherited in an autosomal dominant pattern. Subsequent sequencing of families involving multiple members with thoracic aortic aneurysms and acute aortic dissections identified SMAD3 mutations in 2% of familial thoracic aortic aneurysms and dissections.
Additionally, Sakai et al used two different technologies (re-sequencing array technology and NGS) to analyze eight genes associated with syndromic aortic aneurysms and/or dissections. They identified eighteen variants with both technologies and concluded that NGS was able to detect almost all types of mutations, but it requires improved informatics methods.
Finally, NGS with the Illumina platform has been used to sequence genomic DNA in a nuclear family with a history of thrombophilia. Two hundred variants have been identified and compared with different groups of HapMap populations. This method has allowed the identification of multigenic risk for inherited thrombophilia and appropriate pharmacological therapy.
Table Table11 summarizes the current application of NGS in cardiovascular disorders. NGS technology promises to improve our understanding of the genetic architecture and the missing heritability of CVDs.
In the near future, NGS will revolutionize the genetic study of cardiovascular disease allowing unprecedented opportunities to detect mutations in disease-genes with high accuracy in a fast and cost-efficient manner in daily clinical practice.
In particular, the targeted re-sequencing of the region of interest selected by GWAS, using NGS technologies, will allow identification of rare SNPs involved in the risk of CVD.
To date, data on the reproducibility of NGS results in the cardiovascular setting are limited by too few available studies. In addition, results in other more explored fields such as cancer showed that two independent groups can simultaneously arrive at different sets of gene alterations, without overlap between the two sets of mutations identified[54,55]. This finding suggests that the reproducibility of data could be one major limitation of these advanced techniques. Furthermore, as previously cited, Dames et al demonstrated that different variants were found using two different NGS tools of which only a few were successively confirmed by the conventional Sanger approach of sequencing. Thus, new efforts are needed to improve sequencing accuracy and streamline technical processes as the next steps toward transitioning NGS into the clinical laboratory.
The major challenge in NGS is that although it produces an enormous volume of data cheaply, in most cases, the millions of reads generated do not cover the coding regions of disease genes. Indeed, NGS provides only 50-500 continuous base-pair reads, making it difficult for both the assembly and the data analysis. Therefore, new methods should be developed to selectively capture DNA from the region of interest in order to sequence only targeted regions.
In addition to short DNA sequence reads, these technologies can generate terabyte-sized data files for each instrument run, greatly increasing the computer resource requirements. Given the vast amount of data produced by NGS, the creation of informatics tools for the storage and analysis of data will be essential to the successful application of NGS.
In the future, with the advent of NGS and the progressive increase in data sequences of the human genome from projects such as HapMap and the 1000 Genomes Project, investigators will have to choose between the multiple strategies to test a reference panel of polymorphic sites.
Moreover, parallel genome-wide studies are characterizing a large number of genes affecting the risk factors for CAD including dyslipidemia, hypertension, diabetes mellitus, and obesity. These findings are to be integrated with loci directly associated with CAD to obtain the fullest picture. Thus, in the next few years, the main focus of these studies will be to define a risk prediction as well as a preventive and individual therapy for CAD.
In the last few years, a technological revolution has taken place in the field of epigenomics.
The development of NGS devices are now providing researchers with tools to draw high-resolution maps of DNA methylation and histone modifications in normal tissues and diseases. NGS technologies may be used to profile epigenetic alterations that influence gene expression and to study the genome-wide epigenetic changes that occur in the genome in cardiovascular disease.
Peer reviewers: Cristina Vassalle, PhD, G. Monasterio Foundation and Institute of Clinical Physiology, Via Moruzzi 1, I-56124 Pisa, Italy; Yuri V Bobryshev, PhD, Associate Professor, School of Medical Sciences, Faculty of Medicine, University of New South Wales, Kensington, NSW 2052, Australia; Mohamed Chahine, PhD, Professeur Titulaire, Le Centre de Recherche Université Laval Robert-Giffard, Local F-6539, 2601 chemin de la Canardière, Québec G1J 2G3, Canada
S- Editor Cheng JX L- Editor Webster JR E- Editor Li JY