|Home | About | Journals | Submit | Contact Us | Français|
Coronary heart disease (CHD) will soon become the leading cause of death and morbidity in the world. Early detection and treatment of CHD is thus imperative to improve global health. Atherosclerosis of the coronary arteries is a complex multifactorial disease process involving multiple pathways that can be influenced by both genetic and environmental factors. With the recent advances in genomics and proteomics, many new risk factors with small-to-moderate effects are likely to be identified. Additionally, individualized risk stratification and targeted therapy may become feasible; each individual could potentially be assessed with a panel of tests for genomic and proteomic markers and, on the basis of the individual’s composite risk profile, preventive and therapeutic steps could then be undertaken. With a multimarker approach, it may also be possible to identify alterations in pathways involved in atherogenesis, rather than focus on individual risk factors. In this article, we use the specific example of atherosclerosis to discuss the role of genomics and proteomics in cardiovascular risk assessment.
Significant progress was made in reducing mortality from coronary heart disease (CHD) in the western world in the past century. These advances resulted from identification of risk factors for CHD, treatment of these risk factors, and improved care of patients with acute coronary syndromes. Despite these successes, CHD will soon be the leading cause of death and morbidity in the world.1 The first manifestation of CHD is often a devastating event, such as sudden death or myocardial infarction, and a substantial number of individuals develop acute coronary syndromes at a relatively young age or in the absence or paucity of conventional risk factors. Early-onset disease is particularly common in urban centers of developing countries, such as in Southeast Asia, where there is an ongoing epidemic of CHD.2 Refining methods for early detection and treatment of CHD is thus imperative to improve global health.
Prediction of cardiovascular events is based on conventional risk factors for atherosclerosis, such as age, male sex, hypertension, diabetes, dyslipidemia, and smoking.3,4 Because these risk factors are prevalent in much of the population, available risk-prediction algorithms have less than desired accuracy.5 For example, one study showed that the algorithms correctly predicted just ~11% of the CHD events occurring within 10 years.6 New tools to better predict the cardiovascular risk of an individual are clearly urgently needed.7 Whereas several drugs to treat risk factors such as dyslipidemia and hypertension have been developed over the past few decades, the identification of new biomarkers for early detection of atherosclerotic vascular disease has lagged behind.
In the past few years, substantial progress has been made in three areas—genomics, proteomics, and imaging (the latter of which is discussed by Fuster et al.8 in this issue of Nature Reviews Cardiology)—and is likely to lead to more-accurate cardiovascular risk prediction. Although outside the scope of this Review article (and reviewed in detail elsewhere9), metabolomics—the study of the total complement of small-molecule metabolites found in or produced by an organism—is another promising tool for early detection of disease and understanding of disease pathophysiology.
Knowledge of new plasma markers and genetic susceptibility variants may provide more-precise estimates of risk while also defining the pathways perturbed in individual patients, revealing new targets for intervention, and ultimately enabling an individualized approach to care.10–12 In this article, we discuss the current state of genomics and proteomics for cardiovascular risk assessment and how advances in these fields may promote a better understanding of the complexity of atherosclerotic vascular disease. These advances may refine cardiovascular risk assessment, allow early treatment and, therefore, improve global health.
Our knowledge of the biology of atherosclerotic vascular disease has substantially increased over the past few decades. After an initial preoccupation with arterial lumen narrowing and ischemia, there has been gradual acceptance of the concept that atherosclerosis is a response to injury and that inflammation is involved in all stages of the disease. That acute coronary syndromes can occur in the absence of significant luminal stenosis is now well established. Alterations in several key pathways—including those involved in inflammation, thrombosis, lipoprotein metabolism, calcification, vascular remodeling, oxidative stress and cell death—are known to result in the development of atherosclerosis, as well as its progression and complications.13,14 Inclusion of potential markers of each of these pathways in CHD screening may lead to increased accuracy of CHD risk prediction. Furthermore, determination of how alterations in these pathways lead to atherosclerosis could allow implementation of therapies to restore perturbed pathways to their normal functions.
In addition to our increased knowledge of the cells and pathways involved in the pathogenesis of atherosclerosis, we are now also aware that the risk of acute coronary syndromes is dynamic and fluctuates over time. Although atherosclerosis generally progresses with age, there are periods of quiescence punctuated by episodic increases in plaque inflammation and growth. Most risk predictions are made on the basis of single cross-sectional profiles, and newer algorithms will need to incorporate the nonlinear behavior of atherosclerotic plaque growth and activity. One approach would be to have periodic risk assessments targeted to detect such changes in plaque behavior.
Our improved understanding of the biology of atherosclerosis has, therefore, highlighted the following two main paradigms important for CHD risk assessment: firstly, the complexity of CHD necessitates a multimodal and multimarker approach; and, secondly, there is need for periodic assessment, owing to the dynamic nature of plaque activity and CHD risk.
Family history has been described as a “…free, well-proven, personalized genomic tool that captures many of the genes and environmental interactions and can serve as the cornerstone for individualized disease prevention”.15 CHD has a significant heritable component, as exemplified by twin and family studies,16,17 and high-risk families make up a considerable proportion of early CHD cases in the general population.18 A history of early CHD in a first-degree relative approximately doubles the risk of CHD, although the reported relative risk ranges from 1.3 to 11.3.17 The familial clustering of CHD can be partly explained by heritable variation in known CHD risk factors, which can thus be quantified by using currently available risk-prediction algorithms; however, existing evidence suggests that family history contributes to an increased risk of CHD independently of the known risk factors.19 Multiple, as yet undiscovered, genetic susceptibility variants that mediate the familial clustering of CHD must, therefore, exist. Both candidate-gene and agnostic genomic approaches have been used to identify genetic variants that influence susceptibility to CHD.
The candidate-gene approach involves genotyping variants in genes with important roles in etiologic pathways of atherosclerosis, such as lipoprotein metabolism, to determine whether the variants—typically single nucleotide polymorphisms (SNPs)—are associated with adverse cardiovascular events or quantitative measures of subclinical atherosclerotic vascular disease. Candidate-gene studies of lipoprotein-related genes, which include APOE, LPA and PCSK9, have reinforced the importance of lipoprotein metabolism in atherogenesis. Presence of one or more APOE ε4 alleles is associated with dyslipidemia and increased risk of CHD.20 Similarly, variants in LPA that are associated with higher levels of lipoprotein(a) are in turn associated with an increased risk of CHD.21 Contrarily, mutations in PCSK9 that lead to lower levels of LDL cholesterol are associated with lower lifetime risk of CHD.22
By contrast to the candidate-gene studies of the lipoprotein metabolism pathway, findings from the majority of candidate-gene studies have not been replicable. Typically, between one and several SNPs in a candidate gene were genotyped in cases and controls and statistical significance was set at P <0.05. The newer generation of candidate-gene studies attempts a more-comprehensive coverage of the genetic variation at the candidate gene locus, on the basis of knowledge of neighboring SNPs being strongly correlated with each other (that is, in linkage disequilibrium), and replicates the findings either within a study sample or in an independent cohort.23 There is also growing interest in sequencing the ‘exome’—the gene-coding parts of the human genome—to identify rare variants that are associated with human diseases.24
Genome-wide association studies (GWAS; Figure 1) have become feasible owing to our knowledge of patterns of linkage disequilibrium across the genome and availability of high-throughput genotyping platforms.25 By contrast to family-based linkage studies, GWAS are easier to initiate and simply require an adequate number of cases and controls. DNA is typically extracted from white blood cells and then subjected to genotyping of anywhere from 500,000 to 1 million SNPs across the genome. Although at least 7 million common SNPs (minor allele frequency >5%) exist in the human genome, neighboring SNPs are often in linkage disequilibrium. GWAS take advantage of patterns of linkage disequilibrium, such that genotyping approximately 500,000 SNPs will cover more than 80% of the common SNPs in the genomes of nonblack populations, whereas 1 million SNPs are needed to obtain similar coverage in black populations.25
SNPs that differ in frequency between cases and controls are then said to be ‘associated’ with the disease of interest. To account for multiple testing, the P value for statistical significance is usually set at 5 × 10−8. Of note, a statistically significant SNP is not necessarily causal, but may be in linkage disequilibrium with the causal SNP. Establishing causality typically requires bioinformatics study of the locus of interest, additional genotyping, and functional assays in vitro or in experimental animal models.26
Over the past few years, GWAS have generated hundreds of replicable SNP associations for complex traits and diseases, including at least 35 loci influencing various atherosclerotic vascular diseases and related intermediate traits.25 Loci implicated in susceptibility to CHD and replicated in different populations are listed in Table 1. The susceptibility variants that have been uncovered increase the risk modestly, typically by 10–40% per risk allele. Notably, however, many of the alleles associated with increased risk are frequently found in the population, so the population-attributable risk is often substantial.27 Many more such loci associated with weak effects on risk will probably be identified by GWAS of large sample sizes and also as a result of meta-analyses that combine results of several studies. The discovery of new, robust and replicable CHD-associated SNPs in GWAS might confer clinical value by facilitating multiplex testing of these SNPs in an individual, thereby yielding predictive information that is incremental to assessment of conventional risk factors.
Over the past several years, substantial advances have taken place in genotyping technology, including SNP ‘arrays’ that can allow assays of up to 1 million SNPs. Detection of SNPs is by means of specific oligonucleotide probes (25–50 base pairs in length), which are placed either on beads 2 μm in diameter or ligated directly on to glass slides (SNP array technology is reviewed in detail elsewhere28). In addition to rapid technological advances, the cost per genotype has gone down dramatically. These developments have enabled the widespread use of GWAS and the identification of many new genetic variants associated with common diseases. Custom-designed SNP assays can be utilized for follow-up replication and validation studies.
In addition to sequence variation, structural variation contributes substantially to genetic variation in humans. Copy-number variants—defined as fragments of the genome that are larger than 1 kb and vary in copy number between individuals—are being investigated for their contribution to common diseases including cardiovascular diseases.29 To date, few copy-number variants implicated in susceptibility to atherosclerosis have been identified; however, one example is variation in the region on chromosome 6 that harbors the LPA gene.30 The current SNP genotyping arrays incorporate assays for copy-number variants, and new analysis algorithms are being tested to infer copy-number variants from SNP genotyping data.
In the near future, high-density genotyping and candidate-gene multiplex-SNP testing might both be replaced by sequencing of entire genomes or ‘exomes’. The milestone of the $1,000 genome, once thought improbable, now seems to be likely in the not-too-distant future. Sequencing of candidate genes, exons, or whole genomes will allow identification of ‘rare’ susceptibility variants that may have stronger effects on disease susceptibility.
Because of small relative risks and odds ratios, most genetic susceptibility variants have low discriminative accuracy and contribute only marginally to the c-statistic, compared with existing risk factors, such as presence of diabetes or history of smoking.31–33 The predictive value of genetic profiling for CHD and related traits has been investigated in several studies. Talmud et al. found that a chromosome 9p21.3 SNP (rs10757274) did not add substantially to the Framingham risk score for predicting CHD events in 2,742 men, but led to reclassification of risk in 22%, of whom 63% moved into more-accurate categories (defined by the observed risk corresponding better to the predicted risk in the new category).34 Two studies published in 2009 also investigated whether the addition of the chromosome 9p21 allele to conventional risk factors improved CHD risk prediction; these studies yielded conflicting results.35,36
The effect of susceptibility genetic variants may vary in different environmental backgrounds. The study of gene–environment interactions in the setting of GWAS is important to improve understanding of disease pathways, facilitate individualized medicine to maximize response or minimize adverse effects on the basis of genetic susceptibility, and improve prediction of an individual’s risk of disease or prognosis, and of potential changes in risk in relation to modifiable environmental factors.37 Improved methods to quantify environmental exposures and large sample sizes are needed to accelerate progress in this area.
Several web-based companies have already started to market genotyping of disease-susceptibility variants to the public.38 Estimation of CHD risk is a prominent feature of the reports given to the consumers after genotyping. Given the proliferation of direct-to-consumer genetic testing, there is a need to study the integration of predictive genetic risk assessment in clinical practice.39,40 Furthermore, although genetic testing may improve accuracy of risk profiles and thereby aid in early detection of CHD, whether genetic testing will improve outcomes remains to be established. Box 1 details important points to consider when evaluating the utility of genetic testing, as proposed by Haddow and Palomaki in 2004.41 Several aspects require further investigation, including whether genetic testing could lead to potential harm (for example, by causing undue anxiety), how physicians should deal with predictive genetic information, how to communicate risk to patients such that they can make sense of it, and the effect of communication of genetic risk on patient behavior and lifestyle.
Ability to measure the genotype of interest both accurately and reliably.
Ability of a genetic test to detect or predict the presence or absence of the phenotype or clinical disease.
The likelihood that the genetic test will lead to an improved outcome, including test and disease characteristics.
Impact of genetic test results on insurance and employment, privacy and confidentiality, equity of access, and stigmatization.
As for genetic variation, the expectation that a single protein marker will provide significant incremental information that will be useful for early detection of cardiovascular disease is unrealistic.13,42 Given the complexity of atherosclerotic vascular disease, a multimarker approach is likely to be more informative than use of a single marker because it might help detect perturbations in one or more of the various etiologic pathways of atherosclerosis.13,43 One caveat to the multimarker approach is that if the candidate biomarkers derive from pathways that are correlated with clinical characteristics that are already being measured (for example, lipid levels or inflammatory markers), their incremental predictive value is limited. The failure of correlated biomarkers to provide incremental predictive value was illustrated by Pepe and colleagues.44
To identify new biomarkers that are not correlated with the known candidate markers and are associated with risk of CHD, an unbiased approach is needed. The utility of an agnostic approach—which assumes no prior knowledge of disease pathophysiology—has been highlighted by the often unexpected genetic susceptibility variants identified by GWAS. The results of these studies also highlight that our knowledge of disease pathophysiology is far from complete.
The principal technology of proteomic discovery is mass spectrometry (MS) (Figure 2). In what is often termed ‘shotgun proteomics’, proteins in a particular sample are initially digested by trypsin into their constituent peptides and then separated by high-performance liquid chromatography. The sequences of these peptides are then established by tandem MS (MS/MS) experiments in which parent ions are fragmented by collision with nitrogen or helium gas to produce fragment ions. The resulting product ion mass spectra plot the mass-to-charge ratio of the ions observed versus detected ion abundance.45 Software tools can then be used to deduce the amino-acid sequence and, if applicable, the quantity of the proteins in a sample and the peptide identity to the MS/MS spectra, through database searching.46 MS can also be useful in the triage and validation of novel proteins found in discovery efforts. Sensitivity can be increased if MS is coupled with immunogenic extraction and enrichment methods to increase low concentration target selection.47,48 To date, no specific markers for clinical use in CHD risk stratification have emerged from MS studies.
Additional technology platforms that incorporate MS for proteomic biomarker discovery include pattern-based methods that produce MS-derived protein patterns via surface-enhanced laser desorption–ionization (SELDI) or matrix-assisted laser desorption–ionization (MALDI). Further details of these technologies are beyond the scope of the present article and are provided in reviews published elsewhere.45,46,48 A substantial disadvantage of these approaches is that once a differential pattern is detected, identification of the peaks in the pattern is often challenging.49 Without knowing the identity of the candidate peaks, it is difficult to gain insight into the disease pathophysiology and to move to higher-throughput, widely disseminated, clinical assay platforms, which are typically immunoassays (see the section on assay standardization below). Furthermore, pattern-only approaches face considerable challenges when used as diagnostic tests in the clinical setting.10
Pattern-based and identity-based methods can be combined, as in MS/MS analysis of selected spots from differential protein displays, such as two-dimensional polyacrylamide gel electrophoresis (2D-PAGE). The limitations of this approach include limited sensitivity, reproducibility and throughput; these limitations are ameliorated to a certain degree by the method of upfront liquid-chromatography followed by MS/MS.
Although proteomic technologies have the potential to identify novel proteins in the plasma that can improve accuracy of cardiovascular risk prediction, many challenges exist. Some of the main technical challenges for discovery proteomics are associated with depleting abundant proteins in the plasma to obtain the less-abundant proteins (the ‘deep proteome’). Another issue is that, by contrast to only ~25,000 genes in the human genome, hundreds of thousands of unique proteins circulate in the blood, owing to alternative splicing of genes leading to different transcripts from the same gene, as well as to post-translational modification of proteins. The latter includes glycosylation, which is particularly relevant to atherosclerosis given the role of advanced glycation end-products in increasing oxidative stress in the vascular wall.50
Unlike the GWAS agnostic approach, which is a productive method for identification of genetic variants that influence susceptibility to common chronic diseases, an agnostic proteomics approach has not yet had much success. Advances in proteomic technology, standardizing the collection and storage of specimens, preventing degradation during storage, and using well-defined cases and controls will increase the likelihood of success in identifying novel markers that may help in disease prediction and prognostication.46,51
Considerable work is required for biomarkers to be successfully transitioned from proteomic discovery to clinical use for risk prediction.52 Most research-based protein marker assays are not validated according to Clinical Laboratory Improvement Amendments (CLIA) standards. Before clinical use in the USA, these assays must meet CLIA requirements53 in terms of accuracy, precision, analytical sensitivity, analytical specificity, reportable range of test results, and reference intervals. Furthermore, these procedures must be clinically validated using samples from patients with well-characterized clinical profiles. This translation process demands an established infrastructure and multidisciplinary expertise, particularly for assays that provide multiple, parallel protein measurements on the same specimen (‘multiplex assays’).49,54–56 Extensive validation is required for multiplex protein test panels intended for use in clinical trials or diagnostic laboratories.57 To date, FDA-cleared protein multiplex assays consist primarily of the lateral-flow immunoassays used for point-of-care evaluation.58 No multiplex assays are currently available for CHD risk prediction in asymptomatic individuals.
Several studies have tested whether multiple markers increase the ability to predict adverse cardiovascular outcomes. Wang and colleagues evaluated 10 biomarkers of cardiovascular disease in over 3,000 study participants who were followed for development of cardiovascular disease for a median follow-up of 7.4 years.59 Of the 10 biomarkers measured, many were significant predictors of cardiovascular events (B-type natriuretic peptide and urinary albumin excretion) and mortality (C-reactive protein, B-type natriuretic peptide, urinary albumin excretion, renin, and homocysteine). Risk prediction was improved by combining biomarkers into a multimarker score; however, combining the biomarkers yielded only a modest increment in the c-statistic. By contrast, a study of elderly men in Sweden revealed that four biomarkers—N-terminal pro-B-type natriuretic peptide (NT-proBNP), C-reactive protein, cystatin, and troponin—led to significant improvement in the c-statistic for predicting myocardial infarction and death during a median follow-up of 10 years.60
Evaluation of diagnostic or predictive tests uses the c-statistic as a measure of the test’s ability to discriminate individuals with disease from those without disease.61 Distributions of biomarkers in individuals with and without cardiovascular disease typically overlap a great deal, which provides one explanation for the modest increase in the c-statistic seen in most clinical bio-marker studies of cardiovascular risk.62 Because it is difficult to demonstrate improvements in the c-statistic, some investigators have advocated use of other metrics (described below) to evaluate the predictive utility of new biomarkers.63,64
The net reclassification index refers to the proportion of persons who change risk categories when prediction models incorporate new biomarkers.63 Reclassification refers to the ability of new biomarkers to move people between discrete risk categories, so that some low-risk individuals may be reclassified as high-risk and vice versa. For CHD, reclassification is meaningful and clinically relevant, given the widespread use of low-risk, intermediate-risk, and high-risk categories of 10-year CHD risk. Movement between risk categories might not be clinically relevant if there is no management strategy explicitly linked to the categorization, or if most of the movements involve small shifts in absolute risk from just below to just above the cutoff point.65 However, if risk categories are defined according to cutoff points used to indicate type or intensity of interventions, as is true for CHD, reclassification can impact on clinical management.
Integrated discrimination improvement is another method of assessing the discriminative value of a bio-marker beyond known risk factors,64 and has been used in the setting of CHD.37 This method measures the difference in discriminative ability between two models according to their predicted survival probabilities. Comparing one model to another, an increased probability of an outcome among individuals who eventually did have the outcome, and a decreased probability of an outcome among subjects who did not have the outcome implies better predictive ability, whereas the opposite implies worse predictive ability. These two changes are summed to give the integrated discrimination improvement (with improvement always being considered positive).
Clinical utility refers to the likelihood that measurement of a biomarker will lead to an improved outcome.41 To determine clinical utility of a new biomarker, several features in addition to the ability of the marker to discriminate between individuals who will or will not develop cardiovascular disease need to be evaluated. These considerations include the potential added benefits (including prevention of adverse cardiovascular outcomes and reduction of health-related costs) and harms (for example, anxiety, excessive downstream testing and unwarranted pharmacotherapy) of using the new biomarker. Although this concept remains to be proven, it is hoped that measurement of new biomarkers will promote cardiovascular health by enabling early disease detection and management.66 The availability of clinically useful biomarkers for the evaluation of cardiovascular risk might enable the health-care system to become more proactive, moving the focus away from treatment of end-stage disease and towards early detection of disease risk and prevention of adverse outcomes.
Atherosclerosis is a complex multifactorial disease process involving multiple pathways that are influenced by genetic and environmental factors.14 At the population level, efforts to reduce the prevalence of conventional risk factors, and to treat these factors when present, are enormously important for reduction of the burden of cardiovascular disease. However, to refine cardiovascular risk assessment in an individual, and thereby promote cardiovascular health, further work is needed to elucidate the role of the newer genomic and proteomic markers. Ideally, a new marker should add to the accuracy of risk prediction beyond the traditional risk factors, have a standardized and reproducible assay with established cutoff points to guide interpretation of the results, and have a therapeutic intervention available that leads to a reduction in the incidence of cardiovascular events.67–69
An important component of individualized medicine—a concept currently surrounded by a great deal of interest—is knowledge of which risk factors are increased in the individual. With advances in genomics and proteomics, robust, individualized risk stratification and targeted therapy may become feasible in the near future. Many risk factors with small-to-moderate effects are likely to be identified as a result of advances in the genomic and proteomic sciences,70 and each individual could potentially be assessed with a panel of tests for genomic and proteomic markers. With a multimarker approach, it may also be possible to identify alterations in specific pathways of atherogenesis, rather than focus on individual risk factors; this approach might aid in provision of the best preventative and therapeutic interventions for each individual, which would restore the specific perturbed pathways to their normal function.
Database searched included PubMed and MEDLINE, for articles published from 1995 to present, using the search terms “genomics”, “proteomics”, “genome-wide association studies”, “biomarker discovery”, and “mass spectrometry”. Only articles published in English were considered.
This work was supported in part by grants HL81331 and HG04599 from the NIH, USA, and a generous gift from the Marriot family.
The authors declare no competing interests.