|Home | About | Journals | Submit | Contact Us | Français|
Disease heterogeneity presents a formidable challenge for clinical medicine. We are unlikely to develop successful, targeted treatments unless diagnostic schemes begin to reflect the biologic complexity underlying superficially similar disease phenotypes. To address heterogeneity, there is an increasing recognition that realistic disease models need to be built upon a foundation of quantitative molecular information1. Such a “systems medicine” approach should ultimately allow patient classification into biologically more homogeneous groups, with similar prognosis and treatment response.
Oncology has been among the first specialty to embrace molecular profiling to improve clinical decision-making. Today, a wealth of expression profiling information exists for tumors and early efforts are being made to classify patients into likely treatment responders for specific chemotherapeutics2. The investigation of complex multisystem pathologies such as cardiovascular disease has been slower to incorporate molecular profiling, in part because of the involvement of multiple tissues, many of which are not readily accessible. Moreover, cardiovascular disease, unlike cancer, is not clonal in origin, making experimental analyses more challenging. It is clear, however, that broadly-defined diseases such as the cardiomyopathies are the product of diverse genetic and environmental agents3; one can expect that quantitative molecular profiling will shed light on distinct, pathologic activities underlying these conditions, thus allowing more meaningful classification schemes beyond those based solely on anatomic or hemodynamic considerations.
The manuscript by Barth et al in the current issue of Circulation Cardiovascular Genetics represents such a broadened search for molecular correlates of cardiovascular disease. The authors focus on the problem of heart failure with mechanical dyssynchrony (DHF) and its treatment by cardiac resynchronizaton therapy (CRT). Clinically, CRT has been a remarkable success, with significant gains in quality of life and mortality4. Multiple studies have investigated the physiologic consequences of dyssynchrony and the improvements resulting from CRT; however, the underlying molecular processes largely remain unclear. The current study is unique in its use of DNA microarrays to ask if heterogeneity resulting from DHF and the physiologic benefits from CRT are broadly reflected at a molecular level.
DNA microarrays allow scientists to simultaneously survey the expression level of thousands of mRNAs under a variety of experimental conditions. A key strength of microarray technology is that it is inherently free of inspection or ascertainment biases–a “transcriptome-wide” approach does not require preconceived notions of which biological processes are important. Unbiased approaches, including genome-wide association and metabolomics studies, have the potential to lead us to entirely unexpected disease mechanisms. However, large scale (‘omic) data presents analysis challenges of its own – requiring an understanding of measurement error, detection limits, and problems that arise when a plethora (sometimes thousands) of hypotheses are tested. Any of these issues can lead to erroneous conclusions and compromise the generalizability of the results. Fortunately, bioinformatics research has focused on these issues for more than a decade, and many solutions have become available.
Armed with ‘omic data sets and bioinformatics tools, Barth et al tackle the hypothesis that CRT reduces the regional heterogeneity in gene expression induced by DHF. The experimental design (Figure 1) features three groups:
Tissue samples from the anterior and lateral segments of the left ventricle of each dog were subjected to microarray profiling, and comparisons made within and across groups. Bioinformatics methods were used to analyze the microarray data including pathway enrichment analysis and hierarchical clustering. Each of these will be discussed below.
Typical ‘omics experiments generate lists of significantly changed genes, proteins or metabolites. In the usual microarray experiment, these lists include hundreds of genes, only a handful of which are well known to a given experimenter. It is difficult for the researcher to see the forest for the trees in a long list of changed genes, a problem exacerbated by the fact that nearly every method in experimental molecular biology contains a mixture of true and spurious results. Since the usual goal of such experiments is to understand what biologic processes differentiate the groups under comparison, new analytic techniques were required. Pathway analysis has emerged as just such a tool.
Many properties are known or measured for genes and their encoded mRNAs and protein products, including tissue expression patterns, interactions partners, and molecular functions. Systematic efforts have been made to compile such properties into databases, with a vocabulary of Gene Ontology (GO) terms (http://www.geneontology.org/) being the most widely used. Both automated and manual efforts have annotated genes with GO Terms in dozens of organisms. GO Terms cover a range of gene properties including cellular location (e.g. mitochondria), molecular function (e.g. transcription factor) and involvement in biological processes (e.g. apoptosis). Related to biological processes, biological pathways comprise groups of genes that participate towards a common cellular function such as “Oxidative Phosphorylation”. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a commonly used database of biological pathways and currently contains hundreds of groupings of genes (http://www.genome.jp/kegg/pathway.html).
This wealth of information defines many “gene sets” and we can ask whether a “query list” of changed genes from a microarray experiment shares significant similarity with any of these sets. A number of software packages have been designed to carry out this type of analysis. These broadly differ by the statistic used to measure enrichment of a gene property within a querygene listand the method used to determine statistical significance. In this manuscript, Barth and colleagues use two such packages: FatiGO (http://babelomics.bioinfo.cipf.es/EntryPoint?loadForm=fatigo) and Gene Set Analysis (http://www-stat.stanford.edu/~tibs/GSA/), which illustrate two categories of pathway enrichment analysis.
FatiGO, FuncAssociate5 and several other applications use the hypergeometric test as a measure of how well one’s query list overlaps with each predefined gene set. The relevant analytic parameters include the number of total genes in the “universe” to be considered (in this case, all the genes on the microarray), the number of genes in the query list, the number of “universe” genes corresponding to the gene set of interest, and the number of genes in the query list corresponding to that same gene set. Fisher’s exact test is used to generate a p-value for significance of overlap. For example if the canine microarray includes 20,000 genes, 500 of which correspond to the pathway “Oxidative Phosphorylation” and your query list of 100 genes includes 10 genes corresponding to the same pathway, your query list would be significantly enriched for “Oxidative Phosphorylation” (p=0.0002).
A caveat to this analysis is the fact that one is often testing one’s gene list against hundreds of pathways or thousands of GO Terms. Clearly, low p-values will arise simply by chance and some correction for multiple hypothesis testing is essential. Simple Bonferroni approaches are often applied, but these incorrectly assume that GO Terms are independent and are therefore unnecessarily conservative.
FatiGO outputs an adjusted p-value for several methods of false-discovery rate (FDR) estimation. In addition to the well-known Benjamini-Hochberg method to estimate the FDR, FatiGO uses resampling-based testing to develop the distribution of Fisher’s Exact Test results for many GO Terms under the null hypothesis. In this case, the label “in the query list” and “not in the query list” are permuted randomly among the genes in the universe and an enrichment p-value is computed for overlap of the random query list with each gene set. In principle, FatiGO and FuncAssociate can be used to find any type of overrepresented gene property, such as promoter elements and protein domains. Many packages allow you to supply your own annotations for other types of ‘omic data; the results of the analyses (including those using GO Terms or KEGG pathways) will, of course, depend on the completeness and quality of the annotations.
A second general approach to pathway enrichment is used in Gene Set Enrichment Analysis (GSEA, http://www.broad.mit.edu/gsea) and Gene Set Analysis (GSA). In these packages, a statistic is computed for each gene set that captures the degree to which genes in the set are statistical outliers in the experimental microarray comparison data. The null distribution for each statistic is generated by permuting the class labels for each microarray (e.g. shuffling case and control status) and/or the gene membership of the various gene sets.
In this manuscript, Barth and colleagues use the FatiGO tool to evaluate pathway enrichment in the list of the significantly changed genes in their comparison of the anterior wall (DHF vs. NF) and find multiple overrepresented pathways, including metabolic (e.g. Oxidative Phosphorylation) and signaling pathways (e.g. Wnt, VEGF). The pathways listed are consistent across at least two experiments, reinforcing their validity. The utility of an enriched pathway or GO Term, of course, depends on whether it can motivate a testable hypothesis. For example, knowing that a specific kinase pathway appears to be activated might lead one to evaluate the biological effects of specific kinase inhibitors. Enrichment for vague pathways may not represent an immediately testable hypothesis.
In addition to identifying underlying biological processes in a set of changed genes, pathway analysis can be used to compare different microarray experiments for “biological similarity”. Pathway-based comparisons can be used to test the generalizability of one’s results (as performed in Barth et al) or to search for other perturbations that are similar to the one under consideration (a pathway “signature”)6. When performing comparisons, one should choose a quantitative measure for comparison that allows establishment of statistical significance. Many measures are “non-classical” in that they lack established null distributions, but resampling can generate empirical distributions of the measure and enable one to estimate whether the observed similarity is significant. Typically, such pathway-focused analyses use normalized gene expression values for genes in the pathway7, or, if case-control experiments are to be compared, a fold-change value or t-statisticis used for individual genes6.
A key conclusion of the Barth et al manuscript is that CRT reduces the heterogeneity in gene expression across the left ventricle introduced by dyssynchronous heart failure. If so, CRT hearts would more closely resemble NF than DHF hearts. As one method to evaluate this hypothesis, the authors use hierarchical clustering to look for natural groupings of NF, DHF and CRT microarray samples. Hierarchical clustering is a powerful, potentially unbiased method of looking for similarities among experimental samples8. It has been applied successfully to microarray data to identify genomic features of cancers that predict mortality and chemotherapy response9, and to identify likely gene targets for established drugs10. All forms of clustering require a metric for comparison (usually Euclidean Distance or some type of correlation coefficient) and a method of grouping similar samples. Hierarchical clustering works by first clustering the most similar samples, and then successively grouping small clusters into larger ones.
Hierarchical clustering is unbiased only when clustering is performed across all genes in the array. However, there are many potential sources of both technical and biological variance in microarray analysis that are unrelated to the biological differences of interest, and it is notoriously difficult to cluster biological samples. In a tour-de-force microarray analysis of 300 genetic and experimental perturbations in yeast, Hughes and colleagues accounted for the intrinsic variability of genes in 63 control samples to greatly improve clustering ability10. Unfortunately, most microarray studies (mammalian or otherwise) have not included the large number of control samples needed to establish baseline variance for each gene. In a large series of cancer profiling experiments, Brown and colleagues limited cluster analysis to those “interesting” genes showing a 2-fold deviation from the median in some minimum fraction (~10–15%) of microarrays9. For hierarchical clustering, the Barth study limited the set of genes to a subset identified by ANOVA analysis to discriminate amongst the samples. This method appears to have selected hundreds of genes that discriminated NF from DHF and very few distinguishing NF from CRT. Failure to discriminate between two samples may stem from biological similarity between two samples or, alternatively, can arise if one or both samples are highly variable, leading to reduced power to detect significant differences. In either case, clustering based on this gene subset showed NF and CRT groups to be indistinguishable, and found both to be markedly different from DHF. Although promising in support of the hypothesis that CRT regularizes heterogeneity in ventricular gene expression, it will be important to see if this observation extends over a broader set of genes.
While CRT has proven remarkably successful in treating heart failure patients, up to a quarter of individuals still fail to respond4. Well-designed microarray experiments along with rigorous bioinformatics analyses may identify biological pathways that remain dysregulated even after CRT, and thus could be targeted by adjunct pharmacologic therapy. Such a discovery would truly represent a triumph for a “systems medicine” approach to complex, cardiovascular disease.
RCD was supported by NIH grant T32 HL007208. FPR was supported by NIH grants HL081341, HG004233, HG0017115, NS035611, HG003224 and by the Canadian Institute for Advanced Research.