PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1483345)

Clipboard (0)
None

Related Articles

1.  ContrastRank: a new method for ranking putative cancer driver genes and classification of tumor samples 
Bioinformatics  2014;30(17):i572-i578.
Motivation: The recent advance in high-throughput sequencing technologies is generating a huge amount of data that are becoming an important resource for deciphering the genotype underlying a given phenotype. Genome sequencing has been extensively applied to the study of the cancer genomes. Although a few methods have been already proposed for the detection of cancer-related genes, their automatic identification is still a challenging task. Using the genomic data made available by The Cancer Genome Atlas Consortium (TCGA), we propose a new prioritization approach based on the analysis of the distribution of putative deleterious variants in a large cohort of cancer samples.
Results: In this paper, we present ContastRank, a new method for the prioritization of putative impaired genes in cancer. The method is based on the comparison of the putative defective rate of each gene in tumor versus normal and 1000 genome samples. We show that the method is able to provide a ranked list of putative impaired genes for colon, lung and prostate adenocarcinomas. The list significantly overlaps with the list of known cancer driver genes previously published. More importantly, by using our scoring approach, we can successfully discriminate between TCGA normal and tumor samples. A binary classifier based on ContrastRank score reaches an overall accuracy >90% and the area under the curve (AUC) of receiver operating characteristics (ROC) >0.95 for all the three types of adenocarcinoma analyzed in this paper. In addition, using ContrastRank score, we are able to discriminate the three tumor types with a minimum overall accuracy of 77% and AUC of 0.83.
Conclusions: We describe ContrastRank, a method for prioritizing putative impaired genes in cancer. The method is based on the comparison of exome sequencing data from different cohorts and can detect putative cancer driver genes.
ContrastRank can also be used to estimate a global score for an individual genome about the risk of adenocarcinoma based on the genetic variants information from a whole-exome VCF (Variant Calling Format) file. We believe that the application of ContrastRank can be an important step in genomic medicine to enable genome-based diagnosis.
Availability and implementation: The lists of ContrastRank scores of all genes in each tumor type are available as supplementary materials. A webserver for evaluating the risk of the three studied adenocarcinomas starting from whole-exome VCF file is under development.
Contact: emidio@uab.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btu466
PMCID: PMC4147919  PMID: 25161249
2.  Disease candidate gene identification and prioritization using protein interaction networks 
BMC Bioinformatics  2009;10:73.
Background
Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses.
Results
For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds"), and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings – for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method – the three methods achieved a comparable AUC value, suggesting a similar performance.
Conclusion
Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.
doi:10.1186/1471-2105-10-73
PMCID: PMC2657789  PMID: 19245720
3.  Exploiting Protein-Protein Interaction Networks for Genome-Wide Disease-Gene Prioritization 
PLoS ONE  2012;7(9):e43557.
Complex genetic disorders often involve products of multiple genes acting cooperatively. Hence, the pathophenotype is the outcome of the perturbations in the underlying pathways, where gene products cooperate through various mechanisms such as protein-protein interactions. Pinpointing the decisive elements of such disease pathways is still challenging. Over the last years, computational approaches exploiting interaction network topology have been successfully applied to prioritize individual genes involved in diseases. Although linkage intervals provide a list of disease-gene candidates, recent genome-wide studies demonstrate that genes not associated with any known linkage interval may also contribute to the disease phenotype. Network based prioritization methods help highlighting such associations. Still, there is a need for robust methods that capture the interplay among disease-associated genes mediated by the topology of the network. Here, we propose a genome-wide network-based prioritization framework named GUILD. This framework implements four network-based disease-gene prioritization algorithms. We analyze the performance of these algorithms in dozens of disease phenotypes. The algorithms in GUILD are compared to state-of-the-art network topology based algorithms for prioritization of genes. As a proof of principle, we investigate top-ranking genes in Alzheimer's disease (AD), diabetes and AIDS using disease-gene associations from various sources. We show that GUILD is able to significantly highlight disease-gene associations that are not used a priori. Our findings suggest that GUILD helps to identify genes implicated in the pathology of human disorders independent of the loci associated with the disorders.
doi:10.1371/journal.pone.0043557
PMCID: PMC3448640  PMID: 23028459
4.  Deducing corticotropin-releasing hormone receptor type 1 signaling networks from gene expression data by usage of genetic algorithms and graphical Gaussian models 
BMC Systems Biology  2010;4:159.
Background
Dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis is a hallmark of complex and multifactorial psychiatric diseases such as anxiety and mood disorders. About 50-60% of patients with major depression show HPA axis dysfunction, i.e. hyperactivity and impaired negative feedback regulation. The neuropeptide corticotropin-releasing hormone (CRH) and its receptor type 1 (CRHR1) are key regulators of this neuroendocrine stress axis. Therefore, we analyzed CRH/CRHR1-dependent gene expression data obtained from the pituitary corticotrope cell line AtT-20, a well-established in vitro model for CRHR1-mediated signal transduction. To extract significantly regulated genes from a genome-wide microarray data set and to deduce underlying CRHR1-dependent signaling networks, we combined supervised and unsupervised algorithms.
Results
We present an efficient variable selection strategy by consecutively applying univariate as well as multivariate methods followed by graphical models. First, feature preselection was used to exclude genes not differentially regulated over time from the dataset. For multivariate variable selection a maximum likelihood (MLHD) discriminant function within GALGO, an R package based on a genetic algorithm (GA), was chosen. The topmost genes representing major nodes in the expression network were ranked to find highly separating candidate genes. By using groups of five genes (chromosome size) in the discriminant function and repeating the genetic algorithm separately four times we found eleven genes occurring at least in three of the top ranked result lists of the four repetitions. In addition, we compared the results of GA/MLHD with the alternative optimization algorithms greedy selection and simulated annealing as well as with the state-of-the-art method random forest. In every case we obtained a clear overlap of the selected genes independently confirming the results of MLHD in combination with a genetic algorithm.
With two unsupervised algorithms, principal component analysis and graphical Gaussian models, putative interactions of the candidate genes were determined and reconstructed by literature mining. Differential regulation of six candidate genes was validated by qRT-PCR.
Conclusions
The combination of supervised and unsupervised algorithms in this study allowed extracting a small subset of meaningful candidate genes from the genome-wide expression data set. Thereby, variable selection using different optimization algorithms based on linear classifiers as well as the nonlinear random forest method resulted in congruent candidate genes. The calculated interacting network connecting these new target genes was bioinformatically mapped to known CRHR1-dependent signaling pathways. Additionally, the differential expression of the identified target genes was confirmed experimentally.
doi:10.1186/1752-0509-4-159
PMCID: PMC3002901  PMID: 21092110
5.  SVM classifier to predict genes important for self-renewal and pluripotency of mouse embryonic stem cells 
BMC Systems Biology  2010;4:173.
Background
Mouse embryonic stem cells (mESCs) are derived from the inner cell mass of a developing blastocyst and can be cultured indefinitely in-vitro. Their distinct features are their ability to self-renew and to differentiate to all adult cell types. Genes that maintain mESCs self-renewal and pluripotency identity are of interest to stem cell biologists. Although significant steps have been made toward the identification and characterization of such genes, the list is still incomplete and controversial. For example, the overlap among candidate self-renewal and pluripotency genes across different RNAi screens is surprisingly small. Meanwhile, machine learning approaches have been used to analyze multi-dimensional experimental data and integrate results from many studies, yet they have not been applied to specifically tackle the task of predicting and classifying self-renewal and pluripotency gene membership.
Results
For this study we developed a classifier, a supervised machine learning framework for predicting self-renewal and pluripotency mESCs stemness membership genes (MSMG) using support vector machines (SVM). The data used to train the classifier was derived from mESCs-related studies using mRNA microarrays, measuring gene expression in various stages of early differentiation, as well as ChIP-seq studies applied to mESCs profiling genome-wide binding of key transcription factors, such as Nanog, Oct4, and Sox2, to the regulatory regions of other genes. Comparison to other classification methods using the leave-one-out cross-validation method was employed to evaluate the accuracy and generality of the classification. Finally, two sets of candidate genes from genome-wide RNA interference screens are used to test the generality and potential application of the classifier.
Conclusions
Our results reveal that an SVM approach can be useful for prioritizing genes for functional validation experiments and complement the analyses of high-throughput profiling experimental data in stem cell research.
doi:10.1186/1752-0509-4-173
PMCID: PMC3019180  PMID: 21176149
6.  Functional Genomics Complements Quantitative Genetics in Identifying Disease-Gene Associations 
PLoS Computational Biology  2010;6(11):e1000991.
An ultimate goal of genetic research is to understand the connection between genotype and phenotype in order to improve the diagnosis and treatment of diseases. The quantitative genetics field has developed a suite of statistical methods to associate genetic loci with diseases and phenotypes, including quantitative trait loci (QTL) linkage mapping and genome-wide association studies (GWAS). However, each of these approaches have technical and biological shortcomings. For example, the amount of heritable variation explained by GWAS is often surprisingly small and the resolution of many QTL linkage mapping studies is poor. The predictive power and interpretation of QTL and GWAS results are consequently limited. In this study, we propose a complementary approach to quantitative genetics by interrogating the vast amount of high-throughput genomic data in model organisms to functionally associate genes with phenotypes and diseases. Our algorithm combines the genome-wide functional relationship network for the laboratory mouse and a state-of-the-art machine learning method. We demonstrate the superior accuracy of this algorithm through predicting genes associated with each of 1157 diverse phenotype ontology terms. Comparison between our prediction results and a meta-analysis of quantitative genetic studies reveals both overlapping candidates and distinct, accurate predictions uniquely identified by our approach. Focusing on bone mineral density (BMD), a phenotype related to osteoporotic fracture, we experimentally validated two of our novel predictions (not observed in any previous GWAS/QTL studies) and found significant bone density defects for both Timp2 and Abcg8 deficient mice. Our results suggest that the integration of functional genomics data into networks, which itself is informative of protein function and interactions, can successfully be utilized as a complementary approach to quantitative genetics to predict disease risks. All supplementary material is available at http://cbfg.jax.org/phenotype.
Author Summary
Many recent efforts to understand the genetic origins of complex diseases utilize statistical approaches to analyze phenotypic traits measured in genetically well-characterized populations. While these quantitative genetics methods are powerful, their success is limited by sampling biases and other confounding factors, and the biological interpretation of results can be challenging since these methods are not based on any functional information for candidate loci. On the other hand, the functional genomics field has greatly expanded in past years, both in terms of experimental approaches and analytical algorithms. However, functional approaches have been applied to understanding phenotypes in only the most basic ways. In this study, we demonstrate that functional genomics can complement traditional quantitative genetics by analytically extracting protein function information from large collections of high throughput data, which can then be used to predict genotype-phenotype associations. We applied our prediction methodology to the laboratory mouse, and we experimentally confirmed a role in osteoporosis for two of our predictions that were not candidates from any previous quantitative genetics study. The ability of our approach to produce accurate and unique predictions implies that functional genomics can complement quantitative genetics and can help address previous limitations in identifying disease genes.
doi:10.1371/journal.pcbi.1000991
PMCID: PMC2978695  PMID: 21085640
7.  Combining Genome-Wide Association Mapping and Transcriptional Networks to Identify Novel Genes Controlling Glucosinolates in Arabidopsis thaliana 
PLoS Biology  2011;9(8):e1001125.
Genome-wide association mapping is highly sensitive to environmental changes, but network analysis allows rapid causal gene identification.
Background
Genome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.
Methodology/Principal Findings
To understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ∼230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.
Conclusions/Significance
Together, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse.
Author Summary
Understanding how genetic variation can control phenotypic variation is a fundamental goal of modern biology. A major push has been made using genome-wide association mapping in all organisms to attempt and rapidly identify the genes contributing to phenotypes such as disease and nutritional disorders. But a number of fundamental questions have not been answered about the use of genome-wide association: for example, how does the internal or external environment influence the genes found? Furthermore, the simple question of how many genes may influence a trait is unknown. Finally, a number of studies have identified significant false-positive and -negative issues within genome-wide association studies that are not solvable by direct statistical approaches. We have used genome-wide association mapping in the plant Arabidopsis thaliana to begin exploring these questions. We show that both external and internal environments significantly alter the identified genes, such that using different tissues can lead to the identification of nearly completely different gene sets. Given the large number of potential false-positives, we developed an orthogonal approach to filtering the possible genes, by identifying co-functioning networks using the nominal candidate gene list derived from genome-wide association studies. This allowed us to rapidly identify and validate a large number of novel and unexpected genes that affect Arabidopsis thaliana defense metabolism within phenotypic ranges that have been shown to be selectable within the field. These genes and the associated networks suggest that Arabidopsis thaliana defense metabolism is more readily similar to the infinite gene hypothesis, according to which there is a vast number of causative genes controlling natural variation in this phenotype. It remains to be seen how frequently this is true for other organisms and other phenotypes.
doi:10.1371/journal.pbio.1001125
PMCID: PMC3156686  PMID: 21857804
8.  Factors affecting reproducibility between genome-scale siRNA-based screens 
Journal of biomolecular screening  2010;15(7):735-747.
RNA interference-based screening is a powerful new genomic technology which addresses gene function en masse. To evaluate factors influencing hit list composition and reproducibility, we performed two identically designed small interfering RNA (siRNA)-based, whole genome screens for host factors supporting yellow fever virus infection. These screens represent two separate experiments completed five months apart and allow the direct assessment of the reproducibility of a given siRNA technology when performed in the same environment. Candidate hit lists generated by sum rank, median absolute deviation, z-score, and strictly standardized mean difference were compared within and between whole genome screens. Application of these analysis methodologies within a single screening dataset using a fixed threshold equivalent to a p-value ≤ 0.001 resulted in hit lists ranging from 82 to 1,140 members and highlighted the tremendous impact analysis methodology has on hit list composition. Intra- and inter-screen reproducibility was significantly influenced by the analysis methodology and ranged from 32% to 99%. This study also highlighted the power of testing at least two independent siRNAs for each gene product in primary screens. To facilitate validation we conclude by suggesting methods to reduce false discovery at the primary screening stage.
In this study we present the first comprehensive comparison of multiple analysis strategies, and demonstrate the impact of the analysis methodology on the composition of the “hit list”. Therefore, we propose that the entire dataset derived from functional genome-scale screens, especially if publicly funded, should be made available as is done with data derived from gene expression and genome-wide association studies.
doi:10.1177/1087057110374994
PMCID: PMC3149892  PMID: 20625183
RNA interference; analysis; RNAi screen analysis; siRNA; RNAi; siRNA screening; sum rank; median absolute deviation; strictly standardized mean difference; genome-wide; whole-genome; comparison; overlap; hit list
9.  RankAggreg, an R package for weighted rank aggregation 
BMC Bioinformatics  2009;10:62.
Background
Researchers in the field of bioinformatics often face a challenge of combining several ordered lists in a proper and efficient manner. Rank aggregation techniques offer a general and flexible framework that allows one to objectively perform the necessary aggregation. With the rapid growth of high-throughput genomic and proteomic studies, the potential utility of rank aggregation in the context of meta-analysis becomes even more apparent. One of the major strengths of rank-based aggregation is the ability to combine lists coming from different sources and platforms, for example different microarray chips, which may or may not be directly comparable otherwise.
Results
The RankAggreg package provides two methods for combining the ordered lists: the Cross-Entropy method and the Genetic Algorithm. Two examples of rank aggregation using the package are given in the manuscript: one in the context of clustering based on gene expression, and the other one in the context of meta-analysis of prostate cancer microarray experiments.
Conclusion
The two examples described in the manuscript clearly show the utility of the RankAggreg package in the current bioinformatics context where ordered lists are routinely produced as a result of modern high-throughput technologies.
doi:10.1186/1471-2105-10-62
PMCID: PMC2669484  PMID: 19228411
10.  Prioritizing Genomic Drug Targets in Pathogens: Application to Mycobacterium tuberculosis 
PLoS Computational Biology  2006;2(6):e61.
We have developed a software program that weights and integrates specific properties on the genes in a pathogen so that they may be ranked as drug targets. We applied this software to produce three prioritized drug target lists for Mycobacterium tuberculosis, the causative agent of tuberculosis, a disease for which a new drug is desperately needed. Each list is based on an individual criterion. The first list prioritizes metabolic drug targets by the uniqueness of their roles in the M. tuberculosis metabolome (“metabolic chokepoints”) and their similarity to known “druggable” protein classes (i.e., classes whose activity has previously been shown to be modulated by binding a small molecule). The second list prioritizes targets that would specifically impair M. tuberculosis, by weighting heavily those that are closely conserved within the Actinobacteria class but lack close homology to the host and gut flora. M. tuberculosis can survive asymptomatically in its host for many years by adapting to a dormant state referred to as “persistence.” The final list aims to prioritize potential targets involved in maintaining persistence in M. tuberculosis. The rankings of current, candidate, and proposed drug targets are highlighted with respect to these lists. Some features were found to be more accurate than others in prioritizing studied targets. It can also be shown that targets can be prioritized by using evolutionary programming to optimize the weights of each desired property. We demonstrate this approach in prioritizing persistence targets.
Synopsis
The search for drugs to prevent or treat infections remains an urgent focus in infectious disease research. A new software program has been developed by the authors of this article that can be used to rank genes as potential drug targets in pathogens. Traditional prioritization approaches to drug target identification, such as searching the literature and trying to mentally integrate varied criteria, can quickly become overwhelming for the drug discovery researcher. Alternatively, one can computationally integrate different criteria to create a ranking function that can help to identify targets. The authors demonstrate the applicability of this approach on the genome of Mycobacterium tuberculosis, the organism that causes tuberculosis (TB), a disease for which new drug treatments are especially needed because of emerging drug-resistant strains. The experiences gained from this work will be useful for both wet-lab and informatics scientists working in infectious disease research; first, it demonstrates that ample public data already exist on the M. tuberculosis genome that can be tuned effectively for prioritizing drug targets. Second, the output from numerous freely available bioinformatics tools can be pushed to achieve these goals. Third, the methodology can easily be extended to other pathogens of interest. Currently studied TB targets are also highlighted in terms of the authors' ranking system, which should be useful for researchers focusing on TB drug discovery.
doi:10.1371/journal.pcbi.0020061
PMCID: PMC1475714  PMID: 16789813
11.  A gene signature based method for identifying subtypes and subtype-specific drivers in cancer with an application to medulloblastoma 
BMC Bioinformatics  2013;14(Suppl 18):S1.
Background
Subtypes are widely found in cancer. They are characterized with different behaviors in clinical and molecular profiles, such as survival rates, gene signature and copy number aberrations (CNAs). While cancer is generally believed to have been caused by genetic aberrations, the number of such events is tremendous in the cancer tissue and only a small subset of them may be tumorigenic. On the other hand, gene expression signature of a subtype represents residuals of the subtype-specific cancer mechanisms. Using high-throughput data to link these factors to define subtype boundaries and identify subtype-specific drivers, is a promising yet largely unexplored topic.
Results
We report a systematic method to automate the identification of cancer subtypes and candidate drivers. Specifically, we propose an iterative algorithm that alternates between gene expression clustering and gene signature selection. We applied the method to datasets of the pediatric cerebellar tumor medulloblastoma (MB). The subtyping algorithm consistently converges on multiple datasets of medulloblastoma, and the converged signatures and copy number landscapes are also found to be highly reproducible across the datasets. Based on the identified subtypes, we developed a PCA-based approach for subtype-specific identification of cancer drivers. The top-ranked driver candidates are found to be enriched with known pathways in certain subtypes of MB. This might reveal new understandings for these subtypes.
This article is an extended abstract of our ICCABS '12 paper (Chen et al. 2012), with revised methods in iterative subtyping, the use of canonical correlation analysis for driver-identification, and an extra dataset (Northcott90 dataset) for cross-validations. Discussions of the algorithm performance and of the slightly different gene lists identified are also added.
Conclusions
Our study indicates that subtype-signature defines the subtype boundaries, characterizes the subtype-specific processes and can be used to prioritize signature-related drivers.
doi:10.1186/1471-2105-14-S18-S1
PMCID: PMC3820164  PMID: 24564171
subtypes of cancer; medulloblastoma; gene signature; copy number aberrations; microarrays; driver genes
12.  Using a large-scale knowledge database on reactions and regulations to propose key upstream regulators of various sets of molecules participating in cell metabolism 
BMC Systems Biology  2014;8:32.
Background
Most of the existing methods to analyze high-throughput data are based on gene ontology principles, providing information on the main functions and biological processes. However, these methods do not indicate the regulations behind the biological pathways. A critical point in this context is the extraction of information from many possible relationships between the regulated genes, and its combination with biochemical regulations. This study aimed at developing an automatic method to propose a reasonable number of upstream regulatory candidates from lists of various regulated molecules by confronting experimental data with encyclopedic information.
Results
A new formalism of regulated reactions combining biochemical transformations and regulatory effects was proposed to unify the different mechanisms contained in knowledge libraries. Based on a related causality graph, an algorithm was developed to propose a reasonable set of upstream regulators from lists of target molecules. Scores were added to candidates according to their ability to explain the greatest number of targets or only few specific ones. By testing 250 lists of target genes as inputs, each with a known solution, the success of the method to provide the expected transcription factor among 50 or 100 proposed regulatory candidates, was evaluated to 62.6% and 72.5% of the situations, respectively. An additional prioritization among candidates might be further realized by adding functional ontology information. The benefit of this strategy was proved by identifying PPAR isotypes and their partners as the upstream regulators of a list of experimentally-identified targets of PPARA, a pivotal transcriptional factor in lipid oxidation. The proposed candidates participated in various biological functions that further enriched the original information. The efficiency of the method in merging reactions and regulations was also illustrated by identifying gene candidates participating in glucose homeostasis from an input list of metabolites involved in cell glycolysis.
Conclusion
This method proposes a reasonable number of regulatory candidates for lists of input molecules that may include transcripts of genes and metabolites. The proposed upstream regulators are the transcription factors themselves and protein complexes, so that a multi-level description of how cell metabolism is regulated is obtained.
doi:10.1186/1752-0509-8-32
PMCID: PMC4004165  PMID: 24635915
Biochemical reactions; Causalities; Gene expression; Knowledge integration; Protein partners; Upstream regulators
13.  Whole genome identification of Mycobacterium tuberculosis vaccine candidates by comprehensive data mining and bioinformatic analyses 
BMC Medical Genomics  2008;1:18.
Background
Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), infects ~8 million annually culminating in ~2 million deaths. Moreover, about one third of the population is latently infected, 10% of which develop disease during lifetime. Current approved prophylactic TB vaccines (BCG and derivatives thereof) are of variable efficiency in adult protection against pulmonary TB (0%–80%), and directed essentially against early phase infection.
Methods
A genome-scale dataset was constructed by analyzing published data of: (1) global gene expression studies under conditions which simulate intra-macrophage stress, dormancy, persistence and/or reactivation; (2) cellular and humoral immunity, and vaccine potential. This information was compiled along with revised annotation/bioinformatic characterization of selected gene products and in silico mapping of T-cell epitopes. Protocols for scoring, ranking and prioritization of the antigens were developed and applied.
Results
Cross-matching of literature and in silico-derived data, in conjunction with the prioritization scheme and biological rationale, allowed for selection of 189 putative vaccine candidates from the entire genome. Within the 189 set, the relative distribution of antigens in 3 functional categories differs significantly from their distribution in the whole genome, with reduction in the Conserved hypothetical category (due to improved annotation) and enrichment in Lipid and in Virulence categories. Other prominent representatives in the 189 set are the PE/PPE proteins; iron sequestration, nitroreductases and proteases, all within the Intermediary metabolism and respiration category; ESX secretion systems, resuscitation promoting factors and lipoproteins, all within the Cell wall category. Application of a ranking scheme based on qualitative and quantitative scores, resulted in a list of 45 best-scoring antigens, of which: 74% belong to the dormancy/reactivation/resuscitation classes; 30% belong to the Cell wall category; 13% are classical vaccine candidates; 9% are categorized Conserved hypotheticals, all potentially very potent T-cell antigens.
Conclusion
The comprehensive literature and in silico-based analyses allowed for the selection of a repertoire of 189 vaccine candidates, out of the whole-genome 3989 ORF products. This repertoire, which was ranked to generate a list of 45 top-hits antigens, is a platform for selection of genes covering all stages of M. tuberculosis infection, to be incorporated in rBCG or subunit-based vaccines.
doi:10.1186/1755-8794-1-18
PMCID: PMC2442614  PMID: 18505592
14.  mirTarPri: Improved Prioritization of MicroRNA Targets through Incorporation of Functional Genomics Data 
PLoS ONE  2013;8(1):e53685.
MicroRNAs (miRNAs) are a class of small (19–25 nt) non-coding RNAs. This important class of gene regulator downregulates gene expression through sequence-specific binding to the 3′untranslated regions (3′UTRs) of target mRNAs. Several computational target prediction approaches have been developed for predicting miRNA targets. However, the predicted target lists often have high false positive rates. To construct a workable target list for subsequent experimental studies, we need novel approaches to properly rank the candidate targets from traditional methods. We performed a systematic analysis of experimentally validated miRNA targets using functional genomics data, and found significant functional associations between genes that were targeted by the same miRNA. Based on this finding, we developed a miRNA target prioritization method named mirTarPri to rank the predicted target lists from commonly used target prediction methods. Leave-one-out cross validation has proved to be successful in identifying known targets, achieving an AUC score up to 0. 84. Validation in high-throughput data proved that mirTarPri was an unbiased method. Applying mirTarPri to prioritize results of six commonly used target prediction methods allowed us to find more positive targets at the top of the prioritized candidate list. In comparison with other methods, mirTarPri had an outstanding performance in gold standard and CLIP data. mirTarPri was a valuable method to improve the efficacy of current miRNA target prediction methods. We have also developed a web-based server for implementing mirTarPri method, which is freely accessible at http://bioinfo.hrbmu.edu.cn/mirTarPri.
doi:10.1371/journal.pone.0053685
PMCID: PMC3541237  PMID: 23326485
15.  Computational selection and prioritization of candidate genes for Fetal Alcohol Syndrome 
BMC Genomics  2007;8:389.
Background
Fetal alcohol syndrome (FAS) is a serious global health problem and is observed at high frequencies in certain South African communities. Although in utero alcohol exposure is the primary trigger, there is evidence for genetic- and other susceptibility factors in FAS development. No genome-wide association or linkage studies have been performed for FAS, making computational selection and -prioritization of candidate disease genes an attractive approach.
Results
10174 Candidate genes were initially selected from the whole genome using a previously described method, which selects candidate genes according to their expression in disease-affected tissues. Hereafter candidates were prioritized for experimental investigation by investigating criteria pertinent to FAS and binary filtering. 29 Criteria were assessed by mining various database sources to populate criteria-specific gene lists. Candidate genes were then prioritized for experimental investigation using a binary system that assessed the criteria gene lists against the candidate list, and candidate genes were scored accordingly. A group of 87 genes was prioritized as candidates and for future experimental validation. The validity of the binary prioritization method was assessed by investigating the protein-protein interactions, functional enrichment and common promoter element binding sites of the top-ranked genes.
Conclusion
This analysis highlighted a list of strong candidate genes from the TGF-β, MAPK and Hedgehog signalling pathways, which are all integral to fetal development and potential targets for alcohol's teratogenic effect. We conclude that this novel bioinformatics approach effectively prioritizes credible candidate genes for further experimental analysis.
doi:10.1186/1471-2164-8-389
PMCID: PMC2194724  PMID: 17961254
16.  Enhancing the Prioritization of Disease-Causing Genes through Tissue Specific Protein Interaction Networks 
PLoS Computational Biology  2012;8(9):e1002690.
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.
Author Summary
Identifying the genes causing genetic disease is a key challenge in human health, and a crucial step on the road for developing novel diagnostics and treatments. Modern discovery methods involve genome-wide association studies that reveal regions of the genome where the causal gene is likely to reside, and then prioritizing the candidate genes within these regions and experimentally examining the most promising candidates' potential influence on the disease. Many computational methods were developed to automatically prioritize candidate genes. Some of the most successful methods use a biological network of interacting genes or proteins as an input. However, these networks – and subsequently, these methods – do not take into account the differences between tissues. In other words, a heart disease is analyzed using the same network as a skin disease. We constructed tissue-specific protein interaction networks and explored their effect on an existing prioritization algorithm by comparing the algorithm's performance on the tissue-specific networks and the generic network. We find that integrating tissue-specific data indeed leads to better prioritization. We also used the prioritization results of different tissues in order to suggest new disease-tissue associations.
doi:10.1371/journal.pcbi.1002690
PMCID: PMC3459874  PMID: 23028288
17.  Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis 
BMC Bioinformatics  2013;14:143.
Background
Reliability and Reproducibility of differentially expressed genes (DEGs) are essential for the biological interpretation of microarray data. The microarray quality control (MAQC) project launched by US Food and Drug Administration (FDA) elucidated that the lists of DEGs generated by intra- and inter-platform comparisons can reach a high level of concordance, which mainly depended on the statistical criteria used for ranking and selecting DEGs. Generally, it will produce reproducible lists of DEGs when combining fold change ranking with a non-stringent p-value cutoff. For further interpretation of the gene expression data, statistical methods of gene enrichment analysis provide powerful tools for associating the DEGs with prior biological knowledge, e.g. Gene Ontology (GO) terms and pathways, and are widely used in genome-wide research. Although the DEG lists generated from the same compared conditions proved to be reliable, the reproducible enrichment results are still crucial to the discovery of the underlying molecular mechanism differentiating the two conditions. Therefore, it is important to know whether the enrichment results are still reproducible, when using the lists of DEGs generated by different statistic criteria from inter-laboratory and cross-platform comparisons. In our study, we used the MAQC data sets for systematically accessing the intra- and inter-platform concordance of GO terms enriched by Gene Set Enrichment Analysis (GSEA) and LRpath.
Results
In intra-platform comparisons, the overlapped percentage of enriched GO terms was as high as ~80% when the inputted lists of DEGs were generated by fold change ranking and Significance Analysis of Microarrays (SAM), whereas the percentages decreased about 20% when generating the lists of DEGs by using fold change ranking and t-test, or by using SAM and t-test. Similar results were found in inter-platform comparisons.
Conclusions
Our results demonstrated that the lists of DEGs in a high level of concordance can ensure the high concordance of enrichment results. Importantly, based on the lists of DEGs generated by a straightforward method of combining fold change ranking with a non-stringent p-value cutoff, enrichment analysis will produce reproducible enriched GO terms for the biological interpretation.
doi:10.1186/1471-2105-14-143
PMCID: PMC3644270  PMID: 23627640
DNA microarray; Intra-/inter-platform comparison; Gene Ontology enrichment; Microarray quality control (MAQC)
18.  Genomic convergence and network analysis approach to identify candidate genes in Alzheimer's disease 
BMC Genomics  2014;15:199.
Background
Alzheimer’s disease (AD) is one of the leading genetically complex and heterogeneous disorder that is influenced by both genetic and environmental factors. The underlying risk factors remain largely unclear for this heterogeneous disorder. In recent years, high throughput methodologies, such as genome-wide linkage analysis (GWL), genome-wide association (GWA) studies, and genome-wide expression profiling (GWE), have led to the identification of several candidate genes associated with AD. However, due to lack of consistency within their findings, an integrative approach is warranted. Here, we have designed a rank based gene prioritization approach involving convergent analysis of multi-dimensional data and protein-protein interaction (PPI) network modelling.
Results
Our approach employs integration of three different AD datasets- GWL,GWA and GWE to identify overlapping candidate genes ranked using a novel cumulative rank score (SR) based method followed by prioritization using clusters derived from PPI network. SR for each gene is calculated by addition of rank assigned to individual gene based on either p value or score in three datasets. This analysis yielded 108 plausible AD genes. Network modelling by creating PPI using proteins encoded by these genes and their direct interactors resulted in a layered network of 640 proteins. Clustering of these proteins further helped us in identifying 6 significant clusters with 7 proteins (EGFR, ACTB, CDC2, IRAK1, APOE, ABCA1 and AMPH) forming the central hub nodes. Functional annotation of 108 genes revealed their role in several biological activities such as neurogenesis, regulation of MAP kinase activity, response to calcium ion, endocytosis paralleling the AD specific attributes. Finally, 3 potential biochemical biomarkers were found from the overlap of 108 AD proteins with proteins from CSF and plasma proteome. EGFR and ACTB were found to be the two most significant AD risk genes.
Conclusions
With the assumption that common genetic signals obtained from different methodological platforms might serve as robust AD risk markers than candidates identified using single dimension approach, here we demonstrated an integrated genomic convergence approach for disease candidate gene prioritization from heterogeneous data sources linked to AD.
doi:10.1186/1471-2164-15-199
PMCID: PMC4028079  PMID: 24628925
Gene prioritization; Protein-protein interaction; Clustering; Functional annotation
19.  Efficient haplotype block recognition of very long and dense genetic sequences 
BMC Bioinformatics  2014;15:10.
Background
The new sequencing technologies enable to scan very long and dense genetic sequences, obtaining datasets of genetic markers that are an order of magnitude larger than previously available. Such genetic sequences are characterized by common alleles interspersed with multiple rarer alleles. This situation has renewed the interest for the identification of haplotypes carrying the rare risk alleles. However, large scale explorations of the linkage-disequilibrium (LD) pattern to identify haplotype blocks are not easy to perform, because traditional algorithms have at least Θ(n2) time and memory complexity.
Results
We derived three incremental optimizations of the widely used haplotype block recognition algorithm proposed by Gabriel et al. in 2002. Our most efficient solution, called MIG ++, has only Θ(n) memory complexity and, on a genome-wide scale, it omits >80% of the calculations, which makes it an order of magnitude faster than the original algorithm. Differently from the existing software, the MIG ++ analyzes the LD between SNPs at any distance, avoiding restrictions on the maximal block length. The haplotype block partition of the entire HapMap II CEPH dataset was obtained in 457 hours. By replacing the standard likelihood-based D′ variance estimator with an approximated estimator, the runtime was further improved. While producing a coarser partition, the approximate method allowed to obtain the full-genome haplotype block partition of the entire 1000 Genomes Project CEPH dataset in 44 hours, with no restrictions on allele frequency or long-range correlations. These experiments showed that LD-based haplotype blocks can span more than one million base-pairs in both HapMap II and 1000 Genomes datasets. An application to the North American Rheumatoid Arthritis Consortium (NARAC) dataset shows how the MIG ++ can support genome-wide haplotype association studies.
Conclusions
The MIG ++ enables to perform LD-based haplotype block recognition on genetic sequences of any length and density. In the new generation sequencing era, this can help identify haplotypes that carry rare variants of interest. The low computational requirements open the possibility to include the haplotype block structure into genome-wide association scans, downstream analyses, and visual interfaces for online genome browsers.
doi:10.1186/1471-2105-15-10
PMCID: PMC3898000  PMID: 24423111
20.  Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection 
Background
Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.
Description
We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.
Conclusion
Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays location-based genetic variation frequency, impact on protein function, association with known genetic variations and a quality score of the variation base derived from Huvariome Core and the Diversity Panel data. These results may be used to identify and prioritize rare variants that, for example, might be disease relevant. In testing the accuracy of the Huvariome database, alleles of a selection of ambiguously called coding single nucleotide variants were successfully predicted in all cases. Data protection of individuals is ensured by restricted access to patient derived genomes from the host institution which is relevant for future molecular diagnostics.
doi:10.1186/2043-9113-2-19
PMCID: PMC3549785  PMID: 23164068
Medical genetics; Medical genomics; Whole genome sequencing; Allele frequency; Cardiomyopathy
21.  Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples 
BMC Bioinformatics  2008;9(Suppl 9):S17.
Background
Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.
Results
Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.
Conclusion
Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
doi:10.1186/1471-2105-9-S9-S17
PMCID: PMC2537568  PMID: 18793462
22.  End-Stage Liver Disease Candidates at the Highest MELD Scores Have Higher Wait-list Mortality than Status-1A Candidates 
Hepatology (Baltimore, Md.)  2011;55(1):192-198.
Background
Candidates with fulminant hepatic failure (Status-1A) receive the highest priority for liver transplantation (LT) in the United States. However, no studies have compared wait-list mortality risk among end-stage liver disease (ESLD) candidates with high Model for End-stage Liver Disease (MELD) scores to those listed as Status-1A. We aimed to determine if there are MELD scores for ESLD candidates at which their wait-list mortality risk is higher than that of Status-1A, and to identify the factors predicting wait-list mortality among Status-1A.
Methods
Data were obtained from the Scientific Registry of Transplant Recipients for adult LT candidates (n=52,459) listed between 09/01/2001 and 12/31/2007. Candidates listed for repeat LT as Status-1 A were excluded. Starting from the date of wait listing, candidates were followed for 14 days or until the earliest of death, transplant, or granting of an exception MELD score. ESLD candidates were categorized by MELD score, with a separate category for those with calculated MELD >40. We compared wait-list mortality between each MELD category and Status-1A (reference) using time-dependent Cox regression.
Results
ESLD candidates with MELD >40 had almost twice the wait-list mortality risk of Status-1A candidates, with a covariate-adjusted hazard ratio of HR=1.96 (p=0.004). There was no difference in wait-list mortality risk for candidates with MELD 36–40 and Status-1A, while candidates with MELD <36 had significantly lower mortality risk than Status-1A candidates. MELD score did not significantly predict wait-list mortality among Status-1A candidates (p=0.18). Among Status-1A candidates with acetaminophen toxicity, MELD was a significant predictor of wait-list mortality (p<0.0009). Post-transplant survival was similar for Status-1A and ESLD candidates with MELD >20 (p=0.6).
Conclusions
Candidates with MELD >40 have significantly higher wait-list mortality and similar post-transplant survival as Status-1A, and therefore, should be assigned higher priority than Status-1A for allocation. Since ESLD candidates with MELD 36–40 and Status-1A have similar wait-list mortality risk and post-transplant survival, these candidates should be assigned similar rather than sequential priority for deceased donor LT.
doi:10.1002/hep.24632
PMCID: PMC3235236  PMID: 21898487
decompensated end-stage liver disease; fulminant hepatic failure; model for end-stage liver disease; Status-1A; Status-1B; survival
23.  Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate 
PLoS Computational Biology  2009;5(6):e1000413.
Rates of evolution differ widely among proteins, but the causes and consequences of such differences remain under debate. With the advent of high-throughput functional genomics, it is now possible to rigorously assess the genomic correlates of protein evolutionary rate. However, dissecting the correlations among evolutionary rate and these genomic features remains a major challenge. Here, we use an integrated probabilistic modeling approach to study genomic correlates of protein evolutionary rate in Saccharomyces cerevisiae. We measure and rank degrees of association between (i) an approximate measure of protein evolutionary rate with high genome coverage, and (ii) a diverse list of protein properties (sequence, structural, functional, network, and phenotypic). We observe, among many statistically significant correlations, that slowly evolving proteins tend to be regulated by more transcription factors, deficient in predicted structural disorder, involved in characteristic biological functions (such as translation), biased in amino acid composition, and are generally more abundant, more essential, and enriched for interaction partners. Many of these results are in agreement with recent studies. In addition, we assess information contribution of different subsets of these protein properties in the task of predicting slowly evolving proteins. We employ a logistic regression model on binned data that is able to account for intercorrelation, non-linearity, and heterogeneity within features. Our model considers features both individually and in natural ensembles (“meta-features”) in order to assess joint information contribution and degree of contribution independence. Meta-features based on protein abundance and amino acid composition make strong, partially independent contributions to the task of predicting slowly evolving proteins; other meta-features make additional minor contributions. The combination of all meta-features yields predictions comparable to those based on paired species comparisons, and approaching the predictive limit of optimal lineage-insensitive features. Our integrated assessment framework can be readily extended to other correlational analyses at the genome scale.
Author Summary
Proteins encoded within a given genome are known to evolve at drastically different rates. Through recent large-scale studies, researchers have measured a wide variety of properties for all proteins in yeast. We are interested to know how these properties relate to one another and to what extent they explain evolutionary rate variation. Protein properties are a heterogeneous mix, a factor which complicates research in this area. For example, some properties (e.g., protein abundance) are numerical, while others (e.g., protein function) are descriptive; protein properties may also suffer from noise and hidden redundancies. We have addressed these issues within a flexible and robust statistical framework. We first ranked a large list of protein properties by the strength of their relationships with evolutionary rate; this confirms many known evolutionary relationships and also highlights several new ones. Similar protein properties were then grouped and applied to predict slowly evolving proteins. Some of these groups were as effective as paired species comparison in making correct predictions, although in both cases a great deal of evolutionary rate variation remained to be explained. Our work has helped to refine the set of protein properties that researchers should consider as they investigate the mechanisms underlying protein evolution.
doi:10.1371/journal.pcbi.1000413
PMCID: PMC2688033  PMID: 19521505
24.  Single Photon Emission Computed Tomography for the Diagnosis of Coronary Artery Disease 
Executive Summary
In July 2009, the Medical Advisory Secretariat (MAS) began work on Non-Invasive Cardiac Imaging Technologies for the Diagnosis of Coronary Artery Disease (CAD), an evidence-based review of the literature surrounding different cardiac imaging modalities to ensure that appropriate technologies are accessed by patients suspected of having CAD. This project came about when the Health Services Branch at the Ministry of Health and Long-Term Care asked MAS to provide an evidentiary platform on effectiveness and cost-effectiveness of non-invasive cardiac imaging modalities.
After an initial review of the strategy and consultation with experts, MAS identified five key non-invasive cardiac imaging technologies for the diagnosis of CAD. Evidence-based analyses have been prepared for each of these five imaging modalities: cardiac magnetic resonance imaging, single photon emission computed tomography, 64-slice computed tomographic angiography, stress echocardiography, and stress echocardiography with contrast. For each technology, an economic analysis was also completed (where appropriate). A summary decision analytic model was then developed to encapsulate the data from each of these reports (available on the OHTAC and MAS website).
The Non-Invasive Cardiac Imaging Technologies for the Diagnosis of Coronary Artery Disease series is made up of the following reports, which can be publicly accessed at the MAS website at: www.health.gov.on.ca/mas or at www.health.gov.on.ca/english/providers/program/mas/mas_about.html
Single Photon Emission Computed Tomography for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Stress Echocardiography for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Stress Echocardiography with Contrast for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
64-Slice Computed Tomographic Angiography for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Cardiac Magnetic Resonance Imaging for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Pease note that two related evidence-based analyses of non-invasive cardiac imaging technologies for the assessment of myocardial viability are also available on the MAS website:
Positron Emission Tomography for the Assessment of Myocardial Viability: An Evidence-Based Analysis
Magnetic Resonance Imaging for the Assessment of Myocardial Viability: an Evidence-Based Analysis
The Toronto Health Economics and Technology Assessment Collaborative has also produced an associated economic report entitled:
The Relative Cost-effectiveness of Five Non-invasive Cardiac Imaging Technologies for Diagnosing Coronary Artery Disease in Ontario [Internet]. Available from: http://theta.utoronto.ca/reports/?id=7
Objective
The objective of the analysis is to determine the diagnostic accuracy of single photon emission tomography (SPECT) in the diagnosis of coronary artery disease (CAD) compared to the reference standard of coronary angiography (CA). The analysis is primarily meant to allow for indirect comparisons between non-invasive strategies for the diagnosis of CAD, using CA as a reference standard.
SPECT
Cardiac SPECT, or myocardial perfusion scintigraphy (MPS), is a widely used nuclear, non-invasive image acquisition technique for investigating ischemic heart disease. SPECT is currently appropriate for all aspects of detecting and managing ischemic heart disease including diagnosis, risk assessment/stratification, assessment of myocardial viability, and the evaluation of left ventricular function. Myocardial perfusion scintigraphy was originally developed as a two-dimensional planar imaging technique, but SPECT acquisition has since become the clinical standard in current practice. Cardiac SPECT for the diagnosis of CAD uses an intravenously administered radiopharmaceutical tracer to evaluate regional coronary blood flow usually at rest and after stress. The radioactive tracers thallium (201Tl) or technetium-99m (99mTc), or both, may be used to visualize the SPECT acquisition. Exercise or a pharmacologic agent is used to achieve stress. After the administration of the tracer, its distribution within the myocardium (which is dependent on myocardial blood flow) is imaged using a gamma camera. In SPECT imaging, the gamma camera rotates around the patients for 10 to 20 minutes so that multiple two-dimensional projections are acquired from various angles. The raw data are then processed using computational algorithms to obtain three-dimensional tomographic images.
Since its inception, SPECT has evolved and its techniques/applications have become increasingly more complex and numerous. Accordingly, new techniques such as attenuation correction and ECG gating have been developed to correct for attenuation due to motion or soft-tissue artifact and to improve overall image clarity.
Research Questions
What is the diagnostic accuracy of SPECT for the diagnosis of CAD compared to the reference standard of CA?
Is SPECT cost-effective compared to other non-invasive cardiac imaging modalities for the diagnosis of CAD?
What are the major safety concerns with SPECT when used for the diagnosis of CAD?
Methods
A preliminary literature search was performed across OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for all systematic reviews/meta-analysis published between January 1, 2004 and August 22, 2009. A comprehensive systematic review was identified from this search and used as a basis for an updated search.
A second comprehensive literature search was then performed on October 30, 2009 across the same databases for studies published between January 1, 2002 and October 30, 2009. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria, full-text articles were obtained. Reference lists were also hand-searched for any additional studies.
Systematic reviews, meta-analyses, controlled clinical trials, and observational studies
Minimum sample size of 20 patients who completed coronary angiography
Use of CA as a reference standard for the diagnosis of CAD
Data available to calculate true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN)
Accuracy data reported by patient not by segment
English language
Non-systematic reviews, case reports
Grey literature and abstracts
Trials using planar imaging only
Trials conducted in patients with non-ischemic heart disease
Studies done exclusively in special populations (e.g., patients with left branch bundle block, diabetics, minority populations) unless insufficient data available
Summary of Findings
Eighty-four observational studies, one non-randomized, single arm controlled clinical trial, and one poorly reported trial that appeared to be a randomized controlled trial (RCT) met the inclusion criteria for this review. All studies assessed the diagnostic accuracy of myocardial perfusion SPECT for the diagnosis of CAD using CA as a reference standard. Based on the results of these studies the following conclusions were made:
According to very low quality evidence, the addition of attenuation correction to traditional or ECG-gated SPECT greatly improves the specificity of SPECT for the diagnosis of CAD although this improvement is not statistically significant. A trend towards improvement of specificity was also observed with the addition of ECG gating to traditional SPECT.
According to very low quality evidence, neither the choice of stress agent (exercise or pharmacologic) nor the choice of radioactive tracer (technetium vs. thallium) significantly affect the diagnostic accuracy of SPECT for the diagnosis of CAD although a trend towards accuracy improvement was observed with the use of pharmacologic stress over exercise stress and technetium over thallium.
Considerably heterogeneity was observed both within and between trials. This heterogeneity may explain why some of the differences observed between accuracy estimates for various subgroups were not statistically significant.
More complex analytic techniques such as meta-regression may help to better understand which study characteristics significantly influence the diagnostic accuracy of SPECT.
PMCID: PMC3377554  PMID: 23074411
25.  64-Slice Computed Tomographic Angiography for the Diagnosis of Intermediate Risk Coronary Artery Disease 
Executive Summary
In July 2009, the Medical Advisory Secretariat (MAS) began work on Non-Invasive Cardiac Imaging Technologies for the Diagnosis of Coronary Artery Disease (CAD), an evidence-based review of the literature surrounding different cardiac imaging modalities to ensure that appropriate technologies are accessed by patients suspected of having CAD. This project came about when the Health Services Branch at the Ministry of Health and Long-Term Care asked MAS to provide an evidentiary platform on effectiveness and cost-effectiveness of non-invasive cardiac imaging modalities.
After an initial review of the strategy and consultation with experts, MAS identified five key non-invasive cardiac imaging technologies for the diagnosis of CAD. Evidence-based analyses have been prepared for each of these five imaging modalities: cardiac magnetic resonance imaging, single photon emission computed tomography, 64-slice computed tomographic angiography, stress echocardiography, and stress echocardiography with contrast. For each technology, an economic analysis was also completed (where appropriate). A summary decision analytic model was then developed to encapsulate the data from each of these reports (available on the OHTAC and MAS website).
The Non-Invasive Cardiac Imaging Technologies for the Diagnosis of Coronary Artery Disease series is made up of the following reports, which can be publicly accessed at the MAS website at: www.health.gov.on.ca/mas or at www.health.gov.on.ca/english/providers/program/mas/mas_about.html
Single Photon Emission Computed Tomography for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Stress Echocardiography for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Stress Echocardiography with Contrast for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
64-Slice Computed Tomographic Angiography for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Cardiac Magnetic Resonance Imaging for the Diagnosis of Coronary Artery Disease: An Evidence-Based Analysis
Pease note that two related evidence-based analyses of non-invasive cardiac imaging technologies for the assessment of myocardial viability are also available on the MAS website:
Positron Emission Tomography for the Assessment of Myocardial Viability: An Evidence-Based Analysis
Magnetic Resonance Imaging for the Assessment of Myocardial Viability: an Evidence-Based Analysis
The Toronto Health Economics and Technology Assessment Collaborative has also produced an associated economic report entitled:
The Relative Cost-effectiveness of Five Non-invasive Cardiac Imaging Technologies for Diagnosing Coronary Artery Disease in Ontario [Internet]. Available from: http://theta.utoronto.ca/reports/?id=7
Objective
The objective of this report is to determine the accuracy of computed tomographic angiography (CTA) compared to the more invasive option of coronary angiography (CA) in the detection of coronary artery disease (CAD) in stable (non-emergent) symptomatic patients.
CT Angiography
CTA is a cardiac imaging test that assesses the presence or absence, as well as the extent, of coronary artery stenosis for the diagnosis of CAD. As such, it is a test of cardiac structure and anatomy, in contrast to the other cardiac imaging modalities that assess cardiac function. It is, however, unclear as to whether cardiac structural features alone, in the absence cardiac function information, are sufficient to determine the presence or absence of intermediate pretest risk of CAD.
CTA technology is changing rapidly with increasing scan speeds and anticipated reductions in radiation exposure. Initial scanners based on 4, 8, 16, 32, and 64 slice machines have been available since the end of 2004. Although 320-slice machines are now available, these are not widely diffused and the existing published evidence is specific to 64-slice scanners. In general, CTA allows for 3-dimensional (3D) viewing of the coronary arteries derived from software algorithms of 2-dimensional (2D) images.
The advantage of CTA over CA, the gold standard for the diagnosis of CAD, is that it is relatively less invasive and may serve as a test in determining which patients are best suited for a CA. CA requires insertion of a catheter through an artery in the arm or leg up to the area being studied, yet both tests involve contrast agents and radiation exposure. Therefore, the identification of patients for whom CTA or CA is more appropriate may help to avoid more invasive tests, treatment delays, and unnecessary radiation exposure. The main advantage of CA, however, is that treatment can be administered in the same session as the test procedure and as such, it’s recommended for patients with a pre-test probability of CAD of ≥80%. The progression to the more invasive CA allows for the diagnosis and treatment in one session without the added radiation exposure from a previous CTA.
The visibility of arteries in CTA images is best in populations with a disease prevalence, or pre-test probabilities of CAD, of 40% to 80%, beyond which patients are considered at high pre-test probability. Visibility decreases with increasing prevalence as arteries become increasingly calcified (coronary artery calcification is based on the Agaston score). Such higher risk patients are not candidates for the less invasive diagnostic procedures and should proceed directly to CA, where treatment can be administered in conjunction with the test itself, while bypassing the radiation exposure from CTA.
CTA requires the addition of an ionated contrast, which can be administered only in patients with sufficient renal function (creatinine levels >30 micromoles/litre) to allow for the clearing of the contrast from the body. In some cases, the contrast is administered in patients with creatinine levels less than 30 micromoles/litre.
A second important criterion for the administration of the CTA is patient heart rate, which should be less than 65 beats/min for the single source CTA machines and less than 80 beats/min for the dual source machines. To decrease heart rates to these levels, beta-blockers are often required. Although the accuracy of these two machines does not differ, the dual source machines can be utilized in a higher proportion of patients than the single source machines for patients with heart beats of up to 80 beats/min. Approximately 10% of patients are considered ineligible for CTA because of this inability to decrease heart rates to the required levels. Additional contra-indications include renal insufficiency as described above and atrial fibrillation, with approximately 10% of intermediate risk patients ineligible for CTA due these contraindications. The duration of the procedure may be between 1 and 1.5 hours, with about 15 minutes for the CTA and the remaining time for the preparation of the patient.
CTA is licensed by Health Canada as a Class III device. Currently, two companies have licenses for 64-slice CT scanners, Toshiba Medical Systems Corporation (License 67604) and Philips Medical Systems (License 67599 and 73260).
Research Questions
How does the accuracy of CTA compare to the more invasive CA in the diagnosis of CAD in symptomatic patients at intermediate risk of the disease?
How does the accuracy for CTA compare to other modalities in the detection of CAD?
Research Methods
Literature Search
A literature search was performed on July 20, 2009 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1, 2004 until July 20, 2009. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria, full-text articles were obtained. Reference lists were also examined for any relevant studies not identified through the search. The quality of evidence was assessed as high, moderate, low or very low according to GRADE methodology.
Inclusion Criteria
English language articles and English or French-language HTAs published from January 1, 2004 to July 20, 2009.
Randomized controlled trials (RCTs), non-randomized clinical trials, systematic reviews and meta-analyses.
Studies of symptomatic patients at intermediate pre-test probability of CAD.
Studies of single source CTA compared to CA for the diagnosis of CAD.
Studies in which sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) could be established. HTAs, SRs, clinical trials, observational studies.
Exclusion Criteria
Non-English studies.
Pediatric populations.
Studies of patients at low or high pre-test probability of CAD.
Studies of unstable patients, e.g., emergency room visits, or a prior diagnosis of CAD.
Studies in patients with non-ischemic heart disease.
Studies in which outcomes were not specific to those of interest in this report.
Studies in which CTA was not compared to CA in a stable population.
Outcomes of Interest)
CAD defined as ≥50% stenosis.
Comparator
Coronary angiography.
Measures of Interest
Sensitivity, specificity;
Negative predictive value (NPV), positive predictive value (PPV);
Area under the curve (AUC) and diagnostic odds ratios (DOR).
Results of Literature Search and Evidence-Based Analysis
The literature search yielded two HTAs, the first published by MAS in April 2005, the other from the Belgian Health Care Knowledge Centre published in 2008, as well as three recent non-randomized clinical studies. The three most significant studies concerning the accuracy of CTA versus CA are the CORE-64 study, the ACCURACY trial, and a prospective, multicenter, multivendor study conducted in the Netherlands. Five additional non-randomized studies were extracted from the Belgian Health Technology Assessment (2008).
To provide summary estimates of sensitivity, specificity, area under the SROC curve (AUC) and diagnostic odds rations (DORs), a meta-analysis of the above-mentioned studies was conducted. Pooled estimates of sensitivity and specificity were 97.7% (95%CI: 95.5% - 99.9%) and 78.8% (95%CI: 70.8% - 86.8%), respectively. These results indicate that the sensitivity of CTA is almost as good as CA, while its specificity is poorer. The diagnostic odds ratio (DOR) was estimated at 157.0 (95%CI: 11.2 - 302.7) and the AUC was found to be 0.94; however, the inability to provide confidence estimates for this estimate decreased its utility as an adequate outcome measure in this review.
This meta-analysis was limited by the significant heterogeneity between studies for both the pooled sensitivity and specificity (heterogeneity Chi-square p=0.000). To minimize these statistical concerns, the analysis was restricted to studies of intermediate risk patients with no previous history of cardiac events. Nevertheless, the underlying prevalence of CAD ranged from 24.8% to 78% between studies, indicating that there was still some variability in the pre-test probabilities of disease within this stable population. The variation in the prevalence of CAD, accompanied with differences in the proportion of calcification, likely affected the specificity directly and the sensitivity indirectly across studies.
In February 2010, the results of the Ontario Multi-detector Computed Tomography Coronary Angiography Study (OMCAS) became available and were thus included in a second meta-analysis of the above studies. The OMCAS was a non-randomized double-blind study conducted in 3 centers in Ontario that was conducted as a result of a MAS review from 2005 requesting an evaluation of the accuracy of 64-slice CTA for CAD detection. Within 10 days of their scheduled CA, all patients received an additional evaluation with CTA. Included in the meta-analysis with the above-mentioned studies are 117 symptomatic patients with intermediate probability of CAD (10% - 90% probability), resulting in a pooled sensitivity of 96.1% (95%CI: 94.0%-98.3%) and pooled specificity of 81.5% (95%CI: 73.0% - 89.9%).
Summary of Findings
CTA is almost as good as CA in detecting true positives but poorer in the rate of false positives. The main value of CTA may be in ruling out significant CAD.
Increased prevalence of CAD decreases study specificity, whereas specificity is increased in the presence of increased arterial calcification even in lower prevalence studies.
Positive CT angiograms may require additional tests such as stress tests or the more invasive CA, partly to identify false positives.
Radiation exposure is an important safety concern that needs to be considered, particularly the cumulative exposures from repeat CTAs.
PMCID: PMC3377576  PMID: 23074388

Results 1-25 (1483345)