Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge 
Brownstein, Catherine A | Beggs, Alan H | Homer, Nils | Merriman, Barry | Yu, Timothy W | Flannery, Katherine C | DeChene, Elizabeth T | Towne, Meghan C | Savage, Sarah K | Price, Emily N | Holm, Ingrid A | Luquette, Lovelace J | Lyon, Elaine | Majzoub, Joseph | Neupert, Peter | McCallie Jr, David | Szolovits, Peter | Willard, Huntington F | Mendelsohn, Nancy J | Temme, Renee | Finkel, Richard S | Yum, Sabrina W | Medne, Livija | Sunyaev, Shamil R | Adzhubey, Ivan | Cassa, Christopher A | de Bakker, Paul IW | Duzkale, Hatice | Dworzyński, Piotr | Fairbrother, William | Francioli, Laurent | Funke, Birgit H | Giovanni, Monica A | Handsaker, Robert E | Lage, Kasper | Lebo, Matthew S | Lek, Monkol | Leshchiner, Ignaty | MacArthur, Daniel G | McLaughlin, Heather M | Murray, Michael F | Pers, Tune H | Polak, Paz P | Raychaudhuri, Soumya | Rehm, Heidi L | Soemedi, Rachel | Stitziel, Nathan O | Vestecka, Sara | Supper, Jochen | Gugenmus, Claudia | Klocke, Bernward | Hahn, Alexander | Schubach, Max | Menzel, Mortiz | Biskup, Saskia | Freisinger, Peter | Deng, Mario | Braun, Martin | Perner, Sven | Smith, Richard JH | Andorf, Janeen L | Huang, Jian | Ryckman, Kelli | Sheffield, Val C | Stone, Edwin M | Bair, Thomas | Black-Ziegelbein, E Ann | Braun, Terry A | Darbro, Benjamin | DeLuca, Adam P | Kolbe, Diana L | Scheetz, Todd E | Shearer, Aiden E | Sompallae, Rama | Wang, Kai | Bassuk, Alexander G | Edens, Erik | Mathews, Katherine | Moore, Steven A | Shchelochkov, Oleg A | Trapane, Pamela | Bossler, Aaron | Campbell, Colleen A | Heusel, Jonathan W | Kwitek, Anne | Maga, Tara | Panzer, Karin | Wassink, Thomas | Van Daele, Douglas | Azaiez, Hela | Booth, Kevin | Meyer, Nic | Segal, Michael M | Williams, Marc S | Tromp, Gerard | White, Peter | Corsmeier, Donald | Fitzgerald-Butt, Sara | Herman, Gail | Lamb-Thrush, Devon | McBride, Kim L | Newsom, David | Pierson, Christopher R | Rakowsky, Alexander T | Maver, Aleš | Lovrečić, Luca | Palandačić, Anja | Peterlin, Borut | Torkamani, Ali | Wedell, Anna | Huss, Mikael | Alexeyenko, Andrey | Lindvall, Jessica M | Magnusson, Måns | Nilsson, Daniel | Stranneheim, Henrik | Taylan, Fulya | Gilissen, Christian | Hoischen, Alexander | van Bon, Bregje | Yntema, Helger | Nelen, Marcel | Zhang, Weidong | Sager, Jason | Zhang, Lu | Blair, Kathryn | Kural, Deniz | Cariaso, Michael | Lennon, Greg G | Javed, Asif | Agrawal, Saloni | Ng, Pauline C | Sandhu, Komal S | Krishna, Shuba | Veeramachaneni, Vamsi | Isakov, Ofer | Halperin, Eran | Friedman, Eitan | Shomron, Noam | Glusman, Gustavo | Roach, Jared C | Caballero, Juan | Cox, Hannah C | Mauldin, Denise | Ament, Seth A | Rowen, Lee | Richards, Daniel R | Lucas, F Anthony San | Gonzalez-Garay, Manuel L | Caskey, C Thomas | Bai, Yu | Huang, Ying | Fang, Fang | Zhang, Yan | Wang, Zhengyuan | Barrera, Jorge | Garcia-Lobo, Juan M | González-Lamuño, Domingo | Llorca, Javier | Rodriguez, Maria C | Varela, Ignacio | Reese, Martin G | De La Vega, Francisco M | Kiruluta, Edward | Cargill, Michele | Hart, Reece K | Sorenson, Jon M | Lyon, Gholson J | Stevenson, David A | Bray, Bruce E | Moore, Barry M | Eilbeck, Karen | Yandell, Mark | Zhao, Hongyu | Hou, Lin | Chen, Xiaowei | Yan, Xiting | Chen, Mengjie | Li, Cong | Yang, Can | Gunel, Murat | Li, Peining | Kong, Yong | Alexander, Austin C | Albertyn, Zayed I | Boycott, Kym M | Bulman, Dennis E | Gordon, Paul MK | Innes, A Micheil | Knoppers, Bartha M | Majewski, Jacek | Marshall, Christian R | Parboosingh, Jillian S | Sawyer, Sarah L | Samuels, Mark E | Schwartzentruber, Jeremy | Kohane, Isaac S | Margulies, David M
Genome Biology  2014;15(3):R53.
There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
PMCID: PMC4073084  PMID: 24667040
2.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics 
Science (New York, N.Y.)  2013;342(6154):1235587.
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations (“ultrasensitive”) and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, “motif-breakers”). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
PMCID: PMC3947637  PMID: 24092746
3.  Protein Interaction-Based Genome-Wide Analysis of Incident Coronary Heart Disease 
Network-based approaches may leverage genome-wide association (GWA) analysis by testing for the aggregate association across several pathway members. We aimed to examine if networks of genes that represent experimentally determined protein-protein interactions are enriched in genes associated with risk of coronary heart disease (CHD).
Methods and Results
GWA analyses of ~700,000 SNPs in 899 incident CHD cases and 1,823 age- and sex-matched controls within the Nurses’ Health and the Health Professionals Follow-Up Studies were used to assign gene-wise p-values. A large database of protein-protein interactions (PPI) was used to assemble 8,300 unbiased protein complexes and corresponding gene-sets. Superimposed gene-wise p-values were used to rank gene-sets based on their enrichment in genes associated with CHD. After correcting for the number of complexes tested, one gene-set was overrepresented in CHD-associated genes (p-value=0.002). Centered on the beta-1-adrenergic receptor gene (ADRB1), this complex included 18 protein interaction partners that, so far, have not been identified as candidate loci for CHD. Five of the 19 genes in the top-complex are reported to be involved in abnormal cardiovascular system physiology based on knock-out mice (4-fold enrichment; p-value, Fisher’s exact test= 0.006). Ingenuity pathway analysis revealed that especially canonical pathways related to blood pressure regulation were significantly enriched in the genes from the top complex.
The integration of a GWA study with PPI data successfully identifies a set of candidate susceptibility genes for incident CHD that would have been missed in single-marker GWA analysis.
PMCID: PMC3197770  PMID: 21880673
Genetics of cardiovascular disease; acute myocardial infarction; epidemiology
4.  Concordance of gene expression in human protein complexes reveals tissue specificity and pathology 
Nucleic Acids Research  2013;41(18):e171.
Disease-causing variants in human genes usually lead to phenotypes specific to only a few tissues. Here, we present a method for predicting tissue specificity based on quantitative deregulation of protein complexes. The underlying assumption is that the degree of coordinated expression among proteins in a complex within a given tissue may pinpoint tissues that will be affected by a mutation in the complex and coordinated expression may reveal the complex to be active in the tissue. We identified known disease genes and their protein complex partners in a high-quality human interactome. Each susceptibility gene's tissue involvement was ranked based on coordinated expression with its interaction partners in a non-disease global map of human tissue-specific expression. The approach demonstrated high overall area under the curve (0.78) and was very successfully benchmarked against a random model and an approach not using protein complexes. This was illustrated by correct tissue predictions for three case studies on leptin, insulin-like-growth-factor 2 and the inhibitor of NF-κB kinase subunit gamma that show high concordant expression in biologically relevant tissues. Our method identifies novel gene-phenotype associations in human diseases and predicts the tissues where associated phenotypic effects may arise.
PMCID: PMC3794609  PMID: 23921638
5.  MetaRanker 2.0: a web server for prioritization of genetic variation data 
Nucleic Acids Research  2013;41(Web Server issue):W104-W108.
MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein–protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at
PMCID: PMC3692047  PMID: 23703204
6.  The Validation and Assessment of Machine Learning: A Game of Prediction from High-Dimensional Data 
PLoS ONE  2009;4(8):e6287.
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.
PMCID: PMC2716515  PMID: 19652722
7.  A genome-wide association study of men with symptoms of testicular dysgenesis syndrome and its network biology interpretation 
Journal of Medical Genetics  2011;49(1):58-65.
Testicular dysgenesis syndrome (TDS) is a common disease that links testicular germ cell cancer, cryptorchidism and some cases of hypospadias and male infertility with impaired development of the testis. The incidence of these disorders has increased over the last few decades, and testicular cancer now affects 1% of the Danish and Norwegian male population.
To identify genetic variants that span the four TDS phenotypes, the authors performed a genome-wide association study (GWAS) using Affymetrix Human SNP Array 6.0 to screen 488 patients with symptoms of TDS and 439 selected controls with excellent reproductive health. Furthermore, they developed a novel integrative method that combines GWAS data with other TDS-relevant data types and identified additional TDS markers. The most significant findings were replicated in an independent cohort of 671 Nordic men.
Markers located in the region of TGFBR3 and BMP7 showed association with all TDS phenotypes in both the discovery and replication cohorts. An immunohistochemistry investigation confirmed the presence of transforming growth factor β receptor type III (TGFBR3) in peritubular and Leydig cells, in both fetal and adult testis. Single-nucleotide polymorphisms in the KITLG gene showed significant associations, but only with testicular cancer.
The association of single-nucleotide polymorphisms in the TGFBR3 and BMP7 genes, which belong to the transforming growth factor β signalling pathway, suggests a role for this pathway in the pathogenesis of TDS. Integrating data from multiple layers can highlight findings in GWAS that are biologically relevant despite having border significance at currently accepted statistical levels.
PMCID: PMC3284313  PMID: 22140272
TDS; systems biology; GWAS; infertility; testis cancer; reproductive medicine; genome-wide; genetics; epidemiology; diabetes; endocrinology; genetic epidemiology; cancer: urological; chromosomal; oncology; developmental
8.  Macrophages and Adipocytes in Human Obesity 
Diabetes  2009;58(7):1558-1567.
We investigated the regulation of adipose tissue gene expression during different phases of a dietary weight loss program and its relation with insulin sensitivity.
Twenty-two obese women followed a dietary intervention program composed of an energy restriction phase with a 4-week very-low-calorie diet and a weight stabilization period composed of a 2-month low-calorie diet followed by 3–4 months of a weight maintenance diet. At each time point, a euglycemic-hyperinsulinemic clamp and subcutaneous adipose tissue biopsies were performed. Adipose tissue gene expression profiling was performed using a DNA microarray in a subgroup of eight women. RT–quantitative PCR was used for determination of mRNA levels of 31 adipose tissue macrophage markers (n = 22).
Body weight, fat mass, and C-reactive protein level decreased and glucose disposal rate increased during the dietary intervention program. Transcriptome profiling revealed two main patterns of variations. The first involved 464 mostly adipocyte genes involved in metabolism that were downregulated during energy restriction, upregulated during weight stabilization, and unchanged during the dietary intervention. The second comprised 511 mainly macrophage genes involved in inflammatory pathways that were not changed or upregulated during energy restriction and downregulated during weight stabilization and dietary intervention. Accordingly, macrophage markers were upregulated during energy restriction and downregulated during weight stabilization and dietary intervention. The increase in glucose disposal rates in each dietary phase was associated with variation in expression of sets of 80–110 genes that differed among energy restriction, weight stabilization, and dietary intervention.
Adipose tissue macrophages and adipocytes show distinct patterns of gene regulation and association with insulin sensitivity during the various phases of a dietary weight loss program.
PMCID: PMC2699855  PMID: 19401422
9.  Metabolic Network Topology Reveals Transcriptional Regulatory Signatures of Type 2 Diabetes 
PLoS Computational Biology  2010;6(4):e1000729.
Type 2 diabetes mellitus (T2DM) is a disorder characterized by both insulin resistance and impaired insulin secretion. Recent transcriptomics studies related to T2DM have revealed changes in expression of a large number of metabolic genes in a variety of tissues. Identification of the molecular mechanisms underlying these transcriptional changes and their impact on the cellular metabolic phenotype is a challenging task due to the complexity of transcriptional regulation and the highly interconnected nature of the metabolic network. In this study we integrate skeletal muscle gene expression datasets with human metabolic network reconstructions to identify key metabolic regulatory features of T2DM. These features include reporter metabolites—metabolites with significant collective transcriptional response in the associated enzyme-coding genes, and transcription factors with significant enrichment of binding sites in the promoter regions of these genes. In addition to metabolites from TCA cycle, oxidative phosphorylation, and lipid metabolism (known to be associated with T2DM), we identified several reporter metabolites representing novel biomarker candidates. For example, the highly connected metabolites NAD+/NADH and ATP/ADP were also identified as reporter metabolites that are potentially contributing to the widespread gene expression changes observed in T2DM. An algorithm based on the analysis of the promoter regions of the genes associated with reporter metabolites revealed a transcription factor regulatory network connecting several parts of metabolism. The identified transcription factors include members of the CREB, NRF1 and PPAR family, among others, and represent regulatory targets for further experimental analysis. Overall, our results provide a holistic picture of key metabolic and regulatory nodes potentially involved in the pathogenesis of T2DM.
Author Summary
Type 2 diabetes mellitus is a complex metabolic disease recognized as one of the main threats to human health in the 21st century. Recent studies of gene expression levels in human tissue samples have indicated that multiple metabolic pathways are dysregulated in diabetes and in individuals at risk for diabetes; which of these are primary, or central to disease pathogenesis, remains a key question. Cellular metabolic networks are highly interconnected and often tightly regulated; any perturbations at a single node can thus rapidly diffuse to the rest of the network. Such complexity presents a considerable challenge in pinpointing key molecular mechanisms and biomarkers associated with insulin resistance and type 2 diabetes. In this study, we address this problem by using a methodology that integrates gene expression data with the human cellular metabolic network. We demonstrate our approach by analyzing gene expression patterns in skeletal muscle. The analysis identified transcription factors and metabolites that represent potential targets for therapeutic agents and future clinical diagnostics for type 2 diabetes and impaired glucose metabolism. In a broader perspective, the study provides a framework for analysis of gene expression datasets from complex diseases in the context of changes in cellular metabolism.
PMCID: PMC2848542  PMID: 20369014
10.  Fatness-Associated FTO Gene Variant Increases Mortality Independent of Fatness – in Cohorts of Danish Men 
PLoS ONE  2009;4(2):e4428.
The A-allele of the single nucleotide polymorphism (SNP), rs9939609, in the FTO gene is associated with increased fatness. We hypothesized that the SNP is associated with morbidity and mortality through the effect on fatness.
Methodology/Principal Findings
In a population of 362,200 Danish young men, examined for military service between 1943 and 1977, all obese (BMI≥31.0 kg/m2) and a random 1% sample of the others were identified. In 1992–94, at an average age of 46 years, 752 of the obese and 876 of the others were re-examined, including measurements of weight, fat mass, height, and waist circumference, and DNA sampling. Hospitalization and death occurring during the following median 13.5 years were ascertained by linkage to national registers. Cox regression analyses were performed using a dominant effect model (TT vs. TA or AA). In total 205 men died. Mortality was 42% lower (p = 0.001) with the TT genotype than in A-allele carriers. This phenomenon was observed in both the obese and the randomly sampled cohort when analysed separately. Adjustment for fatness covariates attenuated the association only slightly. Exploratory analyses of cause-specific mortality and morbidity prior to death suggested a general protective effect of the TT genotype, whereas there were only weak associations with disease incidence, except for diseases of the nervous system.
Independent of fatness, the A-allele of the FTO SNP appears to increase mortality of a magnitude similar to smoking, but without a particular underlying disease pattern barring an increase in the risk of diseases of the nervous system.
PMCID: PMC2639637  PMID: 19214238

Results 1-10 (10)