PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-11 (11)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Interpretation of Genomic Variants Using a Unified Biological Network Approach 
PLoS Computational Biology  2013;9(3):e1002886.
The decreasing cost of sequencing is leading to a growing repertoire of personal genomes. However, we are lagging behind in understanding the functional consequences of the millions of variants obtained from sequencing. Global system-wide effects of variants in coding genes are particularly poorly understood. It is known that while variants in some genes can lead to diseases, complete disruption of other genes, called ‘loss-of-function tolerant’, is possible with no obvious effect. Here, we build a systems-based classifier to quantitatively estimate the global perturbation caused by deleterious mutations in each gene. We first survey the degree to which gene centrality in various individual networks and a unified ‘Multinet’ correlates with the tolerance to loss-of-function mutations and evolutionary conservation. We find that functionally significant and highly conserved genes tend to be more central in physical protein-protein and regulatory networks. However, this is not the case for metabolic pathways, where the highly central genes have more duplicated copies and are more tolerant to loss-of-function mutations. Integration of three-dimensional protein structures reveals that the correlation with centrality in the protein-protein interaction network is also seen in terms of the number of interaction interfaces used. Finally, combining all the network and evolutionary properties allows us to build a classifier distinguishing functionally essential and loss-of-function tolerant genes with higher accuracy (AUC = 0.91) than any individual property. Application of the classifier to the whole genome shows its strong potential for interpretation of variants involved in Mendelian diseases and in complex disorders probed by genome-wide association studies.
Author Summary
The number of personal genomes sequenced has grown rapidly over the last few years and is likely to grow further. In order to use the DNA sequence variants amongst individuals for personalized medicine, we need to understand the functional impact of these variants. Deleterious variants in genes can have a wide spectrum of global effects, ranging from fatal for essential genes to no obvious damaging effect for loss-of-function tolerant genes. The global effect of a gene mutation is largely governed by the diverse biological networks in which the gene participates. Since genes participate in many networks, no singular network captures the global picture of gene interactions. Here we integrate the diverse modes of gene interactions (regulatory, genetic, phosphorylation, signaling, metabolic and physical protein-protein interactions) to create a unified biological network. We then exploit the unique properties of loss-of-function tolerant and essential genes in this unified network to build a computational model that can predict global perturbation caused by deleterious mutations in all genes. Our model can distinguish between these two gene sets with high accuracy and we further show that it can be used for interpretation of variants involved in Mendelian diseases and in complex disorders probed by genome-wide association studies.
doi:10.1371/journal.pcbi.1002886
PMCID: PMC3591262  PMID: 23505346
2.  A systematic survey of loss-of-function variants in human protein-coding genes 
Science (New York, N.Y.)  2012;335(6070):823-828.
Genome sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2,951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in non-essential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes, and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
doi:10.1126/science.1215040
PMCID: PMC3299548  PMID: 22344438
3.  Computational study of drug binding to the membrane-bound tetrameric M2 peptide bundle from influenza A virus 
Biochimica et biophysica acta  2010;1808(2):530-537.
The M2 protein of influenza A virus performs the crucial function of transporting protons to the interior of virions enclosed in the endosome. Adamantane drugs, amantadine (AMN) and rimantidine (RMN), block the proton conduction in some strains, and have been used for the treatment and prophylaxis of influenza A infections. The structures of the transmembrane (TM) region of M2 that have been solved in micelles using NMR (residues 23-60) [Schnell and Chou (2008)] and by X-ray crystallography (residues 22-46) [Stouffer et al. (2008)] suggest different drug binding sites: external and internal for RMN and AMN, respectively. We have used molecular dynamics (MD) simulations to investigate the nature of the binding site and binding mode of adamantane drugs on the membrane-bound tetrameric M2-TM peptide bundles using as initial conformations the low-pH AMN-bound crystal structure, a high-pH model derived from the drug-free crystal structure, and the high-pH NMR structure. The MD simulations indicate that under both low-and high-pH conditions, AMN is stable inside the tetrameric bundle, spanning the region between residues Val27 to Gly34. At low pH the polar group of AMN is oriented toward the His37 gate while under high-pH conditions its orientation exhibits large fluctuations. The present MD simulations also suggest that AMN and RMN molecules do not show strong affinity to the external binding sites.
doi:10.1016/j.bbamem.2010.03.025
PMCID: PMC2975046  PMID: 20385097
molecular dynamics; simulations; amantadine; adamantine; transmembrane; ion channel
4.  Mapping copy number variation by population scale genome sequencing 
Nature  2011;470(7332):59-65.
Summary
Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
doi:10.1038/nature09708
PMCID: PMC3077050  PMID: 21293372
5.  Segmental duplications in the human genome reveal details of pseudogene formation 
Nucleic Acids Research  2010;38(20):6997-7007.
Duplicated pseudogenes in the human genome are disabled copies of functioning parent genes. They result from block duplication events occurring throughout evolutionary history. Relatively recent duplications (with sequence similarity ≥90% and length ≥1 kb) are termed segmental duplications (SDs); here, we analyze the interrelationship of SDs and pseudogenes. We present a decision-tree approach to classify pseudogenes based on their (and their parents’) characteristics in relation to SDs. The classification identifies 140 novel pseudogenes and makes possible improved annotation for the 3172 pseudogenes located in SDs. In particular, it reveals that many pseudogenes in SDs likely did not arise directly from parent genes, but are the result of a multi-step process. In these cases, the initial duplication or retrotransposition of a parent gene gives rise to a ‘parent pseudogene’, followed by further duplication creating duplicated–duplicated or duplicated–processed pseudogenes, respectively. Moreover, we can precisely identify these parent pseudogenes by overlap with ancestral SD loci. Finally, a comparison of nucleotide substitutions per site in a pseudogene with its surrounding SD region allows us to estimate the time difference between duplication and disablement events, and this suggests that most duplicated pseudogenes in SDs were likely disabled around the time of the original duplication.
doi:10.1093/nar/gkq587
PMCID: PMC2978362  PMID: 20615899
6.  Using semantic web rules to reason on an ontology of pseudogenes 
Bioinformatics  2010;26(12):i71-i78.
Motivation: Recent years have seen the development of a wide range of biomedical ontologies. Notable among these is Sequence Ontology (SO) which offers a rich hierarchy of terms and relationships that can be used to annotate genomic data. Well-designed formal ontologies allow data to be reasoned upon in a consistent and logically sound way and can lead to the discovery of new relationships. The Semantic Web Rules Language (SWRL) augments the capabilities of a reasoner by allowing the creation of conditional rules. To date, however, formal reasoning, especially the use of SWRL rules, has not been widely used in biomedicine.
Results: We have built a knowledge base of human pseudogenes, extending the existing SO framework to incorporate additional attributes. In particular, we have defined the relationships between pseudogenes and segmental duplications. We then created a series of logical rules using SWRL to answer research questions and to annotate our pseudogenes appropriately. Finally, we were left with a knowledge base which could be queried to discover information about human pseudogene evolution.
Availability: The fully populated knowledge base described in this document is available for download from http://ontology.pseudogene.org. A SPARQL endpoint from which to query the dataset is also available at this location.
Contact: matthew.holford@yale.edu; mark.gerstein@yale.edu
doi:10.1093/bioinformatics/btq173
PMCID: PMC2881358  PMID: 20529940
7.  Artificial Transmembrane Oncoproteins Smaller than the Bovine Papillomavirus E5 Protein Redefine Sequence Requirements for Activation of the Platelet-Derived Growth Factor β Receptor▿†  
Journal of Virology  2009;83(19):9773-9785.
The bovine papillomavirus E5 protein (BPV E5) is a 44-amino-acid homodimeric transmembrane protein that binds directly to the transmembrane domain of the platelet-derived growth factor (PDGF) β receptor and induces ligand-independent receptor activation. Three specific features of BPV E5 are considered important for its ability to activate the PDGF β receptor and transform mouse fibroblasts: a pair of C-terminal cysteines, a transmembrane glutamine, and a juxtamembrane aspartic acid. By using a new genetic technique to screen libraries expressing artificial transmembrane proteins for activators of the PDGF β receptor, we isolated much smaller proteins, from 32 to 36 residues, that lack all three of these features yet still dimerize noncovalently, specifically activate the PDGF β receptor via its transmembrane domain, and transform cells efficiently. The primary amino acid sequence of BPV E5 is virtually unrecognizable in some of these proteins, which share as few as seven consecutive amino acids with the viral protein. Thus, small artificial proteins that bear little resemblance to a viral oncoprotein can nevertheless productively interact with the same cellular target. We speculate that similar cellular proteins may exist but have been overlooked due to their small size and hydrophobicity.
doi:10.1128/JVI.00946-09
PMCID: PMC2748040  PMID: 19605488
8.  Computational analysis of membrane proteins: the largest class of drug targets 
Drug discovery today  2009;14(23-24):1130-1135.
Given the key roles of integral membrane proteins as transporters and channels, it is necessary to understand their structures and, hence, mechanisms and regulation at the molecular level. Membrane proteins represent ~30% of all proteins of currently sequenced genomes. Paradoxically, however, only ~2% of crystal structures deposited in the protein data bank are of membrane proteins, and very few of these are at high resolution (better than 2 Å). The great disparity between our understanding of soluble proteins and our understanding of membrane proteins is because of the practical problems of working with membrane proteins – specifically, difficulties in expression, purification and crystallization. Thus, computational modeling has been utilized extensively to make crucial advances in understanding membrane protein structure and function.
doi:10.1016/j.drudis.2009.08.006
PMCID: PMC2796609  PMID: 19733256
9.  Probing Peptide Nanotube Self-Assembly at a Liquid-Liquid Interface with Coarse-Grained Molecular Dynamics 
Nano letters  2008;8(11):3626-3630.
Self-assembly at a liquid-liquid interface is a powerful experimental route to novel nanomaterials. We report herein a computational study of peptide nanotube formation at an oil-water interface. We probe interfacial self-assembly and nanotube formation of the cyclic octapeptide, cyclo [(-L-Trp-D-Leu-)4] as an illustrative example. Individual peptide rings are rapidly adsorbed at the liquid-liquid interface where they self-assemble. Monomeric and dimeric peptide rings lie with their molecular planes mostly parallel to the interface. Longer oligomeric nanotubes are increasingly tilted at the interface and grow by an Oswald ripening mechanism to eventually align their tube axis parallel to the interface. The present results on nanotube assembly suggest that computation will be a useful complement to experiment in understanding the nature of self-assembly of nanomaterials at liquid-liquid interfaces.
doi:10.1021/nl801564m
PMCID: PMC2696305  PMID: 18855461
10.  Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity 
BMC Genomics  2009;10:480.
Background
Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins.
Results
We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at ). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively.
Conclusion
Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.
doi:10.1186/1471-2164-10-480
PMCID: PMC2770531  PMID: 19835609
11.  Pseudofam: the pseudogene families database 
Nucleic Acids Research  2008;37(Database issue):D738-D743.
Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125 000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.
doi:10.1093/nar/gkn758
PMCID: PMC2686518  PMID: 18957444

Results 1-11 (11)