Search tips
Search criteria

Results 1-25 (40)

Clipboard (0)
Year of Publication
1.  MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network 
Nucleic Acids Research  2014;42(Web Server issue):W147-W153.
Despite recent advances in human genetics, model organisms are indispensable for human disease research. Most human disease pathways are evolutionally conserved among other species, where they may phenocopy the human condition or be associated with seemingly unrelated phenotypes. Much of the known gene-to-phenotype association information is distributed across diverse databases, growing rapidly due to new experimental techniques. Accessible bioinformatics tools will therefore facilitate translation of discoveries from model organisms into human disease biology. Here, we present a web-based discovery tool for human disease studies, MORPHIN (model organisms projected on a human integrated gene network), which prioritizes the most relevant human diseases for a given set of model organism genes, potentially highlighting new model systems for human diseases and providing context to model organism studies. Conceptually, MORPHIN investigates human diseases by an orthology-based projection of a set of model organism genes onto a genome-scale human gene network. MORPHIN then prioritizes human diseases by relevance to the projected model organism genes using two distinct methods: a conventional overlap-based gene set enrichment analysis and a network-based measure of closeness between the query and disease gene sets capable of detecting associations undetectable by the conventional overlap-based methods. MORPHIN is freely accessible at
PMCID: PMC4086117  PMID: 24861622
2.  Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness 
Nature chemical biology  2014;10(3):178-180.
Bioengineering advances have made it possible to fundamentally alter the genetic codes of organisms. However, the evolutionary consequences of expanding an organism's genetic code with a non-canonical amino acid are poorly understood. Here we show that bacteriophages evolved on a host that incorporates 3-iodotyrosine at the amber stop codon acquired neutral and beneficial mutations to this new amino acid in their proteins, demonstrating that an expanded genetic code increases evolvability.
PMCID: PMC3932624  PMID: 24487692
3.  Global signatures of protein and mRNA expression levels† 
Molecular bioSystems  2009;5(12):1512-1526.
Cellular states are determined by differential expression of the cell’s proteins. The relationship between protein and mRNA expression levels informs about the combined outcomes of translation and protein degradation which are, in addition to transcription and mRNA stability, essential contributors to gene expression regulation. This review summarizes the state of knowledge about large-scale measurements of absolute protein and mRNA expression levels, and the degree of correlation between the two parameters. We summarize the information that can be derived from comparison of protein and mRNA expression levels and discuss how corresponding sequence characteristics suggest modes of regulation.
PMCID: PMC4089977  PMID: 20023718
4.  A role for central spindle proteins in cilia structure and function 
Cytoskeleton (Hoboken, N.J.)  2011;68(2):112-124.
Cytokinesis and ciliogenesis are fundamental cellular processes that require strict coordination of microtubule organization and directed membrane trafficking. These processes have been intensely studied, but there has been little indication that regulatory machinery might be extensively shared between them. Here, we show that several central spindle/midbody proteins (PRC1, MKLP-1, INCENP, centriolin) also localize in specific patterns at the basal body complex in vertebrate ciliated epithelial cells. Moreover, bioinformatic comparisons of midbody and cilia proteomes reveal a highly significant degree of overlap. Finally, we used temperature-sensitive alleles of PRC1/spd-1 and MKLP-1/zen-4 in C. elegans to assess ciliary functions while bypassing these proteins' early role in cell division. These mutants displayed defects in both cilia function and cilia morphology. Together, these data suggest the conserved re-use of a surprisingly large number of proteins in the cytokinetic apparatus and in cilia.
PMCID: PMC4089984  PMID: 21246755
Ciliogenesis; cytokinesis; PRC1; INCENP; MKLP-1; bioinformatics; cilia midbody
5.  Dynamic Reorganization of Metabolic Enzymes into Intracellular Bodies 
Both focused and large-scale cell biological and biochemical studies have revealed that hundreds of metabolic enzymes across diverse organisms form large intracellular bodies. These proteinaceous bodies range in form from fibers and intracellular foci—such as those formed by enzymes of nitrogen and carbon utilization and of nucleotide biosynthesis—to high-density packings inside bacterial microcompartments and eukaryotic microbodies. Although many enzymes clearly form functional mega-assemblies, it is not yet clear for many recently discovered cases whether they represent functional entities, storage bodies, or aggregates. In this article, we survey intracellular protein bodies formed by metabolic enzymes, asking when and why such bodies form and what their formation implies for the functionality—and dysfunctionality—of the enzymes that comprise them. The panoply of intracellular protein bodies also raises interesting questions regarding their evolution and maintenance within cells. We speculate on models for how such structures form in the first place and why they may be inevitable.
PMCID: PMC4089986  PMID: 23057741
self-assembly; allosteric regulation; fibers; foci; storage bodies; aggregates; metabolic efficiency
6.  Pseudomonas aeruginosa Enhances Production of a Non-Alginate Exopolysaccharide during Long-Term Colonization of the Cystic Fibrosis Lung 
PLoS ONE  2013;8(12):e82621.
The gram-negative opportunistic pathogen Pseudomonas aeruginosa is the primary cause of chronic respiratory infections in individuals with the heritable disease cystic fibrosis (CF). These infections can last for decades, during which time P. aeruginosa has been proposed to acquire beneficial traits via adaptive evolution. Because CF lacks an animal model that can acquire chronic P. aeruginosa infections, identifying genes important for long-term in vivo fitness remains difficult. However, since clonal, chronological samples can be obtained from chronically infected individuals, traits undergoing adaptive evolution can be identified. Recently we identified 24 P. aeruginosa gene expression traits undergoing parallel evolution in vivo in multiple individuals, suggesting they are beneficial to the bacterium. The goal of this study was to determine if these genes impact P. aeruginosa phenotypes important for survival in the CF lung. By using a gain-of-function genetic screen, we found that 4 genes and 2 operons undergoing parallel evolution in vivo promote P. aeruginosa biofilm formation. These genes/operons promote biofilm formation by increasing levels of the non-alginate exopolysaccharide Psl. One of these genes, phaF, enhances Psl production via a post-transcriptional mechanism, while the other 5 genes/operons do not act on either psl transcription or translation. Together, these data demonstrate that P. aeruginosa has evolved at least two pathways to over-produce a non-alginate exopolysaccharide during long-term colonization of the CF lung. More broadly, this approach allowed us to attribute a biological significance to genes with unknown function, demonstrating the power of using evolution as a guide for targeted genetic studies.
PMCID: PMC3855792  PMID: 24324811
7.  Id2a functions to limit Notch pathway activity and thereby influence the transition from proliferation to differentiation of retinoblasts during zebrafish retinogenesis 
Developmental biology  2012;371(2):280-292.
During vertebrate retinogenesis, the precise balance between retinoblast proliferation and differentiation is spatially and temporally regulated through a number of intrinsic factors and extrinsic signaling pathways. Moreover, there are complex gene regulatory network interactions between these intrinsic factors and extrinsic pathways, which ultimately function to determine when retinoblasts exit the cell cycle and terminally differentiate. We recently uncovered a cell non-autonomous role for the intrinsic HLH factor, Id2a, in regulating retinoblast proliferation and differentiation, with Id2a-deficient retinae containing an abundance of proliferative retinoblasts and an absence of terminally differentiated retinal neurons and glia. Here, we report that Id2a function is necessary and sufficient to limit Notch pathway activity during retinogenesis. Id2a-deficient retinae possess elevated levels of Notch pathway component gene expression, while retinae overexpressing id2a possess reduced expression of Notch pathway component genes. Attenuation of Notch signaling activity by DAPT or by morpholino knockdown of Notch1a is sufficient to rescue both the proliferative and differentiation defects in Id2a-deficient retinae. In addition to regulating Notch pathway activity, through a novel RNA-Seq and differential gene expression analysis of Id2a-deficient retinae, we identify a number of additional intrinsic and extrinsic regulatory pathway components whose expression is regulated by Id2a. These data highlight the integral role played by Id2a in the gene regulatory network governing the transition from retinoblast proliferation to terminal differentiation during vertebrate retinogenesis.
PMCID: PMC3477674  PMID: 22981606
Id2a; Id2; retinogenesis; Notch; zebrafish
8.  Correction: Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses 
PLoS ONE  2013;8(9):10.1371/annotation/5aeb88a0-1630-4a07-bb49-32cb5d617af1.
PMCID: PMC3779276
9.  A Census of Human Soluble Protein Complexes 
Cell  2012;150(5):1068-1081.
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions which were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably-associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes, and encompass both candidate disease genes and unnanotated proteins to inform on mechanism. Strikingly, whereas larger multi-protein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with 5 or fewer subunits are far more likely to be functionally un-annotated or restricted to vertebrates, suggesting more recent functional innovations.
PMCID: PMC3477804  PMID: 22939629
10.  A Bacteriophage Tailspike Domain Promotes Self-Cleavage of a Human Membrane-Bound Transcription Factor, the Myelin Regulatory Factor MYRF 
PLoS Biology  2013;11(8):e1001624.
Myelination of the central nervous system (CNS) is critical to vertebrate nervous systems for efficient neural signaling. CNS myelination occurs as oligodendrocytes terminally differentiate, a process regulated in part by the myelin regulatory factor, MYRF. Using bioinformatics and extensive biochemical and functional assays, we find that MYRF is generated as an integral membrane protein that must be processed to release its transcription factor domain from the membrane. In contrast to most membrane-bound transcription factors, MYRF proteolysis seems constitutive and independent of cell- and tissue-type, as we demonstrate by reconstitution in E. coli and yeast. The apparent absence of physiological cues raises the question as to how and why MYRF is processed. By using computational methods capable of recognizing extremely divergent sequence homology, we identified a MYRF protein domain distantly related to bacteriophage tailspike proteins. Although occurring in otherwise unrelated proteins, the phage domains are known to chaperone the tailspike proteins' trimerization and auto-cleavage, raising the hypothesis that the MYRF domain might contribute to a novel activation method for a membrane-bound transcription factor. We find that the MYRF domain indeed serves as an intramolecular chaperone that facilitates MYRF trimerization and proteolysis. Functional assays confirm that the chaperone domain-mediated auto-proteolysis is essential both for MYRF's transcriptional activity and its ability to promote oligodendrocyte maturation. This work thus reveals a previously unknown key step in CNS myelination. These data also reconcile conflicting observations of this protein family, different members of which have been identified as transmembrane or nuclear proteins. Finally, our data illustrate a remarkable evolutionary repurposing between bacteriophages and eukaryotes, with a chaperone domain capable of catalyzing trimerization-dependent auto-proteolysis in two entirely distinct protein and cellular contexts, in one case participating in bacteriophage tailspike maturation and in the other activating a key transcription factor for CNS myelination.
Author Summary
Membrane-bound transcription factors are synthesized as integral membrane proteins, but are proteolytically cleaved in response to relevant cues, untethering their transcription factor domains from the membrane to control gene expression in the nucleus. Here, we find that the myelin regulatory factor MYRF, a major transcriptional regulator of oligodendrocyte differentiation and central nervous system myelination, is also a membrane-bound transcription factor. In marked contrast to most well-known membrane-bound transcription factors, cleavage of MYRF appears to be unconditional. Surprisingly, this processing is performed by a protein domain shared with bacteriophages in otherwise unrelated proteins, where the domain is critical to the folding and proteolytic maturation of virus tailspikes. In addition to revealing a previously unknown key step in central nervous system myelination, this work also illustrates a remarkable example of evolutionary repurposing between bacteriophages and eukaryotes, with the same protein domain capable of catalyzing trimerization-dependent auto-proteolysis in two completely distinct protein and cellular contexts.
PMCID: PMC3742443  PMID: 23966832
11.  Role of Pseudomonas aeruginosa Peptidoglycan-Associated Outer Membrane Proteins in Vesicle Formation 
Journal of Bacteriology  2013;195(2):213-219.
Gram-negative bacteria produce outer membrane vesicles (OMVs) that package and deliver proteins, small molecules, and DNA to prokaryotic and eukaryotic cells. The molecular details of OMV biogenesis have not been fully elucidated, but peptidoglycan-associated outer membrane proteins that tether the outer membrane to the underlying peptidoglycan have been shown to be critical for OMV formation in multiple Enterobacteriaceae. In this study, we demonstrate that the peptidoglycan-associated outer membrane proteins OprF and OprI, but not OprL, impact production of OMVs by the opportunistic pathogen Pseudomonas aeruginosa. Interestingly, OprF does not appear to be important for tethering the outer membrane to peptidoglycan but instead impacts OMV formation through modulation of the levels of the Pseudomonas quinolone signal (PQS), a quorum signal previously shown by our laboratory to be critical for OMV formation. Thus, the mechanism by which OprF impacts OMV formation is distinct from that for other peptidoglycan-associated outer membrane proteins, including OprI.
PMCID: PMC3553829  PMID: 23123904
12.  A flaw in the typical evaluation scheme for pair-input computational predictions 
Nature methods  2012;9(12):1134-1136.
PMCID: PMC3531800  PMID: 23223166
13.  Label-Free Protein Quantitation Using Weighted Spectral Counting 
Methods in molecular biology (Clifton, N.J.)  2012;893:10.1007/978-1-61779-885-6_20.
Mass spectrometry (MS)-based shotgun proteomics allows protein identifications even in complex biological samples. Protein abundances can then be estimated from the counts of MS/MS spectra attributable to each protein, provided that one corrects for differential MS-detectability of the contributing peptides. We describe the use of a method, APEX, which calculates Absolute Protein EXpression levels based on learned correction factors, MS/MS spectral counts, and each protein's probability of correct identification.
The APEX-based calculations consist of three parts: (1) Using training data, peptide sequences and their sequence properties, a model is built that can be used to estimate MS-detectability (Oi) for any given protein. (2) Absolute abundances of proteins measured in an MS/MS experiment are calculated with information from spectral counts, identification probabilities and the learned Oi -values. (3) Simple statistics allow for significance analysis of differential expression in two distinct biological samples, i.e., measuring relative protein abundances. APEX-based protein abundances span more than four orders of magnitude and are applicable to mixtures of hundreds to thousands of proteins from any type of organism.
PMCID: PMC3654649  PMID: 22665309
Quantitative proteomics; Protein expression; Label-free mass spectrometry; Spectral counting
14.  Insights into the regulation of protein abundance from proteomic and transcriptomic analyses 
Nature reviews. Genetics  2012;13(4):227-232.
Recent advances in next-generation DNA sequencing and proteomics provide an unprecedented ability to survey mRNA and protein abundances. Such proteome-wide surveys are illuminating the extent to which different aspects of gene expression help to regulate cellular protein abundances. Current data demonstrate a substantial role for regulatory processes occurring after mRNA is made — that is, post-transcriptional, translational and protein degradation regulation — in controlling steady-state protein abundances. Intriguing observations are also emerging in relation to cells following perturbation, single-cell studies and the apparent evolutionary conservation of protein and mRNA abundances. Here, we summarize current understanding of the major factors regulating protein expression.
PMCID: PMC3654667  PMID: 22411467
15.  High-Throughput Immunofluorescence Microscopy Using Yeast Spheroplast Cell-Based Microarrays 
We have described a protocol for performing high-throughput immunofluorescence microscopy on microarrays of yeast cells. This approach employs immunostaining of spheroplasted yeast cells printed as high-density cell microarrays, followed by imaging using automated microscopy. A yeast spheroplast microarray can contain more than 5,000 printed spots, each containing cells from a given yeast strain, and is thus suitable for genome-wide screens focusing on single cell phenotypes, such as systematic localization or co-localization studies or genetic assays for genes affecting probed targets. We demonstrate the use of yeast spheroplast microarrays to probe microtubule and spindle defects across a collection of yeast strains harboring tetracycline-down-regulatable alleles of essential genes.
PMCID: PMC3654672  PMID: 21104056
Yeast; immunofluorescence; high-throughput microscopy; cell microarrays; microtubule
16.  Transiently Transfected Purine Biosynthetic Enzymes Form Stress Bodies 
PLoS ONE  2013;8(2):e56203.
It has been hypothesized that components of enzymatic pathways might organize into intracellular assemblies to improve their catalytic efficiency or lead to coordinate regulation. Accordingly, de novo purine biosynthesis enzymes may form a purinosome in the absence of purines, and a punctate intracellular body has been identified as the purinosome. We investigated the mechanism by which human de novo purine biosynthetic enzymes might be organized into purinosomes, especially under differing cellular conditions. Irregardless of the activity of bodies formed by endogenous enzymes, we demonstrate that intracellular bodies formed by transiently transfected, fluorescently tagged human purine biosynthesis proteins are best explained as protein aggregation.
PMCID: PMC3566086  PMID: 23405267
17.  Revisiting the negative example sampling problem for predicting protein–protein interactions 
Bioinformatics  2011;27(21):3024-3028.
Motivation: A number of computational methods have been proposed that predict protein–protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs (negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs (positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs.
Results: We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the ‘hubbiness’ of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling.
Availability: The datasets used for this study are available at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3198576  PMID: 21908540
18.  Evolutionarily Repurposed Networks Reveal the Well-Known Antifungal Drug Thiabendazole to Be a Novel Vascular Disrupting Agent 
PLoS Biology  2012;10(8):e1001379.
Analysis of a genetic module repurposed between yeast and vertebrates reveals that a common antifungal medication is also a potent vascular disrupting agent.
Studies in diverse organisms have revealed a surprising depth to the evolutionary conservation of genetic modules. For example, a systematic analysis of such conserved modules has recently shown that genes in yeast that maintain cell walls have been repurposed in vertebrates to regulate vein and artery growth. We reasoned that by analyzing this particular module, we might identify small molecules targeting the yeast pathway that also act as angiogenesis inhibitors suitable for chemotherapy. This insight led to the finding that thiabendazole, an orally available antifungal drug in clinical use for 40 years, also potently inhibits angiogenesis in animal models and in human cells. Moreover, in vivo time-lapse imaging revealed that thiabendazole reversibly disassembles newly established blood vessels, marking it as vascular disrupting agent (VDA) and thus as a potential complementary therapeutic for use in combination with current anti-angiogenic therapies. Importantly, we also show that thiabendazole slows tumor growth and decreases vascular density in preclinical fibrosarcoma xenografts. Thus, an exploration of the evolutionary repurposing of gene networks has led directly to the identification of a potential new therapeutic application for an inexpensive drug that is already approved for clinical use in humans.
Author Summary
Yeast cells and vertebrate blood vessels would not seem to have much in common. However, we have discovered that during the course of evolution, a group of proteins whose function in yeast is to maintain cell walls has found an alternative use in vertebrates regulating angiogenesis. This remarkable repurposing of the proteins during evolution led us to hypothesize that, despite the different functions of the proteins in humans compared to yeast, drugs that modulated the yeast pathway might also modulate angiogenesis in humans and in animal models. One compound seemed a particularly promising candidate for this sort of approach: thiabendazole (TBZ), which has been in clinical use as a systemic antifungal and deworming treatment for 40 years. Gratifyingly, our study shows that TBZ is indeed able to act as a vascular disrupting agent and an angiogenesis inhibitor. Notably, TBZ also slowed tumor growth and decreased vascular density in human tumors grafted into mice. TBZ’s historical safety data and low cost make it an outstanding candidate for translation to clinical use as a complement to current anti-angiogenic strategies for the treatment of cancer. Our work demonstrates how model organisms from distant branches of the evolutionary tree can be exploited to arrive at a promising new drug.
PMCID: PMC3423972  PMID: 22927795
19.  Protein abundances are more conserved than mRNA abundances across diverse taxa 
Proteomics  2010;10(23):4209-4212.
Proteins play major roles in most biological processes; as a consequence, protein expression levels are highly regulated. While extensive post-transcriptional, translational and protein degradation control clearly influence protein concentration and functionality, it is often thought that protein abundances are primarily determined by the abundances of the corresponding mRNAs. Hence surprisingly, a recent study showed that abundances of orthologous nematode and fly proteins correlate better than their corresponding mRNA abundances. We tested if this phenomenon is general by collecting and testing matching large-scale protein and mRNA expression datasets from seven different species: two bacteria, yeast, nematode, fly, human, and plant. We find that steady-state abundances of proteins show significantly higher correlation across these diverse phylogenetic taxa than the abundances of their corresponding mRNAs (p=0.0008, paired Wilcoxon). These data support the presence of strong selective pressure to maintain protein abundances during evolution, even when mRNA abundances diverge.
PMCID: PMC3113407  PMID: 21089048
20.  It’s the machine that matters: Predicting gene function and phenotype from protein networks 
Journal of proteomics  2010;73(11):2277-2289.
Increasing knowledge about the organization of proteins into complexes, systems, and pathways has led to a flowering of theoretical approaches for exploiting this knowledge in order to better learn the functions of proteins and their roles underlying phenotypic traits and diseases. Much of this body of theory has been developed and tested in model organisms, relying on their relative simplicity and genetic and biochemical tractability to accelerate the research. In this review, we discuss several of the major approaches for computationally integrating proteomics and genomics observations into integrated protein networks, then applying guilt-by-association in these networks in order to identify genes underlying traits. Recent trends in this field include a rising appreciation of the modular network organization of proteins underlying traits or mutational phenotypes, and how to exploit such protein modularity using computational approaches related to the internet search algorithm PageRank. Many protein network-based predictions have recently been experimentally confirmed in yeast, worms, plants, and mice, and several successful approaches in model organisms have been directly translated to analyze human disease, with notable recent applications to glioma and breast cancer prognosis.
PMCID: PMC2953423  PMID: 20637909
Data integration; Function prediction; Humans; Model organisms; Phenotype prediction; Protein interaction networks
21.  MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines 
Journal of proteome research  2011;10(7):2949-2958.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
PMCID: PMC3128686  PMID: 21488652
integrative analysis; database search; peptide identification
22.  Characterising and Predicting Haploinsufficiency in the Human Genome 
PLoS Genetics  2010;6(10):e1001154.
Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
Author Summary
Humans, like most complex organisms, have two copies of most genes in their genome, one from the mother and one from the father. This redundancy provides a back-up copy for most genes, should one copy be lost through mutation. For a minority of genes, one functional copy is not enough to sustain normal human function, and mutations causing the loss of function of one of the copies of such genes are a major cause of childhood developmental diseases. Over the past 20 years medical geneticists have identified over 300 such genes, but it is not known how many of the 22,000 genes in our genome may also be sensitive to gene loss. By comparing these ∼300 genes known to be sensitive to gene loss with over 1,000 genes where loss of a single copy does not result in disease, we have identified some key evolutionary and functional similarities between genes sensitive to loss of a single copy. We have used these similarities to predict for most genes in the genome, whether loss of a single copy is likely to result in disease. These predictions will help in the interpretation of mutations seen in patients.
PMCID: PMC2954820  PMID: 20976243
23.  Parallel Evolution in Pseudomonas aeruginosa over 39,000 Generations In Vivo 
mBio  2010;1(4):e00199-10.
The Gram-negative bacterium Pseudomonas aeruginosa is a common cause of chronic airway infections in individuals with the heritable disease cystic fibrosis (CF). After prolonged colonization of the CF lung, P. aeruginosa becomes highly resistant to host clearance and antibiotic treatment; therefore, understanding how this bacterium evolves during chronic infection is important for identifying beneficial adaptations that could be targeted therapeutically. To identify potential adaptive traits of P. aeruginosa during chronic infection, we carried out global transcriptomic profiling of chronological clonal isolates obtained from 3 individuals with CF. Isolates were collected sequentially over periods ranging from 3 months to 8 years, representing up to 39,000 in vivo generations. We identified 24 genes that were commonly regulated by all 3 P. aeruginosa lineages, including several genes encoding traits previously shown to be important for in vivo growth. Our results reveal that parallel evolution occurs in the CF lung and that at least a proportion of the traits identified are beneficial for P. aeruginosa chronic colonization of the CF lung.
Deadly diseases like AIDS, malaria, and tuberculosis are the result of long-term chronic infections. Pathogens that cause chronic infections adapt to the host environment, avoiding the immune response and resisting antimicrobial agents. Studies of pathogen adaptation are therefore important for understanding how the efficacy of current therapeutics may change upon prolonged infection. One notorious chronic pathogen is Pseudomonas aeruginosa, a bacterium that causes long-term infections in individuals with the heritable disease cystic fibrosis (CF). We used gene expression profiles to identify 24 genes that commonly changed expression over time in 3 P. aeruginosa lineages, indicating that these changes occur in parallel in the lungs of individuals with CF. Several of these genes have previously been shown to encode traits critical for in vivo-relevant processes, suggesting that they are likely beneficial adaptations important for chronic colonization of the CF lung.
PMCID: PMC2939680  PMID: 20856824
24.  A Synthetic Genetic Edge Detection Program 
Cell  2009;137(7):1272-1281.
Edge detection is a signal processing algorithm common in artificial intelligence and image recognition programs. We have constructed a genetically encoded edge detection algorithm that programs an isogenic community of E.coli to sense an image of light, communicate to identify the light-dark edges, and visually present the result of the computation. The algorithm is implemented using multiple genetic circuits. An engineered light sensor enables cells to distinguish between light and dark regions. In the dark, cells produce a diffusible chemical signal that diffuses into light regions. Genetic logic gates are used so that only cells that sense light and the diffusible signal produce a positive output. A mathematical model constructed from first principles and parameterized with experimental measurements of the component circuits predicts the performance of the complete program. Quantitatively accurate models will facilitate the engineering of more complex biological behaviors and inform bottom-up studies of natural genetic regulatory networks.
PMCID: PMC2775486  PMID: 19563759
25.  Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana 
Nature biotechnology  2010;28(2):149-156.
Plants are essential sources of food, fiber and renewable energy. Effective methods for manipulating plant traits have important agricultural and economic consequences. We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional network and targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647 (73%) genes of the reference flowering plant Arabidopsis thaliana. AraNet associations have measured precision greater than literature-based protein interactions (21%) for 55% of genes, and are highly predictive for diverse biological pathways. Using AraNet, we found a 10-fold enrichment in identifying early seedling development genes. By interrogating network neighborhoods, we identify At1g80710 (now Drought sensitive 1; Drs1) and At3g05090 (now Lateral root stimulator 1; Lrs1) as novel regulators of drought sensitivity and lateral root development, respectively. AraNet ( provides a global resource for plant gene function identification and genetic dissection of plant traits.
PMCID: PMC2857375  PMID: 20118918

Results 1-25 (40)