PMCC PMCC

Search tips
Search criteria

Advanced
Results 26-41 (41)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
26.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy 
Genome Biology  2008;9(Suppl 1):S5.
The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseFunc competition, we developed and applied two distinct large-scale data mining approaches to infer the functions (Gene Ontology annotations) of mouse genes from experimental observations from available functional genomics, proteomics, comparative genomics, and phenotypic data. The two strategies — the first using classifiers to map features to annotations, the second propagating annotations from characterized genes to uncharacterized genes along edges in a network constructed from the features — offer alternative and possibly complementary approaches to providing functional annotations. Here, we re-implement and evaluate these approaches and their combination for their ability to predict the proper functional annotations of genes in the MouseFunc data set. We show that, when controlling for the same set of input features, the network approach generally outperformed a naïve Bayesian classifier approach, while their combination offers some improvement over either independently. We make our observations of predictive performance on the MouseFunc competition hold-out set, as well as on a ten-fold cross-validation of the MouseFunc data. Across all 1,339 annotated genes in the MouseFunc test set, the median predictive power was quite strong (median area under a receiver operating characteristic plot of 0.865 and average precision of 0.195), indicating that a mining-based strategy with existing data is a promising path towards discovering mammalian gene functions. As one product of this work, a high-confidence subset of the functional mouse gene network was produced — spanning >70% of mouse genes with >1.6 million associations — that is predictive of mouse (and therefore often human) gene function and functional associations. The network should be generally useful for mammalian gene functional analyses, such as for predicting interactions, inferring functional connections between genes and pathways, and prioritizing candidate genes. The network and all predictions are available on the worldwide web.
doi:10.1186/gb-2008-9-s1-s5
PMCID: PMC2447539  PMID: 18613949
27.  Group II Intron Protein Localization and Insertion Sites Are Affected by Polyphosphate 
PLoS Biology  2008;6(6):e150.
Mobile group II introns consist of a catalytic intron RNA and an intron-encoded protein with reverse transcriptase activity, which act together in a ribonucleoprotein particle to promote DNA integration during intron mobility. Previously, we found that the Lactococcus lactis Ll.LtrB intron-encoded protein (LtrA) expressed alone or with the intron RNA to form ribonucleoprotein particles localizes to bacterial cellular poles, potentially accounting for the intron's preferential insertion in the oriC and ter regions of the Escherichia coli chromosome. Here, by using cell microarrays and automated fluorescence microscopy to screen a transposon-insertion library, we identified five E. coli genes (gppA, uhpT, wcaK, ynbC, and zntR) whose disruption results in both an increased proportion of cells with more diffuse LtrA localization and a more uniform genomic distribution of Ll.LtrB-insertion sites. Surprisingly, we find that a common factor affecting LtrA localization in these and other disruptants is the accumulation of intracellular polyphosphate, which appears to bind LtrA and other basic proteins and delocalize them away from the poles. Our findings show that the intracellular localization of a group II intron-encoded protein is a major determinant of insertion-site preference. More generally, our results suggest that polyphosphate accumulation may provide a means of localizing proteins to different sites of action during cellular stress or entry into stationary phase, with potentially wide physiological consequences.
Author Summary
Group II introns are bacterial mobile elements thought to be ancestors of introns—genetic material that is discarded from messenger RNA transcripts—and retroelements—genetic elements and viruses that replicate via reverse transcription—in higher organisms. They propagate by forming a complex consisting of the catalytically active intron RNA and an intron-encoded reverse transcriptase (which converts the RNA to DNA, which can then be reinserted in the host genome). The Ll.LtrB group II intron-encoded protein (LtrA) was found previously to localize to bacterial cellular poles, potentially accounting for the preferential insertion of Ll.LtrB in the replication origin (oriC) and terminus (ter) regions of the Escherichia coli chromosome, which are located near the poles during much of the cell cycle. Here, we identify E. coli genes whose disruption leads both to more diffuse LtrA localization and a more uniform chromosomal distribution of Ll.LtrB-insertion sites, proving that the location of the LtrA protein contributes to insertion-site preference. Surprisingly, we find that LtrA localization in the disruptants is affected by the accumulation of intracellular polyphosphate, which appears to bind basic proteins and delocalize them away from the cellular poles. Thus, polyphosphate, a ubiquitous but enigmatic molecule in prokaryotes and eukaryotes, can localize proteins to different sites of action, with potentially wide physiological consequences.
A novel cell microarray method uncovers connections between group II intron mobility, cell stress, and polyphosphate metabolism, including the finding that polyphosphate can influence intracellular protein localization.
doi:10.1371/journal.pbio.0060150
PMCID: PMC2435150  PMID: 18593213
28.  A map of human protein interactions derived from co-expression of human mRNAs and their orthologs 
The human protein interaction network will offer global insights into the molecular organization of cells and provide a framework for modeling human disease, but the network's large scale demands new approaches. We report a set of 7000 physical associations among human proteins inferred from indirect evidence: the comparison of human mRNA co-expression patterns with those of orthologous genes in five other eukaryotes, which we demonstrate identifies proteins in the same physical complexes. To evaluate the accuracy of the predicted physical associations, we apply quantitative mass spectrometry shotgun proteomics to measure elution profiles of 3013 human proteins during native biochemical fractionation, demonstrating systematically that putative interaction partners tend to co-sediment. We further validate uncharacterized proteins implicated by the associations in ribosome biogenesis, including WBSCR20C, associated with Williams–Beuren syndrome. This meta-analysis therefore exploits non-protein-based data, but successfully predicts associations, including 5589 novel human physical protein associations, with measured accuracies of 54±10%, comparable to direct large-scale interaction assays. The new associations' derivation from conserved in vivo phenomena argues strongly for their biological relevance.
doi:10.1038/msb.2008.19
PMCID: PMC2387231  PMID: 18414481
interactions; mass spectrometry; networks; proteomics; systems biology
29.  Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes 
Genome Biology  2007;8(12):R258.
Loss-of-function phenotypes of yeast genes can be predicted from the loss-of-function phenotypes of their neighbours in functional gene networks. This could potentially be applied to the prediction of human disease genes.
We demonstrate that loss-of-function yeast phenotypes are predictable by guilt-by-association in functional gene networks. Testing 1,102 loss-of-function phenotypes from genome-wide assays of yeast reveals predictability of diverse phenotypes, spanning cellular morphology, growth, metabolism, and quantitative cell shape features. We apply the method to extend a genome-wide screen by predicting, then verifying, genes whose disruption elongates yeast cells, and to predict human disease genes. To facilitate network-guided screens, a web server is available .
doi:10.1186/gb-2007-8-12-r258
PMCID: PMC2246260  PMID: 18053250
30.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae 
PLoS ONE  2007;2(10):e988.
Background
Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.
Methodology/Principal Findings
We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.
Conclusions/Significance
YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.
doi:10.1371/journal.pone.0000988
PMCID: PMC1991590  PMID: 17912365
31.  Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways 
BMC Genomics  2007;8:117.
Background
Cell lines have been used to study cancer for decades, but truly quantitative assessment of their performance as models is often lacking. We used gene expression profiling to quantitatively assess the gene expression of nine cell line models of cervical cancer.
Results
We find a wide variation in the extent to which different cell culture models mimic late-stage invasive cervical cancer biopsies. The lowest agreement was from monolayer HeLa cells, a common cervical cancer model; the highest agreement was from primary epithelial cells, C4-I, and C4-II cell lines. In addition, HeLa and SiHa cell lines cultured in an organotypic environment increased their correlation to cervical cancer significantly. We also find wide variation in agreement when we considered how well individual biological pathways model cervical cancer. Cell lines with an anti-correlation to cervical cancer were also identified and should be avoided.
Conclusion
Using gene expression profiling and quantitative analysis, we have characterized nine cell lines with respect to how well they serve as models of cervical cancer. Applying this method to individual pathways, we identified the appropriateness of particular cell lines for studying specific pathways in cervical cancer. This study will allow researchers to choose a cell line with the highest correlation to cervical cancer at a pathway level. This method is applicable to other cancers and could be used to identify the appropriate cell line and growth condition to employ when studying other cancers.
doi:10.1186/1471-2164-8-117
PMCID: PMC1878486  PMID: 17493265
32.  How complete are current yeast and human protein-interaction networks? 
Genome Biology  2006;7(11):120.
How can protein-interaction networks can be made more complete?
We estimate the full yeast protein-protein interaction network to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively. Paradoxically, releasing raw, unfiltered assay data might help separate true from false interactions.
doi:10.1186/gb-2006-7-11-120
PMCID: PMC1794583  PMID: 17147767
33.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome 
Genome Biology  2005;6(5):R40.
In order to consolidate the known human proteins interactions two tests were developed to measure the relative accuracy of the available interaction data. In addition, 6,580 interactions among 3,737 human proteins were recovered from Medline abstracts and combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing data sets.
Background
Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins.
Results
We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets.
Conclusion
These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.
doi:10.1186/gb-2005-6-5-r40
PMCID: PMC1175952  PMID: 15892868
34.  Assembling a jigsaw puzzle with 20,000 parts 
Genome Biology  2003;4(6):323.
A report on the Keystone Symposium 'Proteomics: Technologies and Applications', Keystone, USA, 25-30 March 2003.
A report on the Keystone Symposium 'Proteomics: Technologies and Applications', Keystone, USA, 25-30 March 2003.
doi:10.1186/gb-2003-4-6-323
PMCID: PMC193613  PMID: 12801408
35.  DIP: The Database of Interacting Proteins: 2001 update 
Nucleic Acids Research  2001;29(1):239-241.
The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined protein–protein interactions. Since January 2000 the number of protein–protein interactions in DIP has nearly tripled to 3472 and the number of proteins to 2659. New interactive tools have been developed to aid in the visualization, navigation and study of networks of protein interactions.
PMCID: PMC29798  PMID: 11125102
36.  Characterization of a Thermostable DNA Glycosylase Specific for U/G and T/G Mismatches from the Hyperthermophilic Archaeon Pyrobaculum aerophilum 
Journal of Bacteriology  2000;182(5):1272-1279.
U/G and T/G mismatches commonly occur due to spontaneous deamination of cytosine and 5-methylcytosine in double-stranded DNA. This mutagenic effect is particularly strong for extreme thermophiles, since the spontaneous deamination reaction is much enhanced at high temperature. Previously, a U/G and T/G mismatch-specific glycosylase (Mth-MIG) was found on a cryptic plasmid of the archaeon Methanobacterium thermoautotrophicum, a thermophile with an optimal growth temperature of 65°C. We report characterization of a putative DNA glycosylase from the hyperthermophilic archaeon Pyrobaculum aerophilum, whose optimal growth temperature is 100°C. The open reading frame was first identified through a genome sequencing project in our laboratory. The predicted product of 230 amino acids shares significant sequence homology to [4Fe-4S]-containing Nth/MutY DNA glycosylases. The histidine-tagged recombinant protein was expressed in Escherichia coli and purified. It is thermostable and displays DNA glycosylase activities specific to U/G and T/G mismatches with an uncoupled AP lyase activity. It also processes U/7,8-dihydro-oxoguanine and T/7,8-dihydro-oxoguanine mismatches. We designate it Pa-MIG. Using sequence comparisons among complete bacterial and archaeal genomes, we have uncovered a putative MIG protein from another hyperthermophilic archaeon, Aeropyrum pernix. The unique conserved amino acid motifs of MIG proteins are proposed to distinguish MIG proteins from the closely related Nth/MutY DNA glycosylases.
PMCID: PMC94412  PMID: 10671447
37.  DIP: the Database of Interacting Proteins 
Nucleic Acids Research  2000;28(1):289-291.
The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu ) is a database that documents experimentally determined protein–protein interactions. This database is intended to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes. Beyond cataloging details of protein–protein interactions, the DIP is useful for understanding protein function and protein–protein relationships, studying the properties of networks of interacting proteins, benchmarking predictions of protein–protein interactions, and studying the evolution of protein–protein interactions.
PMCID: PMC102387  PMID: 10592249
38.  Evolutionarily Repurposed Networks Reveal the Well-Known Antifungal Drug Thiabendazole to Be a Novel Vascular Disrupting Agent 
PLoS Biology  2012;10(8):e1001379.
Analysis of a genetic module repurposed between yeast and vertebrates reveals that a common antifungal medication is also a potent vascular disrupting agent.
Studies in diverse organisms have revealed a surprising depth to the evolutionary conservation of genetic modules. For example, a systematic analysis of such conserved modules has recently shown that genes in yeast that maintain cell walls have been repurposed in vertebrates to regulate vein and artery growth. We reasoned that by analyzing this particular module, we might identify small molecules targeting the yeast pathway that also act as angiogenesis inhibitors suitable for chemotherapy. This insight led to the finding that thiabendazole, an orally available antifungal drug in clinical use for 40 years, also potently inhibits angiogenesis in animal models and in human cells. Moreover, in vivo time-lapse imaging revealed that thiabendazole reversibly disassembles newly established blood vessels, marking it as vascular disrupting agent (VDA) and thus as a potential complementary therapeutic for use in combination with current anti-angiogenic therapies. Importantly, we also show that thiabendazole slows tumor growth and decreases vascular density in preclinical fibrosarcoma xenografts. Thus, an exploration of the evolutionary repurposing of gene networks has led directly to the identification of a potential new therapeutic application for an inexpensive drug that is already approved for clinical use in humans.
Author Summary
Yeast cells and vertebrate blood vessels would not seem to have much in common. However, we have discovered that during the course of evolution, a group of proteins whose function in yeast is to maintain cell walls has found an alternative use in vertebrates regulating angiogenesis. This remarkable repurposing of the proteins during evolution led us to hypothesize that, despite the different functions of the proteins in humans compared to yeast, drugs that modulated the yeast pathway might also modulate angiogenesis in humans and in animal models. One compound seemed a particularly promising candidate for this sort of approach: thiabendazole (TBZ), which has been in clinical use as a systemic antifungal and deworming treatment for 40 years. Gratifyingly, our study shows that TBZ is indeed able to act as a vascular disrupting agent and an angiogenesis inhibitor. Notably, TBZ also slowed tumor growth and decreased vascular density in human tumors grafted into mice. TBZ’s historical safety data and low cost make it an outstanding candidate for translation to clinical use as a complement to current anti-angiogenic strategies for the treatment of cancer. Our work demonstrates how model organisms from distant branches of the evolutionary tree can be exploited to arrive at a promising new drug.
doi:10.1371/journal.pbio.1001379
PMCID: PMC3423972  PMID: 22927795
39.  Rational Extension of the Ribosome Biogenesis Pathway Using Network-Guided Genetics 
PLoS Biology  2009;7(10):e1000213.
Gene networks are an efficient route for associating candidate genes with biological processes. Here, networks are used to discover more than 15 new genes for ribosomal subunit maturation, rRNA processing, and ribosomal export from the nucleus.
Biogenesis of ribosomes is an essential cellular process conserved across all eukaryotes and is known to require >170 genes for the assembly, modification, and trafficking of ribosome components through multiple cellular compartments. Despite intensive study, this pathway likely involves many additional genes. Here, we employ network-guided genetics—an approach for associating candidate genes with biological processes that capitalizes on recent advances in functional genomic and proteomic studies—to computationally identify additional ribosomal biogenesis genes. We experimentally evaluated >100 candidate yeast genes in a battery of assays, confirming involvement of at least 15 new genes, including previously uncharacterized genes (YDL063C, YIL091C, YOR287C, YOR006C/TSR3, YOL022C/TSR4). We associate the new genes with specific aspects of ribosomal subunit maturation, ribosomal particle association, and ribosomal subunit nuclear export, and we identify genes specifically required for the processing of 5S, 7S, 20S, 27S, and 35S rRNAs. These results reveal new connections between ribosome biogenesis and mRNA splicing and add >10% new genes—most with human orthologs—to the biogenesis pathway, significantly extending our understanding of a universally conserved eukaryotic process.
Author Summary
Ribosomes are the extremely complex cellular machines responsible for constructing new proteins. In eukaryotic cells, such as yeast, each ribosome contains more than 80 protein or RNA components. These complex machines must themselves be assembled by an even more complex machinery spanning multiple cellular compartments and involving perhaps 200 components in an ordered series of processing events, resulting in delivery of the two halves of the mature ribosome, the 40S and 60S components, to the cytoplasm. The ribosome biogenesis machinery has been only partially characterized, and many lines of evidence suggest that there are additional components that are still unknown. We employed an emerging computational technique called network-guided genetics to identify new candidate genes for this pathway. We then tested the candidates in a battery of experimental assays to determine what roles the genes might play in the biogenesis of ribosomes. This approach proved an efficient route to the discovery of new genes involved in ribosome biogenesis, significantly extending our understanding of a universally conserved eukaryotic process.
doi:10.1371/journal.pbio.1000213
PMCID: PMC2749941  PMID: 19806183
40.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence 
Genome Biology  2008;9(Suppl 1):S2.
Background:
Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.
Results:
In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%.
Conclusion:
We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.
doi:10.1186/gb-2008-9-s1-s2
PMCID: PMC2447536  PMID: 18613946
41.  Systematic profiling of cellular phenotypes with spotted cell microarrays reveals mating-pheromone response genes 
Genome Biology  2006;7(1):R6.
Spotted cell microarrays were developed for measuring cellular phenotypes on a large scale and used to identify genes involved in the response of yeast to mating pheromone.
We have developed spotted cell microarrays for measuring cellular phenotypes on a large scale. Collections of cells are printed, stained for subcellular features, then imaged via automated, high-throughput microscopy, allowing systematic phenotypic characterization. We used this technology to identify genes involved in the response of yeast to mating pheromone. Besides morphology assays, cell microarrays should be valuable for high-throughput in situ hybridization and immunoassays, enabling new classes of genetic assays based on cell imaging.
doi:10.1186/gb-2006-7-1-r6
PMCID: PMC1431703  PMID: 16507139

Results 26-41 (41)