Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)
Year of Publication
Document Types
1.  Systematic Definition of Protein Constituents along the Major Polarization Axis Reveals an Adaptive Reuse of the Polarization Machinery in Pheromone-Treated Budding Yeast 
Polarizing cells extensively restructure cellular components in a spatially and temporally coupled manner along the major axis of cellular extension. Budding yeast are a useful model of polarized growth, helping to define many molecular components of this conserved process. Besides budding, yeast cells also differentiate upon treatment with pheromone from the opposite mating type, forming a mating projection (the ‘shmoo’) by directional restructuring of the cytoskeleton, localized vesicular transport and overall reorganization of the cytosol. To characterize the proteomic localization changes accompanying polarized growth, we developed and implemented a novel cell microarray-based imaging assay for measuring the spatial redistribution of a large fraction of the yeast proteome, and applied this assay to identify proteins localized along the mating projection following pheromone treatment. We further trained a machine learning algorithm to refine the cell imaging screen, identifying additional shmoo-localized proteins. In all, we identified 74 proteins that specifically localize to the mating projection, including previously uncharacterized proteins (Ycr043c, Ydr348c, Yer071c, Ymr295c, and Yor304c-a) and known polarization complexes such as the exocyst. Functional analysis of these proteins, coupled with quantitative analysis of individual organelle movements during shmoo formation, suggests a model in which the basic machinery for cell polarization is generally conserved between processes forming the bud and the shmoo, with a distinct subset of proteins used only for shmoo formation. The net effect is a defined ordering of major organelles along the polarization axis, with specific proteins implicated at the proximal growth tip.
Upon sensing mating pheromone, budding yeast cells form a mating projection (the ‘shmoo’) that serves as a model for polarized cell growth, involving cytoskeletal/cytosolic restructuring and directed vesicular transport. We developed a cell microarray-based imaging assay for measuring localization of the yeast proteome during polarized growth. We find major organelles ordered along the polarization axis, localize 74 proteins to the growth tip, and observe adaptive reuse of general polarization machinery.
PMCID: PMC2651748  PMID: 19053807
Proteomics; polarized growth; subcellular localization; pheromone response; yeast
2.  Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation 
BMC Genomics  2008;9:609.
One mechanism to account for robustness against gene knockouts or knockdowns is through buffering by gene duplicates, but the extent and general correlates of this process in organisms is still a matter of debate. To reveal general trends of this process, we provide a comprehensive comparison of gene essentiality, duplication and buffering by duplicates across seven bacteria (Mycoplasma genitalium, Bacillus subtilis, Helicobacter pylori, Haemophilus influenzae, Mycobacterium tuberculosis, Pseudomonas aeruginosa, Escherichia coli), and four eukaryotes (Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Mus musculus (mouse)).
In nine of the eleven organisms, duplicates significantly increase chances of survival upon gene deletion (P-value ≤ 0.05), but only by up to 13%. Given that duplicates make up to 80% of eukaryotic genomes, the small contribution is surprising and points to dominant roles of other buffering processes, such as alternative metabolic pathways. The buffering capacity of duplicates appears to be independent of the degree of gene essentiality and tends to be higher for genes with high expression levels. For example, buffering capacity increases to 23% amongst highly expressed genes in E. coli. Sequence similarity and the number of duplicates per gene are weak predictors of the duplicate's buffering capacity. In a case study we show that buffering gene duplicates in yeast and worm are somewhat more similar in their functions than non-buffering duplicates and have increased transcriptional and translational activity.
In sum, the extent of gene essentiality and buffering by duplicates is not conserved across organisms and does not correlate with the organisms' apparent complexity. This heterogeneity goes beyond what would be expected from differences in experimental approaches alone. Buffering by duplicates contributes to robustness in several organisms, but to a small extent – and the relatively large amount of buffering by duplicates observed in yeast and worm may be largely specific to these organisms. Thus, the only common factor of buffering by duplicates between different organisms may be the by-product of duplicate retention due to demands of high dosage.
PMCID: PMC2627895  PMID: 19087332
3.  The APEX Quantitative Proteomics Tool: Generating protein quantitation estimates from LC-MS/MS proteomics results 
BMC Bioinformatics  2008;9:529.
Mass spectrometry (MS) based label-free protein quantitation has mainly focused on analysis of ion peak heights and peptide spectral counts. Most analyses of tandem mass spectrometry (MS/MS) data begin with an enzymatic digestion of a complex protein mixture to generate smaller peptides that can be separated and identified by an MS/MS instrument. Peptide spectral counting techniques attempt to quantify protein abundance by counting the number of detected tryptic peptides and their corresponding MS spectra. However, spectral counting is confounded by the fact that peptide physicochemical properties severely affect MS detection resulting in each peptide having a different detection probability. Lu et al. (2007) described a modified spectral counting technique, Absolute Protein Expression (APEX), which improves on basic spectral counting methods by including a correction factor for each protein (called Oi value) that accounts for variable peptide detection by MS techniques. The technique uses machine learning classification to derive peptide detection probabilities that are used to predict the number of tryptic peptides expected to be detected for one molecule of a particular protein (Oi). This predicted spectral count is compared to the protein's observed MS total spectral count during APEX computation of protein abundances.
The APEX Quantitative Proteomics Tool, introduced here, is a free open source Java application that supports the APEX protein quantitation technique. The APEX tool uses data from standard tandem mass spectrometry proteomics experiments and provides computational support for APEX protein abundance quantitation through a set of graphical user interfaces that partition thparameter controls for the various processing tasks. The tool also provides a Z-score analysis for identification of significant differential protein expression, a utility to assess APEX classifier performance via cross validation, and a utility to merge multiple APEX results into a standardized format in preparation for further statistical analysis.
The APEX Quantitative Proteomics Tool provides a simple means to quickly derive hundreds to thousands of protein abundance values from standard liquid chromatography-tandem mass spectrometry proteomics datasets. The APEX tool provides a straightforward intuitive interface design overlaying a highly customizable computational workflow to produce protein abundance values from LC-MS/MS datasets.
PMCID: PMC2639435  PMID: 19068132
4.  Age-Dependent Evolution of the Yeast Protein Interaction Network Suggests a Limited Role of Gene Duplication and Divergence 
PLoS Computational Biology  2008;4(11):e1000232.
Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The key ideas are (1) interaction probability increases with availability of unoccupied interaction surface, thus following an anti-preferential attachment rule, (2) as a network grows, highly connected sub-networks emerge into protein modules or complexes, and (3) once a new protein is committed to a module, further connections tend to be localized within that module. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution.
Author Summary
Proteins function together forming stable protein complexes or transient interactions in various cellular processes, such as gene regulation and signaling. Here, we address the basic question of how these networks of interacting proteins evolve. This is an important problem, as the structures of such networks underlie important features of biological systems, such as functional modularity, error-tolerance, and stability. It is not yet known how these network architectures originate or what driving forces underlie the observed network structure. Several models have been proposed over the past decade—in particular, a “rich get richer” model (preferential attachment) and a model based upon gene duplication and divergence—often based only on network topologies. Here, we show that real yeast protein interaction networks show a unique age distribution among interacting proteins, which rules out these canonical models. In light of these results, we developed a simple, alternative model based on well-established physical principles, analogous to the process of growing protein crystals in solution. The model better explains many features of real PPI networks, including the network topologies, their characteristic age distributions, and the spatial distribution of subunits of differing ages within protein complexes, suggesting a plausible physical mechanism of network evolution.
PMCID: PMC2583957  PMID: 19043579
5.  mspire: mass spectrometry proteomics in Ruby 
Bioinformatics  2008;24(23):2796-2797.
Summary: Mass spectrometry-based proteomics stands to gain from additional analysis of its data, but its large, complex datasets make demands on speed and memory usage requiring special consideration from scripting languages. The software library ‘mspire’—developed in the Ruby programming language—offers quick and memory-efficient readers for standard xml proteomics formats, converters for intermediate file types in typical proteomics spectral-identification work flows (including the Bioworks .srf format), and modules for the calculation of peptide false identification rates.
Availability: Freely available at Additional data models, usage information, and methods available at
PMCID: PMC2639276  PMID: 18930952
6.  Bud23 Methylates G1575 of 18S rRNA and Is Required for Efficient Nuclear Export of Pre-40S Subunits▿  
Molecular and Cellular Biology  2008;28(10):3151-3161.
BUD23 was identified from a bioinformatics analysis of Saccharomyces cerevisiae genes involved in ribosome biogenesis. Deletion of BUD23 leads to severely impaired growth, reduced levels of the small (40S) ribosomal subunit, and a block in processing 20S rRNA to 18S rRNA, a late step in 40S maturation. Bud23 belongs to the S-adenosylmethionine-dependent Rossmann-fold methyltransferase superfamily and is related to small-molecule methyltransferases. Nevertheless, we considered that Bud23 methylates rRNA. Methylation of G1575 is the only mapped modification for which the methylase has not been assigned. Here, we show that this modification is lost in bud23 mutants. The nuclear accumulation of the small-subunit reporters Rps2-green fluorescent protein (GFP) and Rps3-GFP, as well as the rRNA processing intermediate, the 5′ internal transcribed spacer 1, indicate that bud23 mutants are defective for small-subunit export. Mutations in Bud23 that inactivated its methyltransferase activity complemented a bud23Δ mutant. In addition, mutant ribosomes in which G1575 was changed to adenosine supported growth comparable to that of cells with wild-type ribosomes. Thus, Bud23 protein, but not its methyltransferase activity, is important for biogenesis and export of the 40S subunit in yeast.
PMCID: PMC2423152  PMID: 18332120
7.  Mechanisms of Cell Cycle Control Revealed by a Systematic and Quantitative Overexpression Screen in S. cerevisiae 
PLoS Genetics  2008;4(7):e1000120.
Regulation of cell cycle progression is fundamental to cell health and reproduction, and failures in this process are associated with many human diseases. Much of our knowledge of cell cycle regulators derives from loss-of-function studies. To reveal new cell cycle regulatory genes that are difficult to identify in loss-of-function studies, we performed a near-genome-wide flow cytometry assay of yeast gene overexpression-induced cell cycle delay phenotypes. We identified 108 genes whose overexpression significantly delayed the progression of the yeast cell cycle at a specific stage. Many of the genes are newly implicated in cell cycle progression, for example SKO1, RFA1, and YPR015C. The overexpression of RFA1 or YPR015C delayed the cell cycle at G2/M phases by disrupting spindle attachment to chromosomes and activating the DNA damage checkpoint, respectively. In contrast, overexpression of the transcription factor SKO1 arrests cells at G1 phase by activating the pheromone response pathway, revealing new cross-talk between osmotic sensing and mating. More generally, 92%–94% of the genes exhibit distinct phenotypes when overexpressed as compared to their corresponding deletion mutants, supporting the notion that many genes may gain functions upon overexpression. This work thus implicates new genes in cell cycle progression, complements previous screens, and lays the foundation for future experiments to define more precisely roles for these genes in cell cycle progression.
Author Summary
All cells require proper cell cycle regulation; failure leads to numerous human diseases. Cell cycle mechanisms are broadly conserved across eukaryotes, with many key regulatory genes known. Nonetheless, our knowledge of regulators is incomplete. Many classic studies have analyzed yeast loss-of-function mutants to identify cell cycle genes. Studies have also implicated genes based upon their overexpression phenotypes, but the effects of gene overexpression on the cell cycle have not been quantified for all yeast genes. We individually quantified the effect of overexpression on cell cycle progression for nearly all (91%) of yeast genes, and we report the 108 genes causing the most significant and reproducible cell cycle defects, most of which have not been previously observed. We characterize three genes in more detail, implicating one in chromosomal segregation and mitotic spindle formation. A second affects mitotic stability and the DNA damage checkpoint. Curiously, overexpression of a third gene, SKO1, arrests the cell cycle by activating the pheromone response pathway, with cells mistakenly behaving as if mating pheromone is present. These results establish a basis for future experiments elucidating precise cell cycle roles for these genes. Similar assays in human cells could help further clarify the many connections between cell cycle control and cancers.
PMCID: PMC2438615  PMID: 18617996
8.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy 
Genome Biology  2008;9(Suppl 1):S5.
The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseFunc competition, we developed and applied two distinct large-scale data mining approaches to infer the functions (Gene Ontology annotations) of mouse genes from experimental observations from available functional genomics, proteomics, comparative genomics, and phenotypic data. The two strategies — the first using classifiers to map features to annotations, the second propagating annotations from characterized genes to uncharacterized genes along edges in a network constructed from the features — offer alternative and possibly complementary approaches to providing functional annotations. Here, we re-implement and evaluate these approaches and their combination for their ability to predict the proper functional annotations of genes in the MouseFunc data set. We show that, when controlling for the same set of input features, the network approach generally outperformed a naïve Bayesian classifier approach, while their combination offers some improvement over either independently. We make our observations of predictive performance on the MouseFunc competition hold-out set, as well as on a ten-fold cross-validation of the MouseFunc data. Across all 1,339 annotated genes in the MouseFunc test set, the median predictive power was quite strong (median area under a receiver operating characteristic plot of 0.865 and average precision of 0.195), indicating that a mining-based strategy with existing data is a promising path towards discovering mammalian gene functions. As one product of this work, a high-confidence subset of the functional mouse gene network was produced — spanning >70% of mouse genes with >1.6 million associations — that is predictive of mouse (and therefore often human) gene function and functional associations. The network should be generally useful for mammalian gene functional analyses, such as for predicting interactions, inferring functional connections between genes and pathways, and prioritizing candidate genes. The network and all predictions are available on the worldwide web.
PMCID: PMC2447539  PMID: 18613949
9.  Group II Intron Protein Localization and Insertion Sites Are Affected by Polyphosphate 
PLoS Biology  2008;6(6):e150.
Mobile group II introns consist of a catalytic intron RNA and an intron-encoded protein with reverse transcriptase activity, which act together in a ribonucleoprotein particle to promote DNA integration during intron mobility. Previously, we found that the Lactococcus lactis Ll.LtrB intron-encoded protein (LtrA) expressed alone or with the intron RNA to form ribonucleoprotein particles localizes to bacterial cellular poles, potentially accounting for the intron's preferential insertion in the oriC and ter regions of the Escherichia coli chromosome. Here, by using cell microarrays and automated fluorescence microscopy to screen a transposon-insertion library, we identified five E. coli genes (gppA, uhpT, wcaK, ynbC, and zntR) whose disruption results in both an increased proportion of cells with more diffuse LtrA localization and a more uniform genomic distribution of Ll.LtrB-insertion sites. Surprisingly, we find that a common factor affecting LtrA localization in these and other disruptants is the accumulation of intracellular polyphosphate, which appears to bind LtrA and other basic proteins and delocalize them away from the poles. Our findings show that the intracellular localization of a group II intron-encoded protein is a major determinant of insertion-site preference. More generally, our results suggest that polyphosphate accumulation may provide a means of localizing proteins to different sites of action during cellular stress or entry into stationary phase, with potentially wide physiological consequences.
Author Summary
Group II introns are bacterial mobile elements thought to be ancestors of introns—genetic material that is discarded from messenger RNA transcripts—and retroelements—genetic elements and viruses that replicate via reverse transcription—in higher organisms. They propagate by forming a complex consisting of the catalytically active intron RNA and an intron-encoded reverse transcriptase (which converts the RNA to DNA, which can then be reinserted in the host genome). The Ll.LtrB group II intron-encoded protein (LtrA) was found previously to localize to bacterial cellular poles, potentially accounting for the preferential insertion of Ll.LtrB in the replication origin (oriC) and terminus (ter) regions of the Escherichia coli chromosome, which are located near the poles during much of the cell cycle. Here, we identify E. coli genes whose disruption leads both to more diffuse LtrA localization and a more uniform chromosomal distribution of Ll.LtrB-insertion sites, proving that the location of the LtrA protein contributes to insertion-site preference. Surprisingly, we find that LtrA localization in the disruptants is affected by the accumulation of intracellular polyphosphate, which appears to bind basic proteins and delocalize them away from the cellular poles. Thus, polyphosphate, a ubiquitous but enigmatic molecule in prokaryotes and eukaryotes, can localize proteins to different sites of action, with potentially wide physiological consequences.
A novel cell microarray method uncovers connections between group II intron mobility, cell stress, and polyphosphate metabolism, including the finding that polyphosphate can influence intracellular protein localization.
PMCID: PMC2435150  PMID: 18593213
10.  A map of human protein interactions derived from co-expression of human mRNAs and their orthologs 
The human protein interaction network will offer global insights into the molecular organization of cells and provide a framework for modeling human disease, but the network's large scale demands new approaches. We report a set of 7000 physical associations among human proteins inferred from indirect evidence: the comparison of human mRNA co-expression patterns with those of orthologous genes in five other eukaryotes, which we demonstrate identifies proteins in the same physical complexes. To evaluate the accuracy of the predicted physical associations, we apply quantitative mass spectrometry shotgun proteomics to measure elution profiles of 3013 human proteins during native biochemical fractionation, demonstrating systematically that putative interaction partners tend to co-sediment. We further validate uncharacterized proteins implicated by the associations in ribosome biogenesis, including WBSCR20C, associated with Williams–Beuren syndrome. This meta-analysis therefore exploits non-protein-based data, but successfully predicts associations, including 5589 novel human physical protein associations, with measured accuracies of 54±10%, comparable to direct large-scale interaction assays. The new associations' derivation from conserved in vivo phenomena argues strongly for their biological relevance.
PMCID: PMC2387231  PMID: 18414481
interactions; mass spectrometry; networks; proteomics; systems biology
11.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence 
Genome Biology  2008;9(Suppl 1):S2.
Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.
In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%.
We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.
PMCID: PMC2447536  PMID: 18613946

Results 1-11 (11)