PMCC PMCC

Search tips
Search criteria

Advanced
Results 26-50 (59)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
26.  Parallel Evolution in Pseudomonas aeruginosa over 39,000 Generations In Vivo 
mBio  2010;1(4):e00199-10.
The Gram-negative bacterium Pseudomonas aeruginosa is a common cause of chronic airway infections in individuals with the heritable disease cystic fibrosis (CF). After prolonged colonization of the CF lung, P. aeruginosa becomes highly resistant to host clearance and antibiotic treatment; therefore, understanding how this bacterium evolves during chronic infection is important for identifying beneficial adaptations that could be targeted therapeutically. To identify potential adaptive traits of P. aeruginosa during chronic infection, we carried out global transcriptomic profiling of chronological clonal isolates obtained from 3 individuals with CF. Isolates were collected sequentially over periods ranging from 3 months to 8 years, representing up to 39,000 in vivo generations. We identified 24 genes that were commonly regulated by all 3 P. aeruginosa lineages, including several genes encoding traits previously shown to be important for in vivo growth. Our results reveal that parallel evolution occurs in the CF lung and that at least a proportion of the traits identified are beneficial for P. aeruginosa chronic colonization of the CF lung.
IMPORTANCE
Deadly diseases like AIDS, malaria, and tuberculosis are the result of long-term chronic infections. Pathogens that cause chronic infections adapt to the host environment, avoiding the immune response and resisting antimicrobial agents. Studies of pathogen adaptation are therefore important for understanding how the efficacy of current therapeutics may change upon prolonged infection. One notorious chronic pathogen is Pseudomonas aeruginosa, a bacterium that causes long-term infections in individuals with the heritable disease cystic fibrosis (CF). We used gene expression profiles to identify 24 genes that commonly changed expression over time in 3 P. aeruginosa lineages, indicating that these changes occur in parallel in the lungs of individuals with CF. Several of these genes have previously been shown to encode traits critical for in vivo-relevant processes, suggesting that they are likely beneficial adaptations important for chronic colonization of the CF lung.
doi:10.1128/mBio.00199-10
PMCID: PMC2939680  PMID: 20856824
27.  Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line 
We provide a large-scale dataset on absolute protein and matching mRNA concentrations from the human medulloblastoma cell line Daoy. The correlation between mRNA and protein concentrations is significant and positive (Rs=0.46, R2=0.29, P-value<2e16), although non-linear.Out of ∼200 tested sequence features, sequence length, frequency and properties of amino acids, as well as translation initiation-related features are the strongest individual correlates of protein abundance when accounting for variation in mRNA concentration.When integrating mRNA expression data and all sequence features into a non-parametric regression model (Multivariate Adaptive Regression Splines), we were able to explain up to 67% of the variation in protein concentrations. Half of the contributions were attributed to mRNA concentrations, the other half to sequence features relating to regulation of translation and protein degradation. The sequence features are primarily linked to the coding and 3′ untranslated region. To our knowledge, this is the most comprehensive predictive model of human protein concentrations achieved so far.
mRNA decay, translation regulation and protein degradation are essential parts of eukaryotic gene expression regulation (Hieronymus and Silver, 2004; Mata et al, 2005), which enable the dynamics of cellular systems and their responses to external and internal stimuli without having to rely exclusively on transcription regulation. The importance of these processes is emphasized by the generally low correlation between mRNA and protein concentrations. For many prokaryotic and eukaryotic organisms, <50% of variation in protein abundance variation is explained by variation in mRNA concentrations (de Sousa Abreu et al, 2009).
Given the plethora of regulatory mechanisms involved, most studies have focused so far on individual regulators and specific targets. Particularly in human, we currently lack system-wide, quantitative analyses that evaluate the relative contribution of regulatory elements encoded in the mRNA and protein sequence. Existing studies have been carried out only in bacteria and yeast (Nie et al, 2006; Brockmann et al, 2007; Tuller et al, 2007; Wu et al, 2008). Here, we present the first comprehensive analysis on the impact of translation and protein degradation on protein abundance variation in a human cell line. For this purpose, we experimentally measured absolute protein and mRNA concentrations in the Daoy medulloblastoma cell line, using shotgun proteomics and microarrays, respectively (Figure 1). These data comprise one of the largest such sets available today for human. We focused on sequence features that likely impact protein translation and protein degradation, including length, nucleotide composition, structure of the untranslated regions (UTRs), coding sequence, composition of the translation initiation site, presence of upstream open reading frames putative target sites of miRNAs, codon usage, amino-acid composition and protein degradation signals.
Three types of tests have been conducted: (a) we examined partial Spearman's rank correlation of numerical features (e.g. length) with protein concentration, accounting for variation in mRNA concentrations; (b) for numerical and categorical features (e.g. function), we compared two extreme populations with Welch's t-test and (c) using a Multivariate Adaptive Regression Splines model, we analyzed the combined contributions of mRNA expression and sequence features to protein abundance variation (Figure 1). To account for the non-linearity of many relationships, we use non-parametric approaches throughout the analysis.
We observed a significant positive correlation between mRNA and protein concentrations, larger than many previous measurements (de Sousa Abreu et al, 2009). We also show that the contribution of translation and protein degradation is at least as important as the contribution of mRNA transcription and stability to the abundance variation of the final protein products. Although variation in mRNA expression explains ∼25–30% of the variation in protein abundance, another 30–40% can be accounted for by characteristics of the sequences, which we identified in a comparative assessment of global correlates. Among these characteristics, sequence length, amino-acid frequencies and also nucleotide frequencies in the coding region are of strong influence (Figure 3A). Characteristics of the 3′UTR and of the 5′UTR, that is length, nucleotide composition and secondary structures, describe another part of the variation, leaving 33% expression variation unexplained. The unexplained fraction may be accounted for by mechanisms not considered in this analysis (e.g. regulation by RNA-binding proteins or gene-specific structural motifs), as well as expression and measurement noise.
Our combined model including mRNA concentration and sequence features can explain 67% of the variation of protein abundance in this system—and thus has the highest predictive power for human protein abundance achieved so far (Figure 3B).
Transcription, mRNA decay, translation and protein degradation are essential processes during eukaryotic gene expression, but their relative global contributions to steady-state protein concentrations in multi-cellular eukaryotes are largely unknown. Using measurements of absolute protein and mRNA abundances in cellular lysate from the human Daoy medulloblastoma cell line, we quantitatively evaluate the impact of mRNA concentration and sequence features implicated in translation and protein degradation on protein expression. Sequence features related to translation and protein degradation have an impact similar to that of mRNA abundance, and their combined contribution explains two-thirds of protein abundance variation. mRNA sequence lengths, amino-acid properties, upstream open reading frames and secondary structures in the 5′ untranslated region (UTR) were the strongest individual correlates of protein concentrations. In a combined model, characteristics of the coding region and the 3′UTR explained a larger proportion of protein abundance variation than characteristics of the 5′UTR. The absolute protein and mRNA concentration measurements for >1000 human genes described here represent one of the largest datasets currently available, and reveal both general trends and specific examples of post-transcriptional regulation.
doi:10.1038/msb.2010.59
PMCID: PMC2947365  PMID: 20739923
gene expression regulation; protein degradation; protein stability; translation
28.  A Synthetic Genetic Edge Detection Program 
Cell  2009;137(7):1272-1281.
Summary
Edge detection is a signal processing algorithm common in artificial intelligence and image recognition programs. We have constructed a genetically encoded edge detection algorithm that programs an isogenic community of E.coli to sense an image of light, communicate to identify the light-dark edges, and visually present the result of the computation. The algorithm is implemented using multiple genetic circuits. An engineered light sensor enables cells to distinguish between light and dark regions. In the dark, cells produce a diffusible chemical signal that diffuses into light regions. Genetic logic gates are used so that only cells that sense light and the diffusible signal produce a positive output. A mathematical model constructed from first principles and parameterized with experimental measurements of the component circuits predicts the performance of the complete program. Quantitatively accurate models will facilitate the engineering of more complex biological behaviors and inform bottom-up studies of natural genetic regulatory networks.
doi:10.1016/j.cell.2009.04.048
PMCID: PMC2775486  PMID: 19563759
29.  Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana 
Nature biotechnology  2010;28(2):149-156.
Plants are essential sources of food, fiber and renewable energy. Effective methods for manipulating plant traits have important agricultural and economic consequences. We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional network and targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647 (73%) genes of the reference flowering plant Arabidopsis thaliana. AraNet associations have measured precision greater than literature-based protein interactions (21%) for 55% of genes, and are highly predictive for diverse biological pathways. Using AraNet, we found a 10-fold enrichment in identifying early seedling development genes. By interrogating network neighborhoods, we identify At1g80710 (now Drought sensitive 1; Drs1) and At3g05090 (now Lateral root stimulator 1; Lrs1) as novel regulators of drought sensitivity and lateral root development, respectively. AraNet (http://www.functionalnet.org/aranet/) provides a global resource for plant gene function identification and genetic dissection of plant traits.
doi:10.1038/nbt.1603
PMCID: PMC2857375  PMID: 20118918
30.  The planar cell polarity effector Fuz is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development 
Nature cell biology  2009;11(10):1225-1232.
The planar cell polarity (PCP) signaling pathway is essential for embryonic development because it governs diverse cellular behaviors, and the “core PCP” proteins, such as Dishevelled and Frizzled, have been extensively characterized1–4. By contrast, the “PCP effector” proteins, such as Intu and Fuz, remain largely unstudied5, 6. These proteins are essential for PCP signaling, but they have never been investigated in a mammal and their cell biological activities remain entirely unknown. We report here that Fuz mutant mice display neural tube defects, skeletal dysmorphologies, and Hedgehog signaling defects stemming from disrupted ciliogenesis. Using bioinformatics and imaging of an in vivo mucociliary epithelium, we establish a central role for Fuz in membrane trafficking, showing that Fuz is essential for trafficking of cargo to basal bodies and to the apical tips of cilia. Fuz is also essential for exocytosis in secretory cells. Finally, we identify a novel, Rab-related small GTPase as a Fuz interaction partner that is also essential for ciliogenesis and secretion. These results are significant because they provide novel insights into the mechanisms by which developmental regulatory systems like PCP signaling interface with fundamental cellular systems such as the vesicle trafficking machinery.
doi:10.1038/ncb1966
PMCID: PMC2755648  PMID: 19767740
31.  Disorder, promiscuity, and toxic partnerships 
Cell  2009;138(1):16-18.
Many genes are toxic when overexpressed, but general mechanisms for this toxicity have proven elusive. Vavouri et al. (2009) find that intrinsic protein disorder and promiscuous molecular interactions are strong determinants of dosage sensitivity, explaining in part the toxicity of dosage-sensitive oncogenes in mice and humans.
doi:10.1016/j.cell.2009.06.024
PMCID: PMC2848715  PMID: 19596229
32.  Ribosome stalk assembly requires the dual-specificity phosphatase Yvh1 for the exchange of Mrt4 with P0 
The Journal of Cell Biology  2009;186(6):849-862.
The step by step assembly process from preribosome in the nucleus to translation-competent 60S ribosome subunit in the cytoplasm is revealed (also see Kemmler et al. in this issue).
The ribosome stalk is essential for recruitment of translation factors. In yeast, P0 and Rpl12 correspond to bacterial L10 and L11 and form the stalk base of mature ribosomes, whereas Mrt4 is a nuclear paralogue of P0. In this study, we show that the dual-specificity phosphatase Yvh1 is required for the release of Mrt4 from the pre-60S subunits. Deletion of YVH1 leads to the persistence of Mrt4 on pre-60S subunits in the cytoplasm. A mutation in Mrt4 at the protein–RNA interface bypasses the requirement for Yvh1. Pre-60S subunits associated with Yvh1 contain Rpl12 but lack both Mrt4 and P0. These results suggest a linear series of events in which Yvh1 binds to the pre-60S subunit to displace Mrt4. Subsequently, P0 loads onto the subunit to assemble the mature stalk, and Yvh1 is released. The initial assembly of the ribosome with Mrt4 may provide functional compartmentalization of ribosome assembly in addition to the spatial separation afforded by the nuclear envelope.
doi:10.1083/jcb.200904110
PMCID: PMC2753163  PMID: 19797078
33.  Human Cell Chips: Adapting DNA Microarray Spotting Technology to Cell-Based Imaging Assays 
PLoS ONE  2009;4(10):e7088.
Here we describe human spotted cell chips, a technology for determining cellular state across arrays of cells subjected to chemical or genetic perturbation. Cells are grown and treated under standard tissue culture conditions before being fixed and printed onto replicate glass slides, effectively decoupling the experimental conditions from the assay technique. Each slide is then probed using immunofluorescence or other optical reporter and assayed by automated microscopy. We show potential applications of the cell chip by assaying HeLa and A549 samples for changes in target protein abundance (of the dsRNA-activated protein kinase PKR), subcellular localization (nuclear translocation of NFκB) and activation state (phosphorylation of STAT1 and of the p38 and JNK stress kinases) in response to treatment by several chemical effectors (anisomycin, TNFα, and interferon), and we demonstrate scalability by printing a chip with ∼4,700 discrete samples of HeLa cells. Coupling this technology to high-throughput methods for culturing and treating cell lines could enable researchers to examine the impact of exogenous effectors on the same population of experimentally treated cells across multiple reporter targets potentially representing a variety of molecular systems, thus producing a highly multiplexed dataset with minimized experimental variance and at reduced reagent cost compared to alternative techniques. The ability to prepare and store chips also allows researchers to follow up on observations gleaned from initial screens with maximal repeatability.
doi:10.1371/journal.pone.0007088
PMCID: PMC2760726  PMID: 19862318
34.  Mining gene functional networks to improve mass-spectrometry-based protein identification 
Bioinformatics  2009;25(22):2955-2961.
Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly.
Results: We develop a method that analyzes MS/MS experiments in the larger context of the biological processes active in a cell. Our method, MSNet, improves protein identification in shotgun proteomics experiments by considering information on functional associations from a gene functional network. MSNet substantially increases the number of proteins identified in the sample at a given error rate. We identify 8–29% more proteins than the original MS experiment when applied to yeast grown in different experimental conditions analyzed on different MS/MS instruments, and 37% more proteins in a human sample. We validate up to 94% of our identifications in yeast by presence in ground-truth reference sets.
Availability and Implementation: Software and datasets are available at http://aug.csres.utexas.edu/msnet
Contact: miranker@cs.utexas.edu, marcotte@icmb.utexas.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp461
PMCID: PMC2773251  PMID: 19633097
35.  Integrating shotgun proteomics and mRNA expression data to improve protein identification 
Bioinformatics  2009;25(11):1397-1403.
Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration.
Results: We develop a Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by ∼40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions and organisms (Escherichia coli, human), and predict 19–63% more proteins across the different datasets. MSpresso demonstrates that incorporating prior knowledge of protein presence into shotgun proteomics experiments can substantially improve protein identification scores.
Availability and Implementation: Software is available upon request from the authors. Mass spectrometry datasets and supplementary information are available from http://www.marcottelab.org/MSpresso/.
Contact: marcotte@icmb.utexas.edu; miranker@cs.utexas.edu
Supplementary Information: Supplementary data website: http://www.marcottelab.org/MSpresso/.
doi:10.1093/bioinformatics/btp168
PMCID: PMC2682515  PMID: 19318424
36.  Systematic Definition of Protein Constituents along the Major Polarization Axis Reveals an Adaptive Reuse of the Polarization Machinery in Pheromone-Treated Budding Yeast 
Polarizing cells extensively restructure cellular components in a spatially and temporally coupled manner along the major axis of cellular extension. Budding yeast are a useful model of polarized growth, helping to define many molecular components of this conserved process. Besides budding, yeast cells also differentiate upon treatment with pheromone from the opposite mating type, forming a mating projection (the ‘shmoo’) by directional restructuring of the cytoskeleton, localized vesicular transport and overall reorganization of the cytosol. To characterize the proteomic localization changes accompanying polarized growth, we developed and implemented a novel cell microarray-based imaging assay for measuring the spatial redistribution of a large fraction of the yeast proteome, and applied this assay to identify proteins localized along the mating projection following pheromone treatment. We further trained a machine learning algorithm to refine the cell imaging screen, identifying additional shmoo-localized proteins. In all, we identified 74 proteins that specifically localize to the mating projection, including previously uncharacterized proteins (Ycr043c, Ydr348c, Yer071c, Ymr295c, and Yor304c-a) and known polarization complexes such as the exocyst. Functional analysis of these proteins, coupled with quantitative analysis of individual organelle movements during shmoo formation, suggests a model in which the basic machinery for cell polarization is generally conserved between processes forming the bud and the shmoo, with a distinct subset of proteins used only for shmoo formation. The net effect is a defined ordering of major organelles along the polarization axis, with specific proteins implicated at the proximal growth tip.
Upon sensing mating pheromone, budding yeast cells form a mating projection (the ‘shmoo’) that serves as a model for polarized cell growth, involving cytoskeletal/cytosolic restructuring and directed vesicular transport. We developed a cell microarray-based imaging assay for measuring localization of the yeast proteome during polarized growth. We find major organelles ordered along the polarization axis, localize 74 proteins to the growth tip, and observe adaptive reuse of general polarization machinery.
doi:10.1021/pr800524g
PMCID: PMC2651748  PMID: 19053807
Proteomics; polarized growth; subcellular localization; pheromone response; yeast
37.  Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation 
BMC Genomics  2008;9:609.
Background
One mechanism to account for robustness against gene knockouts or knockdowns is through buffering by gene duplicates, but the extent and general correlates of this process in organisms is still a matter of debate. To reveal general trends of this process, we provide a comprehensive comparison of gene essentiality, duplication and buffering by duplicates across seven bacteria (Mycoplasma genitalium, Bacillus subtilis, Helicobacter pylori, Haemophilus influenzae, Mycobacterium tuberculosis, Pseudomonas aeruginosa, Escherichia coli), and four eukaryotes (Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Mus musculus (mouse)).
Results
In nine of the eleven organisms, duplicates significantly increase chances of survival upon gene deletion (P-value ≤ 0.05), but only by up to 13%. Given that duplicates make up to 80% of eukaryotic genomes, the small contribution is surprising and points to dominant roles of other buffering processes, such as alternative metabolic pathways. The buffering capacity of duplicates appears to be independent of the degree of gene essentiality and tends to be higher for genes with high expression levels. For example, buffering capacity increases to 23% amongst highly expressed genes in E. coli. Sequence similarity and the number of duplicates per gene are weak predictors of the duplicate's buffering capacity. In a case study we show that buffering gene duplicates in yeast and worm are somewhat more similar in their functions than non-buffering duplicates and have increased transcriptional and translational activity.
Conclusion
In sum, the extent of gene essentiality and buffering by duplicates is not conserved across organisms and does not correlate with the organisms' apparent complexity. This heterogeneity goes beyond what would be expected from differences in experimental approaches alone. Buffering by duplicates contributes to robustness in several organisms, but to a small extent – and the relatively large amount of buffering by duplicates observed in yeast and worm may be largely specific to these organisms. Thus, the only common factor of buffering by duplicates between different organisms may be the by-product of duplicate retention due to demands of high dosage.
doi:10.1186/1471-2164-9-609
PMCID: PMC2627895  PMID: 19087332
38.  The APEX Quantitative Proteomics Tool: Generating protein quantitation estimates from LC-MS/MS proteomics results 
BMC Bioinformatics  2008;9:529.
Background
Mass spectrometry (MS) based label-free protein quantitation has mainly focused on analysis of ion peak heights and peptide spectral counts. Most analyses of tandem mass spectrometry (MS/MS) data begin with an enzymatic digestion of a complex protein mixture to generate smaller peptides that can be separated and identified by an MS/MS instrument. Peptide spectral counting techniques attempt to quantify protein abundance by counting the number of detected tryptic peptides and their corresponding MS spectra. However, spectral counting is confounded by the fact that peptide physicochemical properties severely affect MS detection resulting in each peptide having a different detection probability. Lu et al. (2007) described a modified spectral counting technique, Absolute Protein Expression (APEX), which improves on basic spectral counting methods by including a correction factor for each protein (called Oi value) that accounts for variable peptide detection by MS techniques. The technique uses machine learning classification to derive peptide detection probabilities that are used to predict the number of tryptic peptides expected to be detected for one molecule of a particular protein (Oi). This predicted spectral count is compared to the protein's observed MS total spectral count during APEX computation of protein abundances.
Results
The APEX Quantitative Proteomics Tool, introduced here, is a free open source Java application that supports the APEX protein quantitation technique. The APEX tool uses data from standard tandem mass spectrometry proteomics experiments and provides computational support for APEX protein abundance quantitation through a set of graphical user interfaces that partition thparameter controls for the various processing tasks. The tool also provides a Z-score analysis for identification of significant differential protein expression, a utility to assess APEX classifier performance via cross validation, and a utility to merge multiple APEX results into a standardized format in preparation for further statistical analysis.
Conclusion
The APEX Quantitative Proteomics Tool provides a simple means to quickly derive hundreds to thousands of protein abundance values from standard liquid chromatography-tandem mass spectrometry proteomics datasets. The APEX tool provides a straightforward intuitive interface design overlaying a highly customizable computational workflow to produce protein abundance values from LC-MS/MS datasets.
doi:10.1186/1471-2105-9-529
PMCID: PMC2639435  PMID: 19068132
39.  Age-Dependent Evolution of the Yeast Protein Interaction Network Suggests a Limited Role of Gene Duplication and Divergence 
PLoS Computational Biology  2008;4(11):e1000232.
Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The key ideas are (1) interaction probability increases with availability of unoccupied interaction surface, thus following an anti-preferential attachment rule, (2) as a network grows, highly connected sub-networks emerge into protein modules or complexes, and (3) once a new protein is committed to a module, further connections tend to be localized within that module. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution.
Author Summary
Proteins function together forming stable protein complexes or transient interactions in various cellular processes, such as gene regulation and signaling. Here, we address the basic question of how these networks of interacting proteins evolve. This is an important problem, as the structures of such networks underlie important features of biological systems, such as functional modularity, error-tolerance, and stability. It is not yet known how these network architectures originate or what driving forces underlie the observed network structure. Several models have been proposed over the past decade—in particular, a “rich get richer” model (preferential attachment) and a model based upon gene duplication and divergence—often based only on network topologies. Here, we show that real yeast protein interaction networks show a unique age distribution among interacting proteins, which rules out these canonical models. In light of these results, we developed a simple, alternative model based on well-established physical principles, analogous to the process of growing protein crystals in solution. The model better explains many features of real PPI networks, including the network topologies, their characteristic age distributions, and the spatial distribution of subunits of differing ages within protein complexes, suggesting a plausible physical mechanism of network evolution.
doi:10.1371/journal.pcbi.1000232
PMCID: PMC2583957  PMID: 19043579
40.  mspire: mass spectrometry proteomics in Ruby 
Bioinformatics  2008;24(23):2796-2797.
Summary: Mass spectrometry-based proteomics stands to gain from additional analysis of its data, but its large, complex datasets make demands on speed and memory usage requiring special consideration from scripting languages. The software library ‘mspire’—developed in the Ruby programming language—offers quick and memory-efficient readers for standard xml proteomics formats, converters for intermediate file types in typical proteomics spectral-identification work flows (including the Bioworks .srf format), and modules for the calculation of peptide false identification rates.
Availability: Freely available at http://mspire.rubyforge.org. Additional data models, usage information, and methods available at http://bioinformatics.icmb.utexas.edu/mspire
Contact: marcotte@icmb.utexas.edu
doi:10.1093/bioinformatics/btn513
PMCID: PMC2639276  PMID: 18930952
41.  Bud23 Methylates G1575 of 18S rRNA and Is Required for Efficient Nuclear Export of Pre-40S Subunits▿  
Molecular and Cellular Biology  2008;28(10):3151-3161.
BUD23 was identified from a bioinformatics analysis of Saccharomyces cerevisiae genes involved in ribosome biogenesis. Deletion of BUD23 leads to severely impaired growth, reduced levels of the small (40S) ribosomal subunit, and a block in processing 20S rRNA to 18S rRNA, a late step in 40S maturation. Bud23 belongs to the S-adenosylmethionine-dependent Rossmann-fold methyltransferase superfamily and is related to small-molecule methyltransferases. Nevertheless, we considered that Bud23 methylates rRNA. Methylation of G1575 is the only mapped modification for which the methylase has not been assigned. Here, we show that this modification is lost in bud23 mutants. The nuclear accumulation of the small-subunit reporters Rps2-green fluorescent protein (GFP) and Rps3-GFP, as well as the rRNA processing intermediate, the 5′ internal transcribed spacer 1, indicate that bud23 mutants are defective for small-subunit export. Mutations in Bud23 that inactivated its methyltransferase activity complemented a bud23Δ mutant. In addition, mutant ribosomes in which G1575 was changed to adenosine supported growth comparable to that of cells with wild-type ribosomes. Thus, Bud23 protein, but not its methyltransferase activity, is important for biogenesis and export of the 40S subunit in yeast.
doi:10.1128/MCB.01674-07
PMCID: PMC2423152  PMID: 18332120
42.  Mechanisms of Cell Cycle Control Revealed by a Systematic and Quantitative Overexpression Screen in S. cerevisiae 
PLoS Genetics  2008;4(7):e1000120.
Regulation of cell cycle progression is fundamental to cell health and reproduction, and failures in this process are associated with many human diseases. Much of our knowledge of cell cycle regulators derives from loss-of-function studies. To reveal new cell cycle regulatory genes that are difficult to identify in loss-of-function studies, we performed a near-genome-wide flow cytometry assay of yeast gene overexpression-induced cell cycle delay phenotypes. We identified 108 genes whose overexpression significantly delayed the progression of the yeast cell cycle at a specific stage. Many of the genes are newly implicated in cell cycle progression, for example SKO1, RFA1, and YPR015C. The overexpression of RFA1 or YPR015C delayed the cell cycle at G2/M phases by disrupting spindle attachment to chromosomes and activating the DNA damage checkpoint, respectively. In contrast, overexpression of the transcription factor SKO1 arrests cells at G1 phase by activating the pheromone response pathway, revealing new cross-talk between osmotic sensing and mating. More generally, 92%–94% of the genes exhibit distinct phenotypes when overexpressed as compared to their corresponding deletion mutants, supporting the notion that many genes may gain functions upon overexpression. This work thus implicates new genes in cell cycle progression, complements previous screens, and lays the foundation for future experiments to define more precisely roles for these genes in cell cycle progression.
Author Summary
All cells require proper cell cycle regulation; failure leads to numerous human diseases. Cell cycle mechanisms are broadly conserved across eukaryotes, with many key regulatory genes known. Nonetheless, our knowledge of regulators is incomplete. Many classic studies have analyzed yeast loss-of-function mutants to identify cell cycle genes. Studies have also implicated genes based upon their overexpression phenotypes, but the effects of gene overexpression on the cell cycle have not been quantified for all yeast genes. We individually quantified the effect of overexpression on cell cycle progression for nearly all (91%) of yeast genes, and we report the 108 genes causing the most significant and reproducible cell cycle defects, most of which have not been previously observed. We characterize three genes in more detail, implicating one in chromosomal segregation and mitotic spindle formation. A second affects mitotic stability and the DNA damage checkpoint. Curiously, overexpression of a third gene, SKO1, arrests the cell cycle by activating the pheromone response pathway, with cells mistakenly behaving as if mating pheromone is present. These results establish a basis for future experiments elucidating precise cell cycle roles for these genes. Similar assays in human cells could help further clarify the many connections between cell cycle control and cancers.
doi:10.1371/journal.pgen.1000120
PMCID: PMC2438615  PMID: 18617996
43.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy 
Genome Biology  2008;9(Suppl 1):S5.
The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseFunc competition, we developed and applied two distinct large-scale data mining approaches to infer the functions (Gene Ontology annotations) of mouse genes from experimental observations from available functional genomics, proteomics, comparative genomics, and phenotypic data. The two strategies — the first using classifiers to map features to annotations, the second propagating annotations from characterized genes to uncharacterized genes along edges in a network constructed from the features — offer alternative and possibly complementary approaches to providing functional annotations. Here, we re-implement and evaluate these approaches and their combination for their ability to predict the proper functional annotations of genes in the MouseFunc data set. We show that, when controlling for the same set of input features, the network approach generally outperformed a naïve Bayesian classifier approach, while their combination offers some improvement over either independently. We make our observations of predictive performance on the MouseFunc competition hold-out set, as well as on a ten-fold cross-validation of the MouseFunc data. Across all 1,339 annotated genes in the MouseFunc test set, the median predictive power was quite strong (median area under a receiver operating characteristic plot of 0.865 and average precision of 0.195), indicating that a mining-based strategy with existing data is a promising path towards discovering mammalian gene functions. As one product of this work, a high-confidence subset of the functional mouse gene network was produced — spanning >70% of mouse genes with >1.6 million associations — that is predictive of mouse (and therefore often human) gene function and functional associations. The network should be generally useful for mammalian gene functional analyses, such as for predicting interactions, inferring functional connections between genes and pathways, and prioritizing candidate genes. The network and all predictions are available on the worldwide web.
doi:10.1186/gb-2008-9-s1-s5
PMCID: PMC2447539  PMID: 18613949
44.  Group II Intron Protein Localization and Insertion Sites Are Affected by Polyphosphate 
PLoS Biology  2008;6(6):e150.
Mobile group II introns consist of a catalytic intron RNA and an intron-encoded protein with reverse transcriptase activity, which act together in a ribonucleoprotein particle to promote DNA integration during intron mobility. Previously, we found that the Lactococcus lactis Ll.LtrB intron-encoded protein (LtrA) expressed alone or with the intron RNA to form ribonucleoprotein particles localizes to bacterial cellular poles, potentially accounting for the intron's preferential insertion in the oriC and ter regions of the Escherichia coli chromosome. Here, by using cell microarrays and automated fluorescence microscopy to screen a transposon-insertion library, we identified five E. coli genes (gppA, uhpT, wcaK, ynbC, and zntR) whose disruption results in both an increased proportion of cells with more diffuse LtrA localization and a more uniform genomic distribution of Ll.LtrB-insertion sites. Surprisingly, we find that a common factor affecting LtrA localization in these and other disruptants is the accumulation of intracellular polyphosphate, which appears to bind LtrA and other basic proteins and delocalize them away from the poles. Our findings show that the intracellular localization of a group II intron-encoded protein is a major determinant of insertion-site preference. More generally, our results suggest that polyphosphate accumulation may provide a means of localizing proteins to different sites of action during cellular stress or entry into stationary phase, with potentially wide physiological consequences.
Author Summary
Group II introns are bacterial mobile elements thought to be ancestors of introns—genetic material that is discarded from messenger RNA transcripts—and retroelements—genetic elements and viruses that replicate via reverse transcription—in higher organisms. They propagate by forming a complex consisting of the catalytically active intron RNA and an intron-encoded reverse transcriptase (which converts the RNA to DNA, which can then be reinserted in the host genome). The Ll.LtrB group II intron-encoded protein (LtrA) was found previously to localize to bacterial cellular poles, potentially accounting for the preferential insertion of Ll.LtrB in the replication origin (oriC) and terminus (ter) regions of the Escherichia coli chromosome, which are located near the poles during much of the cell cycle. Here, we identify E. coli genes whose disruption leads both to more diffuse LtrA localization and a more uniform chromosomal distribution of Ll.LtrB-insertion sites, proving that the location of the LtrA protein contributes to insertion-site preference. Surprisingly, we find that LtrA localization in the disruptants is affected by the accumulation of intracellular polyphosphate, which appears to bind basic proteins and delocalize them away from the cellular poles. Thus, polyphosphate, a ubiquitous but enigmatic molecule in prokaryotes and eukaryotes, can localize proteins to different sites of action, with potentially wide physiological consequences.
A novel cell microarray method uncovers connections between group II intron mobility, cell stress, and polyphosphate metabolism, including the finding that polyphosphate can influence intracellular protein localization.
doi:10.1371/journal.pbio.0060150
PMCID: PMC2435150  PMID: 18593213
45.  A map of human protein interactions derived from co-expression of human mRNAs and their orthologs 
The human protein interaction network will offer global insights into the molecular organization of cells and provide a framework for modeling human disease, but the network's large scale demands new approaches. We report a set of 7000 physical associations among human proteins inferred from indirect evidence: the comparison of human mRNA co-expression patterns with those of orthologous genes in five other eukaryotes, which we demonstrate identifies proteins in the same physical complexes. To evaluate the accuracy of the predicted physical associations, we apply quantitative mass spectrometry shotgun proteomics to measure elution profiles of 3013 human proteins during native biochemical fractionation, demonstrating systematically that putative interaction partners tend to co-sediment. We further validate uncharacterized proteins implicated by the associations in ribosome biogenesis, including WBSCR20C, associated with Williams–Beuren syndrome. This meta-analysis therefore exploits non-protein-based data, but successfully predicts associations, including 5589 novel human physical protein associations, with measured accuracies of 54±10%, comparable to direct large-scale interaction assays. The new associations' derivation from conserved in vivo phenomena argues strongly for their biological relevance.
doi:10.1038/msb.2008.19
PMCID: PMC2387231  PMID: 18414481
interactions; mass spectrometry; networks; proteomics; systems biology
46.  Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes 
Genome Biology  2007;8(12):R258.
Loss-of-function phenotypes of yeast genes can be predicted from the loss-of-function phenotypes of their neighbours in functional gene networks. This could potentially be applied to the prediction of human disease genes.
We demonstrate that loss-of-function yeast phenotypes are predictable by guilt-by-association in functional gene networks. Testing 1,102 loss-of-function phenotypes from genome-wide assays of yeast reveals predictability of diverse phenotypes, spanning cellular morphology, growth, metabolism, and quantitative cell shape features. We apply the method to extend a genome-wide screen by predicting, then verifying, genes whose disruption elongates yeast cells, and to predict human disease genes. To facilitate network-guided screens, a web server is available .
doi:10.1186/gb-2007-8-12-r258
PMCID: PMC2246260  PMID: 18053250
47.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae 
PLoS ONE  2007;2(10):e988.
Background
Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations.
Methodology/Principal Findings
We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis.
Conclusions/Significance
YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.
doi:10.1371/journal.pone.0000988
PMCID: PMC1991590  PMID: 17912365
48.  Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways 
BMC Genomics  2007;8:117.
Background
Cell lines have been used to study cancer for decades, but truly quantitative assessment of their performance as models is often lacking. We used gene expression profiling to quantitatively assess the gene expression of nine cell line models of cervical cancer.
Results
We find a wide variation in the extent to which different cell culture models mimic late-stage invasive cervical cancer biopsies. The lowest agreement was from monolayer HeLa cells, a common cervical cancer model; the highest agreement was from primary epithelial cells, C4-I, and C4-II cell lines. In addition, HeLa and SiHa cell lines cultured in an organotypic environment increased their correlation to cervical cancer significantly. We also find wide variation in agreement when we considered how well individual biological pathways model cervical cancer. Cell lines with an anti-correlation to cervical cancer were also identified and should be avoided.
Conclusion
Using gene expression profiling and quantitative analysis, we have characterized nine cell lines with respect to how well they serve as models of cervical cancer. Applying this method to individual pathways, we identified the appropriateness of particular cell lines for studying specific pathways in cervical cancer. This study will allow researchers to choose a cell line with the highest correlation to cervical cancer at a pathway level. This method is applicable to other cancers and could be used to identify the appropriate cell line and growth condition to employ when studying other cancers.
doi:10.1186/1471-2164-8-117
PMCID: PMC1878486  PMID: 17493265
49.  How complete are current yeast and human protein-interaction networks? 
Genome Biology  2006;7(11):120.
How can protein-interaction networks can be made more complete?
We estimate the full yeast protein-protein interaction network to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively. Paradoxically, releasing raw, unfiltered assay data might help separate true from false interactions.
doi:10.1186/gb-2006-7-11-120
PMCID: PMC1794583  PMID: 17147767
50.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome 
Genome Biology  2005;6(5):R40.
In order to consolidate the known human proteins interactions two tests were developed to measure the relative accuracy of the available interaction data. In addition, 6,580 interactions among 3,737 human proteins were recovered from Medline abstracts and combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing data sets.
Background
Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins.
Results
We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets.
Conclusion
These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.
doi:10.1186/gb-2005-6-5-r40
PMCID: PMC1175952  PMID: 15892868

Results 26-50 (59)