Despite recent advances in human genetics, model organisms are indispensable for human disease research. Most human disease pathways are evolutionally conserved among other species, where they may phenocopy the human condition or be associated with seemingly unrelated phenotypes. Much of the known gene-to-phenotype association information is distributed across diverse databases, growing rapidly due to new experimental techniques. Accessible bioinformatics tools will therefore facilitate translation of discoveries from model organisms into human disease biology. Here, we present a web-based discovery tool for human disease studies, MORPHIN (model organisms projected on a human integrated gene network), which prioritizes the most relevant human diseases for a given set of model organism genes, potentially highlighting new model systems for human diseases and providing context to model organism studies. Conceptually, MORPHIN investigates human diseases by an orthology-based projection of a set of model organism genes onto a genome-scale human gene network. MORPHIN then prioritizes human diseases by relevance to the projected model organism genes using two distinct methods: a conventional overlap-based gene set enrichment analysis and a network-based measure of closeness between the query and disease gene sets capable of detecting associations undetectable by the conventional overlap-based methods. MORPHIN is freely accessible at http://www.inetbio.org/morphin.
We are developing a new class of lanthanide-based self-assembling molecular nanoparticles as potential reporter molecules for imaging, and as multi-functional nanoprobes or nanosensors in diagnostic systems. These lanthanide “nano-drums” are homogeneous 4d–4f clusters approximately 25 to 30 Å in diameter that can emit from the visible to near-infrared (NIR) wavelengths. Here, we present syntheses, crystal structures, photophysical properties, and comparative cytotoxicity data for six nano-drums containing either Eu, Tb, Lu, Er, Yb or Ho. Imaging capabilities of these nano-drums are demonstrated using epifluorescence, total internal reflection fluorescence (TIRF), and two-photon microscopy. We discuss how these molecular nanoparticles can to be adapted for a range of assays, particularly by taking advantage of functionalization strategies with chemical moieties to enable conjugation to protein or nucleic acids.
studies have shown that the concentrations of proteins expressed
from orthologous genes are often conserved across organisms and to
a greater extent than the abundances of the corresponding mRNAs. However,
such studies have not distinguished between evolutionary (e.g., sequence
divergence) and environmental (e.g., growth condition) effects on
the regulation of steady-state protein and mRNA abundances. Here,
we systematically investigated the transcriptome and proteome of two
closely related Pseudomonas aeruginosa strains, PAO1 and PA14, under identical experimental conditions,
thus controlling for environmental effects. For 703 genes observed
by both shotgun proteomics and microarray experiments, we found that
the protein-to-mRNA ratios are highly correlated between orthologous
genes in the two strains to an extent comparable to protein and mRNA
abundances. In spite of this high molecular similarity between PAO1
and PA14, we found that several metabolic, virulence, and antibiotic
resistance genes are differentially expressed between the two strains,
mostly at the protein but not at the mRNA level. Our data demonstrate
that the magnitude and direction of the effect of protein abundance
regulation occurring after the setting of mRNA levels is conserved
between bacterial strains and is important for explaining the discordance
between mRNA and protein abundances.
Transcriptomics; proteomics; Pseudomonas
Many normally cytosolic yeast proteins form insoluble intracellular bodies in response to nutrient depletion, suggesting the potential for widespread protein aggregation in stressed cells. Nearly 200 such bodies have been found in yeast by screening libraries of fluorescently tagged proteins. In order to more broadly characterize the formation of these bodies in response to stress, we employed a proteome-wide shotgun mass spectrometry assay in order to measure shifts in the intracellular solubilities of endogenous proteins following heat stress. As quantified by mass spectrometry, heat stress tended to shift the same proteins into insoluble form as did nutrient depletion; many of these proteins were also known to form foci in response to arsenic stress. Affinity purification of several foci-forming proteins showed enrichment for co-purifying chaperones, including Hsp90 chaperones. Tests of induction conditions and co-localization of metabolic enzymes participating in the same metabolic pathways suggested those foci did not correspond to multi-enzyme organizing centers. Thus, in yeast, the formation of stress bodies appears common across diverse, normally diffuse cytoplasmic proteins and is induced by multiple types of cell stress, including thermal, chemical, and nutrient stress.
Characterizing the in vivo dynamics of the polyclonal
antibody repertoire in serum, such as that which might arise in response
to stimulation with an antigen, is difficult due to the presence of
many highly similar immunoglobulin proteins, each specified by distinct
B lymphocytes. These challenges have precluded the use of conventional
mass spectrometry for antibody identification based on peptide mass
spectral matches to a genomic reference database. Recently, progress
has been made using bottom-up analysis of serum antibodies by nanoflow
liquid chromatography/high-resolution tandem mass spectrometry combined
with a sample-specific antibody sequence database generated by high-throughput
sequencing of individual B cell immunoglobulin variable domains (V
genes). Here, we describe how intrinsic features of antibody primary
structure, most notably the interspersed segments of variable and
conserved amino acid sequences, generate recurring patterns in the
corresponding peptide mass spectra of V gene peptides, greatly complicating
the assignment of correct sequences to mass spectral data. We show
that the standard method of decoy-based error modeling fails to account
for the error introduced by these highly similar sequences, leading
to a significant underestimation of the false discovery rate. Because
of these effects, antibody-derived peptide mass spectra require increased
stringency in their interpretation. The use of filters based on the
mean precursor ion mass accuracy of peptide-spectrum matches is shown
to be particularly effective in distinguishing between “true”
and “false” identifications. These findings highlight
important caveats associated with the use of standard database search
and error-modeling methods with nonstandard data sets and custom sequence
We find that the topologies of real world networks, such as those formed within human societies, by the Internet, or among cellular proteins, are dominated by the mode of the interactions considered among the individuals. Specifically, a major dichotomy in previously studied networks arises from modeling networks in terms of pairwise versus group tasks. The former often intrinsically give rise to scale-free, disassortative, hierarchical networks, whereas the latter often give rise to single- or broad-scale, assortative, nonhierarchical networks. These dependencies explain contrasting observations among previous topological analyses of real world complex systems. We also observe this trend in systems with natural hierarchies, in which alternate representations of the same networks, but which capture different levels of the hierarchy, manifest these signature topological differences. For example, in both the Internet and cellular proteomes, networks of lower-level system components (routers within domains or proteins within biological processes) are assortative and nonhierarchical, whereas networks of upper-level system components (internet domains or biological processes) are disassortative and hierarchical. Our results demonstrate that network topologies of complex systems must be interpreted in light of their hierarchical natures and interaction types.
Some metabolic pathway enzymes are known to organize into multi-enzyme complexes for reasons of catalytic efficiency, metabolite channeling, and other advantages of compartmentalization. It has long been an appealing prospect that de novo purine biosynthesis enzymes form such a complex, termed the “purinosome.” Early work characterizing these enzymes garnered scarce but encouraging evidence for its existence. Recent investigations led to the discovery in human cell lines of purinosome bodies—cytoplasmic puncta containing transfected purine biosynthesis enzymes, which were argued to correspond to purinosomes. New discoveries challenge both the functional and physiological relevance of these bodies in favor of protein aggregation.
The proteomes of cells, tissues, and organisms reflect active cellular processes and change continuously in response to intracellular and extracellular cues. Deep, quantitative profiling of the proteome, especially if combined with mRNA and metabolite measurements, should provide an unprecedented view of cell state, better revealing functions and interactions of cell components. Molecular diagnostics and biomarker discovery should benefit particularly from the accurate quantification of proteomes, since complex diseases like cancer change protein abundances and modifications. Currently, shotgun mass spectrometry is the primary technology for high-throughput protein identification and quantification; while powerful, it lacks high sensitivity and coverage. We draw parallels with next-generation DNA sequencing and propose a strategy, termed fluorosequencing, for sequencing peptides in a complex protein sample at the level of single molecules. In the proposed approach, millions of individual fluorescently labeled peptides are visualized in parallel, monitoring changing patterns of fluorescence intensity as N-terminal amino acids are sequentially removed, and using the resulting fluorescence signatures (fluorosequences) to uniquely identify individual peptides. We introduce a theoretical foundation for fluorosequencing and, by using Monte Carlo computer simulations, we explore its feasibility, anticipate the most likely experimental errors, quantify their potential impact, and discuss the broad potential utility offered by a high-throughput peptide sequencing technology.
The development of next-generation DNA and RNA sequencing methods has transformed biology, with current platforms generating >1 billion sequencing reads per run. Unfortunately, no method of similar scale and throughput exists to identify and quantify specific proteins in complex mixtures, representing a critical bottleneck in many biochemical and molecular diagnostic assays. What is urgently needed is a massively parallel method, akin to next-gen DNA sequencing, for identifying and quantifying peptides or proteins in a sample. In principle, single-molecule peptide sequencing could achieve this goal, allowing billions of distinct peptides to be sequenced in parallel and thereby identifying proteins composing the sample and digitally quantifying them by direct counting of peptides. Here, we discuss theoretical considerations of single molecule peptide sequencing, suggest one possible experimental strategy, and, using computer simulations, characterize the potential utility and unusual properties of this future proteomics technology.
In eukaryotes, the highly conserved U3 small nucleolar RNA (snoRNA) base-pairs to multiple sites in the pre-ribosomal RNA (pre-rRNA) to promote early cleavage and folding events. Binding of the U3 box A region to the pre-rRNA is mutually exclusive with folding of the central pseudoknot (CPK), a universally conserved rRNA structure of the small ribosomal subunit essential for protein synthesis. Here, we report that the DEAH-box helicase Dhr1 (Ecm16) is responsible for displacing U3. An active site mutant of Dhr1 blocked release of U3 from the pre-ribosome, thereby trapping a pre-40S particle. This particle had not yet achieved its mature structure because it contained U3, pre-rRNA, and a number of early-acting ribosome synthesis factors but noticeably lacked ribosomal proteins (r-proteins) that surround the CPK. Dhr1 was cross-linked in vivo to the pre-rRNA and to U3 sequences flanking regions that base-pair to the pre-rRNA including those that form the CPK. Point mutations in the box A region of U3 suppressed a cold-sensitive mutation of Dhr1, strongly indicating that U3 is an in vivo substrate of Dhr1. To support the conclusions derived from in vivo analysis we showed that Dhr1 unwinds U3-18S duplexes in vitro by using a mechanism reminiscent of DEAD box proteins.
U3 snoRNA binds to pre-rRNA, helping to orchestrate key steps in ribosome assembly. This study identifies Dhr1 as the essential RNA helicase that releases U3 snoRNA and allows ribosome maturation to continue.
Ribosomes are intricate assemblies of RNA and protein that are responsible for decoding a cell’s genetic information. Their assembly is a very rapid and dynamic process, requiring many ancillary factors in eukaryotic cells. One critical factor is the U3 snoRNA, which binds to the immature ribosomal RNA to direct early processing and folding of the RNA of the small subunit. Although U3 is essential to promote assembly, it must be actively removed to allow completion of RNA folding. Such RNA dynamics are often driven by RNA helicases, and here we use a broad range of experimental approaches to identify the RNA helicase Dhr1 as the enzyme responsible for removing U3 in yeast. A combination of techniques allows us to assess what goes wrong when Dhr1 is mutated, which parts of the RNA molecules the enzyme binds to, and how Dhr1 unwinds its substrates.
Bioengineering advances have made it possible to fundamentally alter the genetic codes of organisms. However, the evolutionary consequences of expanding an organism's genetic code with a non-canonical amino acid are poorly understood. Here we show that bacteriophages evolved on a host that incorporates 3-iodotyrosine at the amber stop codon acquired neutral and beneficial mutations to this new amino acid in their proteins, demonstrating that an expanded genetic code increases evolvability.
Cellular states are determined by differential expression of the cell’s proteins. The relationship between protein and mRNA expression levels informs about the combined outcomes of translation and protein degradation which are, in addition to transcription and mRNA stability, essential contributors to gene expression regulation. This review summarizes the state of knowledge about large-scale measurements of absolute protein and mRNA expression levels, and the degree of correlation between the two parameters. We summarize the information that can be derived from comparison of protein and mRNA expression levels and discuss how corresponding sequence characteristics suggest modes of regulation.
Cytokinesis and ciliogenesis are fundamental cellular processes that require strict coordination of microtubule organization and directed membrane trafficking. These processes have been intensely studied, but there has been little indication that regulatory machinery might be extensively shared between them. Here, we show that several central spindle/midbody proteins (PRC1, MKLP-1, INCENP, centriolin) also localize in specific patterns at the basal body complex in vertebrate ciliated epithelial cells. Moreover, bioinformatic comparisons of midbody and cilia proteomes reveal a highly significant degree of overlap. Finally, we used temperature-sensitive alleles of PRC1/spd-1 and MKLP-1/zen-4 in C. elegans to assess ciliary functions while bypassing these proteins' early role in cell division. These mutants displayed defects in both cilia function and cilia morphology. Together, these data suggest the conserved re-use of a surprisingly large number of proteins in the cytokinetic apparatus and in cilia.
Ciliogenesis; cytokinesis; PRC1; INCENP; MKLP-1; bioinformatics; cilia midbody
Both focused and large-scale cell biological and biochemical studies have revealed that hundreds of metabolic enzymes across diverse organisms form large intracellular bodies. These proteinaceous bodies range in form from fibers and intracellular foci—such as those formed by enzymes of nitrogen and carbon utilization and of nucleotide biosynthesis—to high-density packings inside bacterial microcompartments and eukaryotic microbodies. Although many enzymes clearly form functional mega-assemblies, it is not yet clear for many recently discovered cases whether they represent functional entities, storage bodies, or aggregates. In this article, we survey intracellular protein bodies formed by metabolic enzymes, asking when and why such bodies form and what their formation implies for the functionality—and dysfunctionality—of the enzymes that comprise them. The panoply of intracellular protein bodies also raises interesting questions regarding their evolution and maintenance within cells. We speculate on models for how such structures form in the first place and why they may be inevitable.
self-assembly; allosteric regulation; fibers; foci; storage bodies; aggregates; metabolic efficiency
The gram-negative opportunistic pathogen Pseudomonas aeruginosa is the primary cause of chronic respiratory infections in individuals with the heritable disease cystic fibrosis (CF). These infections can last for decades, during which time P. aeruginosa has been proposed to acquire beneficial traits via adaptive evolution. Because CF lacks an animal model that can acquire chronic P. aeruginosa infections, identifying genes important for long-term in vivo fitness remains difficult. However, since clonal, chronological samples can be obtained from chronically infected individuals, traits undergoing adaptive evolution can be identified. Recently we identified 24 P. aeruginosa gene expression traits undergoing parallel evolution in vivo in multiple individuals, suggesting they are beneficial to the bacterium. The goal of this study was to determine if these genes impact P. aeruginosa phenotypes important for survival in the CF lung. By using a gain-of-function genetic screen, we found that 4 genes and 2 operons undergoing parallel evolution in vivo promote P. aeruginosa biofilm formation. These genes/operons promote biofilm formation by increasing levels of the non-alginate exopolysaccharide Psl. One of these genes, phaF, enhances Psl production via a post-transcriptional mechanism, while the other 5 genes/operons do not act on either psl transcription or translation. Together, these data demonstrate that P. aeruginosa has evolved at least two pathways to over-produce a non-alginate exopolysaccharide during long-term colonization of the CF lung. More broadly, this approach allowed us to attribute a biological significance to genes with unknown function, demonstrating the power of using evolution as a guide for targeted genetic studies.
During vertebrate retinogenesis, the precise balance between retinoblast proliferation and differentiation is spatially and temporally regulated through a number of intrinsic factors and extrinsic signaling pathways. Moreover, there are complex gene regulatory network interactions between these intrinsic factors and extrinsic pathways, which ultimately function to determine when retinoblasts exit the cell cycle and terminally differentiate. We recently uncovered a cell non-autonomous role for the intrinsic HLH factor, Id2a, in regulating retinoblast proliferation and differentiation, with Id2a-deficient retinae containing an abundance of proliferative retinoblasts and an absence of terminally differentiated retinal neurons and glia. Here, we report that Id2a function is necessary and sufficient to limit Notch pathway activity during retinogenesis. Id2a-deficient retinae possess elevated levels of Notch pathway component gene expression, while retinae overexpressing id2a possess reduced expression of Notch pathway component genes. Attenuation of Notch signaling activity by DAPT or by morpholino knockdown of Notch1a is sufficient to rescue both the proliferative and differentiation defects in Id2a-deficient retinae. In addition to regulating Notch pathway activity, through a novel RNA-Seq and differential gene expression analysis of Id2a-deficient retinae, we identify a number of additional intrinsic and extrinsic regulatory pathway components whose expression is regulated by Id2a. These data highlight the integral role played by Id2a in the gene regulatory network governing the transition from retinoblast proliferation to terminal differentiation during vertebrate retinogenesis.
Id2a; Id2; retinogenesis; Notch; zebrafish
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions which were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably-associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes, and encompass both candidate disease genes and unnanotated proteins to inform on mechanism. Strikingly, whereas larger multi-protein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with 5 or fewer subunits are far more likely to be functionally un-annotated or restricted to vertebrates, suggesting more recent functional innovations.
Myelination of the central nervous system (CNS) is critical to vertebrate nervous systems for efficient neural signaling. CNS myelination occurs as oligodendrocytes terminally differentiate, a process regulated in part by the myelin regulatory factor, MYRF. Using bioinformatics and extensive biochemical and functional assays, we find that MYRF is generated as an integral membrane protein that must be processed to release its transcription factor domain from the membrane. In contrast to most membrane-bound transcription factors, MYRF proteolysis seems constitutive and independent of cell- and tissue-type, as we demonstrate by reconstitution in E. coli and yeast. The apparent absence of physiological cues raises the question as to how and why MYRF is processed. By using computational methods capable of recognizing extremely divergent sequence homology, we identified a MYRF protein domain distantly related to bacteriophage tailspike proteins. Although occurring in otherwise unrelated proteins, the phage domains are known to chaperone the tailspike proteins' trimerization and auto-cleavage, raising the hypothesis that the MYRF domain might contribute to a novel activation method for a membrane-bound transcription factor. We find that the MYRF domain indeed serves as an intramolecular chaperone that facilitates MYRF trimerization and proteolysis. Functional assays confirm that the chaperone domain-mediated auto-proteolysis is essential both for MYRF's transcriptional activity and its ability to promote oligodendrocyte maturation. This work thus reveals a previously unknown key step in CNS myelination. These data also reconcile conflicting observations of this protein family, different members of which have been identified as transmembrane or nuclear proteins. Finally, our data illustrate a remarkable evolutionary repurposing between bacteriophages and eukaryotes, with a chaperone domain capable of catalyzing trimerization-dependent auto-proteolysis in two entirely distinct protein and cellular contexts, in one case participating in bacteriophage tailspike maturation and in the other activating a key transcription factor for CNS myelination.
Membrane-bound transcription factors are synthesized as integral membrane proteins, but are proteolytically cleaved in response to relevant cues, untethering their transcription factor domains from the membrane to control gene expression in the nucleus. Here, we find that the myelin regulatory factor MYRF, a major transcriptional regulator of oligodendrocyte differentiation and central nervous system myelination, is also a membrane-bound transcription factor. In marked contrast to most well-known membrane-bound transcription factors, cleavage of MYRF appears to be unconditional. Surprisingly, this processing is performed by a protein domain shared with bacteriophages in otherwise unrelated proteins, where the domain is critical to the folding and proteolytic maturation of virus tailspikes. In addition to revealing a previously unknown key step in central nervous system myelination, this work also illustrates a remarkable example of evolutionary repurposing between bacteriophages and eukaryotes, with the same protein domain capable of catalyzing trimerization-dependent auto-proteolysis in two completely distinct protein and cellular contexts.
Gram-negative bacteria produce outer membrane vesicles (OMVs) that package and deliver proteins, small molecules, and DNA to prokaryotic and eukaryotic cells. The molecular details of OMV biogenesis have not been fully elucidated, but peptidoglycan-associated outer membrane proteins that tether the outer membrane to the underlying peptidoglycan have been shown to be critical for OMV formation in multiple Enterobacteriaceae. In this study, we demonstrate that the peptidoglycan-associated outer membrane proteins OprF and OprI, but not OprL, impact production of OMVs by the opportunistic pathogen Pseudomonas aeruginosa. Interestingly, OprF does not appear to be important for tethering the outer membrane to peptidoglycan but instead impacts OMV formation through modulation of the levels of the Pseudomonas quinolone signal (PQS), a quorum signal previously shown by our laboratory to be critical for OMV formation. Thus, the mechanism by which OprF impacts OMV formation is distinct from that for other peptidoglycan-associated outer membrane proteins, including OprI.
Mass spectrometry (MS)-based shotgun proteomics allows protein identifications even in complex biological samples. Protein abundances can then be estimated from the counts of MS/MS spectra attributable to each protein, provided that one corrects for differential MS-detectability of the contributing peptides. We describe the use of a method, APEX, which calculates Absolute Protein EXpression levels based on learned correction factors, MS/MS spectral counts, and each protein's probability of correct identification.
The APEX-based calculations consist of three parts: (1) Using training data, peptide sequences and their sequence properties, a model is built that can be used to estimate MS-detectability (Oi) for any given protein. (2) Absolute abundances of proteins measured in an MS/MS experiment are calculated with information from spectral counts, identification probabilities and the learned Oi -values. (3) Simple statistics allow for significance analysis of differential expression in two distinct biological samples, i.e., measuring relative protein abundances. APEX-based protein abundances span more than four orders of magnitude and are applicable to mixtures of hundreds to thousands of proteins from any type of organism.
Quantitative proteomics; Protein expression; Label-free mass spectrometry; Spectral counting
Recent advances in next-generation DNA sequencing and proteomics provide an unprecedented ability to survey mRNA and protein abundances. Such proteome-wide surveys are illuminating the extent to which different aspects of gene expression help to regulate cellular protein abundances. Current data demonstrate a substantial role for regulatory processes occurring after mRNA is made — that is, post-transcriptional, translational and protein degradation regulation — in controlling steady-state protein abundances. Intriguing observations are also emerging in relation to cells following perturbation, single-cell studies and the apparent evolutionary conservation of protein and mRNA abundances. Here, we summarize current understanding of the major factors regulating protein expression.
We have described a protocol for performing high-throughput immunofluorescence microscopy on microarrays of yeast cells. This approach employs immunostaining of spheroplasted yeast cells printed as high-density cell microarrays, followed by imaging using automated microscopy. A yeast spheroplast microarray can contain more than 5,000 printed spots, each containing cells from a given yeast strain, and is thus suitable for genome-wide screens focusing on single cell phenotypes, such as systematic localization or co-localization studies or genetic assays for genes affecting probed targets. We demonstrate the use of yeast spheroplast microarrays to probe microtubule and spindle defects across a collection of yeast strains harboring tetracycline-down-regulatable alleles of essential genes.
Yeast; immunofluorescence; high-throughput microscopy; cell microarrays; microtubule
It has been hypothesized that components of enzymatic pathways might organize into intracellular assemblies to improve their catalytic efficiency or lead to coordinate regulation. Accordingly, de novo purine biosynthesis enzymes may form a purinosome in the absence of purines, and a punctate intracellular body has been identified as the purinosome. We investigated the mechanism by which human de novo purine biosynthetic enzymes might be organized into purinosomes, especially under differing cellular conditions. Irregardless of the activity of bodies formed by endogenous enzymes, we demonstrate that intracellular bodies formed by transiently transfected, fluorescently tagged human purine biosynthesis proteins are best explained as protein aggregation.
Motivation: A number of computational methods have been proposed that predict protein–protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs (negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs (positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs.
Results: We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the ‘hubbiness’ of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling.
Availability: The datasets used for this study are available at http://www.marcottelab.org/PPINegativeDataSampling.
Contact: firstname.lastname@example.org; email@example.com
Supplementary Information: Supplementary data are available at Bioinformatics online.