Search tips
Search criteria

Results 1-13 (13)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Integrating protein-protein interaction networks with phenotypes reveals signs of interactions 
Nature methods  2013;11(1):10.1038/nmeth.2733.
A major objective of systems biology is to organize molecular interactions as networks and to characterize information-flow within networks. We describe a computational framework to integrate protein-protein interaction (PPI) networks and genetic screens to predict the “signs” of interactions (i.e. activation/inhibition relationships). We constructed a Drosophila melanogaster signed PPI network, consisting of 6,125 signed PPIs connecting 3,352 proteins that can be used to identify positive and negative regulators of signaling pathways and protein complexes. We identified an unexpected role for the metabolic enzymes Enolase and Aldo-keto reductase as positive and negative regulators of proteolysis, respectively. Characterization of the activation/inhibition relationships between physically interacting proteins within signaling pathways will impact our understanding of many biological functions, including signal transduction and mechanisms of disease.
PMCID: PMC3877743  PMID: 24240319
2.  Online GESS: prediction of miRNA-like off-target effects in large-scale RNAi screen data by seed region analysis 
BMC Bioinformatics  2014;15:192.
RNA interference (RNAi) is an effective and important tool used to study gene function. For large-scale screens, RNAi is used to systematically down-regulate genes of interest and analyze their roles in a biological process. However, RNAi is associated with off-target effects (OTEs), including microRNA (miRNA)-like OTEs. The contribution of reagent-specific OTEs to RNAi screen data sets can be significant. In addition, the post-screen validation process is time and labor intensive. Thus, the availability of robust approaches to identify candidate off-targeted transcripts would be beneficial.
Significant efforts have been made to eliminate false positive results attributable to sequence-specific OTEs associated with RNAi. These approaches have included improved algorithms for RNAi reagent design, incorporation of chemical modifications into siRNAs, and the use of various bioinformatics strategies to identify possible OTEs in screen results. Genome-wide Enrichment of Seed Sequence matches (GESS) was developed to identify potential off-targeted transcripts in large-scale screen data by seed-region analysis. Here, we introduce a user-friendly web application that provides researchers a relatively quick and easy way to perform GESS analysis on data from human or mouse cell-based screens using short interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), as well as for Drosophila screens using shRNAs. Online GESS relies on up-to-date transcript sequence annotations for human and mouse genes extracted from NCBI Reference Sequence (RefSeq) and Drosophila genes from FlyBase. The tool also accommodates analysis with user-provided reference sequence files.
Online GESS provides a straightforward user interface for genome-wide seed region analysis for human, mouse and Drosophila RNAi screen data. With the tool, users can either use a built-in database or provide a database of transcripts for analysis. This makes it possible to analyze RNAi data from any organism for which the user can provide transcript sequences.
PMCID: PMC4073188  PMID: 24934636
RNAi; Off-target effects; Data analysis; Seed region; miRNA; siRNA; shRNA; High-throughput screening
3.  FlyPrimerBank: An Online Database for Drosophila melanogaster Gene Expression Analysis and Knockdown Evaluation of RNAi Reagents 
G3: Genes|Genomes|Genetics  2013;3(9):1607-1616.
The evaluation of specific endogenous transcript levels is important for understanding transcriptional regulation. More specifically, it is useful for independent confirmation of results obtained by the use of microarray analysis or RNA-seq and for evaluating RNA interference (RNAi)-mediated gene knockdown. Designing specific and effective primers for high-quality, moderate-throughput evaluation of transcript levels, i.e., quantitative, real-time PCR (qPCR), is nontrivial. To meet community needs, predefined qPCR primer pairs for mammalian genes have been designed and sequences made available, e.g., via PrimerBank. In this work, we adapted and refined the algorithms used for the mammalian PrimerBank to design 45,417 primer pairs for 13,860 Drosophila melanogaster genes, with three or more primer pairs per gene. We experimentally validated primer pairs for ~300 randomly selected genes expressed in early Drosophila embryos, using SYBR Green-based qPCR and sequence analysis of products derived from conventional PCR. All relevant information, including primer sequences, isoform specificity, spatial transcript targeting, and any available validation results and/or user feedback, is available from an online database ( At FlyPrimerBank, researchers can retrieve primer information for fly genes either one gene at a time or in batch mode. Importantly, we included the overlap of each predicted amplified sequence with RNAi reagents from several public resources, making it possible for researchers to choose primers suitable for knockdown evaluation of RNAi reagents (i.e., to avoid amplification of the RNAi reagent itself). We demonstrate the utility of this resource for validation of RNAi reagents in vivo.
PMCID: PMC3755921  PMID: 23893746
Drosophila; real-time PCR; gene expression; RNAi; knockdown evaluation
4.  Protein Complex–Based Analysis Framework for High-Throughput Data Sets 
Science signaling  2013;6(264):rs5.
Analysis of high-throughput data increasingly relies on pathway annotation and functional information derived from Gene Ontology. This approach has limitations, in particular for the analysis of network dynamics over time or under different experimental conditions, in which modules within a network rather than complete pathways might respond and change. We report an analysis framework based on protein complexes, which are at the core of network reorganization. We generated a protein complex resource for human, Drosophila, and yeast from the literature and databases of protein-protein interaction networks, with each species having thousands of complexes. We developed COMPLEAT (, a tool for data mining and visualization for complex-based analysis of high-throughput data sets, as well as analysis and integration of heterogeneous proteomics and gene expression data sets. With COMPLEAT, we identified dynamically regulated protein complexes among genome-wide RNA interference data sets that used the abundance of phosphorylated extracellular signal–regulated kinase in cells stimulated with either insulin or epidermal growth factor as the output. The analysis predicted that the Brahma complex participated in the insulin response.
PMCID: PMC3756668  PMID: 23443684
5.  Genetic Determinants of Phosphate Response in Drosophila 
PLoS ONE  2013;8(3):e56753.
Phosphate is required for many important cellular processes and having too little phosphate or too much can cause disease and reduce life span in humans. However, the mechanisms underlying homeostatic control of extracellular phosphate levels and cellular effects of phosphate are poorly understood. Here, we establish Drosophila melanogaster as a model system for the study of phosphate effects. We found that Drosophila larval development depends on the availability of phosphate in the medium. Conversely, life span is reduced when adult flies are cultured on high phosphate medium or when hemolymph phosphate is increased in flies with impaired Malpighian tubules. In addition, RNAi-mediated inhibition of MAPK-signaling by knockdown of Ras85D, phl/D-Raf or Dsor1/MEK affects larval development, adult life span and hemolymph phosphate, suggesting that some in vivo effects involve activation of this signaling pathway by phosphate. To identify novel genetic determinants of phosphate responses, we used Drosophila hemocyte-like cultured cells (S2R+) to perform a genome-wide RNAi screen using MAPK activation as the readout. We identified a number of candidate genes potentially important for the cellular response to phosphate. Evaluation of 51 genes in live flies revealed some that affect larval development, adult life span and hemolymph phosphate levels.
PMCID: PMC3592877  PMID: 23520455
6.  RNAi Screening: New Approaches, Understandings and Organisms 
RNA interference (RNAi) leads to sequence-specific knockdown of gene function. The approach can be used in large-scale screens to interrogate function in various model organisms and an increasing number of other species. Genome-scale RNAi screens are routinely performed in cultured or primary cells or in vivo in organisms such as C. elegans. High-throughput RNAi screening is benefitting from the development of sophisticated new instrumentation and software tools for collecting and analyzing data, including high-content image data. The results of large-scale RNAi screens have already proved useful, leading to new understandings of gene function relevant to topics such as infection, cancer, obesity and aging. Nevertheless, important caveats apply and should be taken into consideration when developing or interpreting RNAi screens. Some level of false discovery is inherent to high-throughput approaches and specific to RNAi screens, false discovery due to off-target effects (OTEs) of RNAi reagents remains a problem. The need to improve our ability to use RNAi to elucidate gene function at large scale and in additional systems continues to be addressed through improved RNAi library design, development of innovative computational and analysis tools and other approaches.
PMCID: PMC3249004  PMID: 21953743
RNAi; high-throughput screens; high-content imaging; cell-based assays
7.—the database of the Drosophila RNAi screening center: 2012 update 
Nucleic Acids Research  2011;40(Database issue):D715-D719.
FlyRNAi (, the database and website of the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, serves a dual role, tracking both production of reagents for RNA interference (RNAi) screening in Drosophila cells and RNAi screen results. The database and website is used as a platform for community availability of protocols, tools, and other resources useful to researchers planning, conducting, analyzing or interpreting the results of Drosophila RNAi screens. Based on our own experience and user feedback, we have made several changes. Specifically, we have restructured the database to accommodate new types of reagents; added information about new RNAi libraries and other reagents; updated the user interface and website; and added new tools of use to the Drosophila community and others. Overall, the result is a more useful, flexible and comprehensive website and database.
PMCID: PMC3245182  PMID: 22067456
8.  An integrative approach to ortholog prediction for disease-focused and other functional studies 
BMC Bioinformatics  2011;12:357.
Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward.
We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT;, for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST;
DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
PMCID: PMC3179972  PMID: 21880147
9.  False negative rates in Drosophila cell-based RNAi screens: a case study 
BMC Genomics  2011;12:50.
High-throughput screening using RNAi is a powerful gene discovery method but is often complicated by false positive and false negative results. Whereas false positive results associated with RNAi reagents has been a matter of extensive study, the issue of false negatives has received less attention.
We performed a meta-analysis of several genome-wide, cell-based Drosophila RNAi screens, together with a more focused RNAi screen, and conclude that the rate of false negative results is at least 8%. Further, we demonstrate how knowledge of the cell transcriptome can be used to resolve ambiguous results and how the number of false negative results can be reduced by using multiple, independently-tested RNAi reagents per gene.
RNAi reagents that target the same gene do not always yield consistent results due to false positives and weak or ineffective reagents. False positive results can be partially minimized by filtering with transcriptome data. RNAi libraries with multiple reagents per gene also reduce false positive and false negative outcomes when inconsistent results are disambiguated carefully.
PMCID: PMC3036618  PMID: 21251254
10.  Protein Structure Initiative Material Repository: an open shared public resource of structural genomics plasmids for the biological community 
Nucleic Acids Research  2009;38(Database issue):D743-D749.
The Protein Structure Initiative Material Repository (PSI-MR; provides centralized storage and distribution for the protein expression plasmids created by PSI researchers. These plasmids are a resource that allows the research community to dissect the biological function of proteins whose structures have been identified by the PSI. The plasmid annotation, which includes the full length sequence, vector information and associated publications, is stored in a freely available, searchable database called DNASU ( Each PSI plasmid is also linked to a variety of additional resources, which facilitates cross-referencing of a particular plasmid to protein annotations and experimental data. Plasmid samples can be requested directly through the website. We have also developed a novel strategy to avoid the most common concern encountered when distributing plasmids namely, the complexity of material transfer agreement (MTA) processing and the resulting delays this causes. The Expedited Process MTA, in which we created a network of institutions that agree to the terms of transfer in advance of a material request, eliminates these delays. Our hope is that by creating a repository of expression-ready plasmids and expediting the process for receiving these plasmids, we will help accelerate the accessibility and pace of scientific discovery.
PMCID: PMC2808882  PMID: 19906724
11.  A Full-Genomic Sequence-Verified Protein-Coding Gene Collection for Francisella tularensis 
PLoS ONE  2007;2(6):e577.
The rapid development of new technologies for the high throughput (HT) study of proteins has increased the demand for comprehensive plasmid clone resources that support protein expression. These clones must be full-length, sequence-verified and in a flexible format. The generation of these resources requires automated pipelines supported by software management systems. Although the availability of clone resources is growing, current collections are either not complete or not fully sequence-verified. We report an automated pipeline, supported by several software applications that enabled the construction of the first comprehensive sequence-verified plasmid clone resource for more than 96% of protein coding sequences of the genome of F. tularensis, a highly virulent human pathogen and the causative agent of tularemia. This clone resource was applied to a HT protein purification pipeline successfully producing recombinant proteins for 72% of the genes. These methods and resources represent significant technological steps towards exploiting the genomic information of F. tularensis in discovery applications.
PMCID: PMC1894649  PMID: 17593976
12.  A novel approach to sequence validating protein expression clones with automated decision making 
BMC Bioinformatics  2007;8:198.
Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation.
We have developed an Automated Clone Evaluation (ACE) system – the first comprehensive, multi-platform, web-based plasmid sequence verification software package. ACE automates the clone verification process by defining each clone sequence as a list of multidimensional discrepancy objects, each describing a difference between the clone and its expected sequence including the resulting polypeptide consequences. To evaluate clones automatically, this list can be compared against user acceptance criteria that specify the allowable number of discrepancies of each type. This strategy allows users to re-evaluate the same set of clones against different acceptance criteria as needed for use in other experiments. ACE manages the entire sequence validation process including contig management, identifying and annotating discrepancies, determining if discrepancies correspond to polymorphisms and clone finishing. Designed to manage thousands of clones simultaneously, ACE maintains a relational database to store information about clones at various completion stages, project processing parameters and acceptance criteria. In a direct comparison, the automated analysis by ACE took less time and was more accurate than a manual analysis of a 93 gene clone set.
ACE was designed to facilitate high throughput clone sequence verification projects. The software has been used successfully to evaluate more than 55,000 clones at the Harvard Institute of Proteomics. The software dramatically reduced the amount of time and labor required to evaluate clone sequences and decreased the number of missed sequence discrepancies, which commonly occur during manual evaluation. In addition, ACE helped to reduce the number of sequencing reads needed to achieve adequate coverage for making decisions on clones.
PMCID: PMC1914086  PMID: 17567908
13.  PlasmID: a centralized repository for plasmid clone information and distribution 
Nucleic Acids Research  2006;35(Database issue):D680-D684.
The Plasmid Information Database (PlasmID; ) was developed as a community-based resource portal to facilitate search and request of plasmid clones shared with the Dana-Farber/Harvard Cancer Center (DF/HCC) DNA Resource Core. PlasmID serves as a central data repository and enables researchers to search the collection online using common gene names and identifiers, keywords, vector features, author names and PubMed IDs. As of October 2006, the repository contains >46 000 plasmids in 98 different vectors, including cloned cDNA and genomic fragments from 26 different species. Moreover, the clones include plasmid vectors useful for routine and cutting-edge techniques; functionally related sets of human cDNA clones; and genome-scale gene collections for Saccharomyces cerevisiae, Pseudomonas aeruginosa, Yersinia pestis, Francisella tularensis, Bacillus anthracis and Vibrio cholerae. Information about the plasmids has been fully annotated in adherence with a high-quality standard, and clone samples are stored as glycerol stocks in a state-of-the-art automated −80°C freezer storage system. Clone replication and distribution is highly automated to minimize human error. Infor-mation about vectors and plasmid clones, including downloadable maps and sequence data, is freely available online. Researchers interested in requesting clone samples or sharing their own plasmids with the repository can visit the PlasmID website for more information.
PMCID: PMC1716714  PMID: 17132831

Results 1-13 (13)