Search tips
Search criteria

Results 1-25 (1602204)

Clipboard (0)

Related Articles

1.  A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila 
BMC Genomics  2009;10:220.
The recently developed RNA interference (RNAi) technology has created an unprecedented opportunity which allows the function of individual genes in whole organisms or cell lines to be interrogated at genome-wide scale. However, multiple issues, such as off-target effects or low efficacies in knocking down certain genes, have produced RNAi screening results that are often noisy and that potentially yield both high rates of false positives and false negatives. Therefore, integrating RNAi screening results with other information, such as protein-protein interaction (PPI), may help to address these issues.
By analyzing 24 genome-wide RNAi screens interrogating various biological processes in Drosophila, we found that RNAi positive hits were significantly more connected to each other when analyzed within a protein-protein interaction network, as opposed to random cases, for nearly all screens. Based on this finding, we developed a network-based approach to identify false positives (FPs) and false negatives (FNs) in these screening results. This approach relied on a scoring function, which we termed NePhe, to integrate information obtained from both PPI network and RNAi screening results. Using a novel rank-based test, we compared the performance of different NePhe scoring functions and found that diffusion kernel-based methods generally outperformed others, such as direct neighbor-based methods. Using two genome-wide RNAi screens as examples, we validated our approach extensively from multiple aspects. We prioritized hits in the original screens that were more likely to be reproduced by the validation screen and recovered potential FNs whose involvements in the biological process were suggested by previous knowledge and mutant phenotypes. Finally, we demonstrated that the NePhe scoring system helped to biologically interpret RNAi results at the module level.
By comprehensively analyzing multiple genome-wide RNAi screens, we conclude that network information can be effectively integrated with RNAi results to produce suggestive FPs and FNs, and to bring biological insight to the screening results.
PMCID: PMC2697172  PMID: 19435510
2.  Analysis of high-throughput RNAi screening data in identifying genes mediating sensitivity to chemotherapeutic drugs: statistical approaches and perspectives 
BMC Genomics  2012;13(Suppl 8):S3.
High-throughput RNA interference (RNAi) screens have been used to find genes that, when silenced, result in sensitivity to certain chemotherapy drugs. Researchers therefore can further identify drug-sensitive targets and novel drug combinations that sensitize cancer cells to chemotherapeutic drugs. Considerable uncertainty exists about the efficiency and accuracy of statistical approaches used for RNAi hit selection in drug sensitivity studies. Researchers require statistical methods suitable for analyzing high-throughput RNAi screening data that will reduce false-positive and false-negative rates.
In this study, we carried out a simulation study to evaluate four types of statistical approaches (fold-change/ratio, parametric tests/statistics, sensitivity index, and linear models) with different scenarios of RNAi screenings for drug sensitivity studies. With the simulated datasets, the linear model resulted in significantly lower false-negative and false-positive rates. Based on the results of the simulation study, we then make recommendations of statistical analysis methods for high-throughput RNAi screening data in different scenarios. We assessed promising methods using real data from a loss-of-function RNAi screen to identify hits that modulate paclitaxel sensitivity in breast cancer cells. High-confidence hits with specific inhibitors were further analyzed for their ability to inhibit breast cancer cell growth. Our analysis identified a number of gene targets with inhibitors known to enhance paclitaxel sensitivity, suggesting other genes identified may merit further investigation.
RNAi screening can identify druggable targets and novel drug combinations that can sensitize cancer cells to chemotherapeutic drugs. However, applying an inappropriate statistical method or model to the RNAi screening data will result in decreased power to detect the true hits and increase false positive and false negative rates, leading researchers to draw incorrect conclusions. In this paper, we make recommendations to enable more objective selection of statistical analysis methods for high-throughput RNAi screening data.
PMCID: PMC3535706  PMID: 23281588
3.  A protein network-guided screen for cell cycle regulators in Drosophila 
BMC Systems Biology  2011;5:65.
Large-scale RNAi-based screens are playing a critical role in defining sets of genes that regulate specific cellular processes. Numerous screens have been completed and in some cases more than one screen has examined the same cellular process, enabling a direct comparison of the genes identified in separate screens. Surprisingly, the overlap observed between the results of similar screens is low, suggesting that RNAi screens have relatively high levels of false positives, false negatives, or both.
We re-examined genes that were identified in two previous RNAi-based cell cycle screens to identify potential false positives and false negatives. We were able to confirm many of the originally observed phenotypes and to reveal many likely false positives. To identify potential false negatives from the previous screens, we used protein interaction networks to select genes for re-screening. We demonstrate cell cycle phenotypes for a significant number of these genes and show that the protein interaction network is an efficient predictor of new cell cycle regulators. Combining our results with the results of the previous screens identified a group of validated, high-confidence cell cycle/cell survival regulators. Examination of the subset of genes from this group that regulate the G1/S cell cycle transition revealed the presence of multiple members of three structurally related protein complexes: the eukaryotic translation initiation factor 3 (eIF3) complex, the COP9 signalosome, and the proteasome lid. Using a combinatorial RNAi approach, we show that while all three of these complexes are required for Cdk2/Cyclin E activity, the eIF3 complex is specifically required for some other step that limits the G1/S cell cycle transition.
Our results show that false positives and false negatives each play a significant role in the lack of overlap that is observed between similar large-scale RNAi-based screens. Our results also show that protein network data can be used to minimize false negatives and false positives and to more efficiently identify comprehensive sets of regulators for a process. Finally, our data provides a high confidence set of genes that are likely to play key roles in regulating the cell cycle or cell survival.
PMCID: PMC3113730  PMID: 21548953
4.  Limited Agreement of Independent RNAi Screens for Virus-Required Host Genes Owes More to False-Negative than False-Positive Factors 
PLoS Computational Biology  2013;9(9):e1003235.
Systematic, genome-wide RNA interference (RNAi) analysis is a powerful approach to identify gene functions that support or modulate selected biological processes. An emerging challenge shared with some other genome-wide approaches is that independent RNAi studies often show limited agreement in their lists of implicated genes. To better understand this, we analyzed four genome-wide RNAi studies that identified host genes involved in influenza virus replication. These studies collectively identified and validated the roles of 614 cell genes, but pair-wise overlap among the four gene lists was only 3% to 15% (average 6.7%). However, a number of functional categories were overrepresented in multiple studies. The pair-wise overlap of these enriched-category lists was high, ∼19%, implying more agreement among studies than apparent at the gene level. Probing this further, we found that the gene lists implicated by independent studies were highly connected in interacting networks by independent functional measures such as protein-protein interactions, at rates significantly higher than predicted by chance. We also developed a general, model-based approach to gauge the effects of false-positive and false-negative factors and to estimate, from a limited number of studies, the total number of genes involved in a process. For influenza virus replication, this novel statistical approach estimates the total number of cell genes involved to be ∼2,800. This and multiple other aspects of our experimental and computational results imply that, when following good quality control practices, the low overlap between studies is primarily due to false negatives rather than false-positive gene identifications. These results and methods have implications for and applications to multiple forms of genome-wide analysis.
Author Summary
Genome-wide RNA interference assays of gene functions offer the potential for systematic, global analysis of biological processes. A pressing challenge is to develop meta-analysis methods that effectively combine information from multiple studies. One puzzle is that implicated gene lists from independent studies of the same process often show relatively low overlap. This disagreement might arise from false-positive factors, such as imperfect gene targeting (off-target effects), or from false negatives if separate studies access different components of large, complex systems. We present new methods to examine the relations between individual genome-wide RNAi studies, using studies of host genes in influenza virus replication as a test case. We find that cross-study agreement is greater than suggested by overlap of reported gene lists. This better agreement is evidenced by the strong relation of independent gene lists in functional pathways and protein interaction networks, and by a statistical model that relates multi-study, gene-level findings to factors driving correct, false-negative, and false-positive gene identification. Our analysis of multiple genome-wide studies predicts that there are many undetected host genes important for influenza virus infection, and that false negatives are the major concerns for genome-wide studies.
PMCID: PMC3777922  PMID: 24068911
5.  Online GESS: prediction of miRNA-like off-target effects in large-scale RNAi screen data by seed region analysis 
BMC Bioinformatics  2014;15:192.
RNA interference (RNAi) is an effective and important tool used to study gene function. For large-scale screens, RNAi is used to systematically down-regulate genes of interest and analyze their roles in a biological process. However, RNAi is associated with off-target effects (OTEs), including microRNA (miRNA)-like OTEs. The contribution of reagent-specific OTEs to RNAi screen data sets can be significant. In addition, the post-screen validation process is time and labor intensive. Thus, the availability of robust approaches to identify candidate off-targeted transcripts would be beneficial.
Significant efforts have been made to eliminate false positive results attributable to sequence-specific OTEs associated with RNAi. These approaches have included improved algorithms for RNAi reagent design, incorporation of chemical modifications into siRNAs, and the use of various bioinformatics strategies to identify possible OTEs in screen results. Genome-wide Enrichment of Seed Sequence matches (GESS) was developed to identify potential off-targeted transcripts in large-scale screen data by seed-region analysis. Here, we introduce a user-friendly web application that provides researchers a relatively quick and easy way to perform GESS analysis on data from human or mouse cell-based screens using short interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), as well as for Drosophila screens using shRNAs. Online GESS relies on up-to-date transcript sequence annotations for human and mouse genes extracted from NCBI Reference Sequence (RefSeq) and Drosophila genes from FlyBase. The tool also accommodates analysis with user-provided reference sequence files.
Online GESS provides a straightforward user interface for genome-wide seed region analysis for human, mouse and Drosophila RNAi screen data. With the tool, users can either use a built-in database or provide a database of transcripts for analysis. This makes it possible to analyze RNAi data from any organism for which the user can provide transcript sequences.
PMCID: PMC4073188  PMID: 24934636
RNAi; Off-target effects; Data analysis; Seed region; miRNA; siRNA; shRNA; High-throughput screening
6.  In Vivo RNAi-Based Screens: Studies in Model Organisms 
Genes  2013;4(4):646-665.
RNA interference (RNAi) is a technique widely used for gene silencing in organisms and cultured cells, and depends on sequence homology between double-stranded RNA (dsRNA) and target mRNA molecules. Numerous cell-based genome-wide screens have successfully identified novel genes involved in various biological processes, including signal transduction, cell viability/death, and cell morphology. However, cell-based screens cannot address cellular processes such as development, behavior, and immunity. Drosophila and Caenorhabditis elegans are two model organisms whose whole bodies and individual body parts have been subjected to RNAi-based genome-wide screening. Moreover, Drosophila RNAi allows the manipulation of gene function in a spatiotemporal manner when it is implemented using the Gal4/UAS system. Using this inducible RNAi technique, various large-scale screens have been performed in Drosophila, demonstrating that the method is straightforward and valuable. However, accumulated results reveal that the results of RNAi-based screens have relatively high levels of error, such as false positives and negatives. Here, we review in vivo RNAi screens in Drosophila and the methods that could be used to remove ambiguity from screening results.
PMCID: PMC3927573  PMID: 24705267
Drosophila; genome-wide screen; RNAi library; false results; interaction network
7.  Modularity and hormone sensitivity of the Drosophila melanogaster insulin receptor/target of rapamycin interaction proteome 
First systematic analysis of the evolutionary conserved InR/TOR pathway interaction proteome in Drosophila.Quantitative mass spectrometry revealed that 22% of identified protein interactions are regulated by the growth hormone insulin affecting membrane proximal as well as intracellular signaling complexes.Systematic RNA interference linked a significant fraction of network components to the control of dTOR kinase activity.Combined biochemical and genetic data suggest dTTT, a dTOR-containing complex required for cell growth control by dTORC1 and dTORC2 in vivo.
Cellular growth is a fundamental process that requires constant adaptations to changing environmental conditions, like growth factor and nutrient availability, energy levels and more. Over the years, the insulin receptor/target of rapamycin pathway (InR/TOR) emerged as a key signaling system for the control of metazoan cell growth. Genetic screens carried out in the fruit fly Drosophila melanogaster identified key InR/TOR pathway components and their relationships. Phenotypes such as altered cell growth are likely to emerge from perturbed dynamic networks containing InR/TOR pathway components, which stably or transiently interact with other cellular proteins to form complexes and networks thereof. Systematic studies on the topology and dynamics of protein interaction networks become therefore highly relevant to gain systems level understanding of deregulated cell growth. Despite much progress in genetic analysis only few systematic protein interaction studies have been reported for Drosophila, which in most cases lack quantitative information representing the dynamic nature of such networks. Here, we present the first quantitative affinity purification mass spectrometry (AP–MS/MS) analysis on the evolutionary conserved InR/TOR signaling network in Drosophila. Systematic RNAi-based functional analysis of identified network components revealed key components linked to the regulation of the central effector kinase dTOR. This includes also dTTT, a novel dTOR-containing complex required for the control of dTORC1 and dTORC2 in vivo.
For systematic AP–MS analysis, we generated Drosophila Kc167 cell lines inducibly expressing affinity-tagged bait proteins previously linked to InR/TOR signaling. Bait expressing Kc167 cell lines were harvested before and after insulin stimulation for subsequent affinity purification. Following LC–MS/MS analysis and probabilistic data filtering using SAINT (Choi et al, 2010), we generated a quantitative network model from 97 high confidence protein–protein interactions and 58 network components (Figure 2). The presented network displayed a high degree of orthologous interactions conserved also in human cells and identified a number of novel molecular interactions with InR/TOR signaling components for future hypothesis driven analysis.
To measure insulin-induced changes within the InR/TOR interaction proteome, we applied a recently introduced label-free quantitative MS approach (Rinner et al, 2007). The obtained quantitative data suggest that 22% of all interactions in the network are regulated by insulin. Major changes could be observed within the membrane proximal InR/chico/PI3K signaling complexes, and also in 14-3-3 protein containing signaling complexes and dTORC1, a complex that contains besides dTOR all major orthologous proteins found also in human mTORC1 including the two dTORC1 substrates d4E-BP (Thor) and S6 Kinase (S6K). Insulin triggered both, dissociation and association of dTORC1 proteins. Among the proteins that showed enhanced binding to dTORC1 upon insulin stimulation we found Unkempt, a RING-finger protein with a proposed role in ubiquitin-mediated protein degradation (Lores et al, 2010). Besides dTORC1 our systematic AP–MS analysis also revealed the presence of dTORC2, the second major TOR complex in Drosophila. dTORC2 contains the Drosophila orthologous of human mTORC2 proteins, but in contrast to dTORC1 was not affected upon insulin stimulation. Interestingly, we also found a specific set of proteins that were not linked to the canonical TOR complexes TORC1 and TORC2 in dTOR purifications. These include LqfR (liquid facets related), Pontin, Reptin, Spaghetti and the gene product of CG16908. We found the same set of proteins when we used CG16908 as a bait, suggesting complex formation among the identified proteins. None of the dTORC1/2 components besides dTOR was identified in CG16908 purifications, indicating that these proteins form dTOR complexes distinct from dTORC1 and dTORC2. Based on known interaction information from other species and data obtained from this study we refer to this complex as dTTT (Drosophila TOR, TELO2, TTI1) (Horejsi et al, 2010; [18]Hurov et al, 2010; [20]Kaizuka et al, 2010). A directed quantitative MS analysis of dTOR complex components suggests that dTORC1 is the most abundant dTOR complex we identified in Kc167 cells.
We next studied the potential roles of the identified network components for controlling the activity of the dInR/TOR pathway using systematic RNAi depletion and quantitative western blotting to measure the changes in abundance of phosphorylated substrates of dTORC1 (Thor/d4E-BP, dS6K) and dTORC2 (dPKB) in RNAi-treated cells (Figure 5). Overall, we could identify 16 proteins (out of 58) whose depletion caused an at least 50% increase or decrease in the levels of phosphorylated d4E-BP, S6K and/or PKB compared with control GFP RNAi. Besides established pathway components, we found several novel regulators within the dInR/TOR interaction network. For example, RNAi against the novel insulin-regulated dTORC1 component Unkempt resulted in enhanced phosphorylation of the dTORC1 substrate d4E-BP, which suggests a negative role for Unkempt on dTORC1 activity. In contrast, depletion of CG16908 and LqfR caused hypo-phosphorylation of all dTOR substrates similar to dTOR itself, suggesting a positive role for the dTTT complex on dTOR activity. Subsequently, we tested whether dTTT components also plays a role in dTOR-mediated cell growth in vivo. Depletion of both dTTT components, CG16908 and LqfR, in the Drosophila eye resulted in a substantial decrease in eye size. Likewise, FLP-FRT-mediated mitotic recombination resulted in CG16908 and LqfR mutant clones with a similar reduced growth phenotype as observed in dTOR mutant clones. Hence, the combined biochemical and genetic analysis revealed dTTT as a dTOR-containing complex required for the activity of both dTORC1 and dTORC2 and thus plays a critical role in controlling cell growth.
Taken together, these results illustrate how a systematic quantitative AP–MS approach when combined with systematic functional analysis in Drosophila can reveal novel insights into the dynamic organization of regulatory networks for cell growth control in metazoans.
Using quantitative mass spectrometry, this study reports how insulin affects the modularity of the interaction proteome of the Drosophila InR/TOR pathway, an evolutionary conserved signaling system for the control of metazoan cell growth. Systematic functional analysis linked a significant number of identified network components to the control of dTOR activity and revealed dTTT, a dTOR complex required for in vivo cell growth control by dTORC1 and dTORC2.
Genetic analysis in Drosophila melanogaster has been widely used to identify a system of genes that control cell growth in response to insulin and nutrients. Many of these genes encode components of the insulin receptor/target of rapamycin (InR/TOR) pathway. However, the biochemical context of this regulatory system is still poorly characterized in Drosophila. Here, we present the first quantitative study that systematically characterizes the modularity and hormone sensitivity of the interaction proteome underlying growth control by the dInR/TOR pathway. Applying quantitative affinity purification and mass spectrometry, we identified 97 high confidence protein interactions among 58 network components. In all, 22% of the detected interactions were regulated by insulin affecting membrane proximal as well as intracellular signaling complexes. Systematic functional analysis linked a subset of network components to the control of dTORC1 and dTORC2 activity. Furthermore, our data suggest the presence of three distinct dTOR kinase complexes, including the evolutionary conserved dTTT complex (Drosophila TOR, TELO2, TTI1). Subsequent genetic studies in flies suggest a role for dTTT in controlling cell growth via a dTORC1- and dTORC2-dependent mechanism.
PMCID: PMC3261712  PMID: 22068330
cell growth; InR/TOR pathway; interaction proteome; quantitative mass spectrometry; signaling
8.  RNAi Screening: New Approaches, Understandings and Organisms 
RNA interference (RNAi) leads to sequence-specific knockdown of gene function. The approach can be used in large-scale screens to interrogate function in various model organisms and an increasing number of other species. Genome-scale RNAi screens are routinely performed in cultured or primary cells or in vivo in organisms such as C. elegans. High-throughput RNAi screening is benefitting from the development of sophisticated new instrumentation and software tools for collecting and analyzing data, including high-content image data. The results of large-scale RNAi screens have already proved useful, leading to new understandings of gene function relevant to topics such as infection, cancer, obesity and aging. Nevertheless, important caveats apply and should be taken into consideration when developing or interpreting RNAi screens. Some level of false discovery is inherent to high-throughput approaches and specific to RNAi screens, false discovery due to off-target effects (OTEs) of RNAi reagents remains a problem. The need to improve our ability to use RNAi to elucidate gene function at large scale and in additional systems continues to be addressed through improved RNAi library design, development of innovative computational and analysis tools and other approaches.
PMCID: PMC3249004  PMID: 21953743
RNAi; high-throughput screens; high-content imaging; cell-based assays
9.  OrthoList: A Compendium of C. elegans Genes with Human Orthologs 
PLoS ONE  2011;6(5):e20085.
C. elegans is an important model for genetic studies relevant to human biology and disease. We sought to assess the orthology between C. elegans and human genes to understand better the relationship between their genomes and to generate a compelling list of candidates to streamline RNAi-based screens in this model.
We performed a meta-analysis of results from four orthology prediction programs and generated a compendium, “OrthoList”, containing 7,663 C. elegans protein-coding genes. Various assessments indicate that OrthoList has extensive coverage with low false-positive and false-negative rates. Part of this evaluation examined the conservation of components of the receptor tyrosine kinase, Notch, Wnt, TGF-ß and insulin signaling pathways, and led us to update compendia of conserved C. elegans kinases, nuclear hormone receptors, F-box proteins, and transcription factors. Comparison with two published genome-wide RNAi screens indicated that virtually all of the conserved hits would have been obtained had just the OrthoList set (∼38% of the genome) been targeted. We compiled Ortholist by InterPro domains and Gene Ontology annotation, making it easy to identify C. elegans orthologs of human disease genes for potential functional analysis.
We anticipate that OrthoList will be of considerable utility to C. elegans researchers for streamlining RNAi screens, by focusing on genes with apparent human orthologs, thus reducing screening effort by ∼60%. Moreover, we find that OrthoList provides a useful basis for annotating orthology and reveals more C. elegans orthologs of human genes in various functional groups, such as transcription factors, than previously described.
PMCID: PMC3102077  PMID: 21647448
10.  Clustering phenotype populations by genome-wide RNAi and multiparametric imaging 
How to predict gene function from phenotypic cues is a longstanding question in biology.Using quantitative multiparametric imaging, RNAi-mediated cell phenotypes were measured on a genome-wide scale.On the basis of phenotypic ‘neighbourhoods', we identified previously uncharacterized human genes as mediators of the DNA damage response pathway and the maintenance of genomic integrity.The phenotypic map is provided as an online resource at for discovering further functional relationships for a broad spectrum of biological module
Genetic screens for phenotypic similarity have made key contributions for associating genes with biological processes. Aggregating genes by similarity of their loss-of-function phenotype has provided insights into signalling pathways that have a conserved function from Drosophila to human (Nusslein-Volhard and Wieschaus, 1980; Bier, 2005). Complex visual phenotypes, such as defects in pattern formation during development, greatly facilitated the classification of genes into pathways, and phenotypic similarities in many cases predicted molecular relationships. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cultured cells has become feasible in many organisms whose genome have been sequenced (Boutros and Ahringer, 2008). One of the current challenges is the computational categorization of visual phenotypes and the prediction of gene function and associated biological processes. With large parts of the genome still being in unchartered territory, deriving functional information from large-scale phenotype analysis promises to uncover novel gene–gene relationships and to generate functional maps to explore cellular processes.
In this study, we developed an automated approach using RNAi-mediated cell phenotypes, multiparametric imaging and computational modelling to obtain functional information on previously uncharacterized genes. To generate broad, computer-readable phenotypic signatures, we measured the effect of RNAi-mediated knockdowns on changes of cell morphology in human cells on a genome-wide scale. First, the several million cells were stained for nuclear and cytoskeletal markers and then imaged using automated microscopy. On the basis of fluorescent markers, we established an automated image analysis to classify individual cells (Figure 1A). After cell segmentation for determining nuclei and cell boundaries (Figure 1C), we computed 51 cell descriptors that quantified intensities, shape characteristics and texture (Figure 1F). Individual cells were categorized into 1 of 10 classes, which included cells showing protrusion/elongation, cells in metaphase, large cells, condensed cells, cells with lamellipodia and cellular debris (Figure 1D and E). Each siRNA knockdown was summarized by a phenotypic profile and differences between RNAi knockdowns were quantified by the similarity between phenotypic profiles. We termed the vector of scores a phenoprint (Figure 3C) and defined the phenotypic distance between a pair of perturbations as the distance between their corresponding phenoprints.
To visualize the distribution of all phenoprints, we plotted them in a genome-wide map as a two-dimensional representation of the phenotypic similarity relationships (Figure 3A). The complete data set and an interactive version of the phenotypic map are available at The map identified phenotypic ‘neighbourhoods', which are characterized by cells with lamellipodia (WNK3, ANXA4), cells with prominent actin fibres (ODF2, SOD3), abundance of large cells (CA14), many elongated cells (SH2B2, ELMO2), decrease in cell number (TPX2, COPB1, COPA), increase in number of cells in metaphase (BLR1, CIB2) and combinations of phenotypes such as presence of large cells with protrusions and bright nuclei (PTPRZ1, RRM1; Figure 3B).
To test whether phenotypic similarity might serve as a predictor of gene function, we focused our further analysis on two clusters that contained genes associated with the DNA damage response (DDR) and genomic integrity (Figure 3A and C). The first phenotypic cluster included proteins with kinetochore-associated functions such as NUF2 (Figure 3B) and SGOL1. It also contained the centrosomal protein CEP164 that has been described as an important mediator of the DNA damage-activated signalling cascade (Sivasubramaniam et al, 2008) and the largely uncharacterized genes DONSON and SON. A second phenotypically distinct cluster included previously described components of the DDR pathway such as RRM1 (Figure 3A–C), CLSPN, PRIM2 and SETD8. Furthermore, this cluster contained the poorly characterized genes CADM1 and CD3EAP.
Cells activate a signalling cascade in response to DNA damage induced by exogenous and endogenous factors. Central are the kinases ATM and ATR as they serve as sensors of DNA damage and activators of further downstream kinases (Harper and Elledge, 2007; Cimprich and Cortez, 2008). To investigate whether DONSON, SON, CADM1 and CD3EAP, which were found in phenotypic ‘neighbourhoods' to known DDR components, have a role in the DNA damage signalling pathway, we tested the effect of their depletion on the DDR on γ irradiation. As indicated by reduced CHEK1 phosphorylation, siRNA knock down of DONSON, SON, CD3EAP or CADM1 resulted in impaired DDR signalling on γ irradiation. Furthermore, knock down of DONSON or SON reduced phosphorylation of downstream effectors such as NBS1, CHEK1 and the histone variant H2AX on UVC irradiation. DONSON depletion also impaired recruitment of RPA2 onto chromatin and SON knockdown reduced RPA2 phosphorylation indicating that DONSON and SON presumably act downstream of the activation of ATM. In agreement to their phenotypic profile, these results suggest that DONSON, SON, CADM1 and CD3EAP are important mediators of the DDR. Further experiments demonstrated that they are also required for the maintenance of genomic integrity.
In summary, we show that genes with similar phenotypic profiles tend to share similar functions. The power of our computational and experimental approach is demonstrated by the identification of novel signalling regulators whose phenotypic profiles were found in proximity to known biological modules. Therefore, we believe that such phenotypic maps can serve as a resource for functional discovery and characterization of unknown genes. Furthermore, such approaches are also applicable for other perturbation reagents, such as small molecules in drug discovery and development. One could also envision combined maps that contain both siRNAs and small molecules to predict target–small molecule relationships and potential side effects.
Genetic screens for phenotypic similarity have made key contributions to associating genes with biological processes. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cells has become feasible. One of the current challenges however is the computational categorization of visual phenotypes and the prediction of biological function and processes. In this study, we describe a combined computational and experimental approach to discover novel gene functions and explore functional relationships. We performed a genome-wide RNAi screen in human cells and used quantitative descriptors derived from high-throughput imaging to generate multiparametric phenotypic profiles. We show that profiles predicted functions of genes by phenotypic similarity. Specifically, we examined several candidates including the largely uncharacterized gene DONSON, which shared phenotype similarity with known factors of DNA damage response (DDR) and genomic integrity. Experimental evidence supports that DONSON is a novel centrosomal protein required for DDR signalling and genomic integrity. Multiparametric phenotyping by automated imaging and computational annotation is a powerful method for functional discovery and mapping the landscape of phenotypic responses to cellular perturbations.
PMCID: PMC2913390  PMID: 20531400
DNA damage response signalling; massively parallel phenotyping; phenotype networks; RNAi screening
11.  Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps 
PLoS Computational Biology  2007;3(11):e214.
Yeast two-hybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either a high false-discovery rate (interaction sets have low overlap because each set is contaminated by a large number of stochastic false-positive interactions) or a high false-negative rate (interaction sets have low overlap because each misses many true interactions). We extend capture–recapture theory to provide the first unified model for false-positive and false-negative rates for two-hybrid screens. Analysis of yeast, worm, and fly data indicates that 25% to 45% of the reported interactions are likely false positives. Membrane proteins have higher false-discovery rates on average, and signal transduction proteins have lower rates. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical undersampling and a 55% to 85% false-negative rate due to proteins that appear to be systematically lost from the assays. Finally, statistical model selection conclusively rejects the Erdös-Rényi network model in favor of the power law model for yeast and the truncated power law for worm and fly degree distributions. Much as genome sequencing coverage estimates were essential for planning the human genome sequencing project, the coverage estimates developed here will be valuable for guiding future proteomic screens. All software and datasets are available in Datasets S1 and S2, Figures S1–S5, and Tables S1−S6, and are also available from our Web site,
Author Summary
The genome sequence of an organism provides a parts list of proteins, but not an instruction manual for assembling the parts into a cell. Assembly instructions now come from experiments such as two-hybrid screens that detect physical interactions between pairs of proteins. Defining the resources required for generating a full interaction map requires accurate estimates of the false-negative and false-positive rates of genome-scale screens. Two-hybrid screens often select a query protein and sample its interaction partners. True partners may be missed, and false partners may be spuriously identified. This sampling process resembles a capture–recapture experiment, except that classical capture–recapture theory assumes no false positives. Novel extensions to capture–recapture theory permit its application to proteomic screens. This new theory provides statistically grounded answers to long-standing questions: false-discovery rates of high-throughput screens (possibly over 50% per unique interaction, but probably no more than 15% per clone); the quality of different screening libraries; protein properties leading to “sticky” or “promiscuous” interactions; the global network topology; and, most importantly, the coverage of existing two-hybrid maps. Models estimate roughly 30,000 total pairwise interactions in yeast and 500,000 to 1,000,000 in metazoans. The majority of these interactions remain to be discovered.
PMCID: PMC2082503  PMID: 18039026
12.  A novel method for tissue-specific RNAi rescue in Drosophila 
Nucleic Acids Research  2009;37(13):e93.
Targeted gene silencing by RNA interference allows the study of gene function in plants and animals. In cell culture and small animal models, genetic screens can be performed—even tissue-specifically in Drosophila—with genome-wide RNAi libraries. However, a major problem with the use of RNAi approaches is the unavoidable false-positive error caused by off-target effects. Until now, this is minimized by computational RNAi design, comparing RNAi to the mutant phenotype if known, and rescue with a presumed ortholog. The ultimate proof of specificity would be to restore expression of the same gene product in vivo. Here, we present a simple and efficient method to rescue the RNAi-mediated knockdown of two independent genes in Drosophila. By exploiting the degenerate genetic code, we generated Drosophila RNAi Escape Strategy Construct (RESC) rescue proteins containing frequent silent mismatches in the complete RNAi target sequence. RESC products were no longer efficiently silenced by RNAi in cell culture and in vivo. As a proof of principle, we rescue the RNAi-induced loss of function phenotype of the eye color gene white and tracheal defects caused by the knockdown of the heparan sulfate proteoglycan syndecan. Our data suggest that RESC is widely applicable to rescue and validate ubiquitous or tissue-specific RNAi and to perform protein structure–function analysis.
PMCID: PMC2715260  PMID: 19483100
13.  Using Multiple Phenotype Assays and Epistasis Testing to Enhance the Reliability of RNAi Screening and Identify Regulators of Muscle Protein Degradation 
Genes  2012;3(4):686-701.
RNAi is a convenient, widely used tool for screening for genes of interest. We have recently used this technology to screen roughly 750 candidate genes, in C. elegans, for potential roles in regulating muscle protein degradation in vivo. To maximize confidence and assess reproducibility, we have only used previously validated RNAi constructs and have included time courses and replicates. To maximize mechanistic understanding, we have examined multiple sub-cellular phenotypes in multiple compartments in muscle. We have also tested knockdowns of putative regulators of degradation in the context of mutations or drugs that were previously shown to inhibit protein degradation by diverse mechanisms. Here we discuss how assaying multiple phenotypes, multiplexing RNAi screens with use of mutations and drugs, and use of bioinformatics can provide more data on rates of potential false positives and negatives as well as more mechanistic insight than simple RNAi screening.
PMCID: PMC3495584  PMID: 23152949
RNAi; Systems Biology; Network Biology; Functional Genomics; Muscle; Proteolysis; C. elegans
14.  Using Multiple Phenotype Assays and Epistasis Testing to Enhance the Reliability of RNAi Screening and Identify Regulators of Muscle Protein Degradation  
Genes  2012;3(4):686-701.
RNAi is a convenient, widely used tool for screening for genes of interest. We have recently used this technology to screen roughly 750 candidate genes, in C. elegans, for potential roles in regulating muscle protein degradation in vivo. To maximize confidence and assess reproducibility, we have only used previously validated RNAi constructs and have included time courses and replicates. To maximize mechanistic understanding, we have examined multiple sub-cellular phenotypes in multiple compartments in muscle. We have also tested knockdowns of putative regulators of degradation in the context of mutations or drugs that were previously shown to inhibit protein degradation by diverse mechanisms. Here we discuss how assaying multiple phenotypes, multiplexing RNAi screens with use of mutations and drugs, and use of bioinformatics can provide more data on rates of potential false positives and negatives as well as more mechanistic insight than simple RNAi screening.
PMCID: PMC3495584  PMID: 23152949
RNAi; systems Biology; network Biology; functional Genomics; muscle; proteolysis; C. elegans
15.  Advances in genome-wide RNAi cellular screens: a case study using the Drosophila JAK/STAT pathway 
BMC Genomics  2012;13:506.
Genome-scale RNA-interference (RNAi) screens are becoming ever more common gene discovery tools. However, whilst every screen identifies interacting genes, less attention has been given to how factors such as library design and post-screening bioinformatics may be effecting the data generated.
Here we present a new genome-wide RNAi screen of the Drosophila JAK/STAT signalling pathway undertaken in the Sheffield RNAi Screening Facility (SRSF). This screen was carried out using a second-generation, computationally optimised dsRNA library and analysed using current methods and bioinformatic tools. To examine advances in RNAi screening technology, we compare this screen to a biologically very similar screen undertaken in 2005 with a first-generation library. Both screens used the same cell line, reporters and experimental design, with the SRSF screen identifying 42 putative regulators of JAK/STAT signalling, 22 of which verified in a secondary screen and 16 verified with an independent probe design. Following reanalysis of the original screen data, comparisons of the two gene lists allows us to make estimates of false discovery rates in the SRSF data and to conduct an assessment of off-target effects (OTEs) associated with both libraries. We discuss the differences and similarities between the resulting data sets and examine the relative improvements in gene discovery protocols.
Our work represents one of the first direct comparisons between first- and second-generation libraries and shows that modern library designs together with methodological advances have had a significant influence on genome-scale RNAi screens.
PMCID: PMC3526451  PMID: 23006893
Genome screening; RNAi; Off-target effect; JAK/STAT pathway; Functional genomics; dsRNA
16.  GUItars: A GUI Tool for Analysis of High-Throughput RNA Interference Screening Data 
PLoS ONE  2012;7(11):e49386.
High-throughput RNA interference (RNAi) screening has become a widely used approach to elucidating gene functions. However, analysis and annotation of large data sets generated from these screens has been a challenge for researchers without a programming background. Over the years, numerous data analysis methods were produced for plate quality control and hit selection and implemented by a few open-access software packages. Recently, strictly standardized mean difference (SSMD) has become a widely used method for RNAi screening analysis mainly due to its better control of false negative and false positive rates and its ability to quantify RNAi effects with a statistical basis. We have developed GUItars to enable researchers without a programming background to use SSMD as both a plate quality and a hit selection metric to analyze large data sets.
The software is accompanied by an intuitive graphical user interface for easy and rapid analysis workflow. SSMD analysis methods have been provided to the users along with traditionally-used z-score, normalized percent activity, and t-test methods for hit selection. GUItars is capable of analyzing large-scale data sets from screens with or without replicates. The software is designed to automatically generate and save numerous graphical outputs known to be among the most informative high-throughput data visualization tools capturing plate-wise and screen-wise performances. Graphical outputs are also written in HTML format for easy access, and a comprehensive summary of screening results is written into tab-delimited output files.
With GUItars, we demonstrated robust SSMD-based analysis workflow on a 3840-gene small interfering RNA (siRNA) library and identified 200 siRNAs that increased and 150 siRNAs that decreased the assay activities with moderate to stronger effects. GUItars enables rapid analysis and illustration of data from large- or small-scale RNAi screens using SSMD and other traditional analysis methods. The software is freely available at
PMCID: PMC3502531  PMID: 23185323
17.  Comparative Genomics Reveals Two Novel RNAi Factors in Trypanosoma brucei and Provides Insight into the Core Machinery 
PLoS Pathogens  2012;8(5):e1002678.
The introduction ten years ago of RNA interference (RNAi) as a tool for molecular exploration in Trypanosoma brucei has led to a surge in our understanding of the pathogenesis and biology of this human parasite. In particular, a genome-wide RNAi screen has recently been combined with next-generation Illumina sequencing to expose catalogues of genes associated with loss of fitness in distinct developmental stages. At present, this technology is restricted to RNAi-positive protozoan parasites, which excludes T. cruzi, Leishmania major, and Plasmodium falciparum. Therefore, elucidating the mechanism of RNAi and identifying the essential components of the pathway is fundamental for improving RNAi efficiency in T. brucei and for transferring the RNAi tool to RNAi-deficient pathogens. Here we used comparative genomics of RNAi-positive and -negative trypanosomatid protozoans to identify the repertoire of factors in T. brucei. In addition to the previously characterized Argonaute 1 (AGO1) protein and the cytoplasmic and nuclear Dicers, TbDCL1 and TbDCL2, respectively, we identified the RNA Interference Factors 4 and 5 (TbRIF4 and TbRIF5). TbRIF4 is a 3′-5′ exonuclease of the DnaQ superfamily and plays a critical role in the conversion of duplex siRNAs to the single-stranded form, thus generating a TbAGO1-siRNA complex required for target-specific cleavage. TbRIF5 is essential for cytoplasmic RNAi and appears to act as a TbDCL1 cofactor. The availability of the core RNAi machinery in T. brucei provides a platform to gain mechanistic insights in this ancient eukaryote and to identify the minimal set of components required to reconstitute RNAi in RNAi-deficient parasites.
Author Summary
RNA interference (RNAi), a naturally-occurring pathway whereby the presence of double-stranded RNA in a cell triggers the degradation of homologous mRNA, has been harnessed in many organisms as an invaluable molecular biology tool to interrogate gene function. Although this technology is widely used in the protozoan parasite Trypanosoma brucei, other parasites of considerable public health significance, such as Trypanosoma cruzi, Leishmania major, and Plasmodium falciparum do not perform RNAi. Since RNAi has recently been introduced into budding yeast, this opens up the possibility that RNAi can be reconstituted in these pathogens. The key to this is getting a handle on the essential RNAi factors in T. brucei. By applying comparative genomics we identified five genes that are present in the RNAi-proficient species, but not in RNAi-deficient species: three previously identified RNAi factors, and two novel ones, which are described here. This insight into the core T. brucei RNAi machinery represents a major step towards transferring this pathway to RNAi-deficient parasites.
PMCID: PMC3359990  PMID: 22654659
18.  Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe 
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe.
The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences.Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation.Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011).
Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance.
Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci.
At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression.
To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation.
If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs.
The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus.
Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome.
Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product.
Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase.
Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates.
These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell.
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
PMCID: PMC3738847  PMID: 22186733
antisense; meiosis; ncRNA; S. pombe; siRNA
19.  Neuron-Specific Feeding RNAi in C. elegans and Its Use in a Screen for Essential Genes Required for GABA Neuron Function 
PLoS Genetics  2013;9(11):e1003921.
Forward genetic screens are important tools for exploring the genetic requirements for neuronal function. However, conventional forward screens often have difficulty identifying genes whose relevant functions are masked by pleiotropy. In particular, if loss of gene function results in sterility, lethality, or other severe pleiotropy, neuronal-specific functions cannot be readily analyzed. Here we describe a method in C. elegans for generating cell-specific knockdown in neurons using feeding RNAi and its application in a screen for the role of essential genes in GABAergic neurons. We combine manipulations that increase the sensitivity of select neurons to RNAi with manipulations that block RNAi in other cells. We produce animal strains in which feeding RNAi results in restricted gene knockdown in either GABA-, acetylcholine-, dopamine-, or glutamate-releasing neurons. In these strains, we observe neuron cell-type specific behavioral changes when we knock down genes required for these neurons to function, including genes encoding the basal neurotransmission machinery. These reagents enable high-throughput, cell-specific knockdown in the nervous system, facilitating rapid dissection of the site of gene action and screening for neuronal functions of essential genes. Using the GABA-specific RNAi strain, we screened 1,320 RNAi clones targeting essential genes on chromosomes I, II, and III for their effect on GABA neuron function. We identified 48 genes whose GABA cell-specific knockdown resulted in reduced GABA motor output. This screen extends our understanding of the genetic requirements for continued neuronal function in a mature organism.
Author Summary
Living organisms often reuse the same genes multiple times for different purposes. If one function of a gene is essential, death or arrest of the mutant masks other functions. Understanding the functions of essential genes is particularly critical in the nervous system, which must maintain plasticity and fend off disease long after development is complete. However, current strategies for generating conditional knockouts rely on making a new transgenic animal for each gene and thus are not useful for forward genetic screens or for other experiments involving a large number of genes. We have developed a technique in C. elegans for generating gene knockdown in selected neuron sub-types in response to feeding RNAi. Using this technique, we performed a screen aimed at identifying essential genes that are required for the function of mature GABAergic neurons. By knocking these genes down in only GABAergic neurons, we can circumvent the muddying effects of pleiotropy and find essential genes that function cell intrinsically to promote GABA neuron function. The genes we identified using this method provide a more complete understanding of the complex genetic requirements of post-developmental neurons.
PMCID: PMC3820814  PMID: 24244189
20.  Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources 
The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs.
In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and – on the other hand – the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions.
The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard.
PMCID: PMC4021975  PMID: 24112383
21.  Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection 
Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.
We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.
Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays location-based genetic variation frequency, impact on protein function, association with known genetic variations and a quality score of the variation base derived from Huvariome Core and the Diversity Panel data. These results may be used to identify and prioritize rare variants that, for example, might be disease relevant. In testing the accuracy of the Huvariome database, alleles of a selection of ambiguously called coding single nucleotide variants were successfully predicted in all cases. Data protection of individuals is ensured by restricted access to patient derived genomes from the host institution which is relevant for future molecular diagnostics.
PMCID: PMC3549785  PMID: 23164068
Medical genetics; Medical genomics; Whole genome sequencing; Allele frequency; Cardiomyopathy
22.  Genome-Wide RNAi of C. elegans Using the Hypersensitive rrf-3 Strain Reveals Novel Gene Functions 
PLoS Biology  2003;1(1):e12.
RNA-mediated interference (RNAi) is a method to inhibit gene function by introduction of double-stranded RNA (dsRNA). Recently, an RNAi library was constructed that consists of bacterial clones expressing dsRNA, corresponding to nearly 90% of the 19,427 predicted genes of C. elegans. Feeding of this RNAi library to the standard wild-type laboratory strain Bristol N2 detected phenotypes for approximately 10% of the corresponding genes. To increase the number of genes for which a loss-of-function phenotype can be detected, we undertook a genome-wide RNAi screen using the rrf-3 mutant strain, which we found to be hypersensitive to RNAi. Feeding of the RNAi library to rrf-3 mutants resulted in additional loss-of-function phenotypes for 393 genes, increasing the number of genes with a phenotype by 23%. These additional phenotypes are distributed over different phenotypic classes. We also studied interexperimental variability in RNAi results and found persistent levels of false negatives. In addition, we used the RNAi phenotypes obtained with the genome-wide screens to systematically clone seven existing genetic mutants with visible phenotypes. The genome-wide RNAi screen using rrf-3 significantly increased the functional data on the C. elegans genome. The resulting dataset will be valuable in conjunction with other functional genomics approaches, as well as in other model organisms.
The screen suggested functions for 393 genes for which no RNAi-mediated phenotype was known. The comparison with similar screens in worms has general implications for RNAi experiments
PMCID: PMC212692  PMID: 14551910
23.  Identification of Drosophila Mitotic Genes by Combining Co-Expression Analysis and RNA Interference 
PLoS Genetics  2008;4(7):e1000126.
RNAi screens have, to date, identified many genes required for mitotic divisions of Drosophila tissue culture cells. However, the inventory of such genes remains incomplete. We have combined the powers of bioinformatics and RNAi technology to detect novel mitotic genes. We found that Drosophila genes involved in mitosis tend to be transcriptionally co-expressed. We thus constructed a co-expression–based list of 1,000 genes that are highly enriched in mitotic functions, and we performed RNAi for each of these genes. By limiting the number of genes to be examined, we were able to perform a very detailed phenotypic analysis of RNAi cells. We examined dsRNA-treated cells for possible abnormalities in both chromosome structure and spindle organization. This analysis allowed the identification of 142 mitotic genes, which were subdivided into 18 phenoclusters. Seventy of these genes have not previously been associated with mitotic defects; 30 of them are required for spindle assembly and/or chromosome segregation, and 40 are required to prevent spontaneous chromosome breakage. We note that the latter type of genes has never been detected in previous RNAi screens in any system. Finally, we found that RNAi against genes encoding kinetochore components or highly conserved splicing factors results in identical defects in chromosome segregation, highlighting an unanticipated role of splicing factors in centromere function. These findings indicate that our co-expression–based method for the detection of mitotic functions works remarkably well. We can foresee that elaboration of co-expression lists using genes in the same phenocluster will provide many candidate genes for small-scale RNAi screens aimed at completing the inventory of mitotic proteins.
Author Summary
Mitosis is the evolutionarily conserved process that enables a dividing cell to equally partition its genetic material between the two daughter cells. The fidelity of mitotic division is crucial for normal development of multicellular organisms and to prevent cancer or birth defects. Understanding the molecular mechanisms of mitosis requires the identification of genes involved in this process. Previous studies have shown that such genes can be readily identified by RNA interference (RNAi) in Drosophila tissue culture cells. Because the inventory of mitotic genes is still incomplete, we have undertaken an RNAi screen using a novel approach. We used a co-expression–based bioinformatic procedure to select a group of 1,000 genes enriched in mitotic functions from a dataset of 13,166 Drosophila genes. This group includes roughly half of the known mitotic genes, implying that it should contain half of all mitotic genes, including those that are currently unknown. We performed RNAi against each of the 1,000 genes in the group. By limiting the number of genes to be examined, we were able to perform a very detailed phenotypic analysis of RNAi cells. This analysis allowed the identification of 70 genes whose mitotic role was previously unknown; 30 are required for proper chromosome segregation and 40 are required to maintain chromosome integrity.
PMCID: PMC2537813  PMID: 18797514
24.  Screening Mammography for Women Aged 40 to 49 Years at Average Risk for Breast Cancer 
Executive Summary
The aim of this review was to determine the effectiveness of screening mammography in women aged 40 to 49 years at average risk for breast cancer.
Clinical Need
The effectiveness of screening mammography in women aged over 50 years has been established, yet the issue of screening in women aged 40 to 49 years is still unsettled. The Canadian Task Force of Preventive Services, which sets guidelines for screening mammography for all provinces, supports neither the inclusion nor the exclusion of this screening procedure for 40- to 49-year-old women from the periodic health examination. In addition to this, 2 separate reviews, one conducted in Quebec in 2005 and the other in Alberta in 2000, each concluded that there is an absence of convincing evidence on the effectiveness of screening mammography for women in this age group who are at average risk for breast cancer.
In the United States, there is disagreement among organizations on whether population-based mammography should begin at the age of 40 or 50 years. The National Institutes of Health, the American Association for Cancer Research, and the American Academy of Family Physicians recommend against screening women in their 40s, whereas the United States Preventive Services Task Force, the National Cancer Institute, the American Cancer Society, the American College of Radiology, and the American College of Obstetricians and Gynecologists recommend screening mammograms for women aged 40 to 49 years. Furthermore, in comparing screening guidelines between Canada and the United States, it is also important to recognize that “standard care” within a socialized medical system such as Canada’s differs from that of the United States. The National Breast Screening Study (NBSS-1), a randomized screening trial conducted in multiple centres across Canada, has shown there is no benefit in mortality from breast cancer from annual mammograms in women randomized between the ages of 40 and 49, relative to standard care (i.e. physical exam and teaching of breast-self examination on entry to the study, with usual community care thereafter).
At present, organized screening programs in Canada systematically screen women starting at 50 years of age, although with a physician’s referral, a screening mammogram is an insured service in Ontario for women under 50 years of age.
International estimates of the epidemiology of breast cancer show that the incidence of breast cancer is increasing for all ages combined, whereas mortality is decreasing, though at a slower rate. These decreasing mortality rates may be attributed to screening and advances in breast cancer therapy over time. Decreases in mortality attributable to screening may be a result of the earlier detection and treatment of invasive cancers, in addition to the increased detection of ductal carcinoma in situ (DCIS), of which certain subpathologies are less lethal. Evidence from the SEER cancer registry in the United States indicates that the age-adjusted incidence of DCIS has increased almost 10-fold over a 20-year period (from 2.7 to 25 per 100,000).
The incidence of breast cancer is lower in women aged 40 to 49 years than in women aged 50 to 69 years (about 140 per 100,000 versus 500 per 100,000 women, respectively), as is the sensitivity (about 75% versus 85% for women aged under and over 50, respectively) and specificity of mammography (about 80% versus 90% for women aged under and over 50, respectively). The increased density of breast tissue in younger women is mainly responsible for the lower accuracy of this procedure in this age group. In addition, as the proportion of breast cancers that occur before the age of 50 are more likely to be associated with genetic predisposition as compared with those diagnosed in women after the age of 50, mammography may not be an optimal screening method for younger women.
Treatment options vary with the stage of disease (based on tumor size, involvement of surrounding tissue, and number of affected axillary lymph nodes) and its pathology, and may include a combination of surgery, chemotherapy, and/or radiotherapy.
Surgery is the first-line intervention for biopsy confirmed tumours. The subsequent use of radiation, chemotherapy, or hormonal treatments is dependent on the histopathologic characteristics of the tumor and the type of surgery. There is controversy regarding the optimal treatment of DCIS, which is noninvasive.
With such controversy as to the effectiveness of mammography and the potential risk associated with women being overtreated or actual cancers being missed, and the increased risk of breast cancer associated with exposure to annual mammograms over a 10-year period, the Ontario Health Technology Advisory Committee requested this review of screening mammography in women aged 40 to 49 years at average risk for breast cancer. This review is the first of 2 parts and concentrates on the effectiveness of screening mammography (i.e., film mammography, FM) for women at average risk aged 40 to 49 years. The second part will be an evaluation of screening by either magnetic resonance imaging or digital mammography, with the objective of determining the optimal screening modality in these younger women.
Review Strategy
The following questions were asked:
Does screening mammography for women aged 40 to 49 years who are at average risk for breast cancer reduce breast cancer mortality?
What is the sensitivity and specificity of mammography for this age group?
What are the risks associated with annual screening from ages 40 to 49?
What are the risks associated with false positive and false negative mammography results?
What are the economic considerations if evidence for effectiveness is established?
The Medical Advisory Secretariat followed its standard procedures and searched these electronic databases: Ovid MEDLINE, EMBASE, Ovid MEDLINE In-Process and Other Non-Indexed Citations, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews and the International Network of Agencies for Health Technology Assessment.
Keywords used in the search were breast cancer, breast neoplasms, mass screening, and mammography.
In total, the search yielded 6,359 articles specific to breast cancer screening and mammography. This did not include reports on diagnostic mammograms. The search was further restricted to English-language randomized controlled trials (RCTs), systematic reviews, and meta-analyses published between 1995 and 2005. Excluded were case reports, comments, editorials, and letters, which narrowed the results to 516 articles and previous health technology policy assessments.
These were examined against the criteria outlined below. This resulted in the inclusion of 5 health technology assessments, the Canadian Preventive Services Task Force report, the United States Preventive Services Task Force report, 1 Cochrane review, and 8 RCTs.
Inclusion Criteria
English-language articles, and English and French-language health technology policy assessments, conducted by other organizations, from 1995 to 2005
Articles specific to RCTs of screening mammography of women at average risk for breast cancer that included results for women randomized to studies between the ages of 40 and 49 years
Studies in which women were randomized to screening with or without mammography, although women may have had clinical breast examinations and/or may have been conducting breast self-examination.
UK Age Trial results published in December 2006.
Exclusion Criteria
Observational studies, including those nested within RCTs
RCTs that do not include results on women between the ages of 40 and 49 at randomization
Studies in which mammography was compared with other radiologic screening modalities, for example, digital mammography, magnetic resonance imaging or ultrasound.
Studies in which women randomized had a personal history of breast cancer.
Film mammography
Within RCTs, the comparison group would have been women randomized to not undergo screening mammography, although they may have had clinical breast examinations and/or have been conducting breast self-examination.
Outcomes of Interest
Breast cancer mortality
Summary of Findings
There is Level 1 Canadian evidence that screening women between the ages of 40 and 49 years who are at average risk for breast cancer is not effective, and that the absence of a benefit is sustained over a maximum follow-up period of 16 years.
All remaining studies that reported on women aged under 50 years were based on subset analyses. They provide additional evidence that, when all these RCTs are taken into account, there is no significant reduction in breast cancer mortality associated with screening mammography in women aged 40 to 49 years.
There is Level 1 evidence that screening mammography in women aged 40 to 49 years at average risk for breast cancer is not effective in reducing mortality.
Moreover, risks associated with exposure to mammographic radiation, the increased risk of missed cancers due to lower mammographic sensitivity, and the psychological impact of false positives, are not inconsequential.
The UK Age Trial results published in December 2006 did not change these conclusions.
PMCID: PMC3377515  PMID: 23074501
25.  An analysis of normalization methods for Drosophila RNAi genomic screens and development of a robust validation scheme 
Journal of biomolecular screening  2008;13(8):777-784.
Genome-wide RNAi screening is a powerful, yet relatively immature technology that allows investigation into the role of individual genes in a process of choice. Most RNAi screens identify a large number of genes with a continuous gradient in the assessed phenotype. Screeners must then decide whether to examine just those genes with the most robust phenotype or to examine the full gradient of genes that cause an effect and how to identify the candidate genes to be validated. We have used RNAi in Drosophila cells to examine viability in a 384-well plate format and compare two screens, untreated control and treatment. We compare multiple normalization methods, which take advantage of different features within the data, including quantile normalization, background subtraction, scaling, cellHTS2 1, and interquartile range measurement. Considering the false-positive potential that arises from RNAi technology, a robust validation method was designed for the purpose of gene selection for future investigations. In a retrospective analysis, we describe the use of validation data to evaluate each normalization method. While no normalization method worked ideally, we found that a combination of two methods, background subtraction followed by quantile normalization and cellHTS2, at different thresholds, captures the most dependable and diverse candidate genes. Thresholds are suggested depending on whether a few candidate genes are desired or a more extensive systems level analysis is sought. In summary, our normalization approaches and experimental design to perform validation experiments are likely to apply to those high-throughput screening systems attempting to identify genes for systems level analysis.
PMCID: PMC2956424  PMID: 18753689
RNAi; high-throughput screen; normalization; validation

Results 1-25 (1602204)