PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1588341)

Clipboard (0)
None

Related Articles

1.  Clustering phenotype populations by genome-wide RNAi and multiparametric imaging 
How to predict gene function from phenotypic cues is a longstanding question in biology.Using quantitative multiparametric imaging, RNAi-mediated cell phenotypes were measured on a genome-wide scale.On the basis of phenotypic ‘neighbourhoods', we identified previously uncharacterized human genes as mediators of the DNA damage response pathway and the maintenance of genomic integrity.The phenotypic map is provided as an online resource at http://www.cellmorph.org for discovering further functional relationships for a broad spectrum of biological module
Genetic screens for phenotypic similarity have made key contributions for associating genes with biological processes. Aggregating genes by similarity of their loss-of-function phenotype has provided insights into signalling pathways that have a conserved function from Drosophila to human (Nusslein-Volhard and Wieschaus, 1980; Bier, 2005). Complex visual phenotypes, such as defects in pattern formation during development, greatly facilitated the classification of genes into pathways, and phenotypic similarities in many cases predicted molecular relationships. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cultured cells has become feasible in many organisms whose genome have been sequenced (Boutros and Ahringer, 2008). One of the current challenges is the computational categorization of visual phenotypes and the prediction of gene function and associated biological processes. With large parts of the genome still being in unchartered territory, deriving functional information from large-scale phenotype analysis promises to uncover novel gene–gene relationships and to generate functional maps to explore cellular processes.
In this study, we developed an automated approach using RNAi-mediated cell phenotypes, multiparametric imaging and computational modelling to obtain functional information on previously uncharacterized genes. To generate broad, computer-readable phenotypic signatures, we measured the effect of RNAi-mediated knockdowns on changes of cell morphology in human cells on a genome-wide scale. First, the several million cells were stained for nuclear and cytoskeletal markers and then imaged using automated microscopy. On the basis of fluorescent markers, we established an automated image analysis to classify individual cells (Figure 1A). After cell segmentation for determining nuclei and cell boundaries (Figure 1C), we computed 51 cell descriptors that quantified intensities, shape characteristics and texture (Figure 1F). Individual cells were categorized into 1 of 10 classes, which included cells showing protrusion/elongation, cells in metaphase, large cells, condensed cells, cells with lamellipodia and cellular debris (Figure 1D and E). Each siRNA knockdown was summarized by a phenotypic profile and differences between RNAi knockdowns were quantified by the similarity between phenotypic profiles. We termed the vector of scores a phenoprint (Figure 3C) and defined the phenotypic distance between a pair of perturbations as the distance between their corresponding phenoprints.
To visualize the distribution of all phenoprints, we plotted them in a genome-wide map as a two-dimensional representation of the phenotypic similarity relationships (Figure 3A). The complete data set and an interactive version of the phenotypic map are available at http://www.cellmorph.org. The map identified phenotypic ‘neighbourhoods', which are characterized by cells with lamellipodia (WNK3, ANXA4), cells with prominent actin fibres (ODF2, SOD3), abundance of large cells (CA14), many elongated cells (SH2B2, ELMO2), decrease in cell number (TPX2, COPB1, COPA), increase in number of cells in metaphase (BLR1, CIB2) and combinations of phenotypes such as presence of large cells with protrusions and bright nuclei (PTPRZ1, RRM1; Figure 3B).
To test whether phenotypic similarity might serve as a predictor of gene function, we focused our further analysis on two clusters that contained genes associated with the DNA damage response (DDR) and genomic integrity (Figure 3A and C). The first phenotypic cluster included proteins with kinetochore-associated functions such as NUF2 (Figure 3B) and SGOL1. It also contained the centrosomal protein CEP164 that has been described as an important mediator of the DNA damage-activated signalling cascade (Sivasubramaniam et al, 2008) and the largely uncharacterized genes DONSON and SON. A second phenotypically distinct cluster included previously described components of the DDR pathway such as RRM1 (Figure 3A–C), CLSPN, PRIM2 and SETD8. Furthermore, this cluster contained the poorly characterized genes CADM1 and CD3EAP.
Cells activate a signalling cascade in response to DNA damage induced by exogenous and endogenous factors. Central are the kinases ATM and ATR as they serve as sensors of DNA damage and activators of further downstream kinases (Harper and Elledge, 2007; Cimprich and Cortez, 2008). To investigate whether DONSON, SON, CADM1 and CD3EAP, which were found in phenotypic ‘neighbourhoods' to known DDR components, have a role in the DNA damage signalling pathway, we tested the effect of their depletion on the DDR on γ irradiation. As indicated by reduced CHEK1 phosphorylation, siRNA knock down of DONSON, SON, CD3EAP or CADM1 resulted in impaired DDR signalling on γ irradiation. Furthermore, knock down of DONSON or SON reduced phosphorylation of downstream effectors such as NBS1, CHEK1 and the histone variant H2AX on UVC irradiation. DONSON depletion also impaired recruitment of RPA2 onto chromatin and SON knockdown reduced RPA2 phosphorylation indicating that DONSON and SON presumably act downstream of the activation of ATM. In agreement to their phenotypic profile, these results suggest that DONSON, SON, CADM1 and CD3EAP are important mediators of the DDR. Further experiments demonstrated that they are also required for the maintenance of genomic integrity.
In summary, we show that genes with similar phenotypic profiles tend to share similar functions. The power of our computational and experimental approach is demonstrated by the identification of novel signalling regulators whose phenotypic profiles were found in proximity to known biological modules. Therefore, we believe that such phenotypic maps can serve as a resource for functional discovery and characterization of unknown genes. Furthermore, such approaches are also applicable for other perturbation reagents, such as small molecules in drug discovery and development. One could also envision combined maps that contain both siRNAs and small molecules to predict target–small molecule relationships and potential side effects.
Genetic screens for phenotypic similarity have made key contributions to associating genes with biological processes. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cells has become feasible. One of the current challenges however is the computational categorization of visual phenotypes and the prediction of biological function and processes. In this study, we describe a combined computational and experimental approach to discover novel gene functions and explore functional relationships. We performed a genome-wide RNAi screen in human cells and used quantitative descriptors derived from high-throughput imaging to generate multiparametric phenotypic profiles. We show that profiles predicted functions of genes by phenotypic similarity. Specifically, we examined several candidates including the largely uncharacterized gene DONSON, which shared phenotype similarity with known factors of DNA damage response (DDR) and genomic integrity. Experimental evidence supports that DONSON is a novel centrosomal protein required for DDR signalling and genomic integrity. Multiparametric phenotyping by automated imaging and computational annotation is a powerful method for functional discovery and mapping the landscape of phenotypic responses to cellular perturbations.
doi:10.1038/msb.2010.25
PMCID: PMC2913390  PMID: 20531400
DNA damage response signalling; massively parallel phenotyping; phenotype networks; RNAi screening
2.  Phenotype Recognition with Combined Features and Random Subspace Classifier Ensemble 
BMC Bioinformatics  2011;12:128.
Background
Automated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.
Results
Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.
Conclusions
The characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.
doi:10.1186/1471-2105-12-128
PMCID: PMC3098787  PMID: 21529372
3.  A novel phenotypic dissimilarity method for image-based high-throughput screens 
BMC Bioinformatics  2013;14:336.
Background
Discovering functional relationships of genes through cell-based phenotyping has become an important approach in functional genomics. High-throughput imaging offers the ability to quantitatively assess complex phenotypes after perturbation by RNA interference (RNAi). Such image-based high-throughput RNAi screening studies have facilitated the discovery of novel components of gene networks and their interactions. Images generated by automated microscopy are typically analyzed by extracting quantitative features of individual cells, resulting in large multidimensional data sets. Robust and sensitive methods to interpret these data sets and to derive biologically relevant information in a high-throughput and unbiased manner remain to be developed.
Results
Here we propose a new analysis method, PhenoDissim, which computes the phenotypic dissimilarity between cell populations via Support Vector Machine classification and cross validation. Applying this method to a kinome RNAi screening data set, we demonstrate that the proposed method shows a good replicate reproducibility, separation of controls and clustering quality, and we are able to identify siRNA phenotypes and discover potential functional links between genes.
Conclusions
PhenoDissim is a novel analysis method for image-based high-throughput screen, relying on two parameters which can be automatically optimized without a priori knowledge. PhenoDissim is freely available as an R package.
doi:10.1186/1471-2105-14-336
PMCID: PMC4225524  PMID: 24256072
Phenotypic dissimilarity; Image-based high-throughput screening; High-content screening; RNAi; Gene networks
4.  Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens 
BMC Bioinformatics  2008;9:264.
Background
The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.
Results
Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.
Conclusion
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.
doi:10.1186/1471-2105-9-264
PMCID: PMC2443381  PMID: 18534020
5.  Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models 
PLoS Computational Biology  2011;7(7):e1002098.
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.
Author Summary
High throughput image acquisition is a quickly increasing new source of data for problems in computational biology, such as phenotypic screens. Given the very diverse nature of imaging technology, samples, and biological questions, approaches are oftentimes very tailored and ad hoc to a specific data set. In particular, the image-based genome scale profiling of gene expression patterns via approaches like in situ hybridization requires the development of accurate and automatic image analysis systems for understanding regulatory networks and development of multicellular organisms. Here, we present a computational method for automated annotation of Drosophila gene expression images. This framework allows us to extract, identify and compare spatial expression patterns, of essence for higher organisms. Based on a sparse feature extraction technique, we successfully cluster and annotate expression patterns with high reliability, and show that the model represents a “vocabulary” of basic patterns reflecting common function or regulation.
doi:10.1371/journal.pcbi.1002098
PMCID: PMC3140966  PMID: 21814502
6.  Automated identification of pathways from quantitative genetic interaction data 
We present a novel Bayesian learning method that reconstructs large detailed gene networks from quantitative genetic interaction (GI) data.The method uses global reasoning to handle missing and ambiguous measurements, and provide confidence estimates for each prediction.Applied to a recent data set over genes relevant to protein folding, the learned networks reflect known biological pathways, including details such as pathway ordering and directionality of relationships.The reconstructed networks also suggest novel relationships, including the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated.
Recent developments have enabled large-scale quantitative measurement of genetic interactions (GIs) that report on the extent to which the activity of one gene is dependent on a second. It has long been recognized (Avery and Wasserman, 1992; Hartman et al, 2001; Segre et al, 2004; Tong et al, 2004; Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Costanzo et al, 2010) that functional dependencies revealed by GI data can provide rich information regarding underlying biological pathways. Further, the precise phenotypic measurements provided by quantitative GI data can provide evidence for even more detailed aspects of pathway structure, such as differentiating between full and partial dependence between two genes (Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Jonikas et al, 2009) (Figure 1A). As GI data sets become available for a range of quantitative phenotypes and organisms, such patterns will allow researchers to elucidate pathways important to a diverse set of biological processes.
We present a new method that exploits the high-quality, quantitative nature of recent GI assays to automatically reconstruct detailed multi-gene pathway structures, including the organization of a large set of genes into coherent pathways, the connectivity and ordering within each pathway, and the directionality of each relationship. We introduce activity pathway networks (APNs), which represent functional dependencies among a set of genes in the form of a network. We present an automatic method to efficiently reconstruct APNs over large sets of genes based on quantitative GI measurements. This method handles uncertainty in the data arising from noise, missing measurements, and data points with ambiguous interpretations, by performing global reasoning that combines evidence from multiple data points. In addition, because some structure choices remain uncertain even when jointly considering all measurements, our method maintains multiple likely networks, and allows computation of confidence estimates over each structure choice.
We applied our APN reconstruction method to the recent high-quality GI data set of Jonikas et al (2009), which examined the functional interaction between genes that contribute to protein folding in the ER. Specifically, Jonikas et al used the cell's endogenous sensor (the unfolded protein response), to first identify several hundred yeast genes with functions in endoplasmic reticulum folding and then systematically characterized their functional interdependencies by measuring unfolded protein response levels in double mutants. Our analysis produced an ensemble of 500 likelihood-weighted APNs over 178 genes (Figure 2).
We performed an aggregate evaluation of our results by comparing to known biological relationships between gene pairs, including participation in pathways according to the Kyoto Encyclopedia of Genes and Genomes (KEGG), correlation of chemical genomic profiles in a recent high-throughput assay (Hillenmeyer et al, 2008) and similarity of Gene Ontology (GO) annotations. In each evaluation performed, our reconstructed APNs were significantly more consistent with the known relationships than either the raw GI values or the Pearson correlation between profiles of GI values.
Importantly, our approach provides not only an improved means for defining pairs or groups of related genes, but also enables the identification of detailed multi-gene network structures. In many cases, our method successfully reconstructed known cellular pathways, including the ER-associated degradation (ERAD) pathway, and the biosynthesis of N-linked glycans, ranking them among the highest confidence structures. In-depth examination of the learned network structures indicates agreement with many known details of these pathways. In addition, quantitative analysis indicates that our learned APNs are indicative of ordering within KEGG-annotated biological pathways.
Our results also suggest several novel relationships, including placement of uncharacterized genes into pathways, and novel relationships between characterized genes. These include the dependence of the J domain chaperone JEM1 on the PDI homolog MPD1, dependence of the Ubiquitin-recycling enzyme DOA4 on N-linked glycosylation, and the dependence of the E3 Ubiquitin ligase DOA10 on the signal peptidase complex subunit SPC2. Our APNs also place the poorly characterized TPR-containing protein SGT2 upstream of the tail-anchored protein biogenesis machinery components GET3, GET4, and MDY2 (also known as GET5), suggesting that SGT2 has a function in the insertion of tail-anchored proteins into membranes. Consistent with this prediction, our experimental analysis shows that sgt2Δ cells show a defect in localization of the tail-anchored protein GFP-Sed5 from punctuate Golgi structures to a more diffuse pattern, as seen in other genes involved in this pathway.
Our results show that multi-gene, detailed pathway networks can be reconstructed from quantitative GI data, providing a concrete computational manifestation to intuitions that have traditionally accompanied the manual interpretation of such data. Ongoing technological developments in both genetics and imaging are enabling the measurement of GI data at a genome-wide scale, using high-accuracy quantitative phenotypes that relate to a range of particular biological functions. Methods based on RNAi will soon allow collection of similar data for human cell lines and other mammalian systems (Moffat et al, 2006). Thus, computational methods for analyzing GI data could have an important function in mapping pathways involved in complex biological systems including human cells.
High-throughput quantitative genetic interaction (GI) measurements provide detailed information regarding the structure of the underlying biological pathways by reporting on functional dependencies between genes. However, the analytical tools for fully exploiting such information lag behind the ability to collect these data. We present a novel Bayesian learning method that uses quantitative phenotypes of double knockout organisms to automatically reconstruct detailed pathway structures. We applied our method to a recent data set that measures GIs for endoplasmic reticulum (ER) genes, using the unfolded protein response as a quantitative phenotype. The results provided reconstructions of known functional pathways including N-linked glycosylation and ER-associated protein degradation. It also contained novel relationships, such as the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated. Our approach should be readily applicable to the next generation of quantitative GI data sets, as assays become available for additional phenotypes and eventually higher-level organisms.
doi:10.1038/msb.2010.27
PMCID: PMC2913392  PMID: 20531408
computational biology; genetic interaction; pathway reconstruction; probabilistic methods
7.  Unsupervised automated high throughput phenotyping of RNAi time-lapse movies 
BMC Bioinformatics  2013;14:292.
Background
Gene perturbation experiments in combination with fluorescence time-lapse cell imaging are a powerful tool in reverse genetics. High content applications require tools for the automated processing of the large amounts of data. These tools include in general several image processing steps, the extraction of morphological descriptors, and the grouping of cells into phenotype classes according to their descriptors. This phenotyping can be applied in a supervised or an unsupervised manner. Unsupervised methods are suitable for the discovery of formerly unknown phenotypes, which are expected to occur in high-throughput RNAi time-lapse screens.
Results
We developed an unsupervised phenotyping approach based on Hidden Markov Models (HMMs) with multivariate Gaussian emissions for the detection of knockdown-specific phenotypes in RNAi time-lapse movies. The automated detection of abnormal cell morphologies allows us to assign a phenotypic fingerprint to each gene knockdown. By applying our method to the Mitocheck database, we show that a phenotypic fingerprint is indicative of a gene’s function.
Conclusion
Our fully unsupervised HMM-based phenotyping is able to automatically identify cell morphologies that are specific for a certain knockdown. Beyond the identification of genes whose knockdown affects cell morphology, phenotypic fingerprints can be used to find modules of functionally related genes.
doi:10.1186/1471-2105-14-292
PMCID: PMC3851277  PMID: 24090185
8.  Identification of Neural Outgrowth Genes using Genome-Wide RNAi 
PLoS Genetics  2008;4(7):e1000111.
While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi) on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new genes that have important functions in the nervous system.
Author Summary
Development and function of the brain requires the coordinated action of thousands of genes, and currently we understand the roles of only a small fraction of them. Recent advances in genomics, such as the sequencing of entire genomes and the discovery of RNA-interference as a means of testing the effects of gene loss, have opened up the possibility to systematically analyze the function of all known and predicted genes in an organism. Until now, this type of functional genomics approach has not been applied to the study of very complex cells, such as the brain's neurons, on a full-genome scale. In this work, we developed techniques to test all genes, one by one in a rapid manner, for their potential role in neuronal development using neurons isolated from fruit fly embryos. These results yielded a global perspective of what types of genes are necessary for brain development; importantly, they show that a large variety of genes can be studied in this way.
doi:10.1371/journal.pgen.1000111
PMCID: PMC2435276  PMID: 18604272
9.  Characterizing Protein Interactions Employing a Genome-Wide siRNA Cellular Phenotyping Screen 
PLoS Computational Biology  2014;10(9):e1003814.
Characterizing the activating and inhibiting effect of protein-protein interactions (PPI) is fundamental to gain insight into the complex signaling system of a human cell. A plethora of methods has been suggested to infer PPI from data on a large scale, but none of them is able to characterize the effect of this interaction. Here, we present a novel computational development that employs mitotic phenotypes of a genome-wide RNAi knockdown screen and enables identifying the activating and inhibiting effects of PPIs. Exemplarily, we applied our technique to a knockdown screen of HeLa cells cultivated at standard conditions. Using a machine learning approach, we obtained high accuracy (82% AUC of the receiver operating characteristics) by cross-validation using 6,870 known activating and inhibiting PPIs as gold standard. We predicted de novo unknown activating and inhibiting effects for 1,954 PPIs in HeLa cells covering the ten major signaling pathways of the Kyoto Encyclopedia of Genes and Genomes, and made these predictions publicly available in a database. We finally demonstrate that the predicted effects can be used to cluster knockdown genes of similar biological processes in coherent subgroups. The characterization of the activating or inhibiting effect of individual PPIs opens up new perspectives for the interpretation of large datasets of PPIs and thus considerably increases the value of PPIs as an integrated resource for studying the detailed function of signaling pathways of the cellular system of interest.
Author Summary
Mathematical models which aim to describe cellular signaling start from constructing an interaction network of effectors, mediators and their effected target proteins. Several developments came up making it easier to put these links together. Besides tediously assembling knowledge from textbooks and research articles, experimental high-throughput methods were established like Yeast-2-Hybrid assays or Fluorescence Emission Resonance Transfer. However, these methods do not elucidate the effect of such interactions. We aimed inferring if an interaction in a specific cellular context is rather activating or inhibiting. We used cellular phenotypes of a genome-wide RNAi knockdown screen of live cells to identify such activating and inhibiting effects of protein interactions. The rationale behind it is that activating protein interactions should lead to similar phenotypes when their respective genes are knocked down, whereas an inhibiting protein interaction should lead to dissimilar phenotypes. Exemplarily, we applied our method to a phenotype screen of perturbed HeLa cells. Our predictions effectively reproduced textbook relationships between proteins or domains when comparing the predicted effects with pairs of effectors, receptors, kinases, phosphatases and of general signalling modules. The presented computational approach is generic and may enable elucidating the effects of studied interactions also of other cellular systems under more specific conditions.
doi:10.1371/journal.pcbi.1003814
PMCID: PMC4178005  PMID: 25255318
10.  Quantitative and Automated High-throughput Genome-wide RNAi Screens in C. elegans 
RNA interference is a powerful method to understand gene function, especially when conducted at a whole-genome scale and in a quantitative context. In C. elegans, gene function can be knocked down simply and efficiently by feeding worms with bacteria expressing a dsRNA corresponding to a specific gene 1. While the creation of libraries of RNAi clones covering most of the C. elegans genome 2,3 opened the way for true functional genomic studies (see for example 4-7), most established methods are laborious. Moy and colleagues have developed semi-automated protocols that facilitate genome-wide screens 8. The approach relies on microscopic imaging and image analysis.
Here we describe an alternative protocol for a high-throughput genome-wide screen, based on robotic handling of bacterial RNAi clones, quantitative analysis using the COPAS Biosort (Union Biometrica (UBI)), and an integrated software: the MBioLIMS (Laboratory Information Management System from Modul-Bio) a technology that provides increased throughput for data management and sample tracking. The method allows screens to be conducted on solid medium plates. This is particularly important for some studies, such as those addressing host-pathogen interactions in C. elegans, since certain microbes do not efficiently infect worms in liquid culture.
We show how the method can be used to quantify the importance of genes in anti-fungal innate immunity in C. elegans. In this case, the approach relies on the use of a transgenic strain carrying an epidermal infection-inducible fluorescent reporter gene, with GFP under the control of the promoter of the antimicrobial peptide gene nlp 29 and a red fluorescent reporter that is expressed constitutively in the epidermis. The latter provides an internal control for the functional integrity of the epidermis and nonspecific transgene silencing9. When control worms are infected by the fungus they fluoresce green. Knocking down by RNAi a gene required for nlp 29 expression results in diminished fluorescence after infection. Currently, this protocol allows more than 3,000 RNAi clones to be tested and analyzed per week, opening the possibility of screening the entire genome in less than 2 months.
doi:10.3791/3448
PMCID: PMC3399495  PMID: 22395785
Molecular Biology;  Issue 60;  C. elegans;  fluorescent reporter;  Biosort;  LIMS;  innate immunity;  Drechmeria coniospora
11.  Integration of biological data by kernels on graph nodes allows prediction of new genes involved in mitotic chromosome condensation 
Molecular Biology of the Cell  2014;25(16):2522-2536.
A gene function prediction method suitable for the design of targeted RNAi libraries is described and used to predict chromosome condensation genes. Systematic experimental validation of candidate genes in a focused RNAi screen by automated microscopy and quantitative image analysis reveals many new chromosome condensation factors.
The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest.
doi:10.1091/mbc.E13-04-0221
PMCID: PMC4142622  PMID: 24943848
12.  Exploring systemic RNA interference in insects: a genome-wide survey for RNAi genes in Tribolium 
Genome Biology  2008;9(1):R10.
Tribolium resembles C. elegans in showing a robust systemic RNAi response, but does not have C. elegans-type RNAi mechanisms; insect systemic RNAi probably uses a different mechanism.
Background
RNA interference (RNAi) is a highly conserved cellular mechanism. In some organisms, such as Caenorhabditis elegans, the RNAi response can be transmitted systemically. Some insects also exhibit a systemic RNAi response. However, Drosophila, the leading insect model organism, does not show a robust systemic RNAi response, necessitating another model system to study the molecular mechanism of systemic RNAi in insects.
Results
We used Tribolium, which exhibits robust systemic RNAi, as an alternative model system. We have identified the core RNAi genes, as well as genes potentially involved in systemic RNAi, from the Tribolium genome. Both phylogenetic and functional analyses suggest that Tribolium has a somewhat larger inventory of core component genes than Drosophila, perhaps allowing a more sensitive response to double-stranded RNA (dsRNA). We also identified three Tribolium homologs of C. elegans sid-1, which encodes a possible dsRNA channel. However, detailed sequence analysis has revealed that these Tribolium homologs share more identity with another C. elegans gene, tag-130. We analyzed tag-130 mutants, and found that this gene does not have a function in systemic RNAi in C. elegans. Likewise, the Tribolium sid-like genes do not seem to be required for systemic RNAi. These results suggest that insect sid-1-like genes have a different function than dsRNA uptake. Moreover, Tribolium lacks homologs of several genes important for RNAi in C. elegans.
Conclusion
Although both Tribolium and C. elegans show a robust systemic RNAi response, our genome-wide survey reveals significant differences between the RNAi mechanisms of these organisms. Thus, insects may use an alternative mechanism for the systemic RNAi response. Understanding this process would assist with rendering other insects amenable to systemic RNAi, and may influence pest control approaches.
doi:10.1186/gb-2008-9-1-r10
PMCID: PMC2395250  PMID: 18201385
13.  RNAiDB and PhenoBlast: web tools for genome-wide phenotypic mapping projects 
Nucleic Acids Research  2004;32(Database issue):D406-D410.
RNA interference (RNAi) is being used in large-scale genomic studies as a rapid way to obtain in vivo functional information associated with specific genes. How best to archive and mine the complex data derived from these studies provides a series of challenges associated with both the methods used to elicit the RNAi response and the functional data gathered. RNAiDB (RNAi Database; http://www.rnai.org) has been created for the archival, distribution and analysis of phenotypic data from large-scale RNAi analyses in Caenorhabditis elegans. The database contains a compendium of publicly available data and provides information on experimental methods and phenotypic results, including raw data in the form of images and streaming time-lapse movies. Phenotypic summaries together with graphical displays of RNAi to gene mappings allow quick intuitive comparison of results from different RNAi assays and visualization of the gene product(s) potentially inhibited by each RNAi experiment based on multiple sequence analysis methods. RNAiDB can be searched using combinatorial queries and using the novel tool PhenoBlast, which ranks genes according to their overall phenotypic similarity. RNAiDB could serve as a model database for distributing and navigating in vivo functional information from large-scale systematic phenotypic analyses in different organisms.
doi:10.1093/nar/gkh110
PMCID: PMC308844  PMID: 14681444
14.  RNAi screen of Salmonella invasion shows role of COPI in membrane targeting of cholesterol and Cdc42 
A genome wide RNAi screen identifies 72 host cell genes affecting S. Typhimurium entry, including actin regulators and COPI. This study implicates COPI-dependent cholesterol and sphingolipid localization as a common mechanism of infection by bacterial and viral pathogens.
Genome-scale RNAi screen identifies 72 host genes affecting S. Typhimurium host cell invasion.Step-specific follow-up assays assign the phenotypes to specific steps of the invasion process.COPI effects on host cell binding, ruffling and invasion were traced to a key role of COPI in membrane targeting of cholesterol, sphingolipids, Rac1 and Cdc42.This new role of COPI explains why COPI is required for host cell infection by numerous bacterial and viral pathogens.
Pathogens are not only a menace to public health, but they also provide excellent tools for probing host cell function. Thus, studying infection mechanisms has fueled progress in cell biology (Ridley et al, 1992; Welch et al, 1997). In the presented study, we have performed an RNAi screen to identify host cell genes required for Salmonella host cell invasion. This screen identified proteins known to contribute to Salmonella-induced actin rearrangements (e.g., Cdc42 and the Arp2/3 complex; reviewed in Schlumberger and Hardt, 2006) and vesicular traffic (e.g., Rab7) as well as unexpected hits, such as the COPI complex. COPI is a known organizer of Golgi-to-ER vesicle transport (Bethune et al, 2006; Beck et al, 2009). Here, we show that COPI is also involved in plasma membrane targeting of cholesterol, sphingolipids and the Rho GTPases Cdc42 and Rac1, essential host cell factors required for Salmonella invasion. This explains why COPI depletion inhibits infection by S. Typhimurium and illustrates how combining bacterial pathogenesis and systems approaches can promote cell biology.
Salmonella Typhimurium is a common food-borne pathogen and worldwide a major public health problem causing severe diarrhea. The pathogen uses the host's gut mucosa as a portal of entry and gut tissue invasion is a key event leading to the disease. This explains the intense interest from medicine and basic biology in the mechanism of Salmonella host cell invasion.
Tissue culture infection models have delineated a sequence of events leading host cell invasion (Figure 1; Schlumberger and Hardt, 2006): (i) pathogen binding to the host cell surface; (ii) activation of a syringe-like apparatus (‘Type III secretion system 1', T1) of the bacterium and injection of a bacterial toxin cocktail into the host cell. These toxins include SopE, a key virulence factor triggering invasion (Hardt et al, 1998), which was analyzed in our study; (iii) toxin-triggered membrane ruffling. To a significant extent, this is facilitated by SopE-triggered activation of Cdc42 and Rac1 and subsequent actin polymerization at the site of infection; (iv) engulfment of the pathogen within a vesicular compartment (SCV) and (v) maturation of the SCV, a process driven by a second Type III secretion system (T2), which is expressed by the pathogen upon bacterial entry (Figure 1). This sequence of events mediates Salmonella invasion into the gut epithelium and illustrates that this pathogen can be used for probing mechanisms of host cell actin control, membrane biogenesis, vesicle formation and vesicular trafficking.
SopE is a key virulence factor of invasion and triggers the activation of Cdc42 and Rac1 and subsequent actin polymerization at the site of infection. We have employed a SopE-expressing S. Typhimurium strain and RNAi screening technology to identify host cell factors affecting invasion. First, we developed an automated fluorescence microscopy assay to quantify S. Typhimurium entry in a high-throughput format (Figure 1C). This assay was based on a GFP reporter expressed by the pathogen after invasion and maturation of the SCV. Using this assay, we screened a ‘druggable genome' siRNA library (6978 genes, 3 oligos each, 1 oligo per well) and identified 72 invasion hits. These included established regulators of the actin cytoskeleton (Cdc42, Arp2/3, Nap1; Schlumberger and Hardt, 2006), some of which have not been implicated so far in Salmonella entry (Pfn1, Cap1), as well as proteins not previously thought to influence infection (Atp1a1, Rbx1, COPI complex). Potentially, these hits could affect any step of the invasion process (Figure 1A).
In the second stage of the study, we have assigned each ‘invasion hit' to particular steps of the invasion process. For this purpose, we developed step-specific assays for Salmonella binding, injection, ruffling and membrane engulfment and re-screened the genes found as hits in the first screen (four siRNAs per gene). As expected, a significant number of ‘hits' affected binding to the host cell, others affected binding and ruffling (e.g., Pfn1, Itgβ5, Cap1), a few were specific for the ruffling step (e.g., Cdc42) and some affected SCV maturation, namely Rab7a, the trafficking protein Vps39 and the vacuolar proton pump Atp6ap2. Thus, our experimental strategy allowed mechanistic interpretation and linked novel hits to particular phenotypes, thus providing a basis for further studies (Figure 1).
COPI depletion impaired effector injection and ruffling. This was surprising, as the COPI complex was known to regulate retrogade Golgi-to-ER transport, but was not expected to affect pathogen interactions at the plasma membrane. Therefore, we have investigated the underlying mechanism. We have observed that COPI depletion entailed dramatic changes in the plasma membrane composition (Figure 6). Cholesterol and sphingolipids, which form domains (‘lipid rafts') in the plasma membrane, were depleted from the cell surface and redirected into a large vesicular compartment. The same was true for the Rho GTPases Rac1 and Cdc42. This strong decrease in the amount of cholesterol-enriched microdomains and Rho GTPases in the plasma membrane explained the observed defects in S. Typhimurium host cell invasion and assigned a novel role for COPI in controlling mammalian plasma membrane composition. It should be noted that other viral and bacterial pathogens do show a similar dependency on host cellular COPI and plasma membrane lipids. This includes notorious pathogens such as Staphylococcus aureus (Ramet et al, 2002; Potrich et al, 2009), Listeria monocytogenes (Seveau et al, 2004; Agaisse et al, 2005; Cheng et al, 2005; Gekara et al, 2005), Mycobacterium tuberculosis (Munoz et al, 2009), Chlamydia trachomatis (Elwell et al, 2008), influenza virus (Hao et al, 2008; Konig et al, 2010), hepatitis C virus (Tai et al, 2009; Popescu and Dubuisson, 2010) and the vesicular stomatitis virus (presented study) and suggests that COPI-mediated control of host cell plasma membrane composition might be of broad importance for pathogenesis. Future work will have to address whether this might offer starting points for developing anti-infective therapeutics with a very broad spectrum of activity.
The pathogen Salmonella Typhimurium is a common cause of diarrhea and invades the gut tissue by injecting a cocktail of virulence factors into epithelial cells, triggering actin rearrangements, membrane ruffling and pathogen entry. One of these factors is SopE, a G-nucleotide exchange factor for the host cellular Rho GTPases Rac1 and Cdc42. How SopE mediates cellular invasion is incompletely understood. Using genome-scale RNAi screening we identified 72 known and novel host cell proteins affecting SopE-mediated entry. Follow-up assays assigned these ‘hits' to particular steps of the invasion process; i.e., binding, effector injection, membrane ruffling, membrane closure and maturation of the Salmonella-containing vacuole. Depletion of the COPI complex revealed a unique effect on virulence factor injection and membrane ruffling. Both effects are attributable to mislocalization of cholesterol, sphingolipids, Rac1 and Cdc42 away from the plasma membrane into a large intracellular compartment. Equivalent results were obtained with the vesicular stomatitis virus. Therefore, COPI-facilitated maintenance of lipids may represent a novel, unifying mechanism essential for a wide range of pathogens, offering opportunities for designing new drugs.
doi:10.1038/msb.2011.7
PMCID: PMC3094068  PMID: 21407211
coatomer; HeLa; Salmonella; siRNA; systems biology
15.  A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila 
BMC Genomics  2009;10:220.
Background
The recently developed RNA interference (RNAi) technology has created an unprecedented opportunity which allows the function of individual genes in whole organisms or cell lines to be interrogated at genome-wide scale. However, multiple issues, such as off-target effects or low efficacies in knocking down certain genes, have produced RNAi screening results that are often noisy and that potentially yield both high rates of false positives and false negatives. Therefore, integrating RNAi screening results with other information, such as protein-protein interaction (PPI), may help to address these issues.
Results
By analyzing 24 genome-wide RNAi screens interrogating various biological processes in Drosophila, we found that RNAi positive hits were significantly more connected to each other when analyzed within a protein-protein interaction network, as opposed to random cases, for nearly all screens. Based on this finding, we developed a network-based approach to identify false positives (FPs) and false negatives (FNs) in these screening results. This approach relied on a scoring function, which we termed NePhe, to integrate information obtained from both PPI network and RNAi screening results. Using a novel rank-based test, we compared the performance of different NePhe scoring functions and found that diffusion kernel-based methods generally outperformed others, such as direct neighbor-based methods. Using two genome-wide RNAi screens as examples, we validated our approach extensively from multiple aspects. We prioritized hits in the original screens that were more likely to be reproduced by the validation screen and recovered potential FNs whose involvements in the biological process were suggested by previous knowledge and mutant phenotypes. Finally, we demonstrated that the NePhe scoring system helped to biologically interpret RNAi results at the module level.
Conclusion
By comprehensively analyzing multiple genome-wide RNAi screens, we conclude that network information can be effectively integrated with RNAi results to produce suggestive FPs and FNs, and to bring biological insight to the screening results.
doi:10.1186/1471-2164-10-220
PMCID: PMC2697172  PMID: 19435510
16.  Accurate, precise modeling of cell proliferation kinetics from time-lapse imaging and automated image analysis of agar yeast culture arrays 
Background
Genome-wide mutant strain collections have increased demand for high throughput cellular phenotyping (HTCP). For example, investigators use HTCP to investigate interactions between gene deletion mutations and additional chemical or genetic perturbations by assessing differences in cell proliferation among the collection of 5000 S. cerevisiae gene deletion strains. Such studies have thus far been predominantly qualitative, using agar cell arrays to subjectively score growth differences. Quantitative systems level analysis of gene interactions would be enabled by more precise HTCP methods, such as kinetic analysis of cell proliferation in liquid culture by optical density. However, requirements for processing liquid cultures make them relatively cumbersome and low throughput compared to agar. To improve HTCP performance and advance capabilities for quantifying interactions, YeastXtract software was developed for automated analysis of cell array images.
Results
YeastXtract software was developed for kinetic growth curve analysis of spotted agar cultures. The accuracy and precision for image analysis of agar culture arrays was comparable to OD measurements of liquid cultures. Using YeastXtract, image intensity vs. biomass of spot cultures was linearly correlated over two orders of magnitude. Thus cell proliferation could be measured over about seven generations, including four to five generations of relatively constant exponential phase growth. Spot area normalization reduced the variation in measurements of total growth efficiency. A growth model, based on the logistic function, increased precision and accuracy of maximum specific rate measurements, compared to empirical methods. The logistic function model was also more robust against data sparseness, meaning that less data was required to obtain accurate, precise, quantitative growth phenotypes.
Conclusion
Microbial cultures spotted onto agar media are widely used for genotype-phenotype analysis, however quantitative HTCP methods capable of measuring kinetic growth rates have not been available previously. YeastXtract provides objective, automated, quantitative, image analysis of agar cell culture arrays. Fitting the resulting data to a logistic equation-based growth model yields robust, accurate growth rate information. These methods allow the incorporation of imaging and automated image analysis of cell arrays, grown on solid agar media, into HTCP-driven experimental approaches, such as global, quantitative analysis of gene interaction networks.
doi:10.1186/1752-0509-1-3
PMCID: PMC1847469  PMID: 17408510
17.  A novel computational model of the circadian clock in Arabidopsis that incorporates PRR7 and PRR9 
We developed a mathematical model of the Arabidopsis circadian clock, including PRR7 and PRR9, which is able to predict several single, double and triple mutant phenotypes.Sensitivity Analysis was used to identify the properties and time sensing mechanisms of model structures.PRR7 and CCA1/LHY were identified as weak points of the mathematical model indicating where more experimental data is needed for further model development.Detailed dynamical studies showed that the timing of an evening light sensing element is essential for day length responsiveness
In recent years, molecular genetic techniques have revealed a complex network of components in the Arabidopsis circadian clock. Mathematical models allow for a detailed study of the dynamics and architecture of such complex gene networks leading to a better understanding of the genetic interactions. It is important to maintain a constant iteration with experimentation, to include novel components as they are discovered and use the updated model to design new experiments. This study develops a framework to introduce new components into the mathematical model of the Arabidopsis circadian clock accelerating the iterative model development process and gaining insight into the system's properties.
We used the interlocked feedback loop model published in Locke et al (2005) as the base model. In Arabidopsis, the first suggested regulatory loop involves the morning expressed transcription factors CIRCADIAN CLOCK-ASSOCIATED 1 (CCA1) and LATE ELONGATED HYPOCOTYL (LHY), and the evening expressed pseudo-response regulator TIMING OF CAB EXPRESSION (TOC1). The hypothetical component X had been introduced to realize a longer delay between gene expression of CCA1/LHY and TOC1. The introduction of Y was motivated by the need for a mechanism to reproduce the dampening short period rhythms of the cca1/lhy double mutant and to include an additional light input at the end of the day.
In this study, the new components pseudo-response regulators PRR7 and PRR9 were added in negative feedback loops based on the biological hypothesis that they are activated by LHY and in turn repress LHY transcription (Farré et al, 2005; Figure 1). We present three iterations steps of model development (Figure 1A–C).
A wide range of tools was used to establish and analyze new model structures. One of the challenges facing mathematical modeling of biological processes is parameter identification; they are notoriously difficult to determine experimentally. We established an optimization procedure based on an evolutionary strategy with a cost function mainly derived from wild-type characteristics. This ensured that the model was not restricted by a specific set of parameters and enabled us to use a large set of biological mutant information to assess the predictive capability of the model structure. Models were evaluated by means of an extended phenotype catalogue, allowing for an easy and fair comparison of the structures. We also carried out detailed simulation analysis of component interactions to identify weak points in the structure and suggest further modifications. Finally, we applied sensitivity analysis in a novel manner, using it to direct the model development. Sensitivity analysis provides quantitative measures of robustness; the two measures in this study were the traces of component concentrations over time (classical state sensitivities) and phase behavior (measured by the phase response curve). Three major results emerged from the model development process.
First, the iteration process helped us to learn about general characteristics of the system. We observed that the timing of Y expression is critical for evening light entrainment, which enables the system to respond to changes in day length. This is important for our understanding of the mechanism of light input to the clock and will add in the identification of biological candidates for this function. In addition, our results suggest that a detailed description of the mechanisms of genetic interactions is important for the systems behavior. We observed that the introduction of an experimentally based precise light regulation mechanism on PRR9 expression had a significant effect on the systems behavior.
Second, the final model structure (Figure 1C) was capable of predicting a wide range of mutant phenotypes, such as a reduction of TOC1 expression by RNAi (toc1RNAi), mutations in PRR7 and PRR9 and the novel mutant combinations prr9toc1RNAi and prr7prr9toc1RNAi. However, it was unable to predict the mutations in CCA1 and LHY.
Finally, sensitivity analysis identified the weak points of the system. The developed model structure was heavily based on the TOC1/Y feedback loop. This could explain the model's failure to predict the cca1lhy double mutant phenotype. More detailed information on the regulation of CCA1 and LHY expression will be important to achieve the right balance between the different regulatory loops in the mathematical model. This is in accordance with genetic studies that have identified several genes involved in the regulation of LHY and CCA1 expression. The identification of their mechanism of action will be necessary for the next model development.
In plants, as in animals, the core mechanism to retain rhythmic gene expression relies on the interaction of multiple feedback loops. In recent years, molecular genetic techniques have revealed a complex network of clock components in Arabidopsis. To gain insight into the dynamics of these interactions, new components need to be integrated into the mathematical model of the plant clock. Our approach accelerates the iterative process of model identification, to incorporate new components, and to systematically test different proposed structural hypotheses. Recent studies indicate that the pseudo-response regulators PRR7 and PRR9 play a key role in the core clock of Arabidopsis. We incorporate PRR7 and PRR9 into an existing model involving the transcription factors TIMING OF CAB (TOC1), LATE ELONGATED HYPOCOTYL (LHY) and CIRCADIAN CLOCK ASSOCIATED (CCA1). We propose candidate models based on experimental hypotheses and identify the computational models with the application of an optimization routine. Validation is accomplished through systematic analysis of various mutant phenotypes. We introduce and apply sensitivity analysis as a novel tool for analyzing and distinguishing the characteristics of proposed architectures, which also allows for further validation of the hypothesized structures.
doi:10.1038/msb4100101
PMCID: PMC1682023  PMID: 17102803
Arabidopsis; circadian rhythms; mathematical modeling; parameter optimization; sensitivity analysis
18.  RNA interference is ineffective as a routine method for gene silencing in chick embryos as monitored by fgf8 silencing 
The in vivo accessibility of the chick embryo makes it a favoured model system for experimental developmental biology. Although the range of available techniques now extends to miss-expression of genes through in ovo electroporation, it remains difficult to knock out individual gene expression. Recently, the possibility of silencing gene expression by RNAi in chick embryos has been reported. However, published studies show only discrete quantitative differences in the expression of the endogenous targeted genes and unclear morphological alterations. To elucidate whether the tools currently available are adequate to silence gene expression sufficiently to produce a clear and specific null-like mutant phenotype, we have performed several experiments with different molecules that trigger RNAi: dsRNA, siRNA, and shRNA produced from a plasmid coexpressing green fluorescent protein as an internal marker. Focussing on fgf8 expression in the developing isthmus, we show that no morphological defects are observed, and that fgf8 expression is neither silenced in embryos microinjected with dsRNA nor in embryos microinjected and electroporated with a pool of siRNAs. Moreover, fgf8 expression was not significantly silenced in most isthmic cells transformed with a plasmid producing engineered shRNAs to fgf8. We also show that siRNA molecules do not spread significantly from cell to cell as reported for invertebrates, suggesting the existence of molecular differences between different model systems that may explain the different responses to RNAi. Although our results are basically in agreement with previously reported studies, we suggest, in contrast to them, that with currently available tools and techniques the number of cells in which fgf8 gene expression is decreased, if any, is not sufficient to generate a detectable mutant phenotype, thus making RNAi useless as a routine method for functional gene analysis in chick embryos.
PMCID: PMC1140352  PMID: 15951844
RNA interference (RNAi); small interfering RNA (siRNA); short hairpin RNA (shRNA); chick embryo; isthmus; fgf8
19.  Content-based image retrieval for brain MRI: An image-searching engine and population-based analysis to utilize past clinical data for future diagnosis 
NeuroImage : Clinical  2015;7:367-376.
Radiological diagnosis is based on subjective judgment by radiologists. The reasoning behind this process is difficult to document and share, which is a major obstacle in adopting evidence-based medicine in radiology. We report our attempt to use a comprehensive brain parcellation tool to systematically capture image features and use them to record, search, and evaluate anatomical phenotypes. Anatomical images (T1-weighted MRI) were converted to a standardized index by using a high-dimensional image transformation method followed by atlas-based parcellation of the entire brain. We investigated how the indexed anatomical data captured the anatomical features of healthy controls and a population with Primary Progressive Aphasia (PPA). PPA was chosen because patients have apparent atrophy at different degrees and locations, thus the automated quantitative results can be compared with trained clinicians' qualitative evaluations. We explored and tested the power of individual classifications and of performing a search for images with similar anatomical features in a database using partial least squares-discriminant analysis (PLS-DA) and principal component analysis (PCA). The agreement between the automated z-score and the averaged visual scores for atrophy (r = 0.8) was virtually the same as the inter-evaluator agreement. The PCA plot distribution correlated with the anatomical phenotypes and the PLS-DA resulted in a model with an accuracy of 88% for distinguishing PPA variants. The quantitative indices captured the main anatomical features. The indexing of image data has a potential to be an effective, comprehensive, and easily translatable tool for clinical practice, providing new opportunities to mine clinical databases for medical decision support.
Highlights
•Brain parcellation tools define structures automatically and convert images into standardized and quantitative matrices.•We tested if an automated tool and the resultant vector of structural volumes can accurately capture anatomical phenotypes.•The agreement between visual and automated atrophy detection was virtually the same as the inter-evaluator agreement.•The quantitative indices captured the main anatomical features in brains with atrophy in different degrees and location.•The image quantification has potential to be an effective, comprehensive, and easily translatable tool for clinical practice.
doi:10.1016/j.nicl.2015.01.008
PMCID: PMC4309952
Automated parcellation; Brain; MRI; Content-based image retrieval; Atlas-based analysis
20.  Modularity and hormone sensitivity of the Drosophila melanogaster insulin receptor/target of rapamycin interaction proteome 
First systematic analysis of the evolutionary conserved InR/TOR pathway interaction proteome in Drosophila.Quantitative mass spectrometry revealed that 22% of identified protein interactions are regulated by the growth hormone insulin affecting membrane proximal as well as intracellular signaling complexes.Systematic RNA interference linked a significant fraction of network components to the control of dTOR kinase activity.Combined biochemical and genetic data suggest dTTT, a dTOR-containing complex required for cell growth control by dTORC1 and dTORC2 in vivo.
Cellular growth is a fundamental process that requires constant adaptations to changing environmental conditions, like growth factor and nutrient availability, energy levels and more. Over the years, the insulin receptor/target of rapamycin pathway (InR/TOR) emerged as a key signaling system for the control of metazoan cell growth. Genetic screens carried out in the fruit fly Drosophila melanogaster identified key InR/TOR pathway components and their relationships. Phenotypes such as altered cell growth are likely to emerge from perturbed dynamic networks containing InR/TOR pathway components, which stably or transiently interact with other cellular proteins to form complexes and networks thereof. Systematic studies on the topology and dynamics of protein interaction networks become therefore highly relevant to gain systems level understanding of deregulated cell growth. Despite much progress in genetic analysis only few systematic protein interaction studies have been reported for Drosophila, which in most cases lack quantitative information representing the dynamic nature of such networks. Here, we present the first quantitative affinity purification mass spectrometry (AP–MS/MS) analysis on the evolutionary conserved InR/TOR signaling network in Drosophila. Systematic RNAi-based functional analysis of identified network components revealed key components linked to the regulation of the central effector kinase dTOR. This includes also dTTT, a novel dTOR-containing complex required for the control of dTORC1 and dTORC2 in vivo.
For systematic AP–MS analysis, we generated Drosophila Kc167 cell lines inducibly expressing affinity-tagged bait proteins previously linked to InR/TOR signaling. Bait expressing Kc167 cell lines were harvested before and after insulin stimulation for subsequent affinity purification. Following LC–MS/MS analysis and probabilistic data filtering using SAINT (Choi et al, 2010), we generated a quantitative network model from 97 high confidence protein–protein interactions and 58 network components (Figure 2). The presented network displayed a high degree of orthologous interactions conserved also in human cells and identified a number of novel molecular interactions with InR/TOR signaling components for future hypothesis driven analysis.
To measure insulin-induced changes within the InR/TOR interaction proteome, we applied a recently introduced label-free quantitative MS approach (Rinner et al, 2007). The obtained quantitative data suggest that 22% of all interactions in the network are regulated by insulin. Major changes could be observed within the membrane proximal InR/chico/PI3K signaling complexes, and also in 14-3-3 protein containing signaling complexes and dTORC1, a complex that contains besides dTOR all major orthologous proteins found also in human mTORC1 including the two dTORC1 substrates d4E-BP (Thor) and S6 Kinase (S6K). Insulin triggered both, dissociation and association of dTORC1 proteins. Among the proteins that showed enhanced binding to dTORC1 upon insulin stimulation we found Unkempt, a RING-finger protein with a proposed role in ubiquitin-mediated protein degradation (Lores et al, 2010). Besides dTORC1 our systematic AP–MS analysis also revealed the presence of dTORC2, the second major TOR complex in Drosophila. dTORC2 contains the Drosophila orthologous of human mTORC2 proteins, but in contrast to dTORC1 was not affected upon insulin stimulation. Interestingly, we also found a specific set of proteins that were not linked to the canonical TOR complexes TORC1 and TORC2 in dTOR purifications. These include LqfR (liquid facets related), Pontin, Reptin, Spaghetti and the gene product of CG16908. We found the same set of proteins when we used CG16908 as a bait, suggesting complex formation among the identified proteins. None of the dTORC1/2 components besides dTOR was identified in CG16908 purifications, indicating that these proteins form dTOR complexes distinct from dTORC1 and dTORC2. Based on known interaction information from other species and data obtained from this study we refer to this complex as dTTT (Drosophila TOR, TELO2, TTI1) (Horejsi et al, 2010; [18]Hurov et al, 2010; [20]Kaizuka et al, 2010). A directed quantitative MS analysis of dTOR complex components suggests that dTORC1 is the most abundant dTOR complex we identified in Kc167 cells.
We next studied the potential roles of the identified network components for controlling the activity of the dInR/TOR pathway using systematic RNAi depletion and quantitative western blotting to measure the changes in abundance of phosphorylated substrates of dTORC1 (Thor/d4E-BP, dS6K) and dTORC2 (dPKB) in RNAi-treated cells (Figure 5). Overall, we could identify 16 proteins (out of 58) whose depletion caused an at least 50% increase or decrease in the levels of phosphorylated d4E-BP, S6K and/or PKB compared with control GFP RNAi. Besides established pathway components, we found several novel regulators within the dInR/TOR interaction network. For example, RNAi against the novel insulin-regulated dTORC1 component Unkempt resulted in enhanced phosphorylation of the dTORC1 substrate d4E-BP, which suggests a negative role for Unkempt on dTORC1 activity. In contrast, depletion of CG16908 and LqfR caused hypo-phosphorylation of all dTOR substrates similar to dTOR itself, suggesting a positive role for the dTTT complex on dTOR activity. Subsequently, we tested whether dTTT components also plays a role in dTOR-mediated cell growth in vivo. Depletion of both dTTT components, CG16908 and LqfR, in the Drosophila eye resulted in a substantial decrease in eye size. Likewise, FLP-FRT-mediated mitotic recombination resulted in CG16908 and LqfR mutant clones with a similar reduced growth phenotype as observed in dTOR mutant clones. Hence, the combined biochemical and genetic analysis revealed dTTT as a dTOR-containing complex required for the activity of both dTORC1 and dTORC2 and thus plays a critical role in controlling cell growth.
Taken together, these results illustrate how a systematic quantitative AP–MS approach when combined with systematic functional analysis in Drosophila can reveal novel insights into the dynamic organization of regulatory networks for cell growth control in metazoans.
Using quantitative mass spectrometry, this study reports how insulin affects the modularity of the interaction proteome of the Drosophila InR/TOR pathway, an evolutionary conserved signaling system for the control of metazoan cell growth. Systematic functional analysis linked a significant number of identified network components to the control of dTOR activity and revealed dTTT, a dTOR complex required for in vivo cell growth control by dTORC1 and dTORC2.
Genetic analysis in Drosophila melanogaster has been widely used to identify a system of genes that control cell growth in response to insulin and nutrients. Many of these genes encode components of the insulin receptor/target of rapamycin (InR/TOR) pathway. However, the biochemical context of this regulatory system is still poorly characterized in Drosophila. Here, we present the first quantitative study that systematically characterizes the modularity and hormone sensitivity of the interaction proteome underlying growth control by the dInR/TOR pathway. Applying quantitative affinity purification and mass spectrometry, we identified 97 high confidence protein interactions among 58 network components. In all, 22% of the detected interactions were regulated by insulin affecting membrane proximal as well as intracellular signaling complexes. Systematic functional analysis linked a subset of network components to the control of dTORC1 and dTORC2 activity. Furthermore, our data suggest the presence of three distinct dTOR kinase complexes, including the evolutionary conserved dTTT complex (Drosophila TOR, TELO2, TTI1). Subsequent genetic studies in flies suggest a role for dTTT in controlling cell growth via a dTORC1- and dTORC2-dependent mechanism.
doi:10.1038/msb.2011.79
PMCID: PMC3261712  PMID: 22068330
cell growth; InR/TOR pathway; interaction proteome; quantitative mass spectrometry; signaling
21.  PPARα siRNA–Treated Expression Profiles Uncover the Causal Sufficiency Network for Compound-Induced Liver Hypertrophy 
PLoS Computational Biology  2007;3(3):e30.
Uncovering pathways underlying drug-induced toxicity is a fundamental objective in the field of toxicogenomics. Developing mechanism-based toxicity biomarkers requires the identification of such novel pathways and the order of their sufficiency in causing a phenotypic response. Genome-wide RNA interference (RNAi) phenotypic screening has emerged as an effective tool in unveiling the genes essential for specific cellular functions and biological activities. However, eliciting the relative contribution of and sufficiency relationships among the genes identified remains challenging. In the rodent, the most widely used animal model in preclinical studies, it is unrealistic to exhaustively examine all potential interactions by RNAi screening. Application of existing computational approaches to infer regulatory networks with biological outcomes in the rodent is limited by the requirements for a large number of targeted permutations. Therefore, we developed a two-step relay method that requires only one targeted perturbation for genome-wide de novo pathway discovery. Using expression profiles in response to small interfering RNAs (siRNAs) against the gene for peroxisome proliferator-activated receptor α (Ppara), our method unveiled the potential causal sufficiency order network for liver hypertrophy in the rodent. The validity of the inferred 16 causal transcripts or 15 known genes for PPARα-induced liver hypertrophy is supported by their ability to predict non-PPARα–induced liver hypertrophy with 84% sensitivity and 76% specificity. Simulation shows that the probability of achieving such predictive accuracy without the inferred causal relationship is exceedingly small (p < 0.005). Five of the most sufficient causal genes have been previously disrupted in mouse models; the resulting phenotypic changes in the liver support the inferred causal roles in liver hypertrophy. Our results demonstrate the feasibility of defining pathways mediating drug-induced toxicity from siRNA-treated expression profiles. When combined with phenotypic evaluation, our approach should help to unleash the full potential of siRNAs in systematically unveiling the molecular mechanism of biological events.
Author Summary
Approaches for discovering mechanisms of action and for identifying molecular biomarkers in biomedical research are evolving today, as the growing symbiosis with computational sciences becomes more widely appreciated. In fact, the combination of various new technologies has been pushing forward both frontiers. Here, we present an example of the combined use of in vivo siRNA knock-down technology, genome-wide gene expression profiling, and computational reasoning to unveil regulatory causal relationships and the sufficiency network of identified genes for compound-induced toxicity. Unlike previously reported approaches, our method requires only one targeted perturbation for genome-wide de novo pathway discovery. Hence, our method can be directly applied to animal models in which it is still technically challenging to perform genome-wide genetic perturbation or RNAi screening. The independent application of our derived model to compounds with unrelated mechanisms of action suggests the existence of a universal molecular module that mediates liver hypertrophy. The resulting sufficiency network for induction of liver hypertrophy will have an immediate impact on the progress of toxicogenomics. When combined with phenotypic evaluation, our approach should help to unleash the full potential of siRNAs in systematically unveiling the molecular mechanisms of biological events.
doi:10.1371/journal.pcbi.0030030
PMCID: PMC1808491  PMID: 17335344
22.  A regression model approach to enable cell morphology correction in high-throughput flow cytometry 
Large variations in cell size and shape can undermine traditional gating methods for analyzing flow cytometry data. Correcting for these effects enables analysis of high-throughput data sets, including >5000 yeast samples with diverse cell morphologies.
The regression model approach corrects for the effects of cell morphology on fluorescence, as well as an extremely small and restrictive gate, but without removing any of the cells.In contrast to traditional gating, this approach enables the quantitative analysis of high-throughput flow cytometry experiments, since the regression model can compare between biological samples that show no or little overlap in terms of the morphology of the cells.The analysis of a high-throughput yeast flow cytometry data set consisting of >5000 biological samples identified key proteins that affect the time and intensity of the bifurcation event that happens after the carbon source transition from glucose to fatty acids. Here, some yeast cells undergo major structural changes, while others do not.
Flow cytometry is a widely used technique that enables the measurement of different optical properties of individual cells within large populations of cells in a fast and automated manner. For example, by targeting cell-specific markers with fluorescent probes, flow cytometry is used to identify (and isolate) cell types within complex mixtures of cells. In addition, fluorescence reporters can be used in conjunction with flow cytometry to measure protein, RNA or DNA concentration within single cells of a population.
One of the biggest advantages of this technique is that it provides information of how each cell behaves instead of just measuring the population average. This can be essential when analyzing complex samples that consist of diverse cell types or when measuring cellular responses to stimuli. For example, there is an important difference between a 50% expression increase of all cells in a population after stimulation and a 100% increase in only half of the cells, while the other half remains unresponsive. Another important advantage of flow cytometry is automation, which enables high-throughput studies with thousands of samples and conditions. However, current methods are confounded by populations of cells that are non-uniform in terms of size and granularity. Such variability affects the emitted fluorescence of the cell and adds undesired variability when estimating population fluorescence. This effect also frustrates a sensible comparison between conditions, where not only fluorescence but also cell size and granularity may be affected.
Traditionally, this problem has been addressed by using ‘gates' that restrict the analysis to cells with similar morphological properties (i.e. cell size and cell granularity). Because cells inside the gate are morphologically similar to one another, they will show a smaller variability in their response within the population. Moreover, applying the same gate in all samples assures that observed differences between these samples are not due to differential cell morphologies.
Gating, however, comes with costs. First, since only a subgroup of cells is selected, the final number of cells analyzed can be significantly reduced. This means that in order to have sufficient statistical power, more cells have to be acquired, which, if even possible in the first place, increases the time and cost of the experiment. Second, finding a good gate for all samples and conditions can be challenging if not impossible, especially in cases where cellular morphology changes dramatically between conditions. Finally, gating is a very user-dependent process, where both the size and shape of the gate are determined by the researcher and will affect the outcome, introducing subjectivity in the analysis that complicates reproducibility.
In this paper, we present an alternative method to gating that addresses the issues stated above. The method is based on a regression model containing linear and non-linear terms that estimates and corrects for the effect of cell size and granularity on the observed fluorescence of each cell in a sample. The corrected fluorescence thus becomes ‘free' of the morphological effects.
Because the model uses all cells in the sample, it assures that the corrected fluorescence is an accurate representation of the sample. In addition, the regression model can predict the expected fluorescence of a sample in areas where there are no cells. This makes it possible to compare between samples that have little overlap with good confidence. Furthermore, because the regression model is automated, it is fully reproducible between labs and conditions. Finally, it allows for a rapid analysis of big data sets containing thousands of samples.
To probe the validity of the model, we performed several experiments. We show how the regression model is able to remove the morphological-associated variability as well as an extremely small and restrictive gate, but without the caveat of removing cells. We test the method in different organisms (yeast and human) and applications (protein level detection, separation of mixed subpopulations). We then apply this method to unveil new biological insights in the mechanistic processes involved in transcriptional noise.
Gene transcription is a process subjected to the randomness intrinsic to any molecular event. Although such randomness may seem to be undesirable for the cell, since it prevents consistent behavior, there are situations where some degree of randomness is beneficial (e.g. bet hedging). For this reason, each gene is tuned to exhibit different levels of randomness or noise depending on its functions. For core and essential genes, the cell has developed mechanisms to lower the level of noise, while for genes involved in the response to stress, the variability is greater.
This gene transcription tuning can be determined at many levels, from the architecture of the transcriptional network, to epigenetic regulation. In our study, we analyze the latter using the response of yeast to the presence of fatty acid in the environment. Fatty acid can be used as energy by yeast, but it requires major structural changes and commitments. We have observed that at the population level, there is a bifurcation event whereby some cells undergo these changes and others do not. We have analyzed this bifurcation event in mutants for all the non-essential epigenetic regulators in yeast and identified key proteins that affect the time and intensity of this bifurcation. Even though fatty acid triggers major morphological changes in the cell, the regression model still makes it possible to analyze the over 5000 flow cytometry samples in this data set in an automated manner, whereas a traditional gating approach would be impossible.
Cells exposed to stimuli exhibit a wide range of responses ensuring phenotypic variability across the population. Such single cell behavior is often examined by flow cytometry; however, gating procedures typically employed to select a small subpopulation of cells with similar morphological characteristics make it difficult, even impossible, to quantitatively compare cells across a large variety of experimental conditions because these conditions can lead to profound morphological variations. To overcome these limitations, we developed a regression approach to correct for variability in fluorescence intensity due to differences in cell size and granularity without discarding any of the cells, which gating ipso facto does. This approach enables quantitative studies of cellular heterogeneity and transcriptional noise in high-throughput experiments involving thousands of samples. We used this approach to analyze a library of yeast knockout strains and reveal genes required for the population to establish a bimodal response to oleic acid induction. We identify a group of epigenetic regulators and nucleoporins that, by maintaining an ‘unresponsive population,' may provide the population with the advantage of diversified bet hedging.
doi:10.1038/msb.2011.64
PMCID: PMC3202802  PMID: 21952134
flow cytometry; high-throughput experiments; statistical regression model; transcriptional noise
23.  Identification of Drosophila Mitotic Genes by Combining Co-Expression Analysis and RNA Interference 
PLoS Genetics  2008;4(7):e1000126.
RNAi screens have, to date, identified many genes required for mitotic divisions of Drosophila tissue culture cells. However, the inventory of such genes remains incomplete. We have combined the powers of bioinformatics and RNAi technology to detect novel mitotic genes. We found that Drosophila genes involved in mitosis tend to be transcriptionally co-expressed. We thus constructed a co-expression–based list of 1,000 genes that are highly enriched in mitotic functions, and we performed RNAi for each of these genes. By limiting the number of genes to be examined, we were able to perform a very detailed phenotypic analysis of RNAi cells. We examined dsRNA-treated cells for possible abnormalities in both chromosome structure and spindle organization. This analysis allowed the identification of 142 mitotic genes, which were subdivided into 18 phenoclusters. Seventy of these genes have not previously been associated with mitotic defects; 30 of them are required for spindle assembly and/or chromosome segregation, and 40 are required to prevent spontaneous chromosome breakage. We note that the latter type of genes has never been detected in previous RNAi screens in any system. Finally, we found that RNAi against genes encoding kinetochore components or highly conserved splicing factors results in identical defects in chromosome segregation, highlighting an unanticipated role of splicing factors in centromere function. These findings indicate that our co-expression–based method for the detection of mitotic functions works remarkably well. We can foresee that elaboration of co-expression lists using genes in the same phenocluster will provide many candidate genes for small-scale RNAi screens aimed at completing the inventory of mitotic proteins.
Author Summary
Mitosis is the evolutionarily conserved process that enables a dividing cell to equally partition its genetic material between the two daughter cells. The fidelity of mitotic division is crucial for normal development of multicellular organisms and to prevent cancer or birth defects. Understanding the molecular mechanisms of mitosis requires the identification of genes involved in this process. Previous studies have shown that such genes can be readily identified by RNA interference (RNAi) in Drosophila tissue culture cells. Because the inventory of mitotic genes is still incomplete, we have undertaken an RNAi screen using a novel approach. We used a co-expression–based bioinformatic procedure to select a group of 1,000 genes enriched in mitotic functions from a dataset of 13,166 Drosophila genes. This group includes roughly half of the known mitotic genes, implying that it should contain half of all mitotic genes, including those that are currently unknown. We performed RNAi against each of the 1,000 genes in the group. By limiting the number of genes to be examined, we were able to perform a very detailed phenotypic analysis of RNAi cells. This analysis allowed the identification of 70 genes whose mitotic role was previously unknown; 30 are required for proper chromosome segregation and 40 are required to maintain chromosome integrity.
doi:10.1371/journal.pgen.1000126
PMCID: PMC2537813  PMID: 18797514
24.  A Computational Framework for Ultrastructural Mapping of Neural Circuitry 
PLoS Biology  2009;7(3):e1000074.
Circuitry mapping of metazoan neural systems is difficult because canonical neural regions (regions containing one or more copies of all components) are large, regional borders are uncertain, neuronal diversity is high, and potential network topologies so numerous that only anatomical ground truth can resolve them. Complete mapping of a specific network requires synaptic resolution, canonical region coverage, and robust neuronal classification. Though transmission electron microscopy (TEM) remains the optimal tool for network mapping, the process of building large serial section TEM (ssTEM) image volumes is rendered difficult by the need to precisely mosaic distorted image tiles and register distorted mosaics. Moreover, most molecular neuronal class markers are poorly compatible with optimal TEM imaging. Our objective was to build a complete framework for ultrastructural circuitry mapping. This framework combines strong TEM-compliant small molecule profiling with automated image tile mosaicking, automated slice-to-slice image registration, and gigabyte-scale image browsing for volume annotation. Specifically we show how ultrathin molecular profiling datasets and their resultant classification maps can be embedded into ssTEM datasets and how scripted acquisition tools (SerialEM), mosaicking and registration (ir-tools), and large slice viewers (MosaicBuilder, Viking) can be used to manage terabyte-scale volumes. These methods enable large-scale connectivity analyses of new and legacy data. In well-posed tasks (e.g., complete network mapping in retina), terabyte-scale image volumes that previously would require decades of assembly can now be completed in months. Perhaps more importantly, the fusion of molecular profiling, image acquisition by SerialEM, ir-tools volume assembly, and data viewers/annotators also allow ssTEM to be used as a prospective tool for discovery in nonneural systems and a practical screening methodology for neurogenetics. Finally, this framework provides a mechanism for parallelization of ssTEM imaging, volume assembly, and data analysis across an international user base, enhancing the productivity of a large cohort of electron microscopists.
Author Summary
Building an accurate neural network diagram of the vertebrate nervous system is a major challenge in neuroscience. Diverse groups of neurons that function together form complex patterns of connections often spanning large regions of brain tissue, with uncertain borders. Although serial-section transmission electron microscopy remains the optimal tool for fine anatomical analyses, the time and cost of the undertaking has been prohibitive. We have assembled a complete framework for ultrastructural mapping using conventional transmission electron microscopy that tremendously accelerates image analysis. This framework combines small-molecule profiling to classify cells, automated image acquisition, automated mosaic formation, automated slice-to-slice image registration, and large-scale image browsing for volume annotation. Terabyte-scale image volumes requiring decades or more to assemble manually can now be automatically built in a few months. This makes serial-section transmission electron microscopy practical for high-resolution exploration of all complex tissue systems (neural or nonneural) as well as for ultrastructural screening of genetic models.
A framework for analysis of terabyte-scale serial-section transmission electron microscopic (ssTEM) datasets overcomes computational barriers and accelerates high-resolution tissue analysis, providing a practical way of mapping complex neural circuitry and an effective screening tool for neurogenetics.
doi:10.1371/journal.pbio.1000074
PMCID: PMC2661966  PMID: 19855814
25.  Automatic categorization of diverse experimental information in the bioscience literature 
BMC Bioinformatics  2012;13:16.
Background
Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance.
Results
We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction.
Conclusions
Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.
doi:10.1186/1471-2105-13-16
PMCID: PMC3305665  PMID: 22280404

Results 1-25 (1588341)