1.  The Hippo Signaling Pathway Interactome 
Science (New York, N.Y.)  2013;342(6159):737-740.
The Hippo pathway controls metazoan organ growth by regulating cell proliferation and apoptosis. Many components have been identified, but our knowledge of the composition and structure of this pathway is still incomplete. Using existing pathway components as baits, we generated by mass spectrometry a high-confidence Drosophila Hippo protein-protein interaction network (Hippo-PPIN) consisting of 153 proteins and 204 interactions. Depletion of 67% of the proteins by RNA interference regulated the transcriptional coactivator Yorkie (Yki) either positively or negatively. We selected for further characterization a new member of the alpha-arrestin family, Leash, and show that it promotes degradation of Yki through the lysosomal pathway. Given the importance of the Hippo pathway in tumor development, the Hippo-PPIN will contribute to our understanding of this network in both normal growth and cancer.
PMCID: PMC3951131  PMID: 24114784
2.  PPIRank - an advanced method for ranking protein-protein interations in TAP/MS data 
Proteome Science  2013;11(Suppl 1):S16.
Tandem affinity purification coupled with mass-spectrometry (TAP/MS) analysis is a popular method for the identification of novel endogenous protein-protein interactions (PPIs) in large-scale. Computational analysis of TAP/MS data is a critical step, particularly for high-throughput datasets, yet it remains challenging due to the noisy nature of TAP/MS data.
We investigated several major TAP/MS data analysis methods for identifying PPIs, and developed an advanced method, which incorporates an improved statistical method to filter out false positives from the negative controls. Our method is named PPIRank that stands for PPI ranking in TAP/MS data. We compared PPIRank with several other existing methods in analyzing two pathway-specific TAP/MS PPI datasets from Drosophila.
Experimental results show that PPIRank is more capable than other approaches in terms of identifying known interactions collected in the BioGRID PPI database. Specifically, PPIRank is able to capture more true interactions and simultaneously less false positives in both Insulin and Hippo pathways of Drosophila Melanogaster.
PMCID: PMC3908380  PMID: 24565074
Protein-Protein Interaction; TAP/MS; Spectral Counts
3.  A novel approach to minimize false discovery rate in genome-wide data analysis 
BMC Systems Biology  2013;7(Suppl 4):S1.
High-throughput technologies, such as DNA microarray, have significantly advanced biological and biomedical research by enabling researchers to carry out genome-wide screens. One critical task in analyzing genome-wide datasets is to control the false discovery rate (FDR) so that the proportion of false positive features among those called significant is restrained. Recently a number of FDR control methods have been proposed and widely practiced, such as the Benjamini-Hochberg approach, the Storey approach and Significant Analysis of Microarrays (SAM).
This paper presents a straight-forward yet powerful FDR control method termed miFDR, which aims to minimize FDR when calling a fixed number of significant features. We theoretically proved that the strategy used by miFDR is able to find the optimal number of significant features when the desired FDR is fixed.
We compared miFDR with the BH approach, the Storey approach and SAM on both simulated datasets and public DNA microarray datasets. The results demonstrated that miFDR outperforms others by identifying more significant features under the same FDR cut-offs. Literature search showed that many genes called only by miFDR are indeed relevant to the underlying biology of interest.
FDR has been widely applied to analyzing high-throughput datasets allowed for rapid discoveries. Under the same FDR threshold, miFDR is capable to identify more significant features than its competitors at a compatible level of complexity. Therefore, it can potentially generate great impacts on biological and biomedical research.
If interested, please contact the authors for getting miFDR.
PMCID: PMC3856609  PMID: 24564975
4.  Proteomic and Functional Genomic Landscape of Receptor Tyrosine Kinase and Ras to Extracellular Signal–Regulated Kinase Signaling 
Science signaling  2011;4(196):rs10.
Characterizing the extent and logic of signaling networks is essential to understanding specificity in such physiological and pathophysiological contexts as cell fate decisions and mechanisms of oncogenesis and resistance to chemotherapy. Cell-based RNA interference (RNAi) screens enable the inference of large numbers of genes that regulate signaling pathways, but these screens cannot provide network structure directly. We describe an integrated network around the canonical receptor tyrosine kinase (RTK)–Ras–extracellular signal–regulated kinase (ERK) signaling pathway, generated by combining parallel genome-wide RNAi screens with protein-protein interaction (PPI) mapping by tandem affinity purification–mass spectrometry. We found that only a small fraction of the total number of PPI or RNAi screen hits was isolated under all conditions tested and that most of these represented the known canonical pathway components, suggesting that much of the core canonical ERK pathway is known. Because most of the newly identified regulators are likely cell type– and RTK-specific, our analysis provides a resource for understanding how output through this clinically relevant pathway is regulated in different contexts. We report in vivo roles for several of the previously unknown regulators, including CG10289 and PpV, the Drosophila orthologs of two components of the serine/threonine–protein phosphatase 6 complex; the Drosophila ortholog of TepIV, a glycophosphatidylinositol-linked protein mutated in human cancers; CG6453, a noncatalytic subunit of glucosidase II; and Rtf1, a histone methyltransferase.
PMCID: PMC3439136  PMID: 22028469
5.  Imaging analysis of clock neurons: light buffers the wake-promoting effect of dopamine 
Nature neuroscience  2011;14(7):889-895.
How animals maintain proper amounts of sleep yet still be flexible to changes in the environmental conditions remains unknown. Here we showed that environmental light suppresses the wake-promoting effects of dopamine in fly brains. A subset of clock neurons, the 10 large lateral-ventral neurons (l-LNvs), are wake-promoting and respond to dopamine, octopamine as well as light. Behavioral and imaging analyses suggested that dopamine is a stronger arousal signal than octopamine. Surprisingly, light exposure not only suppressed the l-LNv responses but also synchronized responses of neighboring l-LNvs. This regulation occured by distinct mechanisms: light-mediated suppression of octopamine responses is regulated by the circadian clock, whereas light regulation of dopamine responses occurs by upregulation of inhibitory dopamine receptors. Plasticity therefore alters the relative importance of diverse cues based on the environmental mix of stimuli. The regulatory mechanisms described here may contribute to the control of sleep stability while still allowing behavioral flexibility.
PMCID: PMC3424274  PMID: 21685918
6.  Natural Language Processing and the Oncologic History: Is There a Match? 
Journal of Oncology Practice  2011;7(4):e15-e19.
The widespread adoption of electronic health records within the oncology community is creating rich databases that contain details of the cancer care continuum. Large portions of this information are locked up in free text, but several efforts are underway to address this.
The widespread adoption of electronic health records (EHRs) is creating rich databases documenting the cancer patient's care continuum. However, much of this data, especially narrative “oncologic histories,” are “locked” within free text (unstructured) portions of notes. Nationwide incentives, ranging from certification (Quality Oncology Practice Initiative) to monetary reimbursement (the Health Information Technology for Economic and Clinical Health Act), increasingly require the translation of these histories into treatment summaries for patient use and into tools to assist in transitions of care. Unfortunately, formulation of treatment summaries from these data is difficult and time-consuming. The rapidly developing field of automated natural language processing may offer a solution to this communication problem.
We surveyed a cross section of providers at Beth Israel Deaconess Medical Center regarding the importance of treatment summaries and whether these were being formulated on a regular basis. We also developed a program for the Informatics for Integrating Biology and the Bedside challenge, which was designed to extract meaningful information from EHRs. The program was then applied to a sample of narrative oncologic histories.
The majority of providers (86%) felt that treatment summaries were important, but only 11% actually implemented them. The most common obstacles identified were lack of time and lack of EHR tools. We demonstrated that relevant medical concepts can be automatically extracted from oncologic histories with reasonable accuracy and precision.
Natural language processing technology offers a promising method for structuring a free-text oncologic history into a compact treatment summary, creating a robust and accurate means of communication between providers and between provider and patient.
PMCID: PMC3140455  PMID: 22043196
7.  High-Content Chemical and RNAi Screens for Suppressors of Neurotoxicity in a Huntington's Disease Model 
PLoS ONE  2011;6(8):e23841.
To identify Huntington's Disease therapeutics, we conducted high-content small molecule and RNAi suppressor screens using a Drosophila primary neural culture Huntingtin model. Drosophila primary neurons offer a sensitive readout for neurotoxicty, as their neurites develop dysmorphic features in the presence of mutant polyglutamine-expanded Huntingtin compared to nonpathogenic Huntingtin. By tracking the subcellular distribution of mRFP-tagged pathogenic Huntingtin and assaying neurite branch morphology via live-imaging, we identified suppressors that could reduce Huntingtin aggregation and/or prevent the formation of dystrophic neurites. The custom algorithms we used to quantify neurite morphologies in complex cultures provide a useful tool for future high-content screening approaches focused on neurodegenerative disease models. Compounds previously found to be effective aggregation inhibitors in mammalian systems were also effective in Drosophila primary cultures, suggesting translational capacity between these models. However, we did not observe a direct correlation between the ability of a compound or gene knockdown to suppress aggregate formation and its ability to rescue dysmorphic neurites. Only a subset of aggregation inhibitors could revert dysmorphic cellular profiles. We identified lkb1, an upstream kinase in the mTOR/Insulin pathway, and four novel drugs, Camptothecin, OH-Camptothecin, 18β-Glycyrrhetinic acid, and Carbenoxolone, that were strong suppressors of mutant Huntingtin-induced neurotoxicity. Huntingtin neurotoxicity suppressors identified through our screen also restored viability in an in vivo Drosophila Huntington's Disease model, making them attractive candidates for further therapeutic evaluation.
PMCID: PMC3166080  PMID: 21909362
8.  Oligodendrocyte development and myelinogenesis are not impaired by high concentrations of phenylalanine or its metabolites 
Phenylketonuria (PKU) is a metabolic genetic disease characterized by deficient phenylalanine hydroxylase (PAH) enzymatic activity. Brain hypomyelination has been reported in untreated patients, but its mechanism remains unclear. We therefore investigated the influence of phenylalanine (Phe), phenylpyruvate (PP), and phenylacetate (PA) on oligodendrocytes. We fisrt showed in a mouse model of PKU that the number of oligodendrocytes is not different in corpus callosum sections from adult mutants or from control brains. Then, using enriched oligodendroglial cultures, we detected no cytotoxic effect of high concentrations of Phe, PP, or PA. Finally, we analyzed the impact of Phe, PP, and PA on the myelination process in myelinating cocultures using both an in vitro index of myelination, based on activation of the myelin basic protein (MBP) promoter, and the direct quantification of myelin sheaths by both optical measurement and a bioinformatics method. None of these parameters was affected by the increased levels of Phe or its derivatives. Taken together, our data demonstrate that high levels of Phe, such as in PKU, are unlikely to directly induce brain hypomyelination, suggesting involvement of alternative mechanisms in this myelination defect.
PMCID: PMC3071566  PMID: 20151197
9.  Intelligent Interfaces for Mining Large-Scale RNAi-HCS Image Databases 
Recently, High-content screening (HCS) has been combined with RNA interference (RNAi) to become an essential image-based high-throughput method for studying genes and biological networks through RNAi-induced cellular phenotype analyses. However, a genome-wide RNAi-HCS screen typically generates tens of thousands of images, most of which remain uncategorized due to the inadequacies of existing HCS image analysis tools. Until now, it still requires highly trained scientists to browse a prohibitively large RNAi-HCS image database and produce only a handful of qualitative results regarding cellular morphological phenotypes. For this reason we have developed intelligent interfaces to facilitate the application of the HCS technology in biomedical research. Our new interfaces empower biologists with computational power not only to effectively and efficiently explore large-scale RNAi-HCS image databases, but also to apply their knowledge and experience to interactive mining of cellular phenotypes using Content-Based Image Retrieval (CBIR) with Relevance Feedback (RF) techniques.
PMCID: PMC3028207  PMID: 21278820
10.  Automatic inference of multicellular regulatory networks using informative priors 
To fully understand the mechanisms governing animal development, computational models and algorithms are needed to enable quantitative studies of the underlying regulatory networks. We developed a mathematical model based on dynamic Bayesian networks to model multicellular regulatory networks that govern cell differentiation processes. A machine-learning method was developed to automatically infer such a model from heterogeneous data. We show that the model inference procedure can be greatly improved by incorporating interaction data across species. The proposed approach was applied to C. elegans vulval induction to reconstruct a model capable of simulating C. elegans vulval induction under 73 different genetic conditions.
PMCID: PMC3024031  PMID: 20090166
multicellular regulatory network; DBN; dynamic Bayesian network; animal development
11.  Automatic Robust Neurite Detection and Morphological Analysis of Neuronal Cell Cultures in High-content Screening 
Neuroinformatics  2010;8(2):83-100.
Cell-based high content screening (HCS) is becoming an important and increasingly favored approach in therapeutic drug discovery and functional genomics. In HCS, changes in cellular morphology and biomarker distributions provide an information-rich profile of cellular responses to experimental treatments such as small molecules or gene knockdown probes. One obstacle that currently exists with such cell-based assays is the availability of image processing algorithms that are capable of reliably and automatically analyzing large HCS image sets. HCS images of primary neuronal cell cultures are particularly challenging to analyze due to complex cellular morphology. Here we present a robust method for quantifying and statistically analyzing the morphology of neuronal cells in HCS images. The major advantages of our method over existing software lie in its capability to correct non-uniform illumination using the contrast-limited adaptive histogram equalization method; segment neuromeres using Gabor-wavelet texture analysis; and detect faint neurites by a novel phase-based neurite extraction algorithm that is invariant to changes in illumination and contrast and can accurately localize neurites. Our method was successfully applied to analyze a large HCS image set generated in a morphology screen for polyglutamine-mediated neuronal toxicity using primary neuronal cell cultures derived from embryos of a Drosophila Huntington’s Disease (HD) model.
PMCID: PMC3022421  PMID: 20405243
High content screening; Neurite detection; Neuromeres; Gabor filter; Phase symmetry; Huntington’s Disease
12.  DMob4/Phocein regulates synapse formation, axonal transport, and microtubule organization 
The Mob family of kinase-interacting proteins regulate cell cycle and cell morphology, and their dysfunction has been linked to cancer. Models for Mob function are largely based on studies of Mob1 and Mob2 family members in yeast. In contrast, the function of the highly conserved metazoan Phocein/Mob3 subfamily is unknown. We identified the Drosophila Phocein homolog (DMob4) as a regulator of neurite branching in a genome-wide RNAi screen for neuronal morphology mutants. To further characterize DMob4, we generated null and hypomorphic alleles and carried out in vivo cell biological and physiological analysis. We find that DMob4 plays a prominent role in neural function, regulating axonal transport, membrane excitability and organization of microtubule networks. DMob4 mutant neuromuscular synapses also show a profound overgrowth of synaptic boutons, similar to known Drosophila endocytotic mutants. DMob4 and human Phocein are >80% identical, and the lethality of DMob4 mutants can be rescued by a human phocein transgene, indicating a conservation of function across evolution. These findings suggest a novel role for Phocein proteins in the regulation of axonal transport, neurite elongation, synapse formation and microtubule organization.
PMCID: PMC2862384  PMID: 20392941
Drosophila; knockout; Axonal Transport [Axoplasmic Transport]; Microtubule; Dendrite; Synapse
13.  Comparative Analysis of Argonaute-dependent Small RNA Pathways in Drosophila 
Molecular cell  2008;32(4):592-599.
The specificity of RNAi pathways is determined by several classes of small RNAs, which include siRNAs, piRNAs, endo-siRNAs, and microRNAs (miRNAs). These small RNAs are invariably incorporated into large Argonaute (Ago)-containing effector complexes known as RNA-induced silencing complexes (RISCs), which they guide to silencing targets. Both genetic and biochemical strategies have yielded conserved molecular components of small RNA biogenesis and effector machineries. However, given the complexity of these pathways, there are likely to be additional components and regulators that remain to be uncovered. We have undertaken a comparative and comprehensive RNAi screen to identify genes that impact three major Ago-dependent small RNA pathways that operate in Drosophila S2 cells. We identify subsets of candidates that act positively or negatively in siRNA, endo-siRNA and miRNA pathways. Our studies indicate that many components are shared among all three Argonaute-dependent silencing pathways, though each is also impacted by discrete sets of genes.
PMCID: PMC2615197  PMID: 19026789
14.  Identification of Neural Outgrowth Genes using Genome-Wide RNAi 
PLoS Genetics  2008;4(7):e1000111.
While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi) on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new genes that have important functions in the nervous system.
Author Summary
Development and function of the brain requires the coordinated action of thousands of genes, and currently we understand the roles of only a small fraction of them. Recent advances in genomics, such as the sequencing of entire genomes and the discovery of RNA-interference as a means of testing the effects of gene loss, have opened up the possibility to systematically analyze the function of all known and predicted genes in an organism. Until now, this type of functional genomics approach has not been applied to the study of very complex cells, such as the brain's neurons, on a full-genome scale. In this work, we developed techniques to test all genes, one by one in a rapid manner, for their potential role in neuronal development using neurons isolated from fruit fly embryos. These results yielded a global perspective of what types of genes are necessary for brain development; importantly, they show that a large variety of genes can be studied in this way.
PMCID: PMC2435276  PMID: 18604272
15.  GeneNotes – A novel information management software for biologists 
BMC Bioinformatics  2005;6:20.
Collecting and managing information is a challenging task in a genome-wide profiling research project. Most databases and online computational tools require a direct human involvement. Information and computational results are presented in various multimedia formats (e.g., text, image, PDF, word files, etc.), many of which cannot be automatically processed by computers in biologically meaningful ways. In addition, the quality of computational results is far from perfect and requires nontrivial manual examination. The timely selection, integration and interpretation of heterogeneous biological information still heavily rely on the sensibility of biologists. Biologists often feel overwhelmed by the huge amount of and the great diversity of distributed heterogeneous biological information.
We developed an information management application called GeneNotes. GeneNotes is the first application that allows users to collect and manage multimedia biological information about genes/ESTs. GeneNotes provides an integrated environment for users to surf the Internet, collect notes for genes/ESTs, and retrieve notes. GeneNotes is supported by a server that integrates gene annotations from many major databases (e.g., HGNC, MGI, etc.). GeneNotes uses the integrated gene annotations to (a) identify genes given various types of gene IDs (e.g., RefSeq ID, GenBank ID, etc.), and (b) provide quick views of genes. GeneNotes is free for academic usage. The program and the tutorials are available at: .
GeneNotes provides a novel human-computer interface to assist researchers to collect and manage biological information. It also provides a platform for studying how users behave when they manipulate biological information. The results of such study can lead to innovation of more intelligent human-computer interfaces that greatly shorten the cycle of biology research.
PMCID: PMC549201  PMID: 15686593

