1.  esyN: Network Building, Sharing and Publishing 
PLoS ONE  2014;9(9):e106035.
The construction and analysis of networks is increasingly widespread in biological research. We have developed esyN (“easy networks”) as a free and open source tool to facilitate the exchange of biological network models between researchers. esyN acts as a searchable database of user-created networks from any field. We have developed a simple companion web tool that enables users to view and edit networks using data from publicly available databases. Both normal interaction networks (graphs) and Petri nets can be created. In addition to its basic tools, esyN contains a number of logical templates that can be used to create models more easily. The ability to use previously published models as building blocks makes esyN a powerful tool for the construction of models and network graphs. Users are able to save their own projects online and share them either publicly or with a list of collaborators. The latter can be given the ability to edit the network themselves, allowing online collaboration on network construction. esyN is designed to facilitate unrestricted exchange of this increasingly important type of biological information. Ultimately, the aim of esyN is to bring the advantages of Open Source software development to the construction of biological networks.
PMCID: PMC4152123  PMID: 25181461
2.  InterMine: extensive web services for modern biology 
Nucleic Acids Research  2014;42(Web Server issue):W468-W472.
InterMine ( is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis.
PMCID: PMC4086141  PMID: 24753429
3.  Identification of MicroRNAs in the Coral Stylophora pistillata 
PLoS ONE  2014;9(3):e91101.
Coral reefs are major contributors to marine biodiversity. However, they are in rapid decline due to global environmental changes such as rising sea surface temperatures, ocean acidification, and pollution. Genomic and transcriptomic analyses have broadened our understanding of coral biology, but a study of the microRNA (miRNA) repertoire of corals is missing. miRNAs constitute a class of small non-coding RNAs of ∼22 nt in size that play crucial roles in development, metabolism, and stress response in plants and animals alike. In this study, we examined the coral Stylophora pistillata for the presence of miRNAs and the corresponding core protein machinery required for their processing and function. Based on small RNA sequencing, we present evidence for 31 bona fide microRNAs, 5 of which (miR-100, miR-2022, miR-2023, miR-2030, and miR-2036) are conserved in other metazoans. Homologues of Argonaute, Piwi, Dicer, Drosha, Pasha, and HEN1 were identified in the transcriptome of S. pistillata based on strong sequence conservation with known RNAi proteins, with additional support derived from phylogenetic trees. Examination of putative miRNA gene targets indicates potential roles in development, metabolism, immunity, and biomineralisation for several of the microRNAs. Here, we present first evidence of a functional RNAi machinery and five conserved miRNAs in S. pistillata, implying that miRNAs play a role in organismal biology of scleractinian corals. Analysis of predicted miRNA target genes in S. pistillata suggests potential roles of miRNAs in symbiosis and coral calcification. Given the importance of miRNAs in regulating gene expression in other metazoans, further expression analyses of small non-coding RNAs in transcriptional studies of corals should be informative about miRNA-affected processes and pathways.
PMCID: PMC3962355  PMID: 24658574
4.  BioJS DAGViewer: A reusable JavaScript component for displaying directed graphs 
F1000Research  2014;3:51.
Summary: The DAGViewer BioJS component is a reusable JavaScript component made available as part of the BioJS project and intended to be used to display graphs of structured data, with a particular emphasis on Directed Acyclic Graphs (DAGs). It enables users to embed representations of graphs of data, such as ontologies or phylogenetic trees, in hyper-text documents (HTML). This component is generic, since it is capable (given the appropriate configuration) of displaying any kind of data that is organised as a graph. The features of this component which are useful for examining and filtering large and complex graphs are described.
PMCID: PMC3945768  PMID: 24627804
6.  Integrating microRNA and mRNA expression profiling in Symbiodinium microadriaticum, a dinoflagellate symbiont of reef-building corals 
BMC Genomics  2013;14:704.
Animal and plant genomes produce numerous small RNAs (smRNAs) that regulate gene expression post-transcriptionally affecting metabolism, development, and epigenetic inheritance. In order to characterize the repertoire of endogenous smRNAs and potential gene targets in dinoflagellates, we conducted smRNA and mRNA expression profiling over 9 experimental treatments of cultures from Symbiodinium microadriaticum, a photosynthetic symbiont of scleractinian corals.
We identified a set of 21 novel smRNAs that share stringent key features with functional microRNAs from other model organisms. smRNAs were predicted independently over all 9 treatments and their putative gene targets were identified. We found 1,720 animal-like target sites in the 3'UTRs of 12,858 mRNAs and 19 plant-like target sites in 51,917 genes. We assembled a transcriptome of 58,649 genes and determined differentially expressed genes (DEGs) between treatments. Heat stress was found to produce a much larger number of DEGs than other treatments that yielded only few DEGs. Analysis of DEGs also revealed that minicircle-encoded photosynthesis proteins seem to be common targets of transcriptional regulation. Furthermore, we identified the core RNAi protein machinery in Symbiodinium.
Integration of smRNA and mRNA expression profiling identified a variety of processes that could be under microRNA control, e.g. protein modification, signaling, gene expression, and response to DNA damage. Given that Symbiodinium seems to have a paucity of transcription factors and differentially expressed genes, identification and characterization of its smRNA repertoire establishes the possibility of a range of gene regulatory mechanisms in dinoflagellates acting post-transcriptionally.
PMCID: PMC3853145  PMID: 24119094
Symbiodinium; Dinoflagellates; Scleractinian corals; Symbiont; Coral reef; Small RNA (smRNA); microRNA (miRNA); Small interfering RNA (siRNA); mRNA; Expression profiling; RNAseq
7.  InterMOD: integrated data and tools for the unification of model organism research 
Scientific Reports  2013;3:1802.
Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
PMCID: PMC3647165  PMID: 23652793
8.  The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies 
BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research.
The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization.
We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.
PMCID: PMC3598643  PMID: 23398680
BioHackathon; Open source; Software; Semantic Web; Databases; Data integration; Data visualization; Web services; Interfaces
9.  Biochemical Diversification through Foreign Gene Expression in Bdelloid Rotifers 
PLoS Genetics  2012;8(11):e1003035.
Bdelloid rotifers are microinvertebrates with unique characteristics: they have survived tens of millions of years without sexual reproduction; they withstand extreme desiccation by undergoing anhydrobiosis; and they tolerate very high levels of ionizing radiation. Recent evidence suggests that subtelomeric regions of the bdelloid genome contain sequences originating from other organisms by horizontal gene transfer (HGT), of which some are known to be transcribed. However, the extent to which foreign gene expression plays a role in bdelloid physiology is unknown. We address this in the first large scale analysis of the transcriptome of the bdelloid Adineta ricciae: cDNA libraries from hydrated and desiccated bdelloids were subjected to massively parallel sequencing and assembled transcripts compared against the UniProtKB database by blastx to identify their putative products. Of ∼29,000 matched transcripts, ∼10% were inferred from blastx matches to be horizontally acquired, mainly from eubacteria but also from fungi, protists, and algae. After allowing for possible sources of error, the rate of HGT is at least 8%–9%, a level significantly higher than other invertebrates. We verified their foreign nature by phylogenetic analysis and by demonstrating linkage of foreign genes with metazoan genes in the bdelloid genome. Approximately 80% of horizontally acquired genes expressed in bdelloids code for enzymes, and these represent 39% of enzymes in identified pathways. Many enzymes encoded by foreign genes enhance biochemistry in bdelloids compared to other metazoans, for example, by potentiating toxin degradation or generation of antioxidants and key metabolites. They also supplement, and occasionally potentially replace, existing metazoan functions. Bdelloid rotifers therefore express horizontally acquired genes on a scale unprecedented in animals, and foreign genes make a profound contribution to their metabolism. This represents a potential mechanism for ancient asexuals to adapt rapidly to changing environments and thereby persist over long evolutionary time periods in the absence of sex.
Author Summary
Bdelloid rotifers are tiny invertebrates with unusual characteristics: they withstand stresses, such as desiccation and high levels of ionising radiation, that kill other animals, and they have survived over millions of years without sexual reproduction, which contradicts theories on the evolutionary advantages of sex. In this study, we investigate another bizarre feature of bdelloids, namely their ability to acquire genes from other organisms in a process known as horizontal gene transfer (HGT). We show that HGT happens on an unprecedented scale in bdelloids: approximately 10% of active genes are “foreign,” mostly originating from bacteria and other simple organisms like fungi and algae, but now functioning as bdelloid genes. About 80% of foreign genes code for enzymes, and these make a major contribution to bdelloid biochemistry: 39% of enzyme activities have a foreign contribution, and in 23% of cases the activity in question is uniquely specified by a foreign gene. This indicates biochemistry, such as toxin degradation and antioxidant generation, that is unknown in other animals and that is expected to improve the “robustness” of the bdelloid. It also represents a possible mechanism for survival without sex, by diversification of functional capacity and even replacement of defective genes by foreign counterparts.
PMCID: PMC3499245  PMID: 23166508
10.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data 
Bioinformatics  2012;28(23):3163-3165.
Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of ‘widgets’ performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages.
Availability: Freely available from under the LGPL license.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3516146  PMID: 23023984
11.  Multiple functionally divergent and conserved copies of alpha tubulin in bdelloid rotifers 
Bdelloid rotifers are microscopic animals that have apparently survived without sex for millions of years and are able to survive desiccation at all life stages through a process called anhydrobiosis. Both of these characteristics are believed to have played a role in shaping several unusual features of bdelloid genomes discovered in recent years. Studies into the impact of asexuality and anhydrobiosis on bdelloid genomes have focused on understanding gene copy number. Here we investigate copy number and sequence divergence in alpha tubulin. Alpha tubulin is conserved and normally present in low copy numbers in animals, but multiplication of alpha tubulin copies has occurred in animals adapted to extreme environments, such as cold-adapted Antarctic fish. Using cloning and sequencing we compared alpha tubulin copy variation in four species of bdelloid rotifers and four species of monogonont rotifers, which are facultatively sexual and cannot survive desiccation as adults. Results were verified using transcriptome data from one bdelloid species, Adineta ricciae.
In common with the typical pattern for animals, monogonont rotifers contain either one or two copies of alpha tubulin, but bdelloid species contain between 11 and 13 different copies, distributed across five classes. Approximately half of the copies form a highly conserved group that vary by only 1.1% amino acid pairwise divergence with each other and with the monogonont copies. The other copies have divergent amino acid sequences that evolved significantly faster between classes than within them, relative to synonymous changes, and vary in predicted biochemical properties. Copies of each class were expressed under the laboratory conditions used to construct the transcriptome.
Our findings are consistent with recent evidence that bdelloids are degenerate tetraploids and that functional divergence of ancestral copies of genes has occurred, but show how further duplication events in the ancestor of bdelloids led to proliferation in both conserved and functionally divergent copies of this gene.
PMCID: PMC3464624  PMID: 22901238
Bdelloid rotifers; Gene copies; Tubulin; Evolution
12.  YeastMine—an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit 
The Saccharomyces Genome Database (SGD; provides high-quality curated genomic, genetic, and molecular information on the genes and their products of the budding yeast Saccharomyces cerevisiae. To accommodate the increasingly complex, diverse needs of researchers for searching and comparing data, SGD has implemented InterMine (, an open source data warehouse system with a sophisticated querying interface, to create YeastMine ( YeastMine is a multifaceted search and retrieval environment that provides access to diverse data types. Searches can be initiated with a list of genes, a list of Gene Ontology terms, or lists of many other data types. The results from queries can be combined for further analysis and saved or downloaded in customizable file formats. Queries themselves can be customized by modifying predefined templates or by creating a new template to access a combination of specific data types. YeastMine offers multiple scenarios in which it can be used such as a powerful search interface, a discovery tool, a curation aid and also a complex database presentation format.
Database URL:
PMCID: PMC3308152  PMID: 22434830
13.  modMine: flexible access to modENCODE data 
Nucleic Acids Research  2011;40(Database issue):D1082-D1088.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database ( described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
PMCID: PMC3245176  PMID: 22080565
14.  Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE 
Roy, Sushmita | Ernst, Jason | Kharchenko, Peter V. | Kheradpour, Pouya | Negre, Nicolas | Eaton, Matthew L. | Landolin, Jane M. | Bristow, Christopher A. | Ma, Lijia | Lin, Michael F. | Washietl, Stefan | Arshinoff, Bradley I. | Ay, Ferhat | Meyer, Patrick E. | Robine, Nicolas | Washington, Nicole L. | Di Stefano, Luisa | Berezikov, Eugene | Brown, Christopher D. | Candeias, Rogerio | Carlson, Joseph W. | Carr, Adrian | Jungreis, Irwin | Marbach, Daniel | Sealfon, Rachel | Tolstorukov, Michael Y. | Will, Sebastian | Alekseyenko, Artyom A. | Artieri, Carlo | Booth, Benjamin W. | Brooks, Angela N. | Dai, Qi | Davis, Carrie A. | Duff, Michael O. | Feng, Xin | Gorchakov, Andrey A. | Gu, Tingting | Henikoff, Jorja G. | Kapranov, Philipp | Li, Renhua | MacAlpine, Heather K. | Malone, John | Minoda, Aki | Nordman, Jared | Okamura, Katsutomo | Perry, Marc | Powell, Sara K. | Riddle, Nicole C. | Sakai, Akiko | Samsonova, Anastasia | Sandler, Jeremy E. | Schwartz, Yuri B. | Sher, Noa | Spokony, Rebecca | Sturgill, David | van Baren, Marijke | Wan, Kenneth H. | Yang, Li | Yu, Charles | Feingold, Elise | Good, Peter | Guyer, Mark | Lowdon, Rebecca | Ahmad, Kami | Andrews, Justen | Berger, Bonnie | Brenner, Steven E. | Brent, Michael R. | Cherbas, Lucy | Elgin, Sarah C. R. | Gingeras, Thomas R. | Grossman, Robert | Hoskins, Roger A. | Kaufman, Thomas C. | Kent, William | Kuroda, Mitzi I. | Orr-Weaver, Terry | Perrimon, Norbert | Pirrotta, Vincenzo | Posakony, James W. | Ren, Bing | Russell, Steven | Cherbas, Peter | Graveley, Brenton R. | Lewis, Suzanna | Micklem, Gos | Oliver, Brian | Park, Peter J. | Celniker, Susan E. | Henikoff, Steven | Karpen, Gary H. | Lai, Eric C. | MacAlpine, David M. | Stein, Lincoln D. | White, Kevin P. | Kellis, Manolis
Science (New York, N.Y.)  2010;330(6012):1787-1797.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
PMCID: PMC3192495  PMID: 21177974
15.  The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details 
The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at
Database URL:
PMCID: PMC3170170  PMID: 21856757
16.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project 
Gerstein, Mark B. | Lu, Zhi John | Van Nostrand, Eric L. | Cheng, Chao | Arshinoff, Bradley I. | Liu, Tao | Yip, Kevin Y. | Robilotto, Rebecca | Rechtsteiner, Andreas | Ikegami, Kohta | Alves, Pedro | Chateigner, Aurelien | Perry, Marc | Morris, Mitzi | Auerbach, Raymond K. | Feng, Xin | Leng, Jing | Vielle, Anne | Niu, Wei | Rhrissorrakrai, Kahn | Agarwal, Ashish | Alexander, Roger P. | Barber, Galt | Brdlik, Cathleen M. | Brennan, Jennifer | Brouillet, Jeremy Jean | Carr, Adrian | Cheung, Ming-Sin | Clawson, Hiram | Contrino, Sergio | Dannenberg, Luke O. | Dernburg, Abby F. | Desai, Arshad | Dick, Lindsay | Dosé, Andréa C. | Du, Jiang | Egelhofer, Thea | Ercan, Sevinc | Euskirchen, Ghia | Ewing, Brent | Feingold, Elise A. | Gassmann, Reto | Good, Peter J. | Green, Phil | Gullier, Francois | Gutwein, Michelle | Guyer, Mark S. | Habegger, Lukas | Han, Ting | Henikoff, Jorja G. | Henz, Stefan R. | Hinrichs, Angie | Holster, Heather | Hyman, Tony | Iniguez, A. Leo | Janette, Judith | Jensen, Morten | Kato, Masaomi | Kent, W. James | Kephart, Ellen | Khivansara, Vishal | Khurana, Ekta | Kim, John K. | Kolasinska-Zwierz, Paulina | Lai, Eric C. | Latorre, Isabel | Leahey, Amber | Lewis, Suzanna | Lloyd, Paul | Lochovsky, Lucas | Lowdon, Rebecca F. | Lubling, Yaniv | Lyne, Rachel | MacCoss, Michael | Mackowiak, Sebastian D. | Mangone, Marco | McKay, Sheldon | Mecenas, Desirea | Merrihew, Gennifer | Miller, David M. | Muroyama, Andrew | Murray, John I. | Ooi, Siew-Loon | Pham, Hoang | Phippen, Taryn | Preston, Elicia A. | Rajewsky, Nikolaus | Rätsch, Gunnar | Rosenbaum, Heidi | Rozowsky, Joel | Rutherford, Kim | Ruzanov, Peter | Sarov, Mihail | Sasidharan, Rajkumar | Sboner, Andrea | Scheid, Paul | Segal, Eran | Shin, Hyunjin | Shou, Chong | Slack, Frank J. | Slightam, Cindie | Smith, Richard | Spencer, William C. | Stinson, E. O. | Taing, Scott | Takasaki, Teruaki | Vafeados, Dionne | Voronina, Ksenia | Wang, Guilin | Washington, Nicole L. | Whittle, Christina M. | Wu, Beijing | Yan, Koon-Kiu | Zeller, Georg | Zha, Zheng | Zhong, Mei | Zhou, Xingliang | Ahringer, Julie | Strome, Susan | Gunsalus, Kristin C. | Micklem, Gos | Liu, X. Shirley | Reinke, Valerie | Kim, Stuart K. | Hillier, LaDeana W. | Henikoff, Steven | Piano, Fabio | Snyder, Michael | Stein, Lincoln | Lieb, Jason D. | Waterston, Robert H.
Science (New York, N.Y.)  2010;330(6012):1775-1787.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
PMCID: PMC3142569  PMID: 21177976
17.  The impact of quantitative optimization of hybridization conditions on gene expression analysis 
BMC Bioinformatics  2011;12:73.
With the growing availability of entire genome sequences, an increasing number of scientists can exploit oligonucleotide microarrays for genome-scale expression studies. While probe-design is a major research area, relatively little work has been reported on the optimization of microarray protocols.
As shown in this study, suboptimal conditions can have considerable impact on biologically relevant observations. For example, deviation from the optimal temperature by one degree Celsius lead to a loss of up to 44% of differentially expressed genes identified. While genes from thousands of Gene Ontology categories were affected, transcription factors and other low-copy-number regulators were disproportionately lost. Calibrated protocols are thus required in order to take full advantage of the large dynamic range of microarrays.
For an objective optimization of protocols we introduce an approach that maximizes the amount of information obtained per experiment. A comparison of two typical samples is sufficient for this calibration. We can ensure, however, that optimization results are independent of the samples and the specific measures used for calibration. Both simulations and spike-in experiments confirmed an unbiased determination of generally optimal experimental conditions.
Well calibrated hybridization conditions are thus easily achieved and necessary for the efficient detection of differential expression. They are essential for the sensitive pro filing of low-copy-number molecules. This is particularly critical for studies of transcription factor expression, or the inference and study of regulatory networks.
PMCID: PMC3065421  PMID: 21401920
18.  Poly(A) Signals Located near the 5′ End of Genes Are Silenced by a General Mechanism That Prevents Premature 3′-End Processing ▿ † ‡  
Molecular and Cellular Biology  2010;31(4):639-651.
Poly(A) signals located at the 3′ end of eukaryotic genes drive cleavage and polyadenylation at the same end of pre-mRNA. Although these sequences are expected only at the 3′ end of genes, we found that strong poly(A) signals are also predicted within the 5′ untranslated regions (UTRs) of many Drosophila melanogaster mRNAs. Most of these 5′ poly(A) signals have little influence on the processing of the endogenous transcripts, but they are very active when placed at the 3′ end of reporter genes. In investigating these unexpected observations, we discovered that both these novel poly(A) signals and standard poly(A) signals become functionally silent when they are positioned close to transcription start sites in either Drosophila or human cells. This indicates that the stage when the poly(A) signal emerges from the polymerase II (Pol II) transcription complex determines whether a putative poly(A) signal is recognized as functional. The data suggest that this mechanism, which probably prevents cryptic poly(A) signals from causing premature transcription termination, depends on low Ser2 phosphorylation of the C-terminal domain of Pol II and inefficient recruitment of processing factors.
PMCID: PMC3028650  PMID: 21135120
19.  Unlocking the secrets of the genome 
Nature  2009;459(7249):927-930.
Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.
PMCID: PMC2843545  PMID: 19536255
20.  Identification and analysis of serpin-family genes by homology and synteny across the 12 sequenced Drosophilid genomes 
BMC Genomics  2009;10:489.
The Drosophila melanogaster genome contains 29 serpin genes, 12 as single transcripts and 17 within 6 gene clusters. Many of these serpins have a conserved "hinge" motif characteristic of active proteinase inhibitors. However, a substantial proportion (42%) lacks this motif and represents non-inhibitory serpin-fold proteins of unknown function. Currently, it is not known whether orthologous, inhibitory serpin genes retain the same target proteinase specificity within the Drosophilid lineage, nor whether they give rise to non-inhibitory serpin-fold proteins or other, more diverged, proteins.
We collated 188 orthologues to the D. melanogaster serpins from the other 11 Drosophilid genomes and used synteny to find further family members, raising the total to 226, or 71% of the number of orthologues expected assuming complete conservation across all 12 Drosophilid species. In general the sequence constraints on the serpin-fold itself are loose. The critical Reactive Centre Loop (RCL) sequence, including the target proteinase cleavage site, is strongly conserved in inhibitory serpins, although there are 3 exceptional sets of orthologues in which the evolutionary constraints are looser. Conversely, the RCL of non-inhibitory serpin orthologues is less conserved, with 3 exceptions that presumably bind to conserved partner molecules. We derive a consensus hinge motif, for Drosophilid inhibitory serpins, which differs somewhat from that of the vertebrate consensus. Three gene clusters appear to have originated in the melanogaster subgroup, Spn28D, Spn77B and Spn88E, each containing one inhibitory serpin orthologue that is present in all Drosophilids. In addition, the Spn100A transcript appears to represent a novel serpin-derived fold.
In general, inhibitory serpins rarely change their range of proteinase targets, except by a duplication/divergence mechanism. Non-inhibitory serpins appear to derive from inhibitory serpins, but not the reverse. The conservation of different family members varied widely across the 12 sequenced Drosophilid genomes. An approach considering synteny as well as homology was important to find the largest set of orthologues.
PMCID: PMC2770083  PMID: 19849829
21.  FlyMine: an integrated database for Drosophila and Anopheles genomics 
Genome Biology  2007;8(7):R129.
This novel web-based database provides unique accessibility and querying of integrated genomic and proteomic data for Drosophila and Anopheles.
FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists.
PMCID: PMC2323218  PMID: 17615057

