The construction and analysis of networks is increasingly widespread in biological research. We have developed esyN (“easy networks”) as a free and open source tool to facilitate the exchange of biological network models between researchers. esyN acts as a searchable database of user-created networks from any field. We have developed a simple companion web tool that enables users to view and edit networks using data from publicly available databases. Both normal interaction networks (graphs) and Petri nets can be created. In addition to its basic tools, esyN contains a number of logical templates that can be used to create models more easily. The ability to use previously published models as building blocks makes esyN a powerful tool for the construction of models and network graphs. Users are able to save their own projects online and share them either publicly or with a list of collaborators. The latter can be given the ability to edit the network themselves, allowing online collaboration on network construction. esyN is designed to facilitate unrestricted exchange of this increasingly important type of biological information. Ultimately, the aim of esyN is to bring the advantages of Open Source software development to the construction of biological networks.
InterMine (www.intermine.org) is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis.
Coral reefs are major contributors to marine biodiversity. However, they are in rapid decline due to global environmental changes such as rising sea surface temperatures, ocean acidification, and pollution. Genomic and transcriptomic analyses have broadened our understanding of coral biology, but a study of the microRNA (miRNA) repertoire of corals is missing. miRNAs constitute a class of small non-coding RNAs of ∼22 nt in size that play crucial roles in development, metabolism, and stress response in plants and animals alike. In this study, we examined the coral Stylophora pistillata for the presence of miRNAs and the corresponding core protein machinery required for their processing and function. Based on small RNA sequencing, we present evidence for 31 bona fide microRNAs, 5 of which (miR-100, miR-2022, miR-2023, miR-2030, and miR-2036) are conserved in other metazoans. Homologues of Argonaute, Piwi, Dicer, Drosha, Pasha, and HEN1 were identified in the transcriptome of S. pistillata based on strong sequence conservation with known RNAi proteins, with additional support derived from phylogenetic trees. Examination of putative miRNA gene targets indicates potential roles in development, metabolism, immunity, and biomineralisation for several of the microRNAs. Here, we present first evidence of a functional RNAi machinery and five conserved miRNAs in S. pistillata, implying that miRNAs play a role in organismal biology of scleractinian corals. Analysis of predicted miRNA target genes in S. pistillata suggests potential roles of miRNAs in symbiosis and coral calcification. Given the importance of miRNAs in regulating gene expression in other metazoans, further expression analyses of small non-coding RNAs in transcriptional studies of corals should be informative about miRNA-affected processes and pathways.
Animal and plant genomes produce numerous small RNAs (smRNAs) that regulate gene expression post-transcriptionally affecting metabolism, development, and epigenetic inheritance. In order to characterize the repertoire of endogenous smRNAs and potential gene targets in dinoflagellates, we conducted smRNA and mRNA expression profiling over 9 experimental treatments of cultures from Symbiodinium microadriaticum, a photosynthetic symbiont of scleractinian corals.
We identified a set of 21 novel smRNAs that share stringent key features with functional microRNAs from other model organisms. smRNAs were predicted independently over all 9 treatments and their putative gene targets were identified. We found 1,720 animal-like target sites in the 3'UTRs of 12,858 mRNAs and 19 plant-like target sites in 51,917 genes. We assembled a transcriptome of 58,649 genes and determined differentially expressed genes (DEGs) between treatments. Heat stress was found to produce a much larger number of DEGs than other treatments that yielded only few DEGs. Analysis of DEGs also revealed that minicircle-encoded photosynthesis proteins seem to be common targets of transcriptional regulation. Furthermore, we identified the core RNAi protein machinery in Symbiodinium.
Integration of smRNA and mRNA expression profiling identified a variety of processes that could be under microRNA control, e.g. protein modification, signaling, gene expression, and response to DNA damage. Given that Symbiodinium seems to have a paucity of transcription factors and differentially expressed genes, identification and characterization of its smRNA repertoire establishes the possibility of a range of gene regulatory mechanisms in dinoflagellates acting post-transcriptionally.
Symbiodinium; Dinoflagellates; Scleractinian corals; Symbiont; Coral reef; Small RNA (smRNA); microRNA (miRNA); Small interfering RNA (siRNA); mRNA; Expression profiling; RNAseq
Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research.
The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization.
We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.
BioHackathon; Open source; Software; Semantic Web; Databases; Data integration; Data visualization; Web services; Interfaces
Bdelloid rotifers are microinvertebrates with unique characteristics: they have survived tens of millions of years without sexual reproduction; they withstand extreme desiccation by undergoing anhydrobiosis; and they tolerate very high levels of ionizing radiation. Recent evidence suggests that subtelomeric regions of the bdelloid genome contain sequences originating from other organisms by horizontal gene transfer (HGT), of which some are known to be transcribed. However, the extent to which foreign gene expression plays a role in bdelloid physiology is unknown. We address this in the first large scale analysis of the transcriptome of the bdelloid Adineta ricciae: cDNA libraries from hydrated and desiccated bdelloids were subjected to massively parallel sequencing and assembled transcripts compared against the UniProtKB database by blastx to identify their putative products. Of ∼29,000 matched transcripts, ∼10% were inferred from blastx matches to be horizontally acquired, mainly from eubacteria but also from fungi, protists, and algae. After allowing for possible sources of error, the rate of HGT is at least 8%–9%, a level significantly higher than other invertebrates. We verified their foreign nature by phylogenetic analysis and by demonstrating linkage of foreign genes with metazoan genes in the bdelloid genome. Approximately 80% of horizontally acquired genes expressed in bdelloids code for enzymes, and these represent 39% of enzymes in identified pathways. Many enzymes encoded by foreign genes enhance biochemistry in bdelloids compared to other metazoans, for example, by potentiating toxin degradation or generation of antioxidants and key metabolites. They also supplement, and occasionally potentially replace, existing metazoan functions. Bdelloid rotifers therefore express horizontally acquired genes on a scale unprecedented in animals, and foreign genes make a profound contribution to their metabolism. This represents a potential mechanism for ancient asexuals to adapt rapidly to changing environments and thereby persist over long evolutionary time periods in the absence of sex.
Bdelloid rotifers are tiny invertebrates with unusual characteristics: they withstand stresses, such as desiccation and high levels of ionising radiation, that kill other animals, and they have survived over millions of years without sexual reproduction, which contradicts theories on the evolutionary advantages of sex. In this study, we investigate another bizarre feature of bdelloids, namely their ability to acquire genes from other organisms in a process known as horizontal gene transfer (HGT). We show that HGT happens on an unprecedented scale in bdelloids: approximately 10% of active genes are “foreign,” mostly originating from bacteria and other simple organisms like fungi and algae, but now functioning as bdelloid genes. About 80% of foreign genes code for enzymes, and these make a major contribution to bdelloid biochemistry: 39% of enzyme activities have a foreign contribution, and in 23% of cases the activity in question is uniquely specified by a foreign gene. This indicates biochemistry, such as toxin degradation and antioxidant generation, that is unknown in other animals and that is expected to improve the “robustness” of the bdelloid. It also represents a possible mechanism for survival without sex, by diversification of functional capacity and even replacement of defective genes by foreign counterparts.
Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of ‘widgets’ performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages.
Availability: Freely available from http://www.intermine.org under the LGPL license.
Supplementary data are available at Bioinformatics online.
Bdelloid rotifers are microscopic animals that have apparently survived without sex for millions of years and are able to survive desiccation at all life stages through a process called anhydrobiosis. Both of these characteristics are believed to have played a role in shaping several unusual features of bdelloid genomes discovered in recent years. Studies into the impact of asexuality and anhydrobiosis on bdelloid genomes have focused on understanding gene copy number. Here we investigate copy number and sequence divergence in alpha tubulin. Alpha tubulin is conserved and normally present in low copy numbers in animals, but multiplication of alpha tubulin copies has occurred in animals adapted to extreme environments, such as cold-adapted Antarctic fish. Using cloning and sequencing we compared alpha tubulin copy variation in four species of bdelloid rotifers and four species of monogonont rotifers, which are facultatively sexual and cannot survive desiccation as adults. Results were verified using transcriptome data from one bdelloid species, Adineta ricciae.
In common with the typical pattern for animals, monogonont rotifers contain either one or two copies of alpha tubulin, but bdelloid species contain between 11 and 13 different copies, distributed across five classes. Approximately half of the copies form a highly conserved group that vary by only 1.1% amino acid pairwise divergence with each other and with the monogonont copies. The other copies have divergent amino acid sequences that evolved significantly faster between classes than within them, relative to synonymous changes, and vary in predicted biochemical properties. Copies of each class were expressed under the laboratory conditions used to construct the transcriptome.
Our findings are consistent with recent evidence that bdelloids are degenerate tetraploids and that functional divergence of ancestral copies of genes has occurred, but show how further duplication events in the ancestor of bdelloids led to proliferation in both conserved and functionally divergent copies of this gene.
Bdelloid rotifers; Gene copies; Tubulin; Evolution
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) provides high-quality curated genomic, genetic, and molecular information on the genes and their products of the budding yeast Saccharomyces cerevisiae. To accommodate the increasingly complex, diverse needs of researchers for searching and comparing data, SGD has implemented InterMine (http://www.InterMine.org), an open source data warehouse system with a sophisticated querying interface, to create YeastMine (http://yeastmine.yeastgenome.org). YeastMine is a multifaceted search and retrieval environment that provides access to diverse data types. Searches can be initiated with a list of genes, a list of Gene Ontology terms, or lists of many other data types. The results from queries can be combined for further analysis and saved or downloaded in customizable file formats. Queries themselves can be customized by modifying predefined templates or by creating a new template to access a combination of specific data types. YeastMine offers multiple scenarios in which it can be used such as a powerful search interface, a discovery tool, a curation aid and also a complex database presentation format.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org.
Database URL: http://www.modencode.org.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
With the growing availability of entire genome sequences, an increasing number of scientists can exploit oligonucleotide microarrays for genome-scale expression studies. While probe-design is a major research area, relatively little work has been reported on the optimization of microarray protocols.
As shown in this study, suboptimal conditions can have considerable impact on biologically relevant observations. For example, deviation from the optimal temperature by one degree Celsius lead to a loss of up to 44% of differentially expressed genes identified. While genes from thousands of Gene Ontology categories were affected, transcription factors and other low-copy-number regulators were disproportionately lost. Calibrated protocols are thus required in order to take full advantage of the large dynamic range of microarrays.
For an objective optimization of protocols we introduce an approach that maximizes the amount of information obtained per experiment. A comparison of two typical samples is sufficient for this calibration. We can ensure, however, that optimization results are independent of the samples and the specific measures used for calibration. Both simulations and spike-in experiments confirmed an unbiased determination of generally optimal experimental conditions.
Well calibrated hybridization conditions are thus easily achieved and necessary for the efficient detection of differential expression. They are essential for the sensitive pro filing of low-copy-number molecules. This is particularly critical for studies of transcription factor expression, or the inference and study of regulatory networks.
Poly(A) signals located at the 3′ end of eukaryotic genes drive cleavage and polyadenylation at the same end of pre-mRNA. Although these sequences are expected only at the 3′ end of genes, we found that strong poly(A) signals are also predicted within the 5′ untranslated regions (UTRs) of many Drosophila melanogaster mRNAs. Most of these 5′ poly(A) signals have little influence on the processing of the endogenous transcripts, but they are very active when placed at the 3′ end of reporter genes. In investigating these unexpected observations, we discovered that both these novel poly(A) signals and standard poly(A) signals become functionally silent when they are positioned close to transcription start sites in either Drosophila or human cells. This indicates that the stage when the poly(A) signal emerges from the polymerase II (Pol II) transcription complex determines whether a putative poly(A) signal is recognized as functional. The data suggest that this mechanism, which probably prevents cryptic poly(A) signals from causing premature transcription termination, depends on low Ser2 phosphorylation of the C-terminal domain of Pol II and inefficient recruitment of processing factors.
Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.
The Drosophila melanogaster genome contains 29 serpin genes, 12 as single transcripts and 17 within 6 gene clusters. Many of these serpins have a conserved "hinge" motif characteristic of active proteinase inhibitors. However, a substantial proportion (42%) lacks this motif and represents non-inhibitory serpin-fold proteins of unknown function. Currently, it is not known whether orthologous, inhibitory serpin genes retain the same target proteinase specificity within the Drosophilid lineage, nor whether they give rise to non-inhibitory serpin-fold proteins or other, more diverged, proteins.
We collated 188 orthologues to the D. melanogaster serpins from the other 11 Drosophilid genomes and used synteny to find further family members, raising the total to 226, or 71% of the number of orthologues expected assuming complete conservation across all 12 Drosophilid species. In general the sequence constraints on the serpin-fold itself are loose. The critical Reactive Centre Loop (RCL) sequence, including the target proteinase cleavage site, is strongly conserved in inhibitory serpins, although there are 3 exceptional sets of orthologues in which the evolutionary constraints are looser. Conversely, the RCL of non-inhibitory serpin orthologues is less conserved, with 3 exceptions that presumably bind to conserved partner molecules. We derive a consensus hinge motif, for Drosophilid inhibitory serpins, which differs somewhat from that of the vertebrate consensus. Three gene clusters appear to have originated in the melanogaster subgroup, Spn28D, Spn77B and Spn88E, each containing one inhibitory serpin orthologue that is present in all Drosophilids. In addition, the Spn100A transcript appears to represent a novel serpin-derived fold.
In general, inhibitory serpins rarely change their range of proteinase targets, except by a duplication/divergence mechanism. Non-inhibitory serpins appear to derive from inhibitory serpins, but not the reverse. The conservation of different family members varied widely across the 12 sequenced Drosophilid genomes. An approach considering synteny as well as homology was important to find the largest set of orthologues.
This novel web-based database provides unique accessibility and querying of integrated genomic and proteomic data for Drosophila and Anopheles.
FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists.