|Home | About | Journals | Submit | Contact Us | Français|
This article introduces the second release of the Gypsy Database of Mobile Genetic Elements (GyDB 2.0): a research project devoted to the evolutionary dynamics of viruses and transposable elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is continuously progressing, and that owing to the high molecular diversity of mobile elements requires to be completed in several stages. GyDB 2.0 has been powered with a wiki to allow other researchers participate in the project. The current database stage and scope are long terminal repeats (LTR) retroelements and relatives. GyDB 2.0 is an update based on the analysis of Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants. Among other features, in terms of the aforementioned topics, this update adds: (i) a variety of descriptions and reviews distributed in multiple web pages; (ii) protein-based phylogenies, where phylogenetic levels are assigned to distinct classified elements; (iii) a collection of multiple alignments, lineage-specific hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq databases and BLAST and HMM servers to facilitate sequence characterization of new LTR retroelement and caulimovirus queries; and (v) a bibliographic server. GyDB 2.0 is available at http://gydb.org.
Mobile genetic elements (MGEs) are ubiquitous, autonomous genetic units that often constitute a significant part of their host genomes. It is commonly accepted that mobile DNA elements are powerful vectors for disease and evolution, from which distinct host genes have evolved during the history of life (1,2). The emergence and subsequent role played by viruses and MGEs in the history of life is an exciting topic that requires further investigation. In this respect, researchers aim to discern relevant aspects of the molecular changes responsible for various characteristics in organisms related to horizontal transfer, infection and disease. Among the distinct initiatives launched with the aim of investigating the diversity of MGEs (see for example 3–5) was the Gypsy Database (GyDB) of MGEs (6), a research project devoted to the evolutionary dynamics of viruses and MGEs (and their related host proteins), which was launched in 2008. The GyDB project is a highly informative database established within an evolutionary context of classification, where one piece of research delivers one conclusion that drives individuals towards another goal. The most captivating aspect of this project is that a share of our efforts are dedicated to the interpretation of analyses, paying particular attention to non-redundant elements displaying a certain degree of distance and investigating how they can be collectively aligned or related, in terms of protein domain architecture, with other lineages and elements. Because of the impressive molecular diversity of viruses and MGEs, the GyDB is a long-term project that has been arranged in a database in continuous progression, and must be achieved in stages. The current database stage and scope is retroviruses and retrotransposons with long terminal repeats (LTR retroelements) and their relatives. Following the outline of the earlier release (the study of Ty3/Gypsy and Retroviridae LTR retroelements), this article presents the GyDB update based on the phylogenetic evaluation of the most representative LTR retroelement families and the plant caulimoviruses. This update, called GyDB 2.0, is available at http://gydb.org and includes sequence phylogenetic classification in addition to significant bioinformatic improvements. In particular, the new infrastructure implements a wiki management system constructed with the aim of promoting a world-wide community of researchers collaborating in the analysis and classification of MGEs and viruses inhabiting (or circulating in) living organisms.
GyDB 2.0 consists of 1234 web pages addressing the phylogenetic study of Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelement. Caulimoviruses (Caulimoviridae) are formally plant DNA pararetroviruses, but they were considered in GyDB 2.0 owing to their relationship with LTR retroelements based on the common gag/coat and pol regions [for more details, see (7) and references therein]. Table 1 summarizes the topics addressed in this update, as well as the servers and database sections it offers. The sequences on which GyDB 2.0 is based were retrieved from GenBank (8) and the methodologies employed were the same as those described earlier in references (6,7,9). At GyDB we evaluate the phylogenetic signal of classified distinct elements and create hidden Markov model (HMMs) profiles (10) per lineage and protein domain. In addition, the project is concerned with the evolutionary relationships between MGEs and their host genomes, based on the analysis of common protein families. In this regard, GyDB 2.0 focuses on two protein superfamilies including protein products commonly encoded by LTR retroelements and their host genomes; the chromodomain superfamily (11) and clan AA of aspartic peptidases (12,13). This second release is accompanied by bibliographic data-mining from PubMed databases hosted at the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/) to document up to date information regarding the distinct classified elements.
GyDB 2.0 is deployed over a Linux-MySQL-Apache-PHP (LAMP) stack, with additional Ajax programming to minimize server responses to client browsers. The design is similar to that of the previous release but implements various changes on the web interface. As shown in Figure 1, the database organization is founded upon two major menus––a top menu and a side menu. The top menu allows access to the three servers:
An additional new tool in GyDB 2.0 is its wiki, powered by the MediaWiki content management system (http://www.mediawiki.org/). This tool has been implemented to allow other users participate in the project by editing or creating topics. Accession to this wiki is free but it requires a subscription (registration). The rationale behind this choice is that edits are registered by date and author in order to credit contributions, and secondly, we have programmed a revision mechanism to review all changes constructively before making them public. The top menu includes three sections to log in and manage the distinct wiki resources. Finally, to the right of the top menu, GyDB 2.0 includes a text field to search the whole project under two modes (detailed in Figure 1). The side menu divides the distinct GyDB sections into three major demarcations (emphasized with boxes in Figure 1). The first collects sections associated with the systematics applied at GyDB. The second implements information concerning the domains typically observed in the genomic structure of the elements we classify. The third demarcation offers free access to distinct databases, which are organized into three sections:
Finally, a variety of links to other database initiatives relevant to the topic are included in the side menu.
Sequencing projects constantly deliver new types of MGEs [for example (17–22)]; hence the classification of non-redundant elements based on their phylogenetic signal is an open issue at GyDB, and results in the preparation of new sections. For example, we are committed to improving the understanding of the diversity and evolutionary dynamics of MGEs in eukaryotic and prokaryotic organisms. In this regard of eukaryotic LTR retroelements (the current database scope), the sequence repertoire at GyDB with representative elements retrieved from recently sequenced marine secondary endosymbionts including the brown alga Ectocarpus siliculosus (heterokont) and the coccolithophore Emiliania huxleyi (haptophyte) will be implemented. In terms of other research topics in preparation, one concerns the construction of a server devoted to the study of the complete set of MGEs and repeats (the mobilome) of biological genomes. This server will be introduced with two forthcoming publications focusing on the LTR retroelements and their related transposases of the pea aphid Acyrthosiphon pisum genome [see (23)]. At the technical level, we are exploring the application of formal grammars and machine learning algorithms to automate, as far as possible, the management and classification of the sequence data. We are also committed to developing solutions for other non-trivial difficulties that arise with the growing size of the databases. Viruses and MGEs usually show different rates of evolution and high variability depending on the evaluated protein or region. Therefore, we aim to implement more than one method of phylogenetic reconstruction to offer the user different perspectives based on different methods (or the opportunity to upload updated phylogenies via the wiki). On the other hand, the traditional view of the origin and evolution of biological systems is that they are usually monophyletic, but such an assumption has been challenged by increasing evidence suggesting that natural evolution can frequently proceed by gradual and vertical means, in addition to distinct modular, saltatory and reticulate events (24–36). In this respect, we are investigating appropriate protocols to combine phylogenetic inference with new tendencies in network biology [see also (7)].
Centro de Desarrollo Tecnológico Industrial (CDTI) (grant IDI-20100007, partial); Empresa Nacional de Innovación, S.A (ENISA) (17092008, partial); IMPIVA (IMIDTA/2009/118 and IMDTA/2010/740, partial); European Regional Development Fund (ERDF); Ministerio de Ciencia e Innovación (MICINN) (Torres-Quevedo grants PTQ-09-01-00020, PTQ-09-01-00670 and PTQ-10-03552, partial). Funding for open access charge: University of Valencia.
Conflict of interest statement. None declared.
We thank all the colleagues detailed in the list available at (http://gydb.org/index.php/Acknowledgments) for their support in contributing images of biological host organisms. We are also grateful to Senior NAR Editor Dr Michael Galperin and to the two anonymous reviewers for their constructive comments in improving this article. Finally we also thank Denys Wheatley and Angela Panther from Biomedes for copyediting of this article.