As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution.
•Rapid enhancer and slow promoter evolution across genomes of 20 mammalian species•Enhancers are rarely conserved across these mammals•Recently evolved enhancers dominate mammalian regulatory landscapes•Unbiased mapping links candidate enhancers with lineage-specific positive selection
Comparative functional genomic analysis in 20 mammalian species reveals distinct features for the evolution of enhancers, in comparison to those of promoters, across 180 million years.
Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
Motivation: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language.
Availability and implementation: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
The unique anatomical features of turtles have raised unanswered questions about the origin of their unique body plan. We generated and analyzed draft genomes of the soft-shell turtle (Pelodiscus sinensis) and the green sea turtle (Chelonia mydas); our results indicated the close relationship of the turtles to the bird-crocodilian lineage, from which they split ~267.9–248.3 million years ago (Upper Permian to Triassic). We also found extensive expansion of olfactory receptor genes in these turtles. Embryonic gene expression analysis identified an hourglass-like divergence of turtle and chicken embryogenesis, with maximal conservation around the vertebrate phylotypic period, rather than at later stages that show the amniote-common pattern. Wnt5a expression was found in the growth zone of the dorsal shell, supporting the possible co-option of limb-associated Wnt signaling in the acquisition of this turtle-specific novelty. Our results suggest that turtle evolution was accompanied by an unexpectedly conservative vertebrate phylotypic period, followed by turtle-specific repatterning of development to yield the novel structure of the shell.
The main limitations in the analysis of viral metagenomes are perhaps the high genetic variability and the lack of information in extant databases. To address these issues, several bioinformatic tools have been specifically designed or adapted for metagenomics by improving read assembly and creating more sensitive methods for homology detection. This study compares the performance of different available assemblers and taxonomic annotation software using simulated viral-metagenomic data.
We simulated two 454 viral metagenomes using genomes from NCBI's RefSeq database based on the list of actual viruses found in previously published metagenomes. Three different assembly strategies, spanning six assemblers, were tested for performance: overlap-layout-consensus algorithms Newbler, Celera and Minimo; de Bruijn graphs algorithms Velvet and MetaVelvet; and read probabilistic model Genovo. The performance of the assemblies was measured by the length of resulting contigs (using N50), the percentage of reads assembled and the overall accuracy when comparing against corresponding reference genomes. Additionally, the number of chimeras per contig and the lowest common ancestor were estimated in order to assess the effect of assembling on taxonomic and functional annotation. The functional classification of the reads was evaluated by counting the reads that correctly matched the functional data previously reported for the original genomes and calculating the number of over-represented functional categories in chimeric contigs. The sensitivity and specificity of tBLASTx, PhymmBL and the k-mer frequencies were measured by accurate predictions when comparing simulated reads against the NCBI Virus genomes RefSeq database.
Assembling improves functional annotation by increasing accurate assignations and decreasing ambiguous hits between viruses and bacteria. However, the success is limited by the chimeric contigs occurring at all taxonomic levels. The assembler and its parameters should be selected based on the focus of each study. Minimo's non-chimeric contigs and Genovo's long contigs excelled in taxonomy assignation and functional annotation, respectively.
tBLASTx stood out as the best approach for taxonomic annotation for virus identification. PhymmBL proved useful in datasets in which no related sequences are present as it uses genomic features that may help identify distant taxa. The k-frequencies underperformed in all viral datasets.
Viral metagenome; Assembler performance; Taxonomic classification; Chimera identification; Functional annotation
Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
TreeFam (http://www.treefam.org) is a database of phylogenetic trees inferred from animal genomes. For every TreeFam family we provide homology predictions together with the evolutionary history of the genes. Here we describe an update of the TreeFam database. The TreeFam project was resurrected in 2012 and has seen two releases since. The latest release (TreeFam 9) was made available in March 2013. It has orthology predictions and gene trees for 109 species in 15 736 families covering ∼2.2 million sequences. With release 9 we made modifications to our production pipeline and redesigned our website with improved gene tree visualizations and Wikipedia integration. Furthermore, we now provide an HMM-based sequence search that places a user-provided protein sequence into a TreeFam gene tree and provides quick orthology prediction. The tool uses Mafft and RAxML for the fast insertion into a reference alignment and tree, respectively. Besides the aforementioned technical improvements, we present a new approach to visualize gene trees and alternative displays that focuses on showing homology information from a species tree point of view. From release 9 onwards, TreeFam is now hosted at the EBI.
Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms.
The oral cavity of humans is inhabited by hundreds of bacterial species and some of them have a key role in the development of oral diseases, mainly dental caries and periodontitis. We describe for the first time the metagenome of the human oral cavity under health and diseased conditions, with a focus on supragingival dental plaque and cavities. Direct pyrosequencing of eight samples with different oral-health status produced 1 Gbp of sequence without the biases imposed by PCR or cloning. These data show that cavities are not dominated by Streptococcus mutans (the species originally identified as the ethiological agent of dental caries) but are in fact a complex community formed by tens of bacterial species, in agreement with the view that caries is a polymicrobial disease. The analysis of the reads indicated that the oral cavity is functionally a different environment from the gut, with many functional categories enriched in one of the two environments and depleted in the other. Individuals who had never suffered from dental caries showed an over-representation of several functional categories, like genes for antimicrobial peptides and quorum sensing. In addition, they did not have mutans streptococci but displayed high recruitment of other species. Several isolates belonging to these dominant bacteria in healthy individuals were cultured and shown to inhibit the growth of cariogenic bacteria, suggesting the use of these commensal bacterial strains as probiotics to promote oral health and prevent dental caries.
metagenomics; human microbiome; dental caries; Streptococcus mutans; pyrosequencing; probiotics
The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
Lactococcus garvieae is the etiological agent of lactococcosis disease, affecting many cultured fish species worldwide. In addition, this bacterium is currently considered a potential zoonotic microorganism since it is known to cause several opportunistic human infections. Here we present the draft genome sequence of the L. garvieae strain UNIUD074.
Cockroaches (Blattaria: Dictyoptera) harbor the endosymbiont Blattabacterium sp. in their abdominal fat body. This endosymbiont is involved in nitrogen recycling and amino acid provision to its host. In this study, the genome of Blattabacterium sp. of Cryptocercus punctulatus (BCpu) was sequenced and compared with those of the symbionts of Blattella germanica and Periplaneta americana, BBge and BPam, respectively. The BCpu genome consists of a chromosome of 605.7 kb and a plasmid of 3.8 kb and is therefore approximately 31 kb smaller than the other two aforementioned genomes. The size reduction is due to the loss of 55 genes, 23 of which belong to biosynthetic pathways for amino acids. The pathways for the production of tryptophan, leucine, isoleucine/threonine/valine, methionine, and cysteine have been completely lost. Additionally, the genes for the enzymes catalyzing the last steps of arginine and lysine biosynthesis, argH and lysA, were found to be missing and pseudogenized, respectively. These gene losses render BCpu auxotrophic for nine amino acids more than those corresponding to BBge and BPam. BCpu has also lost capacities for sulfate reduction, production of heme groups, as well as genes for several other unlinked metabolic processes, and genes present in BBge and BPam in duplicates. Amino acids and cofactors that are not synthesized by BCpu are either produced in abundance by hindgut microbiota or are provisioned via a copious diet of dampwood colonized by putrefying microbiota, supplying host and Blattabacterium symbiont with the necessary nutrients and thus permitting genome economization of BCpu.
symbiosis; genome reduction; Blattabacterium; Bacteroidetes; metabolic pathway loss; wood-feeding
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data. With this approach we tested the fidelity of different assemblers in metagenomic studies demonstrating that even under the simplest compositions the number of chimeric contigs involving different species is noticeable. We further showed that the assembly process reduces the accuracy of the functional classification of the metagenomic data and that these errors can be overcome raising the coverage of the studied metagenome. The results presented here highlight the particular difficulties that de novo genome assemblers face in multi-genome scenarios demonstrating that these difficulties, that often compromise the functional classification of the analyzed data, can be overcome with a high sequencing effort.
The human gut is the natural habitat for a large and dynamic bacterial community that has a great relevance for health. Metagenomics is increasing our knowledge of gene content as well as of functional and genetic variability in this microbiome. However, little is known about the active bacteria and their function(s) in the gastrointestinal tract. We performed a metatranscriptomic study on ten healthy volunteers to elucidate the active members of the gut microbiome and their functionality under conditions of health. First, the microbial cDNAs obtained from each sample were sequenced using 454 technology. The analysis of 16S transcripts showed the phylogenetic structure of the active microbial community. Lachnospiraceae, Ruminococcaceae, Bacteroidaceae, Prevotellaceae, and Rickenellaceae were the predominant families detected in the active microbiota. The characterization of mRNAs revealed a uniform functional pattern in healthy individuals. The main functional roles of the gut microbiota were carbohydrate metabolism, energy production and synthesis of cellular components. In contrast, housekeeping activities such as amino acid and lipid metabolism were underrepresented in the metatranscriptome. Our results provide new insights into the functionality of the complex gut microbiota in healthy individuals. In this RNA-based survey, we also detected small RNAs, which are important regulatory elements in prokaryotic physiology and pathogenicity.
The increasing availability of gene sequences of prokaryotic species in samples extracted from all kind of locations allows addressing the study of the influence of environmental patterns in prokaryotic biodiversity. We present a comprehensive study to address the potential existence of environmental preferences of prokaryotic taxa and the commonness of the specialist and generalist strategies. We also assessed the most significant environmental factors shaping the environmental distribution of taxa.
We used 16S rDNA sequences from 3,502 sampling experiments in natural and artificial sources. These sequences were taxonomically assigned, and the corresponding samples were also classified into a hierarchical classification of environments. We used several statistical methods to analyze the environmental distribution of taxa. Our results indicate that environmental specificity is not very common at the higher taxonomic levels (phylum to family), but emerges at lower taxonomic levels (genus and species). The most selective environmental characteristics are those of animal tissues and thermal locations. Salinity is another very important factor for constraining prokaryotic diversity. On the other hand, soil and freshwater habitats are the less restrictive environments, harboring the largest number of prokaryotic taxa. All information on taxa, samples and environments is provided at the envDB online database, http://metagenomics.uv.es/envDB.
This is, as far as we know, the most comprehensive assessment of the distribution and diversity of prokaryotic taxa and their associations with different environments. Our data indicate that we are still far from characterizing prokaryotic diversity in any environment, except, perhaps, for human tissues such as the oral cavity and the vagina.
The genome of the pea aphid Acyrthosiphon pisum lacks genes thought to be crucial in other insects for recognition, signaling and killing of microbes.
Recent genomic analyses of arthropod defense mechanisms suggest conservation of key elements underlying responses to pathogens, parasites and stresses. At the center of pathogen-induced immune responses are signaling pathways triggered by the recognition of fungal, bacterial and viral signatures. These pathways result in the production of response molecules, such as antimicrobial peptides and lysozymes, which degrade or destroy invaders. Using the recently sequenced genome of the pea aphid (Acyrthosiphon pisum), we conducted the first extensive annotation of the immune and stress gene repertoire of a hemipterous insect, which is phylogenetically distantly related to previously characterized insects models.
Strikingly, pea aphids appear to be missing genes present in insect genomes characterized to date and thought critical for recognition, signaling and killing of microbes. In line with results of gene annotation, experimental analyses designed to characterize immune response through the isolation of RNA transcripts and proteins from immune-challenged pea aphids uncovered few immune-related products. Gene expression studies, however, indicated some expression of immune and stress-related genes.
The absence of genes suspected to be essential for the insect immune response suggests that the traditional view of insect immunity may not be as broadly applicable as once thought. The limitations of the aphid immune system may be representative of a broad range of insects, or may be aphid specific. We suggest that several aspects of the aphid life style, such as their association with microbial symbionts, could facilitate survival without strong immune protection.
Heterozygous HNF1A mutations cause pancreatic-islet β-cell dysfunction and monogenic diabetes (MODY3). Hnf1α is known to regulate numerous hepatic genes, yet knowledge of its function in pancreatic islets is more limited. We now show that Hnf1a deficiency in mice leads to highly tissue-specific changes in the expression of genes involved in key functions of both islets and liver. To gain insights into the mechanisms of tissue-specific Hnf1α regulation, we integrated expression studies of Hnf1a-deficient mice with identification of direct Hnf1α targets. We demonstrate that Hnf1α can bind in a tissue-selective manner to genes that are expressed only in liver or islets. We also show that Hnf1α is essential only for the transcription of a minor fraction of its direct-target genes. Even among genes that were expressed in both liver and islets, the subset of targets showing functional dependence on Hnf1α was highly tissue specific. This was partly explained by the compensatory occupancy by the paralog Hnf1β at selected genes in Hnf1a-deficient liver. In keeping with these findings, the biological consequences of Hnf1a deficiency were markedly different in islets and liver. Notably, Hnf1a deficiency led to impaired large-T-antigen-induced growth and oncogenesis in β cells yet enhanced proliferation in hepatocytes. Collectively, these findings show that Hnf1α governs broad, highly tissue-specific genetic programs in pancreatic islets and liver and reveal key consequences of Hnf1a deficiency relevant to the pathophysiology of monogenic diabetes.
Bacterial endosymbionts of insects play a central role in upgrading the diet of their hosts. In certain cases, such as aphids and tsetse flies, endosymbionts complement the metabolic capacity of hosts living on nutrient-deficient diets, while the bacteria harbored by omnivorous carpenter ants are involved in nitrogen recycling. In this study, we describe the genome sequence and inferred metabolism of Blattabacterium strain Bge, the primary Flavobacteria endosymbiont of the omnivorous German cockroach Blattella germanica. Through comparative genomics with other insect endosymbionts and free-living Flavobacteria we reveal that Blattabacterium strain Bge shares the same distribution of functional gene categories only with Blochmannia strains, the primary Gamma-Proteobacteria endosymbiont of carpenter ants. This is a remarkable example of evolutionary convergence during the symbiotic process, involving very distant phylogenetic bacterial taxa within hosts feeding on similar diets. Despite this similarity, different nitrogen economy strategies have emerged in each case. Both bacterial endosymbionts code for urease but display different metabolic functions: Blochmannia strains produce ammonia from dietary urea and then use it as a source of nitrogen, whereas Blattabacterium strain Bge codes for the complete urea cycle that, in combination with urease, produces ammonia as an end product. Not only does the cockroach endosymbiont play an essential role in nutrient supply to the host, but also in the catabolic use of amino acids and nitrogen excretion, as strongly suggested by the stoichiometric analysis of the inferred metabolic network. Here, we explain the metabolic reasons underlying the enigmatic return of cockroaches to the ancestral ammonotelic state.
Bacterial endosymbionts from insects are subjected to a process of genome reduction from the moment they interact with their host, especially when the symbiosis is strict (the partners live together permanently) and the endosymbiont is maternally inherited. The type of genes that are retained correlates with specific metabolic host requirements. Here, we report the genome sequence of Blattabacterium strain Bge, the primary endosymbiont of the German cockroach B. germanica. Cockroaches are omnivorous insects and Blattabacterium cooperates with their metabolism, not only with essential nutrient metabolism but also through an efficient use of amino acids and the nitrogen excretion by the combination of a urea cycle and urease activity. The repertoires of functions that are maintained in Blattabacterium are similar to those already observed in Blochmannia spp., the primary endosymbiont of carpenter ants, also an omnivorous insect. This constitutes a nice example of evolutionary convergence of two endosymbionts belonging to very different bacterial phyla that have evolved a similar repertoire of functions according to the host. However, the current set of genes and, more importantly, those that were lost in the process of genome reduction in both endosymbiont lineages have also contributed to a different involvement of Blattabacterium and Blochmannia in nitrogen metabolism.
Transcriptional analysis of chromatin regulator mutants in Drosophila melanogaster identified clusters of functionally related genes conserved in other insect species.
The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure.
We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae.
The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.
Analysis of the gene expression profiles of wing imaginal discs from ash2 and ash1 mutants shows that they are highly similar, supporting a model in which they act together to maintain stable states of transcription.
The trithorax group (trxG) genes absent, small or homeotic discs 1 (ash1) and 2 (ash2) were isolated in a screen for mutants with abnormal imaginal discs. Mutations in either gene cause homeotic transformations but Hox genes are not their only targets. Although analysis of double mutants revealed that ash2 and ash1 mutations enhance each other's phenotypes, suggesting they are functionally related, it was shown that these proteins are subunits of distinct complexes.
The analysis of wing imaginal disc transcriptomes from ash2 and ash1 mutants showed that they are highly similar. Functional annotation of regulated genes using Gene Ontology allowed identification of severely affected groups of genes that could be correlated to the wing phenotypes observed. Comparison of the differentially expressed genes with those from other genome-wide analyses revealed similarities between ASH2 and Sin3A, suggesting a putative functional relationship. Coimmunoprecipitation studies and immunolocalization on polytene chromosomes demonstrated that ASH2 and Sin3A interact with HCF (host-cell factor). The results of nucleosome western blots and clonal analysis indicated that ASH2 is necessary for trimethylation of the Lys4 on histone 3 (H3K4).
The similarity between the transcriptomes of ash2 and ash1 mutants supports a model in which the two genes act together to maintain stable states of transcription. Like in humans, both ASH2 and Sin3A bind HCF. Finally, the reduction of H3K4 trimethylation in ash2 mutants is the first evidence in Drosophila regarding the molecular function of this trxG gene.