The unicellular cyanobacterium UCYN-A, one of the major contributors to nitrogen fixation in the open ocean, lives in symbiosis with single-celled phytoplankton. UCYN-A includes several closely related lineages whose partner fidelity, genome-wide expression and time of evolutionary divergence remain to be resolved. Here we detect and distinguish UCYN-A1 and UCYN-A2 lineages in symbiosis with two distinct prymnesiophyte partners in the South Atlantic Ocean. Both symbiotic systems are lineage specific and differ in the number of UCYN-A cells involved. Our analyses infer a streamlined genome expression towards nitrogen fixation in both UCYN-A lineages. Comparative genomics reveal a strong purifying selection in UCYN-A1 and UCYN-A2 with a diversification process ∼91 Myr ago, in the late Cretaceous, after the low-nutrient regime period occurred during the Jurassic. These findings suggest that UCYN-A diversified in a co-evolutionary process, wherein their prymnesiophyte partners acted as a barrier driving an allopatric speciation of extant UCYN-A lineages.
Nitrogen fixation in oceans is facilitated by associations between marine phytoplankton and cyanobacteria such as UCYN-A. Here, Cornejo-Castillo et al. show that UCYN-A diversified in the late Cretaceous under strong purifying selection to become lineage-specific symbiont partners with different prymnesiophytes.
To investigate the complexity of alternative splicing in the retina, we sequenced and analyzed a total of 115,706 clones from normalized cDNA libraries from mouse neural retina (66,217) and rat retinal pigmented epithelium (49,489). Based upon clustering the cDNAs and mapping them with their respective genomes, the estimated numbers of genes were 9,134 for the mouse neural retina and 12,050 for the rat retinal pigmented epithelium libraries. This unique collection of retinal of messenger RNAs is maintained and accessible through a web-base server to the whole community of retinal biologists for further functional characterization. The analysis revealed 3,248 and 3,202 alternative splice events for mouse neural retina and rat retinal pigmented epithelium, respectively. We focused on transcription factors involved in vision. Among the six candidates suitable for functional analysis, we selected Otx2S, a novel variant of the Otx2 gene with a deletion within the homeodomain sequence. Otx2S is expressed in both the neural retina and retinal pigmented epithelium, and encodes a protein that is targeted to the nucleus. OTX2S exerts transdominant activity on the tyrosinase promoter when tested in the physiological environment of primary RPE cells. By overexpressing OTX2S in primary RPE cells using an adeno associated viral vector, we identified 10 genes whose expression is positively regulated by OTX2S. We find that OTX2S is able to bind to the chromatin at the promoter of the retinal dehydrogenase 10 (RDH10) gene.
The evolutionary history of the characters underlying the adaptation of microorganisms to food and biotechnological uses is poorly understood. We undertook comparative genomics to investigate evolutionary relationships of the dairy yeast Geotrichum candidum within Saccharomycotina. Surprisingly, a remarkable proportion of genes showed discordant phylogenies, clustering with the filamentous fungus subphylum (Pezizomycotina), rather than the yeast subphylum (Saccharomycotina), of the Ascomycota. These genes appear not to be the result of Horizontal Gene Transfer (HGT), but to have been specifically retained by G. candidum after the filamentous fungi–yeasts split concomitant with the yeasts’ genome contraction. We refer to these genes as SRAGs (Specifically Retained Ancestral Genes), having been lost by all or nearly all other yeasts, and thus contributing to the phenotypic specificity of lineages. SRAG functions include lipases consistent with a role in cheese making and novel endoglucanases associated with degradation of plant material. Similar gene retention was observed in three other distantly related yeasts representative of this ecologically diverse subphylum. The phenomenon thus appears to be widespread in the Saccharomycotina and argues that, alongside neo-functionalization following gene duplication and HGT, specific gene retention must be recognized as an important mechanism for generation of biodiversity and adaptation in yeasts.
Emiliania huxleyi is the most abundant calcifying plankton in modern oceans with substantial intraspecific genome variability and a biphasic life cycle involving sexual alternation between calcified 2N and flagellated 1N cells. We show that high genome content variability in Emiliania relates to erosion of 1N-specific genes and loss of the ability to form flagellated cells. Analysis of 185 E. huxleyi strains isolated from world oceans suggests that loss of flagella occurred independently in lineages inhabiting oligotrophic open oceans over short evolutionary timescales. This environmentally linked physiogenomic change suggests life cycling is not advantageous in very large/diluted populations experiencing low biotic pressure and low ecological variability. Gene loss did not appear to reflect pressure for genome streamlining in oligotrophic oceans as previously observed in picoplankton. Life-cycle modifications might be common in plankton and cause major functional variability to be hidden from traditional taxonomic or molecular markers.
Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments.
The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads.
We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1519-z) contains supplementary material, which is available to authorized users.
Nanopore sequencing; Oxford nanopore; MinION® device; de novo genome assembly; Genome finishing
Rhizaria are an important component of oceanic plankton communities worldwide. A number of species harbor eukaryotic microalgal symbionts, which are horizontally acquired in the environment at each generation. Although these photosymbioses are determinant for Rhizaria ability to thrive in oceanic ecosystems, the mechanisms for symbiotic interactions are unclear. Using high-throughput sequencing technology (i.e., 454), we generated large Expressed Sequence Tag (EST) datasets from four uncultured Rhizaria, an acantharian (Amphilonche elongata), two polycystines (Collozoum sp. and Spongosphaera streptacantha), and one phaeodarian (Aulacantha scolymantha). We assessed the main genetic features of the host/symbionts consortium (i.e., the holobiont) transcriptomes and found rRNA sequences affiliated to a wide range of bacteria and protists in all samples, suggesting that diverse microbial communities are associated with the holobionts. A particular focus was then carried out to search for genes potentially involved in symbiotic processes such as the presence of c-type lectins-coding genes, which are proteins that play a role in cell recognition among eukaryotes. Unigenes coding putative c-type lectin domains (CTLD) were found in the species bearing photosynthetic symbionts (A. elongata, Collozoum sp., and S. streptacantha) but not in the non-symbiotic one (A. scolymantha). More particularly, phylogenetic analyses group CTLDs from A. elongata and Collozoum sp. on a distinct branch from S. streptacantha CTLDs, which contained carbohydrate-binding motifs typically observed in other marine photosymbiosis. Our data suggest that similarly to other well-known marine photosymbiosis involving metazoans, the interactions of glycans with c-type lectins is likely involved in modulation of the host/symbiont specific recognition in Radiolaria.
radiolarian; ESTs; c-type lectins; plankton; photosymbiosis; Rhizaria
Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before.
By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level.
Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0601-9) contains supplementary material, which is available to authorized users.
The domestication of citrus, is poorly understood. Cultivated types are selections from, or hybrids of, wild progenitor species, whose identities and contributions remain controversial. By comparative analysis of a collection of citrus genomes, including a high quality haploid reference, we show that cultivated types were derived from two progenitor species. Though cultivated pummelos represent selections from a single progenitor species, C. maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species, C. reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, implying that wild mandarins were part of the early breeding germplasm. A wild “mandarin” from China exhibited substantial divergence from C. reticulata, suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and enables sequence-directed genetic improvement.
Dimorphic mating-type chromosomes in fungi are excellent models for understanding the genomic consequences of recombination suppression. Their suppressed recombination and reduced effective population size are expected to limit the efficacy of natural selection, leading to genomic degeneration. Our aim was to identify the sequences of the mating-type chromosomes (a1 and a2) of the anther-smut fungi and to investigate degeneration in their nonrecombining regions. We used the haploid a1
Microbotryum lychnidis-dioicae reference genome sequence. The a1 and a2 mating-type chromosomes were both isolated electrophoretically and sequenced. Integration with restriction-digest optical maps identified regions of recombination and nonrecombination in the mating-type chromosomes. Genome sequence data were also obtained for 12 other Microbotryum species. We found strong evidence of degeneration across the genus in the nonrecombining regions of the mating-type chromosomes, with significantly higher rates of nonsynonymous substitution (dN/dS) than in nonmating-type chromosomes or in recombining regions of the mating-type chromosomes. The nonrecombining regions of the mating-type chromosomes also showed high transposable element content, weak gene expression, and gene losses. The levels of degeneration did not differ between the a1 and a2 mating-type chromosomes, consistent with the lack of homogametic/heterogametic asymmetry between them, and contrasting with X/Y or Z/W sex chromosomes.
Y chromosome; Silene latifolia; Microbotryum violaceum; PAR; evolutionary strata; autosomes; allosomes; genetic map
Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.
We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker.
We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0377-z) contains supplementary material, which is available to authorized users.
Transposable elements; Structural variants; Arabidopsis thaliana; Resequencing; DNA methylation
In order to gain insight into the impact of yolk increase on endoderm development, we have analyzed the mechanisms of endoderm formation in the catshark S. canicula, a species exhibiting telolecithal eggs and a distinct yolk sac. We show that in this species, endoderm markers are expressed in two distinct tissues, the deep mesenchyme, a mesenchymal population of deep blastomeres lying beneath the epithelial-like superficial layer, already specified at early blastula stages, and the involuting mesendoderm layer, which appears at the blastoderm posterior margin at the onset of gastrulation. Formation of the deep mesenchyme involves cell internalizations from the superficial layer prior to gastrulation, by a movement suggestive of ingressions. These cell movements were observed not only at the posterior margin, where massive internalizations take place prior to the start of involution, but also in the center of the blastoderm, where internalizations of single cells prevail. Like the adjacent involuting mesendoderm, the posterior deep mesenchyme expresses anterior mesendoderm markers under the control of Nodal/activin signaling. Comparisons across vertebrates support the conclusion that endoderm is specified in two distinct temporal phases in the catshark as in all major osteichthyan lineages, in line with an ancient origin of a biphasic mode of endoderm specification in gnathostomes. They also highlight unexpected similarities with amniotes, such as the occurrence of cell ingressions from the superficial layer prior to gastrulation. These similarities may correspond to homoplastic traits fixed separately in amniotes and chondrichthyans and related to the increase in egg yolk mass.
endoderm; telolecithal egg; chondrichthyan; Nodal signalling
Metatranscriptomics is rapidly expanding our knowledge of gene expression patterns and pathway dynamics in natural microbial communities. However, to cope with the challenges of environmental sampling, various rRNA removal and cDNA synthesis methods have been applied in published microbial metatranscriptomic studies, making comparisons arduous. Whereas efficiency and biases introduced by rRNA removal methods have been relatively well explored, the impact of cDNA synthesis and library preparation on transcript abundance remains poorly characterized. The evaluation of potential biases introduced at this step is challenging for metatranscriptomic samples, where data analyses are complex, for example because of the lack of reference genomes.
Herein, we tested four cDNA synthesis and Illumina library preparation protocols on a simplified mixture of total RNA extracted from four bacterial species. In parallel, RNA from each microbe was tested individually. cDNA synthesis was performed on rRNA depleted samples using the TruSeq Stranded Total RNA Library Preparation, the SMARTer Stranded RNA-Seq, or the Ovation RNA-Seq V2 System. A fourth experiment was made directly from total RNA using the Encore Complete Prokaryotic RNA-Seq. The obtained sequencing data were analyzed for: library complexity and reproducibility; rRNA removal efficiency and bias; the number of genes detected; coverage uniformity; and the impact of protocols on expression biases. Significant variations, especially in organism representation and gene expression patterns, were observed among the four methods. TruSeq generally performed best, but is limited by its requirement of hundreds of nanograms of total RNA. The SMARTer method appears the best solution for smaller amounts of input RNA. For very low amounts of RNA, the Ovation System provides the only option; however, the observed biases emphasized its limitations for quantitative analyses.
cDNA and library preparation methods may affect the outcome and interpretation of metatranscriptomic data. The most appropriate method should be chosen based on the available quantity of input RNA and the quantitative or non-quantitative objectives of the study. When low amounts of RNA are available, as in most metatranscriptomic studies, the SMARTer method seems to be the best compromise to obtain reliable results. This study emphasized the difficulty in comparing metatranscriptomic studies performed using different methods.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-912) contains supplementary material, which is available to authorized users.
Metatranscriptomics; cDNA synthesis method; Gene expression
Legume roots show a remarkable plasticity to adapt their architecture to biotic and abiotic constraints, including symbiotic interactions. However, global analysis of miRNA regulation in roots is limited, and a global view of the evolution of miRNA-mediated diversification in different ecotypes is lacking.
In the model legume Medicago truncatula, we analyze the small RNA transcriptome of roots submitted to symbiotic and pathogenic interactions. Genome mapping and a computational pipeline identify 416 miRNA candidates, including known and novel variants of 78 miRNA families present in miRBase. Stringent criteria of pre-miRNA prediction yield 52 new mtr-miRNAs, including 27 miRtrons. Analyzing miRNA precursor polymorphisms in 26 M. truncatula ecotypes identifies higher sequence polymorphism in conserved rather than Medicago-specific miRNA precursors. An average of 19 targets, mainly involved in environmental responses and signalling, is predicted per novel miRNA. We identify miRNAs responsive to bacterial and fungal pathogens or symbionts as well as their related Nod and Myc-LCO symbiotic signals. Network analyses reveal modules of new and conserved co-expressed miRNAs that regulate distinct sets of targets, highlighting potential miRNA-regulated biological pathways relevant to pathogenic and symbiotic interactions.
We identify 52 novel genuine miRNAs and large plasticity of the root miRNAome in response to the environment, and also in response to purified Myc/Nod signaling molecules. The new miRNAs identified and their sequence variation across M. truncatula ecotypes may be crucial to understand the adaptation of root growth to the soil environment, notably in the agriculturally important legume crops.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0457-4) contains supplementary material, which is available to authorized users.
The industrially important yeast Blastobotrys (Arxula) adeninivorans is an asexual hemiascomycete phylogenetically very distant from Saccharomyces cerevisiae. Its unusual metabolic flexibility allows it to use a wide range of carbon and nitrogen sources, while being thermotolerant, xerotolerant and osmotolerant.
The sequencing of strain LS3 revealed that the nuclear genome of A. adeninivorans is 11.8 Mb long and consists of four chromosomes with regional centromeres. Its closest sequenced relative is Yarrowia lipolytica, although mean conservation of orthologs is low. With 914 introns within 6116 genes, A. adeninivorans is one of the most intron-rich hemiascomycetes sequenced to date. Several large species-specific families appear to result from multiple rounds of segmental duplications of tandem gene arrays, a novel mechanism not yet described in yeasts. An analysis of the genome and its transcriptome revealed enzymes with biotechnological potential, such as two extracellular tannases (Atan1p and Atan2p) of the tannic-acid catabolic route, and a new pathway for the assimilation of n-butanol via butyric aldehyde and butyric acid.
The high-quality genome of this species that diverged early in Saccharomycotina will allow further fundamental studies on comparative genomics, evolution and phylogenetics. Protein components of different pathways for carbon and nitrogen source utilization were identified, which so far has remained unexplored in yeast, offering clues for further biotechnological developments. In the course of identifying alternative microorganisms for biotechnological interest, A. adeninivorans has already proved its strengthened competitiveness as a promising cell factory for many more applications.
Yeast; Genome; Biotechnology; Tannic acid; n-butanol; Metabolism
Members of the family Trypanosomatidae infect many organisms, including animals, plants and humans. Plant-infecting trypanosomes are grouped under the single genus Phytomonas, failing to reflect the wide biological and pathological diversity of these protists. While some Phytomonas spp. multiply in the latex of plants, or in fruit or seeds without apparent pathogenicity, others colonize the phloem sap and afflict plants of substantial economic value, including the coffee tree, coconut and oil palms. Plant trypanosomes have not been studied extensively at the genome level, a major gap in understanding and controlling pathogenesis. We describe the genome sequences of two plant trypanosomatids, one pathogenic isolate from a Guianan coconut and one non-symptomatic isolate from Euphorbia collected in France. Although these parasites have extremely distinct pathogenic impacts, very few genes are unique to either, with the vast majority of genes shared by both isolates. Significantly, both Phytomonas spp. genomes consist essentially of single copy genes for the bulk of their metabolic enzymes, whereas other trypanosomatids e.g. Leishmania and Trypanosoma possess multiple paralogous genes or families. Indeed, comparison with other trypanosomatid genomes revealed a highly streamlined genome, encoding for a minimized metabolic system while conserving the major pathways, and with retention of a full complement of endomembrane organelles, but with no evidence for functional complexity. Identification of the metabolic genes of Phytomonas provides opportunities for establishing in vitro culturing of these fastidious parasites and new tools for the control of agricultural plant disease.
Some plant trypanosomes, single-celled organisms living in phloem sap, are responsible for important palm diseases, inducing frequent expensive and toxic insecticide treatments against their insect vectors. Other trypanosomes multiply in latex tubes without detriment to their host. Despite the wide range of behaviors and impacts, these trypanosomes have been rather unceremoniously lumped into a single genus: Phytomonas. A battery of molecular probes has been used for their characterization but no clear phylogeny or classification has been established. We have sequenced the genomes of a pathogenic phloem-specific Phytomonas from a diseased South American coconut palm and a latex-specific isolate collected from an apparently healthy wild euphorb in the south of France. Upon comparison with each other and with human pathogenic trypanosomes, both Phytomonas revealed distinctive compact genomes, consisting essentially of single-copy genes, with the vast majority of genes shared by both isolates irrespective of their effect on the host. A strong cohort of enzymes in the sugar metabolism pathways was consistent with the nutritional environments found in plants. The genetic nuances may reveal the basis for the behavioral differences between these two unique plant parasites, and indicate the direction of our future studies in search of effective treatment of the crop disease parasites.
The numerous yeast genome sequences presently available provide a rich source of information for functional as well as evolutionary genomics but unequally cover the large phylogenetic diversity of extant yeasts. We present here the complete sequence of the nuclear genome of the haploid-type strain of Kuraishia capsulata (CBS1993T), a nitrate-assimilating Saccharomycetales of uncertain taxonomy, isolated from tunnels of insect larvae underneath coniferous barks and characterized by its copious production of extracellular polysaccharides. The sequence is composed of seven scaffolds, one per chromosome, totaling 11.4 Mb and containing 6,029 protein-coding genes, ∼13.5% of which being interrupted by introns. This GC-rich yeast genome (45.7%) appears phylogenetically related with the few other nitrate-assimilating yeasts sequenced so far, Ogataea polymorpha, O. parapolymorpha, and Dekkera bruxellensis, with which it shares a very reduced number of tRNA genes, a novel tRNA sparing strategy, and a common nitrate assimilation cluster, three specific features to this group of yeasts. Centromeres were recognized in GC-poor troughs of each scaffold. The strain bears MAT alpha genes at a single MAT locus and presents a significant degree of conservation with Saccharomyces cerevisiae genes, suggesting that it can perform sexual cycles in nature, although genes involved in meiosis were not all recognized. The complete absence of conservation of synteny between K. capsulata and any other yeast genome described so far, including the three other nitrate-assimilating species, validates the interest of this species for long-range evolutionary genomic studies among Saccharomycotina yeasts.
phylogeny; centromere; synteny; noncoding RNA; introgression
Candida glabrata follows C. albicans as the second or third most prevalent cause of candidemia worldwide. These two pathogenic yeasts are distantly related, C. glabrata being part of the Nakaseomyces, a group more closely related to Saccharomyces cerevisiae. Although C. glabrata was thought to be the only pathogenic Nakaseomyces, two new pathogens have recently been described within this group: C. nivariensis and C. bracarensis. To gain insight into the genomic changes underlying the emergence of virulence, we sequenced the genomes of these two, and three other non-pathogenic Nakaseomyces, and compared them to other sequenced yeasts.
Our results indicate that the two new pathogens are more closely related to the non-pathogenic N. delphensis than to C. glabrata. We uncover duplications and accelerated evolution that specifically affected genes in the lineage preceding the group containing N. delphensis and the three pathogens, which may provide clues to the higher propensity of this group to infect humans. Finally, the number of Epa-like adhesins is specifically enriched in the pathogens, particularly in C. glabrata.
Remarkably, some features thought to be the result of adaptation of C. glabrata to a pathogenic lifestyle, are present throughout the Nakaseomyces, indicating these are rather ancient adaptations to other environments. Phylogeny suggests that human pathogenesis evolved several times, independently within the clade. The expansion of the EPA gene family in pathogens establishes an evolutionary link between adhesion and virulence phenotypes. Our analyses thus shed light onto the relationships between virulence and the recent genomic changes that occurred within the Nakaseomyces.
Sequence Accession Numbers
Nakaseomyces delphensis: CAPT01000001 to CAPT01000179
Candida bracarensis: CAPU01000001 to CAPU01000251
Candida nivariensis: CAPV01000001 to CAPV01000123
Candida castellii: CAPW01000001 to CAPW01000101
Nakaseomyces bacillisporus: CAPX01000001 to CAPX01000186
Candida glabrata; Fungal pathogens; Nakaseomyces; Yeast genomes; Yeast evolution
Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain.
Nucleo-cytoplasmic large DNA viruses (NCLDVs) constitute a group of eukaryotic viruses that can have crucial ecological roles in the sea by accelerating the turnover of their unicellular hosts or by causing diseases in animals. To better characterize the diversity, abundance and biogeography of marine NCLDVs, we analyzed 17 metagenomes derived from microbial samples (0.2–1.6 μm size range) collected during the Tara Oceans Expedition. The sample set includes ecosystems under-represented in previous studies, such as the Arabian Sea oxygen minimum zone (OMZ) and Indian Ocean lagoons. By combining computationally derived relative abundance and direct prokaryote cell counts, the abundance of NCLDVs was found to be in the order of 104–105 genomes ml−1 for the samples from the photic zone and 102–103 genomes ml−1 for the OMZ. The Megaviridae and Phycodnaviridae dominated the NCLDV populations in the metagenomes, although most of the reads classified in these families showed large divergence from known viral genomes. Our taxon co-occurrence analysis revealed a potential association between viruses of the Megaviridae family and eukaryotes related to oomycetes. In support of this predicted association, we identified six cases of lateral gene transfer between Megaviridae and oomycetes. Our results suggest that marine NCLDVs probably outnumber eukaryotic organisms in the photic layer (per given water mass) and that metagenomic sequence analyses promise to shed new light on the biodiversity of marine viruses and their interactions with potential hosts.
eukaryotic viruses; marine NCLDVs; taxon co-occurrence; oomycetes
We performed high-throughput sequencing of DNA from fossilized faeces to evaluate this material as a source of information on the genome and diet of Pleistocene carnivores. We analysed coprolites derived from the extinct cave hyena (Crocuta crocuta spelaea), and sequenced 90 million DNA fragments from two specimens. The DNA reads enabled a reconstruction of the cave hyena mitochondrial genome with up to a 158-fold coverage. This genome, and those sequenced from extant spotted (Crocuta crocuta) and striped (Hyaena hyaena) hyena specimens, allows for the establishment of a robust phylogeny that supports a close relationship between the cave and the spotted hyena. We also demonstrate that high-throughput sequencing yields data for cave hyena multi-copy and single-copy nuclear genes, and that about 50 per cent of the coprolite DNA can be ascribed to this species. Analysing the data for additional species to indicate the cave hyena diet, we retrieved abundant sequences for the red deer (Cervus elaphus), and characterized its mitochondrial genome with up to a 3.8-fold coverage. In conclusion, we have demonstrated the presence of abundant ancient DNA in the coprolites surveyed. Shotgun sequencing of this material yielded a wealth of DNA sequences for a Pleistocene carnivore and allowed unbiased identification of diet.
ancient DNA; Cervus elaphus; Crocuta crocuta; Hyaena hyaena; mitochondrial genome
Microbes drive the biogeochemistry that fuels the planet. Microbial viruses modulate their hosts directly through mortality and horizontal gene transfer, and indirectly by re-programming host metabolisms during infection. However, our ability to study these virus-host interactions is limited by methods that are low-throughput and heavily reliant upon the subset of organisms that are in culture. One way forward are culture-independent metagenomic approaches, but these novel methods are rarely rigorously tested, especially for studies of environmental viruses, air microbiomes, extreme environment microbiology and other areas with constrained sample amounts. Here we perform replicated experiments to evaluate Roche 454, Illumina HiSeq, and Ion Torrent PGM sequencing and library preparation protocols on virus metagenomes generated from as little as 10pg of DNA.
Using %G + C content to compare metagenomes, we find that (i) metagenomes are highly replicable, (ii) some treatment effects are minimal, e.g., sequencing technology choice has 6-fold less impact than varying input DNA amount, and (iii) when restricted to a limited DNA concentration (<1μg), changing the amount of amplification produces little variation. These trends were also observed when examining the metagenomes for gene function and assembly performance, although the latter more closely aligned to sequencing effort and read length than preparation steps tested. Among Illumina library preparation options, transposon-based libraries diverged from all others and adaptor ligation was a critical step for optimizing sequencing yields.
These data guide researchers in generating systematic, comparative datasets to understand complex ecosystems, and suggest that neither varied amplification nor sequencing platforms will deter such efforts.
We report the development of OikoBase (http://oikoarrays.biology.uiowa.edu/Oiko/), a tiling array-based genome browser resource for Oikopleura dioica, a metazoan belonging to the urochordates, the closest extant group to vertebrates. OikoBase facilitates retrieval and mining of a variety of useful genomics information. First, it includes a genome browser which interrogates 1260 genomic sequence scaffolds and features gene, transcript and CDS annotation tracks. Second, we annotated gene models with gene ontology (GO) terms and InterPro domains which are directly accessible in the browser with links to their entries in the GO (http://www.geneontology.org/) and InterPro (http://www.ebi.ac.uk/interpro/) databases, and we provide transcript and peptide links for sequence downloads. Third, we introduce the transcriptomics of a comprehensive set of developmental stages of O. dioica at high resolution and provide downloadable gene expression data for all developmental stages. Fourth, we incorporate a BLAST tool to identify homologs of genes and proteins. Finally, we include a tutorial that describes how to use OikoBase as well as a link to detailed methods, explaining the data generation and analysis pipeline. OikoBase will provide a valuable resource for research in chordate development, genome evolution and plasticity and the molecular ecology of this important marine planktonic organism.
Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of ∼45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a ∼10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the genome in which parasitic DNA is not usually tolerated.
Ciliates are unicellular eukaryotes that rearrange their genomes at every sexual generation when a new somatic macronucleus, responsible for gene expression, develops from a copy of the germline micronucleus. In Paramecium, assembly of a functional somatic genome requires precise excision of interstitial DNA segments, the Internal Eliminated Sequences (IES), involving a domesticated piggyBac transposase, PiggyMac. To study IES origin and evolution, we sequenced germline DNA and identified 45,000 IESs. We found that at least some of these unique-copy elements are decayed Tc1/mariner transposons and that IES insertion is likely an ongoing process. After insertion, elements decay rapidly by accumulation of deletions and substitutions. The 93% of IESs shorter than 150 bp display a remarkable size distribution with a periodicity of 10 bp, the helical repeat of double-stranded DNA, consistent with the idea that evolution has only retained IESs that can form a double-stranded DNA loop during assembly of an excision complex. We propose that the ancient domestication of a piggyBac transposase, which provided a precise excision mechanism, enabled transposons to subsequently invade Paramecium coding sequences, a fraction of the genome that does not usually tolerate parasitic DNA.
Highly hemolytic strain Bacillus cereus F837/76 was isolated in 1976 from a contaminated prostate wound. The complete nucleotide sequence of this strain reported here counts nearly 36,500 single-nucleotide differences from the closest sequenced strain, Bacillus thuringiensis Al Hakam. F827/76 also contains a 10-kb plasmid that was not detected in the Al Hakam strain.