The domestication of citrus, is poorly understood. Cultivated types are selections from, or hybrids of, wild progenitor species, whose identities and contributions remain controversial. By comparative analysis of a collection of citrus genomes, including a high quality haploid reference, we show that cultivated types were derived from two progenitor species. Though cultivated pummelos represent selections from a single progenitor species, C. maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species, C. reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, implying that wild mandarins were part of the early breeding germplasm. A wild “mandarin” from China exhibited substantial divergence from C. reticulata, suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and enables sequence-directed genetic improvement.
Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.
We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker.
We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0377-z) contains supplementary material, which is available to authorized users.
Transposable elements; Structural variants; Arabidopsis thaliana; Resequencing; DNA methylation
In order to gain insight into the impact of yolk increase on endoderm development, we have analyzed the mechanisms of endoderm formation in the catshark S. canicula, a species exhibiting telolecithal eggs and a distinct yolk sac. We show that in this species, endoderm markers are expressed in two distinct tissues, the deep mesenchyme, a mesenchymal population of deep blastomeres lying beneath the epithelial-like superficial layer, already specified at early blastula stages, and the involuting mesendoderm layer, which appears at the blastoderm posterior margin at the onset of gastrulation. Formation of the deep mesenchyme involves cell internalizations from the superficial layer prior to gastrulation, by a movement suggestive of ingressions. These cell movements were observed not only at the posterior margin, where massive internalizations take place prior to the start of involution, but also in the center of the blastoderm, where internalizations of single cells prevail. Like the adjacent involuting mesendoderm, the posterior deep mesenchyme expresses anterior mesendoderm markers under the control of Nodal/activin signaling. Comparisons across vertebrates support the conclusion that endoderm is specified in two distinct temporal phases in the catshark as in all major osteichthyan lineages, in line with an ancient origin of a biphasic mode of endoderm specification in gnathostomes. They also highlight unexpected similarities with amniotes, such as the occurrence of cell ingressions from the superficial layer prior to gastrulation. These similarities may correspond to homoplastic traits fixed separately in amniotes and chondrichthyans and related to the increase in egg yolk mass.
endoderm; telolecithal egg; chondrichthyan; Nodal signalling
Metatranscriptomics is rapidly expanding our knowledge of gene expression patterns and pathway dynamics in natural microbial communities. However, to cope with the challenges of environmental sampling, various rRNA removal and cDNA synthesis methods have been applied in published microbial metatranscriptomic studies, making comparisons arduous. Whereas efficiency and biases introduced by rRNA removal methods have been relatively well explored, the impact of cDNA synthesis and library preparation on transcript abundance remains poorly characterized. The evaluation of potential biases introduced at this step is challenging for metatranscriptomic samples, where data analyses are complex, for example because of the lack of reference genomes.
Herein, we tested four cDNA synthesis and Illumina library preparation protocols on a simplified mixture of total RNA extracted from four bacterial species. In parallel, RNA from each microbe was tested individually. cDNA synthesis was performed on rRNA depleted samples using the TruSeq Stranded Total RNA Library Preparation, the SMARTer Stranded RNA-Seq, or the Ovation RNA-Seq V2 System. A fourth experiment was made directly from total RNA using the Encore Complete Prokaryotic RNA-Seq. The obtained sequencing data were analyzed for: library complexity and reproducibility; rRNA removal efficiency and bias; the number of genes detected; coverage uniformity; and the impact of protocols on expression biases. Significant variations, especially in organism representation and gene expression patterns, were observed among the four methods. TruSeq generally performed best, but is limited by its requirement of hundreds of nanograms of total RNA. The SMARTer method appears the best solution for smaller amounts of input RNA. For very low amounts of RNA, the Ovation System provides the only option; however, the observed biases emphasized its limitations for quantitative analyses.
cDNA and library preparation methods may affect the outcome and interpretation of metatranscriptomic data. The most appropriate method should be chosen based on the available quantity of input RNA and the quantitative or non-quantitative objectives of the study. When low amounts of RNA are available, as in most metatranscriptomic studies, the SMARTer method seems to be the best compromise to obtain reliable results. This study emphasized the difficulty in comparing metatranscriptomic studies performed using different methods.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-912) contains supplementary material, which is available to authorized users.
Metatranscriptomics; cDNA synthesis method; Gene expression
Legume roots show a remarkable plasticity to adapt their architecture to biotic and abiotic constraints, including symbiotic interactions. However, global analysis of miRNA regulation in roots is limited, and a global view of the evolution of miRNA-mediated diversification in different ecotypes is lacking.
In the model legume Medicago truncatula, we analyze the small RNA transcriptome of roots submitted to symbiotic and pathogenic interactions. Genome mapping and a computational pipeline identify 416 miRNA candidates, including known and novel variants of 78 miRNA families present in miRBase. Stringent criteria of pre-miRNA prediction yield 52 new mtr-miRNAs, including 27 miRtrons. Analyzing miRNA precursor polymorphisms in 26 M. truncatula ecotypes identifies higher sequence polymorphism in conserved rather than Medicago-specific miRNA precursors. An average of 19 targets, mainly involved in environmental responses and signalling, is predicted per novel miRNA. We identify miRNAs responsive to bacterial and fungal pathogens or symbionts as well as their related Nod and Myc-LCO symbiotic signals. Network analyses reveal modules of new and conserved co-expressed miRNAs that regulate distinct sets of targets, highlighting potential miRNA-regulated biological pathways relevant to pathogenic and symbiotic interactions.
We identify 52 novel genuine miRNAs and large plasticity of the root miRNAome in response to the environment, and also in response to purified Myc/Nod signaling molecules. The new miRNAs identified and their sequence variation across M. truncatula ecotypes may be crucial to understand the adaptation of root growth to the soil environment, notably in the agriculturally important legume crops.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0457-4) contains supplementary material, which is available to authorized users.
The industrially important yeast Blastobotrys (Arxula) adeninivorans is an asexual hemiascomycete phylogenetically very distant from Saccharomyces cerevisiae. Its unusual metabolic flexibility allows it to use a wide range of carbon and nitrogen sources, while being thermotolerant, xerotolerant and osmotolerant.
The sequencing of strain LS3 revealed that the nuclear genome of A. adeninivorans is 11.8 Mb long and consists of four chromosomes with regional centromeres. Its closest sequenced relative is Yarrowia lipolytica, although mean conservation of orthologs is low. With 914 introns within 6116 genes, A. adeninivorans is one of the most intron-rich hemiascomycetes sequenced to date. Several large species-specific families appear to result from multiple rounds of segmental duplications of tandem gene arrays, a novel mechanism not yet described in yeasts. An analysis of the genome and its transcriptome revealed enzymes with biotechnological potential, such as two extracellular tannases (Atan1p and Atan2p) of the tannic-acid catabolic route, and a new pathway for the assimilation of n-butanol via butyric aldehyde and butyric acid.
The high-quality genome of this species that diverged early in Saccharomycotina will allow further fundamental studies on comparative genomics, evolution and phylogenetics. Protein components of different pathways for carbon and nitrogen source utilization were identified, which so far has remained unexplored in yeast, offering clues for further biotechnological developments. In the course of identifying alternative microorganisms for biotechnological interest, A. adeninivorans has already proved its strengthened competitiveness as a promising cell factory for many more applications.
Yeast; Genome; Biotechnology; Tannic acid; n-butanol; Metabolism
Members of the family Trypanosomatidae infect many organisms, including animals, plants and humans. Plant-infecting trypanosomes are grouped under the single genus Phytomonas, failing to reflect the wide biological and pathological diversity of these protists. While some Phytomonas spp. multiply in the latex of plants, or in fruit or seeds without apparent pathogenicity, others colonize the phloem sap and afflict plants of substantial economic value, including the coffee tree, coconut and oil palms. Plant trypanosomes have not been studied extensively at the genome level, a major gap in understanding and controlling pathogenesis. We describe the genome sequences of two plant trypanosomatids, one pathogenic isolate from a Guianan coconut and one non-symptomatic isolate from Euphorbia collected in France. Although these parasites have extremely distinct pathogenic impacts, very few genes are unique to either, with the vast majority of genes shared by both isolates. Significantly, both Phytomonas spp. genomes consist essentially of single copy genes for the bulk of their metabolic enzymes, whereas other trypanosomatids e.g. Leishmania and Trypanosoma possess multiple paralogous genes or families. Indeed, comparison with other trypanosomatid genomes revealed a highly streamlined genome, encoding for a minimized metabolic system while conserving the major pathways, and with retention of a full complement of endomembrane organelles, but with no evidence for functional complexity. Identification of the metabolic genes of Phytomonas provides opportunities for establishing in vitro culturing of these fastidious parasites and new tools for the control of agricultural plant disease.
Some plant trypanosomes, single-celled organisms living in phloem sap, are responsible for important palm diseases, inducing frequent expensive and toxic insecticide treatments against their insect vectors. Other trypanosomes multiply in latex tubes without detriment to their host. Despite the wide range of behaviors and impacts, these trypanosomes have been rather unceremoniously lumped into a single genus: Phytomonas. A battery of molecular probes has been used for their characterization but no clear phylogeny or classification has been established. We have sequenced the genomes of a pathogenic phloem-specific Phytomonas from a diseased South American coconut palm and a latex-specific isolate collected from an apparently healthy wild euphorb in the south of France. Upon comparison with each other and with human pathogenic trypanosomes, both Phytomonas revealed distinctive compact genomes, consisting essentially of single-copy genes, with the vast majority of genes shared by both isolates irrespective of their effect on the host. A strong cohort of enzymes in the sugar metabolism pathways was consistent with the nutritional environments found in plants. The genetic nuances may reveal the basis for the behavioral differences between these two unique plant parasites, and indicate the direction of our future studies in search of effective treatment of the crop disease parasites.
The numerous yeast genome sequences presently available provide a rich source of information for functional as well as evolutionary genomics but unequally cover the large phylogenetic diversity of extant yeasts. We present here the complete sequence of the nuclear genome of the haploid-type strain of Kuraishia capsulata (CBS1993T), a nitrate-assimilating Saccharomycetales of uncertain taxonomy, isolated from tunnels of insect larvae underneath coniferous barks and characterized by its copious production of extracellular polysaccharides. The sequence is composed of seven scaffolds, one per chromosome, totaling 11.4 Mb and containing 6,029 protein-coding genes, ∼13.5% of which being interrupted by introns. This GC-rich yeast genome (45.7%) appears phylogenetically related with the few other nitrate-assimilating yeasts sequenced so far, Ogataea polymorpha, O. parapolymorpha, and Dekkera bruxellensis, with which it shares a very reduced number of tRNA genes, a novel tRNA sparing strategy, and a common nitrate assimilation cluster, three specific features to this group of yeasts. Centromeres were recognized in GC-poor troughs of each scaffold. The strain bears MAT alpha genes at a single MAT locus and presents a significant degree of conservation with Saccharomyces cerevisiae genes, suggesting that it can perform sexual cycles in nature, although genes involved in meiosis were not all recognized. The complete absence of conservation of synteny between K. capsulata and any other yeast genome described so far, including the three other nitrate-assimilating species, validates the interest of this species for long-range evolutionary genomic studies among Saccharomycotina yeasts.
phylogeny; centromere; synteny; noncoding RNA; introgression
Candida glabrata follows C. albicans as the second or third most prevalent cause of candidemia worldwide. These two pathogenic yeasts are distantly related, C. glabrata being part of the Nakaseomyces, a group more closely related to Saccharomyces cerevisiae. Although C. glabrata was thought to be the only pathogenic Nakaseomyces, two new pathogens have recently been described within this group: C. nivariensis and C. bracarensis. To gain insight into the genomic changes underlying the emergence of virulence, we sequenced the genomes of these two, and three other non-pathogenic Nakaseomyces, and compared them to other sequenced yeasts.
Our results indicate that the two new pathogens are more closely related to the non-pathogenic N. delphensis than to C. glabrata. We uncover duplications and accelerated evolution that specifically affected genes in the lineage preceding the group containing N. delphensis and the three pathogens, which may provide clues to the higher propensity of this group to infect humans. Finally, the number of Epa-like adhesins is specifically enriched in the pathogens, particularly in C. glabrata.
Remarkably, some features thought to be the result of adaptation of C. glabrata to a pathogenic lifestyle, are present throughout the Nakaseomyces, indicating these are rather ancient adaptations to other environments. Phylogeny suggests that human pathogenesis evolved several times, independently within the clade. The expansion of the EPA gene family in pathogens establishes an evolutionary link between adhesion and virulence phenotypes. Our analyses thus shed light onto the relationships between virulence and the recent genomic changes that occurred within the Nakaseomyces.
Sequence Accession Numbers
Nakaseomyces delphensis: CAPT01000001 to CAPT01000179
Candida bracarensis: CAPU01000001 to CAPU01000251
Candida nivariensis: CAPV01000001 to CAPV01000123
Candida castellii: CAPW01000001 to CAPW01000101
Nakaseomyces bacillisporus: CAPX01000001 to CAPX01000186
Candida glabrata; Fungal pathogens; Nakaseomyces; Yeast genomes; Yeast evolution
Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain.
Nucleo-cytoplasmic large DNA viruses (NCLDVs) constitute a group of eukaryotic viruses that can have crucial ecological roles in the sea by accelerating the turnover of their unicellular hosts or by causing diseases in animals. To better characterize the diversity, abundance and biogeography of marine NCLDVs, we analyzed 17 metagenomes derived from microbial samples (0.2–1.6 μm size range) collected during the Tara Oceans Expedition. The sample set includes ecosystems under-represented in previous studies, such as the Arabian Sea oxygen minimum zone (OMZ) and Indian Ocean lagoons. By combining computationally derived relative abundance and direct prokaryote cell counts, the abundance of NCLDVs was found to be in the order of 104–105 genomes ml−1 for the samples from the photic zone and 102–103 genomes ml−1 for the OMZ. The Megaviridae and Phycodnaviridae dominated the NCLDV populations in the metagenomes, although most of the reads classified in these families showed large divergence from known viral genomes. Our taxon co-occurrence analysis revealed a potential association between viruses of the Megaviridae family and eukaryotes related to oomycetes. In support of this predicted association, we identified six cases of lateral gene transfer between Megaviridae and oomycetes. Our results suggest that marine NCLDVs probably outnumber eukaryotic organisms in the photic layer (per given water mass) and that metagenomic sequence analyses promise to shed new light on the biodiversity of marine viruses and their interactions with potential hosts.
eukaryotic viruses; marine NCLDVs; taxon co-occurrence; oomycetes
We performed high-throughput sequencing of DNA from fossilized faeces to evaluate this material as a source of information on the genome and diet of Pleistocene carnivores. We analysed coprolites derived from the extinct cave hyena (Crocuta crocuta spelaea), and sequenced 90 million DNA fragments from two specimens. The DNA reads enabled a reconstruction of the cave hyena mitochondrial genome with up to a 158-fold coverage. This genome, and those sequenced from extant spotted (Crocuta crocuta) and striped (Hyaena hyaena) hyena specimens, allows for the establishment of a robust phylogeny that supports a close relationship between the cave and the spotted hyena. We also demonstrate that high-throughput sequencing yields data for cave hyena multi-copy and single-copy nuclear genes, and that about 50 per cent of the coprolite DNA can be ascribed to this species. Analysing the data for additional species to indicate the cave hyena diet, we retrieved abundant sequences for the red deer (Cervus elaphus), and characterized its mitochondrial genome with up to a 3.8-fold coverage. In conclusion, we have demonstrated the presence of abundant ancient DNA in the coprolites surveyed. Shotgun sequencing of this material yielded a wealth of DNA sequences for a Pleistocene carnivore and allowed unbiased identification of diet.
ancient DNA; Cervus elaphus; Crocuta crocuta; Hyaena hyaena; mitochondrial genome
Microbes drive the biogeochemistry that fuels the planet. Microbial viruses modulate their hosts directly through mortality and horizontal gene transfer, and indirectly by re-programming host metabolisms during infection. However, our ability to study these virus-host interactions is limited by methods that are low-throughput and heavily reliant upon the subset of organisms that are in culture. One way forward are culture-independent metagenomic approaches, but these novel methods are rarely rigorously tested, especially for studies of environmental viruses, air microbiomes, extreme environment microbiology and other areas with constrained sample amounts. Here we perform replicated experiments to evaluate Roche 454, Illumina HiSeq, and Ion Torrent PGM sequencing and library preparation protocols on virus metagenomes generated from as little as 10pg of DNA.
Using %G + C content to compare metagenomes, we find that (i) metagenomes are highly replicable, (ii) some treatment effects are minimal, e.g., sequencing technology choice has 6-fold less impact than varying input DNA amount, and (iii) when restricted to a limited DNA concentration (<1μg), changing the amount of amplification produces little variation. These trends were also observed when examining the metagenomes for gene function and assembly performance, although the latter more closely aligned to sequencing effort and read length than preparation steps tested. Among Illumina library preparation options, transposon-based libraries diverged from all others and adaptor ligation was a critical step for optimizing sequencing yields.
These data guide researchers in generating systematic, comparative datasets to understand complex ecosystems, and suggest that neither varied amplification nor sequencing platforms will deter such efforts.
We report the development of OikoBase (http://oikoarrays.biology.uiowa.edu/Oiko/), a tiling array-based genome browser resource for Oikopleura dioica, a metazoan belonging to the urochordates, the closest extant group to vertebrates. OikoBase facilitates retrieval and mining of a variety of useful genomics information. First, it includes a genome browser which interrogates 1260 genomic sequence scaffolds and features gene, transcript and CDS annotation tracks. Second, we annotated gene models with gene ontology (GO) terms and InterPro domains which are directly accessible in the browser with links to their entries in the GO (http://www.geneontology.org/) and InterPro (http://www.ebi.ac.uk/interpro/) databases, and we provide transcript and peptide links for sequence downloads. Third, we introduce the transcriptomics of a comprehensive set of developmental stages of O. dioica at high resolution and provide downloadable gene expression data for all developmental stages. Fourth, we incorporate a BLAST tool to identify homologs of genes and proteins. Finally, we include a tutorial that describes how to use OikoBase as well as a link to detailed methods, explaining the data generation and analysis pipeline. OikoBase will provide a valuable resource for research in chordate development, genome evolution and plasticity and the molecular ecology of this important marine planktonic organism.
Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of ∼45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a ∼10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the genome in which parasitic DNA is not usually tolerated.
Ciliates are unicellular eukaryotes that rearrange their genomes at every sexual generation when a new somatic macronucleus, responsible for gene expression, develops from a copy of the germline micronucleus. In Paramecium, assembly of a functional somatic genome requires precise excision of interstitial DNA segments, the Internal Eliminated Sequences (IES), involving a domesticated piggyBac transposase, PiggyMac. To study IES origin and evolution, we sequenced germline DNA and identified 45,000 IESs. We found that at least some of these unique-copy elements are decayed Tc1/mariner transposons and that IES insertion is likely an ongoing process. After insertion, elements decay rapidly by accumulation of deletions and substitutions. The 93% of IESs shorter than 150 bp display a remarkable size distribution with a periodicity of 10 bp, the helical repeat of double-stranded DNA, consistent with the idea that evolution has only retained IESs that can form a double-stranded DNA loop during assembly of an excision complex. We propose that the ancient domestication of a piggyBac transposase, which provided a precise excision mechanism, enabled transposons to subsequently invade Paramecium coding sequences, a fraction of the genome that does not usually tolerate parasitic DNA.
Highly hemolytic strain Bacillus cereus F837/76 was isolated in 1976 from a contaminated prostate wound. The complete nucleotide sequence of this strain reported here counts nearly 36,500 single-nucleotide differences from the closest sequenced strain, Bacillus thuringiensis Al Hakam. F827/76 also contains a 10-kb plasmid that was not detected in the Al Hakam strain.
Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage.
Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis.
The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.
We have sequenced the genome of the emerging human pathogen Babesia microti and compared it with that of other protozoa. B. microti has the smallest nuclear genome among all Apicomplexan parasites sequenced to date with three chromosomes encoding ∼3500 polypeptides, several of which are species specific. Genome-wide phylogenetic analyses indicate that B. microti is significantly distant from all species of Babesidae and Theileridae and defines a new clade in the phylum Apicomplexa. Furthermore, unlike all other Apicomplexa, its mitochondrial genome is circular. Genome-scale reconstruction of functional networks revealed that B. microti has the minimal metabolic requirement for intraerythrocytic protozoan parasitism. B. microti multigene families differ from those of other protozoa in both the copy number and organization. Two lateral transfer events with significant metabolic implications occurred during the evolution of this parasite. The genomic sequencing of B. microti identified several targets suitable for the development of diagnostic assays and novel therapies for human babesiosis.
Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation 1. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Mya). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species 2. Medicago truncatula (Mt) is a long-established model for the study of legume biology. Here we describe the draft sequence of the Mt euchromatin based on a recently completed BAC-assembly supplemented with Illumina-shotgun sequence, together capturing ~94% of all Mt genes. A whole-genome duplication (WGD) approximately 58 Mya played a major role in shaping the Mt genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the Mt genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max (Gm) and Lotus japonicus (Lj). Mt is a close relative of alfalfa (M. sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the Mt genome sequence provides significant opportunities to expand alfalfa’s genomic toolbox.
African Green Monkeys (AGM) are amongst the most frequently used nonhuman primate models in clinical and biomedical research, nevertheless only few genomic resources exist for this species. Such information would be essential for the development of dedicated new generation technologies in fundamental and pre-clinical research using this model, and would deliver new insights into primate evolution.
We have exhaustively sequenced an Expression Sequence Tag (EST) library made from a pool of Peripheral Blood Mononuclear Cells from sixteen Chlorocebus sabaeus monkeys. Twelve of them were infected with the Simian Immunodeficiency Virus. The mononuclear cells were or not stimulated in vitro with Concanavalin A, with lipopolysacharrides, or through mixed lymphocyte reaction in order to generate a representative and broad library of expressed sequences in immune cells. We report here 37,787 sequences, which were assembled into 14,410 contigs representing an estimated 12% of the C. sabaeus transcriptome. Using data from primate genome databases, 9,029 assembled sequences from C. sabaeus could be annotated. Sequences have been systematically aligned with ten cDNA references of primate species including Homo sapiens, Pan troglodytes, and Macaca mulatta to identify ortholog transcripts. For 506 transcripts, sequences were quasi-complete. In addition, 6,576 transcript fragments are potentially specific to the C. sabaeus or corresponding to not yet described primate genes.
The EST library we provide here will prove useful in gene annotation efforts for future sequencing of the African Green Monkey genomes. Furthermore, this library, which particularly well represents immunological and hematological gene expression, will be an important resource for the comparative analysis of gene expression in clinically relevant nonhuman primate and human research.
Streptococcus salivarius is a commensal species commonly found in the human oral cavity and digestive tract, although it is also associated with human infections such as meningitis, endocarditis, and bacteremia. Here, we report the complete sequence of S. salivarius strain CCHSS3, isolated from human blood.
The commensal bacterium Streptococcus salivarius is a prevalent species of the human oropharyngeal tract with an important role in oral ecology. Here, we report the complete 2.2-Mb genome sequence and annotation of strain JIM8777, which was recently isolated from the oral cavity of a healthy, dentate infant.
Polyploidization is an important process in the evolution of eukaryotic genomes, but ensuing molecular mechanisms remain to be clarified. Autopolyploidization or whole-genome duplication events frequently are resolved in resulting lineages by the loss of single genes from most duplicated pairs, causing transient gene dosage imbalance and accelerating speciation through meiotic infertility. Allopolyploidization or formation of interspecies hybrids raises the problem of genetic incompatibility (Bateson-Dobzhansky-Muller effect) and may be resolved by the accumulation of mutational changes in resulting lineages. In this article, we show that an osmotolerant yeast species, Pichia sorbitophila, recently isolated in a concentrated sorbitol solution in industry, illustrates this last situation. Its genome is a mosaic of homologous and homeologous chromosomes, or parts thereof, that corresponds to a recently formed hybrid in the process of evolution. The respective parental contributions to this genome were characterized using existing variations in GC content. The genomic changes that occurred during the short period since hybrid formation were identified (e.g., loss of heterozygosity, unilateral loss of rDNA, reciprocal exchange) and distinguished from those undergone by the two parental genomes after separation from their common ancestor (i.e., NUMT (NUclear sequences of MiTochondrial origin) insertions, gene acquisitions, gene location movements, reciprocal translocation). We found that the physiological characteristics of this new yeast species are determined by specific but unequal contributions of its two parents, one of which could be identified as very closely related to an extant Pichia farinosa strain.
osmotolerant yeast P. sorbitophila; allopolyploidy; hybridization; genome evolution; loss of heterozygosity
Sequencing projects using a clone-by-clone approach require the availability of a robust physical map. The SNaPshot technology, based on pair-wise comparisons of restriction fragments sizes, has been used recently to build the first physical map of a wheat chromosome and to complete the maize physical map. However, restriction fragments sizes shared randomly between two non-overlapping BACs often lead to chimerical contigs and mis-assembled BACs in such large and repetitive genomes. Whole Genome Profiling (WGP™) was developed recently as a new sequence-based physical mapping technology and has the potential to limit this problem.
A subset of the wheat 3B chromosome BAC library covering 230 Mb was used to establish a WGP physical map and to compare it to a map obtained with the SNaPshot technology. We first adapted the WGP-based assembly methodology to cope with the complexity of the wheat genome. Then, the results showed that the WGP map covers the same length than the SNaPshot map but with 30% less contigs and, more importantly with 3.5 times less mis-assembled BACs. Finally, we evaluated the benefit of integrating WGP tags in different sequence assemblies obtained after Roche/454 sequencing of BAC pools. We showed that while WGP tag integration improves assemblies performed with unpaired reads and with paired-end reads at low coverage, it does not significantly improve sequence assemblies performed at high coverage (25x) with paired-end reads.
Our results demonstrate that, with a suitable assembly methodology, WGP builds more robust physical maps than the SNaPshot technology in wheat and that WGP can be adapted to any genome. Moreover, WGP tag integration in sequence assemblies improves low quality assembly. However, to achieve a high quality draft sequence assembly, a sequencing depth of 25x paired-end reads is required, at which point WGP tag integration does not provide additional scaffolding value. Finally, we suggest that WGP tags can support the efficient sequencing of BAC pools by enabling reliable assignment of sequence scaffolds to their BAC of origin, a feature that is of great interest when using BAC pooling strategies to reduce the cost of sequencing large genomes.
Wolbachia are vertically transmitted bacteria known to be the most widespread endosymbiont in arthropods. They induce various alterations of the reproduction of their host, including feminization of genetic males in isopod crustaceans. In the pill bug Armadillidium vulgare, the presence of Wolbachia is also associated with detrimental effects on host fertility and lifespan. Deleterious effects have been demonstrated on hemocyte density, phenoloxidase activity, and natural hemolymph septicemia, suggesting that infected individuals could have defective immune capacities. Since nothing is known about the molecular mechanisms involved in Wolbachia-A. vulgare interactions and its secondary immunocompetence modulation, we developed a transcriptomics strategy and compared A. vulgare gene expression between Wolbachia-infected animals (i.e., “symbiotic” animals) and uninfected ones (i.e., “asymbiotic” animals) as well as between animals challenged or not challenged by a pathogenic bacteria.
Since very little genetic data is available on A. vulgare, we produced several EST libraries and generated a total of 28 606 ESTs. Analyses of these ESTs revealed that immune processes were over-represented in most experimental conditions (responses to a symbiont and to a pathogen). Considering canonical crustacean immune pathways, these genes encode antimicrobial peptides or are involved in pathogen recognition, detoxification, and autophagy. By RT-qPCR, we demonstrated a general trend towards gene under-expression in symbiotic whole animals and ovaries whereas the same gene set tends to be over-expressed in symbiotic immune tissues.
This study allowed us to generate the first reference transcriptome ever obtained in the Isopoda group and to identify genes involved in the major known crustacean immune pathways encompassing cellular and humoral responses. Expression of immune-related genes revealed a modulation of host immunity when females are infected by Wolbachia, including in ovaries, the crucial tissue for the Wolbachia route of transmission.