Haloferax mediterranei, an extremely halophilic archaeon, has shown promise for production of poly(3-hydroxybutyrate-co-3-hydroxyvalerate) (PHBV) from unrelated cheap carbon sources. Here we report the complete genome (3,904,707 bp) of H. mediterranei CGMCC 1.2087, consisting of one chromosome and three megaplasmids.
The accessory sex gland (ASG) is an important component of the male reproductive system, which functions to enhance the fertility of spermatozoa during male reproduction. Certain proteins secreted by the ASG are known to bind to the spermatozoa membrane and affect its function. The ASG gene expression profile in Chinese mitten crab (Eriocheir sinensis) has not been extensively studied, and limited genetic research has been conducted on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for the ASG of E. sinensis using Illumina sequencing technology. This analysis yielded a total of 33,221,284 sequencing reads, including 2.6 Gb of total nucleotides. Reads were assembled into 85,913 contigs (average 218 bp), or 58,567 scaffold sequences (average 292 bp), that identified 37,955 unigenes (average 385 bp). We assembled all unigenes and compared them with the published testis transcriptome from E. sinensis. In order to identify which genes may be involved in ASG function, as it pertains to modification of spermatozoa, we compared the ASG and testis transcriptome of E. sinensis. Our analysis identified specific genes with both higher and lower tissue expression levels in the two tissues, and the functions of these genes were analyzed to elucidate their potential roles during maturation of spermatozoa. Availability of detailed transcriptome data from ASG and testis in E. sinensis can assist our understanding of the molecular mechanisms involved with spermatozoa conservation, transport, maturation and capacitation and potentially acrosome activation.
The mammalian circadian system controls various physiology processes and behavior responses by regulating thousands of circadian genes with rhythmic expressions. In this study, we redefined circadian-regulated genes based on published results in the mouse liver and compared them with other gene groups defined relative to circadian regulations, especially the non-circadian-regulated genes expressed in liver at multiple molecular levels from gene position to protein expression based on integrative analyses of different datasets from the literature. Based on the intra-tissue analysis, the liver circadian genes or LCGs show unique features when compared to other gene groups. First, LCGs in general have less neighboring genes and larger in both genomic and 3′-UTR lengths but shorter in CDS (coding sequence) lengths. Second, LCGs have higher mRNA and protein abundance, higher temporal expression variations, and shorter mRNA half-life. Third, more than 60% of LCGs form major co-expression clusters centered in four temporal windows: dawn, day, dusk, and night. In addition, larger and smaller LCGs are found mainly expressed in the day and night temporal windows, respectively, and we believe that LCGs are well-partitioned into the gene expression regulatory network that takes advantage of gene size, expression constraint, and chromosomal architecture. Based on inter-tissue analysis, more than half of LCGs are ubiquitously expressed in multiple tissues but only show rhythmical expression in one or limited number of tissues. LCGs show at least three-fold lower expression variations across the temporal windows than those among different tissues, and this observation suggests that temporal expression variations regulated by the circadian system is relatively subtle as compared with the tissue expression variations formed during development. Taken together, we suggest that the circadian system selects gene parameters in a cost effective way to improve tissue-specific functions by adapting temporal variations from the environment over evolutionary time scales.
The nucleotide composition of the light (L-) and heavy (H-) strands of animal mitochondrial genomes is known to exhibit strand-biased compositional asymmetry (SCA). One of the possibilities is the existence of a replication-associated mutational pressure (RMP) that may introduce characteristic nucleotide changes among mitochondrial genomes of different animal lineages. Here, we discuss the influence of RMP on nucleotide and amino acid compositions as well as gene organization. Among animal mitochondrial genomes, RMP may represent the major force that compels the evolution of mitochondrial protein-coding genes, coupled with other process-based selective pressures, such as on components of translation machinery— tRNAs and their anticodons. Through comparative analyses of sequenced mitochondrial genomes among diverse animal lineages and literature reviews, we suggest a strong RMP effect, observed among invertebrate mitochondrial genes as compared to those of vertebrates, that is either a result of positive selection on the invertebrate or a relaxed selective pressure on the vertebrate mitochondrial genes.
Function-based selection; mitochondrion genome; replication-associated mutational pressure; strand-biased compositional asymmetry.
Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus.
Magnaporthe oryzae is the causal agent of rice blast that is mainly controlled with resistance cultivars. However, genetic variations in the pathogen often lead to overcoming R gene-mediated resistance in rice cultivars. In this study we sequenced two field isolates from China and Japan. In comparison with the laboratory strain that was previously sequenced, the field isolates have a similar genome size and overall genome structure. However, they have slightly more genes and contain a number of expanded gene families that are likely involved in plant-fungal interactions. Each of the isolates has specific genes, some of which affect virulence and some others are important for asexual development. The three strains differ noticeably in the distribution of transposon-like elements. Many of the transposable elements tend to be associated with isolate-specific or duplicated sequences. This study revealed genetic factors involved in genome variation of the rice blast fungus.
Hassawi rice (Oryza sativa L.) is a landrace adapted to the climate of Saudi Arabia, characterized by its strong resistance to soil salinity and drought. Using high quality sequencing reads extracted from raw data of a whole genome sequencing project, we assembled both chloroplast (cp) and mitochondrial (mt) genomes of the wild-type Hassawi rice (Hassawi-1) and its dwarf hybrid (Hassawi-2). We discovered 16 InDels (insertions and deletions) but no SNP (single nucleotide polymorphism) is present between the two Hassawi cp genomes. We identified 48 InDels and 26 SNPs in the two Hassawi mt genomes and a new type of sequence variation, termed reverse complementary variation (RCV) in the rice cp genomes. There are two and four RCVs identified in Hassawi-1 when compared to 93–11 (indica) and Nipponbare (japonica), respectively. Microsatellite sequence analysis showed there are more SSRs in the genic regions of both cp and mt genomes in the Hassawi rice than in the other rice varieties. There are also large repeats in the Hassawi mt genomes, with the longest length of 96,168 bp and 96,165 bp in Hassawi-1 and Hassawi-2, respectively. We believe that frequent DNA rearrangement in the Hassawi mt and cp genomes indicate ongoing dynamic processes to reach genetic stability under strong environmental pressures. Based on sequence variation analysis and the breeding history, we suggest that both Hassawi-1 and Hassawi-2 originated from the Indonesian variety Peta since genetic diversity between the two Hassawi cultivars is very low albeit an unknown historic origin of the wild-type Hassawi rice.
The rumen hosts one of the most efficient microbial systems for degrading plant cell walls, yet the predominant cellulolytic proteins and fibrolytic mechanism(s) remain elusive. Here we investigated the cellulolytic microbiome of the yak rumen by using a combination of metagenome-based and bacterial artificial chromosome (BAC)-based functional screening approaches. Totally 223 fibrolytic BAC clones were pyrosequenced and 10,070 ORFs were identified. Among them 150 were annotated as the glycoside hydrolase (GH) genes for fibrolytic proteins, and the majority (69%) of them were clustered or linked with genes encoding related functions. Among the 35 fibrolytic contigs of >10 Kb in length, 25 were derived from Bacteroidetes and four from Firmicutes. Coverage analysis indicated that the fibrolytic genes on most Bacteroidetes-contigs were abundantly represented in the metagenomic sequences, and they were frequently linked with genes encoding SusC/SusD-type outer-membrane proteins. GH5, GH9, and GH10 cellulase/hemicellulase genes were predominant, but no GH48 exocellulase gene was found. Most (85%) of the cellulase and hemicellulase proteins possessed a signal peptide; only a few carried carbohydrate-binding modules, and no cellulosomal domains were detected. These findings suggest that the SucC/SucD-involving mechanism, instead of one based on cellulosomes or the free-enzyme system, serves a major role in lignocellulose degradation in yak rumen. Genes encoding an endoglucanase of a novel GH5 subfamily occurred frequently in the metagenome, and the recombinant proteins encoded by the genes displayed moderate Avicelase in addition to endoglucanase activities, suggesting their important contribution to lignocellulose degradation in the exocellulase-scarce rumen.
EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at: http://www.evolgenius.info/evolview.html.
Based on next-generation sequencing data, we assembled the mitochondrial (mt) genome of date palm (Phoenix dactylifera L.) into a circular molecule of 715,001 bp in length. The mt genome of P. dactylifera encodes 38 proteins, 30 tRNAs, and 3 ribosomal RNAs, which constitute a gene content of 6.5% (46,770 bp) over the full length. The rest, 93.5% of the genome sequence, is comprised of cp (chloroplast)-derived (10.3% with respect to the whole genome length) and non-coding sequences. In the non-coding regions, there are 0.33% tandem and 2.3% long repeats. Our transcriptomic data from eight tissues (root, seed, bud, fruit, green leaf, yellow leaf, female flower, and male flower) showed higher gene expression levels in male flower, root, bud, and female flower, as compared to four other tissues. We identified 120 potential SNPs among three date palm cultivars (Khalas, Fahal, and Sukry), and successfully found seven SNPs in the coding sequences. A phylogenetic analysis, based on 22 conserved genes of 15 representative plant mitochondria, showed that P. dactylifera positions at the root of all sequenced monocot mt genomes. In addition, consistent with previous discoveries, there are three co-transcribed gene clusters–18S-5S rRNA, rps3-rpl16 and nad3-rps12–in P. dactylifera, which are highly conserved among all known mitochondrial genomes of angiosperms.
Small RNAs are a group of regulatory RNA molecules that control gene expression at transcriptional or post-transcriptional levels among eukaryotes. The silkworm, Bombyx mori L., genome harbors abundant repetitive sequences derived from families of retrotransposons and transposons, which together constitute almost half of the genome space and provide ample resource for biogenesis of the three major small RNA families. We systematically discovered transposable-element (TE)-associated small RNAs in B. mori genome based on a deep RNA-sequencing strategy and the effort yielded 182, 788 and 4,990 TE-associated small RNAs in the miRNA, siRNA and piRNA species, respectively. Our analysis suggested that the three small RNA species preferentially associate with different TEs to create sequence and functional diversity, and we also show evidence that a Bombyx non-LTR retrotransposon, bm1645, alone contributes to the generation of TE-associated small RNAs in a very significant way. The fact that bm1645-associated small RNAs partially overlap with each other implies a possibility that this element may be modulated by different mechanisms to generate different products with diverse functions. Taken together, these discoveries expand the small RNA pool in B. mori genome and lead to new knowledge on the diversity and functional significance of TE-associated small RNAs.
Streptococcus parasanguinis, a primary colonizer of the tooth surface, is also an opportunistic pathogen for subacute endocarditis. The complete genome of strain FW213 was determined using the traditional shotgun sequencing approach and further refined by the transcriptomes of cells in early exponential and early stationary growth phases in this study. The transcriptomes also discovered 10 transcripts encoding known hypothetical proteins, one pseudogene, five transcripts matched to the Rfam and additional 87 putative small RNAs within the intergenic regions defined by the GLIMMER analysis. The genome contains five acquired genomic islands (GIs) encoding proteins which potentially contribute to the overall pathogenic capacity and fitness of this microbe. The differential expression of the GIs and various open reading frames outside the GIs at the two growth phases suggested that FW213 possess a range of mechanisms to avoid host immune clearance, to colonize host tissues, to survive within oral biofilms and to overcome various environmental insults. Furthermore, the comparative genome analysis of five S. parasanguinis strains indicates that albeit S. parasanguinis strains are highly conserved, variations in the genome content exist. These variations may reflect differences in pathogenic potential between the strains.
We previously reported that the multidrug-resistant (MDR) Acinetobacter baumannii strain MDR-ZJ06, belonging to European clone II, was widely spread in China. In this study, we report the whole-genome sequence of this clinically important strain. A 38.6-kb AbaR-type genomic resistance island (AbaR22) was identified in MDR-ZJ06. AbaR22 has a structure similar to those of the resistance islands found in A. baumannii strains AYE and AB0057, but it contained only a few antibiotic resistance genes. The region of resistant gene accumulation as previously described was not found in AbaR22. In the chromosome of the strain MDR-ZJ06, we identified the gene blaoxa-23 in a composite transposon (Tn2009). Tn2009 shared the backbone with other A. baumannii transponsons that harbor blaoxa-23, but it was bracketed by two ISAba1 elements which were transcribed in the same orientation. MDR-ZJ06 also expressed the armA gene on its plasmid pZJ06, and this gene has the same genetic environment as the armA gene of the Enterobacteriaceae. These results suggest variability of resistance acquisition even in closely related A. baumannii strains.
Sulfobacillus acidophilus strain TPY is a moderately thermoacidophilic bacterium originally isolated from a hydrothermal vent in the Pacific Ocean. Ferrous iron and sulfur oxidation in acidic environments in strain TPY have been confirmed. Here we report the genome sequence and annotation of the strain TPY, which is the first complete genome of Sulfobacillus acidophilus.
Agrobacterium tumefaciens F2 is an efficient bioflocculant-producing bacterium. But the genes related to the metabolic pathway of bioflocculant biosynthesis in strain F2 are unknown. We present the draft genome of A. tumefaciens F2. It could provide further insight into the biosynthetic mechanism of polysaccharide-like bioflocculant in strain F2.
Alicyclobacillus acidocaldarius strain Tc-4-1 was initially isolated from a hot spring in Tengchong, China. This organism is both thermophilic and acidophilic. It can produce heat- and acid-stable enzymes, such as amylase and esterase, which may be important in industry. Here we report the whole genome sequence of the strain.
Streptococcus salivarius 57.I is one of the most abundant and highly ureolytic bacteria in the human mouth. It can utilize urea as the sole nitrogen source via the activity of urease. Complete genome sequencing of S. salivarius 57.I revealed a chromosome and a phage which are absent in strain SK126.
Streptococcus equi subsp. zooepidemicus is an opportunistic pathogen. It has caused a very large economic loss in the swine industry of China and has become a threat to human health. We announce the complete genome sequence of S. equi subsp. zooepidemicus strain ATCC 35246, which provides opportunities to understand its pathogenesis mechanism and genetic basis.
Date palm provides both staple food and gardening for the Middle East and North African countries for thousands of years. Its fruits have diversified significantly, such as nutritional content, size, length, weight color, and ripping process. Dates palm represent an excellent model system for the study of fruit development and diversity of fruit-bearing palm species that produce the most versatile fruit types as compared to other plant families. Using Roche/454 GS FLX instrument, we acquired 7.6 million sequence tags from seven fruiting stages (F1–F7). Over 99% of the raw reads are assembled, and the numbers of isotigs (equivalent to transcription units or unigenes) range from 30,684 to 40,378 during different fruiting stages. We annotated isotigs using BLASTX and BLASTN, and mapped 74% of the isotigs to known functional sequences or genes. Based on gene ontology categorization and pathway analysis, we have identified 10 core cell division genes, 18 ripening related genes, and 7 starch metabolic enzymes, which are involved as nutrition storage and sugar/starch metabolisms. We noticed that many metabolic pathways vary significantly during fruit development, and carbohydrate metabolism (especially sugar synthesis) is particularly prominent during fruit ripening. Transcriptomics study on various fruiting stages of date palm shows complicated metabolic activities during fruit development, ripening, synthesis and accumulation of starch enzymes and other related sugars. Most Genes are highly expressed in early stages of development, while late developmental stages are critical for fruit ripening including most of the metabolism associated ones.
Electronic supplementary material
The online version of this article (doi:10.1007/s11103-012-9890-5) contains supplementary material, which is available to authorized users.
Date palm; Transcriptome; Fruit; Development stage
The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage.
As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes.
Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group.
Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years.
This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin.
The high-throughput next-generation sequencing technologies provide an excellent opportunity for the detection of less-abundance transcripts that may not be identifiable by previously available techniques. Here, we report a discovery of thousands of novel transcripts (mostly non-coding RNAs) that are expressed in mouse cerebrum, testis, and embryonic stem (ES) cells, through an in-depth analysis of rmRNA-seq data. These transcripts show significant associations with transcriptional start and elongation signals. At the upstream of these transcripts we observed significant enrichment of histone marks (histone H3 lysine 4 trimethylation, H3K4me3), RNAPII binding sites, and cap analysis of gene expression tags that mark transcriptional start sites. Along the length of these transcripts, we also observed enrichment of histone H3 lysine 36 trimethylation (H3K36me3). Moreover, these transcripts show strong purifying selection in their genomic loci, exonic sequences, and promoter regions, implying functional constraints on the evolution of these transcripts. These results define a collection of novel transcripts in the mouse genome and indicate their potential functions in the mouse tissues and cells.
novel transcripts; non-coding RNA; ribo-minus RNA-seq; next-generation sequencing
Riemerella anatipestifer is a well-described pathogen of waterfowl and other avian species which can cause a great loss to the poultry industry. Here we obtained the complete genome sequence of R. anatipestifer strain RA-GD, which was isolated from an infected duck in Guangzhou, China, and was cultivated in our laboratory.
Complete organellar genome sequences (chloroplasts and mitochondria) provide valuable resources and information for studying plant molecular ecology and evolution. As high-throughput sequencing technology advances, it becomes the norm that a shotgun approach is used to obtain complete genome sequences. Therefore, to assemble organellar sequences from the whole genome, shotgun reads are inevitable. However, associated techniques are often cumbersome, time-consuming, and difficult, because true organellar DNA is difficult to separate efficiently from nuclear copies, which have been transferred to the nucleus through the course of evolution.
We report a new, rapid procedure for plant chloroplast and mitochondrial genome sequencing and assembly using the Roche/454 GS FLX platform. Plant cells can contain multiple copies of the organellar genomes, and there is a significant correlation between the depth of sequence reads in contigs and the number of copies of the genome. Without isolating organellar DNA from the mixture of nuclear and organellar DNA for sequencing, we retrospectively extracted assembled contigs of either chloroplast or mitochondrial sequences from the whole genome shotgun data. Moreover, the contig connection graph property of Newbler (a platform-specific sequence assembler) ensures an efficient final assembly. Using this procedure, we assembled both chloroplast and mitochondrial genomes of a resurrection plant, Boea hygrometrica, with high fidelity. We also present information and a minimal sequence dataset as a reference for the assembly of other plant organellar genomes.
To further understand the relationship between nucleosome-space occupancy (NO) and global transcriptional activity in mammals, we acquired a set of genome-wide nucleosome distribution and transcriptome data from the mouse cerebrum and testis based on ChIP (H3)-seq and RNA-seq, respectively. We identified a nearly consistent NO patterns among three mouse tissues—cerebrum, testis, and ESCs—and found, through clustering analysis for transcriptional activation, that the NO variations among chromosomes are closely associated with distinct expression levels between house-keeping (HK) genes and tissue-specific (TS) genes. Both TS and HK genes form clusters albeit the obvious majority. This feature implies that NO patterns, i.e. nucleosome binding and clustering, are coupled with gene clustering that may be functionally and evolutionarily conserved in regulating gene expression among different cell types.