Aging-related kidney diseases are a major health concern. Currently, models to study renal aging are lacking. Due to a reduced life-span progeroid models hold the promise to facilitate aging studies and allow examination of tissue-specific changes. Defects in genome maintenance in the Ercc1-/Δ progeroid mouse model result in premature aging and typical age-related pathologies. Here, we compared the glomerular transcriptome of young and aged Ercc1-deficient mice to young and aged WT mice in order to establish a novel model for research of aging-related kidney disease.
In a principal component analysis, age and genotype emerged as first and second principal components. Hierarchical clustering of all 521 genes differentially regulated between young and old WT and young and old Ercc1-/Δ mice showed cluster formation between young WT and Ercc1-/Δ as well as old WT and Ercc1-/Δ samples. An unexpectedly high number of 77 genes were differentially regulated in both WT and Ercc1-/Δ mice (p < 0.0001). GO term enrichment analysis revealed these genes to be involved in immune and inflammatory response, cell death, and chemotaxis. In a network analysis, these genes were part of insulin signaling, chemokine and cytokine signaling and extracellular matrix pathways.
Beyond insulin signaling, we find chemokine and cytokine signaling as well as modifiers of extracellular matrix composition to be subject to major changes in the aging glomerulus. At the level of the transcriptome, the pattern of gene activities is similar in the progeroid Ercc1-/Δ mouse model constituting a valuable tool for future studies of aging-associated glomerular pathologies.
Renal aging; Glomerular aging; Gene expression profiling; Microarray analysis; DNA damage; Nucleotide excision repair
The grain aphid (Sitobion avenae F.) is a major agricultural pest which causes significant yield losses of wheat in China, Europe and North America annually. Transcriptome profiling of the grain aphid alimentary canal after feeding on wheat plants could provide comprehensive gene expression information involved in feeding, ingestion and digestion. Furthermore, selection of aphid-specific RNAi target genes would be essential for utilizing a plant-mediated RNAi strategy to control aphids via a non-toxic mode of action. However, due to the tiny size of the alimentary canal and lack of genomic information on grain aphid as a whole, selection of the RNAi targets is a challenging task that as far as we are aware, has never been documented previously.
In this study, we performed de novo transcriptome assembly and gene expression analyses of the alimentary canals of grain aphids before and after feeding on wheat plants using Illumina RNA sequencing. The transcriptome profiling generated 30,427 unigenes with an average length of 664 bp. Furthermore, comparison of the transcriptomes of alimentary canals of pre- and post feeding grain aphids indicated that 5490 unigenes were differentially expressed, among which, diverse genes and/or pathways were identified and annotated. Based on the RPKM values of these unigenes, 16 of them that were significantly up or down-regulated upon feeding were selected for dsRNA artificial feeding assay. Of these, 5 unigenes led to higher mortality and developmental stunting in an artificial feeding assay due to the down-regulation of the target gene expression. Finally, by adding fluorescently labelled dsRNA into the artificial diet, the spread of fluorescence signal in the whole body tissues of grain aphid was observed.
Comparison of the transcriptome profiles of the alimentary canals of pre- and post-feeding grain aphids on wheat plants provided comprehensive gene expression information that could facilitate our understanding of the molecular mechanisms underlying feeding, ingestion and digestion. Furthermore, five novel and effective potential RNAi target genes were identified in grain aphid for the first time. This finding would provide a fundamental basis for aphid control in wheat through plant mediated RNAi strategy.
Grain aphid (Sitobion avenae F.); Alimentary canal; Transcriptome profile; Double strand RNA (dsRNA); Artificial feeding assay; RNA interference (RNAi); Aphid control
This descriptive study of the abdominal fat transcriptome takes advantage of two experimental lines of meat-type chickens (Gallus domesticus), which were selected over seven generations for a large difference in abdominal (visceral) fatness. At the age of selection (9 wk), the fat line (FL) and lean line (LL) chickens exhibit a 2.5-fold difference in abdominal fat weight, while their feed intake and body weight are similar. These unique avian models were originally created to unravel genetic and endocrine regulation of adiposity and lipogenesis in meat-type chickens. The Del-Mar 14K Chicken Integrated Systems microarray was used for a time-course analysis of gene expression in abdominal fat of FL and LL chickens during juvenile development (1–11 weeks of age).
Microarray analysis of abdominal fat in FL and LL chickens revealed 131 differentially expressed (DE) genes (FDR≤0.05) as the main effect of genotype, 254 DE genes as an interaction of age and genotype and 3,195 DE genes (FDR≤0.01) as the main effect of age. The most notable discoveries in the abdominal fat transcriptome were higher expression of many genes involved in blood coagulation in the LL and up-regulation of numerous adipogenic and lipogenic genes in FL chickens. Many of these DE genes belong to pathways controlling the synthesis, metabolism and transport of lipids or endocrine signaling pathways activated by adipokines, retinoid and thyroid hormones.
The present study provides a dynamic view of differential gene transcription in abdominal fat of chickens genetically selected for fatness (FL) or leanness (LL). Remarkably, the LL chickens over-express a large number of hemostatic genes that could be involved in proteolytic processing of adipokines and endocrine factors, which contribute to their higher lipolysis and export of stored lipids. Some of these changes are already present at 1 week of age before the divergence in fatness. In contrast, the FL chickens have enhanced expression of numerous lipogenic genes mainly after onset of divergence, presumably directed by multiple transcription factors. This transcriptional analysis shows that abdominal fat of the chicken serves a dual function as both an endocrine organ and an active metabolic tissue, which could play a more significant role in lipogenesis than previously thought.
Adipogenesis; Transcriptional regulators; Hemostatic genes; Lipogenesis; Adipokines; Retinoic acid signaling; Thyroid hormone action; Polygenic trait; Visceral obesity; Gene interaction networks; Canonical metabolic/regulatory pathways
Non-coding RNAs (ncRNAs) are key regulatory elements that control a wide range of cellular processes in all bacteria in which they have been studied. Taking advantage of recent technological innovations, we set out to fully explore the ncRNA potential of the multicellular, antibiotic-producing Streptomyces bacteria.
Using a comparative RNA sequencing analysis of three divergent model streptomycetes (S. coelicolor, S. avermitilis and S. venezuelae), we discovered hundreds of novel cis-antisense RNAs and intergenic small RNAs (sRNAs). We identified a ubiquitous antisense RNA species that arose from the overlapping transcription of convergently-oriented genes; we termed these RNA species ‘cutoRNAs’, for convergent untranslated overlapping RNAs. Conservation between different classes of ncRNAs varied greatly, with sRNAs being more conserved than antisense RNAs. Many species-specific ncRNAs, including many distinct cutoRNA pairs, were located within antibiotic biosynthetic clusters, including the actinorhodin, undecylprodigiosin, and coelimycin clusters of S. coelicolor, the chloramphenicol cluster of S. venezuelae, and the avermectin cluster of S. avermitilis.
These findings indicate that ncRNAs, including a novel class of antisense RNA, may exert a previously unrecognized level of regulatory control over antibiotic production in these bacteria. Collectively, this work has dramatically expanded the ncRNA repertoire of three Streptomyces species and has established a critical foundation from which to investigate ncRNA function in this medically and industrially important bacterial genus.
Streptomyces; Non-coding RNA; sRNA; Antisense RNA; Secondary metabolic gene cluster; Antibiotic; RNA degradation
Gene duplication supplies the raw materials for novel gene functions and many gene families arisen from duplication experience adaptive evolution. Most studies of young duplicates have focused on mammals, especially humans, whereas reports describing their genome-wide evolutionary patterns across the closely related Drosophila species are rare. The sequenced 12 Drosophila genomes provide the opportunity to address this issue.
In our study, 3,647 young duplicate gene families were identified across the 12 Drosophila species and three types of expansions, species-specific, lineage-specific and complex expansions, were detected in these gene families. Our data showed that the species-specific young duplicate genes predominated (86.6%) over the other two types. Interestingly, many independent species-specific expansions in the same gene family have been observed in many species, even including 11 or 12 Drosophila species. Our data also showed that the functional bias observed in these young duplicate genes was mainly related to responses to environmental stimuli and biotic stresses.
This study reveals the evolutionary patterns of young duplicates across 12 Drosophila species on a genomic scale. Our results suggest that convergent evolution acts on young duplicate genes after the species differentiation and adaptive evolution may play an important role in duplicate genes for adaption to ecological factors and environmental changes in Drosophila.
Young duplication; Environmental factor; Convergent evolution; Adaptive evolution
Aspartic proteases (APs) are a large family of proteolytic enzymes found in almost all organisms. In plants, they are involved in many biological processes, such as senescence, stress responses, programmed cell death, and reproduction. Prior to the present study, no grape AP gene(s) had been reported, and their research on woody species was very limited.
In this study, a total of 50 AP genes (VvAP) were identified in the grape genome, among which 30 contained the complete ASP domain. Synteny analysis within grape indicated that segmental and tandem duplication events contributed to the expansion of the grape AP family. Additional analysis between grape and Arabidopsis demonstrated that several grape AP genes were found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the divergence of grape and Arabidopsis. Phylogenetic relationships of the 30 VvAPs with the complete ASP domain and their Arabidopsis orthologs, as well as their gene and protein features were analyzed and their cellular localization was predicted. Moreover, expression profiles of VvAP genes in six different tissues were determined, and their transcript abundance under various stresses and hormone treatments were measured. Twenty-seven VvAP genes were expressed in at least one of the six tissues examined; nineteen VvAPs responded to at least one abiotic stress, 12 VvAPs responded to powdery mildew infection, and most of the VvAPs responded to SA and ABA treatments. Furthermore, integrated synteny and phylogenetic analysis identified orthologous AP genes between grape and Arabidopsis, providing a unique starting point for investigating the function of grape AP genes.
The genome-wide identification, evolutionary and expression analyses of grape AP genes provide a framework for future analysis of AP genes in defining their roles during stress response. Integrated synteny and phylogenetic analyses provide novel insight into the functions of less well-studied genes using information from their better understood orthologs.
Synteny analysis; Phylogenetic analysis; Gene expression; Orthologous genes; Grape
Application of Single Nucleotide Polymorphism (SNP) marker technology as a tool in sunflower breeding programs offers enormous potential to improve sunflower genetics, and facilitate faster release of sunflower hybrids to the market place. Through a National Sunflower Association (NSA) funded initiative, we report on the process of SNP discovery through reductive genome sequencing and local assembly of six diverse sunflower inbred lines that represent oil as well as confection types.
A combination of Restriction site Associated DNA Sequencing (RAD-Seq) protocols and Illumina paired-end sequencing chemistry generated high quality 89.4 M paired end reads from the six lines which represent 5.3 GB of the sequencing data. Raw reads from the sunflower line, RHA 464 were assembled de novo to serve as a framework reference genome. About 15.2 Mb of sunflower genome distributed over 42,267 contigs were obtained upon assembly of RHA 464 sequencing data, the contig lengths ranged from 200 to 950 bp with an N50 length of 393 bp. SNP calling was performed by aligning sequencing data from the six sunflower lines to the assembled reference RHA 464. On average, 1 SNP was located every 143 bp of the sunflower genome sequence. Based on several filtering criteria, a final set of 16,467 putative sequence variants with characteristics favorable for Illumina Infinium Genotyping Technology (IGT) were mined from the sequence data generated across six diverse sunflower lines.
Here we report the molecular and computational methodology involved in SNP development for a complex genome like sunflower lacking reference assembly, offering an attractive tool for molecular breeding purposes in sunflower.
Single nucleotide polymorphism (SNP); Restriction site associated DNA sequencing (RAD-Seq)
Litchi (Litchi chinensis Sonn.) is one of the most important fruit trees cultivated in tropical and subtropical areas. However, a lack of transcriptomic and genomic information hinders our understanding of the molecular mechanisms underlying fruit set and fruit development in litchi. Shading during early fruit development decreases fruit growth and induces fruit abscission. Here, high-throughput RNA sequencing (RNA-Seq) was employed for the de novo assembly and characterization of the fruit transcriptome in litchi, and differentially regulated genes, which are responsive to shading, were also investigated using digital transcript abundance(DTA)profiling.
More than 53 million paired-end reads were generated and assembled into 57,050 unigenes with an average length of 601 bp. These unigenes were annotated by querying against various public databases, with 34,029 unigenes found to be homologous to genes in the NCBI GenBank database and 22,945 unigenes annotated based on known proteins in the Swiss-Prot database. In further orthologous analyses, 5,885 unigenes were assigned with one or more Gene Ontology terms, 10,234 hits were aligned to the 24 Clusters of Orthologous Groups classifications and 15,330 unigenes were classified into 266 Kyoto Encyclopedia of Genes and Genomes pathways. Based on the newly assembled transcriptome, the DTA profiling approach was applied to investigate the differentially expressed genes related to shading stress. A total of 3.6 million and 3.5 million high-quality tags were generated from shaded and non-shaded libraries, respectively. As many as 1,039 unigenes were shown to be significantly differentially regulated. Eleven of the 14 differentially regulated unigenes, which were randomly selected for more detailed expression comparison during the course of shading treatment, were identified as being likely to be involved in the process of fruitlet abscission in litchi.
The assembled transcriptome of litchi fruit provides a global description of expressed genes in litchi fruit development, and could serve as an ideal repository for future functional characterization of specific genes. The DTA analysis revealed that more than 1000 differentially regulated unigenes respond to the shading signal, some of which might be involved in the fruitlet abscission process in litchi, shedding new light on the molecular mechanisms underlying organ abscission.
Litchi chinensis; Transcriptome; Fruit; RNA-Seq; DTA; Shade; Abscission
Citrus bacterial canker is a disease that has severe economic impact on citrus industries worldwide and is caused by a few species and pathotypes of Xanthomonas. X. citri subsp. citri strain 306 (XccA306) is a type A (Asiatic) strain with a wide host range, whereas its variant X. citri subsp. citri strain Aw12879 (Xcaw12879, Wellington strain) is restricted to Mexican lime.
To characterize the mechanism for the differences in host range of XccA and Xcaw, the genome of Xcaw12879 that was completed recently was compared with XccA306 genome. Effectors xopAF and avrGf1 are present in Xcaw12879, but were absent in XccA306. AvrGf1 was shown previously for Xcaw to cause hypersensitive response in Duncan grapefruit. Mutation analysis of xopAF indicates that the gene contributes to Xcaw growth in Mexican lime but does not contribute to the limited host range of Xcaw. RNA-Seq analysis was conducted to compare the expression profiles of Xcaw12879 and XccA306 in Nutrient Broth (NB) medium and XVM2 medium, which induces hrp gene expression. Two hundred ninety two and 281 genes showed differential expression in XVM2 compared to in NB for XccA306 and Xcaw12879, respectively. Twenty-five type 3 secretion system genes were up-regulated in XVM2 for both XccA and Xcaw. Among the 4,370 common genes of Xcaw12879 compared to XccA306, 603 genes in NB and 450 genes in XVM2 conditions were differentially regulated. Xcaw12879 showed higher protease activity than XccA306 whereas Xcaw12879 showed lower pectate lyase activity in comparison to XccA306.
Comparative genomic analysis of XccA306 and Xcaw12879 identified strain specific genes. Our study indicated that AvrGf1 contributes to the host range limitation of Xcaw12879 whereas XopAF contributes to virulence. Transcriptome analyses of XccA306 and Xcaw12879 presented insights into the expression of the two closely related strains of X. citri subsp. citri. Virulence genes including genes encoding T3SS components and effectors are induced in XVM2 medium. Numerous genes with differential expression in Xcaw12879 and XccA306 were identified. This study provided the foundation to further characterize the mechanisms for virulence and host range of pathotypes of X. citri subsp. citri.
Xanthomonas citri; Wellington strain; Citrus canker; HR; Virulence; Transcriptome; RNA-Seq
Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers.
We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets.
Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.
Transcriptome assembly; Chimeric transcripts; Fusion; Partial tandem duplication; PTD; Internal tandem duplication; ITD; RNA-seq; Transcriptome
Recent studies suggested that human/mammalian genomes are divided into large, discrete domains that are units of chromosome organization. CTCF, a CCCTC binding factor, has a diverse role in genome regulation including transcriptional regulation, chromosome-boundary insulation, DNA replication, and chromatin packaging. It remains unclear whether a subset of CTCF binding sites plays a functional role in establishing/maintaining chromatin topological domains.
We systematically analysed the genomic, transcriptomic and epigenetic profiles of the CTCF binding sites in 56 human cell lines from ENCODE. We identified ~24,000 CTCF sites (referred to as constitutive sites) that were bound in more than 90% of the cell lines. Our analysis revealed: 1) constitutive CTCF loci were located in constitutive open chromatin and often co-localized with constitutive cohesin loci; 2) most constitutive CTCF loci were distant from transcription start sites and lacked CpG islands but were enriched with the full-spectrum CTCF motifs: a recently reported 33/34-mer and two other potentially novel (22/26-mer); 3) more importantly, most constitutive CTCF loci were present in CTCF-mediated chromatin interactions detected by ChIA-PET and these pair-wise interactions occurred predominantly within, but not between, topological domains identified by Hi-C.
Our results suggest that the constitutive CTCF sites may play a role in organizing/maintaining the recently identified topological domains that are common across most human cells.
CTCF; Cohesin; Constitutive binding site; Chromatin interaction; Topological domain
In the alpha subclass of proteobacteria iron homeostasis is controlled by diverse iron responsive regulators. Caulobacter crescentus, an important freshwater α-proteobacterium, uses the ferric uptake repressor (Fur) for such purpose. However, the impact of the iron availability on the C. crescentus transcriptome and an overall perspective of the regulatory networks involved remain unknown.
In this work we report the identification of iron-responsive and Fur-regulated genes in C. crescentus using microarray-based global transcriptional analyses. We identified 42 genes that were strongly upregulated both by mutation of fur and by iron limitation condition. Among them, there are genes involved in iron uptake (four TonB-dependent receptor gene clusters, and feoAB), riboflavin biosynthesis and genes encoding hypothetical proteins. Most of these genes are associated with predicted Fur binding sites, implicating them as direct targets of Fur-mediated repression. These data were validated by β-galactosidase and EMSA assays for two operons encoding putative transporters. The role of Fur as a positive regulator is also evident, given that 27 genes were downregulated both by mutation of fur and under low-iron condition. As expected, this group includes many genes involved in energy metabolism, mostly iron-using enzymes. Surprisingly, included in this group are also TonB-dependent receptors genes and the genes fixK, fixT and ftrB encoding an oxygen signaling network required for growth during hypoxia. Bioinformatics analyses suggest that positive regulation by Fur is mainly indirect. In addition to the Fur modulon, iron limitation altered expression of 113 more genes, including induction of genes involved in Fe-S cluster assembly, oxidative stress and heat shock response, as well as repression of genes implicated in amino acid metabolism, chemotaxis and motility.
Using a global transcriptional approach, we determined the C. crescentus iron stimulon. Many but not all of iron responsive genes were directly or indirectly controlled by Fur. The iron limitation stimulon overlaps with other regulatory systems, such as the RpoH and FixK regulons. Altogether, our results showed that adaptation of C. crescentus to iron limitation not only involves increasing the transcription of iron-acquisition systems and decreasing the production of iron-using proteins, but also includes novel genes and regulatory mechanisms.
Caulobacter crescentus; Iron stimulon; Fur regulon; Transcriptome; Iron homeostasis; TonB-dependent receptor
A co-ordinated tissue-independent gene expression profile associated with growth is present in rodent models and this is hypothesised to extend to all mammals. Growth in humans has similarities to other mammals but the return to active long bone growth in the pubertal growth spurt is a distinctly human growth event. The aim of this study was to describe gene expression and biological pathways associated with stages of growth in children and to assess tissue-independent expression patterns in relation to human growth.
We conducted gene expression analysis on a library of datasets from normal children with age annotation, collated from the NCBI Gene Expression Omnibus (GEO) and EBI Arrayexpress databases. A primary data set was generated using cells of lymphoid origin from normal children; the expression of 688 genes (ANOVA false discovery rate modified p-value, q < 0.1) was associated with age, and subsets of these genes formed clusters that correlated with the phases of growth – infancy, childhood, puberty and final height. Network analysis on these clusters identified evolutionarily conserved growth pathways (NOTCH, VEGF, TGFB, WNT and glucocorticoid receptor – Hyper-geometric test, q < 0.05). The greatest degree of network ‘connectivity’ and hence functional significance was present in infancy (Wilcoxon test, p < 0.05), which then decreased through to adulthood. These observations were confirmed in a separate validation data set from lymphoid tissue. Similar biological pathways were observed to be associated with development-related gene expression in other tissues (conjunctival epithelia, temporal lobe brain tissue and bone marrow) suggesting the existence of a tissue-independent genetic program for human growth and maturation.
Similar evolutionarily conserved pathways have been associated with gene expression and child growth in multiple tissues. These expression profiles associate with the developmental phases of growth including the return to active long bone growth in puberty, a distinctly human event. These observations also have direct medical relevance to pathological changes that induce disease in children. Taking into account development-dependent gene expression profiles for normal children will be key to the appropriate selection of genes and pathways as potential biomarkers of disease or as drug targets.
Development; Evolution; Gene expression; Growth; Network analysis; Pediatrics
Larix gmelinii is a dominant tree species in China’s boreal forests and plays an important role in the coniferous ecosystem. It is also one of the most economically important tree species in the Chinese timber industry due to excellent water resistance and anti-corrosion of its wood products. Unfortunately, in Northeast China, L. gmelinii often suffers from serious attacks by diseases and insects. The application of exogenous volatile semiochemicals may induce and enhance its resistance against insect or disease attacks; however, little is known regarding the genes and molecular mechanisms related to induced resistance.
We performed de novo sequencing and assembly of the L. gmelinii transcriptome using a short read sequencing technology (Illumina). Chemical defenses of L. gmelinii seedlings were induced with jasmonic acid (JA) or methyl jasmonate (MeJA) for 6 hours. Transcriptomes were compared between seedlings induced by JA, MeJA and untreated controls using a tag-based digital gene expression profiling system. In a single run, 25,977,782 short reads were produced and 51,157 unigenes were obtained with a mean length of 517 nt. We sequenced 3 digital gene expression libraries and generated between 3.5 and 5.9 million raw tags, and obtained 52,040 reliable reference genes after removing redundancy. The expression of disease/insect-resistance genes (e.g., phenylalanine ammonialyase, coumarate 3-hydroxylase, lipoxygenase, allene oxide synthase and allene oxide cyclase) was up-regulated. The expression profiles of some abundant genes under different elicitor treatment were studied by using real-time qRT-PCR.
The results showed that the expression levels of disease/insect-resistance genes in the seedling samples induced by JA and MeJA were higher than those in the control group. The seedlings induced with MeJA elicited the strongest increases in disease/insect-resistance genes.
Both JA and MeJA induced seedlings of L. gmelinii showed significantly increased expression of disease/insect-resistance genes. MeJA seemed to have a stronger induction effect than JA on expression of disease/insect-resistance related genes. This study provides sequence resources for L. gmelinii research and will help us to better understand the functions of disease/insect-resistance genes and the molecular mechanisms of secondary metabolisms in L. gmelinii.
Genomic and transcriptomic sequence data are essential tools for tackling ecological problems. Using an approach that combines next-generation sequencing, de novo transcriptome assembly, gene annotation and synthetic gene construction, we identify and cluster the protein families from Favia corals from the northern Red Sea.
We obtained 80 million 75 bp paired-end cDNA reads from two Favia adult samples collected at 65 m (Fav1, Fav2) on the Illumina GA platform, and generated two de novo assemblies using ABySS and CAP3. After removing redundancy and filtering out low quality reads, our transcriptome datasets contained 58,268 (Fav1) and 62,469 (Fav2) contigs longer than 100 bp, with N50 values of 1,665 bp and 1,439 bp, respectively. Using the proteome of the sea anemone Nematostella vectensis as a reference, we were able to annotate almost 20% of each dataset using reciprocal homology searches. Homologous clustering of these annotated transcripts allowed us to divide them into 7,186 (Fav1) and 6,862 (Fav2) homologous transcript clusters (E-value ≤ 2e-30). Functional annotation categories were assigned to homologous clusters using the functional annotation of Nematostella vectensis. General annotation of the assembled transcripts was improved 1-3% using the Acropora digitifera proteome. In addition, we screened these transcript isoform clusters for fluorescent proteins (FPs) homologs and identified seven potential FP homologs in Fav1, and four in Fav2. These transcripts were validated as bona fide FP transcripts via robust fluorescence heterologous expression. Annotation of the assembled contigs revealed that 1.34% and 1.61% (in Fav1 and Fav2, respectively) of the total assembled contigs likely originated from the corals’ algal symbiont, Symbiodinium spp.
Here we present a study to identify the homologous transcript isoform clusters from the transcriptome of Favia corals using a far-related reference proteome. Furthermore, the symbiont-derived transcripts were isolated from the datasets and their contribution quantified. This is the first annotated transcriptome of the genus Favia, a major increase in genomics resources available in this important family of corals.
K-mer; Contig; Open reading frame; Fluorescent protein; Blast; Clustering; High-throughput sequencing; Illumina paired-end; Coral
METH is an illicit drug of abuse that influences gene expression in the rat striatum. Histone modifications regulate gene transcription.
We therefore used microarray analysis and genome-scale approaches to examine potential relationships between the effects of METH on gene expression and on DNA binding of histone H4 acetylated at lysine 4 (H4K5Ac) in the rat dorsal striatum of METH-naïve and METH-pretreated rats.
Acute and chronic METH administration caused differential changes in striatal gene expression. METH also increased H4K5Ac binding around the transcriptional start sites (TSSs) of genes in the rat striatum. In order to relate gene expression to histone acetylation, we binned genes of similar expression into groups of 100 genes and proceeded to relate gene expression to H4K5Ac binding. We found a positive correlation between gene expression and H4K5Ac binding in the striatum of control rats. Similar correlations were observed in METH-treated rats. Genes that showed acute METH-induced increased expression in saline-pretreated rats also showed METH-induced increased H4K5Ac binding. The acute METH injection caused similar increases in H4K5Ac binding in METH-pretreated rats, without affecting gene expression to the same degree. Finally, genes that showed METH-induced decreased expression exhibited either decreases or no changes in H4K5Ac binding.
Acute METH injections caused increased gene expression of genes that showed increased H4K5Ac binding near their transcription start sites.
ChIP sequencing; Neural networks; Epigenetics; Histone acetylation; Microarray
MicroRNAs (miRNAs) are a family of short, non-coding RNAs modulating expression of human protein coding genes (miRNA target genes). Their dysfunction is associated with many human diseases, including neurodevelopmental disorders. It has been recently shown that genomic copy number variations (CNVs) can cause aberrant expression of integral miRNAs and their target genes, and contribute to intellectual disability (ID).
To better understand the CNV-miRNA relationship in ID, we investigated the prevalence and function of miRNAs and miRNA target genes in five groups of CNVs. Three groups of CNVs were from 213 probands with ID (24 de novo CNVs, 46 familial and 216 common CNVs), one group of CNVs was from a cohort of 32 cognitively normal subjects (67 CNVs) and one group of CNVs represented 40 ID related syndromic regions listed in DECIPHER (30 CNVs) which served as positive controls for CNVs causing or predisposing to ID. Our results show that 1). The number of miRNAs is significantly higher in de novo or DECIPHER CNVs than in familial or common CNV subgroups (P < 0.01). 2). miRNAs with brain related functions are more prevalent in de novo CNV groups compared to common CNV groups. 3). More miRNA target genes are found in de novo, familial and DECIPHER CNVs than in the common CNV subgroup (P < 0.05). 4). The MAPK signaling cascade is found to be enriched among the miRNA target genes from de novo and DECIPHER CNV subgroups.
Our findings reveal an increase in miRNA and miRNA target gene content in de novo versus common CNVs in subjects with ID. Their expression profile and participation in pathways support a possible role of miRNA copy number change in cognition and/or CNV-mediated developmental delay. Systematic analysis of expression/function of miRNAs in addition to coding genes integral to CNVs could uncover new causes of ID.
Micro RNA (miRNA); Copy number variants (CNVs); Copy number variant regions (CNVRs); Intellectual disabilities (ID); Functional pathways
A major part of second generation biofuel production is the enzymatic saccharification of lignocellulosic biomass into fermentable sugars. Many fungi produce enzymes that can saccarify lignocellulose and cocktails from several fungi, including well-studied species such as Trichoderma reesei and Aspergillus niger, are available commercially for this process. Such commercially-available enzyme cocktails are not necessarily representative of the array of enzymes used by the fungi themselves when faced with a complex lignocellulosic material. The global induction of genes in response to exposure of T. reesei to wheat straw was explored using RNA-seq and compared to published RNA-seq data and model of how A. niger senses and responds to wheat straw.
In T. reesei, levels of transcript that encode known and predicted cell-wall degrading enzymes were very high after 24 h exposure to straw (approximately 13% of the total mRNA) but were less than recorded in A. niger (approximately 19% of the total mRNA). Closer analysis revealed that enzymes from the same glycoside hydrolase families but different carbohydrate esterase and polysaccharide lyase families were up-regulated in both organisms. Accessory proteins which have been hypothesised to possibly have a role in enhancing carbohydrate deconstruction in A. niger were also uncovered in T. reesei and categories of enzymes induced were in general similar to those in A. niger. Similarly to A. niger, antisense transcripts are present in T. reesei and their expression is regulated by the growth condition.
T. reesei uses a similar array of enzymes, for the deconstruction of a solid lignocellulosic substrate, to A. niger. This suggests a conserved strategy towards lignocellulose degradation in both saprobic fungi. This study provides a basis for further analysis and characterisation of genes shown to be highly induced in the presence of a lignocellulosic substrate. The data will help to elucidate the mechanism of solid substrate recognition and subsequent degradation by T. reesei and provide information which could prove useful for efficient production of second generation biofuels.
Trichoderma reesei; Aspergillus niger; Glycoside hydrolases; Carbohydrate esterases; Antisense RNA; RNA-sequencing
The Major Histocompatibility Complex (MHC) is the most important genetic marker to study patterns of adaptive genetic variation determining pathogen resistance and associated life history decisions. It is used in many different research fields ranging from human medical, molecular evolutionary to functional biodiversity studies. Correct assessment of the individual allelic diversity pattern and the underlying structural sequence variation is the basic requirement to address the functional importance of MHC variability. Next-generation sequencing (NGS) technologies are likely to replace traditional genotyping methods to a great extent in the near future but first empirical studies strongly indicate the need for a rigorous quality control pipeline. Strict approaches for data validation and allele calling to distinguish true alleles from artefacts are required.
We developed the analytical methodology and validated a data processing procedure which can be applied to any organism. It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles. Finally, we developed a method to assess the confidence level per genotype a-posteriori, which helps to decide which alleles and individuals should be included in any further downstream analyses. The latter method could also be used for optimizing experiment designs in the future.
Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.
Major histocompatibility complex; Next-generation sequencing; 454 pyrosequencing; Molecular cloning; PCR and sequencing artefacts; Amplification efficiency; Allelic dropout; Rodent; Delomys sublineatus
Solanum torvum Sw is worldwide employed as rootstock for eggplant cultivation because of its vigour and resistance/tolerance to the most serious soil-borne diseases as bacterial, fungal wilts and root-knot nematodes. The little information on Solanum torvum (hereafter Torvum) resistance mechanisms, is mostly attributable to the lack of genomic tools (e.g. dedicated microarray) as well as to the paucity of database information limiting high-throughput expression studies in Torvum.
As a first step towards transcriptome profiling of Torvum inoculated with the nematode M. incognita, we built a Torvum 3’ transcript catalogue. One-quarter of a 454 full run resulted in 205,591 quality-filtered reads. De novo assembly yielded 24,922 contigs and 11,875 singletons. Similarity searches of the S. torvum transcript tags catalogue produced 12,344 annotations. A 30,0000 features custom combimatrix chip was then designed and microarray hybridizations were conducted for both control and 14 dpi (day post inoculation) with Meloidogyne incognita-infected roots samples resulting in 390 differentially expressed genes (DEG). We also tested the chip with samples from the phylogenetically-related nematode-susceptible eggplant species Solanum melongena. An in-silico validation strategy was developed based on assessment of sequence similarity among Torvum probes and eggplant expressed sequences available in public repositories. GO term enrichment analyses with the 390 Torvum DEG revealed enhancement of several processes as chitin catabolism and sesquiterpenoids biosynthesis, while no GO term enrichment was found with eggplant DEG.
The genes identified from S. torvum catalogue, bearing high similarity to known nematode resistance genes, were further investigated in view of their potential role in the nematode resistance mechanism.
By combining 454 pyrosequencing and microarray technology we were able to conduct a cost-effective global transcriptome profiling in a non-model species. In addition, the development of an in silico validation strategy allowed to further extend the use of the custom chip to a related species and to assess by comparison the expression of selected genes without major concerns of artifacts. The expression profiling of S. torvum responses to nematode infection points to sesquiterpenoids and chitinases as major effectors of nematode resistance. The availability of the long sequence tags in S. torvum catalogue will allow precise identification of active nematocide/nematostatic compounds and associated enzymes posing the basis for exploitation of these resistance mechanisms in other species.
Torvum; Nematode resistance; 454 pyrosequencing; Microarray; Heterologous hybridizations
Epstein-Barr virus (EBV) is a human herpesvirus implicated in cancer and autoimmune disorders. Little is known concerning the roles of RNA structure in this important human pathogen. This study provides the first comprehensive genome-wide survey of RNA and RNA structure in EBV.
Novel EBV RNAs and RNA structures were identified by computational modeling and RNA-Seq analyses of EBV. Scans of the genomic sequences of four EBV strains (EBV-1, EBV-2, GD1, and GD2) and of the closely related Macacine herpesvirus 4 using the RNAz program discovered 265 regions with high probability of forming conserved RNA structures. Secondary structure models are proposed for these regions based on a combination of free energy minimization and comparative sequence analysis. The analysis of RNA-Seq data uncovered the first observation of a stable intronic sequence RNA (sisRNA) in EBV. The abundance of this sisRNA rivals that of the well-known and highly expressed EBV-encoded non-coding RNAs (EBERs).
This work identifies regions of the EBV genome likely to generate functional RNAs and RNA structures, provides structural models for these regions, and discusses potential functions suggested by the modeled structures. Enhanced understanding of the EBV transcriptome will guide future experimental analyses of the discovered RNAs and RNA structures.
Epstein-Barr virus (EBV); Herpesvirus; RNA; RNA structure; Non-coding RNA (ncRNA); RNA-Seq; Bioinformatics; W repeat; sisRNA; RNA editing
Latimeria menadoensis is a coelacanth species first identified in 1997 in Indonesia, at 10,000 Km of distance from its African congener. To date, only six specimens have been caught and just a very limited molecular data is available. In the present work we describe the de novo transcriptome assembly obtained from liver and testis samples collected from the fifth specimen ever caught of this species.
The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy. The assembly output was processed and filtered producing a set of 66,308 contigs, whose quality was thoroughly assessed. The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome.
The RNA-seq analysis revealed remarkable differences in the expression profiles between the two tissues, allowing the identification of liver- and testis-specific transcripts which may play a fundamental role in important biological processes carried out by these two organs.
Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.
Coelacanth; Latimeria menadoensis; Transcriptome; de novo assembly; RNA-seq
Histone acetylation has been implicated in learning and memory in the brain, however, its function at the level of the genome and at individual genetic loci remains poorly investigated. This study examines a key acetylation mark, histone H4 lysine 5 acetylation (H4K5ac), genome-wide and its role in activity-dependent gene transcription in the adult mouse hippocampus following contextual fear conditioning.
Using ChIP-Seq, we identified 23,235 genes in which H4K5ac correlates with absolute gene expression in the hippocampus. However, in the absence of transcription factor binding sites 150 bp upstream of the transcription start site, genes were associated with higher H4K5ac and expression levels. We further establish H4K5ac as a ubiquitous modification across the genome. Approximately one-third of all genes have above average H4K5ac, of which ~15% are specific to memory formation and ~65% are co-acetylated for H4K12. Although H4K5ac is prevalent across the genome, enrichment of H4K5ac at specific regions in the promoter and coding region are associated with different levels of gene expression. Additionally, unbiased peak calling for genes differentially acetylated for H4K5ac identified 114 unique genes specific to fear memory, over half of which have not previously been associated with memory processes.
Our data provide novel insights into potential mechanisms of gene priming and bookmarking by histone acetylation following hippocampal memory activation. Specifically, we propose that hyperacetylation of H4K5 may prime genes for rapid expression following activity. More broadly, this study strengthens the importance of histone posttranslational modifications for the differential regulation of transcriptional programs in cognitive processes.
ChIP-Seq; Contextual fear conditioning; Gene bookmarking; Gene priming; H4K5 acetylation; Learning and memory
The numerous classes of repeats often impede the assembly of genome sequences from the short reads provided by new sequencing technologies. We demonstrate a simple and rapid means to ascertain the repeat structure and total size of a bacterial or archaeal genome without the need for assembly by directly analyzing the abundances of distinct k-mers among reads.
The sensitivity of this procedure to resolve variation within a bacterial species is demonstrated: genome sizes and repeat structure of five environmental strains of E. coli from short Illumina reads were estimated by this method, and total genome sizes corresponded well with those obtained for the same strains by pulsed-field gel electrophoresis. In addition, this approach was applied to read-sets for completed genomes and shown to be accurate over a wide range of microbial genome sizes.
Application of these procedures, based solely on k-mer abundances in short read data sets, allows aspects of genome structure to be resolved that are not apparent from conventional short read assemblies. This knowledge of the repetitive content of genomes provides insights into genome evolution and diversity.
K-mer; Genome assembly; Repetitive elements; Bacterial evolution
RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE.
We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles.
After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes.
Next-generation sequencing; Mapping bias; Drosophila melanogaster; Drosophila simulans; DGRP; Allelic imbalance; Genomics; Gene expression; Illumina