The revolution in DNA sequencing technology continues unabated, and is affecting all aspects of the biological and medical sciences. The training and recruitment of the next generation of researchers who are able to use and exploit the new technology is severely lacking and potentially negatively influencing research and development efforts to advance genome biology. Here we present a cross-disciplinary course that provides undergraduate students with practical experience in running a next generation sequencing instrument through to the analysis and annotation of the generated DNA sequences.
Many labs across world are installing next generation sequencing technology and we show that the undergraduate students produce quality sequence data and were excited to participate in cutting edge research. The students conducted the work flow from DNA extraction, library preparation, running the sequencing instrument, to the extraction and analysis of the data. They sequenced microbes, metagenomes, and a marine mammal, the Californian sea lion, Zalophus californianus. The students met sequencing quality controls, had no detectable contamination in the targeted DNA sequences, provided publication quality data, and became part of an international collaboration to investigate carcinomas in carnivores.
Students learned important skills for their future education and career opportunities, and a perceived increase in students’ ability to conduct independent scientific research was measured. DNA sequencing is rapidly expanding in the life sciences. Teaching undergraduates to use the latest technology to sequence genomic DNA ensures they are ready to meet the challenges of the genomic era and allows them to participate in annotating the tree of life.
Undergraduate education; DNA sequencing; Sea lion; Metagenome
In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.
An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis.
A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations.
The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome-wide collection of reference RNA motif regulons is available in the RegPrecise database (http://regprecise.lbl.gov/).
RNA regulatory motif; Riboswitch; Regulon; Gene function; Comparative genomics; Bacteria
Mammalian olfactory receptors (ORs) are encoded by the largest mammalian multigene family. Understanding the OR gene repertoire in the cattle genome could lead to link the effects of genetic differences in these genes to variations in olfaction in cattle.
We report here a whole genome analysis of the olfactory receptor genes of Bos taurus using conserved OR gene-specific motifs and known OR protein sequences from diverse species. Our analysis, using the current cattle genome assembly UMD 3.1 covering 99.9% of the cattle genome, shows that the cattle genome contains 1,071 OR-related sequences including 881 functional, 190 pseudo, and 352 partial OR sequences. The OR genes are located in 49 clusters on 26 cattle chromosomes. We classified them into 18 families consisting of 4 Class I and 14 Class II families and these were further grouped into 272 subfamilies. Comparative analyses of the OR genes of cattle, pigs, humans, mice, and dogs showed that 6.0% (n = 53) of functional OR cattle genes were species-specific. We also showed that significant copy number variations are present in the OR repertoire of the cattle from the analysis of 10 selected OR genes.
Our analysis revealed the almost complete OR gene repertoire from an individual cattle genome. Though the number of OR genes were lower than in pigs, the analysis of the genetic system of cattle ORs showed close similarities to that of the pig.
Olfactory receptor; Cattle; Olfaction; OR genes
Cytokinins (CKs) have significant roles in various aspects of plant growth and development, and they are also involved in plant stress adaptations. The fine-tuning of the controlled CK levels in individual tissues, cells, and organelles is properly maintained by isopentenyl transferases (IPTs) and cytokinin oxidase/dehydrogenases (CKXs). Chinese cabbage is one of the most economically important vegetable crops worldwide. The whole genome sequencing of Brassica rapa enables us to perform the genome-wide identification and functional analysis of the IPT and CKX gene families.
In this study, a total of 13 BrIPT genes and 12 BrCKX genes were identified. The gene structures, conserved domains and phylogenetic relationships were analyzed. The isoelectric point, subcellular localization and glycosylation sites of the proteins were predicted. Segmental duplicates were found in both BrIPT and BrCKX gene families. We also analyzed evolutionary patterns and divergence of the IPT and CKX genes in the Cruciferae family. The transcription levels of BrIPT and BrCKX genes were analyzed to obtain an initial picture of the functions of these genes. Abiotic stress elements related to adverse environmental stimuli were found in the promoter regions of BrIPT and BrCKX genes and they were confirmed to respond to drought and high salinity conditions. The effects of 6-BA and ABA on the expressions of BrIPT and BrCKX genes were also investigated.
The expansion of BrIPT and BrCKX genes after speciation from Arabidopsis thaliana is mainly attributed to segmental duplication events during the whole genome triplication (WGT) and substantial duplicated genes are lost during the long evolutionary history. Genes produced by segmental duplication events have changed their expression patterns or may adopted new functions and thus are obtained. BrIPT and BrCKX genes respond well to drought and high salinity stresses, and their transcripts are affected by exogenous hormones, such as 6-BA and ABA, suggesting their potential roles in abiotic stress conditions and regulatory mechanisms of plant hormone homeostasis. The appropriate modulation of endogenous CKs levels by IPT and CKX genes is a promising approach for developing economically important high-yielding and high-quality stress-tolerant crops in agriculture.
Polycomb Repressive Complex 2 (PRC2) is an essential regulator of gene expression that maintains genes in a repressed state by marking chromatin with trimethylated Histone H3 lysine 27 (H3K27me3). In Arabidopsis, loss of PRC2 function leads to pleiotropic effects on growth and development thought to be due to ectopic expression of seed and embryo-specific genes. While there is some understanding of the mechanisms by which specific genes are targeted by PRC2 in animal systems, it is still not clear how PRC2 is recruited to specific regions of plant genomes.
We used ChIP-seq to determine the genome-wide distribution of hemagglutinin (HA)-tagged FERTLIZATION INDEPENDENT ENDOSPERM (FIE-HA), the Extra Sex Combs homolog protein present in all Arabidopsis PRC2 complexes. We found that the FIE-HA binding sites co-locate with a subset of the H3K27me3 sites in the genome and that the associated genes were more likely to be de-repressed in mutants of PRC2 components. The FIE-HA binding sites are enriched for three sequence motifs including a putative GAGA factor binding site that is also found in Drosophila Polycomb Response Elements (PREs).
Our results suggest that PRC2 binding sites in plant genomes share some sequence features with Drosophila PREs. However, unlike Drosophila PREs which are located in promoters and devoid of H3K27me3, Arabidopsis FIE binding sites tend to be in gene coding regions and co-localize with H3K27me3.
Polycomb; Chromatin immunoprecipitation; H3K27me3
Massive mortalities have been observed in France since 2008 on spat and juvenile Pacific oysters, Crassostrea gigas. A herpes virus called OsHV-1, easily detectable by PCR, has been implicated in the mortalities as demonstrated by the results of numerous field studies linking mortality with OsHV-1 prevalence. Moreover, experimental infections using viral particles have documented the pathogenicity of OsHV-1 but the physiological responses of host to pathogen are not well known.
The aim of this study was to understand mechanisms brought into play against the virus during infection in the field. A microarray assay has been developed for a major part of the oyster genome and used for studying the host transcriptome across mortality on field. Spat with and without detectable OsHV-1 infection presenting or not mortality respectively were compared by microarray during mortality episodes. In this study, a number of genes are regulated in the response to pathogen infection on field and seems to argue to an implication of the virus in the observed mortality. The result allowed establishment of a hypothetic scheme of the host cell’s infection by, and response to, the pathogen.
This response shows a “sensu stricto” innate immunity through genic regulation of the virus OsHV-1 life cycle, but also others biological processes resulting to complex interactions between host and pathogens in general.
Crassostrea gigas; OsHV-1; Host response; Mortality; Transcriptome
Qualitative alterations or abnormal expression of microRNAs (miRNAs) in colon cancer have mainly been demonstrated in primary tumors. Poorly overlapping sets of oncomiRs, tumor suppressor miRNAs and metastamiRs have been linked with distinct stages in the progression of colorectal cancer. To identify changes in both miRNA and gene expression levels among normal colon mucosa, primary tumor and liver metastasis samples, and to classify miRNAs into functional networks, in this work miRNA and gene expression profiles in 158 samples from 46 patients were analysed.
Most changes in miRNA and gene expression levels had already manifested in the primary tumors while these levels were almost stably maintained in the subsequent primary tumor-to-metastasis transition. In addition, comparing normal tissue, tumor and metastasis, we did not observe general impairment or any rise in miRNA biogenesis. While only few mRNAs were found to be differentially expressed between primary colorectal carcinoma and liver metastases, miRNA expression profiles can classify primary tumors and metastases well, including differential expression of miR-10b, miR-210 and miR-708. Of 82 miRNAs that were modulated during tumor progression, 22 were involved in EMT. qRT-PCR confirmed the down-regulation of miR-150 and miR-10b in both primary tumor and metastasis compared to normal mucosa and of miR-146a in metastases compared to primary tumor. The upregulation of miR-201 in metastasis compared both with normal and primary tumour was also confirmed. A preliminary survival analysis considering differentially expressed miRNAs suggested a possible link between miR-10b expression in metastasis and patient survival. By integrating miRNA and target gene expression data, we identified a combination of interconnected miRNAs, which are organized into sub-networks, including several regulatory relationships with differentially expressed genes. Key regulatory interactions were validated experimentally. Specific mixed circuits involving miRNAs and transcription factors were identified and deserve further investigation. The suppressor activity of miR-182 on ENTPD5 gene was identified for the first time and confirmed in an independent set of samples.
Using a large dataset of CRC miRNA and gene expression profiles, we describe the interplay of miRNA groups in regulating gene expression, which in turn affects modulated pathways that are important for tumor development.
microRNA; Gene expression; Regulatory networks; Colorectal cancer; Metastasis
Cotton bollworm (Helicoverpa armigera) and oriental tobacco budworm (Helicoverpa assulta) are noctuid sibling species. Under artificial manipulation, they can mate and produce fertile offspring. As serious agricultural insect pests, cotton bollworms are euryphagous insects, but oriental tobacco budworms are oligophagous insects. To identify the differentially expressed genes that affect host recognition and host adaptation between the two species, we constructed digital gene expression tag profiles for four developmental stages of the two species. High-throughput sequencing results indicated that we have got more than 23 million 17nt clean tags from both species, respectively. The number of unique clean tags was nearly same in both species (approximately 357,000).
According to the gene annotation results, we identified 83 and 68 olfaction related transcripts from H. armigera and H. assulta, respectively. At the same time, 1137 and 1138 transcripts of digestion enzymes were identified from the two species. Among the olfaction related transcripts, more odorant binding protein and G protein-coupled receptor were identified in H. armigera than in H. assulta. Among the digestion enzymes, there are more detoxification enzyme, e.g. P450, carboxypeptidase and ATPase in H. assulta than in H. armigera. These differences partially explain that because of the narrow host plant range of H. assulta, more detoxification enzymes would help them increase the food detoxification and utilization efficiency.
This study supplied some differentially expressed genes affecting host selection and adaptation between the two sibling species. These genes will be useful information for studying on the evolution of host plant selection. It also provides some important target genes for insect species-specific control by RNAi technology.
Development; Host plant range; Transcripts; Digital gene expression tag profile (DGE-Tag); Sibling species; Differential expression gene; Helicoverpa armigera; Helicoverpa assulta
A composite biological structure, such as an insect head or abdomen, contains many internal structures with distinct functions. Composite structures are often used in RNA-seq studies, though it is unclear how expression of the same gene in different tissues and structures within the same structure affects the measurement (or even utility) of the resulting patterns of gene expression. Here we determine how complex composite tissue structure affects measures of gene expression using RNA-seq.
We focus on two structures in the honey bee (the sting gland and digestive tract) both contained within one larger structure, the whole abdomen. For each of the three structures, we used RNA-seq to identify differentially expressed genes between two developmental stages, nurse bees and foragers. Based on RNA-seq for each structure-specific extraction, we found that RNA-seq with composite structures leads to many false negatives (genes strongly differentially expressed in particular structures which are not found to be differentially expressed within the composite structure). We also found a significant number of genes with one pattern of differential expression in the tissue-specific extraction, and the opposite in the composite extraction, suggesting multiple signals from such genes within the composite structure. We found these patterns for different classes of genes including transcription factors.
Many RNA-seq studies currently use composite extractions, and even whole insect extractions, when tissue and structure specific extractions are possible. This is due to the logistical difficultly of micro-dissection and unawareness of the potential errors associated with composite extractions. The present study suggests that RNA-seq studies of composite structures are prone to false negatives and difficult to interpret positive signals for genes with variable patterns of local expression. In general, our results suggest that RNA-seq on large composite structures should be avoided unless it is possible to demonstrate that the effects shown here do not exist for the genes of interest.
RNA-seq; Tissue specificity; Genomics
In spite of its association with gastroenteritis and inflammatory bowel diseases, the isolation of Campylobacter concisus from both diseased and healthy individuals has led to controversy regarding its role as an intestinal pathogen. One proposed reason for this is the presence of high genetic diversity among the genomes of C. concisus strains.
In this study the genomes of six C. concisus strains were sequenced, assembled and annotated including two strains isolated from Crohn’s disease patients (UNSW2 and UNSW3), three from gastroenteritis patients (UNSW1, UNSWCS and ATCC 51562) and one from a healthy individual (ATCC 51561). The genomes of C. concisus BAA-1457 and UNSWCD, available from NCBI, were included in subsequent comparative genomic analyses. The Pan and Core genomes for the sequenced C. concisus strains consisted of 3254 and 1556 protein coding genes, respectively.
Genes were identified with specific conservation in C. concisus strains grouped by phenotypes such as invasiveness, adherence, motility and diseased states. Phylogenetic trees based on ribosomal RNA sequences and concatenated host-related pathways for the eight C. concisus strains were generated using the neighbor-joining method, of which the 16S rRNA gene and peptidoglycan biosynthesis grouped the C. concisus strains according to their pathogenic phenotypes. Furthermore, 25 non-synonymous amino acid changes with 14 affecting functional domains, were identified within proteins of conserved host-related pathways, which had possible associations with the pathogenic potential of C. concisus strains. Finally, the genomes of the eight C. concisus strains were compared to the nine available genomes of the well-established pathogen Campylobacter jejuni, which identified several important differences in the respiration pathways of these two species. Our findings indicate that C. concisus strains are genetically diverse, and suggest the genomes of this bacterium contain respiration pathways and modifications in the peptidoglycan layer that may play an important role in its virulence.
Campylobacter concisus; Comparative genomics; Pathogenesis; Phylogeny; Peptidoglycan; Respiration; Campylobacter jejuni
The draft genome of the domestic pig (Sus scrofa) has recently been published permitting refined analysis of the transcriptome. Pig breeds have been reported to differ in their resistance to infectious disease. In this study we examine whether there are corresponding differences in gene expression in innate immune cells
We demonstrate that macrophages can be harvested from three different compartments of the pig (lungs, blood and bone-marrow), cryopreserved and subsequently recovered and differentiated in CSF-1. We have performed surface marker analysis and gene expression profiling on macrophages from these compartments, comparing twenty-five animals from five different breeds and their response to lipopolysaccharide. The results provide a clear distinction between alveolar macrophages (AM) and monocyte-derived (MDM) and bone-marrow-derived macrophages (BMDM). In particular, the lung macrophages express the growth factor, FLT1 and its ligand, VEGFA at high levels, suggesting a distinct pathway of growth regulation. Relatively few genes showed breed-specific differential expression, notably CXCR2 and CD302 in alveolar macrophages. In contrast, there was substantial inter-individual variation between pigs within breeds, mostly affecting genes annotated as being involved in immune responses.
Pig macrophages more closely resemble human, than mouse, in their set of macrophage-expressed and LPS-inducible genes. Future research will address whether inter-individual variation in macrophage gene expression is heritable, and might form the basis for selective breeding for disease resistance.
Pig; Macrophages; Microarray; Breed; Lipopolysaccharide
The insect exoskeleton provides shape, waterproofing, and locomotion via attached somatic muscles. The exoskeleton is renewed during molting, a process regulated by ecdysteroid hormones. The holometabolous pupa transforms into an adult during the imaginal molt, when the epidermis synthe3sizes the definitive exoskeleton that then differentiates progressively. An important issue in insect development concerns how the exoskeletal regions are constructed to provide their morphological, physiological and mechanical functions. We used whole-genome oligonucleotide microarrays to screen for genes involved in exoskeletal formation in the honeybee thoracic dorsum. Our analysis included three sampling times during the pupal-to-adult molt, i.e., before, during and after the ecdysteroid-induced apolysis that triggers synthesis of the adult exoskeleton.
Gene ontology annotation based on orthologous relationships with Drosophila melanogaster genes placed the honeybee differentially expressed genes (DEGs) into distinct categories of Biological Process and Molecular Function, depending on developmental time, revealing the functional elements required for adult exoskeleton formation. Of the 1,253 unique DEGs, 547 were upregulated in the thoracic dorsum after apolysis, suggesting induction by the ecdysteroid pulse. The upregulated gene set included 20 of the 47 cuticular protein (CP) genes that were previously identified in the honeybee genome, and three novel putative CP genes that do not belong to a known CP family. In situ hybridization showed that two of the novel genes were abundantly expressed in the epidermis during adult exoskeleton formation, strongly implicating them as genuine CP genes. Conserved sequence motifs identified the CP genes as members of the CPR, Tweedle, Apidermin, CPF, CPLCP1 and Analogous-to-Peritrophins families. Furthermore, 28 of the 36 muscle-related DEGs were upregulated during the de novo formation of striated fibers attached to the exoskeleton. A search for cis-regulatory motifs in the 5′-untranslated region of the DEGs revealed potential binding sites for known transcription factors. Construction of a regulatory network showed that various upregulated CP- and muscle-related genes (15 and 21 genes, respectively) share common elements, suggesting co-regulation during thoracic exoskeleton formation.
These findings help reveal molecular aspects of rigid thoracic exoskeleton formation during the ecdysteroid-coordinated pupal-to-adult molt in the honeybee.
Cuticular protein genes; Metamorphosis; Molt; Thoracic musculature; Microarrays; Honeybee; Apis mellifera
The exonization of transposable elements (TEs) has proven to be a significant mechanism for the creation of novel exons. Existing knowledge of the retention patterns of TE exons in mRNAs were mainly established by the analysis of Expressed Sequence Tag (EST) data and microarray data.
This study seeks to validate and extend previous studies on the expression of TE exons by an integrative statistical analysis of high throughput RNA sequencing data. We collected 26 RNA-seq datasets spanning multiple tissues and cancer types. The exon-level digital expressions (indicating retention rates in mRNAs) were quantified by a double normalized measure, called the rescaled RPKM (Reads Per Kilobase of exon model per Million mapped reads). We analyzed the distribution profiles and the variability (across samples and between tissue/disease groups) of TE exon expressions, and compared them with those of other constitutive or cassette exons. We inferred the effects of four genomic factors, including the location, length, cognate TE family and TE nucleotide proportion (RTE, see Methods section) of a TE exon, on the exons’ expression level and expression variability. We also investigated the biological implications of an assembly of highly-expressed TE exons.
Our analysis confirmed prior studies from the following four aspects. First, with relatively high expression variability, most TE exons in mRNAs, especially those without exact counterparts in the UCSC RefSeq (Reference Sequence) gene tables, demonstrate low but still detectable expression levels in most tissue samples. Second, the TE exons in coding DNA sequences (CDSs) are less highly expressed than those in 3′ (5′) untranslated regions (UTRs). Third, the exons derived from chronologically ancient repeat elements, such as MIRs, tend to be highly expressed in comparison with those derived from younger TEs. Fourth, the previously observed negative relationship between the lengths of exons and the inclusion levels in transcripts is also true for exonized TEs. Furthermore, our study resulted in several novel findings. They include: (1) for the TE exons with non-zero expression and as shown in most of the studied biological samples, a high TE nucleotide proportion leads to their lower retention rates in mRNAs; (2) the considered genomic features (i.e. a continuous variable such as the exon length or a category indicator such as 3′UTR) influence the expression level and the expression variability (CV) of TE exons in an inverse manner; (3) not only the exons derived from Alu elements but also the exons from the TEs of other families were preferentially established in zinc finger (ZNF) genes.
The stress response in bacteria involves the multistage control of gene expression but is not entirely understood. To identify the translational response of bacteria in stress conditions and assess its contribution to the regulation of gene expression, the translational states of all mRNAs were compared under optimal growth condition and during nutrient (isoleucine) starvation.
A genome-scale study of the translational response to nutritional limitation was performed in the model bacterium Lactococcus lactis. Two measures were used to assess the translational status of each individual mRNA: the fraction engaged in translation (ribosome occupancy) and ribosome density (number of ribosomes per 100 nucleotides). Under isoleucine starvation, half of the mRNAs considered were translationally down-regulated mainly due to decreased ribosome density. This pattern concerned genes involved in growth-related functions such as translation, transcription, and the metabolism of fatty acids, phospholipids and bases, contributing to the slowdown of growth. Only 4% of the mRNAs were translationally up-regulated, mostly related to prophagic expression in response to stress. The remaining genes exhibited antagonistic regulations of the two markers of translation. Ribosome occupancy increased significantly for all the genes involved in the biosynthesis of isoleucine, although their ribosome density had decreased. The results revealed complex translational regulation of this pathway, essential to cope with isoleucine starvation.
To elucidate the regulation of global gene expression more generally, translational regulation was compared to transcriptional regulation under isoleucine starvation and to other post-transcriptional regulations related to mRNA degradation and mRNA dilution by growth. Translational regulation appeared to accentuate the effects of transcriptional changes for down-regulated growth-related functions under isoleucine starvation although mRNA stabilization and lower dilution by growth counterbalanced this effect.
We show that the contribution of translational regulation to the control of gene expression is significant in the stress response. Post-transcriptional regulation is complex and not systematically co-directional with transcription regulation. Post-transcriptional regulation is important to the understanding of gene expression control.
Translational regulation; Stress; Bacterial adaptation; Gene expression regulation; Post-transcriptional regulation; Lactococcus lactis
As is true for many other antibiotic-resistant Gram-negative pathogens, members of the Burkholderia cepacia complex (BCC) are currently being assessed for their susceptibility to phage therapy as an antimicrobial treatment. The objective of this study was to perform genomic and limited functional characterization of the novel BCC phage JG068 (vB_BceP_JG068).
JG068 is a podovirus that forms large, clear plaques on Burkholderia cenocepacia K56-2. Host range analysis indicates that this phage can infect environmental, clinical, and epidemic isolates of Burkholderia multivorans, B. cenocepacia, Burkholderia stabilis, and Burkholderia dolosa, likely through interaction with the host lipopolysaccharide as a receptor. The JG068 chromosome is 41,604 base pairs (bp) in length and is flanked by 216 bp short direct terminal repeats. Gene expression originates from both host and phage promoters and is in the forward direction for all 49 open reading frames. The genome sequence shows similarity to Ralstonia phage ϕRSB1, Caulobacter phage Cd1, and uncharacterized genetic loci of blood disease bacterium R229 and Burkholderia pseudomallei 1710b. CoreGenesUniqueGenes analysis indicates that JG068 belongs to the Autographivirinae subfamily and ϕKMV-like phages genus. Modules within the genome encode proteins involved in DNA-binding, morphogenesis, and lysis, but none associated with pathogenicity or lysogeny. Similar to the signal-arrest-release (SAR) endolysin of ϕKMV, inducible expression of the JG068 SAR endolysin causes lysis of Escherichia coli that is dependent on the presence of an N-terminal signal sequence. In an in vivo assay using the Galleria mellonella infection model, treatment of B. cenocepacia K56-2-infected larvae with JG068 results in a significant increase in larval survival.
As JG068 has a broad host range, does not encode virulence factors, is obligately lytic, and has activity against an epidemic B. cenocepacia strain in vivo, this phage is a highly promising candidate for BCC phage therapy development.
Burkholderia cepacia complex; Phage therapy; Autographivirinae; ϕKMV-like phages; SAR endolysin; Galleria mellonella
Chinese cabbage (Brassica rapa ssp. pekinensis) is a member of one of the most important leaf vegetables grown worldwide, which has experienced thousands of years in cultivation and artificial selection. The entire Chinese cabbage genome sequence, and more than forty thousand proteins have been obtained to date. The genome has undergone triplication events since its divergence from Arabidopsis thaliana (13 to 17 Mya), however a high degree of sequence similarity and conserved genome structure remain between the two species. Arabidopsis is therefore a viable reference species for comparative genomics studies. Variation in the number of members in gene families due to genome triplication may contribute to the broad range of phenotypic plasticity, and increased tolerance to environmental extremes observed in Brassica species. Transcription factors are important regulators involved in plant developmental and physiological processes. The AP2/ERF proteins, one of the most important families of transcriptional regulators, play a crucial role in plant growth, and in response to biotic and abiotic stressors. Our analysis will provide resources for understanding the tolerance mechanisms in Brassica rapa ssp. pekinensis.
In the present study, 291 putative AP2/ERF transcription factor proteins were identified from the Chinese cabbage genome database, and compared with proteins from 15 additional species. The Chinese cabbage AP2/ERF superfamily was classified into four families, including AP2, ERF, RAV, and Soloist. The ERF family was further divided into DREB and ERF subfamilies. The AP2/ERF superfamily was subsequently divided into 15 groups. The identification, classification, phylogenetic reconstruction, conserved motifs, chromosome distribution, functional annotation, expression patterns, and interaction networks of the AP2/ERF transcription factor superfamily were predicted and analyzed. Distribution mapping results showed AP2/ERF superfamily genes were localized on the 10 Chinese cabbage chromosomes. AP2/ERF transcription factor expression levels exhibited differences among six tissue types based on expressed sequence tags (ESTs). In the AP2/ERF superfamily, 214 orthologous genes were identified between Chinese cabbage and Arabidopsis. Orthologous gene interaction networks were constructed, and included seven CBF and four AP2 genes, primarily involved in cold regulatory pathways and ovule development, respectively.
The evolution of the AP2/ERF transcription factor superfamily in Chinese cabbage resulted from genome triplication and tandem duplications. A comprehensive analysis of the physiological functions and biological roles of AP2/ERF superfamily genes in Chinese cabbage is required to fully elucidate AP2/ERF, which provides us with rich resources and opportunities to understand crop stress tolerance mechanisms.
Chinese cabbage; AP2/ERF; Stress tolerance; Gene expression; Interaction network; Protein annotation
The origin, evolution and speciation of the lion, has been subject of interest, debate and study. The present surviving lions of the genus Panthera comprise of eight sub-species inclusive of Asiatic lion Panthera leo persica of India's Gir forest. Except for the Asiatic lion, the other seven subspecies are found in different parts of Africa. There have been different opinions regarding the phylogenetic status of Panthera leo, as well as classifying lions of different geographic regions into subspecies and races. In the present study, mitogenome sequence of P. leo persica deduced, using Ion Torrent PGM to assess phylogeny and evolution which may play an increasingly important role in conservation biology.
The mtDNA sequence of P. leo persica is 17,057 bp in length with 40.8% GC content. Annotation of mitogenome revealed total 37 genes, including 13 protein coding, 2 rRNA and 22 tRNA. Phylogenetic analysis based on whole mitogenome, suggests Panthera pardus as a neighbouring species to P. leo with species divergence at ~2.96 mya.
This work presents first report on complete mitogenome of Panthera leo persica. It sheds light on the phylogenetic and evolutionary status within and across Felidae members. The result compared and evaluated with earlier reports of Felidae shows alteration of phylogenetic status and species evolution. This study may provide information on genetic diversity and population stability.
Asiatic lion; Big cats; Panthera leo persica; Mitogenome; Ion torrent; Phylogeny; Evolution; Felidae; Divergence time
Measuring allelic RNA expression ratios is a powerful approach for detecting cis-acting regulatory variants, RNA editing, loss of heterozygosity in cancer, copy number variation, and allele-specific epigenetic gene silencing. Whole transcriptome RNA sequencing (RNA-Seq) has emerged as a genome-wide tool for identifying allelic expression imbalance (AEI), but numerous factors bias allelic RNA ratio measurements. Here, we compare RNA-Seq allelic ratios measured in nine different human brain regions with a highly sensitive and accurate SNaPshot measure of allelic RNA ratios, identifying factors affecting reliable allelic ratio measurement. Accounting for these factors, we subsequently surveyed the variability of RNA editing across brain regions and across individuals.
We find that RNA-Seq allelic ratios from standard alignment methods correlate poorly with SNaPshot, but applying alternative alignment strategies and correcting for observed biases significantly improves correlations. Deploying these methods on a transcriptome-wide basis in nine brain regions from a single individual, we identified genes with AEI across all regions (SLC1A3, NHP2L1) and many others with region-specific AEI. In dorsolateral prefrontal cortex (DLPFC) tissues from 14 individuals, we found evidence for frequent regulatory variants affecting RNA expression in tens to hundreds of genes, depending on stringency for assigning AEI. Further, we find that the extent and variability of RNA editing is similar across brain regions and across individuals.
These results identify critical factors affecting allelic ratios measured by RNA-Seq and provide a foundation for using this technology to screen allelic RNA expression on a transcriptome-wide basis. Using this technology as a screening tool reveals tens to hundreds of genes harboring frequent functional variants affecting RNA expression in the human brain. With respect to RNA editing, the similarities within and between individuals leads us to conclude that this post-transcriptional process is under heavy regulatory influence to maintain an optimal degree of editing for normal biological function.
RNA-Seq; Whole transcriptome; Allele expression; mRNA expression; Functional genetics; Regulatory polymorphism; eQTL; Read alignment; Next generation sequencing; Bioinformatics
Introgressive hybridization is an important evolutionary process that can lead to the creation of novel genome structures and thus potentially new genetic variation for selection to act upon. On the other hand, hybridization with introduced species can threaten native species, such as cutthroat trout (Oncorhynchus clarkii) following the introduction of rainbow trout (O. mykiss). Neither the evolutionary consequences nor conservation implications of rainbow trout introgression in cutthroat trout is well understood. Therefore, we generated a genetic linkage map for rainbow-Yellowstone cutthroat trout (O. clarkii bouvieri) hybrids to evaluate genome processes that may help explain how introgression affects hybrid genome evolution.
The hybrid map closely aligned with the rainbow trout map (a cutthroat trout map does not exist), sharing all but one linkage group. This linkage group (RYHyb20) represented a fusion between an acrocentric (Omy28) and a metacentric chromosome (Omy20) in rainbow trout. Additional mapping in Yellowstone cutthroat trout indicated the two rainbow trout homologues were fused in the Yellowstone genome. Variation in the number of hybrid linkage groups (28 or 29) likely depended on a Robertsonian rearrangement polymorphism within the rainbow trout stock. Comparison between the female-merged F1 map and a female consensus rainbow trout map revealed that introgression suppressed recombination across large genomic regions in 5 hybrid linkage groups. Two of these linkage groups (RYHyb20 and RYHyb25_29) contained confirmed chromosome rearrangements between rainbow and Yellowstone cutthroat trout indicating that rearrangements may suppress recombination. The frequency of allelic and genotypic segregation distortion varied among parents and families, suggesting few incompatibilities exist between rainbow and Yellowstone cutthroat trout genomes.
Chromosome rearrangements suppressed recombination in the hybrids. This result supports several previous findings demonstrating that recombination suppression restricts gene flow between chromosomes that differ by arrangement. Conservation of synteny and map order between the hybrid and rainbow trout maps and minimal segregation distortion in the hybrids suggest rainbow and Yellowstone cutthroat trout genomes freely introgress across chromosomes with similar arrangement. Taken together, these results suggest that rearrangements impede introgression. Recombination suppression across rearrangements could enable large portions of non-recombined chromosomes to persist within admixed populations.
Mammalian hibernators display phenotypes similar to physiological responses to calorie restriction and fasting, sleep, cold exposure, and ischemia-reperfusion in non-hibernating species. Whether biochemical changes evident during hibernation have parallels in non-hibernating systems on molecular and genetic levels is unclear.
We identified the molecular signatures of torpor and arousal episodes during hibernation using a custom-designed microarray for the Arctic ground squirrel (Urocitellus parryii) and compared them with molecular signatures of selected mouse phenotypes. Our results indicate that differential gene expression related to metabolism during hibernation is associated with that during calorie restriction and that the nuclear receptor protein PPARα is potentially crucial for metabolic remodeling in torpor. Sleep-wake cycle-related and temperature response genes follow the same expression changes as during the torpor-arousal cycle. Increased fatty acid metabolism occurs during hibernation but not during ischemia-reperfusion injury in mice and, thus, might contribute to protection against ischemia-reperfusion during hibernation.
In this study, we systematically compared hibernation with alternative phenotypes to reveal novel mechanisms that might be used therapeutically in human pathological conditions.
Hibernation; Microarray; Molecular signatures; Liver; Urocitellus parryii
Atlantic halibut (Hippoglossus hippoglossus) is a high-value, niche market species for cold-water marine aquaculture. Production of monosex female stocks is desirable in commercial production since females grow faster and mature later than males. Understanding the sex determination mechanism and developing sex-associated markers will shorten the time for the development of monosex female production, thus decreasing the costs of farming.
Halibut juveniles were masculinised with 17 α-methyldihydrotestosterone (MDHT) and grown to maturity. Progeny groups from four treated males were reared and sexed. Two of these groups (n = 26 and 70) consisted of only females, while the other two (n = 30 and 71) contained balanced sex ratios (50% and 48% females respectively). DNA from parents and offspring from the two mixed-sex families were used as a template for Restriction-site Associated DNA (RAD) sequencing. The 648 million raw reads produced 90,105 unique RAD-tags. A linkage map was constructed based on 5703 Single Nucleotide Polymorphism (SNP) markers and 7 microsatellites consisting of 24 linkage groups, which corresponds to the number of chromosome pairs in this species. A major sex determining locus was mapped to linkage group 13 in both families. Assays for 10 SNPs with significant association with phenotypic sex were tested in both population data and in 3 additional families. Using a variety of machine-learning algorithms 97% correct classification could be obtained with the 3% of errors being phenotypic males predicted to be females.
Altogether our findings support the hypothesis that the Atlantic halibut has an XX/XY sex determination system. Assays are described for sex-associated DNA markers developed from the RAD sequencing analysis to fast track progeny testing and implement monosex female halibut production for an immediate improvement in productivity. These should also help to speed up the inclusion of neomales derived from many families to maintain a larger effective population size and ensure long-term improvement through selective breeding.
Hippoglossus hippoglossus; Sex determination; Monosex; QTL mapping; RAD-seq; Aquaculture
Ginseng including North American ginseng (Panax quinquefolius L.) is one of the most widely used medicinal plants. Its success is thought to be due to a diverse collection of ginsenosides that serve as its major bioactive compounds. However, few genomic resources exist and the details concerning its various biosynthetic pathways remain poorly understood. As the root is the primary tissue harvested commercially for ginsenosides, next generation sequencing was applied to the characterization and assembly of the root transcriptome throughout seasonal development. Transcripts showing homology to ginsenoside biosynthesis enzymes were profiled in greater detail.
RNA extracts from root samples from seven development stages of North American ginseng were subjected to 454 sequencing, filtered for quality and used in the de novo assembly of a collective root reference transcriptome consisting of 41,623 transcripts. Annotation efforts using a number of public databases resulted in detailed annotation information for 34,801 (84%) transcripts. In addition, 3,955 genes were assigned to metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes. Among our results, we found all of the known enzymes involved in the ginsenoside backbone biosynthesis and used co-expression analysis to identify a number of candidate sequences involved in the latter stages ginsenoside biosynthesis pathway. Transcript profiles suggest ginsenoside biosynthesis occurs at distinct stages of development.
The assembly generated provides a comprehensive annotated reference for future transcriptomic study of North American ginseng. A collection of putative ginsenoside biosynthesis genes were identified and candidate genes predicted from the lesser understood downstream stages of biosynthesis. Transcript expression profiles across seasonal development suggest a primary dammarane-type ginsenoside biosynthesis occurs just prior to plant senescence, with secondary ginsenoside production occurring throughout development. Data from the study provide a valuable resource for conducting future ginsenoside biosynthesis research in this important medicinal plant.
North American ginseng; Transcriptome; Next generation sequencing; Ginsenoside
Longan is a tropical/subtropical fruit tree of great economic importance in Southeast Asia. Progress in understanding molecular mechanisms of longan embryogenesis, which is the primary influence on fruit quality and yield, is slowed by lack of transcriptomic and genomic information. Illumina second generation sequencing, which is suitable for generating enormous numbers of transcript sequences that can be used for functional genomic analysis of longan.
In this study, a longan embryogenic callus (EC) cDNA library was sequenced using an Illumina HiSeq 2000 system. A total of 64,876,258 clean reads comprising 5.84 Gb of nucleotides were assembled into 68,925 unigenes of 448-bp mean length, with unigenes ≥1000 bp accounting for 8.26% of the total. Using BLASTx, 40,634 unigenes were found to have significant similarity with accessions in Nr and Swiss- Prot databases. Of these, 38,845 unigenes were assigned to 43 GO sub-categories and 17,118 unigenes were classified into 25 COG sub-groups. In addition, 17,306 unigenes mapped to 199 KEGG pathways, with the categories of Metabolic pathways, Plant-pathogen interaction, Biosynthesis of secondary metabolites, and Genetic information processing being well represented. Analyses of unigenes ≥1000 bp revealed 328 embryogenesis-related unigenes as well as numerous unigenes expressed in EC associated with functions of reproductive growth, such as flowering, gametophytogenesis, and fertility, and vegetative growth, such as root and shoot growth. Furthermore, 23 unigenes related to embryogenesis and reproductive and vegetative growth were validated by quantitative real time PCR (qPCR) in samples from different stages of longan somatic embryogenesis (SE); their differentially expressions in the various embryogenic cultures indicated their possible roles in longan SE.
The quantity and variety of expressed EC genes identified in this study is sufficient to serve as a global transcriptome dataset for longan EC and to provide more molecular resources for longan functional genomics.
Powdery mildew (Blumeria graminis f. sp. tritici) is one of the most damaging diseases of wheat. The objective of this study was to identify the wheat genomic regions that are involved in the control of powdery mildew resistance through a quantitative trait loci (QTL) meta-analysis approach. This meta-analysis allows the use of collected QTL data from different published studies to obtain consensus QTL across different genetic backgrounds, thus providing a better definition of the regions responsible for the trait, and the possibility to obtain molecular markers that will be suitable for marker-assisted selection.
Five QTL for resistance to powdery mildew were identified under field conditions in the durum-wheat segregating population Creso × Pedroso. An integrated map was developed for the projection of resistance genes/ alleles and the QTL from the present study and the literature, and to investigate their distribution in the wheat genome. Molecular markers that correspond to candidate genes for plant responses to pathogens were also projected onto the map, particularly considering NBS-LRR and receptor-like protein kinases. More than 80 independent QTL and 51 resistance genes from 62 different mapping populations were projected onto the consensus map using the Biomercator statistical software. Twenty-four MQTL that comprised 2–6 initial QTL that had widely varying confidence intervals were found on 15 chromosomes. The co-location of the resistance QTL and genes was investigated. Moreover, from analysis of the sequences of DArT markers, 28 DArT clones mapped on wheat chromosomes have been shown to be associated with the NBS-LRR genes and positioned in the same regions as the MQTL for powdery mildew resistance.
The results from the present study provide a detailed analysis of the genetic basis of resistance to powdery mildew in wheat. The study of the Creso × Pedroso durum-wheat population has revealed some QTL that had not been previously identified. Furthermore, the analysis of the co-localization of resistance loci and functional markers provides a large list of candidate genes and opens up a new perspective for the fine mapping and isolation of resistance genes, and for the marker-assisted improvement of resistance in wheat.
Wheat; Powdery mildew; MQTL; Collinearity; Resistance gene
Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits.
Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis.
Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.
RNA-seq; Bioenergy; Crassulacean acid metabolism; de novo transcriptome assembly; Q420 Alternative energy sources