MicroRNAs (miRNAs) are small non-coding RNAs found to regulate several biological processes including adipogenesis. Understanding adipose tissue regulation is critical for beef cattle as fat is an important determinant of beef quality and nutrient value. This study analyzed the association between genomic context characteristics of miRNAs with their expression and function in bovine adipose tissue. Twenty-four subcutaneous adipose tissue biopsies were obtained from eight British-continental crossbred steers at 3 different time points. Total RNA was extracted and miRNAs were profiled using a miRNA microarray with expression further validated by qRT-PCR.
A total of 224 miRNAs were detected of which 155 were expressed in all steers (n = 8), and defined as the core miRNAs of bovine subcutaneous adipose tissue. Core adipose miRNAs varied in terms of genomic location (59.5% intergenic, 38.7% intronic, 1.2% exonic, and 0.6% mirtron), organization (55.5% non-clustered and 44.5% clustered), and conservation (49% highly conserved, 14% conserved and 37% poorly conserved). Clustered miRNAs and highly conserved miRNAs were more highly expressed (p < 0.05) and had more predicted targets than non-clustered or less conserved miRNAs (p < 0.001). A total of 34 miRNAs were coordinately expressed, being part of six identified relevant networks. Two intronic miRNAs (miR-33a and miR-1281) were confirmed to have coordinated expression with their host genes, transcriptional factor SREBF2 and EP300 (a transcriptional co-activator of transcriptional factor C/EBPα), respectively which are involved in lipid metabolism, suggesting these miRNAs may also play a role in regulation of bovine lipid metabolism/adipogenesis. Furthermore, a total of 17 bovine specific miRNAs were predicted to be involved in the regulation of energy balance in adipose tissue.
These findings improve our understanding on the behavior of miRNAs in the regulation of bovine adipogenesis and fat metabolism as it reveals that miRNA expression patterns and functions are associated with miRNA genomic location, organization and conservation.
Adipogenesis; Adipose tissue; Bovine; Fat metabolism; Genomic context; microRNA; Cluster; Co-expression; Species specific
Aneuploidy, a karyotype deviating from multiples of a haploid chromosome set, affects the physiology of eukaryotes. In humans, aneuploidy is linked to pathological defects such as developmental abnormalities, mental retardation or cancer, but the underlying mechanisms remain elusive. There are many different types and origins of aneuploidy, but whether there is a uniform cellular response to aneuploidy in human cells has not been addressed so far.
Here we evaluate the transcription profiles of eleven trisomic and tetrasomic cell lines and two cell lines with complex aneuploid karyotypes. We identify a characteristic aneuploidy response pattern defined by upregulation of genes linked to endoplasmic reticulum, Golgi apparatus and lysosomes, and downregulation of DNA replication, transcription as well as ribosomes. Strikingly, complex aneuploidy elicits the same transcriptional changes as trisomy. To uncover the triggers of the response, we compared the profiles with transcription changes in human cells subjected to stress conditions. Interestingly, we found an overlap only with the response to treatment with the autophagy inhibitor bafilomycin A1. Finally, we identified 23 genes whose expression is significantly altered in all aneuploids and which may thus serve as aneuploidy markers.
Our analysis shows that despite the variability in chromosome content, aneuploidy triggers uniform transcriptional response in human cells. A common response independent of the type of aneuploidy might be exploited as a novel target for cancer therapy. Moreover, the potential aneuploidy markers identified in our analysis might represent novel biomarkers to assess the malignant potential of a tumor.
Saccharomyces cerevisiae strains isolated from natural settings form structured biofilm colonies that are equipped with intricate protective mechanisms. These wild strains are able to reprogram themselves with a certain frequency during cultivation in plentiful laboratory conditions. The resulting domesticated strains switch off certain protective mechanisms and form smooth colonies that resemble those of common laboratory strains.
Here, we show that domestication can be reversed when a domesticated strain is challenged by various adverse conditions; the resulting feral strain restores its ability to form structured biofilm colonies. Phenotypic, microscopic and transcriptomic analyses show that phenotypic transition is a complex process that affects various aspects of feral strain physiology; it leads to a phenotype that resembles the original wild strain in some aspects and the domesticated derivative in others. We specify the genetic determinants that are likely involved in the formation of a structured biofilm colonies. In addition to FLO11, these determinants include genes that affect the cell wall and membrane composition. We also identify changes occurring during phenotypic transitions that affect other properties of phenotypic strain-variants, such as resistance to the impact of environmental stress. Here we document the regulatory role of the histone deacetylase Hda1p in developing such a resistance.
We provide detailed analysis of transcriptomic and phenotypic modulations of three related S. cerevisiae strains that arose by phenotypic switching under diverse environmental conditions. We identify changes specifically related to a strain’s ability to create complex structured colonies; we also show that other changes, such as genome rearrangement(s), are unrelated to this ability. Finally, we identify the importance of histone deacetylase Hda1p in strain resistance to stresses.
Biofilm colony; Histone deacetylase; Phenotypic switching; Wild yeast strains
Genome-wide expression profiles are altered during biological aging and can describe molecular regulation of tissue degeneration. Age-regulated mRNA expression trends from cross-sectional studies could describe how aging progresses. We developed a novel statistical methodology to identify age-regulated expression trends in cross-sectional datasets.
We studied six cross-sectional RNA expression profiles from different human tissues. Our methodology, capable of overcoming technical and genetic background differences, identified an age-regulation in four of the tissues. For the identification of expression trends, five regression models were compared and the quadratic model was found as the most suitable for this study. After k-means clustering of the age-associated probes, expression trends were found to change at two major age-positions in brain cortex and in Vastus lateralis muscles. The first age-position was found to occur during the fifth decade and a later one during the eighth decade. In kidney cortex, however, only one age-position was identified correlating with a late age-position. Functional mapping of genes at each age-position suggests that calcium homeostasis and lipid metabolisms are initially affected and subsequently, in elderly mitochondria, apoptosis and hormonal signaling pathways are affected.
Our results suggest that age-associated temporal changes in human tissues progress at distinct age-positions, which differ between tissues and in their molecular composition.
Human aging; Expression profiles; Quadratic regression model; Kmeans clustering
Campylobacter jejuni and C. coli share a multitude of risk factors associated with human gastrointestinal disease, yet their phylogeny differs significantly. C. jejuni is scattered into several lineages, with no apparent linkage, whereas C. coli clusters into three distinct phylogenetic groups (clades) of which clade 1 has shown extensive genome-wide introgression with C. jejuni, yet the other two clades (2 and 3) have less than 2% of C. jejuni ancestry. We characterized a C. coli strain (76339) with four novel multilocus sequence type alleles (ST-5088) and having the capability to express gamma-glutamyltranspeptidase (GGT); an accessory feature in C. jejuni. Our aim was to further characterize unintrogressed C. coli clades 2 and 3, using comparative genomics and with additional genome sequences available, to investigate the impact of horizontal gene transfer in shaping the accessory and core gene pools in unintrogressed C. coli.
Here, we present the first fully closed C. coli clade 3 genome (76339). The phylogenomic analysis of strain 76339, revealed that it belonged to clade 3 of unintrogressed C. coli. A more extensive respiratory metabolism among unintrogressed C. coli strains was found compared to introgressed C. coli (clade 1). We also identified other genes, such as serine proteases and an active sialyltransferase in the lipooligosaccharide locus, not present in C. coli clade 1 and we further propose a unique scenario for the evolution of Campylobacter ggt.
We propose new insights into the evolution of the accessory genome of C. coli clade 3 and C. jejuni. Also, in silico analysis of the gene content revealed that C. coli clades 2 and 3 have genes associated with infection, suggesting they are a potent human pathogen, and may currently be underreported in human infections due to niche separation.
Campylobacter coli; Comparative genomics; Phylogeny; Gamma glutamyltranspeptidase; Sialyltransferase
Methylation on the fifth position of cytosine (5-mC) is an essential epigenetic mark that is linked to both normal neurodevelopment and neurological diseases. The recent identification of another modified form of cytosine, 5-hydroxymethylcytosine (5-hmC), in both stem cells and post-mitotic neurons, raises new questions as to the role of this base in mediating epigenetic effects. Genomic studies of these marks using model systems are limited, particularly with array-based tools, because the standard method of detecting DNA methylation cannot distinguish between 5-mC and 5-hmC and most methods have been developed to only survey the human genome.
We show that non-human data generated using the optimization of a widely used human DNA methylation array, designed only to detect 5-mC, reproducibly distinguishes tissue types within and between chimpanzee, rhesus, and mouse, with correlations near the human DNA level (R2 > 0.99). Genome-wide methylation analysis, using this approach, reveals 6,102 differentially methylated loci between rhesus placental and fetal tissues with pathways analysis significantly overrepresented for developmental processes. Restricting the analysis to oncogenes and tumor suppressor genes finds 76 differentially methylated loci, suggesting that rhesus placental tissue carries a cancer epigenetic signature. Similarly, adapting the assay to detect 5-hmC finds highly reproducible 5-hmC levels within human, rhesus, and mouse brain tissue that is species-specific with a hierarchical abundance among the three species (human > rhesus >> mouse). Annotation of 5-hmC with respect to gene structure reveals a significant prevalence in the 3'UTR and an association with chromatin-related ontological terms, suggesting an epigenetic feedback loop mechanism for 5-hmC.
Together, these data show that this array-based methylation assay is generalizable to all mammals for the detection of both 5-mC and 5-hmC, greatly improving the utility of mammalian model systems to study the role of epigenetics in human health, disease, and evolution.
Epigenetics; DNA methylation; 5-hydroxymethylcytosine (5-hmC); Evolution
Cathelicidins comprise a major group of host-defence peptides. Conserved across a wide range of species, they have several functions related to host defence. Only one cathelicidin has been found in humans but several cathelicidin genes occur in the bovine genome. We propose that these molecules may have a protective role against mastitis. The aim of this study was to characterise the cathelicidin gene-cluster in the bovine genome and to identify sites of expression in the bovine mammary gland.
Bioinformatic analysis of the bovine genome (BosTau7) revealed seven protein-coding cathelicidin genes, CATHL1-7, including two identical copies of CATHL4, as well as three additional putative cathelicidin genes, all clustered on the long arm of chromosome 22. Six of the seven protein-coding genes were expressed in leukocytes extracted from milk of high somatic cell count (SCC) cows. CATHL5 was expressed across several sites in the mammary gland, but did not increase in response to Staphylococcus aureus infection.
Here, we characterise the bovine cathelicidin gene cluster and reconcile inconsistencies in the datasets of previous studies. Constitutive cathelicidin expression in the mammary gland suggests a possible role for these host defence peptides its protection.
Cathelicidin; Hidden Markov Model (HMM); Gene cluster; Locus; Tissue expression
Pea has a complex genome of 4.3 Gb for which only limited genomic resources are available to date. Although SNP markers are now highly valuable for research and modern breeding, only a few are described and used in pea for genetic diversity and linkage analysis.
We developed a large resource by cDNA sequencing of 8 genotypes representative of modern breeding material using the Roche 454 technology, combining both long reads (400 bp) and high coverage (3.8 million reads, reaching a total of 1,369 megabases). Sequencing data were assembled and generated a 68 K unigene set, from which 41 K were annotated from their best blast hit against the model species Medicago truncatula. Annotated contigs showed an even distribution along M. truncatula pseudochromosomes, suggesting a good representation of the pea genome. 10 K pea contigs were found to be polymorphic among the genetic material surveyed, corresponding to 35 K SNPs.
We validated a subset of 1538 SNPs through the GoldenGate assay, proving their ability to structure a diversity panel of breeding germplasm. Among them, 1340 were genetically mapped and used to build a new consensus map comprising a total of 2070 markers. Based on blast analysis, we could establish 1252 bridges between our pea consensus map and the pseudochromosomes of M. truncatula, which provides new insight on synteny between the two species.
Our approach created significant new resources in pea, i.e. the most comprehensive genetic map to date tightly linked to the model species M. truncatula and a large SNP resource for both academic research and breeding.
Pisum sativum; Medicago truncatula; Next generation sequencing; Genetic diversity; Composite genetic map; Synteny; Marker assisted selection
Paracoccus aminophilus JCM 7686 is a methylotrophic α-Proteobacterium capable of utilizing reduced one-carbon compounds as sole carbon and energy source for growth, including toxic N,N-dimethylformamide, formamide, methanol, and methylamines, which are widely used in the industry. P. aminophilus JCM 7686, as many other Paracoccus spp., possesses a genome representing a multipartite structure, in which the genomic information is split between various replicons, including chromids, essential plasmid-like replicons, with properties of both chromosomes and plasmids. In this study, whole-genome sequencing and functional genomics approaches were applied to investigate P. aminophilus genome information.
The P. aminophilus JCM 7686 genome has a multipartite structure, composed of a single circular chromosome and eight additional replicons ranging in size between 5.6 and 438.1 kb. Functional analyses revealed that two of the replicons, pAMI5 and pAMI6, are essential for host viability, therefore they should be considered as chromids. Both replicons carry housekeeping genes, e.g. responsible for de novo NAD biosynthesis and ammonium transport. Other mobile genetic elements have also been identified, including 20 insertion sequences, 4 transposons and 10 prophage regions, one of which represents a novel, functional serine recombinase-encoding bacteriophage, ϕPam-6. Moreover, in silico analyses allowed us to predict the transcription regulatory network of the JCM 7686 strain, as well as components of the stress response, recombination, repair and methylation machineries. Finally, comparative genomic analyses revealed that P. aminophilus JCM 7686 has a relatively distant relationship to other representatives of the genus Paracoccus.
P. aminophilus genome exploration provided insights into the overall structure and functions of the genome, with a special focus on the chromids. Based on the obtained results we propose the classification of bacterial chromids into two types: “primary” chromids, which are indispensable for host viability and “secondary” chromids, which are essential, but only under some environmental conditions and which were probably formed quite recently in the course of evolution. Detailed genome investigation and its functional analysis, makes P. aminophilus JCM 7686 a suitable reference strain for the genus Paracoccus. Moreover, this study has increased knowledge on overall genome structure and composition of members within the class Alphaproteobacteria.
Paracoccus aminophilus JCM 7686; Genome; Chromid; Plasmid; Mobile genetic element; Bacteriophage
Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID’s are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner.
Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers.
This tool combines many different sources of information, that otherwise are time consuming to obtain and difficult to integrate. The SNPchiMp not only provides the information in a user-friendly format, but also enables researchers to perform a large number of operations with a few clicks of the mouse. This significantly reduces the time needed to execute the large number of operations required to manage SNP data.
SNP chip; SNP data; Relational database; Assembly; Imputation; GWAS; Bovine livestock
Alternative splicing is an important process in higher eukaryotes that allows obtaining several transcripts from one gene. A specific case of alternative splicing is mutually exclusive splicing, in which exactly one exon out of a cluster of neighbouring exons is spliced into the mature transcript. Recently, a new algorithm for the prediction of these exons has been developed based on the preconditions that the exons of the cluster have similar lengths, sequence homology, and conserved splice sites, and that they are translated in the same reading frame.
In this contribution we introduce Kassiopeia, a database and web application for the generation, storage, and presentation of genome-wide analyses of mutually exclusive exomes. Currently, Kassiopeia provides access to the mutually exclusive exomes of twelve Drosophila species, the thale cress Arabidopsis thaliana, the flatworm Caenorhabditis elegans, and human. Mutually exclusive spliced exons (MXEs) were predicted based on gene reconstructions from Scipio. Based on the standard prediction values, with which 83.5% of the annotated MXEs of Drosophila melanogaster were reconstructed, the exomes contain surprisingly more MXEs than previously supposed and identified. The user can search Kassiopeia using BLAST or browse the genes of each species optionally adjusting the parameters used for the prediction to reveal more divergent or only very similar exon candidates.
We developed a pipeline to predict MXEs in the genomes of several model organisms and a web interface, Kassiopeia, for their visualization. For each gene Kassiopeia provides a comprehensive gene structure scheme, the sequences and predicted secondary structures of the MXEs, and, if available, further evidence for MXE candidates from cDNA/EST data, predictions of MXEs in homologous genes of closely related species, and RNA secondary structure predictions. Kassiopeia can be accessed at http://www.motorprotein.de/kassiopeia.
Mutually exclusive splicing; Database; Web application; Drosophila
Parkinson’s disease (PD) is complex and heterogeneous. The numerous susceptibility loci that have been identified reaffirm the complexity of PD but do not fully explain it; e.g., it is not known if any given PD susceptibility gene is associated with all PD or a disease subtype. We also suspect that important disease genes may have escaped detection because of this heterogeneity. We used presence/absence of family history to subdivide the cases and performed genome-wide association studies (GWAS) in Sporadic-PD and Familial-PD separately. The aim was to uncover new genes and gain insight into the genetic architecture of PD.
Employing GWAS on the NeuroGenetics Research Consortium (NGRC) dataset stratified by family history (1565 Sporadic-PD, 435 Familial-PD, 1986 controls), we identified a novel locus on chromosome 1p21 in Sporadic-PD (PNGRC = 4×10-8) and replicated the finding (PReplication = 6×10-3; PPooled = 4×10-10) in 1528 Sporadic-PD and 796 controls from the National Institutes of Neurologic Disease and Stroke (NINDS) Repository. This is the fifth PD locus to be mapped to the short arm of chromosome 1. It is flanked by S1PR1 and OLFM3 genes, and is 200 kb from a multiple sclerosis susceptibility gene. The second aim of the study was to extend the stratified GWAS to the well-established PD genes. SNCA_ rs356220 was associated with both Sporadic-PD (OR = 1.37, P = 1×10-9) and Familial-PD (OR = 1.40, P = 2×10-5). HLA_rs3129882 was more strongly associated with Sporadic-PD (OR = 1.38, P = 5×10-10) than Familial-PD (OR = 1.12, P = 0.15). In the MAPT region, virtually every single nucleotide polymorphism (SNP) had a stronger effect-size and lower P-value in Familial-PD (peak P = 8×10-7) than in Sporadic-PD (peak P = 2×10-5).
We discovered and replicated a new locus for Sporadic-PD which had escaped detection in un-stratified GWAS. This demonstrates that by stratifying on a key variable the power gained due to diminished heterogeneity can sometimes outweigh the power lost to reduced sample size. We also detected distinct patterns of disease associations for previously established PD susceptibility genes, which gives an insight to the genetic architecture of the disease and could aid in the selection of appropriate study population for future studies.
GWAS; Parkinson’s disease; SNCA; MAPT; HLA; Genetic heterogeneity; Secondary GWAS; Stratified GWAS; Chromosome 1p
The persimmon Diospyros kaki Thunb. is an important commercial and deciduous fruit tree. The fruits have proanthocyanidin (PA) content of >25% of the dry weight and are astringent. PAs cause astringency that is often undesirable for human consumption; thus, the removal of astringency is an important practice in the persimmon industry. Soluble PAs can be converted to insoluble PAs by enclosing the fruit in a polyethylene bag containing diluted ethanol. The genomic resource development of the persimmon is delayed because of its large and complex genome. Second-generation sequencing is an efficient technique for generating huge sequences that can represent a large number of genes and their expression levels.
We used 454 sequencing for the de novo transcriptome assembly of persimmon fruit treated with 5% ethanol (Tr library) and without treatment as the control (Co library) to investigate the genes and pathways that control PA biosynthesis and other secondary metabolites. We obtained 374.6 Mb in clean nucleotides comprising 624,690 and 626,203 clean sequencing reads from the Tr and Co libraries, respectively. We also identified 83,898 unigenes; 54,719 (~65.2%) unigenes were annotated based on similarity searches with known proteins. Up to 14,954 of the unigenes were assigned to the protein database Clusters of Orthologous Groups (COG), 24,337 were assigned to the term annotation database of Gene Ontology (GO), and 45,506 were assigned to 200 pathways in the database of Kyoto Encyclopedia of Genes and Genomes (KEGG). The two libraries were compared to identify the differentially expressed unigenes. The expression levels of genes involved in PA biosynthesis and tannin coagulation were analysed, and some of them were verified using quantitative real time PCR (qRT-PCR).
This study provides abundant genomic data for persimmon and offers comprehensive sequence resources for persimmon research. The transcriptome dataset will improve our understanding of the molecular mechanisms of tannin coagulation and other biochemical processes in persimmons.
Persimmon; Transcriptome analysis; 454 sequencing
Recent transcriptomic analysis of the bovine Y chromosome revealed at least six multi-copy protein coding gene families, including TSPY, HSFY and ZNF280BY, on the male-specific region (MSY). Previous studies indicated that the copy number variations (CNVs) of the human and bovine TSPY were associated with male fertility in men and cattle. However, the relationship between CNVs of the bovine Y-linked HSFY and ZNF280BY gene families and bull fertility has not been investigated.
We investigated the copy number (CN) of the bovine HSFY and ZNF280BY in a total of 460 bulls from 15 breeds using a quantitative PCR approach. We observed CNVs for both gene families within and between cattle breeds. The median copy number (MCN) of HSFY among all bulls was 197, ranging from 21 to 308. The MCN of ZNF280BY was 236, varying from 28 to 380. Furthermore, bulls in the Bos taurus (BTA) lineage had a significantly higher MCN (202) of HSFY than bulls in the Bos indicus (BIN) lineage (178), while taurine bulls had a significantly lower MCN (231) of ZNF280BY than indicine bulls (284). In addition, the CN of ZNF280BY was positively correlated to that of HSFY on the BTAY. Association analysis revealed that the CNVs of both HSFY and ZNF280BY were correlated negatively with testis size, while positively with sire conception rate.
The bovine HSFY and ZNF280BY gene families have extensively expanded on the Y chromosome during evolution. The CN of both gene families varies significantly among individuals and cattle breeds. These variations were associated with testis size and bull fertility in Holstein, suggesting that the CNVs of HSFY and ZNF280BY may serve as valuable makers for male fertility selection in cattle.
CNVs; HSFY; ZNF280BY; Male fertility; Testis size; Sire conception rate; Cattle
Nematode-trapping fungi are a unique group of organisms that can capture nematodes using sophisticated trapping structures. The genome of Drechslerella stenobrocha, a constricting-ring-forming fungus, has been sequenced and reported, and provided new insights into the evolutionary origins of nematode predation in fungi, the trapping mechanisms, and the dual lifestyles of saprophagy and predation.
The genome of the fungus Drechslerella stenobrocha, which mechanically traps nematodes using a constricting ring, was sequenced. The genome was 29.02 Mb in size and was found rare instances of transposons and repeat induced point mutations, than that of Arthrobotrys oligospora. The functional proteins involved in nematode-infection, such as chitinases, subtilisins, and adhesive proteins, underwent a significant expansion in the A. oligospora genome, while there were fewer lectin genes that mediate fungus-nematode recognition in the D. stenobrocha genome. The carbohydrate-degrading enzyme catalogs in both species were similar to those of efficient cellulolytic fungi, suggesting a saprophytic origin of nematode-trapping fungi. In D. stenobrocha, the down-regulation of saprophytic enzyme genes and the up-regulation of infection-related genes during the capture of nematodes indicated a transition between dual life strategies of saprophagy and predation. The transcriptional profiles also indicated that trap formation was related to the protein kinase C (PKC) signal pathway and regulated by Zn(2)–C6 type transcription factors.
The genome of D. stenobrocha provides support for the hypothesis that nematode trapping fungi evolved from saprophytic fungi in a high carbon and low nitrogen environment. It reveals the transition between saprophagy and predation of these fungi and also proves new insights into the mechanisms of mechanical trapping.
Nematode-trapping fungi; Comparative genomic analysis; Origin of nematode predation; Transcriptomes; Trapping mechanism
We explored the use of genotyping by sequencing (GBS) on a recombinant inbred line population (GPMx) derived from a cross between the two-rowed barley cultivar ‘Golden Promise’ (ari-e.GP/Vrs1) and the six-rowed cultivar ‘Morex’ (Ari-e/vrs1) to map plant height. We identified three Quantitative Trait Loci (QTL), the first in a region encompassing the spike architecture gene Vrs1 on chromosome 2H, the second in an uncharacterised centromeric region on chromosome 3H, and the third in a region of chromosome 5H coinciding with the previously described dwarfing gene Breviaristatum-e (Ari-e).
Barley cultivars in North-western Europe largely contain either of two dwarfing genes; Denso on chromosome 3H, a presumed ortholog of the rice green revolution gene OsSd1, or Breviaristatum-e (ari-e) on chromosome 5H. A recessive mutant allele of the latter gene, ari-e.GP, was introduced into cultivation via the cv. ‘Golden Promise’ that was a favourite of the Scottish malt whisky industry for many years and is still used in agriculture today.
Using GBS mapping data and phenotypic measurements we show that ari-e.GP maps to a small genetic interval on chromosome 5H and that alternative alleles at a region encompassing Vrs1 on 2H along with a region on chromosome 3H also influence plant height. The location of Ari-e is supported by analysis of near-isogenic lines containing different ari-e alleles. We explored use of the GBS to populate the region with sequence contigs from the recently released physically and genetically integrated barley genome sequence assembly as a step towards Ari-e gene identification.
GBS was an effective and relatively low-cost approach to rapidly construct a genetic map of the GPMx population that was suitable for genetic analysis of row type and height traits, allowing us to precisely position ari-e.GP on chromosome 5H. Mapping resolution was lower than we anticipated. We found the GBS data more complex to analyse than other data types but it did directly provide linked SNP markers for subsequent higher resolution genetic analysis.
Barley; Dwarfing gene; Genotyping by sequencing; Physical map
Repeat sequences are abundant in eukaryotic genomes but many are excluded from genome assemblies. In Drosophila melanogaster classical studies of repeat content suggested variability between individuals, but they lacked the precision of modern high throughput sequencing technologies. Genome-wide profiling of chromatin features such as histone tail modifications and DNA-binding proteins relies on alignment to the reference genome and hence excludes highly repetitive sequences.
By analyzing repeat libraries, sequence complexity and k-mer counts we determined the abundances of different D. melanogaster repeat classes in flies in two public datasets, DGRP and modENCODE. We found that larval DNA was depleted of all repeat classes relative to adult and embryonic DNA, as expected from the known depletion of repeat-rich pericentromeric regions during polytenization of larval tissues. By applying a method that is independent of alignment to the genome assembly, we found that satellite repeats associate with distinct H3 tail modifications, such as H3K9me2 and H3K9me3 for short repeats and H3K9me1 for 359 bp repeats. Short AT-rich repeats however are depleted of nucleosomes and hence all histone modifications and associated chromatin proteins.
The total repeat content and association of repeat sequences with chromatin modifications can be determined despite repeats being excluded from genome assemblies, revealing unexpected distinctions in chromatin features based on sequence composition.
DNA satellites; Next-generation sequencing; ChIP-seq; Histone modification
B-thalassaemia and sickle cell disease (SCD) are two of the most common monogenic diseases that are found in many populations worldwide. In both disorders the clinical severity is highly variable, with the persistence of fetal haemoglobin (HbF) being one of the major ameliorating factors. HbF levels are affected by, amongst other factors, single nucleotide polymorphisms (SNPs) at the BCL11A gene and the HBS1L-MYB intergenic region, which are located outside the β-globin locus. For this reason, we developed two multiplex assays that allow the genotyping of SNPs at these two genomic regions which have been shown to be associated with variable HbF levels in different populations.
Two multiplex assays based on the SNaPshot minisequencing approach were developed. The two assays can be used to simultaneous genotype twelve SNPs at the BCL11A gene and sixteen SNPs at HBS1L-MYB intergenic region which were shown to modify HbF levels. The different genotypes can be determined based on the position and the fluorescent colour of the peaks in a single electropherogram. DNA sequencing and restriction fragment length polymorphism (PCR-RFLP) assays were used to verify genotyping results obtained by SNaPshot minisequencing.
In summary, we propose two multiplex assays based on the SNaPshot minisequencing approach for the simultaneous identification of SNPs located at the BCL11A gene and HBS1L-MYB intergenic region which have an effect on HbF levels. The assays can be easily applied for accurate, time and cost efficient genotyping of the selected SNPs in various populations.
BCL11A; HBS1L-MYB; HbF; Thalassaemia; SCD; SNaPshot minisequencing; Multiplex PCR; Polymorphisms
Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding programs. In Atlantic salmon (Salmo salar), these goals are currently hampered by the lack of a high-density SNP genotyping platform. Therefore, the aim of the study was to develop and test a dense Atlantic salmon SNP array.
SNP discovery was performed using extensive deep sequencing of Reduced Representation (RR-Seq), Restriction site-Associated DNA (RAD-Seq) and mRNA (RNA-Seq) libraries derived from farmed and wild Atlantic salmon samples (n = 283) resulting in the discovery of > 400 K putative SNPs. An Affymetrix Axiom® myDesign Custom Array was created and tested on samples of animals of wild and farmed origin (n = 96) revealing a total of 132,033 polymorphic SNPs with high call rate, good cluster separation on the array and stable Mendelian inheritance in our sample. At least 38% of these SNPs are from transcribed genomic regions and therefore more likely to include functional variants. Linkage analysis utilising the lack of male recombination in salmonids allowed the mapping of 40,214 SNPs distributed across all 29 pairs of chromosomes, highlighting the extensive genome-wide coverage of the SNPs. An identity-by-state clustering analysis revealed that the array can clearly distinguish between fish of different origins, within and between farmed and wild populations. Finally, Y-chromosome-specific probes included on the array provide an accurate molecular genetic test for sex.
This manuscript describes the first high-density SNP genotyping array for Atlantic salmon. This array will be publicly available and is likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in salmonids and in aquaculture breeding programs via genomic selection.
Atlantic salmon; Salmo salar; Polymorphism; Single nucleotide polymorphism, SNP; Next-generation sequencing; Array; Genomics; Mapping; Genome duplication
Brassica juncea is an economically important vegetable crop in China, oil crop in India, condiment crop in Europe and selected for canola quality recently in Canada and Australia. B. juncea (2n = 36, AABB) is an allotetraploid derived from interspecific hybridization between B. rapa (2n = 20, AA) and B. nigra (2n = 16, BB), followed by spontaneous chromosome doubling.
Comparative genome analysis by genome survey sequence (GSS) of allopolyploid B. juncea with B. rapa was carried out based on high-throughput sequencing approaches. Over 28.35 Gb of GSS data were used for comparative analysis of B. juncea and B. rapa, producing 45.93% reads mapping to the B. rapa genome with a high ratio of single-end reads. Mapping data suggested more structure variation (SV) in the B. juncea genome than in B. rapa. We detected 2,921,310 single nucleotide polymorphisms (SNPs) with high heterozygosity and 113,368 SVs, including 1-3 bp Indels, between B. juncea and B. rapa. Non-synonymous polymorphisms in glucosinolate biosynthesis genes may account for differences in glucosinolate biosynthesis and glucosinolate components between B. juncea and B. rapa. Furthermore, we identified distinctive vernalization-dependent and photoperiod-dependent flowering pathways coexisting in allopolyploid B. juncea, suggesting contribution of these pathways to adaptation for survival during polyploidization.
Taken together, we proposed that polyploidization has allowed for accelerated evolution of the glucosinolate biosynthesis and flowering pathways in B. juncea that likely permit the phenotypic variation observed in the crop.
Brassica juncea; Comparative genome analysis; Flowering pathway; Genome survey sequencing; Glucosinolate biosynthesis
Divergence in gene regulation has emerged as a key mechanism underlying species differentiation. Comparative analysis of co-expression networks across species can reveal conservation and divergence in the regulation of genes.
We inferred co-expression networks of A. thaliana, Populus spp. and O. sativa using state-of-the-art methods based on mutual information and context likelihood of relatedness, and conducted a comprehensive comparison of these networks across a range of co-expression thresholds. In addition to quantifying gene-gene link and network neighbourhood conservation, we also applied recent advancements in network analysis to do cross-species comparisons of network properties such as scale free characteristics and gene centrality as well as network motifs. We found that in all species the networks emerged as scale free only above a certain co-expression threshold, and that the high-centrality genes upholding this organization tended to be conserved. Network motifs, in particular the feed-forward loop, were found to be significantly enriched in specific functional subnetworks but where much less conserved across species than gene centrality. Although individual gene-gene co-expression had massively diverged, up to ~80% of the genes still had a significantly conserved network neighbourhood. For genes with multiple predicted orthologs, about half had one ortholog with conserved regulation and another ortholog with diverged or non-conserved regulation. Furthermore, the most sequence similar ortholog was not the one with the most conserved gene regulation in over half of the cases.
We have provided a comprehensive analysis of gene regulation evolution in plants and built a web tool for Comparative analysis of Plant co-Expression networks (ComPlEx, http://complex.plantgenie.org/). The tool can be particularly useful for identifying the ortholog with the most conserved regulation among several sequence-similar alternatives and can thus be of practical importance in e.g. finding candidate genes for perturbation experiments.
Rice is considered a short day plant. Originally from tropical regions rice has been progressively adapted to temperate climates and long day conditions in part by modulating its sensitivity to day length. Heading date 3a (Hd3a) and RICE FLOWERING LOCUS T 1 (RFT1) that code for florigens, are known as major regulatory genes of floral transition in rice. Both Hd3a and RFT1 are regulated by Early heading date 1 (Ehd1) and Days to heading on chromosome 2 (DTH2) while Heading date 1 (Hd1) also governs Hd3a expression. To investigate the mechanism of rice adaptation to temperate climates we have analyzed the natural variation of these five genes in a collection of japonica rice representing the genetic diversity of long day cultivated rice.
We have investigated polymorphisms of Hd3a, RFT1, Ehd1, Hd1 and DTH2 in a collection of 57 japonica varieties. Hd3a and RFT1 were highly conserved, displaying one major allele. Expression analysis suggested that RFT1 rather than Hd3a could be the pivotal gene controlling flowering under long day conditions. While few alleles were found in the Ehd1 promoter and DTH2 coding region, a high degree of variation in Hd1, including non-functional alleles, was observed. Correlation analysis between gene expression levels and flowering periods suggested the occurrence of other factors, additionally to Ehd1, affecting RFT1 regulation in long day adapted cultivars.
During domestication, rice expansion was accompanied by changes in the regulatory mechanism of flowering. The existence of non-functional Hd1 alleles and the lack of correlation of their presence with flowering times in plants grown under long day conditions, indicate a minor role of this branch in this process and the existence of an alternative regulatory pathway in northern latitudes. Expression analysis data and a high degree of conservation of RFT1 suggested that this gene could be the main factor regulating flowering among japonica cultivars adapted to northern areas. In the absence of inhibition exerted by Hd1 through repression of Hd3a expression, the role of Ehd1 as a regulator of RFT1 and Hd3a appears to be reinforced. Data also indicated the occurrence of additional regulatory factors controlling flowering.
Flowering; Short day; Rice; Polymorphism; Natural variation
Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale.
Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences.
It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists in the identification of new candidates for further experimental characterization.
Prion domain; Protein aggregation; Amyloid fibrils; Prion prediction
Temperature sensitive lethal (tsl) mutants of the tephritid C. capitata are used extensively in control programs involving sterile insect technique in California. These flies are artificially reared and treated with ionizing radiation to render males sterile for further release en masse into the field to compete with wild males and disrupt establishment of invasive populations. Recent research suggests establishment of C. capitata in California, despite the fact that over 250 million sterile flies are released weekly as part of the state’s preventative program. In this project, genome-level quality assessment was performed, measured as expression differences between the Vienna-7 tsl mutants used in SIT programs and wild flies. RNA-seq was performed to provide a genome-wide map of the messenger RNA populations in C. capitata, and to investigate significant expression changes in Vienna-7 mass reared flies.
Flies from the Vienna-7 colony showed a markedly reduced abundance of transcripts related to visual and chemical responses, including light stimuli, neural development and signaling pathways when compared to wild flies. In addition, genes associated with muscle development and locomotion were shown to be reduced. This suggests that the Vienna-7 line may be less competitive in mating and host plant finding where these stimuli are utilized. Irradiated flies showed several transcripts representing stress associated with irradiation.
There are significant changes at the transcriptome level that likely alter the competitiveness of mass reared flies and provide justification for pursuing methods for strain improvement, increasing competitiveness of mass-reared flies, or exploring alternative SIT approaches to increase the efficiency of eradication programs.
Medfly; Ceratitis capitata; RNA-seq; Sterile insect technique; Irradiation; Sterilization
The immature fiber (im) mutant of Gossypium hirsutum L. is a special cotton fiber mutant with non-fluffy fibers. It has low dry weight and fineness of fibers due to developmental defects in fiber secondary cell wall (SCW).
We compared the cellulose content in fibers, thickness of fiber cell wall and fiber transcriptional profiling during SCW development in im mutant and its near-isogenic wild-type line (NIL) TM-1. The im mutant had lower cellulose content and thinner cell walls than TM-1 at same fiber developmental stage. During 25 ~ 35 day post-anthesis (DPA), sucrose content, an important carbon source for cellulose synthesis, was also significantly lower in im mutant than in TM-1. Comparative analysis of fiber transcriptional profiling from 13 ~ 25 DPA indicated that the largest transcriptional variations between the two lines occurred at the onset of SCW development. TM-1 began SCW biosynthesis approximately at 16 DPA, whereas the same fiber developmental program in im mutant was delayed until 19 DPA, suggesting an asynchronous fiber developmental program between TM-1 and im mutant. Functional classification and enrichment analysis of differentially expressed genes (DEGs) between the two NILs indicated that genes associated with biological processes related to cellulose synthesis, secondary cell wall biogenesis, cell wall thickening and sucrose metabolism, respectively, were significantly up-regulated in TM-1. Twelve genes related to carbohydrate metabolism were validated by quantitative reverse transcription PCR (qRT-PCR) and confirmed a temporal difference at the earlier transition and SCW biosynthesis stages of fiber development between TM-1 and im mutant.
We propose that Im is an important regulatory gene influencing temporal differences in expression of genes related to fiber SCW biosynthesis. This study lays a foundation for cloning the Im gene, elucidating molecular mechanism of fiber SCW development and further genetic manipulation for the improvement of fiber fineness and maturity.
Immature fiber mutant; Expression profiling; Fiber secondary cell wall thickening; Fiber micronaire; Microarray; Gossypium hirsutum