Cotton is the leading fiber crop worldwide. Gossypium barbadense is an important species of cotton because of its extra-long staple fibers with superior luster and silkiness. However, a systematic analysis and utilization of cDNA sequences from G. barbadense fiber development remains understudied.
A total of 21,079 high quality sequences were generated from two non-normalized cDNA libraries prepared by using a mixture of G. barbadense Hai7124 fibers and ovules. After assembly processing, a set of 8,653 unigenes were obtained. Of those, 7,786 were matched to known proteins and 7,316 were assigned to functional categories. The molecular functions of these unigenes were mostly related to binding and catalytic activity, and carbohydrate, amino acid, and energy metabolisms were major contributors among the subsets of metabolism. Sequences comparison between G. barbadense and G. hirsutum revealed that 8,245 unigenes from G. barbadense were detected the similarity with those released publicly in G. hirsutum, however, the remaining 408 sequences had no hits against G. hirsutum unigenes database. Furthermore, 13,275 putative ESTs InDels loci involved in the orthologous and/or homoeologous differences between/within G. barbadense and G. hirsutum were discovered by in silico analyses, and 2,160 InDel markers were developed by ESTs with more than five insertions or deletions. By gel electrophoresis combined with sequencing verification, 71.11% candidate InDel loci were reconfirmed orthologous and/or homoeologous loci polymorphisms using G. hirsutum acc TM-1 and G. barbadense cv Hai7124. Blastx result showed among 2,160 InDel loci, 81 with significant function similarity with known genes associated with secondary wall synthesis process, indicating the important roles in fiber quality in tetraploid cultivated cotton species.
Sequence comparisons and InDel markers development will lay the groundwork for promoting the identification of genes related to superior agronomic traits, genetic differentiation and comparative genomic studies between G. hirsutum and G. barbadense.
Chromosomal rearrangements in the form of deletions, insertions, inversions and translocations are frequently observed in breast cancer genomes, and a subset of these rearrangements may play a crucial role in tumorigenesis. To identify novel somatic chromosomal rearrangements, we determined the genome structures of 15 hormone-receptor negative breast tumors by long-insert mate pair massively parallel sequencing.
We identified and validated 40 somatic structural alterations, including the recurring fusion between genes DDX10 and SKA3 and translocations involving the EPHA5 gene. Other rearrangements were found to affect genes in pathways involved in epigenetic regulation, mitosis and signal transduction, underscoring their potential role in breast tumorigenesis. RNA interference-mediated suppression of five candidate genes (DDX10, SKA3, EPHA5, CLTC and TNIK) led to inhibition of breast cancer cell growth. Moreover, downregulation of DDX10 in breast cancer cells lead to an increased frequency of apoptotic nuclear morphology.
Using whole genome mate pair sequencing and RNA interference assays, we have discovered a number of novel gene rearrangements in breast cancer genomes and identified DDX10, SKA3, EPHA5, CLTC and TNIK as potential cancer genes with impact on the growth and proliferation of breast cancer cells.
Double strand (ds) DNA breaks are a form of DNA damage that can be generated from both genotoxic exposures and physiologic processes, can disrupt cellular functions and can be lethal if not repaired properly. Physiologic dsDNA breaks are generated in a variety of normal cellular functions, including the RAG endonuclease-mediated rearrangement of antigen receptor genes during the normal development of lymphocytes. We previously showed that physiologic breaks initiate lymphocyte development-specific transcriptional programs. Here we compare transcriptional responses to physiological DNA breaks with responses to genotoxic DNA damage induced by ionizing radiation.
We identified a central lymphocyte-specific transcriptional response common to both physiologic and genotoxic breaks, which includes many lymphocyte developmental processes. Genotoxic damage causes robust alterations to pathways associated with B cell activation and increased proliferation, suggesting that genotoxic damage initiates not only the normal B cell maturation processes but also mimics activated B cell response to antigenic agents. Notably, changes including elevated levels of expression of Kras and mmu-miR-155 and the repression of Socs1 were observed following genotoxic damage, reflecting induction of a cancer-prone phenotype.
Comparing these transcriptional responses provides a greater understanding of the mechanisms cells use in the differentiation between types of DNA damage and the potential consequences of different sources of damage. These results suggest genotoxic damage may induce a unique cancer-prone phenotype and processes mimicking activated B cell response to antigenic agents, as well as the normal B cell maturation processes.
miR-155; B cells; Ionizing radiation; DNA damage; Double strand breaks; Transcriptome profiles
Despite the importance of wheat as a major staple crop and the negative impact of diseases on its production worldwide, the genetic mechanisms and gene interactions involved in the resistance response in wheat are still poorly understood. The complete sequence of the rice genome has provided an extremely useful parallel road map for genetic and genomics studies in wheat. The recent construction of a defense response interactome in rice has the potential to further enhance the translation of advances in rice to wheat and other grasses. The objective of this study was to determine the degree of conservation in the protein-protein interactions in the rice and wheat defense response interactomes. As entry points we selected proteins that serve as key regulators of the rice defense response: the RAR1/SGT1/HSP90 protein complex, NPR1, XA21, and XB12 (XA21 interacting protein 12).
Using available wheat sequence databases and phylogenetic analyses we identified and cloned the wheat orthologs of these four rice proteins, including recently duplicated paralogs, and their known direct interactors and tested 86 binary protein interactions using yeast-two-hybrid (Y2H) assays. All interactions between wheat proteins were further tested using in planta bimolecular fluorescence complementation (BiFC). Eighty three percent of the known rice interactions were confirmed when wheat proteins were tested with rice interactors and 76% were confirmed using wheat protein pairs. All interactions in the RAR1/SGT1/ HSP90, NPR1 and XB12 nodes were confirmed for the identified orthologous wheat proteins, whereas only forty four percent of the interactions were confirmed in the interactome node centered on XA21. We hypothesize that this reduction may be associated with a different sub-functionalization history of the multiple duplications that occurred in this gene family after the divergence of the wheat and rice lineages.
The observed high conservation of interactions between proteins that serve as key regulators of the rice defense response suggests that the existing rice interactome can be used to predict interactions in wheat. Such predictions are less reliable for nodes that have undergone a different history of duplications and sub-functionalization in the two lineages.
Animal models are indispensable to understand the lipid metabolism and lipid metabolic diseases. Over the last decade, the nematode Caenorhabditis elegans has become a popular animal model for exploring the regulation of lipid metabolism, obesity, and obese-related diseases. However, the genomic and functional conservation of lipid metabolism from C. elegans to humans remains unknown. In the present study, we systematically analyzed genes involved in lipid metabolism in the C. elegans genome using comparative genomics.
We built a database containing 471 lipid genes from the C. elegans genome, and then assigned most of lipid genes into 16 different lipid metabolic pathways that were integrated into a network. Over 70% of C. elegans lipid genes have human orthologs, with 237 of 471 C. elegans lipid genes being conserved in humans, mice, rats, and Drosophila, of which 71 genes are specifically related to human metabolic diseases. Moreover, RNA-mediated interference (RNAi) was used to disrupt the expression of 356 of 471 lipid genes with available RNAi clones. We found that 21 genes strongly affect fat storage, development, reproduction, and other visible phenotypes, 6 of which have not previously been implicated in the regulation of fat metabolism and other phenotypes.
This study provides the first systematic genomic insight into lipid metabolism in C. elegans, supporting the use of C. elegans as an increasingly prominent model in the study of metabolic diseases.
Caenorhabditis elegans; Lipid metabolism; Comparative genomics; RNAi; Fat storage
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor.
Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST.
This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.
Apis mellifera; Deformed wing virus; Israel acute paralysis virus; Single nucleotide polymorphism; Population genomics
In an intercross between the SWR/J and BALB/c mouse strains, the pulmonary adenoma progression 1 (Papg1) locus on chromosome 4 modulates lung tumor size, one of several measures of lung tumor progression. This locus has not been fully characterized and defined in its extent and genetic content. Fine mapping of this and other loci affecting lung tumor phenotype is possible using recombinant inbred strains.
A population of 376 mice, obtained by crossing mice of the SWR/J strain with CXBN recombinant inbred mice, was treated with a single dose of urethane and assayed for multiplicity of large lung tumors (N2lung). A genome-wide analysis comparing N2lung with 6364 autosomal SNPs revealed multiple peaks of association. The Papg1 locus had two peaks, at rs3654162 (70.574 Mb, -logP=2.8) and rs6209043 (86.606 Mb, -logP=2.7), joined by an interval of weaker statistical association; these data confirm the presence of Papg1 on chromosome 4 and reduce the mapping region to two stretches of ~6.8 and ~4.2 Mb, in the proximal and distal peaks, respectively. The distal peak included Cdkn2a, a gene already proposed as being involved in Papg1 function. Other loci possibly modulating N2lung were detected on chromosomes 5, 8, 9, 11, 15, and 19, but analysis for linkage disequilibrium of these putative loci with Papg1 locus suggested that only those on chromosomes 11 and 15 were true positives.
These findings suggest that Papg1 consists, most likely, of two distinct, nearby loci, and point to putative additional loci on chromosomes 11 and 15 modulating lung tumor size. Within Papg1, Cdkn2a appears to be a strong candidate gene while additional Papg1 genes await to be identified. Greater knowledge of the genetic and biochemical mechanisms underlying the germ-line modulation of lung tumor size in mice is relevant to other species, including humans, in that it may help identify new therapeutic targets in the fight against tumor progression.
Animal models; CXB recombinant inbred; Disease models; Genome-wide association study; Lung tumors; SNPs; Tumor multiplicity
The meninges (arachnoid and pial membranes) and associated vasculature (MAV) and choroid plexus are important in maintaining cerebrospinal fluid (CSF) generation and flow. MAV vasculature was previously observed to be adversely affected by environmentally-induced hyperthermia (EIH) and more so by a neurotoxic amphetamine (AMPH) exposure. Herein, microarray and RT-PCR analysis was used to compare the gene expression profiles between choroid plexus and MAV under control conditions and at 3 hours and 1 day after EIH or AMPH exposure. Since AMPH and EIH are so disruptive to vasculature, genes related to vasculature integrity and function were of interest.
Our data shows that, under control conditions, many of the genes with relatively high expression in both the MAV and choroid plexus are also abundant in many epithelial tissues. These genes function in transport of water, ions, and solutes, and likely play a role in CSF regulation. Most genes that help form the blood–brain barrier (BBB) and tight junctions were also highly expressed in MAV but not in choroid plexus. In MAV, exposure to EIH and more so to AMPH decreased the expression of BBB-related genes such as Sox18, Ocln, and Cldn5, but they were much less affected in the choroid plexus. There was a correlation between the genes related to reactive oxidative stress and damage that were significantly altered in the MAV and choroid plexus after either EIH or AMPH. However, AMPH (at 3 hr) significantly affected about 5 times as many genes as EIH in the MAV, while in the choroid plexus EIH affected more genes than AMPH. Several unique genes that are not specifically related to vascular damage increased to a much greater extent after AMPH compared to EIH in the MAV (Lbp, Reg3a, Reg3b, Slc15a1, Sct and Fst) and choroid plexus (Bmp4, Dio2 and Lbp).
Our study indicates that the disruption of choroid plexus function and damage produced by AMPH and EIH is significant, but the changes may not be as pronounced as they are in the MAV, particularly for AMPH. Expression profiles in the MAV and choroid plexus differed to some extent and differences were not restricted to vascular related genes.
Gene expression; Meninges; Cerebral vasculature; Choroid plexus; Cerebrospinal fluid; Amphetamines; Hyperthermia
Macrosatellite repeats (MSRs), usually spanning hundreds of kilobases of genomic DNA, comprise a significant proportion of the human genome. Because of their highly polymorphic nature, MSRs represent an extreme example of copy number variation, but their structure and function is largely understudied. Here, we describe a detailed study of six autosomal and two X chromosomal MSRs among 270 HapMap individuals from Central Europe, Asia and Africa. Copy number variation, stability and genetic heterogeneity of the autosomal macrosatellite repeats RS447 (chromosome 4p), MSR5p (5p), FLJ40296 (13q), RNU2 (17q) and D4Z4 (4q and 10q) and X chromosomal DXZ4 and CT47 were investigated.
Repeat array size distribution analysis shows that all of these MSRs are highly polymorphic with the most genetic variation among Africans and the least among Asians. A mitotic mutation rate of 0.4-2.2% was observed, exceeding meiotic mutation rates and possibly explaining the large size variability found for these MSRs. By means of a novel Bayesian approach, statistical support for a distinct multimodal rather than a uniform allele size distribution was detected in seven out of eight MSRs, with evidence for equidistant intervals between the modes.
The multimodal distributions with evidence for equidistant intervals, in combination with the observation of MSR-specific constraints on minimum array size, suggest that MSRs are limited in their configurations and that deviations thereof may cause disease, as is the case for facioscapulohumeral muscular dystrophy. However, at present we cannot exclude that there are mechanistic constraints for MSRs that are not directly disease-related. This study represents the first comprehensive study of MSRs in different human populations by applying novel statistical methods and identifies commonalities and differences in their organization and function in the human genome.
Tandem repeat sequences; DNA copy number variations; Population genetics; Bayes theorem
A subset of breast cancer cells displays increased ability to self-renew and reproduce breast cancer heterogeneity. The characterization of these so-called putative breast tumor-initiating cells (BT-ICs) may open the road for novel therapeutic strategies. As microRNAs (miRNAs) control developmental programs in stem cells, BT-ICs may also rely on specific miRNA profiles for their sustained activity. To explore the notion that miRNAs may have a role in sustaining BT-ICs, we performed a comprehensive profiling of miRNA expression in a model of putative BT-ICs enriched by non-attachment growth conditions.
We found breast cancer cells grown under non-attachment conditions display a unique pattern of miRNA expression, highlighted by a marked low expression of miR-30 family members relative to parental cells. We further show that miR-30a regulates non-attachment growth. A target screening revealed that miR-30 family redundantly modulates the expression of apoptosis and proliferation-related genes. At least one of these targets, the anti-apoptotic protein AVEN, was able to partially revert the effect of miR-30a overexpression. Finally, overexpression of miR-30a in vivo was associated with reduced breast tumor progression.
miR30-family regulates the growth of breast cancer cells in non-attachment conditions. This is the first analysis of target prediction in a whole family of microRNAs potentially involved in survival of putative BT-ICs.
Breast cancer; BT-ICs; Mammospheres; microRNAs; miR-30 family; AVEN
MicroRNAs (miRNAs) are 20–21 nucleotide RNA molecules that suppress the transcription of target genes and may also inhibit translation. Despite the thousands of miRNAs identified and validated in numerous plant species, only small numbers have been identified from the oilseed crop plant Brassica napus (canola) – especially in seeds.
Using next-generation sequencing technologies, we performed a comprehensive analysis of miRNAs during seed maturation at 9 time points from 10 days after flowering (DAF) to 50 DAF using whole seeds and included separate analyses of radicle, hypocotyl, cotyledon, embryo, endosperm and seed coat tissues at 4 selected time points. We identified more than 500 conserved miRNA or variant unique sequences with >300 sequence reads and also found 10 novel miRNAs. Only 27 of the conserved miRNA sequences had been previously identified in B. napus (miRBase Release 18). More than 180 MIRNA loci were identified/annotated using the B. rapa genome as a surrogate for the B.napus A genome. Numerous miRNAs were expressed in a stage- or tissue-specific manner suggesting that they have specific functions related to the fine tuning of transcript abundance during seed development. miRNA targets in B. napus were predicted and their expression patterns profiled using microarray analyses. Global correlation analysis of the expression patterns of miRNAs and their targets revealed complex miRNA-target gene regulatory networks during seed development. The miR156 family was the most abundant and the majority of the family members were primarily expressed in the embryo.
Large numbers of miRNAs with diverse expression patterns, multiple-targeting and co-targeting of many miRNAs, and complex relationships between expression of miRNAs and targets were identified in this study. Several key miRNA-target expression patterns were identified and new roles of miRNAs in regulating seed development are suggested. miR156, miR159, miR172, miR167, miR158 and miR166 are the major contributors to the network controlling seed development and maturation through their pivotal roles in plant development. miR156 may regulate the developmental transition to germination.
Seed development; Embryo; Next generation sequencing
Transcriptome analysis in combination with pathway-focused bioassays is
suggested to be a helpful approach for gaining deeper insights into the
complex mechanisms of action of herbal multicomponent preparations in living
cells. The polyherbalism based concept of Tibetan and Ayurvedic medicine
considers therapeutic efficacy through multi-target effects. A polyherbal
Indo-Tibetan preparation, Padma 28, approved by the Swiss drug authorities
(Swissmedic Nr. 58436), was applied to a more detailed dissection of
mechanism of action in human hepatoma HepG2 cells. Cell-free and cell-based
assays were employed to evaluate the antioxidant capacity. Genome-wide
expression profiling was done by applying Human Genome U133 Plus 2.0
Affymetrix arrays. Pathway- and network-oriented analysis elucidated the
affected biological processes. The results were validated using reporter
gene assays and quantitative real-time PCR.
To reveal the direct radical scavenging effects of the ethanolic extract of
the Indo-Tibetan polyherbal remedy Padma 28, an in vitro oxygen
radical absorbance capacity assay (ORAC) was employed, which resulted in a
peroxyl-radical scavenging activity of 2006 ± 235 μmol TE/g.
Furthermore, the antioxidant capacity of Padma 28 was analysed in living
HepG2 cells, by measuring its scavenging potential against radical induced
ROS. This formulation showed a considerable antioxidant capacity by
significantly reducing ROS levels in a dose-dependent manner.
Integrated transcriptome analysis revealed a major influence on phase I and
phase II detoxification and the oxidative stress response. Selected target
genes, such as heme oxygenase 1, were validated in qPCR experiments. Network
analysis showed 18 interrelated networks involved in important biological
functions such as drug and bio-molecule metabolism, molecular transport and
cellular communication. Some molecules are part of signaling cascades that
are active during development and morphogenesis or are involved in
pathological conditions and inflammatory response.
The identified molecular targets and pathways suggest several mechanisms that
underlie the biological activity of the preparation. Although extrapolation
of these findings to the in vivo situation is not possible, the
results obtained might be the basis for further investigations and new
hypotheses to be tested. This study demonstrates the potential of the
combination of focused and unbiased research strategies in the mode of
action analysis of multicomponent herbal mixtures.
HepG2; Microarray; Multicomponent; Pathway analysis; Polyherbal; qPCR
As well known, both natural and synthetic steroidal compounds are powerful endocrine disrupting compounds (EDCs) which can cause reproductive toxicity and affect cellular development in mammals and thus are generally regarded as serious contributors to water pollution. Streptomyces virginiae IBL14 is an effective degradative strain for many steroidal compounds and can also catalyze the C25 hydroxylation of diosgenin, the first-ever biotransformation found on the F-ring of diosgenin.
To completely elucidate the hydroxylation function of cytochrome P450 genes (CYPs) found during biotransformation of steroids by S. virginiae IBL14, the whole genome sequencing of this strain was carried out via 454 Sequencing Systems. The analytical results of BLASTP showed that the strain IBL14 contains 33 CYPs, 7 ferredoxins and 3 ferredoxin reductases in its 8.0 Mb linear chromosome. CYPs from S. virginiae IBL14 are phylogenetically closed to those of Streptomyces sp. Mg1 and Streptomyces sp. C. One new subfamily was found as per the fact that the CYP Svu001 in S. virginiae IBL14 shares 66% identity only to that (ZP_05001937, protein identifer) from Streptomyces sp. Mg1. Further analysis showed that among all of the 33 CYPs in S. virginiae IBL14, three CYPs are clustered with ferredoxins, one with ferredoxin and ferredoxin reductase and three CYPs with ATP/GTP binding proteins, four CYPs arranged with transcriptional regulatory genes and one CYP located on the upstream of an ATP-binding protein and transcriptional regulators as well as four CYPs associated with other functional genes involved in secondary metabolism and degradation.
These characteristics found in CYPs from S. virginiae IBL14 show that the EXXR motif in the K-helix is not absolutely conserved in CYP157 family and I-helix not absolutely essential for the CYP structure, too. Experimental results showed that both CYP Svh01 and CYP Svu022 are two hydroxylases, capable of bioconverting diosgenone into isonuatigenone and β-estradiol into estriol, respectively.
Biotransformation; Cytochrome P450; Ferredoxin; Ferredoxin reductase; Gene sequencing; Secondary metabolism
Understanding the genetic basis of diseases is key to the development of better diagnoses and treatments. Unfortunately, only a small fraction of the existing data linking genes to phenotypes is available through online public resources and, when available, it is scattered across multiple access tools.
Neurocarta is a knowledgebase that consolidates information on genes and phenotypes across multiple resources and allows tracking and exploring of the associations. The system enables automatic and manual curation of evidence supporting each association, as well as user-enabled entry of their own annotations. Phenotypes are recorded using controlled vocabularies such as the Disease Ontology to facilitate computational inference and linking to external data sources. The gene-to-phenotype associations are filtered by stringent criteria to focus on the annotations most likely to be relevant. Neurocarta is constantly growing and currently holds more than 30,000 lines of evidence linking over 7,000 genes to 2,000 different phenotypes.
Neurocarta is a one-stop shop for researchers looking for candidate genes for any disorder of interest. In Neurocarta, they can review the evidence linking genes to phenotypes and filter out the evidence they’re not interested in. In addition, researchers can enter their own annotations from their experiments and analyze them in the context of existing public annotations. Neurocarta’s in-depth annotation of neurodevelopmental disorders makes it a unique resource for neuroscientists working on brain development.
Phenotype; Genes; Knowledgebase; Brain development
Cooperia oncophora and Ostertagia ostertagi are among the most important gastrointestinal nematodes of cattle worldwide. The economic losses caused by these parasites are on the order of hundreds of millions of dollars per year. Conventional treatment of these parasites is through anthelmintic drugs; however, as resistance to anthelmintics increases, overall effectiveness has begun decreasing. New methods of control and alternative drug targets are necessary. In-depth analysis of transcriptomic data can help provide these targets.
The assembly of 8.7 million and 11 million sequences from C. oncophora and O. ostertagi, respectively, resulted in 29,900 and 34,792 transcripts. Among these, 69% and 73% of the predicted peptides encoded by C. oncophora and O. ostertagi had homologues in other nematodes. Approximately 21% and 24% were constitutively expressed in both species, respectively; however, the numbers of transcripts that were stage specific were much smaller (~1% of the transcripts expressed in a stage). Approximately 21% of the transcripts in C. oncophora and 22% in O. ostertagi were up-regulated in a particular stage. Functional molecular signatures were detected for 46% and 35% of the transcripts in C. oncophora and O. ostertagi, respectively. More in-depth examinations of the most prevalent domains led to knowledge of gene expression changes between the free-living (egg, L1, L2 and L3 sheathed) and parasitic (L3 exsheathed, L4, and adult) stages. Domains previously implicated in growth and development such as chromo domains and the MADF domain tended to dominate in the free-living stages. In contrast, domains potentially involved in feeding such as the zinc finger and CAP domains dominated in the parasitic stages. Pathway analyses showed significant associations between life-cycle stages and peptides involved in energy metabolism in O. ostertagi whereas metabolism of cofactors and vitamins were specifically up-regulated in the parasitic stages of C. oncophora. Substantial differences were observed also between Gene Ontology terms associated with free-living and parasitic stages.
This study characterized transcriptomes from multiple life stages from both C. oncophora and O. ostertagi. These data represent an important resource for studying these parasites. The results of this study show distinct differences in the genes involved in the free-living and parasitic life cycle stages. The data produced will enable better annotation of the upcoming genome sequences and will allow future comparative analyses of the biology, evolution and adaptation to parasitism in nematodes.
Cattle; Parasite; Nematode; Transcripts; Ostertagia ostertagi; Cooperia oncophora; Comparative genomics
Trichoderma is a genus of mycotrophic filamentous fungi (teleomorph Hypocrea) which possess a bright variety of biotrophic and saprotrophic lifestyles. The ability to parasitize and/or kill other fungi (mycoparasitism) is used in plant protection against soil-borne fungal diseases (biological control, or biocontrol). To investigate mechanisms of mycoparasitism, we compared the transcriptional responses of cosmopolitan opportunistic species and powerful biocontrol agents Trichoderma atroviride and T. virens with tropical ecologically restricted species T. reesei during confrontations with a plant pathogenic fungus Rhizoctonia solani.
The three Trichoderma spp. exhibited a strikingly different transcriptomic response already before physical contact with alien hyphae. T. atroviride expressed an array of genes involved in production of secondary metabolites, GH16 ß-glucanases, various proteases and small secreted cysteine rich proteins. T. virens, on the other hand, expressed mainly the genes for biosynthesis of gliotoxin, respective precursors and also glutathione, which is necessary for gliotoxin biosynthesis. In contrast, T. reesei increased the expression of genes encoding cellulases and hemicellulases, and of the genes involved in solute transport. The majority of differentially regulated genes were orthologues present in all three species or both in T. atroviride and T. virens, indicating that the regulation of expression of these genes is different in the three Trichoderma spp. The genes expressed in all three fungi exhibited a nonrandom genomic distribution, indicating a possibility for their regulation via chromatin modification.
This genome-wide expression study demonstrates that the initial Trichoderma mycotrophy has differentiated into several alternative ecological strategies ranging from parasitism to predation and saprotrophy. It provides first insights into the mechanisms of interactions between Trichoderma and other fungi that may be exploited for further development of biofungicides.
Hypocrea; T. atroviride; T. virens; T. reesei; Mycoparasitism; Gene expression; Biocontrol; Transcriptomics
The plant-pathogenic fungus Fusarium oxysporum f.sp.lycopersici (Fol) has accessory, lineage-specific (LS) chromosomes that can be transferred horizontally between strains. A single LS chromosome in the Fol4287 reference strain harbors all known Fol effector genes. Transfer of this pathogenicity chromosome confers virulence to a previously non-pathogenic recipient strain. We hypothesize that expression and evolution of effector genes is influenced by their genomic context.
To gain a better understanding of the genomic context of the effector genes, we manually curated the annotated genes on the pathogenicity chromosome and identified and classified transposable elements. Both retro- and DNA transposons are present with no particular overrepresented class. Retrotransposons appear evenly distributed over the chromosome, while DNA transposons tend to concentrate in large chromosomal subregions. In general, genes on the pathogenicity chromosome are dispersed within the repeat landscape. Effector genes are present within subregions enriched for DNA transposons. A miniature Impala (mimp) is always present in their promoters. Although promoter deletion studies of two effector gene loci did not reveal a direct function of the mimp for gene expression, we were able to use proximity to a mimp as a criterion to identify new effector gene candidates. Through xylem sap proteomics we confirmed that several of these candidates encode proteins secreted during plant infection.
Effector genes in Fol reside in characteristic subregions on a pathogenicity chromosome. Their genomic context allowed us to develop a method for the successful identification of novel effector genes. Since our approach is not based on effector gene similarity, but on unique genomic features, it can easily be extended to identify effector genes in Fo strains with different host specificities.
High density genetic maps built with SNP markers that are polymorphic in various genetic backgrounds are very useful for studying the genetics of agronomical traits as well as genome organization and evolution. Simultaneous dense SNP genotyping of segregating populations and variety collections was applied to oilseed rape (Brassica napus L.) to obtain a high density genetic map for this species and to study the linkage disequilibrium pattern.
We developed an integrated genetic map for oilseed rape by high throughput SNP genotyping of four segregating doubled haploid populations. A very high level of collinearity was observed between the four individual maps and a large number of markers (>59%) was common to more than two maps. The precise integrated map comprises 5764 SNP and 1603 PCR markers. With a total genetic length of 2250 cM, the integrated map contains a density of 3.27 markers (2.56 SNP) per cM. Genotyping of these mapped SNP markers in oilseed rape collections allowed polymorphism level and linkage disequilibrium (LD) to be studied across the different collections (winter vs spring, different seed quality types) and along the linkage groups. Overall, polymorphism level was higher and LD decayed faster in spring than in “00” winter oilseed rape types but this was shown to vary greatly along the linkage groups.
Our study provides a valuable resource for further genetic studies using linkage or association mapping, for marker assisted breeding and for Brassica napus sequence assembly and genome organization analyses.
Two species of wild silkworms, the Chinese oak silkworm (Antheraea pernyi) and the castor silkworm Philosamia cynthia ricini, can acquire a serious disease caused by Nucleopolyhedrin Viruses (NPVs) (known as AnpeNPV and PhcyNPV, respectively). The two viruses have similar polyhedral morphologies and their viral fragments share high sequence similarity. However, the physical maps of the viral genomes and cross-infectivity of the viruses are different. The genome sequences of two AnpeNPV isolates have been published.
We sequenced and analyzed the full-length genome of PhcyNPV to compare the gene contents of the two viruses. The genome of PhcyNPV is 125, 376 bp, with a G + C content of 53.65%, and encodes 138 open reading frames (ORFs) of at least 50 amino acids (aa) (GenBank accession number: JX404026). Between PhcyNPV and AnpeMNPV-L and -Z isolates, 126 ORFs are identical, including 30 baculovirus core genes. Nine ORFs were only found in PhcyNPV. Four genes, cath, v-chi, lef 10 and lef 11, were not found in PhcyNPV. However, most of the six genes required for infectivity via the oral route were found in PhcyNPV and in the two AnpeNPV isolates, with high sequence similarities. The pif-3 gene of PhcyNPV contained 59 aa extra amino acids at the N-terminus compared with AnpeNPV.
Most of the genes in PhcyNPV are similar to the two AnpeNPV isolates, including the direction of expression of the ORFs. Only a few genes were missing from PhcyNPV. These data suggest that PhcyNPV and AnpeNPV might be variants of each other, and that the differences in cross-infection might be caused by gene mutations.
The guanine nucleotide binding protein (G protein)-coupled receptors (GPCRs) regulate cell growth, proliferation and differentiation. G proteins are also implicated in erythroid differentiation, and some of them are expressed principally in hematopoietic cells. GPCRs-linked NO/cGMP and p38 MAPK signaling pathways already demonstrated potency for globin gene stimulation. By analyzing erythroid progenitors, derived from hematopoietic cells through in vitro ontogeny, our study intends to determine early markers and signaling pathways of globin gene regulation and their relation to GPCR expression.
Human hematopoietic CD34+ progenitors are isolated from fetal liver (FL), cord blood (CB), adult bone marrow (BM), peripheral blood (PB) and G-CSF stimulated mobilized PB (mPB), and then differentiated in vitro into erythroid progenitors. We find that growth capacity is most abundant in FL- and CB-derived erythroid cells. The erythroid progenitor cells are sorted as 100% CD71+, but we did not find statistical significance in the variations of CD34, CD36 and GlyA antigens and that confirms similarity in maturation of studied ontogenic periods. During ontogeny, beta-globin gene expression reaches maximum levels in cells of adult blood origin (176 fmol/μg), while gamma-globin gene expression is consistently up-regulated in CB-derived cells (60 fmol/μg). During gamma-globin induction by hydroxycarbamide, we identify stimulated GPCRs (PTGDR, PTGER1) and GPCRs-coupled genes known to be activated via the cAMP/PKA (ADIPOQ), MAPK pathway (JUN) and NO/cGMP (PRPF18) signaling pathways. During ontogeny, GPR45 and ARRDC1 genes have the most prominent expression in FL-derived erythroid progenitor cells, GNL3 and GRP65 genes in CB-derived cells (high gamma-globin gene expression), GPR110 and GNG10 in BM-derived cells, GPR89C and GPR172A in PB-derived cells, and GPR44 and GNAQ genes in mPB-derived cells (high beta-globin gene expression).
These results demonstrate the concomitant activity of GPCR-coupled genes and related signaling pathways during erythropoietic stimulation of globin genes. In accordance with previous reports, the stimulation of GPCRs supports the postulated connection between cAMP/PKA and NO/cGMP pathways in activation of γ-globin expression, via JUN and p38 MAPK signaling.
G protein; G protein-coupled receptors; Erythroid progenitors; Ontogeny; Globins
As one of the most dominant bacterial groups on Earth, cyanobacteria play a pivotal role in the global carbon cycling and the Earth atmosphere composition. Understanding their molecular responses to environmental perturbations has important scientific and environmental values. Since important biological processes or networks are often evolutionarily conserved, the cross-species transcriptional network analysis offers a useful strategy to decipher conserved and species-specific transcriptional mechanisms that cells utilize to deal with various biotic and abiotic disturbances, and it will eventually lead to a better understanding of associated adaptation and regulatory networks.
In this study, the Weighted Gene Co-expression Network Analysis (WGCNA) approach was used to establish transcriptional networks for four important cyanobacteria species under metal stress, including iron depletion and high copper conditions. Cross-species network comparison led to discovery of several core response modules and genes possibly essential to metal stress, as well as species-specific hub genes for metal stresses in different cyanobacteria species, shedding light on survival strategies of cyanobacteria responding to different environmental perturbations.
The WGCNA analysis demonstrated that the application of cross-species transcriptional network analysis will lead to novel insights to molecular response to environmental changes which will otherwise not be achieved by analyzing data from a single species.
Cross-species; Transcriptional network; Metal stress; Cyanobacteria
Plant nucleotide-binding site (NBS)-leucine-rich repeat (LRR) proteins encoded by resistance genes play an important role in the responses of plants to various pathogens, including viruses, bacteria, fungi, and nematodes. In this study, a comprehensive analysis of NBS-encoding genes within the whole cucumber genome was performed, and the phylogenetic relationships of NBS-encoding resistance gene homologues (RGHs) belonging to six species in five genera of Cucurbitaceae crops were compared.
Cucumber has relatively few NBS-encoding genes. Nevertheless, cucumber maintains genes belonging to both Toll/interleukine-1 receptor (TIR) and CC (coiled-coil) families. Eight commonly conserved motifs have been established in these two families which support the grouping into TIR and CC families. Moreover, three additional conserved motifs, namely, CNBS-1, CNBS-2 and TNBS-1, have been identified in sequences from CC and TIR families. Analyses of exon/intron configurations revealed that some intron loss or gain events occurred during the structural evolution between the two families. Phylogenetic analyses revealed that gene duplication, sequence divergence, and gene loss were proposed as the major modes of evolution of NBS-encoding genes in Cucurbitaceae species. Compared with NBS-encoding sequences from the Arabidopsis thaliana genome, the remaining seven TIR familes of NBS proteins and RGHs from Cucurbitaceae species have been shown to be phylogenetically distinct from the TIR family of NBS-encoding genes in Arabidopsis, except for two subfamilies (TIR4 and TIR9). On the other hand, in the CC-NBS family, they grouped closely with the CC family of NBS-encoding genes in Arabidopsis. Thus, the NBS-encoding genes in Cucurbitaceae crops are shown to be ancient, and NBS-encoding gene expansions (especially the TIR family) may have occurred before the divergence of Cucurbitaceae and Arabidopsis.
The results of this paper will provide a genomic framework for the further isolation of candidate disease resistance NBS-encoding genes in cucumber, and contribute to the understanding of the evolutionary mode of NBS-encoding genes in Cucurbitaceae crops.
NBS-LRR; Cucumber; Cucurbitaceae; Phylogenetic relationship
Genome-wide association studies have identified thousands of SNP variants associated with hundreds of phenotypes. For most associations the causal variants and the molecular mechanisms underlying pathogenesis remain unknown. Exploration of the underlying functional annotations of trait-associated loci has thrown some light on their potential roles in pathogenesis. However, there are some shortcomings of the methods used to date, which may undermine efforts to prioritize variants for further analyses. Here, we introduce and apply novel methods to rigorously identify annotation classes showing enrichment or depletion of trait-associated variants taking into account the underlying associations due to co-location of different functional annotations and linkage disequilibrium.
We assessed enrichment and depletion of variants in publicly available annotation classes such as genic regions, regulatory features, measures of conservation, and patterns of histone modifications. We used logistic regression to build a multivariate model that identified the most influential functional annotations for trait-association status of genome-wide significant variants. SNPs associated with all of the enriched annotations were 8 times more likely to be trait-associated variants than SNPs annotated with none of them. Annotations associated with chromatin state together with prior knowledge of the existence of a local expression QTL (eQTL) were the most important factors in the final logistic regression model. Surprisingly, despite the widespread use of evolutionary conservation to prioritize variants for study we find only modest enrichment of trait-associated SNPs in conserved regions.
We established odds ratios of functional annotations that are more likely to contain significantly trait-associated SNPs, for the purpose of prioritizing GWAS hits for further studies. Additionally, we estimated the relative and combined influence of the different genomic annotations, which may facilitate future prioritization methods by adding substantial information.
GWAS; Trait-associated SNPs; Chromatin states; Genomic features; Permutations; Logistic regression, Prioritization
The core promoter is the region flanking the transcription start site (TSS) that directs formation of the pre-initiation complex. Core promoters have been studied intensively in mammals and yeast, but not in more diverse eukaryotes. Here we investigate core promoters in oomycetes, a group within the Stramenopile kingdom that includes important plant and animal pathogens. Prior studies of a small collection of genes proposed that oomycete core promoters contain a 16 to 19 nt motif bearing an Initiator-like sequence (INR) flanked by a novel sequence named FPR, but this has not been extended to whole-genome analysis.
We used expectation maximization to find over-represented motifs near TSSs of Phytophthora infestans, the potato blight pathogen. The motifs corresponded to INR, FPR, and a new element found about 25 nt downstream of the TSS called DPEP. TATA boxes were not detected. Assays of DPEP function by mutagenesis were consistent with its role as a core motif. Genome-wide searches found a well-conserved combined INR+FPR in only about 13% of genes after correcting for false discovery, which contradicted prior reports that INR and FPR are found together in most genes. INR or FPR were found alone near TSSs in 18% and 7% of genes, respectively. Promoters lacking the motifs had pyrimidine-rich regions near the TSS. The combined INR+FPR motif was linked to higher than average mRNA levels, developmentally-regulated transcription, and functions related to plant infection, while DPEP and FPR were over-represented in constitutively-expressed genes. The INR, FPR, and combined INR+FPR motifs were detected in other oomycetes including Hyaloperonospora arabidopsidis, Phytophthora sojae, Pythium ultimum, and Saprolegnia parasitica, while DPEP was found in all but S. parasitica. Only INR seemed present in a non-oomycete stramenopile.
The absence of a TATA box and presence of novel motifs show that the oomycete core promoter is diverged from that of model systems, and likely explains the lack of activity of non-oomycete promoters in Phytophthora transformants. The association of the INR+FPR motif with developmentally-regulated genes shows that oomycete core elements influence stage-specific transcription in addition to regulating formation of the pre-initiation complex.
Core promoter; Transcription initiation; Oomycete genome; Promoter mutagenesis; Transcription factor binding site; Reporter gene assay
Classification is the problem of assigning each input object to one of a finite number of classes. This problem has been extensively studied in machine learning and statistics, and there are numerous applications to bioinformatics as well as many other fields. Building a multiclass classifier has been a challenge, where the direct approach of altering the binary classification algorithm to accommodate more than two classes can be computationally too expensive. Hence the indirect approach of using binary decomposition has been commonly used, in which retrieving the class posterior probabilities from the set of binary posterior probabilities given by the individual binary classifiers has been a major issue.
In this work, we present an extension of a recently introduced probabilistic kernel-based learning algorithm called the Classification Relevance Units Machine (CRUM) to the multiclass setting to increase its applicability. The extension is achieved under the error correcting output codes framework. The probabilistic outputs of the binary CRUMs are preserved using a proposed linear-time decoding algorithm, an alternative to the generalized Bradley-Terry (GBT) algorithm whose application to large-scale prediction settings is prohibited by its computational complexity. The resulting classifier is called the Multiclass Relevance Units Machine (McRUM).
The evaluation of McRUM on a variety of real small-scale benchmark datasets shows that our proposed Naïve decoding algorithm is computationally more efficient than the GBT algorithm while maintaining a similar level of predictive accuracy. Then a set of experiments on a larger scale dataset for small ncRNA classification have been conducted with Naïve McRUM and compared with the Gaussian and linear SVM. Although McRUM's predictive performance is slightly lower than the Gaussian SVM, the results show that the similar level of true positive rate can be achieved by sacrificing false positive rate slightly. Furthermore, McRUM is computationally more efficient than the SVM, which is an important factor for large-scale analysis.
We have proposed McRUM, a multiclass extension of binary CRUM. McRUM with Naïve decoding algorithm is computationally efficient in run-time and its predictive performance is comparable to the well-known SVM, showing its potential in solving large-scale multiclass problems in bioinformatics and other fields of study.