PMCC PMCC

Search tips
Search criteria

Advanced
Results 26-50 (6527)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
more »
26.  Characterization of the lipoxygenase (LOX) gene family in the Chinese white pear (Pyrus bretschneideri) and comparison with other members of the Rosaceae 
BMC Genomics  2014;15(1):444.
Background
Lipoxygenases (LOXs), a type of non-haem iron-containing dioxygenase, are ubiquitous enzymes in plants and participate in the formation of fruit aroma which is a very important aspect of fruit quality. Amongst the various aroma volatiles, saturated and unsaturated alcohols and aldehydes provide the characteristic aroma of the fruit. These compounds are formed from unsaturated fatty acids through oxidation, pyrolysis and reduction steps. This biosynthetic pathway involves at least four enzymes, including LOX, the enzyme responsible for lipid oxidation. Although some studies have been conducted on the LOX gene family in several species including Arabidopsis, soybean, cucumber and apple, there is no information from pear; and the evolutionary history of this gene family in the Rosaceae is still not resolved.
Results
In this study we identified 107 LOX homologous genes from five Rosaceous species (Pyrus bretschneideri, Malus × domestica, Fragaria vesca, Prunus mume and Prunus persica); 23 of these sequences were from pear. By using structure analysis, phylogenic analysis and collinearity analysis, we identified variation in gene structure and revealed the phylogenetic evolutionary relationship of this gene family. Expression of certain pear LOX genes during fruit development was verified by analysis of transcriptome data.
Conclusions
23 LOX genes were identified in pear and these genes were found to have undergone a duplication 30–45 MYA; most of these 23 genes are functional. Specific gene duplication was found on chromosome4 in the pear genome. Useful information was provided for future research on the evolutionary history and transgenic research on LOX genes.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-444) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-444
PMCID: PMC4072886  PMID: 24906560
Pear; LOX; Fruit flavor; Gene family; Rosaceae
27.  Improved Multiple Displacement Amplification (iMDA) and Ultraclean Reagents 
BMC Genomics  2014;15(1):443.
Background
Next-generation sequencing sample preparation requires nanogram to microgram quantities of DNA; however, many relevant samples are comprised of only a few cells. Genomic analysis of these samples requires a whole genome amplification method that is unbiased and free of exogenous DNA contamination. To address these challenges we have developed protocols for the production of DNA-free consumables including reagents and have improved upon multiple displacement amplification (iMDA).
Results
A specialized ethylene oxide treatment was developed that renders free DNA and DNA present within Gram positive bacterial cells undetectable by qPCR. To reduce DNA contamination in amplification reagents, a combination of ion exchange chromatography, filtration, and lot testing protocols were developed. Our multiple displacement amplification protocol employs a second strand-displacing DNA polymerase, improved buffers, improved reaction conditions and DNA free reagents. The iMDA protocol, when used in combination with DNA-free laboratory consumables and reagents, significantly improved efficiency and accuracy of amplification and sequencing of specimens with moderate to low levels of DNA. The sensitivity and specificity of sequencing of amplified DNA prepared using iMDA was compared to that of DNA obtained with two commercial whole genome amplification kits using 10 fg (~1-2 bacterial cells worth) of bacterial genomic DNA as a template. Analysis showed >99% of the iMDA reads mapped to the template organism whereas only 0.02% of the reads from the commercial kits mapped to the template. To assess the ability of iMDA to achieve balanced genomic coverage, a non-stochastic amount of bacterial genomic DNA (1 pg) was amplified and sequenced, and data obtained were compared to sequencing data obtained directly from genomic DNA. The iMDA DNA and genomic DNA sequencing had comparable coverage 99.98% of the reference genome at ≥1X coverage and 99.9% at ≥5X coverage while maintaining both balance and representation of the genome.
Conclusions
The iMDA protocol in combination with DNA-free laboratory consumables, significantly improved the ability to sequence specimens with low levels of DNA. iMDA has broad utility in metagenomics, diagnostics, ancient DNA analysis, pre-implantation embryo screening, single-cell genomics, whole genome sequencing of unculturable organisms, and forensic applications for both human and microbial targets.
doi:10.1186/1471-2164-15-443
PMCID: PMC4061449  PMID: 24906487
Whole genome amplification; Next generation sequencing; Multiple displacement amplification; Contamination; Clean reagents; DNA-free
28.  Regulatory and coding genome regions are enriched for trait associated variants in dairy and beef cattle 
BMC Genomics  2014;15(1):436.
Background
In livestock, as in humans, the number of genetic variants that can be tested for association with complex quantitative traits, or used in genomic predictions, is increasing exponentially as whole genome sequencing becomes more common. The power to identify variants associated with traits, particularly those of small effects, could be increased if certain regions of the genome were known a priori to be enriched for associations. Here, we investigate whether twelve genomic annotation classes were enriched or depleted for significant associations in genome wide association studies for complex traits in beef and dairy cattle. We also describe a variance component approach to determine the proportion of genetic variance captured by each annotation class.
Results
P-values from large GWAS using 700K SNP in both dairy and beef cattle were available for 11 and 10 traits respectively. We found significant enrichment for trait associated variants (SNP significant in the GWAS) in the missense class along with regions 5 kilobases upstream and downstream of coding genes. We found that the non-coding conserved regions (across mammals) were not enriched for trait associated variants. The results from the enrichment or depletion analysis were not in complete agreement with the results from variance component analysis, where the missense and synonymous classes gave the greatest increase in variance explained, while the upstream and downstream classes showed a more modest increase in the variance explained.
Conclusion
Our results indicate that functional annotations could assist in prioritization of variants to a subset more likely to be associated with complex traits; including missense variants, and upstream and downstream regions. The differences in two sets of results (GWAS enrichment depletion versus variance component approaches) might be explained by the fact that the variance component approach has greater power to capture the cumulative effect of mutations of small effect, while the enrichment or depletion approach only captures the variants that are significant in GWAS, which is restricted to a limited number of common variants of moderate effects.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-436) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-436
PMCID: PMC4070550  PMID: 24903263
Variants component analysis; Regulatory genome; GWAS prioritization; Enrichment depletion
29.  Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools 
BMC Genomics  2014;15(1):439.
Background
Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality.
Results
In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS.
Conclusions
By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.
We have made public the input data (FASTQ format) for the set of pools used in this study:
ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.
(alternatively accessible via http://congenie.org/downloads).
The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-439
PMCID: PMC4070561  PMID: 24906298
30.  Analysis of peptide PSY1 responding transcripts in the two Arabidopsis plant lines: wild type and psy1r receptor mutant 
BMC Genomics  2014;15(1):441.
Background
Small-secreted peptides are emerging as important components in cell-cell communication during basic developmental stages of plant cell growth and development. Plant peptide containing sulfated tyrosine 1 (PSY1) has been reported to promote cell expansion and differentiation in the elongation zone of roots. PSY1 action is dependent on a receptor PSY1R that triggers a signaling cascade leading to cell elongation. However little is known about cellular functions and the components involved in PSY1-based signaling cascade.
Results
Differentially expressed genes were identified in a wild type plant line and in a psy1r receptor mutant line of Arabidopsis thaliana after treatment with PSY1. Seventy-seven genes were found to be responsive to the PSY1 peptide in wild type plants while 154 genes were responsive in the receptor mutant plants. PSY1 activates the transcripts of genes involved in cell wall modification. Gene enrichment analysis revealed that PSY1-responsive genes are involved in responses to stimuli, metabolic processes and biosynthetic processes. The significant enrichment terms of PSY1-responsive genes were higher in psy1r mutant plants compared to in wild type plants. Two parallel responses to PSY1 were identified, differing in their dependency on the PSY1R receptor. Promoter analysis of the differentially expressed genes identified a light regulatory motif in some of these.
Conclusion
PSY1-responsive genes are involved in cellular functions and stimuli responses suggesting a crosstalk between developmental cues and environmental stimuli. Possibly, two parallel responses to PSY1 exist. A motif involved in light regulation was identified in the promoter region of the differentially expressed genes. Reduced hypocotyl growth was observed in etiolated receptor mutant seedlings.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-441) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-441
PMCID: PMC4070568  PMID: 24906416
Cellular functions; Gene enrichment analysis; Microarray; Signaling cascade; Small signaling peptides
31.  Whole-genome sequencing of Mesorhizobium huakuii 7653R provides molecular insights into host specificity and symbiosis island dynamics 
BMC Genomics  2014;15(1):440.
Background
Evidence based on genomic sequences is urgently needed to confirm the phylogenetic relationship between Mesorhizobium strain MAFF303099 and M. huakuii. To define underlying causes for the rather striking difference in host specificity between M. huakuii strain 7653R and MAFF303099, several probable determinants also require comparison at the genomic level. An improved understanding of mobile genetic elements that can be integrated into the main chromosomes of Mesorhizobium to form genomic islands would enrich our knowledge of how genome dynamics may contribute to Mesorhizobium evolution in general.
Results
In this study, we sequenced the complete genome of 7653R and compared it with five other Mesorhizobium genomes. Genomes of 7653R and MAFF303099 were found to share a large set of orthologs and, most importantly, a conserved chromosomal backbone and even larger perfectly conserved synteny blocks. We also identified candidate molecular differences responsible for the different host specificities of these two strains. Finally, we reconstructed an ancestral Mesorhizobium genomic island that has evolved into diverse forms in different Mesorhizobium species.
Conclusions
Our ortholog and synteny analyses firmly establish MAFF303099 as a strain of M. huakuii. Differences in nodulation factors and secretion systems T3SS, T4SS, and T6SS may be responsible for the unique host specificities of 7653R and MAFF303099 strains. The plasmids of 7653R may have arisen by excision of the original genomic island from the 7653R chromosome.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-440) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-440
PMCID: PMC4072884  PMID: 24906389
Mesorhizobium huakuii 7653R; Genome sequencing; Comparative analysis; Host specificity; Symbiosis island
32.  Gene expression profiling of brains from bovine spongiform encephalopathy (BSE)-infected cynomolgus macaques 
BMC Genomics  2014;15(1):434.
Background
Prion diseases are fatal neurodegenerative disorders whose pathogenesis mechanisms are not fully understood. In this context, the analysis of gene expression alterations occurring in prion-infected animals represents a powerful tool that may contribute to unravel the molecular basis of prion diseases and therefore discover novel potential targets for diagnosis and therapeutics. Here we present the first large-scale transcriptional profiling of brains from BSE-infected cynomolgus macaques, which are an excellent model for human prion disorders.
Results
The study was conducted using the GeneChip® Rhesus Macaque Genome Array and revealed 300 transcripts with expression changes greater than twofold. Among these, the bioinformatics analysis identified 86 genes with known functions, most of which are involved in cellular development, cell death and survival, lipid homeostasis, and acute phase response signaling. RT-qPCR was performed on selected gene transcripts in order to validate the differential expression in infected animals versus controls. The results obtained with the microarray technology were confirmed and a gene signature was identified. In brief, HBB and HBA2 were down-regulated in infected macaques, whereas TTR, APOC1 and SERPINA3 were up-regulated.
Conclusions
Some genes involved in oxygen or lipid transport and in innate immunity were found to be dysregulated in prion infected macaques. These genes are known to be involved in other neurodegenerative disorders such as Alzheimer’s and Parkinson’s diseases. Our results may facilitate the identification of potential disease biomarkers for many neurodegenerative diseases.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-434) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-434
PMCID: PMC4061447  PMID: 24898206
Prion diseases; BSE; Non-human primates; Neurodegeneration; Transcriptome; Microarray; RT-qPCR; Biomarker; Serpina3; Hemoglobin
33.  Dynamic imbalance between cancer cell subpopulations induced by Transforming Growth Factor Beta (TGF-β) is associated with a DNA methylome switch 
BMC Genomics  2014;15(1):435.
Background
Distinct subpopulations of neoplastic cells within tumors, including hepatocellular carcinoma (HCC), display pronounced ability to initiate new tumors and induce metastasis. Recent evidence suggests that signals from transforming growth factor beta (TGF-β) may increase the survival of these so called tumor initiating cells leading to poor HCC prognosis. However, how TGF-β establishes and modifies the key features of these cell subpopulations is not fully understood.
Results
In the present report we describe the differential DNA methylome of CD133-negative and CD133-expressing liver cancer cells. Next, we show that TGF-β is able to increase the proportion of CD133+ cells in liver cancer cell lines in a way that is stable and persistent across cell division. This process is associated with stable genome-wide changes in DNA methylation that persist through cell division. Differential methylation in response to TGF-β is under-represented at promoter CpG islands and enriched at gene bodies, including a locus in the body of the de novo DNA methyl-transferase DNMT3B gene. Moreover, phenotypic changes induced by TGF-β, including the induction of CD133, are impaired by siRNA silencing of de novo DNA methyl-transferases.
Conclusions
Our study reveals a self-perpetuating crosstalk between TGF-β signaling and the DNA methylation machinery, which can be relevant in the establishment of cellular phenotypes. This is the first indication of the ability of TGF-β to induce genome-wide changes in DNA methylation, resulting in a stable change in the proportion of liver cancer cell subpopulations.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-435) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-435
PMCID: PMC4070873  PMID: 24898317
HCC; Tumor-initiating cells; CD133; DNA methylation; TGF-β pathway
34.  Transcriptomics of cryophilic Saccharomyces kudriavzevii reveals the key role of gene translation efficiency in cold stress adaptations 
BMC Genomics  2014;15(1):432.
Background
Comparative transcriptomics and functional studies of different Saccharomyces species have opened up the possibility of studying and understanding new yeast abilities. This is the case of yeast adaptation to stress, in particular the cold stress response, which is especially relevant for the food industry. Since the species Saccharomyces kudriavzevii is adapted to grow at low temperatures, it has been suggested that it contains physiological adaptations that allow it to rapidly and efficiently acclimatise after cold shock.
Results
In this work, we aimed to provide new insights into the molecular basis determining this better cold adaptation of S. kudriavzevii strains. To this end, we have compared S. cerevisiae and S. kudriavzevii transcriptome after yeast adapted to cold shock. The results showed that both yeast mainly activated the genes related to translation machinery by comparing 12°C with 28°C, but the S. kudriavzevii response was stronger, showing an increased expression of dozens of genes involved in protein synthesis. This suggested enhanced translation efficiency at low temperatures, which was confirmed when we observed increased resistance to translation inhibitor paromomycin. Finally, 35S-methionine incorporation assays confirmed the increased S. kudriavzevii translation rate after cold shock.
Conclusions
This work confirms that S. kudriavzevii is able to grow at low temperatures, an interesting ability for different industrial applications. We propose that this adaptation is based on its enhanced ability to initiate a quick, efficient translation of crucial genes in cold adaptation among others, a mechanism that has been suggested for other microorganisms.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-432) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-432
PMCID: PMC4058008  PMID: 24898014
Saccharomyces cerevisiae; S. kudriavzevii; Transcriptomics; Cold stress; Translation
35.  An ultra-high density bin-map for rapid QTL mapping for tassel and ear architecture in a large F2 maize population 
BMC Genomics  2014;15(1):433.
Background
Understanding genetic control of tassel and ear architecture in maize (Zea mays L. ssp. mays) is important due to their relationship with grain yield. High resolution QTL mapping is critical for understanding the underlying molecular basis of phenotypic variation. Advanced populations, such as recombinant inbred lines, have been broadly adopted for QTL mapping; however, construction of large advanced generation crop populations is time-consuming and costly. The rapidly declining cost of genotyping due to recent advances in next-generation sequencing technologies has generated new possibilities for QTL mapping using large early generation populations.
Results
A set of 708 F2 progeny derived from inbreds Chang7-2 and 787 were generated and genotyped by whole genome low-coverage genotyping-by-sequencing method (average 0.04×). A genetic map containing 6,533 bin-markers was constructed based on the parental SNPs and a sliding-window method, spanning a total genetic distance of 1,396 cM. The high quality and accuracy of this map was validated by the identification of two well-studied genes, r1, a qualitative trait locus for color of silk (chromosome 10) and ba1 for tassel branch number (chromosome 3). Three traits of tassel and ear architecture were evaluated in this population, a total of 10 QTL were detected using a permutation-based-significance threshold, seven of which overlapped with reported QTL. Three genes (GRMZM2G316366, GRMZM2G492156 and GRMZM5G805008) encoding MADS-box domain proteins and a BTB/POZ domain protein were located in the small intervals of qTBN5 and qTBN7 (~800 Kb and 1.6 Mb in length, respectively) and may be involved in patterning of tassel architecture. The small physical intervals of most QTL indicate high-resolution mapping is obtainable with this method.
Conclusions
We constructed an ultra-high-dentisy linkage map for the large early generation population in maize. Our study provides an efficient approach for fast detection of quantitative loci responsible for complex trait variation with high accuracy, thus helping to dissect the underlying molecular basis of phenotypic variation and accelerate improvement of crop breeding in a cost-effective fashion.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-433) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-433
PMCID: PMC4059873  PMID: 24898122
Quantitative trait loci; Genotyping by sequencing; Next generation sequencer; Breeding; Maize
36.  Genomic characterization of Salmonella Cerro ST367, an emerging Salmonella subtype in cattle in the United States 
BMC Genomics  2014;15(1):427.
Background
Within the last decade, Salmonella enterica subsp. enterica serovar Cerro (S. Cerro) has become one of the most common serovars isolated from cattle and dairy farm environments in the northeastern US. The fact that this serovar is commonly isolated from subclinically infected cattle and is rarely associated with human disease, despite its frequent isolation from cattle, has led to the hypothesis that this emerging serovar may be characterized by reduced virulence. We applied comparative and population genomic approaches to (i) characterize the evolution of this recently emerged serovar and to (ii) gain a better understanding of genomic features that could explain some of the unique epidemiological features associated with this serovar.
Results
In addition to generating a de novo draft genome for one Salmonella Cerro strain, we also generated whole genome sequence data for 26 additional S. Cerro isolates, including 16 from cattle operations in New York (NY) state, 2 from human clinical cases from NY in 2008, and 8 from diverse animal sources (7 from Washington state and 1 from Florida). All isolates sequenced in this study represent sequence type ST367. Population genomic analysis showed that isolates from the NY cattle operations form a well-supported clade within S. Cerro ST367 (designated here “NY bovine clade”), distinct from isolates from Washington state, Florida and the human clinical cases. A molecular clock analysis indicates that the most recent common ancestor of the NY bovine clade dates back to 1998, supporting the recent emergence of this clone.
Comparative genomic analyses revealed several relevant genomic features of S. Cerro ST367, that may be responsible for reduced virulence of S. Cerro, including an insertion creating a premature stop codon in sopA. In addition, patterns of gene deletion in S. Cerro ST367 further support adaptation of this clone to a unique ecological or host related niche.
Conclusions
Our results indicate that the increase in prevalence of S. Cerro ST367 is caused by a highly clonal subpopulation and that S. Cerro ST367 is characterized by unique genomic deletions that may indicate adaptation to specific ecological niches and possibly reduced virulence in some hosts.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-427) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-427
PMCID: PMC4070546  PMID: 24898914
37.  Genome-wide analysis of regulatory proteases sequences identified through bioinformatics data mining in Taenia solium 
BMC Genomics  2014;15(1):428.
Background
Cysticercosis remains a major neglected tropical disease of humanity in many regions, especially in sub-Saharan Africa, Central America and elsewhere. Owing to the emerging drug resistance and the inability of current drugs to prevent re-infection, identification of novel vaccines and chemotherapeutic agents against Taenia solium and related helminth pathogens is a public health priority. The T. solium genome and the predicted proteome were reported recently, providing a wealth of information from which new interventional targets might be identified. In order to characterize and classify the entire repertoire of protease-encoding genes of T. solium, which act fundamental biological roles in all life processes, we analyzed the predicted proteins of this cestode through a combination of bioinformatics tools. Functional annotation was performed to yield insights into the signaling processes relevant to the complex developmental cycle of this tapeworm and to highlight a suite of the proteases as potential intervention targets.
Results
Within the genome of this helminth parasite, we identified 200 open reading frames encoding proteases from five clans, which correspond to 1.68% of the 11,902 protein-encoding genes predicted to be present in its genome. These proteases include calpains, cytosolic, mitochondrial signal peptidases, ubiquitylation related proteins, and others. Many not only show significant similarity to proteases in the Conserved Domain Database but have conserved active sites and catalytic domains. KEGG Automatic Annotation Server (KAAS) analysis indicated that ~60% of these proteases share strong sequence identities with proteins of the KEGG database, which are involved in human disease, metabolic pathways, genetic information processes, cellular processes, environmental information processes and organismal systems. Also, we identified signal peptides and transmembrane helices through comparative analysis with classes of important regulatory proteases. Phylogenetic analysis using Bayes approach provided support for inferring functional divergence among regulatory cysteine and serine proteases.
Conclusion
Numerous putative proteases were identified for the first time in T. solium, and important regulatory proteases have been predicted. This comprehensive analysis not only complements the growing knowledge base of proteolytic enzymes, but also provides a platform from which to expand knowledge of cestode proteases and to explore their biochemistry and potential as intervention targets.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-428) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-428
PMCID: PMC4070553  PMID: 24899069
Proteases; Taenia solium; Drug target; Vaccine candidate antigen; Genome-wide analysis; Cysticercosis; Platyhelminth
38.  Association of purine asymmetry, strand-biased gene distribution and PolC within Firmicutes and beyond: a new appraisal 
BMC Genomics  2014;15(1):430.
Background
The Firmicutes often possess three conspicuous genome features: marked Purine Asymmetry (PAS) across two strands of replication, Strand-biased Gene Distribution (SGD) and presence of two isoforms of DNA polymerase III alpha subunit, PolC and DnaE. Despite considerable research efforts, it is not clear whether the co-existence of PAS, PolC and/or SGD is an essential and exclusive characteristic of the Firmicutes. The nature of correlations, if any, between these three features within and beyond the lineages of Firmicutes has also remained elusive. The present study has been designed to address these issues.
Results
A large-scale analysis of diverse bacterial genomes indicates that PAS, PolC and SGD are neither essential nor exclusive features of the Firmicutes. PolC prevails in four bacterial phyla: Firmicutes, Fusobacteria, Tenericutes and Thermotogae, while PAS occurs only in subsets of Firmicutes, Fusobacteria and Tenericutes. There are five major compositional trends in Firmicutes: (I) an explicit PAS or G + A-dominance along the entire leading strand (II) only G-dominance in the leading strand, (III) alternate stretches of purine-rich and pyrimidine-rich sequences, (IV) G + T dominance along the leading strand, and (V) no identifiable patterns in base usage. Presence of strong SGD has been observed not only in genomes having PAS, but also in genomes with G-dominance along their leading strands – an observation that defies the notion of co-occurrence of PAS and SGD in Firmicutes. The PolC-containing non-Firmicutes organisms often have alternate stretches of R-dominant and Y-dominant sequences along their genomes and most of them show relatively weak, but significant SGD. Firmicutes having G + A-dominance or G-dominance along LeS usually show distinct base usage patterns in three codon sites of genes. Probable molecular mechanisms that might have incurred such usage patterns have been proposed.
Conclusion
Co-occurrence of PAS, strong SGD and PolC should not be regarded as a genome signature of the Firmicutes. Presence of PAS in a species may warrant PolC and strong SGD, but PolC and/or SGD not necessarily implies PAS.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-430) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-430
PMCID: PMC4070872  PMID: 24899249
Fusobacteria; Tenericutes; Thermotogae; G-dominance; Leading strand; Lagging strand; Mutational bias; Cytosine methylation; Codon sites; Base usage
39.  Identification of genes required for the survival of B. fragilis using massive parallel sequencing of a saturated transposon mutant library 
BMC Genomics  2014;15(1):429.
Background
Bacteroides fragilis is a Gram-negative anaerobe that is normally a human gut commensal; it comprises a small percentage of the gut Bacteroides but is the most frequently isolated Bacteroides from human infections. Identification of the essential genes necessary for the survival of B. fragilis provides novel information which can be exploited for the treatment of bacterial infections.
Results
Massive parallel sequencing of saturated transposon mutant libraries (two mutant pools of approximately 50,000 mutants each) was used to determine the essential genes for the growth of B. fragilis 638R on nutrient rich medium. Among the 4326 protein coding genes, 550 genes (12.7%) were found to be essential for the survival of B. fragilis 638R. Of the 550 essential genes, only 367 genes were assigned to a Cluster of Orthologous Genes, and about 290 genes had Kyoto Encyclopedia of Genes and Genomes orthologous members. Interestingly, genes with hypothetical functions accounted for 41.3% of essential genes (227 genes), indicating that the functions of a significant percentage of the genes used by B. fragilis 638R are still unknown. Global transcriptome analysis using RNA-Seq indicated that most of the essential genes (92%) are, in fact, transcribed in B. fragilis 638R including most of those coding for hypothetical proteins. Three hundred fifty of the 550 essential genes of B. fragilis 638R are present in Database of Essential Genes. 10.02 and 31% of those are genes included as essential genes for nine species (including Gram-positive pathogenic bacteria).
Conclusions
The essential gene data described in this investigation provides a valuable resource to study gene function and pathways involved in B. fragilis survival. Thorough examination of the B. fragilis-specific essential genes and genes that are shared between divergent organisms opens new research avenues that will lead to enhanced understanding of survival strategies used by bacteria in different microniches and under different stress situations.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-429) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-429
PMCID: PMC4072883  PMID: 24899126
Bacteroides fragilis; Transposon mutants; Essential genes; Massively parallel sequencing; COG; DEG
40.  Anthocyanin biosynthetic genes in Brassica rapa 
BMC Genomics  2014;15(1):426.
Background
Anthocyanins are a group of flavonoid compounds. As a group of important secondary metabolites, they perform several key biological functions in plants. Anthocyanins also play beneficial health roles as potentially protective factors against cancer and heart disease. To elucidate the anthocyanin biosynthetic pathway in Brassica rapa, we conducted comparative genomic analyses between Arabidopsis thaliana and B. rapa on a genome-wide level.
Results
In total, we identified 73 genes in B. rapa as orthologs of 41 anthocyanin biosynthetic genes in A. thaliana. In B. rapa, the anthocyanin biosynthetic genes (ABGs) have expanded and most genes exist in more than one copy. The anthocyanin biosynthetic structural genes have expanded through whole genome and tandem duplication in B. rapa. More structural genes located upstream of the anthocyanin biosynthetic pathway have been retained than downstream. More negative regulatory genes are retained in the anthocyanin biosynthesis regulatory system of B. rapa.
Conclusions
These results will promote an understanding of the genetic mechanism of anthocyanin biosynthesis, as well as help the improvement of the nutritional quality of B. rapa through the breeding of high anthocyanin content varieties.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-426) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-426
PMCID: PMC4072887  PMID: 24893600
Comparative genomics; Anthocyanin biosynthetic genes; Whole genome duplication; Brassica rapa; Cruciferae
41.  Crossroads between light response and nutrient signalling: ENV1 and PhLP1 act as mutual regulatory pair in Trichoderma reesei 
BMC Genomics  2014;15(1):425.
Background
Crosstalk between the signalling pathways responding to light–dark cycles and those triggering the adaptation of metabolism to the environment is known to occur in various organisms. This interrelationship of light response and nutrient sigalling is crucial for health and fitness. The tropical ascomycete Trichoderma reesei (syn. Hypocrea jecorina) represents one of the most efficient plant cell wall degraders. Regulation of the enzymes required for this process is affected by nutritional signals as well as other environmental signals including light. Therefore we aimed to elucidate the interrelationship between nutrient and light signaling and how the light signal is transmitted to downstream pathways.
Results
We found that the targets of the light regulatory protein ENV1 in light show considerable overlap with those of the heterotrimeric G-protein components PhLP1, GNB1 and GNG1. Detailed investigation of a regulatory interrelationship of these components with ENV1 under conditions of early and late light response indicated a transcriptional mutual regulation between PhLP1 and ENV1, which appears to dampen nutrient signalling during early light response, presumably to free resources for protective measures prior to adaptation of metabolism to light. Investigating the downstream part of the cascade we found support for the hypothesis that ENV1 is necessary for cAMP mediated regulation of a considerable part of the core functions of the output pathway of this cascade, including regulation of glycoside hydrolase genes and those involved in nitrogen, sulphur and amino acid metabolism.
Conclusions
ENV1 and PhLP1 are mutual regulators connecting light signaling with nutrient signaling, with ENV1 triggering the output pathway by influencing cAMP levels.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-425) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-425
PMCID: PMC4076981  PMID: 24893562
42.  Analysis of the genetics of boar taint reveals both single SNPs and regional effects 
BMC Genomics  2014;15(1):424.
Background
Boar taint is an offensive urine or faecal-like odour, affecting the smell and taste of cooked pork from some mature non-castrated male pigs. Androstenone and skatole in fat are the molecules responsible. In most pig production systems, males, which are not required for breeding, are castrated shortly after birth to reduce the risk of boar taint. There is evidence for genetic variation in the predisposition to boar taint.
A genome-wide association study (GWAS) was performed to identify loci with effects on boar taint. Five hundred Danish Landrace boars with high levels of skatole in fat (>0.3 μg/g), were each matched with a litter mate with low levels of skatole and measured for androstenone. DNA from these 1,000 non-castrated boars was genotyped using the Illumina PorcineSNP60 Beadchip. After quality control, tests for SNPs associated with boar taint were performed on 938 phenotyped individuals and 44,648 SNPs. Empirical significance thresholds were set by permutation (100,000). For androstenone, a ‘regional heritability approach’ combining information from multiple SNPs was used to estimate the genetic variation attributable to individual autosomes.
Results
A highly significant association was found between variation in skatole levels and SNPs within the CYP2E1 gene on chromosome 14 (SSC14), which encodes an enzyme involved in degradation of skatole. Nominal significance was found for effects on skatole associated with 4 other SNPs including a region of SSC6 reported previously. Genome-wide significance was found for an association between SNPs on SSC5 and androstenone levels and nominal significance for associations with SNPs on SSC13 and SSC17. The regional analyses confirmed large effects on SSC5 for androstenone and suggest that SSC5 explains 23% of the genetic variation in androstenone. The autosomal heritability analyses also suggest that there is a large effect associated with androstenone on SSC2, not detected using GWAS.
Conclusions
Significant SNP associations were found for skatole on SSC14 and for androstenone on SSC5 in Landrace pigs. The study agrees with evidence that the CYP2E1 gene has effects on skatole breakdown in the liver. Autosomal heritability estimates can uncover clusters of smaller genetic effects that individually do not exceed the threshold for GWAS significance.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-424) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-424
PMCID: PMC4059876  PMID: 24894739
Boar taint; Skatole; Androstenone; Regional heritability; Genome-wide association
43.  CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data 
BMC Genomics  2014;15(1):423.
Background
miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.
Results
We developed a Comprehensive Analysis Pipeline for microRNA Sequencing data (CAP-miRSeq) that integrates read pre-processing, alignment, mature/precursor/novel miRNA detection and quantification, data visualization, variant detection in miRNA coding region, and more flexible differential expression analysis between experimental conditions. According to computational infrastructure, users can install the package locally or deploy it in Amazon Cloud to run samples sequentially or in parallel for a large number of samples for speedy analyses. In either case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibility, and practical use in research and biomarker discovery.
Conclusions
CAP-miRSeq is a powerful and flexible tool for users to process and analyze miRNA-seq data scalable from a few to hundreds of samples. The results are presented in the convenient way for investigators or analysts to conduct further investigation and discovery.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-423) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-423
PMCID: PMC4070549  PMID: 24894665
miRNA sequencing; Analysis pipeline; Differential expression; Variant detection
44.  Expression-based network biology identifies immune-related functional modules involved in plant defense 
BMC Genomics  2014;15(1):421.
Background
Plants respond to diverse environmental cues including microbial perturbations by coordinated regulation of thousands of genes. These intricate transcriptional regulatory interactions depend on the recognition of specific promoter sequences by regulatory transcription factors. The combinatorial and cooperative action of multiple transcription factors defines a regulatory network that enables plant cells to respond to distinct biological signals. The identification of immune-related modules in large-scale transcriptional regulatory networks can reveal the mechanisms by which exposure to a pathogen elicits a precise phenotypic immune response.
Results
We have generated a large-scale immune co-expression network using a comprehensive set of Arabidopsis thaliana (hereafter Arabidopsis) transcriptomic data, which consists of a wide spectrum of immune responses to pathogens or pathogen-mimicking stimuli treatments. We employed both linear and non-linear models to generate Arabidopsis immune co-expression regulatory (AICR) network. We computed network topological properties and ascertained that this newly constructed immune network is densely connected, possesses hubs, exhibits high modularity, and displays hallmarks of a “real” biological network. We partitioned the network and identified 156 novel modules related to immune functions. Gene Ontology (GO) enrichment analyses provided insight into the key biological processes involved in determining finely tuned immune responses. We also developed novel software called OCCEAN (One Click Cis-regulatory Elements ANalysis) to discover statistically enriched promoter elements in the upstream regulatory regions of Arabidopsis at a whole genome level. We demonstrated that OCCEAN exhibits higher precision than the existing promoter element discovery tools. In light of known and newly discovered cis-regulatory elements, we evaluated biological significance of two key immune-related functional modules and proposed mechanism(s) to explain how large sets of diverse GO genes coherently function to mount effective immune responses.
Conclusions
We used a network-based, top-down approach to discover immune-related modules from transcriptomic data in Arabidopsis. Detailed analyses of these functional modules reveal new insight into the topological properties of immune co-expression networks and a comprehensive understanding of multifaceted plant defense responses. We present evidence that our newly developed software, OCCEAN, could become a popular tool for the Arabidopsis research community as well as potentially expand to analyze other eukaryotic genomes.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-421) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-421
PMCID: PMC4070563  PMID: 24888606
Pathogen and pathogen-mimic stimuli; Linear and non-linear models; Transcriptional regulatory network; Network topology; GO terms; OCCEAN software; Immune-related functional modules
45.  Comparative genomics of Bradyrhizobium japonicum CPAC 15 and Bradyrhizobium diazoefficiens CPAC 7: elite model strains for understanding symbiotic performance with soybean 
BMC Genomics  2014;15(1):420.
Background
The soybean-Bradyrhizobium symbiosis can be highly efficient in fixing nitrogen, but few genomic sequences of elite inoculant strains are available. Here we contribute with information on the genomes of two commercial strains that are broadly applied to soybean crops in the tropics. B. japonicum CPAC 15 (=SEMIA 5079) is outstanding in its saprophytic capacity and competitiveness, whereas B. diazoefficiens CPAC 7 (=SEMIA 5080) is known for its high efficiency in fixing nitrogen. Both are well adapted to tropical soils. The genomes of CPAC 15 and CPAC 7 were compared to each other and also to those of B. japonicum USDA 6T and B. diazoefficiens USDA 110T.
Results
Differences in genome size were found between species, with B. japonicum having larger genomes than B. diazoefficiens. Although most of the four genomes were syntenic, genome rearrangements within and between species were observed, including events in the symbiosis island. In addition to the symbiotic region, several genomic islands were identified. Altogether, these features must confer high genomic plasticity that might explain adaptation and differences in symbiotic performance. It was not possible to attribute known functions to half of the predicted genes. About 10% of the genomes was composed of exclusive genes of each strain, but up to 98% of them were of unknown function or coded for mobile genetic elements. In CPAC 15, more genes were associated with secondary metabolites, nutrient transport, iron-acquisition and IAA metabolism, potentially correlated with higher saprophytic capacity and competitiveness than seen with CPAC 7. In CPAC 7, more genes were related to the metabolism of amino acids and hydrogen uptake, potentially correlated with higher efficiency of nitrogen fixation than seen with CPAC 15.
Conclusions
Several differences and similarities detected between the two elite soybean-inoculant strains and between the two species of Bradyrhizobium provide new insights into adaptation to tropical soils, efficiency of N2 fixation, nodulation and competitiveness.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-420) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-420
PMCID: PMC4070871  PMID: 24888481
Symbiosis; Nodulation; Nitrogen fixation; Competitiveness; Secretion systems; Horizontal gene transfer; Membrane transporters; Surface polysaccharides; Secondary metabolism; Phytohormone synthesis
46.  Direct observation of genomic heterogeneity through local haplotyping analysis 
BMC Genomics  2014;15(1):418.
Background
It has been an abiding belief among geneticists that multicellular organisms’ genomes can be analyzed under the assumption that a single individual has a uniform genome in all its cells. Despite some evidence to the contrary, this belief has been used as an axiomatic assumption in most genome analysis software packages. In this paper we present observations in human whole genome data, human whole exome data and in mouse whole genome data to challenge this assumption. We show that heterogeneity is in fact ubiquitous and readily observable in ordinary Next Generation Sequencing (NGS) data.
Results
Starting with the assumption that a single NGS read (or read pair) must come from one haplotype, we built a procedure for directly observing haplotypes at a local level by examining 2 or 3 adjacent single nucleotide polymorphisms (SNPs) which are close enough on the genome to be spanned by individual reads. We applied this procedure to NGS data from three different sources: whole genome of a Central European trio from the 1000 genomes project, whole genome data from laboratory-bred strains of mouse, and whole exome data from a set of patients of head and neck tumors. Thousands of loci were found in each genome where reads spanning 2 or 3 SNPs displayed more than two haplotypes, indicating that the locus is heterogeneous. We show that such loci are ubiquitous in the genome and cannot be explained by segmental duplications. We explain them on the basis of cellular heterogeneity at the genomic level. Such heterogeneous loci were found in all normal and tumor genomes examined.
Conclusions
Our results highlight the need for new methods to analyze genomic variation because existing ones do not systematically consider local haplotypes. Identification of cancer somatic mutations is complicated because of tumor heterogeneity. It is further complicated if, as we show, normal tissues are also heterogeneous. Methods for biomarker discovery must consider contextual haplotype information rather than just whether a variant “is present”.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-418) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-418
PMCID: PMC4053652  PMID: 24888354
47.  Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling 
BMC Genomics  2014;15(1):419.
Background
RNA sequencing (RNA-Seq) is often used for transcriptome profiling as well as the identification of novel transcripts and alternative splicing events. Typically, RNA-Seq libraries are prepared from total RNA using poly(A) enrichment of the mRNA (mRNA-Seq) to remove ribosomal RNA (rRNA), however, this method fails to capture non-poly(A) transcripts or partially degraded mRNAs. Hence, a mRNA-Seq protocol will not be compatible for use with RNAs coming from Formalin-Fixed and Paraffin-Embedded (FFPE) samples.
Results
To address the desire to perform RNA-Seq on FFPE materials, we evaluated two different library preparation protocols that could be compatible for use with small RNA fragments. We obtained paired Fresh Frozen (FF) and FFPE RNAs from multiple tumors and subjected these to different gene expression profiling methods. We tested 11 human breast tumor samples using: (a) FF RNAs by microarray, mRNA-Seq, Ribo-Zero-Seq and DSN-Seq (Duplex-Specific Nuclease) and (b) FFPE RNAs by Ribo-Zero-Seq and DSN-Seq. We also performed these different RNA-Seq protocols using 10 TCGA tumors as a validation set.
The data from paired RNA samples showed high concordance in transcript quantification across all protocols and between FF and FFPE RNAs. In both FF and FFPE, Ribo-Zero-Seq removed rRNA with comparable efficiency as mRNA-Seq, and it provided an equivalent or less biased coverage on gene 3′ ends. Compared to mRNA-Seq where 69% of bases were mapped to the transcriptome, DSN-Seq and Ribo-Zero-Seq contained significantly fewer reads mapping to the transcriptome (20-30%); in these RNA-Seq protocols, many if not most reads mapped to intronic regions. Approximately 14 million reads in mRNA-Seq and 45–65 million reads in Ribo-Zero-Seq or DSN-Seq were required to achieve the same gene detection levels as a standard Agilent DNA microarray.
Conclusions
Our results demonstrate that compared to mRNA-Seq and microarrays, Ribo-Zero-Seq provides equivalent rRNA removal efficiency, coverage uniformity, genome-based mapped reads, and consistently high quality quantification of transcripts. Moreover, Ribo-Zero-Seq and DSN-Seq have consistent transcript quantification using FFPE RNAs, suggesting that RNA-Seq can be used with FFPE-derived RNAs for gene expression profiling.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-419) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-419
PMCID: PMC4070569  PMID: 24888378
RNA sequencing; FFPE; RNA depletion; Ribo-zero; Gene expression; Microarray
48.  De novo assembly and transcriptome characterization: novel insights into the natural resistance mechanisms of Microtus fortis against Schistosoma japonicum 
BMC Genomics  2014;15(1):417.
Background
Microtus fortis is a non-permissive host of Schistosoma japonicum. It has natural resistance against schistosomes, although the precise resistance mechanisms remain unclear. The paucity of genetic information for M. fortis limits the use of available immunological methods. Thus, studies based on high-throughput sequencing technologies are required to obtain information about resistance mechanisms against S. japonicum.
Results
Using Illumina single-end technology, a de novo assembly of the M. fortis transcriptome produced 67,751 unigenes with an average length of 868 nucleotides. Comparisons were made between M. fortis before and after infection with S. japonicum using RNA-seq quantification analysis. The highest number of differentially expressed genes (DEGs) occurred two weeks after infection, and the highest number of down-regulated DEGs occurred three weeks after infection. Simultaneously, the strongest pathological changes in the liver were observed at week two. Gene ontology terms and pathways related to the DEGs revealed that up-regulated transcripts were involved in metabolism, immunity and inflammatory responses. Quantitative real-time PCR analysis showed that patterns of gene expression were consistent with RNA-seq results.
Conclusions
After infection with S. japonicum, a defensive reaction in M. fortis commenced rapidly, increasing dramatically in the second week, and gradually decreasing three weeks after infection. The obtained M. fortis transcriptome and DEGs profile data demonstrated that natural and adaptive immune responses, play an important role in M. fortis immunity to S. japonicum. These findings provide a better understanding of the natural resistance mechanisms of M. fortis against schistosomes.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-417) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-417
PMCID: PMC4073500  PMID: 24886088
Microtus fortis; Schistosoma japonicum; Non-permissive host; RNA-seq
49.  Stability of gene expression and epigenetic profiles highlights the utility of patient-derived paediatric acute lymphoblastic leukaemia xenografts for investigating molecular mechanisms of drug resistance 
BMC Genomics  2014;15(1):416.
Background
Patient-derived tumour xenografts are an attractive model for preclinical testing of anti-cancer drugs. Insights into tumour biology and biomarkers predictive of responses to chemotherapeutic drugs can also be gained from investigating xenograft models. As a first step towards examining the equivalence of epigenetic profiles between xenografts and primary tumours in paediatric leukaemia, we performed genome-scale DNA methylation and gene expression profiling on a panel of 10 paediatric B-cell precursor acute lymphoblastic leukaemia (BCP-ALL) tumours that were stratified by prednisolone response.
Results
We found high correlations in DNA methylation and gene expression profiles between matching primary and xenograft tumour samples with Pearson’s correlation coefficients ranging between 0.85 and 0.98. In order to demonstrate the potential utility of epigenetic analyses in BCP-ALL xenografts, we identified DNA methylation biomarkers that correlated with prednisolone responsiveness of the original tumour samples. Differential methylation of CAPS2, ARHGAP21, ARX and HOXB6 were confirmed by locus specific analysis. We identified 20 genes showing an inverse relationship between DNA methylation and gene expression in association with prednisolone response. Pathway analysis of these genes implicated apoptosis, cell signalling and cell structure networks in prednisolone responsiveness.
Conclusions
The findings of this study confirm the stability of epigenetic and gene expression profiles of paediatric BCP-ALL propagated in mouse xenograft models. Further, our preliminary investigation of prednisolone sensitivity highlights the utility of mouse xenograft models for preclinical development of novel drug regimens with parallel investigation of underlying gene expression and epigenetic responses associated with novel drug responses.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-416) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-416
PMCID: PMC4057609  PMID: 24885906
Acute lymphoblastic leukaemia; Xenografts; Genome-wide DNA methylation; Microarray analysis of gene expression; Glucocorticoid resistance
50.  Disruption of Mycobacterium avium subsp. paratuberculosis-specific genes impairs in vivo fitness 
BMC Genomics  2014;15(1):415.
Background
Mycobacterium avium subsp. paratuberculosis (MAP) is an obligate intracellular pathogen that infects many ruminant species. The acquisition of foreign genes via horizontal gene transfer has been postulated to contribute to its pathogenesis, as these genetic elements are absent from its putative ancestor, M. avium subsp. hominissuis (MAH), an environmental organism with lesser pathogenicity. In this study, high-throughput sequencing of MAP transposon libraries were analyzed to qualitatively and quantitatively determine the contribution of individual genes to bacterial survival during infection.
Results
Out of 52384 TA dinucleotides present in the MAP K-10 genome, 12607 had a MycoMarT7 transposon in the input pool, interrupting 2443 of the 4350 genes in the MAP genome (56%). Of 96 genes situated in MAP-specific genomic islands, 82 were disrupted in the input pool, indicating that MAP-specific genomic regions are dispensable for in vitro growth (odds ratio = 0.21). Following 5 independent in vivo infections with this pool of mutants, the correlation between output pools was high for 4 of 5 (R = 0.49 to 0.61) enabling us to define genes whose disruption reproducibly reduced bacterial fitness in vivo. At three different thresholds for reduced fitness in vivo, MAP-specific genes were over-represented in the list of predicted essential genes. We also identified additional genes that were severely depleted after infection, and several of them have orthologues that are essential genes in M. tuberculosis.
Conclusions
This work indicates that the genetic elements required for the in vivo survival of MAP represent a combination of conserved mycobacterial virulence genes and MAP-specific genes acquired via horizontal gene transfer. In addition, the in vitro and in vivo essential genes identified in this study may be further characterized to offer a better understanding of MAP pathogenesis, and potentially contribute to the discovery of novel therapeutic and vaccine targets.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-415) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-415
PMCID: PMC4058006  PMID: 24885784
Mycobacterium avium; M. avium subsp. paratuberculosis; Transposon insertion sequencing; Horizontal gene transfer; Mycobacterial pathogenesis

Results 26-50 (6527)