Search tips
Search criteria

Results 1-25 (7362)

Clipboard (0)
Year of Publication
more »
1.  Identification of sex-linked SNP markers using RAD sequencing suggests ZW/ZZ sex determination in Pistacia vera L. 
BMC Genomics  2015;16(1):98.
Pistachio (Pistacia vera L.) is a dioecious species that has a long juvenility period. Therefore, development of marker-assisted selection (MAS) techniques would greatly facilitate pistachio cultivar-breeding programs. The sex determination mechanism is presently unknown in pistachio. The generation of sex-linked markers is likely to reduce time, labor, and costs associated with breeding programs, and will help to clarify the sex determination system in pistachio.
Restriction site-associated DNA (RAD) markers were used to identify sex-linked markers and to elucidate the sex determination system in pistachio. Eight male and eight female F1 progenies from a Pistacia vera L. Siirt × Bağyolu cross, along with the parents, were subjected to RAD sequencing in two lanes of a Hi-Seq 2000 sequencing platform. This generated 449 million reads, comprising approximately 37.7 Gb of sequences. There were 33,757 polymorphic single nucleotide polymorphism (SNP) loci between the parents. Thirty-eight of these, from 28 RAD reads, were detected as putative sex-associated loci in pistachio. Validation was performed by SNaPshot analysis in 42 mature F1 progenies and in 124 cultivars and genotypes in a germplasm collection. Eight loci could distinguish sex with 100% accuracy in pistachio. To ascertain cost-effective application of markers in a breeding program, high-resolution melting (HRM) analysis was performed; four markers were found to perfectly separate sexes in pistachio. Because of the female heterogamety in all candidate SNP loci, we report for the first time that pistachio has a ZZ/ZW sex determination system. As the reported female-to-male segregation ratio is 1:1 in all known segregating populations and there is no previous report of super-female genotypes or female heteromorphic chromosomes in pistachio, it appears that the WW genotype is not viable.
Sex-linked SNP markers were identified and validated in a large germplasm and proved their suitability for MAS in pistachio. HRM analysis successfully validated the sex-linked markers for MAS. For the first time in dioecious pistachio, a female heterogamety ZW/ZZ sex determination system is suggested.
PMCID: PMC4336685
Pistachio; Pistacia vera; Sex determination; RADseq; SNP; SNaPshot; HRM
2.  Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry 
BMC Genomics  2015;16(1):92.
The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; “tent-dwelling” Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage.
We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 ‘loss-of-function’ and 2,314 ’deleterious’ coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant’s family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays.
We report, for the first time, reference genome resource for the population of Persian ancestry. The resource provides a starting point for designing large-scale genetic studies in Peninsula including Kuwait, and Persian population. Such efforts on populations under-represented in global genome variation surveys help augment current knowledge on human genome diversity.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1233-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4336699
Persian genome; Personal genome; Whole genome sequencing; Kuwaiti population; Arabian Peninsula
3.  Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution 
BMC Genomics  2015;16(1):87.
Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood.
We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA).
Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function.
Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions.
We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1245-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4333152
Mouse ENCODE; Regulatory sequences; Comparative genomics
4.  Characterization of a novel RXR receptor in the salmon louse (Lepeophtheirus salmonis, Copepoda) regulating growth and female reproduction 
BMC Genomics  2015;16(1):81.
Nuclear receptors have crucial roles in all metazoan animals as regulators of gene transcription. A wide range of studies have elucidated molecular and biological significance of nuclear receptors but there are still a large number of animals where the knowledge is very limited. In the present study we have identified an RXR type of nuclear receptor in the salmon louse (Lepeophtheirus salmonis) (i.e. LsRXR). RXR is one of the two partners of the Ecdysteroid receptor in arthropods, the receptor for the main molting hormone 20-hydroxyecdysone (E20) with a wide array of effects in arthropods.
Five different LsRXR transcripts were identified by RACE showing large differences in domain structure. The largest isoforms contained complete DNA binding domain (DBD) and ligand binding domain (LBD), whereas some variants had incomplete or no DBD. LsRXR is transcribed in several tissues in the salmon louse including ovary, subcuticular tissue, intestine and glands. By using Q-PCR it is evident that the LsRXR mRNA levels vary throughout the L. salmonis life cycle. We also show that the truncated LsRXR transcript comprise about 50% in all examined samples. We used RNAi to knock-down the transcription in adult reproducing female lice. This resulted in close to zero viable offspring. We also assessed the LsRXR RNAi effects using a L. salmonis microarray and saw significant effects on transcription in the female lice. Transcription of the major yolk proteins was strongly reduced by knock-down of LsRXR. Genes involved in lipid metabolism and transport were also down regulated. Furthermore, different types of growth processes were up regulated and many cuticle proteins were present in this group.
The present study demonstrates the significance of LsRXR in adult female L. salmonis and discusses the functional aspects in relation to other arthropods. LsRXR has a unique structure that should be elucidated in the future.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1277-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4333900
Ultraspiracle; Retinoid X receptor; Sea louse; Copepod; Atlantic salmon; RNAi; Microarray
5.  Genome-wide comparison of PU.1 and Spi-B binding sites in a mouse B lymphoma cell line 
BMC Genomics  2015;16(1):76.
Spi-B and PU.1 are highly related members of the E26-transformation-specific (ETS) family of transcription factors that have similar, but not identical, roles in B cell development. PU.1 and Spi-B are both expressed in B cells, and have been demonstrated to redundantly activate transcription of genes required for B cell differentiation and function. It was hypothesized that Spi-B and PU.1 occupy a similar set of regions within the genome of a B lymphoma cell line.
To compare binding regions of Spi-B and PU.1, murine WEHI-279 lymphoma cells were infected with retroviral vectors encoding 3XFLAG-tagged PU.1 or Spi-B. Anti-FLAG chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) was performed. Analysis for high-stringency enriched genomic regions demonstrated that PU.1 occupied 4528 regions and Spi-B occupied 3360 regions. The majority of regions occupied by Spi-B were also occupied by PU.1. Regions bound by Spi-B and PU.1 were frequently located immediately upstream of genes associated with immune response and activation of B cells. Motif-finding revealed that both transcription factors were predominantly located at the ETS core domain (GGAA), however, other unique motifs were identified when examining regions associated with only one of the two factors. Motifs associated with unique PU.1 binding included POU2F2, while unique motifs in the Spi-B regions contained a combined ETS-IRF motif.
Our results suggest that complementary biological functions of PU.1 and Spi-B may be explained by their interaction with a similar set of regions in the genome of B cells. However, sites uniquely occupied by PU.1 or Spi-B provide insight into their unique functions.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1303-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4334403
PU.1; Spi-B; ChIP-seq; Motif; B cell; Gene regulation
6.  Comparative genome and transcriptome analyses of the social amoeba Acytostelium subglobosum that accomplishes multicellular development without germ-soma differentiation 
BMC Genomics  2015;16(1):80.
Social amoebae are lower eukaryotes that inhabit the soil. They are characterized by the construction of a starvation-induced multicellular fruiting body with a spore ball and supportive stalk. In most species, the stalk is filled with motile stalk cells, as represented by the model organism Dictyostelium discoideum, whose developmental mechanisms have been well characterized. However, in the genus Acytostelium, the stalk is acellular and all aggregated cells become spores. Phylogenetic analyses have shown that it is not an ancestral genus but has lost the ability to undergo cell differentiation.
We performed genome and transcriptome analyses of Acytostelium subglobosum and compared our findings to other available dictyostelid genome data. Although A. subglobosum adopts a qualitatively different developmental program from other dictyostelids, its gene repertoire was largely conserved. Yet, families of polyketide synthase and extracellular matrix proteins have not expanded and a serine protease and ABC transporter B family gene, tagA, and a few other developmental genes are missing in the A. subglobosum lineage. Temporal gene expression patterns are astonishingly dissimilar from those of D. discoideum, and only a limited fraction of the ortholog pairs shared the same expression patterns, so that some signaling cascades for development seem to be disabled in A. subglobosum.
The absence of the ability to undergo cell differentiation in Acytostelium is accompanied by a small change in coding potential and extensive alterations in gene expression patterns.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1278-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4334915
Multicellular development; Cell differentiation; Signaling cascade; Gene expression; Evolution
7.  Transcriptome profiling of the dynamic life cycle of the scypohozoan jellyfish Aurelia aurita 
BMC Genomics  2015;16(1):74.
The moon jellyfish Aurelia aurita is a widespread scyphozoan species that forms large seasonal blooms. Here we provide the first comprehensive view of the entire complex life of the Aurelia Red Sea strain by employing transcriptomic profiling of each stage from planula to mature medusa.
A de novo transcriptome was assembled from Illumina RNA-Seq data generated from six stages throughout the Aurelia life cycle. Transcript expression profiling yielded clusters of annotated transcripts with functions related to each specific life-cycle stage. Free-swimming planulae were found highly enriched for functions related to cilia and microtubules, and the drastic morphogenetic process undergone by the planula while establishing the future body of the polyp may be mediated by specifically expressed Wnt ligands. Specific transcripts related to sensory functions were found in the strobila and the ephyra, whereas extracellular matrix functions were enriched in the medusa due to high expression of transcripts such as collagen, fibrillin and laminin, presumably involved in mesoglea development. The CL390-like gene, suggested to act as a strobilation hormone, was also highly expressed in the advanced strobila of the Red Sea species, and in the medusa stage we identified betaine-homocysteine methyltransferase, an enzyme that may play an important part in maintaining equilibrium of the medusa’s bell. Finally, we identified the transcription factors participating in the Aurelia life-cycle and found that 70% of these 487 identified transcription factors were expressed in a developmental-stage-specific manner.
This study provides the first scyphozoan transcriptome covering the entire developmental trajectory of the life cycle of Aurelia. It highlights the importance of numerous stage-specific transcription factors in driving morphological and functional changes throughout this complex metamorphosis, and is expected to be a valuable resource to the community.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1320-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4334923
Aurelia aurita; Jellyfish; Scyphozoa; Transcriptomics; Life-cycle stages
8.  Next generation sequencing analysis reveals that the ribonucleases RNase II, RNase R and PNPase affect bacterial motility and biofilm formation in E. coli 
BMC Genomics  2015;16(1):72.
The RNA steady-state levels in the cell are a balance between synthesis and degradation rates. Although transcription is important, RNA processing and turnover are also key factors in the regulation of gene expression. In Escherichia coli there are three main exoribonucleases (RNase II, RNase R and PNPase) involved in RNA degradation. Although there are many studies about these exoribonucleases not much is known about their global effect in the transcriptome.
In order to study the effects of the exoribonucleases on the transcriptome, we sequenced the total RNA (RNA-Seq) from wild-type cells and from mutants for each of the exoribonucleases (∆rnb, ∆rnr and ∆pnp). We compared each of the mutant transcriptome with the wild-type to determine the global effects of the deletion of each exoribonucleases in exponential phase. We determined that the deletion of RNase II significantly affected 187 transcripts, while deletion of RNase R affects 202 transcripts and deletion of PNPase affected 226 transcripts. Surprisingly, many of the transcripts are actually down-regulated in the exoribonuclease mutants when compared to the wild-type control. The results obtained from the transcriptomic analysis pointed to the fact that these enzymes were changing the expression of genes related with flagellum assembly, motility and biofilm formation. The three exoribonucleases affected some stable RNAs, but PNPase was the main exoribonuclease affecting this class of RNAs. We confirmed by qPCR some fold-change values obtained from the RNA-Seq data, we also observed that all the exoribonuclease mutants were significantly less motile than the wild-type cells. Additionally, RNase II and RNase R mutants were shown to produce more biofilm than the wild-type control while the PNPase mutant did not form biofilms.
In this work we demonstrate how deep sequencing can be used to discover new and relevant functions of the exoribonucleases. We were able to obtain valuable information about the transcripts affected by each of the exoribonucleases and compare the roles of the three enzymes. Our results show that the three exoribonucleases affect cell motility and biofilm formation that are two very important factors for cell survival, especially for pathogenic cells.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1237-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4335698
Exoribonucleases; RNase II; RNase R; PNPase; Transcriptome; RNA-Seq; Motility; Biofilm formation
9.  Simple regression for correcting ΔCt bias in RT-qPCR low-density array data normalization 
BMC Genomics  2015;16(1):82.
Reverse transcription quantitative PCR (RT-qPCR) is considered the gold standard for quantifying relative gene expression. Normalization of RT-qPCR data is commonly achieved by subtracting the Ct values of the internal reference genes from the Ct values of the target genes to obtain ΔCt. ΔCt values are then used to derive ΔΔCt when compared to a control group or to conduct further statistical analysis.
We examined two rheumatoid arthritis RT-qPCR low density array datasets and found that this normalization method introduces substantial bias due to differences in PCR amplification efficiency among genes. This bias results in undesirable correlations between target genes and reference genes, which affect the estimation of fold changes and the tests for differentially expressed genes. Similar biases were also found in multiple public mRNA and miRNA RT-qPCR array datasets we analysed. We propose to regress the Ct values of the target genes onto those of the reference genes to obtain regression coefficients, which are then used to adjust the reference gene Ct values before calculating ΔCt.
The per-gene regression method effectively removes the ΔCt bias. This method can be applied to both low density RT-qPCR arrays and individual RT-qPCR assays.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1274-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4335788
RT-PCR; Normalization; ΔCt; Housekeeping genes; Regression
10.  De novo assembly and characterization of transcriptomes of early-stage fruit from two genotypes of Annona squamosa L. with contrast in seed number 
BMC Genomics  2015;16(1):86.
Annona squamosa L., a popular fruit tree, is the most widely cultivated species of the genus Annona. The lack of transcriptomic and genomic information limits the scope of genome investigations in this important shrub. It bears aggregate fruits with numerous seeds. A few rare accessions with very few seeds have been reported for Annona. A massive pyrosequencing (Roche, 454 GS FLX+) of transcriptome from early stages of fruit development (0, 4, 8 and 12 days after pollination) was performed to produce expression datasets in two genotypes, Sitaphal and NMK-1, that show a contrast in the number of seeds set in fruits. The data reported here is the first source of genome-wide differential transcriptome sequence in two genotypes of A. squamosa, and identifies several candidate genes related to seed development.
Approximately 1.9 million high-quality clean reads were obtained in the cDNA library from the developing fruits of both the genotypes, with an average length of about 568 bp. Quality-reads were assembled de novo into 2074 to 11004 contigs in the developing fruit samples at different stages of development. The contig sequence data of all the four stages of each genotype were combined into larger units resulting into 14921 (Sitaphal) and 14178 (NMK-1) unigenes, with a mean size of more than 1 Kb. Assembled unigenes were functionally annotated by querying against the protein sequences of five different public databases (NCBI non redundant, Prunus persica, Vitis vinifera, Fragaria vesca, and Amborella trichopoda), with an E-value cut-off of 10−5. A total of 4588 (Sitaphal) and 2502 (NMK-1) unigenes did not match any known protein in the NR database. These sequences could be genes specific to Annona sp. or belong to untranslated regions. Several of the unigenes representing pathways related to primary and secondary metabolism, and seed and fruit development expressed at a higher level in Sitaphal, the densely seeded cultivar in comparison to the poorly seeded NMK-1. A total of 2629 (Sitaphal) and 3445 (NMK-1) Simple Sequence Repeat (SSR) motifs were identified respectively in the two genotypes. These could be potential candidates for transcript based microsatellite analysis in A. squamosa.
The present work provides early-stage fruit specific transcriptome sequence resource for A. squamosa. This repository will serve as a useful resource for investigating the molecular mechanisms of fruit development, and improvement of fruit related traits in A. squamosa and related species.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1248-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4336476
Annona squamosa L. transcriptomics; Early-stage developing fruit; De novo transcriptome assembly; Simple sequence repeats; Web resource
11.  Species-specific duplications driving the recent expansion of NBS-LRR genes in five Rosaceae species 
BMC Genomics  2015;16(1):77.
Disease resistance (R) genes from different Rosaceae species have been identified by map-based cloning for resistance breeding. However, there are few reports describing the pattern of R-gene evolution in Rosaceae species because several Rosaceae genome sequences have only recently become available.
Since most disease resistance genes encode NBS-LRR proteins, we performed a systematic genome-wide survey of NBS-LRR genes between five Rosaceae species, namely Fragaria vesca (strawberry), Malus × domestica (apple), Pyrus bretschneideri (pear), Prunus persica (peach) and Prunus mume (mei) which contained 144, 748, 469, 354 and 352 NBS-LRR genes, respectively. A high proportion of multi-genes and similar Ks peaks (Ks = 0.1- 0.2) of gene families in the four woody genomes were detected. A total of 385 species-specific duplicate clades were observed in the phylogenetic tree constructed using all 2067 NBS-LRR genes. High percentages of NBS-LRR genes derived from species-specific duplication were found among the five genomes (61.81% in strawberry, 66.04% in apple, 48.61% in pear, 37.01% in peach and 40.05% in mei). Furthermore, the Ks and Ka/Ks values of TIR-NBS-LRR genes (TNLs) were significantly greater than those of non-TIR-NBS-LRR genes (non-TNLs), and most of the NBS-LRRs had Ka/Ks ratios less than 1, suggesting that they were evolving under a subfunctionalization model driven by purifying selection.
Our results indicate that recent duplications played an important role in the evolution of NBS-LRR genes in the four woody perennial Rosaceae species. Based on the phylogenetic tree produced, it could be inferred that species-specific duplication has mainly contributed to the expansion of NBS-LRR genes in the five Rosaceae species. In addition, the Ks and Ka/Ks ratios suggest that the rapidly evolved TNLs have different evolutionary patterns to adapt to different pathogens compared with non-TNL resistant genes.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1291-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4336698
NBS-LRR genes; Rosaceae species; Disease resistance genes; Species-specific duplication
12.  A systems genetics study of swine illustrates mechanisms underlying human phenotypic traits 
BMC Genomics  2015;16(1):88.
The pig, which shares greater similarities with human than with mouse, is important for agriculture and for studying human diseases. However, similarities in the genetic architecture and molecular regulations underlying phenotypic variations in humans and swine have not been systematically assessed.
We systematically surveyed ~500 F2 pigs genetically and phenotypically. By comparing candidates for anemia traits identified in swine genome-wide SNP association and human genome-wide association studies (GWAS), we showed that both sets of candidates are related to the biological process “cellular lipid metabolism” in liver. Human height is a complex heritable trait; by integrating genome-wide SNP data and human adipose Bayesian causal network, which closely represents bone transcriptional regulations, we identified PLAG1 as a causal gene for limb bone length. This finding is consistent with GWAS findings for human height and supports the common genetic architecture between swine and humans. By leveraging a human protein-protein interaction network, we identified two putative candidate causal genes TGFB3 and DAB2IP and the known regulators MESP1 and MESP2 as responsible for the variation in rib number and identified the potential underlying molecular mechanisms. In mice, knockout of Tgfb3 and Tgfb2 together decreases rib number.
Our findings show that integrative network analyses reveal causal regulators underlying the genetic association of complex traits in swine and that these causal regulators have similar effects in humans. Thus, swine are a potentially good animal model for studying some complex human traits that are not under intense selection.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1240-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4336704
Systems genetics; Swine model; Complex human traits
13.  Functional analysis of C1 family cysteine peptidases in the larval gut of Тenebrio molitor and Tribolium castaneum 
BMC Genomics  2015;16(1):75.
Larvae of the tenebrionids Tenebrio molitor and Tribolium castaneum have highly compartmentalized guts, with primarily cysteine peptidases in the acidic anterior midgut that contribute to the early stages of protein digestion.
High throughput sequencing was used to quantify and characterize transcripts encoding cysteine peptidases from the C1 papain family in the gut of tenebrionid larvae. For T. castaneum, 25 genes and one questionable pseudogene encoding cysteine peptidases were identified, including 11 cathepsin L or L-like, 11 cathepsin B or B-like, and one each F, K, and O. The majority of transcript expression was from two cathepsin L genes on chromosome 10 (LOC659441 and LOC659502). For cathepsin B, the major expression was from genes on chromosome 3 (LOC663145 and LOC663117). Some transcripts were expressed at lower levels or not at all in the larval gut, including cathepsins F, K, and O. For T. molitor, there were 29 predicted cysteine peptidase genes, including 14 cathepsin L or L-like, 13 cathepsin B or B-like, and one each cathepsin O and F. One cathepsin L and one cathepsin B were also highly expressed, orthologous to those in T. castaneum. Peptidases lacking conservation in active site residues were identified in both insects, and sequence analysis of orthologs indicated that changes in these residues occurred prior to evolutionary divergence. Sequences from both insects have a high degree of variability in the substrate binding regions, consistent with the ability of these enzymes to degrade a variety of cereal seed storage proteins and inhibitors. Predicted cathepsin B peptidases from both insects included some with a shortened occluding loop without active site residues in the middle, apparently lacking exopeptidase activity and unique to tenebrionid insects. Docking of specific substrates with models of T. molitor cysteine peptidases indicated that some insect cathepsins B and L bind substrates with affinities similar to human cathepsin L, while others do not and have presumably different substrate specificity.
These studies have refined our model of protein digestion in the larval gut of tenebrionid insects, and suggest genes that may be targeted by inhibitors or RNA interference for the control of cereal pests in storage areas.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1306-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4336737
High throughput sequencing; Cysteine peptidases; Cathepsin L; Cathepsin B; Peptidase homologs; Tenebrio molitor; Tribolium castaneum
14.  Involvement of a citrus meiotic recombination TTC-repeat motif in the formation of gross deletions generated by ionizing radiation and MULE activation 
BMC Genomics  2015;16(1):69.
Transposable-element mediated chromosomal rearrangements require the involvement of two transposons and two double-strand breaks (DSB) located in close proximity. In radiobiology, DSB proximity is also a major factor contributing to rearrangements. However, the whole issue of DSB proximity remains virtually unexplored.
Based on DNA sequencing analysis we show that the genomes of 2 derived mutations, Arrufatina (sport) and Nero (irradiation), share a similar 2 Mb deletion of chromosome 3. A 7 kb Mutator-like element found in Clemenules was present in Arrufatina in inverted orientation flanking the 5′ end of the deletion. The Arrufatina Mule displayed “dissimilar” 9-bp target site duplications separated by 2 Mb. Fine-scale single nucleotide variant analyses of the deleted fragments identified a TTC-repeat sequence motif located in the center of the deletion responsible of a meiotic crossover detected in the citrus reference genome.
Taken together, this information is compatible with the proposal that in both mutants, the TTC-repeat motif formed a triplex DNA structure generating a loop that brought in close proximity the originally distinct reactive ends. In Arrufatina, the loop brought the Mule ends nearby the 2 distinct insertion target sites and the inverted insertion of the transposable element between these target sites provoked the release of the in-between fragment. This proposal requires the involvement of a unique transposon and sheds light on the unresolved question of how two distinct sites become located in close proximity. These observations confer a crucial role to the TTC-repeats in fundamental plant processes as meiotic recombination and chromosomal rearrangements.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1280-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4334395
Double-strand breaks; Crossover hot spot; Structural variations; Transposable-element
15.  Transcriptome analysis of northern elephant seal (Mirounga angustirostris) muscle tissue provides a novel molecular resource and physiological insights 
BMC Genomics  2015;16(1):64.
The northern elephant seal, Mirounga angustirostris, is a valuable animal model of fasting adaptation and hypoxic stress tolerance. However, no reference sequence is currently available for this and many other marine mammal study systems, hindering molecular understanding of marine adaptations and unique physiology.
We sequenced a transcriptome of M. angustirostris derived from muscle sampled during an acute stress challenge experiment to identify species-specific markers of stress axis activation and recovery. De novo assembly generated 164,966 contigs and a total of 522,699 transcripts, of which 68.70% were annotated using mouse, human, and domestic dog reference protein sequences. To reduce transcript redundancy, we removed highly similar isoforms in large gene families and produced a filtered assembly containing 336,657 transcripts. We found that a large number of annotated genes are associated with metabolic signaling, immune and stress responses, and muscle function. Preliminary differential expression analysis suggests a limited transcriptional response to acute stress involving alterations in metabolic and immune pathways and muscle tissue maintenance, potentially driven by early response transcription factors such as Cebpd.
We present the first reference sequence for Mirounga angustirostris produced by RNA sequencing of muscle tissue and cloud-based de novo transcriptome assembly. We annotated 395,102 transcripts, some of which may be novel isoforms, and have identified thousands of genes involved in key physiological processes. This resource provides elephant seal-specific gene sequences, complementing existing metabolite and protein expression studies and enabling future work on molecular pathways regulating adaptations such as fasting, hypoxia, and environmental stress responses in marine mammals.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1253-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4328371
Transcriptome; de novo assembly; Pinniped; Stress; Cloud computing
16.  Defining the gene repertoire and spatiotemporal expression profiles of adhesion G protein-coupled receptors in zebrafish 
BMC Genomics  2015;16(1):62.
Adhesion G protein-coupled receptors (aGPCRs) are the second largest of the five GPCR families and are essential for a wide variety of physiological processes. Zebrafish have proven to be a very effective model for studying the biological functions of aGPCRs in both developmental and adult contexts. However, aGPCR repertoires have not been defined in any fish species, nor are aGPCR expression profiles in adult tissues known. Additionally, the expression profiles of the aGPCR family have never been extensively characterized over a developmental time-course in any species.
Here, we report that there are at least 59 aGPCRs in zebrafish that represent homologs of 24 of the 33 aGPCRs found in humans; compared to humans, zebrafish lack clear homologs of GPR110, GPR111, GPR114, GPR115, GPR116, EMR1, EMR2, EMR3, and EMR4. We find that several aGPCRs in zebrafish have multiple paralogs, in line with the teleost-specific genome duplication. Phylogenetic analysis suggests that most zebrafish aGPCRs cluster closely with their mammalian homologs, with the exception of three zebrafish-specific expansion events in Groups II, VI, and VIII. Using quantitative real-time PCR, we have defined the expression profiles of 59 zebrafish aGPCRs at 12 developmental time points and 10 adult tissues representing every major organ system. Importantly, expression profiles of zebrafish aGPCRs in adult tissues are similar to those previously reported in mouse, rat, and human, underscoring the evolutionary conservation of this family, and therefore the utility of the zebrafish for studying aGPCR biology.
Our results support the notion that zebrafish are a potentially useful model to study the biology of aGPCRs from a functional perspective. The zebrafish aGPCR repertoire, classification, and nomenclature, together with their expression profiles during development and in adult tissues, provides a crucial foundation for elucidating aGPCR functions and pursuing aGPCRs as therapeutic targets.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1296-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4335454
Adhesion G protein-coupled receptors; Zebrafish genome; Expression profiling; High-throughput quantitative real-time PCR
17.  Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system 
BMC Genomics  2015;16(1):61.
The giant panda (Ailuropoda melanoleuca) is a critically endangered species endemic to China. Microsatellites have been preferred as the most popular molecular markers and proven effective in estimating population size, paternity test, genetic diversity for the critically endangered species. The availability of the giant panda complete genome sequences provided the opportunity to carry out genome-wide scans for all types of microsatellites markers, which now opens the way for the analysis and development of microsatellites in giant panda.
By screening the whole genome sequence of giant panda in silico mining, we identified microsatellites in the genome of giant panda and analyzed their frequency and distribution in different genomic regions. Based on our search criteria, a repertoire of 855,058 SSRs was detected, with mono-nucleotides being the most abundant. SSRs were found in all genomic regions and were more abundant in non-coding regions than coding regions. A total of 160 primer pairs were designed to screen for polymorphic microsatellites using the selected tetranucleotide microsatellite sequences. The 51 novel polymorphic tetranucleotide microsatellite loci were discovered based on genotyping blood DNA from 22 captive giant pandas in this study. Finally, a total of 15 markers, which showed good polymorphism, stability, and repetition in faecal samples, were used to establish the novel microsatellite marker system for giant panda. Meanwhile, a genotyping database for Chengdu captive giant pandas (n = 57) were set up using this standardized system. What’s more, a universal individual identification method was established and the genetic diversity were analysed in this study as the applications of this marker system.
The microsatellite abundance and diversity were characterized in giant panda genomes. A total of 154,677 tetranucleotide microsatellites were identified and 15 of them were discovered as the polymorphic and stable loci. The individual identification method and the genetic diversity analysis method in this study provided adequate material for the future study of giant panda.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1268-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4335702
Ailuropoda melanoleuca; Genome sequence; Tetranucleotide microsatellite; Marker system
18.  Deep RNA sequencing reveals a high frequency of alternative splicing events in the fungus Trichoderma longibrachiatum 
BMC Genomics  2015;16(1):54.
Alternative splicing is crucial for proteome diversity and functional complexity in higher organisms. However, the alternative splicing landscape in fungi is still elusive.
The transcriptome of the filamentous fungus Trichoderma longibrachiatum was deep sequenced using Illumina Solexa technology. A total of 14305 splice junctions were discovered. Analyses of alternative splicing events revealed that the number of all alternative splicing events (10034), intron retentions (IR, 9369), alternative 5’ splice sites (A5SS, 167), and alternative 3’ splice sites (A3SS, 302) is 7.3, 7.4, 5.1, and 5.9-fold higher, respectively, than those observed in the fungus Aspergillus oryzae using Illumina Solexa technology. This unexpectedly high ratio of alternative splicing suggests that alternative splicing is important to the transcriptome diversity of T. longibrachiatum. Alternatively spliced introns had longer lengths, higher GC contents, and lower splice site scores than constitutive introns. Further analysis demonstrated that the isoform relative frequencies were correlated with the splice site scores of the isoforms. Moreover, comparative transcriptomics determined that most enzymes related to glycolysis and the citrate cycle and glyoxylate cycle as well as a few carbohydrate-active enzymes are transcriptionally regulated.
This study, consisting of a comprehensive analysis of the alternative splicing landscape in the filamentous fungus T. longibrachiatum, revealed an unexpectedly high ratio of alternative splicing events and provided new insights into transcriptome diversity in fungi.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1251-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4324775  PMID: 25652134
Alternative splicing; Fungi; RNA-Seq; Intron retention; Transcriptome; Trichoderma longibrachiatum
19.  Development of chromosome-specific markers with high polymorphism for allotetraploid cotton based on genome-wide characterization of simple sequence repeats in diploid cottons (Gossypium arboreum L. and Gossypium raimondii Ulbrich) 
BMC Genomics  2015;16(1):55.
Tetraploid cotton contains two sets of homologous chromosomes, the At- and Dt-subgenomes. Consequently, many markers in cotton were mapped to multiple positions during linkage genetic map construction, posing a challenge to anchoring linkage groups and mapping economically-important genes to particular chromosomes. Chromosome-specific markers could solve this problem. Recently, the genomes of two diploid species were sequenced whose progenitors were putative contributors of the At- and Dt-subgenomes to tetraploid cotton. These sequences provide a powerful tool for developing chromosome-specific markers given the high level of synteny among tetraploid and diploid cotton genomes. In this study, simple sequence repeats (SSRs) on each chromosome in the two diploid genomes were characterized. Chromosome-specific SSRs were developed by comparative analysis and proved to distinguish chromosomes.
A total of 200,744 and 142,409 SSRs were detected on the 13 chromosomes of Gossypium arboreum L. and Gossypium raimondii Ulbrich, respectively. Chromosome-specific SSRs were obtained by comparing SSR flanking sequences from each chromosome with those from the other 25 chromosomes. The average was 7,996 per chromosome. To confirm their chromosome specificity, these SSRs were used to distinguish two homologous chromosomes in tetraploid cotton through linkage group construction. The chromosome-specific SSRs and previously-reported chromosome markers were grouped together, and no marker mapped to another homologous chromosome, proving that the chromosome-specific SSRs were unique and could distinguish homologous chromosomes in tetraploid cotton. Because longer dinucleotide AT-rich repeats were the most polymorphic in previous reports, the SSRs on each chromosome were sorted by motif type and repeat length for convenient selection. The primer sequences of all chromosome-specific SSRs were also made publicly available.
Chromosome-specific SSRs are efficient tools for chromosome identification by anchoring linkage groups to particular chromosomes during genetic mapping and are especially useful in mapping of qualitative-trait genes or quantitative trait loci with just a few markers. The SSRs reported here will facilitate a number of genetic and genomic studies in cotton, including construction of high-density genetic maps, positional gene cloning, fingerprinting, and genetic diversity and comparative evolutionary analyses among Gossypium species.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1265-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4325953  PMID: 25652321
Chromosome-specific; SSR; Tetraploid cotton; Genome-wide
20.  Transcriptomic analysis of hepatic responses to testosterone deficiency in miniature pigs fed a high-cholesterol diet 
BMC Genomics  2015;16(1):59.
Recent studies have indicated that low serum testosterone levels are associated with increased risk of developing hepatic steatosis; however, the mechanisms mediating this phenomenon have not been fully elucidated. To gain insight into the role of testosterone in modulating hepatic steatosis, we investigated the effects of testosterone on the development of hepatic steatosis in pigs fed a high-fat and high-cholesterol (HFC) diet and profiled hepatic gene expression by RNA-Seq in HFC-fed intact male pigs (IM), castrated male pigs (CM), and castrated male pigs with testosterone replacement (CMT).
Serum testosterone levels were significantly decreased in CM pigs, and testosterone replacement attenuated castration-induced testosterone deficiency. CM pigs showed increased liver injury accompanied by increased hepatocellular steatosis, inflammation, and elevated serum alanine aminotransferase levels compared with IM pigs. Moreover, serum levels of total cholesterol, low-density lipoprotein cholesterol, and triglycerides were markedly increased in CM pigs. Testosterone replacement decreased serum and hepatic lipid levels and improved liver injury in CM pigs. Compared to IM and CMT pigs, CM pigs had lower serum levels of superoxide dismutase but higher levels of malondialdehyde. Gene expression analysis revealed that upregulated genes in the livers of CM pigs were mainly enriched for genes mediating immune and inflammatory responses, oxidative stress, and apoptosis. Surprisingly, the downregulated genes mainly included those that regulate metabolism-related processes, including fatty acid oxidation, steroid biosynthesis, cholesterol and bile acid metabolism, and glucose metabolism. KEGG analysis showed that metabolic pathways, fatty acid degradation, pyruvate metabolism, the tricarboxylic acid cycle, and the nuclear factor-kappaB signaling pathway were the major pathways altered in CM pigs.
This study demonstrated that testosterone deficiency aggravated hypercholesterolemia and hepatic steatosis in pigs fed an HFC diet and that these effects could be reversed by testosterone replacement therapy. Impaired metabolic processes, enhanced immune and inflammatory responses, oxidative stress, and apoptosis may contribute to the increased hepatic steatosis induced by testosterone deficiency and an HFC diet. These results deepened our understanding of the molecular mechanisms of testosterone deficiency-induced hepatic steatosis and provided a foundation for future investigations.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1283-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4328429
Testosterone; Nonalcoholic fatty liver disease; Hepatic steatosis; Miniature pigs; RNA-Seq
21.  Comparative analysis of the silk gland transcriptomes between the domestic and wild silkworms 
BMC Genomics  2015;16(1):60.
Bombyx mori was domesticated from the Chinese wild silkworm, Bombyx mandarina. Wild and domestic silkworms are good models in which to investigate genes related to silk protein synthesis that may be differentially expressed in silk glands, because their silk productions are very different. Here we used the mRNA deep sequencing (RNA-seq) approach to identify the differentially expressed genes (DEGs) in the transcriptomes of the median/posterior silk glands of two domestic and two wild silkworms.
The results indicated that about 58% of the total genes were expressed (reads per kilo bases per million reads (RPKM) ≥ 1) in each silkworm. Comparisons of the domestic and wild silkworm transcriptomes revealed 32 DEGs, of which 16 were up-regulated in the domestic silkworms compared with in the wild silkworms, and the other 16 were up-regulated in the wild silkworms compared with in the domestic silkworms. Quantitative real-time polymerase chain reaction (qPCR) was performed for 15 randomly selected DEGs in domestic versus wild silkworms. The qPCR results were mostly consistent with the expression levels determined from the RNA-seq data. Based on a Gene Ontology (GO) enrichment analysis and manual annotation, five of the up-regulated DEGs in the wild silkworms were predicted to be involved in immune response, and seven of the up-regulated DEGs were related to the GO term “oxidoreductase activity”, which is associated with antioxidant systems. In the domestic silkworms, the up-regulated DEGs were related mainly to tissue development, secretion of proteins and metabolism.
The up-regulated DEGs in the two domestic silkworms may be involved mainly in the highly efficient biosynthesis and secretion of silk proteins, while the up-regulated DEGs in the two wild silkworms may play more important roles in tolerance to pathogens and environment adaptation. Our results provide a foundation for understanding the molecular mechanisms of the silk production difference between domestic and wild silkworms.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1287-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4328555
Bombyx mori; Domestication; Silk gland; RNA-seq; Differentially expressed gene
22.  Fast forward genetics to identify mutations causing a high light tolerant phenotype in Chlamydomonas reinhardtii by whole-genome-sequencing 
BMC Genomics  2015;16(1):57.
High light tolerance of microalgae is a desired phenotype for efficient cultivation in large scale production systems under fluctuating outdoor conditions. Outdoor cultivation requires the use of either wild-type or non-GMO derived mutant strains due to safety concerns. The identification and molecular characterization of such mutants derived from untagged forward genetics approaches was limited previously by the tedious and time-consuming methods involving techniques such as classical meiotic mapping. The combination of mapping with next generation sequencing technologies offers alternative strategies to identify genes involved in high light adaptation in untagged mutants.
We used the model alga Chlamydomonas reinhardtii in a non-GMO mutation strategy without any preceding crossing step or pooled progeny to identify genes involved in the regulatory processes of high light adaptation. To generate high light tolerant mutants, wildtype cells were mutagenized only to a low extent, followed by a stringent selection. We performed whole-genome sequencing of two independent mutants hit1 and hit2 and the parental wildtype. The availability of a reference genome sequence and the removal of shared bakground variants between the wildtype strain and each mutant, enabled us to identify two single nucleotide polymorphisms within the same gene Cre02.g085050, hereafter called LRS1 (putative Light Response Signaling protein 1). These two independent single amino acid exchanges are both located in the putative WD40 propeller domain of the corresponding protein LRS1. Both mutants exhibited an increased rate of non-photochemical-quenching (NPQ) and an improved resistance against chemically induced reactive oxygen species. In silico analyses revealed homology of LRS1 to the photoregulatory protein COP1 in plants.
In this work we identified the nuclear encoded gene LRS1 as an essential factor for high light adaptation in C. reinhardtii. The causative random mutation within this gene was identified by a rapid and efficient method, avoiding any preceding crossing step, meiotic mapping, or pooled progeny. Our results open up new insights into mechanisms of high light adaptation in microalgae and at the same time provide a simplified strategy for non-GMO forward genetics, a crucial precondition that could result in the identification of key factors for economically relevant biological processes within algae.
PMCID: PMC4336690
Whole-genome-sequencing; Chlamydomonas reinhardtii; Forward genetics; Mutation identification; SNPs; High light
23.  Expression and regulation of long noncoding RNAs in TLR4 signaling in mouse macrophages 
BMC Genomics  2015;16(1):45.
Though long non-coding RNAs (lncRNAs) are emerging as critical regulators of immune responses, whether they are involved in LPS-activated TLR4 signaling pathway and how is their expression regulated in mouse macrophages are still unexplored.
By repurposing expression microarray probes, we identified 994 lncRNAs in bone marrow-derived macrophages (BMDMs) and classified them to enhancer-like lncRNAs (elncRNAs) and promoter-associated lncRNAs (plncRNAs) according to chromatin signatures defined by relative levels of H3K4me1 and H3K4me3. Fifteen elncRNAs and 12 plncRNAs are differentially expressed upon LPS stimulation. The expression change of lncRNAs and their neighboring protein-coding genes are significantly correlated. Also, the regulation of both elncRNAs and plncRNAs expression is associated with H3K4me3 and H3K27Ac. Crucially, many identified LPS-regulated lncRNAs, such as lncRNA-Nfkb2 and lncRNA-Rel, locate near to immune response protein-coding genes. The majority of LPS-regulated lncRNAs had at least one binding site among the transcription factors p65, IRF3, JunB and cJun.
We established an integrative microarray analysis pipeline for profiling lncRNAs. Also, our results suggest that lncRNAs can be important regulators of LPS-induced innate immune response in BMDMs.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1270-5) contains supplementary material, which is available to authorized users.
PMCID: PMC4320810  PMID: 25652569
TLR4; LPS; elncRNA; plncRNA; Histone modification
24.  Using QTL mapping to investigate the relationships between abiotic stress tolerance (drought and salinity) and agronomic and physiological traits 
BMC Genomics  2015;16(1):43.
Drought and salinity are two major abiotic stresses that severely limit barley production worldwide. Physiological and genetic complexity of these tolerance traits has significantly slowed the progress of developing stress-tolerant cultivars. Marker-assisted selection (MAS) may potentially overcome this problem. In the current research, seventy two double haploid (DH) lines from a cross between TX9425 (a Chinese landrace variety with superior drought and salinity tolerance) and a sensitive variety, Franklin were used to identify quantitative trait loci (QTL) for drought and salinity tolerance, based on a range of developmental and physiological traits.
Two QTL for drought tolerance (leaf wilting under drought stress) and one QTL for salinity tolerance (plant survival under salt stress) were identified from this population. The QTL on 2H for drought tolerance determined 42% of phenotypic variation, based on three independent experiments. This QTL was closely linked with a gene controlling ear emergency. The QTL on 5H for drought tolerance was less affected by agronomic traits and can be effectively used in breeding programs. A candidate gene for this QTL on 5H was identified based on the draft barley genome sequence. The QTL for proline accumulation, under both drought and salinity stresses, were located on different positions to those for drought and salinity tolerance, indicating no relationship with plant tolerance to either of these stresses.
Using QTL mapping, the relationships between QTL for agronomic and physiological traits and plant drought and salinity tolerance were studied. A new QTL for drought tolerance which was not linked to any of the studied traits was identified. This QTL can be effectively used in breeding programs. It was also shown that proline accumulation under stresses was not necessarily linked with drought or salinity tolerance based on methods of phenotyping used in this experiment. The use of proline content in breeding programs can also be limited by the accuracy of phenotyping.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1243-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4320823  PMID: 25651931
QTL analysis; Drought tolerance; Proline content; Leaf wilting; Salinity tolerance
25.  Comparative whole-genome analyses of selection marker–free rice-based cholera toxin B-subunit vaccine lines and wild-type lines 
BMC Genomics  2015;16(1):48.
We have developed a rice-based oral cholera vaccine named MucoRice-CTB (Cholera Toxin B-subunit) by using an Agrobacterium tumefaciens–mediated co-transformation system. To assess the genome-wide effects of this system on the rice genome, we compared the genomes of three selection marker–free MucoRice-CTB lines with those of two wild-type rice lines (Oryza sativa L. cv. Nipponbare). Mutation profiles of the transgenic and wild-type genomes were examined by next-generation sequencing (NGS).
Using paired-end short-read sequencing, a total of more than 300 million reads for each line were obtained and mapped onto the rice reference genome. The number and distribution of variants were similar in all five lines: the numbers of line-specific variants ranged from 524 to 842 and corresponding mutation rates ranged from 1.41 × 10−6 per site to 2.28 × 10−6 per site. The frequency of guanine-to-thymine and cytosine-to-adenine transversions was higher in MucoRice-CTB lines than in WT lines. The transition-to-transversion ratio was 1.12 in MucoRice-CTB lines and 1.65 in WT lines. Analysis of variant-sharing profiles showed that the variants common to all five lines were the most abundant, and the numbers of line-specific variant for all lines were similar. The numbers of non-synonymous amino acid substitutions in MucoRice-CTB lines (15 to 21) were slightly higher than those in WT lines (7 or 8), whereas the numbers of frame shifts were similar in all five lines.
We conclude that MucoRice-CTB and WT are almost identical at the genomic level and that genome-wide effects caused by the Agrobacterium-mediated transformation system for marker-free MucoRice-CTB lines were slight. The comparative whole-genome analyses between MucoRice-CTB and WT lines using NGS provides a reliable estimate of genome-wide differences. A similar approach may be applicable to other transgenic rice plants generated by using this Agrobacterium-mediated transformation system.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1285-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4320824  PMID: 25653106
Plant-made pharmaceuticals; Oral vaccine; Whole-genome resequencing; Transgenic rice; MucoRice-CTB; Variant comparison

Results 1-25 (7362)