The identification and fine mapping of robust quantitative trait loci (QTLs)/genes governing important agro-morphological traits in chickpea still lacks systematic efforts at a genome-wide scale involving wild Cicer accessions. In this context, an 834 simple sequence repeat and single-nucleotide polymorphism marker-based high-density genetic linkage map between cultivated and wild parental accessions (Cicer arietinum desi cv. ICC 4958 and Cicer reticulatum wild cv. ICC 17160) was constructed. This inter-specific genetic map comprising eight linkage groups spanned a map length of 949.4 cM with an average inter-marker distance of 1.14 cM. Eleven novel major genomic regions harbouring 15 robust QTLs (15.6–39.8% R2 at 4.2–15.7 logarithm of odds) associated with four agro-morphological traits (100-seed weight, pod and branch number/plant and plant hairiness) were identified and mapped on chickpea chromosomes. Most of these QTLs showed positive additive gene effects with effective allelic contribution from ICC 4958, particularly for increasing seed weight (SW) and pod and branch number. One robust SW-influencing major QTL region (qSW4.2) has been narrowed down by combining QTL mapping with high-resolution QTL region-specific association analysis, differential expression profiling and gene haplotype-based association/LD mapping. This enabled to delineate a strong SW-regulating ABI3VP1 transcription factor (TF) gene at trait-specific QTL interval and consequently identified favourable natural allelic variants and superior high seed weight-specific haplotypes in the upstream regulatory region of this gene showing increased transcript expression during seed development. The genes (TFs) harbouring diverse trait-regulating QTLs, once validated and fine-mapped by our developed rapid integrated genomic approach and through gene/QTL map-based cloning, can be utilized as potential candidates for marker-assisted genetic enhancement of chickpea.
chickpea; SSR; SNP; QTLs; transcription factor; wild
Eukaryotic microbial communities play key functional roles in soil biology and potentially represent a rich source of natural products including biocatalysts. Culture-independent molecular methods are powerful tools to isolate functional genes from uncultured microorganisms. However, none of the methods used in environmental genomics allow for a rapid isolation of numerous functional genes from eukaryotic microbial communities. We developed an original adaptation of the solution hybrid selection (SHS) for an efficient recovery of functional complementary DNAs (cDNAs) synthesized from soil-extracted polyadenylated mRNAs. This protocol was tested on the Glycoside Hydrolase 11 gene family encoding endo-xylanases for which we designed 35 explorative 31-mers capture probes. SHS was implemented on four soil eukaryotic cDNA pools. After two successive rounds of capture, >90% of the resulting cDNAs were GH11 sequences, of which 70% (38 among 53 sequenced genes) were full length. Between 1.5 and 25% of the cloned captured sequences were expressed in Saccharomyces cerevisiae. Sequencing of polymerase chain reaction-amplified GH11 gene fragments from the captured sequences highlighted hundreds of phylogenetically diverse sequences that were not yet described, in public databases. This protocol offers the possibility of performing exhaustive exploration of eukaryotic gene families within microbial communities thriving in any type of environment.
metatranscriptomics; soil RNA; soil eukaryotes; sequence capture; glycoside hydrolase family GH11
Evolution of bacteria under sublethal concentrations of antibiotics represents a trade-off between growth and resistance to the antibiotic. To understand this trade-off, we performed in vitro evolution of laboratory Escherichia coli under sublethal concentrations of the aminoglycoside kanamycin over short time durations. We report that fixation of less costly kanamycin-resistant mutants occurred earlier in populations growing at lower sublethal concentration of the antibiotic, compared with those growing at higher sublethal concentrations; in the latter, resistant mutants with a significant growth defect persisted longer. Using deep sequencing, we identified kanamycin resistance-conferring mutations, which were costly or not in terms of growth in the absence of the antibiotic. Multiple mutations in the C-terminal end of domain IV of the translation elongation factor EF-G provided low-cost resistance to kanamycin. Despite targeting the same or adjacent residues of the protein, these mutants differed from each other in the levels of resistance they provided. Analysis of one of these mutations showed that it has little defect in growth or in synthesis of green fluorescent protein (GFP) from an inducible plasmid in the absence of the antibiotic. A second class of mutations, recovered only during evolution in higher sublethal concentrations of the antibiotic, deleted the C-terminal end of the ATP synthase shaft. This mutation confers basal-level resistance to kanamycin while showing a strong growth defect in the absence of the antibiotic. In conclusion, the early dynamics of the development of resistance to an aminoglycoside antibiotic is dependent on the levels of stress (concentration) imposed by the antibiotic, with the evolution of less costly variants only a matter of time.
antibiotic resistance; aminoglycosides; evolution
Continuous cell lines that originate from mammalian tissues serve as not only invaluable tools for life sciences, but also important animal cell substrates for the production of various types of biological pharmaceuticals. Vero cells are susceptible to various types of microbes and toxins and have widely contributed to not only microbiology, but also the production of vaccines for human use. We here showed the genome landscape of a Vero cell line, in which 25,877 putative protein-coding genes were identified in the 2.97-Gb genome sequence. A homozygous ∼9-Mb deletion on chromosome 12 caused the loss of the type I interferon gene cluster and cyclin-dependent kinase inhibitor genes in Vero cells. In addition, an ∼59-Mb loss of heterozygosity around this deleted region suggested that the homozygosity of the deletion was established by a large-scale conversion. Moreover, a genomic analysis of Vero cells revealed a female Chlorocebus sabaeus origin and proviral variations of the endogenous simian type D retrovirus. These results revealed the genomic basis for the non-tumourigenic permanent Vero cell lineage susceptible to various pathogens and will be useful for generating new sub-lines and developing new tools in the quality control of Vero cells.
Vero cell; whole genome; infectious diseases; vaccine; animal cell substrate
Pathogen genome sequencing directly from clinical samples is quickly gaining importance in genetic and medical research studies. However, low DNA yield from blood-borne pathogens is often a limiting factor. The problem worsens in extremely base-biased genomes such as the AT-rich Plasmodium falciparum. We present a strategy for whole-genome amplification (WGA) of low-yield samples from P. falciparum prior to short-read sequencing. We have developed WGA conditions that incorporate tetramethylammonium chloride for improved amplification and coverage of AT-rich regions of the genome. We show that this method reduces amplification bias and chimera formation. Our data show that this method is suitable for as low as 10 pg input DNA, and offers the possibility of sequencing the parasite genome from small blood samples.
whole-genome amplification; AT-rich; malaria; tetramethylammonium chloride
Unlike other important Solanaceae crops such as tomato, potato, chili pepper, and tobacco, all of which originated in South America and are cultivated worldwide, eggplant (Solanum melongena L.) is indigenous to the Old World and in this respect it is phylogenetically unique. To broaden our knowledge of the genomic nature of solanaceous plants further, we dissected the eggplant genome and built a draft genome dataset with 33,873 scaffolds termed SME_r2.5.1 that covers 833.1 Mb, ca. 74% of the eggplant genome. Approximately 90% of the gene space was estimated to be covered by SME_r2.5.1 and 85,446 genes were predicted in the genome. Clustering analysis of the predicted genes of eggplant along with the genes of three other solanaceous plants as well as Arabidopsis thaliana revealed that, of the 35,000 clusters generated, 4,018 were exclusively composed of eggplant genes that would perhaps confer eggplant-specific traits. Between eggplant and tomato, 16,573 pairs of genes were deduced to be orthologous, and 9,489 eggplant scaffolds could be mapped onto the tomato genome. Furthermore, 56 conserved synteny blocks were identified between the two species. The detailed comparative analysis of the eggplant and tomato genomes will facilitate our understanding of the genomic architecture of solanaceous plants, which will contribute to cultivation and further utilization of these crops.
Solanum melongena L.; eggplant; genome sequencing; gene prediction; comparative analysis
Hopanoids are present in vast amounts as integral components of bacteria and plants with their primary function to strengthen rigidity of the plasma membrane. To establish their roles more precisely, we conducted sequencing of the whole genome of Rhodomicrobium udaipurense JA643T isolated from a fresh water stream of Udaipur in Himachal Pradesh, India, by using the Illumina HiSeq pair end chemistry of 2 × 100 bp platform. Determined genome showed a high degree of similarity to the genome of R. vannielii ATCC17100T and the 13.7 million reads generated a sequence of 3,649,277 bp possessing 3,611 putative genes. The genomic data were subsequently investigated with respect to genes involved in various features. The machinery required for the degradation of aromatic compounds and resistance to solvents as well as all that required for photosynthesis are present in this organism. Also, through extensive functional annotation, 18 genes involved in the biosynthesis of hopanoids are predicted, namely those responsible for the synthesis of diploptene, diplopterol, adenosylhopane, ribosylhopane, aminobacteriohopanetriol, glycosyl group containing hopanoids and unsaturated hopanoids. The hopanoid biosynthetic pathway was then inferred based on the genes identified and through experimental validation of individual hopanoid molecules. The genome data of R. udaipurense JA643T will be useful in understanding the functional features of hopanoids in this bacterium.
Rhodomicrobium udaipurense JA643T; genome sequence; Illumina Hiseq; hopanoid biosynthesis pathway
Grain amaranths, edible C4 dicots, produce pseudo-cereals high in lysine. Lysine being one of the most limiting essential amino acids in cereals and C4 photosynthesis being one of the most sought-after phenotypes in protein-rich legume crops, the genome of one of the grain amaranths is likely to play a critical role in crop research. We have sequenced the genome and transcriptome of Amaranthus hypochondriacus, a diploid (2n = 32) belonging to the order Caryophyllales with an estimated genome size of 466 Mb. Of the 411 linkage single-nucleotide polymorphisms (SNPs) reported for grain amaranths, 355 SNPs (86%) are represented in the scaffolds and 74% of the 8.6 billion bases of the sequenced transcriptome map to the genomic scaffolds. The genome of A. hypochondriacus, codes for at least 24,829 proteins, shares the paleohexaploidy event with species under the superorders Rosids and Asterids, harbours 1 SNP in 1,000 bases, and contains 13.76% of repeat elements. Annotation of all the genes in the lysine biosynthetic pathway using comparative genomics and expression analysis offers insights into the high-lysine phenotype. As the first grain species under Caryophyllales and the first C4 dicot genome reported, the work presented here will be beneficial in improving crops and in expanding our understanding of angiosperm evolution.
Caryophyllales; grain amaranth; Amaranthus hypochondriacus; lysine biosynthesis; C4 photosynthesis
The pufferfish Takifugu flavidus is an important economic species due to its outstanding flavour and high market value. It has been regarded as an excellent model of genetic study for decades as well. In the present study, three mate-pair libraries of T. flavidus genome were sequenced by the SOLiD 4 next-generation sequencing platform, and the draft genome was constructed with the short reads using an assisted assembly strategy. The draft consists of 50,947 scaffolds with an N50 value of 305.7 kb, and the average GC content was 45.2%. The combined length of repetitive sequences was 26.5 Mb, which accounted for 6.87% of the genome, indicating that the compactness of T. flavidus genome was approximative with that of T. rubripes genome. A total of 1,253 non-coding RNA genes and 30,285 protein-encoding genes were assigned to the genome. There were 132,775 and 394 presumptive genes playing roles in the colour pattern variation, the relatively slow growth and the lipid metabolism, respectively. Among them, genes involved in the microtubule-dependent transport system, angiogenesis, decapentaplegic pathway and lipid mobilization were significantly expanded in the T. flavidus genome. This draft genome provides a valuable resource for understanding and improving both fundamental and applied research with pufferfish in the future.
Takifugu flavidus; draft genome; NGS
Breeding new varieties with low seed glucosinolate (GS) concentrations has long been a prime target in Brassica napus. In this study, a novel association mapping methodology termed ‘associative transcriptomics’ (AT) was applied to a panel of 101 B. napus lines to define genetic regions and also candidate genes controlling total seed GS contents. Over 100,000 informative single-nucleotide polymorphisms (SNPs) and gene expression markers (GEMs) were developed for AT analysis, which led to the identification of 10 SNP and 7 GEM association peaks. Within these peaks, 26 genes were inferred to be involved in GS biosynthesis. A weighted gene co-expression network analysis provided additional 40 candidate genes. The transcript abundance in leaves of two candidate genes, BnaA.GTR2a located on chromosome A2 and BnaC.HAG3b on C9, was correlated with seed GS content, explaining 18.8 and 16.8% of phenotypic variation, respectively. Resequencing of genomic regions revealed six new SNPs in BnaA.GTR2a and four insertions or deletions in BnaC.HAG3b. These deletion polymorphisms were then successfully converted into polymerase chain reaction–based diagnostic markers that can, due to high linkage disequilibrium observed in these regions of the genome, be used for marker-assisted breeding for low seed GS lines.
associative transcriptomics; SNP; GEM; glucosinolate
8-Oxoguanine (8-oxoG) is one of the most common DNA lesions generated by reactive oxygen species. In this study, we analysed the genome-wide distribution profile of 8-oxoG by combining immunoprecipitation by antibodies specific for the DNA fragments containing 8-oxoG with a microarray that covers rat genome. Genome-wide mapping of 8-oxoG in normal rat kidney revealed that 8-oxoG is preferentially located at gene deserts. We did not observe differences in 8-oxoG levels between groups of genes with high and low expression, possibly because of the generally low 8-oxoG levels in genic regions compared with gene deserts. The distribution of 8-oxoG and lamina-associated domains (LADs) were strongly correlated, suggesting that the spatial location of genomic DNA in the nucleus determines the susceptibility to oxidative modifications. One possible explanation for high 8-oxoG levels in LADs is that the nuclear periphery is more susceptible to the oxidative damage caused by the extra-nuclear factors. Moreover, LADs have a rather compact conformation, which may limit the recruitment of repair components to the modified bases.
8-oxoguanine; lamina-associated domain; DNA modification; oxidative stress
The Caco-2 cell line is one of the most important in vitro models for enterocytes, and is used to study drug absorption and disease, including inflammatory bowel disease and cancer. In order to use the model optimally, it is necessary to map its functional entities. In this study, we have generated genome-wide maps of active transcription start sites (TSSs), and active enhancers in Caco-2 cells with or without tumour necrosis factor (TNF)-α stimulation to mimic an inflammatory state. We found 520 promoters that significantly changed their usage level upon TNF-α stimulation; of these, 52% are not annotated. A subset of these has the potential to confer change in protein function due to protein domain exclusion. Moreover, we locate 890 transcribed enhancer candidates, where ∼50% are changing in usage after TNF-α stimulation. These enhancers share motif enrichments with similarly responding gene promoters. As a case example, we characterize an enhancer regulating the laminin-5 γ2-chain (LAMC2) gene by nuclear factor (NF)-κB binding. This report is the first to present comprehensive TSS and enhancer maps over Caco-2 cells, and highlights many novel inflammation-specific promoters and enhancers.
alternative promoters; inflammation; non-coding RNAs; transcribed enhancers; transcriptional regulation
The large genome and allohexaploidy of common wheat have complicated construction of a high-density genetic map. Although improvements in the throughput of next-generation sequencing (NGS) technologies have made it possible to obtain a large amount of genotyping data for an entire mapping population by direct sequencing, including hexaploid wheat, a significant number of missing data points are often apparent due to the low coverage of sequencing. In the present study, a microarray-based polymorphism detection system was developed using NGS data obtained from complexity-reduced genomic DNA of two common wheat cultivars, Chinese Spring (CS) and Mironovskaya 808. After design and selection of polymorphic probes, 13,056 new markers were added to the linkage map of a recombinant inbred mapping population between CS and Mironovskaya 808. On average, 2.49 missing data points per marker were observed in the 201 recombinant inbred lines, with a maximum of 42. Around 40% of the new markers were derived from genic regions and 11% from repetitive regions. The low number of retroelements indicated that the new polymorphic markers were mainly derived from the less repetitive region of the wheat genome. Around 25% of the mapped sequences were useful for alignment with the physical map of barley. Quantitative trait locus (QTL) analyses of 14 agronomically important traits related to flowering, spikes, and seeds demonstrated that the new high-density map showed improved QTL detection, resolution, and accuracy over the original simple sequence repeat map.
array-based genotyping; chromosomal synteny; high-density genetic map; next-generation sequencing; QTL analysis
Rosette neural stem cells (R-NSCs) represent early stage of neural development and possess full neural differentiation and regionalization capacities. R-NSCs are considered as stem cells of neural lineage and have important implications in the study of neurogenesis and cell replacement therapy. However, the molecules regulating their functional properties remain largely unknown. Rhesus monkey is an ideal model to study human neural degenerative diseases and plays intermediate translational roles as therapeutic strategies evolved from rodent systems to human clinical applications. In this study, we derived R-NSCs from rhesus monkey embryonic stem cells (ESCs) and systematically investigated the unique expressions of mRNAs, microRNAs (miRNAs), and signalling pathways by genome-wide comparison of the mRNA and miRNA profilings of ESCs, R-NSCs at early (R-NSCP1) and late (R-NSCP6) passages, and neural progenitor cells. Apart from the R-NSCP1-specific protein-coding genes and miRNAs, we identified several pathways including Hedgehog and Wnt highly activated in R-NSCP1. The possible regulatory interactions among the miRNAs, protein-coding genes, and signalling pathways were proposed. Besides, many genes with alternative splicing switch were identified at R-NSCP1. These data provided valuable resource to understand the regulation of early neurogenesis and to better manipulate the R-NSCs for cell replacement therapy.
rhesus monkeys; embryonic stem cells; neural differentiation; transcriptome; microRNAomes
In many crop species, DNA fingerprinting is required for the precise identification of cultivars to protect the rights of breeders. Many families of retrotransposons have multiple copies throughout the eukaryotic genome and their integrated copies are inherited genetically. Thus, their insertion polymorphisms among cultivars are useful for DNA fingerprinting. In this study, we conducted a DNA fingerprinting based on the insertion polymorphisms of active retrotransposon families (Rtsp-1 and LIb) in sweet potato. Using 38 cultivars, we identified 2,024 insertion sites in the two families with an Illumina MiSeq sequencing platform. Of these insertion sites, 91.4% appeared to be polymorphic among the cultivars and 376 cultivar-specific insertion sites were identified, which were converted directly into cultivar-specific sequence-characterized amplified region (SCAR) markers. A phylogenetic tree was constructed using these insertion sites, which corresponded well with known pedigree information, thereby indicating their suitability for genetic diversity studies. Thus, the genome-wide comparative analysis of active retrotransposon insertion sites using the bench-top MiSeq sequencing platform is highly effective for DNA fingerprinting without any requirement for whole genome sequence information. This approach may facilitate the development of practical polymerase chain reaction-based cultivar diagnostic system and could also be applied to the determination of genetic relationships.
DNA fingerprinting; high-throughput sequencing; molecular marker; retrotransposon; sweet potato
RNA-seq and especially differential RNA-seq-type transcriptomic analyses (dRNA-seq) are powerful analytical tools, as they not only provide insights into gene expression changes but also provide detailed information about all promoters active at a given moment, effectively giving a deep insight into the transcriptional landscape. Synechocystis sp. PCC 6803 (Synechocystis 6803) is a unicellular model cyanobacterium that is widely used in research fields from ecology, photophysiology to systems biology, modelling and biotechnology. Here, we analysed the response of the Synechocystis 6803 primary transcriptome to different, environmentally relevant stimuli. We established genome-wide maps of the transcriptional start sites active under 10 different conditions relevant for photosynthetic growth and identified 4,091 transcriptional units, which provide information about operons, 5′ and 3′ untranslated regions (UTRs). Based on a unique expression factor, we describe regulons and relevant promoter sequences at single-nucleotide resolution. Finally, we report several sRNAs with an intriguing expression pattern and therefore likely function, specific for carbon depletion (CsiR1), nitrogen depletion (NsiR4), phosphate depletion (PsiR1), iron stress (IsaR1) or photosynthesis (PsrR1). This dataset is accompanied by comprehensive information providing extensive visualization and data access to allow an easy-to-use approach for the design of experiments, the incorporation into modelling studies of the regulatory system and for comparative analyses.
comparative transcriptome analysis; cyanobacteria; regulation of gene expression; sRNA; transcriptional unit
In this study, we carried out an evolutionary, transcriptional, and functional analyses of the trihelix transcription factor family. A total of 319 trihelix members, identified from 11 land plant species, were classified into five clades. The results of phylogeny indicate the binding domains of GT1 and GT2 diverged early in the existence of land plants. Genomic localization revealed that the trihelix family members were highly conserved among cereal species, even though some homeologs generated during the tetraploidy of maize were lost. Three-dimensional structural analyses and an examination of subcellular localization of this family supported the involvement of all five clades in transcriptional regulation. Furthermore, the family members from all clades in sorghum and rice showed a broad and dynamic expression pattern in response to abiotic stresses, indicating regulatory subfunctionalization of their original functions. This finding is further supported by the phenotypes of enhanced tolerance to cold, salt, and drought in transgenic plants overexpressing Sb06g023980 and Sb06g024110. In contrast, few Arobidopsis genes showed inducible expression under abiotic stress conditions, which may indicate a functional shift. Finally, our co-expression analysis points to the involvement of this family in various metabolic processes, implying their further functional divergence.
trihelix; abiotic stress; sorghum; subfunctionalization
Fructooligosaccharide (FOS), a prebiotic well known for its health-promoting properties, can improve the human gut ecosystem most likely through changes in its microbial composition. However, the detailed mechanism(s) of action of FOS in the modulation of the gut ecosystem remain(s) obscure. Traditional methods of profiling microbes and metabolites could barely show any significant features due to the existence of large interindividual differences, but our novel microbe–metabolite correlation approach, combined with faecal immunoglobulin A (IgA) measurements, has revealed that the induction of mucosal IgA by FOS supplementation correlated with the presence of specific bacteria. Furthermore, the metabolic dynamics of butyrate, l-phenylalanine, l-lysine and tyramine were positively correlated with that of these bacteria and IgA production, whereas p-cresol was negatively correlated. Taken together, our focused intraindividual analysis with omics approaches is a powerful strategy for uncovering the gut molecular network and could provide a new vista for understanding the human gut ecosystem.
commensal microbiota; correlation analysis; gut ecosystem; metabolite; prebiotics
Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified.
radish; draft sequence; high-density genetic map
With a remarkable increase in genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-organizing map (SOM) is a powerful tool for clustering high-dimensional data on one plane. For oligonucleotide compositions handled as high-dimensional data, we have previously modified the conventional SOM for genome informatics: BLSOM. In the present study, we constructed BLSOMs for oligonucleotide compositions in fragment sequences (e.g. 100 kb) from a wide range of vertebrates, including coelacanth, and found that the sequences were clustered primarily according to species without species information. As one of the nearest living relatives of tetrapod ancestors, coelacanth is believed to provide access to the phenotypic and genomic transitions leading to the emergence of tetrapods. The characteristic oligonucleotide composition found for coelacanth was connected with the lowest dinucleotide CG occurrence (i.e. the highest CG suppression) among fishes, which was rather equivalent to that of tetrapods. This evident CG suppression in coelacanth should reflect molecular evolutionary processes of epigenetic systems including DNA methylation during vertebrate evolution. Sequence of a de novo DNA methylase (Dntm3a) of coelacanth was found to be more closely related to that of tetrapods than that of other fishes.
big data; epigenetic; SOM; DNA methylation; CG suppression
Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters.
secondary metabolism; bioinformatics; filamentous fungi
Plasmodium falciparum malaria imposes a serious public health concern throughout the tropics. Although genetic tools are principally important to fully investigate malaria parasites, currently available forward and reverse tools are fairly limited. It is expected that parasites with a high mutation rate can readily acquire novel phenotypes/traits; however, they remain an untapped tool for malaria biology. Here, we generated a mutator malaria parasite (hereinafter called a ‘malaria mutator’), using site-directed mutagenesis and gene transfection techniques. A mutator Plasmodium berghei line with a defective proofreading 3′ → 5′ exonuclease activity in DNA polymerase δ (referred to as PbMut) and a control P. berghei line with wild-type DNA polymerase δ (referred to as PbCtl) were maintained by weekly passage in ddY mice for 122 weeks. High-throughput genome sequencing analysis revealed that two PbMut lines had 175–178 mutations and a 86- to 90-fold higher mutation rate than that of a PbCtl line. PbMut, PbCtl, and their parent strain, PbWT, showed similar course of infection. Interestingly, PbMut lost the ability to form gametocytes during serial passages. We believe that the malaria mutator system could provide a novel and useful tool to investigate malaria biology.
mutator; Plasmodium; DNA polymerase δ; genome sequencing
We fully sequenced four and partially sequenced six additional plastid genomes of the model legume Medicago truncatula. Three accessions, Jemalong 2HA, Borung and Paraggio, belong to ssp. truncatula, and R108 to ssp. tricycla. We report here that the R108 ptDNA has a ∼45-kb inversion compared with the ptDNA in ssp. truncatula, mediated by a short, imperfect repeat. DNA gel blot analyses of seven additional ssp. tricycla accessions detected only one of the two alternative genome arrangements, represented by three and four accessions each. Furthermore, we found a variable number of repeats in the essential accD and ycf1 coding regions. The repeats within accD are recombinationally active, yielding variable-length insertions and deletions in the central part of the coding region. The length of ACCD was distinct in each of the 10 sequenced ecotypes, ranging between 650 and 796 amino acids. The repeats in the ycf1 coding region are also recombinationally active, yielding short indels in 10 regions of the reading frames. Thus, the plastid genome variability we report here could be linked to repeat-mediated genome rearrangements. However, the rate of recombination was sufficiently low, so that no heterogeneity of ptDNA could be observed in populations maintained by single-seed descent.
accD; Medicago truncatula; plastid genome; ptDNA; ycf1