|Home | About | Journals | Submit | Contact Us | Français|
Next-generation sequencing (NGS) technologies enable the rapid production of an enormous quantity of sequence data. These powerful new technologies allow the identification of mutations by whole-genome sequencing. However, most reported NGS-based mapping methods, which are based on bulked segregant analysis, are costly and laborious. To address these limitations, we designed a versatile NGS-based mapping method that consists of a combination of low- to medium-coverage multiplex SOLiD (Sequencing by Oligonucleotide Ligation and Detection) and classical genetic rough mapping. Using only low to medium coverage reduces the SOLiD sequencing costs and, since just 10 to 20 mutant F2 plants are required for rough mapping, the operation is simple enough to handle in a laboratory with limited space and funding. As a proof of principle, we successfully applied this method to identify the CTR1, which is involved in boron-mediated root development, from among a population of high boron requiring Arabidopsis thaliana mutants. Our work demonstrates that this NGS-based mapping method is a moderately priced and versatile method that can readily be applied to other model organisms.
Next-generation sequencing (NGS) technologies have superseded conventional Sanger sequencing by capillary electrophoresis in many instances of large-scale analysis, such as de novo sequencing, transcriptomics, DNA methylation analysis, metagenomics and population genetic studies.1-4 These new technologies also provide a powerful tool for identifying ethyl methanesulfonate (EMS)- and N-ethyl-N-nitrosourea (ENU)-induced the mutations by whole-genome re-sequencing. For example, mutations underlying an aberrant neuronal phenotype in Caenorhabditis elegans were identified by re-sequencing of the mutant genomes.5 In another case, mutants of the ethanol producing yeast Pichia stipitis were sequenced to identify mutations that facilitated efficient fermentation.6 The ability to identify mutations via NGS technologies has greatly reduced the amount of time needed for conventional map-based cloning.
In plant research, as in research in a variety of model organisms, these NGS technologies have been successfully applied to identify the mutations underlying phenotypes of interest. Schneeberger, et al.7 developed a method called SHOREmap that uses an Illumina Genome Analyzer (GA) to identify causative mutations of A. thaliana.7 By sequencing a genomic DNA sample prepared from a pool of 500 mutant F2 plants that were obtained by crossing a mutant with another wild-type accession, they have identified a causative mutation. This method is based on an approach termed bulked segregant analysis.8,9 For example, a mutant in the Columbia (Col-0) background was crossed to the polymorphic Landsberg erecta (Ler) accession, followed by selfing of the F1 progeny to generate an F2 population in which the mutant phenotype segregates according to Mendelian rules. Then, F2 plants exhibiting the mutant phenotype were pooled and the bulk segregant pool was subjected to deep sequencing. The causative region was confined to a small genomic region by analyzing the index of enrichment of Col-0-type single nucleotide polymorphisms (SNPs) and, finally, causal mutations were identified from the sequence data obtained by deep sequencing analysis.7 Recently, Austin, et al.10 developed a Next-Gen Mapping (NGM) method by modifying the above-mentioned SHOREmap technique, and succeeded in identifying three genes involved in cell wall biology.10 Moreover, a spontaneous mutation in a nonreference A. thaliana accession and EMS-induced mutations in a nonreference accession background were successfully identified using deep sequencing.11,12 These modifications of bulked segregant analysis are extremely useful for identifying mutations in A. thaliana and can also be applied in crops and other organisms with fully sequenced genomes. However, deep sequencing remains expensive and laborious, as approximately 100 or more mutant F2 plants are required for this type of bulked segregant analysis.
To address these problems, we designed a versatile NGS-based mapping method that incorporates SOLiD (Sequencing by Oligonucleotide Ligation and Detection). This mapping method is based on a combination of low- to medium-coverage SOLiD13 and classical genetic rough mapping. Sequencing at just low to medium coverage reduced costs. Furthermore, since rough mapping required only 10 to 20 F2 plants with the mutant phenotype, experiments using this strategy do not require a lot of space. Using this method, we rapidly identified CTR1, which is involved in boron-mediated growth. Here, we describe this NGS-based mapping method and discuss its applications.
Boron is an essential nutrient for plants and boron deficiency is a major cause of reduced crop production.14 Boron maintains the structure and function of the plant cell wall by cross-linking the pectic polysaccharide rhamnogalacturonan (RG) II.15 Several boron transporters that are upregulated under boron limitation have been identified in A. thaliana16-18 and their polar localization and degradation through trafficking pathways in plant cells have been demonstrated.19,20 Although significant progress has been made in our understanding of boron transport mechanisms, the precise role of boron in plant growth and development remains unclear.
To obtain insight into boron function in plants, the EMS-mutagenized M2 seeds of A. thaliana were screened for mutants that required more boron than the wild type for root elongation. Approximately 20,000 seeds were sown onto normal medium (30 µM B) and short-root plants were transferred to medium containing 1 mM boron after 7 d. After growth on high boron medium for 7 d, plants that exhibited increased root elongation at 1 mM boron were selected. From this screening, we isolated 13 mutants. We named one of these mutants b26–6/ctr1–16, as it is allelic to the ctr1 mutants described later (Fig. 1A).
The b26–6/ctr1–16 mutant in the Col-0 background was crossed with Ler wild-type plants for rough mapping. The F2 population segregated into wild type and mutant type at a ratio of 3:1, indicating that the mutant phenotype is caused by a single recessive mutation. Genomic DNA was isolated from 12 F2 plants that exhibited the mutant phenotype and the mutation was assigned to a chromosome using simple sequence length polymorphism (SSLP) markers F15A17 and T32M21. A candidate region with the mutation was rough mapped to between 0.70 Mb and 1.26 Mb on chromosome 5, a region that spanned 175 putative genes annotated in TAIR9 (Fig. 1B and Table 2).
To identify point mutations, we sequenced the genomic DNA of the b26–6/ctr1–16 mutant by SOLiD. We constructed sequence libraries from the b26–6/ctr1–16 mutant and seven other mutants derived from Lehle Seeds using the SOLiD barcoding system to distinguish the eight samples (Fig. 2). The 8-plex libraries were sequenced on a single SOLiD slide. In total, 378.4 M reads were obtained, of which 58.4 M were assigned to the b26–6/ctr1–16 mutant library (see Table 1 for details). Of all the b26–6/ctr1–16 mutant library reads, 73.2% were mapped to the TAIR9 release of the A. thaliana Col-0 genome. The median value of per-base sequence depth was 10 and the genome coverage was 91.8% (Table 1 and Fig. S1).
SNPs refer to sites that differ from the TAIR9 release of the A. thaliana Col-0 genome. We used diBayes for SNP calling with a low stringency setting. Although this setting may increase the risk of reporting false positives, considering that the sequence depth was relatively low (median = 10), we chose this aggressive but sensitive SNP calling strategy to avoid overlooking true SNPs of low sequence coverage. With this strategy, the diBayes program listed 2162 homozygous SNP sites (Table 2).
We assessed the pattern of genome-wide SNPs by plotting the SNP frequencies using a bin size of 1 Mb (Fig. S2A). Almost all SNPs were distributed evenly throughout the genome at an average of 18 SNPs per 1 Mb, with a few regions being enriched in SNPs. This SNP enrichment may be due to variations that were already present in the parental line, rather than to an EMS mutagenesis bias, as a substantial number of SNPs were shared in these regions among eight Lehle Seeds libraries sequenced in this experiment (Fig. S2B). By subtracting shared SNPs, which were regarded as background, the number of SNPs was reduced to 2046. We then extracted SNPs that involved G-to-A (or complementary C-to-T) nucleotide substitutions, which are typical of EMS mutagenesis,21,22 and obtained 462 SNPs. We classified the SNPs according to gene models: 179 occurred in exons and 0 in splice donor/acceptor sites. In total, 177 genes were found to be mutagenized in the genome of the b26–6/ctr1–16 mutant. The b26–6/ctr1–16 mutant was isolated from Lehle Seeds, which are widely used by Arabidopsis researchers. Based on this experiment and additional unpublished ones (Kamiya T and Fujiwara T, unpublished data), an average of 450 mutations was found to exist in the genomes of the Lehle Seeds mutants, approximately 200 mutations of which were exonic. This information can be used in the design of experiments to isolate A. thaliana mutants.
Our re-sequencing method detected three SNPs within the 0.56-Mb candidate genomic region on chromosome 5, which was narrowed down by genetic rough mapping and found to contain 175 genes (Fig. 1B). Two of the three mutations resulted in non-synonymous amino acid changes (Table 3).
Genome re-sequencing by SOLiD identified three candidate SNPs for the b26–6/ctr1–16 mutant in the region narrowed down by rough mapping. Two of the three mutations resulted in an amino acid substitution in At5g02990 and At5g03730, respectively. At5g03730 is annotated as CONSTITUTIVE TRIPLE RESPONSE1 (CTR1),23 and the root phenotype of ctr1 mutants, especially an abundance of hairy roots, was quite similar to that of the mutant identified in this study, b26–6/ctr1–16.24,25 b26–6/ctr1–16 has a point mutation that changes the 610th codon from CTT to TTT, which results in an amino acid substitution of Leu to Phe. Leu610 is highly conserved among species and localizes to the kinase domain of CTR1.
To show that CTR1 is the causal gene of b26–6/ctr1–16, the boron response of two ctr1 alleles, ctr1–1 and sis1/ctr1–12, was examined (Fig. 1A).23,25 Both alleles displayed a similar phenotype to b26–6/ctr1–16 under normal growth conditions and, like b26–6/ctr1–16, root elongation was restored to normal rates by treatment with 1 mM boron. These results indicate that the causal gene of b26–6/ctr1–16 is indeed CTR1.
In this paper, we demonstrate that a versatile NGS-based mapping method, which combines low- to medium-coverage sequencing by SOLiD with classical genetic rough mapping, may be used to identify causal candidate mutations in A. thaliana. This method is superior to other NGS-based mapping methods in terms of time and cost. Recently, a number of NGS-based mapping methods using bulked segregant analysis have been proposed as means to identify causative mutations.5-7,10-12 Although these methods are undoubtedly effective, they are all laborious and time consuming, as they involve the preparation of approximately 100 or more mutant F2 plants for bulked segregant analysis. This is particularly problematic when dealing with mutants that have a phenotype that is difficult to propagate or score. In addition, when a mutant is crossed to the other accession for mapping, many SNPs are mixed. These background variations occasionally make it difficult to distinguish between the mutant and wild-type phenotype. To avoid this problem, we separated genetic mapping and NGS analysis using SOLiD with low- to medium-coverage. In our method, rough mapping requires only 10 to 20 mutant F2 plants, which can readily be produced in a small laboratory space. In fact, rough mapping using 12 mutant F2 plants was sufficient to identify the causal gene in this study. Based on this experiment and additional unpublished ones (Kamiya T and Fujiwara T, unpublished data), we have identified mutations in intervals of up to 3 Mb that were narrowed down by rough mapping by using approximately ten mutant F2 segregants.
Furthermore, since low- to medium-coverage sequencing can be used to identify a mutation, our method facilitates the simultaneous analysis of multiple mutants (a slide can be shared by 16~20 mutants under the current conditions) by using a multiplexed barcoded samples in a single sequencing run. Thus, we can handle multiple mutants in a single run, and/or multiple laboratories can share slides of barcoded DNA. This method reduces the cost of each sample.
We applied our NGS-based mapping technique to mutants other than b26–6/ctr1–16, i.e., #6–10, #7–3 and #9–2, to identify the causative mutations, and succeeded in detecting a small number of testable candidate mutations. In A. thaliana, the CLAVATA3 (CLV3) peptide negatively regulates the size of the shoot meristem via the CLV1, CLV2, CRN and RPK2 receptors.26-28 Moreover, overexpression of CLV3 or some other CLV3/ESR-like (CLE) gene, and CLE peptide treatments reduce the size of the root meristem and the length of the roots, suggesting that the CLE signaling pathway also acts in the root.29 To identify new factors that regulate root meristem development, we performed mutant screening as an index of the CLE peptide-resistance phenotype of root formation in A. thaliana, and isolated these #6–10, #7–3 and #9–2 mutants. In these three cases, we detected 9, 6 and 13 SNPs in 3.1-, 2.7- and 5.8-Mb regions, using 17, 15 and 19 F2 mapping populations, respectively (Yamada M, Tabata R and Sawa S, unpublished data; Table. 4).
We also used this method to identify the causal mutations of mutants derived from non-Col-0 backgrounds. In each case, a small number of candidate mutations could be detected. Therefore, our method appears to be suitable for detecting mutations in a non-reference A. thaliana accession background (data not shown).
Moreover, we have already used our method to identify strong mutations in mutagenized rice plants (data not shown). Rice has a complex genome that is three times larger than that of A. thaliana,30 indicating that our method may readily be applied to various other model plants that can be crossed to a mapping line. We will soon provide a platform to identify the mutations responsible for phenotypes of interest in rice and other plants, as we have already done for those in A. thaliana.
Examining a large number of F2 plants for traditional map-based cloning and bulked segregant analysis is generally time-consuming and labor-intensive. Each NGS-based mapping method using bulked segregant analysis proposed by Uchida, et al.12 and Austin, et al.10 encouraged the use of 80 and at least 50 mutant F2 plants, respectively. Because these NGS-based mapping methods are excellent for identifying causal mutations, they are useful in cases in which a sufficient number of mutant F2 plants are readily collectible. However, in this study, we have shown that only 10 to 20 mutant F2 plants are needed to identify causal mutations using our NGS-based mapping method, as indicated in a number of examples. Our method is particularly effective in situations in which it is difficult to propagate or score mutant F2 plants.
The causal mutation of the b26–6/ctr1–16 mutant is located in a low-recombination region near the telomere of chromosome 5. Thus, we have identified a causal mutation even in a region of the genome that is difficult to sequence, i.e., near the telomere. Moreover, the mutations underlying #6–10, #7–3 and #9–2, were located on various chromosomes and regions, demonstrating that the NGS-based mapping method presented here has utility across the A. thaliana genome; however, it remains unclear whether this method is applicable to centromeric regions.
The short root phenotype of b26–6/ctr1–16 mutant plants was recovered by supplementation of high boron (Fig. 1A). Considering that CTR1 is a suppressor of ethylene signaling,23 of which loss-of-function mutant shows constitutive active ethylene response, there are two possible reasons for the recovery. One is that high boron inhibits constitutive ethylene signaling in b26–6/ctr1–16 mutant. It has been reported that low boron induces ethylene responsive gene and ethylene responsive reporter gene expression in tobacco BY-2 cell and A. thaliana roots, respectively.31,32 In other words, boron inhibits the induction of ethylene responsive gene. Although there is no direct evidence that high boron inhibits ethylene signaling, it is possible that ethylene signaling downstream of CTR1 is suppressed by high boron leading to root elongation. The other possibility is that high boron rescues the defect in RG-II cross-linking with borate, which is the only reported direct function of boron, in the mutant.33,34 In ctl1/arm mutant, which produces increased amount of ethylene,35 cell wall composition of root is changed; Increase in arabinose and galactose: Decrease in xyloglucan (XyG) and mannose (Man), and methylesterification of homogalacturonans. Those changes in cell wall components may affect the structure of RG-II. For example, Man containing nucleotide sugar, GDP-D-Man, is required for the synthesis of GDP-L-fucose, which is a component of RG-II. The phenotype of mur1 mutant, which has defect in the gene required for GDP-L-fucose from GDP-D-Man, was recovered by high boron like b26–6/ctr1–16.36-38 Furthermore, the phenotype of mur3 and hsr8/mur4, which have defect in RG-II composition, was also recovered by high boron.38 Those result suggest that short root in b26–6/ctr1–16 is caused by the defect in boron cross-linking in RG-II.
In this study, we developed a versatile NGS-based mapping method that combined low- to medium-coverage sequencing by SOLiD with classical genetic rough mapping and successfully applied this method to identify the CTR1, which is involved in boron-mediated root development. In fact, we demonstrated that this is a cost effective and versatile method for identifying EMS-induced causal mutations. We hope that this approach will be a useful tool for many Arabidopsis researchers and will be applied to studies on other model plants.
EMS-mutagenized Columbia (Col-0 gl1–1) seeds were purchased from Lehle Seeds. ctr1–1 (CS8057) and sis1/ctr1–14 (CS3874) were obtained from the Arabidopsis Biological Resource Center. The seeds were surface-sterilized and sowed on MGRL growth medium39 solidified with 1.2% gellan gum (Wako) supplemented with 1% sucrose.
Approximately 20,000 plants were sowed onto normal MGRL medium (30 µM boron). After incubation for 2 d at 4°C, the plates were placed vertically and the plants were grown at 22°C under a 16 h light/8 h dark photoperiod. After 7 d, the plants with short roots were transferred to medium containing 1 mM boron. After 7 d, the plants with partially recovered roots were selected and the phenotype was confirmed in the M3 generation.
For mapping, the b26–6/ctr1–16 mutant was crossed with wild-type Landsberg erecta (Ler) and F2 seeds were obtained. Genomic DNA was isolated from F2 plants that exhibited the mutant phenotype and the gene was mapped using simple sequence length polymorphism (SSLP) markers and the following primer pairs: F15A17-F (5′-ACCCAAACTTGGCTCACAAC-3′), F15A17-R (5′-CAAAATCATCTCCCCTTGGA-3′) and T32M21-F (5′-CAAACGTAAAACATAAAAGAGAACCA-3′), T32M21-R (5′-TCCGTTGCTTAGAACATTTGC-3′).
Genomic DNA of the M4 generation of the b26–6/ctr1–16 homozygous mutant was isolated using the DNeasy Plant Mini Kit (QIAGEN). The genomic DNA was sheared into 100- to 150-bp fragments using the Covaris S2 system (Covaris, Inc.) in a 120-μL reaction containing 10 mM TE buffer in a Covaris microTube and the following program: 20% duty cycle, 5 intensity and 200 cycles per burst for 60 sec at 5°C. Fragments were arranged into a SOLiD barcoded fragment library using the SOLiD Fragment Library Construction Kit (Life Technologies).
The fragment libraries were amplified by emulsion PCR (ePCR) at a library concentration of 0.5 pM. After ePCR, the beads were modified at the 3′ terminal and deposited onto a SOLiD sequencing slide, according to the manufacturer’s instructions. The libraries were sequenced to 50 base pairs using an Applied Biosystems SOLiD 3 Plus System.
Color space reads were mapped to the Arabidopsis thaliana genome reference TAIR9 (TAIR_chr_all.fas) using BioScope 1.3 software (Life Technologies) with default parameters. Single nucleotide polymorphisms (SNPs) were called using a diBayes SNP caller, a component of BioScope 1.3, which implements a Bayesian algorithm that includes color space error detection. diBayes was executed with a parameter setting defined as “low call stringency” using the default parameters (major parameters: dibayes.reads.min.mapping.qv = 8; dibayes.reads.only.uniquely.mapped.allow = no; dibayes.snp.both.strands = no; dibayes.snps.min.base.qv = 26; dibayes.hom.min.nonref.start.pos = 1). An Integrative Genomics Viewer (IGV) http://www.broadinstitute.org/igv/)40 was used to visualize the mapped reads and called SNPs along with gene models. To predict the functional impact of the SNPs, we categorized them into coding (synonymous or non-synonymous), intronic, intergenic and splicing site using custom scripts. Gene models and the annotations were based on TAIR9.
NOTE: Author, please cite Table 4 in text.
We thank Yuko Kawara and Yayoi Inui-Tsujimoto for technical assistance and Dr. N. Uchida for his comments on the manuscript.
This study was performed under the NIBB cooperative research program (11–103). This work was supported by: Grant-in Aid for Creative Scientific Research; Grant-in-Aid for Young Scientists S to S.S. (19677001) from Japan Society of the Promotion of Science; Grant-in-Aid for Scientific Research for Priority Areas from the Ministry of Education, Culture, Sports, Science and Technology, Japan to S.S. (221S0002, 23119517, 23012034, 24114001 and 24114009), a Grant-in-Aid for Scientific Research S (to T.F.) and a Grant-in-Aid for Scientific Research on Innovative Areas (to T.F.) from the Ministry of Education, Culture, Sports, Science and Technology.
Conceived and designed the experiments: R.T., T.K., T.F. and S.S. Performed the experiments: R.T., T.K., K.Y. and M.Y. Analyzed the data: S.S. and K.Y. Contributed reagents/materials/analysis tools: M.H., T.F. and S.S. Wrote the paper: R.T., T.K. and S.S.
No potential conflicts of interest were disclosed.
Previously published online: www.landesbioscience.com/journals/psb/article/22534