We have described a new panel of genetically diverse and highly recombinant inbred lines of A. thaliana. Like other recombinant inbred lines they do not require repeated genotyping, and since unlimited replicates of each line can be grown, data for many traits can be accumulated, facilitating the study of trait correlations, genotype by environmental interactions, and the genetic basis of phenotypic plasticity. They represent a significant improvement over standard RILs descended from just two founders in that they capture more of the genetic and phenotypic variation present. Furthermore, they have a higher density of recombinants, which improves mapping resolution. We have shown how to take account of the increased genetic complexity in the analysis, and our results show that mapping accuracy and detection is much improved in the MLs when compared to traditional two-parent F2 and RIL mapping populations. Consequently, the MLs are an important new tool for the study of the genetic basis of plant growth and yield under multiple environments. Improved understanding of the genetic basis of such quantitative traits is important for the improvement of crop varieties, and to improve our basic knowledge of plant form, growth and development.
These lines are the first completed population of RILs descended from a large number of founders. Other populations, descended from eight founders are in production in
A. thaliana [50], and
Mus musculus (the Collaborative Cross
[51],
[52]). There are also ongoing efforts to produce similar populations in a number of crops including wheat, rice and sorghum with financial support from Generation Challenge Programme (
http://www.generationcp.org, and Ian Mckay (NIAB), personal communication). The analysis of all these populations presents similar challenges, so lessons learnt with our lines should be valuable to the others.
Current strategies for QTL mapping in Arabidopsis range in complexity from F2 crosses, through panels of recombinant inbred lines and advanced intercross lines derived from two accessions
[48], through combining multiple panels of RILs
[27],
[49], the MAGIC lines described here, and finally association mapping using a large collection of natural accessions. The MAGIC lines represent a compromise between the extreme simplicity of a diallelic system found in a RIL panel descended from just two progenitors with no population structure other than that due to segregation distortion
[48], and the much greater complexity encountered in the natural accessions
[13].
The power to detect a QTL in any mapping population depends on the phenotypic variance it explains, which ultimately depends on the frequency of the minor allele frequency at the QTL. The range in QTL minor allele frequency starts at 0.5 in diallelic populations, to at least 1/19 (0.052) in MAGIC (with mean value 0.22, if the genotyped SNPs are representative), to a potentially lower value in natural accessions (where many variants are unique to one accession
[22]). Thus, to fine map QTL of small effect, a larger number of plants and genotypes are likely to be needed in a study using MAGIC lines or natural accessions, when compared to diallelic populations. Increasing replication within lines reduces non-genetic variance and improves power. However, even an infinite degree of replication cannot increase the fraction of variance explained by a single QTL to more than the fraction of total genetic variance it explains. Hence mapping QTL of very small effect and low minor allele frequency is likely to remain a challenge.
The genetic architecture of the traits we have mapped in this study range from simple – one QTL of large effect – to complex, with many QTL of smaller effect, some of which are physically linked. As expected, it is straightforward to map unlinked QTL, and the power and mapping resolution improves as the fraction of variance explained by the QTL increases. The dissection of multiple linked QTL is harder and the methodologies we have presented here could be improved. Nonetheless it is reassuring that the three methods we used – i.e., resample-based, hierarchical Bayesian and empirical Bayesian – all produce concordant QTL predictions. This suggests that the population structure of the MLs is not an impediment.
While previous RIL QTL studies have produced confidence intervals in the range of 2–20 Mb
[53], the MAGIC lines generally produce much better resolution. The 90% confidence intervals were always smaller than 6 Mb, with some of the confidence intervals under 1 Mb; simulations indicate that for QTL with 10% effect size, the mean distance between the true QTL location and the midpoint of the marker interval containing the QTL peak is about 300 kb. Our results were in agreement with this expectation. For known QTL of large effect, as in the case of
ERECTA,
GLABROUS,
FRI; the distance from the observed peak to the probable candidate genes was less than 300 kb. Certainly, in cases where these lines will be used for gene discovery, the size of the confidence intervals will still be an issue. However, we show that reasonable candidate genes are also found in close proximity to QTL even when a priori candidate genes were not known (e.g. in the case of
EIN 2,
5 and
PHYE).
We have shown that accuracy of about 300 kb is achievable in the ML using the statistical methodology described here. However, in association mapping the resolution is much greater (measured in the low tens of kb, or close to single gene) thanks to the very rapid decay in linkage disequilibrium with distance among wild accessions. Improvements in the power and mapping resolution of MLs are likely to come from using additional lines (currently in production) containing independent recombination events in which mapping resolution of under 200 kb should be achievable. We also expect to improve resolution by incorporating information about sequence differences between the founder strains (Resequencing the 19 founders of the MAGIC lines is now being conducted using sequencing by synthesis
[54]). We plan to use merge analysis
[55] to determine whether the allelic distribution of a variant across the 19 founders is consistent with the inferred phenotypic pattern of action, in order to test whether the variant could be causal for the QTL..
Finally, the combination of MAGIC and association mapping may prove fruitful. While association mapping may be able to identify QTL with better accuracy, the population structure observed among natural accessions requires much care to distinguish between true QTL and false positives
[39]. In comparison, the structure of the MLs is relatively simple. If there are common variants in MLs and natural accessions, the MLs may provide an ideal material to verify QTL identified with association mapping.