Search tips
Search criteria

Results 1-25 (38)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
Document Types
1.  Epigenomic diversity in a global collection of Arabidopsis thaliana accessions 
Cell  2016;166(2):492-505.
The epigenome orchestrates genome accessibility, functionality and three-dimensional structure. Because epigenetic variation can impact transcription and thus phenotypes, it may contribute to adaptation. Here we report 1,107 high-quality single-base resolution methylomes and 1,203 transcriptomes from the 1001 Genomes collection of Arabidopsis thaliana. Although the genetic basis of methylation variation is highly complex, geographic origin is a major predictor of genome-wide DNA methylation levels and of altered gene expression caused by epialleles. Comparison to cistrome and epicistrome datasets identifies associations between transcription factor binding sites, methylation, nucleotide variation and co-expression modules. Physical maps for nine of the most diverse genomes reveals how transposons and other structural variants shape the epigenome, with dramatic effects on immunity genes. The 1001 Epigenomes Project provides a comprehensive resource for understanding how variation in DNA methylation contributes to molecular and non-molecular phenotypes in natural populations of the most studied model plant.
PMCID: PMC5172462  PMID: 27419873
2.  Multiple alleles at a single locus control seed dormancy in Swedish Arabidopsis 
eLife  null;5:e22502.
Seed dormancy is a complex life history trait that determines the timing of germination and is crucial for local adaptation. Genetic studies of dormancy are challenging, because the trait is highly plastic and strongly influenced by the maternal environment. Using a combination of statistical and experimental approaches, we show that multiple alleles at the previously identified dormancy locus DELAY OF GERMINATION1 jointly explain as much as 57% of the variation observed in Swedish Arabidopsis thaliana, but give rise to spurious associations that seriously mislead genome-wide association studies unless modeled correctly. Field experiments confirm that the major alleles affect germination as well as survival under natural conditions, and demonstrate that locally adaptive traits can sometimes be dissected genetically.
PMCID: PMC5226650  PMID: 27966430
local adaptation; seed dormancy; life history; genetic architecture; Arabidopsis; germination; DOG1; A. thaliana
3.  Molecular, genetic and evolutionary analysis of a paracentric inversion in Arabidopsis thaliana  
The Plant Journal  2016;88(2):159-178.
Chromosomal inversions can provide windows onto the cytogenetic, molecular, evolutionary and demographic histories of a species. Here we investigate a paracentric 1.17‐Mb inversion on chromosome 4 of Arabidopsis thaliana with nucleotide precision of its borders. The inversion is created by Vandal transposon activity, splitting an F‐box and relocating a pericentric heterochromatin segment in juxtaposition with euchromatin without affecting the epigenetic landscape. Examination of the RegMap panel and the 1001 Arabidopsis genomes revealed more than 170 inversion accessions in Europe and North America. The SNP patterns revealed historical recombinations from which we infer diverse haplotype patterns, ancient introgression events and phylogenetic relationships. We find a robust association between the inversion and fecundity under drought. We also find linkage disequilibrium between the inverted region and the early flowering Col‐FRIGIDA allele. Finally, SNP analysis elucidates the origin of the inversion to South‐Eastern Europe approximately 5000 years ago and the FRI‐Col allele to North‐West Europe, and reveals the spreading of a single haplotype to North America during the 17th to 19th century. The ‘American haplotype’ was identified from several European localities, potentially due to return migration.
Significance Statement
Structural rearrangements redefine chromosomes, shape genome diversity and can have profound effects on selection, adaptation and spread. Here we elucidate the history of a paracentric inversion in Arabidopsis thaliana, including its origin a few thousand years ago, its maintenance under certain environmental conditions and its migration patterns, from Europe to North America and back.
PMCID: PMC5113708  PMID: 27436134
chromosome rearrangement; Arabidopsis thaliana; transposon; phylogenetic relationship; introgression; haplotype pattern
4.  AraPheno: a public database for Arabidopsis thaliana phenotypes 
Nucleic Acids Research  2016;45(Database issue):D1054-D1059.
Natural genetic variation makes it possible to discover evolutionary changes that have been maintained in a population because they are advantageous. To understand genotype–phenotype relationships and to investigate trait architecture, the existence of both high-resolution genotypic and phenotypic data is necessary. Arabidopsis thaliana is a prime model for these purposes. This herb naturally occurs across much of the Eurasian continent and North America. Thus, it is exposed to a wide range of environmental factors and has been subject to natural selection under distinct conditions. Full genome sequencing data for more than 1000 different natural inbred lines are available, and this has encouraged the distributed generation of many types of phenotypic data. To leverage these data for meta analyses, AraPheno ( provide a central repository of population-scale phenotypes for A. thaliana inbred lines. AraPheno includes various features to easily access, download and visualize the phenotypic data. This will facilitate a comparative analysis of the many different types of phenotypic data, which is the base to further enhance our understanding of the genotype–phenotype map.
PMCID: PMC5210660  PMID: 27924043
5.  Limited Contribution of DNA Methylation Variation to Expression Regulation in Arabidopsis thaliana 
PLoS Genetics  2016;12(7):e1006141.
The extent to which epigenetic variation affects complex traits in natural populations is not known. We addressed this question using transcriptome and DNA methylation data from a sample of 135 sequenced A. thaliana accessions. Across individuals, expression was significantly associated with cis-methylation for hundreds of genes, and many of these associations remained significant after taking SNP effects into account. The pattern of correlations differed markedly between gene body methylation and transposable element methylation. The former was usually positively correlated with expression, and the latter usually negatively correlated, although exceptions were found in both cases. Finally, we developed graphical models of causality that adapt to a sample with heavy population structure, and used them to show that while methylation appears to affect gene expression more often than expression affects methylation, there is also strong support for both being independently controlled. In conclusion, although we find clear evidence for epigenetic regulation, both the number of loci affected and the magnitude of the effects appear to be small compared to the effect of SNPs.
Author Summary
It has been demonstrated experimentally that epigenetic variation, in particular DNA methylation, can transmit information across generations. However, it is difficult to evaluate the importance of such effects in natural populations due to complex genetic background effects, making experimental the separation of genetic and epigenetic effects challenging. Here we use quantitative genetic models to test whether epigenetic variation plays a significant role in gene expression variation once genetic variation has been taken into account. In addition, we devise and apply methods that go beyond a simple association framework in order to infer causal relationships. Our results suggest a significant but small epigenetic contribution to expression regulation.
PMCID: PMC4939946  PMID: 27398721
6.  "Missing" G x E Variation Controls Flowering Time in Arabidopsis thaliana  
PLoS Genetics  2015;11(10):e1005597.
Understanding how genetic variation interacts with the environment is essential for understanding adaptation. In particular, the life cycle of plants is tightly coordinated with local environmental signals through complex interactions with the genetic variation (G x E). The mechanistic basis for G x E is almost completely unknown. We collected flowering time data for 173 natural inbred lines of Arabidopsis thaliana from Sweden under two growth temperatures (10°C and 16°C), and observed massive G x E variation. To identify the genetic polymorphisms underlying this variation, we conducted genome-wide scans using both SNPs and local variance components. The SNP-based scan identified several variants that had common effects in both environments, but found no trace of G x E effects, whereas the scan using local variance components found both. Furthermore, the G x E effects appears to be concentrated in a small fraction of the genome (0.5%). Our conclusion is that G x E effects in this study are mostly due to large numbers of allele or haplotypes at a small number of loci, many of which correspond to previously identified flowering time genes.
Author Summary
Many traits are influenced by genetic variation in interaction with the environment, so called G x E variation. In agriculture, for example, different varieties are optimal in different environments. In evolution, G x E is also crucial for local adaptation. Identifying the genes underlying G x E has proven extremely challenging, however. Using a collection of inbred lines of the model plant Arabidopsis thaliana, we meausured flowering time under two temperature regimes, and scanned the genome for polymorphisms responsible for variation in this trait. Although most of the variation is due to G x E, genome-wide scans using SNPs only revealed direct genetic effects (G), and failed to reveal any significant G x E associations. In contrast, scanning the genome using local windows of polymorphism suggested that almost all the observed variation can be explained by 2% of the genome. Previously identified flowering time genes are strongly overrepresented in these regions, and our results are compatible with a model under which G x E is mainly due to many alleles at a relatively small number of loci.
PMCID: PMC4608753  PMID: 26473359
7.  Genome-wide association study of Arabidopsis thaliana's leaf microbial community 
Nature communications  2014;5:5320.
Identifying the factors that influence the outcome of host-microbial interactions is critical to protecting biodiversity, minimizing agricultural losses, and improving human health. A few genes that determine symbiosis or resistance to infectious disease have been identified in model species, but a comprehensive examination of how a host's genotype influences the structure of its microbial community is lacking. Here we report the results of a field experiment with the model plant Arabidopsis thaliana to identify the fungi and bacteria that colonize its leaves and the host loci that influence the microbes’ numbers. The composition of this community differs among accessions of A. thaliana. Genome-wide association studies (GWAS) suggest that plant loci responsible for defense and cell wall integrity affect variation in this community. Furthermore, species richness in the bacterial community is shaped by host genetic variation, notably at loci that also influence the reproduction of viruses, trichome branching and morphogenesis.
PMCID: PMC4232226  PMID: 25382143
8.  DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation 
eLife  null;4:e05255.
Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent to which this occurs, nor the mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association studies (GWAS) revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) was not affected by growth temperature, but was instead correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected. GWAS revealed that this effect was largely due to trans-acting loci, many of which showed evidence of local adaptation.
eLife digest
Organisms need to adapt quickly to changes in their environment. Mutations in the DNA sequence of genes can lead to new adaptations, but this can take many generations. Instead, altering how genes are switched on by changing how the DNA is packaged in cells can allow organisms to adapt within and between generations. One way that genes are controlled in organisms is by a process known as DNA methylation, where ‘methyl’ tags are added to DNA and act as markers for other proteins involved in activating genes.
DNA is made of four different molecules called ‘nucleotides’ that are arranged in different orders to produce a vast variety of DNA sequences. One type of DNA methylation can happen at sites where a nucleotide called cytosine is followed by two other non-cytosine nucleotides. Another type of methylation can take place at sites where a cytosine is followed by a guanine nucleotide. However, it is not clear how big a role DNA methylation plays in allowing organisms to adapt to their changing environment.
Here, Dubin, Zhang, Meng, Remigereau et al. studied DNA methylation in a plant called Arabidopsis thaliana. Several different varieties of A. thaliana plants from Sweden were grown at two different temperatures. The experiments showed that the A. thaliana plants grown at higher temperatures were more likely to have methyl tags attached to sections of DNA called transposons, which are able to move around the genome. There was a lot of variety in the levels of this DNA methylation in the different plants, and some of it was shown to be associated with variation in a gene that is involved in DNA methylation.
However, not all of the DNA methylation in these plants was sensitive to the temperature the plants were grown in. Dubin, Zhang, Meng, Remigereau et al. show that the pattern of a type of DNA methylation that is found within genes depends on how far north in Sweden the plants' ancestors came from rather than the temperature the plants were grown in. Plants that originated from colder regions, farther north, had more DNA methylation within many genes and these genes were more active.
These findings suggest that genetic differences in these plants strongly influence the levels of DNA methylation, and they provide the first direct link between DNA methylation and adaption to the environment. Future studies should reveal how DNA methylation is regulated in these plants, and whether it plays a key role in adaptation, or merely reflects other changes in the genome.
PMCID: PMC4413256  PMID: 25939354
epigenetics; population genetics; local adaptation; DNA methylation; Arabidopsis
9.  Keeping It Local: Evidence for Positive Selection in Swedish Arabidopsis thaliana 
Molecular Biology and Evolution  2014;31(11):3026-3039.
Detecting positive selection in species with heterogeneous habitats and complex demography is notoriously difficult and prone to statistical biases. The model plant Arabidopsis thaliana exemplifies this problem: In spite of the large amounts of data, little evidence for classic selective sweeps has been found. Moreover, many aspects of the demography are unclear, which makes it hard to judge whether the few signals are indeed signs of selection, or false positives caused by demographic events. Here, we focus on Swedish A. thaliana and we find that the demography can be approximated as a two-population model. Careful analysis of the data shows that such a two island model is characterized by a very old split time that significantly predates the last glacial maximum followed by secondary contact with strong migration. We evaluate selection based on this demography and find that this secondary contact model strongly affects the power to detect sweeps. Moreover, it affects the power differently for northern Sweden (more false positives) as compared with southern Sweden (more false negatives). However, even when the demographic history is accounted for, sweep signals in northern Sweden are stronger than in southern Sweden, with little or no positional overlap. Further simulations including the complex demography and selection confirm that this is not compatible with global selection acting on both populations, and thus can be taken as evidence for local selection within subpopulations of Swedish A. thaliana. This study demonstrates the necessity of combining demographic analyses and sweep scans for the detection of selection, particularly when selection acts predominantly local.
PMCID: PMC4209139  PMID: 25158800
local adaptation; selective sweeps; demography; Arabidopsis thaliana
10.  Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden 
Nature genetics  2013;45(8):884-890.
Despite advances in sequencing, the goal of obtaining a comprehensive view of genetic variation in populations is still far from reached. We sequenced 180 lines of A. thaliana from Sweden to obtain as complete a picture as possible of variation in a single region. Whereas simple polymorphisms in the unique portion of the genome are readily identified, other polymorphisms are not. The massive variation in genome size identified by flow cytometry seems largely to be due to 45S rDNA copy number variation, with lines from northern Sweden having particularly large numbers of copies. Strong selection is evident in the form of long-range linkage disequilibrium (LD), as well as in LD between nearby compensatory mutations. Many footprints of selective sweeps were found in lines from northern Sweden, and a massive global sweep was shown to have involved a 700-kb transposition.
PMCID: PMC3755268  PMID: 23793030
11.  Co-Variation between Seed Dormancy, Growth Rate and Flowering Time Changes with Latitude in Arabidopsis thaliana 
PLoS ONE  2013;8(5):e61075.
Life-history traits controlling the duration and timing of developmental phases in the life cycle jointly determine fitness. Therefore, life-history traits studied in isolation provide an incomplete view on the relevance of life-cycle variation for adaptation. In this study, we examine genetic variation in traits covering the major life history events of the annual species Arabidopsis thaliana: seed dormancy, vegetative growth rate and flowering time. In a sample of 112 genotypes collected throughout the European range of the species, both seed dormancy and flowering time follow a latitudinal gradient independent of the major population structure gradient. This finding confirms previous studies reporting the adaptive evolution of these two traits. Here, however, we further analyze patterns of co-variation among traits. We observe that co-variation between primary dormancy, vegetative growth rate and flowering time also follows a latitudinal cline. At higher latitudes, vegetative growth rate is positively correlated with primary dormancy and negatively with flowering time. In the South, this trend disappears. Patterns of trait co-variation change, presumably because major environmental gradients shift with latitude. This pattern appears unrelated to population structure, suggesting that changes in the coordinated evolution of major life history traits is adaptive. Our data suggest that A. thaliana provides a good model for the evolution of trade-offs and their genetic basis.
PMCID: PMC3662791  PMID: 23717385
12.  Genetic Architecture of Skin and Eye Color in an African-European Admixed Population 
PLoS Genetics  2013;9(3):e1003372.
Variation in human skin and eye color is substantial and especially apparent in admixed populations, yet the underlying genetic architecture is poorly understood because most genome-wide studies are based on individuals of European ancestry. We study pigmentary variation in 699 individuals from Cape Verde, where extensive West African/European admixture has given rise to a broad range in trait values and genomic ancestry proportions. We develop and apply a new approach for measuring eye color, and identify two major loci (HERC2[OCA2] P = 2.3×10−62, SLC24A5 P = 9.6×10−9) that account for both blue versus brown eye color and varying intensities of brown eye color. We identify four major loci (SLC24A5 P = 5.4×10−27, TYR P = 1.1×10−9, APBA2[OCA2] P = 1.5×10−8, SLC45A2 P = 6×10−9) for skin color that together account for 35% of the total variance, but the genetic component with the largest effect (∼44%) is average genomic ancestry. Our results suggest that adjacent cis-acting regulatory loci for OCA2 explain the relationship between skin and eye color, and point to an underlying genetic architecture in which several genes of moderate effect act together with many genes of small effect to explain ∼70% of the estimated heritability.
Author Summary
Differences in skin and eye color are some of the most obvious traits that underlie human diversity, yet most of our knowledge regarding the genetic basis for these traits is based on the limited range of variation represented by individuals of European ancestry. We have studied a unique population in Cape Verde, an archipelago located off the West African coast, in which extensive mixing between individuals of Portuguese and West African ancestry has given rise to a broad range of phenotypes and ancestral genome proportions. Our results help to explain how genes work together to control the full range of pigmentary phenotypic diversity, provide new insight into the evolution of these traits, and provide a model for understanding other types of quantitative variation in admixed populations.
PMCID: PMC3605137  PMID: 23555287
13.  A mixed-model approach for genome-wide association studies of correlated traits in structured populations 
Nature genetics  2012;44(9):1066-1071.
Genome-wide association studies (GWAS) are a standard approach for studying the genetics of natural variation. A major concern in GWAS is the need to account for the complicated dependence-structure of the data both between loci as well as between individuals. Mixed models have emerged as a general and flexible approach for correcting for population structure in GWAS. Here we extend this linear mixed model approach to carry out GWAS of correlated phenotypes, deriving a fully parameterized multi-trait mixed model (MTMM) that considers both the within-trait and between-trait variance components simultaneously for multiple traits. We apply this to human cohort data for correlated blood lipid traits from the Northern Finland Birth Cohort 1966, and demonstrate greatly increased power to detect pleiotropic loci that affect more than one blood lipid trait. We also apply this to an Arabidopsis dataset for flowering measurements in two different locations, identifying loci whose effect depends on the environment.
PMCID: PMC3432668  PMID: 22902788
14.  An efficient multi-locus mixed model approach for genome-wide association studies in structured populations 
Nature genetics  2012;44(7):825-830.
Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods, in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying novel associations in known candidates as well as evidence for allelic heterogeneity. We also demonstrate how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large datasets (n > 10000) practicable.
PMCID: PMC3386481  PMID: 22706313
15.  Genome-Wide Association Studies Identify Heavy Metal ATPase3 as the Primary Determinant of Natural Variation in Leaf Cadmium in Arabidopsis thaliana 
PLoS Genetics  2012;8(9):e1002923.
Understanding the mechanism of cadmium (Cd) accumulation in plants is important to help reduce its potential toxicity to both plants and humans through dietary and environmental exposure. Here, we report on a study to uncover the genetic basis underlying natural variation in Cd accumulation in a world-wide collection of 349 wild collected Arabidopsis thaliana accessions. We identified a 4-fold variation (0.5–2 µg Cd g−1 dry weight) in leaf Cd accumulation when these accessions were grown in a controlled common garden. By combining genome-wide association mapping, linkage mapping in an experimental F2 population, and transgenic complementation, we reveal that HMA3 is the sole major locus responsible for the variation in leaf Cd accumulation we observe in this diverse population of A. thaliana accessions. Analysis of the predicted amino acid sequence of HMA3 from 149 A. thaliana accessions reveals the existence of 10 major natural protein haplotypes. Association of these haplotypes with leaf Cd accumulation and genetics complementation experiments indicate that 5 of these haplotypes are active and 5 are inactive, and that elevated leaf Cd accumulation is associated with the reduced function of HMA3 caused by a nonsense mutation and polymorphisms that change two specific amino acids.
Author Summary
Cadmium (Cd) is a potentially toxic metal pollutant that threatens food quality and human health in many regions of the world. Plants have evolved mechanisms for the acquisition of essential metals such as zinc and iron from the soil. Though often quite specific, such mechanisms can also lead to the accumulation of Cd by plants. Understanding natural variation in the processes that contribute to Cd accumulation in food crops could help minimize the human health risk posed. We have discovered that DNA sequence changes at a single gene, which encodes the Heavy Metal ATPase 3 (HMA3), drives the variation in Cd accumulation we observe in a world-wide sample of Arabidopsis thaliana. We identified 10 major HMA3 protein variants, of which five contribute to reduce Cd accumulation in leaves of A. thaliana.
PMCID: PMC3435251  PMID: 22969436
16.  Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel 
Nature genetics  2012;44(2):212-216.
Arabidopsis thaliana is native to Eurasia and naturalized across the world due to human disturbance. Its easy propagation and immense phenotypic variability make it an ideal model system for functional, ecological and evolutionary genetics. To date, analyses of its natural variation have involved small numbers of individuals or genetic markers. Here we genotype 1,307 world-wide accessions, including several regional samples, at 250K SNPs, enabling us to describe the global pattern of genetic variation with high resolution. Three complementary tests applied to these data reveal novel targets of selection. Furthermore, we characterize the pattern of historical recombination and observe an enrichment of hotspots in intergenic regions and repetitive DNA, consistent with the pattern observed for humans but strikingly different from other plant species. We are making seeds for this Regional Mapping (RegMap) panel publicly available; they comprise the largest genomic mapping resource available for a naturally occurring, non-human, species.
PMCID: PMC3267885  PMID: 22231484
17.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change 
Nature genetics  2011;43(5):476-481.
We present the 207 Mb genome sequence of the outcrosser Arabidopsis lyrata, which diverged from the self-fertilizing species A. thaliana about 10 million years ago. It is generally assumed that the much smaller A. thaliana genome, which is only 125 Mb, constitutes the derived state for the family. Apparent genome reduction in this genus can be partially attributed to the loss of DNA from large-scale rearrangements, but the main cause lies in the hundreds of thousands of small deletions found throughout the genome. These occurred primarily in non-coding DNA and transposons, but protein-coding multi-gene families are smaller in A. thaliana as well. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome.
PMCID: PMC3083492  PMID: 21478890
18.  The Impact of Arabidopsis on Human Health: Diversifying Our Portfolio 
Cell  2008;133(6):939-943.
Studies of the model plant Arabidopsis thaliana may seem to have little impact on advances in medical research, yet a survey of the scientific literature shows that this is a misconception. Many discoveries with direct relevance to human health and disease have been elaborated using Arabidopsis, and several processes important to human biology are more easily studied in this versatile model plant.
PMCID: PMC3124625  PMID: 18555767
19.  High-Resolution Analysis of Parent-of-Origin Allelic Expression in the Arabidopsis Endosperm 
PLoS Genetics  2011;7(6):e1002126.
Genomic imprinting is an epigenetic phenomenon leading to parent-of-origin specific differential expression of maternally and paternally inherited alleles. In plants, genomic imprinting has mainly been observed in the endosperm, an ephemeral triploid tissue derived after fertilization of the diploid central cell with a haploid sperm cell. In an effort to identify novel imprinted genes in Arabidopsis thaliana, we generated deep sequencing RNA profiles of F1 hybrid seeds derived after reciprocal crosses of Arabidopsis Col-0 and Bur-0 accessions. Using polymorphic sites to quantify allele-specific expression levels, we could identify more than 60 genes with potential parent-of-origin specific expression. By analyzing the distribution of DNA methylation and epigenetic marks established by Polycomb group (PcG) proteins using publicly available datasets, we suggest that for maternally expressed genes (MEGs) repression of the paternally inherited alleles largely depends on DNA methylation or PcG-mediated repression, whereas repression of the maternal alleles of paternally expressed genes (PEGs) predominantly depends on PcG proteins. While maternal alleles of MEGs are also targeted by PcG proteins, such targeting does not cause complete repression. Candidate MEGs and PEGs are enriched for cis-proximal transposons, suggesting that transposons might be a driving force for the evolution of imprinted genes in Arabidopsis. In addition, we find that MEGs and PEGs are significantly faster evolving when compared to other genes in the genome. In contrast to the predominant location of mammalian imprinted genes in clusters, cluster formation was only detected for few MEGs and PEGs, suggesting that clustering is not a major requirement for imprinted gene regulation in Arabidopsis.
Author Summary
Genomic imprinting poses a violation to the Mendelian rules of inheritance, which state functional equality of maternally and paternally inherited alleles. Imprinted genes are expressed dependent on their parent-of-origin, implicating an epigenetic asymmetry of maternal and paternal alleles. Genomic imprinting occurs in mammals and flowering plants. In both groups of organisms, nourishing of the progeny depends on ephemeral tissues, the placenta and the endosperm, respectively. In plants, genomic imprinting predominantly occurs in the endosperm, which is derived after fertilization of the diploid central cell with a haploid sperm cell. In this study we identify more than 60 potentially imprinted genes and show that there are different epigenetic mechanisms causing maternal and paternal-specific gene expression. We show that maternally expressed genes are regulated by DNA methylation or Polycomb group (PcG)-mediated repression, while paternally expressed genes are predominantly regulated by PcG proteins. From an evolutionary perspective, we also show that imprinted genes are associated with transposons and are more rapidly evolving than other genes in the genome. Many MEGs and PEGs encode for transcriptional regulators, implicating important functional roles of imprinted genes for endosperm and seed development.
PMCID: PMC3116908  PMID: 21698132
20.  Analysis and visualization of Arabidopsis thaliana GWAS using web 2.0 technologies 
With large-scale genomic data becoming the norm in biological studies, the storing, integrating, viewing and searching of such data have become a major challenge. In this article, we describe the development of an Arabidopsis thaliana database that hosts the geographic information and genetic polymorphism data for over 6000 accessions and genome-wide association study (GWAS) results for 107 phenotypes representing the largest collection of Arabidopsis polymorphism data and GWAS results to date. Taking advantage of a series of the latest web 2.0 technologies, such as Ajax (Asynchronous JavaScript and XML), GWT (Google-Web-Toolkit), MVC (Model-View-Controller) web framework and Object Relationship Mapper, we have created a web-based application (web app) for the database, that offers an integrated and dynamic view of geographic information, genetic polymorphism and GWAS results. Essential search functionalities are incorporated into the web app to aid reverse genetics research. The database and its web app have proven to be a valuable resource to the Arabidopsis community. The whole framework serves as an example of how biological data, especially GWAS, can be presented and accessed through the web. In the end, we illustrate the potential to gain new insights through the web app by two examples, showcasing how it can be used to facilitate forward and reverse genetics research. Database URL:
PMCID: PMC3243604  PMID: 21609965
21.  Major-Effect Alleles at Relatively Few Loci Underlie Distinct Vernalization and Flowering Variation in Arabidopsis Accessions 
PLoS ONE  2011;6(5):e19949.
We have explored the genetic basis of variation in vernalization requirement and response in Arabidopsis accessions, selected on the basis of their phenotypic distinctiveness. Phenotyping of F2 populations in different environments, plus fine mapping, indicated possible causative genes. Our data support the identification of FRI and FLC as candidates for the major-effect QTL underlying variation in vernalization response, and identify a weak FLC allele, caused by a Mutator-like transposon, contributing to flowering time variation in two N. American accessions. They also reveal a number of additional QTL that contribute to flowering time variation after saturating vernalization. One of these was the result of expression variation at the FT locus. Overall, our data suggest that distinct phenotypic variation in the vernalization and flowering response of Arabidopsis accessions is accounted for by variation that has arisen independently at relatively few major-effect loci.
PMCID: PMC3098857  PMID: 21625501
22.  Natural allelic variation underlying a major fitness tradeoff in Arabidopsis thaliana 
Nature  2010;465(7298):632-636.
Plants can defend themselves against a wide array of enemies, yet one of the most striking observations is the variability in the effectiveness of such defences, both within and between species. Some of this variation can be explained by conflicting pressures from pathogens with different modes of attack1. A second explanation comes from an evolutionary tug of war, in which pathogens adapt to evade detection, until the plant has evolved new recognition capabilities for pathogen invasion2-5. If selection is, however, sufficiently strong, susceptible hosts should remain rare. That this is not the case is best justified by costs incurred from constitutive defences in a pest free environment6-11. Using a combination of forward genetics and genome-wide association analyses, we demonstrate that allelic diversity at a single locus, ACCELERATED CELL DEATH 6 (ACD6)12,13, underpins dramatic pleiotropic differences in both vegetative growth and resistance to microbial infection and herbivory among natural Arabidopsis thaliana strains. A hyperactive ACD6 allele, compared to the reference allele, strongly enhances resistance to a broad range of pathogens from different phyla, but at the same time slows the production of new leaves and greatly reduces the biomass of mature leaves. This allele segregates at intermediate frequency both throughout the worldwide range of A. thaliana and within local populations, consistent with this allele providing substantial fitness benefits despite its drastic impact on growth.
PMCID: PMC3055268  PMID: 20520716
23.  Genome-wide association study of 107 phenotypes in a common set of Arabidopsis thaliana inbred lines 
Nature  2010;465(7298):627-631.
Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases1,2, advances in genotyping and sequencing technology have made genome-wide association (GWA) studies an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available because once these lines have been genotyped, they can be phenotyped multiple times, making it possible (as well as extremely cost-effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly selfing model plant, known to harbor considerable genetic variation for many adaptively important traits3. Our results are dramatically different from those of human GWA studies in that we identify many common alleles with major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true from false associations. However, a priori candidates are significantly overrepresented among these associations as well, making many of them excellent candidates for follow-up experiments by the Arabidopsis community. Our study clearly demonstrates the feasibility of GWA studies in A. thaliana, and suggests that the approach will be appropriate for many other organisms.
PMCID: PMC3023908  PMID: 20336072
24.  PoolHap: Inferring Haplotype Frequencies from Pooled Samples by Next Generation Sequencing 
PLoS ONE  2011;6(1):e15292.
With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at
PMCID: PMC3016441  PMID: 21264334
25.  A Coastal Cline in Sodium Accumulation in Arabidopsis thaliana Is Driven by Natural Variation of the Sodium Transporter AtHKT1;1 
PLoS Genetics  2010;6(11):e1001193.
The genetic model plant Arabidopsis thaliana, like many plant species, experiences a range of edaphic conditions across its natural habitat. Such heterogeneity may drive local adaptation, though the molecular genetic basis remains elusive. Here, we describe a study in which we used genome-wide association mapping, genetic complementation, and gene expression studies to identify cis-regulatory expression level polymorphisms at the AtHKT1;1 locus, encoding a known sodium (Na+) transporter, as being a major factor controlling natural variation in leaf Na+ accumulation capacity across the global A. thaliana population. A weak allele of AtHKT1;1 that drives elevated leaf Na+ in this population has been previously linked to elevated salinity tolerance. Inspection of the geographical distribution of this allele revealed its significant enrichment in populations associated with the coast and saline soils in Europe. The fixation of this weak AtHKT1;1 allele in these populations is genetic evidence supporting local adaptation to these potentially saline impacted environments.
Author Summary
The unusual geographical distribution of certain animal and plant species has provided puzzling questions to the scientific community regarding the interrelationship of evolutionary and geographic histories for generations. With DNA sequencing, such puzzles have now extended to the geographical distribution of genetic variation within a species. Here, we explain one such puzzle in the European population of Arabidopsis thaliana, where we find that a version of a gene encoding for a sodium-transporter with reduced function is almost uniquely found in populations of this plant growing close to the coast or on known saline soils. This version of the gene has previously been linked with elevated salinity tolerance, and its unusual distribution in populations of plants growing in coastal regions and on saline soils suggests that it is playing a role in adapting these plants to the elevated salinity of their local environment.
PMCID: PMC2978683  PMID: 21085628

Results 1-25 (38)