High-density array-based genome-wide association studies (GWAS) are complemented by exome sequencing and whole-genome resequencing-based association studies. Here we present a composite resequencing-based genome-wide association study (CR-GWAS) strategy that systematically exploits collective biological information and analytical tools for a robust analysis. We showcased the utility of this strategy by using Arabidopsis (Arabidopsis thaliana) resequencing data. Bioinformatic predictions of biological function alteration at each locus were integrated into the process of association testing of both common and rare variants for complex traits with a suite of statistics. Significant signals were then filtered with a priori candidate loci generated from genome database and gene network models to obtain a posteriori candidate loci. A probabilistic gene network (AraNet) that interrogates network neighborhoods of genes was then used to expand the filtering power to examine the significant testing signals. Using this strategy, we confirmed the known true positives and identified several new promising associations. Promising genes (AP1, FCA, FRI, FLC, FLM, SPL5, FY, and DCL2) were shown to control for flowering time through either common variants or rare variants within a diverse set of Arabidopsis accessions. Although many of these candidate genes were cloned earlier with mutational studies, identifying their allele variation contribution to overall phenotypic variation among diverse natural accessions is critical. Our rare allele testing established a greater number of connections than previous analyses in which this issue was not addressed. More importantly, our results demonstrated the potential of integrating various biological, statistical, and bioinformatic tools into complex trait dissection.
complex trait dissection; association mapping; rare allele; mixed model
Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome.
Organisms in the wild are subject to multiple, fluctuating environmental factors, and it is in complex natural environments that genetic regulatory networks actually function and evolve. We assessed genome-wide gene expression patterns in the wild in two natural accessions of the model plant Arabidopsis thaliana and examined the nature of transcriptional variation throughout its life cycle and gene expression correlations with natural environmental fluctuations. We grew plants in a natural field environment and measured genome-wide time-series gene expression from the plant shoot every three days, spanning the seedling to reproductive stages. We find that 15,352 genes were expressed in the A. thaliana shoot in the field, and accession and flowering status (vegetative versus flowering) were strong components of transcriptional variation in this plant. We identified between ∼110 and 190 time-varying gene expression clusters in the field, many of which were significantly overrepresented by genes regulated by abiotic and biotic environmental stresses. The two main principal components of vegetative shoot gene expression (PCveg) correlate to temperature and precipitation occurrence in the field. The largest PCveg axes included thermoregulatory genes while the second major PCveg was associated with precipitation and contained drought-responsive genes. By exposing A. thaliana to natural environments in an open field, we provide a framework for further understanding the genetic networks that are deployed in natural environments, and we connect plant molecular genetics in the laboratory to plant organismal ecology in the wild.
Plants in the real world are continuously exposed to multiple environmental signals and must respond appropriately to the dynamic conditions found in nature. Environmental signals can fluctuate during an individual's life cycle with varying degrees of predictability, and complex natural environments are where gene activity evolves. We grew two natural accessions of the model plant Arabidopsis thaliana in an open field in New York in the spring and examined genome-wide gene expression patterns in the wild. We find nearly 200 gene expression clusters in these field-grown plants, and many of these clusters were enriched in genes that had previously been shown to be associated with expression under various abiotic or biotic environmental stress conditions. Two major principal components of gene expression were associated with environmental fluctuations in temperature and rainfall, and we identified several genes (such as the thermoregulatory nucleosome occupancy gene ARP6 and the drought-sensitive hormone biosynthetic gene AAO3) that could be found in these principal components. By exploring genome-wide gene expression in plants in the wild, we were able to connect mechanistic aspects of plant molecular biology with ecological responses in nature and to begin to understand how organisms behave and adapt in their natural environments.
The model plant Arabidopsis thaliana (Arabidopsis) shows a wide range of genetic and trait variation among wild accessions. Because of its unparalleled biological and genomic resources, the potential of Arabidopsis for molecular genetic analysis of this natural variation has increased dramatically in recent years.
Advanced genomics has accelerated molecular phylogenetic analysis and gene identification by quantitative trait loci (QTL) mapping and/or association mapping in Arabidopsis. In particular, QTL mapping utilizing natural accessions is now becoming a major strategy of gene isolation, offering an alternative to artificial mutant lines. Furthermore, the genomic information is used by researchers to uncover the signature of natural selection acting on the genes that contribute to phenotypic variation. The evolutionary significance of such genes has been evaluated in traits such as disease resistance and flowering time. However, although molecular hallmarks of selection have been found for the genes in question, a corresponding ecological scenario of adaptive evolution has been difficult to prove. Ecological strategies, including reciprocal transplant experiments and competition experiments, and utilizing near-isogenic lines of alleles of interest will be a powerful tool to measure the relative fitness of phenotypic and/or allelic variants.
As the plant model organism, Arabidopsis provides a wealth of molecular background information for evolutionary genetics. Because genetic diversity between and within Arabidopsis populations is much higher than anticipated, combining this background information with ecological approaches might well establish Arabidopsis as a model organism for plant evolutionary ecology.
Arabidopsis; natural genetic variation; natural trait variation; QTL mapping; LD mapping; plant development; plant evolution; molecular ecology
The evolutionary origins of the multitude of duplicate genes in the plant genomes are still incompletely understood. To gain an appreciation of the potential selective forces acting on these duplicates, we phylogenetically inferred the set of metabolic gene families from 10 flowering plant (angiosperm) genomes. We then compared the metabolic fluxes for these families, predicted using the Arabidopsis thaliana and Sorghum bicolor metabolic networks, with the families' duplication propensities. For duplications produced by both small scale (small-scale duplications) and genome duplication (whole-genome duplications), there is a significant association between the flux and the tendency to duplicate. Following this global analysis, we made a more fine-scale study of the selective constraints observed on plant sodium and phosphate transporters. We find that the different duplication mechanisms give rise to differing selective constraints. However, the exact nature of this pattern varies between the gene families, and we argue that the duplication mechanism alone does not define a duplicated gene's subsequent evolutionary trajectory. Collectively, our results argue for the interplay of history, function, and selection in shaping the duplicate gene evolution in plants.
dosage selection; genome duplication; gene duplication
PICARA is an analytical pipeline designed to systematically summarize observed SNP/trait associations identified by genome wide association studies (GWAS) and to identify candidate genes involved in the regulation of complex trait variation. The pipeline provides probabilistic inference about a priori candidate genes using integrated information derived from genome-wide association signals, gene homology, and curated gene sets embedded in pathway descriptions. In this paper, we demonstrate the performance of PICARA using data for flowering time variation in maize – a key trait for geographical and seasonal adaption of plants. Among 406 curated flowering time-related genes from Arabidopsis, we identify 61 orthologs in maize that are significantly enriched for GWAS SNP signals, including key regulators such as FT (Flowering Locus T) and GI (GIGANTEA), and genes centered in the Arabidopsis circadian pathway, including TOC1 (Timing of CAB Expression 1) and LHY (Late Elongated Hypocotyl). In addition, we discover a regulatory feature that is characteristic of these a priori flowering time candidates in maize. This new probabilistic analytical pipeline helps researchers infer the functional significance of candidate genes associated with complex traits and helps guide future experiments by providing statistical support for gene candidates based on the integration of heterogeneous biological information.
Light plays an important role in modulating seedling growth and flowering time1. We show that allelic variation at the PHYTOCHROME C (PHYC) photoreceptor locus affects both traits in natural populations of A. thaliana. Two functionally distinct PHYC haplotype groups are distributed in a FRIGIDA-dependent latitudinal cline that is stronger than the one reported for FLOWERING LOCUS C, which together with FRIGIDA explains a large portion of the variation in A. thaliana flowering time2. In a genome-wide scan for association of 65 loci with latitude, there was an excess of significant p-values, indicative of population structure. Nevertheless, PHYC was the most strongly associated locus across 163 strains, suggesting that PHYC alleles are under diversifying selection in A. thaliana. Our work, together with previous findings3–6, suggests that photoreceptor genes are major agents of natural variation in plant flowering and growth response.
The Arabidopsis Gene Regulatory Information Server (AGRIS; http://arabidopsis.med.ohio-state.edu/) provides a comprehensive resource for gene regulatory studies in the model plant Arabidopsis thaliana. Three interlinked databases, AtTFDB, AtcisDB and AtRegNet, furnish comprehensive and updated information on transcription factors (TFs), predicted and experimentally verified cis-regulatory elements (CREs) and their interactions, respectively. In addition to significant contributions in the identification of the entire set of TF–DNA interactions, which are the key to understand the gene regulatory networks that govern Arabidopsis gene expression, tools recently incorporated into AGRIS include the complete set of words length 5–15 present in the Arabidopsis genome and the integration of AtRegNet with visualization tools, such as the recently developed ReIN application. All the information in AGRIS is publicly available and downloadable upon registration.
The timing of flowering initiation is a fundamental trait for the adaptation of annual plants to different environments. Large amounts of intraspecific quantitative variation have been described for it among natural accessions of many species, but the molecular and evolutionary mechanisms underlying this genetic variation are mainly being determined in the model plant Arabidopsis thaliana. To find novel A. thaliana flowering QTL, we developed introgression lines from the Japanese accession Fuk, which was selected based on the substantial transgression observed in an F2 population with the reference strain Ler. Analysis of an early flowering line carrying a single Fuk introgression identified Flowering Arabidopsis QTL1 (FAQ1). We fine-mapped FAQ1 in an 11 kb genomic region containing the MADS transcription factor gene SHORT VEGETATIVE PHASE (SVP). Complementation of the early flowering phenotype of FAQ1-Fuk with a SVP-Ler transgen demonstrated that FAQ1 is SVP. We further proved by directed mutagenesis and transgenesis that a single amino acid substitution in SVP causes the loss-of-function and early flowering of Fuk allele. Analysis of a worldwide collection of accessions detected FAQ1/SVP-Fuk allele only in Asia, with the highest frequency appearing in Japan, where we could also detect a potential ancestral genotype of FAQ1/SVP-Fuk. In addition, we evaluated allelic and epistatic interactions of SVP natural alleles by analysing more than one hundred transgenic lines carrying Ler or Fuk SVP alleles in five genetic backgrounds. Quantitative analyses of these lines showed that FAQ1/SVP effects vary from large to small depending on the genetic background. These results support that the flowering repressor SVP has been recently selected in A. thaliana as a target for early flowering, and evidence the relevance of genetic interactions for the intraspecific evolution of FAQ1/SVP and flowering time.
In many plant species, the timing of flowering initiation shows abundant quantitative variation among natural varieties, which reflects the importance of this trait for adaptation to different environments. Currently, a major goal in plant biology is to determine the molecular and evolutionary bases of this natural genetic variation. In this study we demonstrate that the central flowering regulator SHORT VEGETATIVE PHASE (SVP), encoding a MADS transcription factor, is involved in the flowering natural variation of the model organism Arabidopsis thaliana. In particular, we prove that a structural change caused by a single amino acid substitution generates a SVP early flowering allele that is distributed only in Asia. Furthermore, genetic interactions have been shown to be a component of the natural variation for many important adaptive traits. However, very few studies, either in animals or plants, have systematically addressed the extent of genetic interactions among specific alleles responsible for the natural variation of complex traits. Our study shows that the flowering effects of SVP natural alleles depend significantly on the genetic background; and, subsequently, we demonstrate the relevance of epistasis for the evolution of this crucial transcription factor and flowering time.
Correct daily phasing of transcription confers an adaptive advantage to almost all organisms, including higher plants. In this study, we describe a hypothesis-driven network discovery pipeline that identifies biologically relevant patterns in genome-scale data. To demonstrate its utility, we analyzed a comprehensive matrix of time courses interrogating the nuclear transcriptome of Arabidopsis thaliana plants grown under different thermocycles, photocycles, and circadian conditions. We show that 89% of Arabidopsis transcripts cycle in at least one condition and that most genes have peak expression at a particular time of day, which shifts depending on the environment. Thermocycles alone can drive at least half of all transcripts critical for synchronizing internal processes such as cell cycle and protein synthesis. We identified at least three distinct transcription modules controlling phase-specific expression, including a new midnight specific module, PBX/TBX/SBX. We validated the network discovery pipeline, as well as the midnight specific module, by demonstrating that the PBX element was sufficient to drive diurnal and circadian condition-dependent expression. Moreover, we show that the three transcription modules are conserved across Arabidopsis, poplar, and rice. These results confirm the complex interplay between thermocycles, photocycles, and the circadian clock on the daily transcription program, and provide a comprehensive view of the conserved genomic targets for a transcriptional network key to successful adaptation.
As the earth rotates, environmental conditions oscillate between illuminated warm days and dark cool nights. Plants have adapted to these changes by timing physiological processes to specific times of the day or night. Light and temperature signaling and the circadian clock regulate this adaptive response. To determine the contributions of each of these factors on gene regulation, we analyzed microarray time course experiments interrogating light, temperature, and circadian conditions. We discovered that almost all Arabidopsis genes cycle in at least one condition. From a signaling perspective, this suggests that light, temperature, and circadian clock play an important role in modulating many physiological pathways. To clarify the contribution of transcriptional regulation on this process, we mined the promoters of cycling genes to identify DNA elements associated with expression at specific times of day. This confirmed the importance of several DNA motifs such as the G-box and the evening element in the regulation of gene expression by light and the circadian clock, but also facilitated the discovery of new elements linked to a novel midnight regulatory module. Identification of orthologous promoter elements in rice and poplar revealed a conserved transcriptional regulatory network that allows global adaptation to the ever-changing daily environment.
Flowers are the most complex structures of plants. Studies of Arabidopsis thaliana, which has typical eudicot flowers, have been fundamental in advancing the structural and molecular understanding of flower development. The main processes and stages of Arabidopsis flower development are summarized to provide a framework in which to interpret the detailed molecular genetic studies of genes assigned functions during flower development and is extended to recent genomics studies uncovering the key regulatory modules involved. Computational models have been used to study the concerted action and dynamics of the gene regulatory module that underlies patterning of the Arabidopsis inflorescence meristem and specification of the primordial cell types during early stages of flower development. This includes the gene combinations that specify sepal, petal, stamen and carpel identity, and genes that interact with them. As a dynamic gene regulatory network this module has been shown to converge to stable multigenic profiles that depend upon the overall network topology and are thus robust, which can explain the canalization of flower organ determination and the overall conservation of the basic flower plan among eudicots. Comparative and evolutionary approaches derived from Arabidopsis studies pave the way to studying the molecular basis of diverse floral morphologies.
The onset of flowering is an important adaptive trait in plants. The small ephemeral species Arabidopsis thaliana grows under a wide range of temperature and day-length conditions across much of the Northern hemisphere, and a number of flowering-time loci that vary between different accessions have been identified before. However, only few studies have addressed the species-wide genetic architecture of flowering-time control. We have taken advantage of a set of 18 distinct accessions that present much of the common genetic diversity of A. thaliana and mapped quantitative trait loci (QTL) for flowering time in 17 F2 populations derived from these parents. We found that the majority of flowering-time QTL cluster in as few as five genomic regions, which include the locations of the entire FLC/MAF clade of transcription factor genes. By comparing effects across shared parents, we conclude that in several cases there might be an allelic series caused by rare alleles. While this finding parallels results obtained for maize, in contrast to maize much of the variation in flowering time in A. thaliana appears to be due to large-effect alleles.
The eukaryotic cell cycle is a process controlled by protein assemblies, of which the key subunits are serine-threonine cyclin-dependent kinases (CDKs). Timely association and dissociation of these assemblies ensure that the cell division program is executed correctly. The challenge to unravel the rules of the plant cell cycle results from the multiplicity of the process-regulating genes that emerged through genome duplications during the evolution of flowering plants. Despite the increasing knowledge on the plant cell cycle control, little is known about the composition of the different CDK-Cyclin complexes and their spatiotemporal occurrence. The binary interactions of the previously annotated 58 Arabidopsis thaliana core cell cycle proteins were tested in two high-throughput protein-protein interaction (PPI) assays: the bimolecular fluorescence complementation (BiFC) and the yeast two-hybrid. The resulting PPI network was integrated with available cycle phase-dependent gene expression data and subcellular localization information, revealing distinct cell cycle clusters acting at different cell division stages. Additionally, the BiFC assay revealed that three D-type cyclins, CYCD4;1, CYCD4;2 and CYCD5;1, form active kinase complexes with CDKA;1 and CDKB1;1 in vivo because they induce cell divisions in differentiated tobacco (Nicotiana benthamiana) epidermal cells. We demonstrate that these complexes promote cell proliferation in Arabidopsis and we discuss their putative mode of action in plant development.
Arabidopsis; cell cycle; protein-protein interaction; cell division; meristem
Analysis of the genome-wide distribution patterns of histone H3 lysine4 methylation in Arabidopsis thaliana seedlings shows that it has widespread roles in regulating gene expression.
Post-translational modifications of histones play important roles in maintaining normal transcription patterns by directly or indirectly affecting the structural properties of the chromatin. In plants, methylation of histone H3 lysine 4 (H3K4me) is associated with genes and required for normal plant development.
We have characterized the genome-wide distribution patterns of mono-, di- and trimethylation of H3K4 (H3K4me1, H3K4me2 and H3K4me3, respectively) in Arabidopsis thaliana seedlings using chromatin immunoprecipitation and high-resolution whole-genome tiling microarrays (ChIP-chip). All three types of H3K4me are found to be almost exclusively genic, and two-thirds of Arabidopsis genes contain at least one type of H3K4me. H3K4me2 and H3K4me3 accumulate predominantly in promoters and 5' genic regions, whereas H3K4me1 is distributed within transcribed regions. In addition, H3K4me3-containing genes are highly expressed with low levels of tissue specificity, but H3K4me1 or H3K4me2 may not be directly involved in transcriptional activation. Furthermore, the preferential co-localization of H3K4me3 and H3K27me3 found in mammals does not appear to occur in plants at a genome-wide level, but H3K4me2 and H3K27me3 co-localize at a higher-than-expected frequency. Finally, we found that H3K4me2/3 and DNA methylation appear to be mutually exclusive, but surprisingly, H3K4me1 is highly correlated with CG DNA methylation in the transcribed regions of genes.
H3K4me plays widespread roles in regulating gene expression in plants. Although many aspects of the mechanisms and functions of H3K4me appear to be conserved among all three kingdoms, we observed significant differences in the relationship between H3K4me and transcription or other epigenetic pathways in plants and mammals.
An Arabidopsis thaliana transcriptional network reveals regulatory mechanisms for the control of genes related to stress adaptation.
Understanding the molecular mechanisms plants have evolved to adapt their biological activities to a constantly changing environment is an intriguing question and one that requires a systems biology approach. Here we present a network analysis of genome-wide expression data combined with reverse-engineering network modeling to dissect the transcriptional control of Arabidopsis thaliana. The regulatory network is inferred by using an assembly of microarray data containing steady-state RNA expression levels from several growth conditions, developmental stages, biotic and abiotic stresses, and a variety of mutant genotypes.
We show that the A. thaliana regulatory network has the characteristic properties of hierarchical networks. We successfully applied our quantitative network model to predict the full transcriptome of the plant for a set of microarray experiments not included in the training dataset. We also used our model to analyze the robustness in expression levels conferred by network motifs such as the coherent feed-forward loop. In addition, the meta-analysis presented here has allowed us to identify regulatory and robust genetic structures.
These data suggest that A. thaliana has evolved high connectivity in terms of transcriptional regulation among cellular functions involved in response and adaptation to changing environments, while gene networks constitutively expressed or less related to stress response are characterized by a lower connectivity. Taken together, these findings suggest conserved regulatory strategies that have been selected during the evolutionary history of this eukaryote.
The economic importance of cereals such as barley, and the demand for improved yield and quality require a better understanding of the genetic components that modulate biologically and commercially relevant traits. While Arabidopsis thaliana is the premiere model plant system, the spectrum of its traits cannot address all of the fundamental questions of crop plant development. Unlike Arabidopsis, barley is both a crop and a model system for scientific research, and it is increasingly being used for genetic and molecular investigations into the conserved biological processes of cereals. A common challenge in genetic studies in plants with large genomes arises from the very time-consuming work of associating mutant phenotypes with gene sequence information, especially if insertion mutagenesis is not routine, as in barley. Reverse genetics based on chemical mutagenesis represents the best solution to this obstacle.
In barley, we generated a new TILLING (Targeting Local Lesions IN Genomes) resource comprising 10,279 M2 mutants in the two-rowed malting cultivar 'Barke,' which has been used in the generation of other genomic resources in barley (~150,000 ESTs, DH mapping population). The value of this new resource was tested using selected candidate genes. An average frequency of approximately one mutation per 0.5 Mb was determined by screening ten fragments of six different genes. The ethyl methanesulphonate (EMS)mutagenesis efficiency was studied by recording and relating the mutagenesis-dependent effects found in the three mutant generations (M1-M3). A detailed analysis was performed for the homeodomain-leucine-zipper (HD-ZIP) gene HvHox1. Thirty-one mutations were identified by screening a 1,270-bp fragment in 7,348 M2 lines. Three of the newly identified mutants exhibited either a six-rowed or an intermedium-spike phenotype, and one mutant displayed a significantly altered spikelet morphology compared to that of the 'Barke' wild type. Our results indicate a bias in the frequency of independent functional mutations at specific base pair (bp) positions within the gene HvHox1.
A new TILLING population was developed as a resource for high-throughput gene discovery in an alternative barley germplasm. Pilot screening demonstrated a similar or even slightly higher mutation frequency when compared to previously published barley TILLING populations that should allow for the identification of diverse allelic variation. Partial phenotypic evaluation of the M2 and M3 generations has revealed the presence of a wide spectrum of morphological diversity that highlights the great potential of this resource for use in forward genetic screens. Altogether, our study shows the efficiency of screening and the applicability of the new TILLING population for genetic studies in the barley crop model system.
Construction of transcriptional regulatory networks (TRNs) is of priority concern in systems biology. Numerous high-throughput approaches, including microarray and next-generation sequencing, are extensively adopted to examine transcriptional expression patterns on the whole-genome scale; those data are helpful in reconstructing TRNs. Identifying transcription factor binding sites (TFBSs) in a gene promoter is the initial step in elucidating the transcriptional regulation mechanism. Since transcription factors usually co-regulate a common group of genes by forming regulatory modules with similar TFBSs. Therefore, the combinatorial interactions of transcription factors must be modeled to reconstruct the gene regulatory networks.
Description For systems biology applications, this work develops a novel database called Arabidopsis thaliana Promoter Analysis Net (AtPAN), capable of detecting TFBSs and their corresponding transcription factors (TFs) in a promoter or a set of promoters in Arabidopsis. For further analysis, according to the microarray expression data and literature, the co-expressed TFs and their target genes can be retrieved from AtPAN. Additionally, proteins interacting with the co-expressed TFs are also incorporated to reconstruct co-expressed TRNs. Moreover, combinatorial TFs can be detected by the frequency of TFBSs co-occurrence in a group of gene promoters. In addition, TFBSs in the conserved regions between the two input sequences or homologous genes in Arabidopsis and rice are also provided in AtPAN. The output results also suggest conducting wet experiments in the future.
The AtPAN, which has a user-friendly input/output interface and provide graphical view of the TRNs. This novel and creative resource is freely available online at http://AtPAN.itps.ncku.edu.tw/.
Flowering time is a key life-history trait in the plant life cycle. Most studies to unravel the genetics of flowering time in Arabidopsis thaliana have been performed under greenhouse conditions. Here, we describe a study about the genetics of flowering time that differs from previous studies in two important ways: first, we measure flowering time in a more complex and ecologically realistic environment; and, second, we combine the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping. Our experiments involved phenotyping nearly 20,000 plants over 2 winters under field conditions, including 184 worldwide natural accessions genotyped for 216,509 SNPs and 4,366 RILs derived from 13 independent crosses chosen to maximize genetic and phenotypic diversity. Based on a photothermal time model, the flowering time variation scored in our field experiment was poorly correlated with the flowering time variation previously obtained under greenhouse conditions, reinforcing previous demonstrations of the importance of genotype by environment interactions in A. thaliana and the need to study adaptive variation under natural conditions. The use of 4,366 RILs provides great power for dissecting the genetic architecture of flowering time in A. thaliana under our specific field conditions. We describe more than 60 additive QTLs, all with relatively small to medium effects and organized in 5 major clusters. We show that QTL mapping increases our power to distinguish true from false associations in GWA mapping. QTL mapping also permits the identification of false negatives, that is, causative SNPs that are lost when applying GWA methods that control for population structure. Major genes underpinning flowering time in the greenhouse were not associated with flowering time in this study. Instead, we found a prevalence of genes involved in the regulation of the plant circadian clock. Furthermore, we identified new genomic regions lacking obvious candidate genes.
Dissecting the genetic bases of adaptive traits is of primary importance in evolutionary biology. In this study, we combined a genome-wide association (GWA) study with traditional linkage mapping in order to detect the genetic bases underlying natural variation in flowering time in ecologically realistic conditions in the plant Arabidopsis thaliana. Our study involved phenotyping nearly 20,000 plants over 2 winters under field conditions in a temperate climate. We show that combined linkage and association mapping clearly outperforms each method alone when it comes to identifying true associations. This highlights the utility of combining different methods to localize genes involved in complex trait natural variation. Most candidate genes found in this study are involved in the regulation of the plant circadian clock and, surprisingly, were not associated with flowering time scored under greenhouse conditions. While rapid advances have been made in high-throughput genotyping and sequencing, high-throughput phenotyping of complex traits under natural conditions will be the next challenge for dissecting the genetic bases of adaptive variation in “laboratory” model organisms.
Plant growth promotion is a multigenic process under the influence of many factors; therefore an understanding of these processes and the functions regulated may have profound implications. Present study reports microarray analysis of Arabidopsis thaliana plants inoculated with Pseudomonas putida MTCC5279 (MTCC5279) which resulted in significant increase in growth traits as compared with non-inoculated control. The gene expression changes, represented by oligonucleotide array (24652 genes) have been studied to gain insight into MTCC5279 assisted plant growth promotion in Arabidopsis thaliana. MTCC5279 induced upregulated Arabidopsis thaliana genes were found to be involved in maintenance of genome integrity (At5g20850), growth hormone (At3g23890 and At4g36110), amino acid synthesis (At5g63890), abcissic acid (ABA) signaling and ethylene suppression (At2g29090, At5g17850), Ca+2 dependent signaling (At3g57530) and induction of induced systemic resistance (At2g46370, At2g44840). The genes At3g32920 and At2g15890 which are suggested to act early in petal, stamen and embryonic development are among the downregulated genes. We report for the first time MTCC5279 assisted repression of At3g32920, a putative DNA repair protein involved in recombination and DNA strand transfer in a process of rapid meiotic and mitotic division.
Induced systemic resistance; Plant growth promoting bacteria
Arabidopsis thaliana is the model plant and is grown worldwide by plant biologists seeking to dissect the molecular underpinning of plant growth and development. Gene copy number variation (CNV) is a common form of genome natural diversity that is currently poorly studied in plants and may have broad implications for model organism research, evolutionary biology, and crop science. Herein, comparative genomic hybridization (CGH) was used to identify and interrogate regions of gene CNV across the A. thaliana genome. A common temperature condition used for growth of A. thaliana in our laboratory and many around the globe is 22 °C. The current study sought to test whether A. thaliana, grown under different temperature (16 and 28 °C) and stress regimes (salicylic acid spray) for five generations, selecting for fecundity at each generation, displayed any differences in CNV relative to a plant lineage growing under normal conditions. Three siblings from each alternative temperature or stress lineage were also compared with the reference genome (22 °C) by CGH to determine repetitive and nonrepetitive CNVs. Findings document exceptional rates of CNV in the genome of A. thaliana over immediate family generational scales. A propensity for duplication and nonrepetitive CNVs was documented in 28 °C CGH, which was correlated with the greatest plant stress and infers a potential CNV–environmental interaction. A broad diversity of gene species were observed within CNVs, but transposable elements and biotic stress response genes were notably overrepresented as a proportion of total genes and genes initiating CNVs. Results support a model whereby segmental CNV and the genes encoded within these regions contribute to adaptive capacity of plants through natural genome variation.
natural variation; genome duplication; gene copy number variation; comparative genomic hybridization; genome evolution; Arabidopsis
Receptor-like kinases (RLK) are among the largest gene families encoded by plant genomes. Common structural features of plant RLKs are an extracellular ligand binding domain, a membrane spanning domain, and an intracellular protein kinase domain. The largest subfamily of plant RLKs is characterized by extracellular leucine-rich repeat (LRR-RLK) structures that are known biochemical modules for mediating ligand binding and protein–protein interactions. In the frame of the Arabidopsis Functional Genomics Network initiative of the German Research Foundation (DFG) we have conducted a comprehensive survey for and functional characterization of LRR-RLKs potentially implicated in Arabidopsis thaliana immunity to microbial infection. Arabidopsis gene expression patterns suggested an important role of this class of proteins in biotic stress adaptation. Detailed biochemical and physiological characterization of the brassinosteroid insensitive 1-associated receptor kinase 1 (BAK1) revealed brassinolide-independent roles of this protein in plant immunity, in addition to its well-established function in plant development. The LRR-RLK BAK1 has further been shown to form heteromeric complexes with various other LRR-RLKs in a ligand-dependent manner, suggesting a role as adapter or co-receptor in plant receptor complexes. Here, we review the current status of BAK1 and BAK1-interacting LRR-RLKs in plant immunity.
plant innate immunity; LRR-RLKs; receptor complexes; BAK1
Over the past years, microarray databases have increased rapidly in size. While they offer a wealth of data, it remains challenging to integrate data arising from different studies. Here we propose an unsupervised approach of a large-scale meta-analysis on Arabidopsis thaliana whole genome expression datasets to gain additional insights into the function and regulation of genes. Applying kernel principal component analysis and hierarchical clustering, we found three major groups of experimental contrasts sharing a common biological trait. Genes associated to two of these clusters are known to play an important role in indole-3-acetic acid (IAA) mediated plant growth and development or pathogen defense. Novel functions could be assigned to genes including a cluster of serine/threonine kinases that carry two uncharacterized domains (DUF26) in their receptor part implicated in host defense. With the approach shown here, hidden interrelations between genes regulated under different conditions can be unraveled.
Arabidopsis thaliana; microarray; unsupervised meta-analysis; function prediction; database; gene expression
Recent years have seen an explosion in plant genomics, as the difficulties inherent in sequencing and functionally analyzing these biologically and economically significant organisms have been overcome. Arabidopsis thaliana, a versatile model organism, represents an opportunity to evaluate the predictive power of biological network inference for plant functional genomics.
Here, we provide a compendium of functional relationship networks for Arabidopsis thaliana leveraging data integration based on over 60 microarray, physical and genetic interaction, and literature curation datasets. These include tissue, biological process, and development stage specific networks, each predicting relationships specific to an individual biological context. These biological networks enable the rapid investigation of uncharacterized genes in specific tissues and developmental stages of interest and summarize a very large collection of A. thaliana data for biological examination. We found validation in the literature for many of our predicted networks, including those involved in disease resistance, root hair patterning, and auxin homeostasis.
These context-specific networks demonstrate that highly specific biological hypotheses can be generated for a diversity of individual processes, developmental stages, and plant tissues in A. thaliana. All predicted functional networks are available online at http://function.princeton.edu/arathGraphle.
Heterosis is an important phenomenon in agriculture. However, heterosis often
greatly varies among hybrids and among traits. To investigate heterosis across a
large number of traits and numerous genotypes, we evaluated 12 life history
traits on parents and hybrids derived from five Arabidopsis
thaliana ecotypes (Col, Ler-0, Cvi, Ws, and C24)
by using a complete diallel analysis containing 20 hybrids. Parental
contributions to heterosis were hybrid and trait specific with a few reciprocal
differences. Most notably, C24 generated hybrids with flowering time, biomass,
and reproductive traits that often exceeded high-parent values. However,
reproductive traits of C24 and Col hybrids and flowering time traits of C24 and
Ler hybrids had no heterosis. We investigated whether
allelic variation at flowering time genes FRIGIDA
(FRI) and FLOWERING LOCUS C
(FLC) could explain the genotype- and trait-specific
contribution of C24 to hybrid traits. We evaluated both Col and
Ler lines introgressed with various FRI
and FLC alleles and hybrids between these lines and C24.
Hybrids with functional FLC differed from hybrids with
nonfunctional FLC for 21 of the 24 hybrid-trait combinations.
In most crosses, heterosis was fully or partially explained by
FRI and FLC. Our results describe the
genetic diversity for heterosis within a sample of A. thaliana
ecotypes and show that FRI and FLC are major
factors that contribute to heterosis in a genotype and trait specific
heterosis; FRIGIDA; FLOWERING LOCUS C; diallel; Arabidopsis thaliana
Crop yield is a highly complex quantitative trait. Historically, successful breeding for improved grain yield has led to crop plants with improved source capacity, altered plant architecture, and increased resistance to abiotic and biotic stresses. To date, transgenic approaches towards improving crop grain yield have primarily focused on protecting plants from herbicide, insects, or disease. In contrast, we have focused on identifying genes that, when expressed in soybean, improve the intrinsic ability of the plant to yield more. Through the large scale screening of candidate genes in transgenic soybean, we identified an Arabidopsis thaliana B-box domain gene (AtBBX32) that significantly increases soybean grain yield year after year in multiple transgenic events in multi-location field trials. In order to understand the underlying physiological changes that are associated with increased yield in transgenic soybean, we examined phenotypic differences in two AtBBX32-expressing lines and found increases in plant height and node, flower, pod, and seed number. We propose that these phenotypic changes are likely the result of changes in the timing of reproductive development in transgenic soybean that lead to the increased duration of the pod and seed development period. Consistent with the role of BBX32 in A. thaliana in regulating light signaling, we show that the constitutive expression of AtBBX32 in soybean alters the abundance of a subset of gene transcripts in the early morning hours. In particular, AtBBX32 alters transcript levels of the soybean clock genes GmTOC1 and LHY-CCA1-like2 (GmLCL2). We propose that through the expression of AtBBX32 and modulation of the abundance of circadian clock genes during the transition from dark to light, the timing of critical phases of reproductive development are altered. These findings demonstrate a specific role for AtBBX32 in modulating soybean development, and demonstrate the validity of expressing single genes in crops to deliver increased agricultural productivity.