|Home | About | Journals | Submit | Contact Us | Français|
Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. In quantitative trait locus (QTL) analyses this variation, captured in segregating mapping populations, is used to identify the genomic regions affecting these traits. The identification of the causal genes underlying QTLs is a major challenge for which the detection of gene expression differences is of major importance. By combining genetics with large scale expression profiling (i.e. genetical genomics), resulting in expression QTLs (eQTLs), great progress can be made in connecting phenotypic variation to genotypic diversity. In this review we discuss examples from human, mouse, Drosophila, yeast and plant research to illustrate the advances in genetical genomics, with a focus on understanding the regulatory mechanisms underlying natural variation. With their tolerance to inbreeding, short generation time and ease to generate large families, plants are ideal subjects to test new concepts in genetics. The comprehensive resources which are available for Arabidopsis make it a favorite model plant but genetical genomics also found its way to important crop species like rice, barley and wheat. We discuss eQTL profiling with respect to cis and trans regulation and show how combined studies with other ‘omics’ technologies, such as metabolomics and proteomics may further augment current information on transcriptional, translational and metabolomic signaling pathways and enable reconstruction of detailed regulatory networks. The fast developments in the ‘omics’ area will offer great potential for genetical genomics to elucidate the genotype-phenotype relationships for both fundamental and applied research.
Ever since the current paradigm of gene transcription preceding biological function, research on gene function has focused on expression studies. With the ever increasing availability of genomic sequences and the introduction of microarray technology, enabling the high-throughput analysis of gene expression, this has rapidly become a favorite tool for many researchers . In a typical microarray experiment specific conditions or developmental stages are studied by comparing expression profiles and determining differences in gene transcription. The object of profiling can be a single genotype showing phenotypic diversity in a spatial and temporal manner, e.g. in different tissues and developmental stages, or when exposed to different growing conditions (e.g. ). In this way, large compendia of expression data have been acquired, providing ontological information of genes involved in developmental control and environmental responses [3-5].
Although much of the relationship between the temporal expression of genes and their function can be learned from these analyses, often no information can be obtained about the genetic regulation of transcription or whether expression differences are causal for or a consequence of phenotypic differences. Therefore, instead of comparing different conditions of a single genotype, equivalent samples of different genotypes varying in the trait of interest are often analyzed. These can be natural variants within a species, artificial mutants or transgenics like knockout and over expression lines. Such analyses have shown to be extremely powerful in determining directionality in biological pathways, especially in qualitative traits [6, 7]. However, it becomes extremely difficult to define a proper experimental setup when the trait of interest is complex and has a quantitative character. Geneticists are used to deal with this type of complex traits by using the power of natural variation within species.
In segregating populations, derived from crosses between distinct parents and genotyped with molecular markers, linkage is sought between variation in the trait of interest and genotypic diversity . For this purpose a broad range of software tools and statistical analyses are available. The identified genomic regions explaining the observed phenotypic variation are commonly referred to as quantitative trait loci (QTLs), which can subsequently be used for marker assisted breeding purposes without further knowledge of the underlying genes . Whenever the purpose is to identify the causal genes or even nucleotide polymorphisms (QTNs) underpinning a given QTL, one needs to invest in follow-up analyses for fine mapping and ultimately the cloning of a QTL . This approach however, is very labor intensive and time consuming and confines the classical QTL mapping to a low throughput technique.
Like for many physiological traits, variation in gene expression often shows a quantitative distribution, hence, all the classical statistical tools and concepts for QTL mapping can be applied for its genetic dissection. Thus, subjecting expression variation to linkage analysis identifies genetic regulatory loci, and ideally genes, explaining the observed variation. Knowing the position of genes and their corresponding expression QTLs (eQTLs) renders great opportunities for dissecting quantitative traits. This was first recognized by Jansen and Nap  who outlined a concept, coined ‘genetical genomics’, in which the combination of a genotyped segregating population (i.e. genetics) and genome-wide expression profiling (i.e. genomics) is used to formulate hypothetic regulatory pathways and unravel complex traits in a more high-throughput manner. Analogously, similar approaches can be followed for data derived from other ‘omic’ technologies such as proteomics (pQTLs) and metabolomics (mQTLs) .
The first study reporting a proof of principle of genetical genomics was performed in Saccharomyces cerevisiae . In a relatively small population of 40 haploid segregants from a cross between a laboratory and a wild type strain, it was shown that parental differences in gene expression were highly heritable and amenable to genetic mapping. This first report was quickly followed by more comprehensive eQTL studies in higher eukaryotes  and has now been applied in a broad range of taxonomic kingdoms including yeast [13, 15-17], nematodes , insects [19, 20], plants [21-24], rodents [25-27] and humans [28-33]. All studies demonstrated the power of combining gene expression and genetic analyses to refine molecular pathways involved in complex phenotypes and to identify key driver genes thereof. Moreover, they have shown general and conserved mechanisms of expression regulation which improved our understanding of adaptive strategies and evolutionary concepts [19, 34].
In this review we will discuss the genetic architecture of gene expression regulation, embarking on recent findings in the reference plant Arabidopsis thaliana, the implications of genetical genomics approaches for crop species and the impact of genetic analyses of ‘omics’ data on the construction of regulatory networks. We will discuss future prospects and speculate on the utilization of advancing technological developments for genetic studies.
The detection of eQTLs depends on a number of factors, which together determine the proportion of genetically regulated genes that can be observed. First, biological factors such as the assayed tissue, developmental stage or environmental conditions and the genotypic diversity present in the mapping population determine which genes are expressed and exhibit allelic variants, respectively. Second, statistical issues like population type and size, genetic map quality, measurement accuracy and the number of genes analyzed determine mapping power and detection thresholds. Because all these aspects vary between different experiments, reported fractions of regulated genes range from only a handful to over 50% of the total gene content.
Given the prerequisite of allelic variation, there can be many reasons why genes are differentially expressed in genotypically diverse individuals of a species. Well-known phenomena are allelic variants of transcription factors and other regulators, cis-elemental variation in promoter sequences, differences in mRNA stability, copy number variation and genomic rearrangements such as translocations, insertions and deletions. The latter include gene loss and duplication, resulting in neo- and sub-functionalization. Most of these variations in DNA structure will result in eQTLs but depending on the position of the causal polymorphism, an important dissection is made in local and distant eQTLs Fig. (11) . Local eQTLs can be the result of closely linked trans-acting factors but in the majority of cases result from cis-regulatory variation in the genes under study. By definition eQTLs acting in cis affect transcription initiation, rate and/or transcript stability in an allele-specific manner. In addition, cis-regulated genes might encode regulators affecting the expression of downstream target genes in trans. Although the exact proportion varies between studies the occurrence of cis-acting eQTLs is substantial ranging from one-third to half of the total number of eQTLs .
However, because of limitations in mapping resolution, eQTL support intervals may still contain multiple genes and as a result the classification of cis-eQTLs should be used with care. To discriminate true cis-regulatory polymorphisms from local trans-regulation, allele specific expression (ASE) assays can be performed . In such assays a transcribed polymorphism is used to enable discrimination between the parental transcripts and test for allele specific expression in an F1 hybrid. Because both parental alleles share the same genetic background in F1 hybrids, and therefore are equally exposed to trans-acting factors, any difference in expression can only be explained by true cis-acting variation. Usually, ASE-assays are performed by single gene qRT-PCR approaches but the recent development of whole genome SNP-tile microarrays (e.g. in Arabidopsis) enables the simultaneous testing of genome-wide ASE .
Although expression differences are treated as quantitative traits in mapping approaches, qualitative differences, characterized by a total lack of expression for one of the allelic variants, can also be observed. The variation in a measurable detection signal can be due to differences in hybridization efficiency, which can be confirmed with genomic DNA hybridization, or genuine loss of transcription. Hybridization efficiency differences are often caused by polymorphisms in the complementary sequences of the microarray probes or mRNA splice variation and are not necessarily accompanied by transcription differences. True transcription variation however, can be caused by strong polymorphisms in promoter regions, premature stop mutations and even the complete absence of genes in one of the parental lines . Both hybridization and true transcription variation will lead to strong cis-eQTLs which can subsequently be used as molecular markers, allowing the construction of high-resolution maps [40, 41].
The majority of differentially expressed genes will show a quantitative expression profile with complex inheritance patterns. This is because in general genes are regulated by many independent factors which can show up as trans-eQTLs. Because of the multiplicity of regulators and the often-observed epistasis between them, each trans-eQTL can have a relatively small effect. In addition, compared to the direct regulation of cis-eQTLs, the accumulation of stochastic variation in the expression of trans-regulated genes is indirectly also determined by the expression variation of one or more regulators. As a result the detected number of trans-eQTLs relative to the number of cis-eQTLs drops when the stringency for detection is increased .
Whereas cis-eQTLs are inherently associated with the gene in which they reside, a single gene can be responsible for the appearance of multiple trans-eQTLs throughout the genome. As a consequence the genome-wide distribution of cis-eQTLs is dependent on local gene density, although variation in chromatin structure can have an impact on the exposure of eQTLs. The distribution of trans-eQTLs however, can deviate substantially from what can be expected based on gene density. The identification of so-called hot spots, genomic regions with a high density of trans-eQTLs, can be explained by major regulators, e.g. transcription factors, which influence the expression of many downstream genes. In Arabidopsis this was illustrated by the large number of genes mapping to the ERECTA locus, a gene well-known for its pleiotropic effects on many morphological and developmental traits . These findings suggest that the effects of key-regulators in gene expression are progressed to the phenotypic level. This was recently confirmed in a QTL study comparing transcript, protein and metabolite data with phenotypic traits . Here, only a limited number of QTL hot spots with major, system-wide effects were detected, indicating that most of the genotypic variation is phenotypically buffered. These findings support the theory of biological robustness where hotspots indicate fragilities in this genetic buffering system . Until now only a few reported hotspots have been verified and the number of detected hotspots is far from consistent between different genetical genomics studies. The latter reflects differences in the analyzed populations, species and conditions used and additionally might be the consequence of different statistical procedures used to identify eQTLs . Because of the difficulties in cloning QTLs and the large biological relevance of hotspots, additional sources of information are often used to reduce the number of candidate genes or even predict the causal regulator. Such methods use information on gene ontology, (co-)expression, transcription factor binding sites and targets, ChIP-Seq and protein-protein interaction . Together with computational methods such as regulatory modeling this can severely reduce the number of candidate genes and prioritize remaining candidates for further experimentation Fig. (22).
As discussed above many principles of genetic regulation are shared among different phylogenetic taxa. Not all species however are equally suited for large-scale experimentation. Sometimes evolutionary distances withhold translation of biological relevant findings in less conserved mechanisms, e.g. in yeast and Drosophila, or long generation times, inbreeding depression and moral and ethical issues hinder experimentation, e.g. in humans and other mammals. Plants, representing one of the largest kingdoms, are therefore often used to test concepts in genetic studies. The ease to generate large families from experimental crosses and the ability to store genotypes in the form of seeds or clonal propagation make plants ideal subjects to study the mechanistic basis of genetic regulation of traits.
The comprehensive resources which are available for Arabidopsis thaliana, such as a whole genome sequence, a large collection of natural variants and an ever-increasing number of molecular tools, made it the favorable model for genetical genomics research. As a non-obligate selfing species Arabidopsis combines the ability to cross-pollinate with high tolerance to inbreeding. Together with its short generation time and high reproductive success rate this enables the fast generation of large experimental populations such as Recombinant Inbred (RI) and Introgression Line (IL) populations. The availability and immortal character of such populations enable the accurate estimation of phenotypic values through replicated measurements and allows the testing of traits in different environments .
Traditionally QTL studies of ‘classical’ physiological traits in RIL populations are followed by mendelizing detected QTLs in near isogenic lines (NILs) for detailed analyses. By isolating QTLs from their genetic background it becomes much simpler to study their genetic effect and relate resulting phenotypes to other processes. Because it is expected that much of the phenotypic variation is the resultant of differences in gene expression and phenotypic perturbation in turn leads to transcriptional reprogramming, data mining for relationships between trait values and expression levels has become a common tool . Very often mutants, knockouts or over-expression lines are used for these purposes, in which the effect of a single gene perturbation is tested on both the phenotypic and the expression level. For complex traits however, the causal genes leading to altered phenotypes are often not known and QTL analyses only identify genomic regions containing such genes. Nevertheless, using RIL populations to identify QTLs for a phenotypic trait and subsequently analyzing NILs for expression differences can be a powerful alternative to explore the functional relationship between genotype and phenotype (e.g. [49, 50]). Although the regions spanned by NILs can still contain hundreds of genes, of which many may display allelic variation between accessions, the cis-regulated genes are strong candidates explaining phenotypic diversity. High detection stringency can limit the number of differentially expressed genes to a reasonable number of candidate genes with strong local eQTLs .
The availability of a whole genome sequence in Arabidopsis provides unique opportunities, especially when multiple (epistatic) phenotypic QTLs are detected. Knowing the position of genes allows the identification of strong cis-regulated genes collocating with phenotypic QTLs. An early eQTL study in Arabidopsis analyzed genome-wide gene expression in a limited population of only 30 individuals, mimicking shoot regeneration conditions . Two of the eQTL hotspots found coincided with shoot regeneration QTLs. The most significant eQTLs within these hotspot regions showed local chromosal linkage with their corresponding genes but the majority acted distantly. These results suggest that heritable cis-regulated expression changes of key-regulators determine in trans the expression of many genes related to differences in shoot regeneration efficiency between accessions. It also indicates that a long signaling cascade may exist between the causal genotypic polymorphism and the eventual phenotype.
In contrast to the former study it is not always necessary to combine phenotypic measurements with expression analysis. Often, many genes are known to play a role in the exposure of certain traits without knowledge about the genetic regulation of these genes. Specific analysis of such genes can help to identify common regulators. In the first genome-wide eQTL study in Arabidopsis, using a complete RIL population (162 lines), this concept was used to predict possible key-regulators of flowering time and circadian rhythms . The benefits of using large populations for eQTL studies became also apparent in another study where expression analyses were performed in a RIL population of 211 individuals . Whereas in the majority of cases only a single QTL could be detected per differentially expressed gene in the aforementioned studies, here the expression of many genes was controlled by multiple eQTLs. Moreover, a much larger fraction of genetically regulated genes was identified with a higher proportion of trans-regulated genes of which the vast majority exhibited small effects.
The studies performed in Arabidopsis show that the statistical power to detect eQTLs depends largely on population size. Nonetheless, it can not be excluded that differences in the analyzed tissues, developmental stages and populations used, such as parental variation, linkage distortion and recombination frequency, are responsible for part of the observed differences. All studies however, clearly demonstrated that variation in gene expression is for a large part genetically controlled, with much stronger effects of cis-eQTLs compared to trans-eQTLs. In general, cis-eQTLs also exhibit much higher heritability values and are obvious candidates to act as causal regulators of genes showing trans-eQTLs in the hotspots that could be detected in each of the discussed studies. The detection of regulatory loci for gene expression and the elucidation of their interaction networks might therefore provide the research community with a powerful tool to unravel the complex nature of natural variation in quantitative traits.
Genetical genomics studies in Arabidopsis and other model species have shown the enormous benefits of the availability of an annotated genome sequence. However, until now full annotated genome sequence information for agronomical important species is only available for a limited number of species, including Oryza sativa [51, 52], Populus trichocarpa ,Vitis vinifera  and papaya . This relatively low number of sequenced crop species can be explained by their often immense (polyploid) genome sizes and the highly repetitive nature of many crop genomes . Nevertheless, sequence efforts for many more species, are ongoing and the increasing power of next-generation sequencing will soon lead to an almost unrestricted availability of genomic sequence information. Although an annotated genome is a valuable resource for the comparison of the genomic position of genes and their respective eQTLs, for most crop species this is not feasible yet. Nonetheless, several studies in crops for which genetic maps are available have shown that comprehensive genetical genomics approaches are possible without the need for annotated genome sequences [57-62].
Illustratively, one of the first large genetical genomics experiments was performed in an economically important species, viz. Eucalyptus . QTL analysis of transcript levels of lignin-related genes showed that their mRNA abundance is regulated by two genetic loci coinciding with QTLs for stem diameter growth. Genetic mapping of some of the candidate genes showed that most of the lignin genes are under control of a trans eQTL hotspot which suggests that transcription of many of the genes in this pathway are under a higher level of coordinated control. A strong cis-regulated gene encoding S-adenosylmethionine synthase, collocating with the growth and transcription QTLs, was presented as the possible rate limiting step in lignin biosynthesis and as such a strong candidate for the observed QTLs .
In some crops the required availability of genomic sequence data for large-scale classification of cis/trans eQTLs can be circumvented by making use of synteny with other species. In wheat, synteny with rice was used to assist the physical mapping of wheat genes . A genetical genomics approach was conducted in a segregating population of 41 doubled haploid (DH) lines to study agronomic important seed quality parameters. Assuming that the most significantly different expressed genes were cis-regulated, a selection of genes was subjected to synteny analyses. This enabled the positioning of genes with biological relevant linkage to phenotypic traits in a species for which full genome sequence is not available yet.
In the absence of genome-wide micro-arrays, expressed sequence tag (EST) libraries allow the construction of species specific sub genome-scale microarrays. In maize, cell-wall digestibility, which is the major target for improving the feeding value of forage maize, was analyzed in a RIL population . In addition forty extreme RIL lines were hybridized on a small microarray with 439 preselected candidate ESTs for cell-wall digestibility genes for which 89 eQTLs could be mapped. One eQTL hotspot collocated with a cell-wall digestibility related QTL . The application of genetical genomics approaches can be of special interest here when the detection of eQTLs is combined with ASE assays. The thus identified cis-regulated genes can then be positioned on the genetic map where they may serve as candidate genes underpinning phenotypic QTLs.
An interesting alternative for species for which no (EST) sequence information is available at all, and hence no microarrays can be produced, is a gel-based cDNA-AFLP approach . Here AFLP band intensities, reflecting expression differences, are profiled for a large proportion of the transcribed gene pool enabling standard eQTL analyses procedures. AFLP bands showing significant eQTLs can subsequently be sequenced to obtain the identity of the gene from which the fragment derived. Additionally, the cDNA-AFLPs can be used to construct a genetic map.
The examples given above show that genetical genomics is not necessarily restricted to model species but can be applied to any species in which experimental crosses are possible even in the absence of genomic sequence or genetic map information. The potential of combining phenotypic QTL analysis with gene expression traits is shown in a number of economically important species, e.g. Populus , cotton , rice  and sunflower . The application of genetical genomics is particulary promising in breeding programs of crops that take advantage of hybrid vigour. The eQTLs involved in heterosis will segregate consistently in a F1 backcross population thereby identifying valuable targets for marker assisted breeding for the best combination of alleles in the parents of the hybrid .
Genetical genomics harbors the potential to dissect the genetic regulation of a specific biological process. Therefore, methods to reconstruct regulatory networks from eQTL data have obtained much attention. Prioritizing on cis-eQTLs that collocate with a phenotypic QTL is a valuable approach for causal gene discovery, but in many cases little is known about the global regulation, interaction and function of genes that control a biological process. Identification of a set of genes with a trans-eQTL at an identical position can help to dissect genetic variation that is influencing an entire pathway and can lead to the identification of initiating polymorphisms upstream in a network . Questions about the regulatory level at which trans polymorphisms act in the global gene expression network and what their effect is on phenotypic variation and heritability can only be addressed when eQTLs are further dissected.
With a genetical genomics approach one can use the natural genetic variation as a source of perturbations to elucidate the structure of networks. In a summation approach eQTLs for all genes in the analysis are simply superimposed to identify common regions which control many genes . Such an approach does not require any a priori network information but applies subsequent Gene Set Enrichment Analysis (GSEA) using gene onthology (GO) annotation or other descriptors to test whether selected genes share a common biological function . If the network under study is largely known or at least predicted, an a priori analysis can be performed. Here, the expression levels of individual genes in the network are converted into a common measure for the expression level of the entire network which is then used as the trait for QTL analysis. This strategy was tested in an Arabidopsis RIL population for 20 gene expression networks and resulted in statistically significant network variation for eighteen of the 20 predefined networks . Combining summation, GSEA and a priori network analyses allows the generation of a more specific hypothesis about phenotypic effects of network eQTLs. In a study using 175 genes, selected to be involved in regulation of flowering and circadian rhythms, 83 genes showed an eQTL . By combining co-expression analysis, which becomes feasible for microarray compendia of large populations, and positional information of genes and their eQTLs, it was possible to construct regulatory networks of key-regulators and their target genes, predicting unknown relationships and confirming common knowledge.
Pre-selection of known pathways can obviously hinder the elucidation of novel networks in a species, for which much effort is made to develop methods to translate eQTL data into network information using an a posteriori approach. As the precise balance of active components within a tightly controlled biological pathway is in part maintained by coordinately regulated gene expression, this creates possibilities to model networks by exploring co-expression of untargeted genes. To validate this hypothesis, gene expression in liver from a population of 60 mice with variation in diabetes susceptibility was analyzed . The combination of correlation analysis across a genetic dimension and linkage mapping enabled the identification of regulatory networks, functional predictions for uncharacterized genes and characterization of novel members of known pathways. A similar approach in Drosophila, complemented with information about gene ontology, tissue specific expression and transcription factor binding sites, led to the construction of multiple interconnected networks with biological relevance for phenotypic traits .
Understanding the mechanisms underlying trait regulation requires the identification of specific causal polymorphisms. For this purpose sophisticated self-learning algorithms have been developed which make use of conservation, type and position of a particular SNP to prioritize causal regulators by estimating the likeliness that it plays a causal role in gene expression variation . Extending such approaches might also provide the means to distinguish whether variation in gene expression or a regulatory network is the cause or a consequence of an altered phenotype, resulting in the construction of probabilistic directional networks . Defining such causal networks is also known as reverse engineering, because it aims at understanding how the system works as an integrated whole instead of only defining the functionally related components.
Although phenotypic variation can be partly explained by genetic variation in gene expression, this alone does not fully cover the possible differences in the regulatory mechanisms of an organism. Similar transcript levels of allelic gene variants can still result in varying protein levels because of variance in translational activity, protein degradation and post-translational modifications . Furthermore, variation in coding sequences can alter protein function resulting in a flexible metabolome in terms of chemical structure and function . Integrating ‘omics’ data such as gene expression, SNPs, metabolomics and proteomics in genetic studies can therefore reduce the number of candidate genes for a given QTL from hundreds to a manageable list without excluding regulatory mechanisms a priori. Because of the analytical complexity in analyzing large numbers of protein samples, genetical proteomics studies are limited (e.g. ) but advances made in biochemical detection have already enabled the large-scale untargeted genetic analysis of metabolic content [76-78].
The complex relationship between different levels of regulation was illustrated in a study integrating parallel QTL analyses of the expression of genes, activity of encoded enzymes and metabolites involved in primary carbohydrate metabolism . It could be shown that regulation acted on each of the intermediary levels of the path from genotype to phenotype. Although seemingly specific independent regulation could be observed for each analyzed trait, a strong interconnectivity existed between them resulting in coherent systematic differences between population individuals.
The importance of the tight regulation of such an essential component in plant development as primary metabolism was also demonstrated in an Arabidopsis RIL population where plant biomass was related to the metabolic profile . Again, no relationship could be observed between individual metabolites and plant growth but a strong canonical correlation was observed between biomass and a specific combination of metabolites in central metabolism. The power of large-scale metabolomic profiling combined with detailed morphological analysis was also shown in tomato . Significant QTLs could be detected for the accumulation of a large number of primary metabolites together with loci that modify yield-associated traits. With this information a correlation network revealing associations between phenotype, metabolic content and nutritional value could be generated. These studies show that analyzing phenotypic traits and metabolic profiles in a genetic mapping population has great potential for the generation of biomarkers in breeding programs.
Whereas primary metabolites are essential in central metabolism governing growth and development, plants also accumulate large amounts of secondary metabolites. These are believed to be less essential but may play an important role in the adaptation of plants to local environments. Since Arabidopsis can be found in a wide variety of habitats, variation in secondary metabolism might explain much of the evolutionary success of the species. A large untargeted screen of variation in secondary metabolic composition indeed revealed a high proportion of genetically controlled compounds . The highly flexible nature of the metabolome was clearly shown by the fact that more than one-third of the compounds present in the RILs were not detected in either parent but were the result of recombination in biosynthesis pathways. The genetic information obtained from such studies is of great value for the construction of molecular biosynthesis networks, especially if they can be combined with expression data.
This strategy was applied in the genetic analysis of glucosinolate biosynthetic networks which were studied at both the transcriptional and metabolic level . In all cases, variation in gene expression also affected the accumulation of metabolites but epistasis was detected more frequently for metabolic traits as compared to transcript traits. Within such an a priori defined framework it was possible to identify and unravel complex regulatory mechanisms like metabolic feedback loops in which metabolic content regulated gene expression and vice versa.
The examples discussed here highlight the technological advances made in high-throughput characterization of the transcriptome, the proteome and metabolome which enables an integrated multidisciplinary approach to unravel the regulatory mechanisms involved in natural variation of complex traits.
Although much progress is being made in understanding the influence of genetic factors on a biological system we still have limited understanding of the interplay between environment and genetic factors. The discovery of molecular networks with genetical genomics approaches is often limited to a single experimental condition. An interesting concept, called generalized genetical genomics, studies controlled environmental perturbations combined with genetical genomics . This generalization of genetical genomics will detect how the response to environmental changes is influenced by the genotype (i.e. genotype x environment interactions). Here, spatial and temporal variation can also be regarded as different environments since specific tissues and developmental stages often determine the biological context in which regulatory networks function.
The advances in next generation sequence technology will continue to produce huge amounts of sequence data. Good examples are the human 1000  and the 1001 Arabidopsis  genome projects which aim at resequencing over 1000 different humans and accessions respectively. However, de novo sequencing of economically important or phylogenetic strategically chosen species is of equal importance. The accumulation of genomic information, in combination with genetical genomics approaches, will enable the precise definition of functional important polymorphisms and their role in adaptation to changing environments and species formation. Having access to complete genome sequences also enables the generation of full genome tiling arrays for different (crop) species, which have been proven to be very useful for expression profiling [85, 86]. When used within a genetical genomics approach this offers unique features to elucidate the genetics behind the mechanistic basis of transcriptional differences. For Arabidopsis for instance, a SNPtile microarray was developed harboring tiling probes covering both strands of the genome and in addition probes for genome-wide detection of SNPs and CpG methylation . A properly designed genetical genomics study using such arrays might reveal genetic variation for gene expression, alternative splicing, regulation of cis-natural antisense transcripts, allele specific expression and epigenetic regulation.
As a result of developments in SNP-discovery and platforms for genotyping large collections of individuals, the application of Linkage disequilibrium (LD) mapping for complex traits has become within reach. LD or association mapping detects the non-random inheritance of alleles at separate loci located on the same chromosome. In an experimental F2 or RIL population the genetic variation is limited to the extent of natural variation present in the parental lines and resolution depends on the recombination frequency within and size of the population. In contrast LD mapping makes use of large collections of natural (wild) accessions or elite breeding lines, sampling a much larger fraction of the natural variation present within a species. Moreover, it benefits from the much higher frequency of recombination events accumulated during the evolutionary history of a species allowing higher resolution mapping . The extent of LD varies between species and traits analyzed but the gain in resolution relative to experimental populations lies in the order of magnitudes, equally increasing the need for dense marker spacing to enable genome wide scans . This high number of necessary markers has always been a big limitation for LD mapping but next generation sequencing will tremendously increase the available number of markers. Therefore, we see great potential for phenotyping and expression profiling of LD populations to detect causal genes for natural variation and enable marker-assisted selection in breeding programs.
Since its introduction the concept of genetical genomics has proven to be a powerful approach to dissect genetic variation. Studies in crop species revealed major cis-eQTLs which collocated with important phenotypic traits and therefore will facilitate faster crop improvement. The genetical genomics studies in model species help to understand the extent of genetic variation and much effort is spent to develop statistical tools for building and elucidating causal networks. Recent developments of inexpensive high-throughput sequencing techniques and next generation tiling microarrays will soon create opportunities to extend genetical genomics to unravel the genetic variation of gene expression, alternative splicing, allele specific expression and epigenetic polymorphisms. Similarly, continuing technological developments have increased the power of both proteomic and metabolomic approaches. Integration of phenotypic, genetic, transcriptomic, proteomic and metabolomic data will enable accurate and detailed network reconstruction. This will ultimately result in the elucidation of the molecular pathways involved in complex phenotypic traits.
This work was supported by grants from the Netherlands Organization for Scientific Research (STW 10027), VENI scheme (863.08.019) and the Centre for Biosystems Genomics (CBSG, Netherlands Genomics Initiative).