Large-scale surveys of gene expression variation in humans can provide important baseline information about 'normal' naturally-occurring variation among individuals. These data can be used to assess the significance of variation observed in experimental studies where groups of individuals (eg disease versus non-disease controls) are compared. In addition to being useful to the medical community, these studies will fundamentally increase our understanding of the causes of naturally-occurring gene expression variation. The total phenotypic variance among individuals (VP) for any trait can be broken down into a component of variance due to genotype (VG), a component due to environment (VE) and a component due to different genotypes in different environments (VGE), according to the following equation:
The genetic component of phenotypic expression variation reflects interindividual genetic differences that result in interindividual expression differences.
Little is known about the genetic basis of natural variation in gene expression. There are questions of fundamental biological importance, including, but not limited to:
• How much variation in gene expression (mRNA transcript levels) exists among individuals of a natural population of a single species?
• How much of this variation has a genetic component?
• How common are genetic polymorphisms in regulatory elements in natural populations, and what are the magnitudes of effect of these variants on mRNA levels?
• Are there 'regulatory hotspots', regions of the genome that affect transcription patterns of multiple genes?
• Is there interaction among loci, such that co-expressed gene complexes are observed?
Recent work in model organisms and humans has begun to address these and other questions.
Studies in model organism systems have documented significant, naturally-occurring variation in gene expression among individuals, including yeast,[26
] and mouse,[30
] although additional studies have made similar observations in fish,[34
] primates [36
] and humans [37
]. As it has become accepted that naturally-occurring variation in gene expression among individuals is a common phenomenon, focus has shifted toward trying to quantify the contribution of genetic factors to that variation and to locate the responsible genomic regions.
Yan et al
] were among the first to demonstrate a genetic component of expression variation in humans. For six of 13 loci surveyed in 96 Centre d'Etude du Polymorphisme Humain (CEPH)[43
] individuals, they observed significant differences in mRNA transcript abundance for the two alleles of heterozygous individuals (allelic imbalance). Furthermore, when families of individuals exhibiting allelic expression differences were examined; one-third of them showed expression patterns consistent with underlying Mendelian inheritance of functional variants. Other recent studies of allelic imbalance in humans and mice provided similar evidence for a functional genetic influence separate from that attributable to imprinting [18
]. In a large-scale microarray study, Cheung et al
] provided further evidence of familial aggregation of expression profiles. The authors surveyed genome-wide patterns of gene expression in immortalised lymphoblastoid cells of humans and identified a set of genes whose transcript level varied greatly among 35 unrelated CEPH individuals. To determine whether the variation was influenced by genetic differences segregating among individuals, mRNA transcript levels of the most variable genes were quantified in several samples of individuals of different degrees of genetic inter-relatedness, including a sample of 49 unrelated CEPH individuals (the 35 individuals mentioned above plus an additional 14), offspring from five CEPH families and ten pairs of monozygotic twins. The authors observed that genes exhibited less variability in transcript abundance in more closely related individuals, suggesting a heritable component of gene expression variation among individuals.
Some studies have gone a step further and used large-scale studies to estimate the percentage of genes that exhibit significant heritability. In a study of gene expression in lymphoblastoid cell lines of CEPH pedigrees, Schadt et al
] reported extensive differences among 56 individuals of four CEPH families in mRNA transcript levels, and through heritability analyses were able to estimate that approximately 29 per cent of these genes had a genetic component influencing these levels. Monks et al
] followed up this study with a massive survey of expression of 23,499 genes in 167 individuals of 15 CEPH families. Of the detected genes, 31 per cent exhibited significant heritability (false discovery rate 0.05), with a median heritability of 0.34.
The above studies in human and other species demonstrate gene expression is an abundantly variable phenotype with a genetic component; thus, gene expression -- or mRNA transcript level among individuals -- can be considered as a quantitative trait. In general, quantitative traits exhibit continuous phenotypic variation among individuals, and the genetic component of that variation is often due to contributions of more than one locus. By combining microarray quantification of gene expression among individuals with marker genotype data (eg single nucleotide polymorphisms; SNPs) for the same individuals, it has become possible to map the genomic regions containing factors responsible for natural variation in human gene expression by performing association analyses. In these analyses, first referred to as 'genetical genomics',[44
] transcript abundance of each of thousands of genes is treated as a quantitative phenotype [9
] that is under genetic control. Association analyses are used to map functional regulatory regions by associating genotype at an individual marker locus with the expression of each gene (Figure and ). These methods differ from family-based linkage analysis that traces genotypes and phenotypes of related individuals, looking for polymorphisms that co-segregate with the phenotype (Figure ).
Figure 2 (A) Structure of a hypothetical gene and the haplotype organisation of single nucleotide polymorphisms (SNPs) in the region. Vertical lines represent the location of SNPs, with two nucleotides of a single SNP shown. Horizontal blue lines represent the (more ...)