|Home | About | Journals | Submit | Contact Us | Français|
The hypothesis that environmental factors alter somatically heritable epigenetic marks and change long-term patterns of gene expression is an exciting possibility in human disease research. Because most common diseases, and many quantitative traits, are influenced by both genetic and environmental factors, environmentally induced changes in epigenetic structures can provide a mechanistic link between genes and environment. We believe that inter-individual differences in the epigenetic modification of genes will explain a much greater fraction of inter-individual phenotypic variation than differences in genotype, alone.
One of the long-awaited promises of human genome sequencing and whole genome association analysis was the identification of genes involved in common human diseases.1 In fact, human geneticists have been able to deliver on this promise in a number of cases, identifying genes involved in type 2 diabetes,2,3 asthma4,5 and many other diseases.6–8 In the minds of the public, the identification of disease genes was one step on the road to personalized medicine. Unfortunately, it has turned out to be a rather small step, in most instances.
The reason that most of us have not had our genomes sequenced9 (even if we could get our genomes sequenced for $1000) is that, in the case of type 2 diabetes, for example, the identification of “risk” alleles at the disease loci provides little predictive power; each genetic “risk” variant is associated with an average odds ratio of 1.18 for SNPs at ten loci with very strong association.10 Although that fact may not be very surprising to many geneticists, given the multifactorial, multigenic nature of common diseases, it begs the question of what type of information would add to the predictive power of genetic risk in determining phenotype. In this vein, there is potential for different measures of “epigenotype” (DNA methylation, histone methylation/acetylation/sumoylation, etc.), acting as “readouts” of the impact of the environment, to add significant predictive power to genetic information, alone.11 Given this excitement, it is fair to evaluate the magnitude of the problem and to ask whether epigenetics, writ large, is up to the task.
Inter-individual differences in phenotype, whether associated with disease or not, are generally assumed to reflect inter-individual differences in the expression of genes. Inter-individual differences in gene expression can be qualitative (functional product versus non-functional product, for example) or quantitative (amount of product or relative amounts of functional and non-functional products, etc.). In fact, one of the most surprising observations to emerge from human transcriptome profiling is the very high level of inter-individual variability found in steady state mRNA levels of many genes. The inter-individual differences (often an order of magnitude) do not appear to be a result of a technical artifact attributable to the arrays because similar large inter-individual differences for many genes have been validated by quantitative RT-PCR. Several examples from our own laboratory are shown in Figure 1 but additional examples may be found in reports from several groups.12–15
Many inter-individual differences in mRNA level appear heritable and can be treated as quantitative traits (so-called “expression QTLs” or “eQTLs”), in much the same way as height or blood pressure.16,17 Over the past few years, several groups have used genome-wide association approaches to map the genetic determinants of inter-individual differences in the level of specific mRNAs. Both cis- and trans-acting genetic factors have been identified18–21 although cis-acting factors are more likely to be identified for a number of reasons.12,22
The good news to come out of such analyses is that many of the SNPs associated with eQTLs are very close to the genes, themselves, and can be expected to represent promoter “strength” alleles, binding sites for repressors, or other easily envisioned functional categories, although not proven to be in most cases.22 The bad news is that the associated SNPs explain only a small fraction of the variance observed in transcript level.14 So, we are left, once again, with the conundrum that a trait that shows moderate heritability (transcript level of a particular gene) and that can be mapped to specific sites in the genome, refuses to behave in a predictable fashion, at least on an individual basis. It is enough to drive any geneticist testifying before a congressional committee on personalized medicine to distraction.
It is fair to ask whether measures of epigenotype would perform any better in explaining the level of variance observed in transcript levels. In theory, at least, it is possible for epigenetic measures that can be continuous variables (0–100% methylation of a particular CpG site, for example) to predict widely varying transcript levels better than genetic measures that tend to be trichotomous (AA, AB, BB), at best.
We have examined inter-individual differences in DNA methylation and gene expression in children conceived in vivo or in vitro in a recent report.23 While the design and scale of our experiment did not permit us to map epigenetic determinants of inter-individual differences in transcript level, genome-wide, we did find that a fraction of genes that exhibited significant differences between groups in CpG site methylation also exhibited significant differences in transcript level.23 We hypothesize that many such correlations indicate cis-effects of DNA methylation on gene expression. Regression of CpG methylation level on transcript level in these cases can provide an estimate of the fraction of variance in transcript level that can be accounted for by cis-acting epigenetic factors.
Two of the best examples derived from our study23 are shown in Figure 2. Methylation of a single CpG site adjacent to the CCAAT/enhancer binding protein alpha (CEBPA) locus accounts for approximately 10% of the inter-individual variance in cord blood transcript level, while the methylation of a single site adjacent to serpin peptidase inhibitor, clade F, member 1 (SERPINF1) accounts for approximately 5% of inter-individual differences in transcript level in placenta. While these numbers are nothing about which epigeneticists should puff up their chests, they compare favorably with estimates for the fraction of variation in transcript level explained by of cis-acting genetic factors (approximately 5%).20
Given that there are many epigenetic modifications that could be assayed in a similar fashion, the addition of epigenetic factors can be expected to explain a greater and greater proportion of inter-individual variance. Taking into account the fact that the regressions shown in Figure 2 do not control for any inter-individual genetic differences that might confound the effect of methylation on transcript level, it is heartening that the magnitude of the epigenetic effect seen is within the range of those discovered in whole-genome association searches for genetic factors affecting mRNA levels.12,20 If trans-acting epigenetic factors account for a similar fraction of inter-individual variance as trans-acting genetic factors (approximately 38%)20 the combination of the two types of information should be powerful, indeed.11 Even if genetic and epigenetic effects are only additive, rather than synergistic, the resulting increase in predictive power is likely to be large enough to fulfill the expectations of personalized medicine; i.e., patient specific information that provides relative risk estimates that are large enough to affect patient behavior and/or standards of care.
With respect to our studies of the possible effects of in vitro conception on epigenetic marks and patterns of gene expression,23 what could be the outcome of any validated effects? There is no question that the great majority of children conceived through the assisted reproductive technologies appear normal at birth.24 However, as a group, these children are of lower birth weight and are more often born prematurely (even when multiple pregnancies are taken into consideration).25,26 These issues put some of these children at risk for development of systemic diseases such as obesity, hypertension and/or cardiovascular disease in adulthood and possibly other ailments in later life.27,28 We should not forget that the oldest child conceived through in vitro fertilization is only 30 years old. Given all of the observations on the range of inter-individual variability observed, the challenge facing us is the identification of other “markers” that will accurately select which individuals among the whole cohort are at risk for the development of health problems later in life. Could specific epigenetic marks prove to be such “markers”? And, even better, could any marks altered by the process be re-altered, thus allowing for clinical intervention?
This work was supported by the National Institutes of Health (R01 HD048730 to C.S. and C.C.).