A number of challenges arise when conducting EWAS (Fig. ). I discuss different categories of these challenges below.
The main challenge in the EWAS field is the selection of samples. In epidemiologic studies, blood is the most common sample source. In ideal situations, tissues should be used for any analysis because epigenetic signatures are different in different types of cells. However, tissue collection involves invasive technologies, and problems arise when samples are collected from the same subjects for follow-up studies.
Populations are selected for epidemiologic studies based on the research questions. Problems associated with low recruitment and dropout rates should be considered before planning a study with a large number of subjects. Reaching an adequate high-quality sample size remains a challenge in epidemiologic studies. Different kinds of biological samples have been used for epigenetic analysis (see Table ). Variations between persons and within a person may be challenging when considering the dynamic nature of epigenetic-mediated gene regulation. Unlike genomic information, which is static, epigenomic information changes with time.
Epigenetic profiles respond to environmental changes, and identifying disease-specific profiles requires knowledge of exposures, modifiable factors, and their effects. Exposure measurement limitations and the subsequent risk of misclassification present another major challenge. Improved exposure assessment techniques are needed. Several papers assess the contribution of environmental factors in cancer etiology at 1 to 19% [
35-
40]. Alterations in epigenetic profiles may be due to specific genes or to a large section of the genome. The effects of gene-environment interactions already are taken into account because the epigenotype reflects the effects of the environment and genes on the epigenome. Folic acid, choline, and methionine are known to directly affect epigenetic profiles [
41-
43].
After epigenetic profiles and markers are identified, they should be mapped, and their relationship should be established in large-scale studies. The large amount of data produced from each run (may exceed 1Gb) poses another challenge for EWAS. Thus far, the genome-wide epigenetic profiling of LINE and SINE sequencing has not been completed. When these data are combined with previously collected data, it will be challenging to integrate all of these data and reach useful conclusions. Only a few laboratories have the expertise required to conduct analyses of all four of the components of epigenetics. Collaboration is needed among investigators with different expertise, such as in methylation, histone, chromatin, miRNA, epidemiology, bioinformatics, and statistics. Partnerships with industry may be helpful in this area. Histone profiling is based on ChIP-on-chip analysis, for which high-quality monoclonal antibodies should be used. Few commercially available monoclonal antibodies are of high quality, however. The NIH Epigenome Roadmap has initiated a program to generate monoclonal antibodies against all major histone modifications (
http://commonfund.nih.gov/epigenomics/index.asp). These antibodies will be made readily available to investigators.
New tools are needed to fully utilize GWAS data and determine their correlation with EWAS data. These tools should facilitate the integration of genetic and environmental exposure data into EWAS data to enable the identification of populations at high risk of developing cancer. Data generated by the NIH Epigenome Roadmap program are maintained at the National Center for Bioinformatics Information (NCBI) and dbGAP (the Database of Genotypes and Phenotypes). GWAS data are stored and maintained at dbGAP. Currently, however, no arrangements have been made for storing and maintaining EWAS data at a central location. A variety of approaches have been adopted for epigenetic and epidemiologic studies, such as cohort studies, case-control studies, cross-sectional studies, intervention studies, family-based studies, twin cohorts, and birth cohorts, but the best approach has not been determined. High-throughput technologies are required for these studies. Thus far, most studies have been completed using differential methylation profiles.
Much of the genome has repeat sequences (Alu and LINE) that are methylated in the normal state [
44]. The role of these repeat sequences has not been incorporated into the outcomes of most of the epidemiology studies conducted thus far. A systematic study of the these sequences and their association should be conducted. These sequences are prone to integration by infectious agents and may contribute to the development of cancer.
In conclusion, there is a need to translate curiosity-driven basic research into patient-care-focused clinical research via epigenomics.