|Home | About | Journals | Submit | Contact Us | Français|
There is considerable evidence that inhaled toxicants such as cigarette smoke can cause both irreversible changes to the genetic material (DNA mutations) and putatively reversible changes to the epigenetic landscape (changes in the DNA methylation and chromatin modification state). The diseases that are believed to involve genetic and epigenetic perturbations include lung cancer, chronic obstructive pulmonary disease (COPD), and cardiovascular disease (CVD), all of which are strongly linked epidemiologically to cigarette smoking. In this review, we highlight the significance of genomics and epigenomics in these major smoking-related diseases. We also summarize the in vitro and in vivo findings on the specific perturbations that smoke and its constituent compounds can inflict upon the genome, particularly on the pulmonary system. Finally, we review state-of-the-art genomics and new techniques such as high-throughput sequencing and genome-wide chromatin assays, rapidly evolving techniques which have allowed epigenetic changes to be characterized at the genome level. These techniques have the potential to significantly improve our understanding of the specific mechanisms by which exposure to environmental chemicals causes disease. Such mechanistic knowledge provides a variety of opportunities for enhanced product safety assessment and the discovery of novel therapeutic interventions.
Key abbreviations used 877
Genomic and epigenomic perturbations produced by cigarette smoke 878
Lung cancer 879
Chronic obstructive pulmonary disease 881
Cardiovascular disease 882
DNA methylation signatures 882
Next generation genomics and epigenomics platforms 883
Declaration of interest 885
In man, the process of detoxification of tobacco carcinogens has two phases. Phase I xenobiotic metabolizing enzymes transform oxidizable substrates into electrophiles, and phase II-related enzymes attack oxidized substrates via nucleophilic reactions, providing an efficient process for xenobiotic metabolism and excretion. Accordingly, the genes encoding the enzymes involved in xenobiotic metabolism are up-regulated in the bronchial epithelium of smokers (Spira et al., 2004; Beane et al., 2011) and in smoke-exposed rodent lung tissue (Gebel et al., 2004; Gebel et al., 2006). Some metabolites formed during the detoxification process can be highly reactive and may create covalent adducts at guanidine and adenine bases. Adducts are normally eliminated through the DNA excision repair pathway, but if an adduct is still present during replication, the DNA polymerase may bypass such an altered base leading to a mutation (Pfeifer et al., 2002; Hang, 2010).
The mutagenicity of tobacco smoke has been demonstrated in various assays including the mouse lymphoma assay (MLA) (OECD, 1997). In a recent study, Guo et al. tested smoke condensate from 11 different cigarette brands and found that they all demonstrated dose-dependent mutagenic effects (Guo et al., 2011). Moreover, the tobacco-specific N-nitrosamine (TSNA), nicotine-derived 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) induces lung tumors in all tested species (mice, rat, hamster and mink) regardless of the route of administration.
Some of these agents or their combinations may also induce genomic aberrations in smokers. Genetic lesions, such as loss-of-heterozygosity (LOH) and other chromosomal aberrations are found in dysplastic, hyperplastic but also even histologically normal bronchial epithelium in clinically cancer-free smokers (Franklin et al., 1997; Mao et al., 1997; Wistuba et al., 1997; Nelson et al., 1998). There is a higher frequency of overall LOH in smoker lungs as compared to non-smokers, and among former smokers the LOH at 3p14 (containing the tumor suppressor gene FHIT) is less frequent than in active smokers.
While most compounds that are present in cigarette smoke and in various other environmental contaminants are non-mutagenic, some apparent non-mutagenic compounds can interfere with gene expression by means of epigenetic mechanisms. There is ample evidence for smoking-induced epigenetic events in the clinical context. Hypermethylated promoters have been observed in clinically cancer-free and cancer-bearing smokers. These promoters can be detected in cell samples derived from sources as diverse as bronchoscopy, sputum, broncho-alveolar lavage fluid (BALF), alveolar macrophages, or lymphoblasts (Belinsky et al., 1998; Palmisano et al., 2000; Lamy et al., 2002; Kim et al., 2004; Russo et al., 2005; Belinsky et al., 2005; Belinsky, 2005; Belinsky et al., 2006; Kerr et al., 2007; Baryshnikova et al., 2008; Monick et al., 2012). Even though promoter hypermethylation can persist for many years after smoking cessation (Zöchbauer-Müller et al., 2003; Bhutani et al., 2008), current smokers in general, have a higher mean methylation index than former smokers, supporting the view that hypermethylation, unlike somatic mutations, is reversible (Yanagawa et al., 2011). While the methylation of Retinoic Acid Receptor Beta (RARB), p16, fragile histidine triad (FHIT), and RASSF1A (RAS association domain family 1A) correlate weakly with smoking status, the degree of methylation increases with intensity of smoking.
The smoking effect has even been detected in cell-free DNA present in plasma. When methylation of at least one of Kif1a, DCC, RARB, or NISCH was considered, methylation frequency was smoking-dependent; while none of the light- or non-smoker controls showed plasma DNA methylation, the cumulative smoking dose (pack-years) correlated well with the methylation frequency in cancer-free heavy smokers (Ostrow et al., 2010).
This frequent hypermethylation in smoker tissues has been explained by the high levels of DNA methyltransferase 1 (DNMT1) that correlated with smoking status in lung tumor samples (Lin et al., 2010). In vitro and murine in vivo experiments have further shown that NNK acts through Akt signaling and inhibits DNMT1 protein degradation. Subsequently, DNMT1 protein accumulation leads to increased tumor suppressor gene hypermethylation (Damiani et al., 2008; Lin et al., 2010; Liu et al., 2010).
In an attempt to develop biomarkers of exposure, recent work has identified regions of altered DNA methylation in the lungs of SENCAR (SENsitive to CARcinogens) mice exposed to a single dose of 7,12-dimethylbenz[α]anthracene (DMBA) with or without cigarette smoke (Phillips and Goodman, 2009). The aberrant methylation was detected at very early time points, before any obvious lung histopathology. Based on their results, Phillips and Goodman (2009) suggested that regions of altered DNA methylation could serve as both biomarkers of exposure and effect.
Numerous in vitro studies have attempted to reproduce smoke-induced epigenomic changes observed in the lung of a smoker (Mass and Wang, 1997; Liu et al., 2007; Liu et al., 2010). A recent study has shown that a gene-specific promoter methylation is induced in immortalized lung epithelial cells after prolonged (several months) exposure to cigarette smoke condensate (Liu et al., 2010). Currently, it is not known whether the methylation is reversible and what would be the time frame for demethylation events after removal from smoke exposure. Reversible smoke effects on DNA methylation have been documented in cultured lung cancer cells. In A549 cells, the pro-metastatic oncogene synuclein-γ (SNCG) is silenced by CpG island (genomic region that contains a high frequency of CG-dimer) methylation. Cigarette smoke extract (CSE)-induced SNCG demethylation accompanied by gene overexpression in just 3 days of treatment. The demethylation was associated with a twofold decrease of DNMT3B mRNA. Withdrawal of the treatment resulted in the recovery of DNMT3B expression and concomitant re-establishment of SNCG CpG methylation (Liu et al., 2007).
The various forms of lung cancer display phenotypically diverse cell types and according to the clonal evolution model, cancer arises from a single cell, the extent of the cumulative genomic instability dictating tumor progression (Nowell, 1976). More recent evidence suggests that within a tumor, only a small population of cells has a self-renewing capacity making them able to drive malignant growth (Eramo et al., 2008). According to this cancer stem cell theory (reviewed in Rivera et al., 2011), either restricted progenitors or more differentiated lung cells could convert to cancer stem cells with self-renewing capacity. Such cell populations have already been identified for both small cell (SCLC) and non-small cell lung cancer (NSCLC) (Eramo et al., 2008).
Lung cancer development involves various genomic perturbations, such as point mutations, deletions, and gene amplifications. The short arm of chromosome 3 contains several candidate tumor suppressor genes, and deletions within this region are detected in nearly 100% of lung cancers (Zabarovsky et al., 2002). The appearance of activating mutations in dominant oncogenes, or inactivating mutations in recessive tumor suppressor genes, as well as chromosomal aberrations are generally regarded as late events in tumor development. However, in support of the field cancerization theory (Steiling et al., 2008; Gazdar and Minna, 2009), some mutations have been described in non-malignant lung tissue adjacent to resected lung tumors (Nelson et al., 1998; Zabarovsky et al., 2002).
A recent study has confirmed that cigarette smoking is linked to specific copy number alterations in NSCLC. Analyses of early stage tumors showed that heavy smokers (>60 pack-years) have more copy number gains than light and non-smokers. These gains were found to be predominantly affecting oncogenes and genes associated with tumor growth. Interestingly, copy number losses located in intergenic regions were found in light and non-smoker tumors, a phenomenon not seen in tumors from heavy smokers (Huang et al., 2011).
Even though adenocarcinoma is becoming the dominant lung cancer type in smokers today, it presents a variable, distinct subtype that is also found in non-smokers (Subramanian and Govindan, 2010). However, lung tumors found in non-smokers are genetically different from those found in smokers (Sun et al., 2007). While mutations in the GTPase gene encoding KRAS and the tumor suppressor gene encoding p53 are more specific to smoking-related lung adenocarcinoma, the sequence encoding epidermal growth factor receptor (EGFR) kinase domain often becomes mutated very early in the course of smoking-independent lung adenocarcinoma development (Herbst et al., 2008; Lantuéjoul et al., 2009; Dacic et al., 2010; Broët et al., 2011). KRAS and EGFR mutations have been found to be mutually exclusive, and there was an inverse correlation between smoking status (number of pack-years and duration of smoking) and the frequency of EGFR mutations (Yanagawa et al., 2011). It has been proposed, supported by the positive correlation between smoke-free years and the occurrence of EGFR mutations, that tobacco smoking suppresses EGFR mutation (Garinis et al., 2001; Dacic et al., 2010; Lee et al., 2010; Broët et al., 2011; Yanagawa et al., 2011). An inverse correlation has been shown between the methylation of RASSF1A, and FHIT and runt-related transcription factor 3 (RUNX3) versus EGFR mutation (Yanagawa et al., 2011).
The gene encoding tumor suppressor p53 is commonly mutated in cancers, and contrary to the EGFR mutation, p53 mutations occur more frequently in smokers than in non-smokers (Ryberg et al., 1994; Garinis et al., 2001; Pfeifer et al., 2002). Anna et al. showed that among lung cancer patients, prolonged duration of smoking increased the frequency of p53 mutations. While the mutation frequency was 14.3% in smokers of less than 20 years, half of those who had smoked for more than 20 years carried the mutation (Garinis et al., 2001; Anna et al., 2009). The occurrence of p53 mutations are not random and several ‘hot-spots’ were observed within the region that encodes the DNA-binding domain of p53 (Denissenko et al., 1996). In the subsequent study, it was demonstrated that the locations of the mutation “hot-spots” correlated with the methylation status of the proximal CpG dinucleotides (Denissenko et al., 1997).
The Tumor Sequencing Project (TSP) studied 188 lung adenocarcinomas and identified a group of somatic mutations that might affect the key pathways in adenocarcinoma development. The mutated genes that were found included formerly identified tumor suppressors, such as p53, CDKN2A, and STK11 and oncogenes such as KRAS, EGFR, and NRAS. Additionally, several additional mutated genes were discovered, including putative tumor suppressor genes, such as ATM, NF1, RB1, and APC as well as putative proto-oncogenes, such as ERBB4, KDR, FGFR4 and NTRK. Many of these genes were not only mutated, but the identified loci had also undergone copy number and/or gene expression changes. It was concluded that even though lung adenocarcinomas are highly heterogeneous, the pathways that are affected are most likely the same among the majority of the different subtypes. The dominant pathways that were affected included those of MAPK, p53, Wnt, cell cycle and mTOR (Ding et al., 2008).
In cancer biology, the most relevant, or at least the best understood, epigenetic mechanism refers to DNA hypermethylation of specific loci (CpG islands) in the promoter region of known and presumptive tumor suppressor genes which is a common hallmark of human tumors in general (Esteller, 2007), and of lung cancer in particular (Zochbauer-Muller et al., 2002; Belinsky, 2005). Importantly, there is now strong evidence that aberrant promoter methylation and the consequent silencing of tumor suppressor genes is a critical step during tumor development, and is frequently detected in so-called precursor (benign) lesions (Belinsky et al., 1998). Thus, aberrant promoter methylation is regarded as an early event in the multistep process of carcinogenesis.
Genes ‘targeted’ by promoter hypermethylation encompass a broad functional spectrum including some involved in cell cycle arrest such as p16 (major target) and p53, DNA repair such as MGMT (O 6-methylguanine-DNA methyltransferase), or growth arrest such as RASSF1A. Importantly, although it requires at least 5 years of follow-up, there is a strong indication that clinically cancer-free chronic smokers with hypermethylated promoters in actual and presumptive tumor suppressor genes are at an elevated risk of developing lung cancer (Belinsky et al., 1998; Palmisano et al., 2000; Jarmalaite et al., 2003; Baryshnikova et al., 2008; Feng et al., 2008). The frequency of p16 hypermethylation, which is considered one of the prime aberrations in lung cancer development, increases with disease progression (Belinsky et al., 1998; Belinsky, 2005). In fact, p16 is frequently fully inactivated in lung cancer via allele-loss and inactivation of the remaining allele by homozygous deletion, hypermethylation, or mutation (Cairns et al., 1995; Merlo et al., 1995; Rusin et al., 1996). The consequence of p16 inactivation is an increased cyclin D expression, which leads to persistent hyper-phosphorylation of retinoblastoma protein (Rb) with resultant evasion of cell cycle arrest (Gautschi et al., 2007).
Though ‘targeted’ hypermethylation is more recognized in cancer development, global hypo-methylation is almost always linked to malignancy; in normal human cells, more than twice as many cytosine residues are methylated compared to tumor cell DNA. Nearly half of the genome consists of highly repeated DNA sequences that are accountable for the global hypomethylation, a common trait in cancers, which can result in increased possibility of chromosomal rearrangements (Dunn, 2003). Activation of proto-oncogenes, including c-myc, c-fos, and c-Ha-ras, which upon demethylation become abnormally expressed, set off uncontrolled cellular proliferation. Finally, it is possible that inserted pro-viral sequences (silenced by methylation) become demethylated, which can lead to reactivation and increased infectivity (Dunn, 2003; Schär and Fritsch, 2011).
Tobacco smoke-induced methylation and resulting loss of gene expression has been detected in cell lines derived from cigarette smoke- or tobacco carcinogen-induced mouse lung tumors that showed reduced expression of death associated protein (DAP) kinase. The loss of expression correlated with the promoter methylation state and could be reversed by treatment with 5-aza-2′-deoxycytidine, a compound that inhibits DNA methyl transferase. The aberrant DAP kinase methylation was linked to the earliest pre-neoplastic state of lung adenocarcinoma, and the authors hypothesized that silencing of a crucial component of several apoptotic pathways could allow the expansion of the malignant cell colony leading to lung carcinogenesis (Pulling et al., 2004).
The methylated-CpG island recovery assay has identified genes that are hypermethylated in lung cancer cells. The data were used to compare the methylation status of A549 lung cancer cells relative to normal human bronchial epithelial cells. Judged by the location of the differentially methylated CpG islands (close to the ends of known or predicted genes or to the exons/introns within genes), it was hypothesized that the elements have a regulatory function. Notably, one fifth of the top 50 genes that were identified in the screen belong to the homeobox gene family (LHX2, LHX4, PAX7,HOXB13, LBX1, SIX2, HOXD3, DLX1, HOXD1, ONECUT2, and PAX9) (Rauch et al., 2006). In addition to gene specific methylation studies, a genome-wide screen was carried out to identify new cancer-specific methylation markers. The study included normal and malignant cell lines and relied on 5-aza-2′-deoxycytidine treatment as the readout for demethylation and increased gene expression. The candidates for cancer cell-specific promoter methylation sites were compared to those found in primary epithelial tumors to establish novel malignancy-associated methylation markers (Shames et al., 2006). An extensive depiction of methylation events and cancer is provided by the MethyCancer database, which documents the interplay between DNA methylation, gene expression, and cancer (He et al., 2008).
DNA methylation has also been detected in plasma. A panel of six genes (APC, CDH1, MGMT, DCC, RASSF1A, and AIM1) was used to compare serum and tumor DNA methylation profiles: if a given gene was methylated in serum, it was always also methylated in the tumor tissue. By contrast, tumor methylation was not always mirrored in the serum sample (Begum et al., 2011). A progressive rise in plasma DNA methylation has also been detected in samples from patients with malignant lung tumors as compared to subjects with normal computer tomography (CT) scan (Ostrow et al., 2010). These studies are encouraging non-invasive approaches to advance the clinical management of lung cancer to distinguish between cancerous and noncancerous abnormal CT findings. Such methylation markers could be followed early on in heavy smokers and the individuals harboring aberrant methylation could be subjected to more intensive screening, resulting in earlier tumor detection and improved prognosis. Moreover, as Dolinoy et al. have proposed, novel therapeutic intervention strategies targeting reversible unfavorable epigenetic modifications could be developed (Dolinoy and Jirtle, 2008).
It has been shown that there is increased methylation for certain individual genes as well as the overall number of methylated genes with the progression of pre-neoplastic lesion towards invasive adenocarcinoma (Licchesi et al., 2008; Chung et al., 2011). Selamat et al. have further mapped early, intermediate, and late methylation changes occurring throughout the development from histologically normal lung tissue (adjacent to the tumor) to atypical adenomatous hyperplasia, further to adenocarcinoma in situ and invasive lung adenocarcinoma. Unfortunately, the number of samples with smoking information was small and no correlation could be drawn between smoking status and DNA methylation levels in the tissue samples examined. Aberrant hypomethylation was evident only in the invasive carcinoma samples, indicating that unlike promoter hypermethylation, hypomethylation is a late event during lung adenocarcinoma development (Selamat et al., 2011). This is in line with a study demonstrating that global DNA hypomethylation is highly associated with tumor progression (Anisowicz et al., 2008). While global hypomethylation, being a late event in the course of disease, may not directly bring value to disease prediction, this association is important for understanding the tumor biology and may have implications in therapeutic intervention.
Clearly, epigenetics plays a major role in tumor progression; however, it might also be involved in the earlier stages of cancer development. According to the previously described cancer stem cell theory, the capacity to self-renew could be the result of abnormal activation of developmental pathways in cells destined for differentiation (Rivera et al., 2011). It is possible that such pathway activation involves epigenetic gene regulation, which evidently contributes to the cell’s ability to express or silence diverse genes during development.
Both in vitro and genetic studies have documented the involvement of the histone code in lung carcinogenesis (Peters et al., 2001; Shan et al., 2008; Haberland et al., 2009; Chi et al., 2010). Histone deacetylase 6 (HDAC6) is required for the epithelial-mesenchymal transition of TGF-α SMAD3 signaling-mediated lung cancer cell metastasis (Shan et al., 2008). Tumor cell lines engineered to have conditional HDAC alleles have shown that HDAC1 and HDAC2 have redundant functions, and while they are expendable in post-mitotic cells, at least one of them is required for tumor cell survival (Haberland et al., 2009). Mutations and deregulation in histone lysine demethylases (HDMs) have been found in many types of cancers (Chi et al., 2010). The perturbed patterns of histone tail methylations are associated with increased chromosomal instabilities and tumor risk, and this applies to both ‘writing’ and ‘erasing’ of the methylation code. For example, the lack of histone H3 lysine 9 methylation in histone methyltransferase (HMTase)-deficient mice leads to impaired genomic stability (Peters et al., 2001).
As lung cancer and COPD share many characteristics in the early stages of disease development, and since each condition increases the susceptibility to the other, it is reasonable to assume that they may be affected by similar genomic perturbations (Sundar et al., 2011; Yang et al., 2011).
In a recent study, white blood cell DNA isolated from subjects with well-defined COPD (using spirometry) was used for an array-based DNA methylation screen. The association of 349 CpG sites with the severity of COPD was reproduced in two family-based cohorts, and several of the 330 genes affected were related to immune and inflammatory processes. A significant association was found between SERPINA1 (the gene encoding α1-antitrypsin), hypomethylation, and low lung function (Qiu et al., 2012).
Cigarette smoke has considerable oxidative stress potential, leading to an imbalance between histone acetylation and deacetylation. This may account for the enhanced expression of inflammatory mediators, which in turn leads to amplified pulmonary inflammation (Rahman et al., 2002; Rahman, 2002). Cigarette smokers have increased acetylation of histones H3 and H4 (Szulakowski et al., 2006), and in C57BL/6J mice cigarette smoke exposure increases the levels of acetylated histones H3 and H4 on lysine 9 after just 3 days of exposure (Yang et al., 2008). In rat lung, cigarette smoke results in altered histone acetylation, and in analogy to COPD, the resulting excessive release of pro-inflammatory cytokines was insensitive to glucocorticoid treatment (Marwick et al., 2004). Mechanistically, the abnormal histone acetylation seems to be due to cigarette smoke-induced phosphorylation and subsequent ubiquitination and proteosomal degradation of HDAC2 as documented by both in vitro (macrophages, human bronchial and primary small airway epithelial cells) and in vivo (mouse lung) smoke exposure models (Adenuga et al., 2009; Adenuga and Rahman, 2010). COPD patients have reduced HDAC activity measured in the peripheral lung tissue, alveolar macrophages, and bronchial biopsy specimens (Ito et al., 2005). Lung tissue and peripheral blood mononuclear cells (PBMC) from COPD patients were also found to have reduced levels of Sirtuin 1, a type III HDAC that is also classified as an anti-aging molecule (Rajendrasozhan et al., 2008; Ito and Barnes, 2009). In agreement with the reduced HDAC levels and activity, it has been shown that the acetylation of histones H2A, H2B, H3, and H4 is increased in the lungs and alveolar macrophages of COPD patients (Chen et al., 2008).
Szulakowski et al. reported a correlation between COPD severity and decreased HDAC2 levels in the cytoplasm, and between both cytoplasmic and nuclear HDAC2 with the reduced lung function (Szulakowski et al., 2006). Oxidative stress is one of the major causative factors for COPD and most likely leads to the observed degradation of HDAC2 with consequent increase in the expression of pro-inflammatory cytokines. It has been proposed that the inhibition of HDAC2 activity also contributes to the glucocorticoid resistance seen in COPD inflammation (Barnes et al., 2005; Barnes, 2006; Barnes, 2009).
From the many risk factors for cardiovascular disease (CVD), smoking is causally linked to both disease onset and progression. Vascular remodeling, characteristic of CVD, involves chronic inflammation and the release of various cytokines and chemokines, and largely establish the initiation and progress of an atherosclerotic lesion (Wierda et al., 2010). Lung inflammation, the major driver of both lung cancer and COPD, has also been linked to CVD. It is believed that when lung inflammation, or even acute lung injury, turns systemic, it further stimulates events that lead to the activation of the vascular endothelium, heart attack, and stroke (Van Eeden et al., 2012).
Direct DNA damage has been linked to cardiovascular disease; the increase of micronuclei, which is indicative of DNA damage, correlates with the severity of atherosclerosis. Genomic instability, in the forms of LOH and microsatellite instability (MI), has been documented in smooth muscle cells of human plaques. The loci involved include the TGF-1 receptor (MI), mismatch repair genes (LOH), and nitric oxide synthase (LOH). Currently, it is not clear what the source of DNA damage is in the disease, or whether the damage is a cause or a consequence of disease progression (Andreassi and Botto, 2003).
Several independent studies have emerged showing that DNA methylation is an important aspect in CVD pathology. Analyses of peripheral lymphocytes have indicated that DNA from subjects with angiographically confirmed coronary artery disease (CAD) had undergone more extensive genomic methylation than that of healthy controls. There was also a significant positive correlation of global DNA methylation with plasma homocysteine levels, which is an independent risk factor in CAD patients (Sharma et al., 2008). Kim et al. have probed global genomic DNA methylation within ALU and Satellite 2 (AS) repetitive elements in peripheral blood lymphocytes (PBL) as a population based CVD risk assessment. Their study showed a positive correlation between PBL DNA methylation and prevalence of CVD or its risk factors, more pronounced in men than women (Kim et al., 2010). Finally, in the apolipoprotein E (apoE) mutant mouse, an animal model for CVD, the epigenetic changes in PBMCs DNA were found within coding sequences as well as repeated interspersed sequences. Strikingly, the altered methylation pattern could be detected before any noticeable atherosclerotic lesions were present, suggesting that DNA methylation plays a causative role during the course of CVD development (Lund et al., 2004). Epigenetic gene regulation has been reported to occur in platelets from smokers, in a study of monoamine oxidase-B (MAOB) (Launay et al., 2009). These authors reported that “the methylation frequency of the MAOB gene promoter was markedly lower in smokers than in non-smokers, due to cigarette smoke-induced increase of nucleic acid demethylases activity”.
In addition to DNA methylation, vascular homeostasis and atherosclerosis biology depend on the histone code. Whilst some functions mediated by HDACs are cytoplasmic (i.e. not associated with chromatin), chromatin immunoprecipitation has shown that the FGF2 promoter is bound by HDAC5. The promoter is thus presumed to be repressed by HDAC5 leading to transcriptional inactivity (Zhou et al., 2011). Post-translational histone modifications have also been involved in endothelial nitric oxide synthase 4 (eNOS4) regulation in endothelial cells, an important feature of vascular biology. It has been proposed that histone modifications are essential in maintaining the eNOS4 expression and that erasing the histone mark results in the hypoxic repression of the eNOS4 gene (Fish et al., 2010).
As DNA methylation plays a critical role in many different cellular processes including chronic inflammation, which lays a foundation to many smoking-related diseases, such as COPD, lung cancer, and CVD, the whole genome methylation signatures could provide valuable information on possible perturbations as a result of cigarette smoking. DNA methylation signatures, and the changes thereof, complement the conventional gene expression profiling in assessing the extent of damage that smoking causes to different organs. Li et al. have carried out a whole-genome DNA methylation analysis on human PBMCs and demonstrated 20 genomic features of regulatory, coding, non-coding, RNA-coding, and repeat sequences with a distinct methylation signature. There was a considerable inverse correlation between the allele-specific methylation (ASM) and allele-specific gene expression (ASE). As PBMCs represent a significant, non-invasive sample source, risk assessment could benefit considerably from analysis of the PBMC DNA methylome (Li et al., 2010). Similar to gene expression analyses, the challenge is to categorize the individual methylated genes in a meaningful way to build a predictive classifier. Assessment of network perturbation amplitude by applying systems biology data to causal biological networks could identify specific signatures beyond single genes for each of the many smoking-related diseases (Martin et al., 2012).
The methylation analyses should be explored as a method to improve stratification of subjects in addition to gene expression. Most likely the gene expression will be inversely correlated to the methylation levels. Furthermore, this work may pinpoint “interchromosomal networks” (chromosomal regions) that are subject to joint epigenetic mechanisms regulating gene expression (Zhao et al., 2006). A variety of DNA methylation biomarkers, have been developed in the field of tumor classification as well as disease and therapy prognosis (Sandoval and Esteller, 2012).
As reviewed above, a large body of information exists on different genomic and epigenetic changes that are caused by smoking and other xenobiotic compounds. It is, however, possible to obtain an even more comprehensive picture on these perturbations by employing new and advanced technologies. Pleasance et al. (2010) have assessed the genome-wide mutational load extracted from an SCLC cell line by a massively parallel sequencing technology. The authors showed a comprehensive, somatically acquired, mutational profile including base substitutions, insertions, and deletions, as well as copy number changes and genomic rearrangements. The 23,000 SCLC mutations could be classified into distinct mutation signatures. Overall, these remarkable mutational patterns observed in the SCLC genome highlight the utility of genome-wide analyses to obtain true DNA signatures associated with lung cancer and carcinogen exposure. The patterns would not have been identified by simply sequencing limited genomic regions (Pleasance et al., 2010).
Several parallel sequencing technologies including pyro-sequencing, fluorescence-based sequencing-by-synthesis and sequencing-by-ligation, ion semiconductor and single molecule real time (SMRT) sequencing, have been developed in the last decade, offering read lengths and throughputs ranging from about 75,000 single-end long reads (~1kb) to 6 billion 2×100bp paired-end reads (Supplementary Material; Niedringhaus et al.,2011).
Regardless of the technology used, high-throughput sequencing has applications in genomics and epigenomics. Apart from its use in de-novo and re-sequencing of whole genomes, deep sequencing of a genome (DNA-seq) enables a wide range of analysis. Structural variations, copy number variations, single nucleotide polymorphisms (SNPs) and small insertions or deletions can be identified based on the comparison of a re-sequenced genome to a reference genome. If genomic regions of interest are known, targeted re-sequencing can be used to reduce costs, complexity and time.
Whole genome DNA methylation can be investigated either at a single nucleotide level using shotgun bisulfite sequencing (MethylC-seq) and reduced representation bisulfite sequencing (RRBS), or at the level of a few tens of nucleotides with methylated DNA immunoprecipitation (MeDIP-seq), methylated DNA capture by affinity purification (MethylCap-seq), methylated DNA binding domain sequencing (MBD-seq) or methylation-sensitive restriction enzyme sequencing (MRE-seq) (Bock et al.,2010; Harris et al.,2010). MethylC-seq and RRBS require the sequencing of both the untreated and the bisulfite treated genomes in order to compare them and identify methylated cytosines. With MeDIP-seq, MethylCap-seq and MBD-seq, methylated genomic DNA fragments are enriched before sequencing. On the contrary, MRE-seq enriches unmethylated genomic DNA fragments. These methods reduce the amount of sequencing necessary, but do not provide single nucleotide accuracy.
Histone modifications can be studied using chromatin immunoprecipitation (ChIP) with antibodies specific for a given histone modification. After precipitation, the DNA bound to the precipitated chromatin can be hybridized on a microarray chip (ChIP-chip) or sequenced (ChIP-seq) on a high-throughput platform. ChIP-chip will require a large number of arrays to cover whole mammalian genomes, while ChIP-seq provides the whole genome coverage even at a low sequencing depth (Barski et al., 2007).
With the development of platforms based on high-throughput sequencing, genomics and epigenomics analyses are freed from the requirement for prior knowledge inherent to array-based platforms. The platforms thus move closer to measuring what is in the sample rather than what is on the chip. Furthermore, as high-throughput sequencing directly measures the DNA present in a sample, it can capture several modifications and alterations at once. The data generated by these platforms are therefore more flexible in the way they can be analysed and provide wider spectrum of genomic information. Computational resources and methodologies for the analysis of large-scale epigenomics datasets have been developed in recent year, however, as noted by Huss, “custom tools are needed to optimally analyze ChIP-seq data on histone modification and BS-seq data on DNA methylation” (Huss, 2010). Furthermore, reaching the full potential of whole genome epigenomics studies is still hindered by the lack of guidelines and repositories for data submission. This would speed up research and allow for advanced applications such as therapeutic interventions (Ongenaert, 2010).
These next generation technology platformsshould be preferred methodologies to execute DNA and chromatin bound analyses. They have proven to be powerful tools, and are clearly the future means of identifying genetic and epigenetic aberrations triggered by compounds in tobacco smoke. In their recent review, Pfeifer et al. have given this assertion some perspective: “the analysis of a single cancer genome may generate almost twice as much mutation data as the whole literature on sequencing the p53 gene accumulated during 20 years” (Pfeifer and Hainaut, 2011).
Better understanding of the mechanisms for the major smoking-related disease was the main aim of a recent report (U.S. Department of Health and Human Services. 2010); however, the report was quite limited on how this understanding might actually be gained. Product safety assessment through clinical and epidemiological studies, generally decades after the initial exposure, rarely explains the mechanisms that link biological perturbations to a given exposure. In contrast, predictive systems biology approaches offer a valuable means for the assessment of biological perturbations before the phenotypic outcomes manifest as well as for pinning down the mechanisms involved in the body’s response to potentially harmful substances. Global measurements from in vitro cellular and in vivo animal experimental models can be causally linked to a biological network and this offers immense power to identify important pathways and targets for intervention (Hoeng et al., 2012).
A major component of systems biology is to identify the gene-related network perturbations caused by xenobiotics. For example, a constitutively increased growth pattern in gene expression profiling may stem from an activating mutation in a cell growth signaling gene such as KRAS or EGFR. While both effects may provide similar pictures at gene expression level, their impact on disease progression is clearly different. The integration of epigenetics and genomics to the systems biology approach will lead to the identification of a full panel of biomarkers and to a more reliable depiction of DNA damages caused by cigarette smoke. As some of the epigenetic and genomic changes, e.g. DNA methylation, can be reversible, their targeting has been considered for therapies. In this context, it is of interest to also study these effects upon smoking cessation to determine how many of the changes presented above reverse completely upon cessation, and how long does the reversal process take.
Given the very rapid increase in knowledge on genomic effects of cigarette smoke, our recommendation is that this research be continued to definitively determine pathways (i.e.mechanisms) by which smoke actually causes diseases such as those describe here. This genomic investigation will spur the formulation of new testable hypotheses and present mechanistic insight on how exposure to chemicals and mixtures are related to disease onset and progression, thus providing the basis for biomarkers selection and justification. It might then be possible to identify specific chemicals (or classes of chemicals) in the smoke that would be linked to specific diseases, a long-term goal (Wynder, 1980) which has so far received only minimal success. Research could then be pursued on reducing or eliminating those chemicals in smoke, which have been identified by systems biology as necessary to substantially perturb disease networks as opposed to the current assumption that ‘harmful and potentially harmful constituents’ (U.S. Department of Health and Human Services, 2011) of the smoke matrix can be reliably identified by traditional toxicological or epidemiological studies.
Our approach may be more robust than the simplistic application of genetic toxicology testing of either whole smoke or of cigarette smoke condensate. Mutagenicity testing of highly complex mixtures seems scientifically irrational, but it is an approach that is often used when comparing cigarette types (Demarini et al.,2008) and smokeless products (Johnson et al.,2009). It might also be possible to take the now-established mechanisms for the different diseases in humans, and see whether these mechanisms also exist (or could be improved) in animal species that have been used as surrogates for humans (Schleef, 2006), with the aim of building better animal models (Coggins, 2010). Transgenic strains of laboratory animals are becoming increasingly more popular and seemingly limitless in scope.
It is very likely that multiple redundant pathways exist for each of the disease states we have reviewed. The adoption of a systems biology approach relying on results from high-throughput sequencing genomics studies may therefore be a significant contribution enabling the elucidation of these pathways and of their interplay. We propose that the use of systems biology can go beyond just toxicological assessment, and can be applied in other areas such as drug development, pharmacology, and personalized medicine. These measurements should be considered in order to gain better understanding of the mechanism of actions of cigarette smoke on disease development as also indicated by Hammons et al. in their recent review (Hammons and Lyn-Cook, 2011). Furthermore, the genomic and epigenetic mechanisms may serve as causal links between exposure, dose and duration. Recently, Hou et al. summarized how exposure to environmental chemicals can cause epigenetic changes in a dose-dependent manner (Hou et al., 2012). Moreover, the initiation of biological perturbations resulting in malignancy by certain chemical compounds can be explained by epigenetic mechanisms (Stein, 2012).
The limitations of the traditional risk assessments include the lack of mechanistic information, uncertainty about the toxicity pathways that might potentially be affected, relevance of the endpoints of toxicity, default assumptions for dose-response extrapolations, and uncertainty about intra- and inter-species “safety factors” (Barlow et al.,2006; Edwards and Preston, 2008; Hartung, 2009). In this review, we have focused on the genomic and epigenomic perturbations caused by tobacco smoke and how the latest technological developments enable the research community to investigate them. In general, toxicogenomics is not yet fully accepted in the toxicological sciences, due to regulatory concerns and the lack of published proof of concept studies (Mendrick, 2008). All these aspects are currently being addressed and the advances in molecular system biology unveils the multifaceted nature of disease processes. The toxicology community should incorporate this emerging knowledge, whereby it can take an iterative approach of examining proof of concepts, assessing the value of new information, and developing decision rules. Programs such as NexGen (http://www.epa.gov/risk/nexgen/index.htm) from the US Environmental Protection Agency guide the community in this direction.
For a complete understanding of the effects of these perturbations, all components of network biology should be included (Figure 1). This allows the mechanistic understanding of the biological responses that attempt to protect the organism from potentially harmful substances and the identification of predictive biomarkers for disease onset. Ultimately, the combined quantifiable genomic and epigenomic perturbations will allow us to measure the genomic impact of a substance on a biological system.
The affiliation of the authors is shown on the cover page. The authors are all employees of Philip Morris International (PMI) with the exception of Christopher Coggins who served as a paid consultant to PMI for preparation of this review. PMI is one of the world’s largest producers and marketers of cigarettes. Christopher Coggins is an independent toxicology consultant, specializing primarily on issues concerned with the health impacts of airborne materials from environmental or occupational exposures or use of consumer products such as tobacco containing products. The views expressed in the paper are solely those of the authors and the paper was prepared exclusively by the authors.