|Home | About | Journals | Submit | Contact Us | Français|
Esophageal squamous cell carcinoma (ESCC) is one of the most common malignant tumors with poor prognosis worldwide. The poor prognosis is due to the advanced stage at the time of diagnosis and the limited clinical staging lacking significant molecular biomarkers to effectively stratify patients for treatment options. As cancer is a disease of genome instability and a resulting of accumulation of genetic alteration, mounting chromosomal and genomic technologies were developed and progressed rapidly which could be used for characterizing patients in genomics level. In this review, we summarized applications of multiple technologies and research progress at chromosomal and genomic level in ESCC.
Esophageal cancer (EC) is the eighth most common cancer and the sixth most common cause of cancer-related death in the world, with an estimated 456,000 new cases and 400,000 deaths in 2012 1. However, according to the latest cancer statistics in China 2, EC is the third most commonly diagnosed cancer among men and the fifth among women. When concerning mortality, EC is the fourth leading cause of cancer death in both sexes. EC can be divided into esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC), which are two completely distinct subtypes from histopathological, epidemiologic and molecular aspects 3. ESCC accounts for about 90% of cases of EC worldwide (Figure (Figure1),1), and the 5-year survival rate for patients with ESCC, although has improved during the past decade, remains generally poor 4. Many patients showed lymph node metastasis and tumor invasion into adjacent organs at the time of diagnosis and lacking effective chemotherapeutic approaches available to treat ESCC patients both contribute to the poor prognosis of ESCC 5. Current clinical staging approaches are limited in their ability to effectively stratify patients for treatment options. The concept of precision medicine—coupling established clinical-pathological indexes with molecular profiling to create preventive, diagnostic, prognostic, and therapeutic strategies precisely tailored to each patient's requirements 6, 7—was put forward in recent days. This concept gives all of us access to the personalized molecular information, bringing us closer to curing diseases.
In this review, we summarized a series of chromosomal and genomic technologies used in clinical and research field of ESCC in recent decades, helping us to understand the chromosomal and genomic characteristics and variations in ESCC and providing us a direct cognition to precision medicine concerning to molecular profiling level.
A series of chromosomal and genomic technologies were used in clinical and research field, including Southern blot analysis 8, Sanger sequencing 9, fluorescence in situ hybridization (FISH) 10, DNA microarray 11, PCR method 12, comparative genomic hybridization (CGH) 13, spectral karyotyping (SKY) 14, Next-generation sequencing (NGS) 15, Third generation sequencing 16, and so on (Figure (Figure2,2, Table Table1).1). We summarized the characteristics of these technologies with the purpose of understanding the chromosomal and genomic variations of ESCC in detail.
The Southern blot named after its inventor is a method used for detection of a specific DNA sequence or identification methylated sites in samples. By the method of Southern blot, coamplification of genes, such as hst-1 and int-2 17, MYEOV and CCND1 18, was observed in ESCCs. Southern blot, as a conventional method to detect specific DNA sequence, has its limitations such as the harmful radiation generated from radioisotopes, which is needed for detection of DNA sequence. Nowadays, this method used as a validation tool to demonstrate the accuracy of other methods, such as quantitative polymerase chain reaction (qPCR) 19 and array-based comparative genomic hybridization (CGH )20, 21.
Sanger sequencing, as a most widely used direct DNA sequencing method developed in 1977, is based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase. When PCR method was developed in 1983, these two methods were perfectly combined, the sequencing always followed by the PCR amplification 22. Recently, Sanger sequencing has been partly supplanted by Next generation sequencing (NGS), owning to the NGS's characteristics of high throughput, automated genome analyses. However, the Sanger method remains in wide use because of its accuracy, especially for validation of NGS results 23 as a classical method and smaller-scale projects in clinical.
FISH, used fluorescent probes that bind to those parts with a high degree of sequence complementarity, was widely used to detect the amplification 24, deletion 25, and gene rearrangement 26 of the targeted sequences on chromosome in situ, with the signal capturing by fluorescence microscopy. The biggest limitation of this technology is due to the subjectivity from different testers and the low resolution to detect the detailed variation.
DNA microarrays offer a high-thoughput genomic approach to screen chromosomal alterations systematically and can be used to measure the alterations of large numbers of genes simultaneously or to genotype multiple regions of a genome for evaluating tumor heterogeneity 27. However, the complexity of design and manufacture of DNA microarrays limited the use of this technology.
The PCR method, is a widely-used technology for amplifying a particular DNA or cDNA sequence to generate thousands to millions of copies and acts as an important fundamental method for many applications, such as PCR-sequencing, PCR-restriction fragment length polymorphism (RFLP), PCR-single strand conformation polymorphism (SSCP), amplification refractory mutation system (ARMS) - PCR and so on. The major limitation of PCR is that the target sequence information is needed prior to primers' designation. Moreover, because PCR is a high sensitive technique, any form of contamination of the sample can produce misleading results 28.
This combined technology is widely used in scientific research and molecular pathology examination for detecting single nucleotide polymorphism (SNP), mutation, and gene fusion in a particular district of a gene or more genes.
PCR-RFLP is a technique that exploits variations in homologous DNA sequences. After the amplification of PCR, the DNA products are digested by restriction enzymes and the resulting restriction fragments are separated according to their lengths by gel electrophoresis. RFLP analysis was an important tool in genome mapping, localization of genes for genetic disorders and determination of risk for disease 12.
SSCP is defined as conformational difference of single-stranded nucleotide sequences of identical length as induced by differences in the sequences under certain experimental conditions. This property allows sequences to be distinguished by means of gel electrophoresis, which separates fragments according to their different conformations 29.
The basis of this technology is that oligonucleotides with a mismatched 3'-residue will not function as primers in the PCR under appropriate conditions 30. ARMS-PCR is simple, rapid and reliable, allowing the direct analysis of any locus of interest provided sufficient sequence data.
Real-time qPCR is a technique based on conventional PCR, monitoring the signal of amplification of a targeted DNA molecule during the PCR. Two common methods for the detection of real-time qPCR products are: (1) non-specific fluorescent dyes that intercalate with any double-stranded DNA, such as SYBR Green method, This method is a low cost and accurate way for detection of gene copy number alteration (CNA), including copy number amplification31 and deletion 32; (2) sequence-specific DNA probes consisting of oligonucleotides that are labelled with a fluorescent reporter which permits detection only after hybridization of the probe with its complementary sequence, such as TaqMan probe method. A series of TaqMan Assays were designed to detect SNP 33, CNA 34 and gene mutation. This method contains a specific DNA probe, which improves the specificity and sensitivity for detection, with a wider application, a higher accuracy than SYBR Green method, however, a higher cost.
The digital polymerase chain reaction simultaneously amplifies thousands of samples, each in a separate droplet within an emulsion. This can be used to quantitate mutant alleles or copy number of a specific gene in a DNA sample.
CGH is a powerful method that can survey the entire genome of tumor cells to detect DNA CNAs in one hybridization experiment 35, and has an improved resolution compared to the more traditional cytogenetic analysis techniques of FISH which are limited by the resolution of the microscope utilized. Nowadays, this technology is perfectly combined with DNA microarray to detect unbalanced chromosomal abnormalities.
SKY is used to simultaneously visualize all the pairs of chromosomes in an organism in different colors, detecting or defining genomic changes, such as chromosomal derivatives or chromosomal rearrangements 35. However, it is limited in detecting complex rearrangements.
The high demand for low-cost sequencing has driven the development of NGS technologies that parallelize the sequencing process, producing thousands or millions of sequences concurrently 23, 36-38. With the widely use of NGS, the genetic landscape of various diseases has been reported. However, without the bioinformatics, the results obtained from NGS costing serious funding can be worthless. Also, the sequence obtained from NGS need to be validated by other classical methods.
Third generation sequencing is characterized by single molecular sequencing with no need of PCR amplification. This allows longer reads, faster sequencing speed, and more accuracy. Because of the direct sequencing of long sequence, it is easy to perform the data analysis without the need of joint of gene sequence. Moreover, RNA and methylated DNA sites can be directly sequenced.
Chromosomal variations are a series of missing, extra, or irregular portion of chromosomal DNA. It can be from an atypical number of chromosomes or a structural abnormality in one or more chromosomes or chromosomal segments.
The most frequently detected chromosomal gains were found on 1p, 1q, 2p, 3q, 5p, 6p, 7p, 7q, 8q, 9q, 11q, 12p, 14q, 15q, 16p, 16q, 17q, 18p, 19q, 20p, 20q, 22q, and Xq, while the most frequently loss involved 3p, 3q, 4p, 4q, 5q, 6q, 7q, 9p, 13p, 13q, 18q and 19p 35, 39. Gain in 12p is indicative of poor prognosis after esophagectomy39. However, it is also worth pointing that due to the limited technologies in that period, the results should be confirmed by further studies.
In recent years, high resolution array-based CGH (aCGH) has been applied to identify target oncogenes and tumor suppressor genes (TSGs) through defining recurrent gains and losses in various cancers. Studies on ESCC samples revealed that recurrent, high-level amplifications in 3q26.32-33, 3q27.1, 7p11, 7p22.3, 8p11.23, 8q21.11, 8q24.21, 11q13.2-11q13.3, 11q22, 12p12.1, 12q15-q21.1, 13q22.1, 14q11.2, 14q13.3, 18q11.2, and 19q13.11-q13.12, and homozygous deletions in 1p15.4, 2q22.1-22.2, 3p14.2, 4p16.1-p15.1, 4q34.3-q35.1, 5q12.1, 6p22.1, 9p21.3, 9p24.1, 13q14.2, 14q12, and 22q13.1 13, 36, 40, 41. Gain of 11q13.2 and loss of 7q34 and 18q21.1-q23 were associated with poor outcome 41.
When the chromosome's structure is altered, several atypical forms appeared, such as deletions, duplications, translocations, inversions, insertions and so on. They often lead to an increased tendency to develop certain types of malignancies. Caixia Cheng, et al. 42 analyzed whole-genome sequencing (WGS) data from 31 ESCCs to predict somatic structural variations and determine copy number changes. They found deletions and translocations as the dominant SV types, and 16% of deletions were complex deletions. Chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally misarranged chromosomes that occurred in 55% of ESCCs.
Genetic variation is based on the variation in alleles of genes in a gene pool. Genetic variation will result in phenotypic variation if variation in the DNA sequence results in a difference in the order of amino acids in proteins coded by that DNA sequence. We focus on the gene CNAs and mutations in ESCC.
Gene mutation is an important mechanism leading to the alteration or loss of the gene function. The most frequently mutated genes in ESCC, including TP53 22, PIK3CA 43, BRCA2 44, EGFR 45, NRF2 46, 47, CDKN2A 48 were detected by several groups using traditional methods, such as PCR-sequencing or PCR-SSCP. A complete genetic landscape of ESCC remains incomplete, and it is likely that additional genes might also play a role in this disease and its progression. Recent advances in sequencing technology have overcome past limitation of scale and thousands of mutations can be identified in a single sample 49. Chinese researchers conducted whole-genome sequencing or whole-exome sequencing on ESCC patients, hoping to define the mutational landscape of ESCC and providing an important molecular foundation for understanding esophageal tumors 23, 36-38, 50. The mutations discovered by traditional methods were also detected by NGS, moreover, a number of novel mutated genes were firstly unmasked in ESCC. In table table2,2, we profiled the frequently mutated genes (≥5%) in two NGS studies. The most frequently mutated genes were TP53, TTN, MLL2, CDKN2A, PIK3CA, NOTCH1, NFE2L2, EP300, ADAM29, FAM135B and so on. And these genes were mostly involved in pathways containing epigenetic processes (MLL2, EP300, CREBBP, TET2); cell cycle (TP53, CCND1, CDKN2A, FBXW7); and the NOTCH (NOTCH1, NOTCH3), WNT (FAT1, YAP1, AJUBA) and receptor-tyrosine kinase phosphoinositide 3-kinase signaling pathways (PIK3CA, EGFR, ERBB2) 51.
Ling Zhang, et al. 38 discussed the mutation signature of ESCC. Signature A was characterized by C>G, C>T, and C>A mutations at TpCpX trinucleotides and was associated with mutations in the APOBEC family of cytidine deaminases. Signature B was characterized by an enrichment of C>T mutations at XpCpG trinucleotides because of an elevated rate of spontaneous 5-methyl-cytosine deamination. Genes involved in cell cycle and apoptosis regulation were mutated in 99% of cases and mutations in genes that regulate histone modification have been observed in about 63% of ESCCs 23.
To determine the deeper clinical implications of some significantly mutated genes, investigators focused on the prognostic values of genes with deleterious recurrent mutations. EP300 23, 51, TET2 51, FAM135B 37 mutations were associated poor survival respectively. However, NOTCH1 mutations had a better outcome than those individuals without deleterious mutations 50. These results need further validation from multi-centers in future studies.
Gene amplification is one of the major causes leading to the proto-oncogene activation. As mentioned, high-level amplifications in 3q26 (P63), 3q26.32-33 (SOX2, PIK3CA), 3q27.1, 7p11.2 (EGFR), 7p22.3, 8p11.23 (FGFR1), 8q21.11, 8q24.21 (MYC), 11q13.2-11q13.3 (CCND1, FGF3/4/19, CTTN, CPT1A), 11q22, 12p12.1 (KRAS), 12q15-q21.1 (MDM2), 14q13.3 (NKX2-1), and 18q11.2 were detected (Figure (Figure3)3) 36. The amplified genes may be the key drivers giving rise to carcinogenesis.
Wang, et al. 52 detected the CCND1 amplification on 100 ESCCs and 11 normal tissues using real-time qPCR and found that 41% of the patients had CCND1 amplification, which has a short survival time compared with the patients without CCND1 amplification. Our group's result had a consistence with them by FISH. The amplification rate of EGFR ranges from 7% to 15%, and it showed a correlation with poor prognosis in ESCC patients 24, 34, 53. In ESCC, the 3q amplification peak includes only one annotated gene, SOX2 54. A copy number gain of SOX2 was observed in 6 of the 40 primary ESCCs (15%) 55. De-Chen Lin, et al. 36 examined 59 tumors with aCGH and identified CCND1, EGFR, MYC, KRAS, FGFR1 were frequently amplified in ESCC. The amplification of FGFR1 was validated by FISH and high FGFR1 amplification is an independent poor prognostic factor in resected ESCC 56. And our group's result showed that high FGFR1 amplification is a delayed poor prognostic factor in stage I and II patients (unpublished data).
Gene deletions/losses are not as common as amplification in ESCC. Relative high-level deletion in 2q22.1-22.2 (LRP1B), 9p21.3 (CDKN2A/B), 5q12.1 (PDE4D), 9p24.1 (PTPRD), and 3p14.2 (FHIT) were detected 36. Madiniyet, et al. 25 examined 40 ESCC surgical specimens for TP53 gene deletion using FISH and TP53 gene deletion was significantly higher in poorly differentiated ESCC cases. The TP53 gene deletion rate was shown to be correlated with the level of differentiation and lymph node metastasis in ESCC. CDKN2A is a frequently deleted gene, with a loss rate of 14.3% (3/21) in ESCC patients 57. Takehiko, et al. 32 investigated FBXW7 copy number aberrations in laser-microdissected 38 ESCC specimens using aCGH analysis. These evaluations found FBXW7 copy number loss rates of 44.7% (17/38) in the clinical samples.
Compared with ESCC samples, ESCC cell lines are pure cancer cells, playing important roles in the molecular mechanism research of ESCC. Studying the chromosomal and genomic variations of ESCC cells provide valuable insight for future studies using these cell lines as ESCC models.
KYSE 180 is an ESCC cell line. Loss of DNA copy number was observed at 4p, 5q, 6q, 9, 10p, 12p, 13, 14p, 15p, 18p, 18q, 20, 22, and Y. Chromosomal gains and translocations occurred at the entire or part of 1, 2p, 3, 4p, 5p, 5q, 6p, 7, 8, 10q, 11, 12q, 14q, 16, 17q, 19, and Xp. Seven derivative chromosomes (5, 8, 12, 14, 14, 14, and 17) presented complex translocations, each involving three or four chromosomes 58. KYSE 410-4 is also an ESCC cell line. Chromosomal gains occurred at 2q, 3, 8, 17p, and X. Totally 16 structural arrangements were detected, including four derivative chromosomes. The rearrangement of the centromeric regions accounted for approximately 44% of all rearrangements 59. Jianming Ying, et al. 60 profiled ten commonly used ESCC cell lines (EC1, EC18, EC109, HKESC1, HKESC2, HKESC3, SLMT1, KYSE70, KYSE410 and KYSE520) using aCGH for whole-genome DNA copy number alterations, finding that recurrent chromosomal gains were frequently detected on 3q26-27, 5p15-14, 8p12, 8p22-24, 11q13, 13q21-31, 18p11 and 20q11-13, with frequent losses also found on 8p23-22, 11q22, 14q32 and 18q11-23. Gao et al. 23 performed exome sequencing on 8 cell lines, including 7 from the KYSE series ESCC cell lines14 and 1 immortalized esophageal squamous epithelial cell line, Het-1A. Among the 8 cell lines, total mutations varied from 315 to 754.
These data provide significant, detailed information for appropriate uses of these ESCC cell lines for cytogenetic and molecular biological studies.
As innovation and development of NGS have driven prices down and throughput up, projects have been transitioning from exome to whole-genome sequencing of tumor and matched germline samples, facilitating the discovery of new biology for ESCC. However, as data from different projects began to be collected and centralized, it became apparent that there are marked differences in how teams generate WGS data and analyze it. Benchmarking strategies are needed to be explored to standardize the sequencing method and data analysis 61.
Dissimilar to conventional biomarkers, big-data-based edge biomarker is a new concept to characterize disease features based on biomedical big data in a dynamical and network manner, which also provides alternative strategies to indicate disease status in single samples 62.
The carcinogenesis of ESCC is generally a multistep process reflecting cumulative chromosomal and genetic alterations. Moreover, multiple genetic variations may be involved in a single gene, such as CDKN2A, FBXW7. Plenty of researches had been done in ESCC, however, it is still lacking specific driver genes just as HER2 in breast cancer, KIT and PDGFRA in gastrointestinal stromal tumor, EGFR, EML4-ALK, ROS in lung cancer which can be used to diagnose the disease, stratify the patients, predict the prognosis or used as a therapeutic target. Big data era has arrived, researches on a large group of ESCC patients from multicenter urgently needed. Precision medicine based on genomic data could lead to new way for prevention, diagnosis and treatment of ESCC.
This work was supported by Shanghai Municipal Commission of Health and Family Planning, Key-developing disciplines (No.2015ZB0201).