|Home | About | Journals | Submit | Contact Us | Français|
The incidence of esophageal adenocarcinoma (EAC) has risen 600% over the last 30 years. With a five-year survival rate of 15%, identification of new therapeutic targets for EAC is greatly important. We analyze the mutation spectra from whole exome sequencing of 149 EAC tumors/normal pairs, 15 of which have also been subjected to whole genome sequencing. We identify a mutational signature defined by a high prevalence of A to C transversions at AA dinucleotides. Statistical analysis of exome data identified significantly mutated 26 genes. Of these genes, four (TP53, CDKN2A, SMAD4, and PIK3CA) have been previously implicated in EAC. The novel significantly mutated genes include chromatin modifying factors and candidate contributors: SPG20, TLR4, ELMO1, and DOCK2. Functional analyses of EAC-derived mutations in ELMO1 reveal increased cellular invasion. Therefore, we suggest a new hypothesis about the potential activation of the RAC1 pathway to be a contributor to EAC tumorigenesis.
In recent decades, the incidence of esophageal adenocarcinoma (EAC) has increased dramatically in the United States and other Western countries1,2. The increasing frequency and poor prognosis of these cancers is a substantial health concern. EAC does not develop from the naïve esophageal epithelium, but rather originates from intestinal metaplasia of the esophageal epithelium (Barrett’s esophagus) that develops in response to chronic gastroesophageal reflux. While the reason for the dramatic rise in these cancers is unknown, factors influencing the rising rates include gastroesophageal reflux disease (GERD), Barrett’s esophagus, and obesity3. There is great urgency to elucidate the genomic alterations underlying EAC in order to enhance understanding of these tumors, aid in early diagnosis, and identify therapeutic targets.
Knowledge of the somatic mutations in EAC has been limited to studies in small collections of tumors. These studies have identified frequent mutations in TP534 and CDKN2A5. Beyond these two genes, small focused studies have noted sporadic mutations in APC6, BRAF7, CDH18, CTNNB16, EGFR9,10, KRAS7, PIK3CA11, PTEN12, and SMAD412. While comparative whole exome sequencing has been reported for 11 EACs and esophageal squamous cell carcinomas, no clear contributors to EAC were identified at the gene level13.
Here, we describe the landscape and spectrum of genomic alterations in 149 fresh-frozen, surgically-resected cases of EAC including adenocarcinomas arising in the gastric-esophageal junction-GEJ not treated with chemotherapy or radiation prior to surgery. All cases were subjected to whole exome sequencing (WES) with 15 sample pairs also analyzed by whole-genome sequencing (WGS). Examination of the somatic alterations revealed a high frequency of mutations and rearrangements. Additionally, we identify a mutational signature defined by A>C transversions at AA dinucleotide sites (the latter adenine denotes the site of the mutation). Through systematic analysis of the mutated genes, we identify many genes not previously associated with this cancer. These include ELMO1 and DOCK2, upstream modulators of the RAC1 GTPase, and characterize the presence of mutations impacting signal transduction pathways. These results provide a foundation for further study and treatment of these cancers.
To identify somatic alterations in EAC, we performed WES on tumor-normal pairs from 149 patients and WGS on 16 pairs. Fifteen of the WGS samples have matched WES data, and 14 WGS samples were evaluated on mRNA expression arrays (Supplementary Fig. 1 and Supplementary Table 1). One tumor from which WGS was performed lacked matched WES data due to sequencing failure in this sample. Somatic mutations were identified using the MuTect and Indelocator tools14–16.
For WGS, tumors were sequenced to an average depth of 49X and the matched germline DNA samples were sequenced to 30X coverage with paired 101-basepair reads on Illumina HiSeq instruments (Supplementary Table 2). We identified a median of 26,161 genome-wide mutations per tumor (range 18,881–66,225) corresponding to a median mutation frequency of 9.9/Mb (range 7.1–25.2/Mb) relative to a haploid genome (Supplementary Table 3). The mutation frequency was highest in intergenic regions (17.0/Mb), lower in intronic regions (9.9/Mb) and lowest in coding exons (6.2/Mb) (Supplementary Table 4). This step-wise decrease in mutation frequency was consistently seen in other cancers16. Compared to other cancer types, this overall mutation frequency is high and exceeded only by lung cancer17,18 and melanoma19, diseases that emerge from clear mutagens. By contrast, analogous sequencing of colorectal adenocarcinomas (CRC) identified a mutation frequency of 5.6/Mb20 across the genome. The high mutation frequency in EAC suggests that these tumors may be exposed to significant mutagens, perhaps attributable to the harsh environment created by gastric refluxate and inflammation21.
We also analyzed the WGS data using the dRanger algorithm20 to identify chromosomal rearrangements. A total of 2,952 candidate rearrangements were identified with a median of 172 per tumor (range 77–402) (Supplementary Table 5). Consistent with array data showing a higher degree of structural alterations in EAC compared to CRC22, the number of rearrangements is much greater than observed with a comparable analysis of CRC genomes20. No correlation was observed between the number of mutations and rearrangements (R2=0.0046). Of the rearrangements identified, 20% were interchromosomal translocations. Among the intrachromosomal alterations identified, a majority (55%) involved aberrant fusions of two sequences located within 1 Mb of each other. To identify potential fusion gene products that may contribute to the pathogenesis of EAC, we examined data for predicted in-frame gene fusions. Thirty-eight such events were identified (Supplementary Table 6), but no recurrent gene fusions were detected.
Epithelial cancers often display variable mutation spectra pointing toward particular mutagenic stimuli. Therefore, we analyzed the spectrum of mutations in EAC detected by WGS. Earlier exome sequencing of EACs noted A>C transversions to be more common in EAC compared to squamous esophageal carcinoma13. Evaluating the WGS data, we found that A>C base changes comprised an average of 34% of total mutations (Supplementary Table 7). To comprehensively characterize the mutation spectra, we measured the frequencies of base mutations at different trinucleotide contexts and observed a preponderance of C>T transitions (39.2/Mb) as seen in most epithelial cancers. We further investigated the high frequency of A>C transversions (or equivalently T>G transversions on the complementary strand). These events showed preference (20.2/Mb) for the context AA – that is, at adenines flanked by a 5′ adenine and any 3′ nucleotide (Fig. 1a). In total, 84% of A>C mutations are flanked by a 5′ adenine. Expanding upon these findings, A>C transversions at AA dinucltoides were most pronounced when the 3′ base was a guanine (49.3/Mb) and lower when it was an adenine (8.0/Mb), cytosine (16.8/Mb), or thymidine (6.7/Mb) (Supplementary Table 8). To validate these results, genotyping of randomly selected AA>C mutations from intragenic regions showed a concordance rate of 100% (25/25). The high frequency of AA transversions appears to be unique to EAC as equivalent events have not been identified in other cancer types15–19,23,24.
In total, A to C transversions at AA sites accounted for 29% of the total mutations (Fig. 1a). Within individual tumors, these AA transversions accounted for 5–48% of mutations (Supplementary Table 8), and the event number was correlated with overall mutation frequency (R2=0.92) (Supplementary Fig. 2). When we exclude AA transversions, the mutation frequency is 8.5/Mb and still higher than most tumor types. Thus, AA>C mutations do not fully explain the elevated mutation frequency in EAC relative to other cancers.
We next characterized the distribution of mutations in genomic regions. While A>C transversions remained notable at the AA sites, the percentage of all mutations consisting of these transversions was significantly lower in exons (16%) than seen across the entire genome (AAG, P=0.001; AAT, P=0.0006; AAC, P=0.0007; AAA, P=0.0006; two-tailed Student’s t-test) (Fig. 1a). These results were consistent across the coding regions of all 16 cases evaluated by WGS (Supplementary Tables 9 and 10). In contrast, the attenuation of C>T transitions at CG dinucleotides in coding regions (39.2/Mb vs. 25.4/Mb) was smaller than that seen for AA transversions.
The reduction of AA transversions in coding areas relative to intergenic regions suggested that these mutations may be less likely to occur in transcribed regions or repaired effectively by transcription-coupled repair. To evaluate the potential impact of gene expression on mutation rates, we compared sample-specific frequencies of AA mutations within gene boundaries at varying levels of gene expression in 14 WGS samples from which mRNA was available for microarray expression profiling. Elevated expression was associated with lower global mutation frequency. Additionally, the impact of gene expression upon attenuating mutation rates was three-fold greater at AA sites than for mutations at other nucleotide contexts (Fig. 1b and Supplementary Table 11). This finding demonstrates a strong impact of local gene expression on the development of AA transversions in EAC.
Given the impact of transcription upon these mutations, we analyzed AA transversions for strand bias. The mutation rates for AA transversions in introns and exons were calculated separately depending upon if the adenine base is on the transcribed or non-transcribed strand. The results indicated that AA>C mutations are more common when the AA sites are located on the non-transcribed strand (12.4/Mb vs. 11.2/Mb; P = 0.0016, Student’s T-test, paired). When evaluating all other mutations, a strand bias was not detected (9.5/Mb vs. 9.5/Mb; P = 0.9086, Student’s T-test, paired) (Fig. 1c and Supplementary Table 12). These results suggest that AA transversions may be more effectively recognized and repaired when the mutated adenine is located on the transcribed strand.
We next analyzed WES data from 149 tumor/germline pairs (Supplementary Table 13). A mean coverage depth of 83.3x was achieved in neoplastic DNA and 85.9x in the non-cancerous tissue. 89% of exons were covered at 8x or greater depth for normal and at 14x for tumor, a threshold for which MuTect is powered to detect mutation above or equal to an allele fraction of 0.314,16,25. We evaluated mutation calling by comparing candidate coding mutations identified by WES to WGS calls from the same tumor. An 85.1% (2200/2585) concordance was observed for all events and 90% concordance at mutations present at greater than 0.1 allele fraction (Supplementary Table 14).
Four tumors had markedly higher coding mutation frequencies (14.6–50.9/Mb) than other cases. This pattern resembled that of CRC where a subset of tumors were hypermutated, largely attributable to microsatellite instability (MSI). Similarly, MSI-positive tumors have been reported to represent 7% of EAC26. These four cases with the highest mutation rates were found to be MSI-positive with the highest mutation frequency tumor having mutations in two mismatch repair genes MSH6 and MSH3 (Supplementary Table 15). By contrast, none of the 24 EAC samples with the next highest mutation frequency (greater than 5 mutations/Mb) scored positive for MSI. To avoid a potential confounding effect on statistical analysis, we omitted these MSI-positive cases from the final analysis, leaving 145 tumors.
A total of 17,383 mutations, consisting of 16,516 non-silent mutations and 1,954 insertion-deletion/null mutations were detected in the 145-sample cohort, for a median of 104 non-silent coding mutations per tumor (Fig. 2). The overall non-silent median mutation frequency was 3.51/Mb (range 0.97–10.8/Mb). We investigated whether the fraction of AA transversions was associated with clinical variables including age, stage, gender, and tumor location. Interestingly, a trend was seen wherein EACs developing within the tubular esophagus harbored a greater fraction of AA transversions compared to the tumor in the GEJ (P=0.076, two-tailed Student’s t-test); an intriguing result given the possibility of gastric refluxate into the lower esophagus to serve as a mutagenic insult (Fig. 2 and Supplementary Fig. 3). No other significant associations were identified.
We observed mutations in 8,331 genes, of which 3,639 were mutated in two or more samples (Supplementary Table 16). Among these, 199 genes were mutated in 5% or more of the tumors, including 33 genes mutated in over 10% of cases. To identify genes displaying evidence of positive selection for mutation, we used the mutation significance algorithm, MutSig14–16,25. This tool compares the mutation occurrence in each gene to that which would be expected by chance given a background mutation frequency model that factors in the mutation spectra, presence of silent mutations, mutation frequencies, and regional mutation frequencies along the genome18. We found 26 genes to be significantly mutated (FDR q<0.1) with two known EAC tumor suppressors, TP53 and CDKN2A, being the most significant (Fig. 2). With the exception of ARID1A, PIK3CA, and SMAD4, no other significantly mutated genes have been previously implicated in EAC although several had been implicated in other cancers.
Intriguingly, two significantly mutated genes, ELMO1 and DOCK2, are dimerization partners and intracellular mediators of the Rho family GTPase, RAC127,28. Because aberrant RAC1 activation has been implicated in malignant transformation of other cancer types, mainly by enhancing cellular motility29–33, recurrent mutations in these genes may be functionally important. While no RAC1 or RAC2 mutations were identified, ELMO1 or DOCK2 are mutated in 25 (17%) EAC samples with two samples having mutations in both factors and two samples have two independent mutations in DOCK2 (Fig. 3 and Supplementary Table 16). Notably, a single amino acid, p.K312 of ELMO1, is mutated in three tumors, which suggests a gain of function phenotype. DOCK2 is a guanine nucleotide exchange factor (GEF) that activates RAC1 directly through GTP loading27,34. To fully activate RAC1, DOCK2 and ELMO1 interact to relieve mutual autoinhibition28. In cancer models, ELMO1 and other DOCK family members have been associated with enhanced migration and invasion35,36. Mutations were also present in other RAC1 GEFs (TRIO, TIAM1, VAV2, and ECT2) (Supplementary Fig. 4). Furthermore, we previously observed focal copy-number gain of the 11q13 locus containing the serine/threonine kinase, PAK1, a principal downstream effector of RAC1 that has been shown to be oncogenic in breast cancer22,37. Taken together, the aberrant activation of genes related to RAC1 suggests that the motility pathway may be important to EAC.
To examine the significance of ELMO1 mutations, wild-type and mutant ELMO1 constructs were generated and introduced into NIH/3T3 cells. Based on studies in glioblastoma demonstrating correlative increase in cellular invasion with overexpression of wild-type ELMO136, we hypothesized that ELMO1 mutations would enhance cell invasion. Compared to GFP control, wild-type ELMO1 increased invasion by 7-fold (P = 0.0040, Student’s T-test, unpaired) (Fig. 3C). ELMO1 mutations (p.F59L, p.K312E, p.K312T, p.K349R, p.T421N) further resulted in a significant increase (2 to 7-fold) in invasion compared to wild-type ELMO1 (P-values in figure). These results suggest that ELMO1 mutations can increase invasiveness and potentially contribute to tumorigenesis in EAC.
Additional significantly mutated genes include members of the SWI/SNF family of chromatin-remodeling factors: ARID1A, SMARCA4, and ARID2. Together, these genes are mutated in 20% of tumors. The enzymatic subunit of the chromatin-remodeling complex, SMARCA4, has been established as a putative tumor suppressor14,38. Likewise, ARID1A and ARID2 have been implicated as tumor suppressors in cancers including gastric cancer39–42. Interestingly, a candidate protein fusion identified by WGS also targets SMARCA4. The predicted fusion between exon 11 of SMARCA4 and exon 14 of DNM2 might point to an alteration that results in a loss-of-function gene-phenotype (Supplementary Table 6). Mutations were also found in other chromatin modifying enzymes: PBRM143 and JARID2. Taken together, 24% (35/145) of EACs harbored mutations in genes encoding chromatin-modifying factors (Supplementary Figs. 4 and 5).
Another intriguing gene is SPG20, mutated in 7% of EACs with five of the mutations generated by AA transversions (Supplementary Fig. 6). Spartin, the gene product of SPG20, was reportedly mutated in Troyer syndrome44, a genetic disorder characterized by progressive muscle stiffness and limb paralysis. The functions of Spartin include endosomal trafficking of growth factor receptors, inhibition of bone morphogenic protein signaling, and ubiquitin targeting45. More recently, SPG20 hypermethylation has been linked to colon cancer progression.46
TLR4 was mutated in 6% of EACs. Germline polymorphisms in TLR4 correlate with risk of in Helicobacter pylori-mediated gastric carcinoma47. In lung models, TLR4 deficiency contributes to enhanced inflammation and tumorigenesis48. TLR4 activates the innate immune response to pathogen exposure through heterodimerization with MD-249. Notably, the mutations in TLR4 fall between amino acids p.D379 and p.F487, a region critical for MD-2 interaction50. One mutation impacts p.E439, a site essential for hydrogen bonding of TLR4 to MD-2 (Supplementary Fig. 6). These mutations suggest disruption of the TLR4/MD-2 complex as a potential driver of tumor progression in EAC.
We also identified other significant candidates, including the protein kinase A anchoring protein, AKAP6 (mutated in 8% of samples), E3 ubiquitin ligase, HECW1 (8%), and AJAP1 (6%), which mediates signaling at adherens junctions and increases invasiveness in cancer cell lines51. NUAK1 (ARK5) is mutated 3% of samples, which is notable given that MYC-overexpressing hepatocellular carcinoma models are dependent on NUAK152. The lysine acetyltransferase MYST3, recurrently targeted for translocation in leukemia53, was also mutated in seven specimens (5%) (Supplementary Fig. 6).
Beyond the genes mutated at a statistically significant frequency, we queried the data for mutations of biologic significance given their recurrence in other cancers. We identified mutations in EAC that had been seen two or more times across all cancers in COSMIC database54; we found 22 such genes (Supplementary Table 17). Additionally, ten genes were significantly mutated (FDR q<0.1) in limited analysis of COSMIC gene territory including KRAS, CTNNB1, and ERBB2 (Supplementary Table 18). These results indicate that genes not reaching statistical significance in the cohort may harbor mutations of biologic relevance in individual tumors.
We queried the data for mutations in genes encoding therapeutic targets with inhibitors approved for clinical use or in preclinical development55. Mutations in actionable genes were discovered in 23% of tumors with PIK3CA being the most frequently mutated (Fig. 4). When also evaluating amplification status of genes, 48% of tumors in this cohort have a genomic alteration in a gene with a targeted agent. The high frequency of focally amplified therapeutic targets22 exceeds that of mutation in these same genes in EAC. Therefore, determining how to effectively treat tumors with amplified targets, especially RTKs, should be considered a priority.
To explore the functional impact of the mutations, we performed unbiased, GO-term enrichment in the overall ranked MutSig list using the 8,356 genes with at least one non-silent mutation56,57. GO processes related to cell adhesion and chemotaxis ranked as enriched near the top of the list (Supplementary Table 19). These findings support the hypothesis that enhanced cellular motility and invasiveness plays an important role in EAC disease progression.
We also studied how cancer-associated pathways were disrupted by mutation in EAC. Cell-cycle control was altered by point mutation in 14% of EACs with most of the mutations occurring in CDKN2A (Fig. 5 and Supplementary Fig. 4). This process was also frequently affected by amplifications at the loci of CCND1, CCNE1, and CDK622. Although activation of β-catenin signaling is ubiquitous in CRC, mutations in this pathway were found in only 9% of EACs with two tumors having APC mutations that co-occur with either CDH1 or AXIN1 (Fig. 5 and Supplementary Fig. 4). Moreover, a potential AXIN1 fusion was identified by WGS in sample ESO-1060 spanning exon 5 of AXIN1 and exon 2 of GALNT7, which might alter normal gene function (Table 1). As in other cancer types, the TGFβ/SMAD signaling pathway was mutated in 18% of EAC tumors. The most recurrently altered gene in this pathway is SMAD4, which mutated in 10 samples and also subject to frequent copy-number loss (Fig. 5 and Supplementary Fig. 4).
We evaluated the frequency and manner of somatic alterations in mitogen-activated protein kinase (MAPK) and phosphatidylinositol 3-kinase (PI3K), two common pathways required for proliferation and survival of cancer cells. Unlike other epithelial tumor types, where such MAPK-pathway mutations are common, no BRAF mutations were observed, and NF1 and KRAS mutations were seen in only three (2%) and five tumors (3%), respectively. Three of the five KRAS mutations alter p.G12; however one EAC harbored a KRAS c.351A>C (p.K117N) event, a mutation caused by an AA transversion and previously observed in CRC58. The PI3K pathway was the most frequently altered oncogenic pathway altered by mutation (13%). PIK3CA was mutated in seven tumors, followed by PIK3R1 and PTEN in five and four tumors, respectively (Fig. 4b and Supplementary Fig. 4).
We explored mutations in the ErbB family of RTKs, which are important therapeutic targets in many cancer types. Although three samples harbored EGFR mutations, these alterations were not previously annotated in other tumors. Moreover, two of these alterations, p.S447Y and p.S1153I, were predicted by Polyphen-2 score59 to not be deleterious to normal function, and thus of questionable biologic significance. By contrast, ERBB2 mutations were present in five tumors. Three mutations were in the kinase domain including two c.351A>C (p.D769Y) mutations and one c.2327G>T (p.G776V). These alterations have been observed previously in other cancers60–62.
Here, through mutation analysis, we provide insight into the somatically-altered genes and signaling pathways as well as confirm the a high rate of A>C transversions in EAC13. We further establish that the rates of these mutations are highest in non-coding areas, and within coding areas are overrepresented in less-expressed genes. Additionally, we demonstrate context specificity showing that A>C transversions are most common when the mutated adenine follows a 5′ flanking adenine (AA) and especially at AAG trinucleotides.
This mutational spectrum appears to be unique to EAC suggesting that these mutations are attributable to gastroesophageal reflux, where the gastric and duodenal contents travel into the lower esophagus creating an environment of inflammation21. Prior studies have linked particular substances such as bile acids, nitrosamines, and reactive oxygen species to the development of metaplasia and carcinomas63, but the precise mutagen(s) remain poorly understood. Experiments in E. coli exploring the mutagenic potential of an oxidatively damaged DNA precursor, 8-hydroxydeoxyguanosine triphosphate, demonstrated that it preferentially induces A>C transversions64. These data suggest that A>C transversions in EAC may arise from oxidative damage induced by GERD; however, experimental evidence is necessary to identify a culprit stimulus. The identification of this mutational signature enables future studies to define specific carcinogen(s) that contribute to EAC and potentially aid in the explanation of the rising incidence.
Statistical analysis also enabled a comprehensive assessment of mutated genes in EAC and identified mutations in cancer-related genes such as TP53, CDKN2A, SMAD4, and PIK3CA. It was notable that most well-annotated cancer genes were not affected by AA transversions. In many cases, it is impossible to generate hotspot mutations such as KRAS p.G12 or PIK3CA p.E545 with an AA transversion. Additionally, given the base composition of stop codons it is difficult to generate nonsense events from AA transversions and impossible to create a stop mutation from such an A>C transversion when it occurs in an AAG trinucleotide context, the most common context for AA mutations in our dataset. Of the 2,570 coding mutations caused by these events, none is a predicted nonsense mutation. Moreover, the data suggest that AA transversions accumulate in lower expressed genes, thus reducing their prevalence in genes contributing to oncogenesis. Despite these caveats, it is likely that mutations caused by AA transversions do impact genes relevant for these tumors. For example, a known transforming mutation in KRAS (c.351A>C; p.K117N) is created by an AA transversion.
Consistent with previous reports38–43, loss-of-function mutations in chromatin-remodeling enzymes are common in EAC. Prior gene studies have also suggested frequent activation of the MAPK, PI3K, and β-catenin pathways. The data presented here verify the presence of frequent mutations in the PI3K cascade, but argue against wide-reaching mutations in these pathways, thus drawing contrasts between EAC and CRC, where β-catenin activation and missense mutations of KRAS and BRAF are highly prevalent65.
For the first time, we detected EAC mutations in regulators of invasion and motility including significantly recurrent mutations in DOCK2 and ELMO1. These mutations may increase tumor fitness through alteration of cytoskeletal structure, increased invasive properties, or mitogenesis. We demonstrate that ELMO1 mutations augment cellular invasiveness; thus, suggesting one mechanism by which these events contribute to tumorigenesis. Given that EAC is a highly-invasive tumor prone to early metastasis, alterations in the RAC1 pathway may contribute to this phenotype.
Although we identified potentially actionable genomic alterations in 48% of samples, the ERBB2 (HER2)-targeted antibody, trastuzumab, is the only targeted agent used in the treatment of EAC/GEJ adenocarcinomas with its use guided by overexpression and genomic amplification of ERBB266. Currently, ERBB2 mutation assessment is not performed for EAC, despite ERBB2 being altered by both co-occurring amplification and mutation in 3% of samples. These data point to a potential role of mutation as an additional biomarker to guide use of ERBB2 (HER2)-targeting agents.
The limited knowledge of the genomic aberrations underlying EAC has hindered the development of new therapies. Numerous candidates, not previously implicated in this disease, have emerged from this analysis. Functional study of these genes will be required to validate and understand their roles in tumorigenesis and to identify the etiology to the unique spectrum of the observed AA transversions. These data provide an enhanced roadmap for the study of EAC and the much-needed development of new therapies for these deadly cancers.
All samples were obtained under institutional IRB approval and with documented informed consent. All samples were fresh frozen primary resections from patients not treated with prior chemotherapy or radiation. Hematoxylin and eosin stained slides were examined by a board-certified pathologist to select cases with estimated carcinoma content >70%. DNA was extracted using salt precipitation or phenol chloroform extraction. DNA was quantified using Picogreen dsDNA Quantitation Reagent (Invitrogen).
Whole-exome capture libraries were constructed from 100ng of tumor and normal DNA following shearing, end repair, phosphorylation and ligation to barcoded sequencing adapters67. Ligated DNA was size-selected for lengths between 200–350bp and subjected to exonic hybrid capture using SureSelect v2 Exome bait (Agilent). Samples were multiplexed and sequenced on multiple Illumina HiSeq flowcells to average target exome coverage of 83.3x was achieved in neoplastic DNA and 85.9x in the non-cancerous tissue.
Whole-genome sequencing library construction was done with 500ng of native DNA from primary tumor and germline samples for each patient. The DNA was sheared to a range of 101–700 bp using the Covaris E210 Instrument, and then phosphorylated and adenylated according to the Illumina protocol. Adapter ligated purification was done by preparatory gel electrophoresis (4% agarose, 85 volts, 3 hours), and size was selected by excision of two bands (500–520 bp and 520–540 bp respectively) yielding two libraries per sample with average of 380 bp and 400 bp respectively15,16,20. Qiagen Min-Elute column based clean ups were performed after each step. For a subset of samples, gel electrophoresis and extraction was performed using the automated Pippin Prep system (Sage Science, Beverly MA). Libraries were then sequenced with the Illumina GA-II or Illumina HiSeq sequencer with 101 bp reads, achieving an average of ~30X coverage depth.
The dRanger algorithm was used to detect genomic rearrangements by identifying instances where the two read pairs map to distinct regions of the genome or map in a manner that suggests another structural event such as an inversion. Candidate somatic rearrangements were queried in both the matched normal genome and a panel of non-tumor genomes to remove germline events. The final scorings of these somatic reads were then calculated by multiplying the number of supporting read pairs by the estimated quality of the candidate rearrangement (0 to 1). This metric is generated by taking into account the ability to align of the two regions joined by the putative rearrangement and the chance of detecting such a read pair given the library fragment-size distribution. Events with scores ≥4 (observed in at least four read pairs) were included in this analysis.
A total of 45 intergenic AA>C mutations were selected validation in tumor and germline sample using mass spectrometry genotyping (Sequenom). Mutations were randomly selected across six samples and sites chosen all had estimated mutation allelic fractions exceeding 30% thus enabling mutation detection19,25,68. Of those assays performed, 25 yielded interpretable data with others failing due to lack of PCR amplification or probe hybridization in tumor and/or germline sample.
Exome and whole-genome sequence data processing and analysis were performed using Broad Institute pipelines15,16,25,68. A BAM file aligned to the hg19 human genome build was generated from Illumina sequencing reads for each tumor and normal sample by the “Picard” pipeline. The “Firehose” pipeline was used to manage input and output files and submit analyses for execution in GenePattern69.
Quality control modules in Firehose were used to compare genotypes derived from Affymetrix arrays and sequencing data to ensure concordance. Genotypes from SNP arrays were also used to monitor for low levels of cross-contamination between samples from different individuals in sequencing data using the ContEst algorithm70. One tumor/normal pair (ESO-774) analyzed by WGS was not included in the exome analysis as the exome sequencing from that case failed quality control metrics.
The MuTect algorithm was used to identify somatic mutations in targeted exons and whole-genome data14,16,25. MuTect identifies candidate somatic mutations by Bayesian statistical analysis of bases and their qualities in the tumor and normal BAMs at a given genomic locus. We required a minimum of 14 reads covering a site in the tumor and 8 in the normal for declaring a site is adequately covered for mutation calling. We determined the lowest allelic fraction at which somatic mutations could be detected on a per-sample basis, using estimates of cross-contamination from the ContEst pipeline70. Small somatic insertions and deletions were detected using the Indelocator algorithm after local realignment of tumor and normal sequences14. All somatic mutations detected by WES were analyzed for potential false-positive calls by performing a comparison to mutation calls from a panel of 2,500 germline DNA samples. Mutations found in 2% of the germline samples or 2% of sequencing reads were removed from analysis. MutSig significant mutations except for all TP53 mutants were reviewed manually in the respective BAM file using the Integrative Genomics Viewer.
Somatic point, insertion, and deletion mutations were annotated using information from publicly available databases, including the UCSC Genome Browser’s UCSC Genes track71, miRBase release 1572, dbSNP build 13273, UCSC Genome Browser’s ORegAnno track74, UniProt release 2011_0375, and COSMIC v5154.
For the purpose of discovering recurrently mutated genes, we used the MutSig algorithm, as described in the following studies18. In short, this method builds a background model of mutational processes, which takes into account the genome-wide variability in mutation rates. We achieve this by considering different covariates that have been shown to affect mutation rate: GC content (measured on 100kb windows), local relative replication time76,77, open vs. closed chromatin status as determined by “HiC” – fine-scale mapping of the three-dimensional DNA contacts in the nucleus78, gene expression16, and finally, the local gene density measured in a 1 Mb window. For each gene, we define a set of nearest neighbors according to these covariates, and estimate the background mutation rate from noncoding (flanking and introns) and silent mutations of these neighbors. We then assign a score based on the ratio between the number of nonsilent coding mutation rate of the gene and the noncoding and silent mutation rate of the given gene and its neighbors. Furthermore, in order to increase power in detecting known driver genes, we performed an independent significance analysis that is restricted to events that have been previously reported in the COSMIC database.
MSI (microsatellite instability) analysis was performed using 10 microsatellite markers (D2S123, D5S346, D17S250, BAT25, BAT26, BAT40, D18S55, D18S56, D18S67 and D18S487) as described previously22.
Copy-ratios were calculated as the ratio of tumor read-depth to the average read-depth observed in normal samples for that region using the CapSeg DNA-sequencing based tool (McKenna, et al. in preparation).
Raw data was processed using the gene chip robust multiarray averaging79 (RMA) approach to provide normalized expression data for each probe set on the arrays.
NIH/3T3 and 293 cells were obtained from American Type Culture Collection (Manassas, VA). All cells were maintained in DMEM plus 10% FBS at 37°C in 5% CO2.
The full-length ELMO1 cDNA was obtained from Open Biosystems-Thermo Scientific (Lafayette, CO) and cloned into the EcoRI site of pBabe(puro). Mutants were generated by site-directed mutagenesis using the Quikchange II Site-Directed Mutagenesis Kit (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s instructions. All mutations were verified by sequencing.
Wild-type ELMO1, ELMO1 mutants, or GFP in the pBabe(puro) vector (1 μg) was co-transfected with 1 μg of pCL-Eco into 293 cells with Fugene HD (Roche, Indianapolis, IN) overnight. The growth media was replaced with new full serum medium after 24 h. After an additional 24 h, the retroviral supernatant was harvested and replaced fresh media. Retroviral supernatant was filtered and incubated with target NIH/3T3 cells in the presence of 5 μg/mL polybrene (hexadimethrine bromide). This procedure was repeated again after 24 h. Stably-infected cells were selected for under puromycin (1μg/mL) pressure for 2 weeks. Positive expression was confirmed by western blotting with an ELMO1 antibody-ab2239 (Abcam, Cambridge, MA).
Growth Factor Reduced Matrigel-coated Transwell chambers (BD Biosciences, San Jose, CA) were activated in serum-free media at 37°C for 2 h. NIH/3T3 cells (1 × 104) were plated in matrigel invasion chambers with full serum containing medium in the lower chamber only. After 24 h, non-invading cells in the top chamber were removed by cotton swab, and invading cells were fixed and stained using Diff-Quik staining solutions according to the manufacturer’s instructions (VWR International, Radnor, PA). The number of invading cells from each of four fields were counted at 20X magnification.
We thank Matthew Meyerson for helpful discussions and review of the manuscript and thank members of the Broad Institute Biological Samples Platform, Genetic Analysis Platform, and Genome Sequencing Platform for their assistance. We are also grateful for the physicians and hospital staff whose effort in collecting these samples is essential to this research. This work was supported by the US National Human Genome Research Institute (NHGRI) Large Scale Sequencing Program (U54 HG003067 to the Broad Institute, E.S.L.), National Cancer Institute (K08 CA134931 to A.J.B.), DeGregorio Family Foundation (A.J.B.), Karin Grunebaum Cancer Research Foundation (A.J.B), Target Cancer (A.J.B.), and Connecticut Conquers Cancer (A.J.B.). S.O. and Y.I. are supported by National Cancer Institute (R01 CA151993 to S.O.) and the Dana-Farber/Harvard Cancer Center GI Cancer SPORE (US National Institutes of Health P50 CA127003),. D.G.B is supported by grant CA163059 and CA46592.
Binary sequence alignment/map (BAM) files were deposited in the database of Genotypes and Phenotypes (phs000598.v1.p1). Raw data mRNA expression profiles on 14 EAC samples have been deposited at the Gene Expression Omnibus (GSE42363).
Author ContributionsP.S., S.P., M.S.L., C.F., C.S., S.E.S., A.M., K.C., A.S., S.L.C., G.S., D.V., A.H.R., and R.B. performed computational analyses. E.S., D.A., K.T., C.S., R.C.O., C.G., and S.B.G. processed samples and supervised exome sequencing. A.M.D., S.B., D.Z., L.L., J.L., R.R., A.C., J.D.L., A.P., D.G.B., T.E.G., and A.J.B. coordinated sample acquisition, processing, pathologic review, and analysis. Y.I. and S.O. performed MSI testing. A.M.D., P.S., T.R.G., S.B.G., E.S.L., G.G., and A.J.B. designed the study. A.M.D., P.S., S.P., M.S.L., G.G., and A.J.B. analyzed the data and wrote the manuscript.
MutSig Algorithm, http://confluence.broadinstitute.org/display/CGATools/MutSig; CCDS, http://www.ncbi.nlm.nih.gov/CCDS/; Broad Institute Picard Sequencing Pipeline, http://picard.sourceforge.net/; Broad Institute Firehose Pipeline, http://www.broadinstitute.org/cancer/cga; Supplementary Table 3: Whole genome sequencing mutation list, http://www.broadinstitute.org/~shouyong/eac/