All specimens used were derived from standard colectomies from previously biopsy proven colon cancers. Specimens were received fresh into the Department of Pathology at Case Medical Center. After harvesting of fresh portions of cancer and normal mucosa, colectomy specimens were pinned onto a wax plate and immersed overnight in 10% neutral buffered formalin. The next day routine sections of cancer, normal mucosa and adjacent lymph nodes were submitted for routine paraffin embedding and H&E sections obtained. DNA for sequencing was isolated from these clinically derived specimens. Eight 5μm sections of formalin fixed paraffin embedded cancer sections were obtained per case. Cancers were microdissected to remove non-cancerous and necrotic tissue, resulting in specimens that contained approximately 80% viable cancer material.
DNA Extraction from Formalin Fixed Paraffin Embedded Samples
Initially the tissues were de-paraffinized by dipping the slides in a series of solutions: Xylene – initially for 4 minutes and secondly for 2 minutes; 2 minutes each in 100%, 95%, and 70% ethanol followed by 2 rinses of 2 minutes each in 10mM of Tris solution. Tissues were then scraped into a polypropylene microcentrifuge tube. The QIAamp Micro DNA Kit was employed to extract the DNA. 15μl of buffer ATL and 10μl of proteinase K (20 mg/ml) were added per slide with up to sections from five slides in a polypropylene microcentrifuge tube. Tubes were then vortexed for 15 sec. The tubes were incubated at 60°C for 8 days – with the daily addition of 1.5μl of proteinase K (20 mg/ml) per tube.
After incubation, 25μl of buffer ATL is added per slide scraped into tube. (For example, if four slides were scraped then add 100 μl). Add 50 μl of buffer AL containing 1μl RNA carrier to the tube per slide scraped in step one. [1μl of RNA carrier is added to 50 μl of buffer AL previously: RNA stock is 1 μg/μl] then vortex solution for 15 sec then incubate at 70°C for 20 minutes. After a brief spin, add 50 μl of 100% ETOH per slide scraped from step one, vortex for 15 sec. incubate at room temperature for 5 minutes, spin briefly and add contents of tube to a micro spin column (provided with kit). (One may combine up to five digestions onto a single column.) Spin at 8,000 rpm for 1 minute, place spin column into a clean collection tube then add 500μl of AW1 wash buffer. Spin at 8,000 rpm for 1 minute. Place spin column into a clean collection tube then add 500μl of AW2 wash buffer to the column. Spin at 8,000 rpm for 1 minute. Place the spin column into a clean collection tube and spin for 3 minutes at 14,000 rpm to dry the column. Place the spin column into a clean collection tube then add 25μl of buffer AE to the center of the column and incubate for 5 minutes at room temperature. [Add 50μl of AE buffer if you have more than one digestion added to a single column]. Spin the column for 1 minute at 14,000 rpm to collect the DNA. Repeat with the same volume of buffer AE added to the center of the column and incubate for 5 minutes at room temperature without changing the collection tube. Spin the column for 1 minute at 14,000 rpm to collect the DNA. Transfer DNA to a closable tube for permanent storage.
Assessment of Sample Quality and Yield
To select samples for sequencing, each DNA sample isolated from formalin fixed paraffin embedded tissue was assessed for quality and yield. DNA concentration was determined using a Qubit fluorometer [Invitrogen Qubit High Sensitivity dsDNA Assay]. A minimum of 3 μg of DNA was required to initiate library preparation. Samples producing insufficient yield were removed for further analysis. DNA quality was assessed by the ability of the sample to produce a robust PCR amplicon of at least 420 bps. The formalin fixed paraffin embedded DNA was used as a template for a series of 5 PCR assays designed to produce PCR products ranging in size from 420–718 bps. The resulting amplicons were then examined on a 2% agarose gel. If a DNA sample was not able to robustly produce amplicons >420 bps in this PCR QC Assay, it was excluded from further analysis. The PCR primers used in the PCR QC Assay amplify different regions of the HLTF gene. These primer IDs and their sequences are depicted in .
depicts the primer combinations used in the 5 different PCR assays designed to produce a range of different sized amplicons. Each PCR amplification was initiated in a 50μl reaction volume at 95°C for 9 minutes and a total of 35 PCR cycles were then carried out using AmpliTaq Gold as the polymerase. [PCR Cycle Conditions: 95°C for 30 sec; 64°C for 45 sec & 72°C for 45 sec].
Primer Combinations for PCR QC Assays.
DNA samples successfully passing the screen for Sample Quality and Yield were then prepped for library production, followed by targeted hybridization-based capture and sequencing on the Illumina Genome Analyzer. Initially each 3 μg DNA sample was sheared to a peak distribution of 150-200 bp, a size range that is optimal for SureSelect target enrichment, using a Covaris S2. Following shearing and sample purification, an Experion 1K DNA LabChip was employed to ensure that the DNA was sheared to the appropriate size. The Agilent SureSelect Target Enrichment System for Illumina Multiplexed Sequencing was followed. The Adapter-Ligated Library was amplified for 4 cycles of PCR. Agencourt AMPure XP Beads were employed for all purifications steps in the Library Preparation. The captured, amplified DNA library was checked for both quantity and quality. The Qubit High Sensitivity dsDNA Assay was employed to assess library concentration, while analysis on the Experion DNA 1K Chip was used to ensure a peak size between 250 – 275 bp. To move forward the sample must have a 260/280 ratio of 1.8 to 2.0, a minimum yield of 500 ng [147 ng/μl], and a single peak between 250 – 275 bp. Although sufficient yield was required to move to capture, the Library Preparations for all samples were too dilute and required concentration using a speed-vac before proceeding to enrichment.
Enrichment of CAN Gene Exons
Agilent Technologies ‘SureSelect Target Enrichment System’ [or DNA capture], was used to enrich targeted regions of the genome for analysis with the Illumina sequencing platform. Through Agilent's Custom Design Portal, a customized SureSelect Kit was designed to target exons from the 140 CAN genes, comprising 2,934 target exon regions. Only 31 (~1%) additional exon regions did not meet criteria for bait design. The Custom SureSelect Kit designed for this project used custom oligonucleotides as capture probes.
Once size-selected libraries were prepared and confirmed as detailed above, they were incubated with the custom-designed SureSelect baits for 24 hours. RNA bait-DNA hybrids were then isolated from the complex mixture with streptavidin-labeled magnetic beads. After extensive washing the RNA bait was digested, leaving only the targeted DNA of interest. Following the capture, 14 cycles of DNA amplification were performed. The targeted sample was then analyzed using an Agilent 2100 Bioanalyzer High Sensitivity DNA Chip to ensure that the amplified prepped library DNA showed a single peak in the size range of 300 to 325bp. DNA concentration was determined using Agilent's QPCR NGS Library Quantification Kit [for Illumina]. This Real Time PCR assay, which employs SYBR Green as the fluorescent indicator was designed by Agilent to assess the quantity of index-tagged libraries. Samples successfully meeting the size and concentration criteria were then pooled at equimolar concentrations and subjected to Illumina Sequencing.
Up to five samples, with unique index-tag adapter sequences were combined for multiplex sequencing in a single lane on the Illumina Genome Analyzer IIx [GA2x]. Paired-end 72-base reads were collected, with an additional seven bases collected for decoding the index tag sequence in each read. Sequence data quality was assessed using fastqc
). Capture efficiency was determined based on the percentage of reads that map to or near CAN gene target regions using the HSmetrics component of the picard
Three software systems were used to define single nucleotide variants in the Illumina sequence data: Atlas-SNP2(12)
, and the Unified Genotyper component of the Genome Analysis Toolkit (UGT) (14, 15)
. UGT was also used to predict small indel variants. Reads were aligned to the human genome (hg18) SOAP2 (for SOAPsnp) and BWA (for UGT and Atlas-SNP2). Default parameters were used for SOAP2 and BWA. For Atlas-SNP2, the minimum coverage was least three reads and the maximum coverage was set to approximately the maximum coverage of each sample based on SOAP2 alignment results. Variants that have been previously observed in other studies were identified by cross-reference with dbSNP version 130. As shown in , nearly all variants present in dbSNP were predicted by all three callers. A significant majority of novel (non-dbSNP) variants were also predicted as present by all three callers ().
Overlap of SNP calls at positions of dbSNP variants
Overlap of SNP calls at positions without dbSNP variants
Validation of Unique Variants
Novel variants could be germline polymorphisms, somatic mutations, or sequencing errors. To distinguish among these possibilities, candidate variants not previously reported in dbSNP were confirmed or validated by re-sequencing the appropriate exon in the FFP-derived DNA, DNA from a cell line derived from the cancer and the respective matched normal DNA. This sequencing was performed in the Genomics Core by traditional PCR amplification and Sanger sequencing on an Applied Biosystems 3730xl DNA sequencer.