For analysis of cancer related genomic alterations, genomic DNA of neuroblastoma cell lines IMR-32, SK-N-AS and SK-N-SH (all provided by ATCC) was analyzed. SK-N-SH cells were propagated in two different laboratories for at least ten passages. The resulting cell lines were analyzed as SK-N-SH/G and SK-N-SH/L. As baseline of normal human individuals, genomic DNA from peripheral blood samples of four females (C2, C4, C5 and C6) was collected after informed consent. Genomic DNA was isolated from animals of a mouse strain (megabladder mouse), which resulted from mutagenesis during generating transgenic mice. Genetic characterization of the megabladder mouse using BAC clones containing the transgene revealed chromosome 16 at approximately 26.4 Mb to be the site of insertional mutation. FISH analysis of metaphase chromosomes further revealed this region of chromosome 16 to be translocated into chromosome 11. Therefore, wild type mice contain two copies of the genomic region surrounding 26.4 Mb on chromosome 16, heterozygous mutants contained three copies and homozygous mutants contained four copies. The megabladder mouse will be described in detail elsewhere.
Re-annotation of expression array
Probe sets were aligned to Build 35 version of the human genome assembly by applying standalone BLAT [28
] to "concatemers" formed by concatenating the non-overlapping portions of individual 25-mer probe sequences of a probe set. If BLAT did not report any match for a concatemer of a certain probe set, the probe sets was eliminated from further annotation. Homology of each alignment was computed as the percentage of concatemer bases matched and the genomic location with the highest homology was used for further annotation. The refFlat.txt.gz file[29
] contains physical positions of gene locations according to the human genome assembly version Build 35 and has been used for identification of probe sets interrogating genes. When the genomic location with highest homology to a probe set overlapped with a gene in this database, the probe set was annotated to measure this particular gene. For multiple probe sets measuring the same gene, log2 copy number differences measured by individual probe sets were averaged.
Processing of genomic DNA for graCNV using expression arrays
20 μg genomic DNA was digested using EcoRI (New England Biolabs). Fragmentation and biotin labeling using terminal transferase were performed using GeneChip Mapping 10K Xba Assay Kit (Affymetrix). Human samples were hybridized to U133plus2.0 GeneChips (Affymetrix) and mouse samples were hybridized to custom GeneChips containing 4,400 probe sets preferentially measuring genes located on chromosomes 11 and 16 (Affymetrix). Hybridization and other conditions were slightly modified from those suggested for 10K Mapping Arrays (Affymetrix) and washing conditions were carried out as suggested for 100K Mapping Arrays. A detailed description of sample processing is available in Additional file 1
Data analysis for expression arrays
CEL files were generated from scanned images (DAT files) using GCOS 1.4 software (Affymetrix). Probe set signals were either generated using the RMA algorithm in ArrayAssist 3.4 (Stratagene) or using the in-house developed WPP algorithm. WPP (Well behaved estimates of differential gene expression Plus probe-level p-values Plus extensible quantile scaling) software is an enhanced version of RMA [17
]. WPP provides the following advanced analysis procedures which significantly increase the reliability and interpretability of calculated differentials: 1) probe-level nonparametric p-values are used to assess the statistical significance of individual calculated differentials; 2) strictly monotonic quantile scaling is used to standardize PM and MM probe intensity distributions across arrays; 3) automatic exclusion of uninformative and misinformative probes is used to increase the accuracy and precision of calculated differentials. A detailed description of the WPP algorithm is available in Additional file 1
. Measurements of the four normal human DNA samples were used as baseline for measurement of copy number variation in the cell lines. CNVs of cell lines were calculated relative to the median of signals from normal samples.
Principle components analysis and hierarchical cluster analysis
Hierarchical cluster analysis in SPSS software was applied to log2 transformed copy number estimates of probe sets using a Pearson correlation measure with furthest-neighbor distance. To exclude gender-specific differences, X-linked and Y-linked genes were excluded. Principle components analysis in SPSS software was applied with Varimax rotation to log2 transformed copy number estimates for 2,000 probe sets with the widest range of values for autosomal chromosomes.
Construction of the Human BAC CGH array
We prepared DNA spotting solutions from sequence connected RPCI-11 BAC by ligation-mediated PCR as described previously[30
]. The array contained ~19,000 BAC clones that were chosen by virtue of their STS content, end-sequence and association with cancer[15
]. Each clone was spotted in duplicate on amino-silanated glass slides (Schott Nexterion typeA+) using a MicroGrid ll TAS arrayer (Apogent Discoveries). The BAC DNA products have ~80 μm diameter spots with 150 μm center-to-center spacing creating an array of ~39,000 elements. The printed slides were dried overnight and thereafter UV-crosslinked (350 mJ) in a Stratalinker 2400 (Stratagene) immediately before hybridization. A complete list of the RPCI-11 BAC clones spotted on the 19k array is available online[31
Labeling and Hybridization of DNA for BAC aCGH
One μg of reference and test sample genomic DNA (pooled genomic DNA of five individuals) were individually fluorescently labeled using the BioArray CGH Labeling System (Enzo Life Sciences). Initially, the DNA was denatured in the presence of the random primer at 99°C for 10 minutes in a thermalcycler, followed by a quick chill at 4°C. The tubes were transferred to ice and underwent labeling with the addition of dNTP-cyanine 3 mix (or dNTP-cyanine 5) and Klenow. Samples were incubated overnight at 37°C in a thermalcycler. The unincorporated nucleotides were removed using a QIAquick PCR purification column (Qiagen) and the labeled probe is eluted with 2 × 25 ul washes. Prior to hybridization, the test and reference probes were resuspended in 110 μl SlideHyb Buffer #3 (Ambion) containing 5 μl of 20 μg/μl Cot-1 and 5 μl of 100 μg/μl Yeast tRNA (Invitrogen), heated to 95°C for 5 minutes and placed on ice. Hybridization to the 19k CGH arrays were performed for 16 hours at 55°C using a GeneTAC hybridization station (Genomic Solutions, Inc.) as described[32
]. After hybridization, the slides are automatically washed in the GeneTAC station with reducing concentrations of SSC and SDS.
Digital Data Acquisition and Analysis for BAC aCGH
The hybridized aCGH slides were scanned using a GenePix 4200A Scanner (Molecular Devices) to generate high-resolution (5 μm) images for both Cy3 (test) and Cy5 (control) channels. Image analysis was performed using the ImaGene (version 6.0.1) software from BioDiscovery, Inc. A loess corrected log2
ratio of the background-subtracted test/control were calculated for each clone to compensate for non-linear raw aCGH profiles in each sample. Mapping information was added to the resulting log2
test/control values. The mapping data for each BAC is found by querying the human genome sequence[33
] and examined for regions of large scale variation (LSV) in the human genome[8
Comparison of copy number segmentation results from expression arrays and BAC arrays
Since BAC aCGH microarray and the graCNV microarray (U133 Plus 2.0 GeneChip) have been annotated according to the human genome assembly version 35, coordinates of copy number segments were compared directly. Copy number segmentation of log2 ratios was performed in R using the DNAcopy package v1.8.1 which applies CBS (Circular Binary Segmentation) [36
], one of the best available segmentation algorithms [38
]. The undo.splits option was set to "sdundo".
Microarray expression analysis
For expression profiling, 25 ng total RNA per sample was processed using isothermal amplification SPIA Biotin System (NuGEN Technologies, Inc.) and 2.2 μg of cDNA was hybridized per microarray. Microarrays utilized were Custom GeneChips (Affymetrix), containing probe sets to measure transcripts from mouse chromosomes 11 and 16. After 16 hours of hybridization at 45°C, washing and staining of microarrays was performed using a Fluidics Station 450 (Affymetrix); GeneChips were scanned in a GeneChip Scanner 3000 (Affymetrix). CEL files were generated from DAT files using GCOS software (Affymetrix). All steps of sample and microarray processing were performed according to manufacturer's recommendations. For calculation of differential gene expression, log2 differential expression of multiple probe sets per gene were averaged when more than one probe set was available per gene.
Tail samples (<1 cm) were snipped from every animal in the megabladder mouse colony. Tails were digested and DNA was isolated using Spin Doctor Genomic DNA Isolation kit (Gerard Biotech) according to the manufacturer's protocol. The DNA was resuspended in resuspension buffer included in the kit. The concentrations of the samples were determined by Nanodrop ND1000 spectrophotometer (Nanodrop), and the optical density 260/280 nm ratios were evaluated. Genomic DNA was stored at 4°C until further use. Mutant mice contain an artificial transgene in addition to the additional copies of the specified region of chromosome 16. Genotyping of mice by quantitative PCR was performed using transgene specific primers (5'-CAACCGACTCTGCATTCATCTC-3' (forward) and 5'-CTCCAGTACAGCCCTCATGTTTG-3' (reverse) and probe 5'-6FAM AAGCTTGATATCGAATTC MGBNFQ-3'. The Glucagon gene was used as internal control with primers 5'-CACAACATCTCGTGCCAGTCA-3' (forward) and 5'-ATCTGCATGCAAAGCAATATAGCT-3' (reverse), and the probe was 5'-VICT GGGATGTACAATTTCAA MGBNFQ-3'. Working concentrations of primers and probes were 18 μM and 5 μM, respectively. The multiplex PCR reactions were set up with 20 ng DNA and TaqMan Universal PCR Master Mix, No AmpErase UNG (Applied Biosystems). Reactions were performed in triplicate using the ABI series 7500 Sequence Detection System (Applied Biosystems). The initial denaturation was carried out at 50°C for 2 min, followed by 95°C for 10 min (denaturation) followed by 40 cycles of PCR reactions at 95°C for 15 sec and 60°C for 1 min. The amplification data were further analyzed using ABI 7500 System Sequence Detection Software Version 1.2.3 (Applied Biosystems). The genotype was determined by the presence of 0 versus 1 versus 2 copies of the transgene in wild type, heterozygous and homozygous mice respectively. Copy numbers of endogenous genes were determined using SYBR Green or TaqMan chemistry (both from Applied Biosystems). 10 ng of genomic DNA were used per reaction and amplification conditions for SYBR Green assays were as follows: 50°C for 2 min, 95°C for 10 min followed by 40 PCR cycles at 95°C for 15 sec, 54°C for 30 sec and 72°C for 35 sec. The data was collected at 72°C for 35 sec. TaqMan data for glucagon were used for normalization. Primers were generated for the following sequences: 2310061A09Rik (5' GCCATCTGCATATTCTTTGCTAGCA 3' forward and 5'ACATGGTTTAATGGTAGACTGGGCA 3' reverse); Cldn1 (5'CTCAACCTCCCAACTGTTAAGATGA 3' forward and 5'AACCTCTCCTATAACTGTCAGCTTC 3' reverse); Ostn (5'GAGTGTTTGCTTCAACTGTGTCAGA 3' forward and 5'AACAAGCCAGGCAGTAACTTCTTTT 3', reverse); Uts2d (5'GAGTGTTTGCTTCAACTGTGTCAGA 3' forward and 5' TAGGCTGGTAGAAGTAAACAAGCCA 3' reverse), 2610529H08Rik (5'TGGCGTCTAGGGAACTGAGTTTCTT 3' forward and 5'TGAGGAAACAGCAGTACACGATAAC 3' reverse), D16Bwg1543e (5'GCTGGCTGCAGGGAACAATCTATTT 3' forward and 5'GATGTAGACATATGAGTGGTAGTGA 3' reverse), B230343J05Rik (5'TGTGATTCATCATCGCTACAGGGAA 3' forward and 5'AACCTTCTCAAAAGCAAGGCCTTGT 3' reverse). Amplification conditions for TaqMan assays were as described above for genotyping. Commercial "Primer probe mixes" (Applied Biosystems) were used for Il1rap (ILRAP5-K1), Fgf12 (FGF12-1-A2) and Cldn16 (CLDN-I55S4).
Microarray data are deposited as GEO accession # GSE7364[39