|Home | About | Journals | Submit | Contact Us | Français|
Stromal contamination is one of the major confounding factors in the analysis of solid tumor samples by single nucleotide polymorphism (SNP) arrays. As we propose to use genome-wide SNP microarray analysis as a diagnostic platform for neuroblastoma, the sensitivity, specificity, and accuracy of these studies must be optimized. To investigate the effects of stromal contamination, we derived early-passage cell lines from nine primary tumors and compared their genomic signature with that of the primary tumors using 100K SNP arrays. The average concordance between tumor and cell line for raw loss of heterozygosity (LOH) calls was 96% (range, 91–99%) and for raw copy number alterations, 71% (range, 43–87%). In general, there were a larger number of LOH events identified in the cell lines compared with the matched tumor samples (mean increase, 3.2% ± 1.9%). We have developed an algorithm that shows that the presence of stroma contributes to under-reporting of LOH and copy number loss. Notable findings in this sample set were uniparental disomy of chromosome arms 11p, 1q, 14q, and 15q and a novel area of amplification on chromosome band 11p15. Our analysis shows that LOH was identified significantly more often in derived cell lines compared with the original tumor samples. Although these may in part be due to clonal selection during adaptation to tissue culture, our study indicates that stromal contamination may be a major contributing factor in underestimation of LOH and copy number loss events.
Neuroblastoma is the most common extracranial solid tumor in children and accounts for over 15% of pediatric cancer-related deaths (1). The hallmark of neuroblastoma is its clinical heterogeneity, and seemingly similar tumors can have drastically different prognoses. Recently, we used the Affymetrix 10K SNP array to detect areas of loss of heterozygosity (LOH) and corresponding changes in copy number in paired blood and primary tumor samples from children with high-risk neuroblastomas (2). We detected regions of LOH with copy number loss (CNL) on chromosomes 1p, 3p, 4p, 11q, and 14q, LOH with no change in copy number on chromosome 11p i.e. uniparental disomy (UPD), and 17q LOH with concomitant copy number gain (CNG).
As genome-wide single nucleotide polymorphism (SNP) microarray analyses are being used as a platform for discovery of clinically relevant genetic alterations in neuroblastoma, it is important to validate the accuracy of these studies. Contamination by normal stromal elements may confound the detection of clonal tumor abnormalities, decreasing the sensitivity of identifying LOH or copy number alterations (CNA; refs. 3, 4). The primary goal of the current study was to determine whether or not contamination by normal stromal elements in the tumor samples could lead to under-reporting of LOH and CNA events. Here, we compare neuroblastoma primary tumors with derivative cell lines, the latter lacking contamination by the stromal component of the primary tumor. We show that contamination by normal elements can confound LOH and copy number determination. In addition, we confirm the previously observed genetic alterations of 11p UPD, 17q LOH with 17q gain, and the association of 1p LOH, MYCN amplification, and 17q gain with 10q loss. We also identified UPD in three new regions, 1q, 14q, and 15q, and a new region of amplification on chromosome 11p.
Samples were identified from the Children's Oncology Group Neuroblastoma Nucleic Acids Bank that had both constitutional DNA from the patient's peripheral blood mononuclear cells and an established low-passage cell line. Patients were staged according to the International Neuroblastoma Staging System (5), histology analyzed using the Shimada Classification (6) and MYCN gene amplification, and DNA ploidy determined as previously described (7, 8). This project was approved by the Institutional Review Boards of the Dana-Farber Cancer Institute, the Children's Hospital of Philadelphia, and Children's Hospital Los Angeles.
DNA was extracted from tumor tissue, cell lines, and blood samples using the Qiagen method. Cryopreserved cell lines, which had been established at the time of initial diagnosis as previously described (8), passaged not >4 to 5 times, were thawed and grown in culture for at most 2 passages in RPMI with 10% fetal bovine serum. When 80% to 90% confluent, cells were collected by trypsinization and pelleted by centrifugation at 3,500 rpm for 3 min.
The Affymetrix 100K SNP array was used for all experiments, according to the methods described by the manufacturer.6 Briefly, 250 ng of tumor, cell line, and control DNA were digested with the appropriate restriction enzyme (HindIII/Xba1) and ligated to an adaptor sequence (Xba/Hind). For each sample, the ligated DNA was PCR amplified under recommended conditions, using primers complementary to the adapters. Purified PCR products were fragmented and end labeled with biotinylated ddATP, using terminal deoxynucleotidyl transferase. The labeled DNA was hybridized to the 100K SNP chips, washed, incubated with streptavidin, and stained with biotinylated antistreptavidin antibody and a streptavidin R-phycoerythrin conjugate. Chips were scanned with an HP scanner (Hewlett-Packard) as per the manufacturer's recommendations. Affymetrix genotyping software (Affymetrix GeneChip 5.0) was used to examine the SNP hybridization patterns and to make SNP calls of all loci in each of the tumor samples and their corresponding matched controls.
The resulting data were analyzed with the dChip software package.7 An LOH call was assigned as follows: LOH (AB in blood, A/B in tumor/cell line; blue), ROH or retention of heterozygosity (AB in both blood and tumor/cell line; yellow); noninformative (A/B in blood; gray); no call (SNP no-call in blood or tumor/cell line; white), or conflict (A/B in blood, B/A or AB in tumor/cell line; red; Supplementary Fig. S1). Larger areas of LOH were obtained by using hidden Markov models to infer the LOH status at noninformative SNPs (9, 10).
Copy number changes were calculated by comparing the normal and tumor samples, using dChip (11). Patient blood samples were averaged for use as normal for copy number comparison. As in our previous work, CNAs were defined as follows: CNG, 2.8 to 5 copies; amplification, >5 copies; CNL, <1.2 copies.
Exported LOH and inferred copy number data were further analyzed and visualized using custom-written Python scripts (available upon request).
To determine the contribution of normal elements to LOH and CNL detection in tumor specimens versus cell lines, the raw LOH and copy number data were examined. Overlapping areas of LOH were determined by querying the raw data for windows of 150 SNPs with >5% LOH in both the tumor and cell line. Within these common regions of LOH, the number of loci displaying LOH and ROH were tallied for both the tumor and cell line, and χ2 analysis used to determine the significance of the difference. For example, in sample #1355, there were 2 common blocks of LOH (1 shown in Fig. 1). Within these 2 blocks, there were 191 and 47 loci displaying retention in the tumor and cell line, respectively, and 408 and 730 loci displaying LOH, respectively. The χ2 value was 157.9 with a P value of < 0.0001. In other words, the difference between the amount of retention seen in the tumor sample compared with the cell line is highly significant, and unlikely to be attributable to chance alone.
Sample noise was determined by tallying the number of loci with either a “conflict” call or “no-call.” These calls were tallied for both the common areas of LOH as well as for all SNPs.
The average copy number was calculated over a sliding window of 150 SNPs, normalized against the blood sample. For areas where the copy number of the tumor sample or cell line was ≥0.8 less than the blood (CNL), the number of loci within these regions were tallied to determine how often the value for tumor was intermediate to that of blood and cell line (C<T<B) and vice versa (T<C<B).
Blood and tumor samples from 10 patients with neuroblastoma, along with derivative cell lines, were analyzed by 100K SNP arrays. All but one patient had metastatic disease at diagnosis, and all met the Children's Oncology Group criteria for having high-risk disease. Initial evaluation of SNP genotype data showed that one of these samples was incorrectly paired with the corresponding control, and it was therefore left out of subsequent analyses (data not shown). All data shown are from the nine remaining samples, all of which were shown to be correctly paired, based on SNP identity between blood and tumor samples (71.1–97.4%; average, 91.6%) and blood and cell line samples (72.5–97.1%; average, 89.3%) for SNPs called in both blood and cell line.
The average genotype call rates for all samples were >94% (Supplementary Table S1). For the combined HIND/XBA set, the average call rates were 0.942 ± 0.047 for blood, 0.982 ± 0.003 for tumor, and 0.954 ± 0.033 for cell lines derived from tumors. An informative locus for LOH analysis was defined as having AB in the blood sample and anything except no-call in the tumor sample. An average of 26% and 25% of the loci were informative in tumor and cell lines, respectively. The ratio of loci with LOH to number of informative loci ranged from 0.9% to 7.9% (average, 3.6%) in the tumor and 2.1% to 14.8% (average, 6.7%) in the cell line samples. This wide variation in LOH/informative loci ratios is most likely due to the heterogeneity of the tumors. The LOH/informative loci ratio was higher in all cell lines compared with tumor samples (average, 45.7% higher) possibly due to contamination of tumor samples by normal stromal elements.
For LOH, concordance was measured simply as a match of the raw SNP call between tumor and cell line (A, B, AB, no-call), as well as matching blood/tumor LOH and the blood/cell line LOH (loss, retention, noninformative, conflict for each SNP). For copy number concordance, each SNP was assigned a category (<1.2, loss; >2.8, gain; >5, amplification) and this category was compared between tumor and cell line. Inferred LOH was determined using dChip's hidden Markov model function (9, 10). For raw SNP calls, the average concordance was 94% (83–98%; Supplementary Table S2). For raw LOH calls, the average concordance was 96% (91–99%), whereas for inferred LOH calls, the average was 98% (90–100.0%). The high degree of concordance for LOH calls confirms that the cell lines were derived from the tumor samples and not from other surrounding elements (12). Copy number analysis revealed lower levels of concordance (raw, 43–87%; average, 71%; inferred, 60–91%; average, 82%).
We hypothesized that contamination by normal stromal elements leads to an under-reporting of LOH and CNA events in tumor specimens. Within areas of LOH, stromal contamination should, in effect, “dilute” the LOH calls and copy numbers and manifest as ROH in the tumor sample and LOH in the derivative cell line. Outside areas of LOH, stromal contamination should be transparent, as normal elements do not affect the SNP calls. We observed this visually, and a representative section is shown in Fig. 1. In areas of LOH on chromosome 1p (Fig. 1A, red vertical bars), we observed more loci with loss in the cell lines than in the tumors, whereas outside the regions of LOH, this increase was not observed. We sought to quantify and compare the number of loci that displayed ROH within the areas of LOH in the tumor versus the derived cell line. We developed an algorithm to query the raw LOH data for windows of 150 SNPs containing at least 5% loci with LOH in both the tumor and the cell line. Analysis of the data confirmed that this algorithm indeed recognized all common blocks of LOH identified by visual inspection. Of note, the hidden Markov model inference algorithm in dChip failed to find some common regions of LOH, specifically those for which there were many loci that displayed ROH in the tumor and LOH in the cell line (Fig. 1B). Within these common regions of LOH, we tallied the number of loci that displayed ROH and compared this tally to the number of loci within these regions displaying LOH (Table 1). For eight of nine samples, there was a statistically significant increase in the number of loci with ROH in the tumor compared with the cell line. One sample (#5463) did not have a significant difference between tumor and derivative cell line, perhaps due to less stromal contamination in the tumor.
Sample noise may manifest as a SNP being called as conflict or no-call. Visual inspection of the LOH data suggests that in two samples (e.g., #298, #4793; Table 1B), there are many more loci displaying either conflict or no-call in the cell line versus the tumor sample. We quantified this by tallying the number of loci with either conflict or no-call both within areas of LOH and outside (Table 1B). Consistent with our visual observations, samples #298 and #4793 had significantly more loci with conflict or no-call in the cell line versus the tumor, both inside and outside regions of LOH. The remaining samples had either a decrease in either conflict or no-call (#400, #4030, #1355, #5134, #5463) or a small increase (#1038, #4903), although these latter two samples displayed a decrease in conflict/no-call within regions of LOH. Taken together, these data suggest that, at least for seven of the nine samples, sample noise does not account for the increased LOH seen in the cell line versus the tumor sample.
We looked for stromal contamination in the determination of copy number. If normal elements are present in the tumor specimens, then in areas of CNL, the copy number for the tumor should be intermediate to that of the blood and cell line. Even at low magnification (Fig. 1C), it is evident that the CNL is more exaggerated in the cell line versus the tumor samples, whereas outside of the areas of CNL, there seems to be no difference. To quantify this, we calculated the average normalized copy number across the genome for a sliding window of 150 SNPs. For areas where the average copy number of the cell line or tumor sample was V0.8 than normal (CNL), we compared the copy number of the blood, tumor, and cell line specimens (Table 1C). In eight of nine samples, there were more windows where the average copy number for the tumor was intermediate to that of the blood and the cell line.
We tabulated the observed LOH and CNAs observed over all samples (Supplementary Table S3; Fig. 2). The most common region of LOH was on chromosome 1p (seven of nine tumors). LOH of chromosome arm 10q was seen in two samples in both tumors and cell lines and in one sample in the cell line only. LOH was seen at 11p in one tumor and corresponding cell line. There was partial gain of chromosome arm 17q in eight of nine tumor samples and corresponding cell lines.
Most areas of LOH corresponded to areas of CNL (Supplementary Table S3A; Fig. 2A), as was seen in all cases of 1p LOH. UPD of 11p was seen in 4 of 22 neuroblastoma samples in our previous 10K study (2). Similarly, in this analysis, 11p LOH was accompanied by normal copy number in one tumor (#400; Fig. 3A). Novel areas of UPD were seen also on chromosome arms 14q, 15q, and 1q (Fig. 3B–D). As in our previous study, we again saw 17q gain accompanying LOH (Fig. 4A).
A novel area of amplification was found on chromosome 11p in 1 sample (Fig. 4B), with copy numbers >20. High copy numbers were also found on chromosome arm 2p around the MYCN oncogene (Supplementary Fig. S2B). Another discontinuous area of high copy number on chromosome arm 2p was found in 1 sample (#1355, smallest region of overlap, 30.15–30.17 Mb; data not shown), surrounding the ALK gene.
Of the seven samples demonstrating LOH on chromosome arm 1p, five showed amplification of chromosome 2p at the MYCN locus (Supplementary Fig. S2A–B), whereas the other two showed CNG on chromosome arm 2p at the same locus. LOH on chromosome arm 10q in two samples with 1p LOH and MYCN amplification was also seen (Supplementary Fig. S2C). In one sample (#5134), 10q LOH was only seen in the cell line and not in the tumor sample.
One of the goals of this study was to determine if cells cultured from tumors enable identification of genetic abnormalities that may be masked by stromal elements present in the primary tumor samples. We hypothesized that genome-wide studies of tumor samples may under-report genetic changes if the samples are contaminated with normal stromal tissue or heterogeneous tumor elements containing diploid DNA. To study this question, we performed parallel analyses of tumors and derivative cell lines. Care was taken to ensure that cell lines in this study were analyzed before multiple passages so as to minimize novel genetic abnormalities that may arise during culture. If the tumor samples were contaminated by normal stromal elements, we would expect to see more LOH in the cell line samples at a given locus because they would represent a more “pure” population, and indeed, this was seen to be the case. For instance, the ratio of loci with LOH to all informative loci was found to be consistently higher for cell line samples compared with tumor samples (Supplementary Table S1). Similarly, in eight of nine samples, findings of CNL were “muted” in the tumor samples compared with the cell lines. Our findings suggest that studies of tumor tissue may lead to an under-reporting of LOH and CNL. An alternative explanation for the relatively increased degree of LOH and CNL seen in the cell line samples is that the cell lines selectively maintained and intensified those anomalies that were essential for tumorigenesis and survival. This hypothesis would only hold for areas outside of blocks of LOH, as areas of LOH should be lost in whole and not one allele at a time. Outside areas of LOH, the cell line data mostly mirrored tumor data, but there were some areas of LOH that were seen in the cell lines only and not in the parent tumor. These few areas are likely new abnormalities acquired during cell culture despite keeping the number of passages to a minimum. For example, in one sample (#5134), 10q LOH was only seen in the cell line with no suggestion of LOH in the tumor sample. In contrast, although the inferred LOH at 1p was seen for samples #4030 and #1038 only in the cell line but not in the tumor samples, when the raw LOH calls are viewed (Fig. 1A), it is readily apparent that the inferred LOH is underrepresentative because of the increased retention calls in the tumor sample. This may be due to the masking of LOH by contamination by either normal elements or heterogenous diploid tumor DNA.
An alternative hypothesis is that increased sample noise in the cell line could account for the increased number of loci displaying LOH. When we queried loci with either no-call or conflict as a measure of noise, we found that for five of nine samples, there was more noise in the tumor specimen. For two of the four samples in which there was slightly more noise in the cell line overall (#1038, #4903), the trend within the areas of LOH was the opposite, i.e., there were more loci with a call of no-call or conflict in the tumor samples. Thus for seven of nine samples, there is no evidence to support the theory that increased noise could account for the increased loci with LOH seen within blocks of LOH in the cell line samples. For the remaining two (#298, #4793), this possibility cannot be excluded.
These findings have several implications. First, it is common to study tumor tissue when available, rather than a derived cell line, so it is not surprising that normal elements in the surrounding tissue stroma can contaminate the findings of LOH and CNAs. In addition to effects of normal elements, it is also conceivable that other populations of tumor cells may also complicate findings. One way to circumvent this problem is to use laser capture microdissection to isolate tumor cells before analysis (13, 14). Another approach is to derive cell lines from the tumor cells, as done in this study. One of the common criticisms of this approach is that there will be a selection for advantageous mutations in the tumor. Our data do not support this, and we feel that it is reasonable to culture tumor cells for a limited number of passages (~5) to obtain a more homogenous population. Finally, an algorithm for genotype copy number correction based on aneuploidy that might also account for contamination by normal stromal elements was recently published (3).
We found LOH on chromosome 1p in almost all samples and 10q and 14q LOH in a subset. LOH at 10q has been reported in 18% to 53% of neuroblastoma tumors (15, 16) and is significantly associated with decreased survival, albeit in small numbers (16). There are conflicting reports on the clinical relevance of 14q deletion, correlating 14q LOH both in advanced disease (17), and in low and intermediate risk tumors (16). Of note, our earlier work using the 10K array showed LOH at 11q in 68% of the samples, whereas in the current study, LOH at 11q was not seen. This is likely explained in that all but two samples had MYCN amplification, which is negatively correlated with 11q LOH (18).
Gain of 17q, associated with a more aggressive phenotype (19), was seen in a majority of the samples as previously shown. The CNG seen on chromosome 1q has been reported in 20% of primary tumors and seems to be a marker for therapy failure (20–22). Although it has mainly been reported as a finding in cell lines (16), we found gain of 1q in two tumors and corresponding cell lines.
We found a novel area of 11p amplification one sample (Fig. 4B). One of the genes found in this area, SOX6, has been associated with neuronal differentiation and tumor response to retinoic acid (23). In addition to amplification of the MYCN gene on 2p24 and another locus of amplification at 11p15, one sample also showed MYCN-discontinuous CNG at the site of the ALK gene (sample #1355; region not shown). ALK encodes a tyrosine kinase receptor, and has been found to be amplified and mutated in neuroblastoma (24).
The mechanism of LOH in an area can be inferred by assessing the corresponding copy number change. LOH is defined as the loss of one parental allele, leaving the other allele unopposed, resulting in a decrease in copy number because half of the genetic material will be missing. Most areas of LOH seen in this study (e.g., 1p, 10q) correspond to areas of CNL. The presence of one or more tumor suppressor genes on 1p has been postulated, and several candidate genes have been identified, including KIF1B (25) and CHD5 (26). In this study, we found that all three samples with LOH and CNL on chromosome arm 10q also had LOH of 1p, MYCN amplification, and 17q gain. Whole chromosome 10 loss has been reported in ~50% of tumors with MYCN amplification and is significantly predictive of a poor outcome (16). It is very likely that 10q is the site of a tumor suppressor gene that acts in concert with the other three adverse prognostic factors to induce tumor progression.
UPD is the result of the loss of one parental allele with a duplication of the other allele, manifesting as LOH without a change in copy number. We found four loci of LOH with neutral copy number: 1q, 11p, 14q, and 15q. As in our previous study, we found UPD of 11p. One explanation for UPD is that the duplicated allele contains a mutated gene that is selective for tumor proliferation or progression. It is interesting that 1q is a region that is normally gained in neuroblastoma, and in this study, we see LOH without copy number loss. LOH of 14q has previously been reported in 22% of primary tumors (17), but this is the first reported occurrence of UPD involving 14q. Loss of 15q has been reported previously in one study and was associated with disease progression (27), whereas gain of 15q has also been reported in neuroblastoma (22). Larger studies are required to determine the frequency and relevance of 15q abnormalities in neuroblastoma.
Finally, there are instances where there is loss of one allele with a concomitant increase in copy number of the other allele. Our previous finding of 17q LOH with CNG was again seen in this study in both the tumor and derivative cell line. It is possible that one allele is lost early in neuroblastoma evolution leading to gain/amplification of the other allele.
The main aim of this study was to show that interpretation of SNP data from tumor specimens might be affected by contamination by normal stromal elements or diploid DNA from heterogeneous tumor specimens. We studied the differences in LOH calls between tumors and derivative cell lines and found evidence that normal elements were “muting” the LOH data in cell lines. Furthermore, we found that, in general, the increase in LOH in cell lines cannot be explained through increased mutations in passaged cell lines. Rather, we believe that our data show that stromal contamination is more likely the cause, suggesting that low-passage tumor-derived cell lines may be a better starting material for genome-wide SNP array studies.
Grant support: Children's Oncology Group Grant U10 CA98543, NIH grant CA87847 (J.M. Maris), Bear Necessities Pediatric Cancer Foundation (S.L. Volchenboum), and Children's Oncology Group Young Investigator Award and the Hope Street Kids Research Fellowship (R.E. George).