|Home | About | Journals | Submit | Contact Us | Français|
To define copy number alterations and gene expression signatures underlying pediatric high-grade glioma (HGG).
We conducted a high-resolution analysis of genomic imbalances in 78 de novo pediatric HGGs, including seven diffuse intrinsic pontine gliomas, and 10 HGGs arising in children who received cranial irradiation for a previous cancer using single nucleotide polymorphism microarray analysis. Gene expression was analyzed with gene expression microarrays for 53 tumors. Results were compared with publicly available data from adult tumors.
Significant differences in copy number alterations distinguish childhood and adult glioblastoma. PDGFRA was the predominant target of focal amplification in childhood HGG, including diffuse intrinsic pontine gliomas, and gene expression analyses supported an important role for deregulated PDGFRα signaling in pediatric HGG. No IDH1 hotspot mutations were found in pediatric tumors, highlighting molecular differences with adult secondary glioblastoma. Pediatric and adult glioblastomas were clearly distinguished by frequent gain of chromosome 1q (30% v 9%, respectively) and lower frequency of chromosome 7 gain (13% v 74%, respectively) and 10q loss (35% v 80%, respectively). PDGFRA amplification and 1q gain occurred at significantly higher frequency in irradiation-induced tumors, suggesting that these are initiating events in childhood gliomagenesis. A subset of pediatric HGGs showed minimal copy number changes.
Integrated molecular profiling showed substantial differences in the molecular features underlying pediatric and adult HGG, indicating that findings in adult tumors cannot be simply extrapolated to younger patients. PDGFRα may be a useful target for pediatric HGG, including diffuse pontine gliomas.
High-grade gliomas (HGGs) comprise 15% to 20% of all childhood tumors of the CNS, and 70% to 90% of patients die within 2 years of diagnosis. Consequently, improved understanding of pediatric HGG to identify relevant therapeutic targets is essential.1
The frequency, anatomic location, and pathologic spectrum of gliomas differ in children and adults, suggesting that the representation of progenitor and mature cell types, as well as the microenvironment within the developing brain, may influence the disease process. Glioblastomas dominate adult disease, whereas juvenile pilocytic astrocytomas are the most common brain tumors in children. Pediatric glioblastomas often arise in brain regions that are rarely targeted in adult disease. In adults, most low-grade diffuse gliomas undergo anaplastic progression to a high-grade tumor over time, but progression of pediatric low-grade diffuse gliomas is rare.2,3
Array-based studies of adult glioblastoma identified common regions of genomic gain and loss and gene expression signatures, allowing molecular subclassification of tumors.4–11 Comprehensive studies integrating copy number, gene expression, and mutation analyses reported that virtually all glioblastomas have disrupted the p53, PI3K/receptor tyrosine kinase (RTK), and RB pathways through various genetic mechanisms.12,13
By comparison, pediatric HGG is an understudied disease. Specific genetic alterations underlying pediatric HGG were defined primarily by directed analyses of genes that are mutated in the more common adult HGG. Mutations in TP53, CDKN2A, and PIK3CA are common in both adult and pediatric HGG.14–16 PTEN mutations and EGFR amplifications, which are frequent in adult primary glioblastoma, are less common in pediatric HGGs, which also arise de novo.15,17 Two disease subsets of pediatric glioblastoma with differential survival that were distinguishable from adult glioblastoma were identified based on expression signatures.18 Array-based copy number studies of pediatric HGG using relatively small sample sizes supported a difference between childhood and adult tumors.19,20
Here, we provide, to our knowledge, the first report of a high-resolution unbiased analysis of genomic imbalances and gene expression signatures in a large collection of pediatric HGGs. We show that HGGs in children and adults are a related spectrum of disease driven by significantly different frequencies of genomic alterations.
We analyzed snap-frozen HGG specimens from 78 pediatric patients (< 23 years old) from St Jude Children's Research Hospital (Memphis, TN) and the Children's Cancer and Leukemia Group in the United Kingdom (Data Supplement). Ethical review committee approval was obtained from each institution/consortium. All tumors were collected before adjuvant therapy for the glioma including 10 gliomas that arose in patients who previously received irradiation (IR) for a different cancer (IR-induced tumors). Sections from matched formalin-fixed paraffin-embedded tissue were reviewed by neuropathologists (D.W.E. and J.L.). DNA extraction and, when sufficient material was available, RNA extraction and tissue smears were performed as described.21
DNA was labeled and hybridized to Affymetrix 500K GeneChips (Affymetrix, Santa Clara, CA). Fifty-three tumor samples with qualified RNA were profiled using Affymetrix Human Genome U133 Plus 2.0 arrays. Details of single nucleotide polymorphism data analyses, validation by quantitative polymerase chain reaction (Data Supplement) and FISH, and expression and statistical analyses are provided in the Appendix (online only). Array data are deposited at the Gene Expression Omnibus Web site (http://www.ncbi.nlm.nih.gov/geo/, accession No. GSE19578).
We used Affymetrix 500K GeneChips to identify copy number imbalances in 58 WHO grade 4 tumors and 20 WHO grade 3 pediatric HGGs (Data Supplement). Genomic Identification of Significant Targets in Cancer (GISTIC)22 was used to identify significant copy number aberrations (Fig 1). We excluded the 10 IR-induced tumors from the GISTIC analysis. To define candidate pediatric HGG cancer genes, we mapped regions of focal high-level amplification (copy number > five) or likely homozygous deletion (focal loss with copy number < 0.8), after exclusion for normal copy number variation (Data Supplement).
Recurrent broad low-amplitude gains of chromosome 1q and focal high-amplitude gains encompassing PDGFRA were observed at the highest frequency (29% and 12%, respectively; Fig 1A). The minimal common region of focal amplification was restricted to PDGFRA, which was consistently overexpressed when amplified (Fig 2, Data Supplement). Other significant gains were found in only 1% to 4% of tumors, with significance scores driven by high-level amplification (Fig 1A). Additional amplified genes included those encoding cell cycle progression proteins (CCND2, CDK4, MYC, and MYCN), RTKs and ligands (EGFR, MET, PDGFB, and NRG1), members of the PI3K pathway (PIK3C2B, PIK3C2G, PIK3R5, KRAS, AKT1, and S6K1), and p53 pathway regulation (MDM4; Data Supplement).
In contrast, most of the significant losses involved broad regions including chromosomes 10q (38%), 13q (34%), and 14q (29%; Fig 1B). The only narrow peak of high significance identified focal homozygous deletion encompassing CDKN2A/CDKN2B in 19% of tumors and was associated with absent expression of both genes (Data Supplement). Additional homozygous deletions of tumor suppressor genes of known importance in glioma included CDKN2C, NF1, PTEN, RB1, TP53, and TP73, reflecting abrogation of common signaling pathways,12,13 and the tyrosine phosphatase PTPRD.23,24 Further candidates included other tyrosine phosphatases (PTPRE and PTPN2), DNA repair genes (ATR, TOPBP1, and KU80), and members of the Notch pathway (DLK1 and NEDD4L; Data Supplement).
Seven glioblastoma samples were diffuse intrinsic pontine gliomas, an understudied tumor type that is rarely biopsied.1 Focal amplification of PDGFRA was observed in two (29%) of seven of these tumors. Copy number imbalances in this subgroup were not significantly different from other glioblastomas (P > .1 for gains or losses of all individual chromosome arms, PDGFRA amplification, and CDKN2A deletion; Data Supplement).
Overall, there was an average of 5.7 large regions of copy number imbalance per tumor (median, four regions; range, zero to 23 regions), with no difference between histologies or grade of tumor (P > .3; Data Supplement). The total numbers of large-scale gains and losses per tumor were not significantly different in IR-induced tumors (P > .19, nonparametric Wilcoxon test). However, specific gains of chromosome 1q and 9q and losses of 13q and 1p were significantly more frequent in IR-induced tumors compared with the rest of the HGGs (Table 1; P = .03, P = .01, P = .04, and P = .01, respectively). All other copy number imbalances for individual chromosomal arms were not significantly different between IR-induced tumors and other HGGs (P > .1, Fisher's exact test). One hundred ninety-six regions with focal aberrations were observed, comprising 66 amplifications and 130 deletions. Focal amplifications of PDGFRA and deletions of CDKN2A/B were significantly more common in IR-induced tumors (Table 1; P = .01 and P = .05, respectively).
There was a significant association between 1q gain and decreased survival among patients with glioblastoma, excluding IR-induced tumors (P = .04; Data Supplement); however, we could not determine whether this effect was independent of treatment with the available sample size.
We compared imbalances in pediatric HGG with published findings on copy number changes in adult glioblastoma. We considered frequencies in all pediatric HGG and pediatric glioblastoma alone, excluding variant glioblastomas, for comparison with adult glioblastoma (Table 1 and Data Supplement). Pediatric glioblastomas were clearly distinguished from adult glioblastomas by frequent gain of chromosome 1q and the paucity of chromosome 7 gains and 10q losses. The most frequent focal amplifications differ, with PDGFRA and EGFR predominant in childhood and adult populations, respectively. In contrast, the frequencies of 13q and 14q loss were similar between pediatric and adult glioblastoma. Copy number imbalances were not significantly different between pediatric glioblastomas and all de novo pediatric HGG (P > .2; Table 1).
In adults, secondary glioblastomas show overexpression or amplification of PDGFRA but rarely contain EGFR amplification,3 suggesting that pediatric HGG with PDGFRA amplification may be molecularly similar to adult secondary glioblastoma. IDH1 mutations at codon 132 strongly distinguish adult secondary from primary glioblastoma, with frequencies of 85% compared with 5%, respectively.12,25–28 We sequenced IDH1 exon 4, containing codon 132, from 78 pediatric HGGs and 11 pediatric low-grade gliomas. No codon 132 mutations were detected, consistent with previous reports showing only rare IDH1 mutations in smaller collections of pediatric HGGs.26,28 The only IDH1 alterations found in our pediatric collection were in HGG153, which contained two missense mutations in trans, encoding R49C and G97D, which were not found in the matched germline DNA (Data Supplement), and focal homozygous deletion encompassing IDH1 in HGG10. The absence of hotspot mutations in IDH1 strongly distinguished pediatric HGG from adult secondary glioblastoma.
Although the majority of pediatric HGGs showed multiple genomic imbalances, 15 tumors in our collection showed relatively stable genomes (Data Supplement). Tissue smears from the frozen sample used for DNA and RNA extraction were available for nine of 15 stable samples, and seven of nine smears showed greater than 75% tumor cells (reviewed by D.W.E.), strongly suggesting that the tumor samples were of sufficient purity to detect copy number imbalances. Normal tissue within primary tumor samples can mask detection of copy number imbalances, especially homozygous deletions.29 Some of the stable cases showed conclusive evidence of minimal contaminating normal tissue by detection of homozygous deletions, loss of heterozygosity, and point mutations (Data Supplement). Thus, a subset of pediatric HGGs showed minimal copy number imbalances in relatively pure tumor samples.
We analyzed gene expression profiles from 53 of the tumors in our collection. Unsupervised hierarchical clustering identified three main tumor subgroups (HC1 to HC3; Data Supplement). Gene ontology analysis of upregulated genes that most discriminate each subgroup from the others (Data Supplement) showed the most significantly overexpressed genes were associated with cell cycle regulation in HC1, with neuronal differentiation in HC2, and with extracellular matrix–receptor interactions and cell adhesion in HC3. We used gene set enrichment analysis to show that these pediatric subgroups significantly recapitulated subgroups previously defined in adult HGG, termed proliferative (Prolif), proneural (PN), and mesenchymal (Mes)9 (Data Supplement).
We integrated genomic copy number imbalances with this expression subgroup classification (Fig 3). Most common abnormalities were distributed across subgroups. However, seven (88%) of eight of the amplifications targeting the PDGFR signaling cascade through PDGFRA and/or PDGFB were found in the Prolif/HC1 subgroup (association with this subgroup, odds ratio = 8.44, P = .05), implicating this pathway as a strong driver of proliferation in childhood tumors. Gain of 1q was found at high frequency in the Prolif/HC1 tumors (52%) and the PN/HC2 group (23%) but was significantly under-represented in the Mes/HC3 subclass (8%; odds ratio = 0.12; P = .04, Fisher's exact test).
Supervised comparison of glioblastomas to anaplastic astrocytomas showed significantly increased expression of genes associated with angiogenesis in glioblastoma, a strong molecular signature relating to the microvascular proliferation that is characteristic of these tumors (Data Supplement). Gene expression profiles were available from nine of 15 tumors showing overall genomic stability, and they were represented in each expression subgroup (Fig 3). As a group, tumors with a stable genome showed decreased expression of genes associated with cell cycle and DNA repair (Data Supplement).
Patients with HGG younger than age 3 years have a better prognosis than older children.30 None of the 11 infant HGGs showed chromosome 1q gain, a significant difference compared with HGGs from older children (P = .03). At the gene expression level, infant tumor gene expression profiles were heterogeneous, distributing among all three expression subclasses (Fig 3). Infant HGGs overexpressed genes involved in nervous system development and calcium ion binding and showed coordinated underexpression of multiple HOX genes when compared with HGGs from older children (Data Supplement).
As a group, IR-induced tumors significantly overexpressed genes relating to control of gene expression and metabolic processes compared with other pediatric HGGs (Data Supplement). There was significant over-representation of genes that mapped to chromosome 1q (P < .001) and the region of amplification encompassing PDGFRA (P < .001), reflecting the higher incidence of these copy number gains in IR-induced tumors.
We performed a principal component analysis (PCA) using data from our pediatric glioblastomas and adult glioblastomas that were analyzed on the same platform (n = 32,22,31 Fig 4). The first component of the PCA is predominantly associated with differences between the PN, Prolif, and Mes subgroups found in both pediatric and adult tumors. However, the second principal component shows a trend separating pediatric and adult tumors. Consistent with differences in frequencies of amplification, PDGFRA was significantly overexpressed (q = 0.000002) and EGFR was significantly repressed in pediatric tumors (q = 0.0003; Data Supplement). Gene ontology analysis highlighted over-represented pathways among the differentially expressed genes, including immune response, response to extracellular stimulus, cell adhesion, cytokine- and chemokine-mediated signaling pathways, and calcium-mediated pathways.
To determine whether gene expression signatures of pediatric HGG more closely resemble the smaller subset of adult glioblastomas with PDGFRA amplification, we identified gene sets distinguishing adult glioblastomas with PDGFRA amplification from those with EGFR amplification using published data13 (Data Supplement) and applied gene set enrichment analysis to the combined data from our pediatric collection and the adult glioblastoma data set used in Figure 4. Pediatric glioblastomas, regardless of PDGFRA amplification status, show significantly increased expression of the gene set upregulated in adult PDGFRA-amplified tumors (Fig 5A), whereas the gene set that is upregulated in adult EGFR-amplified tumors is downregulated in pediatric HGG as well as the adult PN subclass (Fig 5B). The adult PN subclass also shows overexpression of certain genes within the adult PDGFRA-amplified gene set, although these genes comprise a distinct subset compared with those upregulated in pediatric HGG (Fig 5A; Data Supplement). Notably, one adult tumor from a 53-year-old patient was similar to pediatric tumors in expression of the TCGA PDGFRA gene set and showed similar positioning to pediatric tumors in the PCA analysis (Figs 4 and and5).5). The subset of the PDGFRA-amplified gene set that showed the greatest enrichment in pediatric compared with adult glioblastoma was associated with cell cycle regulation and multicellular organismal development.
The high-resolution analysis of copy number and gene expression signatures reported here demonstrates that pediatric and adult HGGs represent a related spectrum of disease distinguished by differences in the frequency of copy number changes, in specific gene expression signatures, and by the absence of IDH1 hotspot mutations. In pediatric HGG, numerous genes within the p53, PI3K/RTK, and RB pathways are targeted by focal gain or loss (Data Supplement), but with the exception of PDGFRA and CDKN2A, other alterations were found only at low frequency.
Although the majority of pediatric HGGs in our series showed multiple genomic imbalances, a subgroup of tumors (15 of 78 tumors, 19%) lacked large-scale copy number changes. Other childhood tumors show subsets with balanced genomic profiles including ependymoma and CNS supratentorial primitive neuroectodermal tumors,32–34 Ewing sarcoma,35 and Wilms tumor.36 Tumors with balanced genomes may possess an inherited or acquired predisposition for generating copy neutral mutations, such as the subset of colorectal cancers that arise in the context of DNA mismatch repair.37 Alternatively, relatively fewer mutations may be required to drive the disease, as in pediatric acute myeloid leukemias, which show low frequency copy number imbalances and sequence alterations.38
Frequent gain of chromosome 1q clearly distinguished childhood from adult HGG and showed corresponding upregulation of gene expression involving the whole chromosomal arm, precluding identification of a focal target. A similar pattern of differential gain of 1q in childhood compared with adult brain tumors is seen in ependymoma,32,33,39 and gain of 1q is common in other pediatric malignancies.40–43
PDGFRA is the predominant target of focal amplification in pediatric HGG, in contrast to adult glioblastoma, where EGFR is the most common target. Previous studies have suggested that overexpression may be an alternative mechanism of activating EGFR in childhood glioblastoma.44–46 However, we found that EGFR was significantly underexpressed in pediatric compared with adult glioblastoma. The gene expression signature of adult tumors associated with EGFR amplification was relatively underexpressed in pediatric glioblastoma, whereas the gene expression signature associated with PDGFRA amplification was significantly overexpressed in pediatric glioblastomas, even in tumors that did not show amplification of the gene. Overall, both the copy number and gene expression analyses suggest that PDGFRα may be an important therapeutic target for pediatric HGG, including diffuse intrinsic pontine gliomas. A small subset of pediatric glioblastomas within the Mes subgroup appeared similar to adult tumors of the same subgroup when evaluated by PCA (Fig 4). The median age of onset for these tumors was 11.6 years, and one tumor was from an infant, so this similarity is independent of age. Interestingly, the gene expression signatures in this subset of pediatric tumors also showed the greatest similarity to adult glioblastomas with EGFR amplification (Fig 5B) and may indicate a small pediatric subgroup in which EGFR inhibitors may have a greater effect.
The preferential targeting of PDGFRA and EGFR in pediatric and adult HGG, respectively, suggests that the developing brain is more susceptible to oncogenic transformation triggered by aberrant PDGFR signaling. These differences may reflect changes in the populations of cell types expressing the growth factor receptors and their ligands during development and differentiation and complex regulation causing different cellular responses to activated signaling of the receptors. Both growth factor receptors play important roles in nervous system development and lineage commitment.47–49 Mouse models suggest that aberrant PDGF signaling, but not EGFR activation, was sufficient to trigger glioma formation.50–54 This is consistent with observations in human tumors where EGFR amplifications are rare in lower grade gliomas, whereas PDGF and PDGFR overexpression/amplification are found in both low-grade and high-grade astrocytomas, suggesting that activated PDGFR signaling is an early event in gliomagenesis.3
Mutations that cause tumor initiation will lead to tumor formation more rapidly by expanding the pool of available cells that may acquire additional mutations. This may be particularly relevant in early disease onset in children. Ten tumors in our study arose in patients previously treated for another cancer with cranial IR (IR induced), a mutagenic exposure that increases the risk of brain tumors.55 IR-induced tumors were similar to other pediatric HGGs, with gene expression profiles distinguishing them from adult glioblastomas by PCA (Fig 4) and in histopathologic features. The increased incidence of chromosome 1q gain and PDGFRA amplifications seen in IR-induced HGG may reflect radiation-induced initiating mutations that greatly increase the likelihood of developing childhood HGG. The same mutations also confer a strong selective advantage in pediatric HGG tumors arising spontaneously. Many pediatric HGG that lack amplification of PDGFRA or PDGFB still show strong expression of the gene signatures associated with PDGFRA amplification (Fig 5A), supporting the hypothesis that this pathway plays a central role in pediatric HGG and may be activated by multiple mechanisms.
Tumor samples were obtained from the St Jude Children's Research Hospital and Children's Cancer and Leukemia Group Tumor Banks. We thank Drs Marie-Anne Brundler, Keith Robson, and Safa Al-Sarraj for histopathology review; Dr Lisa Storer and Sarah Leigh-Nicholson for help with sample information; and Lara Perryman and Suzie Little for technical assistance.
The genotype calls from the dynamic model (Di X, Matsuzaki H, Webster TA, et al. 21:1958-1963, 2005) mapping algorithm of Affymetrix Genotyping Analysis Software (GTYPE version 4.0; Affymetrix, Santa Clara, CA) was improved using Bayesian Robust Linear Model with Mahalanobis (BRLMM) distance classifier algorithm (Rabbee N, Speed TP. Bioinformatics 22:7-12, 2006; http://www.biostat.jhsph.edu/~iruczins/teaching/misc/2008.140.668/papers/affymetrix2006.pdf). In addition to the samples collected in this study, we also included 43 normal samples from another study at St Jude Children's Research Hospital (Mullighan CG, Kennedy A, Zhou X, et al. Leukemia 21:2000-2009, 2007) for the estimation of genotype calls using multiple chip analysis approach with BRLMM, which is implemented in Affymetrix copy number pipeline software (v1.5.6). The call rates were improved significantly, with the majority of the chips having a call rate higher than 95%.
First, the copy number analysis was performed using dChipSNP (Zhao X, Li C, Paez JG, et al. Cancer Res 64:3060-3071, 2004; Lin M, Wei LJ, Sellers WR, et al. Bioinformatics 20:1233-1240, 2004) with the Affymetrix CEL files and BRLMM single nucleotide polymorphism (SNP) genotype call files used in this study. We used the standard dChipSNP normalization and model-based expression algorithms. Probe intensity data for each array were normalized to a baseline array with median signal intensity using the invariant set model. Model-based expression was performed using the perfect match/mismatch model to summarize signal intensities for each probe set. For copy number inference, raw copy number was calculated by comparing the signal intensity of each SNP probe set for each tumor sample against the median SNP data of the normal collections. Raw and median-smoothed copy number was examined to identify a set of normal (diploid) chromosomes.
We then used dChipSNP to compute the un-normalized signals for each marker probe set and normalize these signals with the reference normalization algorithm (Pounds S, Cheng C, Mullighan C, et al. Bioinformatics 25:315-321, 2009). Because the arrays were performed in two main batches, we applied the 5-nearest-neighbor normalization22 approach to remove the batch effect. For each tumor, the five most similar normal samples were identified among the entire 43 normal samples. In most cases, the five normal samples selected were from the same batch. We then calculated the log2 ratios for each sample using the sample signal and median of the five nearest normal sample signals and applied the circular binary segmentation algorithm (DNAcopy, Bioconductor; Olshen AB, Venkatraman ES, Lucito R, et al. Biostatistics 5:557-572, 2004). The segmentation on the smoothed log2 ratios was performed using a threshold P = .01 (significance level α = .01). We assigned gain or loss status to the segments with at least three SNPs and absolute log2 ratios greater than 0.2, with which the normal samples have few gain or loss regions.
Before applying Genomic Identification of Significant Targets in Cancer (GISTIC) analysis or focal lesion identification, we first removed the genomic regions that are associated with copy number variations (CNVs). Identification of excluded regions was similar to the approach used by the The Cancer Genome Atlas (TCGA).13 The regions included the following: CNVs found in a SNP6.0 analysis of all HapMap normals (McCarroll SA, Kuruvilla FG, Korn JM, et al. Nat Genet 40:1166-1174, 2008); CNVs identified in at least two independent publications listed in the Database of Genomic Variants (http://projects.tcga.ca/variation, version 3); and CNVs found in 23 matched normal and 20 unrelated normal samples by manual inspection. These regions are often either common among more than five samples or short segments (< 30 markers) appearing in more than one sample.
The segmented data were preprocessed by first removing the short fragments with less than four SNPs by merging to the neighbor fragments. We then merged neighbor fragments with log2 ratio values within 0.2 of each other. We applied the GISTIC algorithm22 to the processed segmented data. G-scores are plotted along each chromosome with both G-scores and q values labeled. Regions above q = 0.25 are considered significant.
We compared chromosome arm changes between pediatric high-grade gliomas (HGGs) and adult HGGs. If more than half of the markers on a chromosome arm had copy number gain or loss, then the entire arm was classified as gain or loss. For adult HGGs, we used level 3 (segmented) SNP6.0 TCGA data downloaded from the TCGA portal (http://cancergenome.nih.gov/dataportal/data/about/) in February 2009.
In addition to GISTIC, we also derived minimum common regions for focal amplification (copy number > five) or focal homozygous deletion (copy number < 0.8). Regions associated with known CNVs were removed as described earlier. All remaining regions were manually inspected for cancer/glioma-related genes, and candidate targets of amplification or deletion were selected based on careful manual evaluation of PubMed and GeneGo databases. Supporting publications for each candidate gene are listed in the Data Supplement. Additionally, microRNAs contained within the region or within 10 kilobases of the region were listed based on miRbase and the University of California, Santa Cruz genome browser.
Large regions of copy number imbalance were determined by merging the segmented data to the neighboring segment if the segment had less than four SNPs or the log2 ratio of the segment was within 0.5 of the neighboring segment. We then counted the number of fragments with log2 ratio less than –0.2 or more than 0.2 as large regions of copy number imbalance.
A group of tumors with stable genome were identified using the quality control approach as reported for adult glioblastoma.22 In brief, the log2 ratios were smoothed by taking the mean value across a running window of Hwindow (= 501) markers. The histogram was generated using a bin size Hbin (= 0.01) and smoothed by convoluting with Gaussian distribution that had a standard deviation of Hsigma (= 0.05). Samples with no detectable peaks were assigned as tumors with a stable genome.
We validated inferred copy number analyses using quantitative real-time polymerase chain reaction (PCR) for 41 loci (Data Supplement). Primers and probes (Data Supplement) were designed using Primer Express 3.0 (Applied Biosystems, Foster City, CA) or Primer 3 software (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi). GAPDH or FLJ16124 (chr 2) were used as internal standards to normalize the data. Two nanograms of DNA from the HGG samples or control normal DNA were amplified using a Taqman 7900 Real-Time PCR system and 7900 System Software (Applied Biosystems). Standard curves for each locus tested were generated from three-fold serial dilutions of control human WBC DNA. Quantitative real-time PCR for each primer set was performed in triplicate, and means are reported. For CDKN2A-exon3 and RB1-primer set 2, a Fast SYBR Green kit (Applied Biosystems) was used with the following PCR conditions: 95°C for 20 seconds, then 40 cycles of 95°C for 5 seconds and 60°C for 30 seconds. At the end of the PCR, samples were subjected to a melting analysis to confirm specificity of these amplicons. Standard-mode quantitative PCR using TaqMan Universal PCR Master Mix was performed (50°C for 2 minutes, followed by 95°C for 10 minutes, then 40 cycles of 95°C for 15 seconds and 60°C for 1 minute) for the following genes: BOC, chr 13q, CNTN2, DLK1, DLEU7, FAS, FBXL7, GALNT13, HTR2A, KLF12, KLF6, MRPL36, MYC, NEDD4L, PDGFB, SLITRK6, TP53, and TARP. For all other loci, fast-mode quantitative PCR was performed using TaqMan Fast Universal PCR Master Mix from Applied Biosystems (95°C for 20 seconds, then 40 cycles of 95°C for 5 seconds and 60°C for 30 seconds).
Where formalin-fixed paraffin-embedded material was available, fluorescent in situ hybridization for probes identifying PDGFRA (4q12), MET (7q31), CDKN2A/B (9p21), PTEN (10q23), RB (13q), and DLK1 (14q32) was performed as described (Lambros MB, Simpson PT, Jones C, et al. Lab Invest 86:398-408, 2006; McManamy CS, Pears J, Weston CL, et al. Brain Pathol 17:151-164, 2007). Probes were directed against PDGFRA (pool of BAC clones RP11-819D11 and RP11-58C6) and MET (Kreatech, Amsterdam, the Netherlands) and were labeled with Cy5 (GE Healthcare, Amersham, United Kingdom) and Platinumbright 495 (Kreatech), respectively, whereas chromosome 4 and 7 centromeric probes were labeled with fluorescein (GE Healthcare) and Platinumbright 550 (Kreatech), respectively. Additional probes were derived from BAC clones (Invitrogen, Carlsbad, CA) RP11-149I2 (CDKN2A), RP11-235C23 (9q31.2), CTD-2553L21 (PTEN), RP11-254A5/322I2 (10p11.21), RP11-1008O15 (KCNRG), RP11-936K15/539J14 (13q12.11), RP11-90G22 (DLK1), and RP11-659J20 (14q11.2) and labeled with either fluorescein isothiocyanate or rhodamine fluorochromes. For RB1, a commercial probe was used (Abbott Laboratories, Des Plaines, IL; Catalog No. 05J15-001). Images were captured using a cooled charge-coupled device camera.
Expression data were analyzed using Affymetrix Microarray Suite software. Gene expression signals were scaled to a target intensity of 500. Probe sets lacking present calls for any samples were excluded. Signals were then variance stabilized by adding 25 and log2 transformed for subsequent analysis.
We calculated the median absolute deviation score for each probe set using the log2 transformed data and selected the top 1,000 most variable probe sets for unsupervised hierarchical clustering analysis. Unsupervised hierarchical clustering analysis was carried out using GeneMaths software (Applied Maths, Austin, TX), with Pearson correlation as the similarity coefficient and Ward as the clustering method. Three major tumor subgroups were identified.
We then identified probe sets that are most upregulated in each subgroup by comparing one group against the other two. The LIMMA (Linear Models for Microarray Analysis; Smyth GK. Stat Appl Genet Mol Biol 3:Article3, 2004) and empirical Bayes t test implemented in Bioconductor (http://www.bioconductor.org; Gentleman RC, Carey VJ, Bates DM, et al. Genome Biol 5:R80, 2004) were used to identify differentially expressed probe sets at a Benjamini-Hochberg false discovery rate (q value) of less than 0.01 and fold change of more than 2. The heat map was generated using 350 probe sets from each group.
To characterize each of the subgroups, we performed gene ontology (GO) and pathway analysis using Database for Annotation, Visualization, and Integrated Discovery (DAVID) Bioinformatics Resources (http://david.abcc.ncifcrf.gov/home.jsp; Dennis G Jr, Sherman BT, Hosack DA, et al. Genome Biol 4:P3, 2003; Huang W, Sherman BT, Lempicki RA. Nat Protoc 4:44-57, 2009)16,17 with the upregulated genes specific to each group. The full list of probe sets and the results of GO and pathway analysis are provided in the Data Supplement.
Gene set enrichment analysis (GSEA) implemented in R (http://www.r-projects.org) was used to assess enrichment of previously identified adult HGG signature genes in pediatric HGG (Freije WA, Castro-Vargas FE, Fang Z, et al. Cancer Res 64:6503-6510, 2004).9 As opposed to the subgroups derived here using unsupervised analysis, the adult subgroups were derived initially from the comparison of tumor subgroups with different survival time. We generated seven gene sets using the signature genes from these subgroups and applied GSEA to the pediatric HGG, with one subgroup against the other two. All of the gene sets are significantly enriched in one of the tumor subgroups, with the gene sets generated from the Phillips et al9 subgroup scored on top of each pediatric subgroup.
There are a number of published studies including adult glioblastoma gene expression arrays (Freije WA, Castro-Vargas FE, Fang Z, et al. Cancer Res 64:6503-6510, 2004; Sun L, Hui AM, Su Q, et al. Cancer Cell 9:287-300, 2006).9,22,31 To minimize any cross-platform artifacts in our analysis, we restricted our comparison of pediatric glioblastoma to adult tumors profiled on the same platform as our study, Affymetrix Human Genome U133 Plus 2.0 arrays, as reported in Sun et al (Sun L, Hui AM, Su Q, et al. Cancer Cell 9:287-300, 2006), Beroukhim et al,22 and Lee et al.31 Nonetheless, because the array experiments were performed at different centers and at different times, we were still concerned with a possible batch effect of nonbiologic experimental variation in our data integration. However, because the batch effect and biologic effect (pediatric or adult) are confounded, we were unable to correct using a model-based method, such as Combat (Johnson WE, Li C, Rabinovic A. Biostatistics 8:118-127, 2007), or simply combining the z scores calculated within each data set, which assumes the means of the two data sets are the same. Instead, we selected a set of the least variable probe sets that are common between the two data sets to check the degree of any batch effect that may be present. First, we filtered all probe sets that are absent in all of the samples. Then we ranked the probe sets in each data set based on their median absolute deviation score, from least variable to most variable. Last, we chose probe sets that were within the top 1,000 in one data set and within the top 4,000 in the other data set as the common least variable probe sets.
Ten, 27, and 81 glioblastoma samples that are on HG-U133 Plus 2.0 arrays have been previously reported (Sun L, Hui AM, Su Q, et al. Cancer Cell 9:287-300, 2006).22,31 While combining the data, we noticed that the profiles of five arrays from Beroukhim et al22 were identical to ones included in the study from Lee et al31 (identical CEL files verified using md5sum). Therefore, we combined these two data sets together after removing the five duplicate arrays from Beroukhim et al22 (Set1, n = 32). There are 660 probe sets selected as the least variable ones between Set1 and our pediatric glioblastomas (n = 33). The P value for the two-sample t test is .7589, indicating no significant batch effect between Set1 and pediatric samples. Similarly, there are 650 probe sets selected as the least variable ones between Sun et al (Sun L, Hui AM, Su Q, et al. Cancer Cell 9:287-300, 2006) and pediatric glioblastomas (n = 33). However, the P value for the two-sample t test is less than .001, indicating significant batch effect. Because Set1 and pediatric glioblastomas have similar sample sizes and no significant batch effect, we chose Set1 as the adult glioblastoma data set for our comparisons.
We applied LIMMA/Bioconductor to identify differentially expressed genes between pediatric and adult glioblastoma. We selected 3,039 probe sets using a q value cutoff of less than 0.01 and minimum fold change of 2. To further characterize the set of genes, we mapped the genes to GeneGo MetaCore canonical gene pathway maps (Ekins S, Nikolsky Y, Bugrim A, et al. Methods Mol Biol 356:319-350, 2007) to identify the most significantly changed gene networks (Data Supplement).
More than 200 adult glioblastoma samples have been profiled by TCGA pilot project.13 From the data downloaded in February 2009 (http://tcga-data.nci.nih.gov/tcga/homepage.htm), we were able to select 14 tumors with PDGFRA amplification and 76 tumors with EGFR amplification. We applied LIMMA/Bioconductor to the downloaded log2-transformed probe set level (level 2) data from Affymetrix Human Genome HTS U133A 2.0 arrays. Differentially expressed genes between the two tumor groups were identified using a q value less than 0.01 and fold change of 2 as the cutoff. There were 693 probe sets selected (Data Supplement). Subsequently, two gene sets were generated comprising upregulated genes in PDGFRA-amplified tumors and upregulated genes in EGFR-amplified tumors.
Because EGFR is the most commonly amplified gene in adult glioblastoma and PDGFRA is the most commonly amplified gene in pediatric glioblastoma, we checked enrichment of EGFR-and PDGFRA-associated genes in adult glioblastomas (Set1) and pediatric glioblastomas using GSEA. GSEA is a computational method that determines whether an a priori–defined set of genes shows statistically significant, concordant differences between two biologic states. The EGFR- and PDGFRA-associated genes are defined as two separate gene sets, and pediatric and adult glioblastoma samples are defined as two biologic states.
We applied LIMMA/Bioconductor to identify differentially expressed genes. We used a cutoff value of P < .01 and minimum fold change of 2. To further characterize the set of genes, we performed GO analysis using DAVID Bioinformatics Resources (http://david.abcc.ncifcrf.gov/home.jsp). Upregulated and downregulated genes were evaluated separately and over-represented GO terms were selected using an Expression Analysis Systematic Explorer score cutoff of less than 0.01 (Data Supplement).
Irradiation (IR) -induced tumors contained a significantly higher frequency of 1q gains and PDGFRA amplifications compared with other HGGs. To determine whether the set of genes overexpressed in IR-induced tumors contained an overrepresentation of genes mapping to these regions, we determined the total number of transcripts selected by the analysis (n = 1,386) as well as covered by the profiling platform (n = 20,607). On the basis of Affymetrix annotation version NA27, probe sets with no chromosome location were not considered. On hundred thirty-nine (10%) of 1,386 overexpressed genes were from chromosome 1q, whereas 933 (4.5%) of 20,607 total transcripts on the platform were from chromosome 1; 15 (1.1%) of 1,386 overexpressed genes were from 4q12, whereas 48 (0.2%) of 20,607 total transcripts localized to 4q12. Therefore, transcripts overexpressed in IR-induced tumors contained a significant over-representation of transcripts from chromosome 1q (P < .001) and the region of amplification encompassing PDGFRA (P < .001; Fisher's exact test).
All statistical tests were performed in R2.8.0 (http://www.r-project.org/). All tests of association between categoric variables were performed using the Fisher's exact test; all tests were two-tailed, and P < .05 was considered statistically significant. GSEA implemented in R was used to compare coordinate expression of sets of genes between our gene expression data and published independent data sets. A nominal P < .25 was considered statistically significant for GSEA.
To determine whether any copy number imbalances were associated with long-term survivors (> 3 years, n = 5) or short-term survivors (< 1 year, n = 19) of glioblastomas arising outside of the brainstem, we evaluated the frequency of gain or loss of each chromosome arm, or focal gain or loss of PDGFRA or CDKN2A, respectively, in each of these groups. There were no significant associations (P > .1) between copy number imbalances and short- or long-term survivors, except for a higher frequency of 5p gain (two of five patients, 40%) in long-term survivors compared with short-term survivors (zero of 19 patients, 0%; P = .04, Fisher's exact test). Patients with short survival (< 1 year) and long survival (> 3 years) were also distributed among all three expression subclasses. We used LIMMA/Bioconductor to identify differentially expressed genes between these two groups. Because of the small sample size and significant heterogeneity, only one gene was identified with a false discovery rate less than 0.1.
Survival analyses were used to investigate the correlation between survival and 1q gain, 10q loss, 13q loss, 14q loss, PDGFRA amplification, or CDKN2A deletion among all HGGs or, for a more similar group of tumors, among all glioblastomas excluding IR-induced cases. The statistical tests for these comparisons were exploratory, and P values were not adjusted for multiple testing. The log-rank test shows that 1q gain is marginally related with poor survival in HGG patients (n = 69, P = .08), and the test becomes significant in the glioblastoma subgroup of patients (n = 44, P = .04). Further analysis was attempted to test whether the association of 1q gain with poor survival was independent of surgical resection or adjuvant therapy. However, the number of samples was unevenly distributed among subgroups when either of these two factors was included, precluding a meaningful survival analysis. Therefore, we could not conclude whether the association of 1q gain with poor survival was independent of surgical resection or adjuvant therapy.
See accompanying article on page 3054
Supported by grants from the Children's Brain Tumor Foundation (S.J.B.), Grant No. P01 CA096832 from the National Institutes of Health, The Sydney Schlobohm Chair of Research from the National Brain Tumor Society, the Ryan McGee Foundation, Musicians Against Childhood Cancer, the Noyes Brain Tumor Foundation, and American Lebanese Syrian Associated Charities. The Children's Brain Tumour Research Centre is supported by the Samantha Dickson Brain Tumour Trust, Air and Ground, The Connie and Albert Taylor Trust, and the Joe Foote Foundation. The Children's Cancer and Leukemia Group Tumour Bank is supported by Cancer Research UK. The Institute for Cancer Research is supported by Cancer Research UK and National Health Service funding to the National Institute for Health Research Biomedical Research Centre.
Presented in part at the Inaugural Pediatric Neuro-Oncology Basic and Translational Research Conference, Asheville, NC, October 2, 2009, and the American Association for Cancer Research Genetics and Biology of Brain Cancer Meeting, San Diego, CA, December 13-15, 2009.
Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.
The author(s) indicated no potential conflicts of interest.
Conception and design: Barbara S. Paugh, Chunxu Qu, Zhaoli Liu, David W. Ellison, Richard G. Grundy, Suzanne J. Baker
Financial support: Amar Gajjar, Richard G. Grundy, Suzanne J. Baker
Administrative support: Amar Gajjar
Provision of study materials or patients: Chris Jones, Darren Hargrave, Amar Gajjar, Alberto Broniscer, David W. Ellison, Richard G. Grundy
Collection and assembly of data: Barbara S. Paugh, Zhaoli Liu, Martyna Adamowicz-Brice, Junyuan Zhang, Dorine A. Bax, Darren Hargrave, James Lowe, Alberto Broniscer, David W. Ellison, Richard G. Grundy
Data analysis and interpretation: Barbara S. Paugh, Chunxu Qu, Chris Jones, Zhaoli Liu, Martyna Adamowicz-Brice, Dorine A. Bax, Beth Coyle, Jennifer Barrow, James Lowe, Wei Zhao, David W. Ellison, Richard G. Grundy, Suzanne J. Baker
Manuscript writing: Barbara S. Paugh, Chunxu Qu, Chris Jones, David W. Ellison, Richard G. Grundy, Suzanne J. Baker
Final approval of manuscript: Barbara S. Paugh, Chunxu Qu, Chris Jones, Zhaoli Liu, Martyna Adamowicz-Brice, Junyuan Zhang, Dorine A. Bax, Beth Coyle, Jennifer Barrow, Darren Hargrave, James Lowe, Amar Gajjar, Wei Zhao, Alberto Broniscer, David W. Ellison, Richard G. Grundy, Suzanne J. Baker