|Home | About | Journals | Submit | Contact Us | Français|
Black patients with neuroblastoma have a higher prevalence of high-risk disease and worse outcome than white patients. We sought to investigate the relationship between genetic variation and the disparities in survival observed in neuroblastoma.
The analytic cohort was composed of 2709 patients. Principal components were used to assign patients to genomic ethnic clusters for survival analyses. Locus-specific ancestry was calculated for use in association analysis. The shorter spans of linkage disequilibrium in African populations may facilitate the fine mapping of causal variants in regions previously implicated by genome-wide association studies conducted primarily in patients of European descent. Thus, we evaluated 13 single nucleotide polymorphisms known to be associated with susceptibility to high-risk neuroblastoma from genome-wide association studies and all variants with highly divergent allele frequencies in reference African and European populations near the known susceptibility loci. All statistical tests were two-sided.
African genomic ancestry was associated with high-risk neuroblastoma (P = .007) and lower event-free survival (P = .04, hazard ratio = 1.4, 95% confidence interval = 1.05 to 1.80). rs1033069 within SPAG16 (sperm associated antigen 16) was determined to have higher risk allele frequency in the African reference population and statistically significant association with high-risk disease in patients of European and African ancestry (P = 6.42×10−5, false discovery rate < 0.0015) in the overall cohort. Multivariable analysis using an additive model demonstrated that the SPAG16 single nucleotide polymorphism contributes to the observed ethnic disparities in high-risk disease and survival.
Our study demonstrates that common genetic variation influences neuroblastoma phenotype and contributes to the ethnic disparities in survival observed and illustrates the value of trans-population mapping.
Neuroblastoma is the most common extracranial solid malignancy in children and is notable for its clinical heterogeneity. Clinical features, including age at diagnosis and stage of disease, and genetic tumor markers are strongly predictive of outcome and used to define risk groups for treatment stratification (1). High-risk neuroblastoma is defined largely by older patients (aged >18 months) with metastatic disease, although all patients with amplification of the MYCN oncogene, regardless of stage, are considered high risk. Ethnic disparities in outcome are well described for many cancers (2–5). However, until recently, little was known about associations between race/ethnicity and survival in neuroblastoma. In 2011, Henderson and colleagues analyzed a cohort of 3539 neuroblastoma patients with known self-reported race and outcome and demonstrated for the first time that black children have an increased prevalence of high-risk disease and inferior survival compared with white children (6).
Multiple factors are known to contribute to racial health disparities, including differences in access to healthcare and other environmental and socioeconomic factors. However, because low-risk neuroblastomas only rarely progress to high-risk disease over time, we hypothesized that genetic variation is largely responsible for the ethnic disparities in tumor phenotype and outcome observed in neuroblastoma. Indeed, germline DNA variants associated with susceptibility to high- or low-risk neuroblastoma have been identified in genome-wide studies conducted primarily in children of European descent (7–12). To investigate the role African genomic ancestry plays in determining neuroblastoma phenotype and to identify genetic variants that statistically account for the ethnic disparities in outcome, we analyzed the genotypes of an ethnically admixed population of neuroblastoma patients (N = 2709) accrued through Children’s Oncology Group neuroblastoma biology protocol ANBL00B1 between 2001 and 2009.
After institutional review board approval from participating institutions and informed consent were obtained, children diagnosed with neuroblastoma, ganglioneuroblastoma, or ganglioneuroma (maturing type) were enrolled in Children’s Oncology Group ANBL00B1 (NCT00904241) between 2001 and 2009. Those patients with available genotype and outcome data formed the analytic cohort (Table 1). Methods to confirm the diagnosis, assignment of stage, analysis of tumor biology (ploidy, MYCN amplification and histology), and assignment of self-reported race have been previously described (6).
DNA samples from patients enrolled in Children’s Oncology Group ANBL00B1 were genotyped using three Illumina platforms: 550v1, 550v3, and Human610-Quad, (Illumina, Inc., San Diego, CA) as previously described (7–9). Included in this analysis are those single nucleotide polymorphisms (SNPs) that were genotyped on all three platforms, had call rates greater than 95%, and had minor allele frequencies less than 5%.
Supplementary Figure 1 (available online) is a diagram that illustrates the sample-based quality control pipeline, as previously described (13). Briefly, extensive quality control analyses were performed on the genotype data, including detection of sex incompatibilities, mis-specified relationships, and duplications.
Call rates were estimated by individual and by SNP to determine average heterozygosity (across all SNPs) for each sample and to evaluate genotype distributions for each SNP in relation to expected Hardy–Weinberg equilibrium (7–9,12).
Principal component analysis was conducted on the genotypes of 2919 individuals—2709 children with neuroblastoma plus 210 reference HapMap (14) samples from descendants of Northern Europeans (Utah residents with ancestry from Europe, CEU, n = 60), West Africans (Yoruba in Ibadan, Nigeria, YRI, n = 60) and East Asians (Han Chinese in Beijing, CHB, n = 45; Japanese in Tokyo, JPT, n = 45)—using EIGENSTRAT (15). The first principal component (PC1) separated patients with African ancestry from other patients. Principal component 2 (PC2) separated patients with Asian ancestry and patients with European ancestry. These leading principal components (PCs) provide the relative contribution of (geographically defined) reference populations.
In addition to principal component analysis, we used a model-based approach for ancestry quantification, as implemented in ADMIXTURE (17), assuming three founder populations. This approach simultaneously yields estimates of ancestry proportions and population allele frequencies in the samples. The latter enabled us to calculate the extent of divergence between the estimated ancestral populations through the Fisher fixation index F ST statistic. We calculated the Spearman correlation between PC1 and the proportion of African ancestry derived from ADMIXTURE to test the robustness of our ancestry estimation.
PC1, which separated the CEU and YRI populations, was tested for association with neuroblastoma risk group using logistic regression and with event-free survival using a Cox regression model. The proportional hazards assumption of the Cox regression was validated using cox.zph in the R survival package (18). Furthermore, Kaplan–Meier event-free survival analyses were performed on the genomic ethnic clusters derived from the k-means clustering algorithm, and curves were compared using a log-rank test. Time to event was calculated as the time from diagnosis until the first occurrence of neuroblastoma relapse, progression, or death. Patients without an event were censored at the time of last contact.
Contrasting association results in samples across major geographic regions at known susceptibility loci may allow us to distinguish causal variants and identify variants that account for the observed ethnic disparities. Although the challenges of conducting genetic association studies in African populations are well recognized, the shorter spans of linkage disequilibrium (LD) and the greater levels of haplotype diversity in African populations may facilitate the fine mapping of causal variants responsible for the genome-wide associations first identified in samples of European ancestry (19).
Thus, variants within 2Mb of all the replicated susceptibility loci on chromosomes 2, 6, and 11 [identified in all previous genome-wide association studies conducted primarily in children of European descent (7–9)] that show high divergence between the ancestral populations (ie, European and African), as quantified by the Fisher fixation index (F ST > 0.25), were prioritized for further analysis. These variants were tested through logistic regression analyses, with the proportion of African ancestry as a covariate, for association with high-risk neuroblastoma in a case-only study (high risk vs non–high risk). This association analysis was restricted to patients in genomic ethnic clusters 1, 2, and 3 (along the European–African ancestry axis; n = 2368 patients). Applying a Bonferroni adjustment for multiple testing, the level of statistical significance was set at P less than or equal to .0002. False discovery rate was computed using Storey’s Q value package (20).
To test whether our SNP associations were confounded by population stratification, we conducted a subgroup association analysis using only patients with genomic European ancestry. All statistical tests were two-sided.
In the principal component analysis, the reference HapMap populations (14)—European (CEU), African (YRI), and Asians (JPT+CHB)—clustered tightly when the first two PCs were plotted on an ancestry map. The samples of European and African descent in the study appeared along the “ancestry axis” between the CEU and YRI clusters. PC1 was associated with neuroblastoma risk group (P = .007), with African ancestry associated with high-risk disease. In addition, PC1 was also associated with outcome, with African ancestry associated with worse event-free survival (P = .04, hazard ratio = 1.4, 95% confidence interval = 1.05 to 1.80) in a Cox proportional hazard regression model. Adjustment for risk group in a Cox regression multivariable analysis abrogated the inferior outcome associated with PC1 (P = .38).
A k-means algorithm (16) was used to classify the patients into five genomic ethnic clusters for Kaplan–Meier survival analysis. As shown in Figure 1, event-free survival for clusters 1 (African–YRI genomic cluster) and 2 (African American genomic cluster), which consist of patients with the highest proportion of African ancestry, was statistically significantly worse than for the other ethnic clusters (log-rank P = .03).
In addition to principal component analysis, we used a model-based approach for ancestry estimation, as implemented in ADMIXTURE (17). Estimates of ancestry from both the ADMIXTURE and EIGENSTRAT methods showed a high degree of concordance (Spearman correlation = 0.997), which indicates the robustness of our estimates of ancestry. As expected from the near-perfect correlation, the estimate of the proportion of African ancestry from either method was associated with high-risk disease.
We evaluated risk allele frequencies of 13 known susceptibility SNPs in LINC00340 (7), LMO1 (8), and BARD1 (9) from previous genome-wide case–control studies using the HapMap CEU and YRI genotype data. Of the 13 candidate SNPs, six exhibited high population differentiation (F ST > 0.25). All but one (rs17487792, an intronic SNP within the BARD1 locus) of these six SNPs showed higher risk allele frequencies in the reference African population vs the European cohort (Table 2).
To identify potential genetic-based mechanisms for the observed racial disparities in risk group classification and survival, we used a population genetics method applied to high-risk vs non-high-risk samples that cluster along the European–African ancestry axis (the European–CEU, African–YRI, and African American clusters from the k-means analysis; Figure 1), and prioritized variants near the known neuroblastoma susceptibility loci (7–9) with high population divergence (F ST > 0.25). We tested each such SNP (n = 245 polymorphisms) near the known neuroblastoma susceptibility loci for association with high-risk phenotype using logistic regression with the proportion of African ancestry estimated by ADMIXTURE as a covariate. Table 3 lists the statistically significant associations from this analysis (P < .0002, Bonferroni method for multiple testing). The previously reported neuroblastoma susceptibility loci within BARD1 (rs17487792) and on chromosome 6p22 (rs6939340, rs9295536 and rs4712653) were found to be statistically significantly associated with high-risk disease (7,9). Of note, the previously identified neuroblastoma susceptibility SNPs within LMO1 (8) did not meet the statistical significance threshold in our analysis. Because the chromosome 6p22 SNPs do not lie in a protein-coding gene, they were excluded from further analyses and are the subject of ongoing investigation.
A novel SPAG16 (sperm associated antigen 16) SNP (rs1033069) (LD r 2 = 0.03 with rs17487792 in CEU) was determined to be statistically significantly associated with high-risk neuroblastoma (P = 6.42×10−5, False Discovery Rate <0.0015). Importantly, this SNP shows a substantially higher risk allele frequency in the reference African vs European population. The risk allele frequency for the SPAG16 SNP (rs1033069) is 0.34 in CEU and 0.76 in YRI. Further, a higher ancestral risk allele frequency in sub–Saharan Africa relative to Europe, Asia, and the Americas is observed in the geographic distribution of the rs1033069 allele frequencies from the Human Genome Diversity Panel populations (Figure 2) (21–24). In contrast, the risk allele for the known susceptibility BARD1 SNP (rs17487792), located 878kb from the SPAG16 SNP (rs1033069), is nearly absent in sub–Saharan Africa. Indeed, the risk allele frequency for this BARD1 SNP in the HapMap YRI population is 0.008. Thus, this genetic variant does not explain the higher prevalence of high-risk disease in children with African vs European ancestry. A nearby SNP in perfect LD with rs1033069 in both CEU and YRI—namely, rs1033067—showed a similar level of statistical significance.
Log-rank comparison of event-free survival by SPAG16 SNP rs1033069 genotype (Figure 3) demonstrated that the number of risk alleles is statistically significantly associated with event-free survival (P = .007). This finding is consistent with the earlier survival analysis on the genomic ethnic clusters (Figure 1, ,B),B), which suggests that risk allele frequency differences at rs1033069 may be driving the observed disparities between the genomic ethnic clusters.
The SPAG16 SNP was also associated with high-risk disease and survival in a subgroup analysis using only patients with genomic European ancestry or patients with self-reported European ancestry, indicating that the SNP association was not confounded by population stratification (Supplementary Table 1, available online). The association between the SPAG16 SNP and high-risk disease/event-free survival in patients of European ancestry (n = 2068) becomes more statistically significant in the combined European and African sample set (n = 2368).
Because the risk allele for the SPAG16 SNP (rs1033069) is common (minor allele frequency > 0.05) in groups with African ancestry, we hypothesized that additional polymorphisms in LD with the SNP may be identified using the 1000 Genomes Project dataset (see Supplementary Methods, available online) (25). In particular, we were interested in an expanded list of variants at this locus that exhibited large frequency differences between Africans and Europeans. Supplementary Table 2 (available online) summarizes the results of this analysis with annotation of effect on DNA regulatory motifs and chromatin state when applicable (26,27).
We also evaluated SPAG16 mRNA abundance in lymphoblastoid cell lines derived from descendants of European ancestry (CEU) and African ancestry (YRI) (see Supplementary Methods, available online) and found higher levels of expression in the YRI cell lines (P = .04) (Supplementary Figure 2, available online).
The ADMIXTURE and EIGENSTRAT methods (see Patients and Methods) measure global ancestry across the genome and do not estimate ancestry at a given locus. Indeed, individuals with similar global ancestry profiles or similar principal components may still have distinct local ancestry patterns (28). To perform local ancestry inference in populations formed by two-way admixture, we employed HAPMIX (see Supplementary Methods, available online) (29). We inferred the number of copies of each ancestry at each examined locus, which was then used in the association analysis as a covariate. Supplementary Table 3 (available online) shows the robustness of the associations (P < .0002, Bonferroni adjustment) to the use of locus-specific ancestry.
Conditional analyses and a likelihood ratio approach (see Supplementary Methods, available online) show that the SPAG16 SNP rs1033069 accounts for the ethnic disparities in phenotype (Supplementary Table 4, available online). Adjustment for rs1033069 substantially reduced the statistical significance of the racial disparities in the prevalence of high-risk phenotype (P = .12 after conditioning on rs1033069). In contrast, conditional analyses with the nearby BARD1 SNP (rs17487792) did not statistically significantly alter the association between African ancestry and high-risk disease. Similarly, in a Cox proportional hazard regression model, the addition of the SPAG16 SNP rs1033069 substantially reduced the statistical significance of the racial disparities associated with event-free survival (P = .11 after conditioning on rs1033069). Finally, the addition of rs1033069 to African genomic ancestry (PC1) fits the data statistically significantly better (P = 6.4×10−5) in predicting high-risk disease, but the addition of PC1 to rs1033069 does not improve model fit (P = .12).
Recently, racial disparities in outcome in neuroblastoma have been reported. A higher proportion of patients with high-risk disease were seen in self-reported black vs white populations, and in concordance with this disease presentation, black children had statistically significantly worse survival (6). We hypothesized that genetic variants in the genomic regions at known low-risk (11) or high-risk (7–9) neuroblastoma susceptibility loci (identified by previous genome-wide association study analyses performed primarily with patients of European descent) may statistically account for the observed racial disparities. To test this hypothesis, we evaluated the genotypes in an admixed population of children with neuroblastoma. The shorter spans of LD in African populations should facilitate the fine mapping of causal loci in previously implicated regions. We found that African genomic ancestry was statistically significantly associated with high-risk neuroblastoma phenotype and lower event-free survival. Multivariable analysis of the association between African ancestry and event-free survival after accounting for diagnostic risk group showed that event-free survival failed to achieve statistical significance, suggesting that ethnic disparities in survival are driven by ethnic disparities in risk group stratification at diagnosis. We evaluated the variants near the known neuroblastoma susceptibility SNPs with high population differentiation between the reference African and European populations. Five SNPs (three in LMO1, and two in LINC00340) known to be associated with the susceptibility to high-risk neuroblastoma were found to have high population divergence, with higher risk allele frequencies in the African cohort. We expanded our association analysis using a novel case-only design and were able to replicate previous high-risk susceptibility variants within BARD1 and LINC00340 and find additional variants within SPAG16 associated with high-risk neuroblastoma. In conditional analyses, the SPAG16 SNP, and not the nearby BARD1 SNP, abrogated the racial disparities in risk-group and survival, which indicates that this variant may contribute to the higher prevalence of high-risk disease and inferior survival observed in children with African ancestry.
Neuroblastoma offers a unique opportunity to dissect genetic and environmental factors involved in racial disparities in risk and survival. In the United States, racial and ethnic differences in socioeconomic standing contribute to poorer access to care, delays in diagnosis, and poorer overall health outcomes (30). However, for children with neuroblastoma, diagnostic delays have limited influence on the tumor phenotype and ultimate risk-group classification. Observation studies have shown that it is unlikely for low-risk, favorable-biology tumors to progress to metastatic, high-risk tumors over time (31). Furthermore, MYCN status in neuroblastoma tumors, a powerful genetic prognostic factor, does not change over time (32,33). In addition, medication compliance problems are reduced in neuroblastoma because the majority of treatment (surgery, chemotherapy, radiation, immunotherapy) is delivered in the inpatient setting. Taken together, these observations suggest that socioeconomic and environmental factors are likely to play a more limited role in contributing to racial disparities in neuroblastoma than in other cancers.
In contrast to previous case–control genome-wide association studies, our study was a case-only design with the goal of localizing risk variants relevant to the development of high-risk neuroblastoma. Our analysis recapitulated several findings from earlier studies of neuroblastoma susceptibility, including SNPs within BARD1 and on chromosome 6p22, but also identified novel high-risk neuroblastoma-associated SNPs at these loci in both the combined population and in samples of European descent.
These novel results implicate variants within the ubiquitously expressed SPAG16 (34) as important in the acquisition of high-risk neuroblastoma. This study highlights the importance of expanding genetic association studies to non-European populations to enable identification of novel disease variants.
One limitation of our study is a lack of a replication cohort. Given the rarity of neuroblastoma and the relatively small number of patients of African ancestry genotyped in this study, we felt that inclusion of all patients was warranted to maximize power to detect associations.
Our findings emphasize the key role genomic variation may play in predicting outcome in children with neuroblastoma. Efforts to understand the underlying biology of these variants associated with risk group classification and survival are underway and may ultimately lead to the identification of more effective treatment strategies.
This research was funded by the National Institutes of Health (U01 GM61393 [to MED and NJC], R01 MH090937 [to MED], U01 HG005773 [to NJC], R01 MH090937 [to NJC], and R01 CA078545 [to JMM]; Alex’s Lemonade Stand (to SLC); Children’s Neuroblastoma Cancer Foundation (to SLC); Elise Anderson Fund (to SLC); Neuroblastoma Children’s Cancer Society (to SLC); Little Heroes Cancer Research Foundation(to SLC); St. Baldrick’s Foundation (to NP); and Cancer Research Foundation (to NP).
E. R. Gamazon and N. Pinto contributed equally to this work.
Authors had full responsibility for the design of the study, the collection of the data, the analysis and interpretation of the data, the decision to submit the manuscript for publication, and the writing of the manuscript.
The authors have no conflict of interest to declare with respect to this manuscript.