|Home | About | Journals | Submit | Contact Us | Français|
NJD – carried out data analysis and organization, designed the algorithms, identified the 6-gene signature and wrote the paper. BAW – carried out lab work and analysis, co-designed the algorithms and wrote the paper. PEL, DCJ, MWJ and KB – carried out lab work, assisted in the design and wrote the paper. FMR – organised and correlated FISH and conventional cytogenetics on samples used in this analysis. AZ, JLB, WMG and FED - assisted in the design and edited the manuscript. GJM – obtained funding, managed the project, assisted in the design and wrote the paper.
Myeloma is a clonal malignancy of plasma cells. Poor prognosis risk is currently identified by clinical and cytogenetic features. However, these indicators do not capture all prognostic information. Gene expression analysis can be used to identify poor prognosis patients and this can be improved by combination with information about DNA level changes.
Using SNP-based gene mapping in combination with global gene expression analysis we have identified homozygous deletions in genes and networks that are relevant to myeloma pathogenesis and outcome.
We identified 170 genes with homozygous deletions and corresponding loss of expression. Deletion within the “Cell Death” network was over-represented and cases with these deletions have impaired overall survival. From further analysis of these events, we have generated an expression-based signature associated with shorter survival in 258 patients and confirmed this signature in data from 2 independent groups totalling 800 patients. We defined a gene expression signature of 97 cell death genes that reflects prognosis confirmed this in two independent data sets.
We developed a simple 6-gene expression signature from the 97-gene signature that can be used to identify poor prognosis myeloma in the clinical environment. The signature can form the basis of future trials aimed at improving the outcome of poor prognosis myeloma.
Multiple myeloma is an incurable clonal plasma cell malignancy that accounts for around 10% of all haematological cancers(1). While there have been significant improvements in the treatment of myeloma, there remains a clinical need to identify patients with a poor prognosis for whom alternate treatment strategies can be explored. The International Staging System (ISS) can achieve this for groups of patients and is based on clinical factors that are surrogates for disease biology. The integration of chromosomal analysis into the assessment of patients offers an additional strategy which has the potential to group patients into biologically relevant prognostically important groups(2-5). A number of chromosomal abnormalities have been identified, which can be used in this fashion, including a group of poor prognostic IgH translocations comprising t(4;14), t(14;20) and t(14;16) together the other poor prognostic markers: del(1p), gain(1q) and del(17p). Deletion of chromosome 13 has also been considered important prognostically when detected by cytogenetics. However, within each of the groups there is variability in clinical outcome, such that on their own they lack specificity in capturing all poor prognosis cases. More recently global gene expression analysis has been used for risk stratification and it is postulated that combining this information with cytogenetic changes associated with disease progression will improve prognostic grouping. Using high resolution array technologies, increasingly small regions of copy number alteration can be identified. We and others have used these technologies to characterize regions relevant to the pathogenesis of myeloma(2, 4, 6, 7). Based on the initial results of these studies we reasoned that identification of the full range of genes inactivated by the loss of genetic material during disease pathogenesis could be used as a means to identify expression signatures and specific genes with prognostic significance. Homozygous deletions (HZD) are particularly relevant in this respect as, by definition, they contain genes that are inactivated on both alleles(8, 9).
In this work, we used high density SNP arrays (median 2.5 kb resolution) to identify the location and frequency of HZD in samples collected from presenting patients with myeloma in a randomized clinical trial. In combination with global gene expression data the number of target genes was reduced with the aim of identifying key pathologically-relevant signatures and networks dysregulated in myeloma pathogenesis. Deletions affecting genes in the cell death network were able to identify a group with poor prognosis myeloma, i.e. shorter progression-free survival (PFS) and overall survival (OS), and we were able to use these data derived at a DNA-based level to define a more generally applicable cell death expression signature. The prognostic value of this expression signature was further validated in two additional large trial data sets that utilised different treatment regimens. In order to translate the signature into a readily applicable test the signature was developed such that, by using the expression ratio of only three pairs of genes, it is possible to identify cases with poor prognosis myeloma with high specificity.
Bone marrow aspirates were obtained from newly diagnosed patients with multiple myeloma entered into the MRC Myeloma IX study, after informed consent. The trial recruited 1966 patients and comprised two arms, the first for older and less fit patients and the second for younger fitter patients. All younger patients received autologous transplantation following induction with cyclophosphamide, thalidomide and dexamethasone (CTD) or cyclophosphamide, vincristine, adriamycin and dexamethasone (CVAD). Older patients were treated with either attenuated CTD (CTDa) or melphalan, prednisolone (MP). All patients were then randomized to thalidomide maintenance or no thalidomide maintenance. Patients in the analysis presented here were representative of the trial in general (Supplementary Table 1). The trial was approved by the MRC Leukaemia Data Monitoring and Ethics committee (MREC 02/8/95, ISRCTN68454111).
DNA and RNA were prepared for hybridization to the GeneChip Mapping 500K Array set and the U133 Plus 2.0 expression GeneChip, respectively (Affymetrix, Santa Clara, CA) according to manufacturers instructions, and have previously been described(4, 6, 10). There were 114 samples for which good quality DNA samples were available (of which 84 had matched non-tumor DNA) and 258 samples for which good quality RNA samples were available. All DNA samples had corresponding RNA samples. The samples with microarrays were selected randomly based only on the availability of sample. Analysis of mapping array data was performed as previously described using GCOS, GTYPE, dChip and CNAG(6). The expression levels were generated with dChip, using the default perfect match/mismatch calculations and median normalization. The microarray data have been deposited in NCBI’s Gene Expression Omnibus(11) and are accessible through GEO Series accession number GSE15695. All 258 expression samples were used to derive both the 97-gene and 6-gene signatures.
SNP genotypes and copy numbers were obtained using Affymetrix GCOS software (version 1.4.0) to obtain raw feature intensities which were then processed using Affymetrix software to derive SNP genotypes. The paired tumour-control copy number data were inferred using dChip(12). The dChip inferred copy number was used to identify HZD. These HZD were further filtered by integration with expression array data to identify those HZD that ablate expression of the genes with deletions (see Supplementary Methods).
Cancer associated changes at a DNA level are known to be associated with both pathologically and prognostically important effects and so can be used to define prognostically relevant gene expression signatures. In order to do this, we identified genes that were differentially expressed (p<0.05) between samples with HZD at the DNA level in the cell death network and those lacking such deletions. This analysis gave a list of genes that was subsequently used to perform hierarchical clustering of all of the 258 samples for which gene expression information was available. This list of genes was then used to derive a 6-gene test using a filtering process based on Cox regression, the independence and significance of expression ratios and specificity (Supplementary Methods).
Some networks of genes are well described and databases such as KEGG(13) and Biocarta4 provide well-curated sources of information. However, automatically identifying all biological relationships between genes is an intensive task and detailed manual annotation is not possible for large lists of genes. Ontologies such as the Gene Ontology project (GO)(14) and associated tools provide one method by which we can automatically identify a larger set of relationships between genes. GO terms are a hierarchical structure of semi-automatic annotations of genes. Broad annotation categories are sub-divided into informational categories and as the GO tree is descended they become more descriptive. However, at the more detailed levels there are fewer members of the annotation groups. The GO term “Cell Death” (GO term GO:008219) is a level 5 Biological Process, it includes both cytolysis and programmed cell death. Programmed cell death referred to by GO is an umbrella term that includes both apoptotic and non-apoptotic mechanisms. We will refer to this as the cell death network.
Kaplan-Meier survival curves were generated using R, Bioconductor and the survival package(15). The difference between curves was tested using the Logrank test, using R. The impact of the prognostic factors β-2-microglobulin (β2M), age, t(4;14), t(11;14), hyperdiploidy and del(13) was analyzed using univariate Cox regression. The independence of these factors and the prognostic signature was compared using multivariate Cox regression in R (Supplementary Methods).
HZD by definition inactivate the genes contained within them and have been shown to affect genes important in tumour progression and clinical outcome(16-18). Using the smoothing algorithm within dChip single examples of HZD, of at least 100 kb, were identified in 114 presenting cases with myeloma. These included FAM46C (1p), TSPYL4 (6q), PARK2 (6q), TLR4 (9q), RB1 (13q), WWOX (16q), CDH1 (16q), keratin locus (17q), GSK3A and neighboring genes (19q), UTX (Xp), CNKSR2 (Xp), and HDHD1A (Xp). Frequently occurring homozygous deletions were located at 1p32.3 (FAF1/CDKN2C), 11q (BIRC2 and BIRC3), 14q (TRAF3 and AMN), and 16q (CYLD). The majority of these genes have relevance to myeloma biology, i.e. CDKN2C and RB1 (cell-cycle regulation), TRAF3, BIRC2, BIRC3 and CYLD (NF-κB regulation)(6, 19, 20), WWOX (apoptosis)(6, 21), GSK3A (Wnt signaling)(22, 23), and CDH1 (frequently methylated)(24, 25).
However, this manual approach is not efficient and a more sensitive bioinformatic approach was designed to identify these deletions in a standardized manner (for details see Supplementary Methods). In this bioinformatic approach a custom algorithm was designed to maximize the detection of deletions within and including genes. Using this algorithm on the dChip-processed mapping data from 84 newly diagnosed myeloma patients with paired peripheral blood controls (Supplementary Table 1) identified 783 genes that contained an acquired HZD in more than 1 tumour sample and of these 170 were associated with loss of expression of that gene and represent true HZD events (Figure 1A). Only 29 genes had HZD in 5% or more samples and could be considered recurrent (Figure 1B and Table 1). All of the deletions identified by the manual dChip method were included in the list generated by the custom algorithm, cross validating the two approaches.
HZD tend to occur in the genomic regions where hemizygous deletions occur: 1p, 6q, 8p, 12p, 13q, 14q, 16q, 20p and 22 (Figure 1B) however, recurrent HZD are also seen at 11q, which is not affected by LOH(16, 18). The median size of the HZD was 37.5 kb with the smallest deletion identified in more than one patient being 13 bases and the largest 14.9 Mb. We show that the frequency of HZD is variable, with 56 out of 84 cases having at least 1 HZD, (median 6 HZD per case) making it important to define those which are driver and which are passenger lesions.
A subset of the HZD were confirmed by DNA-based quantitative PCR (qPCR) in order to validate the method (Supplementary Methods). All samples tested which were expected to have homozygous deletions validated by qPCR. Of the 29 homozygously deleted genes, 14 (48%) were located on chromosome 13, which has frequent hemizygous deletion. However, these genes are still altered in the myeloma tumour clone relative to their germline by HZD and may be relevant to disease pathology. Inherited copy number variation (CNV) can impact the copy number at any locus but the copy numbers inferred by dChip take this into account by comparison to matched control genomic DNA. In the list of genes with HZD there are 20 that are frequently affected by inherited CNVR5 (Table 1), however, these are not consistently either gains or losses and the HZD are not present in the germline DNA and so are potentially relevant to myeloma pathology.
Survival analyses were carried out on the 29 genes with HZD in 5% or more cases. Of these genes, deletion of CDKN2C/FAF1 at 1p32 was found to be associated with a shorter OS (p=0.029, median OS 20 months versus 46 months). These genes are grouped together because the deletion detected spans both genes in all of our samples. There were five genes deleted in at least 10% of cases DCLK1, ATP8A2, KLF12, PCDH9 and FGF14. Of these only samples with ATP8A2 HZD had a significantly shorter OS (p=0.018), median OS of 14 months versus 46 months, Supplementary Figure 1.
GO annotations were used to interpret the identified genes and define network-specific abnormalities present within the filtered list of 170 genes. There were 43 terms at GO level 5 assigned to the list and after consideration of the annotations we tested the prognostic significance of the following GO terms: regulation of progression through cell cycle probeset (GO:0000074), negative regulation of progression through cell cycle (GO:0045786), protein transport (GO:0015031), intracellular transport (GO:0046907) and the cell death network (GO: 008219), all of which were over-represented in the annotations of genes with HZD.
Within the list of genes with HZD we identified a significant enrichment of genes defined as the cell death network by GO. The genes significantly enriched included genes important in cell cycle regulation (CDKN2C, EMP1, PLAGL1), apoptosis (CTSB, BIRC2, BIRC3, TNFRSF10B, TNFRSF10D, FAF1, FGF14, SGK), and regulation of transcription (ESR1, FOXO1, TFDP1). While there were 15 genes in this cell death network there were only 11 distinct genetic regions, this difference being due to juxtaposed pairs of genes being deleted in the same cases: CDKN2C and FAF1 on 1p, SGK and ESR1 on 6q, TNFRSF10B and TNFRSF10D on 8p, and BIRC2 and BIRC3 on 11q. Deletion affecting any of the genes within the GO defined cell death network identified 25% of all cases of myeloma. Samples with a deletion of any gene annotated as cell death (n=24) had a significantly shorter OS (p= 0.015) and PFS (p <0.001); median OS was 32 months versus 48 months, Figure 2.
We examined other prognostic events associated with HZD and show that the t(4;14), t(14;16), t(14;20), high β2M, del(16q) or del(17p) were not over-represented. Examination of a prognostic expression-based proliferation index (PI)(26) within the cell death deleted cases shows that there was an over-representation of high PI samples (p=0.019). However, removal of cases with a high PI reveals that the residual cases still have a significantly shorter OS (p=0.029) indicating that the cell death pathway is an independent prognostic factor.
In order to link alterations of the cell death network detected at a DNA level to associated changes at the level of gene expression, an analysis of differential gene expression between the samples with and those without HZD of cell death genes was carried out. To do this mapping samples were divided into those with any cell death HZD and into those without; a T-test was carried out (2-tailed, unequal variance) with a p-value significance of p<0.001 threshold (uncorrected). This analysis generated a list of 97 genes annotated as cell death by GO (Supplementary Table 2). Unsupervised hierarchical clustering (centroid-based) was then performed on the 258 samples with expression data using the 97-gene list. This analysis revealed 2 clusters with distinct expression patterns. The samples that clustered in the same branch as the majority of cell death HZD samples had a significantly shorter OS (p<0.001), median 32 versus 48 months, and shorter PFS (p=0.019). The samples in this cell death expression class were checked for an over-representation of other known factors that would affect survival and they were representative of the set as a whole.
In order to validate this signature we repeated the expression-based hierarchical clustering analysis using the 97-gene list in two additional data sets (GSE2658(27) and GSE9782(28)). The raw data for these series were not available but using the same 97-gene list and performing hierarchical clustering in the same way again produced a distinct cluster with higher expression of the same genes as identified in our data. In each of these data series the samples in the higher expressing cell death cluster also had a significantly shorter OS than the other samples: in GSE2658 median OS was 46 months in the cell death cluster and >70 months (median not reached) outwith the cell death cluster (p<0.001); in GSE9782 median OS was 10 months in the cell death cluster and 22 months outwith the cluster (p<0.001). The higher expressing genes are identified in Supplementary Table 2. The GSE2658 data set comprised 559 patients derived from patients in the total therapy programmes TT2 and TT3; the first using thalidomide and the second thalidomide plus bortezomib. The GSE9782 data set comprised relapsed patients treated with bortezomib from the CREST, SUMMIT and APEX studies including the 040 companion study.
We went on to examine the associations with the 97-gene signature and showed it to be independent of other known prognostic factors including, β2M, serum albumin, and cytogenetic factors (del(13), t(4;14), t(11;14), del(17p), del(16q)), Table 2A. This suggests that the prognostic implication of dysregulation of the cell death network is independent of these cytogenetic changes and that identifying patients with this signature may be clinically useful.
The 97-gene signature identifies 34% of Myeloma IX patients as having a poor prognosis, but as it is expression microarray based its regular use in the clinical setting is currently impractical. As a consequence, a more readily applicable 6 gene Cell Death signature that identifies a similar set of patients with poor prognosis was derived. This was achieved using a univariate and multivariate regression analysis, comparing ratios of expression of genes from within the 97-gene signature (Supplementary Methods). As a result of this analysis we identified 3 pairs of genes whose relative expression provides a robust marker of prognosis, identifying 12% of patients as poor prognosis. This approach is based on the expression ratios of BUB1B (budding uninhibited by benzimidazoles 1 homolog beta; involved in spindle checkpoint) versus HDAC3 (histone deacetylase 3; involved in transcriptional repression), CDC2 (cell division cycle 2; G1/S to G2/M transition) versus FIS1 (fission 1 homolog; mitochondrial fission) and RAD21 (rad21 homolog; double-strand DNA break repair) versus ITM2B (integral membrane protein 2B; Alzheimer’s disease) and if any one of these pairs has a ratio ≥1 then the test is positive for poor prognosis. There is a clear divide between genes potentially involved in myeloma/cancer pathogenesis (BUB1B, CDC2 and RAD21) and those which are either potential tumor suppressor genes or uninvolved in cancer pathogenesis (HDAC3, FIS1 and ITM2B). Cases identified by these six genes have a median OS of 13 months with the signature and 45 months without, they also have a shorter PFS of 11 months in the 6-gene positive cases 22 months in the negative cases in the Myeloma IX data, Figure 3A. The 6-gene signature identifies 12% of cases in the Myeloma IX dataset but an average of 20% across all of the analysed data sets and in each case the patients with the 6-gene signature had a significantly shorter prognosis (Figure 3B and 3C). The specificity of this signature for identifying cases previously identified as being poor prognosis by the 97-gene signature is100%. The 6-gene signature is not intended to replace the biologically relevant 97-gene signature; it is used to provide a more specific signature of poor prognosis that could be used as a prognostic test. Multivariate Cox regression was used to compare the 6-gene signature as a prognostic factor versus other published signatures (Table 2B) and other conventional prognostic factors (Table 2C). This showed that the 6-gene signature is independent of the other prognostic factors.
We have carried out a comprehensive analysis of HZD in myeloma using Affymetrix SNP-based mapping data. In myeloma, in contrast to lymphoid tumours, there is marked genomic instability with a particular focus on a number of defined regions where these events are concentrated and within which HZD occur(29, 30). An exception to this is 11q which is frequently amplified by chromosomal gain and where HZD are also seen. Using a standard manual approach based on dChip we have identified a number genes affected by HZD. These data provide further support for our previous findings and those of others showing that the recurrent loss of regions containing negative regulators of the NF-κB pathway is an important recurrent event in myeloma. In addition, we have also gone on to identify novel deleted genes which also affect cell death pathways consistent with the importance of the upregulation of survival pathways leading to the immortalisation of the myeloma clone. We also identify genes affecting chromatin methylation status such as UTX, which has recently been shown to be mutated in myeloma cell lines(31).
We have examined the prognostic significance of the frequently occurring HZD but because of limited numbers failed to identify relevant effects. We reasoned that rather than individual lesions being important the deregulation of pathways that are important defines outcome. Thus, in order define the prognostic impact of pathway associated changes we wished to identify all potential HZD affecting gene expression. We, therefore, designed a sensitive and standardized approach to global HZD detection, using a custom algorithm with which we identified 170 genes with HZD and loss of expression. These genes were enriched for genes involved in a cell death network. Pathway analysis showed that deregulation of the cell death pathway at the DNA level was significantly associated with impaired clinical outcome. Detecting such changes in new patients could be important to define prognosis but detecting changes at the DNA level in clinical samples is difficult. In context it is readily possible to detect changes at an RNA level in routine diagnostic laboratories.
Consequently, we postulated that if we could develop an RNA-based expression signature focussed on the cell death network, then we would be able to define a biologically relevant and prognostically important set of genes. Thus the cases with a dysregulated cell death network at a DNA level were used to identify associated expression changes within cell death genes. This analysis generated a signature consisting of 97-genes associated with poor outcome (Table 1). Genes included in this comprised important genes in both the intrinsic and extrinsic apoptotic pathway as well as within the TRAIL/TNF/NF-κB signalling and PI3k pathways. We went on to test the signature for its validity at presentation, relapse and by treatment including thalidomide or bortezomib using publicly available data. The results of this analysis show that being in the poor prognosis group, as identified using hierarchical clustering by the expression of the 97-genes, is independent of either prognostic lesion or treatment used. Currently, the ISS and the FISH-based abnormalities del(1p), del(17p) and the poor prognosis immunoglobulin translocations can be used to define prognosis but even these variables do not capture all of the clinical variability. Thus, more variables are required in order to define risk for an individual patient and expression signatures are able to address at least some of this requirement. The most widely reported signature is the 70-gene signature reported by the University of Arkansas group that they have incorporated into their treatment strategies. More recently, the IFM group have reported a signature and including ours, we have tried to understand how the three signatures compare.
In order to understand how this expression signature anchored in changes within cell death genes at the DNA level compared with other prognostically important expression signatures we applied two other published prognostic signatures (Arkansas 70-gene(32) and Intergroupe Francophone du Myélome (IFM)(33) 15-genes) to our data set using their published methods. The genes in each of the signatures are different with no overlap apart from BIRC5. Using each of the signatures to predict poor prognosis in our data set we identified a core group of 37 cases (42% of poor prognosis cases and 14% of all myelomas) identified as having a poor prognosis by all of the signatures. However, our signature identifies a distinct set of 27 poor prognosis cases. In total, we identify 89 cases with poor prognosis but the IFM 15-gene signature identifies 64 of our cases. Our signature is almost equally as sensitive to identifying poor prognosis cases as the Arkansas 70-gene signature which identifies 90 cases compared with 89 using our signature (Figure 4).
The use of global gene expression arrays and data are mostly impractical within the clinical environment and consequently we developed a more limited signature which would give similar information. This test utilised information from 3 pairs of genes suitable for qPCR analysis, removing the need to derive a set of control genes for the analysis, since each member of the pair acts as a control for the other gene. While some of the expression ratio pairs from the 97-gene list were more sensitive to identifying poor prognosis than those finally selected for the 6-gene signature, those selected were, highly specific such that patients who were not poor prognosis risk would not be incorrectly classified as such. The 6-gene signature is a significant independent variable when other prognostic factors, including ISS, del(17p) and poor-prognosis IgH translocations are taken into account, Table 2. The 6 genes can all be described as biologically relevant, but since they were specifically selected for cell death annotations they are not necessarily involved in the biology of the poor prognosis. However, they are demonstrably markers of poor prognosis and the 6-gene signature that we develop here has high specificity for the identification of patients with poor prognosis myeloma, at both presentation and relapse, who are suitable candidates for alternate treatment approaches.
This research was supported by Myeloma UK, Cancer Research UK, Leukaemia Research Fund UK, the Bud Flanagan Leukaemia Fund, the Kay Kendall Leukaemia Fund and the Royal Marsden Hospital National Institute for Health Research (NIHR) Centre.
Data and samples were collected as part of the MRC Myeloma IX phase III trial – clinical trial registration (ISRCTN68454111). FISH and cytogenetics were carried out by Leukaemia Research Fund UK Myeloma Forum Cytogenetics Group, Wessex Regional Genetics Laboratory, Salisbury, UK. Overall trial statistics were carried out by the Clinical Trials Research Unit, University of Leeds, UK.
Statement of Translational Relevance
The research discussed in this paper provides new insights into prognosis in myeloma. Poor prognosis risk is currently identified by clinical and cytogenetic features. However, these indicators do not capture all prognostic information. Gene expression analysis can be used to identify patients with poor prognosis and this can be improved by combination with information about DNA-level changes. The biological information about homozygous deletions discussed here has been integrated with gene expression and then with clinical data. The 97-gene expression signature discussed in this paper has also been further developed into a 6-gene signature that is clinically applicable. This 6-gene signature is discussed in the paper and is aimed at producing a test that could be used directly with patients. In the future, this test and others like it have the potential to be used to tailor treatment of individual myeloma patients.