|Home | About | Journals | Submit | Contact Us | Français|
Because Topoisomerase I (TOP1) is critical for the relaxation of DNA supercoils and because it is the target for the anticancer activity of camptothecins, we assessed TOP1 transcript levels in the 60 cell line panel (the NCI-60) of the National Cancer Institute's anticancer drug screen. TOP1 expression levels varied over a 5.7-fold range across the NCI-60. HCT116 colon and MCF-7 breast cancer cells were the highest expressers; SK-MEL-28 melanoma and HS578T breast carcinoma cells were the lowest. TOP1 mRNA expression was highly correlated with Top1 protein levels, indicating that TOP1 transcripts could be conveniently used to monitor Top1 protein levels and activity in tissues. Assessment of the TOP1 locus by array comparative genomic hybridization across the NCI-60 showed copy numbers ranging from 1.71 to 4.13 and a statistically significant correlation with TOP1 transcript levels (p<0.01). Further analyses of TOP1 expression on an exon-specific basis revealed that exon 1 expression was generally higher, and less variable than expression of the other exons, suggesting some form of transcriptional pausing regulation between exons one and two. Accordingly, we found the presence of multiple evolutionarily-conserved potential G-quadruplex-forming sequences in the first TOP1 intron. Physico-chemical tests for actual quadruplex formation by several of those sequences yielded quadruplex formation for two of them and duplex formation for one. The observations reported here suggest the hypothesis that there is a conserved negative transcription regulator within intron 1 of the TOP1 gene associated with a quadruplex-prone region.
Top1 catalyzes DNA strand breakage through the reversible formation of covalent bonds between an enzyme tyrosyl oxygen and a DNA phosphorus.(1) The strand breakage/religation allows relaxation of DNA supercoiling to facilitate the diverse processes of replication, transcription, repair, recombination, and chromatin remodeling (1-3). Top1 is therapeutically important because Top1 inhibitors derived from the plant alkaloid camptothecin (topotecan and irinotecan) are routinely used to treat colon, ovarian and lung cancers in adults, and neuroblastoma and sarcoma in children. Non-camptothecin Top1 inhibitors are in preclinical development (1, 4).
Top1 is expressed ubiquitously in eukaryotic cells including non-replicative and post-mitotic cells. Expression of the TOP1 gene is essential in animals, and its homozygous disruption is early-embryonic lethal (5). Even its under-expression leads to alterations in DNA replication and genomic organization (6). Reduced levels of Top1 may be rate-limiting for the relaxation of positive DNA supercoiling ahead of replication and transcription complexes (1, 6, 7). Accumulation of supercoiling may promote replication fork collapse and transcriptional R-loops (8). TOP1 over-expression is also toxic (9). Too much Top1 may promote the formation of Top1 cleavage complexes at endogenous DNA lesions (abasic sites, mismatches, oxidized bases, nicks, and DNA adducts) (1, 10). Top1 protein levels are also critical for responses to anticancer therapy; cell killing by Top1 inhibitors, including camptothecins and indenoisoquinolines, is positively correlated with Top1 expression (6, 11-15).
The 60 cancer cell lines (the NCI-60) of the NCI Developmental Therapeutics anticancer screen exhibit differential expression patterns, which is an asset in identifying relationships to general processes, such as alterations in DNA copy number and pharmacological response (16, 17). In a recent study, we showed that Top1 expression varies significantly across the NCI-60 and that Top1 protein expression is significantly correlated with TOP1 mRNA expression across those cells (18). The NCI-60 panel (19, 20) constitutes a unique database for pharmacological inquiry because it has been characterized more extensively at the DNA, RNA, protein, chromosomal, functional, and pharmacological levels than any other set of cells in existence1 (17, 20-24).
The aim of the present study was to take advantage of our multi-faceted molecular profiling of the NCI-60 to elucidate genetic mechanism(s) that regulate TOP1 transcript expression. For that purpose, we integrated data on: i) TOP1 transcript expression, using data from three different platforms, Human Genome U95 (HG-U95) (24), Human Genome U-133 (HG-U133) (24) and Human Genome U133 Plus 2.0 (HG-U133 Plus 2.0); ii) TOP1 transcript expression at the exon level for the 21 TOP1 exons, using Affymetrix GH Exon 1.0 ST Arrays; iii) TOP1 protein expression (18); and iv) DNA copy number, by array CGH using NimbleGen arrays from the NCI-60 screen. Those joint profiling data generated a hypothesis about the regulation of TOP1 expression that we then followed up with biochemical and biophysical studies.
As described previously (18), transcript expression data for the NCI-60 were obtained from our Affymetrix Human Genome U95 Set (HG-U95; ~60,000 features; Affymetrix Inc., Sunnyvale, CA) and Human Genome U133 (HG-U133a and b; ~44,000 feature (24). We also analyzed data for TOP1 from our Human Genome U133 Plus 2.0 Arrays (HG-U133 Plus 2.0; ~47,000 features). We used GC RMA normalization for the HG-U95 and HG-U133. RMA was used for the HG-U133 Plus 2 (25). In order to be included in the calculation of TOP1 expression levels (Figure 1A), probesets were required to have an intensity range of ≥1,2 log2, and be consistent to the pattern of expression of the other probesets, using a Pearson's correlation coefficient cutoff of ≥ 0.52. Probes that passed these quality controls (Figure 1A) were used to calculate the z-score2 average for TOP1 by first determining the z-score for each probe by subtracting the 60 cell mean and dividing by the standard deviation, (the second to last and last rows of numbers in Figure 1A, respectively) and then averaging the resulting values for each cell line. Data from the HG-U95 and HG-U133 microarrays can be accessed at our relational database, CellMiner3.
The sequence within TOP1 that the individual probes (that make up the probesets) from the HG-U95, HG-U133, and HG-U133 Plus 2.0 arrays hybridize to were visualized (Figure 1C) using SpliceCenter4. SpliceCenter is a suite of user-friendly tools that evaluate the impact of gene splice variation on a variety of molecular biological techniques.
Transcript expression data for each of the 21 TOP1 exons were obtained using the Affymetrix GeneChip Human Exon 1.0 ST (GH Exon 1.0 ST) Array according to manufacturer's instructions. We are grateful to John Ward, Thomas Gingeras and colleagues at Affymetrix for very kindly providing the chips for those studies. Assays were run at GeneLogic (under the aegis of E. Kaldjian) following manufacturer's recommendations. Technical triplicate array were hybridized for each cell line, except for renal (RE) UO-31, which was done in duplicate, and central nervous system (CNS) SF-539, which was missing. Transcript data were normalized by RMA (25), using Partek Genomics Suite version 6.3.
To calculate mean-centered intensities across cell lines for each probe set, the technical triplicates for each cell line were averaged for each probe set. All probesets were then quality-controlled for either being at background levels for all cell lines (and thus uninformative about pattern), or being “dead” (unresponsive, although not at background). Probes that failed either of those criteria were dropped from further analysis. When there were multiple probesets for a single exon, averages were taken next to yield a single value for each exon.
Data for each exon were next mean-centered across the 59 cell line intensities available (CNS:SF-539 was unavailable). The mean-centered intensity values (M) for each exon were calculated as: M = intensity value for the exon - (average of intensity values for the exon across 59 cell lines).
The statistical significance (p=2.2×10-16) of differences in intensity between TOP1 exon 1 and exons 2-21 was calculated by Welch's t-test on the basis of measurements using 182 microarrays. On each array, there were four probes specific for exon 1 (yielding 4×182=712 measurements) and 33 probes specific for exons 2-21 (yielding 33×182=6006 measurements).
Human, mouse, and rat sequence information was obtained from the National Center for Biotechnology Information (NCBI)6. For Figure 3A, the exon sizes were derived from information obtained from locus NM_003286. The TOP1 intron 1 sizes and sequences were obtained from locus NC_000020, NC_000068, and NC_005102, for human, mouse, and rat, respectively. Fly sequences were obtained from FlyBase7. The complementary strand sequence was generated using the Molecular Biologist's Workbench's Data Manipulation tool8. Potential G-quadruplex regions were identified using default conditions at the GRS (quadruplex forming G-rich sequences) Mapper9.
Oligonucleotides were synthesized by Eurogentec (Seraing, Belgium) at the 200-nmole scale. Concentrations were estimated using extinction coefficients provided by the manufacturer and calculated using a nearest-neighbor model (26) as described previously. Sequences (Figure 5A) are given in the 5′ to 3′ direction. Melting experiments were conducted as previously described (27, 28) by recording the absorbance at 295 nm (27, 29). Sequences were tested at least twice at 4-μM strand concentration. Thermal difference spectra were obtained from differences between the absorbance spectra from unfolded and folded oligonucleotides (as recorded above and below their Tm's (30). Circular dichroism spectra were recorded on a JASCO-810 spectropolarimeter as described previously (28).
The NimbleGen 385,000-feature Human Whole-Genome array (HG17, Build 35) probe microarray (Varma, in preparation) yielded data from 15 probes specific for TOP1. They are 39095858, 39101405, 39106639, 39111664, 39117552, 39123350, 39128951, 39134754, 39146000, 39154112, 39159375, 39165420, 39170495, 39176079, and 39181834. Approximate mean DNA copy number was calculated as:
where C = (the correction for generating the intensities as a ratio of the cell line intensity to a normal, 2N, DNA) = 2, and L = log of the intensity values = 2.
Genes are designated here by their HUGO (Human Genome Organization) names, as promulgated by the Gene Nomenclature Committee (HGNC)10.
To determine relative transcript levels of TOP1 in the NCI-60, we used six probes from three different microarray platforms, HG-U95 (24), HG-U133 (24) and HG-U133 Plus 2.0. Figure 1A displays these relative levels both as intensity values, and as the z-score average. The use of z-scores allows comparison of relative levels for data distributions with different means and/or standard deviations. Values obtained for probes from the three platforms were consistent with each other (mean Pearson's correlation coefficient = 0.72, with a range of 0.52 to 0.93). The six probes, two of which appear in both HG-U133 and HG-U133 Plus 2.0, hybridize to exons 1-6 and 18-21 (Figure 1C), with the majority of probes targeting the 3′ end of the gene. TOP1 expression in the NCI-60 varied over a 5.7 fold range, within one standard deviation of the average transcript variation (a 9.0 fold range) for 26 housekeeping genes (as defined in (31)).
Colon (CO) HCT116 and breast (BR) MCF-7 cells showed the highest TOP1 mRNA levels, and breast (BR) HS578T and melanoma (ME) SK-MEL-28 showed the lowest (Figures 1A, and and2A).2A). The six leukemia (LE) cell lines and six of the seven colon carcinoma cell lines consistently expressed high TOP1 mRNA (Figures 1A-B). Their average z-scores (0.86 and 0.89, respectively) were more than 3.7 times that of the next highest tissue of origin type, ovarian (OV) at 0.23. HCC-2998 cells were the only colon carcinoma cells with lower than average TOP1 transcript levels. The breast and prostate (PR) lines were the most variable in TOP1 expression, with standard deviations of 1.25, and 0.95, respectively. The renal (RE) lines were the lowest expressers (Figures 1A-B), with an average z-score of -0.83. Breast, central nervous system (CNS) and melanoma formed a second tier of negative expressers, with z-score averages of -0.16, -0.35, and -0.34, respectively. TOP1 transcript and protein levels correlated with one another at statistically significant levels (r=0.80, p<0.001) (18).
To evaluate whether DNA copy number contributes to TOP1 expression, we used array comparative genomic hybridization (aCGH) to determine average copy number levels for the TOP1 locus and 2 megabases of flanking region on both sides of it. Those levels were obtained from our studies using NimbleGen 385,000-feature Human Whole-Genome CGH arrays. Based on the average intensities of 15 tiled probes specific for the TOP1 locus, DNA copy number ranged from 4.13 for the breast cancer line MCF-7 to 1.71 for the melanoma SK-MEL-28 (Figure 2A, left panel). Comparison of the average transcript z-scores (Figure 1A, last column of the tabular data, and Figure 2A, right panel) with the average estimated DNA copy number indicates a statistically significant, correlation (r=0.33, p<0.02, without multiple comparisons correction). Those results suggest that TOP1 amplification can contribute to increased expression levels in cancer cells.
Within the portion of chromosome 20 for which copy number was estimated (i.e., from nucleotides 37,103,339 to 41,095,315), 57 of the cell lines showed relative invariance in their average copy number (within the limits of reliability and variability of the individual probes). HS578T exemplifies that type of profile (Figure 2B, middle panel). Only two cell lines, breast cancer MCF7 and lung cancer NCI-H322M, showed variation around the TOP1 locus (Figure 2B, top and bottom panels, respectively). In MCF7, there is amplification of an ~1.7×106 nucleotide region that contains TOP1 (red squares in Figure 2B). In the NCI-H322M cells, there is an ~2-fold increased copy number toward the p-end of chromosome 20 and a reduction in average copy number at the TOP1 locus.
To expand our transcript assessment and consider potential exon-specific variation, we next examined the intensity levels for each of the 21 TOP1 exons using the GH Exon 1.0 ST Array (see Supplemental Table 1). This platform was in agreement with the other three transcript platforms for TOP1 expression, with average correlations of 0.86, 0.82, and 0.73 (all significant with p<.0001) for HG-U95, HG-U133, and HG-U133 Plus2, respectively. Analysis of the GH Exon 1.0 ST Array indicated that the probe intensities for exon 1 were higher on average (by ≥1.87 fold, linear) than those for exons 2-21, with the higher expresser cell lines such as HCT-116 and MCF7 (ranked in Figure 1A) having less of a drop off (1.37 and 1.50 fold linear change, respectively) than the lower expresser cell lines such as HS578T and SK-MEL-28 (2.23 and 2.19 fold linear change, respectively). For all cell lines, the average of the exon 1 minus the average of the exon 2 through 21 probesets are positive (Supplemental Table 1, last column), indicating the intensity reduction following intron 1.
Next, to partially compensate for variations in probe set hybridization efficiency, we mean-centered the probe set average log2 intensity values for each exon across the NCI-60 (results in Table 1). Using the data in this form allows one to more clearly identify potential specific exon level fluctuations. Figure 3A shows a clustered image map visualization of these mean-centered intensity values. The vertical strip representing exon 1 stands out as having less color variation, and thus less mean-centered intensity variation, than do the other exons. The horizontal blocks of color indicate that most cell lines are otherwise consistent in their mean intensity levels across exons 2-21. Assessment of the mean-centered log2 intensities by cell line, including their 95% confidence intervals (Supplemental Figure 1A, B, and C), indicates that few examples of exon-specific transcript variation occur within exons 2-21 for TOP1.
The deviations from the (60 cell) mean for the Table 1 data for four individual exons are depicted quantitatively in Figure 3B, reflecting what was seen in Figure 3A. That is, from the lengths of the red bars, it is apparent that exon 1 has less deviation from the mean than do the other three exons shown. The same is true also for the other 17 exons (for all exons, see Supplemental Figure 1), substantiating the observation of reduced exon 1 variation from Figure 3A. That point is reinforced in tabular form as well by the standard deviations of the mean-centered log2 intensities for each exon across the NCI-60 (Table 1, second row from bottom). The standard deviation is the least for exon 1, at 0.20, with an average of 0.47 (with a range of 0.35 to 0.67) for exons 2 to 21.
Those results indicate that: i) TOP1 mRNA expression is generally less variable for exon 1 than for exons 2-21; ii) the TOP1 transcript levels generally decrease between the exons 1 and 2 probe sets; and iii) there is a greater drop-off in the lower expressers as compared with the higher expresser cell lines.
Due to these transcript variations found between TOP1 exon 1 and 2, we took a detailed look at intron 1 for potential secondary structure forming motifs (Figure 4A). The first intron of human TOP1 is among the smallest in the TOP1 gene, consisting of 330 base pairs. Examination of the sequence composition of human intron 1 using the Quadruplex forming G-Rich Sequences (QGRS) Mapper11 algorithm revealed six and five potential quadruplex-forming G-rich sequences (depicted as the lighter bars in Figure 4B) within the coding and transcribed strands, respectively. Chimpanzee, mouse, and rat TOP intron 1 (see Figure 4B) are relatively conserved in both size and number of potential quadruplex-forming G-rich sequences. Drosophila is more disparate, with intron 1 containing 1346 nucleotides and four potential quadruplex-forming G-rich sequences in both its coding and transcribed strands (data not shown).
Of the QGRS Mapper algorithm identified potential quadruplex sequences on both DNA strands of Intron 1, manual inspection suggested that the most stable (quadruplexes) were located on the transcribed strand. To confirm the propensity of these motifs to form quadruplexes, we synthesized three oligodeoxynucleotides (21, 25, and 30 bases long) that mimicked parts of intron 1 (located in Figure 4B beneath the human intron 1, with the sequence depicted in Figure 5A). Standard physico-chemical methods were used to reveal quadruplex formation (32). For the 21- and 25-mers, UV melting profiles showed a clear inverted transition at 295 nm, consistent with G4 quadruplex formation (Figure 5B) (27). The 21-mer was highly stable (Tm of 75°C in K+). Furthermore, this Tm depends on the nature of the cation (K+ > Na+: data not shown). The thermal difference spectra (TDS, Figure 5C) and circular dichroism spectra (CDS, Figure 5D) for both sequences were consistent with G4 formation. In contrast, the 30-mer, despite being very G-rich, did not exhibit the same behavior (no transition at 295 nm and different CDS and TDS). Altogether, those results indicate that two of the three sequences form bona fide quadruplexes whereas the longest one (30-mer) forms a duplex.
We then analyzed the molecularity of the 21-mer, which shows a fully reversible melting profile. We measured the Tm at different strand concentrations (between 2 and 30 μM). The Tm was concentration-independent (data not shown), confirming that the quadruplexes are intramolecular (33). We could not perform a similar experiment with the 25-mer because its melting was more complex, and it showed hysteresis, probably as a result of the formation of multiple quadruplex species of different molecularities.
The melting experiments indicated that at least two sequences in the transcribed strand tend to form quadruplex structures and that the one formed with the 21-mer is extremely thermally stable. Nevertheless, a Tm value is of limited value for assessing the “strength” of a quadruplex under physiological conditions; a free energy of transition would be more predictive. We therefore performed a van't Hoff analysis of the melting profiles and found a δG°37°C value of −8.6 kcal/mol. That value provides an indication of the high thermodynamic barrier for unfolding of the G4 structure to let the transcription machinery pass through this structure. However, the value of δG°37°C should be interpreted with caution because it relies on a number of assumptions (such as two-state behavior and temperature-independent enthalpy) and extrapolations with respect to the nature of the cellular environment (33).
TOP1 is viewed as a housekeeping gene because its homozygous deletion is embryonic lethal (5) and its expression is restricted across cell cycle stages (34). In this study we used high throughput microarray analyses to elucidate regulatory genetic determinants of TOP1 transcript expression across the NCI-60 and explored two potential influences on TOP1 expression: transcriptional pausing and DNA copy number variation. The correlation between TOP1 expression and copy number of 0.33 was higher than prior multi-gene correlation averages of 0.29 and 0.23 (35), suggesting that TOP1 copy number alterations in cancers can influence the transcript level.
To compare TOP1 transcript levels across the NCI-60 cell lines and across our six Affymetrix probesets (on HG-U95, HG-U133, and HG-U133 Plus2 microarrays), we used average z-scores2. Z-scores facilitate the integration of data from multiple platforms because they are mean-centered and normalized with respect to standard deviation. Average z-scores for TOP1 expression provided a single value for each of the NCI-60 lines (see Figures 1A and B, and and2A).2A). Those values were used to rank the cells, and for comparison to arrayCGH (Figure 2A). Use of z-scores, as opposed to individual probeset data, for TOP1 transcript levels yielded generally higher correlations with both TOP1 protein expression (18) and TOP1 DNA copy number (Figure 2A). The TOP1 transcript expression at relatively low levels in the renal and at high levels in the leukemia and in six of the seven colon carcinoma lines (Figures 1A and B, and and2A),2A), was consistent with our recent results using an ELISA assay to measure Top1 protein levels (18).
Recent studies have shown that transcription can be regulated for many genes, after establishment of the pre-initiation complex (36, 37). Having found that TOP1 expression was variable from cell line to cell line across the NCI-60, we looked for differential expression across the 21 exons of the TOP1 gene using our data from the exon-specific Affymetrix GeneChip Human Exon 1.0 ST microarray. An asset in organizing and visualizing the exon array data was our SpliceCenter tool (38), which provided: i) automated visualizations of the translated and untranslated regions of the TOP1 gene; ii) the relative exon sizes and locations of introns; and iii) the portions of the exons being assessed by the individual probesets (examples in Figure 1C). Interpretation of inter-cell line transcript level variation was made possible by the relatively large number of arrays (156) used in this study. As probe set intensities are not directly interpretable with respect to differences among exons because of variability in hybridization efficiency between individual probes, we developed a mean-centering approach and looked for variation across cell lines for each individual exon. Mean-centered intensities, as seen in Figure 3 for exon 1, can provide indications of exon-specific transcript variations. Analogous transformation of exon-array data from genes other than TOP1 should likewise be useful for detection of exon-specific transcript variation.
Our analyses based on the GH Exon 1.0 ST microarray revealed only very limited variation in exon 1-specific transcript intensities across the NCI-60, as compared with differences across cell lines for the other 20 exons (see Figure 3). The cell lines with the highest TOP1 expression (such as colon carcinoma HCT-116) appear to allow the transcription process to pass through intron 1 more readily than do the low-TOP1 expressers (such as breast carcinoma HS578T). In the latter, transcription appears to be impeded within intron 1. Those observations suggested the existence of a negative transcription regulator within intron 1 of the TOP1 gene in humans. Further analysis demonstrated that intron 1 is relatively short and conserved among vertebrates (see Figure 4A and B), containing multiple potential quadruplex-forming G-rich sequences. This implies similar potential for the formation of secondary structures in these vertebrates.
Sequences that form quadruplexes have been generally found to regulate transcription within promoter regions. For instance, quadruplexes in the promoters of the HIF1A, KRAS, CMYB and CMYC genes (39-43) have been found to repress transcription initiation. Our findings for the TOP1 gene highlight the potential role of quadruplex-forming G-rich sequences for regulating transcription in the body of genes. They are consistent with a recent report showing the preferential enrichment of potential quadruplex sequences in the first introns of a large number of human genes (44). Potential quadruplexes were significantly over-represented on the non-template DNA strands (44). In the case of the TOP1 gene, potential quadruplex sequences in the first intron were present in both the transcribed (non-template) and coding strands, but the sequences with the highest potential were on the transcribed strand. Biochemical and biophysical evidence provided here indicates that the sequences with potential for G-quadruplex formation can actually form such structures, at least in vitro (see Figure 5).
To the best of our knowledge, our study is the first use of exon-specific expression data for recognition of potential intronic secondary structure leading to the restriction of transcript elongation. The approach, practiced across panels of cells lines such as the NCI-60, could be useful for detection of intronic transcription pausing in other genes.
In summary, the present study describes: i) the relative levels of TOP1 within the NCI-60; ii) a significant association between TOP1 expression and DNA copy number; iii) the existence of a reduction of TOP1 transcript level following exon 1; iv) the existence of reduced variability of exon 1 (as compared to the other 20 exons); v) the presence of multiple potential guanosine quadruplexes within intron 1; and v) the verification of the guanosine quartet formation by two of these by physico-chemical methods. This form of exon-specific transcript variation analysis across the NCI-60 allows the recognition of inhibitory elements within the introns of genes. Based on this approach, we suggest for the first time, the existence of a transcriptional pausing event in the first intron of TOP1, and provide a plausible mechanistic explanation based on the presence of G-quadruplex regions.
We wish to thank Dr. Thomas Gingeras, now at Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, and John Ward who provided the microarrays while at Affymetrix, Santa Clara, CA 95050. We also wish to thank Eric Kaldjian, who oversaw the data generation for the Affymetrix GH Exon 1.0 ST microarrays while at GeneLogic, Gaithersburg, MD 20879. This work was supported by the Intramural Program of the National Cancer Institute, Center for Cancer Research.