TOP1 is viewed as a housekeeping gene because its homozygous deletion is embryonic lethal (5
) and its expression is restricted across cell cycle stages (34
). In this study we used high throughput microarray analyses to elucidate regulatory genetic determinants of TOP1 transcript expression across the NCI-60 and explored two potential influences on TOP1 expression: transcriptional pausing and DNA copy number variation. The correlation between TOP1 expression and copy number of 0.33 was higher than prior multi-gene correlation averages of 0.29 and 0.23 (35
), suggesting that TOP1 copy number alterations in cancers can influence the transcript level.
To compare TOP1 transcript levels across the NCI-60 cell lines and across our six Affymetrix probesets (on HG-U95, HG-U133, and HG-U133 Plus2 microarrays), we used average z-scores2
. Z-scores facilitate the integration of data from multiple platforms because they are mean-centered and normalized with respect to standard deviation. Average z-scores for TOP1 expression provided a single value for each of the NCI-60 lines (see , and ). Those values were used to rank the cells, and for comparison to arrayCGH (). Use of z-scores, as opposed to individual probeset data, for TOP1 transcript levels yielded generally higher correlations with both TOP1 protein expression (18
) and TOP1 DNA copy number (). The TOP1 transcript expression at relatively low levels in the renal and at high levels in the leukemia and in six of the seven colon carcinoma lines (, and ), was consistent with our recent results using an ELISA assay to measure Top1 protein levels (18
Recent studies have shown that transcription can be regulated for many genes, after establishment of the pre-initiation complex (36
). Having found that TOP1 expression was variable from cell line to cell line across the NCI-60, we looked for differential expression across the 21 exons of the TOP1 gene using our data from the exon-specific Affymetrix GeneChip Human Exon 1.0 ST microarray. An asset in organizing and visualizing the exon array data was our SpliceCenter tool (38
), which provided: i) automated visualizations of the translated and untranslated regions of the TOP1 gene; ii) the relative exon sizes and locations of introns; and iii) the portions of the exons being assessed by the individual probesets (examples in ). Interpretation of inter-cell line transcript level variation was made possible by the relatively large number of arrays (156) used in this study. As probe set intensities are not directly interpretable with respect to differences among exons because of variability in hybridization efficiency between individual probes, we developed a mean-centering approach and looked for variation across cell lines for each individual exon. Mean-centered intensities, as seen in for exon 1, can provide indications of exon-specific transcript variations. Analogous transformation of exon-array data from genes other than TOP1 should likewise be useful for detection of exon-specific transcript variation.
Our analyses based on the GH Exon 1.0 ST microarray revealed only very limited variation in exon 1-specific transcript intensities across the NCI-60, as compared with differences across cell lines for the other 20 exons (see ). The cell lines with the highest TOP1 expression (such as colon carcinoma HCT-116) appear to allow the transcription process to pass through intron 1 more readily than do the low-TOP1 expressers (such as breast carcinoma HS578T). In the latter, transcription appears to be impeded within intron 1. Those observations suggested the existence of a negative transcription regulator within intron 1 of the TOP1 gene in humans. Further analysis demonstrated that intron 1 is relatively short and conserved among vertebrates (see ), containing multiple potential quadruplex-forming G-rich sequences. This implies similar potential for the formation of secondary structures in these vertebrates.
Sequences that form quadruplexes have been generally found to regulate transcription within promoter regions. For instance, quadruplexes in the promoters of the HIF1A, KRAS, CMYB and CMYC genes (39
) have been found to repress transcription initiation. Our findings for the TOP1 gene highlight the potential role of quadruplex-forming G-rich sequences for regulating transcription in the body of genes. They are consistent with a recent report showing the preferential enrichment of potential quadruplex sequences in the first introns of a large number of human genes (44
). Potential quadruplexes were significantly over-represented on the non-template DNA strands (44
). In the case of the TOP1 gene, potential quadruplex sequences in the first intron were present in both the transcribed (non-template) and coding strands, but the sequences with the highest potential were on the transcribed strand. Biochemical and biophysical evidence provided here indicates that the sequences with potential for G-quadruplex formation can actually form such structures, at least in vitro (see ).
To the best of our knowledge, our study is the first use of exon-specific expression data for recognition of potential intronic secondary structure leading to the restriction of transcript elongation. The approach, practiced across panels of cells lines such as the NCI-60, could be useful for detection of intronic transcription pausing in other genes.
In summary, the present study describes: i) the relative levels of TOP1 within the NCI-60; ii) a significant association between TOP1 expression and DNA copy number; iii) the existence of a reduction of TOP1 transcript level following exon 1; iv) the existence of reduced variability of exon 1 (as compared to the other 20 exons); v) the presence of multiple potential guanosine quadruplexes within intron 1; and v) the verification of the guanosine quartet formation by two of these by physico-chemical methods. This form of exon-specific transcript variation analysis across the NCI-60 allows the recognition of inhibitory elements within the introns of genes. Based on this approach, we suggest for the first time, the existence of a transcriptional pausing event in the first intron of TOP1, and provide a plausible mechanistic explanation based on the presence of G-quadruplex regions.