The initial goal of our study was to perform comparative expression profiling of normal and malignant breast epithelia using three different approaches – two commonly used technologies we had utilised previously [
21] and a recently released microarray designed specifically for investigation of breast cancer.
To ensure fair comparison, a human transcriptome database was used as a common mapping point for the three technologies [
22]. This produced similar numbers of mapped features for the Plus 2.0 and Breast Cancer DSA and markedly lower amounts for MPSS. The similar overall numbers of Plus 2.0 and DSA probesets unambiguously mapping to the HTR database were contrasted when the orientation of alignment was considered, with the Plus 2.0 showing slightly higher numbers of sense features and the DSA showing significantly higher numbers of antisense transcripts. This is most likely explained by the Plus 2.0 design being focused on common protein coding genes from older public data while the DSA design employed tissue specific sequencing and is more likely to have discovered novel content. Therefore whilst the Plus 2.0 could be expected to have more features mapping in the sense orientation, a proportion of these likely represent transcripts not necessarily of functional importance in breast cancer.
The fact that MPSS produced fewer unambiguous maps than either of the other two technologies is a reflection of the short tag length utilised by the technology and the consequent increased likelihood of cross hybridisation. This fact highlights a major shortcoming of the MPSS approach. Whereas either of the microarray technologies utilise probesets containing eleven 25-mer probes and specially formulated to be reflective of a single transcript, the MPSS approach generates single tags only 21 nucleotides in length. This fact creates significant potential for incorrect mapping of tags and means that much of the initial data generated becomes unusable when a stringent mapping methodology is utilised. This serves as a general reminder of the care that must be taken in the interpretation of any data generated on the basis of short, single tags or probes.
Detection analysis showed further advantages of the DSA approach with higher numbers of present calls than the Plus 2.0 array in both orientations. The fact that this trend was observed regardless of the Plus 2.0's higher number of sense mapped features is again suggestive of the advantage of the disease-focused approach used in the generation of the DSA – it would appear that a larger proportion of the Plus 2.0 content does not show expression in breast epithelia. MPSS again under performed at this point, which is reflective of previous assessments of the technology [
23].
Generic arrays have previously been suggested as a viable means of studying antisense transcription [
8,
24] however the higher number of antisense transcripts and higher detection levels on the DSA suggest that antisense transcription would be better studied using a focused approach like the Breast Cancer DSA research tool. Furthermore, the DSA achieved greater concordance with MPSS data than the Plus 2.0 which is noteworthy as our previous studies conducted in the absence of the DSA had identified the Affymetrix Plus 2.0 as the microarray platform that had the highest concordance with the MPSS data set [
20].
Our criteria for selection of a 'robust' set of antisense transcripts meant that a large proportion (~90%) of the DSA's antisense probesets were excluded from further analysis. It is likely that some of these antisense transcripts arose due to experimental artefacts [
21] and the use of actinomycin D during reverse transcription could have reduced the number of antisense transcripts as seen in the study of Perocchi et al [
25]. Nevertheless, it is equally possible that many of these are probesets to genuine antisense sequences and could have yielded useful data – 868 probe sets on the DSA showed more than two fold differential expression, however in the absence of an extended validation of the antisense transcripts it was felt that they should only be considered when confirmed by one of the other two technologies used in the study. This leaves a substantial subset of remaining antisense transcripts whose expression in the breast tissue has to be validated by different technologies in the future.
The 257 robust sense-antisense pairs investigated on the DSA showed a high degree of novelty when compared to a recently created SAS database, suggestive of the fact that a large number of SAS pairs remain to be discovered and reported. Numerous SAS databases have been published by other researchers [
7,
26] and comparison with these could form the basis of further studies. The large number of novel SAS pairs identified here is understandable as the discovery of antisense transcripts and SAS pairs is still considered a relatively new phenomenon in many quarters and work in this area has yet to reach maturity. This provides further indication of the potential value of the antisense transcripts represented on the DSA but excluded from this study. The nature and function of the 431 'robust' antisense candidates and the subset of these forming the 257 SAS pairs is currently unknown. As stated previously, the Breast Cancer DSA is a discovery platform containing many transcripts that have not yet been well characterised. Whilst we have demonstrated the expression of these antisense transcripts, extensive subsequent validation would be required to elucidate their function and falls outside the scope of the current study. Sequence alignment data for the SAS pairs are provided [see Additional file
5] and may prove a useful resource for future functional analysis.
SAS pairs have previously been classified as head-to-head, tail-to-tail or embedded based on their pattern of overlap [
6]. A limitation of the DSA technology is that it utilises 3' biased protocols and therefore only the 3' end of transcripts are interrogated. As a result, SAS pairs discovered using this technology will solely represent tail-to-tail overlap patterns. This fact also suggests that there may be a large body of alternatively classified SAS pairs to be discovered by other experimental means.
The fact that all SAS pairs differentially expressed between the normal and malignant settings showed positive correlation was surprising as negative correlation has previously been reported in several studies [
11]. This led us to attempt validation of the SAS expression in a range of malignant cell-lines and solid tumours by means of strand-specific RT-PCR. The results produced by this approach largely correlated with those obtained on the DSA platform, however negative correlation was observed in 13 of the 81 tested samples. So while our pooled samples suggested positive correlation of differential expression of all SAS pairs between normal and malignant settings, individual assessment of a range of solid tumours and cell-lines indicated the existence of alternative patterns of differential expression. While differential expression of the sense and antisense transcript for MMP24 was more prominent in luminal breast cancer cell lines (3/5), significant different expression levels of the DCBLD2 -SAS pair were observed solely in two luminal, hormone receptor positive breast cancer cell lines. This data might suggest that the level of expression for certain SAS pairs could be breast cancer subtype specific. Nonetheless our studies suggest that coexpression of SAS pairs may be more prevalent than inverse expression. The differing patterns of differential expression between samples suggests a potential functional relevance of sense-antisense expression patterns as has previously been reported [
11] and serves to highlight the importance of SAS profiling in cancer research. Such knowledge could be beneficial in the elucidation of pathways in cancer and might be exploited in potential future treatments like antisense therapy [
27]. Aberrations in SAS expression patterns might well be indicative of disease or could prove useful in sub-classification of a given disease, potentially aiding in the development of targeted treatments.