In this study, we sequenced 691,390 SAGE tags from four libraries. Cervical L-SAGE libraries N1, N2, C1, and C2 were sequenced to 165,624, 181,224, 173,534, and 171,008 tags, respectively. Duplicate ditags were eliminated from analysis resulting in 136,276, 139,656, 154,828 and 136,386 useful tags respectively and a total of 24, 058 unique tags (Figure ). 15,438 of the unique tags mapped to annotated UniGene identifiers. The raw data of the sequence tags have been made publicly available (Gene Expression Omnibus, series accession number GSE6252). We characterized the transcriptome of normal cervical tissue and evaluated the highly expressed genes in terms of tissue specificity, concordant expression among the normal libraries and their altered expression in CIN III lesions (Figure ).
Figure 1 Flow diagram of SAGE analysis and tag-to-gene mapping. A. Sequence tags yielded from the four SAGE libraries were catagorized. Useful tags indicate all sequenced tags less duplicate ditags. B. The abundance and classification of unique tags in the SAGE (more ...)
Genes Highly Expressed in Normal Cervical Epithelium
118 unique tags were found to be highly expressed in the normal cervical epithelium (at >500 tpm in both normal libraries). 103 of these tags mapped to UniGene clusters and represent 100 unique genes and hypothetical proteins (Figure ). Manual examination of tags not mapped by SAGE Genie yielded three additional tags. This results in a total of 107 unique tag-to-gene mappings and 103 unique genes. The abundance of the 118 tags and the genes they represent are summarized in Table .
Tags expressed in normal cervical libraries at ≥500 tags per million.
To determine cervical tissue specific expression, we first investigated the expression of the 107 genes using expression data available at the National Center for Biotechnology Information (NCBI) Unigene database and the National Cancer Institute (NCI) Cancer Genome Anatomy Project (CGAP) SAGE Anatomical Viewer. Based on CGAP information, only four of the 107 genes were unique to cervical tissue: carcinoembryonic antigen-related cell adhesion molecule 7 (CEACAM7), keratin 6A (KRT6A), small proline-rich protein 3 (SPRR3) and S100 calcium binding protein A7 (S100A7). These genes were further investigated for expression by RT-PCR in 20 different tissue types and three normal cervical specimens (Figure ). CEACAM7 was found to be expressed in colon, larynx, pancreas and two of the three normal cervical specimens. KRT6A expression was detected in placenta, thymus, tongue, prostate, larynx, colon, skin and in all three of the normal cervical specimens. SPRR3 was found strongly expressed in placenta, thymus, colon, tongue, larynx and all three of the normal cervical cases. S100A7 showed expression in placenta, thymus, and tongue and in all three of the normal cervical specimens. All four genes were prominently expressed in the cervical epithelium but this combination of genes was not expressed in the tissues examined (Figure ).
Figure 2 Validation of tissue specificity of gene expression. Reverse transcriptase PCR of four genes in 20 tissue types and three normal cervical specimens. Heart (Ht), breast (Br), placenta (Pl), lung (Lg), liver (Lv), skeletal muscle (Sk), kidney (Kd), pancreas (more ...)
Disrupted Gene Expression in CIN III
All tags were assessed for altered expression in CIN III. Four hundred and seventy-six tag show greater than two fold increase in CIN III and are expressed at greater than 15 tpm (see Additional file 1
) while 315 tags were decreased in CIN III (see Additional file 2
We determined if the expression of the 107 unique tags, that were highly expressed in normal cervical libraries (> 500 TPM), were disrupted in CIN III. Comparison of expression levels in N1, N2 to the CIN III libraries using the Z-test revealed five differentially expressed genes (Table ). Annexin 2 (ANXA2), galectin 7 (LGALS7) and connexin 43 (GJA1) exhibited decreased expression in CIN III (Z < -1.96) while aquaporin 3 (AQP3) and ribosomal-like protein 37 (RPL37) increased in expression (Z > 1.96). Real-time PCR was performed on a panel of 6 new cervical specimens, three each of normal and CIN III for all five of these genes (Figure ). Expression results were normalized to housekeeping gene ACTB and 18S (Figure and , respectively). Decrease in expression of ANXA2, LGALS7 and GJA1 in CIN III was confirmed while increase in expression of AQP3 and RPL37 were not.
Highly expressed genes with altered expression in CIN III.
Figure 3 Summary of test panel quantitative PCR results of genes with altered expression in CIN III L-SAGE libraries. A panel of three new CIN III cases (CIN III A, CIN III B, CIN III C) were investigated for expression and compared to three new normal specimens. (more ...)
Viral (HPV 16) tags in L-SAGE libraries
HPV transcripts were also detected by L-SAGE. Tags from all four libraries were mapped against the genomes of HPV 16 and HPV 18. While no tags mapped to HPV 18, twelve tags from the CIN III libraries mapped to the more prevalent HPV 16 genome (Table ). The highest transcript counts of known genes belonged to E5 at 1,180 and 290 tpm and E2 at 240 and 20 tpm, in libraries C2 and C1, respectively. Compared by BLAST [8
] against the RefSeq Genome collection, none of the twelve tags matched 100% to the human genome. All twelve tags were also mapped against human transcript sets (mitochondrial genome, RefSeq, UCSC gene set, Unigene, Ensembl, UCSC mRNA, UCSC EST, SAGEmap and SAGEgenie SAGE tag sets). No tags matched to any of the described transcript sets with the exception of CATGCACGCTTTTTAATTACA and CATGTGTATGTATTAAAAATA which mapped to human EST BF909200. The full length EST sequence is 97% identical with the HPV 16 E5
gene and was likely amplified from HPV sequences in the originating uterine tumour lesion.
Tags mapping to HPV16 genome.