The highest rates of cervical cancer are found in developing countries. Frontline monitoring has reduced these rates in developed countries and present day screening programs primarily identify precancerous lesions termed cervical intraepithelial neoplasias (CIN). CIN lesions described as mild dysplasia (CIN I) are likely to spontaneously regress while CIN III lesions (severe dysplasia) are likely to progress if untreated. Thoughtful consideration of gene expression changes paralleling the progressive pre invasive neoplastic development will yield insight into the key casual events involved in cervical cancer development.
In this study, we have identified gene expression changes across 16 cervical cases (CIN I, CIN II, CIN III and normal cervical epithelium) using the unbiased long serial analysis of gene expression (L-SAGE) method. The 16 L-SAGE libraries were sequenced to the level of 2,481,387 tags, creating the largest SAGE data collection for cervical tissue worldwide. We have identified 222 genes differentially expressed between normal cervical tissue and CIN III. Many of these genes influence biological functions characteristic of cancer, such as cell death, cell growth/proliferation and cellular movement. Evaluation of these genes through network interactions identified multiple candidates that influence regulation of cellular transcription through chromatin remodelling (SMARCC1, NCOR1, MRFAP1 and MORF4L2). Further, these expression events are focused at the critical junction in disease development of moderate dysplasia (CIN II) indicating a role for chromatin remodelling as part of cervical cancer development.
We have created a valuable publically available resource for the study of gene expression in precancerous cervical lesions. Our results indicate deregulation of the chromatin remodelling complex components and its influencing factors occur in the development of CIN lesions. The increase in SWI/SNF stabilizing molecule SMARCC1 and other novel genes has not been previously illustrated as events in the early stages of dysplasia development and thus not only provides novel candidate markers for screening but a biological function for targeting treatment.
SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers.
To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique “multi-pool method” that analyzes multiple pools of pair-wise case controls individually. When all the settings are in “inclusion”, the common SAGE tag sequences are mined. When one tissue type is in “inclusion” and the other types of tissues are not in “inclusion”, the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries.
The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing
Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag.
One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes – due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4–9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results.
Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered.
To facilitate in the identification of gene products important in regulating renal glomerular structure and function, we have produced an annotated transcriptome database for normal human glomeruli using the SAGE approach.
The database contains 22,907 unique SAGE tag sequences, with a total tag count of 48,905. For each SAGE tag, the ratio of its frequency in glomeruli relative to that in 115 non-glomerular tissues or cells, a measure of transcript enrichment in glomeruli, was calculated. A total of 133 SAGE tags representing well-characterized transcripts were enriched 10-fold or more in glomeruli compared to other tissues. Comparison of data from this study with a previous human glomerular Sau3A-anchored SAGE library reveals that 47 of the highly enriched transcripts are common to both libraries. Among these are the SAGE tags representing many podocyte-predominant transcripts like WT-1, podocin and synaptopodin. Enrichment of podocyte transcript tags SAGE library indicates that other SAGE tags observed at much higher frequencies in this glomerular compared to non-glomerular SAGE libraries are likely to be glomerulus-predominant. A higher level of mRNA expression for 19 transcripts represented by glomerulus-enriched SAGE tags was verified by RT-PCR comparing glomeruli to lung, liver and spleen.
The database can be retrieved from, or interrogated online at http://cgap.nci.nih.gov/SAGE. The annotated database is also provided as an additional file with gene identification for 9,022, and matches to the human genome or transcript homologs in other species for 1,433 tags. It should be a useful tool for in silico mining of glomerular gene expression.
To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE).
Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR.
Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified.
The EyeSAGE database, combining three different gene-profiling platforms including the authors’ multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions.
Exfoliated cervical cells are used in cytology-based cancer screening and may also be a source for molecular biomarkers indicative of neoplastic changes in the underlying tissue. However, because of keratinization and terminal differentiation it is not clear that these cells have an mRNA profile representative of cervical tissue, and that the profile can distinguish the lesions targeted for early detection.
We used whole genome microarrays (25,353 unique genes) to compare the transcription profiles from seven samples of normal exfoliated cells and one cervical tissue. We detected 10,158 genes in exfoliated cells, 14,544 in the tissue and 7320 genes in both samples. For both sample types the genes grouped into the same major gene ontology (GO) categories in the same order, with exfoliated cells, having on average 20% fewer genes in each category. We also compared microarray results of samples from women with cervical intraepithelial neoplasia grade 3 (CIN3, n = 15) to those from age and race matched women without significant abnormalities (CIN1, CIN0; n = 15). We used three microarray-adapted statistical packages to identify differential gene expression. The six genes identified in common were two to four fold upregulated in CIN3 samples. One of these genes, the ubiquitin-conjugating enzyme E2 variant 1, participates in the degradation of p53 through interaction with the oncogenic HPV E6 protein.
The findings encourage further exploration of gene expression using exfoliated cells to identify and validate applicable biomarkers. We conclude that the gene expression profile of exfoliated cervical cells partially represents that of tissue and is complex enough to provide potential differentiation between disease and non-disease.
Sixteen longSAGE libraries from four different clinical stages of cervical intraepithelial neoplasia have enabled us to identify novel cell-surface biomarkers indicative of CIN stage. By comparing gene expression profiles of cervical tissue at early and advanced stages of CIN, several genes are identified to be novel genetic markers. We present fifty-six cell-surface gene products differentially expressed during progression of CIN. These cell surface proteins are being examined to establish their capacity for optical contrast agent binding. Contrast agent visualization will allow real-time assessment of the physiological state of the disease process bringing vast benefit to cancer care. The data discussed in this publication have been submitted to NCBIs Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and are accessible through GEO Series accession number GSE6252.
longSAGE; cervical cancer; biomarker; optical imaging
Genomic and transcriptomic alterations affecting key cellular processes such us cell proliferation, differentiation and genomic stability are considered crucial for the development and progression of cancer. Most invasive breast carcinomas are known to derive from precursor in situ lesions. It is proposed that major global expression abnormalities occur in the transition from normal to premalignant stages and further progression to invasive stages. Serial analysis of gene expression (SAGE) was employed to generate a comprehensive global gene expression profile of the major changes occurring during breast cancer malignant evolution.
In the present study we combined various normal and tumor SAGE libraries available in the public domain with sets of breast cancer SAGE libraries recently generated and sequenced in our laboratory. A recently developed modified t test was used to detect the genes differentially expressed.
We accumulated a total of approximately 1.7 million breast tissue-specific SAGE tags and monitored the behavior of more than 25,157 genes during early breast carcinogenesis. We detected 52 transcripts commonly deregulated across the board when comparing normal tissue with ductal carcinoma in situ, and 149 transcripts when comparing ductal carcinoma in situ with invasive ductal carcinoma (P < 0.01).
A major novelty of our study was the use of a statistical method that correctly accounts for the intra-SAGE and inter-SAGE library sources of variation. The most useful result of applying this modified t statistics beta binomial test is the identification of genes and gene families commonly deregulated across samples within each specific stage in the transition from normal to preinvasive and invasive stages of breast cancer development. Most of the gene expression abnormalities detected at the in situ stage were related to specific genes in charge of regulating the proper homeostasis between cell death and cell proliferation. The comparison of in situ lesions with fully invasive lesions, a much more heterogeneous group, clearly identified as the most importantly deregulated group of transcripts those encoding for various families of proteins in charge of extracellular matrix remodeling, invasion and cell motility functions.
breast cancer; gene expression profiling; serial analysis of gene expression
D2-40 has been shown a selective marker for lymphatic endothelium, but also shown in the benign cervical basal cells. However, the application of D2-40 immunoreactivity in the cervical basal cells for identifying the grade of cervical intraepithelial neoplasia (CIN) has not been evaluated.
In this study, the immunoreactive patterns of D2-40, compared with p16INK4A, which is currently considered as the useful marker for cervical cancers and their precancerous diseases, were examined in total 125 cervical specimens including 32 of CIN1, 37 of CIN2, 35 of CIN3, and 21 of normal cervical tissue. D2-40 and p16INK4A immunoreactivities were scored semiquantitatively according to the intensity and/or extent of the staining.
Diffuse D2-40 expression with moderate-to-strong intensity was seen in all the normal cervical epithelia (21/21, 100%) and similar pattern of D2-40 immunoreactivity with weak-to-strong intensity was observed in CIN1 (31/32, 97.2%). However, negative and/or focal D2-40 expression was found in CIN2 (negative: 20/37, 54.1%; focal: 16/37, 43.2%) and CIN3 (negative: 22/35, 62.8%; focal: 12/35, 34.3%). On the other hand, diffuse immunostaining for p16INK4A was shown in 37.5% of CIN1, 64.9% of CIN2, and 80.0% of CIN3. However, the immunoreactive pattern of D2-40 was not associated with the p16INK4A immunoreactivity.
Immunohistochemical analysis of D2-40 combined with p16INK4A may have a significant implication in clinical practice for better identifying the grade of cervical intraepithelial neoplasia, especially for distinguishing CIN1 from CIN2/3.
D2-40; cervical intraepithelial neoplasia; immunohistochemistry; p16INK4A
To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma.
Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries were analyzed using SAGE 2000 software to extract the 17 base pair sequence tags. The extracted sequence tags were mapped to the genome using SAGE Genie map.
A total of 298,834 SAGE tags were identified from all HTM libraries (96,842, 88,126, and 113,866 tags, respectively). Collectively, there were 107,325 unique tags. There were 10,329 unique tags with a minimum of 2 counts from a single library. These tags were mapped to known unique Unigene clusters. Approximately 29% of the tags (orphan tags) did not map to a known Unigene cluster. Thirteen percent of the tags mapped to at least 2 Unigene clusters. Sequence tags from many glaucoma-related genes, including myocilin, optineurin, and WD repeat domain 36, were identified.
This is the first time SAGE analysis has been used to characterize the gene expression profile in normal HTM. SAGE analysis provides an unbiased sampling of gene expression of the target tissue. These data will provide new and valuable information to improve understanding of the biology of human aqueous outflow.
Serial Analysis of Gene Expression (SAGE) is becoming a widely
used gene expression profiling method for the study of development,
cancer and other human diseases. Investigators using SAGE rely heavily
on the quantitative aspect of this method for cataloging gene expression
and comparing multiple SAGE libraries. We have developed additional
computational and statistical tools to assess the quality and reproducibility
of a SAGE library. Using these methods, a critical variable in the
SAGE protocol was identified that has the potential to bias the
Tag distribution relative to the GC content of the 10 bp SAGE Tag
DNA sequence. We also detected this bias in a number of publicly
available SAGE libraries. It is important to note that the GC content bias
went undetected by quality control procedures in the current SAGE
protocol and was only identified with the use of these statistical
analyses on as few as 750 SAGE Tags. In addition to keeping any
solution of free DiTags on ice, an analysis of the GC content should
be performed before sequencing large numbers of SAGE Tags to be
confident that SAGE libraries are free from experimental bias.
Toxoplasma gondii gives rise to toxoplasmosis, among the most prevalent parasitic diseases of animals and man. Transformation of the tachzyoite stage into the latent bradyzoite-cyst form underlies chronic disease and leads to a lifetime risk of recrudescence in individuals whose immune system becomes compromised. Given the importance of tissue cyst formation, there has been intensive focus on the development of methods to study bradyzoite differentiation, although the molecular basis for the developmental switch is still largely unknown.
We have used serial analysis of gene expression (SAGE) to define the Toxoplasma gondii transcriptome of the intermediate-host life cycle that leads to the formation of the bradyzoite/tissue cyst. A broad view of gene expression is provided by >4-fold coverage from nine distinct libraries (~300,000 SAGE tags) representing key developmental transitions in primary parasite populations and in laboratory strains representing the three canonical genotypes. SAGE tags, and their corresponding mRNAs, were analyzed with respect to abundance, uniqueness, and antisense/sense polarity and chromosome distribution and developmental specificity.
This study demonstrates that phenotypic transitions during parasite development were marked by unique stage-specific mRNAs that accounted for 18% of the total SAGE tags and varied from 1–5% of the tags in each developmental stage. We have also found that Toxoplasma mRNA pools have a unique parasite-specific composition with 1 in 5 transcripts encoding Apicomplexa-specific genes functioning in parasite invasion and transmission. Developmentally co-regulated genes were dispersed across all Toxoplasma chromosomes, as were tags representing each abundance class, and a variety of biochemical pathways indicating that trans-acting mechanisms likely control gene expression in this parasite. We observed distinct similarities in the specificity and expression levels of mRNAs in primary populations (Day-6 post-sporozoite infection) that occur prior to the onset of bradyzoite development that were uniquely shared with the virulent Type I-RH laboratory strain suggesting that development of RH may be arrested. By contrast, strains from Type II-Me49B7 and Type III-VEGmsj contain SAGE tags corresponding to bradyzoite genes, which suggests that priming of developmental expression likely plays a role in the greater capacity of these strains to complete bradyzoite development.
Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes.
Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries.
Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues.
To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor.
Rice blast, caused by the fungal pathogen Magnaporthe grisea, is a devastating disease causing tremendous yield loss in rice production. The public availability of the complete genome sequence of M. grisea provides ample opportunities to understand the molecular mechanism of its pathogenesis on rice plants at the transcriptome level. To identify all the expressed genes encoded in the fungal genome, we have analyzed the mycelium and appressorium transcriptomes using massively parallel signature sequencing (MPSS), robust-long serial analysis of gene expression (RL-SAGE) and oligoarray methods.
The MPSS analyses identified 12,531 and 12,927 distinct significant tags from mycelia and appressoria, respectively, while the RL-SAGE analysis identified 16,580 distinct significant tags from the mycelial library. When matching these 12,531 mycelial and 12,927 appressorial significant tags to the annotated CDS, 500 bp upstream and 500 bp downstream of CDS, 6,735 unique genes in mycelia and 7,686 unique genes in appressoria were identified. A total of 7,135 mycelium-specific and 7,531 appressorium-specific significant MPSS tags were identified, which correspond to 2,088 and 1,784 annotated genes, respectively, when matching to the same set of reference sequences. Nearly 85% of the significant MPSS tags from mycelia and appressoria and 65% of the significant tags from the RL-SAGE mycelium library matched to the M. grisea genome. MPSS and RL-SAGE methods supported the expression of more than 9,000 genes, representing over 80% of the predicted genes in M. grisea. About 40% of the MPSS tags and 55% of the RL-SAGE tags represent novel transcripts since they had no matches in the existing M. grisea EST collections. Over 19% of the annotated genes were found to produce both sense and antisense tags in the protein-coding region. The oligoarray analysis identified the expression of 3,793 mycelium-specific and 4,652 appressorium-specific genes. A total of 2,430 mycelial genes and 1,886 appressorial genes were identified by both MPSS and oligoarray.
The comprehensive and deep transcriptome analysis by MPSS and RL-SAGE methods identified many novel sense and antisense transcripts in the M. grisea genome at two important growth stages. The differentially expressed transcripts that were identified, especially those specifically expressed in appressoria, represent a genomic resource useful for gaining a better understanding of the molecular basis of M. grisea pathogenicity. Further analysis of the novel antisense transcripts will provide new insights into the regulation and function of these genes in fungal growth, development and pathogenesis in the host plants.
Cervical cancer is the second most common female cancer worldwide. The ability to quantify physiological and morphological changes in the cervix is not only useful in the diagnosis of cervical precancers but also important in aiding the design of cost-effective detection systems for use in developing countries that lack well-established screening and diagnostic programs. We assessed the capability of a diffuse reflectance spectroscopy technique to identify contrasts in optical biomarkers that vary with different grades of cervical intraepithelial neoplasia (CIN) from normal cervical tissues. The technology consists of an optical probe and an instrument (with broadband light source, dispersive element, and detector), and a Monte Carlo algorithm to extract optical biomarker contributions including total hemoglobin (Hb) concentration, Hb saturation, and reduced scattering coefficient from the measured spectra. Among 38 patients and 89 sites examined, 46 squamous normal sites, 18 CIN 1, and 15 CIN 2+ sites were included in the analysis. Total Hb was statistically higher in CIN 2+ (18.3 ± 3.6 µM, mean ± SE) compared with normal (9.58 ± 1.91 µM) and CIN 1 (12.8 ± 2.6 µM), whereas scattering was significantly reduced in CIN 1 (8.3 ± 0.8 cm-1) and CIN 2+ (8.6 ± 1.0 cm-1) compared with normal (10.2 ± 1.1 cm-1). Hemoglobin saturation was not significantly altered in CIN 2+ compared with normal and CIN 1. The difference in total Hb is likely because of stromal angiogenesis, whereas decreased scattering can be attributed to breakdown of collagen network in the cervical stroma.
Cervical carcinomas are second most frequent type of women cancer. Success in diagnostics of this disease is due to the use of Pap-test (cytological smear analysis). However Pap-test gives significant portion of both false-positive and false-negative conclusions. Amendments of the diagnostic procedure are desirable. Aetiological role of papillomaviruses in cervical cancer is established while the role of cellular gene alterations in the course of tumor progression is less clear. Several research groups including us have recently named the protein p16INK4a as a possible diagnostic marker of cervical cancer. To evaluate whether the specificity of p16INK4a expression in dysplastic and neoplastic cervical epithelium is sufficient for such application we undertook a broader immunochistochemical registration of this protein with a highly p16INK4a-specific monoclonal antibody.
Paraffin-embedded samples of diagnostic biopsies and surgical materials were used. Control group included vaginal smears of healthy women and biopsy samples from patients with cervical ectopia. We examined 197 samples in total. Monoclonal antibody E6H4 (MTM Laboratories, Germany) was used.
In control samples we did not find any p16INK4a-positive cells. Overexpression of p16INK4a was detected in samples of cervical dysplasia (CINs) and carcinomas. The portion of p16INK4a-positive samples increased in the row: CIN I – CIN II – CIN III – invasive carcinoma. For all stages the samples were found to be heterogeneous with respect to p16INK4a-expression. Every third of CINs III and one invasive squamous cell carcinoma (out of 21 analyzed) were negative.
Overexpression of the protein p16INK4a is typical for dysplastic and neoplastic epithelium of cervix uteri. However p16INK4a-negative CINs and carcinomas do exist. All stages of CINs and carcinomas analyzed are heterogeneous with respect to p16INK4a expression. So p16INK4a-negativity is not a sufficient reason to exclude a patient from the high risk group. As far as normal cervical epithelium is p16INK4a-negative and the ratio p16INK4a-positive/ p16INK4a-negative samples increases at the advanced stages application of immunohisto-/cytochemical test for p16INK4a may be regarded as a supplementary test for early diagnostics of cervical cancer.
Deep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' e.g. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. Although new technologies are now emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining.
We used a single LongSAGE library of 503,431 tags and a "classic" MPSS library of 1,744,173 tags, both prepared from the same T cell-derived RNA sample, to compare the ability of each method to probe, at considerable depth, a human cellular transcriptome. We show that even though LongSAGE is more error-prone than MPSS, our LongSAGE library nevertheless generated 6.3-fold more genome-matching (and therefore likely error-free) tags than the MPSS library. An analysis of a set of 8,132 known genes detectable by both methods, and for which there is no ambiguity about tag matching, shows that MPSS detects only half (54%) the number of transcripts identified by SAGE (3,617 versus 1,955). Analysis of two additional MPSS libraries shows that each library samples a different subset of transcripts, and that in combination the three MPSS libraries (4,274,992 tags in total) still only detect 73% of the genes identified in our test set using SAGE. The fraction of transcripts detected by MPSS is likely to be even lower for uncharacterized transcripts, which tend to be more weakly expressed. The source of the loss of complexity in MPSS libraries compared to SAGE is unclear, but its effects become more severe with each sequencing cycle (i.e. as MPSS tag length increases).
We show that MPSS libraries are significantly less complex than much smaller SAGE libraries, revealing a serious bias in the generation of MPSS data unlikely to have been circumvented by later technological improvements. Our results emphasize the need for the rigorous testing of new expression profiling technologies.
Serial Analysis of Gene Expression (SAGE) is a new technique that allows a detailed and profound quantitative and qualitative knowledge of gene expression profile, without previous knowledge of sequence of analyzed genes. We carried out a modification of SAGE methodology (microSAGE), useful for the analysis of limited quantities of tissue samples, on normal human cervical tissue obtained from a donor without histopathological lesions. Cervical epithelium is constituted mainly by cervical keratinocytes which are the targets of human papilloma virus (HPV), where persistent HPV infection of cervical epithelium is associated with an increase risk for developing cervical carcinomas (CC).
We report here a transcriptome analysis of cervical tissue by SAGE, derived from 30,418 sequenced tags that provide a wealth of information about the gene products involved in normal cervical epithelium physiology, as well as genes not previously found in uterine cervix tissue involved in the process of epidermal differentiation.
This first comprehensive and profound analysis of uterine cervix transcriptome, should be useful for the identification of genes involved in normal cervix uterine function, and candidate genes associated with cervical carcinoma.
Objective: To explore the relationship between methylation of interferon gamma (IFN-γ) gene and tumorigenesis in cervical cancer tissues, the biopsy specimens of cervical cancer and cervical intraepithelial neoplasia (CIN) (I-III) patients as well as normal controls were collected and analyzed. Methods: The methylation of the IFN-γ gene was verified by using methylation-specific PCR and DNA sequencing analysis, and the expression levels of IFN-γ mRNA were detected using quantitative real-time reverse transcriptase-polymerase chain reaction (qRT-PCR). Results: The methylation rates of the IFN-γ gene were significantly higher in cervical cancer tissues (15/43, 34.9%) than those in CIN (3/23, 13.0% of CIN I; 6/39, 15.4% of CIN II/III) and normal cervical tissues (2/43, 4.7%) (P < 0.01). Furthermore, the mRNA expression of IFN-γ in cervical tumors with methylation (0.71 ± 0.13, n = 8) was lower than that in those without methylation (1.58 ± 0.32, n = 27) (P < 0.05). Likewise, the IFN-γ expression levels in CIN II/III tissues with methylation (0.87 ± 0.16, n = 5) were significantly (P < 0.01) lower compared to those without methylation (2.12 ± 0.27, n = 32). Conclusion: The hypermethylation of IFN-γ gene may be related with tumorigenesis of cervical cancer.
To evaluate whether mandatory fortification of grain products with folic acid in the US is associated with changes in histone methylation in cells involved in cervical carcinogenesis.
Cervical specimens obtained before (1990 to 1992) and after mandatory folic acid fortification (2000 to 2002) were used to examine the degree of histone methylation (H3 Lys-9) by immunohistochemistry. 91 women (51 before and 40 after fortification) were diagnosed with cervical intraepithelial neoplasia (CIN) grade 3 or carcinoma in situ (CIS) and sections utilized in the study also contained normal, reactive or metaplastic cervical epithelium, CIN 1 or CIN 2. 64 women (34 before and 30 after fortification) were free of CIN and these sections contained only normal or reactive cervical epithelium. Immunohistochemical staining for H3 Lys-9, its assessment in different cell or lesion types and data entry were blinded for fortification status. For each cell type or lesion category we used PROC MIXED in SAS with the specimen identifier as a random effect and the robust variance estimator to estimate age- and race-adjusted intensity score for H3 Lys-9 in the pre- and post-fortification periods.
Degree of H3 Lys-9 methylation was significantly higher (P < 0.0001) in ≥CIN 2 lesions (CIN 2, CIN 3 and CIS) than in ≤CIN 1 lesions (CIN 1, normal, reactive and metaplastic), in both pre- and post-fortification CIN 3/CIS specimens. Age- and race-adjusted mean H3 Lys-9 score was significantly higher in all cell or lesion types in CIN 3/CIS specimens obtained in the post-fortification period compared to pre-fortification period (P < 0.05, all comparisons). In contrast, in specimens obtained from women free of CIN, Lys-9 methylation in normal/reactive cervical epithelium was significantly lower in post-fortification specimens than in pre-fortification specimens (P = 0.03).
Higher levels of Lys-9 methylation in ≥CIN 2 compared to ≤CIN 1 lesions suggest that higher Lys-9 methylation is associated with progression of lower grade CIN to higher grade CIN. Higher Lys-9 methylation in cervical tissues of women diagnosed with CIN 3 in the post-fortification period than in pre-fortification period suggest that fortification may adversely affect histone methylation in already initiated cells. Lower Lys-9 methylation in normal/reactive cervical cells of women free of CIN in the post-fortification period than pre-fortification on the other hand suggests that fortification is likely to protect against initiation of carcinogenic process in the cervix. These results suggest that mandatory fortification with folic acid in the US seems to have different effects on cancer depending on the stage of carcinogenesis. Because this is the first study to report folic acid fortification-associated differences in histone methylation and because of the limitations inherent to the approach we have taken to demonstrate these differences, validation of the results in other study populations or with other techniques for assessing histone methylation is necessary.
folic acid; fortification; histone methylation; cervix
The amplification of oncogenes initiated by high-risk human papillomavirus (HPV) infection is an early event in cervical carcinogenesis and can be used for cervical lesion diagnosis. We measured the genomic amplification rates and the patterns of human telomerase RNA gene (TERC) and C-MYC in the liquid-based cytological specimens to evaluate the diagnostic characteristics for the detection of high-grade cervical lesions.
Two hundred and forty-three residual cytological specimens were obtained from outpatients aged 25 to 64 years at Qilu Hospital, Shandong University. The specimens were evaluated by fluorescence in situ hybridization (FISH) using chromosome probes to TERC (3q26) and C-MYC (8q24). All of the patients underwent colposcopic examination and histological evaluation. A Chi-square test was used for categorical data analysis.
In the normal, cervical intraepithelial neoplasia grade 1 (CIN1), grade 2 (CIN2), grade 3 (CIN3) and squamous cervical cancer (SCC) cases, the TERC positive rates were 9.2%, 17.2%, 76.2%, 100.0% and 100.0%, respectively; the C-MYC positive rates were 20.7%, 31.0%, 71.4%, 81.8% and 100.0%, respectively. The TERC and C-MYC positive rates were higher in the CIN2+ (CIN2, CIN3 and SCC) cases than in the normal and CIN1 cases (p < 0.01). Compared with cytological analysis, the TERC test showed higher sensitivity (90.0% vs. 84.0%) and higher specificity (89.6% vs. 64.3%). The C-MYC test showed lower sensitivity (80.0% vs. 84.0%) and higher specificity (77.7% vs. 64.3%). Using a cut-off value of 5% or more aberrant cells, the TERC test showed the highest combination of sensitivity and specificity. The CIN2+ group showed more high-level TERC gene copy number (GCN) cells than did the normal/CIN1 group (p < 0.05). For C-MYC, no significant difference between the two histological categories was detected (p > 0.05).
The TERC test is highly sensitive and is therefore suitable for cervical cancer screening. The C-MYC test is not suitable for cancer screening because of its lower sensitivity. The amplification patterns of TERC become more diverse and complex as the severity of cervical diseases increases, whereas for C-MYC, the amplification patterns are similar between the normal/CIN1 and CIN2+ groups.
The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1308004512669913.
Uterine cervical neoplasia; Oncogenes; Fluorescence in situ hybridization; Telomerase RNA gene; C-MYC; Human papillomavirus
Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that has the potential to generate the full list of mRNAs present within a cell population at a given time and their frequency. An essential step in SAGE library analysis is the unambiguous assignment of each 14 bp tag to the transcript from which it was derived. This process, called tag-to-gene mapping, represents a step that has to be improved in the analysis of SAGE libraries. Indeed, the existing web sites providing correspondence between tags and transcripts do not concern all species for which numerous EST and cDNA have already been sequenced.
This is the reason why we designed and implemented a freely available tool called Identitag for tag identification that can be used in any species for which transcript sequences are available. Identitag is based on a relational database structure in order to allow rapid and easy storage and updating of data and, most importantly, in order to be able to precisely define identification parameters. This structure can be seen like three interconnected modules : the first one stores virtual tags extracted from a given list of transcript sequences, the second stores experimental tags observed in SAGE experiments, and the third allows the annotation of the transcript sequences used for virtual tag extraction. It therefore connects an observed tag to a virtual tag and to the sequence it comes from, and then to its functional annotation when available. Databases made from different species can be connected according to orthology relationship thus allowing the comparison of SAGE libraries between species. We successfully used Identitag to identify tags from our chicken SAGE libraries and for chicken to human SAGE tags interspecies comparison. Identitag sources are freely available on web site.
Identitag is a flexible and powerful tool for tag identification in any single species and for interspecies comparison of SAGE libraries. It opens the way to comparative transcriptomic analysis, an emerging branch of biology.
To describe the long-term (≥ 10 years) benefits of clinical human papillomavirus (HPV) DNA testing for cervical precancer and cancer risk prediction.
Cervicovaginal lavages collected from 19,512 women attending a health maintenance program were retrospectively tested for HPV using a clinical test. HPV positives were tested for HPV16 and HPV18 individually using a research test. A Papanicolaou (Pap) result classified as atypical squamous cells of undetermined significance (ASC-US) or more severe was considered abnormal. Women underwent follow-up prospectively with routine annual Pap testing up to 18 years. Cumulative incidence rates (CIRs) of ≥ grade 3 cervical intraepithelial neoplasia (CIN3+) or cancer for enrollment test results were calculated.
A baseline negative HPV test provided greater reassurance against CIN3+ over the 18-year follow-up than a normal Pap (CIR, 0.90% v 1.27%). Although both baseline Pap and HPV tests predicted who would develop CIN3+ within the first 2 years of follow-up, only HPV testing predicted who would develop CIN3+ 10 to 18 years later (P = .004). HPV16- and HPV18-positive women with normal Pap were at elevated risk of CIN3+ compared with other HPV-positive women with normal Pap and were at similar risk of CIN3+ compared with women with a low-grade squamous intraepithelial Pap.
HPV testing to rule out cervical disease followed by Pap testing and possibly combined with the detection of HPV16 and HPV18 among HPV positives to identify those at immediate risk of CIN3+ would be an efficient algorithm for cervical cancer screening, especially in women age 30 years or older.
Cervical cancer is the most common cancer among Indian women. This cancer has well defined pre-cancerous stages and evolves over 10-15 years or more. This study was undertaken to identify differentially expressed genes between normal, dysplastic and invasive cervical cancer.
Materials and methods
A total of 28 invasive cervical cancers, 4 CIN3/CIS, 4 CIN1/CIN2 and 5 Normal cervix samples were studied. We have used microarray technique followed by validation of the significant genes by relative quantitation using Taqman Low Density Array Real Time PCR. Immunohistochemistry was used to study the protein expression of MMP3, UBE2C and p16 in normal, dysplasia and cancers of the cervix. The effect of a dominant negative UBE2C on the growth of the SiHa cells was assessed using a MTT assay.
Our study, for the first time, has identified 20 genes to be up-regulated and 14 down-regulated in cervical cancers and 5 up-regulated in CIN3. In addition, 26 genes identified by other studies, as to playing a role in cervical cancer, were also confirmed in our study. UBE2C, CCNB1, CCNB2, PLOD2, NUP210, MELK, CDC20 genes were overexpressed in tumours and in CIN3/CIS relative to both Normal and CIN1/CIN2, suggesting that they could have a role to play in the early phase of tumorigenesis. IL8, INDO, ISG15, ISG20, AGRN, DTXL, MMP1, MMP3, CCL18, TOP2A AND STAT1 were found to be upregulated in tumours. Using Immunohistochemistry, we showed over-expression of MMP3, UBE2C and p16 in cancers compared to normal cervical epithelium and varying grades of dysplasia. A dominant negative UBE2C was found to produce growth inhibition in SiHa cells, which over-expresses UBE2C 4 fold more than HEK293 cells.
Several novel genes were found to be differentially expressed in cervical cancer. MMP3, UBE2C and p16 protein overexpression in cervical cancers was confirmed by immunohistochemistry. These will need to be validated further in a larger series of samples. UBE2C could be evaluated further to assess its potential as a therapeutic target in cervical cancer.
Worldwide gastric carcinoma has marked geographical variations and worse outcome in patients from the West compared to the East. Although these differences has been explained by better diagnostic criteria, improved staging methods and more radical surgery, emerging evidence supports the concept that gene expression differences associated to ethnicity might contribute to this disparate outcome. Here, we collected datasets from 4 normal and 11 gastric carcinoma Serial Gene Expression Analysis (SAGE) libraries from two different ethnicities. All normal SAGE libraries as well as 7 tumor libraries were from the West and 4 tumor libraries were from the East. These datasets we compare by Correspondence Analysis and Support Tree analysis and specific differences in tags expression were identified by Significance Analysis for Microarray. Tags to gene assignments were performed by CGAP-SAGE Genie or TAGmapper. The analysis of global transcriptome shows a clear separation between normal and tumor libraries with 90 tags differentially expressed. A clear separation was also found between the West and the East tumor libraries with 54 tags differentially expressed. Tags to gene assignments identified 15 genes, 5 of them with significant higher expression in the West libraries in comparison to the East libraries. qRT-PCR in cell lines from west and east origin confirmed these differences. Interestingly, two of these genes have been associated to aggressiveness (COL1A1 and KLK10). In conclusion we found that in silico analysis of SAGE libraries from two different ethnicities reveal differences in gene expression profile. These expression differences might contribute to explain the disparate outcome between the West and the East.