The highest rates of cervical cancer are found in developing countries. Frontline monitoring has reduced these rates in developed countries and present day screening programs primarily identify precancerous lesions termed cervical intraepithelial neoplasias (CIN). CIN lesions described as mild dysplasia (CIN I) are likely to spontaneously regress while CIN III lesions (severe dysplasia) are likely to progress if untreated. Thoughtful consideration of gene expression changes paralleling the progressive pre invasive neoplastic development will yield insight into the key casual events involved in cervical cancer development.
In this study, we have identified gene expression changes across 16 cervical cases (CIN I, CIN II, CIN III and normal cervical epithelium) using the unbiased long serial analysis of gene expression (L-SAGE) method. The 16 L-SAGE libraries were sequenced to the level of 2,481,387 tags, creating the largest SAGE data collection for cervical tissue worldwide. We have identified 222 genes differentially expressed between normal cervical tissue and CIN III. Many of these genes influence biological functions characteristic of cancer, such as cell death, cell growth/proliferation and cellular movement. Evaluation of these genes through network interactions identified multiple candidates that influence regulation of cellular transcription through chromatin remodelling (SMARCC1, NCOR1, MRFAP1 and MORF4L2). Further, these expression events are focused at the critical junction in disease development of moderate dysplasia (CIN II) indicating a role for chromatin remodelling as part of cervical cancer development.
We have created a valuable publically available resource for the study of gene expression in precancerous cervical lesions. Our results indicate deregulation of the chromatin remodelling complex components and its influencing factors occur in the development of CIN lesions. The increase in SWI/SNF stabilizing molecule SMARCC1 and other novel genes has not been previously illustrated as events in the early stages of dysplasia development and thus not only provides novel candidate markers for screening but a biological function for targeting treatment.
Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag.
One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes – due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4–9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results.
Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered.
To facilitate in the identification of gene products important in regulating renal glomerular structure and function, we have produced an annotated transcriptome database for normal human glomeruli using the SAGE approach.
The database contains 22,907 unique SAGE tag sequences, with a total tag count of 48,905. For each SAGE tag, the ratio of its frequency in glomeruli relative to that in 115 non-glomerular tissues or cells, a measure of transcript enrichment in glomeruli, was calculated. A total of 133 SAGE tags representing well-characterized transcripts were enriched 10-fold or more in glomeruli compared to other tissues. Comparison of data from this study with a previous human glomerular Sau3A-anchored SAGE library reveals that 47 of the highly enriched transcripts are common to both libraries. Among these are the SAGE tags representing many podocyte-predominant transcripts like WT-1, podocin and synaptopodin. Enrichment of podocyte transcript tags SAGE library indicates that other SAGE tags observed at much higher frequencies in this glomerular compared to non-glomerular SAGE libraries are likely to be glomerulus-predominant. A higher level of mRNA expression for 19 transcripts represented by glomerulus-enriched SAGE tags was verified by RT-PCR comparing glomeruli to lung, liver and spleen.
The database can be retrieved from, or interrogated online at http://cgap.nci.nih.gov/SAGE. The annotated database is also provided as an additional file with gene identification for 9,022, and matches to the human genome or transcript homologs in other species for 1,433 tags. It should be a useful tool for in silico mining of glomerular gene expression.
SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers.
To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique “multi-pool method” that analyzes multiple pools of pair-wise case controls individually. When all the settings are in “inclusion”, the common SAGE tag sequences are mined. When one tissue type is in “inclusion” and the other types of tissues are not in “inclusion”, the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries.
The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing
To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE).
Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR.
Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified.
The EyeSAGE database, combining three different gene-profiling platforms including the authors’ multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions.
Cervical intraepithelial neoplasia (CIN), also known as transformation and dysplasia of cervical intraepithelial cells, is the precancerous lesion of squamous cell carcinoma. CXC chemokine receptor 7 (CXCR7) has been indicated in tumor development and metastasis of multiple malignancies or precancerous lesion. However, the protein expression and function of CXCR7 in different stages of human CIN remains unclear. The present study examined CXCR7 protein expression in cervical tissue samples from 34 patients, including 7 patients with normal cervical tissues (negative control), 10 patients with stage I of CIN (CIN I), 8 patients with CIN II and 9 patients with CIN III. Receiver operating characteristic curves (ROC) were established to evaluate the prognostic value of CXCR7 in differentiating various stages of CIN. Immunohistochemical staining showed that protein expression of CXCR7 was higher in CIN tissues compared with the normal cervical epithelium (P<0.05). High-grade CIN tissues expressed a higher level of CXCR7 compared to low-grade samples. The ROC curve of integral optical density analysis showed that CXCR7 could discriminate CIN I–III from normal cervical epithelium with 88.9% sensitivity and 71.4% specificity, and CIN II–III from the negative control and CIN I with 92.7% sensitivity and 50.0% specificity. ROC curve of area analysis also showed that CXCR7 could discriminate CIN I–III from normal cervical epithelium with 70.4% sensitivity and 100.0% specificity, and CIN II–III from the negative control and CIN I with 50.0% sensitivity and 90.0% specificity. An increase in CXCR7 expression may represent a novel predictor of CIN. The wide expression of CXCR7 in CIN also supports the assumption that CXCR7 plays a role in precancerous lesion progression, as well as proliferation, migration and angiogenesis.
cervical intraepithelial neoplasia; CXC chemokine receptor 7; receiver operating characteristics curve analysis; immunohistochemistry
To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma.
Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries were analyzed using SAGE 2000 software to extract the 17 base pair sequence tags. The extracted sequence tags were mapped to the genome using SAGE Genie map.
A total of 298,834 SAGE tags were identified from all HTM libraries (96,842, 88,126, and 113,866 tags, respectively). Collectively, there were 107,325 unique tags. There were 10,329 unique tags with a minimum of 2 counts from a single library. These tags were mapped to known unique Unigene clusters. Approximately 29% of the tags (orphan tags) did not map to a known Unigene cluster. Thirteen percent of the tags mapped to at least 2 Unigene clusters. Sequence tags from many glaucoma-related genes, including myocilin, optineurin, and WD repeat domain 36, were identified.
This is the first time SAGE analysis has been used to characterize the gene expression profile in normal HTM. SAGE analysis provides an unbiased sampling of gene expression of the target tissue. These data will provide new and valuable information to improve understanding of the biology of human aqueous outflow.
Sixteen longSAGE libraries from four different clinical stages of cervical intraepithelial neoplasia have enabled us to identify novel cell-surface biomarkers indicative of CIN stage. By comparing gene expression profiles of cervical tissue at early and advanced stages of CIN, several genes are identified to be novel genetic markers. We present fifty-six cell-surface gene products differentially expressed during progression of CIN. These cell surface proteins are being examined to establish their capacity for optical contrast agent binding. Contrast agent visualization will allow real-time assessment of the physiological state of the disease process bringing vast benefit to cancer care. The data discussed in this publication have been submitted to NCBIs Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and are accessible through GEO Series accession number GSE6252.
longSAGE; cervical cancer; biomarker; optical imaging
Exfoliated cervical cells are used in cytology-based cancer screening and may also be a source for molecular biomarkers indicative of neoplastic changes in the underlying tissue. However, because of keratinization and terminal differentiation it is not clear that these cells have an mRNA profile representative of cervical tissue, and that the profile can distinguish the lesions targeted for early detection.
We used whole genome microarrays (25,353 unique genes) to compare the transcription profiles from seven samples of normal exfoliated cells and one cervical tissue. We detected 10,158 genes in exfoliated cells, 14,544 in the tissue and 7320 genes in both samples. For both sample types the genes grouped into the same major gene ontology (GO) categories in the same order, with exfoliated cells, having on average 20% fewer genes in each category. We also compared microarray results of samples from women with cervical intraepithelial neoplasia grade 3 (CIN3, n = 15) to those from age and race matched women without significant abnormalities (CIN1, CIN0; n = 15). We used three microarray-adapted statistical packages to identify differential gene expression. The six genes identified in common were two to four fold upregulated in CIN3 samples. One of these genes, the ubiquitin-conjugating enzyme E2 variant 1, participates in the degradation of p53 through interaction with the oncogenic HPV E6 protein.
The findings encourage further exploration of gene expression using exfoliated cells to identify and validate applicable biomarkers. We conclude that the gene expression profile of exfoliated cervical cells partially represents that of tissue and is complex enough to provide potential differentiation between disease and non-disease.
Toxoplasma gondii gives rise to toxoplasmosis, among the most prevalent parasitic diseases of animals and man. Transformation of the tachzyoite stage into the latent bradyzoite-cyst form underlies chronic disease and leads to a lifetime risk of recrudescence in individuals whose immune system becomes compromised. Given the importance of tissue cyst formation, there has been intensive focus on the development of methods to study bradyzoite differentiation, although the molecular basis for the developmental switch is still largely unknown.
We have used serial analysis of gene expression (SAGE) to define the Toxoplasma gondii transcriptome of the intermediate-host life cycle that leads to the formation of the bradyzoite/tissue cyst. A broad view of gene expression is provided by >4-fold coverage from nine distinct libraries (~300,000 SAGE tags) representing key developmental transitions in primary parasite populations and in laboratory strains representing the three canonical genotypes. SAGE tags, and their corresponding mRNAs, were analyzed with respect to abundance, uniqueness, and antisense/sense polarity and chromosome distribution and developmental specificity.
This study demonstrates that phenotypic transitions during parasite development were marked by unique stage-specific mRNAs that accounted for 18% of the total SAGE tags and varied from 1–5% of the tags in each developmental stage. We have also found that Toxoplasma mRNA pools have a unique parasite-specific composition with 1 in 5 transcripts encoding Apicomplexa-specific genes functioning in parasite invasion and transmission. Developmentally co-regulated genes were dispersed across all Toxoplasma chromosomes, as were tags representing each abundance class, and a variety of biochemical pathways indicating that trans-acting mechanisms likely control gene expression in this parasite. We observed distinct similarities in the specificity and expression levels of mRNAs in primary populations (Day-6 post-sporozoite infection) that occur prior to the onset of bradyzoite development that were uniquely shared with the virulent Type I-RH laboratory strain suggesting that development of RH may be arrested. By contrast, strains from Type II-Me49B7 and Type III-VEGmsj contain SAGE tags corresponding to bradyzoite genes, which suggests that priming of developmental expression likely plays a role in the greater capacity of these strains to complete bradyzoite development.
Rice blast, caused by the fungal pathogen Magnaporthe grisea, is a devastating disease causing tremendous yield loss in rice production. The public availability of the complete genome sequence of M. grisea provides ample opportunities to understand the molecular mechanism of its pathogenesis on rice plants at the transcriptome level. To identify all the expressed genes encoded in the fungal genome, we have analyzed the mycelium and appressorium transcriptomes using massively parallel signature sequencing (MPSS), robust-long serial analysis of gene expression (RL-SAGE) and oligoarray methods.
The MPSS analyses identified 12,531 and 12,927 distinct significant tags from mycelia and appressoria, respectively, while the RL-SAGE analysis identified 16,580 distinct significant tags from the mycelial library. When matching these 12,531 mycelial and 12,927 appressorial significant tags to the annotated CDS, 500 bp upstream and 500 bp downstream of CDS, 6,735 unique genes in mycelia and 7,686 unique genes in appressoria were identified. A total of 7,135 mycelium-specific and 7,531 appressorium-specific significant MPSS tags were identified, which correspond to 2,088 and 1,784 annotated genes, respectively, when matching to the same set of reference sequences. Nearly 85% of the significant MPSS tags from mycelia and appressoria and 65% of the significant tags from the RL-SAGE mycelium library matched to the M. grisea genome. MPSS and RL-SAGE methods supported the expression of more than 9,000 genes, representing over 80% of the predicted genes in M. grisea. About 40% of the MPSS tags and 55% of the RL-SAGE tags represent novel transcripts since they had no matches in the existing M. grisea EST collections. Over 19% of the annotated genes were found to produce both sense and antisense tags in the protein-coding region. The oligoarray analysis identified the expression of 3,793 mycelium-specific and 4,652 appressorium-specific genes. A total of 2,430 mycelial genes and 1,886 appressorial genes were identified by both MPSS and oligoarray.
The comprehensive and deep transcriptome analysis by MPSS and RL-SAGE methods identified many novel sense and antisense transcripts in the M. grisea genome at two important growth stages. The differentially expressed transcripts that were identified, especially those specifically expressed in appressoria, represent a genomic resource useful for gaining a better understanding of the molecular basis of M. grisea pathogenicity. Further analysis of the novel antisense transcripts will provide new insights into the regulation and function of these genes in fungal growth, development and pathogenesis in the host plants.
Genomic and transcriptomic alterations affecting key cellular processes such us cell proliferation, differentiation and genomic stability are considered crucial for the development and progression of cancer. Most invasive breast carcinomas are known to derive from precursor in situ lesions. It is proposed that major global expression abnormalities occur in the transition from normal to premalignant stages and further progression to invasive stages. Serial analysis of gene expression (SAGE) was employed to generate a comprehensive global gene expression profile of the major changes occurring during breast cancer malignant evolution.
In the present study we combined various normal and tumor SAGE libraries available in the public domain with sets of breast cancer SAGE libraries recently generated and sequenced in our laboratory. A recently developed modified t test was used to detect the genes differentially expressed.
We accumulated a total of approximately 1.7 million breast tissue-specific SAGE tags and monitored the behavior of more than 25,157 genes during early breast carcinogenesis. We detected 52 transcripts commonly deregulated across the board when comparing normal tissue with ductal carcinoma in situ, and 149 transcripts when comparing ductal carcinoma in situ with invasive ductal carcinoma (P < 0.01).
A major novelty of our study was the use of a statistical method that correctly accounts for the intra-SAGE and inter-SAGE library sources of variation. The most useful result of applying this modified t statistics beta binomial test is the identification of genes and gene families commonly deregulated across samples within each specific stage in the transition from normal to preinvasive and invasive stages of breast cancer development. Most of the gene expression abnormalities detected at the in situ stage were related to specific genes in charge of regulating the proper homeostasis between cell death and cell proliferation. The comparison of in situ lesions with fully invasive lesions, a much more heterogeneous group, clearly identified as the most importantly deregulated group of transcripts those encoding for various families of proteins in charge of extracellular matrix remodeling, invasion and cell motility functions.
breast cancer; gene expression profiling; serial analysis of gene expression
Deep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' e.g. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. Although new technologies are now emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining.
We used a single LongSAGE library of 503,431 tags and a "classic" MPSS library of 1,744,173 tags, both prepared from the same T cell-derived RNA sample, to compare the ability of each method to probe, at considerable depth, a human cellular transcriptome. We show that even though LongSAGE is more error-prone than MPSS, our LongSAGE library nevertheless generated 6.3-fold more genome-matching (and therefore likely error-free) tags than the MPSS library. An analysis of a set of 8,132 known genes detectable by both methods, and for which there is no ambiguity about tag matching, shows that MPSS detects only half (54%) the number of transcripts identified by SAGE (3,617 versus 1,955). Analysis of two additional MPSS libraries shows that each library samples a different subset of transcripts, and that in combination the three MPSS libraries (4,274,992 tags in total) still only detect 73% of the genes identified in our test set using SAGE. The fraction of transcripts detected by MPSS is likely to be even lower for uncharacterized transcripts, which tend to be more weakly expressed. The source of the loss of complexity in MPSS libraries compared to SAGE is unclear, but its effects become more severe with each sequencing cycle (i.e. as MPSS tag length increases).
We show that MPSS libraries are significantly less complex than much smaller SAGE libraries, revealing a serious bias in the generation of MPSS data unlikely to have been circumvented by later technological improvements. Our results emphasize the need for the rigorous testing of new expression profiling technologies.
Serial Analysis of Gene Expression (SAGE) is becoming a widely
used gene expression profiling method for the study of development,
cancer and other human diseases. Investigators using SAGE rely heavily
on the quantitative aspect of this method for cataloging gene expression
and comparing multiple SAGE libraries. We have developed additional
computational and statistical tools to assess the quality and reproducibility
of a SAGE library. Using these methods, a critical variable in the
SAGE protocol was identified that has the potential to bias the
Tag distribution relative to the GC content of the 10 bp SAGE Tag
DNA sequence. We also detected this bias in a number of publicly
available SAGE libraries. It is important to note that the GC content bias
went undetected by quality control procedures in the current SAGE
protocol and was only identified with the use of these statistical
analyses on as few as 750 SAGE Tags. In addition to keeping any
solution of free DiTags on ice, an analysis of the GC content should
be performed before sequencing large numbers of SAGE Tags to be
confident that SAGE libraries are free from experimental bias.
Serial Analysis of Gene Expression (SAGE) is a new technique that allows a detailed and profound quantitative and qualitative knowledge of gene expression profile, without previous knowledge of sequence of analyzed genes. We carried out a modification of SAGE methodology (microSAGE), useful for the analysis of limited quantities of tissue samples, on normal human cervical tissue obtained from a donor without histopathological lesions. Cervical epithelium is constituted mainly by cervical keratinocytes which are the targets of human papilloma virus (HPV), where persistent HPV infection of cervical epithelium is associated with an increase risk for developing cervical carcinomas (CC).
We report here a transcriptome analysis of cervical tissue by SAGE, derived from 30,418 sequenced tags that provide a wealth of information about the gene products involved in normal cervical epithelium physiology, as well as genes not previously found in uterine cervix tissue involved in the process of epidermal differentiation.
This first comprehensive and profound analysis of uterine cervix transcriptome, should be useful for the identification of genes involved in normal cervix uterine function, and candidate genes associated with cervical carcinoma.
Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes.
Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries.
Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues.
To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor.
Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that has the potential to generate the full list of mRNAs present within a cell population at a given time and their frequency. An essential step in SAGE library analysis is the unambiguous assignment of each 14 bp tag to the transcript from which it was derived. This process, called tag-to-gene mapping, represents a step that has to be improved in the analysis of SAGE libraries. Indeed, the existing web sites providing correspondence between tags and transcripts do not concern all species for which numerous EST and cDNA have already been sequenced.
This is the reason why we designed and implemented a freely available tool called Identitag for tag identification that can be used in any species for which transcript sequences are available. Identitag is based on a relational database structure in order to allow rapid and easy storage and updating of data and, most importantly, in order to be able to precisely define identification parameters. This structure can be seen like three interconnected modules : the first one stores virtual tags extracted from a given list of transcript sequences, the second stores experimental tags observed in SAGE experiments, and the third allows the annotation of the transcript sequences used for virtual tag extraction. It therefore connects an observed tag to a virtual tag and to the sequence it comes from, and then to its functional annotation when available. Databases made from different species can be connected according to orthology relationship thus allowing the comparison of SAGE libraries between species. We successfully used Identitag to identify tags from our chicken SAGE libraries and for chicken to human SAGE tags interspecies comparison. Identitag sources are freely available on web site.
Identitag is a flexible and powerful tool for tag identification in any single species and for interspecies comparison of SAGE libraries. It opens the way to comparative transcriptomic analysis, an emerging branch of biology.
D2-40 has been shown a selective marker for lymphatic endothelium, but also shown in the benign cervical basal cells. However, the application of D2-40 immunoreactivity in the cervical basal cells for identifying the grade of cervical intraepithelial neoplasia (CIN) has not been evaluated.
In this study, the immunoreactive patterns of D2-40, compared with p16INK4A, which is currently considered as the useful marker for cervical cancers and their precancerous diseases, were examined in total 125 cervical specimens including 32 of CIN1, 37 of CIN2, 35 of CIN3, and 21 of normal cervical tissue. D2-40 and p16INK4A immunoreactivities were scored semiquantitatively according to the intensity and/or extent of the staining.
Diffuse D2-40 expression with moderate-to-strong intensity was seen in all the normal cervical epithelia (21/21, 100%) and similar pattern of D2-40 immunoreactivity with weak-to-strong intensity was observed in CIN1 (31/32, 97.2%). However, negative and/or focal D2-40 expression was found in CIN2 (negative: 20/37, 54.1%; focal: 16/37, 43.2%) and CIN3 (negative: 22/35, 62.8%; focal: 12/35, 34.3%). On the other hand, diffuse immunostaining for p16INK4A was shown in 37.5% of CIN1, 64.9% of CIN2, and 80.0% of CIN3. However, the immunoreactive pattern of D2-40 was not associated with the p16INK4A immunoreactivity.
Immunohistochemical analysis of D2-40 combined with p16INK4A may have a significant implication in clinical practice for better identifying the grade of cervical intraepithelial neoplasia, especially for distinguishing CIN1 from CIN2/3.
D2-40; cervical intraepithelial neoplasia; immunohistochemistry; p16INK4A
Worldwide gastric carcinoma has marked geographical variations and worse outcome in patients from the West compared to the East. Although these differences has been explained by better diagnostic criteria, improved staging methods and more radical surgery, emerging evidence supports the concept that gene expression differences associated to ethnicity might contribute to this disparate outcome. Here, we collected datasets from 4 normal and 11 gastric carcinoma Serial Gene Expression Analysis (SAGE) libraries from two different ethnicities. All normal SAGE libraries as well as 7 tumor libraries were from the West and 4 tumor libraries were from the East. These datasets we compare by Correspondence Analysis and Support Tree analysis and specific differences in tags expression were identified by Significance Analysis for Microarray. Tags to gene assignments were performed by CGAP-SAGE Genie or TAGmapper. The analysis of global transcriptome shows a clear separation between normal and tumor libraries with 90 tags differentially expressed. A clear separation was also found between the West and the East tumor libraries with 54 tags differentially expressed. Tags to gene assignments identified 15 genes, 5 of them with significant higher expression in the West libraries in comparison to the East libraries. qRT-PCR in cell lines from west and east origin confirmed these differences. Interestingly, two of these genes have been associated to aggressiveness (COL1A1 and KLK10). In conclusion we found that in silico analysis of SAGE libraries from two different ethnicities reveal differences in gene expression profile. These expression differences might contribute to explain the disparate outcome between the West and the East.
Cervical cancer is the second most common female cancer worldwide. The ability to quantify physiological and morphological changes in the cervix is not only useful in the diagnosis of cervical precancers but also important in aiding the design of cost-effective detection systems for use in developing countries that lack well-established screening and diagnostic programs. We assessed the capability of a diffuse reflectance spectroscopy technique to identify contrasts in optical biomarkers that vary with different grades of cervical intraepithelial neoplasia (CIN) from normal cervical tissues. The technology consists of an optical probe and an instrument (with broadband light source, dispersive element, and detector), and a Monte Carlo algorithm to extract optical biomarker contributions including total hemoglobin (Hb) concentration, Hb saturation, and reduced scattering coefficient from the measured spectra. Among 38 patients and 89 sites examined, 46 squamous normal sites, 18 CIN 1, and 15 CIN 2+ sites were included in the analysis. Total Hb was statistically higher in CIN 2+ (18.3 ± 3.6 µM, mean ± SE) compared with normal (9.58 ± 1.91 µM) and CIN 1 (12.8 ± 2.6 µM), whereas scattering was significantly reduced in CIN 1 (8.3 ± 0.8 cm-1) and CIN 2+ (8.6 ± 1.0 cm-1) compared with normal (10.2 ± 1.1 cm-1). Hemoglobin saturation was not significantly altered in CIN 2+ compared with normal and CIN 1. The difference in total Hb is likely because of stromal angiogenesis, whereas decreased scattering can be attributed to breakdown of collagen network in the cervical stroma.
Five species of the genus Schistosoma, a parasitic trematode flatworm, are causative agents of Schistosomiasis, a disease that is endemic in a large number of developing countries, affecting millions of patients around the world. By using SAGE (Serial Analysis of Gene Expression) we describe here the first large-scale quantitative analysis of the Schistosoma mansoni transcriptome, one of the most epidemiologically relevant species of this genus.
After extracting mRNA from pooled male and female adult-worms, a SAGE library was constructed and sequenced, generating 68,238 tags that covered more than 6,000 genes expressed in this developmental stage. An analysis of the ordered tag-list shows the genes of F10 eggshell protein, pol-polyprotein, HSP86, 14-3-3 and a transcript yet to be identified to be the five top most abundant genes in pooled adult worms. Whereas only 8% of the 100 most abundant tags found in adult worms of S. mansoni could not be assigned to transcripts of this parasite, 46.9% of the total ditags could not be mapped, demonstrating that the 3 sequence of most of the rarest transcripts are still to be identified. Mapping of our SAGE tags to S. mansoni genes suggested the occurrence of alternative-polyadenylation in at least 13 gene transcripts. Most of these events seem to shorten the 3 UTR of the mRNAs, which may have consequences over their stability and regulation.
SAGE revealed the frequency of expression of the majority of the S. mansoni genes. Transcriptome data suggests that alternative polyadenylation is likely to be used in the control of mRNA stability in this organism. When transcriptome was compared with the proteomic data available, we observed a correlation of about 50%, suggesting that both transcriptional and post-transcriptional regulation are important for determining protein abundance in S. mansoni. The generation of SAGE tags from other life-cycle stages should contribute to reveal the dynamics of gene expression in this important parasite.
Neural tube defects (NTDs) are common human birth defects with a complex etiology. To develop a comprehensive knowledge of the genes expressed during normal neurulation, we established transcriptomes from human neural tube fragments during and after neurulation using long Serial Analysis of Gene Expression (long-SAGE).
Rostral and caudal neural tubes were dissected from normal human embryos aged between 26 and 32 days of gestation. Tissues from the same region and Carnegie stage were pooled (n>=4) and total RNA extracted to construct four long-SAGE libraries. Tags were mapped using the UniGene Homo sapiens 17 bp tag-to-gene best mapping set. Differentially expressed genes were identified by chi-square or Fisher’s exact test and validation was performed for a subset of those transcripts using in situ hybridization. In silico analyses were performed with BinGO and EXPANDER.
We observed most genes to be similarly regulated in rostral and caudal regions, but expression profiles differed during and after closure. In silico analysis found similar enrichments in both regions for biological process terms, transcription factor binding and miRNA target motifs. Twelve genes potentially expressing alternate isoforms by region or developmental stage, and the miRNAs miR-339-5p, miR-141/200a, miR-23ab, and miR-129/129-5p, are among several potential candidates identified here for future research.
Time appears to influence gene expression in the developing central nervous system more than location. These data provide a novel complement to traditional strategies of identifying genes associated with human NTDs, and offer unique insight into the genes associated with normal human neurulation.
gene expression; Homo sapiens; long-SAGE; neurulation; neural tube defects
Cervical carcinomas are second most frequent type of women cancer. Success in diagnostics of this disease is due to the use of Pap-test (cytological smear analysis). However Pap-test gives significant portion of both false-positive and false-negative conclusions. Amendments of the diagnostic procedure are desirable. Aetiological role of papillomaviruses in cervical cancer is established while the role of cellular gene alterations in the course of tumor progression is less clear. Several research groups including us have recently named the protein p16INK4a as a possible diagnostic marker of cervical cancer. To evaluate whether the specificity of p16INK4a expression in dysplastic and neoplastic cervical epithelium is sufficient for such application we undertook a broader immunochistochemical registration of this protein with a highly p16INK4a-specific monoclonal antibody.
Paraffin-embedded samples of diagnostic biopsies and surgical materials were used. Control group included vaginal smears of healthy women and biopsy samples from patients with cervical ectopia. We examined 197 samples in total. Monoclonal antibody E6H4 (MTM Laboratories, Germany) was used.
In control samples we did not find any p16INK4a-positive cells. Overexpression of p16INK4a was detected in samples of cervical dysplasia (CINs) and carcinomas. The portion of p16INK4a-positive samples increased in the row: CIN I – CIN II – CIN III – invasive carcinoma. For all stages the samples were found to be heterogeneous with respect to p16INK4a-expression. Every third of CINs III and one invasive squamous cell carcinoma (out of 21 analyzed) were negative.
Overexpression of the protein p16INK4a is typical for dysplastic and neoplastic epithelium of cervix uteri. However p16INK4a-negative CINs and carcinomas do exist. All stages of CINs and carcinomas analyzed are heterogeneous with respect to p16INK4a expression. So p16INK4a-negativity is not a sufficient reason to exclude a patient from the high risk group. As far as normal cervical epithelium is p16INK4a-negative and the ratio p16INK4a-positive/ p16INK4a-negative samples increases at the advanced stages application of immunohisto-/cytochemical test for p16INK4a may be regarded as a supplementary test for early diagnostics of cervical cancer.
Co-factors for cervical cancer, including oral contraceptive (OC) use, smoking and multiparity have been identified; however, the stage at which they act in cervical carcinogenesis is not clear. We compared established risk factors among women with CIN2 and CIN3 to evaluate the heterogeneity of these factors in precancer and also assessed their role during cervical carcinogenesis.
The current analysis included 2783 women with various stages of cervical disease who were enrolled in the Study to Understand Cervical Cancer Early Endpoints and Determinants (SUCCEED) and the Biopsy Study. Associations of co-factors within cervical precancer and at different stages of cervical carcinogenesis were estimated using logistic regression.
Long-term OC use (10+ years vs. never: OR=2.42, 95% CI: [1.13–5.15]), multiparity (3+ births vs. nulliparous: OR=1.54 [1.04–2.28]), smoking (ever vs. never: OR=1.95 [1.48–2.58]), and no Pap test in the previous five years (2.05 [1.32–3.17]) were positively associated with CIN3 compared to CIN2. We observed that long-term OC use, parity and smoking were associated with an increased risk of CIN3 compared to
Differences in established risk factors suggest that CIN3 is a more specific definition of precancer than CIN2. Hormonally-related factors and smoking play a role in the transition from human papillomavirus infection to precancer.
The retinoblastoma gene was the first tumour suppressor gene identified that was altered not only in retinoblastomas but has been described in a wide variety of human neoplasms. The retinoblastoma gene encodes a nuclear phosphoprotein that in its hypophosphorylated state plays an important role in regulating the cell cycle, thus preventing from tumour formation. Expression of retinoblastoma gene protein product (pRB) was investigated in 118 formalin-fixed, paraffin-embedded cervical tissues by immunohistochemistry using commercially available antibody directed against RB protein. Ten normal ectocervical epithelium, 16 cervical intraepithelial neoplasia (CIN) I, 13 CIN II, 14 CIN III, 53 invasive squamous cell carcinoma, 11 adenocarcinoma and 1 small cell carcinoma were selected for this study. The proportions of pRB-positive cells as well as the extent of pRB expression in ectocervical squamous epithelium were assessed and compared among the lesions. The pRB expression was observed in 100% of normal ectocervical epithelium (n=10), 100% of CIN lesions (n=43) and 98.5% of invasive carcinoma of the uterine cervix (n=65) and were statistically significant when CIN or CIN/invasive were compared to normal cases (P < 0.01, P < 0.05 respectively). While in invasive squamous cell carcinoma (SCC), 81.8% (9/11) pRB-positive cells were found in much higher percentages in well differentiated SCC compared to 64.3% (18/28) of moderately differentiated cases and only 7.1% (1/14) of poorly differentiated SCC (P < 0.01, respectively). The results of this study suggest that loss of RB protein expression is rare in carcinoma of the uterine cervix and this protein may be important in the pathogenesis of cervical carcinoma.
Uterine cervix; pRB; immunohistochemistry
Results 1-25 (1761235)