Relative quantification by normalization against a stably expressed reference gene is a widely used data analysis method in microarray and quantitative real-time polymerase chain reaction (qRT-PCR) platforms; however, recent evidence suggests that many commonly utilized reference genes are unstable in certain experimental systems and situations. The primary aim of this study, therefore, was to screen and identify stably expressed reference genes in a well-established rat model of vocal fold mucosal injury. We selected and evaluated the expression stability of nine candidate reference genes. Ablim1, Sptbn1 and Wrnip1 were identified as stably expressed in a model-specific microarray dataset and were further validated as suitable reference genes in an independent qRT-PCR experiment using 2-ΔCT and pairwise comparison-based (geNorm) analyses. Parallel analysis of six commonly used reference genes identified Sdha as the only stably expressed candidate in this group. Sdha, Sptbn1 and the geometric mean of Sdha and Sptbn1 each provided accurate normalization of target gene Tgfb1; Gapdh, the least stable candidate gene in our dataset, provided inaccurate normalization and an invalid experimental result. The stable reference genes identified here are suitable for accurate normalization of target gene expression in vocal fold mucosal injury experiments.
gene expression; geNorm; housekeeping gene; inflammation; larynx; tissue repair; transcription; wound healing
Serial analysis of gene expression (SAGE) is a powerful technique that can be used for global analysis of gene expression. Its chief advantage over other methods is that SAGE does not require prior knowledge of the genes of interest and provides quantitative and qualitative data of potentially every transcribed sequence in a particular tissue or cell type. Furthermore, SAGE can quantify low-abundance transcripts and reliably detect relatively small differences in transcript abundance between cell populations. However, SAGE demands high input levels of mRNA which are often unavailable, particularly when studying human disease. To overcome this limitation, we have developed a modification of SAGE that allows detailed global analysis of gene expression in extremely small quantities of tissue or cultured cells. We have called this approach 'SAGE-Lite'. This technique was used for the global analysis of transcription in samples of normal and pathological human cerebrovasculature to study the molecular pathology of intracranial aneurysms. These samples, which are obtained during operative surgical repair, are typically no bigger than 1 or 2 mm and yield <100 ng of total RNA. In addition, we show that SAGE-Lite allows simple and rapid isolation of long cDNAs from short (15 bp) SAGE sequence tags.
Reference genes are widely used to normalise transcript abundance data determined by quantitative RT-PCR and microarrays. However, the approaches taken to define reference genes can be variable. Although Oryza sativa (rice) is a widely used model plant and important crop specie, there has been no comprehensive analysis carried out to define superior reference genes.
Analysis of 136 Affymetrix transcriptome datasets comprising of 373 genome microarrays from studies in rice that encompass tissue, developmental, abiotic, biotic and hormonal transcriptome datasets identified 151 genes whose expression was considered relatively stable under all conditions. A sub-set of 12 of these genes were validated by quantitative RT-PCR and were seen to be stable under a number of conditions. All except one gene that has been previously proposed as a stably expressed gene for rice, were observed to change significantly under some treatment.
A new set of reference genes that are stable across tissue, development, stress and hormonal treatments have been identified in rice. This provides a superior set of reference genes for future studies in rice. It confirms the approach of mining large scale datasets as a robust method to define reference genes, but cautions against using gene orthology or counterparts of reference genes in other plant species as a means of defining reference genes.
Quantification and normalization of RT-qPCR data critically depends on the expression of so called reference genes. Our goal was to develop a strategy for the selection of reference genes that utilizes microarray data analysis and combines known approaches for gene stability evaluation and to select a set of appropriate reference genes for research and clinical analysis of breast samples with different receptor and cancer status using this strategy.
A preliminary search of reference genes was based on high-throughput analysis of microarray datasets. The final selection and validation of the candidate genes were based on the RT-qPCR data analysis using several known methods for expression stability evaluation: comparative ∆Ct method, geNorm, NormFinder and Haller equivalence test.
A set of five reference genes was identified: ACTB, RPS23, HUWE1, EEF1A1 and SF3A1. The initial selection was based on the analysis of publically available well-annotated microarray datasets containing different breast cancers and normal breast epithelium from breast cancer patients and epithelium from cancer-free patients. The final selection and validation were performed using RT-qPCR data from 39 breast cancer biopsy samples. Three genes from the final set were identified by the means of microarray analysis and were novel in the context of breast cancer assay. We showed that the selected set of reference genes is more stable in comparison not only with individual genes, but also with a system of reference genes used in commercial OncotypeDX test.
A selection of reference genes for RT-qPCR can be efficiently performed by combining a preliminary search based on the high-throughput analysis of microarray datasets and final selection and validation based on the analysis of RT-qPCR data with a simultaneous examination of different expression stability measures. The identified set of reference genes proved to be less variable and thus potentially more efficient for research and clinical analysis of breast samples comparing to individual genes and the set of reference genes used in OncotypeDX assay.
Reference genes; Microarrays; Reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR); Gene expression; Breast cancer
High-throughput genomic technologies (HGTs), including next-generation DNA sequencing (NGS), microarray, and serial analysis of gene expression (SAGE), have become effective experimental tools for cancer genomics to identify cancer-associated somatic genomic alterations and genes. The main hurdle in cancer genomics is to identify the real causative mutations or genes out of many candidates from an HGT-based cancer genomic analysis. One useful approach is to refer to known cancer genes and associated information. The list of known cancer genes can be used to determine candidates of cancer driver mutations, while cancer gene-related information, including gene expression, protein-protein interaction, and pathways, can be useful for scoring novel candidates. Some cancer gene or mutation databases exist for this purpose, but few specialized tools exist for an automated analysis of a long gene list from an HGT-based cancer genomic analysis. This report presents a new web-accessible bioinformatic tool, called CaGe, a cancer genome annotation system for the assessment of candidates of cancer genes from HGT-based cancer genomics. The tool provides users with information on cancer-related genes, mutations, pathways, and associated annotations through annotation and browsing functions. With this tool, researchers can classify their candidate genes from cancer genome studies into either previously reported or novel categories of cancer genes and gain insight into underlying carcinogenic mechanisms through a pathway analysis. We show the usefulness of CaGe by assessing its performance in annotating somatic mutations from a published small cell lung cancer study.
annotation; cancer gene; high-throughput genomic technology; mutation; next-generation sequencing; pathway
Serial Analysis of Gene Expression (SAGE) is becoming a widely
used gene expression profiling method for the study of development,
cancer and other human diseases. Investigators using SAGE rely heavily
on the quantitative aspect of this method for cataloging gene expression
and comparing multiple SAGE libraries. We have developed additional
computational and statistical tools to assess the quality and reproducibility
of a SAGE library. Using these methods, a critical variable in the
SAGE protocol was identified that has the potential to bias the
Tag distribution relative to the GC content of the 10 bp SAGE Tag
DNA sequence. We also detected this bias in a number of publicly
available SAGE libraries. It is important to note that the GC content bias
went undetected by quality control procedures in the current SAGE
protocol and was only identified with the use of these statistical
analyses on as few as 750 SAGE Tags. In addition to keeping any
solution of free DiTags on ice, an analysis of the GC content should
be performed before sequencing large numbers of SAGE Tags to be
confident that SAGE libraries are free from experimental bias.
Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.
Gene expression data are a rich source of information about the transcriptional dis-regulation of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers.
We present an approach to mine expressed sequence tags to discover cancer biomarkers. A false discovery rate analysis suggests that the approach generates less than 22% false discoveries when applied to combined human and mouse whole genome screens. With this approach, we identify the 200 genes most consistently differentially expressed in cancer (called HM200) and proceed to characterize these genes. When used for prediction in a variety of cancer classification tasks (in 24 independent cancer microarray datasets, 59 classifications total), we show that HM200 and the shorter gene list HM100 are very competitive cancer biomarker sets. Indeed, when compared to 13 published cancer marker gene lists, HM200 achieves the best or second best classification performance in 79% of the classifications considered.
These results indicate the existence of at least one general cancer marker set whose predictive value spans several tumor types and classification types. Our comparison with other marker gene lists shows that HM200 markers are mostly novel cancer markers. We also identify the previously published Pomeroy-400 list as another general cancer marker set. Strikingly, Pomeroy-400 has 27 genes in common with HM200. Our data suggest that a core set of genes are responsive to the deregulation of pathways involved in tumorigenesis in a variety of tumor types and that these genes could serve as transcriptional cancer markers in applications of clinical interest. Finally, our study suggests new strategies to select and evaluate cancer biomarkers in microarray studies.
Serial analysis of gene expression (SAGE) is a powerful approach for the identification of differentially expressed genes, providing comprehensive and quantitative gene expression profiles in the form of short tag sequences. Each tag represents a unique transcript, and the relative frequencies of tags in the SAGE library are equal to the relative proportions of the transcripts they represent. One of the major obstacles in the preparation of SAGE libraries from microorganisms is the requirement for large amounts of starting material (i.e., mRNA). Here, we present a novel approach for the construction of SAGE libraries from small quantities of total RNA by using Y linkers to selectively amplify 3′ cDNA fragments. To validate this method, we constructed comprehensive gene expression profiles of the toxic dinoflagellate Pfiesteria shumwayae. SAGE libraries were constructed from an actively toxic fish-fed culture of P. shumwayae and from a recently toxic alga-fed culture. P. shumwayae-specific gene transcripts were identified by comparison of tag sequences in the two libraries. Representative tags with frequencies ranging from 0.026 to 3.3% of the total number of tags in the libraries were chosen for further analysis. Expression of each transcript was confirmed in separate control cultures of toxic P. shumwayae. The modified SAGE method described here produces gene expression profiles that appear to be both comprehensive and quantitative, and it is directly applicable to the study of gene expression in other environmentally relevant microbial species.
Serial analysis of gene expression (SAGE) is a powerful quantification technique for gene expression data. The huge
amount of tag data in SAGE libraries of samples is difficult to analyze with current SAGE analysis tools. Data is often not
provided in a biologically significant way for cross‐analysis and ‐comparison, thus limiting its application.
Hence, an integrated software platform that can perform such a complex task is required. Here, we implement set theory for
cross‐analyzing gene expression data among different SAGE libraries of tissue sources; up‐ or down‐regulated
tissue‐specific tags can be identified computationally. Extract‐SAGE employs a genetic algorithm (GA) to reduce the
number of genes among the SAGE libraries. Its representative tag mining will facilitate the discovery of the candidate genes with
discriminating gene expression.
This software and user manual are freely available at
SAGE; genetic algorithm; set theory; software
The serial analysis of gene expression (SAGE) method is based on
the isolation of unique sequence tags from individual transcripts
and concatenation of tags serially into long DNA molecules. SAGE
is an innovative technique that offers the potential of
cataloging both the identity and relative frequencies of mRNA
transcripts in a given RNA preparation. It can quantify
low-abundance transcripts and reliably detect relatively small
differences in transcript abundance between cell populations.
SAGE data can be used to complement studies in cases where other
gene expression methods may be more convenient or
efficient. SAGE can be used in a wide variety of applications to
identify disease-related genes, to analyze the effect of drugs on
tissues, and to provide insights into the disease pathways. The
most important application of SAGE is the identification of
differentially expressed genes. In this review, we describe
various applications of this powerful technology in malarial
parasite, yeast, plant, and animal systems.
Microarray-based gene expression measurement is one of the major methods for transcriptome analysis. However, current microarray data are substantially affected by microarray platforms and RNA references because of the microarray method can provide merely the relative amounts of gene expression levels. Therefore, valid comparisons of the microarray data require standardized platforms, internal and/or external controls and complicated normalizations. These requirements impose limitations on the extensive comparison of gene expression data. Here, we report an effective approach to removing the unfavorable limitations by measuring the absolute amounts of gene expression levels on common DNA microarrays. We have developed a multiplex cDNA quantification method called GEP-DEAN (Gene expression profiling by DCN-encoding-based analysis). The method was validated by using chemically synthesized DNA strands of known quantities and cDNA samples prepared from mouse liver, demonstrating that the absolute amounts of cDNA strands were successfully measured with a sensitivity of 18 zmol in a highly multiplexed manner in 7 h.
Reverse transcription quantitative real-time PCR (RT-qPCR) is widely used in microRNA (miRNA) expression studies on cancer. To compensate for the analytical variability produced by the multiple steps of the method, relative quantification of the measured miRNAs is required, which is based on normalization to endogenous reference genes. No study has been performed so far on reference miRNAs for normalization of miRNA expression in urothelial carcinoma. The aim of this study was to identify suitable reference miRNAs for miRNA expression studies by RT-qPCR in urothelial carcinoma.
Candidate reference miRNAs were selected from 24 urothelial carcinoma and normal bladder tissue samples by miRNA microarrays. The usefulness of these candidate reference miRNAs together with the commonly for normalization purposes used small nuclear RNAs RNU6B, RNU48, and Z30 were thereafter validated by RT-qPCR in 58 tissue samples and analyzed by the algorithms geNorm, NormFinder, and BestKeeper.
Based on the miRNA microarray data, a total of 16 miRNAs were identified as putative reference genes. After validation by RT-qPCR, miR-101, miR-125a-5p, miR-148b, miR-151-5p, miR-181a, miR-181b, miR-29c, miR-324-3p, miR-424, miR-874, RNU6B, RNU48, and Z30 were used for geNorm, NormFinder, and BestKeeper analyses that gave different combinations of recommended reference genes for normalization.
The present study provided the first systematic analysis for identifying suitable reference miRNAs for miRNA expression studies of urothelial carcinoma by RT-qPCR. Different combinations of reference genes resulted in reliable expression data for both strongly and less strongly altered miRNAs. Notably, RNU6B, which is the most frequently used reference gene for miRNA studies, gave inaccurate normalization. The combination of four (miR-101, miR-125a-5p, miR-148b, and miR-151-5p) or three (miR-148b, miR-181b, and miR-874,) reference miRNAs is recommended for normalization.
Oligoarrays have become an accessible technique for exploring the transcriptome, but it is presently unclear how absolute transcript data from this technique compare to the data achieved with tag-based quantitative techniques, such as massively parallel signature sequencing (MPSS) and serial analysis of gene expression (SAGE). By use of the TransCount method we calculated absolute transcript concentrations from spotted oligoarray intensities, enabling direct comparisons with tag counts obtained with MPSS and SAGE. The tag counts were converted to number of transcripts per cell by assuming that the sum of all transcripts in a single cell was 5·105. Our aim was to investigate whether the less resource demanding and more widespread oligoarray technique could provide data that were correlated to and had the same absolute scale as those obtained with MPSS and SAGE.
A number of 1,777 unique transcripts were detected in common for the three technologies and served as the basis for our analyses. The correlations involving the oligoarray data were not weaker than, but, similar to the correlation between the MPSS and SAGE data, both when the entire concentration range was considered and at high concentrations. The data sets were more strongly correlated at high transcript concentrations than at low concentrations. On an absolute scale, the number of transcripts per cell and gene was generally higher based on oligoarrays than on MPSS and SAGE, and ranged from 1.6 to 9,705 for the 1,777 overlapping genes. The MPSS data were on same scale as the SAGE data, ranging from 0.5 to 3,180 (MPSS) and 9 to1,268 (SAGE) transcripts per cell and gene. The sum of all transcripts per cell for these genes was 3.8·105 (oligoarrays), 1.1·105 (MPSS) and 7.6·104 (SAGE), whereas the corresponding sum for all detected transcripts was 1.1·106 (oligoarrays), 2.8·105 (MPSS) and 3.8·105 (SAGE).
The oligoarrays and TransCount provide quantitative transcript concentrations that are correlated to MPSS and SAGE data, but, the absolute scale of the measurements differs across the technologies. The discrepancy questions whether the sum of all transcripts within a single cell might be higher than the number of 5·105 suggested in the literature and used to convert tag counts to transcripts per cell. If so, this may explain the apparent higher transcript detection efficiency of the oligoarrays, and has to be clarified before absolute transcript concentrations can be interchanged across the technologies. The ability to obtain transcript concentrations from oligoarrays opens up the possibility of efficient generation of universal transcript databases with low resource demands.
The somatic embryogenesis tissue culture process has been utilized to propagate high yielding oil palm. Due to the low callogenesis and embryogenesis rates, molecular studies were initiated to identify genes regulating the process, and their expression levels are usually quantified using reverse transcription quantitative real-time PCR (RT-qPCR). With the recent release of oil palm genome sequences, it is crucial to establish a proper strategy for gene analysis using RT-qPCR. Selection of the most suitable reference genes should be performed for accurate quantification of gene expression levels.
In this study, eight candidate reference genes selected from cDNA microarray study and literature review were evaluated comprehensively across 26 tissue culture samples using RT-qPCR. These samples were collected from two tissue culture lines and media treatments, which consisted of leaf explants cultures, callus and embryoids from consecutive developmental stages. Three statistical algorithms (geNorm, NormFinder and BestKeeper) confirmed that the expression stability of novel reference genes (pOP-EA01332, PD00380 and PD00569) outperformed classical housekeeping genes (GAPDH, NAD5, TUBULIN, UBIQUITIN and ACTIN). PD00380 and PD00569 were identified as the most stably expressed genes in total samples, MA2 and MA8 tissue culture lines. Their applicability to validate the expression profiles of a putative ethylene-responsive transcription factor 3-like gene demonstrated the importance of using the geometric mean of two genes for normalization.
Systematic selection of the most stably expressed reference genes for RT-qPCR was established in oil palm tissue culture samples. PD00380 and PD00569 were selected for accurate and reliable normalization of gene expression data from RT-qPCR. These data will be valuable to the research associated with the tissue culture process. Also, the method described here will facilitate the selection of appropriate reference genes in other oil palm tissues and in the expression profiling of genes relating to yield, biotic and abiotic stresses.
Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives.
Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data.
We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.
In the last decade, genome-wide gene expression data has been collected from a large number of cancer specimens. In many studies utilizing either microarray-based or knowledge-based gene expression profiling, both the validation of candidate genes and the identification and inclusion of biomarkers in prognosis-modeling has employed real-time quantitative PCR on reverse transcribed mRNA (qRT-PCR) because of its inherent sensitivity and quantitative nature. In qRT-PCR data analysis, an internal reference gene is used to normalize the variation in input sample quantity. The relative quantification method used in current real-time qRT-PCR analysis fails to ensure data comparability pivotal in identification of prognostic biomarkers. By employing an absolute qRT-PCR system that uses a single standard for marker and reference genes (SSMR) to achieve absolute quantification, we showed that the normalized gene expression data is comparable and independent of variations in the quantities of sample as well as the standard used for generating standard curves. We compared two sets of normalized gene expression data with same histological diagnosis of brain tumor from two labs using relative and absolute real-time qRT-PCR. Base-10 logarithms of the gene expression ratio relative to ACTB were evaluated for statistical equivalence between tumors processed by two different labs. The results showed an approximate comparability for normalized gene expression quantified using a SSMR-based qRT-PCR. Incomparable results were seen for the gene expression data using relative real-time qRT-PCR, due to inequality in molar concentration of two standards for marker and reference genes. Overall results show that SSMR-based real-time qRT-PCR ensures comparability of gene expression data much needed in establishment of prognostic/predictive models for cancer patients—a process that requires large sample sizes by combining independent sets of data.
gene expression; quantification; qRT-PCR; biomarkers
As a growing number of complementary transcripts, susceptible to exert various regulatory functions, are being found in eukaryotes, high throughput analytical methods are needed to investigate their expression in multiple biological samples. Serial Analysis of Gene Expression (SAGE), based on the enumeration of directionally reliable short cDNA sequences (tags), is capable of revealing antisense transcripts. We initially detected them by observing tags that mapped on to the reverse complement of known mRNAs. The presence of such tags in individual SAGE libraries suggested that SAGE datasets contain latent information on antisense transcripts. We raised a collection of virtual tags for mining these data. Tag pairs were assembled by searching for complementarities between 24-nt long sequences centered on the potential SAGE-anchoring sites of well-annotated human expressed sequences. An analysis of their presence in a large collection of published SAGE libraries revealed transcripts expressed at high levels from both strands of two adjacent, oppositely oriented, transcription units. In other cases, the respective transcripts of such cis-oriented genes displayed a mutually exclusive expression pattern or were co-expressed in a small number of libraries. Other tag pairs revealed overlapping transcripts of trans-encoded unique genes. Finally, we isolated a group of tags shared by multiple transcripts. Most of them mapped on to retroelements, essentially represented in humans by Alu sequences inserted in opposite orientations in the 3′UTR of otherwise different mRNAs. Registering these tags in separate files makes possible computational searches focused on unique sense–antisense pairs. The method developed in the present work shows that SAGE datasets constitute a major resource of rapidly investigating with high sensitivity the expression of antisense transcripts, so that a single tag may be detected in one library when screening a large number of biological samples.
Serial Analysis of Gene Expression (SAGE) and Massively Parallel Signature Sequencing (MPSS) are powerful techniques for gene expression analysis. A crucial step in analyzing SAGE and MPSS data is the assignment of experimentally obtained tags to a known transcript. However, tag to transcript assignment is not a straightforward process since alternative tags for a given transcript can also be experimentally obtained. Here, we have evaluated the impact of Single Nucleotide Polymorphisms (SNPs) on the generation of alternative SAGE and MPSS tags. This was achieved through the construction of a reference database of SNP-associated alternative tags, which has been integrated with SAGE Genie. A total of 2020 SNP-associated alternative tags were catalogued in our reference database and at least one SNP-associated alternative tag was observed for ∼8.6% of all known human genes. A significant fraction (61.9%) of these alternative tags matched a list of experimentally obtained tags, validating their existence. In addition, the origin of four out of five SNP-associated alternative MPSS tags was experimentally confirmed through the use of the GLGI-MPSS protocol (Generation of Long cDNA fragments for Gene Identification). The availability of our SNP-associated alternative tag database will certainly improve the interpretation of SAGE and MPSS experiments.
Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e.g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones.
We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at
We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable.
SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers.
To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique “multi-pool method” that analyzes multiple pools of pair-wise case controls individually. When all the settings are in “inclusion”, the common SAGE tag sequences are mined. When one tissue type is in “inclusion” and the other types of tissues are not in “inclusion”, the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries.
The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing
In the investigation of the expression levels of target genes, reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is the most accurate and widely used method. However, a normalization step is a prerequisite to obtain accurate quantification results from RT-qPCR data. Therefore, many studies regarding the selection of reference genes have been carried out. Recently, these studies have involved large-scale gene analysis methods such as microarray and next generation sequencing. In our previous studies, we analyzed large amounts of transcriptome data from the cynomolgus monkey. Using a modification of this large-scale transcriptome sequencing dataset, we selected and compared 12 novel candidate reference genes (ARFGAP2, ARL1, BMI1, CASC3, DDX3X, MRFAP1, ORMDL1, RSL24D1, SAR1A, USP22, ZC3H11A, and ZRANB2) and 4 traditionally used reference genes (ACTB, GAPDH, RPS19, and YWHAZ) in 13 different whole-body tissues by the 3 well-known programs geNorm, NormFinder, and BestKeeper. Combined analysis by these 3 programs showed that ADP-ribosylation factor GTPase activating protein 2 (ARFGAP2), morf4 family associated protein 1 (MRFAP1), and ADP-ribosylation factor-like 1 (ARL1) are the most appropriate reference genes for accurate normalization. Interestingly, 4 traditionally used reference genes were the least stably expressed in this study. For this reason, selection of appropriate reference genes is vitally important, and large-scale analysis is a good method for finding new candidate reference genes. Our results could provide reliable reference gene lists for future studies on the expression of various target genes in the cynomolgus monkey.
Advances in high-throughput technologies and bioinformatics have transformed gene expression profiling methodologies. The results of microarray experiments are often validated using reverse transcription quantitative PCR (RT-qPCR), which is the most sensitive and reproducible method to quantify gene expression. Appropriate normalisation of RT-qPCR data using stably expressed reference genes is critical to ensure accurate and reliable results. Mi(cro)RNA expression profiles have been shown to be more accurate in disease classification than mRNA expression profiles. However, few reports detailed a robust identification and validation strategy for suitable reference genes for normalisation in miRNA RT-qPCR studies.
We adopt and report a systematic approach to identify the most stable reference genes for miRNA expression studies by RT-qPCR in colorectal cancer (CRC). High-throughput miRNA profiling was performed on ten pairs of CRC and normal tissues. By using the mean expression value of all expressed miRNAs, we identified the most stable candidate reference genes for subsequent validation. As such the stability of a panel of miRNAs was examined on 35 tumour and 39 normal tissues. The effects of normalisers on the relative quantity of established oncogenic (miR-21 and miR-31) and tumour suppressor (miR-143 and miR-145) target miRNAs were assessed.
In the array experiment, miR-26a, miR-345, miR-425 and miR-454 were identified as having expression profiles closest to the global mean. From a panel of six miRNAs (let-7a, miR-16, miR-26a, miR-345, miR-425 and miR-454) and two small nucleolar RNA genes (RNU48 and Z30), miR-16 and miR-345 were identified as the most stably expressed reference genes. The combined use of miR-16 and miR-345 to normalise expression data enabled detection of a significant dysregulation of all four target miRNAs between tumour and normal colorectal tissue.
Our study demonstrates that the top six most stably expressed miRNAs (let-7a, miR-16, miR-26a, miR-345, miR-425 and miR-454) described herein should be validated as suitable reference genes in both high-throughput and lower throughput RT-qPCR colorectal miRNA studies.
Quantitative real-time RT-PCR (RT-qPCR) is being widely used in microRNA expression research. However, few reports detailed a robust identification and validation strategy for suitable reference genes for normalisation in microRNA RT-qPCR studies. The aim of this study was to identify the most stable reference gene(s) for quantification of microRNA expression analysis in uterine cervical tissues. A microarray was performed on 6 pairs of uterine cervical tissues to identify the candidate reference genes. The stability of candidate reference genes was assessed by RT-qPCR in 23 pairs of uterine cervical tissues. The identified most stable reference genes were further validated in other cohort of 108 clinical uterine cervical samples: (HR-HPV- normal, n = 21; HR-HPV+ normal, n = 19; cervical intraepithelial neoplasia [CIN], n = 47; cancer, n = 21), and the effects of normalizers on the relative quantity of target miR-424 were assessed. In the array experiment, miR-26a, miR-23a, miR-200c, let-7a, and miR-1979 were identified as candidate reference genes for subsequent validation. MiR-23a was identified as the most reliable reference gene followed by miR-191. The use of miR-23a and miR-191 to normalize expression data enabled detection of a significant deregulation of miR-424 between normal, CIN and cancer tissue. Our results suggested that miR-23a and miR-191 are the optimal reference microRNAs that can be used for normalization in profiling studies of cervical tissues; miR-23a is a novel microRNA normalizer.
gene expression profiling; microRNAs; real-time polymerase chain reaction; uterine cervical neoplasms
Accurate interpretation of quantitative PCR (qPCR) data requires normalization using constitutively expressed reference genes. Ribosomal RNA is often used as a reference gene for transcriptional studies in E. coli. However, the choice of reliable reference genes has not been systematically validated. The objective of this study is to identify a set of reliable reference genes for transcription analysis in recombinant protein over-expression studies in E. coli.
In this study, the meta-analysis of 240 sets of single-channel Affymetrix microarray data representing over-expressions of 63 distinct recombinant proteins in various E. coli strains identified twenty candidate reference genes that were stably expressed across all conditions. The expression of these twenty genes and two commonly used reference genes, rrsA encoding ribosomal RNA 16S and ihfB, was quantified by qPCR in E. coli cells over-expressing four genes of the 1-Deoxy-D-Xylulose 5-Phosphate pathway. From these results, two independent statistical algorithms identified three novel reference genes cysG, hcaT, and idnT but not rrsA and ihfB as highly invariant in two E. coli strains, across different growth temperatures and induction conditions. Transcriptomic data normalized by the geometric average of these three genes demonstrated that genes of the lycopene synthetic pathway maintained steady expression upon enzyme overexpression. In contrast, the use of rrsA or ihfB as reference genes led to the mis-interpretation that lycopene pathway genes were regulated during enzyme over-expression.
This study identified cysG/hcaT/idnT to be reliable novel reference genes for transcription analysis in recombinant protein producing E. coli.