The explosion of available microarray data on human cancer increases the urgency for developing methods for effectively sharing this data among clinical cancer investigators. Lack of a smooth interface between the databases and statistical analysis tools limits the potential benefits of sharing the publicly available microarray data. To facilitate the efficient sharing and use of publicly available microarray data among cancer investigators, we have built a BRB-ArrayTools Data Archive including over one hundred human cancer microarray projects for 28 cancer types. Expression array data and clinical descriptors have been imported into BRB-ArrayTools and are stored as BRB-ArrayTools project folders on the archive. The data archive can be accessed from: http://linus.nci.nih.gov/~brb/DataArchive.html Our BRB-ArrayTools data archive and GEO importer represent ongoing efforts to provide effective tools for efficiently sharing and utilizing human cancer microarray data.
Helicobacter pylori infection reprograms host gene expression and influences various cellular processes, which have been investigated by cDNA microarray using in vitro culture cells and in vivo gastric biopsies from patients of the Chronic Abdominal Complaint. To further explore the effects of H. pylori infection on host gene expression, we have collected the gastric antral mucosa samples from 6 untreated patients with gastroscopic and pathologic confirmation of chronic superficial gastritis. Among them three patients were infected by H. pylori and the other three patients were not. These samples were analyzed by a microarray chip which contains 14,112 cloned cDNAs, and microarray data were analyzed via BRB ArrayTools software and Ingenuity Pathways Analysis (IPA) website. The results showed 34 genes of 38 differentially expressed genes regulated by H. pylori infection had been annotated. The annotated genes were involved in protein metabolism, inflammatory and immunological reaction, signal transduction, gene transcription, trace element metabolism, and so on. The 82% of these genes (28/34) were categorized in three molecular interaction networks involved in gene expression, cancer progress, antigen presentation and inflammatory response. The expression data of the array hybridization was confirmed by quantitative real-time PCR assays. Taken together, these data indicated that H. pylori infection could alter cellular gene expression processes, escape host defense mechanism, increase inflammatory and immune responses, activate NF-κB and Wnt/β-catenin signaling pathway, disturb metal ion homeostasis, and induce carcinogenesis. All of these might help to explain H. pylori pathogenic mechanism and the gastroduodenal pathogenesis induced by H. pylori infection.
Identification of gene expression profiles of cancer stem cells may have significant implications in the understanding of tumor biology and for the design of novel treatments targeted toward these cells. Here we report a potential ovarian cancer stem cell gene expression profile from isolated side population of fresh ascites obtained from women with high-grade advanced stage papillary serous ovarian adenocarcinoma. Affymetrix U133 Plus 2.0 microarrays were used to interrogate the differentially expressed genes between side population (SP) and main population (MP), and the results were analyzed by paired T-test using BRB-ArrayTools. We identified 138 up-regulated and 302 down-regulated genes that were differentially expressed between all 10 SP/MP pairs. Microarray data was validated using qRT-PCR and17/19 (89.5%) genes showed robust correlations between microarray and qRT-PCR expression data. The Pathway Studio analysis identified several genes involved in cell survival, differentiation, proliferation, and apoptosis which are unique to SP cells and a mechanism for the activation of Notch signaling is identified. To validate these findings, we have identified and isolated SP cells enriched for cancer stem cells from human ovarian cancer cell lines. The SP populations were having a higher colony forming efficiency in comparison to its MP counterpart and also capable of sustained expansion and differentiation in to SP and MP phenotypes. 50,000 SP cells produced tumor in nude mice whereas the same number of MP cells failed to give any tumor at 8 weeks after injection. The SP cells demonstrated a dose dependent sensitivity to specific γ-secretase inhibitors implicating the role of Notch signaling pathway in SP cell survival. Further the generated SP gene list was found to be enriched in recurrent ovarian cancer tumors.
Krüppel-like factor KLF4 plays a crucial role in the development and maintenance of the mouse cornea. Here, we have compared the wild type (WT) and Klf4-conditional null (Klf4CN) corneal gene expression patterns to understand the molecular basis of the Klf4CN corneal phenotype.
Expression of more than 22,000 genes in 10 WT and Klf4CN corneas was compared by microarrays, analyzed using BRB ArrayTools and validated by Q-RT-PCR. Transient cotransfections were employed to test if KLF4 activates the aquaporin-3, Aldh3a1 and TKT promoters.
Scatter plot analysis identified 740 and 529 genes up- and down-regulated by more than 2-fold, respectively, in the Klf4CN corneas. Cell cycle activators were upregulated while the inhibitors were downregulated, consistent with the increased Klf4CN corneal epithelial cell proliferation. Desmosomal components were downregulated, consistent with the Klf4CN corneal epithelial fragility. Downregulation of aquaporin-3, detected by microarray, was confirmed by immunoblot and immunohistochemistry. Aquaporin-3 promoter activity was stimulated 7–10 fold by cotransfection with pCI-KLF4. Corneal crystallins Aldh3A1 and TKT were downregulated in the Klf4CN cornea and their respective promoter activities were upregulated 16- and 9-fold by pCI-KLF4 in co-transfections. Expression of epidermal keratinocyte differentiation markers was affected in the Klf4CN cornea. While the cornea specific keratin-12 was downregulated, most other keratins were upregulated, suggesting hyperkeratosis.
We have identified functionally diverse candidate KLF4 target genes, revealing the molecular basis of the diverse aspects of the Klf4CN corneal phenotype. These results establish KLF4 as an important node in the genetic network of transcription factors regulating the corneal homeostasis.
Cornea; Development; KLF4; Microarray
Separation of the neurosensory retina from the retinal pigment epithelium (RPE) yields many morphologic and functional consequences, including death of the photoreceptor cells, Müller cell hypertrophy, and inner retinal rewiring. Many of these changes are due to the separation-induced activation of specific genes. In this work, we define the gene transcription profile within the retina as a function of time after detachment. We also define the early activation of kinases that might be responsible for the detachment-induced changes in gene transcription.
Separation of the retina from the RPE was induced in Brown-Norway rats by the injection of 1% hyaluronic acid into the subretinal space. Retinas were harvested at 1, 7, and 28 days after separation. Gene transcription profiles for each time point were determined using the Affymetrix Rat 230A gene microarray chip. Transcription levels in detached retinas were compared to those of nondetached retinas with the BRB-ArrayTools Version 3.6.0 using a random variance analysis of variance (ANOVA) model. Confirmation of the significant transcriptional changes for a subset of the genes was performed using microfluidic quantitative real-time polymerase chain reaction (qRT-PCR) assays. Kinase activation was explored using Western blot analysis to look for early phosphorylation of any of the 3 main families of mitogen-activated protein kinases (MAPK): the p38 family, the Janus kinase family, and the p42/p44 family.
Retinas separated from the RPE showed extensive alterations in their gene transcription profile. Many of these changes were initiated as early as 1 day after separation, with significant increases by 7 days. ANOVA analysis defined 144 genes that had significantly altered transcription levels as a function of time after separation when setting a false discovery rate at ≤0.1. Confirmatory RT-PCR was performed on 51 of these 144 genes. Differential transcription detected on the microarray chip was confirmed by qRT-PCR for all 51 genes. Western blot analysis showed that the p42/p44 family of MAPK was phosphorylated within 2 hours of retinal-RPE separation. This phosphorylation was detachment-induced and could be inhibited by specific inhibitors of MAPK phosphorylation.
Separation of the retina from the RPE induces significant alteration in the gene transcription profile within the retina. These profiles are not static, but change as a function of time after detachment. These gene transcription changes are preceded by the activation of the p42/p44 family of MAPK. This altered transcription may serve as the basis for many of the morphologic, biochemical, and functional changes seen within the detached retina.
This study evaluated the effects of black raspberries (BRBs) on biomarkers of tumor development in the human colon and rectum including methylation of relevant tumor suppressor genes, cell proliferation, apoptosis, angiogenesis and expression of Wnt pathway genes.
Biopsies of adjacent normal tissues and colorectal adenocarcinomas were taken from 20 patients before and after oral consumption of BRB powder (60g/day) for 1-to-9 wks. Methylation status of promoter regions of five tumor suppressor genes was quantified. Protein expression of DNA methyltransferase 1 (DNMT1) and genes associated with cell proliferation, apoptosis, angiogenesis, and Wnt signaling were measured.
The methylation of three Wnt inhibitors, SFRP2, SFRP5, and WIF1, upstream genes in Wnt pathway, and PAX6a, a developmental regulator, was modulated in a protective direction by BRBs in normal tissues and in colorectal tumors only in patients who received an average of 4 wks of BRB treatment, but not in all 20 patients with 1-to-9 wks of BRB treatment. This was associated with decreased expression of DNMT1. BRBs modulated expression of genes associated with Wnt pathway, proliferation, apoptosis and angiogenesis in a protective direction.
These data provide evidence of the ability of BRBs to demethylate tumor suppressor genes and to modulate other biomarkers of tumor development in the human colon and rectum. While demethylation of genes did not occur in colorectal tissues from all treated patients, the positive results with the secondary endpoints suggest that additional studies of BRBs for the prevention of colorectal cancer in humans now appear warranted.
Idiopathic Pulmonary Fibrosis (IPF) is characterized by profound changes in the lung phenotype including excessive extracellular matrix deposition, myofibroblast foci, alveolar epithelial cell hyperplasia and extensive remodeling. The role of epigenetic changes in determining the lung phenotype in IPF is unknown. In this study we determine whether IPF lungs exhibit an altered global methylation profile.
Immunoprecipitated methylated DNA from 12 IPF lungs, 10 lung adenocarcinomas and 10 normal histology lungs was hybridized to Agilent human CpG Islands Microarrays and data analysis was performed using BRB-Array Tools and DAVID Bioinformatics Resources software packages. Array results were validated using the EpiTYPER MassARRAY platform for 3 CpG islands. 625 CpG islands were differentially methylated between IPF and control lungs with an estimated False Discovery Rate less than 5%. The genes associated with the differentially methylated CpG islands are involved in regulation of apoptosis, morphogenesis and cellular biosynthetic processes. The expression of three genes (STK17B, STK3 and HIST1H2AH) with hypomethylated promoters was increased in IPF lungs. Comparison of IPF methylation patterns to lung cancer or control samples, revealed that IPF lungs display an intermediate methylation profile, partly similar to lung cancer and partly similar to control with 402 differentially methylated CpG islands overlapping between IPF and cancer. Despite their similarity to cancer, IPF lungs did not exhibit hypomethylation of long interspersed nuclear element 1 (LINE-1) retrotransposon while lung cancer samples did, suggesting that the global hypomethylation observed in cancer was not typical of IPF.
Our results provide evidence that epigenetic changes in IPF are widespread and potentially important. The partial similarity to cancer may signify similar pathogenetic mechanisms while the differences constitute IPF or cancer specific changes. Elucidating the role of these specific changes will potentially allow better understanding of the pathogenesis of IPF.
Numerous microarray analysis programs have been created through the efforts of Open Source software development projects. Providing browser-based interfaces that allow these programs to be executed over the Internet enhances the applicability and utility of these analytic software tools.
Here we present ArrayQuest, a web-based DNA microarray analysis process controller. Key features of ArrayQuest are that (1) it is capable of executing numerous analysis programs such as those written in R, BioPerl and C++; (2) new analysis programs can be added to ArrayQuest Methods Library at the request of users or developers; (3) input DNA microarray data can be selected from public databases (i.e., the Medical University of South Carolina (MUSC) DNA Microarray Database or Gene Expression Omnibus (GEO)) or it can be uploaded to the ArrayQuest center-point web server into a password-protected area; and (4) analysis jobs are distributed across computers configured in a backend cluster. To demonstrate the utility of ArrayQuest we have populated the methods library with methods for analysis of Affymetrix DNA microarray data.
ArrayQuest enables browser-based implementation of DNA microarray data analysis programs that can be executed on a Linux-based platform. Importantly, ArrayQuest is a platform that will facilitate the distribution and implementation of new analysis algorithms and is therefore of use to both developers of analysis applications as well as users. ArrayQuest is freely available for use at .
The inner blood-retinal barrier (BRB) is a gliovascular unit in which macroglial cells surround capillary endothelial cells and regulate retinal capillaries by paracrine interactions. The purpose of the present study was to identify genes of retinal capillary endothelial cells whose expression is modulated by Müller glial cell-derived factors.
Conditionally immortalized rat retinal capillary endothelial (TR-iBRB2) and Müller (TR-MUL5) cell lines were chosen as an in vitro model. TR-iBRB2 cells were incubated with conditioned medium of TR-MUL5 (MUL-CM) for 24 h and subjected to microarray and quantitative real-time PCR analysis.
TR-MUL5 cell-derived factors increased alkaline phosphatase activity in TR-iBRB2 cells, indicating that paracrine interactions occurred between TR-iBRB2 and TR-MUL5 cells. Microarray analysis demonstrated that MUL-CM treatment leads to a modulation of several genes including an induction of plasminogen activator inhibitor 1 (PAI-1) and a suppression of an inhibitor of DNA binding 2 (Id2) in TR-iBRB2 cells. Treatment with TGF-β1, which is incorporated in MUL-CM, also resulted in an induction of PAI-1 and a suppression of Id2 in TR-iBRB2 cells.
In vitro inner BRB model study revealed that Müller glial cell-derived factors modulate endothelial cell functions including the induction of anti-angiogenic PAI-1 and the suppression of pro-angiogenic Id2. Therefore, Müller cells appear to be one of the modulators of retinal angiogenesis.
The high-density oligonucleotide microarray (GeneChip) is an important tool for molecular biological research aiming at large-scale detection of small nucleotide polymorphisms in DNA and genome-wide analysis of mRNA concentrations. Local array data management solutions are instrumental for efficient processing of the results and for subsequent uploading of data and annotations to a global certified data repository at the EBI (ArrayExpress) or the NCBI (GeneOmnibus).
To facilitate and accelerate annotation of high-throughput expression profiling experiments, the Microarray Information Management and Annotation System (MIMAS) was developed. The system is fully compliant with the Minimal Information About a Microarray Experiment (MIAME) convention. MIMAS provides life scientists with a highly flexible and focused GeneChip data storage and annotation platform essential for subsequent analysis and interpretation of experimental results with clustering and mining tools. The system software can be downloaded for academic use upon request.
MIMAS implements a novel concept for nation-wide GeneChip data management whereby a network of facilities is centered on one data node directly connected to the European certified public microarray data repository located at the EBI. The solution proposed may serve as a prototype approach to array data management between research institutes organized in a consortium.
Cervical cancer is the most common cancer among Indian women. The current recommendations are to treat the stage IIB, IIIA, IIIB and IVA with radical radiotherapy and weekly cisplatin based chemotherapy. However, Radiotherapy alone can help cure more than 60% of stage IIB and up to 40% of stage IIIB patients.
Archival RNA samples from 15 patients who had achieved complete remission and stayed disease free for more than 36 months (No Evidence of Disease or NED group) and 10 patients who had failed radical radiotherapy (Failed group) were included in the study. The RNA were amplified, labelled and hybridized to Stanford microarray chips and analyzed using BRB Array Tools software and Significance Analysis of Microarray (SAM) analysis. 20 genes were selected for further validation using Relative Quantitation (RQ) Taqman assay in a Taqman Low-Density Array (TLDA) format. The RQ value was calculated, using each of the NED sample once as a calibrator. A scoring system was developed based on the RQ value for the genes.
Using a seven gene based scoring system, it was possible to distinguish between the tumours which were likely to respond to the radiotherapy and those likely to fail. The mean score ± 2 SE (standard error of mean) was used and at a cut-off score of greater than 5.60, the sensitivity, specificity, Positive predictive value (PPV) and Negative predictive value (NPV) were 0.64, 1.0, 1.0, 0.67, respectively, for the low risk group.
We have identified a 7 gene signature which could help identify patients with cervical cancer who can be treated with radiotherapy alone. However, this needs to be validated in a larger patient population.
Summary: Microarrays are commonly used to detect changes in gene expression between different biological samples. For this purpose, many analysis tools have been developed that offer visualization, statistical analysis and more sophisticated analysis methods. Most of these tools are designed specifically for messenger RNA microarrays. However, today, more and more different microarray platforms are available. Changes in DNA methylation, microRNA expression or even protein phosphorylation states can be detected with specialized arrays. For these microarray technologies, the number of available tools is small compared with mRNA analysis tools. Especially, a joint analysis of different microarray platforms that have been used on the same set of biological samples is hardly supported by most microarray analysis tools. Here, we present InCroMAP, a tool for the analysis and visualization of high-level microarray data from individual or multiple different platforms. Currently, InCroMAP supports mRNA, microRNA, DNA methylation and protein modification datasets. Several methods are offered that allow for an integrated analysis of data from those platforms. The available features of InCroMAP range from visualization of DNA methylation data over annotation of microRNA targets and integrated gene set enrichment analysis to a joint visualization of data from all platforms in the context of metabolic or signalling pathways.
Availability: InCroMAP is freely available as Java™ application at www.cogsys.cs.uni-tuebingen.de/software/InCroMAP, including a comprehensive user’s guide and example files.
firstname.lastname@example.org or email@example.com
DNA microarrays provide data for genome wide patterns of expression between observation classes. Microarray studies often have small samples sizes, however, due to cost constraints or specimen availability. This can lead to poor random error estimates and inaccurate statistical tests of differential expression. We compare the performance of the standard t-test, fold change, and four small n statistical test methods designed to circumvent these problems. We report results of various normalization methods for empirical microarray data and of various random error models for simulated data.
Three Empirical Bayes methods (CyberT, BRB, and limma t-statistics) were the most effective statistical tests across simulated and both 2-colour cDNA and Affymetrix experimental data. The CyberT regularized t-statistic in particular was able to maintain expected false positive rates with simulated data showing high variances at low gene intensities, although at the cost of low true positive rates. The Local Pooled Error (LPE) test introduced a bias that lowered false positive rates below theoretically expected values and had lower power relative to the top performers. The standard two-sample t-test and fold change were also found to be sub-optimal for detecting differentially expressed genes. The generalized log transformation was shown to be beneficial in improving results with certain data sets, in particular high variance cDNA data.
Pre-processing of data influences performance and the proper combination of pre-processing and statistical testing is necessary for obtaining the best results. All three Empirical Bayes methods assessed in our study are good choices for statistical tests for small n microarray studies for both Affymetrix and cDNA data. Choice of method for a particular study will depend on software and normalization preferences.
Though microarray experiments are very popular in life science research, managing and analyzing microarray data are still challenging tasks for many biologists. Most microarray programs require users to have sophisticated knowledge of mathematics, statistics and computer skills for usage. With accumulating microarray data deposited in public databases, easy-to-use programs to re-analyze previously published microarray data are in high demand.
EzArray is a web-based Affymetrix expression array data management and analysis system for researchers who need to organize microarray data efficiently and get data analyzed instantly. EzArray organizes microarray data into projects that can be analyzed online with predefined or custom procedures. EzArray performs data preprocessing and detection of differentially expressed genes with statistical methods. All analysis procedures are optimized and highly automated so that even novice users with limited pre-knowledge of microarray data analysis can complete initial analysis quickly. Since all input files, analysis parameters, and executed scripts can be downloaded, EzArray provides maximum reproducibility for each analysis. In addition, EzArray integrates with Gene Expression Omnibus (GEO) and allows instantaneous re-analysis of published array data.
EzArray is a novel Affymetrix expression array data analysis and sharing system. EzArray provides easy-to-use tools for re-analyzing published microarray data and will help both novice and experienced users perform initial analysis of their microarray data from the location of data storage. We believe EzArray will be a useful system for facilities with microarray services and laboratories with multiple members involved in microarray data analysis. EzArray is freely available from .
The development of DNA microarrays has facilitated the generation of hundreds of thousands of transcriptomic datasets. The use of a common reference microarray design allows existing transcriptomic data to be readily compared and re-analysed in the light of new data, and the combination of this design with large datasets is ideal for 'systems'-level analyses. One issue is that these datasets are typically collected over many years and may be heterogeneous in nature, containing different microarray file formats and gene array layouts, dye-swaps, and showing varying scales of log2- ratios of expression between microarrays. Excellent software exists for the normalisation and analysis of microarray data but many data have yet to be analysed as existing methods struggle with heterogeneous datasets; options include normalising microarrays on an individual or experimental group basis. Our solution was to develop the Batch Anti-Banana Algorithm in R (BABAR) algorithm and software package which uses cyclic loess to normalise across the complete dataset. We have already used BABAR to analyse the function of Salmonella genes involved in the process of infection of mammalian cells.
The only input required by BABAR is unprocessed GenePix or BlueFuse microarray data files. BABAR provides a combination of 'within' and 'between' microarray normalisation steps and diagnostic boxplots. When applied to a real heterogeneous dataset, BABAR normalised the dataset to produce a comparable scaling between the microarrays, with the microarray data in excellent agreement with RT-PCR analysis. When applied to a real non-heterogeneous dataset and a simulated dataset, BABAR's performance in identifying differentially expressed genes showed some benefits over standard techniques.
BABAR is an easy-to-use software tool, simplifying the simultaneous normalisation of heterogeneous two-colour common reference design cDNA microarray-based transcriptomic datasets. We show BABAR transforms real and simulated datasets to allow for the correct interpretation of these data, and is the ideal tool to facilitate the identification of differentially expressed genes or network inference analysis from transcriptomic datasets.
Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation.
We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at .
GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at .
Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.
We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from . All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.
varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.
Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58%) of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a freely available software package that is immediately applicable to any human microarray study.
Though the use of microarrays to identify differentially expressed (DE) genes has become commonplace, it is still not a trivial task. Microarray data are notorious for being noisy, and current DE gene methods do not fully utilize pre-existing biological knowledge to help control this noise. One such source of knowledge is the vast number of publicly available microarray datasets. To leverage this information, we have developed the SVD Augmented Gene expression Analysis Tool (SAGAT) for identifying DE genes. SAGAT extracts transcriptional modules from publicly available microarray data and integrates this information with a dataset of interest. We explore SAGAT's ability to improve DE gene identification on simulated data, and we validate the method on three highly replicated biological datasets. Finally, we demonstrate SAGAT's effectiveness on a novel human dataset investigating the transcriptional response to insulin resistance. Use of SAGAT leads to an increased number of insulin resistant candidate genes, and we validate a subset of these with qPCR. We provide SAGAT as an open source R package that is applicable to any human microarray study.
It is well known that Affymetrix microarrays are widely used to predict genome-wide gene expression and genome-wide genetic polymorphisms from RNA and genomic DNA hybridization experiments, respectively. It has recently been proposed to integrate the two predictions by use of RNA microarray data only. Although the ability to detect single feature polymorphisms (SFPs) from RNA microarray data has many practical implications for genome study in both sequenced and unsequenced species, it raises enormous challenges for statistical modelling and analysis of microarray gene expression data for this objective. Several methods are proposed to predict SFPs from the gene expression profile. However, their performance is highly vulnerable to differential expression of genes. The SFPs thus predicted are eventually a reflection of differentially expressed genes rather than genuine sequence polymorphisms. To address the problem, we developed a novel statistical method to separate the binding affinity between a transcript and its targeting probe and the parameter measuring transcript abundance from perfect-match hybridization values of Affymetrix gene expression data. We implemented a Bayesian approach to detect SFPs and to genotype a segregating population at the detected SFPs. Based on analysis of three Affymetrix microarray datasets, we demonstrated that the present method confers a significantly improved robustness and accuracy in detecting the SFPs that carry genuine sequence polymorphisms when compared to its rivals in the literature. The method developed in this paper will provide experimental genomicists with advanced analytical tools for appropriate and efficient analysis of their microarray experiments and biostatisticians with insightful interpretation of Affymetrix microarray data.
One of the ultimate goals of genomics is to explore structural and functional variations of all genes in a genome. High-density oligo-microarray techniques enable prediction of genome-wide gene expression and genome-wide genetic polymorphisms from using RNA and genomic DNA samples, respectively. A recent proposal to integrate the two predictions by use of RNA microarray data alone has great practical implications in genomics. However, it is essential but very challenging to develop an appropriate analytical method for detecting genetic polymorphisms (SFPs) from RNA expression data, which are inherently coupled with various sources of biological and technical variations. This paper presents a novel statistical approach to detect SFPs from gene expression data. We demonstrated that the new method is significantly more robust to variation due to differential expression of genes and improves the reliability of calling SFPs that bear genuine sequence polymorphisms than the other five methods in the mainstream literature on SFP prediction from microarray data. The improved predictability of detecting SFPs not only confers accuracy in evaluating gene expression from microarray information, but also opens up an opportunity to integrate structural and functional analyses by using only one set of microarray data.
Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data.
The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data.
The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity.
Hepatitis C virus (HCV) infection is a major cause of hepatocellular carcinoma (HCC) worldwide. The molecular mechanisms of HCV-induced hepatocarcinogenesis are not yet fully elucidated. Besides indirect effects as tissue inflammation and regeneration, a more direct oncogenic activity of HCV can be postulated leading to an altered expression of cellular genes by early HCV viral proteins. In the present study, a comparison of gene expression patterns has been performed by microarray analysis on liver biopsies from HCV-positive HCC patients and HCV-negative controls.
Gene expression profiling of liver tissues has been performed using a high-density microarray containing 36'000 oligos, representing 90% of the human genes. Samples were obtained from 14 patients affected by HCV-related HCC and 7 HCV-negative non-liver-cancer patients, enrolled at INT in Naples. Transcriptional profiles identified in liver biopsies from HCC nodules and paired non-adjacent non-HCC liver tissue of the same HCV-positive patients were compared to those from HCV-negative controls by the Cluster program. The pathway analysis was performed using the BRB-Array- Tools based on the "Ingenuity System Database". Significance threshold of t-test was set at 0.001.
Significant differences were found between the expression patterns of several genes falling into different metabolic and inflammation/immunity pathways in HCV-related HCC tissues as well as the non-HCC counterpart compared to normal liver tissues. Only few genes were found differentially expressed between HCV-related HCC tissues and paired non-HCC counterpart.
In this study, informative data on the global gene expression pattern of HCV-related HCC and non-HCC counterpart, as well as on their difference with the one observed in normal liver tissues have been obtained. These results may lead to the identification of specific biomarkers relevant to develop tools for detection, diagnosis, and classification of HCV-related HCC.
Microarray experimentation requires the application of complex analysis methods as well as the use of non-trivial computer technologies to manage the resultant large data sets. This, together with the proliferation of tools and techniques for microarray data analysis, makes it very challenging for a laboratory scientist to keep up-to-date with the latest developments in this field. Our aim was to develop a distributed e-support system for microarray data analysis and management.
EMAAS (Extensible MicroArray Analysis System) is a multi-user rich internet application (RIA) providing simple, robust access to up-to-date resources for microarray data storage and analysis, combined with integrated tools to optimise real time user support and training. The system leverages the power of distributed computing to perform microarray analyses, and provides seamless access to resources located at various remote facilities. The EMAAS framework allows users to import microarray data from several sources to an underlying database, to pre-process, quality assess and analyse the data, to perform functional analyses, and to track data analysis steps, all through a single easy to use web portal. This interface offers distance support to users both in the form of video tutorials and via live screen feeds using the web conferencing tool EVO. A number of analysis packages, including R-Bioconductor and Affymetrix Power Tools have been integrated on the server side and are available programmatically through the Postgres-PLR library or on grid compute clusters. Integrated distributed resources include the functional annotation tool DAVID, GeneCards and the microarray data repositories GEO, CELSIUS and MiMiR. EMAAS currently supports analysis of Affymetrix 3' and Exon expression arrays, and the system is extensible to cater for other microarray and transcriptomic platforms.
EMAAS enables users to track and perform microarray data management and analysis tasks through a single easy-to-use web application. The system architecture is flexible and scalable to allow new array types, analysis algorithms and tools to be added with relative ease and to cope with large increases in data volume.
Successful delivery of compounds to the brain and retina is a challenge in the development of therapeutic drugs and imaging agents. This challenge arises because internalization of compounds into the brain and retina is restricted by the blood–brain barrier (BBB) and blood-retinal barrier (BRB), respectively. Simple and reliable in vivo assays are necessary to identify compounds that can easily cross the BBB and BRB.
We developed six fluorescent indoline derivatives (IDs) and examined their ability to cross the BBB and BRB in zebrafish by in vivo fluorescence imaging. These fluorescent IDs were administered to live zebrafish by immersing the zebrafish larvae at 7-8 days post fertilization in medium containing the ID, or by intracardiac injection. We also examined the effect of multidrug resistance proteins (MRPs) on the permeability of the BBB and BRB to the ID using MK571, a selective inhibitor of MRPs.
The permeability of these barriers to fluorescent IDs administered by simple immersion was comparable to when administered by intracardiac injection. Thus, this finding supports the validity of drug administration by simple immersion for the assessment of BBB and BRB permeability to fluorescent IDs. Using this zebrafish model, we demonstrated that the length of the methylene chain in these fluorescent IDs significantly affected their ability to cross the BBB and BRB via MRPs.
We demonstrated that in vivo assessment of the permeability of the BBB and BRB to fluorescent IDs could be simply and reliably performed using zebrafish. The structure of fluorescent IDs can be flexibly modified and, thus, the permeability of the BBB and BRB to a large number of IDs can be assessed using this zebrafish-based assay. The large amount of data acquired might be useful for in silico analysis to elucidate the precise mechanisms underlying the interactions between chemical structure and the efflux transporters at the BBB and BRB. In turn, understanding these mechanisms may lead to the efficient design of compounds targeting the brain and retina.
Blood-brain barrier; Blood-retinal barrier; Zebrafish; Fluorescent indoline derivatives; Transporters
Microarray-based pooled DNA experiments that combine the merits of DNA pooling and gene chip technology constitute a pivotal advance in biotechnology. This new technique uses pooled DNA, thereby reducing costs associated with the typing of DNA from numerous individuals. Moreover, use of an oligonucleotide gene chip reduces costs related to processing various DNA segments (e.g., primers, reagents). Thus, the technique provides an overall cost-effective solution for large-scale genomic/genetic research. However, few publicly shared tools are available to systematically analyze the rapidly accumulating volume of whole-genome pooled DNA data.
We propose a generalized concept of pooled DNA and present a user-friendly tool named Microarray Pooled DNA Analyzer (MPDA) that we developed to analyze hybridization intensity data from microarray-based pooled DNA experiments. MPDA enables whole-genome DNA preferential amplification/hybridization analysis, allele frequency estimation, association mapping, allelic imbalance detection, and permits integration with shared data resources online. Graphic and numerical outputs from MPDA support global and detailed inspection of large amounts of genomic data. Four whole-genome data analyses are used to illustrate the major functionalities of MPDA. The first analysis shows that MPDA can characterize genomic patterns of preferential amplification/hybridization and provide calibration information for pooled DNA data analysis. The second analysis demonstrates that MPDA can accurately estimate allele frequencies. The third analysis indicates that MPDA is cost-effective and reliable for association mapping. The final analysis shows that MPDA can identify regions of chromosomal aberration in cancer without paired-normal tissue.
MPDA, the software that integrates pooled DNA association analysis and allelic imbalance analysis, provides a convenient analysis system for extensive whole-genome pooled DNA data analysis. The software, user manual and illustrated examples are freely available online at the MPDA website listed in the Availability and requirements section.
High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).
To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.
Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.
Gene expression; Contrast data; Gene set; Gene set enrichment; Omics; Microarray; Next-generation sequencing; Reproducible research system; Knowledge acquisition