A data set was generated to examine global changes in gene expression in rat liver over time in response to a single bolus dose of methylprednisolone. Four control animals and 43 drug-treated animals were humanely killed at 16 different time points following drug administration. Total RNA preparation from the livers of these animals were hybridized to 47 individual Affymetrix RU34A gene chips, generating data for 8799 different probe sets for each chip. Data mining techniques that are applicable to gene array time series data sets in order to identify drug-regulated changes in gene expression were applied to this data set. A series of 4 sequentially applied filters were developed that were designed to eliminate probe sets that were not expressed in the tissue, were not regulated by the drug treatment, or did not meet defined quality control standards. These filters eliminated 7287 probe sets of the 8799 total (82%) from further consideration. Application of judiciously chosen filters is an effective tool for data mining of time series data sets. The remaining data can then be further analyzed by clustering and mathematical modeling techniques.
Data mining; gene arrays; glucocorticoids; mathematical modeling; pharmacogenomics
Kidney is a major target for adverse effects associated with corticosteroids. A microarray dataset was generated to examine changes in gene expression in rat kidney in response to methylprednisolone. Four control and 48 drug-treated animals were killed at 16 times after drug administration. Kidney RNA was used to query 52 individual Affymetrix chips, generating data for 15,967 different probe sets for each chip. Mining techniques applicable to time series data that identify drug-regulated changes in gene expression were applied. Four sequential filters eliminated probe sets that were not expressed in the tissue, not regulated by drug, or did not meet defined quality control standards. These filters eliminated 14,890 probe sets (94%) from further consideration. Application of judiciously chosen filters is an effective tool for data mining of time series datasets. The remaining data can then be further analyzed by clustering and mathematical modeling. Initial analysis of this filtered dataset identified a group of genes whose pattern of regulation was highly correlated with prototype corticosteroid enhanced genes. Twenty genes in this group, as well as selected genes exhibiting either downregulation or no regulation, were analyzed for 5′ GRE half-sites conserved across species. In general, the results support the hypothesis that the existence of conserved DNA binding sites can serve as an important adjunct to purely analytic approaches to clustering genes into groups with common mechanisms of regulation. This dataset, as well as similar datasets on liver and muscle, are available online in a format amenable to further analysis by others.
data mining; gene arrays; glucocorticoids; pharmacogenomics; evolutionary conservation
Elevated systemic levels of glucocorticoids are causally related to peripheral insulin resistance. The pharmacological use of synthetic glucocorticoids (corticosteroids) often results in insulin resistance/type II diabetes. Skeletal muscle is responsible for close to 80% of the insulin-induced systemic disposal of glucose and is a major target for glucocorticoid-induced insulin resistance. We used Affymetrix gene chips to profile the dynamic changes in mRNA expression in rat skeletal muscle in response to a single bolus dose of the synthetic glucocorticoid methyl-prednisolone. Temporal expression profiles (analyzed on individual chips) were obtained from tissues of 48 drug-treated animals encompassing 16 time points over 72 h following drug administration along with four vehicle-treated controls. Data mining identified 653 regulated probe sets out of 8799 present on the chip. Of these 653 probe sets we identified 29, which represented 22 gene transcripts, that were associated with the development of insulin resistance. These 29 probe sets were regulated in three fundamental temporal patterns. 16 probe sets coding for 12 different genes had a profile of enhanced expression. 10 probe sets coding for eight different genes showed decreased expression and three probe sets coding for two genes showed biphasic temporal signatures. These transcripts were grouped into four general functional categories: signal transduction, transcription regulation, carbohydrate/fat metabolism, and regulation of blood flow to the muscle. The results demonstrate the polygenic nature of transcriptional changes associated with insulin resistance that can provide a temporal scaffolding for translational and post-translational data as they become available.
Chronopharmacology is an important but under-explored aspect of therapeutics. Rhythmic variations in biological processes can influence drug action, including pharmacodynamic responses, due to circadian variations in the availability or functioning of drug targets. We hypothesized that global gene expression analysis can be useful in the identification of circadian regulated genes involved in drug action. Circadian variations in gene expression in rat liver were explored using Affymetrix gene arrays. A rich time series involving animals analyzed at 18 time points within the 24 hour cycle was generated. Of the more than 15,000 probe sets on these arrays, 265 exhibited oscillations with a 24 hour frequency. Cluster analysis yielded 5 distinct circadian clusters, with approximately two-thirds of the transcripts reaching maximum expression during the animal’s dark/active period. Of the 265 probe sets, 107 of potential therapeutic importance were identified. The expression levels of clock genes were also investigated in this study. Five clock genes exhibited circadian variation in liver, and data suggest that these genes may also be regulated by corticosteroids.
One of the challenges in constructing biological models involves resolving meaningful data patterns from which the mathematical models will be generated. For models that describe the change of mRNA in response to drug administration, questions exist whether the correct genes have been selected given the myriad transcriptional effects that may occur. Oftentimes, different algorithms will select or cluster different groups of genes from the same data set. A new approach was developed that focuses on identifying the underlying global dynamics of the system instead of selecting individual genes. The procedure was applied to microarray genomic data obtained from rat liver after a large single dose of methylprednisolone in 52 adrenalectomized rats. Twelve clusters of at least 30 genes each were selected, reflecting the major changes over time. This method along with isolating the underlying dynamics of the system also extracts and clusters the genes that make up this global dynamic for further analysis as to the contributions of specific mechanisms affected by the drug.
High-throughput data collection using gene microarrays has great potential as a method for addressing the pharmacogenomics of complex biological systems. Similarly, mechanism-based pharmacokinetic/pharmacodynamic modeling provides a tool for formulating quantitative testable hypotheses concerning the responses of complex biological systems. As the response of such systems to drugs generally entails cascades of molecular events in time, a time series design provides the best approach to capturing the full scope of drug effects. A major problem in using microarrays for high-throughput data collection is sorting through the massive amount of data in order to identify probe sets and genes of interest. Due to its inherent redundancy, a rich time series containing many time points and multiple samples per time point allows for the use of less stringent criteria of expression, expression change and data quality for initial filtering of unwanted probe sets. The remaining probe sets can then become the focus of more intense scrutiny by other methods, including temporal clustering, functional clustering and pharmacokinetic/pharmacodynamic modeling, which provide additional ways of identifying the probes and genes of pharmacological interest.
corticosteroids; data mining; expression profiling; gene chips; methylprednisolone; microarrays; modeling; pharmacodynamics; skeletal muscle; time series
Corticosteroids (CS) regulate many enzymes at both mRNA and protein levels. This study used microarrays to broadly assess regulation of various genes related to the greater urea cycle and employs pharmacokinetic/pharmacodynamic (PK/PD) modeling to quantitatively analyze and compare the temporal profiles of these genes during acute and chronic exposure to methylprednisolone (MPL). One group of adrenalectomized male Wistar rats received an intravenous bolus dose (50 mg/kg) of MPL, whereas a second group received MPL by a subcutaneous infusion (Alzet osmotic pumps) at a rate of 0.3 mg/kg/hr for seven days. The rats were sacrificed at various time points over 72 hours (acute) or 168 hours (chronic) and livers were harvested. Total RNA was extracted and Affymetrix® gene chips (RG_U34A for acute and RAE 230A for chronic) were used to identify genes regulated by CS. Besides five primary urea cycle enzymes, many other genes related to the urea cycle showed substantial changes in mRNA expression. Some genes that were simply up- or down-regulated after acute MPL showed complex biphasic patterns upon chronic infusion indicating involvement of secondary regulation. For the simplest patterns, indirect response models were used to describe the nuclear steroid-bound receptor mediated increase or decrease in gene transcription (e.g. tyrosine aminotransferase, glucocorticoid receptor). For the biphasic profiles, involvement of a secondary biosignal was assumed (e.g. ornithine decarboxylase, CCAAT/enhancer binding protein) and more complex models were derived. Microarrays were used successfully to explore CS effects on various urea cycle enzyme genes. PD models presented in this report describe testable hypotheses regarding molecular mechanisms and quantitatively characterize the direct or indirect regulation of various genes by CS.
urea cycle; corticosteroids; methylprednisolone; pharmacodynamics; genomics
Corticosteroids (CS) effects on insulin resistance related genes in rat skeletal muscle were studied. In our acute study, adrenalectomized (ADX) rats were given single doses of 50 mg/kg methylprednisolone (MPL) intravenously. In our chronic study, ADX rats were implanted with Alzet mini-pumps giving zero-order release rates of 0.3 mg/kg/h MPL and sacrificed at various times up to 7 days. Total RNA was extracted from gastrocnemius muscles and hybridized to Affymetrix GeneChips. Data mining and literature searches identified 6 insulin resistance related genes which exhibited complex regulatory pathways. Insulin receptor substrate-1 (IRS-1), uncoupling protein 3 (UCP3), pyruvate dehydrogenase kinase isoenzyme 4 (PDK4), fatty acid translocase (FAT) and glycerol-3-phosphate acyltransferase (GPAT) dynamic profiles were modeled with mutual effects by calculated nuclear drug-receptor complex (DR(N)) and transcription factors. The oscillatory feature of endothelin-1 (ET-1) expression was depicted by a negative feedback loop. These integrated models provide testable quantitative hypotheses for these regulatory cascades.
corticosteroid; glucocorticoid; microarrays; mathematical modeling; insulin resistance
Blood leukocytes play a major role in mediating local and systemic inflammation during acute pancreatitis. We hypothesize that peripheral blood mononuclear cells (PBMC) in circulation exhibit unique changes in gene expression, and could provide a “reporter” function that reflects the inflammatory response in pancreas of acute pancreatitis.
To determine specific changes in blood leukocytes during acute pancreatitis, we studied gene transcription profile of in peripheral blood mononuclear cells (PBMC) in a rat model of experimental pancreatitis (sodium taurocholate). Normal rats, saline controls and a model of septic shock were used as a controls. cRNA obtained from PBMC of each group (n = 3) were applied to Affymetrix rat genome DNA Gene Chip Arrays.
From the 8,799 rat genes analyzed, 140 genes showed unique significant changes in their expression in PBMC during the acute phase of pancreatitis, but not in sepsis. Among the 140 genes, 57 were upregulated, while 69 were downregulated. Platelet-derived growth factor receptor, prostaglandin E2 receptor and phospholipase D1 are among the top upregulated genes. Others include genes involved in G protein-coupled receptor and TGF-β-mediated signaling pathways, while genes associated with apoptosis, glucocorticoid receptors and even the cholecystokinin receptor are downregulated.
Microarray analysis in transcriptional profiling of PBMC showed that genes that are uniquely related to molecular and pancreatic function display differential expression in acute pancreatitis. Profiling genes obtained from an easily accessible source during severe pancreatitis may identify surrogate markers for disease severity.
Acute pancreatitis; oligonucleotide microarray; abdominal sepsis; peripheral blood mononuclear cells; PBMC
Affymetrix GeneChips can be re-annotated at the probe-level by breaking up the original probe-sets and recomposing new probe-sets based on up-to-date genomic knowledge, such as available in Entrez Gene. This results in custom Chip Description Files (CDF). Using these custom CDFs improves the quality of the data and thus the results of related gene expression studies. However, 44–71% of the probes on a GeneChip are lost in this re-annotation process. Although generally aimed at less known genes, losing these probes obviously means a substantial loss of expensive experiment data. Biologists are therefore very reluctant to adopt this approach.
We aimed to re-introduce the non-affected Affymetrix probe-sets after these re-annotation procedures. For this, we developed an algorithm (CDF-Merger) and applied it to standard Affymetrix CDFs and custom Brainarray CDFs to obtain Hybrid CDFs. Thus, salvaging lost Affymetrix probes with our CDF-Merger restored probe content up to 94%. Because the salvaged probes (up to 54% of the probe content on the arrays) represent less-reliable probe-sets, we made the origin of all probe-set definitions traceable, so biologists can choose at any time in their analyses, which subset of probe-sets they want to use.
The availability of up-to-date Hybrid CDFs plus R environment allows for easy implementation of our approach.
The interpretability of microarray data can be affected by sample quality. To systematically explore how RNA quality affects microarray assay performance, a set of rat liver RNA samples with a progressive change in RNA integrity was generated by thawing frozen tissue or by ex vivo incubation of fresh tissue over a time course.
Incubation of tissue at 37°C for several hours had little effect on RNA integrity, but did induce changes in the transcript levels of stress response genes and immune cell markers. In contrast, thawing of tissue led to a rapid loss of RNA integrity. Probe sets identified as most sensitive to RNA degradation tended to be located more than 1000 nucleotides upstream of their transcription termini, similar to the positioning of control probe sets used to assess sample quality on Affymetrix GeneChip® arrays. Samples with RNA integrity numbers less than or equal to 7 showed a significant increase in false positives relative to undegraded liver RNA and a reduction in the detection of true positives among probe sets most sensitive to sample integrity for in silico modeled changes of 1.5-, 2-, and 4-fold.
Although moderate levels of RNA degradation are tolerated by microarrays with 3'-biased probe selection designs, in this study we identify a threshold beyond which decreased specificity and sensitivity can be observed that closely correlates with average target length. These results highlight the value of annotating microarray data with metrics that capture important aspects of sample quality.
The use of gene expression profiling in both clinical and laboratory settings would be enhanced by better characterization of variance due to individual, environmental, and technical factors. Meta-analysis of microarray data from untreated or vehicle-treated animals within the control arm of toxicogenomics studies could yield useful information on baseline fluctuations in gene expression, although control animal data has not been available on a scale and in a form best served for data-mining.
A dataset of control animal microarray expression data was assembled by a working group of the Health and Environmental Sciences Institute's Technical Committee on the Application of Genomics in Mechanism Based Risk Assessment in order to provide a public resource for assessments of variability in baseline gene expression. Data from over 500 Affymetrix microarrays from control rat liver and kidney were collected from 16 different institutions. Thirty-five biological and technical factors were obtained for each animal, describing a wide range of study characteristics, and a subset were evaluated in detail for their contribution to total variability using multivariate statistical and graphical techniques.
The study factors that emerged as key sources of variability included gender, organ section, strain, and fasting state. These and other study factors were identified as key descriptors that should be included in the minimal information about a toxicogenomics study needed for interpretation of results by an independent source. Genes that are the most and least variable, gender-selective, or altered by fasting were also identified and functionally categorized. Better characterization of gene expression variability in control animals will aid in the design of toxicogenomics studies and in the interpretation of their results.
PPARα is a ligand-activated transcription factor involved in the regulation of nutrient metabolism and inflammation. Although much is already known about the function of PPARα in hepatic lipid metabolism, many PPARα-dependent pathways and genes have yet to be discovered. In order to obtain an overview of PPARα-regulated genes relevant to lipid metabolism, and to probe for novel candidate PPARα target genes, livers from several animal studies in which PPARα was activated and/or disabled were analyzed by Affymetrix GeneChips. Numerous novel PPARα-regulated genes relevant to lipid metabolism were identified. Out of this set of genes, eight genes were singled out for study of PPARα-dependent regulation in mouse liver and in mouse, rat, and human primary hepatocytes, including thioredoxin interacting protein (Txnip), electron-transferring-flavoprotein β polypeptide (Etfb), electron-transferring-flavoprotein dehydrogenase (Etfdh), phosphatidylcholine transfer protein (Pctp), endothelial lipase (EL, Lipg), adipose triglyceride lipase (Pnpla2), hormone-sensitive lipase (HSL, Lipe), and monoglyceride lipase (Mgll). Using an in silico screening approach, one or more PPAR response elements (PPREs) were identified in each of these genes. Regulation of Pnpla2, Lipe, and Mgll, which are involved in triglyceride hydrolysis, was studied under conditions of elevated hepatic lipids. In wild-type mice fed a high fat diet, the decrease in hepatic lipids following treatment with the PPARα agonist Wy14643 was paralleled by significant up-regulation of Pnpla2, Lipe, and Mgll, suggesting that induction of triglyceride hydrolysis may contribute to the anti-steatotic role of PPARα. Our study illustrates the power of transcriptional profiling to uncover novel PPARα-regulated genes and pathways in liver.
To identify early diabetes-related alterations in gene expression in bladder and erectile tissue that would provide novel diagnostic and therapeutic treatment targets to prevent, delay or ameliorate the ensuing bladder and erectile dysfunction.
MATERIALS AND METHODS
The RG-U34A rat GeneChip® (Affymetrix Inc., Sunnyvale, CA, USA) oligonucleotide microarray (containing ≈8799 genes) was used to evaluate gene expression in corporal and male bladder tissue excised from rats 1 week after confirmation of a diabetic state, but before demonstrable changes in organ function in vivo. A conservative analytical approach was used to detect alterations in gene expression, and gene ontology (GO) classifications were used to identify biological themes/pathways involved in the aetiology of the organ dysfunction.
In all, 320 and 313 genes were differentially expressed in bladder and corporal tissue, respectively. GO analysis in bladder tissue showed prominent increases in biological pathways involved in cell proliferation, metabolism, actin cytoskeleton and myosin, as well as decreases in cell motility, and regulation of muscle contraction. GO analysis in corpora showed increases in pathways related to ion channel transport and ion channel activity, while there were decreases in collagen I and actin genes.
The changes in gene expression in these initial experiments are consistent with the pathophysiological characteristics of the bladder and erectile dysfunction seen later in the diabetic disease process. Thus, the observed changes in gene expression might be harbingers or biomarkers of impending organ dysfunction, and could provide useful diagnostic and therapeutic targets for a variety of progressive urological diseases/conditions (i.e. lower urinary tract symptoms related to benign prostatic hyperplasia, erectile dysfunction, etc.).
diabetes mellitus; streptozotocin; erectile dysfunction; bladder dysfunction; microarray analysis; gene chips; genomics
Effects of high fat diet (HFD) on obesity and, subsequently, on diabetes are highly variable and modulated by genetics in both humans and rodents. In this report, we characterized the response of Goto-Kakizaki (GK) rats, a spontaneous polygenic model for lean diabetes and healthy Wistar-Kyoto (WKY) controls, to high fat feeding from weaning to 20 weeks of age. Animals fed either normal diet or HFD were sacrificed at 4, 8, 12, 16 and 20 weeks of age and a wide array of physiological measurements were made along with gene expression profiling using Affymetrix gene array chips. Mining of the microarray data identified differentially regulated genes (involved in inflammation, metabolism, transcription regulation, and signaling) in diabetic animals, as well as the response of both strains to HFD. Functional annotation suggested that HFD increased inflammatory differences between the two strains. Chronic inflammation driven by heightened innate immune response was identified to be present in GK animals regardless of diet. In addition, compensatory mechanisms by which WKY animals on HFD resisted the development of diabetes were identified, thus illustrating the complexity of diabetes disease progression.
diabetes; high fat diet; gene expression; microarray
We have previously discovered that probes containing runs of four or more contiguous guanines are not reliable
for measuring gene expression in the Human HG_U133A Affymetrix GeneChip data. These probes are not correlated with other members of their probe set, but they are correlated with each other. We now extend our analysis to different 3′ GeneChip designs of mouse, rat, and human. We find that, in all these chip designs, the G-stack probes (probes with
a run of exactly four consecutive guanines) are correlated highly with each other, indicating that such probes are not
reliable measures of gene expression in mammalian studies. Furthermore, there is no specific position of G-stack
where the correlation is highest in all the chips. We also find that the latest designs of rat and mouse chips have
significantly fewer G-stack probes compared to their predecessors, whereas there has not been a similar reduction
in G-stack density across the changes in human chips. Moreover, we find significant changes in RMA values (after
removing G-stack probes) as the number of G-stack probes increases.
Severe trauma, including burns, triggers a systemic response that significantly impacts on the liver, which plays a key role in the metabolic and immune responses aimed at restoring homeostasis. While many of these changes are likely regulated at the gene expression level, there is a need to better understand the dynamics and expression patterns of burn injury-induced genes in order to identify potential regulatory targets in the liver. Herein we characterized the response within the first 24 h in a standard animal model of burn injury using a time series of microarray gene expression data.
Rats were subjected to a full thickness dorsal scald burn injury covering 20% of their total body surface area while under general anesthesia. Animals were saline resuscitated and sacrificed at defined time points (0, 2, 4, 8, 16, and 24 h). Liver tissues were explanted and analyzed for their gene expression profiles using microarray technology. Sham controls consisted of animals handled similarly but not burned. After identifying differentially expressed probesets between sham and burn conditions over time, the concatenated data sets corresponding to these differentially expressed probesets in burn and sham groups were combined and analyzed using a “consensus clustering” approach.
The clustering method of expression data identified 621 burn-responsive probesets in 4 different co-expressed clusters. Functional characterization revealed that these 4 clusters are mainly associated with pro-inflammatory response, anti-inflammatory response, lipid biosynthesis, and insulin-regulated metabolism. Cluster 1 pro-inflammatory response is rapidly up-regulated (within the first 2 h) following burn injury, while Cluster 2 anti-inflammatory response is activated later on (around 8 h post burn). Cluster 3 lipid biosynthesis is downregulated rapidly following burn, possibly indicating a shift in the utilization of energy sources to produce acute phase proteins which serve the anti-inflammatory response. Cluster 4 insulin-regulated metabolism was down-regulated late in the observation window (around 16 h postburn), which suggests a potential mechanism to explain the onset of hypermetabolism, a delayed but well-known response that is characteristic of severe burns and trauma with potential adverse outcome.
Simultaneous analysis and comparison of gene expression profiles for both burn and sham control groups provided a more accurate estimation of the activation time, expression patterns, and characteristics of a certain burn-induced response based on which the cause-effect relationship among responses were revealed.
Burn; gene expression; microarray; inflammation; liver
Affymetrix three-prime expression microarrays contain thousands of redundant probe sets that interrogate different regions of the same gene. Differential expression analysis methods rarely consider probe redundancy, which can lead to inaccurate inference about overall gene expression or cause investigators to overlook potentially valuable information about differential regulation of variant mRNA products. We investigated the behaviour and consistency of redundant probe sets in a publicly-available data set containing samples from mouse brain amygdala and hippocampus and asked how applying filtering methods to the data affected consistency of results obtained from redundant probe sets. A genome-based filter that screens and groups probe sets according to their overlapping genomic alignments significantly improved redundant probe set consistency. Screening based on qualitative Present-Absent calls from MAS5 also improved consistency. However, even after applying these filters, many redundant probe sets showed significant fold-change differences relative to each other, suggesting differential regulation of alternative transcript production. Visual inspection of these loci using an interactive genome visualization tool (igb.bioviz.org) exposed thirty putative examples of differential regulation of alternative splicing or polyadenylation across brain regions in mouse. This work demonstrates how P/A-call and genome-based filtering can improve consistency among redundant probe sets while at the same time exposing possible differential regulation of RNA processing pathways across sample types.
The Affymetrix GeneChip is a widely used gene expression profiling platform. Since the chips were originally designed, the genome databases and gene definitions have been considerably updated. Thus, more accurate interpretation of microarray data requires parallel updating of the specificity of GeneChip probes. We propose a new probe remapping protocol, using the zebrafish GeneChips as an example, by removing nonspecific probes, and grouping the probes into transcript level probe sets using an integrated zebrafish genome annotation. This genome annotation is based on combining transcript information from multiple databases. This new remapping protocol, especially the new genome annotation, is shown here to be an important factor in improving the interpretation of gene expression microarray data.
Transcript data from the RefSeq, GenBank and Ensembl databases were downloaded from the UCSC genome browser, and integrated to generate a combined zebrafish genome annotation. Affymetrix probes were filtered and remapped according to the new annotation. The influence of transcript collection and gene definition methods was tested using two microarray data sets. Compared to remapping using a single database, this new remapping protocol results in up to 20% more probes being retained in the remapping, leading to approximately 1,000 more genes being detected. The differentially expressed gene lists are consequently increased by up to 30%. We are also able to detect up to three times more alternative splicing events. A small number of the bioinformatics predictions were confirmed using real-time PCR validation.
By combining gene definitions from multiple databases, it is possible to greatly increase the numbers of genes and splice variants that can be detected in microarray gene expression experiments.
Intensity values measured by Affymetrix microarrays have to be both normalized, to be able to compare different microarrays by removing non-biological variation, and summarized, generating the final probe set expression values. Various pre-processing techniques, such as dChip, GCRMA, RMA and MAS have been developed for this purpose. This study assesses the effect of applying different pre-processing methods on the results of analyses of large Affymetrix datasets. By focusing on practical applications of microarray-based research, this study provides insight into the relevance of pre-processing procedures to biology-oriented researchers.
Using two publicly available datasets, i.e., gene-expression data of 285 patients with Acute Myeloid Leukemia (AML, Affymetrix HG-U133A GeneChip) and 42 samples of tumor tissue of the embryonal central nervous system (CNS, Affymetrix HuGeneFL GeneChip), we tested the effect of the four pre-processing strategies mentioned above, on (1) expression level measurements, (2) detection of differential expression, (3) cluster analysis and (4) classification of samples. In most cases, the effect of pre-processing is relatively small compared to other choices made in an analysis for the AML dataset, but has a more profound effect on the outcome of the CNS dataset. Analyses on individual probe sets, such as testing for differential expression, are affected most; supervised, multivariate analyses such as classification are far less sensitive to pre-processing.
Using two experimental datasets, we show that the choice of pre-processing method is of relatively minor influence on the final analysis outcome of large microarray studies whereas it can have important effects on the results of a smaller study. The data source (platform, tissue homogeneity, RNA quality) is potentially of bigger importance than the choice of pre-processing method.
To elucidate the mechanisms underlying peripheral neuropathic pain in the context of HIV infection and antiretroviral therapy, we measured gene expression in dorsal root ganglia (DRG) of rats subjected to systemic treatment with the anti-retroviral agent, ddC (Zalcitabine) and concomitant delivery of HIV-gp120 to the rat sciatic nerve. L4 and L5 DRGs were collected at day 14 (time of peak behavioural change) and changes in gene expression were measured using Affymetrix whole genome rat arrays. Conventional analysis of this data set and Gene Set Enrichment Analysis (GSEA) was performed to discover biological processes altered in this model. Transcripts associated with G protein coupled receptor signalling and cell adhesion were enriched in the treated animals, while ribosomal proteins and proteasome pathways were associated with gene down-regulation. To identify genes that are directly relevant to neuropathic mechanical hypersensitivity, as opposed to epiphenomena associated with other aspects of the response to a sciatic nerve lesion, we compared the gp120 + ddC-evoked gene expression with that observed in a model of traumatic neuropathic pain (L5 spinal nerve transection), where hypersensitivity to a static mechanical stimulus is also observed. We identified 39 genes/expressed sequence tags that are differentially expressed in the same direction in both models. Most of these have not previously been implicated in mechanical hypersensitivity and may represent novel targets for therapeutic intervention. As an external control, the RNA expression of three genes was examined by RT-PCR, while the protein levels of two were studied using western blot analysis.
Neuropathic pain; HIV; Mechanical hypersensitivity; Microarray
The Remote Analysis Computation for gene Expression data (RACE) suite is a collection of bioinformatics web tools designed for the analysis of DNA microarray data. RACE performs probe-level data preprocessing, extensive quality checks, data visualization and data normalization for Affymetrix GeneChips. In addition, it offers differential expression analysis on normalized expression levels from any array platform. RACE estimates the false discovery rates of lists of potentially regulated genes and provides a Gene Ontology-term analysis tool for GeneChip data to support the biological interpretation and annotation of results. The analysis is fully automated but can be customized by flexible parameter settings. To offer a convenient starting point for subsequent analyses, and to provide maximum transparency, the R scripts used to generate the results can be downloaded along with the output files. RACE is freely available for use at .
Current methods of analyzing Affymetrix GeneChip® microarray data require the estimation of probe set expression summaries, followed by application of statistical tests to determine which genes are differentially expressed. The S-Score algorithm described by Zhang and colleagues is an alternative method that allows tests of hypotheses directly from probe level data. It is based on an error model in which the detected signal is proportional to the probe pair signal for highly expressed genes, but approaches a background level (rather than 0) for genes with low levels of expression. This model is used to calculate relative change in probe pair intensities that converts probe signals into multiple measurements with equalized errors, which are summed over a probe set to form the S-Score. Assuming no expression differences between chips, the S-Score follows a standard normal distribution, allowing direct tests of hypotheses to be made. Using spike-in and dilution datasets, we validated the S-Score method against comparisons of gene expression utilizing the more recently developed methods RMA, dChip, and MAS5.
The S-score showed excellent sensitivity and specificity in detecting low-level gene expression changes. Rank ordering of S-Score values more accurately reflected known fold-change values compared to other algorithms.
The S-score method, utilizing probe level data directly, offers significant advantages over comparisons using only probe set expression summaries.
The goal of these studies was to characterize the transcriptional network regulating changes in gene expression in the remnant liver of the rat after 70% partial hepatectomy (PHx) during the early phase response including the transition of hepatocytes from the quiescent (G0) state and the onset of the G1 phase of the cell cycle.
The transcriptome of remnant livers was monitored at 1, 2, 4, and 6 hours after PHx using cDNA microarrays. Differentially regulated genes were grouped into six clusters according their temporal expression profiles. Promoter regions of genes in these clusters were examined for shared transcription factor binding sites (TFBS) by comparing enrichment of each TFBS relative to a reference set using the Promoter Analysis and Interaction Network Toolset (PAINT).
Analysis of the gene expression time series data using ANOVA resulted in a total of 309 genes significantly up- or down-regulated at any of the four time points at a 20% FDR threshold. Sham-operated animals showed no significant differential expression. A subset of the differentially expressed genes was validated using quantitative RT-PCR. Distinct sets of TFBS could be identified that were significantly enriched in each one of the different temporal gene expression clusters. These included binding sites for transcription factors that had previously been recognized as contributing to the onset of regeneration, including NF-κB, C/EBP, HNF-1, CREB, as well as factors, such as ATF, AP-2, LEF-1, GATA and PAX-6, that had not yet been recognized to be involved in this process. A subset of these candidate TFBS was validated by measuring activation of corresponding transcription factors (HNF-1, NK-κB, CREB, C/EBP-α and C/EBP-β, GATA-1, AP-2, PAX-6) in nuclear extracts from the remnant livers.
This analysis revealed multiple candidate transcription factors activated in the remnant livers, some known to be involved in the early phase of liver regeneration, and several not previously identified. The study describes the predominant temporal and functional elements to which these factors contribute and demonstrates the potential of this novel approach to define the functional correlates of the transcriptional regulatory network driving the early response to partial hepatectomy.
Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project.