Search tips
Search criteria

Results 1-25 (1241798)

Clipboard (0)

Related Articles

1.  Development and production of an oligonucleotide MuscleChip: use for validation of ambiguous ESTs 
BMC Bioinformatics  2002;3:33.
We describe the development, validation, and use of a highly redundant 120,000 oligonucleotide microarray (MuscleChip) containing 4,601 probe sets representing 1,150 known genes expressed in muscle and 2,075 EST clusters from a non-normalized subtracted muscle EST sequencing project (28,074 EST sequences). This set included 369 novel EST clusters showing no match to previously characterized proteins in any database. Each probe set was designed to contain 20–32 25 mer oligonucleotides (10–16 paired perfect match and mismatch probe pairs per gene), with each probe evaluated for hybridization kinetics (Tm) and similarity to other sequences. The 120,000 oligonucleotides were synthesized by photolithography and light-activated chemistry on each microarray.
Hybridization of human muscle cRNAs to this MuscleChip (33 samples) showed a correlation of 0.6 between the number of ESTs sequenced in each cluster and hybridization intensity. Out of 369 novel EST clusters not showing any similarity to previously characterized proteins, we focused on 250 EST clusters that were represented by robust probe sets on the MuscleChip fulfilling all stringent rules. 102 (41%) were found to be consistently "present" by analysis of hybridization to human muscle RNA, of which 40 ESTs (39%) could be genome anchored to potential transcription units in the human genome sequence. 19 ESTs of the 40 ESTs were furthermore computer-predicted as exons by one or more than three gene identification algorithms.
Our analysis found 40 transcriptionally validated, genome-anchored novel EST clusters to be expressed in human muscle. As most of these ESTs were low copy clusters (duplex and triplex) in the original 28,000 EST project, the identification of these as significantly expressed is a robust validation of the transcript units that permits subsequent focus on the novel proteins encoded by these genes.
PMCID: PMC137597  PMID: 12456269
Expression profiling; oligonucleotide microarrays; Affymetrix; muscle; EST
2.  Assessment of the relationship between pre-chip and post-chip quality measures for Affymetrix GeneChip expression data 
BMC Bioinformatics  2006;7:211.
Gene expression microarray experiments are expensive to conduct and guidelines for acceptable quality control at intermediate steps before and after the samples are hybridised to chips are vague. We conducted an experiment hybridising RNA from human brain to 117 U133A Affymetrix GeneChips and used these data to explore the relationship between 4 pre-chip variables and 22 post-chip outcomes and quality control measures.
We found that the pre-chip variables were significantly correlated with each other but that this correlation was strongest between measures of RNA quality and cRNA yield. Post-mortem interval was negatively correlated with these variables. Four principal components, reflecting array outliers, array adjustment, hybridisation noise and RNA integrity, explain about 75% of the total post-chip measure variability. Two significant canonical correlations existed between the pre-chip and post-chip variables, derived from MAS 5.0, dChip and the Bioconductor packages affy and affyPLM. The strongest (CANCOR 0.838, p < 0.0001) correlated RNA integrity and yield with post chip quality control (QC) measures indexing 3'/5' RNA ratios, bias or scaling of the chip and scaling of the variability of the signal across the chip. Post-mortem interval was relatively unimportant. We also found that the RNA integrity number (RIN) could be moderately well predicted by post-chip measures B_ACTIN35, GAPDH35 and SF.
We have found that the post-chip variables having the strongest association with quantities measurable before hybridisation are those reflecting RNA integrity. Other aspects of quality, such as noise measures (reflecting the execution of the assay) or measures reflecting data quality (outlier status and array adjustment variables) are not well predicted by the variables we were able to determine ahead of time. There could be other variables measurable pre-hybridisation which may be better associated with expression data quality measures. Uncovering such connections could create savings on costly microarray experiments by eliminating poor samples before hybridisation.
PMCID: PMC1524996  PMID: 16623940
3.  Quality control in microarray assessment of gene expression in human airway epithelium 
BMC Genomics  2009;10:493.
Microarray technology provides a powerful tool for defining gene expression profiles of airway epithelium that lend insight into the pathogenesis of human airway disorders. The focus of this study was to establish rigorous quality control parameters to ensure that microarray assessment of the airway epithelium is not confounded by experimental artifact. Samples (total n = 223) of trachea, large and small airway epithelium were collected by fiberoptic bronchoscopy of 144 individuals and hybridized to Affymetrix microarrays. The pre- and post-chip quality control (QC) criteria established, included: (1) RNA quality, assessed by RNA Integrity Number (RIN) ≥ 7.0; (2) cRNA transcript integrity, assessed by signal intensity ratio of GAPDH 3' to 5' probe sets ≤ 3.0; and (3) the multi-chip normalization scaling factor ≤ 10.0.
Of the 223 samples, all three criteria were assessed in 191; of these 184 (96.3%) passed all three criteria. For the remaining 32 samples, the RIN was not available, and only the other two criteria were used; of these 29 (90.6%) passed these two criteria. Correlation coefficients for pairwise comparisons of expression levels for 100 maintenance genes in which at least one array failed the QC criteria (average Pearson r = 0.90 ± 0.04) were significantly lower (p < 0.0001) than correlation coefficients for pairwise comparisons between arrays that passed the QC criteria (average Pearson r = 0.97 ± 0.01). Inter-array variability was significantly decreased (p < 0.0001) among samples passing the QC criteria compared with samples failing the QC criteria.
Based on the aberrant maintenance gene data generated from samples failing the established QC criteria, we propose that the QC criteria outlined in this study can accurately distinguish high quality from low quality data, and can be used to delete poor quality microarray samples before proceeding to higher-order biological analyses and interpretation.
PMCID: PMC2774870  PMID: 19852842
4.  Comparison of Affymetrix Gene Array with the Exon Array shows potential application for detection of transcript isoform variation 
BMC Genomics  2009;10:519.
The emergence of isoform-sensitive microarrays has helped fuel in-depth studies of the human transcriptome. The Affymetrix GeneChip Human Exon 1.0 ST Array (Exon Array) has been previously shown to be effective in profiling gene expression at the isoform level. More recently, the Affymetrix GeneChip Human Gene 1.0 ST Array (Gene Array) has been released for measuring gene expression and interestingly contains a large subset of probes from the Exon Array. Here, we explore the potential of using Gene Array probes to assess expression variation at the sub-transcript level. Utilizing datasets of the high quality Microarray Quality Control (MAQC) RNA samples previously assayed on the Exon Array and Gene Array, we compare the expression measurements of the two platforms to determine the performance of the Gene Array in detecting isoform variations.
Overall, we show that the Gene Array is comparable to the Exon Array in making gene expression calls. Moreover, to examine expression of different isoforms, we modify the Gene Array probe set definition file to enable summarization of probe intensity values at the exon level and show that the expression profiles between the two platforms are also highly correlated. Next, expression calls of previously known differentially spliced genes were compared and also show concordant results. Splicing index analysis, representing estimates of exon inclusion levels, shows a lower but good correlation between platforms. As the Gene Array contains a significant subset of probes from the Exon Array, we note that, in comparison, the Gene Array overlaps with fewer but still a high proportion of splicing events annotated in the Known Alt Events UCSC track, with abundant coverage of cassette exons. We discuss the ability of the Gene Array to detect alternative splicing and isoform variation and address its limitations.
The Gene Array is an effective expression profiling tool at gene and exon expression level, the latter made possible by probe set annotation modifications. We demonstrate that the Gene Array is capable of detecting alternative splicing and isoform variation. As expected, in comparison to the Exon Array, it is limited by reduced gene content coverage and is not able to detect as wide a range of alternative splicing events. However, for the events that can be monitored by both platforms, we estimate that the selectivity and sensitivity levels are comparable. We hope our findings will shed light on the potential extension of the Gene Array to detect alternative splicing. It should be particularly suitable for researchers primarily interested in gene expression analysis, but who may be willing to look for splicing and isoform differences within their dataset. However, we do not suggest it to be an equivalent substitute to the more comprehensive Exon Array.
PMCID: PMC2780461  PMID: 19909511
5.  Transcript profiling of common bean (Phaseolus vulgaris L.) using the GeneChip® Soybean Genome Array: optimizing analysis by masking biased probes 
BMC Plant Biology  2010;10:85.
Common bean (Phaseolus vulgaris L.) and soybean (Glycine max) both belong to the Phaseoleae tribe and share significant coding sequence homology. This suggests that the GeneChip® Soybean Genome Array (soybean GeneChip) may be used for gene expression studies using common bean.
To evaluate the utility of the soybean GeneChip for transcript profiling of common bean, we hybridized cRNAs purified from nodule, leaf, and root of common bean and soybean in triplicate to the soybean GeneChip. Initial data analysis showed a decreased sensitivity and accuracy of measuring differential gene expression in common bean cross-species hybridization (CSH) GeneChip data compared to that of soybean. We employed a method that masked putative probes targeting inter-species variable (ISV) regions between common bean and soybean. A masking signal intensity threshold was selected that optimized both sensitivity and accuracy of measuring differential gene expression. After masking for ISV regions, the number of differentially-expressed genes identified in common bean was increased by 2.8-fold reflecting increased sensitivity. Quantitative RT-PCR (qRT-PCR) analysis of 20 randomly selected genes and purine-ureide pathway genes demonstrated an increased accuracy of measuring differential gene expression after masking for ISV regions. We also evaluated masked probe frequency per probe set to gain insight into the sequence divergence pattern between common bean and soybean. The sequence divergence pattern analysis suggested that the genes for basic cellular functions and metabolism were highly conserved between soybean and common bean. Additionally, our results show that some classes of genes, particularly those associated with environmental adaptation, are highly divergent.
The soybean GeneChip is a suitable cross-species platform for transcript profiling in common bean when used in combination with the masking protocol described. In addition to transcript profiling, CSH of the GeneChip in combination with masking probes in the ISV regions can be used for comparative ecological and/or evolutionary genomics studies.
PMCID: PMC3017814  PMID: 20459672
6.  The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies 
BMC Bioinformatics  2006;7:105.
Intensity values measured by Affymetrix microarrays have to be both normalized, to be able to compare different microarrays by removing non-biological variation, and summarized, generating the final probe set expression values. Various pre-processing techniques, such as dChip, GCRMA, RMA and MAS have been developed for this purpose. This study assesses the effect of applying different pre-processing methods on the results of analyses of large Affymetrix datasets. By focusing on practical applications of microarray-based research, this study provides insight into the relevance of pre-processing procedures to biology-oriented researchers.
Using two publicly available datasets, i.e., gene-expression data of 285 patients with Acute Myeloid Leukemia (AML, Affymetrix HG-U133A GeneChip) and 42 samples of tumor tissue of the embryonal central nervous system (CNS, Affymetrix HuGeneFL GeneChip), we tested the effect of the four pre-processing strategies mentioned above, on (1) expression level measurements, (2) detection of differential expression, (3) cluster analysis and (4) classification of samples. In most cases, the effect of pre-processing is relatively small compared to other choices made in an analysis for the AML dataset, but has a more profound effect on the outcome of the CNS dataset. Analyses on individual probe sets, such as testing for differential expression, are affected most; supervised, multivariate analyses such as classification are far less sensitive to pre-processing.
Using two experimental datasets, we show that the choice of pre-processing method is of relatively minor influence on the final analysis outcome of large microarray studies whereas it can have important effects on the results of a smaller study. The data source (platform, tissue homogeneity, RNA quality) is potentially of bigger importance than the choice of pre-processing method.
PMCID: PMC1481623  PMID: 16512908
7.  Intratumor Heterogeneity and Precision of Microarray-Based Predictors of Breast Cancer Biology and Clinical Outcome 
Journal of Clinical Oncology  2010;28(13):2198-2206.
Identifying sources of variation in expression microarray data and the effect of variance in gene expression measurements on complex predictive and diagnostic models is essential when translating microarray-based experimental approaches into clinical assays. The technical reproducibility of microarray platforms is well established. Here, we investigate the additional impact of intratumor heterogeneity, a largely unstudied component of variance, on the performance of several microarray-based assays in breast cancer.
Patients and Methods
Genome-wide expression profiling was performed on 50 core needle biopsies from 18 breast cancer patients using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Global profiles of expression were characterized using unsupervised clustering methods and variance components models. Array-based measures of estrogen receptor (ER) and progesterone receptor (PR) status were compared with immunohistochemistry. The precision of genomic predictors of ER pathway status, recurrence risk, and sensitivity to chemotherapeutics was evaluated by interclass correlation.
Global patterns of gene expression demonstrated that intratumor variation was substantially less than the total variation observed across the patient population. Nevertheless, a fraction of genes exhibited significant intratumor heterogeneity in expression. A high degree of reproducibility was observed in single-gene predictors of ER (intraclass correlation coefficient [ICC] = 0.94) and PR expression (ICC = 0.90), and in a multigene predictor of ER pathway activation (ICC = 0.98) with high concordance with immunohistochemistry. Substantial agreement was also observed for multigene signatures of cancer recurrence (ICC = 0.71) and chemotherapeutic sensitivity (ICC = 0.72 and 0.64).
Intratumor heterogeneity, although present at the level of individual gene expression, does not preclude precise microarray-based predictions of tumor behavior or clinical outcome in breast cancer patients.
PMCID: PMC2860437  PMID: 20368555
8.  Use of bioanalyzer electropherograms for quality control and target evaluation in microarray expression profiling studies of ocular tissues 
Expression profiling with DNA microarrays has been used to examine the transcriptome of a wide spectrum of vertebrate cells and tissues. The sensitivity and accuracy of the data generated is dependent on the quality and composition of the input RNA. In this report, we examine the quality and array performance of over 200 total RNA samples extracted from ocular tissues and cells that have been processed in a microarray core laboratory over a 7-year period. Total RNA integrity and cRNA target size distribution were assessed using the 2100 Bioanalyzer. We present Affymetrix GeneChip array performance metrics for different ocular samples processed according to a standard microarray assay workflow including several quality control checkpoints. Our review of ocular sample performance in the microarray assay demonstrates the value of considering tissue-specific characteristics in evaluating array data. Specifically, we show that Bioanalyzer electropherograms reveal highly abundant mRNAs in lacrimal gland targets that are correlated with variation in array assay performance. Our results provide useful benchmarks for other gene expression studies of ocular systems.
PMCID: PMC2816811  PMID: 20157354
DNA microarrays; Gene expression; mRNA; RNA quality; Capillary electrophoresis; Lacrimal gland
9.  Use of bioanalyzer electropherograms for quality control and target evaluation in microarray expression profiling studies of ocular tissues 
Expression profiling with DNA microarrays has been used to examine the transcriptome of a wide spectrum of vertebrate cells and tissues. The sensitivity and accuracy of the data generated is dependent on the quality and composition of the input RNA. In this report, we examine the quality and array performance of over 200 total RNA samples extracted from ocular tissues and cells that have been processed in a microarray core laboratory over a 7-year period. Total RNA integrity and cRNA target size distribution were assessed using the 2100 Bioanalyzer. We present Affymetrix GeneChip array performance metrics for different ocular samples processed according to a standard microarray assay workflow including several quality control checkpoints. Our review of ocular sample performance in the microarray assay demonstrates the value of considering tissue-specific characteristics in evaluating array data. Specifically, we show that Bioanalyzer electropherograms reveal highly abundant mRNAs in lacrimal gland targets that are correlated with variation in array assay performance. Our results provide useful benchmarks for other gene expression studies of ocular systems.
PMCID: PMC2816811  PMID: 20157354
DNA microarrays; Gene expression; mRNA; RNA quality; Capillary electrophoresis; Lacrimal gland
10.  Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq 
The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.
PMCID: PMC4162206
11.  Reproducibility of oligonucleotide arrays using small samples 
BMC Genomics  2003;4:4.
Low RNA yields from small tissue samples can limit the use of oligonucleotide microarrays (Affymetrix GeneChips®). Methods using less cRNA for hybridization or amplifying the cRNA have been reported to reduce the number of transcripts detected, but the effect on realistic experiments designed to detect biological differences has not been analyzed. We systematically explore the effects of using different starting amounts of RNA on the ability to detect differential gene expression.
The standard Affymetrix protocol can be used starting with only 2 micrograms of total RNA, with results equivalent to the recommended 10 micrograms. Biological variability is much greater than the technical variability introduced by this change. A simple amplification protocol described here can be used for samples as small as 0.1 micrograms of total RNA. This amplification protocol allows detection of a substantial fraction of the significant differences found using the standard protocol, despite an increase in variability and the 5' truncation of the transcripts, which prevents detection of a subset of genes.
Biological differences in a typical experiment are much greater than differences resulting from technical manipulations in labeling and hybridization. The standard protocol works well with 2 micrograms of RNA, and with minor modifications could allow the use of samples as small as 1 micrograms. For smaller amounts of starting material, down to 0.1 micrograms RNA, differential gene expression can still be detected using the single cycle amplification protocol. Comparisons of groups of four arrays detect many more significant differences than comparisons of three arrays.
PMCID: PMC150597  PMID: 12594857
12.  The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface 
Nucleic Acids Research  2004;32(Database issue):D578-D581.
Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see, with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see
PMCID: PMC308738  PMID: 14681485
13.  Computational method for reducing variance with Affymetrix microarrays 
BMC Bioinformatics  2002;3:23.
Affymetrix microarrays are used by many laboratories to generate gene expression profiles. Generally, only large differences (> 1.7-fold) between conditions have been reported. Computational methods to reduce inter-array variability might be of value when attempting to detect smaller differences. We examined whether inter-array variability could be reduced by using data based on the Affymetrix algorithm for pairwise comparisons between arrays (ratio method) rather than data based on the algorithm for analysis of individual arrays (signal method). Six HG-U95A arrays that probed mRNA from young (21–31 yr old) human muscle were compared with six arrays that probed mRNA from older (62–77 yr old) muscle.
Differences in mean expression levels of young and old subjects were small, rarely > 1.5-fold. The mean within-group coefficient of variation for 4629 mRNAs expressed in muscle was 20% according to the ratio method and 25% according to the signal method. The ratio method yielded more differences according to t-tests (124 vs. 98 differences at P < 0.01), rank sum tests (107 vs. 85 differences at P < 0.01), and the Significance Analysis of Microarrays method (124 vs. 56 differences with false detection rate < 20%; 20 vs. 0 differences with false detection rate < 5%). The ratio method also improved consistency between results of the initial scan and results of the antibody-enhanced scan.
The ratio method reduces inter-array variance and thereby enhances statistical power.
PMCID: PMC126253  PMID: 12204100
14.  What Your Blood Has to Say: Amplifying Blood RNA for the Affymetrix GeneChip® Platform 
Gene Expression measurements from human blood RNA have become an increasingly important research area of focus. A reliable and consistent workflow to obtain RNA measurements from blood would serve to open the clinical arena to gene expression studies as potential diagnostic indicators. The challenge lies in isolating high quality RNA from human whole blood at room temperature without compromising gene expression profiles. To address this challenge, we report a workflow using newly developed MagMax™ RNA isolation kits for Tempus™ stabilized blood and PAXgene® stabilized blood. These two stabilization methods are currently the most widely used methods for blood collection and stabilization in clinics in the United States. High quality RNA generated from these two RNA isolation processes was amplified using the Ambion MessageAmp™ Premier RNA Amplification Kit and hybridized to Affymetrix GeneChip® microarrays. This platform and experimental tools are used, in combination, to demonstrate high quality gene expression profiles from human blood RNA. The reported experimental design includes human whole blood obtained from two donors collected and stabilized in either Tempus™ PAXgene® tubes. RNA was isolated using new MagMax™ RNA isolation kits for Tempus™ and PAXgene® stabilized blood. Isolated total RNA was then processed through the GLOBINclear™-Human kit and then amplified with the MessageAmp™ Premier kit creating a library for hybridization to the Affymetrix Human U133A 2.0 expression arrays. Affymetrix GeneChip® microarray analysis showed parameters within normal limits for expressed genes. Reported results of this controlled experiment, support a validated gene expression workflow for blood on Affymetrix expression arrays using a combination of commercial kits for RNA purification from stabilized blood and library amplification optimized for hybridization and analysis on expression microarrays.
PMCID: PMC3186520
15.  Identification and utilization of inter-species conserved (ISC) probesets on Affymetrix human GeneChip® platforms for the optimization of the assessment of expression patterns in non human primate (NHP) samples 
BMC Bioinformatics  2004;5:165.
While researchers have utilized versions of the Affymetrix human GeneChip® for the assessment of expression patterns in non human primate (NHP) samples, there has been no comprehensive sequence analysis study undertaken to demonstrate that the probe sequences designed to detect human transcripts are reliably hybridizing with their orthologs in NHP. By aligning probe sequences with expressed sequence tags (ESTs) in NHP, inter-species conserved (ISC) probesets, which have two or more probes complementary to ESTs in NHP, were identified on human GeneChip® platforms. The utility of human GeneChips® for the assessment of NHP expression patterns can be effectively evaluated by analyzing the hybridization behaviour of ISC probesets. Appropriate normalization methods were identified that further improve the reliability of human GeneChips® for interspecies (human vs NHP) comparisons.
ISC probesets in each of the seven Affymetrix GeneChip® platforms (U133Plus2.0, U133A, U133B, U95Av2, U95B, Focus and HuGeneFL) were identified for both monkey and chimpanzee. Expression data was generated from peripheral blood mononuclear cells (PBMCs) of 12 human and 8 monkey (Indian origin Rhesus macaque) samples using the Focus GeneChip®. Analysis of both qualitative detection calls and quantitative signal intensities showed that intra-species reproducibility (human vs. human or monkey vs. monkey) was much higher than interspecies reproducibility (human vs. monkey). ISC probesets exhibited higher interspecies reproducibility than the overall expressed probesets. Importantly, appropriate normalization methods could be leveraged to greatly improve interspecies correlations. The correlation coefficients between human (average of 12 samples) and monkey (average of 8 Rhesus macaque samples) are 0.725, 0.821 and 0.893 for MAS5.0 (Microarray Suite version 5.0), dChip and RMA (Robust Multi-chip Average) normalization method, respectively.
It is feasible to use Affymetrix human GeneChip® platforms to assess the expression profiles of NHP for intra-species studies. Caution must be taken for interspecies studies since unsuitable probesets will result in spurious differentially regulated genes between human and NHP. RMA normalization method and ISC probesets are recommended for interspecies studies.
PMCID: PMC526766  PMID: 15507140
16.  Cross-species analysis of gene expression in non-model mammals: reproducibility of hybridization on high density oligonucleotide microarrays 
BMC Genomics  2007;8:89.
Gene expression profiles of non-model mammals may provide valuable data for biomedical and evolutionary studies. However, due to lack of sequence information of other species, DNA microarrays are currently restricted to humans and a few model species. This limitation may be overcome by using arrays developed for a given species to analyse gene expression in a related one, an approach known as "cross-species analysis". In spite of its potential usefulness, the accuracy and reproducibility of the gene expression measures obtained in this way are still open to doubt. The present study examines whether or not hybridization values from cross-species analyses are as reproducible as those from same-species analyses when using Affymetrix oligonucleotide microarrays.
The reproducibility of the probe data obtained hybridizing deer, Old-World primates, and human RNA samples to Affymetrix human GeneChip® U133 Plus 2.0 was compared. The results show that cross-species hybridization affected neither the distribution of the hybridization reproducibility among different categories, nor the reproducibility values of the individual probes. Our analyses also show that a 0.5% of the probes analysed in the U133 plus 2.0 GeneChip are significantly associated to un-reproducible hybridizations. Such probes-called in the text un-reproducible probe sequences- do not increase in number in cross-species analyses.
Our study demonstrates that cross-species analyses do not significantly affect hybridization reproducibility of GeneChips, at least within the range of the mammal species analysed here. The differences in reproducibility between same-species and cross-species analyses observed in previous studies were probably caused by the analytical methods used to calculate the gene expression measures. Together with previous observations on the accuracy of GeneChips for cross-species analysis, our analyses demonstrate that cross-species hybridizations may provide useful gene expression data. However, the reproducibility and accuracy of these measures largely depends on the use of appropriated algorithms to derive the gene expression data from the probe data. Also, the identification of probes associated to un-reproducible hybridizations-useless for gene expression analyses- in the studied GeneChip, stress the need of a re-evaluation of the probes' performance.
PMCID: PMC1853087  PMID: 17407579
17.  Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments 
BMC Genomics  2011;12:589.
Systematic processing noise, which includes batch effects, is very common in microarray experiments but is often ignored despite its potential to confound or compromise experimental results. Compromised results are most likely when re-analysing or integrating datasets from public repositories due to the different conditions under which each dataset is generated. To better understand the relative noise-contributions of various factors in experimental-design, we assessed several Illumina and Affymetrix datasets for technical variation between replicate hybridisations of Universal Human Reference (UHRR) and individual or pooled breast-tumour RNA.
A varying degree of systematic noise was observed in each of the datasets, however in all cases the relative amount of variation between standard control RNA replicates was found to be greatest at earlier points in the sample-preparation workflow. For example, 40.6% of the total variation in reported expressions were attributed to replicate extractions, compared to 13.9% due to amplification/labelling and 10.8% between replicate hybridisations. Deliberate probe-wise batch-correction methods were effective in reducing the magnitude of this variation, although the level of improvement was dependent on the sources of noise included in the model. Systematic noise introduced at the chip, run, and experiment levels of a combined Illumina dataset were found to be highly dependant upon the experimental design. Both UHRR and pools of RNA, which were derived from the samples of interest, modelled technical variation well although the pools were significantly better correlated (4% average improvement) and better emulated the effects of systematic noise, over all probes, than the UHRRs. The effect of this noise was not uniform over all probes, with low GC-content probes found to be more vulnerable to batch variation than probes with a higher GC-content.
The magnitude of systematic processing noise in a microarray experiment is variable across probes and experiments, however it is generally the case that procedures earlier in the sample-preparation workflow are liable to introduce the most noise. Careful experimental design is important to protect against noise, detailed meta-data should always be provided, and diagnostic procedures should be routinely performed prior to downstream analyses for the detection of bias in microarray studies.
PMCID: PMC3269440  PMID: 22133085
18.  In vitro identification and in silico utilization of interspecies sequence similarities using GeneChip® technology 
BMC Genomics  2005;6:62.
Genomic approaches in large animal models (canine, ovine etc) are challenging due to insufficient genomic information for these species and the lack of availability of corresponding microarray platforms. To address this problem, we speculated that conserved interspecies genetic sequences can be experimentally detected by cross-species hybridization. The Affymetrix platform probe redundancy offers flexibility in selecting individual probes with high sequence similarities between related species for gene expression analysis.
Gene expression profiles of 40 canine samples were generated using the human HG-U133A GeneChip (U133A). Due to interspecies genetic differences, only 14 ± 2% of canine transcripts were detected by U133A probe sets whereas profiling of 40 human samples detected 49 ± 6% of human transcripts. However, when these probe sets were deconstructed into individual probes and examined performance of each probe, we found that 47% of human probes were able to find their targets in canine tissues and generate a detectable hybridization signal. Therefore, we restricted gene expression analysis to these probes and observed the 60% increase in the number of identified canine transcripts. These results were validated by comparison of transcripts identified by our restricted analysis of cross-species hybridization with transcripts identified by hybridization of total lung canine mRNA to new Affymetrix Canine GeneChip®.
The experimental identification and restriction of gene expression analysis to probes with detectable hybridization signal drastically increases transcript detection of canine-human hybridization suggesting the possibility of broad utilization of cross-hybridizations of related species using GeneChip technology.
PMCID: PMC1156887  PMID: 15871745
19.  puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis 
BMC Bioinformatics  2013;14:39.
Microarrays have been a popular tool for gene expression profiling at genome-scale for over a decade due to the low cost, short turn-around time, excellent quantitative accuracy and ease of data generation. The Bioconductor package puma incorporates a suite of analysis methods for determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analysis. As isoform level expression profiling receives more and more interest within genomics in recent years, exon microarray technology offers an important tool to quantify expression level of the majority of exons and enables the possibility of measuring isoform level expression. However, puma does not include methods for the analysis of exon array data. Moreover, the current expression summarisation method for Affymetrix 3’ GeneChip data suffers from instability for low expression genes. For the downstream analysis, the method for differential expression detection is computationally intensive and the original expression clustering method does not consider the variance across the replicated technical and biological measurements. It is therefore necessary to develop improved uncertainty propagation methods for gene and transcript expression analysis.
We extend the previously developed Bioconductor package puma with a new method especially designed for GeneChip Exon arrays and a set of improved downstream approaches. The improvements include: (i) a new gamma model for exon arrays which calculates isoform and gene expression measurements and a level of uncertainty associated with the estimates, using the multi-mappings between probes, isoforms and genes, (ii) a variant of the existing approach for the probe-level analysis of Affymetrix 3’ GeneChip data to produce more stable gene expression estimates, (iii) an improved method for detecting differential expression which is computationally more efficient than the existing approach in the package and (iv) an improved method for robust model-based clustering of gene expression, which takes technical and biological replicate information into consideration.
With the extensions and improvements, the puma package is now applicable to the analysis of both Affymetrix 3’ GeneChips and Exon arrays for gene and isoform expression estimation. It propagates the uncertainty of expression measurements into more efficient and comprehensive downstream analysis at both gene and isoform level. Downstream methods are also applicable to other expression quantification platforms, such as RNA-Seq, when uncertainty information is available from expression measurements. puma is available through Bioconductor and can be found at
PMCID: PMC3626802  PMID: 23379655
20.  Effect of RNA quality on transcript intensity levels in microarray analysis of human post-mortem brain tissues 
BMC Genomics  2008;9:91.
Large-scale gene expression analysis of post-mortem brain tissue offers unique opportunities for investigating genetic mechanisms of psychiatric and neurodegenerative disorders. On the other hand microarray data analysis associated with these studies is a challenging task. In this publication we address the issue of low RNA quality data and corresponding data analysis strategies.
A detailed analysis of effects of post chip RNA quality on the measured abundance of transcripts is presented. Overall Affymetrix GeneChip data (HG-U133_AB and HG-U133_Plus_2.0) derived from ten different brain regions was investigated. Post chip RNA quality being assessed by 5'/3' ratio of housekeeping genes was found to introduce a well pronounced systematic noise into the measured transcript expression levels. According to this study RNA quality effects have: 1) a "random" component which is introduced by the technology and 2) a systematic component which depends on the features of the transcripts and probes. Random components mainly account for numerous negative correlations of low-abundant transcripts. These negative correlations are not reproducible and are mainly introduced by an increased relative level of noise. Three major contributors to the systematic noise component were identified: the first is the probe set distribution, the second is the length of mRNA species, and the third is the stability of mRNA species. Positive correlations reflect the 5'-end to 3'-end direction of mRNA degradation whereas negative correlations result from the compensatory increase in stable and 3'-end probed transcripts. Systematic components affect the expressed transcripts by introducing irrelevant gene correlations and can strongly influence the results of the main experiment. A linear model correcting the effect of RNA quality on measured intensities was introduced.
In addition the contribution of a number of pre-mortem and post-mortem attributes to the overall detected RNA quality effect was investigated. Brain pH, duration of agonal stage, post-mortem interval before sampling and donor's age of death within considered limits were found to have no significant contribution.
Basic conclusions for data analysis in expression profiling study are as follows: 1) testing for RNA quality dependency should be included in the preprocessing of the data; 2) investigating inter-gene correlation without regard to RNA quality effects could be misleading; 3) data normalization procedures relying on housekeeping genes either do not influence the correlation structure (if 3'-end intensities are used) or increase it for negatively correlated transcripts (if 5'-end or median intensities are included in normalization procedure); 4) sample sets should be matched with regard to RNA quality; 5) RMA preprocessing is more sensitive to RNA quality effect, than MAS 5.0.
PMCID: PMC2268927  PMID: 18298816
21.  Comparative microRNA Profiling by TaqMan Low Density Arrays and Affymetrix GeneChip miRNA 2.0 Arrays: An Evaluation of Platform Performance 
Genome-wide miRNA level profiling platforms can be applied to measure aberrant post-transcriptional gene regulation in human diseases including cancer. To determine the relative miRNA abundance in paired total RNA samples from tumors and normal tissues from patients with aristolochic acid nephropathy (AAN) with upper urinary tract urothelial carcinomas (UUCs), we performed a systematic performance comparison between two prominent platforms: 1) the high-capacity quantitative PCR using Applied Biosystems megaplex RT primer pools for 754 human miRNAs from the miRBase v14 contents, analyzed by the microfluidics TaqMan Low Density Arrays (TLDA)) and 2) the semiquantitative Affymetrix GeneChip miRNA 2.0 arrays that cover 100 % miRBase v15 contents, with multi-species coverage corresponding to miRNAs of 131 organisms detected by 15,644 probe sets. Systematic data analysis of the TLDA results revealed a signature of 50 miRNAs differentially modulated (19 elevated and 31 reduced) in tumors versus unaffected tissues detected at a reasonable Pearson correlation by both profiling platforms, while there were also platform-specific discrepancies, especially when relative levels and the rank of all human miRNAs were compared. We conclude that for miRNA abundance profiling studies, the performance of these platforms is considerably comparable and we also address the possible sources of observed discrepancies. Next, we discuss the utility of the additional information provided by the broader Affymetrix array content such as the value of cross-reactivity among orthologous probes for non-human species and the value of additional probes for snoRNAs and scaRNAs and most importantly the extensive sets unique to pre-miRNA hairpins. Our study can serve as a useful starting point for specific platform recommendations offered to clients by a genomics shared resource laboratory.
PMCID: PMC3630612
22.  The Mayo Clinic Advanced Genomic Technology Center Microarray Shared Resource 
The Mayo Clinic Advanced Genomic Technology Center Microarray Shared Resource provides a full range of services from RNA quality assessment to whole genome transcript measurement for researchers both inside and outside of the Mayo Health System. Created in 2000, the facility offers technical services and support for both basic and clinical research programs for more than 80 principle investigators annually. Specializing in high-density microarray and real-time PCR-based analyses, the lab provides instrumentation and technical expertise to perform genomic studies on a wide variety of different sample types, including, but not limited to, RNA samples derived from fresh frozen tissues, formalin-fixed paraffin-embedded tissues, Laser Capture Microdissected cells, and cultured cell lines. Instrument platforms present in the Microarray Shared Resource include Affymetrix GeneChip™;, Illumina BeadChip™, Applied Biosystems 7900 HT Sequence Detection Systems, Fluidigm Biomark™ and Agilent 2100 Bioanalyzers. Affymetrix GeneChip™ analyses are performed routinely on all human, mouse, and rat whole transcriptome arrays, as well as Gene 1.0, Exon, Tiling, and miRNA GeneChip™ products. In addition, Illumina BeadChip™ analyses are conducted with human and mouse Whole Genome 6, HT-12, Ref8, and Whole Genome DASL assays. As genomic technologies continue to emerge and evolve, the Microarray Shared Resource also seeks to expand its capabilities to include gene expression profiling in single cells as well as mRNA and miRNA analyses using Fluidigm high throughput chips and Next Generation sequencing technologies.
PMCID: PMC2918225
23.  Microarray analysis of retinal gene expression in chicks during imposed myopic defocus 
Molecular Vision  2008;14:1589-1599.
The retina plays an important regulatory role in ocular growth. To screen for new retinal candidate genes that could be involved in the inhibition of ocular growth, we used chick microarrays to analyze the changes in retinal mRNA expression after myopic defocus was imposed by positive lens wear.
Four male white leghorn chicks, aged nine days, wore +6.9D spectacle lenses over both eyes for 24 h. Four untreated age-matched male chicks from the same batch served as controls. The chicks were euthanized, and retinas from both eyes of each chick were pooled. RNA was isolated and labeled cRNA was prepared. These samples were hybridized to Affymetrix GeneChip Chicken Genome arrays with more than 28,000 characterized genes. After comparison of multiple normalization methods, GC-RMA and a false-discovery rate of 6% was chosen for normalization of the data. The expression of 16 candidate genes was further studied, using semiquantitative real-time RT–PCR. In addition, the expression of the mRNA of some of these candidate genes was assessed in chicks that wore either +6.9D lenses for 4 h or −7D lenses for 24 h.
123 transcripts were found to be differentially expressed (p<0.05; at least 1.5-fold change in expression level), with an absolute mean fold-change of 1.97±1.16 (mean±standard deviation). Nine of the sixteen genes that were examined by real-time RT–PCR were validated. Regardless of whether positive or negative lenses were worn, six of these nine genes were regulated in the same direction after 24 h: arginyltransferase 1 (ATE1), E74-like factor 1 (ELF1), growth factor receptor-bound protein 2 (GRB2), SHQ1 homolog (S. cerevisiae) (SHQ1), spectrin, beta, non-erythrocytic 1 (SPTBN1), prepro-urotensin II-related peptide (pp-URP). Three genes responded differently to positive and negative lens treatment after 24 h: ATP-binding cassette, sub-family C, member 10 (ABCC10), CD226 molecule (CD226) and oxysterol binding protein 2 (OSBP2).
The validated genes that were regulated only by myopic defocus may represent elements in a pathway generating a “stop-signal” for eye growth. Some of the genes identified in this study have so far not been described in the retina. Further investigation of their function may improve the understanding of the signaling cascades in emmetropization. More general, published microarray data are variable among different animal models (mouse, chick, monkeys), tissues (retina, retina/retinal pigment epithelium), treatments (diffusers, lenses, lid-suture), as well as different treatment durations (hours, days), and comparisons remain difficult. That only a small number of common genes were found emphasizes the need for careful normalization of the experimental parameters.
PMCID: PMC2528026  PMID: 18769560
24.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis 
BMC Medical Genomics  2008;1:42.
The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses.
A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics.
Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.
PMCID: PMC2563019  PMID: 18803878
25.  Allelic imbalance analysis by high-density single-nucleotide polymorphic allele (SNP) array with whole genome amplified DNA 
Nucleic Acids Research  2004;32(9):e69.
Besides their use in mRNA expression profiling, oligonucleotide microarrays have also been applied to single-nucleotide polymorphism (SNP) and loss of heterozygosity (LOH) or allelic imbalance studies. In this report, we evaluate the reliability of using whole genome amplified DNA for analysis with an oligonucleotide microarray containing 11 560 SNPs to detect allelic imbalance and chromosomal copy number abnormalities. Whole genome SNP analyses were performed with DNA extracted from osteosarcoma tissues and patient-matched blood. SNP calls were then generated by Affymetrix® GeneChip® DNA Analysis Software. In two osteosarcoma cases, using unamplified DNA, we identified 793 and 1070 SNP loci with allelic imbalance, respectively. In a parallel experiment with amplified DNA, 78% and 83% of these SNP loci with allelic imbalance was detected. The average false-positive rate is 13.8%. Furthermore, using the Affymetrix® GeneChip® Chromosome Copy Number Tool to analyze the SNP array data, we were able to detect identical chromosomal regions with gain or loss in both amplified and unamplified DNA at cytoband resolution.
PMCID: PMC419627  PMID: 15148342

Results 1-25 (1241798)