1.  Development and production of an oligonucleotide MuscleChip: use for validation of ambiguous ESTs 
BMC Bioinformatics  2002;3:33.
We describe the development, validation, and use of a highly redundant 120,000 oligonucleotide microarray (MuscleChip) containing 4,601 probe sets representing 1,150 known genes expressed in muscle and 2,075 EST clusters from a non-normalized subtracted muscle EST sequencing project (28,074 EST sequences). This set included 369 novel EST clusters showing no match to previously characterized proteins in any database. Each probe set was designed to contain 20–32 25 mer oligonucleotides (10–16 paired perfect match and mismatch probe pairs per gene), with each probe evaluated for hybridization kinetics (Tm) and similarity to other sequences. The 120,000 oligonucleotides were synthesized by photolithography and light-activated chemistry on each microarray.
Hybridization of human muscle cRNAs to this MuscleChip (33 samples) showed a correlation of 0.6 between the number of ESTs sequenced in each cluster and hybridization intensity. Out of 369 novel EST clusters not showing any similarity to previously characterized proteins, we focused on 250 EST clusters that were represented by robust probe sets on the MuscleChip fulfilling all stringent rules. 102 (41%) were found to be consistently "present" by analysis of hybridization to human muscle RNA, of which 40 ESTs (39%) could be genome anchored to potential transcription units in the human genome sequence. 19 ESTs of the 40 ESTs were furthermore computer-predicted as exons by one or more than three gene identification algorithms.
Our analysis found 40 transcriptionally validated, genome-anchored novel EST clusters to be expressed in human muscle. As most of these ESTs were low copy clusters (duplex and triplex) in the original 28,000 EST project, the identification of these as significantly expressed is a robust validation of the transcript units that permits subsequent focus on the novel proteins encoded by these genes.
PMCID: PMC137597  PMID: 12456269
Expression profiling; oligonucleotide microarrays; Affymetrix; muscle; EST
2.  Assessment of the relationship between pre-chip and post-chip quality measures for Affymetrix GeneChip expression data 
BMC Bioinformatics  2006;7:211.
Gene expression microarray experiments are expensive to conduct and guidelines for acceptable quality control at intermediate steps before and after the samples are hybridised to chips are vague. We conducted an experiment hybridising RNA from human brain to 117 U133A Affymetrix GeneChips and used these data to explore the relationship between 4 pre-chip variables and 22 post-chip outcomes and quality control measures.
We found that the pre-chip variables were significantly correlated with each other but that this correlation was strongest between measures of RNA quality and cRNA yield. Post-mortem interval was negatively correlated with these variables. Four principal components, reflecting array outliers, array adjustment, hybridisation noise and RNA integrity, explain about 75% of the total post-chip measure variability. Two significant canonical correlations existed between the pre-chip and post-chip variables, derived from MAS 5.0, dChip and the Bioconductor packages affy and affyPLM. The strongest (CANCOR 0.838, p < 0.0001) correlated RNA integrity and yield with post chip quality control (QC) measures indexing 3'/5' RNA ratios, bias or scaling of the chip and scaling of the variability of the signal across the chip. Post-mortem interval was relatively unimportant. We also found that the RNA integrity number (RIN) could be moderately well predicted by post-chip measures B_ACTIN35, GAPDH35 and SF.
We have found that the post-chip variables having the strongest association with quantities measurable before hybridisation are those reflecting RNA integrity. Other aspects of quality, such as noise measures (reflecting the execution of the assay) or measures reflecting data quality (outlier status and array adjustment variables) are not well predicted by the variables we were able to determine ahead of time. There could be other variables measurable pre-hybridisation which may be better associated with expression data quality measures. Uncovering such connections could create savings on costly microarray experiments by eliminating poor samples before hybridisation.
PMCID: PMC1524996  PMID: 16623940
3.  Transcript profiling of common bean (Phaseolus vulgaris L.) using the GeneChip® Soybean Genome Array: optimizing analysis by masking biased probes 
BMC Plant Biology  2010;10:85.
Common bean (Phaseolus vulgaris L.) and soybean (Glycine max) both belong to the Phaseoleae tribe and share significant coding sequence homology. This suggests that the GeneChip® Soybean Genome Array (soybean GeneChip) may be used for gene expression studies using common bean.
To evaluate the utility of the soybean GeneChip for transcript profiling of common bean, we hybridized cRNAs purified from nodule, leaf, and root of common bean and soybean in triplicate to the soybean GeneChip. Initial data analysis showed a decreased sensitivity and accuracy of measuring differential gene expression in common bean cross-species hybridization (CSH) GeneChip data compared to that of soybean. We employed a method that masked putative probes targeting inter-species variable (ISV) regions between common bean and soybean. A masking signal intensity threshold was selected that optimized both sensitivity and accuracy of measuring differential gene expression. After masking for ISV regions, the number of differentially-expressed genes identified in common bean was increased by 2.8-fold reflecting increased sensitivity. Quantitative RT-PCR (qRT-PCR) analysis of 20 randomly selected genes and purine-ureide pathway genes demonstrated an increased accuracy of measuring differential gene expression after masking for ISV regions. We also evaluated masked probe frequency per probe set to gain insight into the sequence divergence pattern between common bean and soybean. The sequence divergence pattern analysis suggested that the genes for basic cellular functions and metabolism were highly conserved between soybean and common bean. Additionally, our results show that some classes of genes, particularly those associated with environmental adaptation, are highly divergent.
The soybean GeneChip is a suitable cross-species platform for transcript profiling in common bean when used in combination with the masking protocol described. In addition to transcript profiling, CSH of the GeneChip in combination with masking probes in the ISV regions can be used for comparative ecological and/or evolutionary genomics studies.
PMCID: PMC3017814  PMID: 20459672
4.  Comparison of Affymetrix Gene Array with the Exon Array shows potential application for detection of transcript isoform variation 
BMC Genomics  2009;10:519.
The emergence of isoform-sensitive microarrays has helped fuel in-depth studies of the human transcriptome. The Affymetrix GeneChip Human Exon 1.0 ST Array (Exon Array) has been previously shown to be effective in profiling gene expression at the isoform level. More recently, the Affymetrix GeneChip Human Gene 1.0 ST Array (Gene Array) has been released for measuring gene expression and interestingly contains a large subset of probes from the Exon Array. Here, we explore the potential of using Gene Array probes to assess expression variation at the sub-transcript level. Utilizing datasets of the high quality Microarray Quality Control (MAQC) RNA samples previously assayed on the Exon Array and Gene Array, we compare the expression measurements of the two platforms to determine the performance of the Gene Array in detecting isoform variations.
Overall, we show that the Gene Array is comparable to the Exon Array in making gene expression calls. Moreover, to examine expression of different isoforms, we modify the Gene Array probe set definition file to enable summarization of probe intensity values at the exon level and show that the expression profiles between the two platforms are also highly correlated. Next, expression calls of previously known differentially spliced genes were compared and also show concordant results. Splicing index analysis, representing estimates of exon inclusion levels, shows a lower but good correlation between platforms. As the Gene Array contains a significant subset of probes from the Exon Array, we note that, in comparison, the Gene Array overlaps with fewer but still a high proportion of splicing events annotated in the Known Alt Events UCSC track, with abundant coverage of cassette exons. We discuss the ability of the Gene Array to detect alternative splicing and isoform variation and address its limitations.
The Gene Array is an effective expression profiling tool at gene and exon expression level, the latter made possible by probe set annotation modifications. We demonstrate that the Gene Array is capable of detecting alternative splicing and isoform variation. As expected, in comparison to the Exon Array, it is limited by reduced gene content coverage and is not able to detect as wide a range of alternative splicing events. However, for the events that can be monitored by both platforms, we estimate that the selectivity and sensitivity levels are comparable. We hope our findings will shed light on the potential extension of the Gene Array to detect alternative splicing. It should be particularly suitable for researchers primarily interested in gene expression analysis, but who may be willing to look for splicing and isoform differences within their dataset. However, we do not suggest it to be an equivalent substitute to the more comprehensive Exon Array.
PMCID: PMC2780461  PMID: 19909511
5.  Computational method for reducing variance with Affymetrix microarrays 
BMC Bioinformatics  2002;3:23.
Affymetrix microarrays are used by many laboratories to generate gene expression profiles. Generally, only large differences (> 1.7-fold) between conditions have been reported. Computational methods to reduce inter-array variability might be of value when attempting to detect smaller differences. We examined whether inter-array variability could be reduced by using data based on the Affymetrix algorithm for pairwise comparisons between arrays (ratio method) rather than data based on the algorithm for analysis of individual arrays (signal method). Six HG-U95A arrays that probed mRNA from young (21–31 yr old) human muscle were compared with six arrays that probed mRNA from older (62–77 yr old) muscle.
Differences in mean expression levels of young and old subjects were small, rarely > 1.5-fold. The mean within-group coefficient of variation for 4629 mRNAs expressed in muscle was 20% according to the ratio method and 25% according to the signal method. The ratio method yielded more differences according to t-tests (124 vs. 98 differences at P < 0.01), rank sum tests (107 vs. 85 differences at P < 0.01), and the Significance Analysis of Microarrays method (124 vs. 56 differences with false detection rate < 20%; 20 vs. 0 differences with false detection rate < 5%). The ratio method also improved consistency between results of the initial scan and results of the antibody-enhanced scan.
The ratio method reduces inter-array variance and thereby enhances statistical power.
PMCID: PMC126253  PMID: 12204100
6.  A genomic approach to myoblast fusion in Drosophila 
We have developed an integrated genetic, genomic and computational approach to identify and characterize genes involved in myoblast fusion in Drosophila. We first used fluorescence activated cell sorting to purify mesodermal cells both from wild-type embryos and from twelve variant genotypes in which muscle development is perturbed in known ways. Then, we obtained gene expression profiles for the purified cells by hybridizing isolated mesodermal RNA to Affymetrix GeneChip arrays. These data were subsequently compounded into a statistical meta-analysis that predicts myoblast subtype-specific gene expression signatures that were later validated by in situ hybridization experiments. Finally, we analyzed the myogenic functions of a subset of these myoblast genes using a double-stranded RNA interference assay in living embryos expressing green fluorescent protein under control of a muscle-specific promoter. This experimental strategy led to the identification of several previously uncharacterized genes required for myoblast fusion in Drosophila.
PMCID: PMC3190861  PMID: 18979251
cell-cell fusion; myoblast; mesoderm; myogenesis; muscle development; Drosophila; genomics; gene expression profiling
7.  Use of bioanalyzer electropherograms for quality control and target evaluation in microarray expression profiling studies of ocular tissues 
Expression profiling with DNA microarrays has been used to examine the transcriptome of a wide spectrum of vertebrate cells and tissues. The sensitivity and accuracy of the data generated is dependent on the quality and composition of the input RNA. In this report, we examine the quality and array performance of over 200 total RNA samples extracted from ocular tissues and cells that have been processed in a microarray core laboratory over a 7-year period. Total RNA integrity and cRNA target size distribution were assessed using the 2100 Bioanalyzer. We present Affymetrix GeneChip array performance metrics for different ocular samples processed according to a standard microarray assay workflow including several quality control checkpoints. Our review of ocular sample performance in the microarray assay demonstrates the value of considering tissue-specific characteristics in evaluating array data. Specifically, we show that Bioanalyzer electropherograms reveal highly abundant mRNAs in lacrimal gland targets that are correlated with variation in array assay performance. Our results provide useful benchmarks for other gene expression studies of ocular systems.
PMCID: PMC2816811  PMID: 20157354
DNA microarrays; Gene expression; mRNA; RNA quality; Capillary electrophoresis; Lacrimal gland
9.  An optimized workflow for improved gene expression profiling for formalin-fixed, paraffin-embedded tumor samples 
Whole genome microarray gene expression profiling is the ‘gold standard’ for the discovery of prognostic and predictive genetic markers for human cancers. However, suitable research material is lacking as most diagnostic samples are preserved as formalin-fixed, paraffin-embedded tissue (FFPET). We tested a new workflow and data analysis method optimized for use with FFPET samples.
Sixteen breast tumor samples were split into matched pairs and preserved as FFPET or fresh-frozen (FF). Total RNA was extracted and tested for yield and purity. RNA from FFPET samples was amplified using three different commercially available kits in parallel, and hybridized to Affymetrix GeneChip® Human Genome U133 Plus 2.0 Arrays. The array probe set was optimized in silico to exclude misdesigned and misannotated probes.
FFPET samples processed using the WT-Ovation™ FFPE System V2 (NuGEN) provided 80% specificity and 97% sensitivity compared with FF samples (assuming values of 100%). In addition, in silico probe set redesign improved sequence detection sensitivity and, thus, may rescue potentially significant small-magnitude gene expression changes that could otherwise be diluted by the overall probe set background.
In conclusion, our FFPET-optimized workflow enables the detection of more genes than previous, nonoptimized approaches, opening new possibilities for the discovery, validation, and clinical application of mRNA biomarkers in human diseases.
PMCID: PMC3660273  PMID: 23641797
Biomarker; Breast cancer; Gene; HER2; Microarray
10.  Gene expression profile of bladder tissue of patients with ulcerative interstitial cystitis 
BMC Genomics  2009;10:199.
Interstitial cystitis (IC), a chronic bladder disease with an increasing incidence, is diagnosed using subjective symptoms in combination with cystoscopic and histological evidence. By cystoscopic examination, IC can be classified into an ulcerative and a non-ulcerative subtype. To better understand this debilitating disease on a molecular level, a comparative gene expression profile of bladder biopsies from patients with ulcerative IC and control patients has been performed.
Gene expression profiles from bladder biopsies of five patients with ulcerative IC and six control patients were generated using Affymetrix GeneChip expression arrays (Affymetrix – GeneChip® Human Genome U133 Plus 2.0). More than 31,000 of > 54,000 tested probe sets were present (detection p-value < 0.05). The difference between the two groups was significant for over 3,500 signals (t-test p-value < 0.01), and approximately 2,000 of the signals (corresponding to approximately 1,000 genes) showed an IC-to-healthy expression ratio greater than two. The IC pattern had similarities to patterns from immune system, lymphatic, and autoimmune diseases. The dominant biological processes were the immune and inflammatory responses. Many of the up-regulated genes were expressed in leukocytes, suggesting that leukocyte invasion into the bladder wall is a dominant feature of ulcerative IC. Histopathological data supported these findings.
GeneChip expression arrays present a global picture of ulcerative IC and provide us with a series of marker genes characteristic for this subtype of the disease. Evaluation of biopsies from other bladder patients with similar symptoms (e.g. patients with non-ulcerative IC) will further indicate whether the data presented here will be valuable for the diagnosis of IC.
PMCID: PMC2686735  PMID: 19400928
11.  Cross-species analysis of gene expression in non-model mammals: reproducibility of hybridization on high density oligonucleotide microarrays 
BMC Genomics  2007;8:89.
Gene expression profiles of non-model mammals may provide valuable data for biomedical and evolutionary studies. However, due to lack of sequence information of other species, DNA microarrays are currently restricted to humans and a few model species. This limitation may be overcome by using arrays developed for a given species to analyse gene expression in a related one, an approach known as "cross-species analysis". In spite of its potential usefulness, the accuracy and reproducibility of the gene expression measures obtained in this way are still open to doubt. The present study examines whether or not hybridization values from cross-species analyses are as reproducible as those from same-species analyses when using Affymetrix oligonucleotide microarrays.
The reproducibility of the probe data obtained hybridizing deer, Old-World primates, and human RNA samples to Affymetrix human GeneChip® U133 Plus 2.0 was compared. The results show that cross-species hybridization affected neither the distribution of the hybridization reproducibility among different categories, nor the reproducibility values of the individual probes. Our analyses also show that a 0.5% of the probes analysed in the U133 plus 2.0 GeneChip are significantly associated to un-reproducible hybridizations. Such probes-called in the text un-reproducible probe sequences- do not increase in number in cross-species analyses.
Our study demonstrates that cross-species analyses do not significantly affect hybridization reproducibility of GeneChips, at least within the range of the mammal species analysed here. The differences in reproducibility between same-species and cross-species analyses observed in previous studies were probably caused by the analytical methods used to calculate the gene expression measures. Together with previous observations on the accuracy of GeneChips for cross-species analysis, our analyses demonstrate that cross-species hybridizations may provide useful gene expression data. However, the reproducibility and accuracy of these measures largely depends on the use of appropriated algorithms to derive the gene expression data from the probe data. Also, the identification of probes associated to un-reproducible hybridizations-useless for gene expression analyses- in the studied GeneChip, stress the need of a re-evaluation of the probes' performance.
PMCID: PMC1853087  PMID: 17407579
12.  The chemiluminescence based Ziplex® automated workstation focus array reproduces ovarian cancer Affymetrix GeneChip® expression profiles 
As gene expression signatures may serve as biomarkers, there is a need to develop technologies based on mRNA expression patterns that are adaptable for translational research. Xceed Molecular has recently developed a Ziplex® technology, that can assay for gene expression of a discrete number of genes as a focused array. The present study has evaluated the reproducibility of the Ziplex system as applied to ovarian cancer research of genes shown to exhibit distinct expression profiles initially assessed by Affymetrix GeneChip® analyses.
The new chemiluminescence-based Ziplex® gene expression array technology was evaluated for the expression of 93 genes selected based on their Affymetrix GeneChip® profiles as applied to ovarian cancer research. Probe design was based on the Affymetrix target sequence that favors the 3' UTR of transcripts in order to maximize reproducibility across platforms. Gene expression analysis was performed using the Ziplex Automated Workstation. Statistical analyses were performed to evaluate reproducibility of both the magnitude of expression and differences between normal and tumor samples by correlation analyses, fold change differences and statistical significance testing.
Expressions of 82 of 93 (88.2%) genes were highly correlated (p < 0.01) in a comparison of the two platforms. Overall, 75 of 93 (80.6%) genes exhibited consistent results in normal versus tumor tissue comparisons for both platforms (p < 0.001). The fold change differences were concordant for 87 of 93 (94%) genes, where there was agreement between the platforms regarding statistical significance for 71 (76%) of 87 genes. There was a strong agreement between the two platforms as shown by comparisons of log2 fold differences of gene expression between tumor versus normal samples (R = 0.93) and by Bland-Altman analysis, where greater than 90% of expression values fell within the 95% limits of agreement.
Overall concordance of gene expression patterns based on correlations, statistical significance between tumor and normal ovary data, and fold changes was consistent between the Ziplex and Affymetrix platforms. The reproducibility and ease-of-use of the technology suggests that the Ziplex array is a suitable platform for translational research.
PMCID: PMC2724495  PMID: 19580657
13.  Transporter and ion channel gene expression after caco-2 cell differentiation using 2 different microarray technologies 
The AAPS Journal  2004;6(3):35-44.
mRNA expression profiles had previously been measured in Caco-2 cells (human colonic carcinoma cells) using either custom-designed spotted oligonucleotide arrays or Affymetrix GeneChip oligonucleotide arrays. The Caco-2 cells used were from different clones and were examined under slightly different culture conditions commonly encountered when Caco-2 cells are used as a model tissue for studying intestinal transport and metabolism in different laboratories. In this study, we compared gene expression profiles of Caco-2 cells generated with different arrays to assess the validity of conclusions derived from the 2 independent studies, with a focus on changes in transporter and ion channel mRNA expression levels on Caco-2 cell differentiation. Significant changes in expression levels upon differentiation were observed with 78 genes, with probes common to both arrays. Of these, 18 genes were upregulated and 36 genes were downregulated. The 2 arrays yielded discrepant results for 24 genes, showing significant changes upon differentiation. The results from the 2 arrays correlated well for genes expressed above average levels (r=0.75,P<0.01, n=25) and poorly for genes expressed at low levels (r=0.08,P>0.05, n=25). Overall correlation across the 2 platforms wasr=0.45 (P<0.01) for the 78 genes, with similar results from both arrays. Despite differences in experimental conditions and array technology, similar results were obtained for most genes.
PMCID: PMC2751246  PMID: 15760106
microarrays; Caco-2 cells; transporter; ion channel; platform comparison
14.  Expression profiling of human renal carcinomas with functional taxonomic analysis 
BMC Bioinformatics  2002;3:26.
Molecular characterization has contributed to the understanding of the inception, progression, treatment and prognosis of cancer. Nucleic acid array-based technologies extend molecular characterization of tumors to thousands of gene products. To effectively discriminate between tumor sub-types, reliable laboratory techniques and analytic methods are required.
We derived mRNA expression profiles from 21 human tissue samples (eight normal kidneys and 13 kidney tumors) and two pooled samples using the Affymetrix GeneChip platform. A panel of ten clustering algorithms combined with four data pre-processing methods identified a consensus cluster dendrogram in 18 of 40 analyses and of these 16 used a logarithmic transformation. Within the consensus dendrogram the expression profiles of the samples grouped according to tissue type; clear cell and chromophobe carcinomas displayed distinctly different gene expression patterns. By using a rigorous statistical selection based method we identified 355 genes that showed significant (p < 0.001) gene expression changes in clear cell renal carcinomas compared to normal kidney. These genes were classified with a tool to conceptualize expression patterns called "Functional Taxonomy". Each tumor type had a distinct "signature," with a high number of genes in the categories of Metabolism, Signal Transduction, and Cellular and Matrix Organization and Adhesion.
Affymetrix GeneChip profiling differentiated clear cell and chromophobe carcinomas from one another and from normal kidney cortex. Clustering methods that used logarithmic transformation of data sets produced dendrograms consistent with the sample biology. Functional taxonomy provided a practical approach to the interpretation of gene expression data.
PMCID: PMC130042  PMID: 12356337
15.  Segregation of Regulatory Polymorphisms with Effects on the Gluteus Medius Transcriptome in a Purebred Pig Population 
PLoS ONE  2012;7(4):e35583.
The main goal of the present study was to analyse the genetic architecture of mRNA expression in muscle, a tissue with an outmost economic importance for pig breeders. Previous studies have used F2 crosses to detect porcine expression QTL (eQTL), so they contributed with data that mostly represents the between-breed component of eQTL variation. Herewith, we have analysed eQTL segregation in an outbred Duroc population using two groups of animals with divergent fatness profiles. This approach is particularly suitable to analyse the within-breed component of eQTL variation, with a special emphasis on loci involved in lipid metabolism.
Methodology/Principal Findings
GeneChip Porcine Genome arrays (Affymetrix) were used to determine the mRNA expression levels of gluteus medius samples from 105 Duroc barrows. A whole-genome eQTL scan was carried out with a panel of 116 microsatellites. Results allowed us to detect 613 genome-wide significant eQTL unevenly distributed across the pig genome. A clear predominance of trans- over cis-eQTL, was observed. Moreover, 11 trans-regulatory hotspots affecting the expression levels of four to 16 genes were identified. A Gene Ontology study showed that regulatory polymorphisms affected the expression of muscle development and lipid metabolism genes. A number of positional concordances between eQTL and lipid trait QTL were also found, whereas limited evidence of a linear relationship between muscle fat deposition and mRNA levels of eQTL regulated genes was obtained.
Our data provide substantial evidence that there is a remarkable amount of within-breed genetic variation affecting muscle mRNA expression. Most of this variation acts in trans and influences biological processes related with muscle development, lipid deposition and energy balance. The identification of the underlying causal mutations and the ascertainment of their effects on phenotypes would allow gaining a fundamental perspective about how complex traits are built at the molecular level.
PMCID: PMC3335821  PMID: 22545120
16.  Intratumor Heterogeneity and Precision of Microarray-Based Predictors of Breast Cancer Biology and Clinical Outcome 
Journal of Clinical Oncology  2010;28(13):2198-2206.
Identifying sources of variation in expression microarray data and the effect of variance in gene expression measurements on complex predictive and diagnostic models is essential when translating microarray-based experimental approaches into clinical assays. The technical reproducibility of microarray platforms is well established. Here, we investigate the additional impact of intratumor heterogeneity, a largely unstudied component of variance, on the performance of several microarray-based assays in breast cancer.
Patients and Methods
Genome-wide expression profiling was performed on 50 core needle biopsies from 18 breast cancer patients using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Global profiles of expression were characterized using unsupervised clustering methods and variance components models. Array-based measures of estrogen receptor (ER) and progesterone receptor (PR) status were compared with immunohistochemistry. The precision of genomic predictors of ER pathway status, recurrence risk, and sensitivity to chemotherapeutics was evaluated by interclass correlation.
Global patterns of gene expression demonstrated that intratumor variation was substantially less than the total variation observed across the patient population. Nevertheless, a fraction of genes exhibited significant intratumor heterogeneity in expression. A high degree of reproducibility was observed in single-gene predictors of ER (intraclass correlation coefficient [ICC] = 0.94) and PR expression (ICC = 0.90), and in a multigene predictor of ER pathway activation (ICC = 0.98) with high concordance with immunohistochemistry. Substantial agreement was also observed for multigene signatures of cancer recurrence (ICC = 0.71) and chemotherapeutic sensitivity (ICC = 0.72 and 0.64).
Intratumor heterogeneity, although present at the level of individual gene expression, does not preclude precise microarray-based predictions of tumor behavior or clinical outcome in breast cancer patients.
PMCID: PMC2860437  PMID: 20368555
17.  Analysis of probe level patterns in Affymetrix microarray data 
BMC Bioinformatics  2007;8:146.
Microarrays have been used extensively to analyze the expression profiles for thousands of genes in parallel. Most of the widely used methods for analyzing Affymetrix Genechip microarray data, including RMA, GCRMA and Model Based Expression Index (MBEI), summarize probe signal intensity data to generate a single measure of expression for each transcript on the array. In contrast, other methods are applied directly to probe intensities, negating the need for a summarization step.
In this study, we used the Affymetrix rat genome Genechip to explore variability in probe response patterns within transcripts. We considered a number of possible sources of variability in probe sets including probe location within the transcript, middle base pair of the probe sequence, probe overlap, sequence homology and affinity. Although affinity, middle base pair and probe location effects may be seen at the gross array level, these factors only account for a small proportion of the variation observed at the gene level. A BLAST search and the presence of probe by treatment interactions for selected differentially expressed genes showed high sequence homology for many probes to non-target genes.
We suggest that examination and modeling of probe level intensities can be used to guide researchers in refining their conclusions regarding differentially expressed genes. We discuss implications for probe sequence selection for confirmatory analysis using real time PCR.
PMCID: PMC1884176  PMID: 17480226
18.  Noncoder: a web interface for exon array-based detection of long non-coding RNAs 
Nucleic Acids Research  2012;41(1):e20.
Due to recent technical developments, a high number of long non-coding RNAs (lncRNAs) have been discovered in mammals. Although it has been shown that lncRNAs are regulated differently among tissues and disease statuses, functions of these transcripts are still unknown in most cases. GeneChip Exon 1.0 ST Arrays (exon arrays) from Affymetrix, Inc. have been used widely to profile genome-wide expression changes and alternative splicing of protein-coding genes. Here, we demonstrate that re-annotation of exon array probes can be used to profile expressions of tens of thousands of lncRNAs. With this annotation, a detailed inspection of lncRNAs and their isoforms is possible. To allow for a general usage to the research community, we developed a user-friendly web interface called ‘noncoder’. By uploading CEL files from exon arrays and with a few mouse clicks and parameter settings, exon array data will be normalized and analysed to identify differentially expressed lncRNAs. Noncoder provides the detailed annotation information of lncRNAs and is equipped with unique features to allow for an efficient search for interesting lncRNAs to be studied further. The web interface is available at
PMCID: PMC3592461  PMID: 23012263
19.  Reproducibility of oligonucleotide arrays using small samples 
BMC Genomics  2003;4:4.
Low RNA yields from small tissue samples can limit the use of oligonucleotide microarrays (Affymetrix GeneChips®). Methods using less cRNA for hybridization or amplifying the cRNA have been reported to reduce the number of transcripts detected, but the effect on realistic experiments designed to detect biological differences has not been analyzed. We systematically explore the effects of using different starting amounts of RNA on the ability to detect differential gene expression.
The standard Affymetrix protocol can be used starting with only 2 micrograms of total RNA, with results equivalent to the recommended 10 micrograms. Biological variability is much greater than the technical variability introduced by this change. A simple amplification protocol described here can be used for samples as small as 0.1 micrograms of total RNA. This amplification protocol allows detection of a substantial fraction of the significant differences found using the standard protocol, despite an increase in variability and the 5' truncation of the transcripts, which prevents detection of a subset of genes.
Biological differences in a typical experiment are much greater than differences resulting from technical manipulations in labeling and hybridization. The standard protocol works well with 2 micrograms of RNA, and with minor modifications could allow the use of samples as small as 1 micrograms. For smaller amounts of starting material, down to 0.1 micrograms RNA, differential gene expression can still be detected using the single cycle amplification protocol. Comparisons of groups of four arrays detect many more significant differences than comparisons of three arrays.
PMCID: PMC150597  PMID: 12594857
20.  Temporal profiling of the transcriptional basis for the development of corticosteroid-induced insulin resistance in rat muscle 
The Journal of endocrinology  2005;184(1):219-232.
Elevated systemic levels of glucocorticoids are causally related to peripheral insulin resistance. The pharmacological use of synthetic glucocorticoids (corticosteroids) often results in insulin resistance/type II diabetes. Skeletal muscle is responsible for close to 80% of the insulin-induced systemic disposal of glucose and is a major target for glucocorticoid-induced insulin resistance. We used Affymetrix gene chips to profile the dynamic changes in mRNA expression in rat skeletal muscle in response to a single bolus dose of the synthetic glucocorticoid methyl-prednisolone. Temporal expression profiles (analyzed on individual chips) were obtained from tissues of 48 drug-treated animals encompassing 16 time points over 72 h following drug administration along with four vehicle-treated controls. Data mining identified 653 regulated probe sets out of 8799 present on the chip. Of these 653 probe sets we identified 29, which represented 22 gene transcripts, that were associated with the development of insulin resistance. These 29 probe sets were regulated in three fundamental temporal patterns. 16 probe sets coding for 12 different genes had a profile of enhanced expression. 10 probe sets coding for eight different genes showed decreased expression and three probe sets coding for two genes showed biphasic temporal signatures. These transcripts were grouped into four general functional categories: signal transduction, transcription regulation, carbohydrate/fat metabolism, and regulation of blood flow to the muscle. The results demonstrate the polygenic nature of transcriptional changes associated with insulin resistance that can provide a temporal scaffolding for translational and post-translational data as they become available.
PMCID: PMC2574435  PMID: 15642798
21.  The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface 
Nucleic Acids Research  2004;32(Database issue):D578-D581.
Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see, with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see
PMCID: PMC308738  PMID: 14681485
22.  A statistical method for predicting splice variants between two groups of samples using GeneChip® expression array data 
Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip® that uses multiple oligonucleotide probes (i.e. probe set), since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip® was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip® gene expression array data.
We developed a two-step approach to predict alternative splicing from GeneChip® data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip® Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic) samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic medulloblastomas to non-metastatic ones. We checked the consistency of some of our findings with information in UCSC Human Genome Browser.
The two-step approach described in this paper is capable of predicting some alternative splicing from multiple oligonucleotide-based gene expression array data with GeneChip® technology. Our method employs the extensive repositories of gene expression array data available and generates alternative splicing hypotheses, which can be further validated by experimental studies.
PMCID: PMC1502129  PMID: 16603076
23.  Transcriptomes of human prostate cells 
BMC Genomics  2006;7:92.
The gene expression profiles of most human tissues have been studied by determining the transcriptome of whole tissue homogenates. Due to the solid composition of tissues it is difficult to study the transcriptomes of individual cell types that compose a tissue. To overcome the problem of heterogeneity we have developed a method to isolate individual cell types from whole tissue that are a source of RNA suitable for transcriptome profiling.
Using monoclonal antibodies specific for basal (integrin β4), luminal secretory (dipeptidyl peptidase IV), stromal fibromuscular (integrin α 1), and endothelial (PECAM-1) cells, respectively, we separated the cell types of the prostate with magnetic cell sorting (MACS). Gene expression of MACS-sorted cell populations was assessed with Affymetrix GeneChips. Analysis of the data provided insight into gene expression patterns at the level of individual cell populations in the prostate.
In this study, we have determined the transcriptome profile of a solid tissue at the level of individual cell types. Our data will be useful for studying prostate development and cancer progression in the context of single cell populations within the organ.
PMCID: PMC1553448  PMID: 16638148
24.  Age and Diet Affect Gene Expression Profile in Canine Skeletal Muscle 
PLoS ONE  2009;4(2):e4481.
We evaluated gene transcription in canine skeletal muscle (biceps femoris) using microarray analysis to identify effects of age and diet on gene expression. Twelve female beagles were used (six 1-year olds and six 12-year olds) and they were fed one of two experimental diets for 12 months. One diet contained primarily plant-based protein sources (PPB), whereas the second diet contained primarily animal-based protein sources (APB). Affymetrix GeneChip Canine Genome Arrays were used to hybridize extracted RNA. Age had the greatest effect on gene transcription (262 differentially expressed genes), whereas the effect of diet was relatively small (22 differentially expressed genes). Effects of age (regardless of diet) were most notable on genes related to metabolism, cell cycle and cell development, and transcription function. All these genes were predominantly down-regulated in geriatric dogs. Age-affected genes that were differentially expressed on only one of two diets were primarily noted in the PPB diet group (144/165 genes). Again, genes related to cell cycle (22/35) and metabolism (15/19) had predominantly decreased transcription in geriatric dogs, but 6/8 genes related to muscle development had increased expression. Effects of diet on muscle gene expression were mostly noted in geriatric dogs, but no consistent patterns in transcription were observed. The insight these data provide into gene expression profiles of canine skeletal muscle as affected by age, could serve as a foundation for future research pertaining to age-related muscle diseases.
PMCID: PMC2637985  PMID: 19221602
25.  Effect of Diet Supplementation on the Expression of Bovine Genes Associated with Fatty Acid Synthesis and Metabolism 
Conjugated linoleic acids (CLA) are of important nutritional and health benefit to human. Food products of animal origin are their major dietary source and their concentration increases with high concentrate diets fed to animals. To examine the effects of diet supplementation on the expression of genes related to lipid metabolism, 28 Angus steers were fed either pasture only, pasture with soybean hulls and corn oil, pasture with corn grain, or high concentrate diet. At slaughter, samples of subcutaneous adipose tissue were collected, from which RNA was extracted. Relative abundance of gene expression was measured using Affymetrix GeneChip Bovine Genome array. An ANOVA model nested within gene was used to analyze the background adjusted, normalized average difference of probe-level intensities. To control experiment wise error, a false discovery rate of 0.01 was imposed on all contrasts. Expression of several genes involved in the synthesis of enzymes related to fatty acid metabolism and lipogenesis such as stearoyl-CoA desaturase (SCD), fatty acid synthetase (FASN), lipoprotein lipase (LPL), fatty-acyl elongase (LCE) along with several trancription factors and co-activators involved in lipogenesis were found to be differentially expressed. Confirmatory RT-qPCR was done to validate the microarray results, which showed satisfactory correspondence between the two platforms. Results show that changes in diet by increasing dietary energy intake by supplementing high concentrate diet have effects on the transcription of genes encoding enzymes involved in fat metabolism which in turn has effects on fatty acid content in the carcass tissue as well as carcass quality. Corn supplementation either as oil or grain appeared to significantly alter the expression of genes directly associated with fatty acid synthesis.
PMCID: PMC2865165  PMID: 20448844
affymetrix bovine array; fatty acid metabolism; RT-qPCR; conjugated linoleic acid; stearoyl-CoA desaturase; fatty acid synthase; acetyl-coenzyme A carboxylase

