Microarray platforms and collaborating laboratories
Eight sites were involved in testing the mixed tissue RNA reference material (MTRRM); the results are reported anonymized. The samples were run on Affymetrix GeneChip®
RAE230A arrays at three sites (sites 1, 2 and 3), on GE Healthcare CodeLink UniSet Rat I arrays at two sites (sites 4 and 5), and on Agilent G4130A arrays at 4 sites (sites 3, 7, 8 and 9). Site 6 used the RM to help calibrate their in-house platform (data not shown). Data from this study are available at EBI ArrayExpress (http://www.ebi.ac.uk/arrayexpress/
) under accession number E-TABM-16.
Animal care and procedures were approved by the Institutional Animal Care and Use Committee at the US FDA. Male Sprague-Dawley (SD) rats were ordered from Charles River Laboratories (Product no. Crl:CD(SD)IGS) or from Harlan Laboratories (Product no. Hsd:SD) at 6 weeks of age. Three separate shipments of eight rats were ordered and received for this study within a 2 months time frame. Each shipment was processed and pooled separately to create three biological replicate sets. Shipments 1 and 2 were made from rats ordered from Charles River; Harlan was the source of shipment 3. The rats received certified rodent diet #5002C (Purina Mills Inc.) ad lib and drinking water purified by reverse osmosis. The animals were acclimated for 6 days before euthanasia. The animals were on a 12 h light/dark cycle and euthanasia was performed consistently within 4 to 6 h after the start of the light cycle. The average weight at sacrifice (7 weeks of age) was 223 ± 9 g.
After euthanasia in a slow-charged CO2 chamber, the rats were immediately decapitated to allow for rapid access to the brain. Four organs were quickly removed in the following order: brain, liver, kidneys and testes. The whole brain, including brain stem but excluding pituitary gland, was collected. Tissues were quickly dissected into 0.5 cm sections while submersed in RNAlater (Ambion) in sterile petri plates and placed into 50 ml tubes containing RNAlater at a ratio of 10 ml per 1g of tissue. Tissues were stored at 4°C for a minimum of 24 h and a maximum of 72 h. All tissue RNA was isolated using a Tempest rotor-stator homogenizer (VirTis) and QIAGEN RNA isolation kits, following the manufacturer's protocol. Brain RNA was isolated using QIAzol reagent, followed by a clean-up step with an RNeasy Maxi kit. Kidney, liver and testes were homogenized in 15 ml QIAGEN RNeasy Lysis buffer (RLT) per mg of tissue, diluted to 30 ml RLT per mg, and RNA was isolated on RNeasy Maxi columns using 15 ml homogenate per column. After an additional clean-up step on RNeasy Maxi columns, RNA was aliquoted and stored at −70°C. The integrity of each RNA sample was assessed on an Agilent 2100 Bioanalyzer (Agilent Technologies). Total RNA was quantitated by ultraviolet (UV)/visible wavelength spectrophotometry in TNE [40 mM Tris–HCl (pH 7.5), 1 mM EDTA (pH 8.0), 150 mM NaCl]. For each tissue, equal amounts of RNA were pooled from each of the eight animals in the same shipment to create tissue shipment specific pools.
SD rat RNA was also tested from two commercial RNA sources. Total RNA isolated from rat brain (Catalog no. 7912), kidney (Catalog no. 7926), liver (Catalog no. 7910) and testicle (Catalog no. 7934) were obtained from Ambion. Total RNA isolated from brain (Catalog no. 737001), kidney (Catalog no. 737007), liver (Catalog no. 737009) and testes (Catalog no. 737023) were obtained from Stratagene. One lot of Stratagene brain RNA (Lot no. 0610696) could not be used to make the MTRRM because it behaved on microarrays like RNA from a tissue different from brain. MTRRM were prepared from the commercial RNA in proportions based on the RNA concentrations provided by the supplier to make MTRRM batches 4 and 5 from Ambion and Stratagene RNA, respectively. An independent batch (batch 6) of the MTRRM was prepared at site 3 under the same protocol used to prepare batches 1–3 from a set of eight rats.
MTRRM batch preparation
After pooling, the RNA was quantitated by measuring OD260 on a UV/visible wavelength spectrophotometer in TNE, checked for purity by OD260/OD280 ratio, and checked for RNA integrity on the Agilent 2100 bioanalyzer. RNA quantitation was confirmed on a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies). Accurate RNA quantitation in TNE was found to be important for replication of the results with the MTRRM. The average OD260/OD280 ratio for each tissue pool across the 3 shipments was the following: brain, 2.09 ± 0.01; kidney, 2.03 ± 0.04; liver, 2.07 ± 0.13 and testis, 2.09 ± 0.03. Each pool of same-tissue RNA from each shipment was run on Affymetrix RAE230A arrays (Affymetrix, Inc.) using the protocols associated with site 1 below. Two mg each of two mixtures (Mix1 and Mix2) were prepared for each shipment from the same-tissue RNA pools to make MTRRM batches 1–3 from shipments 1–3, respectively. Mix1 consisted of 200 µg testis RNA, 600 µg liver RNA, 800 µg brain RNA and 400 µg kidney RNA. Mix2 consisted of 800 µg testis RNA, 400 µg liver RNA, 400 µg brain RNA and 400 µg kidney RNA. A total of 50 µg aliquots of each mixture for each of the 3 batches were frozen at −70°C. The 3 batches of Mix1 and Mix2 RNA samples were run on microarrays at eight anonymized sites. For site 7, 100 µg of Mix1 and Mix2 were treated with deoxyribonuclease I (E.C. 126.96.36.199) (DNA- free, Ambion), and diluted to a concentration of 0.2 µg/µl.
Gene expression measurement on Affymetrix RAE230A arrays
Sites 1–3 ran 3 batches of the MTRRM on Affymetrix RAE230A arrays using either the standard or the alternate protocol for labeling and processing specified by the manufacturer (http://www.affymetrix.com/support/technical/manual/expression_manual.affx
). Standardized amounts of input total RNA per labeling reaction (5 µg), labeled cRNA target per array (15 µg) and hybridization volume per array (200 µl) were used by the 3 sites. Labeling and processing conditions at sites 1 and 3 included the use of the T7-Oligo(dT) promoter primer kit (Affymetrix Part no. 900375), reagents for cDNA synthesis from Invitrogen, cDNA and cRNA Clean-up using the Sample Clean-up Module (Affymetrix Part no. 900371), and synthesis of biotin-labeled cRNA using an Enzo kit (Affymetrix Part no. 900182). Site 2 used the alternate protocols for cDNA clean-up that includes phenol/chloroform extraction with Phase-Lock gels and for cRNA clean-up that use the RNeasy kit (QIAGEN Part no. 74103). At sites 1 and 2, microarrays were stained and washed on an Affymetrix GeneChip®
Fluidics Station 400 using the EukGE-WS2 protocol. The arrays were scanned on an Affymetrix GeneChip®
Scanner 2500 using default settings. Site 3 used an Affymetrix GeneChip®
Fluidics Station 450 with the EukGE-WS2v4 protocol and an Affymetrix GeneChip®
Scanner 3000 with default settings. Affymetrix MAS5 software was used to calculate signal values for tissue-selectivity determinations and for intra-site and cross-site comparisons. All data was globally scaled to a target intensity of 500. For some applications, Mix1 array data were normalized to a selected probe set on the Mix2 array (the 10% trimmed mean of kidney-selective probes (listed in Supplementary Table 1). Gene summary values were also calculated from CEL files using the Probe Logarithmic Intensity Error Estimation (PLIER) algorithm. In contrast to the MAS 5.0 algorithm, which applies a one-step Tukey's biweight estimate to produce a robust weighted mean signal for each probe set the PLIER algorithm uses maximum likelihood type estimates in a model-based framework for finding probe expression estimates. PLIER signal calculations for body map data were performed using default settings (quantile normalization, mismatch background estimation, perfect match minus mismatch and full optimization). An affinity model for the PLIER analysis was constructed from the four same-tissue RNA pools from each shipment that were run on 12 Affymetrix RAE230A arrays. Normalization between Mix1 and Mix2 was performed either on signals or ratios, where indicated. If PLIER signal estimates were not quantile normalized, Mix1 signal data was normalized by the 10% trimmed mean Mix2 signal of kidney-selective analytes. If PLIER signals were quantile normalized, Mix1:Mix2 ratios were normalized by dividing each ratio by the 10% trimmed mean ratio of the subset of kidney-selective reference probes.
Gene expression measurements on CodeLink UniSet Rat I arrays
At sites 4 and 5, the MTRRM was run on CodeLink RU1 arrays using a standardized amount of input total RNA per labeling reaction (2 µg), labeled cRNA target per array (10 µg) and hybridization volume per array (250 µl)(http://www5.amershambiosciences.com/aptrix/upp01077.nsf/Content/codelink_user_protocols
). At site 4, target labeling was performed using sthe manufacturer's manual labeling cDNA target preparation protocol; site 5 used the manufacturer's automated target preparation protocol. Site 4 used the manufacturer's recommended hybridization and detection protocols. A few modifications to this protocol were made at site 5 (21
). As a secondary label, site 4 used Cy5-Streptavidin and site 5 used Alexa fluor 647-Streptavidin. Both sites used the Axon 4000B scanners at settings defined in the user manual. Version 2.3 of the CodeLink Expression analysis software was used for feature extraction. A global normalization of each array by the median normalized intensity was performed. For some applications, this step was followed by normalization of Mix1 to Mix2 using the 10% trimmed mean signal of the kidney-selective probe subset. Alternatively, Mix1:Mix2 ratios were normalized by dividing each ratio by the 10% trimmed mean ratio of the subset of kidney-selective reference probes.
Gene expression measurements on Agilent G4130A arrays
The four sites running the MTRRM on Agilent arrays used either standardized protocols for sample labeling and hybridization or a propriety in-house method. Sites 3, 8 and 9 used the Agilent Low Input RNA Fluorescent Linear Amplification Kit (Part no. 5184-3523) for target labeling and the Agilent 60mer microarray processing protocol version 2.0 (Part no.G4140-90030). All sites used the Agilent DNA microarray scanner (Part no. G2565BA). Site 7 used DNase-treated MTRRM and proprietary protocols for target labeling, hybridization and washing. Mix1 and Mix2 samples were run on the same array, in dye-swap replicate experiments. For this phase of the project, the data was extracted and processed using a standard method. The TIFF images from all four sites were processed with Agilent Feature Extraction software version A.7.4.47 (a prerelease version that is algorithmically identical to v 7.5.1) using default settings. Adjustment for local variations in background signal was performed using a spatial detrending algorithm. To normalize and correct for dye bias, a combined method of linear scaling in each channel followed by LOWESS curve fitting (‘Linear&LOWESS’ option in version A.7.5) was used. Mix1 and Mix2 signals were calculated from the average of dye-swap replicates. For some applications, this step was followed by normalization of Mix1 to Mix2 using the 10% trimmed mean signal of the kidney-selective probe subset. Alternatively, Mix1: Mix2 ratios were normalized by dividing each ratio by the 10% trimmed mean ratio of the subset of kidney-selective reference probes. Features flagged as outliers by the feature extraction software were not removed from the analysis for this study.
To identify the probes on three commercial rat expression array platforms (Affymetrix RAE230A, Agilent G4130A and CodeLink UniSet Rat I) that were potential reporters of expression levels for the same gene transcripts, probe annotation data were intersected by GenBank accession number and/or UniGene identifier using the annotation files supplied by the manufacturer that were available in June 2003. Approximately, 6300 probes were identified that could be intersected by annotation on all three platforms. This number includes duplicate listings when a probe on one platform could be linked to multiple probes on a second platform.
Tissue-selectivity was determined using body map data, i.e. signal values averaged across multiple control animal samples for the individual tissues in the MTRRM. For each probe on each of the three platforms, a tissue-selective index (TSI) was determined as follows: the average signal value in a selected tissue was divided by the maximum average signal value among the other three non-selected tissues.
Body map data on Affymetrix RAE230A arrays was generated from the pooled tissue samples that are components of the MTRRM, as described above. Each sample was composed of RNA pooled from brain, kidney, liver or testes samples across eight male SD rats that were in the same shipment cohort. Using these samples from three biological replicate experiments, an average signal value was determined for each probe in each of the four tissues.
Body map data on CodeLink UniSet Rat I arrays was derived from individual control animal data from vehicle-treated male SD rats (vehicle not specified) provided by Iconix Pharmaceuticals. Data was excluded for probes which showed identified associations with process drift due to array protocol changes over time. An average signal value for each of 8565 probes in brain, kidney and liver RNA was calculated across 25 control animal datasets. Six control animal datasets were available to calculate an average signal in testes RNA.
Body map data on Agilent G4130A arrays was derived from individual control animal data generated through a collaboration between NIEHS and Iconix. Brain, kidney, liver and testis RNA samples from three SD rats, that received a 0.5% carboxymethyl cellulose (CMC) vehicle treatment for 5 days, were run on Agilent G4130A arrays. Each tissue sample was run once on each of the Cy3 and Cy5 channels in a dye-swap with the Iconix Reference RNA on the other channel. The Iconix Reference RNA is a pooled RNA extracted from an equal tissue mixture of 7 rat tissues taken from 10 male SD rats, vehicle-treated with 0.5% CMC for 3 days. The signal channel corresponding to the control tissue sample was separated from the reference sample channel for each dye-swap pair, resulting in one Cy3 and one Cy5 signal value for each probe for each of three control animal samples per tissue. An average signal value was calculated for each probe in each tissue from these six signal values.
Relative signal intensity in the MTRRM
Using data generated by participating labs, an average signal value in the MTRRM was calculated for each probe on each of the three microarray platforms. For each site, the average signal value across three replicate sets of Mix1 and Mix2 experiments was determined for each probe, expressed as a percent of the average maximal signal (%Max) in the same experimental set, and then averaged across all sites using the same microarray platform in the study (n = 3 for RAE230A, n = 2 for RU1, n = 4 for G4130A). The average %Max was used to sort probes into nine exponentially distributed bins as follows: <0.4; 0.4–0.8; 0.8–1.6; 1.6–3.2; 3.2–6.4; 6.4–12.8; 12.8–25.6; 25.6–51.2 and >51.2%Max, respectively.
MTRRM reference probe selection
For probes that could be linked across three platforms by annotation, tissue-selectivity and relative signal intensity were weighed together to derive a list of candidate probes for the MTRRM. Probes were first sorted into nine exponentially spaced bins based upon their %Max on the Affymetrix platform. From each bin, 5–8 probes with the highest combined TSI values for each platform were chosen in order to select ~200 probes in total (~50 per tissue). The selected probes were then binned according to their %Max on the Agilent and CodeLink arrays and reselected, if necessary, to achieve a similar distribution on each platform. Probes for tissue-selective gene transcripts that did not receive a MAS5.0 present call on all of the selective tissue samples run on Affymetrix RAE230A for this study were not chosen for the analyte list. Testes-selective genes with signal intensities <0.8%MAX were also excluded because, in this intensity range, the contribution of non-selective signal to selective signal greatly attenuated the observed ratio for these probes.
To confirm that the candidate MTRRM analytes measured the same gene transcripts on each platform, probes were sequenced mapped to a common exemplar. Using annotations from UniGene Build 135, probes were aligned against the corresponding NCBI Reference Sequence database (RefSeq) sequence (22
) or, if not available, a common mRNA or EST sequence. For a few exemplars that were not RefSeqs, probes aligned to the reverse complemented strand of the GenBank sequence. Cross-platform intersected probes that could not be mapped to a common exemplar sequence were filtered from the list. For one of the analytes, a single exemplar sequence was not found that contained the probe sequences for all three platforms, so two overlapping exemplars are listed in Supplementary Table 1. Gene symbol and RefSeq status were updated for all exemplars using the information available in the NCBI public databases in November 2005.
In silico modeling of microarray ratio measurements
An average signal intensity (I
) for each probe in each of the four tissues was calculated from body map data available for each platform and used to calculate a modeled ratio (R
) for each analyte based upon tissue RNA proportions in Mix1 and Mix2 using the following formula:
An average ratio for each set of tissue-selective analytes was calculated from 46–55 individual RAnalytes
Reverse transcription polymerase chain reaction (qRT–PCR)
Relative gene transcript levels between Mix1 and Mix2 were determined using qRT–PCR. cDNA was generated from total RNA using random hexamer primers and Superscript II reverse transcriptase (Invitrogen). qRT–PCR was performed using SYBR Green PCR Master Mix reagents (Applied Biosystems) on the ABI Prism 7900HT sequence detection system as described in User Bulletin #2 (updated 10/2001). Seven 2-fold serial dilutions were used to prepare relative standard curves for each of the targets and their endogenous reference (18s rRNA). Gene expression data were normalized by dividing the amount of target mRNA by the endogenous reference. Relative changes in gene expression were calculated by dividing the amount of target mRNA in Mix1 by the amount in Mix2.
High-performance liquid chromatography (HPLC)-purified oligonucleotide primers were obtained from BioServe Biotechnologies. The UniGene name, symbol, sequence accession number and primer sequences for qRT–PCR for each target transcript are provided below.
The three brain-selective targets were chromogranin B (Chgb, NM_012526.1, forward primer: GGAAAAGTTCAGCCAGCGG, reverse primer: CAGCGAATGGCTCGTCTCTC), neurofilament 3, medium (Nef3, NM_017029.1, forward primer: TGTACCTAGGGAATTTGCCAGTTT, reverse primer: CGAGTGCCCCTCTTTCAACA), and neurofilament, light polypeptide (Nfl, NM_031783, forward primer: GACCTCCTCAATGTCAAGATGG, reverse primer: TCGCCTTCCAAGAGTTTCCT). The kidney-selective targets were kidney-specific membrane protein (Tmem27, NM_020976.1, forward primer: GAAATTTCCCACGTCCTGCTTT, reverse primer: GCACTGTTGATCCGTTTCCTGT) and trefoil factor 3 (Tff3, NM_013042.1, forward primer: AGCTCCACACCCTGGACTCTT, reverse primer: TGAGTGTTACCCTGGGCCAC). The two liver-selective targets were hepatic lipase (Lipc, NM_012597, forward primer: GCTCCCATCCACTTGTCATGA, reverse primer: TTTCTAGCAAGCCATCCACCG) and complement component 9 (C9, NM_057146.1, forward primer: CATGTCAAAACGGAGGCACA, reverse primer: TGCACTGTTGATCCGTTTCCT). The three testis-selective targets were phosphorylase kinase, gamma 2 (Phkg2, M73808, forward primer: AACTGTGCCTTCCGGCTCTA, reverse primer: CTGCTGCTCCCCCTTCTTC), A kinase anchor protein 4 (Akap4, NM_024402, forward primer: AAACAAGACCAGCCTAAGACGG, reverse primer: GAGGAGCCAGTTGAGGACACTT), and ATPase, Na+/K+ transporting, alpha 4 polypeptide (Atp1a4, NM_022848, forward primer: TGGATGAGCTGAGTGCCAAGT, reverse primer: CGTCTGTGACGCTAAGACCCTT). Nfl, Lipc, Akap4 and Atp1a4 were not selected to be on the final cross-platform list of MTRRM analytes.
Two ANOVA models were applied to identify the major sources of variability within the MTRRM data collected on three platforms from eight sites using three biological replicate batches. A two-way ANOVA, using S-PLUS®, was used to study the tissue and gene effects as well as their interactions. A mixed-model three-way ANOVA was performed using Partek® software (Copyright, Partek Inc.). Partek and all other Partek Inc. product or service names are registered trademarks or trademarks of Partek Inc. This model was used to determine the contribution to variation by the platform, batch and site effects. The input data for these models are the batch-specific ratios for 199 tissue-selective analytes without the Mix1:Mix2 normalization step. Affymetrix signal estimates were calculated using PLIER. Four probes were excluded from the analysis because they were masked as suboptimal probes in the Manufacturing Slide Report files for the lots of CodeLink arrays used in this study.
One sample t-tests comparing replicate Mix1:Mix2 ratios (for single analytes or across sets of analytes) to a theoretical mean were performed using GraphPad Prism version 3.03 for Windows (GraphPad Software, San Diego, CA). Two sample t-tests were performed in Excel and Partek software. ANOVA within body map data was performed using Partek Genomics Solutions software version 6.1. Multiple comparison corrections were performed using the Benjamini-Hochberg False Discovery Rate procedure within Partek software.