Evaluation of array design and manufacture parameters with test oligonucleotide microarray
A custom MA comprised of 3,900 oligonucleotide probes was developed to evaluate the impact of design parameters (for details see Additional file 1
, Additional file 2
: Figure S1 and Additional file 3
: Figure S2). It contained multiple probes for 929 distinct genes as well as cDNA amplicons for 96 of those genes. The parameters tested include oligonucleotide lengths of 50, 60 and 70 nucleotides, the position of probes within the transcript, the impact of SNPs and indels. The presence of one or three SNP mismatches (distributed throughout the oligonucleotide probes) had a small effect on hybridization signal intensities and in many cases the expression ratio between tissues was largely conserved (See Additional file 2
: Figure S1). In contrast, the presence of seven SNP mismatches in the probe had a large effect on intensities and expression ratios. Based on these data, 70-mer probes that vary by up to three SNPs distributed throughout the probe (95.7% sequence identity) are expected to hybridize to the same RNA and on average produce a signal of similar strength, where as probes with seven SNPs or more (less than 90% sequence identity) give very little cross hybridization. These observations establish thresholds of sensitivity to sequence variation, i.e. up to three mismatches have little impact on sensitivity, and specificity, i.e. specificity is achieved with seven mismatches or more to other sequences. The cut-off is situated between four and six SNPs, i.e. between 95% and 91% sequence identity. The presence of short insertions and deletions (up to six nucleotides) located at the center of the 70-mer probes had a small impact on probe performance. The impacts of other parameters tested were generally small and less predictable.
The impact of different spotting buffers and surface chemistries used to manufacture the microarray were also assessed in regard to image quality and data reproducibility. For details on methods and findings, see Additional file 1
. We found that the optimum conditions were obtained by using aminosilane coated slides and 3X SSC without betaine as a spotting buffer.
Microarray design and manufacture
The sequences included in the microarray were selected on the basis of reproducible sequence quality from all of the ESTs and FL-cDNA described for P. glauca in Rigault et al. [13
] and P. sitchensis in Ralph et al. [14
] (Table ). To obtain a robust probe set, we selected sequences that were either detected in the two species, verified with two technologies (Sanger and 454) or were derived from a FL-cDNA (Table ). The probes were 70 nucleotides in length, and were designed to minimize similarity with other sequences in the dataset. Sequence similarity between the probes and known sequences in P. sitchensis and P. glauca was determined from sequence alignments. The microarray contains 25,045 probes that match with known P. glauca sequences; however, they represent 23,853 unique genes based on the most recent clustering [13
], such that 1,017 genes are represented by more than one probe.
The microarray consists of 33,984 spotted features including 33,024 sample spots (Invitrogen, Carlsbad, CA, USA), 240 negative buffer spots, and 480 Spot Report Alien Oligos (Agilent Technologies, Inc., Santa Clara, CA, USA). Oligonucleotides were consolidated into 384-well plates, lyophilized by speed-Vac, and resuspended in 3X SSC to a printing concentration of 30 μM. Oligos were printed on aminosilane slides (Erie, Hudson, NH, USA) with a QArrayMax microarray printer (Genetix Limited, Hampshire, UK) using 946MP2 microarray pins (ArrayIt Corp, Sunnyvale, CA, USA) in a 48-pin tool depositing ~0.5 nL per spot onto the slide. The resulting microarrays had a 4 x 12 subgrid layout with 708 spots per subgrid, each spot having approximate diameter and pitch of 90 μm and 160 μm, respectively. A 280-bp GFP (green fluorescent protein) oligonucleotide was printed in subgrid corners to assist in grid alignment during image processing. The slides were crosslinked in a UV Stratalinker 2400 (Agilent Technologies, Inc.) at 300 mJ. Array quality was assessed by visual inspection and hybridization of representative slides from a print run by dye-labeled random 9-mer oligonucleotides. The quality control images were acquired via the GenePix 4200AL scanner (Molecular Devices, CA, USA) at a 10 micron resolution and quantified with the Imagene 8.0 software suite. Oligonucleotide library management, printing of microarrays and quality control was performed by the Genome BC Microarray Platform (Vancouver, BC, Canada).
The array design details are available in the Gene Expression Omnibus (accession No. GPL15033). All of the transcript profiling data is also deposited in GEO (available upon acceptance of this manuscript for publication).
The origin, genotypes and method of collection of each type of material are described in Table , except for details described here. All of the tissue samples were frozen in liquid nitrogen immediately after removal from the trees, the seed or tissue culture vessels, and stored at −80°C until further use.
•Vegetative Buds: Buds were collected from branches of several clonal replicates of 9-year-old trees of P. glauca regenerated from two genetically distinct somatic embryogenesis lines as described [37
] during the mid-Spring when the buds were just beginning to grow. For each genotype, five biological samples each consisting of 6 buds (approximately 80 mg fresh weight) were used for analysis.
•Secondary xylem, phelloderm (including phloem), young needles of juvenile trees: Nursery planting stock (from open-pollinated seed lots) were obtained as 3-year-old seedlings of Picea glauca, Picea mariana, Picea abies, Picea Sitchensis, Pinus strobus, Pinus resinosa and Larix laricina, were transferred to 8-inch pots and grown in a greenhouse under natural light conditions. Sampling of tissues was timed with the ending of primary shoot elongation, i.e. after 6 to 8 weeks of growth, and was as described [22
]. For each tissue type five biological samples were prepared by pooling 6 independent trees within each species. These materials were used for interspecific comparisons (all 6 species) and for the development of PiceaGenExpress (P. glauca only).
•Megagametophytes: Control-pollinated seed (cross C962856) were obtained from P. glauca tree (80112) described in [38
], were surface-sterilized for 1 minute in 70% EtOH and 10 minutes in 3% Na-hypochlorite, washing the seeds with sterile water three times between and after treatments. The seeds were then immersed in 4°C sterile water for 24 hours and stratified at 4°C for 28 days. Next, the seeds were moved to 26°C on petri dishes with a moist paper and kept in the dark to start the germination process. After 4 hours of incubation the seeds were opened and the embryo was separated from the megagametophyte under a dissecting microscope. A total of 3 biological samples comprised of a single megagametophyte were used for analysis.
•Adventitious roots: Norway spruce (P. abies) seedlings were grown in the experimental nursery of Finnish Forest Research Institute for about one and a half years before sampling. They were grown in standard nursery growing media, light Sphagnum peat, and fertilized with mineral nutrients. Roots were washed under tap water to remove surrounding peat and approximately 1 cm of the root ends was collected for analysis. Between two and four biological samples comprised of several root ends were analyzed for each of eight different genotypes.
RNA extraction, labelling and hybridization
Total RNA was extracted following Chang et al. [39
] as described in Pavy et al. [22
] for all of the sample types, except for megagametophytes where poly A
RNA was extracted directly by using polyT coated magnetic beads (Dynal). One μg of total RNA was amplified for each sample replicates, except for the megagametophyte samples where 10 ng of poly A
RNA was used, with the Amino Allyl MessageAmp II aRNA Amplification Kit (Applied Biosystems by Life Technologies, Carlsbad, CA, USA) according to manufacturer’s instructions. Five μg of amplified RNA (aRNA) was then labelled with Alexa Fluor 555 or 647 dyes (Invitrogen, Carlsbad, CA, USA) and purified as per the manufacturer’s instructions. Dye incorporation efficiency was determined by using a Nanodrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) following the manufacturer’s instructions. Depending on the experiment, each microarray was hybridized with one labeled aRNA sample or two samples labelled with different dyes. The sample(s) to be hybridized to a microarray were mixed and the volume was reduced to ~10 μl by evaporating excess water in a DNA 120 speedvac (Thermo Fisher Scientific). Labelled aRNAs were fragmented for 15 minutes at 70°C using Ambion’s ”RNA Fragmentation Reagents“ (Applied Biosystems), placed on ice for 1 minute, denatured for 2 minutes at 95°C, put on ice for 2 min and resuspended in 120 μl hybridization buffer (50% formamide, 5X SSC, 0,1% SDS, 0,1 mg/mL Herring sperm DNA) preheated to 55°C. Samples were kept in a heating block at 50°C until hybridization.
Hybridizations were performed in HS400Pro hybridization stations (Tecan Group Ltd., Männedorf, Switzerland). The slides were heated at 80°C for 10 minutes, then washed once at 37°C with 0.5X SSC, 0.1% SDS for 20 seconds and once at 50°C with 2X SSC, 0.5% SDS for 20 seconds, and prehybridized for 1 hour at 65°C in 5X SSC, 0.1% SDS, 0.1 mg/ml BSA, 0.1 mg/ml Herring Sperm DNA. Next the slides were washed at 55°C with 2X SSC, 0.5% SDS for 1 minute with a 30 second soak and washed again at 45°C for 1 minute with the same solution. The resuspended labeled targets were injected into the chambers and hybridized for 16 hours at 45°C with sample agitation. The slides were then washed as follows: 2 times 1 minute 30 seconds at 45°C with 30 seconds soaking in 2X SSC, 0.5% SDS, 1 time 1 minute at 45°C in 2X SSC, 0.5% SDS, 2 times 1 minute 30 seconds at 45°C with 30 seconds soaking in 0.5X SSC, 0.1% SDS, 1 time 1 minute at 37°C with 20 seconds soaking in 0.5X SSC, 0.1% SDS, 1 time 1 minute at 23°C with 20 seconds soaking in 0.5X SSC, 0.1% SDS, 1 time 1 minute 30 seconds at 23°C with 30 seconds soaking in 0.1X SSC, 1 time 30 seconds at 23°C in 0.1X SSC and 2 times 30 seconds at 23°C in milliQ filtered water. Finally slides were dried for 2 minutes 30 seconds with nitrogen gaz. Slide scanning and image processing were performed as described in Table .
Microarray data processing and analysis
A procedure was developed to process and analyze microarray intensity data from single dyes, as opposed to fold change methods routinely used for spotted arrays. Data analyses were performed using customized scripts for R and Bioconductor (http://www.r-project.org
). Spots that were flagged as presenting abnormal morphology during the image processing were replaced by mean value of the remaining spots of the same probe from the other slides from the same sample type. Background intensities were subtracted from the foreground intensities. Background-subtracted data were log2-transformed and normalized using quantile correction approach.
A filtering step was applied to select positive genes to be used for further analysis. The mean intensity of spots containing buffer only was calculated for each row of sub-grids, and was taken as the minimum intensity of probes for that subgrid. A probe was called positive (detected above background) when its signal intensity was above the buffer intensity on at least 50% of slides within a given sample type. When determining differential expression, positive probes were probes that were detected according to this criterion in at least one of the tissues (e.g. phelloderm and xylem juvenile). Mean probe intensity was determined for genes represented by more than one positive probe. All microarray experiment data has been submitted to the Gene Expression Omnibus (GEO) under accession numbers GSE35624, GSE35847 and GSE35922.
Statistical testing for differential gene expression used the linear modeling approach and the empirical Bayes statistics [40
]; the p-values were adjusted for multiple testing according to Benjamini and Hochberg [41
]. Differential genes were those meeting an adjusted p-value ≤0.01, unless stated otherwise.
The PiceaGenExpress database
The PiceaGenExpress database was developed from transcript profiles obtained for eight different tissue types coming from five independent experiments (see Table ). For this, the average signal intensity was determined from all of the slides available for a sample type. The genes were then ranked based on their average signal intensities within a tissue type and equally divided into 10 separate classes according to their signal intensity. Genes from class 1 or class 10 were the 10% with lowest and highest signal intensities, respectively (for the number of genes per class in each type, see Additional file 6
: Table S2 and Additional file 8
: Figure S5). Functional annotations were based on the matches with Arabidopsis proteins (TAIR 9 release) with E-value <1e-10 and on the detection of Pfam domains as described [13
]. PiceaGenExpress is made available as a flat file (Additional file 5
: Table S1), which may be uploaded to any type of data processing or spreadsheet platform.
Agglomerative hierarchical clustering with the complete linkage method was performed using hclust function in R [42
] on the expression levels for two gene families, i.e. dehydrins and LEA proteins. This approach used a similarity matrix based on Euclidean distance. A smaller distance means that two genes or clusters have more similar expression levels in the tissues analyzed. First, the clustering analysis placed each gene into its own singleton group or cluster. Second, the closest clusters were iteratively joined together until all genes were merged into a single cluster based upon similarity/distance measures between clusters. Dendrograms showing clusters of genes were drawn (Additional file 9
: Figure S6).
RNA-Seq data processing and analysis
Two composite samples were analysed by RNA- Sequencing for validation and comparison to MA results. Samples were prepared by combining equal molar amounts of the P. glauca RNAs isolated from secondary xylem (juvenile trees) and phelloderm. These samples were also used in the MA profiling of each of the tissues. RNA-Sequencing, filtering of quality reads and mapping of reads onto the cDNA clusters were described [13
]. The number of sequences matching each cDNA cluster was normalized by transforming the data to the number of reads per kilobase of exon model per million mapped reads (RPKM) following the method of Mortazavi et al. [24
]. Unless specified otherwise, an RPKM >1 was used as a minimum threshold of detection for RNA-Seq. Differential expression of genes was determined by using the Chi-squared test corrected for multiple testing according to Benjamini and Hochberg [41