Blood Collection and RNA extraction
This study was approved by Institutional Review Boards of the University of Queensland and the Wesley Research Institute, and was conducted with permission of the Red Cross Society of Australia. Written informed consent was obtained from all participants and informed assent was provided by parents of newborn children. Blood was collected by venipuncture (or extraction from the umbilical cord immediately after birth). Within 2 minutes of sampling, between 5 and 10 ml of blood were washed through a LeukoLock size-fractionation filtration system (Ambion/Applied Biosystems, Austin TX) that traps leukocytes while allowing most platelets, erythrocytes, and serum to pass through. Following manufacturer's recommendations, the filters were immediately infiltrated with RNAlater solution and stored frozen at −20°C until preparation of RNA.
RNA samples for the mother-newborn study were collected following informed consent from pregnant mothers enrolled as patients in the medical practice of Graham Tronc, ObGyn, at Brisbane Private Hospital. All pregnant mothers in this study were encouraged to take a vitamin supplement rich in folic acid, either Blackmore's Gold Pregnancy, Elevit, or Fefol, but no data on intake is available. Maternal samples were collected between the 30th and 36th week of pregnancy during regular visits and were processed through the LeukoLocks by staff of Gribbles Pathology Associates. Umbilical cord blood samples were processed at birth by GG or EM, with the majority of deliveries by caesarian section arranged in advance between the hours of 8am and 11am. All samples for this study were collected between August 2008 and February 2009. In order to avoid batch effects, we did not extract RNA until immediately prior to microarray hybridizations after all samples had been collected. At this time, we discovered that a large number of the samples had been affected by significant RNA degradation with the result that only 56 of the 90 originally collected samples yielded RNA of minimal acceptable quality for hybridizations. The cause of the degradation is unknown, but we suspect a change in the pH of the PBS or RNAlater solutions used for washes over several months and/or variation in the sealing of the filters before storage by multiple individuals engaged in the processing. As a consequence, only 16 mother-newborn pairs were included in the study, with the remaining 24 samples from only the mother or baby.
Blood samples for the cross-sectional study were collected with permission of the Red Cross Society of Australia and under informed consent from 100 participants (50 men and 50 women) between the ages of 18 and 68 (mean 44). BMI ranged from 17.7 to 51.3 with a mean of 26.6. Collection was carried out by EM at mobile Red Cross vans at 10 locations (Bellbowrie, Capalaba, Dayboro, Eagle Farm, Kenmore, Mount Ommaney, South Brisbane, St Lucia, Virginia, and Woolloongabba) distributed across the city, most with multiple sampling days and 5–10 samples per day, between March and June, 2009. Each of these samples was also tested for the presence of antibodies against EBV and CMV using a rapid ImmunoDOT Mono-G test-strip assay (GenBio, San Diego CA). Individuals also provided a limited amount of information by written survey regarding their location of residence, age, gender, weight and height, and recent pharmaceutical usage. Sample features are listed in
Table S2.
Following extraction of total RNA from the LeukoLock filters according to manufacturer recommended protocols, the RNA concentration was estimated with a NanoDrop 1000 spectrophotomer and quality was assessed with an Agilent Bioanalyzer. RIN (RNA Integrity Number) scores for the samples included in the mother-newborn study ranged from 3.2 to 9.3 (median 6.75; 23 samples greater than 7.0). RIN scores for the samples included in the Red Cross study ranged from 4.9 to 9.5 (median 8.75; 66 samples greater than 8.0).
Flow Cytometry
In order to estimate the relative abundance of cells, flow cytometry was performed on a BD LSR-II at the Queensland Brain Institute. EDTA treated whole blood was incubated in FACS Lysing Solution (BD Biosciences, San Jose, CA) and stained with diagnostic monoclonal antibodies (BD Biosciences) against seven different cell-surface Cellular Differentiation antigens: CD3, CD4, CD8, CD16, CD19, CD45, and CD123. Each batch of samples was kept at room temperature, and processed the day immediately following blood collection, such that no samples were frozen or stored for longer than 24 h at room temperature. An unstained control was also tested for each sample, as well as positive controls for each of the seven antibodies from one of the samples in each batch.
Cell counts were analyzed using Weasel 2.6.1 software developed by the Walter and Eliza Hall Institute (Melbourne, Australia) downloaded from
http://en.bio-soft.net/other/WEASEL.html. Contour plots for pair-wise comparisons of channel intensities were drawn, and 91.67% densities selected manually. Samples were included where the sum of CD4+ and CD8+ was within 85% of the CD3+ count, and where the sum of CD3+, CD16+ and CD19+ cells was greater than 80%. CD4 averaged 14.3% with a standard deviation of 4.0%, and CD8 averaged 6.7% with a standard deviation of 3.7%. Similar results were obtained using the automated gating options with BD's FACSDiva software.
Gene Expression Profiling
Each RNA sample was reverse transcribed and labeled with Cy3 dye according to standard Illumina protocols (using 500 ng of RNA and 14 h incubation for the IVT reaction), and hybridized to an Illumina HT-12 bead array with over 48,000 probes, each represented by between 20 and 80 individual beads. Two blocks of hybridizations were performed, one for each study, a couple of months apart. Within each study, samples were randomized across the arrays with respect to maternal or newborn origin of the sample, or with respect to collection site and location of residence. Arrays were scanned on an Illumina BeadArray Scanner and data was extracted with Genome Studio Software. Standard array quality measures indicated high quality hybridization and all samples were taken forward for further analysis. The mother-newborn study included one technical replicate (both samples cluster adjacent to one another in all subsequent analyses), and the Red Cross study included six technical replicates (four of which cluster adjacent to one another, one very close, while fifth was disparate but in the same broad profile group).
Statistical Analyses
Raw probe summary data was exported into Microsoft Excel and transformed on the log base 2 scale. Each study was analyzed separately. In order to reduce the dataset to include only genes that are expressed above background, we first computed the average expression level for each probe across all of the samples, and plotted these averages in rank order. The inflection point of this curve suggested a conservative cutoff including 15,000 probes for each study, of which 13,715 (91.4%) representing 10,987 different genes were included in both analyses. The log
2 values for each of the 15,000 probes were then imported into JMP Genomics 4.0 (SAS Institute, Cary NC) for all subsequent analyses. These values are available as
Table S3, and the raw array measures are available at the Gene Expression Omnibus (GEO) repository as series GSE21345 with sub-series GSE21311 and GSE21342 for the Red Cross and Mother-Newborn studies respectively.
JMP Genomics provides a versatile analysis environment with workflows for performing quality control and data normalization, supervised and unsupervised clustering, and analysis of variance of gene expression profiles. Initial exploratory analyses indicated that the raw data is influenced by a variety of technical artifacts, principally RNA quality (samples below RIN 6.5 cluster quite distinctly from those above RIN 6.5), and in the Red Cross study an array effect that significantly differentiated 5 of the arrays from the other 4 (each array was hybridized with 12 samples). Three individuals (mother-newborn) and two individuals (Red Cross) were removed since they had outlying profiles, and after exclusion of one of the technical replicates where present, 56 and 100 samples were available for analysis.
The following statistical pipeline was adopted. First, the complete profiles of 15,000 probes per sample were inter-quartile transformed (we also explored other transformations but did not observe gross distortion of the conclusions). We then performed hierarchical clustering and observed that while RNA quality was a major influence, consistent structure could be seen within the high and low quality groups, so decided to remove this effect statistically. We fit an analysis of variance model to each probe fitting expression as a function of RNA quality with four approximately equal-sized categorical levels in the mother-newborn study (RIN<5.3; 5.3<RIN<6.5; 6.5<RIN<7.7; RIN>7.6) or three categorical levels in the Red Cross study (RIN<8.0; 7.9<RIN<9; RIN>8.9), as well as the dichotomous array effect in the Red Cross study. The standardized residuals from this model were carried further for all subsequent modeling.
Subsequently, we calculated the principal components of the gene expression profiles, retaining the first 5 PC in each analysis as documented in the text. One of the outputs of the JMP Genomics expression workflow is an estimate of the proportion of specified experimental factors that is explained by each PC. This revealed that PC2MN in the first study precisely corresponds to the distinction between mother and newborn, and that PC3 RC in the second study is contributed in part by the distinction between inner city and suburban residents. It also showed that neither RNA quality (RIN score) nor array effects contribute to the variation in the transformed data, and the body mass index overall was a very minor component of the variation in either study.
Gene significance was evaluated with a combination of ANOVA and multiple linear regression. For the mother-newborn study, probe intensity was modeled as a function of the fixed categorical contrasts BP1A against BP1B and of Mother against Newborn, with PC3MN, PC4MN and PC5MN as covariates. For the Red Cross study, probe intensity was modeled as a function of PC1RC, PC2RC and PC3RC using the ANOVA routine in JMP Genomics in the absence of a fixed categorical effect. Axes from each study were compared simply by correlating the estimated regression slopes from these analyses for all probes.
Appendix S1 describes a series of additional analyses that were performed to assess the impact of the RNA integrity transformation on the principal component analysis. In short, instead of adjusting the transcript abundance measures for RIN, we performed two parallel analyses to control for RNA integrity (i) by only including high RNA quality samples, and (ii) by removing all probes from the analysis that had a significant RIN effect at p<0.05. Although the percent variation explained by the top 3 PC changed slightly, all were clearly conserved across the three analyses and identified the same gene sets as the major components of variation. Furthermore, we also conducted an aggressive normalization including hybridization chip as well as RIN, and this also reduced the percent variation explained by PC1
MN, but retained it as one of the top three components of the variance with the same transcripts contributing.
Gene Set Enrichment
Gene set enrichment analyses were performed using a combination of DAVID functional annotation (reported in ), KEGG Pathway queries, Ingenuity Pathway Analysis. In each case we set thresholds for inclusion of between 500 and 800 unique genes, noting that agreement between replicate probes for the approximately 10% of genes was complete with respect to sign of effect, and almost always estimated similar magnitudes of differential expression. Bonferroni adjustments for multiple comparison testing were used to assess the DAVID pathways, whereas the KEGG analysis relied on simple gene counts. Additionally, transcription factor and miRNA binding site enrichment, along with representation of biological pathways and drug targets was performed with ToppFun in the ToppGene suite
[22] and is reported in
Table S4.