A total of 223 samples of airway epithelium were obtained by bronchial brushing from three different locations (trachea, large airway, small airway) from 144 subjects with 5 different pulmonary phenotypes (healthy non-smokers, healthy smokers, symptomatic smokers, smokers with lone emphysema with normal spirometry, and smokers with COPD; Table ). The mean ages varied from 36 to 52 yr, and males represented the majority in all but one group. The ancestries varied among those of European, Hispanic, Asian and African. The lung function fit the criteria for each group. A range of 4.4 to 7.6 × 106
cells were recovered from trachea, large airway and small airway in all five pulmonary phenotypic groups and cell counts were not dependent upon phenotype of the subject or site of bronchial brushing (p > 0.05 by ANOVA). From all locations, an average of 99 to 100% of all cells recovered were epithelial with less than 1% contamination by non-epithelial cells. The cell differentials varied depending on location as previously described [4
]. The average yield of extracted RNA was 25.3 ± 10 μg. varying from 3.5 to 53.9 μg.
Demographic of the Study Population and Biologic Samples1
Establishment and Testing of Quality Control Criteria
The overall strategy was to utilize the data on 223 samples to establish prospectively applicable QC criteria that would ensure high quality expression microarray data for biological interpretation in our ongoing studies. The QC criteria were selected as rigorous and objective quality control metrics at three distinct stages of the microarray workflow, and were applied to all 223 samples hybridized to microarray in this study; for the RIN assessment, the n = 191 (32 samples were unavailable for RIN analysis because the samples were hybridized to microarray prior to the development of the Bioanalyzer RIN software). For the GAPDH 3'/5' signal intensity ratio and scaling factor criteria, all 223 samples were included.
Of the 223 samples, all three criteria were assessed in 191; of these 184 (96.3%) passed all three criteria. For the remaining 32 samples, the RIN was not available, and only the other two criteria were used; of these 29 (90.6%) passed these two criteria. Only 10 (4.5%) failed at least one QC criterion, and were therefore considered to have failed QC. The overall breakdown of samples failing QC was: 2 large airway samples (1 healthy non-smoker and 1 healthy smoker) and 8 small airway samples (1 healthy smoker, 4 symptomatic smokers, and 3 smokers with COPD). The greatest source of failure was the scaling factor criterion, which contributed to 70% of the overall failures. All of the 10 samples failing the QC criteria failed the RIN and/or scaling factor criterion, indicating that these metrics may be the most sensitive to technical variance, and therefore are central to assessing overall array quality. While 7 samples failed by one criterion each, 1 sample failed by both the RIN and GAPDH 3'/5' ratio criteria, and 2 samples failed by both the RIN and scaling factor criteria, suggesting that the quality control parameters exert correlated effects on array performance.
The RNA quality was examined by the Bioanalyzer-generated RIN score in 191 samples for which there was data available (see above). Based on published data [26
], samples with a RIN ≤ 7.0 were designated to have passed QC (Figure ). Five out of the 191 samples (2.6%) had RIN scores <7.0. The RIN values were not significantly dependent upon the phenotype or biologic origin of the RNA sample (p > 0.1 by ANOVA), with n = 4 small airway samples (1 healthy smoker, 2 symptomatic smokers, 1 smoker with COPD) and 1 large airway sample (healthy nonsmoker) failing on the basis of RIN <7.0.
Figure 1 Assessment of RNA quality in airway epithelial samples. Integrity of 180 RNA samples was scored using the RNA Integrity Number (RIN) generated by Agilent 2100 Bioanalyzer Software (1 = highly degraded; 10 = intact). Samples are grouped by phenotype as (more ...)
GAPDH 3'/5' Signal Intensity Ratio
As a metric for the efficiency of transcription and amplification of antisense cRNA from the cDNA derivative of the starting RNA material, the signal intensities for the probe sets for GAPDH residing at the 5' end and within the 600 nucleotides most proximal to the priming site at the 3' end of the transcript were compared. For all samples hybridized to microarrays, 3' to 5' probe set intensities for the GAPDH gene were extracted to compute the 3'/5' signal intensity ratio. Based on published data [16
], the criterion for passing QC was established as GAPDH 3'/5' ratio ≤ 3.0 (Figure ). By this criterion, only 1 small airway sample from a symptomatic smoker failed QC. The Affymetrix expression microarray also returns 3'/5' ratios for other genes including β-actin. But due to the strong correlation in 3'/5' ratios for β-actin and GAPDH (r2
= 0.92; p < 0.0001), application of addition cutoff criteria beyond GAPDH was considered redundant. In the context of airway epithelium, although GAPDH is not an ideal "housekeeping" gene as its expression may vary under different conditions, this does not interfere with its use in assessing cRNA quality [37
Figure 2 Assessment of GAPDH 3'/5' and Chip scaling factor. Ratios of signal intensities for GAPDH 3' and 5' probe sets for 223 samples were extracted from the GeneChip Operating Software (GCOS) Quality Report and plotted against the Scaling Factors analyzed with (more ...)
Multi-chip Normalization Scaling Factor
The scaling factor was used as an overall index of the microarray hybridization, washing, and scanning process. Scaling factor values for all 223 samples computed at a target intensity value of 500 were examined. The criterion of scaling factor values ≤ 10.0 was established (Figure ). Seven out of the 223 samples (3.1%) had scaling factor values above the acceptable cutoff. The scaling factor values were not significantly dependent upon the phenotype or biologic origin of the sample (p > 0.1 by ANOVA), with n = 5 small airway samples (1 healthy smoker, 2 symptomatic smokers, 2 smokers with COPD) and n = 2 large airway samples (1 healthy nonsmoker, 1 healthy smoker) failing on the basis of scaling factor >10.0.
The interdependence of failing different QC criteria was assessed (Table ). Of the total 7 samples that failed RIN, three failed one of other the other QC criteria with 1 failing GAPDH 3'/5' test and 2 failing scaling factor test. There was no pattern of repeated QC failure by a single subject sampled on more than one occasion, neither was there correlation of failure with differential or % non-epithelial contamination.
Classification of Quality Control Failures by Criterion 1
Maintenance Gene Expression Levels
To assess whether the gene expression data derived from samples that pass all of the QC criteria was more robust than that derived from samples that failed one or more conditions, for every sample, regardless of QC metric values, expression levels were extracted for the 100 maintenance genes. For the 10 samples failing QC criteria and 24 randomly selected samples passing the QC criteria, the expression profile for all 100 genes was compared. Pearson's correlation was calculated for all pairwise comparisons (i.e., 24 × 24 comparison of samples both passing QC, 24 × 10 among samples passing QC and samples failing QC, and 10 × 10 comparison of samples both failing QC). Correlation coefficient values indicated that samples passing QC criteria were highly correlated with other samples passing QC criteria (average Pearson r = 0.97) while samples failing QC criteria showed lower correlations with all other samples (average Pearson r = 0.90; Figure ). The range of correlation coefficient values obtained for pairwise correlations of samples passing QC criteria was 0.92 to 0.99. In contrast, when comparing samples failing QC criteria with all other samples, the range of correlation coefficient values was 0.76 to 0.97. There was no difference in the correlation coefficient values for samples failing QC for RIN criterion versus other causes (p > 0.4). The distribution of correlation coefficients for the pairwise comparisons of samples passing QC criteria was significantly different from the distribution of values for pairwise comparisons where at least one sample failed the QC criteria (p < 0.0001, Mann-Whitney U Test; Figure ).
Figure 3 Pairwise correlations of expression levels for 100 maintenance genes. Expression levels for 100 maintenance genes were determined for 34 airway epithelial samples of which 24 randomly selected samples passed the pre-determined QC criteria and 10 failed (more ...)
Figure 4 Frequency distribution of correlation coefficients calculated for pairwise comparisons. Shaded dark grey region represents pairwise comparisons (n = 285) where at least 1 sample failed the QC criteria. Light grey region represent pairwise comparisons (more ...)
Of the 24 samples passing QC criteria that were used for the correlation matrix analysis, 10 samples matched in airway location with the 10 samples failing QC criteria were selected to assess coefficient of variation of each of the 100 maintenance genes. Expression levels for the 100 maintenance genes showed significantly greater variability among the 10 samples failing QC criteria ("fail" data set) than among the 10 samples passing QC criteria ("pass" data set, Figure ). Across the "pass" data set, the median coefficient of variation for the maintenance genes was 21.7% (5th to 95th percentile 13.0 to 31.0%). By contrast, across the "fail" data set, the median coefficient of variation for the 100 genes was 35.7% (5th to 95th percentile 21.8 to 52.5%; p < 0.0001, Mann-Whitney U test).
Figure 5 Variability in maintenance gene expression levels in samples that pass or fail QC criteria. The coefficients of variation for each of the 100 maintenance genes were calculated across 2 data sets: a data set of 10 samples failing QC criteria (red squares), (more ...)
Similarly, the coefficient of variation for all probe sets was greater for microarrays that failed QC compared to that for microarrays that passed. Two datasets of 9 microarrays each were compared giving a mean coefficient of variation of 34 ± 0.1% for the arrays that passed QC and 43 ± 0.1% for the arrays that failed QC. The impact on discovery of biological differences (for example impact of smoking on gene expression profile [12
]), was assessed by power calculations. If two groups of 15 smokers and 15 non-smokers were compared, the required true difference of means for detection with p < 0.05 with and power of 0.95 rises from 0.46 with arrays that pass QC to 0.58 with arrays that failed QC (i.e., small biological effects become more difficult to detect).
To examine potential causes of the variation in maintenance gene expression levels unrelated to the QC criteria, differences among the subjects were assessed. The 223 airway epithelial samples acquired for this study were derived from 144 individuals, as it was possible for a single individual to undergo bronchial brushing at one or more of the three target sites: trachea, large airway, and small airway. By independent linear regression, there was no correlation of gene expression level for the 100 maintenance genes (r2<0.05 for all genes) with age (average 45 ± 8.8) across the 144 individuals from whom airway epithelium was derived. None of the genes showed strong correlation (r2<0.15) with smoking history (average pack-yr 30 ± 18). Correlation analysis of expression levels with pulmonary function parameters showed no relationship (r2<0.09 for all genes with all parameters).
Impact of QC Failures on Global Lung Biology
In order to assess the functional consequences of the QC criteria on the gene expression data, a Principal Components Analysis (PCA) was used to compared samples that passed QC to those that failed. For this analysis, an independent set of microarray data that failed QC was available from a technician training program in the Weill Cornell Medical College Department of Genetic Medicine. From this training program, 11 microarrays that failed QC were available from small airway epithelium samples collected from individuals with COPD (n = 1 failed due to the RIN criteria; n = 6 failed the GAPDH criteria; n = 3 failed the scaling factor criteria; and n = 1 failed both the GAPDH and scaling factor criteria.). The data from these 11 samples was compared to microarray data from n = 11 samples (matched for ancestry, age, gender, pack-years and pulmonary function test results) from the small airway epithelium of individuals with COPD that passed all QC criteria (see Additional file 1
for demographics of the 2 groups). The PCA revealed broad, global differences in genome-wide expression levels in the small airway epithelium of individuals with COPD in samples that pass QC vs those that fail (Figure ). Using the criteria of P call of "Present" in 20% of samples, magnitude of fold-change in passed vs
failed samples >1.5, and p < 0.01 using a t test with a Benjamini-Hochberg correction to limit the false positive rate, a total of 888 probe sets are differentially expressed between the 2 groups (Additional file 2
), indicating that data from microarrays that fail QC criteria is not necessarily only more variable or "noisy," but in fact is significantly different biological data compared to data obtained from samples that pass QC criteria.
Figure 6 Principal components analysis of genome-wide gene transcriptome data in failed and passed COPD subjects. The axes have been rotated presenting a top view to highlight the 2 standard deviation ovoid clustering of expression from failed and passed COPD (more ...)