Demographic and physiologic characteristics of the 11 patients enrolled in this study are reported in Table . Each patient underwent either a medically-indicated surgical lung biopsy or medically-indicated lung transplantation surgery; remnants of the biopsy sample or pieces of the explanted lung were preserved for microarray analysis. Physiologic measurements were made prior to surgery. When we compared biopsy to explant, we found no differences in the average age of patients (60.67 ± 2.72 to 66.6 ± 0.68); the proportion of males (83% versus 60%); or the forced vital capacity (65.17 ± 5.75 to 56.8 ± 5.54). However, diffusing capacity for carbon monoxide was decreased in patients undergoing lung transplantation surgery (61.83 ± 6.38 to 29.2 ± 4.19, p-value < 0.01) which is statistically significant in this patient cohort.
Global Analysis of Gene Expression
To explore gene expression differences (and similarities) between all of the samples, we carried out an unsupervised hierarchical cluster of the entire dataset (Figure ). The dataset contains gene expression from 23 samples: 17 samples of IPF from 11 different patients (6 pairs of samples from upper and lower lobes; and 5 samples of single lobes); and 6 samples from normal lung donors. Examining the hierarchical dendrogram (Figure ), we found a natural separation between IPF samples and normal lung samples (normals are found on the left-hand side of the figure; IPF samples fall in the middle and on the right-hand side of the dendrogram), with the exception of one outlier, a sample of normal lung (Normal_C) which falls among the IPF samples.
Figure 1 Unsupervised cluster of the complete dataset (training cohort). Samples include normal lung (black) and IPF (brown). IPF is divided into biopsy (red) and explant (green). Samples are also identified by their lobe of origin: upper lobe (orange), lower (more ...)
We further observed that pairs of samples from the upper and lower lobes have similar global gene expression profiles, such that each pair forms its own node in the hierarchical cluster. In order to meet the assumptions of independent and identically distributed samples for developing signatures of IPF, we chose to use only one sample (the upper lobe, when available) per patient in the subsequent analyses.
Finally, we observed that explanted samples and biopsied samples largely segregate in the hierarchical clusters with the exceptions of: one pair of biopsied samples (Biopsy_159U and Biopsy_159L) and one normal sample (Normal_C) falling in the explant cluster; and a pair of explants (Explant_152U and Explant_152L) which fall in the biopsy cluster.
To further evaluate global differences in gene expression, we decomposed the high-dimensional gene expression data using principal component analysis (PCA), whereby 47% of the variance in this dataset is captured within the first two principal components for all 23 samples. Again, we found that normal and IPF samples are distinctive (Figure ). Furthermore, a separation was seen between the biopsied IPF samples and the explants. Meanwhile, the upper/lower lobe pairs showed strong similarity (average Pearson correlation of 0.929) as compared to unmatched pairs (average Pearson correlation of 0.781).
Comparing Gene Expression from the Upper and Lower Lobes
To further characterize the upper/lower lobe pairs, we decomposed the gene expression data for pairs alone by PCA. This analysis captured 74% of the variance within the first three principal components. We plotted the upper/lower lobe pairs according to expression of the first three principal components (Figure ) and found that clusters were not determined by lobe, but rather by the patient (intraclass correlation coefficient = 0.474, p-value = 0.02 [for the first principal component]).
Figure 2 Comparison of samples from different lobes; comparison of samples from biopsy and explant. In panels A and B, we compare samples from the upper (orange) and lower lobes (blue). In panels C and D, we compare samples obtained by biopsy (red) versus explant (more ...)
To identify genes that might be differentially expressed between the upper and lower lobes, we performed a paired LIMMA test [21
] as an empirical Bayesian approach to analyzing microarray data that uses hierarchical linear models to improve estimates of variance. First, we excluded unannotated and lowly expressed genes. Then we plotted the unadjusted p-values for all tests on a frequency histogram and note that the frequency of nominally significant p-values (< 0.05) is no greater than that expected by chance alone (Figure ). This suggests that greater differences in expression are observed across subjects than between upper and lower lobe, as supported by serial 2-way ANOVA (data not shown), and the hierarchical cluster in Figure where 5 of 6 pairs are noted to be most similar. Therefore, a single sample from each patient was selected for further analysis regardless of lobe.
Comparing Gene Expression from Biopsies and Explants
To investigate the difference between biopsies and explants, we selected the data from this subset of samples (excluding lobar replicates) and decomposed the data by PCA such that 68% of the variance was captured within the first three principal components. The samples were plotted according to expression of the first three principal components (Figure ). Here, we could appreciate a distinct separation between IPF biopsies and IPF explants.
Next, we carried out the LIMMA test to identify genes that were differentially expressed between biopsy and explant. Before adjusting p-values, we plotted the results on a frequency histogram. We noted that the frequency of nominally significant p-values (< 0.05) was greater than expected by chance alone (Figure ). After adjusting the p-values with the Benjamini-Hochberg step-down method to control the false discovery rate (FDR) [23
], 13 probesets (corresponding to 11 unique genes) were identified as statistically significant using a FDR threshold of 10% (Additional file 7
, Table S4).
Approach to Developing Gene Expression Signatures
A schematic diagram illustrates the process by which we develop genomic signatures using BPR models (Figure ). The first step is to select, as the training dataset, a collection of samples that represent two distinct phenotypes. Prior to analysis, the training dataset is filtered to exclude unannotated and lowly expressed genes, without regard to phenotypic information.
Schematic diagram of the workflow.
Because there is no prior knowledge on which to base the number of genes included in the model, we propose an iterative data-driven approach to model-fitting. We propose using the "sum of deviances" between observed and predicted phenotypes, coupled with the "misclassification rate" under a leave-one-out process, to determine the optimal size of our BPR model (i.e., the number of genes to include in the regression equation). Once the number of genes is selected, the model is summarized by the gene annotation and the average of the posterior distribution of the linear predictor under the Bayesian model. The gene signature is visualized by a heatmap that shows normalized expression values of the selected genes (rows) over the set of samples (columns).
Finally, a second set of samples is used to test the performance of the tuned model. This represents an independent validation. Because the validation dataset is derived on a different microarray platform, expression values need to be mapped and normalized in a merged dataset to account for differences in batch and the information content of each array. Then, each sample in the validation dataset is applied to the Bayesian regression model in order to generate a predictive probability (from 0.0 to 1.0) as a relative score indicating the likelihood of one phenotype over the other. Given information regarding the true phenotype of each validation sample, it is possible to construct a receiver-operating characteristic (ROC) curve for the predictive value of the gene signature.
Binary Classification for Signature Development
We chose to develop three separate models for the classification of IPF; we planned to test each model for diagnostic accuracy (i.e., functional validity) in an independent dataset. We developed the first model from all IPF samples (excluding lobar replicates) versus normal controls. This training dataset is summarized in an unsupervised hierarchical cluster (Figure ) of the genes showing the largest coefficient of variation (CoV).
Figure 4 The training sets. All IPF (brown), IPF biopsy (red), IPF explant (green) and normal lung (black). (A) Unsupervised hierarchical clustering of 11 IPF samples (6 biopsies and 5 explants) and 6 normals. (B) Unsupervised hierarchical clustering of 6 IPF (more ...)
Since we identified differential gene expression between IPF biopsies and IPF explants, we chose to separately develop diagnostic signatures from each class, as compared to normal controls. For the IPF biopsy samples, the training dataset is summarized in an unsupervised hierarchical cluster (Figure ). Likewise, for the subset of IPF explants, the training dataset is summarized in an unsupervised hierarchical cluster (Figure ).
The three training datasets are each decomposed by PCA and the samples are plotted with regard to the first two principal components (Figures and ).
Model Parameterization for Signature Development
For all signatures, the top two factors from singular value decomposition were used to fit independent terms to the BPR models. The "misclassification rate" and "sum of deviance" were used to determine the number of genes in each model, as described in Additional file 1
(also see Additional file 2
, Figure S1). We determined that 151 genes were needed to optimize the "All IPF" model; 153 genes were needed to optimize the "IPF Biopsy" model; and 70 genes were needed to optimize the "IPF Explant" model.
BPR was performed on each training dataset. Each model was visualized with a heatmap (Figure ). To illustrate that each training dataset produces a unique set of predictors, we list the top 10 gene predictors alongside each model. The complete gene list for each signature is supplied in the additional files (see Additional files 8
, Tables S5-S7).
Figure 5 Gene signatures. (A) A heatmap displays the normalized expression values of 151 genes that comprise the All IPF model, derived from 6 normals and 11 IPF samples (rows = genes; columns [left to right] = 6 normals, 6 biopsies and 5 explants). A partial (more ...)
Independent Validation of Gene Signatures
We used the GSE10667 dataset to test each gene signature. By using the same dataset to validate all three signatures, we were able to make a direct comparison between the models.
First we mapped the features of the Agilent microarray GSE10667 dataset to the corresponding features in our Affymetrix training datasets. We found that 148 features of the GSE10667 dataset mapped to features of the "All IPF" model (out of a possible 151 features, 98.0%); 151 features were mapped to the "IPF Biopsy" model (out of 153 possible features, 98.7%); and 69 features were mapped to the "IPF Explant" model (out of 70 possible features, 98.6%). After features were mapped, we merged the training and validation datasets. Gene expression was normalized across the merged datasets.
Then, each model was used in turn to predict the phenotype of each sample in the validation cohort (Figure and ). Predicted probabilities indicate the likelihood of IPF. The true phenotype of each validation sample is shown in color (blue for normal and red for IPF). Correct predictions are indicated with a solid marker while incorrect predictions are indicated with an open marker. The Youden index was used to compute cut points that maximize linear combinations of sensitivity and specificity for each model in this cohort, run on Agilent arrays. Evaluation of the quality of these thresholds would require additional validation on the Agilent platform as part of future investigations.
Figure 6 Validation tests. Each sample of the GSE10667 cohort is assigned a probability of IPF. Cutoffs were determined by calculating the Youden index. The true phenotype of each sample is indicated in color (15 normals [blue] and 31 IPF [red]). (A) The All IPF (more ...)
ROC curves are drawn on a single graph to facilitate comparison (Figure ). Area under the curve, sensitivity, specificity, positive and negative predictive values and overall predictive accuracy are reported in Table . Wilcoxon rank sum was performed on each signature to test the general association of predictions and phenotypes. Interestingly, the "IPF Explant" model outperforms the "All IPF" and "IPF Biopsy" models.
ROC curves. All IPF (brown), IPF Biopsy (red) and IPF Explant (green) are shown for comparison. Optimal cutoff points are circled.
Operating Characteristics of the Gene Signatures