|Home | About | Journals | Submit | Contact Us | Français|
Acute respiratory infections (ARI) are a common reason for seeking medical attention and the threat of pandemic influenza will likely add to these numbers. Using human viral challenge studies with live rhinovirus, respiratory syncytial virus, and influenza A, we developed peripheral blood gene expression signatures that distinguish individuals with symptomatic ARI from uninfected individuals with > 95% accuracy. We validated this “acute respiratory viral” signature - encompassing genes with a known role in host defense against viral infections - across each viral challenge. We also validated the signature in an independently acquired dataset for influenza A and classified infected individuals from healthy controls with 100% accuracy. In the same dataset, we could also distinguish viral from bacterial ARIs (93% accuracy). These results demonstrate that ARIs induce changes in human peripheral blood gene expression that can be used to diagnose a viral etiology of respiratory infection and triage symptomatic individuals.
Acute respiratory infections (ARI) are among the most common reasons for seeking medical attention in the United States (Hong et al., 2004; Johnstone et al., 2008). Rhinovirus (HRV), influenza, and respiratory syncytial virus (RSV) are recognized as leading etiologies of ARI in adults (Peltola et al., 2008). Viral ARIs are generally self-limited, but can lead to disease exacerbation among individuals with prior pulmonary disease (Johnston, 1995; Rakes et al., 1999). Most adults experience at least one HRV infection per year (Arruda et al., 1997); (Schaller et al., 2006). Adult RSV infections may be self-limited or lead to airways obstruction and morbidity (Falsey et al., 2005). Influenza infection remains common, with associated significant health-care and societal costs (Gums et al., 2008). Early detection of influenza A can facilitate individual treatment decisions, as well as provide early data to forecast an epidemic/pandemic (Memoli et al., 2008).
HRV, RSV, and influenza are all spread by droplet inhalation, and upon contact with the respiratory epithelium, these viruses initiate a cytokine and chemokine response that orchestrates proliferation, chemotaxis and amplification of inflammatory cells (Bhoj et al., 2008; Kirchberger et al., 2007). Nasal epithelial inflammation produced on contact with virus triggers a coordinated host response that may result from infection limited to the upper respiratory tract or spread to the lower respiratory tract with bronchiolitis and pneumonia. Understanding the host responses to these common infections will allow for better understanding of disease pathobiology and provide a basis for development of novel diagnostic methodologies for distinguishing viral respiratory infection from respiratory disease caused by other common pathogens.
Peripheral blood leukocytes are a reservoir and migration point for cells representing all aspects of the host immune response. Gene expression patterns obtained from these cells can discriminate between complex physiologic states (Aziz et al., 2007), exposures to pathogens (Ramilo et al., 2007; Simmons et al., 2007), immune modifiers (e.g., LPS) (Boldrick et al., 2002; Kobayashi et al., 2003), and environmental exposures (Dressman et al., 2007; Meadows et al., 2008; Wang et al., 2005). While current infectious disease diagnostics rely on pathogen-based detection (Chiarini et al., 2008; Lambert et al., 2008; Robinson et al., 2008), the development of reproducible means for extracting RNA from whole blood, coupled with advanced statistical methods for analysis of complex datasets, now allows the possibility of classifying infections based on host gene expression profiling that reveal pathogen specific signatures of disease.
To realize the potential of genome-scale information requires a paradigm shift in the way complex, large-scale data are viewed, analyzed and utilized. The biology of infection, the host response and the ensuing disease process are highly complex. Our previous work in defining the complexity of the cancer phenotype using gene expression analysis has defined approaches involving successive sub-categorization of patients according to combinations of both clinical and genomic risk factors, highlighting the predictive value of multiple genomic patterns (Acharya et al., 2008; Garman et al., 2008; Xu et al., 2008). The role of formal statistical models to incorporate, evaluate, and weigh multiple gene expression patterns is fundamental to this methodology. We have shown that specific classes of statistical tree models are capable of such synthesis and can improve prediction and classification for individual patients. One core methodology that underlies our comprehensive models uses statistical prediction tree models, and the expression data enters into these models signatures (estimated “factors”) that are candidate predictive factors in statistical tree models. This approach to molecular characterization and candidate gene identification has provided significant value in recent work (Acharya et al., 2008; Garman et al., 2008; Lucas et al., 2006; Meadows et al., 2008; Seo et al., 2006), uncovering patterns of non-linear associations between gene expression and phenotypic outcomes(Brieman, 2001; Kooperberg et al., 2001; Ruczinski, 2003).
Using three human viral challenge cohorts for HRV, RSV, and influenza A, we developed a robust blood mRNA expression signature that classifies symptomatic human respiratory viral infection. Factor analysis (Carvalho et al., 2008) of mRNA expression data revealed a pattern of gene expression common across symptomatic individuals from all viral challenges. This was termed the “acute respiratory viral” bio-signature of disease, that encompassed transcripts of genes known to be related to viral infection and the overall immune response. Further, this signature could accurately classify influenza A infection in an independent community-based cohort. For this signature to serve as an important diagnostic indicator of viral respiratory infection, and for the purpose of clinical triage and treatment decisions, it should be distinct from the overall response to bacterial respiratory tract infections. An analysis of publically available peripheral blood-based gene expression data from patients with bacterial infection indicated that the acute respiratory viral signature was viral infection-specific and could distinguish patients with viral and bacterial infections as well as healthy controls. Moreover, bacterial and viral respiratory infections could be accurately classified using this gene expression signature. This work emphasizes the important concept that capturing the human host response to pathogen exposure may serve as the basis for both diagnostic testing as well as a window into the fundamental biology of infection.
Organization and data flow are shown in Figure 1. Exposures were performed on independent cohorts and datasets combined for analysis.
The attack rate was 50%, as ten of the 20 inoculated subjects developed ARI-like symptoms and had confirmed viral shedding (Table 1; Supplemental Figures 1 and 2). Peak symptoms occurred at 48 hours (n=2), 72 hours (n=4) or 96 hours (n=4) post inoculation (median 72 hours).
The attack rate was 45%, as nine of the 20 inoculated subjects developed ARI-like symptoms and had confirmed viral shedding (Table 1, Supplemental Figures 1 and 2). One subject (RSV020) had late symptoms and uninterpretable culture data and was excluded. Peak symptoms occurred at 93.5 hours (n=1), 117.5 hours (n=1), 141.5 hours (n=5) and 165.5 hours (n=1) post inoculation (median 141.5 hours).
The attack rate was 53%, as nine of the 17 inoculated subjects developed ARI-like symptoms and had confirmed viral shedding (Table 1, Supplemental Figures 1 and 2). Peak symptoms occurred at 50 hours (n=1), 62 hours (n=2), 74 hours (n=2), 86 hours (n=2), 98 (n=1) and 110 hours (n=1) post inoculation (median 80 hours).
We first combined data from each challenge and analyzed it as a single dataset. Eighty-four timepoints were included in the analysis (HRV: 10 baseline, 10 symptomatic, 10 matched timepoint asymptomatic; RSV: 10 baseline, 9 symptomatic, 10 matched timepoint asymptomatic; influenza: 8 baseline, 9 symptomatic, 8 matched timepoint asymptomatic). Twenty factors were developed using all available probes and a single factor (Factor 16) could best discriminate symptomatic (infected) subjects (HRV, RSV or influenza A) from asymptomatic (uninfected) individuals. Baseline (pre-inoculation) gene expression was indistinguishable from the matched timepoint of asymptomatic subjects (Figure 2). Baseline gene expression in subjects who became symptomatic was indistinguishable from those who remained asymptomatic (data not shown). The top 30 predictive genes contained in Factor 16 are known to characterize host response to viral infection (Supplemental Table 1). These 30 genes were used as features for the sparse probit regression model to perform leave-one-out cross validation and generate an ROC curve (Figure 2) to estimate performance of the model. Leave-one-out cross validation correctly identified 96.5% of infected subjects (misclassification rate 3.5%, 3/84). These data - from three distinct viral challenge experiments – demonstrate a clear acute respiratory viral response factor as a common feature of peak infection.
To further validate the robust acute respiratory viral response signature, we next analyzed each dataset (HRV, RSV and influenza A) separately to identify a factor that characterized symptomatic viral infection for each individual dataset (Supplemental Figure 3). We performed sparse probit regression on the 30 genes with the highest factor loading values in each factor, and this data was used for leave-one-out cross validation and generation of an ROC curve to estimate factor performance. Notably, the p-values associated with each factor (i.e. the likelihood that this group of genes would not be selected randomly) were 2.33 ×10−5 (HRV), 2.29 × 10−7 (RSV), and 4.95 × 10−13 (influenza) (www.gather.duke.edu). The individual challenge-specific factors were used as a de facto “training set” to classify subjects from the other challenges. As shown in Supplemental Figure 4 and Table 2, when the model was trained on any individual dataset, prediction of symptomatic versus asymptomatic was >96%. This supports the conclusion that, at peak viral respiratory infection symptoms, the host response converges to encompass a gene expression program highly characteristic of response to viral infection. Supplemental Figure 5 shows overlap between genes represented in the factors predictive for the individual viruses. Most genes contained in the individual virus factors were present in the acute respiratory viral factor. Genes unique to an individual virus factor include the following: SOCS1 (HRV) and FCGR1A, GBP1, LAP3, ETV7 and FCGR1B (RSV). Complete gene lists for the individual virus factors and the acute respiratory viral factor are listed in Supplemental Table 1. Genes represented in these factors were highly representative of host response to viral infection, including RSAD2, interferon response elements and the OAS gene family.
Given the strong viral response signature that distinguished symptomatic HRV, RSV, and influenza infection from uninfected subjects, we sought to confirm the specificity of this response to viral infection diagnosed in a community setting. We utilized two methods to validate our acute respiratory viral signature using microarray datasets derived from PBMC mRNA from a published study (Ramilo et al., 2007) of viral respiratory infection ascertained a from cohort of pediatric patients with microbiologically proven influenza A infection with linked gene expression data. First, we used the acute respiratory viral classifier built on the combined three challenge datasets to predict disease state (uninfected vs influenza A infection) in the literature cohort (Figure 3). Despite differences in subject ascertainment in the experimental cohort and the literature cohort [as well as other potential confounders (such as age and demographics)], we were able to accurately classify subjects as influenza A infected versus no infection in the literature cohort. This classification of subjects in this cohort was highly accurate [100% (23/23) for influenza infected versus no infection] (Figure 3b). Prediction of viral infection in a pre-existing dataset using genes identified as discriminative in an experimental dataset reinforces the robust nature of both the methodology and the classifier.
In the second approach, we re-analyzed the raw gene expression data from the literature data set  using the same methods that were utilized to generate the HRV, RSV, and influenza expression signatures. Similar to our analysis of the HRV-, RSV-, and influenza-infected cohorts, twenty factors were built using the entire gene set from all persons in the literature cohort (Supplemental Figure 6). These factors were used to build a classifier that distinguished persons with influenza A (n = 18) from healthy controls (n = 6 pediatric subjects hospitalized for elective surgery). The top 30 genes in this factor were used as features for the sparse probit regression model to perform leave-one-out cross validation and generate ROC curves to estimate performance of the algorithm. Leave-one-out cross validation correctly identified 100% of the 24 individuals in this dataset. Of the 27 unique genes represented in the literature cohort factor, 20 were also present in the acute respiratory viral factor derived from the experimental cohorts. Of the 28 unique genes represented in the acute respiratory viral factor derived from our experimental cohorts, 20 were also present in the literature cohort factor. The probit function was also used to discriminate between influenza A infection and bacterial infection, with cross-validation correctly classifying 90/97 subjects (misclassification rate 7%). This finding further supports the acute respiratory viral factor derived above is a robust disease signature at time of peak symptoms. Predictive performance of each gene contained in the probit function generated from the acute respiratory viral factor to predict pathogen class in the independent dataset is shown in Supplemental Figure 7.
We next sought to further show that our acute respiratory viral gene expression factor was specific for viral infections. We used microarray datasets available in the literature  derived from PBMC mRNA from a cohort of pediatric patients with microbiologically proven S. pneumoniae, S. aureus, or E. coli infections [(S. pneumoniae (n=13), S. aureus (n=31), or E. coli (n=29)]. We used the acute respiratory viral classifier built on the three combined challenge datasets to predict disease state (influenza A infection versus bacterial infection) in the literature cohort (Figure 4). Classification of subjects in the literature cohort was highly accurate: 80% (73/91) for influenza infected versus any bacterial infection (Figure 4) and 93% (31/33) for influenza infected versus pneumococcal infection (data not shown). This analysis confirms specificity of the viral infection signature to discriminate not only between subjects with acute respiratory viral infection and uninfected subjects, but also from subjects with acute bacterial infections, including bacterial respiratory infection. Ultimately, the differentiation that is most valuable clinically may be discriminative between host response to viral respiratory tract infection and bacterial pneumonia (i.e. S. pneumoniae infection). Thus, despite inherent differences in sample acquisition and study design between the experimental HRV, RSV, and influenza cohorts and the literature cohort, these analyses confirm the robust nature of gene expression signatures that differentiate subjects with respiratory viral infection from subjects with bacterial infections, including pneumococcal infection, and from healthy subjects.
We performed three independent human viral challenge studies (HRV, RSV, and influenza) to define host-based peripheral blood gene expression patterns characteristic of response to viral respiratory infection. The results provide clear evidence that a unique biologically relevant peripheral blood gene expression signature classifies respiratory viral infection with a remarkable degree of accuracy. These findings underscore the conserved nature of the host response to viral infection, which is also evident in the cross-validation between experimental cohorts. The “acute respiratory viral” gene expression signature derived from these cohorts was validated in an independently derived external dataset, and, importantly, can distinguish respiratory viral infection from bacterial infection. These findings provide compelling evidence that peripheral blood gene expression can function as a biomarker for specific classes of infectious pathogens and may potentially serve as a useful diagnostic for triaging treatment decisions for ARI.
Discrimination between infectious causes of illness is a critical component of acute care of the medical patient as such distinctions facilitate both triage and treatment decisions. While traditional culture, antigen-based, and PCR based diagnostics are useful in pathogen classification, these assays are not without limitations(Bryant et al., 2004; Campbell and Ghazal, 2004). Current rapid diagnostic methods are lacking in sensitivity, with influenza and RSV tests (e.g. BinaxNOW antigen testing) reporting sensitivities of 53-80% (Jonathan, 2006; Landry et al., 2008; Rahman et al., 2008) or are labor-intensive, such as direct-fluorescent antibody (DFA) testing. Categorizing infection based on host response is an emerging hypothesis that not only enhances our diagnostic capabilities, but may provide additional insight into the pathobiology of infection. We have identified gene expression patterns that characterize host response to viral infection and that identify infected individuals with a high degree of accuracy. Several lines of evidence validate our findings, including the internal cross validation between exposure cohorts as well as validation with the free-living influenza A and bacterial infection pediatric cohort (Ramilo et al., 2007). Other investigators have identified host gene expression patterns – in nasal epithelium – that are associated with viral infection. Differentially expressed genes in nasal epithelium exposed to HRV 16 (in vitro and from experimentally infected subjects) were similar to those found in the current study in peripheral blood (Proud et al., 2008). In particular, RSAD2 (viperin), a potential antiviral molecule (Chin and Cresswell, 2001; Jiang et al., 2008; Wang et al., 2007b), was the most highly differentially expressed gene in nasal epithelium between infected and uninfected individuals at 48 hours post inoculation. Our HRV (HRV-16) predictive factor included RSAD2 (viperin) and the probit regression model selected it as the key differentially expressed gene in blood for determining infected state in the HRV cohort. Whole blood gene expression studies looking at RSV infection in hospitalized infants shared differentially expressed genes with the RSV factor found in our study, with a predominance of interferon-response elements, FCγ1AR, and OAS3 (Fjaerli et al., 2006). Finally, data from the naturally-occurring influenza A/bacterial infection study (Ramilo et al., 2007) confirmed a distinct host response signature to viral infection occurring both in this cohort and our experimentally infected cohorts. Taken together, this provides strong evidence for highly accurate in vivo detection of human viral respiratory infection through analysis of peripheral blood gene expression. Notably, different peripheral blood immune cell types induce varying gene expression programs in response to pathogen exposure. Thus, the peripheral blood gene expression signatures derived and validated in these cohorts may only be applicable to individuals without underlying immune deficiencies. Additional studies in immune deficient populations will be needed to generalize the current findings to these rare but clinically important patient subsets.
Evident from the genes in each factor, signatures that discriminate subjects with symptomatic respiratory viral infections from healthy subjects and subjects with bacterial infection contain biologically plausible gene networks involved in host viral response. The acute respiratory viral factor was most heavily represented by genes in the interferon signaling canonical pathway (p = 9.75 × 10−9) and the pattern recognition pathway for bacteria and viruses (p = 5.67 × 10−5). This over-representation of interferon response elements remained when individual viral challenges were analyzed as separate entities (HRV p = 1.38 × 10−10, RSV p = 2.25 × 10−9, influenza p = 1.25 × 10−7). (www.ingenuity.com). Overlap between the genes defining each factor (discriminating symptomatic individuals versus asymptomatic individuals OR discriminating viral respiratory infection from bacterial infection) was strong. Baseline gene expression among all challenge subjects was similar and indistinguishable from the later timepoints for asymptomatic subjects and classification of subjects from one cohort based on the other cohorts was remarkably accurate. Discovery of discriminant factors for disease states such as this one is inherently blind to biology, as the model is not aware of data labels. Despite differences in study design, commonalities between experimentally infected adults with HRV, RSV, or influenza A and community infected children with influenza A predominated over virus-specific aspects of each signature. However, when selecting the gene or genes with greatest discriminating power for leave-one-out cross validation, the model chose different genes for each viral illness (HRV: RSAD2; RSV: RTP4; influenza A: ISG15; viral vs. bacterial: IFI27, RSAD2, IFI6, CXCL10, FLJ20035, GBP1 and SIGLEC1 and viral vs. S. pneumoniae: RSAD2). Thus, with careful exploration of disease biology or with additional cohorts for validation, disease specific markers of infection may arise, adding parity to the diagnostic signatures. Overlap is minimal with differentially expressed genes from other studies of peripheral blood response to environmental stress found in a study of humans exposed to ionizing radiation, and the genotoxic stress of chemotherapy and LPS (Dressman et al., 2007; Meadows et al., 2008), decreasing the likelihood that these genes are part of a generalized response program inherent to immune effector cells.
Despite data acquisition and processing differences, gene expression patterns derived from publically available microarray data for individuals with influenza A infection were similar to those with experimentally acquired symptomatic HRV, RSV, or influenza A infection. Genes found to characterize the response to respiratory viral infection in our cohorts overlap with genes found in many gene expression studies of host response to viral infections, both in vivo (Bhoj et al., 2008; Proud et al., 2008; Ramilo et al., 2007) and in vitro (Jenner and Young, 2005). This generalizability of the respiratory viral response signature finding illustrates that the host response to respiratory viral infections is robust and conserved such that it can be discerned in divergent patient populations (healthy adult volunteers experimentally infected with HRV or RSV and children hospitalized with influenza A). Second, this finding illustrates the dominance of a pathogen specific response at time of peak symptoms over a generalized “infection” response, as discrimination between viral and bacterial infection is possible. The ability of these signatures to differentiate between pathogen classes (viral versus bacterial) provides a marked distinction between these findings and current methods of infectious or inflammatory illness classification (e.g. peripheral white blood cell count or measurement of inflammatory markers such as C-reactive protein). The sensitivity and specificity of these markers in both our experimental setting and when applied to a cohort from the literature data represent an improvement on the performance of current rapid (e.g. rapid antigen testing) diagnostics as well as current culture-based diagnostics. A combination of these tests may ultimately prove to offer the best sensitivity and specificity for disease diagnosis. These data provide an important backbone to the concept that host peripheral blood gene expression may be a valuable tool alone or in conjunction with standard microbiologic testing for infectious diseases. Validation in an additional community based cohort, as well as developing signatures to diagnose pre-symptomatic viral respiratory infections is desirable.
An important question that arises is whether the changes in host gene expression described here occur before peak symptoms? While still preliminary, we have time course data on subsets of these cohorts. The factor analysis was applied using the RSV, HRV and influenza data from all samples at all times, from which the factor discussed above [Factor 16] was constituted. In Figure 5 we plot the factor score (strength) of the discriminative factor, as a function of time. Two curves are depicted, representing the average factor scores, averaged separately for those that would eventually be symptomatic, and those that would not. The differences in f scores between individuals who remain asymptomatic and those who become symptomatic reach statistical significance (p = 0.028) at 45.5 hours following inoculation. This factor was found to be detectable prior to development of peak symptoms among symptomatic individuals. Thus, using host response as the diagnostic paradigm, presymptomatic diagnosis may be possible.
Signature validation across experimentally infected cohorts illustrates the robust nature of the host response to viral infection. Additional validation of the gene expression signatures in other community-based cohorts would elevate these findings to a true diagnostic test that could enhance or supersede traditional microbiologic based diagnostics. Additionally, such data would be extremely valuable if it could be used to either diagnose infection class prior to standard microbiologic studies (i.e. in the early phases of disease) or indicate prognosis following disease acquisition or therapeutic intervention. In our study, we were able to utilize an easily obtained sample (peripheral blood) to characterize response to a respiratory infection. While development of a diagnostic test that utilizes host gene expression to characterize or predict infectious diseases is not yet possible from the data generated in this study, it represents an important advance showing that peripheral blood gene expression can be used to characterize host response to infection.
All exposures were approved by the relevant institutional review boards (IRBs) and conducted according to the Declaration of Helsinki. Funding for this study was provided by the US Defense Advanced Research Projects Agency (DARPA) through contract N66001-07-C-2024.
We recruited healthy volunteers via advertisement to participate in the HRV challenge study through an active screening protocol at the University of Virginia (Charlottesville, VA). Subjects who met inclusion criteria underwent informed consent and pre-screening for serotype-specific anti-HRV approximately two weeks prior to study start date. On the day prior to inoculation, subjects underwent repeat HRV antibody testing as well as baseline laboratory studies, including complete blood count, serum chemistries, and hepatic enzymes. On day of inoculation, 106 TCID50 GMP HRV serotype 39 (Charles River Laboratories, Malvern PA) was inoculated intranasally according to published methods (Drake et al., 2000; Gwaltney et al., 1992; Turner, 2001). Subjects were admitted to the quarantine facility for 48 hours following HRV inoculation and remained for 48 hours following inoculation. Blood was sampled into RNA PAXGene™ collection tubes (PreAnalytix; Franklin Lakes, NJ) at pre-determined intervals post inoculation. Nasopharyngeal (NP) lavage samples were obtained from each subject daily for HRV titers to accurately gauge the success and timing of the HRV inoculation. Following the 48th hour post inoculation, subjects were released from quarantine and returned for three consecutive mornings for sample acquisition and symptom score ascertainment.
A healthy volunteer intranasal challenge with RSV A was performed in a manner similar to the HRV challenge. The RSV challenge was performed by Retroscreen Virology, Ltd (London, UK) in 20 pre-screened volunteers who provided informed consent. On day of inoculation, a dose of 104 TCID50 RSV (serotype A) manufactured and processed under current good manufacturing practices (cGMP) by Meridian Life Sciences, Inc. (Memphis, TN USA) was inoculated intranasally per standard methods. Blood and NP lavage collection methods were similar to the HRV cohort, but continued throughout the quarantine. Due to the incubation period of RSV A, subjects were not released from quarantine until after the 288th hour AND were negative by rapid RSV antigen detection (BinaxNow Rapid RSV Antigen; Inverness Medical Innovations, Inc).
A healthy volunteer intranasal challenge with influenza A A/Wisconsin/67/2005 (H3N2) was performed at Retroscreen Virology, Ltd (Brentwood, UK) in 17 pre-screened volunteers who provided informed consent. On day of inoculation, a dose of 106 TCID50 Influenza A manufactured and processed under current good manufacturing practices (cGMP) by Baxter BioScience, (Vienna, Austria) was diluted and inoculated intranasally per standard methods at a varying dose (1:10, 1:100, 1:1000, 1:10000) with four to five subjects receiving each dose. Due to the incubation period, subjects were not released from quarantine until after the 168th hour. Blood and NP lavage collection continued throughout the duration of the quarantine. All subjects received oral oseltamivir (Roche Pharmaceuticals) 75 mg by mouth twice daily at day 6 following inoculation and were negative by rapid antigen detection (BinaxNow Rapid Influenza Antigen; Inverness Medical Innovations, Inc) at time of discharge.
Symptoms were recorded twice daily using standardized symptom scoring(Jackson et al., 1958). The modified Jackson Score requires subjects to rank symptoms of upper respiratory infection (stuffy nose, scratchy throat, headache, cough, etc) on a scale of 0-3 of “no symptoms”, “just noticeable”, “bothersome but can still do activities” and “bothersome and cannot do daily activities”. Modified Jackson scores were tabulated to determine if subjects became symptomatic from the respiratory viral challenge. A modified Jackson score of >= 6 over the quarantine period was the primary indicator of successful viral infection(Turner, 2001) and subjects with this score were denoted as “symptomatic, infected” Viral titers from daily nasopharyngeal washes were used as corroborative evidence of successful infection using quantitative culture (Barrett et al., 2006; Jackson et al., 1958; Turner, 2001).
Subjects were classified as “asymptomatic, not infected” if the Jackson score was less than 6 over the five days of observation and viral shedding was not documented after the first 24 hours subsequent to inoculation. Standardized symptom scores tabulated at the end of each study to determine attack rate and time of maximal symptoms (time “T”).
Subjects had the following samples taken 24 hours prior to inoculation with virus (baseline), immediately prior to inoculation (pre-challenge) and at set intervals following challenge: peripheral blood for serum and plasma, peripheral blood for RNA PAXgene™, NP wash for viral culture/PCR, urine, and exhaled breath condensate (EBC). For the HRV challenge, peripheral blood was taken at baseline, then at 4 hour intervals for the first 24 hours, then 6 hour intervals for the next 24 hours, then 8 hour intervals for the next 24 hours and then 24 hour intervals for the remaining 3 days of the study. For the RSV and influenza challenges, peripheral blood was taken at baseline, then at 8 hour intervals for the initial 120 hours and then 24 hours for two further days. For all cohorts, NP washes, urine and EBCs were taken at baseline and every 24 hours. Samples were aliquoted and frozen at −80°C immediately. This study is focused on comparison of baseline samples with RNA PAXgene™ samples taken at time of peak symptoms. Paxgene™ RNA from the timepoint of maximal symptoms was chosen for hybridization to Affymetrix U133a human microarrays for further analysis. For all results reported, gene expression signatures were evaluated at the time of maximal symptoms following viral inoculation for symptomatic subjects and a matched timepoint for asymptomatic subjects. Baseline (pre-inoculation) samples were also analyzed.
Raw data from Ramilo, et al, (Ramilo et al., 2007)was obtained from the public domain database GEO (www.ncbi.nlm.nih.gov/geo/projectIDGSE6269) and were analyzed independently using methods described below.
RNA was extracted at Expression Analysis (Durham, NC) from whole blood using the PAXgene™ 96 Blood RNA Kit (PreAnalytiX, Valencia, CA) employing the manufacturer’s recommended protocol. Complete methodology can be viewed in the Supplementary Methods. Hybridization and microarray data collection was performed at Expression Analysis (Durham, NC) using the GeneChip® Human Genome U133A 2.0 Array (Affymetrix, Santa Clara, CA).
Using just the data from the influenza challenge, we tested (Kruskal-Wallis) each probe for differential expression between subjects who were sick vs healthy at Time T. Due to the small sample size, there were no probes showing significant association after correction for multiple hypotheses (Bonferroni). We then analyzed jointly the results from all three trials in an ANOVA framework. In addition to the intercept term, we included in the design matrix indicators of sick versus healthy, t0 versus tmax, and indicator for each of rhinovirus and RSV, and interaction terms for rhinovirus – sick and RSV – sick (Supplemental Analysis). Following RMA normalization of raw probe data, sparse latent factor regression analysis was applied to each dataset (Aziz et al., 2007; Carvalho et al., 2008; Lucas et al., 2006; Wang et al., 2007a). This reduces the dimensionality of the complex gene expression array dataset assuming that many of the probe sets on the expression array chip are highly interrelated (targeting the same genes or genes in the same pathways). Dimension reduction is performed by constructing factors (groups of genes with related expression values). These are used in a sparse linear regression framework to explain the variation seen in all of the probe sets. By default, most of the coefficients in this linear regression are zero. Thus, a small number (e.g. 20) of factors explain variation seen in any single dataset. Factor loadings are defined as the coefficients of the factor regression, and, to explore the biological relevance any particular factor, we examine the genes that are “in” that factor -- the genes that show significantly non-zero factor loadings. “Factor scores” are defined as the vector that best describes the co-expression of the genes in a particular factor. Both factor loadings and factor scores are fit to the data concurrently. While 20 factors were used for the results reported here, we also considered 30 and 40, with minimal effect on the significant factor loadings. The initial models were derived using an unsupervised process (Acharya et al., 2008) (i.e. the model classified subjects based on gene expression pattern alone, without a priori knowledge of infection status). The top 30 genes in each factor were used as features for the sparse probit regression model to perform leave-one-out cross validation and generate ROC curves to estimate performance of the algorithm. The probit regression model selects the “top” predictive gene from the gene set for sample classification and generation of an ROC curve. Validation of the factor most discriminative between the asymptomatic and symptomatic state was performed using labeled data. Validation between datasets (HRV, RSV, and influenza A) was performed by training the regression model on one set of data (i.e. one viral exposure) and using this model to predict health or disease in a different data set (i.e. a different viral exposure). Validation of the model using the publically available dataset was performed by utilizing the joint factor analysis on the viral exposure dataset (HRV, RSV, and influenza), building a probit classifier using the top 30 genes from the most predictive factor and applying this classifier to the publically available dataset to estimate the predictive performance of the acute respiratory viral classifier.
The authors wish to acknowledge Daphne Jones, Stephanie Dobos and Kyle Breitschwerdt for data management; L. Brett Caram, MD for protocol design; and Anil Potti, MD for critical review of the manuscript. This work was supported by funding from the Defense Advanced Projects Research Agency (DARPA) lN66001-07-C-0092 (G.S.G.)
Aimee K. Zaas, Division of Infectious Diseases and International Health; Department of Medicine; Institute for Genome Sciences and Policy; Duke University School of Medicine; Durham, NC.
Minhua Chen, Department of Electrical and Computer Engineering; Duke University, Durham, NC.
Jay Varkey, Division of Infectious Diseases and International Health; Department of Medicine; Institute for Genome Sciences and Policy; Duke University School of Medicine; Durham, NC.
Timothy Veldman, Institute for Genome Sciences and Policy; Duke University; Durham, NC.
Alfred O. Hero, III, Department of Electrical Engineering and Computer Science; University of Michigan, Ann Arbor, MI.
Joseph Lucas, Institute for Genome Sciences and Policy; Duke University; Durham, NC.
Yongsheng Huang, Department of Electrical Engineering and Computer Science; University of Michigan, Ann Arbor, MI.
Ronald Turner, University of Virginia School of Medicine; Charlottesville, VA.
Anthony Gilbert, Retroscreen Virology; Brentwood, UK.
Robert Lambkin-Williams, Retroscreen Virology; Brentwood, UK.
N. Christine Øien, Institute for Genome Sciences and Policy; Duke University; Durham, NC.
Bradly Nicholson, Division of Infectious Diseases; Durham Veteran’s Affairs Medical Center; Durham, NC.
Stephen Kingsmore, National Center for Genome Research; Santa Fe, NM.
Lawrence Carin, Department of Electrical and Computer Engineering; Duke University, Durham, NC.
Christopher W. Woods, Division of Infectious Diseases and International Health; Department of Medicine; Institute for Genome Sciences and Policy; Duke University School of Medicine; Durham, NC.
Geoffrey S. Ginsburg, Institute for Genome Sciences and Policy; Duke University; Durham, NC.