Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Curr Opin Immunol. Author manuscript; available in PMC 2013 June 1.
Published in final edited form as:
PMCID: PMC3383371

Identifying gnostic predictors of the vaccine response


Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis for their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of outcome classification. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response.


Vaccination remains one of the most effective methods to prevent human disease yet we understand very little about the precise mechanisms that lead to protective immunity in vaccinated individuals. The consequences of this knowledge gap are profound. Effective vaccines are lacking for pathogens like HIV and malaria that, in aggregate, afflict billions of people [1]. And although vaccines against influenza virus, pneumococcus and varicella zoster virus are very effective in healthy adults, they fail to elicit protective immunity in a substantial proportion of the very young or elderly who are precisely the people most susceptible to infection.

One obstacle in the development of more effective vaccines is the lack of predictors of vaccine efficacy [2]. Ideally, it would be possible to take a blood sample from a vaccinated individual shortly after vaccination and measure informative parameters that would predict whether that individual would go on to develop protective immunity or not. Predictors of vaccine efficacy would allow a larger number of vaccine candidates to be rapidly evaluated in a series of short clinical trials, accelerating the identification of the most promising approaches and reducing drug development cycles by years.

In order to develop predictive measures of immunity, immunologists have now started to apply an armamentarium of large-scale, highly parallel assays to identify predictive features in biological samples from vaccinees [3,4]. Tools to measure genome-wide profiles of transcript abundance, spliceform representation, non-coding RNA levels, metabolite profiles, protein abundance, epigenetic changes and germ-line polymorphisms have become widely accessible and frequently used. Indeed recent studies have demonstrated that transcriptional profiling of peripheral blood mononuclear cells (PBMCs) a few days after vaccination reveal distinctive patterns of gene expression. Mathematical models have been developed based on these gene expression signatures that can predict the subsequent development of protective immunity [5-7].

However the challenge for systems vaccinology is that identifying gene expression signatures that predict vaccine outcome does not equate to mechanistic understanding about how that vaccine elicits immunity. In this review, we will argue that systems biology approaches to find molecular predictors of outcome must be twinned with efforts to extend the data to discover how vaccines work. Pursuing “gnostic” predictors is not only important for advancing understanding of vaccine immunology, but will also improve the classification of vaccine outcome.

We will review the types of information that can be learned through molecular predictor development in vaccine research, and why the computational algorithms and experimental design currently used to identify predictors may not themselves be sufficient to drive basic science discovery. Finally we will discuss potential ways in which the development of predictors can be linked to an increased knowledge of how effective vaccines work.

Building predictors of vaccine response

Predictors of the vaccine response are needed because individuals within an apparently homogenous cohort can have heterogeneous responses to vaccination: not everyone who is vaccinated will mount a protective T or B cell response. The rationale underlying the development of predictors of vaccine response is that the diversity in response to vaccination will be accompanied by a correspondingly diverse set of changes in cellular compartments of the immune system following vaccination. The immediate changes that best correspond to vaccine response can be used as pharmacodynamic markers of vaccine efficacy. However the rational selection of such correlates of immune protection has not been straightforward [2,3]. The use of gene expression profiling has become more frequently used because the types of complex biological processes that correspond to an effective vaccine response are likely to involve not single genes but many genes in combination.

The initial steps in generating a predictor involve measuring gene expression profiles in a relevant tissue in a cohort of individuals for whom the outcome of vaccination – such as antibody titer – is known. In the field of cancer genomics, predictors of cancer outcome have been based on gene expression profiles of the cancer cells themselves. However in the immune system, there is no analogous, single tissue from which to sample cells for dissecting biology and creating predictors. The immune system spans multiple lineages, is anatomically distributed and is highly inter-regulated. Sampling all these cellular components and assaying their gene expression profiles is obviously not feasible. However, two critical features of the immune response provide an opportunity to apply genomic approaches to study the response to vaccines.

First, cells of the immune system are easily accessible in peripheral blood samples. Each blood sample provides a snap-shot of many lineages and dozens of differentiation states within the immune system. Moreover, because migration and trafficking is an ongoing feature of the immune response, peripheral blood leukocytes represent recent emigrants of peripheral tissues, including vaccine sites. Second, cells of the immune system are uniquely sensitive to perturbation. Vaccination, infection and inflammation cause marked changes in the gene expression profiles of peripheral blood leukocytes [8-*11]. Thus the population of immune cells in the peripheral blood can provide a sensitive bellwether of localized or systemic immunologic events. For this reason, efforts to date have focused on analyzing gene expression profiles from PBMC of vaccinated individuals.

Initially, the tens of thousands of genes profiled in post-vaccination PBMC are surveyed for those individual genes that most closely correspond to the outcome of interest (antibody titer in this case) in a process known as feature selection. These features are then compiled into a mathematical model that can be used to predict whether a given sample came from an individual who will go on to have a good or a poor vaccine response. Many variants of features selection and predictive models have been developed, however none is clearly superior in all cases [12].

The next step in predictor generation is determining its accuracy using two main approaches. In the first, one or more samples are withheld from the dataset prior to feature selection and predictive model generation. The accuracy of the resulting model is then tested on the withheld sample, and the process is then repeated iteratively (leave-one-out-cross validation). The second approach is more rigorous, and uses one dataset to build the model (a training set) and a second dataset (a test set - ideally from an independent source, such as a second cohort of vaccinees) with which to test it. The end result is the selection of a group of genes (usually around 10 – 100) and a mathematical model that can be used to predict the outcome of vaccination [13].

Function-agnostic gene expression based predictors

Previous experience with gene expression-based predictors has been complicated by a lack of consistency between predictors developed in the same disease, and the difficulty in reproducing the results on fresh data sets [14]. These problems have induced a degree of skepticism about the general applicability of gene expression predictors of clinical outcome.

However a larger problem is that gene-expression based predictors often fail to provide a biological mechanism to explain their predictive power. For instance, in the field of cancer biology, several studies have identified signatures that predict outcome in breast cancer. Based on a priori knowledge about the identity of the genes in the signatures, hypotheses about the underlying mechanisms were developed including wound response [15], hypoxia [16] or invasiveness [17]. But in most cases, none of the studies provided evidence that mechanisms related to these hypotheses dictated outcome [18].

Pursuing mechanistic understanding from signatures predictive of vaccine response is more than simply biological purism. Gene expression based predictors should reflect phenotypic distinctions underlying the phenomenon (in this case vaccine response) under study. It is reasonable to assume that the closer one can get to identifying the molecular features of the underlying mechanisms that drive the difference in vaccine outcome, the more robust and generalizable a predictor will be.

The corollary of this is that predictors based on features only distantly correlated to the biological basis of the vaccine response are likely to be variable and hard to reproduce. Indeed one of the concerns regarding the gene expression predictors of outcome in breast cancer is that predictive signatures identified in different studies show dismayingly few genes in common [18]. Subsequent re-analyses of these signatures has revealed that although the specific genes selected in each were different, they share a common feature: overrepresentation of genes related to the cell cycle [19]. This raises the possibility that each predictor selected different genes that detected the same biology, i.e. the relationship between increased tumor proliferation and poor prognosis. The challenge for vaccinologists, therefore, is to approach the task of predictor development with the goal of entwining it with mechanistic discovery in order to develop the most robust and reproducible signatures.

Knowledge-based gene expression predictors

The selection of features from microarray-based measures of gene expression is usually done one a gene-by-gene basis, with genes selected to provide non-redundant information about the differences between the classes to be distinguished. However two factors limit this approach. First, genes that are most differentially expressed (i.e. have the largest fold-change or adjusted P value) may not necessarily be those that are most important in causing the difference between individuals who are responders and those who are not. Second, organisms rarely effect change in the cell-state by radical changes in small numbers of genes. Rather, alterations in biological state are often reflected in broad changes in coordinated networks of genes, the magnitude of each of which may be relatively small [20].

Because of these limitations of gene-by-gene analysis, several groups have developed analytic approaches designed to detect coordinated changes in groups of biologically related genes distributed across a transcriptional network. One such analytic approach, termed gene-set enrichment analysis [21], and its variants [22] have been used to identify the signficant biological differences between samples sets manifest at the level of coordinated up- or down-regulation of biologically related genes. This approach, for instance, was used to detect the enrichment of a set of genes common to the process of oxidative phosphorylation in profiles of diabetic muscle compared to normal controls, even though the magnitude of change in each gene was less than 10% of its value [21]. Enrichment algorithms have been strengthened by the development of curated libraries of gene signatures related to pathways [23], [24] and from experimental data [25]. In immunology this approach has been used for mechanistic discovery in many studies in a wide variety of biological settings (reviewed in [20]), and is likely to become even more powerful as libraries of immunology-specific gene-sets are developed.

More recently, gene-set enrichment methods have been applied to the development of gene expression-based predictors [**26]. In this approach, the definition of “features” that distinguish between classes of samples is broadened to include not just individual genes but sets of biologically related genes that are a coordinately up- or down-regulated. An aggregate measure of this gene-set enrichment is then calculated for each sample, and the gene sets that score best in their ability to distinguish between the classes are selected as individual features of a predictive model. Some data suggest that these gene-set approaches therefore make predictors more accurate than single gene predictors, although it remains to be seen how generalized that benefit is [27].

The real advantage of gene-set based predictors for biologists, however, that the features that distinguish the class become much more biologically interpretable. For instance, one study in cancer genomics sought to develop outcome predictors in pediatric medulloblastoma, a disease in which amplification of the Myc gene is a known prognostic marker of outcome [**26]. When gene-set expression predictors were developed, one of the features that best predicted outcome was enrichment of a set of Myc target genes, even in samples in which Myc was not amplified. This suggests the existence of pathways that cause Myc activation beyond those that result from gene amplification.

Validating the function of predictive genes

As powerful a tool as genome-wide transcriptional profiling is, it is still a method better suited to generating hypotheses rather than testing them, and the best way to determine if genes or pathways contained in a predictor are central to the vaccine response is to perturb their expression experimentally.

In our recent study, gene expression profiles of PBMC from volunteers vaccinated with the inactivated seasonal influenza vaccine were analyzed to identify gene expression based predictors of the antibody response to vaccination [6]. A group of 44 genes was identified that when measured on day 3 or 7 post vaccine could predict with high accuracy whether an individual would respond to vaccination with a high or low HAI titer [6]. Some genes had obvious relevance to the vaccine response such as TNFRSF17, a molecule involved in plasma cell differentiation, however many had no known role in regulating immunity. One gene – CAMKIV – was selected for functional validation in an animal model because its expression 3 days post vaccination was negatively correlated with antibody titers 28 days later. Mice deficient in CAMIV showed a significantly increased antibody response to vaccination, suggesting that this predictor gene was likely to be centrally involved in regulating the B cell response. Although this gene had no more statistical power than any of the others to predict outcome, its functional validation in the vaccine model provides a strong rationale for its inclusion in predictive signatures for future studies.

Challenges to the functional validation of vaccine predictive signatures

The role of CAMKIV as both a predictive gene and a regulator of the vaccine antibody responses suggest that validation of vaccine predictor genes should therefore be an important component of predictor development. However there are several obstacles to overcome for this approach to be generalized. The first difficulty is that predictive signatures are often generated from PBMC, a complex mixture of many cell types. This makes it difficult to prioritize any single gene for functional followup because upregulation of a gene in PBMC expression profiles from vaccine responders may reflect either an increase in transcript abundance in a fixed frequency of cells, an increased frequency of cells with fixed transcript abundance, or both. This can be partially offset by the use of analytic strategies that deconvolve expression profiles from heterogeneous cell mixtures into their component cell-type specific signatures [*28]. Whether predictors would be better if developed from individual lineages of immune cells rather than mixtures of cells is an open question.

The second challenge is that predictive signatures often include a relatively large number of candidate genes. For instance, CAMKIV was only one of 44 genes on the list of genes predictive of influenza vaccine response. Because evaluating the function of a list of candidate genes is conventionally done one by one in individual knockout models, it is unfeasible to tackle more than one or two leading candidates in this way. However, in recent years, tools to conduct highly parallel assessment of gene function have been developed that make assigning a functional role to a large number of genes feasible. Libraries of shRNA vectors or pools of siRNA molecules have been constructed that allow gene silencing of virtually all human or mouse genes [29]. In addition, cDNA libraries encoding tens of thousands of genes can be used to perturb gene expression in high throughput [30]. These technologies now allow the function of large numbers of genes to be surveyed in parallel.

Although these functional genomics tools have not yet been applied to screen through a list of candidate genes from a predictive vaccine signature, they have been used to survey the function of genes identified by analysis of gene expression profiles. For instance, one recent study identified patterns of genes upregulated in bone-marrow derived dendritic cells following ligation of different TLRs [31]. Although computational analysis could generate a list of candidate regulators, the function of each in the TLR response was not clear. To address this, the investigators use used RNAi to silence each of 144 candidate genes and measured the effect on the pattern of gene expression triggered by TLR stimulation. They recovered dozens of known regulatory molecules, and also identified 63 novel regulators of DC function [31]. A second study used a similar approach to identify PLK2 as a novel signaling intermediate downstream of TLRs [**32]. These studies illustrate a potential experimental workflow in which large numbers of potentially important genes could be screened for their role in regulating key cellular compartments that dictate the response to vaccination [33].


As in many avenues of systems biology, we are deluged by a sea of data, and yet real insights about mechanisms that control biological systems have been much slower to emerge [34]. Although molecular predictors of vaccine outcome have the promise to profoundly change the process by which new vaccines are discovered, embedding mechanistic insights into gnostic predictors of vaccine response may be best way to make good on that promise.


  • Gene-expression based predictors of response to vaccination have been developed and show promise in influenza and yellow-fever virus vaccines.
  • Gene-expression based predictors often fail to provide mechanistic insight in the basis for vaccine response.
  • New analytic and experimental tools now allow the function of individual genes in predictive signatures to be identified.
  • Enriching predictive signatures with functional information about their constituent genes will make gene-expression based predictors more robust.


U19AI090023, HHSN266200700006C, U54AI057157, R37AI48638, R01DK057665, and N01 AI50025 to B.P.

U19AI082630, R01AI091493 to W. N. H.

U19AI057266 to B. P. and W. N. H.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


1. Rappuoli R, Aderem A. A 2020 vision for vaccines against HIV, tuberculosis and malaria. Nature. 2011;473:463–469. [PubMed]
2. Plotkin SA. Correlates of protection induced by vaccination. Clin. Vaccine Immunol. 2010;17:1055–1065. [PMC free article] [PubMed]
3. Pulendran B, Li S, Nakaya HI. Systems vaccinology. Immunity. 2010;33:516–529. [PMC free article] [PubMed]
4. Pulendran B, Ahmed R. Immunological mechanisms of vaccination. Nat Immunol. 2011;131:509–517. [PMC free article] [PubMed]
*5. Querec TD, Akondy RS, Lee EK, Cao W, Nakaya HI, Teuwen D, Pirani A, Gernert K, Deng J, Marzolf B, et al. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nat Immunol. 2009;10:116–125. [PubMed] First proof of concept demonstration of the utility of systems approaches in predicting the immunogenicity of a vaccine.
*6. Nakaya HI, Wrammert J, Lee EK, Racioppi L, Marie-Kunze S, Haining WN, Means AR, Kasturi SP, Khan N, Li G-M, et al. Systems biology of vaccination for seasonal influenza in humans. Nat Immunol. 2011;12:786–795. [PubMed] Identification of predictive signatures of antibody response to influenza vaccination, and functional validation of predictive signature genes.
7. Gaucher D, Therrien R, Kettaf N, Angermann B, Boucher G, Filali-Mouhim A, Moser J, Mehta R, Drake D, Castro E, et al. Yellow fever vaccine induces integrated multilineage and polyfunctional immune responses. J Exp Med. 2008 doi:10.1084/jem.20082292. [PMC free article] [PubMed]
8. Allantaz F, Chaussabel D, Stichweh D, Bennett L, Allman W, Mejias A, Ardura M, Chung W, Wise C, Palucka K, et al. Blood leukocyte microarrays to diagnose systemic onset juvenile idiopathic arthritis and follow the response to IL-1 blockade. J Exp Med. 2007;204:2131–2144. [PMC free article] [PubMed]
9. Ramilo O, Allman W, Chung W, Mejias A, Ardura M, Glaser C, Wittkowski KM, Piqueras B, Banchereau J, Palucka AK, et al. Gene expression patterns in blood leukocytes discriminate patients with acute infections. Blood. 2007;109:2066–2077. [PubMed]
10. Chaussabel D, Quinn C, Shen J, Patel P, Glaser C, Baldwin N, Stichweh D, Blankenship D, Li L, Munagala I, et al. A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity. 2008;29:150–164. [PMC free article] [PubMed]
*11. Berry MPR, Graham CM, McNab FW, Xu Z, Bloch SAA, Oni T, Wilkinson KA, Banchereau R, Skinner J, Wilkinson RJ, et al. An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis. Nature. 2010;466:973–977. [PubMed] PBMC signatures that correlate with disease state in TB.
12. Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu T-M, Goodsaid FM, Pusztai L, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol. 2010;28:827–838. [PMC free article] [PubMed]
13. Simon R. Using DNA microarrays for diagnostic and prognostic prediction. Expert Rev Mol Diagn. 2003;3:587–595. [PubMed]
14. Dupuy A, Simon R. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal Of The National Cancer Institute. 2007;99:147–157. [PubMed]
15. Chang HY, Sneddon JB, Alizadeh AA, Sood R, West RB, Montgomery K, Chi J-T, van de Rijn M, Botstein D, Brown PO. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2004;2:E7. [PMC free article] [PubMed]
16. Chi J-T, Wang Z, Nuyten DSA, Rodriguez EH, Schaner ME, Salim A, Wang Y, Kristensen GB, Helland A, Børresen-Dale A-L, et al. Gene expression programs in response to hypoxia: cell type specificity and prognostic significance in human cancers. PLoS Med. 2006;3:e47. [PubMed]
17. Liu R, Wang X, Chen GY, Dalerba P, Gurney A, Hoey T, Sherlock G, Lewicki J, Shedden K, Clarke MF. The prognostic role of a gene signature from tumorigenic breast-cancer cells. N Engl J Med. 2007;356:217–226. [PubMed]
18. Massague J. Sorting out breast-cancer gene signatures. N Engl J Med. 2007;356:294–297. [PubMed]
19. Mosley JD, Keri RA. Cell cycle correlated genes dictate the prognostic power of breast cancer gene lists. BMC Medical Genomics. 2008;1:11. [PMC free article] [PubMed]
20. Haining WN, Wherry EJ. Integrating genomic signatures for immunologic discovery. Immunity. 2010;32:152–161. [PubMed]
21. Subramanian A, Tamayo P, Mootha V, Mukherjee S, Ebert BL, Gillette M, Paulovich A, Pomeroy S, Golub TR, Lander E, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102:15545–15550. [PubMed]
22. Tilford CA, Siemers NO. Gene set enrichment analysis. Methods Mol Biol. 2009;563:99–121. [PubMed]
23. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. [PMC free article] [PubMed]
24. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999;27:29–34. [PMC free article] [PubMed]
25. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular Signatures Database (MSigDB) 3.0. Bioinformatics. 2011 doi:10.1093/bioinformatics/btr260. [PMC free article] [PubMed]
**26. Tamayo P, Cho Y-J, Tsherniak A, Greulich H, Ambrogio L, Schouten-van Meeteren N, Zhou T, Buxton A, Kool M, Meyerson M, et al. Predicting relapse in patients with medulloblastoma by integrating evidence from clinical and genomic features. J Clin Oncol. 2011;29:1415–1423. [PubMed] Clinical charateristics and pathways signatures features are integrated into a Bayesian predictive model of prognosis in medulloblastoma
27. Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D. Inferring pathway activity toward precise disease classification. PLoS Comput Biol. 2008;4:e1000217. [PMC free article] [PubMed]
*28. Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, Hastie T, Sarwal MM, Davis MM, Butte AJ. Cell type-specific gene expression differences in complex tissues. Nat Methods. 2010;7:287–289. [PubMed] Deconvolving lineage-specific signatures from gene expression profiles of mixtures of cells.
29. Boehm JS, Hahn WC. Towards systematic functional characterization of cancer genomes. Nat Rev Genet. 2011;12:487–498. [PubMed]
30. Yang X, Boehm JS, Yang X, Salehi-Ashtiani K, Hao T, Shen Y, Lubonja R, Thomas SR, Alkan O, Bhimdi T, et al. A public genome-scale lentiviral expression library of human ORFs. Nat Methods. 2011 doi:10.1038/nmeth.1638. [PMC free article] [PubMed]
31. Amit I, Garber M, Chevrier N, Leite AP, Donner Y, Eisenhaure T, Guttman M, Grenier JK, Li W, Zuk O, et al. Unbiased Reconstruction of a Mammalian Transcriptional Network Mediating Pathogen Responses. Science. 2009;326:257–263. [PMC free article] [PubMed]
**32. Chevrier N, Mertins P, Artyomov MN, Shalek AK, Iannacone M, Ciaccio MF, Gat-Viks I, Tonti E, DeGrace MM, Clauser KR, et al. Systematic Discovery of TLR Signaling Components Delineates Viral-Sensing Circuits. Cell. 2011;147:853–867. [PubMed] Perturbing the expression of multiple candidate reguatory genes in parallel uncovers novel regulators of TLR signaling.
33. Amit I, Regev A, Hacohen N. Strategies to discover regulatory circuits of the mammalian immune system. Nat Rev Immunol. 2011;11:873–880. [PMC free article] [PubMed]
34. Brenner S. Sequences and consequences. Philos Trans R Soc Lond B Biol Sci. 2010;365:207–212. [PMC free article] [PubMed]