Pathogen genes that are up-regulated during infection and/or essential for microorganism survival or pathogenesis can be identified by using transcriptomics, i.e., the analysis of a near complete set of RNA transcripts expressed by the pathogen under a specified condition. Comprehensive DNA-based microarray chips (probed with cDNA generated from RNA by reverse transcription)
[24] and ultra-high-throughput sequencing technologies that allow rapid sequencing and direct quantification of cDNA
[25] enable the transcriptome of a pathogen to be characterized and particular types of gene product to be identified. For example, genes involved in the hyperinfectious state of
Vibrio cholerae, which appears after passage through the human gastrointestinal tract, were identified through a comparison of the transcriptome of bacteria isolated directly from stool samples of cholera patients with that of
V. cholerae grown in vitro
[26]. Similarly, analysis of the transcription profile of
M. tuberculosis during early infection in immune-competent (BALB/c) and severe combined immunodeficient (SCID) mice revealed a set of 67 genes activated exclusively in response to the host immune system
[27].
Functional genomics—linking genotype, through transcriptomics and proteomics, to phenotype—has been applied to many pathogens to identify genes essential to survival or virulence that may be valid vaccine candidates. DNA microarrays can be used to screen comprehensive libraries of pathogen mutants, by comparing bacterial isolates from before and after passage through animal models or exposure to compound libraries to identify attenuated clones
[28]–
[30]. For example, these methods have been used to identify 65 novel MenB genes that are required for the pathogen to cause septicemia in infant rats
[31], 47 genes essential for
H. pylori gastric colonization of the gerbil
[32], and genes contributing to
M. tuberculosis persistence in the host
[33].
Analysis of a pathogen's proteome (the near complete set of proteins expressed under a specified condition) to reveal potential vaccine and drug candidates can add significant value to in silico approaches
[34]. High-throughput proteomic analyses can be performed by using mass spectrometry (MS), chromatographic techniques, and protein microarrays
[35]. A novel proteome-based approach has been applied to identify the surface proteins of GAS by making use of proteolytic enzymes to “shave” the bacterial surface, releasing exposed proteins and partially exposed peptides. Seventeen surface proteins of a virulent GAS strain were identified in this way by using MS and genome sequence analysis. Their location on the pathogen surface was confirmed by flow cytometry, and one of them provided protective immunity in a mouse model of the disease
[36].
The proteome of a pathogen can also be screened to identify the immunome (the near complete set of pathogen proteins or epitopes that interact with the host immune system) using in vitro or in silico techniques
[37],
[38]. In vitro identification and screening of the immunome are based on the idea that antibodies present in serum from a host, which has been exposed to a pathogen, represent a molecular “imprint” of the pathogen's immunogenic proteins and can be used to identify vaccine candidates. As such, several techniques have been developed to allow the high-throughput display of pathogen proteins, and the subsequent screening for proteins that interact with antibodies in sera. Immunogenic surface proteins of several organisms have been identified, including
S. aureus using 2D-PAGE, membrane blotting, and MS
[39];
S. agalactiae,
S. pyogenes, and
Streptococcus pneumoniae using phage- or
E. coli-based comprehensive genomic peptide expression libraries
[38],
[40]; and
Francisella tularensis (the causative agent of tularemia or rabbit fever)
[41] and
V. cholerae using protein microarray chips
[42]. Protein microarrays, in which proteins from the pathogen are spotted onto a microarray chip, can also be used to characterize protein–drug interactions, as well as other protein–protein, protein–nucleic acid, ligand–receptor, and enzyme–substrate interactions
[43].
The ability to predict in silico which pathogen epitopes will be recognized by B cells or T cells has greatly improved in recent years
[44]. Large-scale screening of pathogens including HIV,
Bacillus anthracis,
M. tuberculosis,
F. tularensis,
Yersinia pestis (the causative agent of bubonic plague), flaviviruses, and influenza for B cell and T cell epitopes is currently underway
[45],
[46]. Although epitope prediction is not foolproof, it can serve as a guide for further biological evaluation. T cell epitopes are presented by MHC/HLA proteins on the surface of antigen-presenting cells, which vary considerably between hosts, complicating the task of functional epitope prediction. Additionally, B cell epitopes can be both linear and conformational. The ultimate aim of researchers in this field of study would be to engineer a single peptide that represents defined epitope combinations from a protein or organism, enabling the genetic variability of both pathogen and host to be overcome
[44].
Structural genomics—the study of the three-dimensional structures of the proteins produced by a species—is increasingly being applied to vaccine and drug development as a result of the explosion of genome and proteome data, and continuing improvements in the fields of protein expression, purification, and structural determination
[47]. The structure-based design of antiviral therapeutics has led to the development of drugs directed at the active sites of the HIV-1 protease
[48] and influenza neuraminidase
[49]. More than 45,000 high-resolution protein structures are available in public databases (see
http://www.wwpdb.org/stats.html), and several initiatives have been established to pursue high-throughput characterization of protein structures on a genome-wide scale
[50], focusing on determining and understanding the structural basis of immune-dominant and immune-recessive antigens as well as protein active sites and potential drug-binding sites
[51],
[52]. For example, structural characterization of the HIV envelope proteins gp120 and gp41 has revealed mechanisms used by the virus to evade host antibody responses, many of which involve hypervariability in immunodominant epitopes
[53],
[54]. Based on this information, immune refocusing (e.g., by retargeted glycosylation, deletion, and/or substitution of amino acids) has been used to dampen the response to variable immunodominant epitopes of the envelope glycoprotein gp160, enabling the host to respond to previously subdominant epitopes
[55]. High-throughput modification of proteins and their screening for immunogenicity and interaction with antimicrobials is predicted to become more common as techniques evolve
[51].