Immunogens. Plasmid DNAs containing five different modifications of the canonical Env gene were generated for each of the antigen sets used in this study: full-length Env proteins (“gp160”) and four variants with short deletions. Deletion variants were as follows: (i) the full-length Env protein with variable loops 1, 2, 4, and 5 deleted (Env gp160ΔVs); (ii) the full-length Env protein with deletion of the fusion domain and the cleavage domains and a shortened interspace between heptad 1 (H1) and heptad 2 (H2) (
4) (Env gp160ΔCFI); (iii) the full-length Env protein with deletion of the fusion domain and the cleavage domains, a shortened interspace between heptad 1 (H1) and heptad 2 (H2), and also with variable loops 1, 2, 4, and 5 deleted (Env gp160ΔCFIΔVs); and (iv) the Env protein without the cytoplasmic domain, with deletion of the fusion domain and the cleavage domains, with a shortened interspace between heptad 1 (H1) and heptad 2 (H2), and also with variable loops 1, 2, 4, and 5 deleted (Env gp145ΔCFIΔVs). All modified HIV Env genes were synthesized using human-preferred codons (GeneArt, Regensburg, Germany) (
15) or by preparation of oligonucleotides of 75 bp overlapping by 25 bp or of 60 bp overlapping by 20 bp and were assembled with
Pwo (Boehringer Mannheim) and Turbo
Pfu (Stratagene) as described previously (
4,
14). All deletions or other modifications were generated by site-directed mutagenesis using a QuikChange kit (Stratagene, La Jolla, CA). The cDNAs were cloned into a plasmid expression vector, pCMV/R, which mediates high-level expression and immunogenicity in vivo (
2,
38).
Mosaic proteins were designed using the methods described by Fischer et al. (
6); a web-based suite of tools that enables generation of candidate mosaic sequences for any set of variable pathogen proteins and epitope length sequence coverage comparison of different vaccine antigen candidates is now available (
34). Mosaics are optimized as a set for a particular size of cocktail and so were designed separately for the one-, two-, and three-antigen combinations (i.e., the single mosaic is not found in the two- or three-mosaic set). The input data were an unaligned version of the full Env M group alignment from the Los Alamos National Laboratory HIV database, as of July 2006 (restricted to include a single sequence per person). Sequences were generated as recombinants of that set and optimized for 9-mer coverage of that set. Unnatural breakpoints were excluded. We also selected the three natural sequences that in combination provided the optimal 9-mer coverage of that same data set, either with or after exclusion of the V-loops, using the same software suite (
9,
34) (
http://www.hiv.lanl.gov/content/sequence/MOSAIC). The length of nine amino acids was selected for the optimization criteria because it is the most common length of optimal CD8 epitopes; nearby lengths (8, 10, 11, 12, etc.) also get greatly enhanced coverage through the process of optimizing on 9-mers (data not shown). The full-length Env protein amino acid sequences of the three sets of mosaics, gp160 and various mutants, are shown in Fig. S1 in the supplemental material.
Vaccine antigen comparisons. The basic antigen design strategies included the following sets: three natural strains that have been previously studied as a polyvalent vaccine in the modified form gp145ΔCFI, each from different clades (Env ABC); one, two, or three gp160 mosaics (mos.1, mos.2, and mos.3); three natural strains selected to in combination provide optimal M group coverage of gp160 9-mers (nat.3) (CRF01AE, FIN92168 AF219267; clade B, QH0908 AF277072; and clade C, 93IN101 AB023804; listed as clade, sequence name, and accession number) or to provide optimal M group coverage if the V regions were excluded (natΔV.3) (clade C, 99BW46424 AF443084; clade B, QH0908 AF277072; and clade A, KNH1088 AF457063). These baseline sets were further modified to enable direct comparisons of T-cell responses of the full gp160 proteins to previously studied envelope modifications; thus, gp160 responses for a given antigen set were compared to gp145ΔCFI and gp160ΔCFI modifications. A negative control (Control) consisting only of the CMV/R vector was included.
Splenocytes from immunized mice were analyzed by intracellular cytokine staining (ICS) for tumor necrosis factor alpha (TNF-α) and gamma interferon (IFN-γ) T-cell responses against the approximately 100 different peptide pools described below. Responses from CD4+ and CD8+ T lymphocytes were measured separately. The data were analyzed for the magnitude of overall response (strength) and the number of positive responses (breadth). Both TNF-α and IFN-γ responses were measured and analyzed. Interleukin-2 (IL-2) responses were also measured, but there was very little signal and the measurements were dominated by noise, and so these were not included in further analysis. Between 7 and 10 vaccine antigen/protein modification (vac/mod) configurations were tested on 12 separate days. The 12 sets of experiments were grouped into six pairs; in each pair, the same set of antigen configurations was tested. The magnitude of the overall responses varied by a factor of up to about 6 on different days, and this effect was corrected through statistical methods, as described below. Not all configurations of vaccine plus modification were tested, and the number of times a particular configuration was repeat tested ranged from 2 to 12. In Table S1 in the supplemental material, we indicate the number of microtiter plates measured for each vector.
Statistical methods. The objective of the analysis is to compare the strength and breadth of different vaccine strategies and envelope modifications. A statistical model was used that enabled us to control for the variability between assays done on different days (we will call this the “date effect”) and so to assess the contribution of the vaccine strategy (the “vaccine effect”) to the outcome. These effects should be independent, since the date effect will depend on the measurement process and the variation between mice and the vaccine effect will depend on the vaccine that was given. The usual procedure for dealing with such independent effects is to adopt a balanced experimental design, so that the measurement of a particular vaccine is randomized over the different dates. However, the adoption of such a design was inconsistent with the exploratory manner in which the data were acquired, and in any case, the significance of the date effect was not fully recognized in advance. It turns out that the strengths of the different vaccines vary substantially, by a factor of about 6, but the date effect is roughly comparable in magnitude, complicating the assessment of both strength and breadth.
While measurements with some vaccines were repeated many times, other vaccines were measured on only a few days, often only 2 days. If, for example, the overall response to the vaccine was low, it was not clear whether this was due to the vaccine or to the day on which it was measured. The date effect also complicated the assessment of breadth. If we had used a fixed threshold to assess positivity, as is customary, we would have missed positive responses on days when the overall response was low and would have interpreted random noise as a positive response on days when the overall response was high. We tested this approach and found great variation in the number of positive responses for the same vac/mod on different days. A routine analysis, not correcting for the date effect, would have led to greatly increased noise in the breadth assessments and would have prevented us from making meaningful comparisons.
To deal with this problem, we adopted a statistical model that enabled us to correct for the date effect. We call the corrected data the “date-corrected” data. Using the date-corrected data, we can compare vaccine strengths directly and use a common threshold for assessing positivity. Because the date effect is uncertain, the date-corrected data acquires some additional uncertainty, but the results are nevertheless highly significant. Intuitively, what makes this approach possible is that some of the vaccines were tested on most or all of the dates, and the difference in their responses provides the necessary information about the date effect.
In order to account for the date effect, we modeled the logarithm of the vaccine strength, rather than the strength itself; this converts the multiplicative variation into an additive variation that can then be estimated using a linear model. Accordingly, the following “two-way layout” was adopted:
where
lij is the (natural) logarithm of the strength of the responses for the vac/mod
i on day
j;
vi and
dj are quantifications of the vaccine and date effects, respectively, in this model; and the
ij are identical and independently distributed Gaussian random errors, to account for natural mouse-to-mouse variation and other stochastic effects. We describe how we determined
lij below. Note that the log response is additive in
vi and
dj, which reflects the independence of the date and vaccine effects.
We use the data
lij to make estimates,
i and
j, for the vaccine and date effects. The interpretation of these numbers, roughly speaking, is the following. If vaccines 1 and 2 are measured on the same day, then we expect the response to vaccine 1 to be exp(
1)/exp(
2) times larger than the response to vaccine 2. Similarly, if the same vaccine is measured on day 1 and day 2, then we expect the response on day 1 to be exp(
1)/exp(
1) times larger than the response on day 2. The analysis gives only ratios of the strengths (or differences in the log strengths). Thus, we measure all vaccine strengths relative to the negative control and all date effects relative to an arbitrarily chosen fiducial date.
The date-corrected log strength is
lij% =
lij +
0 −
j; this is the log strength that would have been expected had the data been measured on the fiducial date. The expected difference in the date-corrected log strengths, for two different vaccines, depends only on the vaccine, not on the day: E(
lij% −
li′j%) =
i −
i′.
The date-corrected responses to individual peptide pools are obtained by multiplying the data on day
j by the factor exp(
0)/exp(
j), where
0 is the fiducial date effect. These are the data that we would expect had the data been measured on the fiducial date. (Note that this is a slight approximation, in that the factor should strictly be the expectation of exp(
d0)/exp(
dj), but the difference should be small.)
To assess uncertainties in vaccine strength, we calculate the variance of
vi −
v0, where
v0 is the vaccine effect for the negative control. These uncertainties are easily determined from the linear model (equation
1), using standard methods.
The date effects also depended somewhat on the T-cell type and cytokine, so separate models were fitted to all combinations of CD4 and CD8 and of IFN-γ and TNF-α. However, for the same T-cell type, cytokine, and date, different parts of the data gave very similar estimates for dj.
The log strength, lij, was computed in three ways: (i) averaging the logarithm of the largest 10 responses from the PTE pools for the given experiment (the average was restricted to the top 10 responses in order to reduce the effect of noise, which dominates the smaller responses); (ii) averaging the logarithm of responses from the four PTE superpools; and (iii) averaging the logarithm of the responses for the Env A, Env B, and Env C pools. Note that in all cases, lij is the logarithm of the raw measurement, including the noise, which is assumed to scale in the same way as the signal. We thus avoid singularities that would occur if the measurement went to zero. In fact, we estimate the background from the unstimulated counts and replace any smaller measurement by this estimate. The uncertainty in l will increase as the measurements get close to zero, but we do not attempt to model this.
To assess breadth, we counted the number of positive responses to the 78 PTE pools. The data were corrected for the date effect, as described above, using strength estimates based on the PTE pools. A pool was judged to produce a positive response if the date-corrected response exceeded a threshold, which was set separately for each combination of T-cell type and cytokine but was otherwise held constant. When combining data from identical experiments on different days, the combined data were deemed to produce a positive response if the median response exceeded the threshold. The threshold was chosen by examining the PTE pool responses. There were two patterns of response: (i) pools that were clearly positive, in that they consistently showed elevated responses, and (ii) pools that showed either no responses above background or sporadic responses that might have resulted from random noise or rare actual responses. The threshold was chosen to discriminate between these two patterns. A separate threshold was determined for each T-cell type/cytokine combination.
We did measure background counts on each microtiter plate, as well as responses to the negative control. However, these measurements were too noisy to be useful. We instead used the threshold to assess positive response directly, without first subtracting estimated background.
To assess functionality, we computed a matrix whose rows denote particular experiments and whose columns denote small peptide pools. For each element of the matrix, we assigned the number 0, 1, or 2, depending on the number of positive responses for TNF-α and IFN-γ observed for the corresponding experiment and peptide pool. Some experiments were also performed testing IL-2 responses, but the results were weak and sporadic and thus were excluded from further analyses. We then used a standard agglomerative clustering algorithm (
35), using Euclidean distances, to cluster the experiments (row vectors) and the peptide pools (column vectors) (
http://www.hiv.lanl.gov/content/sequence/HEATMAP/heatmap.html, based on the R package heatmap.2). These cluster patterns are shown on the margins of the heat maps (see Fig. ), which were generated by color coding the responses to indicate those that generated no response (pale yellow), one response to either TNF-α or IFN-γ (orange), or responses to both (red). Statistical support for the various clusters is indicated on the dendrogram branch points, based on the approximately unbiased test of multistep-multiscale bootstraps (
33).
The data in these experiments came from a total of 352 microtiter plates, each of which measured IFN-γ or TNF-α responses to CD4
+ or CD8
+ T cells for a particular vaccine modality (vac/mod) on a particular day. By vac/mod we refer to the DNA vaccine antigen cocktails (including one, two, or three mosaics; three natural strains selected to provide in combination optimal 9-mer coverage; and three natural strains, one each from clades A, B, and C) and the Env modifications (including gp160, gp145ΔCFI, gp160ΔCFI, and gp145ΔCFIΔV and gp145ΔCFIΔV, where ΔV refers to removal of the hypervariable loops and ΔCFI refers to deletions of the cleavage site, fusogenic domain, and spacing of heptad repeats 1 and 2) (
4). In some cases, all or part of the data from a given plate were clearly affected by systematic error, as indicated by trends or consistently elevated responses from pools in contiguous regions of the plate. Such plates, of which 17 involved CD8 and two CD4, were left out of the analysis. Thus, a total of 333 plates were used. Among these plates, there was also a very small fraction of small peptide pools (0.3%) for which data were unavailable. We did not try to estimate the missing data.