Adapting Phage Display to Deep Sequencing
Over the past decade we have used the chimeric phage system producing libraries of random peptides ranging in length from 6 to 12 amino acids, with and without flanking disulfides to produce constrained looped peptides. In order to dramatically increase the peptide database in such experiments we adapted our Protein 8 phage display system to next generation high throughput sequencing 
. For most next generation deep sequencing methods the target DNA to be sequenced is flanked by adaptor sequences compatible with the specific system being used. As is illustrated in we modified the fth1 phage display vector 
to contain the Illumina adaptor sequences 
upstream and downstream to the SfiI cloning sites of the recombinant protein 8
gene thus generating fth1-dp. The DNA adaptors were selected so to generate compatible peptide compositions and avoid stop codons.
To test the fth1-dp system, we cloned oligonucleotides corresponding to the linear peptide epitope of the murine mAb GV4H3 (221-AGFAIL-226) derived from HIV-1 gp120 
into the modified fth1-dp vector. The chimeric phages successfully displayed the insert and were selectively bound by the mAb indicating that the recombinant Protein 8 was assembled and amenable to affinity selection (Figure S1
How Random are Peptide Libraries?
As a case in point,using the fth1-dp vector we constructed a random peptide library consisting of a total of 2×109 random 7 mer linear peptides (NNK codons were used to avoid UAA and UGA stop codons). A sample of the library was subjected to direct PCR amplification using primers corresponding to the upstream and downstream adaptor sequences thus generating amplified DNA segments directly ready for Illumina deep sequencing. The PCR product was quantified and a small quantity was added to the phi-X control channel of an Illumina GAIIx DNA sequencer (“single read” mode of 54 bases). A total of 155,241 DNA sequences were obtained of which 132,887 were unique (). Translating the inserts was revealing as it turned out that >37,000 sequences contained UAA and UGA stop codons leading to Protein 8 truncation and thus generating phenotypically wild-type phages. This situation is curious as NNK prohibits an A in the third position of the codon. The vast majority of these UAA/UGA containing phages turn out to be aberrations and result from dysfunctional oligonucleotide insertions leading to detected frame shifts distorting the intended reading frame. Of these stop codon containing phages, only half were unique; the rest appeared in multiple copies where the most prevalent insert was found 17,044 times. This indicates that truncation of recombinant Protein 8 provides a selective advantage; phage assembly and incorporation of recombinant Protein 8 appear to be more demanding than that of wild-type Protein 8. A corrected pie-chart is given in , in which all the frame shifted inserts were removed (note a remaining 1% of the inserts continue to contain UAA and UGA stop codons with no apparent frame shift in the 54 base read).
Pie charts depicting the proportion of unique peptides in phage display libraries.
A second surprising observation was that 79% of the non-frame shifted inserts contained at least one UAG stop codon. The theoretical expected frequency of UAG containing phages is about 20%. A selective advantage of UAG stop codon could be transiently realized during the initial construction of the library which is performed in MC1061 cells chosen for their high efficiency for electroporation. Thus for the first 24 hours of phage library preparation, UAG functions as a stop codon leading to the observed over representation of those phages that contain this codon in their recombinant Protein 8. The library is then amplified and maintained in DH5alpha cells that contain the supE144 suppression gene translating UAG as glutamine and thus circumventing abortive termination and ensuring the production of functional recombinant Protein 8.
In order to test the hypothesis that the over-abundance of UAG containing phages was due to the lack of suppression in MC1061 cells, another library was constructed, however this time the initial electroporation was performed using DH5alpha cells from the start. Although the transformation efficiency was markedly less, as expected (total complexity ca 108phages), the profile of inserts was dramatically improved (). Fifty-three percent of the sequences had no stop codons and were virtually all unique. Forty-five percent of the library contained only UAG of the possible three stop codons, although the bias for UAG was not completely resolved and is currently further being investigated.
Next we turned to the copy number of the most prevalent peptides in each library and asked if such multiplicity could be a random event or rather indicative of peptides that have some selective advantage? For this we performed a simulation study, as described in the Methods section, in which we studied the distribution of the most common peptide in a naïve library, i.e., a library that was not exposed to any prior affinity selection. Our simulations showed that in all 100 simulations, the most frequent peptide never exceeded 4 copies. From these simulations we conclude that peptides that appear 4 or fewer times are expected by chance. For the library presented in , the top most frequent 38 unique peptides (205 total peptides) were in the range of 5–8 copies. This indicates that the number of copies for 99.99% of the peptides in this naïve library is as expected from a truly random library. We conclude that if some source of selection takes place in naïve libraries, it only affects a tiny portion of the peptides, and even these peptides are only amplified to a very limited extent. While this demonstrates relatively little bias towards common peptides, one must consider the fact that the profile of peptides expressed does have a distinct bias towards peptides containing glutamine (the result from the suppression of the over-represented UAG codon). In view of this, the following analyses were conducted using only peptides devoid of any stop codon.
Following the Dynamics of Deep Panning
The composition of polyclonal serum is in essence a compound mixture of mAbs, each with its signature specificity embodied by the collection of peptides it binds. Thus it is anticipated that the spectrum of peptides recognized by the ensemble of antibodies of polyclonal serum will be extremely complex. Therefore, before embarking on the analysis of polyclonal serum, we first tested Deep Panning on a model mAb.
The murine mAb GV4H3 (mentioned above 
) was used to pan 1011
phages of a 7 mer random peptide library to produce three samples: the first affinity capture (Capture #1) followed by two consecutive rounds of biopanning (amplification and capture, i.e., Captures #2 and #3). For each sample the captured phages were eluted and directly amplified by PCR. Each of these DNA samples (the PCR products) was added to the phi-X control lane of a GAIIx flow cell and the raw data were filtered to exclude DNA reads that would correspond to peptides containing any of the three stop codons. The 20 top most frequent peptides for each capture are given in as well as the top peptides of the naïve library for comparison. It should be noted that the number of copies obtained for each peptide in the different samples simply reflects their relative concentration after random sampling of the eluted phages, PCR amplification and the fortuitous level of DNA used to spike the Illumina flow cell in each case. The total number of reads for each sample is given along with the number of unique peptides.
Three rounds of panning with mAb GV4H3.
Of the total 183,451 peptides obtained in the sample of the naïve library, 92% were unique, and less than one percent (a total of 451 peptides) appeared in >4 copies, suggesting that the library well reflects the expectation from a naïve library. Notably, the most frequent peptide (RIRSEEL) existed in 24 copies, which is more than two order of magnitude less than the most common affinity purified peptide in Capture #1 (). Of the peptides in Capture #1 only the top three peptides were further amplified and found in the top 20 peptides of Captures #2 and #3. Of the 17 remaining peptides 1 can be found in the 4,824 unique peptides of Capture #3. This illustrates that most of the peptides sampled in Capture #1 are non-specific background “laced” here and there with peptides that are genuinely affinity-captured by GV4H3. However, even after a single round of amplification the situation is markedly different. Of the top 20 peptides of Capture #2, twelve are also among the top 20 found for Capture #3 (all of the remaining 8 are found within the top 100 peptides of Capture #3). Hence, there is clear evidence that the most frequent peptides obtained after Deep Panning are indeed affinity selected.
Deep sequencing the phages obtained through various steps of the experiment illustrates the trend for affinity selection and amplification of phages at the expense of marked reduction of the complexity of the random peptides present in the naïve library. As is shown in Figure S2
the vast majority of peptides are unique in the naïve library where the total number of peptide copies derived from the 20 most frequent peptides comprise an insignificant proportion (<0.15%). After two rounds of biopanning the top 20 peptides represent 74% of the 118,548 peptides sequenced while the total percent of unique copies drops to 4%. Thus Deep Panning provides a quantitative depiction of the bio-panning process and enrichment of affinity selected phages through serial rounds of panning.
Epitope mapping is based on the hypothesis that the peptides affinity-selected via panning reflect the structure of the epitope bound by the antibody being scanned 
. B-cell epitopes are typically conformational and discontinuous, comprised of some 15–20 contact residues harbored within 2–3 segments of the antigen brought together via folding 
. Clearly a 7 mer peptide cannot be expected to represent an epitope, nor must it correspond to linear segments of the antigen for recognition. Rather the panel of affinity selected peptides collectively
represents the epitope and can be used as a dataset for computational algorithms designed to predict conformational B-cell epitopes.
Mapitope is such a predictive algorithm 
, which identifies significant amino acid pairs present in the phage displayed peptides that have been affinity enriched by the antibody used to pan the random peptide library. These pairs are then used to identify surface accessible residue clusters on the antigen, in this case HIV-1 gp120. These clusters are predicted to be the corresponding epitope of the antibody being studied. The output of the algorithm is a ranked list of the 5 best predictions based on the panel of peptides. As is illustrated in , the top 20 peptides of Capture #2 predict one single cluster that coincides precisely with the GV4H3 epitope. Success in predicting the correct epitope supports the conclusion that the peptides most amplified are indeed the product of antibody driven affinity selection and amplification.
Motif Analysis–multiple Patterns of mAb Recognition
A total of 4,823 unique peptides were obtained in Capture #3 (). In order to determine if these represent different patterns of affinity recognition by mAb GV4H3, the motif search algorithm, MEME 
was applied to the entire dataset. Four clear motifs were identified as are shown in . The two main motifs () are clearly related to the main core of the GV4H3 epitope. In addition, two minor motifs () are also found. Interestingly, the weakest motif (ADGIGGG) is actually the closest to the most amplified peptide of the experiment (ADGIVGW), thus illustrating that the most frequent and enriched peptide does not necessarily correspond best to the bona fide epitope of the antigen but rather may compliment the paratope of mAb most efficiently.
Deep Panning HIV+ Polyclonal Serum
The situation for polyclonal serum is markedly more complex when compared with mAb analyses. Polyclonal serum is a composite of numerous mAbs, some of which may have a common target; such as a specific pathogen or epitope, while others may be totally unrelated. Each antibody binds its own set of peptides contributing to an extensive mixture of peptides representing the ensemble of antibodies active in the serum sample. Hence the profile of peptides isolated can be extremely diverse and complicated. In order to simplify matters three consecutive rounds of biopanning were performed before deep sequencing, so to reduce the amount of irrelevant background considerably.
The phage display 7 mer library was used to bio-pan a sample of purified human IgG obtained from HIV-1+ individuals (HIVIG, Nabi, Inc. Rockville, MD). After the three rounds of biopanning against the HIVIG a total of 163,400 peptides were obtained of which 7,799 were unique sequences. The question is can one identify any correspondence between the most frequent peptides and HIV? Is a pathogen related response recognizable in analyzing the peptide sequences obtained? Therefore, we asked whether or not any of these peptides could be aligned by a BLASTP analysis against HIV-1HXB2 gp160 so to indicate some HIV specificity.
As is illustrated in , 18 peptides (8%) of the top 223 peptides (all the peptides that were ≥5 copies) could in fact be aligned to HIV gp160. In order to evaluate if this is a significant finding the same alignment was performed against 1,000 different scrambled gp160 sequences generating an average of 2.5% ±2.2 (s.d.)
hits which is statistically distinct from the success when using the native HXB2 sequence (Z-score
0.006). This further substantiates the hypothesis that the HIVIG-captured peptides truly represent regions of the viral gp160. Furthermore, identical analyses were conducted using the spike proteins of eleven other RNA viruses. The results reveal that there is no significant similarity between the peptides and the other viral proteins ().
BLASTP analysis of HIVIG-captured peptides against viral coat proteins.
MEME analysis on all the peptides ≥2 copies (648 unique peptides) identified 10 distinct motifs, each based on 14–162 unique peptides. As is illustrated in , all 18 hits in the previous BLASTP analysis could be ascribed to 5 of the 10 motifs defined. This result indicates that each of the 18 peptides was not selected by accidental alignment, but rather is part of a true motif together with many similar peptides, all selected due to their correspondence to the same linear segments of the gp160 envelope protein. This result clearly illustrates that the Deep Panning of polyclonal serum produces families of affinity selected peptides that define disease related motifs that can reveal meaningful epitopes of the pathogen. This can have application towards the development of diagnostics and vaccines as is discussed below.
Assignment of MEME motifs within HIV gp160.