|Home | About | Journals | Submit | Contact Us | Français|
There is great demand for high-throughput methods to characterize ligand affinity. By combining mRNA display with next-generation sequencing, we determined the kinetic on- and off-rates for over twenty thousand ligands, without the need for synthesis or purification of individual members. Our results are reproducible and as accurate as other methods of affinity measurement.
We were able to obtain the kinetic on- and off-rates of over twenty thousand individual ligands for their target protein, without the need to synthesize each individual ligand separately. We accomplished this by combining mRNA display and high throughput DNA sequencing
Various in vitro selection techniques (e.g., phage display, ribosome display, and mRNA display) facilitate the generation of polypeptide ligands against targets of interest. Recent advances combining in vitro selection with high-throughput sequencing have greatly accelerated the process of generating large lists of potential ligands. The challenge, increasingly, is to rank the molecules based on desirable properties, chiefly, their affinity for their targets.
Initially, we hypothesized that the affinity of a ligand would directly correlate to its frequency rank-order in an affinity-enriched pool of ligands. Although we have shown with mRNA display that higher ranked sequences do exhibit functionality, we observe that a sequence’s rank does not accurately predict binding affinity. One source of this variation is that all sequences above the threshold of each enrichment step will show near quantitative pull-down. The enrichment efficiency of high affinity vs. ultrahigh affinity sequences thus can depend on other factors during selection such as PCR bias, transcription bias, ligation bias, translation efficiency and the efficiency of fusion formation. This means that while after a pool’s convergence, the highest represented sequences exhibit functionality, their rank order is poorly correlated to their affinity.
Due to this effect, there is a great need for methods that evaluate the affinity of a ligand for its target in a high throughput manner. Advances in the field have increased the throughput of Kd measurements using radioactivity,[5a] SPR or fluorescent microarrays, and ELISA assays.[5b] However, all these methods require individually expressed and purified ligands, greatly reducing their throughput. Measuring the Kd for thousands of potential ligands simultaneously has not yet been achieved.
In this work, we combined high throughput DNA sequencing with mRNA display to obtain kinetic on- and off-rates, and consequently Kd values, for tens of thousands of ligands simultaneously. To demonstrate our method, we chose two enriched pools from our selection against B-cell lymphoma extra-large protein (Bcl-xL) (Takahashi and Roberts, manuscript in preparation). The first pool is the final enriched pool from a selection resulting in 21 amino acid long peptide ligands against Bcl-xL (extension selection). The second pool is the final enriched pool of a doped (biased) selection based on one of the top sequences (E1) from the extension selection to further optimize binding. mRNA from both pools were ligated to a 3′ DNA linker attached to puromycin, in vitro translated, purified and reverse transcribed. A small fraction of each pool was also translated using radiolabeled methionine to track pool binding.
To obtain on-rates by High Throughput Sequencing Kinetics (HTSK), a library of mRNA-peptide fusions was first mixed with Bcl-xL immobilized on magnetic beads. A portion of the beads was removed at various time points, washed, PCR amplified, and sent for next-generation sequencing. A sample calculation is shown for peptide sequence E5 in Figure 1a. High throughput sequencing of each time point allowed identification of all the ligands bound to the beads at that point as well as each sequence’s frequency. We were able to calculate each sequence’s fractional composition by dividing the sequence frequency by the total number of sequences at each time point (Figure 1b, left panel). Sequences with fast on-rates bind to the target quickly; therefore, they have a high fractional composition at early time points. As time passes, more slow on-rate ligands bind to the beads, reducing the composition fraction of the fast on-rate ligands.
Separately, using the radiolabeled samples, we measured the total amount of peptide bound to the beads at each time point (Figure 1b, middle panel). The amount of radioactivity at each time point represents the sum of all the peptides bound to the beads at that point. In cases with ligand pools of small diversity, where the composition of pool can vary significantly during the association and dissociation phases, it will be prudent to normalize the radiolabeled signal to the average number of methionine residues per peptide in each pool to obtain a more accurate measure of total peptides bound to beads. This normalization will only be needed where radiolabeled binding is being used to measure the total peptide bound at each time point and where diversity is low. The radiolabeled binding is not the only method of measuring the total amount of peptide bound to beads. Similar calculations can be performed using immunosorbent or fluorescence assays, or simply by quantitating the amount of DNA/RNA bound to the beads. To obtain the kinetic on-rates for each ligand, we simply multiplied each ligand’s fractional composition by the total amount of peptide bound to beads. This results in a measure of binding for each sequence as a function of time (Figure 1b, right panel).
Using this analysis, and knowing the concentration of immobilized Bcl-xL, we obtained the kinetic on-rate for each sequence by fitting the binding data to a simple kinetic on-rate equation (see Figure S1 in the Supporting Information). Since the concentration of Bcl-xL (7 nM) was much higher than the concentration of the mRNA-peptide fusion molecules (<1 nM), the fraction of ligand bound is not a function of ligand concentration. The contribution of the dissociation-rate to the binding equation has been removed because in the small time scale of this experiment (~45 minutes) and given the slow off-rate of the sequences tested (2 × 10−6 s−1 on average), the contribution of the dissociation rate is minimal. This allows for independent calculation of on- and off-rates (see Figure S1 in the Supporting Information).
To obtain the HTSK off-rates, we followed a similar approach. After the kinetic on-rate experiment, the remaining beads were washed and excess Bcl-xL was added in solution to prevent rebinding of dissociated ligands to the beads under pseudo-first order binding conditions. A small fraction of beads was removed at various time points, washed, PCR amplified, and sent for next-generation sequencing (Figure 1c, left panel). By multiplying each sequence’s fractional composition by the total radiolabeled peptides still bound at each time point (Figure 1c, middle panel), we were able to obtain the amount of each peptide still bound as a function of time. A simple exponential decay fit was then used to calculate the kinetic off-rate (Figure 1c, right panel). The data for generating Figure 1c is presented in Table S1 in the Supporting Information. We were also able to obtain the pool’s on- and off-rates by quantitating the amount of DNA on the beads at each time point. The kinetic rates obtained by DNA quantitation matched the rates obtained by radiolabeled binding (<30% deviation, Table S2 in the Supporting Information).
Figure 2a shows the Kd obtained for the 50 highest frequency ligands in each tested pool. As expected, the ligands in the doped pool exhibit a higher affinity on average than the ligands in the extension pool. It is also clear that frequency rank poorly correlates to sequence affinity. To show the reproducibility of the obtained kinetic constants, we compared the obtained values for the 40 ligands that appeared in both the extension and doped pools (Figure 2b). The results show that the HTSK values are remarkably reproducible and highly precise.
In order to check the validity of the obtained results, we tested the off-rate of several ligands using in vitro translated radiolabeled peptides. The peptide ligands were made using a C-terminal HA tag, and affinity purified. The off-rate of the radiolabeled peptides was then measured. Figure 2c shows the HTSK vs. radiolabeled peptide off-rates. The HTSK off-rates correlate very well to the radiolabeled peptide off-rates, however, there is a consistent bias between the two methods. The measured bias is ~7-fold for the fastest off-rate clone, and less than 2-fold for the slowest off-rate clones. This bias is relatively small in comparison to biases measured between other established methods for affinity measurement, which frequently vary by as much as 60-fold.[6b, 8] One contributing factor to this difference could be the context of binding. The HTSK results are obtained for mRNA-DNA-peptide fusion molecules whereas the radiolabeled koff values are for the peptide with a short C-terminal HA tag. To further demonstrate the accuracy of this assay, we compared the HTSK Kd values for two of the peptides with previously published results. E1 peptide has a Kd value of 39 ± 6 pM by ELISA, and 23 ± 2 pM by HTSK, and D1 peptide has a Kd value of 9 ± 2 pM by ELISA and 15 pM by HTSK (see Table S3 in the Supporting Information).
Using HTSK, we identified peptide D79 (frequency rank of 79 in the doped selection pool) with a koff value of 5.9 × 10−7, over three times slower than the previously identified slowest off-rate peptide ligand (D1) or the biotin-streptavidin interaction (Figure 2d). We also identified peptide E1452 (frequency rank of 1452 from the extension selection pool) with the koff value of 8.5 × 10−7, over two fold slower than D1 (see Figure S2 in the Supporting Information). Indeed, in this modest chain length (21 amino acids long), using HTSK, we have identified thousands of sequences with 10 pM Kd or better (~2,600 in Supporting Dataset). The presence of rare sequences with higher affinities than the most frequent sequences suggests the need for testing lower abundant sequences for functionality. While this is not practical when individual sequences must be synthesized and tested, our HTSK method provides a viable approach to testing thousands of sequences simultaneously.
While we used mRNA display for determining HTSK of an in vitro library, the HTSK approach is directly transferable to aptamer selection techniques or any monomeric genotype-phenotype linked display system (e.g., ribosome display). The results from such a high-throughput analysis could be used not only to find the highest affinity binders, but also to obtain structural information from the mutational analysis of a protein. Here, we have shown our HTSK method to be reproducible and accurate, and have identified the highest affinity peptide-protein interaction yet discovered.
This work was supported by NIH grants R01AI085583 (R.W.R.) and R01CA170820 (R.W.R. and T.T.T.) and the Ming Hsieh Institute for Research on Engineering-Medicine for Cancer (RWR). The authors made use of the USC Nanobiophysics core and the USC Genome & Cytometry Core as part of this work. We thank Mehmet Cetin for providing the initial version of the python code used in the analysis of high throughput DNA sequencing results.
Supporting information for this article is given via a link at the end of the document.
Dr. Farzad Jalali-Yazdi, Department of Chemical Engineering and Materials Science.
Lan Huong Lai, Department of Chemistry.
Prof. Terry T. Takahashi, Department of Chemistry.
Prof. Richard W. Roberts, Department of Chemical Engineering and Materials Science. Department of Chemistry. Department of Molecular Computational Biology, USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, Ca, 90089 (USA)