An overview of IVV-HiTSeq and its two major parts are shown in . The first part is the in vitro
selection, which follows the procedure of the previously reported mRNA display method using IVV7
. The second part includes the NGS procedure and the subsequent in silico
analysis. RT-PCR amplifications with 4-base barcoded primers specific for the selection libraries were employed to deal with the large amount of sequenced reads derived from the mixture of selection libraries. The barcoded RT-PCR products allowed an in silico
quantitative analysis of interaction sequence tags in each round of selection. For the negative control, the same procedure was conducted in the absence of bait protein [bait(−)]. Finally, the bait(+), bait(−) and pre-selection samples (initial library) were separately sequenced by the 454 sequencer.
Overview of IVV-HiTSeq as a completely cell-free system for detecting interactors of a target bait protein.
To demonstrate the IVV-HiTSeq method, the above procedure was iterated for four rounds to enrich prey proteins that interacted with mouse interferon regulatory factor 7 (Irf7) from a randomly fragmented cDNA library created from mouse spleen. The primary sequence data included 206,322 reads for the bait(+), 304,504 reads for the bait(−), and 277,833 reads for initial library samples (see Supplementary Table S1
). After eliminating erroneous reads, selection-round information was assigned to each read based on its round-specific barcoded sequence. This process yielded 177,935, 278,816 and 238,683 reads for the bait(+), bait(−) and initial libraries (see Supplementary Table S1
). Finally, 47,849, 63,306 and 102,092 post-mapping reads were obtained for the bait(+), bait(−) and initial libraries, respectively. These sets of reads were then mapped to the genomic sequences (see Supplementary Table S1
) and formed the datasets that were used in the subsequent in silico
analysis to identify true positives, without the need for real-time PCR verification assays.
To validate the accuracy of the in silico
analysis of IVV-HiTSeq, we compared the results of in silico
analysis with real-time PCR assays for 21 interacting regions (IRs) that were randomly selected from all the IRs (including false-positive candidates) that overlapped with sequences in the NCBI RefSeq database. Full details of the results of this comparison can be found in Supplementary Table S2
and Supplementary Fig. S1
. First, read frequencies in each IR per selection round were calculated. These frequencies were based on the number of aligned reads in each IR. Two examples of the results of the comparison between numbers of reads obtained by NGS and numbers of molecules quantified by real-time PCR assays are shown in . The correlation coefficients between the NGS and real-time PCR datasets were calculated () and this confirmed a highly positive correlation (Pearson's correlation coefficient = 0.92) between the two. Using this ability of IVV-HiTSeq for quantification, we determined whether or not each of the 21 IRs was a true IR with statistical significance (P
< 0.001). P
values were calculated using Fisher's exact probability test for 2 × 2 contingency tables. Each contingency table consists of the number of read at a given region for a given round of the bait(+) and bait(−) experiments, and the total numbers of reads for the corresponding experiments in the selection round being compared. Differences between the initial and given rounds were compared in the same manner.
Comparison between real-time PCR data and the read frequency of 454 sequencing.
Real-time PCR assays showed that 88% (7/8) of the true positives identified by the statistical test during in silico
analysis were also recognized as positives in the real-time PCR assays, indicating that IVV-HiTSeq is highly reliable. Furthermore, 89% (8/9) of the positives recognized by the RT-PCR assays were correctly recognized as positives in the in silico
analysis, indicating that IVV-HiTSeq also had high coverage. When the in silico
procedure was applied to all the data in the datasets, 110 enriched IRs were identified that overlapped with protein-coding regions in 106 RefSeq genes (the equivalent of 106 protein-protein interactions; see Supplementary Table S3
IVV-HiTSeq was compared with conventional method using Sanger sequencing for the same prey library and bait, and 640 sequences (87%) determined by Sanger sequencing were also obtained by IVV-HiTSeq; however, most of the sequences (99.7%) obtained by IVV-HiTSeq were new and not found by Sanger sequencing (). Moreover, 88% (7/8) of the real-time PCR assays that were followed by IVV-HiTSeq, including in silico analysis, were positive, while only 43% (9/21) of the randomly chosen samples from the obtained raw reads were positive.
Overlap between IVV-HiTSeq and Sanger sequencing data.