We describe a novel approach for the detection of high quality Y2H PPIs using DNA microarrays and quantitative statistics. The concept study presented here takes full advantage of the established tools for the analysis of DNA microarray data and could have important implications on how future research on protein interactomes is being conducted.
We concentrated our proof-of-principle experiments on the HTT and ATXN1 proteins, which are both neurotoxic on polyglutamine repeat expansion (18
). The approach was validated by the generation of a set of high-confidence PPIs for the HTT protein, which were based on microarray data after multiple testing for significance. These results were benchmarked against sets of known positive PPIs using a quantitative sampling strategy. F-statistics based on precision-recall distributions was used to determine automated cutoffs for high-confidence interactions. PPIs were further restricted by applying two distinct background controls (pool and vector), which allows the simultaneous selection of Y2H positives and the filtering of unspecific autoactivators. Notably, almost two-thirds of the final high-confidence PPIs for a HTT bait protein were known positives or validated by a modified LUMIER assay. Hence, by using quantitative benchmarking and F-statistics, we established a microarray-based Y2H screening method for the high-confidence mapping of PPI networks. However, we also advocate that results may be interpreted with different procedures, depending on the overall screening performance, the availability of sets of known positives and also on the specific aims intended by individual researchers (see Supplementary information
Besides the mapping of individual high-confidence PPIs, microarray Y2H screening data can be more broadly interpreted for enrichments of pathways and functional associations. This may be important when addressing biological consequences of mutations that alter structural properties in proteins and thus underlie global perturbations in PPI networks and potentially influence the outcome of disease (39
). Specifically, we addressed here potential differences in PPI patterns between protein isoforms (ATXN1-Q32 and ATXN1-Q79, containing short and expanded polyQ tracts). In this assay, ATXN1-Q79 exhibits more and stronger Y2H interactions than ATXN1-Q32.
On the other hand, our data analysis also shows that the overall PPI patterns of wild-type and mutant ATXN1 are not radically different, suggesting that ATXN1 pathology results from abnormally strong interactions with its biological partners. Although resulting from a screening effort in a heterologous system (yeast), this finding is consistent with previously observed effects of expanded polyQ tracts in ATXN1 and other polyQ disease proteins (20
). This example demonstrates how microarray-based Y2H procedures can be used in conjunction with extensive data-mining strategies to predict the biological consequences of altered proteins.
While DNA microarrays were used to address Y2H results in an earlier study (40
), a quantitative procedure, such as the one presented here with large-scale pooling of a prey library, unbiased selection by competitive growth and systematic control measurements, was not attempted before. This approach has two major advantages over matrix-based Y2H screenings. First, PPIs are characterized as scores with different parameters (ratios, P
-values, etc.) over a wide dynamic range, instead of being simple counts from identifications in replicate screens. Repetitive sampling strategies and the application of two background controls (pool and pBTM comparisons) have the important consequence that potential false-positive interactions can be addressed and eliminated (see Supplementary information
for discussion of false positive interactions). Because false positives are sometimes estimated up to 50% of all reported interactions (41
), their minimization would constitute a major advantage for mapping of high-confidence PPIs, reducing also the need for confirmation with orthogonal assays. Second, smaller volumes of medium for yeast mating and selection as well as the efficient readout provided by DNA microarrays greatly reduce labor and material costs. Simplifying the screening procedure increases potential throughput, and therefore larger numbers of Y2H screens can be performed in parallel. However, while our system is superior over the ‘classical’ Y2H method with respect of quantitative measurements, it has also some limitations. First, ‘color’-based scoring of interactions via lacZ
activation is not possible for the pool-based screening scheme. Second, some ORFs may not undergo proper PCR amplification, which could lead to a fraction of putative PPIs that are undetectable in microarray-Y2H assays. Indeed, a bias against longer DNA sequences is evidenced by the lesser representation of ORFs >2 kb in sizes on the microarray (Supplementary Figure S2
). Third, prey proteins in the complex pool that occur as different isoforms or with individual mutations may be indistinguishable on the DNA microarray. Hence, for optimal coverage of potential PPIs, DNA microarray and matrix-based robotic Y2H procedures should be envisioned as complementary approaches.