|Home | About | Journals | Submit | Contact Us | Français|
Understanding how proteins are able to form stable complexes is of fundamental interest from the perspective of protein structure and function. Here we show that λ repressor fusions can be used to identify and characterize homotypic interaction domains encoded by the genome of Saccharomyces cerevisiae, using a selection for polypeptides that can drive the assembly of the DNA binding domain of bacteriophage λ repressor. Three high complexity libraries were constructed by cloning random fragments of S. cerevisiae DNA as λ repressor fusions. Repressor fusions encoding homotypic interactions were recovered, identifying oligomerization units in 35 yeast proteins. Seventeen of these interaction domains have not been previously reported, while the other 18 represent homotypic interactions that have been characterized at varying levels of detail. The novel interactions include several predicted coiled-coils as well as domains of unknown structure. With the availability of genomic sequences it should be possible to apply this approach, which provides information about protein-protein interactions that is complementary to that obtained from yeast two-hybrid screens, on a genome-wide scale in yeast or other organisms where large-scale protein-protein interaction data is not available.
There is broad interest in the development of genome-wide methods for identifying protein-protein interactions. Recently, several large-scale yeast two-hybrid screens have been used to generate protein interaction maps for Saccharomyces cerevisiae (Fromont-Racine et al., 1997; Ito et al., 2000; Uetz et al., 2000; Ito et al., 2001; Schwikowski et al., 2000). The λ repressor fusion system is well suited for a complementary interaction hunt focused on identifying homotypic interaction domains. Bacteriophage λ repressor requires its C-terminal dimerization domain for proper biological activity. Removing the C-terminal domain inactivates the repressor. However, a heterologous oligomerization domain fused to the native DNA binding domain can reconstitute the activity of the repressor (Hu et al., 1990). Since its original description as a system to study the oligomerization properties of the Gcn4p leucine zipper (Hu et al., 1990), the bacteriophage λ repressor fusion system has been used to map and characterize oligomerization domains present in a large number of proteins from a variety of biological sources (reviewed in Mariño-Ramírez and Hu, 2001).
The widespread use of the repressor system suggested that it could also be used to identify new homotypic interactions on a genomic scale. However, initial efforts to use the repressor system to find homotypic oligomerization domains from S. cerevisiae (Zhang et al., 1999) and E. coli (Jappelli and Brenner, 1999) were discouraging, due to high backgrounds of self-assembling peptides that did not correspond to annotated open reading frames (ORFs).
Here, we describe the use of a modified version of the repressor system in a pilot screen to identify homotypic interaction domains encoded by the S. cerevisiae genome. The modified repressor fusion system uses a weak constitutive promoter to drive the expression of the fusions as well as an amber mutation at position 103 of λcI to allow rapid screening for insert dependence (Mariño-Ramírez and Hu, 2001). We show that our modified system recovers both known and previously unidentified interaction domains, and that the background of non-ORF encoded self-assembling peptides has been substantially reduced.
The vectors used for these studies have three modifications compared to pJH391, the plasmid used in most published repressor fusion studies. First, we deleted a fragment containing the rop gene to increase the plasmid copy number and increase the yield of DNA for sequencing. Second, we replaced the lacUV5 promoter with the P7107 promoter (Zeng et al., 1997), a weak constitutive promoter, to decrease the expression of the fusion protein and eliminate the background of host mutations in the lacI gene. Third, we introduced an amber mutation at position 103 of λcI to facilitate testing the fusion constructs for insert dependence. Three plasmid vectors (pLM99-101) were constructed, allowing inframe fusion in all three forward reading frames (Figure 2).
Our general strategy is to start with a library of genomic DNA fragments cloned downstream of the repressor DNA-binding domain, select for those that confer immunity to phage infection, and then screen the survivors for those where the immune phenotype requires expression of the insert. The initial selection is done in a strain containing an amber suppressor (supF), while the screening for insert-dependence is done by comparing the ability of repressor fusions to repress a λPL-cat reporter in the presence and absence of the amber suppressor. In strains carrying this reporter, repressor activity turns off chloramphenicol resistance.
To test the feasibility of the approach we performed two reconstruction experiments. First, we constructed a library using genomic DNA from bacteriophage λ as the source of inserts. As expected, we were able to recover the C-terminal domain (amino acids 136-237) of λ repressor as well as inserts from several other λ genes, including the putative self-assembly domain from λ P (amino acids 39-233), which has previously been shown to form homodimers (Zylicz et al., 1984). Second, we tested the ability of well-characterized oligomeric proteins from yeast to drive sufficient self-assembly of repressor fusions to confer phage immunity. We cloned two known dimeric yeast proteins: full-length Tpi1p and Gcn4p as repressor fusions. Both reconstituted the activity of λ repressor in an insert-dependent manner.
We constructed libraries in each of our three vectors using quasi-random genomic DNA fragments of the S. cerevisiae strain BY4741, an S288C derivative (Brachmann et al., 1998). We estimate that each library contains ~106 independent inserts; 95% of the clones contained a single genomic insert, with an average insert size of 1000±500 base pairs.
Each of the three repressor fusion libraries was then subjected to selection for phage immunity. Survivors were screened for insert-dependence and the positive clones were identified by DNA sequencing. Figure 1 shows a flow chart for the processing of the clones through each step in the screen. The positive clones identified fall into two categories: ORF-encoded and non-ORF-encoded. We identified 180 ORF-encoded interacting sequence tags (ISTs). These ISTs were clustered into families of overlapping fragments, identifying potential homotypic interactions in 35 yeast proteins (Table 1).
We also identified 335 non-ORF ISTs, which cluster into 23 unique sequences. All of these contain runs of 5-31 contiguous cysteines, where the shorter oligocysteines are part of much longer Cysrich peptides. These non-ORF peptides are derived from the antisense strands of 20 different annotated ORFs containing poly Q, N or S sequences (see Table 2). Based on their simple sequences, these peptides could be easily identified and discarded.
The ISTs identify not only those genes that encode proteins that could form homotypic oligomers, but also the regions within the genes that encode sequences that are sufficient to drive oligomerization. For seven of the 35 ORFs with ISTs, we found more than one fragment that encoded an IST. In each case the fragments were overlapping. The sizes of the shortest ISTs for each gene range from 26 amino acids (aa) for Cat8p to 400 aa for Fap1p.
We performed literature and database searches (Cherry et al., 1998; Hodges et al., 1999) to determine which of the ISTs corresponded to interactions that had been observed previously by other methods. Among the 35 proteins identified here, homotypic interactions have been previously demonstrated for 17 of them by biochemical or genetic methods (Table 1). The evidence for interaction ranges from crystal structures of the self-assembling domain to positive results in yeast two-hybrid assays. In addition, in the cases where the oligomerization domain has been mapped (Hsp42p, Wotton et al., 1996); Hsp82p, Nemoto et al., 1995; Pho4p, Shimizu et al., 1997; Rep2, Sengupta et al., 2001; Tup1p, Varanasi et al., 1996; Yel015wp, Fromont-Racine et al., 2000) our ISTs contain the sequences shown to be needed for self-assembly.
To determine whether the ISTs recovered represented known or novel structures, we performed BLAST searches comparing our ISTs to all of the polypeptide sequences of proteins whose 3-D structures have been deposited in the Protein Data Bank (PDB). Figure 2 shows the structures of three ISTs where structures are known or can be inferred from homology. In only one case, Pho4p, we found a structure and an IST for the same protein from S. cerevisiae. The Pho4p IST encodes a homodimeric basic helix-loop-helix region required for Pho4p activity.
In two other cases, clear homology was found to one or more proteins in the PDB. KGD2 encodes dihydrolipoamide succinyltransferase, the E2 component of the yeast α-ketoglutarate dehydrogenase complex. As an essential component of the TCA cycle, ketoglutarate dehydrogenase subunits are found throughout evolution, and structures of subdomains of the E2 complex from bacterial orthologues are found in the PDB. The Kgd2p IST is 57% identical to the corresponding sequence within its E. coli orthologue (PDB Accession No. 1C4T), and 37% identical to the E2 component of pyruvate dehydrogenase in Bacillus stearothermophilus (PDB Accession No. 1B5S). As in the related pyruvate dehydrogenase complex, the E2 component of ketoglutarate dehydrogenase forms trimers. In the E. coli E2, these trimers form the vertices of a cube with 24 subunits (Knapp et al., 1998) while in B. stearothermophilus the trimers assemble into a dodecahedron with 60 subunits (Izard et al., 1999). Thus, although Kgd2p is known to form homo-oligomers in vivo (Repetto and Tzagoloff, 1991), it is unclear what oligomeric form we are detecting with the λ repressor fusions.
Glr1p encodes the yeast thioredoxin-dependent glutathione reductase. Glr1p is known to function as a homodimer. The IST from Glr1p, which encompasses the C-terminal 132 aa of the protein has 56% identity with residues 321-450 at the C-terminal end of the E. coli orthologue (PDB Accession No. 1GER) and 58% identity with residues 349-478 of the human orthologue (PDB Accession No. 3GRS). This segment of E. coli and human glutathione reductase forms a homodimeric core with a mixed α/β structure and is located at the dimer interface (Mittl and Schulz, 1994).
We also examined ISTs for their propensity to form coiled-coils, which are commonly found in protein-protein interaction interfaces. The COILS algorithm originally developed by Lupas et al. (1991) predicts coiled-coils with probabilities >80% in nine of the proteins containing ISTs identified in this study (Figure 3). In each case the IST covers part or the entire predicted coiled-coil region. The predicted coiled-coil in Tup1p has been demonstrated experimentally to be helical and sufficient to direct assembly of homotetramers (Jabet et al., 2000).
Using λ repressor fusions we were able to identify potential homotypic interactions in 35 proteins encoded by the yeast genome, including one protein from the 2μ plasmid present in the strain used to make the libraries. About half of the ISTs represent previously identified interactions, while the rest have not been described before. The ISTs we identify also represent a combination of proteins of known structure, those for which structures can be predicted with reasonably high confidence, and proteins of unknown structure.
In principle, all of the identities of yeast proteins capable of self-assembly should show up in an all vs. all interaction screen, such as the large-scale two-hybrid studies being undertaken by several laboratories. What, then, is the benefit of using the repressor fusion approach? Figure 4 shows a Venn diagram representation of the homotypic interactions found in this study and in three different two-hybrid studies: two using full-length ORFs from Ito et al. (2001) and Uetz et al. (2000) and one using predicted coiled-coil domains from Newman et al. (2000). There is minimal overlap between our results and the three large-scale yeast two-hybrid studies. The overlap among the yeast two-hybrid datasets is similarly small. Thus, most of the homotypic interactions we have found were not found in the earlier studies.
As noted by Hazbun and Fields (2001), despite the efforts to make each of the studies comprehensive, many interactions known from biochemical data have not been found by any of the large-scale interaction screens. In the study reported here we are clearly far from saturation. Interactions known to be detectable in reconstruction experiments, most notably the Gcn4p leucine zipper, have not yet been found. Although it is likely that additional screening of existing libraries will yield new ISTs, our libraries are also likely to be biased by the non-random cleavage of CviTI sites in our partial digests. New libraries based on other ways to fragment the target DNA may be a richer source of new ISTs.
In practice, comprehensive identification of protein-protein interactions will involve complementary information from a variety of genetic and biochemical approaches. Among the genetic approaches, repressor fusions are well suited to identify homotypic interactions. Newman et al. (2000) has argued that homotypic interactions, especially those involving homodimers are likely to be underrepresented in yeast two-hybrid screens due to preferential interaction of baits within a dimeric DNA binding protein over preys coming from solution. A wide variety of technical limitations will affect the recovery of ISTs from yeast two-hybrid pairs, repressor fusions or both, e.g. post-translational modifications required for folding in assembly in yeast are unlikely to be recapitulated in E. coli. Nevertheless, both two-hybrid methods and repressor fusions can clearly provide identities of many proteins involved in homotypic interactions.
A genome-wide survey of protein-protein interactions should provide two kinds of information: not only what proteins can interact, but also what parts of the proteins are involved in the interactions. One of the most useful kinds of information provided by repressor fusions is the localization of oligomerization domains on a genome-wide scale. Because repressor fusions require only single libraries of hybrid proteins to identify homotypic ISTs, the number of subdomains that can be tested scales linearly with the number of clones that can be subjected to selection for repressor activity. By contrast, detecting a homotypic interaction in a two-hybrid system requires that both the bait and prey be present in the same cell. This means that the number of protein fragments that can be tested scales only as the square root of the library size.
These considerations, along with the higher transformation efficiency of E. coli, allowed us to use random fragments of genomic DNA instead of the full-length ORFs favored by the large-scale yeast two-hybrid approaches. Thus, our ISTs provide mapping information about the location of the oligomerization domains within proteins as well as the identities of the proteins involved in self-assembly. In general, the ISTs we find are much smaller than the proteins that contain them. Where different, overlapping ISTs are recovered from the same protein (as for Hsp26p, Mdj1p, Not5p, Skn7p, Srl2p, Tup1p and Yap5p) the endpoints of the ISTs can be used to delimit the minimal region required for oligomerization. In the case of Tup1p, amino acids 1-72 have been shown to be sufficient for oligomerization (Tzamarias and Struhl, 1994; Varanasi et al., 1996). The shortest IST we found covered amino acids 1-119, while the overlap between two ISTs suggested that residues 12-119 might be sufficient to form an oligomer.
Self-assembling domains derived from IST analysis will expand our understanding of the many ways nature builds protein complexes. The domains may provide more tractable targets for structure determination than the intact proteins from which they come. Additionally, isolated interaction domains may provide useful tools for functional genomics; expression of the domains in yeast could yield dominant negative phenotypes. While this may not provide much new information in a genetically well-characterized system like S. cerevisiae, a similar approach may be useful for a variety of genetically less tractable organisms. In cases where assembly domains prove to be involved in an important cellular function, the repressor fusions themselves can provide screens for drug discovery.
Finally, detailed study of homotypic interaction domains is likely to identify new structural motifs that will be found in other proteins. Although the interactions we identify are homotypic, it is likely that in many cases, evolutionarily related structures are also used for heterotypic interactions. Examples of structures used in both homotypic and heterotypic interactions include the HLH (Robinson and Lopes, 2000) and leucine zipper (Hurst, 1994) motifs. Other interaction domains that function in both homo- and hetero-oligomers may provide additional mechanisms of regulation by combinatorial assembly of different subunits.
All the strains used in this study are derivatives of AG1688 [F’128 lacIq lacZ::Tn5/araD139, Δ(ara-leu)- 7697, Δ(lac)X74, galE15, galK16, rpsL(StrR), hsdR2, mcrA, mcrB1] (Hu et al., 1993). The repressor fusion libraries were transformed into JH787 [AG1688 (ϕ80 Su-3)]. The screening for insert dependence was done on LM58 [JH787 (λLM58)] and LM59 [AG1688 (λLM58)]. λLM58 carries a PL-cat reporter. The repressor fusion vectors (Figure 5) used to generate the libraries were pLM99 (GenBank Accssion No. AF308739), pLM100 (GenBank Accession No. AF308740) and pLM101 (GenBank Accession No. AF308741). These vectors contain an amber mutation at position 103 in the repressor, between the DNA binding domain and the DNA insert (Mariño-Ramírez and Hu, 2001).
We prepared yeast nuclei and extracted genomic DNA from S. cerevisiae BY4741 as described (Shimizu et al., 1991). The DNA was partially digested with CviTI (Megabase Research) to generate blunt ends. The DNA was cloned into the SmaI site of pLM99, pLM100 and pLM101 to generate three libraries in different reading frames to increase genome coverage. Inserts from 60 randomly chosen clones were examined by PCR amplification to establish the percentage of recombinants and average fragment size. Amplification reactions were done by PCR using Taq DNA polymerase (Promega) and two flanking primers: the cI primer (5′-AGGGATGTTCTCACCTAAGCT-3′) and T-phi primer (5′-CTCAGCGGTGGCAGCAGCCAA-3′).
Detailed procedures for selection and screening have been described (Mariño-Ramírez and Hu, 2001). Briefly, selection of immune clones was done by plating ~107 JH787 cells containing amplified fusion libraries on LB-ampicillin-kanamycin plates seeded with 108 pfu/plate λKH54 and λKH54h80. The amber suppressor in JH787 allows the expression of full-length fusions. M13 transducing stocks from the surviving colonies were prepared and used to individually transduce the repressor fusions to suppressor (supF; LM58) and non-suppressor (sup0; LM59) strains. In clones where the active repressor phenotype is dependent on self-assembly of the insert-encoded domain, phage immunity is dependent on suppression of the amber mutation. Clones identified as insert-dependent by differential repression of PL-cat in supF and sup0 strains were picked for further study; any clone where immunity was not insert-dependent was discarded.
Plasmid DNA was extracted from the positive clones and the inserts were identified by automated dye-terminator DNA sequencing from the cI and T-phi primers. DNA sequencing reactions were done using the ABI Big Dye terminator kit (Applied Biosystems) and sequences were obtained at the Gene Technologies Laboratory in the Department of Biology at Texas A&M University. The sequences were identified by BLAST (Altschul et al., 1997) searches to the yeast protein database (NCBI) to identify the open reading frame (ORF) containing a homotypic interaction. The sequences of the interacting sequence tags (ISTs) encoding self-interaction domains were inferred from the reference yeast genome sequence.
The authors thank Mike Cherry for making the homotypic interaction data available in the Saccharomyces Genome Database (SGD). We thank Svenja Simon-Marshall and W. Brian Hatten for invaluable help with library screening. We thank Peter Uetz and Stan Fields for sharing data prior to publication. We also like to thank the members of the Hu laboratory, Debby Siegele, Wei Wang, Tom Kodadek and Stan Fields, for critical comments on the manuscript. This work was supported by funding from the National Science Foundation (MCB-9808474), the Robert A. Welch Foundation (A-1354) and the Advanced Research Program of the Texas Higher Education Coordinating Board (Award 999902-116). L.M. was supported by a fellowship from Fulbright/Colciencias/IIE.