COBARDE was originally tested on TEM-1 β-lactamase with interesting results [
14]. There were clear indications that this enzyme was able to tolerate even long internal deletions [
15] and this was confirmed by the systematic introduction of deletions. GFP is, however, completely different because no active internal deletions have been yet reported. We thought an excellent test bed for COBARDE was to attempt to shorten this already rather rigid and structurally compromised protein.
We selected the region located between residues 129–142 as target of the mutagenesis for three reasons: 1) It is the longest loop of the protein; 2) two previous attempts of deletions in this area failed to produce fluorescent proteins [
10,
11]; 3) Published sequence alignments of GFP versus GFP-like proteins of anthozoas suggest that GFP may tolerate deletion of either G138 [
16] or H139 [
17] (G139 and H140 respectively in sgGFP).
Experimental work started with synthesis of the oligonuclotide library. One current limitation of Fmoc-based mutagenesis methods is depurination of benzoylated deoxyadenosines (dAbzs), giving rise to a high ratio of backbone cleavage (our own unpublished results). This depurination problem is magnified if the target sequence is dA-enriched at the 3' end, because synthesis proceeds from 3' towards 5' direction. The severity of the problem prevented a successful synthesis of the coding strand for the targeted region. Thus, we resorted to synthesizing a complementary sequence, further modified to reduce even more the content of dAbzs near the 3'end (indicated in bold face): 3' ctc gac ttt cca TAG CTG AAG TTC CTT CTG CCG TTG TAG GAC CCT GTG TTT GAC ctt atg ttg ata ttg 5'. This sequence contains 17 fewer dAbzs than the original sequence and was successfully assembled by COBARDE. The oligonucleotide was used as a PCR template of two partially complementary primers to generate a 148 bp double stranded fragment that included the Mlu I and Acc I restriction sites as shown in Figure . The product was digested, and ligated to the kanamycin-carrying cloning vector pT4GFPMlu (see M&M for preparation of this recipient plasmid). The ligation mixture was transformed into XL1-Blue cells to give a library of 2 × 106 variants. Analysis of colonies grown on plates for 24 h at 37°C revealed that more than 99% of the transformants were non-fluorescent to the naked eye, indicating that most of the deletions perturbed protein structure and/or function. Plasmid DNA from 40 randomly chosen fluorescent clones was obtained and sequenced revealing that 14 of the samples corresponded to re-ligated vector due to incomplete Mlu I/Acc I digestion; 22 corresponded to wild-type sgGFP created with the wild-type oligonucleotide generated in the library and only 4 of the clones were mutants that retained fluorescence. These mutants corresponded to single amino acid deletions of isoleucine 129 (sgGFP-Δ I129) and aspartate 130 (sgGFP-Δ D130). Each mutant was found twice.
On the other hand, the DNA sequence analysis of 33 non-fluorescent colonies (Table ) gave an estimation of the quality of the library and provided insights into the kind of mutations that destroy fluorescence. From the data shown in Table we draw the following conclusions:
| Table 1DNA sequence of non-fluorescent clones chosen randomly from the library generated with COBARDE |
1) A successful mutagenesis (with an average 50% mutagenesis rate) was achieved on the target region. It is clear from Table that amino acid deletions were well spread and represented along the target, except for the first codon (encoding I129) which was mutated at only 2% rate because the Fmoc-Cl delivering line was not properly primed. However, this failure was corrected from the second codon on.
2) 6 out of 33 clones (clones 28–33) contained either single nucleotide deletions or insertions that change the open reading frame of the genes. Although this ratio of undesired variants is apparently high (18%), it is within the error range found in conventional oligonucleotides as has been observed during assembly of synthetic genes [
18-
20]. Single nucleotide deletions usually occur because of incomplete capping step during each synthesis cycle. This chemical imperfection may be significantly reduced with the use of UNICAP [
21], a recently commercially available potent capping reagent. However, the remaining 1.68 × 10
6 useful variants (82%) are enough to represent the complete set of 16384 (2
14) possible deletion variants. Considering an average 0.5 mutagenesis rate per codon, each of the mutants should be represented with the same frequency and we only need a library of 75492 clones to find the least represented variant with 99% confidence [
22]. Further, since the wild-type clone was found several times in the fluorescence screening, it can be concluded that all mutants were well represented in the experimental library.
3) The library follows a roughly binomial distribution. Mutants carrying 6, 7, 8 and 9 deletions were the most frequent.
4) Most of the deletions in the explored loop destroy GFP fluorescence. This result agrees with those found by Li
et al [
10] and Kitamura
et al [
11]. Li removed the region comprising amino acids 132–139 of GFP by site-directed mutagenesis, whereas Kitamura randomly removed tri-peptide blocks in the region 125–142. Both studies found the deletions to cause a complete loss of fluorescence.
Our sample of 33 non-fluorescent mutants sequenced included only one single deletion mutant, sgGFP-Δ K141, yet two single deletions, sgGFP-Δ I129 and sgGFP-Δ D130, conserved fluorescence. To make sure that our fluorescence screening was able to pick up all active robust mutants, we decided to individually create the remaining 11 single deletion mutants and the double mutant that combines Δ I129 and Δ D130 by site-directed mutagenesis using the specific oligonucleotides shown on Figure .
Confirming the validity of the library screening, none of the E. coli expressing these mutants displayed a green-fluorescent phenotype on plates, after incubation at 37°C for 24 h. Fluorescence scanning of cultures containing each of the fourteen single deletion mutants and the double mutant, grown for 12 h at 37°C, confirmed the results observed in plates. These experiments also discarded the hypothesis that sgGFP-ΔG139 and sgGFP-ΔH140 may be functional, as suggested by the alignments of GFP versus GFP-like proteins.
Other aligments based in three-dimensional structures of GFP versus GFP-like proteins suggest that region 128–141 does not tolerate deletions and that GFP must tolerate deletion of Y143 [
23,
24]. To test the confidence of these 3D aligments for protein engineering, we removed the equivalent residue (Y144) in sgGFP by site-directed mutagenesis and the fluorescence was completely lost. The conclusion of these aligments is obvious, no prediction can be done when the sequence identity between the proteins compaired is so low. The sequence identity between GFPs and GFP-like proteins is around 25%.
Additional characterization of whole cells containing the mutants sgGFP-Δ I129 and sgGFP-Δ D130 revealed that both proteins suffered a blue-shift of two nanometers in their maximum emission and their fluorescence intensity was reduced to 21% and 17%, respectively, relative to wt sgGFP. The last result did not correlate with the phenotype observed in plates, where the green color of the mutants was only slightly less intense than the wild-type protein. We then decided to measure the quantum yield of the mutant proteins, which turned out to be 31% and 21% smaller than the parent protein, respectively. Because the quantum yield decrement of the mutants did not fully account for the fluorescence loss, we turned our attention towards protein concentration in the cells, another factor that affects fluorescence intensity. The amount of soluble and non-soluble protein for each mutant was analyzed by western blotting as shown in Figure , using anti-GFP for the detection. This experiment clearly revealed that the main reason for the reduction or loss of fluorescence of the mutants was their low concentration which, in turn, could also be due to low stability or incorrect folding [
25]. Not surprisingly, sgGFP-Δ I129 and sgGFP-Δ D130 were the best mutants expressed. To assess if the proteins were inactivated by improper folding we grew the mutants at 30°C. At this lower temperature, the fluorescence of sgGFP-Δ I129 increased from 21% to 46%, whereas sgGFP-Δ D130 increased from 17% to 116% as compared to wt sgGFP. These results indicated that both deletion mutants are thermosensitive, and even more, at lower temperatures sgGFP-Δ D130 is more fluorescent than the wild-type protein. Lower temperatures frequently favor appropriate folding of mutants [
13]. Western blotting of the mutants grown at 30°C, shown on Figure , confirmed that the protein concentration was increased.
It is worth mentioning that plated colonies expressing the other single deletion mutants remained being non-fluorescent neither at 30°C nor at 22°C during 15 days of growing.
Temperature denaturation curves (see Figure ) for the active purified mutants sgGFP-Δ I129 and sgGFP-Δ D130, demonstrated that these proteins are less heat stable than the parent protein, but not enough to give account for the significative protein reduction at 37°C. Therefore, these amino acids are essential for good folding, especially D130.
In the case of some non-functional mutants such as sgGFP-Δ L138 and sgGFP
-ΔI129/ΔD130, low protein concentration was not the only explanation for their loss of fluorescence. These two mutants gave rise to significant inclusion bodies but still a considerable amount of protein remained in solution, which would be expected to give a signal if the proteins were fluorescent
per se. We believe these mutants are correctly folded but maturation of the chromophore is impaired, in a mode similar to the colorless GFP isolated from
Aequorea corulescens (acGFP) or the enhanced mutant aceGFP-G222E [
26]. More biophysical and biochemical assays are needed to elucidate which process(es) are affected – cyclization, oxidation or dehydration.
The most important conclusion resulting from the deletion studies reported hereby is the key role of residues 131–142 (130–141 in GFP) for appropriate folding of the protein. This result agrees with results reported by Baird
et al [
13] working with permutations. They found that GFP can be opened in different locations only after residue N144 but they did not explain the absence of openings in the first half of the protein. Therefore, the region 131–142 seems to be acting as a bridge that joins two parts of the protein independently folded.
To further explore how essential is the sequence at the sub-region 131–142 we decided to subject each of the twelve positions to single site-saturation mutagenesis (see M&M for details) using the degenerated oligonucleotides shown on Figure . To our surprise, most of the variants (55%) found in the libraries of substitutions displayed a green-fluorescent phenotype in plates, after 24 h of growth at 37°C, showing that substitutions are tolerated where deletions are not. DNA sequence data from several randomly chosen fluorescent and non-fluorescent colonies (as appeared in the plate assay) are concentrated in Table . The data show that G135 is the least tolerant amino acid, with allowed replacements of this residue only producing pale green-fluorescent colonies (due to diminished soluble protein in the cells; data not shown). This buried amino acid forms part of a short α-helix located at the center of the loop. Apparently, the major function of this α-helix is to position I137 towards the heart of the barrel in order to fix part of the loop.
| Table 2Analysis of fluorescent and non-fluorescent mutants carrying single amino acid substitutions |
Positions 131, 137, 138 and 142 only tolerated conservative replacements with hydrophobic residues. Because F131, L138 and L142 are also buried in the core of the protein, these amino acids are likely to be important for fixation of the loop. The non-fluorescent mutant F130A also revealed that size of the hydrophobic side-chain is important for appropriate packaging of the protein. On the other hand, residues H140 and K141 were replaced only with hydrophilic amino acids, suggesting ionic or H-bond interactions with neighbor amino acids. Finally, residues K132, E133, D134, N136 and G139, whose side-chains are exposed to the solvent, tolerated any amino acid substitution.
Our results with substitutions confirmed the scope of the scanning mutagenesis approach to identify buried and exposed amino acids in proteins of unknown structure as proposed by Bajaj
et al [
27]. For instance, substitution of buried amino acids with charged residues, as in mutants F131D, F131K, I137D, I137E, I137R and L142D, destroyed fluorescence, apparently because of protein instability. However, it is important to note that some mutants (labeled with asterisk in Table ), initially non-fluorescent in the plate assay, turned pale green-fluorescent after 3–5 additional days of growing at room temperature, suggesting slow maturation of the chromophore as in the case of acGFP [
26].