Reconstructing biological systems in vitro
is one of the many interesting challenges in synthetic biology (16
). When compared with the in vivo
host, an in vitro
system sometimes offers unique advantages, for instance, it is ideal for toxic genes such as restriction endonucleases for which approaches using living hosts have proven to be difficult (2
). Minimal in vitro
systems are in general more configurable and reaction intermediates are more accessible, making them more amenable to engineering. For the genetic selection of DNA or RNA, without the barrier of a cell membrane and the limitations of transformation, an in vitro
system is capable of exploring even larger libraries and functionalities.
In this article, by using in vitro
compartmentalization to generate myriad aqueous droplets in oil as artificial cells, we are able to selectively amplify RE genes from bacterial genomes. The so-called ‘artificial cells’ themselves do not undergo Darwinian selection, but simply provide a means to link genotype and phenotype. The selection itself requires a method to distinguish those genotypes that have been changed from those that have not. With over 100-fold enrichment in each round of selection, typically three rounds are needed to enrich a specific gene to ‘homogeneity’ from a bacterial genome. From the genomic selection of PstI, we observe that actually most of the contaminating genes have been removed from the library after the second round of selection. We used inverse PCR and DNA sequencing to determine the ends of the selected genomic fragments containing the PstI gene after the second round selection, and found that there is only one variant present in the library at that point, which was later amplified in the third round (data not shown). This contradicted our intuition that many different templates encompassing the PstI gene would be selected and the final result would be a DNA smear on the gel. It can be explained from several perspectives. First, it seems that there is a strong selection pressure on the translation efficiency of the DNA templates, which gives a selective advantage to those templates with ribosome-binding sites very close to the start codon of the target gene for efficient translation. In the genomic selection of the PstI gene, one end of the selected DNA fragment is just 3
nt upstream of the start codon. On the other hand, the ‘substrate’ ends of the templates provide little influence on the translation efficiency and is presumably less strictly selected. This is supported by the TspMI selection, in which the selected genomic fragments end at variable points after the stop codon as far as ~300
nt apart. However, we have no explanation for the lack of variability observed in the PstI genomic selection. Second, it may result from the non-randomness of the shearing process using the nebulizer, in which strand breakage is heavily influenced by the local AT content of the genomic DNA (YZ and Chudi Guan, to be published).
Doi, N. et al
. have previously applied IVC in selecting RE genes (4
). Their method uses a DNA polymerase to incorporate dUTP-biotin to the sticky ends generated by the restriction endonuclease, permitting affinity-based purification of the genes. Using this method, they were only able to obtain a selection efficiency of ~10-fold in a single round. This is likely due to the fact that any DNA fragments that might have resulted from non-specific cleavage in the compartments could have become labeled and hence, selected. Due to this relatively low efficiency, more iterations are required to recover the desired genotype, which severely limits the potential applications. For instance, six rounds of selections are needed to select active FokI gene from a randomized FokI library at three codon positions (expected library complexity of ~8000). In contrast, our method exploits the full potential of the sequence specificity available at the sticky end by requiring ligation of the adaptor. Non-specific cleavage products or damaged ends would not result in amplification. This results in much higher enrichment during each round of selection and experimentally we find that greater than 100-fold enrichment can be obtained. This has allowed us to select genes from genomic libraries and would also permit a much greater sampling of sequence space during the selection of mutants.
When compared with the conventional methylase selection, which often requires a purified endonuclease and the selection process targets the DNA methylase, the in vitro method directly targets the RE genes in the library and merely requires a specified recognition site and a set of DNA adaptors. Thus, it should provide a possible route for searching environmental DNA samples for RE genes with desired specificities, which in principle contains a much greater genetic diversity from many co-existing microbial species.
There are possible limitations with the current approach. Because the ligation is the key step in selection, it may be less effective to select endonucleases which generate shorter overhangs or blunt cuts. This may be alleviated by placing the recognition site of a nicking enzyme close to the blunt cut site and use the nicking enzyme to convert the blunt end to a suitable sticky end (17
). Frequent cutters, such as those recognizing 4-base sites, sometimes fall outside of the application range since they tend to destroy their own genes. These enzymes are completely fine in living bacteria since there is always a companion DNA methyltransferase to protect the host. Nevertheless, it appears that the selective disadvantage of having self-destructing sites has driven a significant proportion of frequent cutters to lose the recognition sites within their genes. lists the statistics of those RE genes having their own recognition sites within their genes. For example, for a gene of 1kb in size, the probability that it does not have a particular 4-base site is ~0.0004 [i.e. (1-1/128)1000
]. This sharply contrasts with the observation that over half of the 4-base-recognizing RE genes do not have their own sites in their coding sequences ().
Statistics of restriction endonuclease genes having their own recognition sites within their coding sequences*
The in vitro
method described here has the potential to be extended to other applications, with the most relevant being the directed evolution of genes encoding enzymes (4
). This requires both extreme specificity and sensitivity in the selection method to allow an efficient search in the vast sequence space. In fact, even with the ability to select from a 1010
library, one can only possibly vary 6–7 codons with saturation. Nonetheless, numerous directed evolution experiments suggest that sometimes only a few amino acid substitutions could bring considerable changes in biochemical properties. For highly diverse libraries, theoretical derivations under simplified assumptions in this article support a reasonable strategy of using relatively large amounts of template in the early rounds of selections, with sacrificed specificity but presumably a high sensitivity, and decreasing amounts of the library in later rounds with improved specificity.
The method we describe in this article depends on our ability to select the desired genotype based on a specific alteration caused by its translated phenotype. The reconstituted in vitro
transcription/translation system plays a crucial role: it is free of many unwanted constraints that are often lethal when using living hosts, and it offers considerable modularity for potential engineering. A similar approach should allow the isolation of mutants with specifically desired properties, such as altered cleavage positions that might result in novel sticky ends, improved thermal stability by selecting at a desired temperature as well as numerous other properties that might increase the practical utility of these enzymes. In all of these cases, the selections can be done on bulk preparations thanks to the DNA-modifying nature of these enzymes. For a wide variety of other useful enzymes, it is desirable to be able to interrogate individual droplets by using some novel emulsion formulations (18
) or by flow cytometry or microfluidics (8
) in a high-throughput way, both of which have shown promise.