PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Chem Biol. Author manuscript; available in PMC Mar 27, 2010.
Published in final edited form as:
PMCID: PMC2728443
NIHMSID: NIHMS106594
Putting Synthesis into Biology – A Viral View of Genetic Engineering Through de novo Gene and Genome synthesis
Steffen Mueller, J. Robert Coleman, and Eckard Wimmer
Department of Molecular Genetics and Microbiology, Stony Brook University, Stony Brook, New York, United States of America
Department of Molecular Genetics and Microbiology, Life Sciences Building, Stony Brook University, Stony Brook, NY 11794-5222, USA ; smueller/at/ms.cc.sunysb.edu
The rapid improvements in DNA synthesis technology hold the potential to revolutionize biosciences in the near future. Traditional genetic engineering methods are template dependent and make extensive but laborious use of site-directed mutagenesis to explore the impact of small variations on an existing sequence “theme”. De novo gene and genome synthesis frees the investigator from the restrictions of the pre-existing template and allows for the rational design of any conceivable new sequence theme.
Viruses, being amongst the simplest replicating entities, have been at the forefront of the advancing biosciences since the dawn of molecular biology. Viral genomes, especially those of RNA viruses, are relatively short, often less than 10,000 bases long, making them amenable to whole genome synthesis with the currently available technology. For this reason viruses are once again poised to lead the way in the budding field of synthetic biology – for better or worse.
The chemical synthesis of nucleotide chains took its first infant steps soon after the discovery of the DNA double helix. The race to elucidate the genetic code was driven by the use of triplet sequences of ribonucleotides synthesized by liquid-phase chemistry. Depending on their sequence these triplets selectively interacted with amino-acylated tRNA (the codon:anticodon recognition)(Nirenberg and Leder, 1964; Soll et al., 1965), which led to the assignment of codons to their respective amino acids, and to a much deserved Nobel Prizes for these heroic efforts in these earliest days of synthetic biology. Khorana’s group “raced” to synthesize the first DNA copy of the 75 base pair long tRNAAla in 1970 (Agarwal et al., 1970) a monumental task requiring 20 man-years of labor, only to be outclassed by himself in 1979 by a 207 bp DNA cassette containing the tyrosine suppressor tRNA gene (Khorana, 1979).
The innovations of synthesizing DNA oligonucleotides (“oligos”) on solid supports (Letsinger and Mahadevan, 1965) combined with new activated phosphoramidite nucleosiodes (Caruthers et al., 1987) led to steady improvements in the availability of quality oligos up to 100 bases long. This resulted in a boost in gene synthesis activity throughout the 1990’s that continues unabatedly today. Some of the most notable synthesis achievements are summarized in Figure 1 (Agarwal et al., 1970; Becker et al., 2008; Blight, Kolykhalov, and Rice, 2000; Cello, Paul, and Wimmer, 2002; Chan, Kosuri, and Endy, 2005; Edge et al., 1981; Ferretti et al., 1986; Gibson et al., 2008; Gupta et al., 1968; Kalman et al., 1990; Khorana, 1979; Kodumal et al., 2004; Nirenberg and Leder, 1964; Pan et al., 1999; Soll et al., 1965; Stemmer et al., 1995; Tian et al., 2004). Significant landmarks include the synthesis of an entire 2.7 kb plasmid sequence by Stemmer et al. (Stemmer et al., 1995), the 4.9 kb MSP-1 gene of Plasmodium (Pan et al., 1999), the 7.5 kb of the poliovirus genome as the first synthetic self replicating organism (Cello, Paul, and Wimmer, 2002), and the 32 kb polyketide synthase gene cluster (Kodumal et al., 2004). The trend has culminated in the recent synthesis of 582,970 base pairs corresponding to the first artificial bacterial genome by the group of Craig Venter (Gibson et al., 2008). Starting with 101 prefabricated segments of 5–7 kb in length (purchased from commercial vendors), Gibson et al. used state of the art methods and brute force to assemble larger and larger DNA pieces, at first by recombination in bacteria, and finally in yeast (Gibson et al., 2008). Alas, the synthetic genome was not, or could not, be “booted” to life, by transplanting the genome into an “empty” chassis as the group has shown previously with a natural genome (Lartigue et al., 2007). Therefore, the first synthetic autonomous life form is still just below the horizon.
Figure 1
Figure 1
Pushing the limits - a historical progression of notable achievements in gene synthesis with references. Each point represents a report of an individual gene synthesis accomplishment with respect to the length of the synthetic sequence and the year it (more ...)
It is not yet possible to synthesize entire genes as long continuous strands of DNA from scratch. Rather, all synthetic genes are assembled from short custom made single stranded DNA oligonucleotides or “oligos”, which are literally strings of a few nucleotides. Oligos are by-and-large still synthesized the same way as they were 15 or 20 years ago. Through incremental improvements in instrumentation and higher throughput, oligos have become a cheap commodity for use in standard recombinant DNA technologies. But, more than anything else, great demand and even greater competition by manufacturers has driven the oligo prices down by about 10-fold over the past 15 years (Figure 2). In comparison, the prices of finished, sequence confirmed, gene synthesis by commercial gene foundries have plummeted 50 fold in only 10 years (Figure 2). As a reference point, at the outset of the poliovirus synthesis project [15] in 1999 commercial gene synthesis was simply unheard of. As recently as 2000, after much searching, we found a vendor who agreed to synthesize parts of the genome by special arrangement at a price of $12/bp (Cello, Paul, and Wimmer, 2002).
Figure 2
Figure 2
Price development of oligonucleotide synthesis and de novo gene synthesis. Shown are the approximate end user prices per base for oligonucleotides (desalted, non-purified) or per base pair for synthetic genes (below 3kb, sequence guaranteed). The data (more ...)
In the ideal world, an efficient and economical de novo gene synthesis platform would combine cheap error-free oligo synthesis with accurate assembly methods. Neither one are currently available. There are two dramatically different methods of synthesiszing oligos. In the traditional, time-proven, method of solid-phase oligo synthesis each oligo is synthesized individually, on a separate small column or a well on a multiwell plate. The method is high yielding but costly ($ 0.10–0.20 per nucleotide synthesis cost), which is a critical aspect if the oligos are needed for the assembly of long DNA sequences. The price given above translates into an oligonucleotide cost of approximately $ 200–400 for a 1kb DNA sequence and that’s for the raw material only.
The development of optical deprotection chemistries heralded a new era of parallel synthesis methods on micro biochips (Fodor et al., 1991) that can be used for both oligo or peptide synthesis. Depending on the chip platform being used, several thousands to hundreds of thousands of distinct oligonucleotides can theoretically be synthesized on a single chip.
In an ingenious extension Tian and collegues (Tian et al., 2004) mated the light-induced deprotection chemistry with microfluidic technology that allows the programmable synthesis of thousands individual oligonucleotides on a tiny chip (Figure 3A). At the heart of this method is the Digital Light Processing technology (DLP) that was developed for digital projectors and High Definition Projection TV sets. On a microfluidic chip containing a labyrinth of thousands of connected tiny reaction chambers (Figure 3C), each chamber is computer-addressable by a light beam generated on a digital micromirror device (Singh-Gasson et al., 1999) (akin to the individual color light spots making up the projection-TV picture). A DNA synthesis mixture containing the first nucleotide (A, for instance) is pumped through the system. Here, A only “sticks” to the chambers which call for an A at the specific position in their sequence, which are the ones that are being illuminated at that time (Figure 3A). Although all chambers receive the same synthesis mixture at any given times, no reaction occurs in the chambers that are “left in the dark” (in the example above, the ones that need a C, G, or T at their corresponding position). After the first reaction, the A-mix is washed out and the next reaction mix, containing the next nucleotide is pumped in and the process is repeated, four times in total. After all four nucleotide reaction mixes have gone through the chip, in each chamber the oligonucleotide chain has now grown by at least one nucleotide of the desired sequence.
Figure 3
Figure 3
Microfluidic chip technology coupled with light activated chemistries hold great promise for the massive parallel synthesis of oligonucleotides. (A) On an array of tiny flippable mirrors, each mirror can be separately computer-controlled (flipped to an (more ...)
At the end of the reaction the oligonucleotides are eluted from the chambers as a single pool. Each of the oligo sequences is only present in minute quantities. This may present a challenge in further increasing the throughput by increasing the number of reaction chambers per chip, while decreasing chip size. Tian et al. demonstrated the potential power of this technology for the synthesis of large numbers of oligonucleotides to be used in synthetic gene assembly (Tian et al., 2004).
Companies already offer parallel on-chip-synthesized custom oligo mixtures that are amenable for gene synthesis (LC Sciences, Houston Texas). Currently the price of a pool of 3,912 90-mers is approximately $1000. This technology is still very much in the exploratory stage. One inherent difficulty of the method is that all oligos are released from the chip as a mixture. The low yields of oligos that come off the chip (107 –108 molecules per sequence) are insufficient to drive a gene assembly reaction, which mandates a post-synthesis PCR amplification step before oligos can be used. For this purpose each oligo is synthesized with two flanking generic adaptor sequences, which allows amplification of all oligos in parallel in a single PCR reaction using the corresponding adaptor primer pair (Figure 4) (Tian et al., 2004). Using distinct sets of adaptors on distinct subsets of oligos in the same chip-synthesis reaction allows the subsequent selective amplification of a desired subset of oligos, for instance a set necessary for the assembly of one particular gene. Therefore, it is possible that in a separate reaction a different set of oligos can be amplified from the same chip-eluted oligo mix. Thus fractioning the entire oligo pool into gene-specific subsets will reduce complexity of the mixture, increase concentration of each specific oligo, and reduce potential interference or cross-hybridization from other oligos in the pool. This will be especially useful as the number of individual sequences synthesized on the chip increases. The higher the number of discrete oligo sequences synthesized per chip, the lower the absolute yield per oligonucleotide (sub fmole range) because the total yield of DNA is a direct function of the total reaction surface on the chip. With more distinct oligos the potential for unwanted cross-hybridizations during the gene assembly step also increases.
Figure 4
Figure 4
Assembly of gene sequences from chip-synthesized oligonucleotides. The pool of overlapping oligos in minute amounts is released from the microchip, followed by PCR amplification with universal adapter primers. Double strand copies produced in this way (more ...)
The second drawback of the chip-based oligo synthesis is that the PCR amplified oligos are now in a double stranded form. The presence of a perfectly matched antisense strand may reduce the efficiency in the subsequent assembly of these oligos into larger genes. The assembly reaction depends on the complementarity of the overlapping “construction” oligos, those designed to build the gene, and the antisense oligos are likely to compete more effectively for the same hybridization partner. To overcome this problem the desired single stranded construction oligos can be selectively enriched by specific hybridization to antisense selection-oligos affixed to a column and subsequent elution (Tian et al., 2004). When done under stringent enough conditions this procedure also contributes to a significant elimination of error-containing oligos, as they produce mismatches with the selection oligo and consequently elute from the column at a lower temperature. On the downside, this method requires twice the amount of selection oligos than there are contruction oligos. In other words, to produce one chip’s worth of oligos one needs two additional chips’s worth of selection oligos, tripling the cost of synthesis (Tian et al., 2004). This brings the current “rock-bottom” cost of the final construction oligos before the gene assembly to about $0.03/bp.
While these new multiplex synthesis systems are technically feasible it is our understanding that the major suppliers of large synthetic DNA for now continue to assemble genes from individually synthesized overlapping oligonucleotides by traditional methods.
The sheer number of different oligonucleotides synthesized on a chip mandates the use of new software programs to handle the complexity of possible interactions of the various oligo sequences in the mix (Czar et al., 2009). Several software programs are freely available to design optimal sets of assembly oligonucleotides. The basic tasks that successful software needs to perform are:
  • Breaking down the target sequences to be synthesized into suitable overlapping oligos.
  • Designing hybridization units, the overlapping portion between two oligos, with the same melting temperature.
  • Ensuring hybridization specificity of each oligo pair to eliminate potential cross-hybridization by choosing the best possible breaking points between oligos for a particular gene, and by altering synonymous codons.
There are two basic methods available for assembling long DNA sequences, such as virus genomes, from short overlapping synthetic oligonucleotides, direct assembly PCR and ligase chain reaction (LCR) followed by fusion PCR with flanking primers.
Assembly PCR
Assembly PCR is based on the principle of generating stepwise elongation of the amplicon, a piece of DNA formed in an amplification event, by one oligonucleotide at each end of the growing amplicon with each PCR cycle (Stemmer et al., 1995), and on the possibility of intermediate products to act as overlapping megaprimers to assemble even larger amplicons (Figure 4). Theoretically, the reaction continues until the two outermost oligos are incoproated to give the full length product. The full length product is subsequently amplified with an excess of the two flanking PCR primers. Practically, obtaining large DNA fragments in a single assembly reaction is exceedingly difficult. For this reason, and for error-management purposes, it is generally necessary to first synthesize, clone and verify the sequence of several intermediate size sub-fragments (500–1000 bp). These can then be linked by fusion PCR to form larger genes or by standard cloning methods.
Ligase chain reaction (LCR) followed by fusion PCR with flanking primers
The ligase chain reaction (LCR) is similar in that it uses overlapping oligos. But unlike with PCR assembly, oligos for LCR have to be designed to anneal without gaps between them, head to toe, forming annealed stretches of DNA which are then ligated using a thermostable DNA ligase (Barany, 1991). In contrast to PCR assembly where a single oligo is added at each end of a synthon in each cycle, during LCR several overlapping oligos can be ligated to one another. Owing to the thermostability of the ligase, LCR can be cycled similar to a PCR reaction, leading to assembly of longer and longer chains, but no net amplification. The desired product is finally amplified by PCR using gene-flanking primers.
Regardless of the many variations on the theme of how to assemble a large synthetic DNA, at the core of all current methods are chemically synthesized oligonucleotides. The downward price trend for oligos has slowed significantly over the past 5 years and appears to be bottoming out (currently in the $0.10–0.20/base range). As the price gap, and therefore the profit margin, between finished synthetic genes and their oligo building blocks is narrowing, it can be expected that oligo-based gene synthesis prices will soon follow. For long DNA synthesis to become economical, radically new technologies need to be developed that either reduce the errors in run-of-the-mill oligos by orders of magnitude, or allow de novo gene synthesis independent of the error-prone oligonucleotide chemistry, perhaps by developing enzyme based synthesis of long accurate polynucleotides. Barring such breakthrough, the routine synthesis of bacterial or larger genomes will likely remain prohibitively expensive for some time to come. As a case in point the recent synthesis of the Mycoplasma genome (Gibson et al., 2008) cost an estimated $ 10 million (Herper, M. 2007, http://www.forbes.com/2007/06/28/venter-synthetic-bacteria-tech-science-cx_mh_0628venter.html). At the research level on the other hand, once gene synthesis hits the $0.10–0.20/bp price range synthesis will very likely replace the traditional recombinant DNA methods for many smaller scale cloning projects within the next few years.
A major problem with genes assembled from overlapping oligos is the inherent error rate of about 1% during the chemical synthesis of the oligos themselves. The most frequent error is the failure to incorporate bases due to less than perfect deprotection of the reactive groups or incorporation of the incoming nucleotide. It appears that there is a rather hard limit for improving the oligo accuracy during the synthesis step much beyond the 1/100. Therefore several techniques are being employed, often in combination, to improve the accuracy of oligos and the assembled DNA intermediates.
  • Keeping the oligos and the overlapping regions between them short (40–50 bases) not only reduces the relative error rate per nucleotide in the oligo but also increases the disruptive effect of mismatches between annealed oligos. Using stringent hybridization conditions thus reduces the chance of incorrect oligos to partake in the assembly reaction (Young and Dong, 2004).
  • A common approach is to gel purify oligos before the assembly reaction, which helps eliminate many of the shorter aberrant oligo species. This reduces the error rate to about 1 in 500. At this error rate, short, several hundred base pairs long, intermediate assembly products are cloned by traditional recombinant DNA methods and sequence verified. The vetted sequence segments are then either combined by further rounds of cloning, or by assembly PCR. The need for gel purification is another reason to keep oligo length limited, as oligos that are too long can no longer be effectively separated from the most troublesome offender, the N–1-mer. If all construction oligos for one specific synthesis project are kept the same length, the gel purification can be done by combining all oligos in one sample, much reducing time and cost (Smith et al., 2003).
  • Another approach relies on the selective hybridization of the construction oligos to a column of immobilized selection oligos (Tian et al., 2004), as noted above.
  • Finally, a second tier of error correction can be implemented after the LCR or PCR assembly of gene fragments. It is based on the enzymatic activity of T7 endonuclease, which recognizes and specifically cleaves dsDNA at mismatched nucleotide pairs (Picksley et al., 1990; Young and Dong, 2004). Following the final PCR amplification the DNA amplicon is heat denatured and re-annealed. Since mutations in the original construction oligo sequences are distributed randomly the probability of two hybridizing strands to carry a mutation on one and the corresponding compensatory mutation on the other oligo is miniscule. It can therefore be expected that virtually every mutation in every oligo that participates in the assembly reaction will create a mismatch. Similarly, error correction by mismatch binding proteins, such as MutS of Thermus aquaticus, can be employed, facilitating the separation of the MutS bound mismatched DNA from the correct DNA by gel electrophoresis (Carr et al., 2004).
The quality of the oligos critically determines the practical size of the synthesis intermediates that need to be cloned and sequence verified (Carr et al., 2004). If sequence errors follow a normal Gaussian distribution along the length of the DNA an error rate of 1 in 600 would make it impractical to assemble a DNA longer than 1–2 kb in a single reaction without intermediate sequence verification (Figure 5).
Figure 5
Figure 5
The impact of oligonucleotide error rate on the accuracy of assembled synthetic genes. The various curves assume error rates in the construction oligonucleotides typically achieved after different error correction methods used to assemble a target sequence (more ...)
Codon Optimization
In many cases it is desirable to express a gene of interest (often a human gene) in a heterologous, more economical, expression system, such as bacteria or yeast. All too often, however, the codon usage within the gene is at odds with the codon usage of the new host species. As a result the gene expresses poorly. Thus, the need for “codon optimization” was born (Itakura et al., 1977). During codon optimization the codon usage of the gene is altered to reflect that of the host species by replacing suboptimal codons with preferred synonymous codons. Since this often involves many simultaneous sequence changes, it is best done by de novo gene synthesis. Probably the best known example of codon optimization is the “humanization” of the Green Fluorescent Protein (GFP) of the jellyfish A. Victoria (Zolotukhin et al., 1996). Codon optimization is currently still the most prevalent reason for de novo gene synthesis (Gustafsson, Govindarajan, and Minshull, 2004).
In some instances gene synthesis has been used to recreate a DNA sequence from a publicly available sequence database in an effort to sidestep licensing, patenting or material transfer issues.
Creating new chassis for protein engineering
It is theoretically possible to synthesize a bacterial genome in which the redundancy of the genetic code is eliminated, such that each amino acid in every bacterial protein is represented by exactly one codon only. Thus, only 20 codons plus one Stop codon would be needed to synthesize all the bacteria’s own genes. At the same time, the remaining 43 “orphaned” codons could be freed up to specify non-natural amino acids. Bacteria with such an expanded genetic code could one day become a powerful chassis for the production of artificial proteins (2006; Carr and Isaacs, 2006).
Viral Gene and Genome Synthesis
Viruses are amongst the simplest replicating genetic systems. For this reason they have been at the forefront of the advancing biosciences since the dawn of molecular biology. Their small genome sizes (most RNA virus genomes are 10+/−5 kb) makes them amenable to whole genome synthesis with the currently available technology. For this reason viruses are poised to lead the way in the budding field of synthetic biology.
A significant use for genome synthesis consists in the recreation of viruses or perhaps other organisms in the future, for which no intact natural template is available. The synthesis of the 1918 flu virus was accomplished by piecing together sequence fragments recovered from victims buried in the Alaskan permafrost and archived tissue samples (Tumpey et al., 2005). The creation of bat SARS coronavirus (Becker et al., 2008) and HIV from Chimpanzee feces (Takehisa et al., 2007) also fall into this category. A clever extension of this idea has been the resurrection of live infectious retroviruses assembled from a consensus of ancient remnants that are endogenous to the human genome, and which have perhaps been inactive for millions of years (Dewannieux et al., 2006; Lee and Bieniasz, 2007). Once the stuff of science fiction movies, these “Jurassic Parkesque” projects are likely to be just the teaser trailers of the coming attractions in the budding synthetic technology.
Through the process of natural selection, evolution favors systems that work, especially those that work better than their direct predecessors and competitors. This selection process however does not follow what humans would consider a logical design process. Evolutionary changes are small and incremental following a one-directional ratchet that does not move backward. There is no “reset” button that allows evolution to jump back to an earlier version and try again. De novo gene and genome synthesis provides this virtual reset button by allowing the creation of any conceivable genome at will and at once, no matter how different from its predecessor.
One recurring theme in viral genomes is the evolution of overlapping reading frames. This space saving measure allows a virus to encode portions of two proteins on the same stretch of genome sequence, but in two different reading frames. Studying individual genes and proteins of such a virus genetically and biochemically poses a problem for the experimenter, since manipulating one protein inadvertently changes the other. To simplify these interdependencies in the genome Chan and colleagues redesigned and synthesized parts of the bacteriophage T7 genome, eliminating the overlapping reading frames (Chan, Kosuri, and Endy, 2005). In the resulting virus, the individual genes could be then manipulated and studied independently, a process they called “refactoring” in analogy to the process of redesigning and improving computer code, while retaining it’s basic function.
Exploiting the intrinsic sequence biases of the human genome for the generation of synthetic virus vaccines
The basic mechanism of mRNA translation is preserved from the simplest virus to the most complex organism. Viruses, just like human cells need to produce mRNA molecules, which are used to convert their genetic information into proteins. Different viruses have devised different strategies to accomplish this, and have different ways to store this genetic information in their genome. Invariably, however, viruses need to divert the host’s cellular machinery for the translation of their proteins, as they themselves cannot execute this function. The degeneracy in the genetic code (several synonymous codons specify the same amino acid) gives an organism the flexibility to encode a given protein sequence in its genome in an unimaginably large number of ways. The poliovirus polyprotein, for instance, could be encoded by a staggering 101100 different mRNA sequences, all of them specifying the same protein sequence (for comparison, the number of atoms in the observable universe is estimated to be on the order of 1080). This raises the question to what extend the natural encoding of a gene is optimal or special. The cell’s preference of one synonymous codon over another to specify the same amino acid is termed “codon bias”. It is thought that codon bias is correlated with the abundance of the corresponding cognate tRNAs in the cell. Consequently, rare codons are associated with a suboptimal translation of an mRNA. In addition, the frequencies of which two codons occur next to one another in the genome are not what is statistically expected from the frequencies of the two codons that make up the pair - a phenomenon called the “codon-pair bias”. There are codon-pair combinations that are statistically greatly underrepresented while others are greatly overrepresented. The significance of codon pair bias has been largely unknown and underappreciated. We have recently shown that it is possible to exploit the codon-pair bias phenomenon for the synthesis of novel live attenuated forms of viruses with incredible properties (Coleman et al., 2008). By large-scale computer-aided redesign of the viral genome we engineered hundreds of silent mutations into poliovirus. These mutations were targeted to introduce a maximum number of unfavorable synonymous codon-pairs, without changing codon bias or protein sequence. By forcing a virus to “make do” with this heavily biased synthetic genome we showed that viral protein translation is greatly reduced. Thus, codon-pair deoptimized viruses cannot reproduce their genetic information as quickly as their wild type cousins which puts them at a decisive disadvantage against the host’s innate and immune defences. One of the major benefits of the whole-genome deoptimization strategy is that the resulting attenuated viruses are phenotypically and genotypically extremely stable. The attenuation (att) phenotype is dependent on many hundreds, even thousands, of silent mutations, each by themselves virtually inconsequential, or “death by a thousand cuts”. Therefore, the fitness gain from reverting individual mutations appears to be too small to drive genetic selection, and thus, reversion apparently does not occur (Coleman et al., 2008). We termed this process of perturbing intrinsic viral genome biases by synthetic genome re-design SAVE for Synthetic Attenuated Virus Engineering (Figure 6).
Figure 6
Figure 6
Recoding of viral genomes according to the SAVE method (Synthetic Attenuated Virus Engineering). A. Example of the level of sequence alteration after codon reassignment of the poliovirus capsid gene (Mueller et al., 2006). PV(M), part of the wild type (more ...)
SAVE attacks a virus at one of the most fundamental processes common to all living systems, the translation of protein, for which viruses depend on the host cell’s machinery. Thus it should be predicted that SAVE may work on most if not any virus. The rational genetic changes imposed on SAVE designed viral genomes are completely independent of protein sequence. The viral protein sequences, and therefore their function remain 100% preserved in the recoding process. Therefore an understanding of the proteins function is not necessary, sidestepping the need of most of classic virology in order to produce an attenuated vaccine candidate in a very short time with a predictable degree of attenuation in virtually any virus system. Viruses live lives of genetic austerity, and therefore don’t usually carry unnecessary genes around. By that rationale most viral genes product can be considered essential. Depending upon the virus system, interfering with the synthesis of several of those genes just a little bit turns out to pack a great punch against the overall fitness of the virus (Coleman et al., 2008; Mueller et al., 2006)
Using the SAVE method we can profit from these genomic biases that have arisen over evolutionary time-scales and turn them upside down and inside out, undoing eons of viral evolution. If we think of evolution as “walking” along a dirt path, SAVE allows us to “leap” across the evolutionary universe at warp speed. Since it is evident that many viruses have actively selected against the occurrence of certain sequence features, such as unfavorable codons, codon-pairs, as well as other sequences motifs, the whole genome recoding approach by de novo synthesis will very likely have a profound effect on any virus.
General requirements for the application of SAVE to a virus system
Since SAVE targets a virus at the level of protein translation, a function elementary to all viruses, we believe this approach is applicable to many virus systems for which the following basic requirements are met:
  • A target virus has a known genome sequence, preferably available online.
  • The desired de-optimized genome sequence are prepared by computer aided redesign using the SAVE algorithm
  • De novo synthesis of the artificial viral genome is performed according to the design specifications, usually outsourced to a commercial vendor.
  • A reverse genetics system is employed to boot the artificial genome to life and make a virus. This is decidedly simple for many human viruses. Often a genome length copy of the DNA itself or an RNA transcript of that DNA is infectious upon transfection into susceptible cells.
  • A method to screen for viruses of desired phenotype has to be available. An initial screen in susceptible cell culture will yield valuable information as to the viability of various deoptimized virus designs. Clearly the virus still must be able to replicate at least at a low level in order to be useful as a live vaccine.
  • A suitable animal model to test attenuation and immune response is required.
Provided above requirements are met, SAVE strategy can successfully be employed for redesign and synthesis of viruses.
Synthetic virology, i.e. the redesign and synthesis of custom-tailored whole virus genomes, has become economically feasible with recent rapid improvements in DNA synthesis technology. This holds the potential to revolutionize the way virology and vaccinology is done. Viral genomes, especially of RNA viruses and retroviruses are short enough to make them amenable to whole genome synthesis with currently available technology. Such freedom of design could provide tremendous power to perform large-scale redesign of DNA/RNA coding sequences, to study the impact of large-scale changes in codon bias, codon-pair bias, dinucleotide biases, GC content, RNA secondary structures, and other sequence signatures, on viral fitness, with the aim to develop a new platform for vaccine design and genetic engineering.
What is synthetic biology? It is neither a field in its own right, nor a separate science. It is perhaps best described as an improvement of existing enabling technologies that are beginning to penetrate mainstream sciences, as they become more and more economical. This has led to an “organized” crossover of different scientific fields (e.g. biology, chemistry, mathematics, engineering etc.) that promises to yield organisms with useful biochemical pathways never seen before.
The new reality of synthetic genes and genomes calls for a fundamental revision of the ways biology is taught to students. The Johns Hopkins University has already embraced these cutting-edge developments, and is now offering an undergraduate course, in which the students collaboratively work toward synthesizing the yeast genome. Impressively, within only one year this unified effort resulted in the synthesis of hundreds of 750bp cassettes amounting to the 280kb of the yeast chromosome III (Dymond et al., 2009). An equally imaginative and playful introduction to engineering of biological systems is fostered by the International Genetically Engineered Machine Competition (iGEM; http://www.igem.org) organized by synthetic biologists at MIT. Here undergraduate teams compete in designing and building genetic circuits and systems from an ever expanding toolkit of standard genetic parts, or “BioBricks” (Goodman, 2008)
However, although the excitement about synthetic biology is substantial enough, it faces equally big scepticism and “fear of the new” in our society. A disservice to their own science is perhaps the tendency of some researchers in the “synthetic biology field” to overvalue its novelty and uniqueness. The most commonly cited public concerns with regard to synthetic biology are probably the ethical implications connected with the creation of “new life forms” and the fear of synthetic “killer viruses”. These sentiments are often picked up and fuelled by the media potentiating the perceived fear of the uncertain.
Virtually every organism ever modified in molecular or genetic research is by definition a new life form. This definition could be expanded to all naturally occurring organisms that genetically differ from their parent, in other words: all the living creatures. Why would an organism created by synthetic methods be qualitatively different? The question presents itself: “Why do we, as a society, worry more about the possibility of a synthetic designer pathogen, when some of the worst pathogens known to mankind are still raging?” Measles virus, as a case in point, is one of the most contagious viruses to humans. As recently as in 2000, approximately 777,000 people died per year from measles, and in third world countries with poor health care systems the fatality rate can be as high as 28% (Perry and Halsey, 2004). Annually, 250,000 – 500,000 people die from complications of the flu (WHO, 2003). Additionally, only a few critical mutations in the H5N1 bird flu virus separate us from a virus that can easily spread amongst humans and lead to an influenza pandemic. The AIDS pandemic, caused by primate viruses that jumped the species barrier to humans, claims approximately 2 million lives annually (http://www.avert.org/worldstats.htm). In 2003, the world barely escaped a pandemic by a SARS-coronavirus now thought to have jumped from bats to humans ((Becker et al., 2008) and references therein).
Although in theory at least, we have the capacity to generate any genetic sequence that we can conceive, what we can do with this capacity is in fact quite limited. While it’s easy to think up fantastic and scary scenarios of a synthetic killer viruses wiping out mankind, bio-terrorists and the brightest scientific thinkers alike would be hard pressed to say what such a designer super-pathogen would look like. In reality, all that can basically be accomplish via synthesis for now and for some time to come, is to emulate, copy and recreate what mother nature has brought forth and thrown at us incessantly throughout our history on this planet. It is possible to produce variations on an existing theme. It is not possible, as yet, to design from scratch a qualitatively new pathogen, that is completely different from any organism that exists now or has existed in the past. The level of abstraction required to “piece together” qualitatively new lifeforms form defined of the shelf parts (genes), is far from being realized (Goler, Bramlett, and Peccoud, 2008). It is probably this misconception, trumpeted by the media, which strikes a cord of fear in the general population. Cases in point:
  • The 2002 poliovirus synthesis (Cello, Paul, and Wimmer, 2002), the first synthesis of a pathogen, caught the world off guard and ignited a heated debate in its aftermath. All we had done was to recreate an exact synthetic copy of the poliovirus genome, except for some genetic “watermarks” to prove the authenticity of the synthetic genome. The resulting virus was at the protein level 100% identical to the wild type virus used in countless laboratories around the world, a virus that even now naturally circulates in several countries and that is available for purchase at repositories such as the American Type Culture Collection (ATCC). Being an exact antigenic match to the currently available poliovirus vaccine, an overwhelming proportion of the world population is immune against this virus. Worldwide vaccine coverage against poliovirus is arguably the greatest of any vaccine preventable disease. This is hardly a blueprint for an imminent bioterrorist attack. But it was suddenly becoming clear that viruses can never be regarded as extinct, as long as their genome sequence information is preserved, be it on a government-sponsored online database, a 29 year old Nature journal (Kitamura et al., 1980) gathering dust in libraries across the world, or just written down on a smudgy piece of paper forgotten in a desk drawer... It is sufficient to re-create a virus at any point, even long after any traces of it’s natural presence have vanished. It is this uncomfortable realization that brought about the level of public discussion that the original poliovirus synthesis had. The publication was intended not only to herald a new era in the study of organisms but also as a “wake-up call” for dual use technology.
  • The recreation of the highly pathogenic 1918 flu virus (Tumpey et al., 2005) out of sequences extracted from influenza victims preserved in the northern permafrost also met with criticism, although no one had maligned the publication of the genome sequence as much as 8 years earlier (Taubenberger et al., 1997). In fact, the synthesis the 1918 virus brought critical new insight into the pathogenesis of the influenza and it is a prerequisite for the production of an adequate vaccine should such a need ever arise. Isn’t society in the long run much better off with this knowledge than without it, understanding 1918 flu virus in detail rather than hoping that something like the 1918 flu will never happen again? This sentiment is even more inappropriate with the looming threat of the H5N1 bird flu pandemic.
  • Over 30 years of random, “unenlightened” genetic manipulation of viral genomes through recombinant DNA technology by countless laboratories around the world has not shown any evidence, that researchers would accidentally and unbeknownst to them create a human super-virus. Whole genome synthesis will be no different.
  • The adapation of a human pathogen to an experimental animal species by repeated passaging through that species (a decidedly “pre-synthetic era” method) has been employed ever since viruses were discovered. It leads to the increased pathogenicity in the new species compared to the wild type virus. These host-adapted models have greatly facilitated the study of viruses and the diseases they cause. Equally important these experiments resulted in the development of some of the most successful vaccines ever produced (polio, measles, mumps, rubella, and smallpox). As it turned out passaging these viruses through diverse animal species lead to the mitigation of their disease-causing potential for humans – a process termed “attenuation”.
All the above considerations notwithstanding, de novo genome synthesis, like many technologies in the past, does hold a potential for dual use. And unlike many technologies before it, nuclear proliferation for instance, which require immense resources that cannot escape detection, the intentional misuse of genome synthesis technologies will become increasingly undetectable. It seems next to impossible that genome synthesis can ever be government-regulated effectively. The technology and its components are too ubiquitous already, and too easy to jury-rig from off-the-shelf parts. The nature of genome synthesis is such that in the very near future pathogens can, and perhaps will, be synthesized in the proverbial hobbyist’s basement, high school Science lab or by a bio-terrorist organization. These possibilities are not an academic’s hyperbole either. In fact the grass roots “bio-hacker” culture is already flourishing, outside the realm of academia, industry and government oversight (Nair, 2009). When considering these issues our society would be prudent to shift focus from prevention of such dual use proliferation to preparing for it. The latter may include the development of new vaccines and/or the stockpiling of available vaccines against the most likely bio-terrorist agents.
Footnotes
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
  • The Economist. Life 2.0. The Economist. 2006:67–70. print edition 09/02/2006.
  • Agarwal KL, Buchi H, Caruthers MH, Gupta N, Khorana HG, Kleppe K, Kumar A, Ohtsuka E, Rajbhandary UL, Van de Sande JH, Sgaramella V, Weber H, Yamada T. Total synthesis of the gene for an alanine transfer ribonucleic acid from yeast. Nature. 1970;227(5253):27–34. [PubMed]
  • Barany F. Genetic disease detection and DNA amplification using cloned thermostable ligase. Proc Natl Acad Sci U S A. 1991;88(1):189–93. [PubMed]
  • Becker MM, Graham RL, Donaldson EF, Rockx B, Sims AC, Sheahan T, Pickles RJ, Corti D, Johnston RE, Baric RS, Denison MR. Synthetic recombinant bat SARS-like coronavirus is infectious in cultured cells and in mice. Proc Natl Acad Sci U S A. 2008;105(50):19944–9. [PubMed]
  • Blight KJ, Kolykhalov AA, Rice CM. Efficient initiation of HCV RNA replication in cell culture. Science. 2000;290(5498):1972–4. [PubMed]
  • Carr PA, Isaacs F. E.coli: Whole Genome Engineering; Presented at “Synthetic Biology 2.0”; May 20–22; Berkeley, CA. 2006.
  • Carr PA, Park JS, Lee YJ, Yu T, Zhang S, Jacobson JM. Protein-mediated error correction for de novo DNA synthesis. Nucleic Acids Res. 2004;32 (20):e162. [PMC free article] [PubMed]
  • Caruthers MH, Barone AD, Beaucage SL, Dodds DR, Fisher EF, McBride LJ, Matteucci M, Stabinsky Z, Tang JY. Chemical synthesis of deoxyoligonucleotides by the phosphoramidite method. Methods Enzymol. 1987;154:287–313. [PubMed]
  • Cello J, Paul AV, Wimmer E. Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science. 2002;297(5583):1016–1018. [PubMed]
  • Chan LY, Kosuri S, Endy D. Refactoring bacteriophage T7. Mol Syst Biol. 2005;1:0018. [PMC free article] [PubMed]
  • Coleman JR, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S. Virus attenuation by genome-scale changes in codon pair bias. Science. 2008;320(5884):1784–7. [PMC free article] [PubMed]
  • Czar MJ, Anderson JC, Bader JS, Peccoud J. Gene synthesis demystified. Trends Biotechnol. 2009;27(2):63–72. [PubMed]
  • Dewannieux M, Harper F, Richaud A, Letzelter C, Ribet D, Pierron G, Heidmann T. Identification of an infectious progenitor for the multiple-copy HERV-K human endogenous retroelements. Genome Res. 2006;16(12):1548–56. [PubMed]
  • Dymond JS, Scheifele LZ, Richardson S, Lee P, Chandrasegaran S, Bader JS, Boeke JD. Teaching synthetic biology, bioinformatics and engineering to undergraduates: the interdisciplinary Build-a-Genome course. Genetics. 2009;181(1):13–21. [PubMed]
  • Edge MD, Green AR, Heathcliffe GR, Meacock PA, Schuch W, Scanlon DB, Atkinson TC, Newton CR, Markham AF. Total synthesis of a human leukocyte interferon gene. Nature. 1981;292(5825):756–62. [PubMed]
  • Ferretti L, Karnik SS, Khorana HG, Nassal M, Oprian DD. Total synthesis of a gene for bovine rhodopsin. Proc Natl Acad Sci U S A. 1986;83(3):599–603. [PubMed]
  • Fodor SP, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D. Light-directed, spatially addressable parallel chemical synthesis. Science. 1991;251(4995):767–73. [PubMed]
  • Gibson DG, Benders GA, Andrews-Pfannkoch C, Denisova EA, Baden-Tillson H, Zaveri J, Stockwell TB, Brownley A, Thomas DW, Algire MA, Merryman C, Young L, Noskov VN, Glass JI, Venter JC, Hutchison CA, Smith HO. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science. 2008;319(5867):1215–20. [PubMed]
  • Goler JA, Bramlett BW, Peccoud J. Genetic design: rising above the sequence. Trends Biotechnol. 2008;26(10):538–44. [PubMed]
  • Goodman C. Engineering ingenuity at iGEM. Nat Chem Biol. 2008;4(1):13. [PubMed]
  • Gupta NK, Ohtsuka E, Sgaramella V, Buchi H, Kumar A, Weber H, Khorana HG. Studies on polynucleotides, 88. Enzymatic joining of chemically synthesized segments corresponding to the gene for alanine-tRNA. Proc Natl Acad Sci U S A. 1968;60(4):1338–44. [PubMed]
  • Gustafsson C, Govindarajan S, Minshull J. Codon bias and heterologous protein expression. Trends Biotechnol. 2004;22(7):346–53. [PubMed]
  • Itakura K, Hirose T, Crea R, Riggs AD, Heyneker HL, Bolivar F, Boyer HW. Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin. Science. 1977;198(4321):1056–63. [PubMed]
  • Kalman M, Cserpan I, Bajszar G, Dobi A, Horvath E, Pazman C, Simoncsits A. Synthesis of a gene for human serum albumin and its expression in Saccharomyces cerevisiae. Nucleic Acids Res. 1990;18(20):6075–81. [PMC free article] [PubMed]
  • Khorana HG. Total synthesis of a gene. Science. 1979;203(4381):614–25. [PubMed]
  • Kitamura N, Adler C, Martinko J, Nathenson S, Wimmer E. The genome-linked protein of picornaviruses Vll. Genetic mapping of poliovirus VPg by protein and RNA sequence studies. Cell. 1980;21:295–302. [PubMed]
  • Kodumal SJ, Patel KG, Reid R, Menzella HG, Welch M, Santi DV. Total synthesis of long DNA sequences: synthesis of a contiguous 32-kb polyketide synthase gene cluster. Proc Natl Acad Sci U S A. 2004;101(44):15573–8. [PubMed]
  • Lartigue C, Glass JI, Alperovich N, Pieper R, Parmar PP, Hutchison CA, 3rd, Smith HO, Venter JC. Genome transplantation in bacteria: changing one species to another. Science. 2007;317(5838):632–8. [PubMed]
  • Lee YN, Bieniasz PD. Reconstitution of an infectious human endogenous retrovirus. PLoS Pathog. 2007;3(1):e10. [PMC free article] [PubMed]
  • Letsinger RL, Mahadevan V. Oligonucleotide Synthesis on a Polymer Support. J Am Chem Soc. 1965;87:3526–7. [PubMed]
  • Mueller S, Papamichail D, Coleman JR, Skiena S, Wimmer E. Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity. J Virol. 2006;80(19):9687–96. [PMC free article] [PubMed]
  • Nair P. Straight talk with...Mac Cowell and Jason Bobe. Nat Med. 2009;15(3):230–1. [PubMed]
  • Nirenberg M, Leder P. Rna Codewords and Protein Synthesis. The Effect of Trinucleotides Upon the Binding of Srna to Ribosomes. Science. 1964;145:1399–407. [PubMed]
  • Pan W, Ravot E, Tolle R, Frank R, Mosbach R, Turbachova I, Bujard H. Vaccine candidate MSP-1 from Plasmodium falciparum: a redesigned 4917 bp polynucleotide enables synthesis and isolation of full-length protein from Escherichia coli and mammalian cells. Nucleic Acids Res. 1999;27(4):1094–103. [PMC free article] [PubMed]
  • Perry RT, Halsey NA. The clinical significance of measles: a review. J Infect Dis. 2004;189(Suppl 1):S4–16. [PubMed]
  • Picksley SM, Parsons CA, Kemper B, West SC. Cleavage specificity of bacteriophage T4 endonuclease VII and bacteriophage T7 endonuclease I on synthetic branch migratable Holliday junctions. J Mol Biol. 1990;212(4):723–35. [PubMed]
  • Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F, Sussman MR, Cerrina F. Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol. 1999;17(10):974–8. [PubMed]
  • Smith HO, Hutchison CA, 3rd, Pfannkoch C, Venter JC. Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from synthetic oligonucleotides. Proc Natl Acad Sci U S A. 2003;100(26):15440–5. [PubMed]
  • Soll D, Ohtsuka E, Jones DS, Lohrmann R, Hayatsu H, Nishimura S, Khorana HG. Studies on polynucleotides, XLIX. Stimulation of the binding of aminoacyl-sRNA’s to ribosomes by ribotrinucleotides and a survey of codon assignments for 20 amino acids. Proc Natl Acad Sci U S A. 1965;54(5):1378–85. [PubMed]
  • Stemmer WP, Crameri A, Ha KD, Brennan TM, Heyneker HL. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene. 1995;164(1):49–53. [PubMed]
  • Takehisa J, Kraus MH, Decker JM, Li Y, Keele BF, Bibollet-Ruche F, Zammit KP, Weng Z, Santiago ML, Kamenya S, Wilson ML, Pusey AE, Bailes E, Sharp PM, Shaw GM, Hahn BH. Generation of infectious molecular clones of simian immunodeficiency virus from fecal consensus sequences of wild chimpanzees. J Virol. 2007;81(14):7463–75. [PMC free article] [PubMed]
  • Taubenberger JK, Reid AH, Krafft AE, Bijwaard KE, Fanning TG. Initial genetic characterization of the 1918 “Spanish” influenza virus. Science. 1997;275(5307):1793–6. [PubMed]
  • Tian J, Gong H, Sheng N, Zhou X, Gulari E, Gao X, Church G. Accurate multiplex gene synthesis from programmable DNA microchips. Nature. 2004;432(7020):1050–4. [PubMed]
  • Tumpey TM, Basler CF, Aguilar PV, Zeng H, Solorzano A, Swayne DE, Cox NJ, Katz JM, Taubenberger JK, Palese P, Garcia-Sastre A. Characterization of the reconstructed 1918 Spanish influenza pandemic virus. Science. 2005;310(5745):77–80. [PubMed]
  • WHO. Fact Sheet No. 211. Influenza World Health Organization; Geneva: 2003.
  • Young L, Dong Q. Two-step total gene synthesis method. Nucleic Acids Res. 2004;32(7):e59. [PMC free article] [PubMed]
  • Zolotukhin S, Potter M, Hauswirth WW, Guy J, Muzyczka N. A “humanized” green fluorescent protein cDNA adapted for high-level expression in mammalian cells. J Virol. 1996;70(7):4646–54. [PMC free article] [PubMed]