In 1828, Friedrich Wöhler synthesized urea from inorganic sources
6, striking a heavy blow to the doctrine of vitalism
7. The chemical DNA was discovered in 1869 (ref.
8), but it took decades to solve the structural configuration of polynucleotides
9,10. In keeping with their tradition, chemists began to synthesize DNA as soon as DNA structures had been published. The most ambitious of such early ventures was Khorana’s synthesis of a 75-base-pair (bp) double-stranded DNA that encoded the nucleotide sequence of yeast tRNA
Ala, published in 1970 (ref.
11). This was followed by the chemical and enzymatic synthesis of the first man-made functional gene, the 207-bp DNA of
Escherichia coli tyrosine suppressor tRNA
12.
These early landmarks consumed enormous resources, in the case of tRNA
Ala some “20 man-years of effort”
13. In the 1980s, however, DNA synthesis went through a rapid transformation, with the introduction of novel activated nucleosides that allowed fully automated 3′-to-5′ synthesis of oligodeoxynucleotides (oligos) on solid supports
13,14. In particular, phosphoramidites (that is, nucleotides that carry protective groups on the reactive hydroxyl and phosphate groups of the ribose and the amine of the base) have been the building blocks of choice. During the past 20 years, numerous DNA synthesis companies have been established in response to an exploding demand for oligos (~15–80 nucleotides (nt)) that are used for genetic analyses, PCR, diagnostic assays, sequence determination or other procedures. The turnaround time for an order of a 75-bp DNA, corresponding to yeast tRNA
Ala, with extra base pairs at each end encoding restriction sites for subcloning, is currently less than 1 week—a fraction of the time and effort expended originally in Khorana’s laboratory.
The assembly of larger DNA segments representing genes or entire genomes, however, is still tedious and costly, even today. It requires many oligos that must be purified, because their chemical synthesis is error prone (none of the successive chemical reactions during 3′-to-5′ chain elongation proceeds at 100%). For this reason, the building blocks for the assembly of large polynucleotides are generally no longer than 40–80 nt. Different approaches have been used to assemble oligos into large polynucleotides, although all have in common the processes of enzymatic chain elongation and/or ligation of hybridized overlapping oligos
14,15. Examples are the 2.7-kbp plasmid containing the P-lactamase gene
16 and the 4,917-kbp gene encoding the merozoite surface protein (MSP-1) of
Plasmodium falciparum17. Currently, synthesizing genes or genomes is most cost efficient when done in part by commercial facilities, where the cost per base pair of finished and sequence-confirmed DNA is now as low $0.39 (E.W. and S.M., based on information obtained from an informal web survey).
Work in one of our groups (E.W. and colleagues)
18 led to the first chemical synthesis of a DNA (7,500 bp) corresponding to the entire genome of an infectious organism, poliovirus, published in 2002. At the time of its publication, the poliovirus-specific DNA was the largest DNA ever synthesized. This milestone was subsequently dwarfed in scale by the synthesis of the 582,970-bp genome of
Mycoplasma genitalium in 2008 (ref.
19). Although this synthetic bacterial genome has not yet been ‘booted’ to life, the assembly of such a large DNA molecule bears witness to the vast possibilities that DNA synthesis will ultimately offer in engineering bacteria or viruses.
Although the mechanics of constructing genes or genomes from oligos is being refined, DNA synthesis is making rapid progress, so that it is likely to fundamentally change research in molecular biology
14. In 2004, Tian
et al.20 published a massively parallel microchip-based DNA synthesis approach that they predicted “might increase yields in oligo synthesis from 9 bp per dollar to 20 kbp per dollar.” Once this or related strategies have matured and reach commercialization, the synthesis of small viral DNA genomes (for example, the 3,215-bp genome of hepatitis B virus (HBV)) could be accomplished for less than $100. At so low a cost, who would then construct an HBV mutant by such classic methods as site-directed mutagenesis? All current gene synthesis methods, either practiced or just conceived, still depend on relatively short oligos as their basic building blocks. But further progress in synthetic biology will require accurate synthesis of long, continuous DNA sequences.
Synthesizing large DNA molecules would be of only limited value if new methods of DNA sequencing had not kept pace with the advances in synthesis. In fact, the advances in DNA sequencing have dwarfed current DNA synthesis technology.
The first sequence of a naturally occurring polynucleotide, yeast tRNA
Ala, was deciphered by R. Holley and colleagues
21 in 1965. Initiated in 1958, the most difficult task of this project was to isolate from 140 kg of bakers’ yeast 1 g of highly purified tRNA
Ala, whose 76 ribonucleotides were then sequenced in 2.5 years (the sequence was later revised slightly)
22. Since then, however, two phases of technological innovation in sequencing have led to rapid progress
23.
The first phase was based on generating radioactive, sequence-specific fragments of DNA and separating them by PAGE
24,25. Sanger’s method of producing fragments enzymatically by chain termination with dideoxynucleoside triphosphates proved to be more practical than the chemical method of Maxam and Gilbert. Subsequently, gels were replaced by capillaries, and radioactive labels by four-color fluorescence; the process was automated and streamlined, but the underlying principle of the dideoxy method remains, to this day, the most widely used platform of DNA sequencing.
The second phase, still in its infancy, falls under the rubric of a single paradigm, termed ‘cyclic array sequencing’. “Cyclic array platforms achieve low cost by simultaneously decoding a two-dimensional array bearing millions (potentially billions) of distinct sequence features”
23. Such instruments, with slightly different technologies, are already commercially available from companies. Other methods, such as single-molecule sequencing, sequencing by microelectrophoresis, sequencing by mass spectrometry or sequencing by squeezing DNA through tiny nanopores (reviewed in ref.
23) are being tested, but have yet to mature into commercially useful techniques.
The strong progress in sequencing technologies is evident in the reduction in time and costs of human genome projects. The sequence of the ‘inaugural human genomes’ (3 × 10
9 bp), published in 2001 (refs.
26–28), was determined over a period of roughly 10 years at a cost of $3 billion—and it was incomplete
27. In contrast, the complete sequence of Jim Watson’s genome was determined in 4 months at a cost of less than $1 million
29. Currently, the price has dropped further to below $50,000 (ref.
30), and there is reason to believe that the number of solved human sequences will exceed 1,000 in the near future.