The Microfluidic PicoArray Synthesis
In the programmable PicoArray synthesis method, a new microfluidic device is used as a miniaturized synthesizer for solid support parallel synthesis of oligonucleotides. This device is a multiplexing reactor that is configured with massively parallel picoliter-scale reaction chambers as illustrated in Figure A. The microfluidic synthesis device is programmable and is readily adopted by the existing automated synthesizers. During the operation, fluid, driven by pressure, flows into the device through an entrance through-hole, splits evenly into the entry binary distribution channels, flows down the tapered fluid channels, passes through the reaction chambers in parallel, enters the tapered fluid channels on the other side of the reaction chambers, emerges in the exit binary distribution channels, and exits through an exit through-hole. The entirely sealed, positive pressure operational system helps to isolate the cross-interactions among the reaction chambers, such as in situ generated acid diffusion (see next paragraph), and avoid trapping of gas bubbles, solvent evaporation, air-borne contamination, and air-oxidation. The synthesis device is compatible with organic solvents used for oligonucleotide synthesis.
The PicoArray is designed to enhance the performance of a novel PGA chemistry (16
), which was demonstrated for in situ
synthesis of DNA microarray chips on non-wetting film patterned glass plates (17
). The synthesis process was carried out on a regular DNA synthesizer that is equipped with a programmable digital light projector as described previously (17
). The PicoArray microfluidic device was connected to the synthesizer in the same way as a regular synthesis column and the synthesis was similar to the process that was described previously (17
). In a typical experiment, oligonucleotide synthesis started on the silane linker derivatized surfaces that contained DMT protected hydroxyl. The reagent flow rate and the synthesis protocols were optimized for high yield and minimal reaction time. Multiple oligonucleotides are synthesized in parallel by gating the reaction using PGA, which was generated upon light irradiation at selected reaction chambers, to remove the DMT group at those sites, thereby allowing nucleophosphoramidite coupling (chain growth) in each reaction cycle. The PicoArray microfluidic device exhibits excellent mass transfer properties. For instance, complete deprotection of the acid labile group from the 5′-terminus of the growing chain took 2–3 s, compared with the 60 s that was required for a confined droplet on a glass plate in our previous studies (17
). Under the current reaction conditions, the reaction cross-talk of the adjacent cells is minimal and the composition integrity (sequence fidelity) inside an individual reaction chamber is maintained. The PicoArray synthesis device has only 10 μl flow-through volume, and thus the chemical consumption for synthesizing an array of oligonucleotides is comparable with regular DNA oligonucleotide synthesis of a single sequence. This translates into micro-amounts of reagents and solvents on a per sequence basis as thousands of oligonucleotides are simultaneously made in a PicoArray synthesis device.
Modeling of the synthesis
To better understand the results and enhance the quality for the PicoArray synthesis, a computational model was created to simulate the product distribution of oligonucleotides as a function of the efficiencies of three individual reaction steps: deprotection using acid, coupling using nucleophosphoramidite, and capping using acetic anhydride (Scheme ). The synthesis of oligonucleotides consists of repeated cycles, with each cycle performing the three individual reaction steps plus the oxidation reaction. The products under consideration by the model include the full-length sequence, the sequences terminated with OH (uncapped), the sequence terminated with the protecting group (DMT) and the capped sequences, [Cap-O(n − 1), Cap-O(n − 2), etc.], where each category of the sequence is a mixture of different lengths accumulated over multiple reaction cycles. This model reveals distinct patterns of product distribution as a function of the efficiency of individual reaction steps (Figure ). These results are used to isolate the factors that most affect the quality of synthesis, especially those that influence the formation of the n-1mer oligonucleotides. This is particularly important because of the difficulty in the discrimination between the n-1mer and the nmer oligonucleotides. This is most probably the primary source of errors in the subsequent experiments. The model predicted that besides high yield, stepwise synthesis is critical to the final amount of full-length sequences (Figure A versus B), but high efficiency deprotection is most critical in reducing n-1 and other undesirable impurity sequences, the presence of which significantly reduces the quality of the synthesis. Low efficiency deprotection would result in shifting of the product distribution to shorter sequences. For example, a reaction condition having coupling and capping efficiencies of 99.5% and a moderate deprotection efficiency of 97.0% would result in (n-1) versus n ratios of 0.50 and 1.24 for the synthesis of 16 and 40mer sequences, respectively, rendering the process unsuitable for making 40mer sequences (Figure E and D). Experimental data shown in Figure A validated the model (Figure F) when poor deprotection efficiency was created in a synthesis. On the other hand, lower coupling efficiency may be compensated by efficient capping; such synthesis results in fewer full-length sequences (24.2% compared with 67.0% of an ideal synthesis) but without a significant increase of the n-1 versus nmer ratio (a ratio of 0.21 compared to 0.20 of an ideal synthesis) (Figure B and A). These comparisons clearly illustrate the importance of efficient capping for reducing the formation of n-1mer sequences (Figure C). Using the modeling and product profile analysis as a guide, we optimized synthesis conditions in the PicoArray reaction device and routinely achieved 98.8% or better stepwise yield (Figure B). The synthesis efficiency would generate 58% full-length 45mer or 70% full-length 30mer sequences. Oligonucleotides >100 nt long (30% full length by prediction) were synthesized using the PicoArray device, cloned and verified by sequencing to contain the correct sequences (data not shown).
Post-synthesis analysis and reverse hybridization of oligonucleotides
The synthesis results were analyzed using several methods including the use of CE as described above. Using real-time PCR, we selectively measured several sequences in the cleaved oligonucleotide mixture and derived an average quantity of 3 × 109
(or 5 fmol) sequences per reaction chamber. This suggests that a total of 20 pmol full-length sequences were recovered from each PicoArray synthesis. The overall presence of the synthesized oligonucleotides in the cleaved mixture product was verified by hybridizing the oligonucleotide mixture to a detection DNA array chip that contained complementary sequences (i.e. oligonucleotide point check) for which the probes have comparable duplex melting temperature (Tm
). In the experiment shown in Figure B we amplified the cleaved oligonucleotide mixture using PCR (the sequences contain priming sites on both ends), and hybridized the PCR product of both sense (labeled with Cy5 and shown in red) and antisense (labeled with Cy3 and shown in green) strands to the detection DNA array chip that contained 1011 probes in duplicate. In a separate experiment, the target duplexes were products of PCR and T7 in vitro
transcription of the oligonucleotides synthesized and recovered from the PicoArray reactor. The presence of complete sets of both strands was confirmed by the observation of an expected red and green chessboard pattern that alternated between hybridized sense and antisense sequences. Nearly 100% of the expected hybridization sites of the cleaved oligonucleotides have intensities above the average background plus three times of the standard deviation of the background; the integrity of the individual sequences obtained from PCR was assessed by hybridization as well. The complementary sequences (perfect match) were easily distinguished from their counterparts that contained mismatched and deleted bases, by their higher intensities. In separate experiments, the hybridized oligonucleotides can be recovered from the surface under denaturing conditions and used in ligation and PCR reactions. These results demonstrate the potential of massively parallel purification of Tm
equalized oligonucleotides by hybridization selection of high-fidelity sequences under optimized experimental conditions. In a separate paper, Church and co-workers provide further experimental details on oligonucleotide point check by FAM labeling and the use of oligonucleotides in PCR and multiplex gene synthesis; they demonstrate that hybridization purification that used specific probes, synthesized on the PicoArray reactor efficiently reduces the error rate from a level of ~1/160 to 1/1394 (G. Church, J. Tian, H. Gong, N. Sheng, X. Gao, X. Zhou and E. Gulari, manuscript submitted).
Assembling of oligonucleotides
The general strategy of DNA synthesis used in this work is depicted in Figure D, which involves ligation of oligonucleotides containing cohesive ends, followed by fusion PCR. Multiple DNA constructs (200–2000 total bp) were assembled from oligonucleotide mixtures of 30 or 45mers that were obtained using the PicoArray synthesis method. An example of a 1 kb gene sequence and its oligonucleotides is given in the Supplementary Material. The PicoArray oligonucleotides were collected in a 20 μl volume, and the materials produced from one synthesis were sufficient for 2–4 ligation reactions. The ligation products were then PCR amplified and visualized by agarose gel electrophoresis (Figure A). The longest DNA constructs directly assembled by ligation were 714 bp EGFP and 712 bp EYFP DNA fragments. A 1040 bp EGFP gene (Supplementary Material) plus flanking sequences was assembled from overlapping 480 and 580 bp DNA ligation fragments, assembled simultaneously, followed by fusion PCR. Simultaneous assembling of five DNA constructs of a total of 1.2 kb from 80 oligonucleotides from the same PicoArray synthesis was also demonstrated; the longest overall length of these DNA fragments was 2 kb. To explore the potential of large DNA synthesis by the ligation–fusion PCR strategy, experiments were performed using ~700 bp fragments (total 15, 5 from LacZ, 4 from luciferase, 2 from tubulin, 1 from BGH, 1 from Amp, 1 from YFP and 1 from RsRed) to assemble 2 kb DNA constructs and then to assemble a 10 kb DNA construct, which was validated by gel electrophoresis (data not shown). The known difficult gene, DNA gyrase subunit B (1.9 kb, 67.8% AT), was synthesized by ligation and fusion PCR. The discussed genes synthesized were validated by conventional sequencing. To avoid cross contamination associated with the PCR reactions used in DNA synthesis, silent mutations of the genetic code were incorporated into the various positions of the various DNA sequences to distinguish them from among the different synthesis experiments. Figure B shows the expression of EGFP from synthetic genes from the commercial or the PicoArray synthesized oligonucleotides, and 26.8 versus 30.0% of colonies contained functional full-length genes and expressed EGFP, respectively. The sequencing results gave an error rate of 1.6‰ (‰ is per thousand) for the PicoArray synthesized EGFP gene that was derived, which is comparable to the 1.7/‰ error rate of the synthetic gene that was assembled from commercial oligonucleotides. On an average, the error rate of the synthetic genes based on PicoArray oligonucleotides was found to be between 1 and 5/‰ when no post-synthesis purification was applied. The types of errors were in most cases deletion > substitution > insertion, and the distribution of these errors was somewhat random. In addition, synthesis efficiency is another source of error; further understanding of the sources of sequence errors would require a large amount of sequencing data accumulated from DNA synthesis. Whereas the attained error rate is sufficient for 1 kb gene synthesis, it must be reduced by a factor of 10 for the synthesis of larger genes.