|Home | About | Journals | Submit | Contact Us | Français|
Marine cyanobacteria of the genus Prochlorococcus represent numerically dominant photoautotrophs residing throughout the euphotic zones in the open oceans and are major contributors to the global carbon cycle. Prochlorococcus has remained a genetically intractable bacterium due to slow growth rates and low transformation efficiencies using standard techniques. Our recent successes in cloning and genetically engineering the AT-rich, 1.1 Mb Mycoplasma mycoides genome in yeast encouraged us to explore similar methods with Prochlorococcus. Prochlorococcus MED4 has an AT-rich genome, with a GC content of 30.8%, similar to that of Saccharomyces cerevisiae (38%), and contains abundant yeast replication origin consensus sites (ACS) evenly distributed around its 1.66 Mb genome. Unlike Mycoplasma cells, which use the UGA codon for tryptophane, Prochlorococcus uses the standard genetic code. Despite this, we observed no toxic effects of several partial and 15 whole Prochlorococcus MED4 genome clones in S. cerevisiae. Sequencing of a Prochlorococcus genome purified from yeast identified 14 single base pair missense mutations, one frameshift, one single base substitution to a stop codon and one dinucleotide transversion compared to the donor genomic DNA. We thus provide evidence of transformation, replication and maintenance of this 1.66 Mb intact bacterial genome in S. cerevisiae.
Cyanobacteria of the genus Prochlorococcus are estimated to be the most abundant photosynthetic organisms on the planet; they are commonly distributed throughout the euphotic zone at cell concentrations ranging from 104 to 105 cells/ml and are a major contributor to the global carbon cycle. Prochlorococcus species have adapted to nutrient-poor environments and can thrive in high-light as well as low-light conditions. Although only 0.5–1 µm in diameter, it is estimated that Prochlorococcus populations constitute 30–60% of total chlorophyll biomass between latitudes 40°N and 40°S (1–3). Prochlorococcus marinus MED4 has been described as a high-light adapted ecotype (4). It contains the smallest genome of any known oxygenic phototroph [1 657 990 bp encoding 1716 annotated genes].
Genetic tools for manipulation of Prochlorococcus are largely lacking. There are no known natural plasmids and thus far transformation methods such as Polyethylene Glycol (PEG)-mediated transformation or electroporation have not been developed. Lateral gene transfer among Prochlorococcus strains has been observed frequently when mediated by cyanophages and appears to be an important evolutionary mechanism (5). There is one report of insertion of transposable elements into the Prochlorococcus MIT9313 genome by plasmid conjugation; however, Prochlorococcus strains have by and large remained genetically intractable (6–9).
The MED4 strain grows in artificial sea water (Pro99 medium) with a minimal doubling time of 24 h under bright light. Consequently, colonies derived from individual cells develop only after several weeks. The native Prochlorococcus MED4 strain was sequenced by the Joint Genome Institute in August 2003 and deposited as an axenic culture at the National Center for Marine Algae and Microbiota (NCMA, formerly the CCMP) in February 2004. Since then MED4 cultures have not been cryopreserved, but continuously passaged in Pro99 medium (personal communication with NCMA/CCMP curator), thus the MED4 strain has not been maintained in a way that guarantees clonal purity.
Our recent achievements in cloning Mycoplasma genomes into Saccharomyces cerevisiae by joining them to a yeast vector containing a centromere (CEN) and a yeast selectable marker, our ability to then genetically engineer the cloned genome using yeast genetic tools and finally our ability to transplant the engineered genome back into a recipient Mycoplasma cell to bring the engineered genome back to life (10–15) have led us to attempt a similar approach with Prochlorococcus. We describe in this article the successful cloning of the entire P. marinus MED4 genome in yeast and detailed sequence analysis of the bacterial circular chromosome purified from yeast. This success has depended primarily on two properties of the Prochlorococcus genome, AT richness, which results in a large number of consensus yeast replication origins, and lack of toxic gene products, presumably due to the differences in expression mechanisms between yeast and bacteria.
Yeast strains W303-1A (MATa leu2-3,112 trp1-1 can1-100 ura3-1 ade2-1 his3-11,15 ybp1-1) and VL6-48 (MATα, his3-Δ1, trp1-Δ1, ura3-52, lys2, ade2-101, met14) were described previously (14,16). Prochlorococcus marinus MED4 (CCMP1986) was continuously grown in Pro99 medium at 18 ± 1°C on a 14:10 light:dark cycle at 20 µmol Q m−2 s−1 from cool white, fluorescent bulbs. Approximately every 2 weeks, 3 ml of culture was diluted into 25 ml of fresh medium to maintain the strains. Growth of Prochlorococcus cultures was monitored by fluorometric detection of bulk chlorophyll autofluorescence (Turner Design Fluorometer 10-AU, excitation, 436 nm; absorbance, 680 nm). Cell concentrations of Prochlorococcus cultures were determined by flow cytometry using SYBR Green I fluorescence in a BD FACS Aria-II (488 nm; emission filter at 512 nm).
MED4 cells were harvested from a 30-ml culture grown to a bulk chlorophyll autofluorescence of 50 Relative Fluorescence Units (RFUs) by centrifugation, washed twice with 10 mM Tris, 0.5 M sucrose pH 6.5 buffer, resuspended in 500 µl of buffer, equilibrated for 5 min in a 50°C water bath and combined with 500 µl pre-melted 2.2% low melting point (LMP) agarose at 50°C. An equivalent of 1.5 × 108 cells was divided into 10 plugs, resulting in ~1.5 × 107 cells per plug. Intact genomes were isolated in agarose plugs according to the clamped homogeneous electric fields (CHEF)-DR III Manual (BioRad). After additional washes with 0.1× wash buffer (2 mM Tris, 5 mM EDTA, pH 8) and NEB buffer 2, plugs were digested with SgrDI or RsrII at 37°C overnight (80 U enzyme per plug). Prior to transformation, plugs were melted in 1× TE at 65°C for 10 min and treated with β-agarase (NEB) at 42°C overnight. Concentrations of melted plug DNA were determined to be 20–25 ng/µl by gel electrophoresis of sheared plug DNA and comparison to quantitative ladders. For transformation, 40 µl genomic DNA (equivalent to 0.8–1 µg) was used. For sequencing of wild-type MED4 genomic DNA, plugs were phenol-extracted after melting and agarase treatment.
The 10-kb trishuttle pmycYACTn vector (GU593054) contains a yeast centromere (CEN6), a yeast selectable marker (HIS3) and the yeast replication origin (ARSH4) (14). We amplified this with primers that contained 20 bp of sequence flanking the replication origin (so as to exclude it from the PCR product) and 60 bp of sequence to either side of the unique genomic SgrDI restriction site. The PCR product was 9862 bp long; 10 ng was used as template in a 100 -µl reaction with ExTaq (TaKaRa) at final concentrations of 5 U DNA polymerase, 2 mM MgCl2, 0.2 µM each primer and 0.2 mM of each dNTP. Cycling conditions were as follows: 94°C for 3 min; followed by five cycles of 94°C for 30 s, 50°C for 30 s and 68°C for 8 min; followed by 30 cycles of 94°C for 30 s, 62°C for 30 s and 68°C for 8 min, with a final extension of 68°C for 5 min. Each PCR vector product was gel purified on 0.8% LMP agarose gels using β-agarase, phenol extraction and ethanol precipitation to remove remaining background circular vector template. Later experiments included transformations with vector DNA products treated with DpnI prior to transformation to digest any remaining GATC-methylated circular vector DNA derived from dam+ Escherichia coli stocks of plasmid pmycYACTn. In this case, PCRs were first digested with 20 U DpnI for 1 h prior to gel purification. For insertion at the single-cut site SgrDI [nt 318371], primers with the following sequences were used (20 bp homology to vector template in bold): TCTAAGTCTCTAAAGTCACTAAAGGTTGTTCCCCCTGCACCAATACTTTTTTTTAAAACGTTCCATCATTAAAAGATACG and CTTTTAGAGAGGCTAGAACAATTAATAAAAATGAAATAAAAAGACTTAGAAGAGCTGTCGAGCAAGATAAAAGGTAGTAT. For insertion at RsrII sites, pmycYACTn was amplified using primers ATCCAACTATTGTCCAAGTCTGATAAGTTCTTGACTGCTGCCCAACCCATGTGCCAGTCGTTCCATCATTAAAAGATACG and AATCTTATTGGTGTCAACTTAATCAATTACCAAAACAAGAGTGGGCAGAATATTTTGACGAGCAAGATAAAAGGTAGTAT for RsrII site 1 [nt 311110] and primers CACATGCTCCAAGGCTTTTAAGTTTTAGGATTGATCCAGAACTTATAGGTACTGTTATCGTTCCATCATTAAAAGATACG and CTTCAATATCTATTTTTGTATTAGTTCTTTCAGTAATTCCTTTAATAGTTCTCCCTCCCGAGCAAGATAAAAGGTAGTAT for RsrII site 2 [nt 1138584].
Yeast transformations were performed by a spheroplast PEG transformation protocol as described (16) except cultures were grown to OD600 1.5. Per 1 µg genomic DNA isolated from agarose plugs (1 µg=1 fmol Prochlorococcus genome), a 30× molar amount of linear vector PCR product (200 ng=30 fmol vector) was added for transformations of whole genomes in a final end volume of 40 µl. Transformants were plated and incubated for 2–3 days at 30°C on selective plates (lacking histidine). Colonies were picked and repatched onto fresh minus histidine plates to avoid false-positive PCR results due to high amounts of genomic DNA on the original plates. DNA was extracted from ~107 repatched yeast cells suspended in 250 µl P1 buffer as described previously (17,18). The DNA was screened by PCR amplification of the vector insertion site. Primers were designed to give a 500-bp product at the 5′-vector junction. The 3′-vector junction was confirmed by primers yielding a 1000-bp product (Supplementary Table S8, Supplementary Figure S2). Clones were confirmed for completeness by multiplex PCR. Additionally, three amplicons were designed to test for the HIS3, lacZ and TetM sequences of the yeast vector (multiplex primers are listed in Supplementary Tables S9–S11).
Total DNA from individual yeast clones containing Prochlorococcus genomes was isolated in agarose plugs as described (11,17). Plugs were first digested with AscI, FseI, MreI and SfiI. These enzymes cleave yeast chromosomes but do not have recognition sites in the Prochlorococcus genome. To remove linear yeast chromosomal DNA fragments, plugs were pre-electrophoresed at constant voltage (5–6V/cm) for 5h at 4°C. Plugs were extracted from the first gel and then digested at the single-cut NotI site (only present in the vector) or the two RsrII sites for 4h, and sized by pulsed-field electrophoresis. Genomic DNA for sequencing of MED4 clone 2–19 was derived from 10 melted and phenol-extracted yeast plugs after digestion of yeast DNA and purification by electrophoresis as described above.
Both genomes were sequenced by 454 FLX pyrosequencing and Illumina Solexa sequencing at the Joint Technology Center, J. Craig Venter Institute, Rockville, MD. Approximately 5 µg of genomic DNA was used for library construction. Sequence reads from each genome were directly mapped to the 2003 reference genome (BX548174) using CLC Bio Genomics Workbench software version 5.1. To exclude any large insertions or gaps in the sequencing results of the MED4 genomes, de novo assembly of the native genomic DNA reads and the yeast clone reads from both platforms was carried out using MUMmer 3.23 (19). Single nucleotide polymorphism (SNP) and DIP (deletion/insertion polymorphism) detection with both software packages resulted in the same variants described below. For further SNP analysis, primers were designed to amplify open reading frames (ORFs) of genes PMM0039, PMM0214, PMM0844, PMM1600 and PMM1294 from genomic DNA isolated from yeast clone agarose plugs. PCR products were purified using PureLink® PCR Purification Kits (Invitrogen™) and screening of these candidate genes from 15 whole-genome clones in yeast was subsequently carried out by Sanger sequencing using either the 5′- or 3′-primer used for amplification.
Doubling times were determined by measuring the OD600 of exponentially growing cultures in media lacking histidine at 1-h intervals for at least three generations. Growth rate measurements were repeated three times for each culture. Standard deviations were calculated from the average doubling time differences obtained from each day. To compare growth of whole Prochlorococcus genome clones in yeast, clones 1–13 (first transformation), 2–2 and 2–19 (second transformation) were grown in parallel to a W303-1A strain containing only the cloning vector pmycYACTn (Supplementary Chart S1). The same four yeast clones were spotted as equal cell amounts on -HIS, -HIS-LYS, -HIS-URA, -HIS-LEU and -HIS-TRP drop-out plates to determine whether the bacterial genome would complement the yeast host marker mutations of strain W303-1A (Supplementary Figure S6).
Replication origins in S. cerevisiae are termed autonomously replicating sequence (ARS), and contain ARS consensus sequences (ACS) that are essential for binding of origin replication complexes. An 11-bp ACS consensus sequence (A/T)TTTA(T/C)(A/G)TTT(A/T) has been proposed (20). Additionally, a 17-bp consensus sequence or extended ACS (EACS) was determined by microarray analysis which combined evolutionary conserved ARS sequences as (A/T)(A/T)(A/T)(A/T)TTTA(T/C)(A/G) TTT(A/T)GTT (21–23). Although yeast replication has been studied extensively, it is difficult to fully define a functional replication origin. While a functional ARS generally contains an exact match, or is very similar to the 11-bp consensus sites, and contains at least one A/T-rich region of DNA functioning as an unwinding element, specific binding of replication complexes relies on varying neighboring sequence modules (24,25). We expected that the AT-rich (30.8% GC-content) Prochlorococcus genome would contain a high number of ACS sites. Using motif searches, we identified 328 11-bp ACS and 4 17-bp ACS sites in the genome. The maximum gap between two sites was 34.5 kb (Figure 1). We consider that this type of motif search resulting in an even distribution of ACS sites might be used as a proxy to determine whether a bacterial chromosome will be maintained in yeast. Analysis of previously cloned genomes of Mycoplasma genitalium, Mycoplasma pneumoniae M129, Mycoplasma mycoides (14) and Acholeplasma laidlawii PG-8A (18) showed similar distributions of ACS sites (Supplementary Figure S1). A more recent study showed that fragments with higher GC-content (55%) derived from Synechococcus elongatus PCC7942 and lacking ACS sites could be stably maintained as a 454-kb plasmid in yeast after ARS sequences were inserted at intervals of <150 kb (26).
We initially attempted cloning SgrDI linearized whole-genome fragments into yeast using gel-purified genomic DNA. Agarose plugs digested with SgrDI were run on LMP agarose CHEF gels and 1.66 Mb bands were isolated for transformation. Various genomic DNA:vector ratios were tested and although up to 19 colonies were obtained per transformation reaction none yielded Prochlorococcus DNA clones (Supplementary Table S4).
Subsequent transformations used genomic DNA digested with RsrII or SgrDI while in the plug. The cleaved DNA was then recovered from the plug and used directly for transformations. This yielded RsrII half genomes (clone 1–17, 675 kb and clone 1–V12, 580 kb) and whole-genome clones derived from SgrDI digests. Additionally, these only appeared in our screen for transformations using a genomic DNA:vector molar ratio of 1:30, while 1:3 and 1:10 ratios yielded no Prochlorococcus clones (Figure 2, Supplementary Table S5).
All yeast clones that showed evidence of one or more multiplex amplicons specific for MED4 were cultured to saturation in selective medium. DNA was isolated in agarose plugs and examined for size by restriction digestion followed by CHEF gel electrophoresis (Figure 3). We presumed some clones recombined into smaller circles due to fragmented genomic DNA derived from the Prochlorococcus agarose plugs (Supplementary Figures S2 and S3).
A third transformation experiment yielded a number of partial clones containing a range of multiplex amplicons as well as two more whole-genome clones (Supplementary Figures S3 and S4). Transformations with DpnI-digested vector product seemed to significantly reduce the background (Supplementary Table S6).
A fourth transformation experiment yielded a higher cloning efficiency resulting in 12 whole-genome clones out of 16 yeast colonies. All transformations using a DpnI-treated vector resulted in whole-genome clones (Supplementary Table S7, Supplementary Figure S5). Some background colonies can arise from recircularization of the linear vector PCR product itself or of the linear vector possibly inserting into yeast genomic DNA to generate histidine prototrophy. Though each transformation was carried out with a 1:30 gDNA:vector ratio, it is difficult to predict the number of intact genomes per plug that contribute to a higher yield of whole clones in yeast. As such, these ratios are only a general approximation to follow when cloning genomic DNA.
The native Prochlorococcus MED4 strain was sequenced by the Joint Genome Institute in August 2003 and was deposited as an axenic culture at the NCMA in February 2004. Since then, MED4 cultures have not been cryopreserved, but have been continuously passaged in Pro99 medium (personal communication with NCMA/CCMP curator). We sequenced the native strain obtained from the culture collection to determine any spontaneous mutations that may have occurred and been fixed at high frequency in the population over the last 8 years. Of 108 496 sequencing reads obtained by 454 sequencing, 108 310 reads mapped to cover the 1 657 990 bp native GenBank reference sequence (BX548174). Of the subsequent 35 672 668 Illumina reads, 35 530 402 matched to the GenBank reference. Combined with de novo assembly of all reads, we identified 27 mutations, 24 of which were single base pair variants, to the Genbank reference sequence (Supplementary Table S1).
A putative sulfate transporter gene (PMM0214) showed four missense mutations. Other non-synonymous SNPs were found in the transcription factor NtcA, the (ppGpp)ase SpoT, the Photosystem II gene PsbC, cytochrome cM, a Na+/H+ antiporter and molecular chaperones DnaJ and DnaK. Osburne et al. (27) resequenced a wild-type Prochlorococcus MED4 strain in 2008. The Illumina sequencing results of that study showed 17 variants compared to the 2003 reference sequence. Six of these were identical ones found in our sequencing results (Supplementary Table S1). The native MED4 strain deposited at the NCMA/CCMP in 2004 and serial passaging over 8 years has induced a remarkably small number of variations. We assume that the native cultures analyzed in the above sequencing projects are a mix of clones and therefore one cannot attribute any single or all identified mutations to one clone.
Sequencing data from the Prochlorococcus genome cloned in yeast (clone 2–19) showed a high yeast genome background even after gel purification of the linear yeast chromosomes from the circular bacterial genome topologically trapped in genomic DNA agarose plugs. Of the 792 302 total reads obtained by 454 sequencing, 288 853 reads mapped to the bacterial genome with an average coverage of 50× along the complete 1 657 990 bp GenBank reference. Sixty-three percent of unmapped reads can be attributed to the high yeast genomic DNA background (specifically 492 710 reads mapped to S. cerevisiae chromosomes). Of the remaining 10 739 reads, 1927 mapped to the cloning vector sequence inserted at the SgrDI site as designed. Illumina sequencing showed 3 350 748 out of 40 370 844 reads matched to complete the Prochlorococcus GenBank reference genome with an average 250× coverage.
The inserted vector sequence was found to contain 40 single base pair alterations, presumably introduced during PCR amplification rather than during transformation into yeast. Among SNPs identified, the ampicillin resistance gene showed one glutamine to glycine mutation, lacZ had three amino acid substitutions and the tetracycline resistance gene had 12 amino acid substitutions (Supplementary Table S2).
The cloned genome contained 27 mutational changes (24 were SNPs) compared to our sequence of the native Prochlorococcus genome. Differences are summarized in Supplementary Table S3. Nine SNPs represent positions at which the yeast clone is identical to the original 2003 reference genome sequence. Our yeast MED4 clone 2–19 represents a single genome from the MED4 culture at the time of harvest. Therefore, these nine SNPs most likely represent sequence heterogeneity at these positions within the culture, rather than reverse mutations to the original 2003 reference genome. Sixteen further single base pair variations were identified in the 2–19 yeast clone. Six represent silent mutations or were found in intergenic regions, eight SNPs cause amino acid substitutions, one is an ORF frameshift mutation and one is a mutation to a stop codon which causes an ORF truncation. An additional dinucleotide insertion (switching threonine to valine) and one amino acid change induced by a SNP occurred in the previously described sulfate transporter gene (PMM0214), which was found to be mutated at four different sites in the native genome. Another transporter gene (PMM1600, a putative Na+/H+ antiporter), mutated in the native genome analysis (two SNPs, see Supplementary Table S1), was mutated in the yeast clone also. Further missense point mutations were identified in a possible type I restriction–modification enzyme, a putative modulator of DNA gyrase (TldD), a seryl-tRNA synthetase (SerS) and in the RNA polymerase β-subunit (RpoB).
Each of the cloning experiments described above was derived from a separate genomic DNA preparation of the native culture. We obtained 1 whole-genome clone in our first successful transformation (clone 1–13), 2 more in our second attempt (clones 2–2 and 2–19) and 12 more in our third attempt (clones 3–2, 3–3, 3–7 to 3–16). These clones were screened for some of the SNPs identified in the whole-genome sequencing of yeast clone 2–19. Five candidate genes were chosen that showed mutations in clone 2–19 compared to the native sequencing data (highlighted in Supplementary Table S3). PCR amplification from genomic DNA prepared from the native Prochlorococcus culture and the 15 different MED4 whole-genome yeast clones was carried out for genes PMM0039, PMM0214, PMM0844, PMM1600 and PMM1294 (Table 1, Supplementary Figure S5). Sanger sequencing of the respective PCR products from the native culture detected the same genotype as for the initial native bacterial culture DNA preparation identified by 454 and Illumina sequencing. The five PCR products from yeast clone 2–19 also showed the same mutations as the whole-genome sequencing data (compare Supplementary Table S3 and Table 1). The remaining yeast clones (1–13, 2–2, 3–2, 3–3, 3–7 to 3–16) contained both versions of either the original native sequence or the mutations identified in clone 2–19 (Table 1). Therefore, the identified insertions, transversions or SNPs do not represent loss of function mutations upon cloning into the yeast host cells.
Our results are consistent with the existence of a mixture of clones in the original MED4 culture. Prior analysis of yeast clone sequences from native M. mycoides (24% GC-content) or M. genitalium (32% GC-content) genome assemblies transformed into yeast using the same protocol showed no mutations compared to the donor DNA. Furthermore, the M. mycoides genome isolated out of yeast was transplanted and aside from changes that were engineered in yeast, it contained the identical sequence as the molecule that was initially cloned into yeast (12–13).
The three whole-genome Prochlorococcus yeast clones 1–13, 2–2 and 2–19 were stored frozen in 25% glycerol at −80°C. They were repatched on selective plates (lacking histidine), passaged in liquid medium (lacking histidine) and analyzed by multiplex PCR and CHEF gels. The banding patterns were the same for all three clones (Supplementary Figures S3 and S4). Growth rates of the whole-genome clones were compared in three separate experiments to the wild-type yeast W303-1 A strain carrying only the circular cloning vector. Doubling times of the three clones compared to the plasmid clone were nearly identical (Supplementary Chart S1). Additionally, Prochlorococcus genomes are stably maintained in cultures grown not only in liquid media (lacking histidine) to late stationary phase but also were detectable by multiplex or pulsed-field gel analysis from cells grown in non-selective yeast extract peptone dextrose complex medium (YPD) medium to OD600 cell densities of 2–2.2 (Supplementary Figure S4B).
Cell spotting of a yeast control strain containing only the pmycYACTnNotI vector and spotting of Prochlorococcus yeast clones 1–13, 2–2 and 2–19 on plates lacking histidine, and uracil, leucine or tryptophan (standard marker mutations in the host strain W303-1A) yielded no growth. Growth was normal on plates lacking only histidine or histidine and lysine, thus the bacterial genes for uracil, leucine and tryptophan synthesis do not complement the yeast host genotype (Supplementary Figure S6).
To our knowledge this is the first report of the cloning of a gram-negative bacterial genome in S. cerevisiae and the second report of cloning a complete circular genome that uses the standard genetic code (18). DNA sequence analysis showed that the complete MED4 genome was present in the yeast clones. We observed a few mutational differences (primarily single base substitutions) between two sequences of the native Prochlorococcus MED4 genome and a yeast clone of that genome. However, because MED4 is maintained by serial passage without clonal isolation, there is no reason to believe that any of these differences arose by mutation during propagation in yeast.
There are two major reasons that the Prochlorococcus genome may be sustainable in yeast. First, it has a high incidence of ACS sites due to its high AT content, and these may be active as yeast origins of replication. Second, it is likely that most of the Prochlorococcus genes are not accurately expressed in yeast due to differences between the mechanisms of prokaryotic and eukaryotic transcription and translation. Prokaryotic promoters generally contain defined −35 and −10 regions, typically within 50 bp of the start of a gene. However, transcription in cyanobacteria may be more complicated due to additional RNA polymerase subunits γ and δ. Analysis of transcription start sites in Prochlorococcus MED4 has shown similar upstream elements to those of yeast (e.g. TATAAT or TATTAT) (28–30). Thus, transcription initiation of the prokaryotic genome in yeast might be possible. However, other studies involving vaccine development in yeast have shown that transcripts of foreign AT-rich genes are prematurely polyadenylated at unexpected sites (31). The native 1400 bp fragment C gene, a subunit of the tetanus toxin derived from the bacterium Clostridium tetani (GC content 28.6%), could not be expressed in its entirety from a yeast GAL promoter until a synthetic version was engineered with a GC content of 47%. Judging from previous studies expressing constructs up to 3.5 kb, sequences with a GC content below 40% resulted in truncated mRNA transcripts (32).
Translation of Prochlorococcus transcripts would require a high degree of similarity between the ribosome binding sites (Shine–Dalgarno sequences) and the yeast translation initiator consensus sites. This does not seem to be the case for cyanobacteria and yeast. Shine–Dalgarno (S–D) sequences typically have a consensus sequence GGAGG or a degenerate version of this which is within 5–10 bp upstream of the translation start codon. Cyanobacterial and chloroplast genes show very few ribosome binding sites in their 5′-UTR regions that can be classified as S–D consensus sequences even though the 16S rRNA in these organisms still retains the complementary (CCUCC) to the S–D consensus sequence (33,34). While the translation initiation site in polycistronic prokaryotic mRNAs is usually selected by base pairing of the S–D sequence with ribosomal RNA, initiation in eukaryotes is facilitated by a scanning mechanism whereby the small 40 S ribosomal subunit and additional co-factors (eIF2, eIF3, eIF4C, Met-tRNAi and Guanosine-5′-triphospate (GTP)) bind the 5′-cap structure of the mRNA and then migrate down the untranslated leader sequence scanning for the first AUG codon (35,36). Translation initiation of prokaryotic mRNA in yeast could be misplaced toward downstream AUG codons.
Additionally, there may be translational differences due to differences in codon usage between Prochlorococcus and yeast. Although the initiation step of mRNA translation is considered rate-limiting, codon usage is known to affect elongation rate and can be a limiting factor in product yield. Codon usage tables showed a bias in codon usage between Prochlorococcus MED4 and S. cerevisiae. For example, MED4 uses the glycine GGA codon and the leucine codon TTA twice as much compared to yeast (Supplementary Figure S7).
We did not observe a toxic effect as far as growth rates of these yeast clones are concerned. Deep sequencing analysis confirmed the presence of all Prochlorococcus genes inside the yeast host; however, several bacterial genes did not complement the yeast host genotype. We conclude that there is limited expression of Prochlorococcus genes in yeast and that the Prochlorococcus genome is stably maintained under selective conditions as observed for other bacterial genomes cloned in yeast thus far. We see these results as a proof of concept that bacteria using the standard genetic code which remain difficult to isolate or transform can be cloned and maintained in yeast. With the growing number of whole-genome sequences currently available, expanding the range of bacteria that can be co-transformed into yeast is a valuable stepping stone to the study of genetically intractable species.
Supplementary Data are available at NAR Online: Supplementary Tables 1–11, Supplementary Figures 1–7, Supplementary Chart 1 and Supplementary References [37,38].
Synthetic Genomics Inc. (SGI); all authors were supported by SGI (in part); National Science and Engineering Research Council of Canada (NSERC, Postdoctoral Fellowship) and SGI [to B.J.K.]. Funding for open access charge: JCVI.
Conflict of interest statement. J.C.V. is a Chief Executive Officer and Co-Chief Scientific Officer of Synthetic Genomics Inc. (SGI). H.O.S. is a Co-Chief Scientific Officer and a member of the Board of Directors of SGI. C.A.H. is a Chairman of the SGI Scientific Advisory Board. D.G.G. is a Principal Scientist at SGI. All four of these authors and the J. Craig Venter Institute hold SGI stock.
We thank Yo Suzuki and Philip D. Weyman for critically reading the article and Sanjay Vashee, Mikkel Algire, Radha Krishnakumar, Vladimir Noskov and Chuck Merryman for helpful discussions and suggestions.