As part of a broader research program in insect glycobiology, we undertook efforts to clone a core α1,3 fucosyltransferase gene from the lepidopteran insect,
Trichoplusia ni. A previously described degenerate PCR approach [
16] was used to successfully isolate an internal fragment of this gene (R.L. Harrison and D.L. Jarvis, unpublished results), which was designated TnFT3. This fragment was cloned and sequenced and the nucleotide sequence data were used to design subsequent 3′RACE experiments, which yielded a putative 3′ nucleotide sequence for the TnFT3 gene (data not shown). In stark contrast, however, multiple attempts to use a variety of previously described 5′-RACE methods failed to amplify the 5′ end of this gene.
Fortunately, some of these latter experiments yielded partial 5′-RACE products, which extended the TnFT3 sequence by a small distance in the 5′ direction. These results revealed that the presumed gene-specific sequence ended abruptly with short, highly GC-rich sequences, followed by sequences that bore no recognizable resemblance to known core α1,3 fucosyltransferase sequences (data not shown). These results suggested that the 5′ end of the TnFT3 gene might have an extremely high GC content, which allowed the mRNA template to form a highly stable hairpin structure(s) that was not accessible to the reverse transcriptase under any conditions utilized to this point in our study. Thus, the reverse transcriptase would copy the TnFT3 mRNA until encountering the hairpin(s), at which point it would jump to template sequences located upstream of the hairpin(s). Presumably, these latter template sequences would be far enough upstream of the coding and 5′-untranslated regions tied up in secondary structure(s) to be unrecognizable as core α1,3 fucosyltransferase sequences in our data analyses.
In theory, 5′-RACE can fail when applied to extremely GC-rich genes due to problems with either the first-strand cDNA synthesis reaction or the subsequent PCR amplification. In practice, first strand cDNA synthesis problems are usually assumed to be the major reason for 5′-RACE failures, and this was the focus of our troubleshooting efforts. Because conventional reverse transcriptases are not thermostable, the reverse transcription reactions are usually performed at a relatively low temperature (42
°C), which cannot melt secondary structures that can form in GC-rich mRNAs. We attempted to address this problem by using recently commercialized thermostable reverse transcriptases for first strand cDNA synthesis reactions at 50
°C (SuperScriptIII™) or 65
°C (Thermo-X™), but with no success (data not shown). Thus, we turned to homoectoine, which has an exceptional ability to decrease the melting temperature of DNA and has been used as a potent PCR enhancer [
17]. It seemed likely that homoectoine might be able to decrease the melting temperature of our extremely GC-rich mRNA template and, if it could do so without inactivating the reverse transcriptase, it should facilitate the reverse transcription step in our 5′-RACE method.
We examined this possibility by performing a first strand cDNA synthesis reaction using Thermo-X™ reverse transcriptase at 70
°C in the presence of 0.4875 M homoectoine, as described in Materials and Methods and illustrated in . To anchor the 3′ end of the first strand cDNA for subsequent PCR amplification steps, we used an adaptor ligation approach similar to a previously described single-strand linker ligation method (), which had been originally developed for full-length cDNA library construction [
18]. The sense strand of the adaptor (SA) included a 3′ overhang sequence (GNNNNN) designed to anneal to the 3′ end of the first strand cDNA and the antisense strand of the adaptor (ASA) included a free 5′ phosphate for subsequent ligation of the adaptor molecule to the first strand cDNA. It should be noted that this method was not designed to selectively amplify only full-length templates, as the adaptor could theoretically be linked to any degraded first strand cDNA molecule with a 3′-terminal deoxycytidine residue. We minimized this possibility by using great care to prevent degradation during the isolation of mRNA. Expecting an extremely high GC content in the 3′ end of the first strand cDNA product, we also incorporated special denaturation measures into our method prior to the adaptor addition. Specifically, the first strand cDNA was heated for 20 min at 65
°C in a solution of 0.3 N NaOH. The denatured cDNA was then ethanol precipitated, directly re-dissolved in the adaptor solution, and the DNAs were ligated, as described in Materials and Methods and illustrated in .
We were concerned that secondary structure also might adversely impact the PCR steps in our 5′-RACE procedure. Previous reports have shown that complete denaturation of DNA at 98
°C is essential for successful amplification of highly GC-rich sequences [
19,
20]. Other previous reports have shown that 5% DMSO and 1 M betaine can be used together to reduce stable secondary structures in DNA templates [
21]. Thus, we used a denaturation temperature of 98°C and included both 5% DMSO and 1 M betaine in our PCRs to eliminate putative secondary structures and facilitate 5′-RACE of the adaptor-ligated cDNA preparation (). Under these conditions, we found that neither
Taq nor
Pfu DNA polymerases produced specific amplification products (data not shown). Conversely, Phusion™ DNA polymerase was able to produce specific amplification products under these conditions. Finally, the PCR conditions developed as part of our new 5′-RACE method included unusual extension steps with progressively increasing times and temperatures, as described in Materials and Methods. This was designed to progressively melt any secondary structures that might have formed during the preceding annealing step.
A stepwise description of our new 5′-RACE method for extremely GC-rich genes is given in Materials and Methods and illustrated in . The results obtained when we used this new 5′-RACE method to try to isolate the 5′ end of the TnFT3 gene are shown in . Whereas all other conditions utilized in this study failed to produce a full-length 5′-RACE product, the new method described herein produced four major DNA fragments ranging from about 450–800 bp in length, each of which was large enough to accommodate the expected size of the open reading frames of the known core α1,3 fucosyltransferases. Direct sequencing of these 5′-RACE products, followed by BLAST-P [
22] and CLUSTAL-W [
23] analyses of the theoretical translation products revealed that each was specific, as both translation products were highly similar to known core α1,3 fucosyltransferases (data not shown). The 800 bp fragment extended the putative TnFT3 sequence from the internal primer through a putative translational initiation codon located at about the expected distance upstream of that primer, and well into the 5′ untranslated region. The 450 bp fragment included an identical 3′ sequence, but had a shorter 5′ untranslated region. The two fragments of intermediate size appeared to be specific, as they did not appear in the single primer control lanes, but were not sequenced because they probably arose from mRNA species of intermediate size with 5′-terminal deoxycytidine residues.
A detailed description of the putative TnFT3 gene, together with a functional analysis of the gene product is in progress and will be described elsewhere. However, in the context of this methodological report, it is important to discuss its GC content (data not shown). The overall GC content of the full-length TnFT3 cDNA turned out to be 48% over 3263 bp, which is not significantly biased. On the other hand, the 5′ end of the open reading frame included extremely GC-rich sequences, as predicted. In fact, the TnFT3 sequence begins with a 104 bp sequence in the 5′ untranslated region with a GC content of 68%. This is followed by a 213 bp sequence with a GC content of 78%, which includes the putative initiation codon. Within this latter 213 bp sequence, there are two smaller nucleotide sequences with even higher GC levels. The first is a 63 bp sequence with a GC content of 89% and the second is a 61 bp sequence with a GC content of 87%. Thus, it is likely that the extremely high GC content at the 5′ end of the TnFT3 mRNA and first strand cDNA resulted in the formation of extensive secondary structures that interfered with our ability to isolate the 5′ end for sequence analysis by conventional 5′-RACE methods.
At this point in our study, we believed we had determined the correct 5′ and 3′ ends of the putative TnFT3 gene. Our next goal was to characterize the biochemical properties of the gene product. Thus, we designed PCRs to isolate the sequences encoding the full-length enzyme and its computer-predicted soluble domain for enzyme expression, purification, and activity assays. To avoid potential problems associated with the presence of extremely GC-rich sequences in the TnFT3 gene, we produced the first strand cDNAs and performed the PCRs using the new conditions we had developed for 5′-RACE, as detailed in , except we used gene specific primers and performed only a single round of PCR. This approach yielded amplification products of the expected size (data not shown), which were cloned and sequenced. To our surprise, however, the sequencing results revealed that the 3′ end of the TnFT3 open reading frame included an additional 73 nucleotides, relative to the sequence that we had originally determined using a conventional 3′-RACE method (data not shown). This new sequence had a GC content of 82% and was embedded within a larger sequence of 162 nucleotides with a GC content of 80% (data not shown). Thus, it appeared that we had isolated the incorrect 3′ end of the TnFT3 gene in our original 3′-RACE experiments due, once again, to the presence of an extremely GC-rich sequence. This led us to perform an experiment in which we compared the results obtained using a “standard” 3′-RACE method or a new method using the conditions we had developed for 5′-RACE (), as described in Materials and methods.
Analysis of the products of these 3′-RACE reactions on an agarose gel with ethidium bromide staining showed that the established 3′-RACE method yielded no detectable product, whereas the new conditions outlined in yielded a 3′-RACE product of about the expected size (1.9 kb; ). Direct sequence analysis of this product confirmed that it was TnFT3-specific and that it contained the 73 bp GC-rich sequence that had apparently been deleted in our original 3′-RACE experiment (data not shown). It is noteworthy that the standard 3′-RACE conditions used in this experiment failed to yield a product, which was inconsistent with the fact that we had obtained a TnFT3-specific product, albeit with a 73 bp deletion, in our initial “standard” 3′-RACE experiments. This apparent discrepancy is easily explained by the fact that different amplification conditions were used for our original experiment and the experiment shown in . Specifically, the original 3′-RACE method involved secondary PCR with nested primers, whereas both of the 3′-RACE methods included in the controlled experiment shown in involved only primary PCRs. Secondary PCRs were not done in the latter experiment because this was not required to produce the correct 3′ end of the TnFT3 gene using the new 3′-RACE method.
Finally, we note that in addition to using the new RACE method described in this paper to successfully isolate the extremely GC-rich ends of the TnFT3 gene, we also have used the conditions used for the cDNA synthesis portion of our new method to successfully amplify another problematic insect gene. This latter gene included a 125 bp sequence that is 84% GC, which was consistently deleted when the full length coding sequence was amplified using conventional reverse transcription and PCR methods (data not shown). Thus, we anticipate that the new RACE method described in this report, as well as the conditions used for the PCR steps, will be widely applicable for the isolation of genes with extremely high GC content.