|Home | About | Journals | Submit | Contact Us | Français|
We have developed a new primer design strategy for PCR amplification of distantly related gene sequences based on consensus-degenerate hybrid oligonucleotide primers (CODEHOPs). An interactive program has been written to design CODEHOP PCR primers from conserved blocks of amino acids within multiply-aligned protein sequences. Each CODEHOP consists of a pool of related primers containing all possible nucleotide sequences encoding 3–4 highly conserved amino acids within a 3′ degenerate core. A longer 5′ non-degenerate clamp region contains the most probable nucleotide predicted for each flanking codon. CODEHOPs are used in PCR amplification to isolate distantly related sequences encoding the conserved amino acid sequence. The primer design software and the CODEHOP PCR strategy have been utilized for the identification and characterization of new gene orthologs and paralogs in different plant, animal and bacterial species. In addition, this approach has been successful in identifying new pathogen species. The CODEHOP designer (http://blocks.fhcrc.org/codehop.html) is linked to BlockMaker and the Multiple Alignment Processor within the Blocks Database World Wide Web (http://blocks.fhcrc.org).
Sequence comparison of multiple members of protein families has proven to be an important approach in the analysis of protein structure and function. The identification of blocks of conserved amino acid sequences has revealed the presence of protein motifs and domains that play important roles in protein function. However, such sequence analysis is limited by the number of individual protein sequences available for comparison. Even with the recent advances in whole genome sequencing, the acquisition of new members of a targeted protein family is limited. Previous methods to isolate unknown family members by PCR have relied on either degenerate primers consisting of a pool of primers containing most or all of the possible nucleotide sequences encoding a conserved amino acid motif or consensus primers consisting of a single primer containing the most common nucleotide at each codon position within the motif. Although these strategies have been successful in isolating closely related sequences, they have generally failed when sequences were more distantly related or were in low copy number. We have developed the COnsensus-DEgenerate Hybrid Oligonucleotide Primer (CODEHOP) PCR strategy for the identification of new members of protein families, which overcomes problems inherent in both degenerate and consensus methods for primer design (1).
Short regions of proteins with high levels of conservation can be represented as ungapped blocks of multiply aligned protein sequences (Fig. (Fig.1)1) (2). CODEHOPs are derived from these conserved sequence blocks (http://blocks.fhcrc.org/codehop.html), and are used in PCR to amplify the region between them. A CODEHOP PCR primer consists of a pool of primers each containing a different sequence in the 3′ degenerate core region where each primer provides one of the possible codon combinations encoding a targeted 3–4 conserved amino acid motif within the sequence block (Fig. (Fig.2).2). In addition, each primer in the pool has an identical 5′ consensus clamp region derived from the most probable nucleotide at each position encoding the conserved amino acids flanking the targeted motif. Amplification initiates by annealing and extension of primers in the pool with the most similarity in the 3′ degenerate core to the target template (Fig. (Fig.3).3). Annealing is stabilized by the 5′ consensus clamp which partially matches the target template. Once the primer is incorporated, it becomes the template for subsequent amplification cycles. Because all primers are identical in the 5′ consensus clamp region, they all will anneal at high stringency during subsequent rounds of amplification. This increases the efficiency of the PCR amplification and differentiates the CODEHOP technique from the less efficient consensus PCR or degenerate PCR techniques. The CODEHOP technique has been validated by the successful amplification of new members of protein families that have proven challenging using conventional methods (1).
The CODEHOP program consists of the following nine steps.
Examples of CODEHOP PCR primers designed from Blocks D and E of the cytosine DNA methyltransferases are provided in Figure Figure44.
The CODEHOP designer can be biased towards selected sequences within a block of multiply aligned sequences. This is useful for targeting specific orthologous or paralogous members of a protein family. The set of blocks returned by BlockMaker for CODEHOP input may be analyzed phylogenetically to permit convenient selection of a subset of related blocks using the ProWeb TreeViewer link from the BlockMaker output. To emphasize or de-emphasize a particular sequence, the weight provided for each sequence segment in the BlockMaker output (5) can be manually altered, thus changing its contribution to the primer design (Fig. (Fig.1).1). Low abundance nucleotides within the 3′ degenerate codon positions can be excluded by increasing the degeneracy ‘strictness’ value in the range from 0 to 1. Also, members of a protein family from a specific organism or genome can be targeted by selecting the applicable codon usage table for that genome. Finally, the most common codon encoding the consensus amino acid determined for the 5′ consensus clamp region can replace the most favored nucleotide chosen from DNA PSSM.
Other user-defined parameters are available to alter the primer design and output. First, the specificity and function of a CODEHOP PCR primer may be altered by changing the length of the 5′ consensus clamp region through an annealing temperature parameter (default=60°C). The presence of nucleotide runs within the 5′ consensus clamp region can be limited through a polynucleotide parameter (default=5) and the core/clamp boundary may be restricted to a codonboundary. Finally, the program output can show all possible primers or only the single most degenerate primer in each region.
Selection of the optimal CODEHOP PCR primer is primarily based on minimal degeneracy across the 3′ degenerate core and secondarily on the clamp score, which indicates the quality of the match between the 5′ non-degenerate clamp and the sequence block given a codon usage table. If no primer is identified from an input sequence block, the block may be biased using the methods described above to remove or de-emphasize the most distantly related sequences. Conversely, the default degeneracy limit of 128 may be increased to 256 or higher, which may identify highly conserved motifs that contain amino acids with higher codon degeneracies.
Since publication of the original CODEHOP manuscript (1) and implementation of the CODEHOP designer program on the WWW in 1998 more than 70 studies have been published in which CODEHOP PCR primers have been designed and successfully utilized to amplify distantly related sequences from organisms as diverse as fish, frog, protozoa, plants, viruses and bacteria (http://courses.washington.edu/bioinfo/CODEHOP/Codehop%20Genes.html). The methodology has been further expanded in the exploration of conservation and diversity of gene families in higher plants (6) and the characterization of viral genomes (7,8). Useful features of the CODEHOP designer include reweighting sequences in order to bias the design of CODEHOPs towards targeted protein families and the ProWeb TreeView selection of input sequences based on their phylogenetic relationship. The CODEHOP web site provides information for ‘Getting Started’, a detailed ‘Help’ page, and a description of the ‘CODEHOP Algorithm’. We are currently working on enhancements to the CODEHOP PCR strategy and the output of the designer program.