|Home | About | Journals | Submit | Contact Us | Français|
We have investigated intragenic recombination in Block 2 of the merozoite surface protein-1 (MSP-1), where three allele-specific families: K1, Mad20, and RO33 were previously known. Using parasites from western Kenya, we have found a fourth Block 2 allele type, which is a recombinant between Mad20 and RO33 alleles. These recombinant alleles, which we have termed MR, contain sequence from the 5′ region of Mad20 and the 3′ region of RO33. The results of this study provide new data on the complexity of the MSP-1 antigen gene, which is a candidate vaccine antigen, and further support the importance of intragenic recombination in generating genetic variability in Plasmodium falciparum parasites in nature.
With 300–500 million clinical cases per year resulting in over 1 million deaths, malaria continues to be one of the leading causes of childhood morbidity and mortality in sub-Saharan Africa . Malaria infection during pregnancy also poses a serious health problem for women of child-bearing age and is associated with adverse effects during fetal development. Four known species of Plasmodium cause disease in humans, but almost all malaria-attributable mortality is due to Plasmodium falciparum. The malaria parasite’s life cycle alternates between the human host and the mosquito vector; it is complex, and the parasite stages exhibit extensive genetic and antigenic diversity.
Genetic diversity of the parasites contributes to the failure of anti-malaria parasite control measures. For instance, the use of antimalarial drugs has led to the emergence and dispersal of drug resistant parasites that render the drugs ineffective. Similarly, antigenic diversity allows the parasite to evade natural immune responses and may jeopardize the effectiveness of vaccines.
Over the past several years, parallel to vaccine development and testing efforts, several molecular epidemiologic studies have been conducted to provide information on the nature and extent of genetic diversity of promising candidate vaccine antigen genes [2–4]. Two major factors have been considered to explain the observed genetic diversity in a specific gene: natural selection and intragenic recombination [5–7].
Given that the malaria parasite’s life cycle includes a sexual stage, intragenic recombination during meiosis has been proposed as an important mechanism for the generation of novel genetic variants in malarial antigens [8–11]. However, finding direct evidence for intragenic recombination is hampered by the slow accumulation of synonymous substitutions that makes it difficult to separate recombination from convergence due to positive natural selection . The importance of intragenic recombination in the generation of new alleles has remained an elusive issue in the context of malarial vaccine antigens .
The merozoite surface protein-1 (MSP-1) is a promising candidate vaccine antigen. It is the most abundant surface protein on the merozoite of P. falciparum, and it is thought to play a role in erythrocyte invasion. The gene consists of both conserved and polymorphic regions, the most polymorphic of which is the 100–400 base pair (bp) Block 2 repetitive region.
Given the fact that MSP-1 Block 2 has been shown to play a role in the development of immunity , it is an important region of the protein for studying diversity and intragenic recombination. MSP1 Block 2 has been divided into three allele-specific families: K1, Mad20, and RO33. Alleles in K1 and Mad20 contain different numbers of unique, tripeptide repeats. K1 and Mad20 are the most diverse families in terms of the number of alleles identified using the fragment length polymorphism of Block 2. RO33 is monomorphic, lacking the tripeptide repeats observed in the other two families. Several studies have used MSP-1 Block 2-based PCR-genotyping to estimate the number of genetically distinct genotypes within an infection [14–16]. One study suggested the presence of alleles in Block 2 that are recombinants between two different main families, but only PCR-based typing methods were used . Another study that used parasite specimens from Mali also indicated an apparent Mad20 and RO33 recombinant using PCR amplification of MSP-1 Block 2 with Mad20 and RO33 primers . One of the limitations of PCR-based genetic analysis is that it does not allow detection of novel sequences that may exist in natural populations. Even more important in the context of studies of intragenic recombination, PCR-based methods can detect false recombinants due to nonspecific amplification.
Direct evidence of intragenic recombination in the MSP-1 gene of malaria parasites was provided by laboratory cross studies . Studies of recombination in MSP-1 have largely focused on regions other than Block 2 [9,12,19–22]. Four studies have included MSP-1 Block 2 in the regions analyzed [14,23–25]. Studies on the pattern of linkage disequilibrium among polymorphic sites suggested a high recombination rate in MSP-1 [23,24]. However, this kind of analysis does not address the issue of convergent mutations due to selection versus intragenic recombination in malarial vaccine antigens .
We are exploring the role of intragenic recombination in the generation of new alleles in P. falciparum. In this study, we have investigated genetic diversity of the Block 2 region of the MSP-1 gene using parasites from a birth cohort in western Kenya. We used allele-specific and mismatched amplifying primers in PCR assays and sequence-characterized the amplified fragments to determine the nature of the Block 2 region. We have found a novel fourth allele family of Block 2. This new allele family is a recombinant between the Mad20 and RO33 allele families and is found in more than 25% of the Kenyan field samples studied, showing that intragenic recombination is a factor in the generation of diversity in the MSP-1 antigen gene.
The samples used in this study came from participants of a birth cohort study (Asembo Bay Cohort Project) which took place in an area of high malaria transmission in western Kenya . We extracted DNA from 173 parasitized blood samples taken from 37 different infants between 1992 and 1994. All infants were parasitemia positive by microscopy. DNA was extracted from approximately 300 ul of infected pRBC’s using the PureGene extraction method (Gentra Systems).
We used a nested PCR method where the first PCR was done with the external 5′ (AAGCTTTGAAGATG-CAGTATTGAC) and 3′ (ATTCATTAATTTCTTCA-TATCCATC) primers, which amplify Block 2 plus some of the flanking regions. The second PCR was done with allele specific primers (K1; 5′GAAATTACTAC AAAAGGTGCAAGTG and 3′AGATGAAGTATT TGAACGAGGTAAAGTG: M20; 5′GCTGTTAC AACTAGTACACC and 3′TGAATTATCTGAAT TTGTACGTCTTGA: and RO33; 5′GCAAATACT CAAGTTGTTGCAAAGC and 3′AGGATTTGCAG CACCTGGAGATCT)  in three different non-recombinant primer combinations (K1 5′and 3′, M20 5′ and 3′, and RO33 5′and 3′) and six different recombinant primer combinations (K1 5′ and M20 3′, M20 5′ and K1 3′, K1 5′ and RO33 3′, RO33 5′ and K1 3′, M20 5′ and RO33 3′, and RO33 5′ and M20 3′). For the PCR amplification mixture, 5 μl of each of the appropriate 5′ and 3′ primers (40 ng μl−1) were added to 12 μl of dNTP mixture (Promega, 107 μmol per reaction), 3 μl of MgCl2 (25 mM), 10 μl of PCR 10× buffer containing 15 mM MgCl2 (Perkin – Elmer), 0.5 μl Taq polymerase, and 59.5 μl double distilled water. For the first PCR, 5 μl of DNA (adjusted to a concentration of 150 ng μl−1) was added to the 95 μl reaction mixture, and amplified for 25 cycles at a 55 °C annealing temperature. For each internal PCR, 5 μl of the first (external) PCR product was added to the 95 μl reaction mixture and amplified for 30 cycles at a 64 °C annealing temperature. Each internal PCR product was electrophoresed on a 3% Agarose-1000 gel (Gibco) to determine the alleles present. We could consistently detect length differences of 10 bp.
We sequenced products obtained from the recombinant PCR representing a range of sizes and band intensities to confirm whether or not the products were actual recombinants or PCR artifacts. Products were cut from 3% Agarose-1000 gels using a clean sterile scalpel, and the DNA was purified using the Qiaquick Gel Extraction Kit (Qiagen). The purified products were sequenced on an ABI automated sequencer.
One hundred and seventy three samples taken from 37 parasitemic infants were genotyped using Block 2 family specific primers for K1, Mad20, and RO33. Of the 173 samples genotyped 97, 76, and 85% contained one or more K1, Mad20, or RO33 alleles, respectively. Each sample contained an average of 2.35 K1 alleles (S.E. = 0.09) and 1.31 Mad20 alleles (S.E. = 0.08). Since RO33 is monomorphic, it was either present or absent in the samples tested. The frequency of the K1, Mad20, and RO33 alleles were similar to that observed in other samples in this area of western Kenya . We sequenced products representing a range of sizes and band intensities, and most of the non-recombinant products were ‘real’ (i.e. not PCR artifact).
The same 173 samples that were genotyped using the non-recombinant primer combinations were also genotyped using the six recombinant primer combinations (K1-Mad20, Mad20-K1, K1-RO33, RO33-K1, Mad20-RO33, and RO33-Mad20). Most of the products we observed using this method were faint spurious bands that upon sequence analysis appeared to be PCR artifacts rather than recombinant alleles. There was one recombinant type, made up of sequence from the 5′ end of Mad20 and the 3′ end of RO33, which we have termed ‘MR’. The most common MR allele detected was 140 bp in length, containing 72 bp of Mad20 sequence followed by 69 bp of RO33 sequence. Six additional sizes of MR alleles, ranging in size from approximately 130 to 220 bp, were also detected, and these differed only by the number of times the sequence GGTTCAGGT was repeated in the Mad20 region of the recombinant (Fig. 1A). The 200 bp MR allele also contains an extra 9 bp in the RO33 region. The amino acid sequences for the recombinants are shown in Fig. 1B, and the amino acid sequences for one representative of each of the non-recombinant families (K1, Mad20, and RO33) are shown in Fig. 1C. Comparing Fig. 1B and C demonstrates that the 5′ end of MR is the same as Mad20, and the 3′ end of MR is the same as RO33, while K1 is different from all the other sequences. Excluding artifacts, verified by sequencing of the PCR products, 48 (28%) of the 173 samples contained one or more MR alleles. Of the 37 infants tested, 21 (57%) had at least one infection containing a MR allele. To examine the effect of recrudescence on the MR frequency we examined 81 samples (of the original 173) that were preceded by aparasitemia either due to successful treatment or self-clearance of parasites. Of these 81 samples, 21 (26%) contained one or more MR alleles. These results suggest that the MR frequency estimate is not strongly affected by recrudescence.
Though these MR alleles appear to be genuine recombinant alleles based on their DNA sequence, false recombinant alleles can be generated during PCR as a result of template switching [27–32]. Saiki et al.  suggest that PCR-mediated recombinants can occur as a result of incompletely extended products acting as primers by annealing to other allelic templates. Odelberg et al.  suggest that such recombinants can be generated when either the polymerase or the nascent strand switches from the original template to a secondary template during DNA synthesis. Regardless of the mechanism, these PCR-mediated recombinants would have to contain sequence from templates present in the reaction. Consequently, in order for the observed MR alleles to be a result of template switching, both Mad20 and RO33 DNA would need to be present in the PCR reaction. Therefore, if a MR recombinant is detected in a sample that lacks Mad20 or RO33 alleles, one would not expect this product to be a result of in vitro PCR-mediated recombination.
In order to determine whether the observed MR alleles were the result of template switching, we did nested PCR using the Mad20 5′ primer and the RO33 3′ primer on 37 additional samples lacking both Mad20 and RO33 alleles. These samples were obtained from another study conducted in the same region of western Kenya . These additional samples were used to rule out template switching as an explanation for the presence of MR alleles and were not included in the MR frequency estimate. We were able to detect 140 bp recombinant alleles in two samples lacking all three main families of non-recombinant alleles (i.e. K1, Mad20, and RO33 alleles were not present). These 140 bp recombinants were identical in sequence to the 140 bp recombinants observed previously.
To further test the possibility of template switching, we made experimental mixtures of Mad20 and RO33 DNA from whole MSP-1 clones, generated for another study, from five infected Kenyan children. These clones contain the entire MSP-1 gene sequence amplified using primers that were outside of the MSP-1 Block 2 region. In the other study, 15 clones were identified as having different MSP-1 Block 2 genotypes. These clones were used to make the experimental mixtures of Mad20 and RO33 DNA to determine if we would get MR-positive PCR-genotyping. The Mad20 block 2 alleles used in the experimental mixtures were the same size as the alleles present in the majority of the samples containing MR recombinant alleles. Each clone contained only one insert, and only one type of Block 2 allele. We made a 2:1 mix of Mad20 DNA to RO33 DNA, a 1:1 mix, and a 1:2 mix. We then performed nested PCR using the Mad20 5′ primer and the RO33 3′ primer with the experimental mixtures as the template. Because we knew that there were no recombinant alleles present in the original template, any recombinant alleles detected in this PCR would have resulted from template switching. The PCR was repeated ten times for each of the experimental mixtures. PCR-genotyping of one of the experimental mixtures yielded a MR PCR product. Therefore, we suspected that this product was a PCR-artifact. However, upon sequencing the MSP-1 clones that made up the mixture, we found the MSP-1 clone (Clone S13A) thought to contain a Mad20 Block 2 actually contained a MR Block 2 region. Sequencing of this entire MSP-1 clone showed further proof of this fourth type of Block 2. No other recombinant products were generated from PCR-genotyping of the experimental mixtures of DNA, which provides further evidence to rule out template switching.
We next investigated the origin of the MR recombinant. Based on genetic comparison of the sequences, a recombination event between Mad20 and RO33 appears to be the most likely reason for the emergence of the MR allele. The parent alleles of this recombinant would need to have a sequence similar to the 5′ end (Mad20 progenitor) and 3′ end (RO33 progenitor) of MR. All of the MR alleles are identical in the RO33 region, with the exception of the 200 bp MR allele (extra 9 bp in RO33 region) and the S13A clone (Fig. 1). The S13A clone has a single nucleotide substitution in the region spanned by the 3′ RO33 primer. Because this substitution is in the primer region, we cannot determine whether this substitution is present in the alleles amplified with this primer. Only one out of 15 full length Mad20 MSP-1 sequences in Genbank contains this same nucleotide substitution (data not shown). Since almost all of the RO33 sequences obtained from this study and recorded in Genbank are identical to the 3′ end of the MR alleles detected, any one of these could be the RO33 progenitor of MR. The Mad20 region of MR differs only by the number of times the sequence GGTTCAGGT is repeated. The S13A clone and one other sample also have a single A–G nucleotide substitution (Fig. 1).
We examined the available sequences from Genbank and this study to look for a Mad20 allele that resembles the Mad20 portion of MR (Table 1). This sequence would have the sequence CAAAG preceding the MR repeat region and at least two GGTTCAGGT to explain the length polymorphism. In addition, the sequence would need to be similar in length to the RO33 allele for successful recombination. Considering these criteria, 45 Mad20 sequences could be divided into 21 Types (Table 1). Of these, the MSP-1 clone from Kenya (Clone S49-3) with the 150 bp Mad20 Block 2 allele, sequenced in this study, was consistent with the Mad20 portion of MR.
Alignment of RO33 with a hypothetical Mad20 progenitor showed how the MR 130 bp MR might have arisen (Fig. 2a). The S49-3 and a RO33 sequence also showed the potential for recombination to yield a 130 bp MR allele (Fig. 2b). Alignment of both the hypothetical progenitor and RO33 and the Kenya Mad20-150 Clone S49-3 showed that the predicted recombination site was CAGGTG (Fig. 2a and b). This recombination site is similar to the recombination hotspot, Chi (GCTGGTGG), documented in bacteria .
The first evidence of intragenic recombination in malaria parasites was provided by laboratory cross studies . Since then several studies have suggested intragenic recombination as an important factor in the generation of new alleles in malarial parasites [8–11]. As we have shown in this study, recombinants reported solely on the basis of PCR fragment analysis could simply be amplification artifacts. We have investigated recombination in the Block 2 region of the MSP-1, and provide evidence of a novel, fourth allele family, resulting from intragenic recombination between the Mad20 and RO33 alleles.
This Block 2 allele, which we have termed MR, is comprised of Mad20 sequence at the 5′ end and RO33 sequence at the 3′ end. The information on the diversity of the Block 2 region along with the results of recent studies of diversity of the C-terminal region of the protein [3,9], provide a better picture of the extent of genetic diversity in this important vaccine antigen gene (Fig. 3).
Two lines of evidence support the conclusion that the MR is a recombinant. First, the MR allele could not be explained by the occurrence of template switching during PCR. Second, we obtained a full-length MSP-1 clone containing a 150 bp MR Block 2 region from a parasite sample from western Kenya.
With very few exceptions noted in the results, these alleles differ only by the number of times the sequence GGTTCAGGT is repeated in the Mad20 region of the recombinant. This result suggests that these alleles arose from a single recombination event and diversified through expansion or contraction of repeats, either through slipped-strand mismatch repair or through recombination between MR alleles. We suggest that this Mad20-RO33 recombination was a single event, given that a crossing over takes place in the P. falciparum genome on an average every 1.67–106 bp . Although it is possible that recombination either between two MR alleles or between a MR and Mad20 might have later contributed to the MR repeat copy number diversity, slipped strand mismatch repair is a more parsimonious explanation. This suggestion is based on two lines of evidence. First, considering the frequency of MR alleles detected and the P. falciparum crossover rate , the probability of two MR alleles being present in an infection and recombining in a mosquito would be less than 10−5 per meiosis. This is a significantly lower rate than the estimated rate of slipped strand mismatch repair: 10−3 per meiosis or mitosis. Second, we did not detect any diversity in MR alleles indicative of recombination between Mad20 and MR alleles. Therefore, we suggest that an initial recombination between Mad20 and RO33 gave rise to MR and then MR repeat copy number diversity arose from slipped strand mismatch repair.
The age of the MR allele in relation to other alleles cannot be established at this time; however, it does not appear to be a ‘recent’ event in relation to the P. falciparum population expansion. The reason for this speculation is that the MR allele is also found in Asia and the Americas (S. Takala et al., unpublished), in addition to its presence in Kenya and Mali.
The predicted recombination site between Mad20 and RO33 occurs at the sequence CAGGTG underlined in Fig. 2. The absence of this site in the K1 allelic form of the gene may explain the lack of recombination observed between K1 and the other allele families. The predicted recombination site is similar to the Chi sequence (GCTGGTGG), which locally increases recombination in Escherichia coli. Similar crossover hot spot instigator sequences occur in other bacteria. In addition to the presence of this putative recombination site in the MSP-1 gene, this site is also present in several other P. falciparum protein genes that are distributed throughout chromosome 2 and 3 (Genbank). Interestingly, these chromosomes contain variable genes (e.g. var and rifin genes), which contain Chi and/or Chi-like sequences.
Though less prevalent than K1, Mad20, and RO33, MR is still detected in more than one-fourth of the samples we have genotyped. Of the 45 sequences available in Genbank or determined in this study, only one, Kenya Mad20-150 S49-3, had a sequence and length consistent with the hypothetical Mad20 progenitor of the MR allele family. If the Mad20 allelic diversity detected at this time is representative of the Mad20 allelic diversity at the time MR arose, then the recombination between the RO33 and Mad20 progenitor was followed by expansion of this MR recombinant into the population. It has been proposed that the allele families in P. falciparum MSP-1 are maintained by balancing selection , so it may be possible that the high frequency of this novel allele family in the sample under study may be due to some selective advantage that has allowed it to survive and amplify in the population. Further investigations will be needed to determine the worldwide frequency of this new allele family and if these alleles are under positive selection, and if so, what factors drive that selection such as differential recognition by the host immune system.
In conclusion, we have provided evidence that intragenic recombination has generated a novel, fourth allelic form of the Block 2 region of the MSP-1 antigen of P. falciparum. We have also identified a putative recombination spot in the MSP-1 gene that bears homology to the prokaryotic recombination hot spot. This new information on the genetic diversity of MSP-1 gene would be useful for vaccine development and testing efforts as well as in molecular epidemiologic investigations. This study also provides another example of intragenic recombination as an important factor for generating genetic variability in P. falciparum parasites in nature.
We thank the study participants for their willingness to participate in the study, the CDC/KEMRI ABCP staff for their support, and A. Barskey for his technical support. We thank one of the reviewers for bringing up several important points for inclusion in the manuscript. We also thank the Director of the Kenya Medical Research Institute (KEMRI) for approving the publication of this research. Shannon Takala was supported by the Emerging Infectious Diseases Fellowship Program administered through APHL and CDC. Ananias A. Escalante is supported by the grant NIH R01 GM60740.