|Home | About | Journals | Submit | Contact Us | Français|
Alteration in gene copy number provides a simple way to change expression levels and alter phenotype. This was fully appreciated by bacteriologists more than 25 years ago, but the extent and implications of copy number polymorphism (CNP) have only recently become apparent in other organisms. New methods demonstrate the ubiquity of CNPs in eukaryotes and their medical importance in humans. CNP is also widespread in the Plasmodium falciparum genome and has an important and underappreciated role in determining phenotype. In this review, we summarize the distribution of CNP, its evolutionary dynamics within populations, its functional importance and its mode of evolution.
Two people can do more work than one. The same rationale applies to genes. Increasing gene copy number (CN) provides a simple way to increase gene expression without requiring a change in sequence. A flood of studies over the past five years have demonstrated that copy number polymorphism (CNP) is ubiquitous in eukaryotic genomes, explains a notable proportion (~17%) of expression variation and accounts for much of the genomic difference among individuals [1–4]. This revolution in our understanding of genetic variation has been driven by new technologies, such as microarrays, that enable direct assessment of CNP at a genome-wide scale. These methods reveal that CNP underlies many human pathologies and is involved in adaptive evolution in some cases [5,6]. As a consequence, there has been a shift in how biologists think about genetic variation. Although most genetic and phenotypic variation was assumed ten years ago to be due to point mutations, single-nucleotide polymorphisms (SNPs) and CNP are now considered on a more equal footing .
Work on CNP in malaria parasites started in the 1980s, but then went strangely silent. The importance of CN was first appreciated with the discovery that a candidate drug-resistance gene, Plasmodium falciparum multidrug-resistance 1 gene (pfmdr1), showed elevated CN in Southeast Asian parasites and was associated with elevated resistance to a variety of antimalarial drugs . Elegant laboratory experiments showed that chromosome 5 could be expanded and contracted by selection with mefloquine and chloroquine . Similarly, deletions on chromosome 2 and 9 were shown to be associated with cell cytoadherence [9,10]. At the same time, pulsed-field gels revealed that malaria parasite chromosomes are extremely polymorphic in size, suggesting large-scale structural rearrangements . Strangely, interest in this topic waned for two decades, probably reflecting the constraints of available technology. Before real-time PCR and microarrays, determining CN was challenging. By contrast, working with sequences was relatively easy. Malaria biologists shared the expectations of those working on other organisms that sequence data alone would explain most phenotypic variation.
The aims of this review are to summarize our current understanding of CNP in malaria parasites, highlight research that has driven this field forward and stimulate future work on this topic. Here, we (i) summarize the genomic distribution of CNP in P. falciparum, (ii) ask whether CNP influences parasite phenotype, (iii) describe what is known about the genetics and population genetics of CNP, (iv) discuss the implications of CNP for genome-wide association studies and (v) highlight questions about CNP of particular interest for Plasmodium. CNP is an emerging field in parasite genetics, so our aim is to raise questions rather than to provide answers.
Methods for detecting CNP are based on comparing relative amounts of DNA present in different parts of the genome. For example, early studies examined the strength of hybridization of pfmdr1 relative to a gene that was assumed to have a single copy in the P. falciparum genome . Current methods follow the same principle, but the scale and precision of the data generated greatly simplify such studies. Real-time PCR is the method of choice for determining CN when a small number of genes are examined in large numbers of samples. However, microarray approaches or next-generation sequencing provide a more cost-effective solution for genome-wide evaluation of CN (Box 1).
Gnomic DNA from two parasite lines is labeled with either Cy3 or Cy5 fluorophore dyes and equal quantities are hybridized to the microarray (two-color hybridization), consisting of unique oligonucleotide probes representing the 3D7 reference genome. Hybridization of a single parasite line (single-color hybridization) is also used on some platforms (Affymetrix). When there is no variation in sequence and/or CN, the two different samples hybridize equally and the log2 ratio of the signals from the two dyes is near zero. If a tandem duplication is present in sample A (red dye), the signal from the probes in the duplicated region will be approximately twice that of sample B (green dye) (log2 ratio = ~1) (Figure I). Similarly, a deletion in sample B would cause a displacement towards the red dye; however, the ratio would be much stronger (log2 ratio >4). In practice, CN reduction and deletion might be difficult to distinguish. Similarly, highly polymorphic genes might hybridize poorly and can be confused with deletions. An alternative analysis approach does not require a reference genome and relies on calculating average or median values for blocks of probes.
A variety of algorithms are available for analyzing CNP from microarray data, implemented in software such as NimbleScan and Nexus Copy Number 3.0 software (BioDiscovery, Inc.; El Segundo, California). When genomic DNA is limiting, whole genome amplified material can also be used with minimal loss of resolution .
Microarrays were originally printed on glass slides. Chip-based synthesis of oligo probes provides better quality control, and companies such as Affymetrix and Nimblegen-Roche use different synthesis strategies to generate high-density microarrays containing millions of features. An attractive feature of Nimblegen-Roche and Agilent arrays is that single custom arrays can be ordered, tested and redesigned iteratively, giving researchers control over the design process. The high density of Affymetrix chips (>6 million probes) provides advantages for some applications (Table I).
Short-read sequencing on platforms such as the Illumina Genome Analyzer has many potential advantages for CNP detection and could replace microarrays as the method of choice. In this case, CN can be inferred from read depth comparisons, rather than hybridization. Paired-end reads enable identification of chromosome breakpoints and determination of the orientation and position of gene copies in the genome [62,63]. Furthermore, deletions can be distinguished from highly polymorphic genes and additional DNA can be detected that is not present in the reference sequence. Finally, SNPs can be directly sequenced on the same platform, rather than indirectly inferred from hybridization.
Three microarray surveys represent our current understanding of the distribution of CNP in Plasmodium. Kidgell et al.  used an Affymetrix array containing 298 752 25-mer oligos to examine CNP in 14 single-clone parasites from four continents. Ribacke et al.  used glass slides printed with 6850 70-mer oligos to examine nine parasite isolates. Jiang et al.  used an Affymetrix array containing 2.56 million 25-mer probes (http://www.sanger.ac.uk/Projects/P_falciparum/news.shtml) tiled across the genome to examine four parasite isolates. The data from these studies [13–15] were collected on different microarrays using different parasite isolates and differing statistical criteria. Nevertheless, they provide a qualitative overview of CNP in P. falciparum (Figure 1).
In general, there is good agreement between the studies. For example, Jiang et al.  identified 181 amplified genes, of which 74 (41%) had been reported previously [13,14]. Kidgell et al.  described an average of 20.5 genes (range 1–63) different in CN between isolates, equivalent to 72 kb (range 0.8–251 kb), whereas Ribacke et al.  found that 29 genes (range 6–54) differ between parasites, equivalent to 130 kb (34–282 kb). The data are broadly consistent and suggest that 0.3–1.0% of the genome differ in CN between parasites. These data correspond well to patterns of CNP in the human genome: McCarroll et al.  found that, on average, 5.9 Mb or 0.2% of the ~3000 Mb genome differed in CN among individuals.
Most of the parasites examined in these three studies had been adapted to long-term culture. One concern is that the CNPs observed might have originated in the laboratory and were not representative of naturally occurring parasites. The two Ugandan field samples included in Ribacke et al.  study show limited CNP, although further data is required to determine whether these are representative.
In addition to CNPs on chromosome 5 and 12 (see below), numerous other CNPs with potential impact on parasite phenotype were observed. All three studies identified a CNP on chromosome 4 containing pfRH1 (PFD0110w), a reticulocyte-binding-like protein. Amplification of this locus is correlated with overexpression and mediates sialic-acid-dependent invasion of erythrocytes . Intriguingly, the two parasites with high CN in the dataset of Ribacke et al.  also show the highest growth rates in culture, consistent with a functional relationship. CNPs containing genes that potentially influence other phenotypes (including sexual differentiation, cell-cycle regulation and metabolism) are also observed, providing new targets for functional studies.
Because CNP alters gene dosage, we might expect it to alter levels of gene expression. This is clearly the case for genes in the chromosome 12 amplicon that segregates in the genetic cross. Progeny inheriting multiple copies of this amplicon from a multidrug-resistant parent (Dd2) express these genes at higher levels than progeny inheriting a single copy from the drug-sensitive (Hb3) parent at all 14 genes  (Figure 2a). Similarly, expression scales with CN in samples carrying different CNPs at the GTP-cyclohydrolase 1 (gch1) locus . More surprisingly, the chromosome 5 amplicon results in upregulation of multiple genes elsewhere in the parasite genome (Figure 2b). Gonzalez et al.  examined quantitative trait loci (QTLs) for expression variation in the progeny of the Dd2 × Hb3 genetic cross. The most prominent trans-regulatory locus, influencing 269 transcripts, coincides with the chromosome 5 amplification event carrying pfmdr1 and 13 other genes. 85% of transcripts (228/269) were upregulated in the Dd2 parent (or progeny) carrying the Dd2 alleles at these loci. These data demonstrate how drug selection on the Dd2 parental clone led not only to a CN change in the pfmdr1 gene but also to increased copies of putative neighboring regulatory factors that fundamentally alter this parasite’s transcriptional network.
Direct evidence for fitness costs of CNP comes from laboratory competition experiments: two mefloquine selected clones carrying elevated pfmdr1 CN showed multiplication rates that were 6–9% lower than the sensitive clone from which they were derived, in the absence of selection . Indirect evidence of fitness costs comes from work on the population genetics of CNP. Although large regions spanning >100 kb of chromosome 5 containing pfmdr1 were initially observed to be amplified in Southeast Asia , recent studies have revealed that the amplicons present are considerably smaller, ranging from 15 to 49 kb  with the predominant amplicon type measuring 16 kb. These data suggest that either amplified regions have been reduced in size or newly arisen CNPs with small amplicons have outcompeted large deleterious amplicons. At the gch1 on chromosome 12, parasites bearing amplicons measuring >11, 8.7 and 2.3 kb have the same genetic background, suggesting that progressive streamlining of amplicons has been operating . In this example, minimal amplicons containing only the gch1 locus predominate in Thailand, which is consistent with strong selection for small amplicons.
CNPs that are under positive selection are of particular interest because they play an important part in parasite survival and adaptation. We can utilize patterns of genetic variation to provide an indirect approach to identify CNPs that are under positive selection and, therefore, have a functional role in parasite biology. To illustrate this approach, we summarize work on CN variation at the gch1 gene on chromosome 12. This gene encodes an enzyme that is the first step in the folate biosynthesis pathway. Enzymes downstream in this pathway are targeted by the antifolate compounds sulfadoxine and pyrimethamine. Kidgell et al.  first described this CNP and speculated that increased CN at the gch1 could be selected either directly or indirectly by treatment with antifolate drugs. Patterns of genetic variation in natural parasite populations strongly support these conclusions (Figure 3). Nair et al.  compared gch1 CNPs in parasites from Thailand (strong historical antifolate selection) with those from neighboring Laos (weak antifolate selection). Although 72% carried multiple (2–11) copies in Thailand, just 2% of chromosomes had amplified CNs in Laos. The high level of geographical differentiation exceeded that observed at 73 synonymous SNPs, strongly suggesting the action of selection. Furthermore, microsatellite variation was reduced and linkage disequilibrium (LD) increased in a 900-kb region flanking gch1 in parasites from Thailand, consistent with rapid recent spread of chromosomes carrying multiple copies of gch1. Two other features of the data also strongly suggest natural selection. There were five amplicon types containing 1 to >6 genes and spanning 1–11 kb. These were found on three different genetic backgrounds, consistent with parallel evolution of this CNP. Finally, parasites bearing dhfr-164L, which causes high-level resistance to antifolate drugs, carry significantly (P = 0.00003) higher CN of gch1 than parasites bearing 164I do, indicating functional association between genes located on different chromosomes but linked in the same biochemical pathway. It is still unclear whether direct selection by antifolate drugs or indirect compensation for mutations elsewhere in the parasite genome is the selective force driving CN change.
For pfmdr1 (chromosome 5), there is direct evidence that CNP is driven by drug pressure. Field surveys show strong associations between high CN and increased drug resistance to mefloquine, quinine and artemisinin, and lower resistance to chloroquine [23,24]. These data are consistent with selection experiments in which CNP is amplified under mefloquine selection and de-amplified under chloroquine selection [8,25]. Finally, elegant transfection experiments demonstrate that reduction of CN results in higher resistance to chloroquine and lower resistance to mefloquine, halofantrine and artemisinin [26,27]. Recently, CNP has been demonstrated in the Plasmodium vivax homologue pvmdr1, where it is also associated with multidrug resistance [28,29]. Interestingly, CNP involving pcmdr1 underlies mefloquine resistance in the rodent malaria Plasmodium chabaudi, but gene copies in this case are found on different chromosomes, rather than arranged in tandem .
Other CNPs might also be involved in drug resistance. In laboratory drug-selection experiments, Jiang et al.  observed deletions of 15 genes on chromosome 10 that might contain loci involved in adaptation to drug treatment. Dharia et al.  generated parasite clones resistant to fosmodomycin, an inhibitor of isoprenoid synthesis. Two independently derived genetic mutants showed approximately threefold amplification of the ~100kb region containing 23 genes, including the gene encoding the enzyme targeted by this drug. Similarly, parasites showing 200-fold resistance to cysteine protease inhibitors have been selected in the laboratory; these parasites showed five- to sixfold amplification of a chromosome 11 region containing falcipain-2 and -3 . Up to 44-fold amplification of genome regions on chromosome 4 containing dhfr have been observed in two drug-selection experiments [34,35]. In nature, specific SNPs confer pyrimethamine resistance, although few field surveys have examined CNP at this locus. One possibility is that CNP is a transient response to selection at this locus and that mutation in one of the copies subsequently enables deamplification. An attractive feature of this model is that amplification raises the mutation rate at the amplified locus. Similar amplification–mutation–deamplification models [36,37] might explain the claims of ‘directed’ mutation in bacterial systems.
Culture adaptation results in genetic changes in the parasite genome, just as domestication alters the genomes of crops and farm animals. There have been no systematic attempts to understand genome-wide changes occurring during culture adaptation, but the available data suggest that substantial CN change occurs. Subtelomeric deletions on chromosome 2 and 9 repeatedly arise during culture adaptation. The chromosome 2 deletion includes the knob-associated histone-rich protein gene and is associated with loss of cytoadherence , whereas the chromosome 9 event involves loss of erythrocyte membrane proteins genes and influences both cytoadherence and gametocytogenesis [10,38]. These deletions are evident in the microarray surveys [13,14,39] and sound a caution that studies of population genetics of CN should use material directly from patients to avoid spurious results associated with laboratory adaptation. Microarray analyses of clones derived from parasite clone P1B5 enable fine mapping of the functional genes . Deletions on chromosome 9 containing three genes and associated with gametocytogenesis have been previously described [38,40,41]. In the gametocytogenesis-defective clones analyzed by Carret et al. , only the breakpoint open reading frame gene is deleted, which narrows the range of candidate loci for this phenotype.
In mammals, CNPs have high mutation rates. Estimates range from 1.1 × 10−2 to 3.6 × 10−3 per generation in inbred mice lineages , 2 × 10−4 per generation in human Y chromosomes  and 10−4 per generation for genetic disorders involving structural change . These estimates are up to 5–7 orders of magnitude higher than estimates of the SNP mutation rate (~10−9) . Measurement of asexual mutation rate for Plasmodium CNP could be achieved using mutation accumulation lines  or by examining the rate of evolution of particular CNPs underlying drug resistance in selection experiments. Using the second approach, Preechapornkul et al.  estimated that pfmdr1 duplications arise once in 108 parasites during laboratory selection experiments and amplification from two to three copies occurred in one in 1000 parasites. Similarly, Jiang et al.  examined genome-wide changes in CNP arising during selection of laboratory cultures. They found deletions of 15 genes on chromosome 10 and amplification of chromosome 5 regions containing pfmdr1 during drug-selection experiments. Each experiment involved ~5 × 108 parasites, so the upper limit for the mutation rate is 2 × 10−9.
Chromosome breakage sites for the pfmdr1 amplicon (chr 5) and the gch1 amplicons (chromosome 12) commonly contain long monomeric A/T tracts or microsatellite repeat sequences [19,22,47]. In the case of the pfmdr1 amplicon, the sites involved in breakage have longer arrays of A/T monomers than elsewhere in the genome, suggesting that such sites are prone to chromosome breakage. These data parallel findings from Drosophila  and other organisms  and support a mechanism of slip-strand mispairing during DNA replication. It is not clear whether CNP arises in nature during asexual mitotic replication in blood-stage parasites or during meiosis after gamete fusion in the mosquito midgut. The fact that CNP has been documented during culture adaptation and in numerous selection experiments clearly demonstrates that CN mutation can occur during asexual growth. Analysis of CNPs occurring in laboratory crosses should also be possible and will reveal whether meiotic division is also an important source of CNPs. The dynamics of CNPs within individual infections are also poorly understood. Analyses of CNP by microarray of real-time PCR methods examine populations of parasites and provide estimates of the average CN at any locus. In reality, if asexual mutation rate is high, we might expect parasites within the bloodstream to show a spectrum of CNP at any particular locus. Cloning of parasites from single infections or direct visualization of CN on individual chromosomes by methods such as fiber fluorescent in situ hybridization (FISH)  could aid understanding of the dynamics of CNP within infections.
SNP mutation is generally considered to be unidirectional, with a very low probability of reversion to the ancestral state. By contrast, CN is expected to both increase and decrease, and reversion to single-copy state can occur. Reversion to single-copy state provides the simplest explanation for patterns of variation observed around well-studied CNPs on chromosomes 5 and 12 [19,22]. In both cases, microsatellite haplotypes flanking tandemly repeated amplicons are also commonly found on chromosomes with single-copy status. In these two examples, there is clear evidence for multiple independent origins of amplification. In the case of pfmdr1, 15 different amplicon types are observed in parasites sampled from a single clinic, and these fall into five groups based on flanking microsatellite markers  (Figure 4). Hence, there are between 5 and 15 independent origins of CN amplification. In the case of gch1, five different amplicons with at least three different flanking haplotypes were found, consistent with a minimum of three amplification events . An important point is that amplicons of different size might have the same origin and amplicons of the same size might have different origins. Therefore, amplicon size data should be treated with caution when investigating the number of independent origins of a particular CNP . Similar complex patterns of recurrent amplification have been observed in both bacteria and inbred mouse lineages [42,51].
There is currently considerable interest in mapping genes underlying important phenotypic traits such as virulence and drug resistance . Following the model of human association mapping, dense SNP maps are being constructed by resequencing parasites from different geographical regions [53,54]. There are several concerns about the efficiency of using SNP maps to detect phenotypic variation resulting from CNPs . First, genome regions containing CNPs might be underrepresented on SNP maps because SNPs in these regions can show unusual segregation. For example, if a SNP differs in state on two different copies of an amplicon, the SNP will appear heterozygous and will not be assayed. Similarly, SNPs occurring in genome regions containing polymorphic deletions will be inconsistently scored and will be removed from genotyping panels. Second, CNPs might show weak linkage disequilibrium with flanking SNPs . This can result from the rapid mutation of CNP and from frequent reversion to single-copy status. Furthermore, phenotypes determined by CNP might show multiple evolutionary origins, once again reducing the power to detect association with flanking SNPs. The extent to which SNPs can tag CNP is not known. However, the ease with which CNP can be directly determined suggests that optimal strategies for locating underlying phenotypic variation in malaria parasites should involve measurement of both CN and SNP variation, rather than reliance on LD with flanking SNPs. Fortunately, the same microarray platforms can be used to document both SNP and CNP variation [13,16,56]. A strong argument for direct genotyping of CN mutations, rather than reliance on linkage disequilibrium with flanking SNPs, stems from the fact that comparative genomic hybridization (cGH) studies can rapidly identify genes that might be functional. The example of gch1 illustrates this point. Kidgell et al.  observed extensive amplification around this locus and were intrigued because this locus encodes the first enzyme in the folate pathway. Subsequent work has shown strong evidence that this locus is under selection, most likely as a consequence of treatment with antifolate drugs . Hence, cGH on a limited number of isolates enabled rapid identification of a CNP that has been selected by drug treatment.
We predict that CNP will be found to underlie a number of important phenotypes in malaria parasites in coming years. An exciting possibility is that complex phenotypes such as virulence might result from a constellation of highly mutable CNP changes, as envisaged for some complex diseases in humans . Understanding the functional role of CNP will require development of efficient methods for manipulation of CN experimentally. Inserting additional gene copies using ‘piggyback’ vectors  is one possible approach, although overexpression of single genes using strong promoters, such as such as hrp3 or hsp86, could be used to mimic dosage effects of gene amplification . The availability of genome sequences for parasites infecting rodents and primates will enable CNP studies to be extended to multiple malaria parasite species, improving understanding of CNP evolution [59,60]. In particular, further information is required on the organization of CNP within the genome. CNP is enriched in regions of segmental duplications in mammalian genomes: it will be of interest to see whether similar patterns are observed in Plasmodium. Next-generation sequencing methods, in combination with older methods such as FISH and pulsed-gel electrophoresis, should complement microarray approaches by revealing whether CNPs are arranged as tandem or inverted repeats or translocated to different sites in the genome. We expect that future studies of CNP in Plasmodium will prove to be both biologically interesting and biomedically important.
This work is funded by NIH R01 AI075145 and AI48071 (T.J.C.A.) and AI055035 (M.T.F.). We thank John and Asako Tan and Shalini Nair for comments and assistance in preparing figures.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.