|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
A number of molecular tools have been developed to monitor the emergence and spread of anti-malarial drug resistance to Plasmodium falciparum. One of the major obstacles to the wider implementation of these tools is the absence of practical methods enabling high throughput analysis. Here a new Zip-code array is described, called FlexiChip, linked to a dedicated software program, which largely overcomes this problem.
Previously published microarray probes detecting single-nucleotide polymorphisms (SNP) associated with parasite resistance to anti-malarial drugs (ResMalChip) were adapted for a universal microarray FlexiChip format. To evaluate the overall sensitivity of the FlexiChip package (microarray + software), the results of FlexiChip were compared to ResMalChip microarray, using the same extension probes and with the same PCR products. In both cases, sequence results were used as gold standard to calculate sensitivity and specificity. FlexiChip results obtained with a set of field isolates were then compared to those assessed in an independent reference laboratory.
The FlexiChip package gave results identical to the ResMalChip results in 92.7% of samples (kappa coefficient 0.8491, with a standard error 0.021) and had a sensitivity of 95.88% and a specificity of 97.68% compared to the sequencing as the reference method. Moreover the method performed well compared to the results obtained in the reference laboratories, with 99.7% of identical results (kappa coefficient 0.9923, S.E. 0.0523).
Microarrays could be employed to monitor P. falciparum drug resistance markers with greater cost effectiveness and the possibility for high throughput analysis. The FlexiChip package is a promising tool for use in poor resource settings of malaria endemic countries.
Anti-malarial drugs play a pivotal role in malaria control, but a limited number of new drugs are under development. Resistance of malaria parasites to commonly used anti-malarial drugs is also a global challenge. Thus, there is a need to optimize the use of existing treatments and to monitor the emergence and the spread of drug resistant malaria parasites, in particular Plasmodium falciparum, which is responsible for the vast majority of malaria deaths [1-4]. Typing the known genetic drug resistance markers is among the strategies currently used for monitoring the resistance of P. falciparum. Single nucleotide polymorphisms (SNP) related to anti-malarial drug resistance include five major genes: Pfdhfr and Pfdhps for pyrimethamine and sulphadoxine resistance, Pfcrt and Pfmdr1 for chloroquine resistance and recently, but not yet confirmed by field studies, serca/atpase6 for artemisinin resistance. Different molecular tools have been developed, including the PCR-RFLP method [5-8], real-time PCR for assessing gene copy number , sequence analysis , the heteroduplex tracking assay , and PCR-amplification of the SNP containing fragments followed by single base extension (SBE) of an elongation primer with fluorescent ddNTP's .
DNA microarray-based SNP genotyping has been importantly developed over the past recent years. Surveying SNPs is an important tool in epidemiological studies on parasite resistance, but the currently available methods to identify resistance all have important drawbacks, including a limited focus on only the five mentioned genes and absence of a high throughput format. Several systems have been proposed [13,14], mainly based on PCR-amplification of the SNP containing fragments followed by SBE of an elongation primer with fluorescent ddNTP's . Recently, a genotyping array called ResMalChip has been developed to monitor 34 SNPs in five genes of P. falciparum that either confer or increase resistance to anti-malarial drugs . The ResMalChip method also has two major drawbacks. First, the content of the microarray is designed only for a specific objective (typing SNPs related to resistance) in a specific organism. Therefore, the use of this microarray for surveying other SNPs in any other gene or organism requires a new array design and production. This implies a large number of tests to adapt the system to new markers. The design of the capture oligonucleotides must be specific of the elongation primers and the experimental conditions of hybridization must be compatible with all the couples of capture oligo-elongation primers. Moreover, no standard software has been developed for data analysis. A new Zip-code array, called FlexiChip, associated with a dedicated software has been designed to address these problems (see Figure Figure1)1) . This array contains oligonucleotides (Zip-code) that are not complementary to any sequence in any known organism and have been designed to have the same thermodynamic properties. Therefore, various assays can be performed using a single protocol from SNP discovery to hybridization. The target probes contain the Zip-codes complementary sequences linked to the elongation primers. FlexiChip can in principle be used to test any SNP. Furthermore, an analysis algorithm based on a mixture model and allowing accurate SNP identification has been developed. This algorithm does not require any prior threshold determination and provides results in a simple Excel file format. To evaluate the overall sensitivity of the FlexiChip package (microarray + software), ResMalChip and FlexiChip data sets were analysed with this software and tested against sequence analysis. FlexiChip results were then compared to results obtained with ResMalChip using the same PCR products. For a set of 50 field isolates, FlexiChip results were also compared to those obtained in another molecular laboratory (MORU) acting as external quality control.
As part of the activities related to the assessment of drug efficacy in uncomplicated malaria in Cambodia, 263 P. falciparum isolates were collected from consenting patients from 2001 to 2004. Blood samples were kept at -20°C at the Institut Pasteur du Cambodge until use. DNA was extracted from blood samples using QIAmp® DNA Mini Kit (Cat. No. 51306, QIAgen®, Germany), according to the manufacturer's procedure. All studies were conducted following good clinical practice, and ethical clearance was obtained from the National Ethical Comity of Cambodia.
A Zip-code as used in the present study is defined as an artificial sequence composed of 24 bases. All the Zip-codes have similar melting temperature (Tm) values. A set of 96 Zip-code of 24 mer oligonucleotides has been designed using a dedicated algorithm developed in Visual Basic (Microsoft®) (see Table S1 in Additional file 1). They were tested to avoid self-pairing and hairpin formation (FastPCR, Institute of Biotechnology, University of Helsinki ). Reverse complement oligonucleotides (cZip) were synthesized with an amino C-7 linker at the 3' end used for its attachment to the slide. Then cZip were spotted onto aldehydesilane coated slides with a 12-well format (AL MPX slides, Schott) using a VersArray ChipWritterPro system (Bio-Rad Laboratories, Hercules, CA). For spotting the cZip were resuspended at 50 micromolar in Phosphate buffer. FlexiChip spotting pattern of the 96 cZip with Cy3, Cy5 anchor prelabeled oligonucleotide and six negative controls is presented in additional figure 2A. Each oligonucleotide was spotted in triplicate. A total of 12 independent hybridizations can be performed in parallel on a single slide. ResMalChip arrays were produced as described in .
To evaluate possible cross-hybridization between Zip-codes, each Zip-code associated with its primer was labelled using the SBE protocol (see below) and hybridized one by one on the microarray. Cross-hybridization was considered as significant when fluorescent average signal intensity of non tested Zip-code spots was above 10% of the average positive signal of the tested Zip-code spot. This was observed for only two spots out of 96 and the corresponding two Zip-codes were discarded from the analysis (see Table S1 in Additional file 1).
Ten nested PCR were used to amplify DNA sequences of genes including SNPs associated with drug resistance as reported previously: pfmdr1 N86Y, Y184F, S1034C, N1042D and D1246Y (two nested PCR); pfcrt C72S, K76T, H97Q, T152A, S163R, A220S, Q271E, N326D/S, I356L/T and R371I (five nested PCR); pfdhfr A16V, N51I, C59R, S108N/T and I164L (one nested PCR); pfdhps S436A, A437G, K540E, A581G, A613T/S, I640F, and H645P (one nested PCR); and pfATPase6 S538R, Q574P, A623E, N683K, and S769N (one nested PCR).
Remaining free dNTP's were removed using a shrimp alkaline phosphatase (SAP). Briefly, 5 μl of those ten nested PCR products were mixed to 2 U of SAP (Amersham Biosciences, Freiburg, Germany) and incubated for 1 h at 37°C. From each sample, two reactions were performed using two combinations of Cy3 and Cy5 labelled ddNTP's (Perkin Elmer, Schwerzenbach, Switzerland). Sequenase (Termipol®, Solis, Tartu, Estonia) extension reaction, reaction mixture and final denaturation were done for ResMalChip and FlexiChip as described by Crameri et al . Extended primers with cyanine labelling were hybridized onto the microarray. With this experimental design on FlexiChip, two samples can be processed per spotting area. As 40 positions are needed per sample, one set of extension primers can be associated with Zip-codes 1 to 40 while the second set can be associated with Zip-codes 49 to 88 (remaining positions 41 to 48 and 89 to 96 were not used).
Briefly, extended primers associated with a Zip-code were resuspended in 6 μl of 20 × SSC (1× SSC = 0.15 M NaCl, 0.015 M sodium citrate, pH 7.2) and hybridized on the array. Microarrays were then incubated during 60 min at 50°C, in a humid chamber and subsequently washed in 2 × SSC and 0,2% SDS for 20 min and in 2 × SSC for 20 min. Microarrays were spun 5 min at 3000 g to dry. During hybridization, extended primers linked with their specific Zip-code were hybridized on the FlexiChip cZip pattern (see Table S2B in Additional file 2).
Hybridized microarrays were scanned at 635 nm and 532 nm using an Axon 4100A fluorescence scanner (Axon, Bucher Biotec AG, Basel, Switzerland) and Axon GenePix® Pro (version 6.0) software. The PMT (photomultiplier tube) was 550 at 532 nm and 500 at 635 nm.
All the data analyses were performed using the R software  and packages. The allele identification algorithm was written in R. It was applied independently on each array. The aim of this algorithm is to classify each spot of the array in either one of the "green", "red", or "indeterminate" classes, and then convert the spot colour into the corresponding SNP sequence. ResMalChip and FlexiChip raw data were first corrected for background using the limma package  (version 2.12.0) according to a two-step procedure. A modified version of the "movingmin" option of the background correction function ("called "bgCorrect") was first applied to the data. This option smoothes the background on the basis of a 3 × 3 moving window. But unlike the original version, the modified version does not substract the smoothed background. Then the normexp procedure was applied. According to this procedure, the observed signal is modeled as the convolution of a true signal and a background one, where the true signal follows an exponential distribution and the background follows a Gaussian distribution.
This two-step process was derived because of a high background level observed with respect to the signal, especially on ResMalChip data. Spots that still had a signal to noise ratio lower than one after background correction were flagged "bg" (where "bg" stands for "background").
Data from negative and positive control spots were then excluded from the data set. An intensity threshold IT was computed on the remaining spots for each slide as the median of pooled "red" and "green" intensities. A log2 ratio of the "red" intensity over the "green" intensity was computed for each of the 1440 (three replicates of 40 SNPs spots for 12 samples) remaining spots.
A two-component Gaussian mixture model was fitted differently to the ResMalChip and FlexiChip datasets. For the ResMalChip dataset, a two-component Gaussian mixture model was computed using the Mclust function from the mclust package  with the modelNames parameter set to "E" (Gaussian functions with same variance). These two estimated Gaussian functions are estimates of the conditional prior probability functions f(x/ω1) and f(x/ω2) that describe the distribution of log ratios within the classes ω1 and ω2 (these two classes are respectively associated with "green" and "red" spots). For the FlexiChip dataset, the model was built in two steps. A first optimal mixture model was computed using the Mclust function with default parameters (modelNames = c("E","V")). In most cases a three or more components mixture model was obtained. To get a two-component mixture model these components were grouped according to the sign (positive or negative) of their mean and a mixture model was derived from each of these two groups. These two "sub"-models were then used as estimates of the conditional prior probability functions (see Figure Figure22).
The remaining of the base-calling algorithm was then identical for the datasets of both chips. Conditional posterior probabilities P(ω1/x) and P(ω2/x) were computed according to the Bayes theorem: P(ωi/x) = f(x/ωi). P(ωi)/[f(x/ω1). P(ω1) + f(x/ω2). P(ω2)], i = 1,2
A third class called ω0 was created between ω1 and ω2. Its boundaries were defined using a tunable parameter called ambiguity rejection threshold and denoted Cr. This class contained data from spots that had a probability lower than Cr of belonging to one of the "red" and "green" classes and was used to exclude data having a low probability of good classification, i.e. lower than Cr.
Each spot on the array was first classified within one of the "green" (ω1)/"red" (ω2)/rejection (ω0)/weak signal/background (bg) classes according to the following decision rules:
• P(ω1/x) > P(ω2/x) and P(ω1/x) > Cr and IS > IT and bg = FALSE → d(x) = ω1
• P(ω2/x) > P(ω1/x) and P(ω2/x) > Cr and IS > IT and bg = FALSE → d(x) = ω2
• max(P(ωi/x)) ≤ Cr, i = 1,2 and IS > IT and bg = FALSE → d(x) = ω0
• IS > IT and background = TRUE → d(x) = bg
• IS ≤ IT → d(x) = weak signal
where x is the log ratio associated with the spot, d(x) is the decision associated with x, ISR and ISG are respectively the "red" and "green" intensities measured on the spot, and IS = max(ISR, ISG) is the maximum of both intensities for this spot.
A final decision was taken for each SNP on the basis of its three replicate spots as follows: if at least two of the three replicates were belonging to the same class the SNP was associated with this class, otherwise it was declared "indeterminate" and no further interpretation was performed.
Allele identification was done using a pre-defined table that describes the expected signal for each allele of each SNP (see Table Table1).1). This table was fully derived from the design of the experiment. As an example, according to this table a "red" signal (Cy5) is expected for spots associated with the RES16 SNP if the allele in the studied sample is a mutant, and a "green" signal (Cy3) otherwise. Three possible scenarios are encountered depending on the number of different probes that were associated with the SNP. In the first case, SNPs were represented by only one probe, meaning that only two different alleles were known for them. This was the most general case. Then, for SNPs that had been classified in ω1 or ω2, allele identification came straight from table table2.2. If a field sample was studied using FlexiChip or ResMalChip and the hybridization signal for the RES16 SNP was found to belong to the "red" class, the Pfdhfr gene from this sample was identified as mutant at position 16. The second scenario refers to SNPs that had only two known alleles but were represented by two different probes on the slide, in order to strengthen the identification process. Then, if one of the probes was classified as "weak signal" or "bg", the other probe result was taken into account. If both probe signals were valid (d(x) = ω1 or d(x) = ω2), the coherence between the probes was checked and in case of conflicting results the SNP was declared "indeterminate". The last scenario refers to the situation where more than two different alleles for a given SNP exist. Thus, three or four different alleles must be discriminated with two colours only. For these particular SNPs, two different probes were designed and the corresponding targets were labelled with two different combinations of Cy3 and Cy5 labelled ddNTP's, as explained in the experimental protocol section. For example, the position 108 on gene Pfdhfr is represented by two probes on the array, RES108 and RES108B. The first probe allows to distinguish between the wild type allele and either mutantA or mutantB. The second probe makes the difference between the mutantA and either wild type or mutantB. In such a case, allele identification was resolved according to the combination of both probe results. If one or both probe signals were classified "weak signal", "bg" or "indeterminate", the SNP was declared "indeterminate", otherwise it was determined according to Table Table1.1. Mutually exclusive results for such two complementary probes led the associated SNP to be declared "indeterminate". As an example, this would be the case for the SNP RES108, if both probes gave "red" signals.
A set of samples was sequenced for Pfdhfr, Pfcrt, Pfmdr1 and PfATPase6.genes. PCR products were purified using a P-100 Gel Fine solution (Biorad) and Multiscreen MAVN45 kit system (Millipore). Sequencing reactions were performed on both strands using internal primers and ABI Prism BigDye Terminator chemistry. Sequencing reactions were run on ABI Prism 3100 Genetic Analyzer (Applied Biosystems) at the Plate-Forme Génomique of Institut Pasteur in Paris, and analysed with Seqscape software v.2.0. (Applied Biosystems).
Fifty P. falciparum isolates from Cambodia were tested blindly in Mahidol Oxford Research Unit according to their own protocols. Briefly five SNPs of Pfmdr1 (positions 86, 184, 1034, 1042 and 1246) and one SNP of Pfcrt genes (position 76) were screened with restriction length polymorphism methods [6,7]. The Pfserca/Pfatpase6 gene was sequenced (4068 bp). Results were compared to the four SNPs tested with FlexiChip.
Twenty five gpr files generated by the Axon GenePix® Pro software were analysed. They included data from 10520 SNPs corresponding to 263 samples tested for 40 positions on five genes. The best compromise between the number of ambiguity rejections and the number of misclassifications was obtained with an optimal rejection threshold of Cr = 0.2. On the 10520 SNPs data handled by this algorithm, 1396 (13.3%) were classified as "weak signal" and 905 (8.6%) were rejected for ambiguity (they belong to ω0, the intermediate class between the "red" and "green" classes). Among the 1642 SNPs data which could be compared to the sequence, 218 (13.3%) were classified as "weak signal" and 109 (6.6%) "indeterminate" (inconsistency between replicates or ambiguity). Compared to sequencing, considered the "gold standard", a good agreement was found with 96.63% and 95.74% for sensitivity and specificity respectively.
Six gpr files were analysed corresponding to 5000 SNPs data from 125 samples (part of the previous 263 samples analysed within ResMalChip) tested for 40 positions on five genes. The optimal rejection threshold value Cr was also 0.2. On 5000 SNPs analysed by the algorithm, 332 (6.6%) were classified as "weak signal" and 222 (4.4%) "indeterminate", i.e. two to three times less than with the ResMalChip array. Among the 1215 SNPs data for which the sequences were available, 28 (2.3%) and 38 (3.1%) were respectively considered as "weak signal" and "indeterminate". Sensitivity and specificity were 95.88% and 97.68% respectively.
A total of 3,078 SNP data corresponding to 81 samples and 38 positions on five genes were available in both ResMalChip and FlexiChip datasets. Among them, 2195 SNPs data were interpretable ("red" or "green" signal) with both techniques. An identical diagnosis was found for 2,034 (92.7%) of the SNP data (kappa coefficient 0.8491 with a standard error of 0.0213). When the results on a gene-by-gene basis were considered (Table (Table2),2), a very good agreement was found for dhfr, crt, atpase and mdr1 gene. The main discrepancies were observed for the dhps gene.
Fifty isolates were tested for eight SNPs in parallel in MORU with standard methods and with the FlexiChip. Among the 400 SNP data, 34 (8.5%) were classified as "weak signal" or "indeterminate". Among the 366 remaining SNP data, results were identical with both techniques in 365 cases (99.7% specificity, 91.5% sensibility, kappa coefficient 0.9923 with a standard error of 0.0523).
Molecular tools are essential for monitoring emergence and spread of anti-malarial drug resistance and are part of strategies described by the World Wide Anti-malarial Resistance Network (WWARN) consortium [22,23]. Correlation of molecular markers with in vivo and in vitro drug resistance has been clearly established for dhfr/dhps (sulphadoxine-pyrimethamine) and pfcrt (chloroquine) mutations, mdr1 (chloroquine, mefloquine) and cytochrome b (atovaquone). The microarray method described in this paper enables to implement molecular monitoring on a large scale because of the possibility to automatically analyse and interpret the results.
The aim of this project was to evaluate the flexible microarray under practical conditions using field isolates, in which multiple infections are frequently observed. Without any dye bias on the array, spots associated with mixed alleles should exhibit a "yellow" signal corresponding to a mix between red and green signals. In the framework of the proposed mathematical model, these spots should then fall in the intermediate class ω0. Thus, this class would be used to detect mixed infections instead of indeterminate ones. However, this mathematical property of the model could not be fully validated for several reasons. First, in the current study some SNPs showed no polymorphism in the processed samples. Indeed, field samples were sequenced for 20 SNPs out of 40 that were genotyped on the array. Among these 20 sequenced SNPs, only five showed polymorphism, with only one having both alleles in (almost) equal amount. Therefore, any dye bias on the signals measured on FlexiChip cannot be excluded, it would prevent mixed signals to behave as expected by the mathematical model. Second, the gold standard used to compare FlexiChip results with is sequencing. This method may not be the best one in the case of mixed alleles because chromatograms may be difficult to interpret, leading to erroneous sequences. The parameters of the mathematical model are derived on an array-by-array basis in order to adapt to possible technical variabilities between arrays. So they depend also on the proportion of single and mixed infections that are hybridized on the array. As most of the field samples analysed in this study were not polymorphic, the model behaviour in the case of a majority of mixed alleles cannot be predicted. But it is doubtless that it will have to be adapted to match the data distribution in that particular case. Finally, the actual design of FlexiChip makes it non exhaustive, as the use of two colours only for most of the monitored SNPs makes it unable to detect all the mutations The use of two mix combinations for the SNP located at position 108 on the pfdhfr gene led to a good classification rate of 100%. It is clear that extending the concept to the whole set of SNPs would increase the reliability of the base calling process, even in the case of mixed infections. Nevertheless, ResMalChip microarray has already been used in an environment of complex malaria infections like .
Combined with the FlexiChip microarray, the software provided a sensitivity and specificity of 95.88% and 97.68% respectively when compared to sequencing as the reference method. Moreover, the method performed well when compared to results obtained in a reference laboratory, with 99.7% concordance (kappa coefficient 0.9923 with a standard error of 0.0523).
The proposed package can be useful for epidemiological surveys and can give information on the dynamics of emergence and spread of genetic markers in time and/or in space. However, the method cannot be used as an immediate diagnostic tool for individual samples, because the format requires a high number of samples tested at one time to be cost effective.
In contrast to previous methods, FlexiChip is no longer dedicated to a single set of genes and/or organisms. Thanks to its flexibility, integration of new SNPs linked to anti-malarial drug resistance is made simpler and adjunction of species identification is now possible. It is easy to adapt to other loci and in particular for SNP detection of other organisms like HIV or Multi Drug Resistant Tuberculosis strains. Moreover, FlexiChip package is ready for use and adaptable to large scale studies to validate new molecular marker candidates.
One of the major obstacles for implementation of molecular monitoring of resistance lies in the absence of practical tools for high throughput analysis. Universal microarrays such as FlexiChip could help to change this, as they are adapted to processing of numerous samples and easily adaptable to new markers. Furthermore, they are well suited for molecular biology laboratories from endemic countries, which need a robust and simple tool that could be easily adapted to a specific epidemiological situation.
The authors declare that they have no competing interests.
NS, AC, JYC, OMP, HPB and FA were involved in the conception and design of the microarray, MAD developed the software, NS, NK, OS, SC, PL, CB, MI, AMD, DS managed experimental procedure and performed field and laboratory work, NS, MAD, CR and FA participated in the statistical analysis, NS, MAD, CR, JYC and FA drafted and critically revised the manuscript. All authors read and approved the manuscript.
List of cZip Codes spotted on FlexiChip. Table S1: List of cZip Codes spotted on FlexiChip. *: discarded from the analysis (number 49 and 61)
Scheme of the spotted oligonucleotides and expected hybridization. Table S2: Scheme of the spotted oligonucleotides and expected hybridization. A) Spotting pattern of cZip on FlexiChip (Neg = Spot of water; Cy3 and Cy5 = anchor oligonucleotides prelabeled with Cy3 or Cy5). B) Example of a FlexiChip within SNPs associated with parasite resistance to anti-malarial drugs used (blue and white areas correspond to the two samples that can be diagnosed at the same time; zip41 to 48 and 89 to 96 are not used on this microarray)
The authors thank the staff of the National Center for Parasitology, Entomology and Malaria Control as well as the staff of the European Commission National Malaria Control Programme for sample collection. The work was funded through a project in the 5th FP of the European Community (ResMalChip), the Global Malaria Programme, World Health Organization (Bill and Melinda Gates Foundation grant nr. 48821) and a grant from Institut Pasteur (Modipop project).