Clinical P. falciparum samples and DNA extraction
As part of the activities related to the assessment of drug efficacy in uncomplicated malaria in Cambodia, 263 P. falciparum isolates were collected from consenting patients from 2001 to 2004. Blood samples were kept at -20°C at the Institut Pasteur du Cambodge until use. DNA was extracted from blood samples using QIAmp® DNA Mini Kit (Cat. No. 51306, QIAgen®, Germany), according to the manufacturer's procedure. All studies were conducted following good clinical practice, and ethical clearance was obtained from the National Ethical Comity of Cambodia.
Zip code design and microarray production
A Zip-code as used in the present study is defined as an artificial sequence composed of 24 bases. All the Zip-codes have similar melting temperature (Tm) values. A set of 96 Zip-code of 24 mer oligonucleotides has been designed using a dedicated algorithm developed in Visual Basic (Microsoft®
) (see Table S1 in Additional file 1
). They were tested to avoid self-pairing and hairpin formation (FastPCR, Institute of Biotechnology, University of Helsinki [18
]). Reverse complement oligonucleotides (cZip) were synthesized with an amino C-7 linker at the 3' end used for its attachment to the slide. Then cZip were spotted onto aldehydesilane coated slides with a 12-well format (AL MPX slides, Schott) using a VersArray ChipWritterPro system (Bio-Rad Laboratories, Hercules, CA). For spotting the cZip were resuspended at 50 micromolar in Phosphate buffer. FlexiChip spotting pattern of the 96 cZip with Cy3, Cy5 anchor prelabeled oligonucleotide and six negative controls is presented in additional figure 2A. Each oligonucleotide was spotted in triplicate. A total of 12 independent hybridizations can be performed in parallel on a single slide. ResMalChip arrays were produced as described in [16
To evaluate possible cross-hybridization between Zip-codes, each Zip-code associated with its primer was labelled using the SBE protocol (see below) and hybridized one by one on the microarray. Cross-hybridization was considered as significant when fluorescent average signal intensity of non tested Zip-code spots was above 10% of the average positive signal of the tested Zip-code spot. This was observed for only two spots out of 96 and the corresponding two Zip-codes were discarded from the analysis (see Table S1 in Additional file 1
Microarray protocol (Figure )
DNA amplification to genotype SNPs associated with anti-malarial drug resistance
Ten nested PCR were used to amplify DNA sequences of genes including SNPs associated with drug resistance as reported previously[16
N86Y, Y184F, S1034C, N1042D and D1246Y (two nested PCR); pfcrt
C72S, K76T, H97Q, T152A, S163R, A220S, Q271E, N326D/S, I356L/T and R371I (five nested PCR); pfdhfr
A16V, N51I, C59R, S108N/T and I164L (one nested PCR); pfdhps
S436A, A437G, K540E, A581G, A613T/S, I640F, and H645P (one nested PCR); and pfATPase6
S538R, Q574P, A623E, N683K, and S769N (one nested PCR).
Single base extension, SBE
Remaining free dNTP's were removed using a shrimp alkaline phosphatase (SAP). Briefly, 5 μl of those ten nested PCR products were mixed to 2 U of SAP (Amersham Biosciences, Freiburg, Germany) and incubated for 1 h at 37°C. From each sample, two reactions were performed using two combinations of Cy3 and Cy5 labelled ddNTP's (Perkin Elmer, Schwerzenbach, Switzerland). Sequenase (Termipol®
, Solis, Tartu, Estonia) extension reaction, reaction mixture and final denaturation were done for ResMalChip and FlexiChip as described by Crameri et al
]. Extended primers with cyanine labelling were hybridized onto the microarray. With this experimental design on FlexiChip, two samples can be processed per spotting area. As 40 positions are needed per sample, one set of extension primers can be associated with Zip-codes 1 to 40 while the second set can be associated with Zip-codes 49 to 88 (remaining positions 41 to 48 and 89 to 96 were not used).
Briefly, extended primers associated with a Zip-code were resuspended in 6 μl of 20 × SSC (1× SSC = 0.15 M NaCl, 0.015 M sodium citrate, pH 7.2) and hybridized on the array. Microarrays were then incubated during 60 min at 50°C, in a humid chamber and subsequently washed in 2 × SSC and 0,2% SDS for 20 min and in 2 × SSC for 20 min. Microarrays were spun 5 min at 3000 g to dry. During hybridization, extended primers linked with their specific Zip-code were hybridized on the FlexiChip cZip pattern (see Table S2B in Additional file 2
Hybridized microarrays were scanned at 635 nm and 532 nm using an Axon 4100A fluorescence scanner (Axon, Bucher Biotec AG, Basel, Switzerland) and Axon GenePix® Pro (version 6.0) software. The PMT (photomultiplier tube) was 550 at 532 nm and 500 at 635 nm.
Data analysis and allele identification
All the data analyses were performed using the R software [19
] and packages. The allele identification algorithm was written in R. It was applied independently on each array. The aim of this algorithm is to classify each spot of the array in either one of the "green", "red", or "indeterminate" classes, and then convert the spot colour into the corresponding SNP sequence. ResMalChip and FlexiChip raw data were first corrected for background using the limma package [20
] (version 2.12.0) according to a two-step procedure. A modified version of the "movingmin" option of the background correction function ("called "bgCorrect") was first applied to the data. This option smoothes the background on the basis of a 3 × 3 moving window. But unlike the original version, the modified version does not substract the smoothed background. Then the normexp procedure was applied. According to this procedure, the observed signal is modeled as the convolution of a true signal and a background one, where the true signal follows an exponential distribution and the background follows a Gaussian distribution.
This two-step process was derived because of a high background level observed with respect to the signal, especially on ResMalChip data. Spots that still had a signal to noise ratio lower than one after background correction were flagged "bg" (where "bg" stands for "background").
Data from negative and positive control spots were then excluded from the data set. An intensity threshold IT was computed on the remaining spots for each slide as the median of pooled "red" and "green" intensities. A log2 ratio of the "red" intensity over the "green" intensity was computed for each of the 1440 (three replicates of 40 SNPs spots for 12 samples) remaining spots.
A two-component Gaussian mixture model was fitted differently to the ResMalChip and FlexiChip datasets. For the ResMalChip dataset, a two-component Gaussian mixture model was computed using the Mclust function from the mclust package [21
] with the modelNames
parameter set to "E" (Gaussian functions with same variance). These two estimated Gaussian functions are estimates of the conditional prior probability functions f(x/ω1
) and f(x/ω2
) that describe the distribution of log ratios within the classes ω1
(these two classes are respectively associated with "green" and "red" spots). For the FlexiChip dataset, the model was built in two steps. A first optimal mixture model was computed using the Mclust function with default parameters (modelNames = c("E","V")). In most cases a three or more components mixture model was obtained. To get a two-component mixture model these components were grouped according to the sign (positive or negative) of their mean and a mixture model was derived from each of these two groups. These two "sub"-models were then used as estimates of the conditional prior probability functions (see Figure ).
Figure 2 The mixture model (FlexiChip). "Green": Gaussian components of the "green" class, "red": Gaussian components of the "red class", thick "green": prior conditional probability density function f(x/ω1), thick "red": prior conditional probability (more ...)
The remaining of the base-calling algorithm was then identical for the datasets of both chips. Conditional posterior probabilities P(ω1/x) and P(ω2/x) were computed according to the Bayes theorem: P(ωi/x) = f(x/ωi). P(ωi)/[f(x/ω1). P(ω1) + f(x/ω2). P(ω2)], i = 1,2
A third class called ω0 was created between ω1 and ω2. Its boundaries were defined using a tunable parameter called ambiguity rejection threshold and denoted Cr. This class contained data from spots that had a probability lower than Cr of belonging to one of the "red" and "green" classes and was used to exclude data having a low probability of good classification, i.e. lower than Cr.
Each spot on the array was first classified within one of the "green" (ω1)/"red" (ω2)/rejection (ω0)/weak signal/background (bg) classes according to the following decision rules:
• P(ω1/x) > P(ω2/x) and P(ω1/x) > Cr and IS > IT and bg = FALSE → d(x) = ω1
• P(ω2/x) > P(ω1/x) and P(ω2/x) > Cr and IS > IT and bg = FALSE → d(x) = ω2
• max(P(ωi/x)) ≤ Cr, i = 1,2 and IS > IT and bg = FALSE → d(x) = ω0
• IS > IT and background = TRUE → d(x) = bg
• IS ≤ IT → d(x) = weak signal
where x is the log ratio associated with the spot, d(x) is the decision associated with x, ISR and ISG are respectively the "red" and "green" intensities measured on the spot, and IS = max(ISR, ISG) is the maximum of both intensities for this spot.
A final decision was taken for each SNP on the basis of its three replicate spots as follows: if at least two of the three replicates were belonging to the same class the SNP was associated with this class, otherwise it was declared "indeterminate" and no further interpretation was performed.
Allele identification was done using a pre-defined table that describes the expected signal for each allele of each SNP (see Table ). This table was fully derived from the design of the experiment. As an example, according to this table a "red" signal (Cy5) is expected for spots associated with the RES16 SNP if the allele in the studied sample is a mutant, and a "green" signal (Cy3) otherwise. Three possible scenarios are encountered depending on the number of different probes that were associated with the SNP. In the first case, SNPs were represented by only one probe, meaning that only two different alleles were known for them. This was the most general case. Then, for SNPs that had been classified in ω1 or ω2, allele identification came straight from table . If a field sample was studied using FlexiChip or ResMalChip and the hybridization signal for the RES16 SNP was found to belong to the "red" class, the Pfdhfr gene from this sample was identified as mutant at position 16. The second scenario refers to SNPs that had only two known alleles but were represented by two different probes on the slide, in order to strengthen the identification process. Then, if one of the probes was classified as "weak signal" or "bg", the other probe result was taken into account. If both probe signals were valid (d(x) = ω1 or d(x) = ω2), the coherence between the probes was checked and in case of conflicting results the SNP was declared "indeterminate". The last scenario refers to the situation where more than two different alleles for a given SNP exist. Thus, three or four different alleles must be discriminated with two colours only. For these particular SNPs, two different probes were designed and the corresponding targets were labelled with two different combinations of Cy3 and Cy5 labelled ddNTP's, as explained in the experimental protocol section. For example, the position 108 on gene Pfdhfr is represented by two probes on the array, RES108 and RES108B. The first probe allows to distinguish between the wild type allele and either mutantA or mutantB. The second probe makes the difference between the mutantA and either wild type or mutantB. In such a case, allele identification was resolved according to the combination of both probe results. If one or both probe signals were classified "weak signal", "bg" or "indeterminate", the SNP was declared "indeterminate", otherwise it was determined according to Table . Mutually exclusive results for such two complementary probes led the associated SNP to be declared "indeterminate". As an example, this would be the case for the SNP RES108, if both probes gave "red" signals.
Expected spot signals for SNP positions processed by the algorithm (Results are given as a comparison with 3D7 genotype; WT = Wild type, MUT = Mutation)
Comparison between ResMalChip and FlexiChip results.
Direct sequencing of PCR products
A set of samples was sequenced for Pfdhfr, Pfcrt, Pfmdr1 and PfATPase6.genes. PCR products were purified using a P-100 Gel Fine solution (Biorad) and Multiscreen MAVN45 kit system (Millipore). Sequencing reactions were performed on both strands using internal primers and ABI Prism BigDye Terminator chemistry. Sequencing reactions were run on ABI Prism 3100 Genetic Analyzer (Applied Biosystems) at the Plate-Forme Génomique of Institut Pasteur in Paris, and analysed with Seqscape software v.2.0. (Applied Biosystems).
External quality control
Fifty P. falciparum
isolates from Cambodia were tested blindly in Mahidol Oxford Research Unit according to their own protocols. Briefly five SNPs of Pfmdr1
(positions 86, 184, 1034, 1042 and 1246) and one SNP of Pfcrt
genes (position 76) were screened with restriction length polymorphism methods [6
]. The Pfserca/Pfatpase6
gene was sequenced (4068 bp). Results were compared to the four SNPs tested with FlexiChip.