VNTR variation and discriminative power
The MLVA data of 61 7th pandemic isolates including its O139 derivative and 5 genome sequenced strains from Grim et al.
] are presented as repeat numbers for each locus (Table
). Additionally, 3 pre-7th pandemic isolates were included for comparison but were excluded from the calculation of diversity statistics below. The 66 7th pandemic isolates were distinguished into 60 MLVA profiles. All MLVA profiles were represented by a single isolate except for 4 MLVA profiles that were represented by 4, 2, 2 and 2 isolates respectively. Two of these profiles belonged to SNP group II and had allelic profile of 9-6-4-7-26-14 and 9-6-4-7-25-13. Note that an MLVA profile is made up of the repeat numbers for the following loci (in order): vc0147, vc0437, vc1457, vc1650, vca0171 and vca0283. The remaining profiles were within SNP group VI and differed at vca0171 by only one repeat, with the profiles 10-7-3-9-(22/23)-11.
Details ofVibrio choleraestrains used and their MLVA profiles*
The level of variation differed across the six VNTRs analysed. In total, 7, 6, 3, 5, 19 and 24 alleles were observed for vc0147, vc0437, vc1457, vc1650, vca0171 and vca0283 respectively. It is also interesting to note that the 2 most variable VNTRs are located in the small chromosome while the other 4 less variable VNTRs are on the large chromosome. Additionally, one isolate (M542) amplified two products that differed by one repeat for vc1457 which has been observed previously [16
]. However, for phylogenetic analysis and scoring of alleles, only the fragment with the strongest signal was recorded. This VNTR is located within the cholera toxin subunit A promoter region which may have contributed to the decreased variation [18
The discriminatory power of each VNTR and all 6 VNTRs combined was measured by Simpson’s Index of Diversity (D). The highest D value was 0.957 and was recorded for vca0283. Except for vca0283 and vca0171, all D values were lower than previously reported. Our focus on 7th pandemic isolates which have been shown to be highly homogeneous may have contributed to these lower D values. VNTR vc1457 had the lowest D value of 0.437, which was lower than previously reported (D value
]. The combined D value of 7th pandemic isolates for all 6 VNTRs in this study was 0.995. We also calculated D values from previous studies by excluding MLVA data of environmental and non-7th pandemic isolates [19
] and found that the D values were similar and ranged from 0.962 to 0.990 [19
], when only 7th pandemic isolates were analysed. Analysis using the two most variable VNTRs, vca0171 and vca0283, produced comparable D values, which could potentially reduce the need to use the other markers. This would be particularly useful in outbreak situations where there is limited time and resources available to type isolates. However, typing the isolates in this study using only two loci would not reveal any useful relationships.
Phylogenetic analysis using MLVA
We analysed the MLVA using eBURST [23
]. Using the criteria of 5 out of 6 loci identical as definition of a clonal complex, 26 MLVA profiles were grouped into 7 clonal complexes with 37 singletons. For the 7 clonal complexes, a minimal spanning network (MSN) was constructed to show the relationships of the MLVA profiles (Figure
A). Many nodes in the 2 largest clonal complexes showed multiple alternative connections. There were 27 possible nodes differing by 1 locus, 4 nodes were due to the difference in vc0147 and 23 others were due to VNTR loci in chromosome II. Out of the 23 single locus difference in the 2 chromosome II VNTRs, the majority (57%) also differed by gain or loss of a single repeat unit. Thus 1 repeat change was the most frequent for the VNTRs on both chromosomes. It has been shown previously that it is more likely for a VNTR locus to differ by the gain or loss of a single repeat unit as seen in E. coli
] and we have also found this was the case in V. cholerae
. We then used the MLVA data for all 7th pandemic isolates to construct a minimal spanning tree (Additional file 1 Figure S 1
A). For nodes where alternative connections of equal minimal distance were present we selected the connection with priority rules in the order of: between nodes within the same SNP group, between nodes differing by 1 repeat difference and between nodes by closest geographical or temporal proximity. The majority of isolates differed by either 1 or 2 loci, which is attributable to vca0171 and vca0283 being the 2 most variable loci. It should be noted that node connections differing by more than one VNTR locus are less reliable as there were more alternatives.
Figure 1 eBURST analysis and minimum Spanning Networks of 7th pandemicV. choleraeisolates based on MLVA.A) MLVA using 6 VNTR loci and B) MLVA using 4 VNTR loci from chromosome I. Each circle represents a unique MLVA profile, with the isolate number/s belonging (more ...)
Since the 2 VNTRs on chromosome II were highly variable, exclusion of these 2 VNTRs may increase the reliability of the minimum spanning tree MST (Kendall et al [21
]). The number of unique MLVA profiles was reduced from 60 to 32. Nine profiles had multiple isolates, of which 5 contained isolates from 2 different SNP groups. eBURST analysis showed that using only the 4 chromosome I VNTR loci, the majority of the 4-loci MLVA profiles were grouped together as one clonal complex with one locus difference. Two MLVA profiles (represented by M543 and M714) were singletons and another 2 (M640 and M2316) formed a clonal complex by themselves. Out of 37 nodes connected by 1 locus difference, the repeat unit differed by the gain or loss of 1 to 11 repeats. The majority (19 events, 51%) differed by a single repeat unit, followed by 2 and 3 units with 7 and 6 events respectively. Gain or loss of 5 and 11 repeats were only seen in one node each. The MSN for the larger clonal complex showed many alternative connections of the nodes (Figure
B). Using the same principle as above to resolve alternative nodes with equal minimum distance, an MST was constructed to display the relationships of these MLVA profiles and the 4 more distantly related MLVA profiles as shown in Additional file 1
A previous SNP analysis with the same isolates had shown that 7th pandemic cholera had undergone stepwise evolution [13
]. None of these groups were clearly distinct from the either the 4 loci or 6 loci MLVA MST aside from SNP group VI which consists of O139 isolates (Figure
). However, a distinctive pattern can be seen when the consensus alleles within a SNP group are compared as shown in Table
. We allocated a consensus allele if more than half of the MLVA profiles carried a given allele in the SNP group and if there was no consensus, the consensus allele was represented by an x for discussion below. The 2 most variable VNTRs (vca0171 and vca0283) had no consensus alleles within any of the SNP groups except vca0171 in group VI. The allelic profile that initiated the 7th pandemic was likely to be 8-6-4-7-x-x based on the allelic profiles of the prepandemic stains which is also consistent with the profile of the earliest 7th pandemic isolate M793 from Indonesia. Group I had an 8-6-4-7-x-x allelic profile which evolved into 9
-6-4-7-x-x in group II. By changing the 2nd
VNTR allele from 6 to 7, groups III and IV had consensus profiles of 9-7
-4-7-x-x and 9-7
-4-x-20-x respectively, with the latter being most likely a 9-7-4-8-20-x profile (see Table
). Group V had the first VNTR allele reverted back to 8 and had an 8-7-4-8-x-x profile. SNP group VI showed the most allele changes with a 10-7-3-9-23-x profile compared with 8, 7,-, 8, 21/22, 23/16 from Stine et al.
]. Although vca0171 and vca0283 offered no group consensus alleles, it is interesting to note that the trend for vca0171 increased in the number of repeats while vca0283 decreased in the number of repeats over time (Table
). Each SNP group was most likely to have arisen once with a single MLVA type as the founder, identical VNTR alleles between SNP groups are most likely due to reverse/parallel changes. This has also contributed to the inability of MLVA to resolve relationships. The comparison of the SNP and MLVA data allowed us to see the reverse/parallel changes of VNTR alleles within known genetically related groups. However, the rate of such changes is difficult to quantitate with the current data set.
In order to resolve isolates within the established SNP groups of the 7th pandemic, all 6 VNTR loci were used to construct a MST for each SNP profile containing more than 2 isolates. Six separate MSTs were constructed and assigned to their respective SNP profiles as shown in Figure
. The largest VNTR difference within a SNP group was 5 loci which was seen between two sequenced strains, CIRS101 and B33. In contrast, there were several sets of MLVA profiles which differed by only one VNTR locus within the MSTs which showed that they were most closely related. The first set consisted of 5 MLVA profiles of six isolates within SNP group II, all of which were the earlier African isolates. The root of group II was M810, an Ethiopian isolate from 1970 which was consistent with previous results using AFLP [7
] and SNPs [13
]. However, the later African and Latin American isolates were not clearly resolved. We previously proposed that Latin American cholera originated from Africa based on SNP analysis, which was further supported by the clustering of recently sequenced strain C6706 from Peru [25
]. Note that C6706 is not on Figure
as we cannot extract VNTR data from the incomplete genome sequence. M2314 and M830 from Peru and French Guiana were the most closely related, with 2 VNTR differences, however the remainder of isolates in this subgroup were more diverse than earlier isolates. The second set of MLVA profiles differing by one locus consisted of all O139 isolates in SNP group VI except M834, which was separated by two VNTR loci. This finding is similar to a study by Ghosh et al
], who found that isolates collected within a year differed at only one locus, while isolates from later years differed at more than one locus. A similar trend was also seen between closely related samples taken from the same household or same individual [21
Figure 2 Composite tree of 7th pandemicV. choleraeisolates. Isolates were separated into six groups according to Single Nucleotide Polymorphism (SNP) typing. Isolates with identical SNP profiles were further separated using Multilocus Variable number tandem repeat (more ...)
Isolates from SNP group V were collected from Thailand and 3 regions of Africa and contained 3 genome sequences, MJ-1236, B33 and CIRS101, from Mozambique and Bangladesh [17
]. These isolates were shown to be identical based on 30 SNPs [13
]. The genetic relatedness of these isolates was also reflected by their MLVA profiles, which differ by only 2 loci. The consensus alleles for SNP group V was 8, 7, 4, 8, x, x, which was identical to the consensus alleles of MLVA group I (8, 7,-, 8, x, x) according to a 5-loci study by Choi et al.
No other consensus alleles of MLVA groups matched the current SNP group consensus alleles. However, there were 2 isolates from Africa (M823 and M826) with the profiles 10, 6, -, 7/8, x, x from this study, which matched 2 MLVA profiles of isolates from MLVA group III Vietnam from Choi et al.
]. These African isolates were collected in 1984 and 1990 while isolates from Choi et al.
] were collected between 2002–2008. It is unlikely that the isolates from these two studies are epidemiologically linked. This further highlights the need for SNP analysis to resolve evolutionary relationships before MLVA can be applied for further differentiation.
Based on a 5-loci MLVA study performed by Ali et al.
] the ancestral profile of the 2010 Haitian outbreak isolates was determined to be 8, 4, -, 6, 13, 36. Nine MLVA profiles differing by 1 locus were found in total and were mapped against our SNP study.
A previous study showed that 2010 Haitian cholera outbreak strain belong to SNP group V [25
]. However, based on the ancestral profile of the Haitian isolates, only the first locus was shared with our group V consensus allele and no other Haitian alleles were found in any of the group V isolates. Thus, no relationships could be made between group V isolates and the Haitian outbreak strains. Similarly, in another 5-loci MLVA study of 7th pandemic isolates sampled from 2002 to 2005 in Bangladesh [21
], no MLVA profiles were found to be identical at more than 2 loci to our MLVA profiles. Therefore, while MLVA may be highly discriminatory, it may not be reliable for longer term epidemiology and evolutionary relationships. Our studies of Salmonella enterica
serovar Typhi also reached a similar conclusion [28
]. However, it should be noted that although our isolates are representative of the spread of the 7th cholera pandemic, our sample size is relatively small. A study with a much larger sample may be useful to affirm this conclusion.