|Home | About | Journals | Submit | Contact Us | Français|
Escherichia coli and Shigella O antigens can be inferred using the rfb-restriction fragment length polymorphism (RFLP) molecular test. We present herein a dynamic programming algorithm-based software to compare the rfb-RFLP patterns of clinical isolates with those in a database containing the 171 previously published patterns corresponding to all known E. coli/Shigella O antigens.
Classical Escherichia coli/Shigella O serogrouping is expensive, labor-intensive, and susceptible to errors due to cross-reactivity between adsorbed O-antigen rabbit antisera, as reviewed in reference 2. Furthermore, corrupted expression of genes involved in O-antigen synthesis renders some strains nontypeable (“rough”). Importantly, classical serotyping does not detect new O antigens. These drawbacks are surmounted by the rfb-restriction fragment length polymorphism (RFLP) test (1, 2). Briefly, the rfb locus containing most of the genes involved in O-antigen synthesis is amplified by PCR and digested with MboII, and products are resolved by electrophoresis. A database with 171 rfb-RFLP patterns representing each known Shigella and E. coli O antigen has been published and used to type reference and clinical strains, including an isolate that had become rough in the lab, with 100% specificity and sensitivity (1, 2). We present herein a web-based software program to compare the rfb-RFLP patterns of clinical isolates with those of known E. coli and Shigella O antigens.
For the purpose of this work, the concepts of similarity and alignment between two rfb-RFLP patterns were adopted from Needleman and Wunsch's dynamic programming algorithm (7), which detects insertions, deletions, and substitutions as changes in the strings that represent nucleic acid or protein sequences. By analogy, restriction patterns represented by ordered fragment sizes can be aligned and their similarity can be calculated as the sum of penalties for the edit operations that transform one pattern into the other. These edit operations are deletions (missing bands) and transformations (errors on fragment sizing). A scoring function assigns a penalty to each transformation, and a dynamic programming algorithm computes the best editing scores between two patterns, producing an editing matrix. For two patterns, a = a1a2···am and b = b1b2···bn, where ai is the fragment size at position i of test pattern a and bj is the fragment size at position j of reference pattern b, the editing matrix has (m + 1) × (n + 1) positions, where n is the number of fragments in pattern a and m is the number of fragments in pattern b. The matrix is initialized, filling the first column with each fragment of pattern a and the first row with each fragment of pattern b. The aim is to move across the matrix from the first column and first row to the last column and last row. At each position, it is possible to move only one step either in the diagonal to align two fragments and advance one position in both patterns or in the horizontal (or vertical) to acknowledge a missing fragment in one pattern and advance one position in the other. Sij is the cumulative score at position i in pattern a and position j in pattern b:
s(aibj) is the score for aligning the fragments ai and bj, and w is the penalty for a missing fragment at position ai of pattern a or in position bj of pattern b; σ is a threshold defined by the equation σ = −5.82E−06 ai + 0.04451.
The scoring function tolerates a variable sizing error between two identical fragments. This error cannot exceed the penalty for a band deletion (w) which, as defined by the variable threshold σ, corresponds to a maximal error in band sizing ranging linearly from 7.0% at 0.5 kbp to 3.5% at 4 kbp. The editing score between two patterns is the sum of the penalties of all edit operations required to transform one pattern into another. It is found in the intersection of the last row and last column of the editing matrix. The corresponding global alignment is then extracted by tracing back the editing matrix.
A global score (Gs) is calculated using the editing score, the number of nonaligned fragments, and the total number of fragments in the two patterns: Gs = editing score × number of gaps × 100/(number of bands)2.
Gs is more influenced by failing to match two identical bands than by errors in band sizing, which is the most common artifact affecting reproducibility of RFLP-based methods.
MST (Molecular Serotyping Tool) (http://www.cebio.org/mst) iterates over the 171 reference Shigella and E. coli rfb-RFLP patterns previously published (1, 2), searching for the reference pattern with the lowest Gs to the test pattern provided by the user. If a match is found with Gs under a threshold (default = 1.5), the output is a schematic representation of the two patterns displayed side by side where two corresponding bands are linked by a line and the absence of a band is represented by a blank space in the respective pattern (Fig. (Fig.11).
MST (Molecular Serotyping Tool) was validated by searching the database with previously published rfb-RFLP patterns of 24 Shigella and 14 E. coli clinical isolates with known O antigens determined by rfb-RFLP and classical serotyping (1, 2). In all cases, MST accurately identified the O antigen from the rfb-RFLP pattern (Table (Table1)1) (100% specificity and sensitivity; discriminatory power = 1; scores ranging from 0 to 1.0, median = 0). The 171 patterns in the reference database for E. coli and Shigella were shuffled to generated 1,000 random patterns which were then compared to the database with MST. Seventeen matches (1.7%) were found with scores ranging from 0 to 1.5 (median = 1.2). The median scores of the 38 clinical isolates and the 17 random patterns with matches in the database were significantly different (P < 0.0001) by the Mann-Whitney test (GraphPad Prism, version 5.0; GraphPad Software Inc., San Diego, CA).
Three new Shigella serotypes and an O148 Shiga toxin-producing E. coli strain have been described using the rfb-RFLP test (3-6). The possibility of searching the database of reference rfb-RFLP patterns using MST might further contribute to the epidemiology of E. coli and Shigella.
R.S.C. and G.C.O. are research fellows from CDTS-FIOCRUZ/CAPES and CNPq, respectively. This work was supported by FAPEMIG grant CBB-1181/0 and NIH-Fogarty grant TW007012.
We acknowledge Eric Aguiar for technical assistance.
Published ahead of print on 3 March 2010.