Diversity of the genomic contents among strains of the same species of pathogens plays an important role in their evolution and could increase the antigenic repertoire of organisms to overcome host immune defenses. Genomic rearrangements including insertions/deletions or inversions are usually responsible for genomic diversity that is well-documented in members of
M. avium complex including
M. ap [
13]. With the availability of complete genome sequences, pathogen diversity is usually analyzed on a whole-genome level. Unfortunately, high throughput sequencing projects are subject to errors and needs continuous improvement as technology progresses. One of the most encountered sequencing errors is single nucleotide miscalls, resulting in frameshifts or additions/deletions of ORFs [
15-
17]. Other errors such as inversions and translocations are usually associated with the assembly step of the whole sequence. In this study, we analyzed errors in the
M. ap K-10 genome that are associated with sequence assembly.
Earlier microarray analysis of strains belonging to the
M. avium complex identified large regions of insertions/deletions [
18] in addition to 3 regions of large inversions between
M. av and
M. ap genomes [
2]. Here, we used optical mapping [
11] to examine genomic rearrangements between
M. ap K-10, the recently sequenced strain, and ATCC 19698, the type strain of
M. ap. Physical mapping has been conducted to compare genomes between
M. tuberculosis and
M. bovis [
19] and between
M. tuberculosis and
M. leprae [
20]. By taking the advantages of current technology, we are able to generate a high-resolution physical map of
M. ap. Earlier, optical mapping was used for large-scale, comparative analysis of several genomes of enteric pathogens revealing loci responsible for serotype conversion[
11]. Based on the comparison of genomic maps of
M. ap ATCC 19698 and
M. ap K-10, both genome sequences shared significant identity on a genome-wide scale at the resolution of the current optical mapping system [
11]. In fact, combined estimated size of the
M. ap ATCC 19698 genomes is only 6.8 kb (about 0.1%) larger than the size of the sequenced
M. ap K-10 genome.
Surprisingly, comparing the generated optical map to the restriction map of the
M. ap K-10 genome revealed an inversion of a large DNA segment (648 kb). The location of this inversion was close to an inverted region (inversion fragment III, 863.8 kb) that was identified earlier when
M. ap and
M. ah genomes were compared [
2]. Southern blotting, PCR and sequencing analyses did not confirm the difference between these genomes. This suggests two possibilities. One possibility is that there is an error in the assembly of the published
M. ap K-10 genome sequence and that it should be corrected to reflect the data in this report. A second possibility is that the changes reflect real mutational differences that have occurred during propagation of the
M. ap ATCC 19698 in the laboratory. Notably, the K-10 strain in our lab was obtained from Dr. Raul Barletta, the same origin for the K-10 strain used in the genome project (personal communication). Accordingly, we performed PCR analysis on strains K-10 and ATCC 19698 maintained in another laboratory as well as clinical isolates from different sources (human and cow). This analysis confirmed the optical mapping data, resulting in a revised segment that is flanked by IS
1311, suggesting an assembly error is the reason for the inversion.
Previously, optical mapping was used to help the assembly of the
Y. pestis genome [
8], a strategy that was not applied for
M. ap K-10 sequencing project [
1]. Interestingly, the MAP0001 gene and the origin of replication are included in the inverted region, which is usually used to dictate the orientation of genes in the genome. However, the
OriC region and conserved sequences involved in replication (e.g. DnaA boxes) [
21] remained intact in the revised sequence, therefore, the inversion of this region should not interfere with DNA replication. We suggest maintaining the same gene identification numbers in the inverted region to avoid confusions caused by changes. However, the genome sequence web portal should contain all of the information gathered from optical mapping (see supplementary Table Two, Additional file
1). Annotators and investigators interested in genomic synteny should be aware of the inversion in the assembled genome of
M. ap K-10 strain. Alternatively, we could reassign the complete locus tags with a distinct prefix to reflect the revised gene order and orientation. However, this task will require the re-naming of the whole genome of
M. ap K-10. Further inspection of the correctly assembled genome identified two additional copies of genes that were paralogues to known genes in
M. ap K-10 genome, suggesting a gene duplication or transposition event. The importance of such gene duplication events remains to be analyzed in the future.