|Home | About | Journals | Submit | Contact Us | Français|
The genetic similarity between Mycobacterium avium subsp. paratuberculosis and other mycobacterial species has confounded the development of M. avium subsp. paratuberculosis-specific diagnostic reagents. Random shotgun sequencing of the M. avium subsp. paratuberculosis genome in our laboratories has shown >98% sequence identity with Mycobacterium avium subsp. avium in some regions. However, an in silico comparison of the largest annotated M. avium subsp. paratuberculosis contigs, totaling 2,658,271 bp, with the unfinished M. avium subsp. avium genome has revealed 27 predicted M. avium subsp. paratuberculosis coding sequences that do not align with M. avium subsp. avium sequences. BLASTP analysis of the 27 predicted coding sequences (genes) shows that 24 do not match sequences in public sequence databases, such as GenBank. These novel sequences were examined by PCR amplification with genomic DNA from eight mycobacterial species and ten independent isolates of M. avium subsp. paratuberculosis. From these analyses, 21 genes were found to be present in all M. avium subsp. paratuberculosis isolates and absent from all other mycobacterial species tested. One region of the M. avium subsp. paratuberculosis genome contains a cluster of eight genes, arranged in tandem, that is absent in other mycobacterial species. This region spans 4.4 kb and is separated from other predicted coding regions by 1,408 bp upstream and 1,092 bp downstream. The gene upstream of this eight-gene cluster has strong similarity to mycobacteriophage integrase sequences. The GC content of this 4.4-kb region is 66%, which is similar to the rest of the genome, indicating that this region was not horizontally acquired recently. Southern hybridization analysis confirmed that this gene cluster is present only in M. avium subsp. paratuberculosis. Collectively, these studies suggest that a genomics approach will help in identifying novel M. avium subsp. paratuberculosis genes as candidate diagnostic sequences.
Paratuberculosis, or Johne's disease, is a granulomatous enteritis of ruminant animals that may be prevalent in approximately 35% of United States dairy herds (7, 24). Diarrhea, reduced feed intake, weight loss, and eventual death characterize this intestinal disorder in cattle. Based upon prevalence figures and information from animal producers, economic losses for the dairy industry exceed 200 million dollars annually (18). Mycobacterium avium subsp. paratuberculosis is the etiologic agent of this economically significant disease. This veterinary pathogen has also been implicated as the etiologic agent of Crohn's disease (15), leading researchers to speculate on a potential pathogenic role for this organism in humans.
The control of Johne's disease is severely hampered by inadequate diagnostic tools (26). The prolonged incubation time and presence of subclinical cases permit infected animals to shed large amounts of bacilli in their feces before detection (21). Culture of M. avium subsp. paratuberculosis from feces has been the most reliable method for identifying infected animals; however, the slow growth of this organism results in a minimum of 6 weeks before culture data are available. Research on the pathogenesis and immunology of M. avium subsp. paratuberculosis infections of cattle will allow the design of better diagnostic and control procedures. New approaches that yield improved diagnostic tests will enable early detection and removal of subclinically infected animals. This will effectively reduce the incidence of Johne's disease in beef and dairy herds.
With the availability of over 60 published microbial genomes, some of which are in the same genus or even species, the age of comparative genomics has arrived. This approach is particularly useful in the genus Mycobacterium due to the number of sequenced species. M. tuberculosis and M. leprae genomes have been published (5, 6), and projects are under way for M. bovis, M. avium subsp. avium, M. smegmatis, and M. avium subsp. paratuberculosis (3). Comparative mycobacterial genomic approaches have been used to identify small-scale genomic deletions among M. tuberculosis isolates (12). Furthermore, large genome rearrangements (2) as well as deleted regions (14) were identified in studies comparing the M. bovis BCG vaccine strain with M. tuberculosis. Genome-wide comparisons in this genus will lead to an increased understanding of the genes required for pathogenicity as well as highlighting the sequences that make each species distinct.
Our laboratories have been actively engaged in sequencing the genome of M. avium subsp. paratuberculosis in order to reveal diagnostic sequences and/or antigens as well as to better understand the pathogenesis of Johne's disease. The strong nucleotide identity between M. avium subsp. avium and M. avium subsp. paratuberculosis (11, 22) has prevented the development of M. avium subsp. paratuberculosis-specific DNA sequences or antigens. To date, the only routinely used diagnostic sequence is that of the insertion element IS900, which is present in multiple copies in the M. avium subsp. paratuberculosis genome (9). In this study, we performed a partial genome comparison between the largest annotated contiguous DNA fragments (contigs) of M. avium subsp. paratuberculosis and the genetically similar M. avium subsp. avium genome. Sequences present in M. avium subsp. paratuberculosis but not M. avium subsp. avium were further analyzed by PCR with genomic DNA from several mycobacterial species. From these analyses, 21 unique M. avium subsp. paratuberculosis predicted coding sequences were identified. These unique sequences may be used to develop improved diagnostic reagents.
Mycobacteria used in this study are listed in Table Table1.1. All mycobacteria were cultured in Middlebrook 7H9 medium with 0.05% Tween 80 and oleic acid-albumin-dextrose-complex (Becton Dickinson Microbiology, Sparks, Md.). Cultures containing M. avium subsp. paratuberculosis isolates were supplemented with 2 mg of ferric mycobactin J (Allied Monitor Inc., Fayette, Mo.) per liter. All growth flasks were incubated at 37°C without shaking.
The sequencing and assembly strategies used here will be described elsewhere. For these studies, we chose assembled M. avium subsp. paratuberculosis contig fragments greater than 10 kb. Predicted coding sequences (genes) were identified with ARTEMIS software (http://www.sanger.ac.uk/Software/) and TB-parse, a program used to identify coding sequences in the M. tuberculosis genome (5). The results were compared and verified manually in ARTEMIS. A putative ribosome binding site was also evaluated for each coding sequence. The presence of an AG-rich sequence approximately 30 bp upstream of the start codon was scored as a putative ribosome binding site sequence. Similarities were identified by BLASTP analysis by using GenBank and a local database constructed by the Computational Biology Center at the University of Minnesota (http://www.cbc.umn.edu).
Sequence alignments of M. avium subsp. paratuberculosis and M. avium subsp. avium were compared and visualized with ACT software (http://www.sanger.ac.uk/Software/). M. avium subsp. avium is being sequenced by The Institute for Genomic Research (TIGR; http://www.tigr.org/cgi-bin/BlastSearch/blast.cgi?organism=m_avium). Sequence alignments used to produce illustrations were made with AssemblyLIGN software (Accelrys, Princeton, N.J.).
Genomic DNA was extracted from several species of mycobacteria by a method modified from that described by Whipple et al. (25). One liter of Middlebrook 7H9-cultured mycobacteria was incubated at 37°C until an optical density at 540 between 0.50 and 0.56 was attained. d-Cycloserine was added to the medium at a final concentration of 0.5 mg/ml and incubated for an additional 24 h. Mycobacteria were harvested by centrifugation at 9,950 × g for 15 min, and the pellet was resuspended in 11 ml of Qiagen buffer B1 containing 1 mg of Qiagen RNase A per ml. Lipase was added (450,000 U; catalog no. L4384; Sigma, St. Louis, Mo.) to digest mycobacterial cell wall lipids. Following a 2-h incubation at 37°C, 20 mg of lysozyme was added, and incubation proceeded for an additional 3 h at 37°C. Qiagen proteinase K (500 μl; 20 mg/ml) was added and incubated for 1.5 h at 37°C. Qiagen buffer B2 (4 ml) was added, and the slurry was mixed and incubated 16 h at 50°C. The remaining cellular debris was removed by centrifugation at 12,100 × g for 20 min. The supernatant was poured over a preequilibrated Qiagen 500/G genomic tip. The loaded column was washed and processed according to the instructions of the manufacturer. PstI-restricted DNA fragments were separated on a 1% agarose gel. DNA-containing gels were depurinated, denatured, and neutralized as described by Sambrook et al. (19). DNA was transferred by capillary action (20) to BrightStar-Plus membranes (Ambion, Austin, Tex.), and probes were labeled with [α-32P]dCTP (ICN, Costa Mesa, Calif.) by random priming. Hybridization was performed in an Autoblot hybridization oven (Bellco Biotechnology, Vineland, N.J.) at 45°C for 16 h in ExpressHyb hybridization solution (Clontech, Palo Alto, Calif.). Probed blots were washed sequentially with increasing stringency solutions as described previously (20). Detection was by autoradiography using BioMax MR film (Kodak, Rochester, N.Y.).
Primers listed in Table Table22 were designed from M. avium subsp. paratuberculosis-specific sequences to amplify mycobacterial genomic DNA. Amplification recipes containing nucleotides, buffer, primers, template, and DNA polymerase were standard except for the addition of 5% dimethyl sulfoxide (Sigma). Amplification conditions included a 5-min denaturation step at 94°C and 35 cycles of 45 s at 94°C, 1 min at 55°C, and 2 min at 72°C. High-fidelity Pwo polymerase (Boehringer Ingelheim Pharmaceutical Inc., Ridgefield, Conn.) was used in amplifications to generate probes used in Southern hybridization experiments. All other amplifications used Taq DNA polymerase (Roche Molecular Biochemicals, Indianapolis, Ind.). Primers used to amplify the no. 7 sequence for a probe in Southern hybridizations were 5′-ATCAGGCTGACGGGATTGCCC-3′ and 5′-TCAACGAGTGCACGGGAACC-3′.
The nucleotide sequences of all M. avium subsp. paratuberculosis genes described in this study were deposited in the GenBank/EMBL nucleotide sequence data library under accession numbers AF445420 through AF445446.
Our laboratories are sequencing the complete genome of M. avium subsp. paratuberculosis K-10, a field isolate recovered from a cow with clinical Johne's disease (http://www.cbc.umn.edu/ResearchProjects/AGAC/Mptb/Mptbhome.html).Thegenome size is estimated to be >5 Mb based on assembled sequence data, and at the time of this analysis (July 2001), 2.65 Mb was contained in contig fragments greater than 10 kb. Contigs that are above 10 kb were annotated with ARTEMIS and represent 48% of the total genome. The average size of the annotated contigs is 25 kb, with one contig over 70 kb. Each gene within the annotated contig set was also checked manually and confirmed by TB-parse. These contigs were aligned with M. avium subsp. avium sequence data generated at TIGR. TIGR has 612 contigs that total 5,867,714 bp in the 8 July 2001 data set.
M. avium subsp. avium and M. avium subsp. paratuberculosis display a high degree of similarity at the nucleotide level as well as local gene order conservation. An analysis of an 11-kb region surrounding the origin of replication for each of these genomes shows 98% nucleotide identity (Q. Zhang, E. Baechler, L. Li, J. P. Bannantine, and V. Kapur, unpublished data). The sequence similarity between orthologs in M. avium subsp. paratuberculosis and M. avium subsp. avium was greater than that between M. avium subsp. paratuberculosis and other mycobacterial species. A more global comparison shows that these strong nucleotide identities are present throughout both genomes. Despite this strong genetic similarity, a total of 27 genes from the annotated M. avium subsp. paratuberculosis contigs were identified that did not align with the unfinished M. avium subsp. avium genome by computerized alignments. These unique M. avium subsp. paratuberculosis sequences are listed in Table Table33 along with some sequence characteristics. Of these, three contained weak similarity to proteins in other mycobacterial species or proteins in GenBank (Table (Table3).3). This leaves 24 genes with no significant similarity to any known proteins. Since only approximately half of the M. avium subsp. paratuberculosis genome was used in these analyses, a complete genome analysis may reveal an estimated 50 unique M. avium subsp. paratuberculosis genes.
Some M. avium subsp. paratuberculosis sequences that did not align with M. avium subsp. avium, either in silico or experimentally, contain similarity to other mycobacterial species. One such sequence, designated no. 7, was tested by PCR and Southern hybridization with two M. avium subsp. avium isolates and two M. avium subsp. paratuberculosis strains (Fig. (Fig.1).1). An amplified PCR fragment was produced only with M. avium subsp. paratuberculosis genomic DNA as the template (Fig. (Fig.1A).1A). Likewise, DNA hybridization on Southern blots detected only M. avium subsp. paratuberculosis sequences, not M. avium subsp. avium (Fig. (Fig.1B).1B). However, BLASTP analysis of the no. 7 sequence revealed strong similarity to hypothetical proteins in the M. tuberculosis genome. Therefore, caution must be used in determining whether a sequence is truly unique to M. avium subsp. paratuberculosis. More comprehensive experiments using additional mycobacterial species are necessary before such conclusions can be made.
PCR amplification was performed on several mycobacterial species, strains, and isolates to experimentally determine the specificity for 26 of the 27 sequences (Table (Table4).4). Gene 128 was not included in these analyses because it had the lowest expect value (highest similarity to a sequence in GenBank) of the 27 sequences by BLASTP analysis (Table (Table3).3). These data show that primers designed from all 26 M. avium subsp. paratuberculosis K-10 genes could produce an amplified product in all 10 M. avium subsp. paratuberculosis strains or isolates tested. In addition, despite an absence of any homologous sequences in public databases, PCR products of the correct size were obtained for five genes by using templates from other mycobacterial species. Following this analysis, a core group of 21 genes that are present only in M. avium subsp. paratuberculosis remained (Table (Table44).
Table Table33 lists eight genes present on contig fragment 1614. These eight genes are arranged in tandem, span a total of 4.4 kb at the end of the 1614 contig (Fig. (Fig.2),2), and are present only in M. avium subsp. paratuberculosis (Table (Table4).4). Located 1,408 bp upstream of gene 250 is an integrase gene that contains similarity to other mycobacteriophage integrases. As larger contiguous fragments were assembled from the gap closure phase of the M. avium subsp. paratuberculosis genome project, a search to define the ends of the 4.4-kb sequence not present in M. avium subsp. avium was performed. This 4.4-kb segment containing genes 250 to 257, herein termed no. 481, is located at the end of the 46-kb contig 1614 and it was found to align with the 94-kb contig 1398 present in a more recent contig assembly data set (Fig. (Fig.2).2). The no. 481 sequence aligned near the center of the 94-kb contig essentially at 35 to 45 kb. A trimmed portion of the 1398 contig is shown in the alignment in Fig Fig2.2. The results of this analysis further extended the region of no. 481 sequence to 9.4 kb, none of which aligns with the M. avium subsp. avium sequence in silico.
A TBLASTX analysis was performed on the 9.4-kb sequence (designated contig 1398-trimmed in Fig. Fig.2).2). The results of these analyses revealed that, while no sequences aligned with M. avium subsp. avium, the ends of contig 1398-trimmed align with sequences in M. tuberculosis (Table (Table5).5). The open reading frames designated by a question mark in Table Table55 are present on contig 1398, which has not yet been annotated. This again leaves a core sequence of eight open reading frames, comprising the no. 481 sequence, that are present only in M. avium subsp. paratuberculosis. This core sequence is flanked by 1,408 bp of noncoding sequence downstream and 1092-bp of noncoding sequence upstream (Fig. (Fig.2).2). Therefore, this novel core sequence is well separated from other predicted open reading frames.
To confirm experimentally that no. 481 is present only in M. avium subsp. paratuberculosis, three arbitrarily chosen genes of the no. 481 sequence (251, 253, and 255) were radiolabeled and used as probes in DNA hybridization with several mycobacterial species, including M. fortuitum, M. bovis, M. intracellulare, M. avium subsp. avium, and M. avium subsp. paratuberculosis (Fig. (Fig.3).3). Only an M. avium subsp. paratuberculosis fragment greater than 9.5 kb was detected by each of the three gene probes.
A major research effort in the study of M. avium subsp. paratuberculosis has been directed at unraveling the complexities surrounding diagnosis of infected animals. However, no DNA sequence besides the IS900 element has been routinely used to detect the presence of M. avium subsp. paratuberculosis (9, 16, 17). IS900 is a repeated sequence in M. avium subsp. paratuberculosis, present in 14 copies in strain K10 (V. Kapur, L. Li, Q. Zhang, and J. P. Bannantine, unpublished data). The results of this study reveal an initial list of 27 sequences that are likely specific to M. avium subsp. paratuberculosis, as determined by an in silico comparison with M. avium subsp. avium. Subsequent analysis by PCR amplification has trimmed this list down to 21 M. avium subsp. paratuberculosis-specific sequences. Nearly one half of the genome was analyzed; therefore, the list reported here will likely expand when the genome sequence is completed. These novel sequences provide investigators with a list of potential diagnostic candidate sequences that can be applied in a multiplex PCR format to better diagnose Johne's disease in cattle.
A surprising finding revealed by this comparative genomic approach was the presence of M. avium subsp. paratuberculosis sequences that contain similarity to M. tuberculosis but are absent in M. avium subsp. avium. This was observed for portions of contig 1398-trimmed and the no. 7 sequence, which is not listed in Table Table3.3. Because M. avium subsp. avium is most closely related to M. avium subsp. paratuberculosis, it is the genome of choice for initial screening of novel M. avium subsp. paratuberculosis sequences. However, each sequence must be subsequently evaluated for specificity experimentally with a complete panel of Mycobacterium sp. DNA before specificity is concluded.
One of the annotated contigs (1614) contained 8 of the 27 M. avium subsp. paratuberculosis sequences not present in M. avium subsp. avium. This region of eight genes (no. 481 sequence) was examined further, as it seems likely to be cotranscribed from one or a small number of promoters, although this was not experimentally shown. The function of this novel gene cluster is not known, although it is possible that the no. 481 sequence may represent a cryptic prophage or prophage remnant. The presence of an upstream coding sequence with strong identity to mycobacteriophage integrases supports this hypothesis. The discovery of a novel mycobacteriophage that is selectively present in certain mycobacterial species is not unprecedented. The prophage phiRv1 is present in M. tuberculosis and some strains of M. bovis but is missing from all M. bovis BCG genomes (1, 2, 14). Conversely, other mycobacteriophages do not have the genomic structure reported for the no. 481 sequence. For example, the genomes of D29 and L5 mycobacteriophages are much larger at 50 kb, with the integrase present near the middle of the bacteriophage genome next to an attP attachment site (8). Furthermore, the integrase gene is separated from the rest of the no. 481 sequence by 1.4 kb, and further upstream of the putative integrase gene on the 1614 contig, another 2.3 kb of sequence separates the next predicted coding sequence. This situation is different from the high coding density seen in bacteriophages (8). Finally, sequencing of the M. tuberculosis genome shows that there are several segments containing phage-related genes (3). One of these appears to be small and contains part of a phage-like integrase gene and a putative excisionase gene but no other functions that are obviously phage related (5).
Horizontally transferred DNA segments that may correspond to pathogenicity islands can often be identified by differences in their GC content (10, 13). The 9.0-kb GS element in M. avium subsp. paratuberculosis (4, 23), for example, has an average GC content of 57.1% (4), significantly lower than the 69.31% average for the M. avium subsp. paratuberculosis genome. Although the source of this low-GC island was never identified, the element is bounded by short inverted repeats, further suggesting its acquisition by horizontal transfer. The fact that the no. 481 sequence is adjacent to a putative integrase may suggest that the no. 481 sequence is part of a horizontally acquired element. However, the GC content of the no. 481 region (66%) is similar to that of the rest of the genome (69.31%), which argues against a recent horizontally transferred element from a species with different GC content. In M. tuberculosis, the average GC content is 65.6%, although some areas show dramatic differences in GC content. Regions that were unusually GC rich or poor were found to correspond to the novel PE-PGRS gene family (5) or to genes encoding polyketide synthases or transmembrane proteins (5).
One of the primary goals set when our laboratories undertook the sequencing of the M. avium subsp. paratuberculosis genome was to identify novel sequences with potential diagnostic utility. This communication represents our initial efforts to achieve this goal. Heterologous expression of these genes is in progress. The resulting purified proteins will then be used in studies to determine if they are recognized by sera from cattle with Johne's disease. These findings may have significant application in the development of new diagnostic tests to identify cattle infected with Johne's disease.
We thank individuals at TIGR for their work on sequencing the M. avium subsp. avium genome. We thank Chad Reinke and Janis Hansen for technical assistance.
This work was funded by USDA-NRI grant 00-02215 to V.K. and J.P.B. Portions of this work were also supported by the Agricultural Research Service.