|Home | About | Journals | Submit | Contact Us | Français|
We report the first whole-genome sequences for five strains, two carried and three pathogenic, of the emerging pathogen Haemophilus haemolyticus. Preliminary analyses indicate that these genome sequences encode markers that distinguish H. haemolyticus from its closest Haemophilus relatives and provide clues to the identity of its virulence factors.
The bacterium Haemophilus haemolyticus is a Gram-negative, facultative anaerobe that colonizes the human respiratory tract (10). H. haemolyticus is one of eight Haemophilus species and is a sister taxon to the most pathogenic member of the genus, H. influenzae. Until very recently, H. haemolyticus has been regarded as a strict human commensal that rarely causes invasive disease, such as meningitis and bacteremia (1, 4, 9; R. D. Mair, X. Wang, E. Briere, L. S. Katz, A. Cohn, and L. W. Mayer, unpublished data). However, five cases of invasive disease detected in 2009 and 2010 which were originally attributed to nontypeable H. influenzae were later confirmed to be caused by H. haemolyticus (Mair et al., unpublished). The nature of its virulence determinants is unknown, and unambiguously differentiating H. haemolyticus from H. influenzae is challenging due to their close genetic relatedness. Genome sequence analysis may provide important clues regarding these open questions. Here, we report genome sequences for five strains of H. haemolyticus; these are the first reported whole-genome sequences for this species.
Two carriage H. haemolyticus strains were taken from a 2009 carriage study in Minnesota (M19107 and M19501) (6), and three pathogenic strains were isolated from patients in Georgia (M21127), Texas (M21621), and Illinois (M21639) in 2010. Sequencing of these five H. haemolyticus strains was performed at the Centers for Disease Control and Prevention (CDC) Biotechnology Core Facility using Roche Applied Science 454 pyrosequencing with the GS FLX titanium platform. The genomes were sequenced at 13.2× to 69.3× coverage (average, 37.8×), and the predicted genome sizes ranged from 1.89 to 2.33 Mb (average, 2.03 Mb). Genome assembly and annotation were performed by the Computational Genomics Group at the Georgia Institute of Technology using a modified version of the CG-Pipeline annotation platform (freely available at http://sourceforge.net/projects/cg-pipeline/) (5). For each strain, independent genome assemblies were first constructed using the Newbler (7) and Mira (3) assemblers and then merged using Minimus (12); genes were predicted using ab initio prediction with GeneMarkS (2). Contig numbers ranged from 22 to 123 (average, 53), and there were 1,886 to 2,782 predicted genes per genome (average, 2,190).
H. haemolyticus is most closely related to H. influenzae, and there is currently no molecular typing scheme that can unambiguously distinguish the two species (8, 11). For example, neither 16S rRNA gene sequences nor multilocus sequence typing can be used to unambiguously delineate H. haemolyticus from nontypeable H. influenzae strains. This provides a fundamental challenge to surveillance, and indeed, invasive cases of H. haemolyticus have been erroneously attributed to H. influenzae (Mair et al., unpublished). Complete genome sequences can provide a source of novel genetic markers that may be able to distinguish between these closely related species. Preliminary comparative analysis of the five genome sequences of H. haemolyticus characterized here with 19 complete H. influenzae genome sequences revealed 54 clusters of orthologous genes that are present exclusively among H. haemolyticus bacteria and absent from H. influenzae bacteria. These gene sequences provide potential markers that can be used as the basis of future molecular typing assays, and the presence of such lineage-specific genes also suggests the possibility of a specific genomic basis of virulence for pathogenic H. haemolyticus strains.
The five H. haemolyticus whole-genome sequence assemblies and their annotations were deposited in GenBank under the accession numbers AFQN00000000 (M19107), AFQO00000000 (M19501), AFQP00000000 (M21127), AFQQ00000000 (M21621), and AFQR00000000 (M21639).
This work was supported by an Alfred P. Sloan Research Fellowship in Computational and Evolutionary Molecular Biology (grant BR-4839 to I.K.J.) and by the Georgia Research Alliance (GRA.VAC09.O to I.K.J. and L.W.M.).
We acknowledge the support of the Georgia Tech graduate programs in bioinformatics. We thank the Active Bacterial Core surveillance team, Roman Golash from the Illinois Department of Health, and the Texas Department of State Health Services for providing strains.