|Home | About | Journals | Submit | Contact Us | Français|
Campylobacter is among the most common worldwide causes of bacterial gastroenteritis. This organism is part of the commensal microbiota of numerous host species, including livestock, and these animals constitute potential sources of human infection. Molecular typing approaches, especially multilocus sequence typing (MLST), have been used to attribute the source of human campylobacteriosis by quantifying the relative abundance of alleles at seven MLST loci among isolates from animal reservoirs and human infection, implicating chicken as a major infection source. The increasing availability of bacterial genomes provides data on allelic variation at loci across the genome, providing the potential to improve the discriminatory power of data for source attribution. Here we present a source attribution approach based on the identification of novel epidemiological markers among a reference pan-genome list of 1,810 genes identified by gene-by-gene comparison of 884 genomes of Campylobacter jejuni isolates from animal reservoirs, the environment, and clinical cases. Fifteen loci involved in metabolic activities, protein modification, signal transduction, and stress response or coding for hypothetical proteins were selected as host-segregating markers and used to attribute the source of 42 French and 281 United Kingdom clinical C. jejuni isolates. Consistent with previous studies of British campylobacteriosis, analyses performed using STRUCTURE software attributed 56.8% of British clinical cases to chicken, emphasizing the importance of this host reservoir as an infection source in the United Kingdom. However, among French clinical isolates, approximately equal proportions of isolates were attributed to chicken and ruminant reservoirs, suggesting possible differences in the relative importance of animal host reservoirs and indicating a benefit for further national-scale attribution modeling to account for differences in production, behavior, and food consumption.
IMPORTANCE Accurately quantifying the relative contribution of different host reservoirs to human Campylobacter infection is an ongoing challenge. This study, based on the development of a novel source attribution approach, provides the first results of source attribution in Campylobacter jejuni in France. A systematic analysis using gene-by-gene comparison of 884 genomes of C. jejuni isolates, with a pan-genome list of genes, identified 15 novel epidemiological markers for source attribution. The different proportions of French and United Kingdom clinical isolates attributed to each host reservoir illustrate a potential role for local/national variations in C. jejuni transmission dynamics.
Campylobacter spp. are among the main causes of foodborne bacterial gastroenteritis worldwide with nearly 236,000 reported cases in Europe in 2014 (1). Campylobacteriosis, mostly caused by Campylobacter jejuni and Campylobacter coli (2), is characterized by acute diarrhea, abdominal pain, headache, and nausea (3) and can lead to Guillain-Barré syndrome (4) and inflammatory bowel syndromes, including Crohn's disease (5, 6). Despite the improvement of surveillance services, the incidence of campylobacteriosis is underestimated. In France, the European Food Safety Authority (EFSA) reported nearly 5,000 cases annually between 2011 and 2014 (1), while the actual burden of disease is thought to be much higher at around 500,000 cases each year (7).
The ubiquity of Campylobacter, as part of the commensal microbiota of various animals, contributes to the threat this organism poses to humans. C. jejuni is commonly isolated from the digestive tracts of many mammals and wild and domestic birds (8). However, factors including the ability to form biofilms (2, 9) and colonize protozoa (10) mean that Campylobacter can be isolated from sources outside the host gut, such as food and water sources (11,–14). Humans are usually infected by handling, preparation, or consumption of meat contaminated during slaughter, including pork, beef, and especially poultry (15,–17). Consumption of raw milk or untreated water and contact with animals are also potential infection sources (18,–20), and quantifying the relative contribution of different infection sources remains an important aim in public health.
Molecular typing methods, such as multilocus sequence typing (MLST) (21), have shown Campylobacter populations to be highly structured, providing a better understanding of how lineage clusters relate to ecology (22, 23). In particular, this has revealed the existence of host-associated genotypes that are more commonly isolated from, for example, chickens or cattle, as well as generalist genotypes that are commonly isolated from multiple hosts. This means that genotyped clinical isolates can be assigned to the reservoir host population from which they most likely originated, based on genotype and allele frequencies, allowing the investigation of the source of human infection. This source attribution approach has been applied to Campylobacter, mainly using MLST data, emphasizing a significant role for the chicken reservoir in human infection (12, 24,–29).
The increasing use of whole-genome sequencing (WGS) data sets is enhancing understanding of the genetic basis of Campylobacter host ecology and transmission (9, 30,–32). In terms of source attribution, data on allelic variation at loci across the genome have considerable potential to improve sensitivity, particularly for assigning the origin of isolates from generalist clonal complexes, such as sequence type 21 (ST-21) and ST-45 complexes, where seven-locus MLST has been of limited use in identifying the source population (23, 33). The evolution of genomic signatures of host association requires that strains are genetically isolated in a specific host niche for long enough for adaptation and genetic drift to lead to host-associated sequence variation in the genome. Therefore, while targeting genetic variation across the genome increases the chances of identifying variation that segregates by host, rapid zoonotic transmission may erode host association signatures. However, even with estimates of host transitions occurring as frequently as every 2 years for isolates from the ST-21 and ST-45 complexes (34), in rapidly recombining species, such as Campylobacter spp., adaptation may occur fast enough to generate variation that provides a basis for source attribution.
In this study, we take a systematic gene-by-gene approach (35, 36) to mine the entire pan-genome of a defined genomic data set of C. jejuni isolates from several animal reservoirs to identify novel epidemiological markers for source attribution. To achieve this, alleles at loci across the genomes of isolates from known hosts were probabilistically assigned to several host populations. Loci where this “self-attribution” gave a high probability of assigning isolates to the correct host population were considered good candidates for source attribution. These host-segregating marker loci were then used to attribute the origin of human clinical cases from France and the United Kingdom, as well as isolates from French pets, to host reservoir populations based on the polymorphism at these 15 loci.
The principle of MLST was extended to the whole genomes of 884 C. jejuni isolates using the 1,810 loci of the pan-genome allowing definition of the core, soft-core, and accessory genomes of the C. jejuni isolates in this study. A core set of 472 genes were universally present within the 884 C. jejuni genomes. A set of 953 genes were found in at least 95% of the 884 isolates and constituted the soft-core genome of our population. Finally, 385 genes were present within less than 95% of the isolates and formed the accessory genome of the population. In these 1,810 loci, the number of alleles per locus ranged from 1 to 430 (see Table S3 in the supplemental material).
A core genome genealogical tree using an approximation of the maximum likelihood algorithm was implemented in FastTree2 to compare the population structure of chicken and ruminant isolates in different countries (Fig. 1). French agricultural C. jejuni isolates did not form distinct isolated clusters that were separate from agricultural isolates originating outside France. As in other studies (37, 38), this shows that host-associated genetic variation in C. jejuni is distributed across national boundaries and allows the use of existing reference training data sets to assign isolates from France. French isolates from chicken belonged to the ST-21 complex (n = 32), ST-45 complex (n = 9), ST-48 complex (n = 7), ST-206 complex (n = 7), ST-353 complex (n = 5), and ST-464 complex (n = 5). The French isolates from ruminants belonged to the ST-21 complex (n = 4), ST-42 complex (n = 1), ST-45 complex (n = 3), ST-48 complex (n = 3), ST-403 complex (n = 1), and ST-586 (n = 1). All the C. jejuni isolates from the different agricultural sources in France clustered with agricultural isolates from other countries except for one French isolate from cattle belonging to the ST-403 complex. Isolate structuring mirrored the clonal complex structure based on MLST designations. The cattle-associated ST-42 complex (39) was present among French isolates as were isolates belonging to the chicken specialist ST-353 complex (39).
To identify potentially suitable epidemiological markers for source attribution, we assessed the host-segregating power of the 1,810 loci by quantifying their accuracy for each source in self-attribution tests. The correct self-attribution rate of chicken isolates was generally lower than that for ruminant isolates. This difference was significantly lower (P < 0.001 by t test) using the 472 genes belonging to the core genome or the 953 genes belonging to the soft-core genome (Fig. 2). These two data sets of core and soft-core genes allowed a correct self-attribution to the host of 92% and 92.5% of ruminant isolates, respectively, and 77% and 73% of chicken isolates, respectively. Thus, according to the set of genes used to perform the source attribution, 23% to 27% of chicken isolates were wrongly attributed to the ruminant reservoir, while less than 10% of ruminant isolates were misattributed. When the genes belonging to the accessory genome were used for source attribution analyses, 87% of ruminant isolates were correctly associated with their host compared to 76.5% in the chicken population. This difference of locus segregating power according to the source can lead to a bias in the source assignment and an overestimation of the ruminant involvement in campylobacteriosis. Moreover, it has also been observed that chicken isolates may less often be assigned to the right host than ruminant isolates in self-attribution tests performed using the STRUCTURE software program (26). Thus, to account for this, we focused on epidemiological marker loci allowing equivalent or higher segregation of chicken isolates compared to ruminant isolates with a minimum threshold fixed at 60% (Fig. S1).
In total, 17 core genes, 20 soft-core genes, and one accessory gene demonstrated ≥60% correct host self-attribution for both chicken and cattle populations, constituting candidates for host-segregating markers. Finally, the 15 loci that showed the most accurate self-attribution were selected as potential good candidates for host-segregating markers. These genes are putatively involved in metabolic activities such as amino acid or vitamin biosynthesis, energy metabolism, modification of protein, or signal transduction, and stress response to heat shock (Table 1). The correct host attribution rate for these loci ranged from 70% to 90% in chicken and 60.5% to 78.5% in ruminants (Table 1 and Fig. 3). These 15 loci thus allowed an average correct host attribution of 80.7% in chicken populations and 68.2% in ruminant populations, which corresponded to a difference of 12.5% between chicken and ruminant populations. The correct host attribution rate for MLST was good for ruminant isolates (90% to 96%), but for isolates from chicken, it ranged from 66.5% to 75% (Fig. 3). The greater difference in correct host attribution with MLST, observed between chicken and ruminant (22.3%), could lead to a bias in host assignment with the overestimation of attribution to a particular reservoir.
A total of 506 agricultural or environmental C. jejuni isolates were used as the reference data set to assign the source of French (n = 42) and British (n = 281) clinical isolates and those from French pets (n = 55). These isolates were attributed probabilistically using the STRUCTURE software program to each potential host source population using allele information at 15 host-segregating loci (Fig. 4). A total of 56.8% of British human cases were attributed to chicken, while 37.1% of cases were attributed to ruminants and 6.1% to the environment. The same analysis applied to French clinical cases attributed an approximately equivalent proportion of cases to chicken (45.8%) and ruminants (46.9%), with 7.3% of cases attributed to environmental/wild bird sources. Analysis of the French pet isolate population revealed an equivalent attribution between the three host populations: 30.7% of the pet isolate population was attributed to chicken, 35.5% to ruminants, and 33.8% to the environment. Consistent with previous work, there was relatively low attribution to environment/wild bird sources in both countries (26). There was some evidence for differences in the contribution of chicken and cattle reservoirs of infection, although there were relatively few French samples.
Accurately quantifying the relative contribution of different host reservoirs to human Campylobacter infection is an ongoing challenge. Probabilistic attribution based on seven-locus MLST has provided valuable information and implicated poultry as an important source (12, 24, 27, 40). However, these techniques can act only on host association signals in seven genes. This is particularly limiting when assigning the origin of lineages that have switched hosts relatively recently and therefore have had limited time for host-associated signatures to evolve in these genes. One potential way to improve power is to target signatures at other loci across the genome. While there is host-associated genetic variation, even in the genomes of host generalist C. jejuni lineages (30), using whole-genome MLST (35) data in an existing attribution model provided little additional power over seven-locus MLST (34). One explanation for this is the relative scarcity of host-segregating markers. For example, here we found that nearly 31% of core gene alleles present in more than one ST-21 and ST-45 complex isolate genome were present in isolates from both cattle and chickens. In part because of this, signals of host association may be masked in conventional attribution models by signals of numerous non-host-segregating loci present in the population. In this study, 1,772 loci (constituting 98% of pan-genome loci) did not meet our criteria for host segregation. To account for this, we took the alternative approach of conducting gene-by-gene analysis of the genome and defining a panel of host-segregating loci.
Self-attribution tests, using STRUCTURE software, quantified the probability of correct host assignment for each locus across the genome. Consistent with analysis based on seven-locus MLST (26), fewer chicken isolates were correctly self-attributed compared to those from ruminants, with 73% to 77% and 87% to 92.5% correct self-attribution, respectively, using the different sets of genes (core genes, soft-core genes, and accessory genes). While alleles at some loci gave up to 100% correct self-attribution in one host, it was essential that selection of host-segregating marker loci was based upon the proportion of correct host segregation in both chicken and ruminant reservoirs to reduce attribution bias. Gene-by-gene assessment of the probability of correct self-attribution identified seven core genes, seven soft-core genes, and one accessory gene as candidate host-segregating epidemiological markers. The 15 chosen marker loci had various putative functions, with six loci encoding hypothetical proteins and the remaining loci involved in metabolic activities such as amino acid biosynthesis and energy metabolism, protein modification, signal transduction, and stress response. Multiple factors associated with differences in animal husbandry and host physiology make it difficult to assign a biological basis for host segregation of alleles among the marker loci. However, some of the genes are involved in acid stress response (groES) (41) or are organized in the same operon as flagellar proteins (flgJ) or as proteins involved in oxidative stress response (Cj1169c) (42).
While it is not necessary to define epidemiological markers based on functional differences, it is interesting to speculate how host colonization factors may have influenced genomic signatures of host association. There are numerous differences between the chicken and ruminant digestive tracts, including body temperature, 41°C in chicken and 38.6°C in cattle, and pH (43, 44). This can be used as a context for considering the sequence variation at host-segregating loci in this study. For example, groES, which is involved in the heat shock response (45), has been shown to contribute to the protection of C. jejuni against pH acidity during transit in the stomach (41). In addition, differences between chicken and cattle husbandry may apply different selection pressures to genes associated with survival of oxygen-intolerant Campylobacter outside the host. For example, depletion of the hypothetical protein Cj1169c and the protein Cj1170c (OMP50) in C. jejuni mutants has been shown to result in reduced colonization of chicken and higher sensitivity to oxygen (42, 46). Considering the cause of sequence variation at host-segregating loci is purely speculative in this study, but it is interesting to note that many of these loci are within the core (soft-core) genome. This is consistent with homologous sequence variation having a role in host adaptation, as previously described for C. coli (47), and not just acquisition of new genes conferring a specific functional advantage.
Consistent with previous studies (26, 48), source attribution of British and French clinical isolates using the 15 host-segregating markers in this study indicated a relatively small contribution of environmental and wild bird reservoirs as human infection sources. As in seven-locus MLST studies in United Kingdom, New Zealand, and Swiss human cases (26, 27, 49), the majority (56.8%) of British clinical cases in this study were attributed to the chicken reservoir, with 37.1% attributed to cattle. Among the most interesting findings was the higher attribution to the ruminant reservoir (46.9%) among French clinical cases, which was approximately equivalent to the contribution from chicken (45.8%). While the number of clinical isolates from France was relatively small, increasing the possibility of sampling bias, the elevated attribution to the ruminant reservoir was consistent with the role of cattle as an infection source among rural children in northeastern Scotland (40). Cultural and dietary differences could influence the relative contribution of sources of foodborne disease in France and the United Kingdom. For example, chicken consumption is higher in the United Kingdom (30 kg/person/year) compared to France (25 kg/person/year) (50), where other known infection sources, including ruminant offal and veal (51,–54), form a greater proportion of the diet. Factors associated with food preparation may also be significant, but analysis of a larger data set of French clinical isolates would be necessary to achieve a more representative description of human C. jejuni contamination routes in France.
A collection of 212 French C. jejuni isolates was sequenced, including isolates from clinical cases, chicken, cattle, pets, and the environment. The human isolates (n = 40) were from campylobacteriosis cases occurring in 2009 in France in regions with a significant broiler chicken meat consumption pattern. Chicken isolates were collected in 2008 and 2009 during two monitoring surveys designed to be representative of the broiler chicken production in France. The first sampling survey allowed the collection of isolates from ceca (n = 11) and carcasses (n = 21) collected from 425 batches of broiler chickens slaughtered in 58 French slaughterhouses over a 12-month period in 2008 (55). During the second monitoring survey, retail meat isolates (n = 33) were collected from broiler meat sampled in retail outlets over a 6-month period in 2009 in geographic areas representing the most significant broiler meat consumption patterns in France (14, 56). Cattle isolates were collected from dairy cow feces sampled during a local survey in 10 farms located in France in 2013 (n = 13) (57). The pet isolates (n = 55) were collected from 304 pet feces sampled in four veterinary clinics, two kennels, and individuals owning pets in Brittany in France in December 2014 and during a 6-month period from April to October 2015. Finally, the environmental C. jejuni isolates were collected from seawater (n = 3), freshwater (n = 30), sediments (n = 3), or mussels (n = 3) in France between 2013 and 2015 (see Table S1 in the supplemental material).
Isolates were subcultured onto Campylobacter selective blood-free agar (Karmali; Oxoid) in microaerophilic conditions (85% N2, 10% CO2, and 5% O2) at 42°C for 48 h. The genomic DNA was extracted from 1-day single-colony cultures incubated at 37°C using the QiaAMP DNA minikit (Qiagen) and quantified using the Qubit 2.0 fluorometer and the Qubit dsDNA (double-stranded DNA) HS (high-sensitivity) assay kit (Invitrogen). Genomes were sequenced using the Ion Torrent technology (Life Technologies). Libraries were prepared using the Ion Xpress Plus fragment library kit fragmentation (Life Technologies), cleaned using Agencourt AMPure XP (Beckman Coulter), and enriched after the size selection performed on a 2% E-Gel SizeSelect (Invitrogen). Emulsion PCR on Ion OneTouch 2 system and subsequent enrichment of template particles on Ion OneTouch ES system were both performed using the Ion PI template OT2 200 kit v3 (Life Technologies). The samples were loaded on a P1 chip and sequenced with an Ion Torrent Proton machine (Life Technologies). When needed for SPAdes assembly and torrent mapping alignment program (TMAP) alignment, the read number was down sampled to fit a maximal coverage depth of 80 (coverage depth evaluated on TMAP alignment, Torrent Suite v.4.0.2) before cleaning with Trimmomatic 0.32 software (58). Assemblies were produced by either MIRA version 4.0rc1 (59) or SPAdes 3.1.1 (60). The k-mer size used by MIRA was deduced by using kmergenie version 1.5658 (61). An average of 138 contigs were obtained for the 212 C. jejuni sequenced genomes with a median value of 71 contigs. The average of the total assembled sequence length is 1,708,807 bp (Table S2).
Isolates sequenced in this study were augmented with 672 genomes of C. jejuni isolated from clinical cases, chickens, ruminants, environmental water, and wild birds in different countries and published in previous studies (30, 32, 39, 62). This gave a total of 884 C. jejuni genomes in our study data set (Table S1).
The genomes sequenced in this study were stored on a web-based archive based on the BIGSdb software (36). The BLAST algorithm was used to perform gene-by-gene alignment and whole-genome MLST on the 884 C. jejuni genomes using a reference pan-genome approach (35, 63, 64). The reference pan-genome included 1,810 unique loci and was obtained from four available genomes, C. jejuni strains NCTC11168 (65), 81-176 (66), 81116 (67), and M1 (68) using a previously described method (64). Orthologous genes at these loci were defined as present in the 884 genomes if the sequence was present with >70% nucleotide identity over ≥50% of the sequence length. Individual gene sequences were aligned using MAFFT (69) and concatenated into contiguous sequence for each isolate including gaps for missing nucleotides or entire genes. An approximation of the maximum likelihood algorithm implemented in FastTree2 software (70) was used to reconstruct a phylogeny of core genome alignments, and the tree was visualized and annotated using MEGA6 software (71). Genetic variation at pan-genome loci was investigated in a presence/absence matrix with allelic diversity (64). The number of missing or incomplete genes for each locus was calculated to define the core, soft-core, and accessory genome of the C. jejuni population in this study. The core genome was defined as genes shared by all the isolates, including incomplete genes, which are a technical artifact due to the use of draft genomes. Genes shared in at least 95% of the isolates constituted the soft-core genome (72, 73), and the remaining genes constituted the accessory genome.
Loci where alleles segregate by host represent useful epidemiological markers for source attribution. To identify these loci, we assigned alleles at all loci in isolates of known host origin to host source training data sets and recorded the probability of correct host population in self-attribution, as in previous studies using seven MLST genes (26). Self-attribution tests focused on chicken isolates (n = 352) and ruminant isolates (n = 59) as the source of the majority of isolates in this study and major reservoirs of human infection by C. jejuni. Furthermore, generalist ST-21 and ST-45 clonal complexes are common in these hosts but have been difficult to attribute to source using seven-locus MLST (23, 26, 29, 34, 49). Host attribution was performed using STRUCTURE software, a Bayesian model-based clustering method designed to infer population structure and attribute individuals to populations using multilocus genotype data (74). Probabilistic assignment was carried out using 1,810 pan-genome loci using the “No Admixture” model with uncorrelated allele frequency model, assuming that each isolate originated in one of the putative source populations each with its own characteristic set of allelic frequencies (75). Analyses were performed with 10,000 burn-in cycles followed by 10,000 iterations with the parameters using source population information (USEPOPINFO) with test isolates differentiated from the training data set using POPFLAG. Random subsets of 20 isolates from each species were assigned to the training data, and self-attribution was performed 10 times for core, soft-core, and accessory genome loci separately.
Based on the self-attribution tests, the segregating power of each locus was calculated as the average probability of allele assignment to the correct host. Loci strongly contributing to correct assignments to chicken and ruminant populations constitute potential candidates for host-segregating epidemiological markers.
Source attribution of the human isolates from France and the United Kingdom was performed using allelic profiles of host-segregating loci. The source of pet contamination was also investigated because of the potential role as vectors in Campylobacter transmission to humans (19, 76, 77). Assignment analyses were carried out separately for 42 French and 281 British clinical isolates (32) and 55 isolates from French pets. The data set used as a reference to probabilistically attribute the sources of clinical and pet isolates comprised 352 chicken isolates, 59 ruminant isolates, and 95 wild bird and environmental isolates. The same settings were used as for self-attribution tests except the burn-in period and the iterations, which were set at 100,000, consistent with published work (49).
Genome sequences generated as part of this study were deposited in SRA (SRR5123296 to SRR5123507; see Table S1), and assemblies are available in Dryad (http://datadryad.org/) at https://doi.org/10.5061/dryad.m86k3. The assemblies of genomes sequenced in earlier studies can be found in Dryad (https://doi.org/10.5061/dryad.8t80s and https://doi.org/10.5061/dryad.28n35) and NCBI (BioProject PRJNA312235).
A.T. was funded by a Ph.D. studentship from the French Ministry of Defence (Direction Générale de l'Armement) and the Conseil General des Côtes d'Armor (CG22) and by a short-term mission grant from the MedVetNet European program. G.M. was supported by a NISCHR Health Research Fellowship (HF-14-13). S.K.S. was funded by Biotechnology and Biological Sciences Research Council (BBSRC) grant BB/I02464X/1, Medical Research Council (MRC) grant MR/L015080/1, and Wellcome Trust grant 088786/C/09/Z. We thank Francis Mégraud, head of the National Reference Center for Campylobacter and Helicobacter in France, and Michèle Gourmelon from the French Research Institute for Exploitation of the Sea (IFREMER) for kindly providing the French clinical and environmental isolates of C. jejuni.
Supplemental material for this article may be found at https://doi.org/10.1128/AEM.03085-16.