|Home | About | Journals | Submit | Contact Us | Français|
Currently, genetic typing of microorganisms is widely used in several major fields of microbiological research. Taxonomy, research aimed at elucidation of evolutionary dynamics or phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology all rely on genetic typing data for discrimination between genotypes. Apart from being an essential component of these fundamental sciences, microbial typing clearly affects several areas of applied microbiogical research. The epidemiological investigation of outbreaks of infectious diseases and the measurement of genetic diversity in relation to relevant biological properties such as pathogenicity, drug resistance, and biodegradation capacities are obvious examples. The diversity among nucleic acid molecules provides the basic information for all fields described above. However, researchers in various disciplines tend to use different vocabularies, a wide variety of different experimental methods to monitor genetic variation, and sometimes widely differing modes of data processing and interpretation. The aim of the present review is to summarize the technological and fundamental concepts used in microbial taxonomy, evolutionary genetics, and epidemiology. Information on the nomenclature used in the different fields of research is provided, descriptions of the diverse genetic typing procedures are presented, and examples of both conceptual and technological research developments for Escherichia coli are included. Recommendations for unification of the different fields through standardization of laboratory techniques are made.
“As molecular microbiologists and ecologists learn each other's language and modes of thought, and are ready to share technologies and ideas, they will fulfill the yearning for a deeper understanding of cellular functions.”
From Schaechter et al. (94a)
The ability to discriminate between genomes is essential to several disciplines of microbiology research including taxonomy, studies of evolutionary mechanisms and phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology. Genetic typing is the means by which the microbiologist is provided with the ability to discriminate between and catalogue microbial nucleic acid molecules. Since genetic characterization forms the basis that allows researchers to classify isolates of microorganisms, the most detailed form of typing, full-genome sequencing, should essentially integrate taxonomy, evolutionary and phylogenetic studies, population genetics, and epidemiology. Once full-genome sequences are available for multiple isolates of a single bacterial species, all genetic variables can be catalogued. The nature of the mutations thus identified can be helpful in elucidating the relatedness between these isolates. Alternatively, if multiple isolates from multiple species have been sequenced in full, the data collection will also define the relatedness or lack thereof between microbial species and genera. To illustrate, more than 30 genome sequences are available for bacteriophages, which parasitize different bacterial hosts. These sequences allow the calculation of relationships that have led to the recent suggestion that today's worldwide phase populations have a common ancestry. It has been proposed that modern phages are mosaics, generated through their access to a common pool of bacteriophage genes (39). Differences between phages originate from horizontal gene exchange and other forms of molecular evolution that depend on a large array of environmental (i.e., host and medium) influences. Although this sequence-based model requires further verification by the inclusion of additional full-genome sequences, it nicely illustrates the type of taxonomic, evolutionary, and population genetics information that can be obtained from detailed experimental genetic identification. Although more difficult to perform due to the increased genetic complexity of prokaryotic cells compared to phages (see, for example, reference 62 for a single species review), similar studies in microbiology are urgently required. Currently, full-genome sequences for multiple isolates of a single microbial species are rare, implying that genetic typing is still performed by methods that are inherently suboptimal.
In addition to the shortcomings of current genetic typing methods, researchers involved in the aforementioned fields of microbiology that employ these methods to genetically characterize organisms tend to use different vocabularies, experimental methodologies, and modes of data processing and interpretation. This is a problem since communication of data is obstructed because of a general lack of standardized genetic typing procedures. Except for primary DNA sequences, typing data frequently suffer from limited interlaboratory reproducibility. In light of these issues, an integrated experimental and theoretical approach to taxonomy, evolutionary and population genetics, and epidemiological typing of microorganisms is vital. Therefore, the aim of this review is to integrate the major concepts common to each of these disciplines. The important concepts include the identification of adequate taxonomic targets, the detection of genetic variation, natural selection, the development of integrated nomenclature, and the analysis of mechanisms of mutation and recombination. The field of microbial species identification and classification, i.e., taxonomy, will be introduced first, to highlight the various definitions of taxonomic units already in use in a single field of microbial identification research. Subsequently, fundamental aspects of evolutionary genetics will be described, after which definitions and concepts derived from epidemiologic typing will be surveyed. After a discussion of the current investigations taking place at the interface of taxonomy, evolutionary, genetics and epidemiology, an integrated discussion of the use of molecular typing to describe the importance of genetic variability in a single microbial species will be presented to highlight examples of integrated molecular typing studies.
Overall, nomenclature will be compared, current research foci covering different concepts will be reviewed, and recommendations for technological standardization and selection of a (molecular) technique of choice for answering certain fundamental questions will be discussed.
Taxonomy, also known as (bio)systematics, gathers organisms into defined groups, provides appropriate nomenclature for the different groups, and is involved in the identification of previously unknown microorganisms. To establish these groups, microbial typing data are mandatory for the definition of so-called species. The definition of species is a primary underlying concept; however, it is controversial and is undergoing continuous refinement (132). Incorrect incorporation of evolutionary history can lead to misclassification of putative ancestors. For this reason, phylogeny, the science that defines groups at a higher level of divergence (by comparing and establishing taxa), will not be discussed here since phylogeneticists try to develop theoretical frameworks that are essentially based on these complex evolutionary relationships. The interested reader is referred to two excellent reviews of bacterial phylogeny based on comparative sequence analysis (mainly rRNA, elongation factor Tu, and ATPase gene sequences) that were recently published by Ludwig et al. (59) and by Huelsenbeck and Rannala (41).
When the diverse microbial world is considered, many species of microorganisms have been described on the basis of phenotypic characteristics without any clues to their phylogenetic status being available. For instance, most protozoan parasites have a clonal population structure. This implies that propagation occurs through simple genome copying during cell division and that intermolecular recombination is infrequent or even absent. Combination of genomes from different cells, as the cells are undergoing frequent genetic exchange or the “sexual cycle,” has hardly been documented. Even for protozoan species that do have a documented sexual cycle, sex occurs during a certain stage of life only; therefore, for these microparasites, a species concept involving the combination of genetic material from different cells is difficult to handle (20).
For bacteria, a species was defined as an entity in which members have a DNA-DNA homology value of at least 70% (125). Consequently, species are defined on the basis of a primitive form of genetic typing: resemblance between genomes is arbitrarily characterized on the basis of a technology that only superficially scans for genome identity. This genospecies concept can be replaced by the alternative but more general phylogenetic species concept proposed by Cracraft (17). According to this concept, a species is a “group with a common origin that is composed of the smallest diagnosable cluster of individual organisms within which there is a parental pattern of ancestry and descent.” It has been proposed that a microbial species should correspond to a discrete typing unit to be valid (109). In contrast to the genospecies concept, the discrete typing unit concept does not imply any given level of phylogenetic divergence and refers only to the criterion of genetic discreteness. Consequently, this approach relies heavily on the quality of genetic typing data. The criterion of genetic discreteness can be adhered to in a strict manner, in which case the species definition is clear-cut. On the other hand, species can be considered “condensed nodes” in an “otherwise cloudy, confluent taxonomic space” (120). In the latter option, taxonomy provides a framework for the nodes and allows for shared “node-membership” for the biological entities in the space between nodes depending on common phenotypic traits and ecological niches.
Taxonomists using the strict species definition have been accused by population geneticists of arbitrarily imposing divisions in a continuum (66). This raises the suggestion that taxonomists artificially separate sets of microorganisms that may be related. Although the microbiological taxonomist and molecular geneticist use overlapping genetic typing technologies for determining relatedness between strains and should be speaking the same language, the manner in which the data are interpreted differs. Vandamme et al. (120) proposed that data analysis must be improved in the field of taxonomy. Although Vandamme et al. advocate so-called polyphasic taxonomy (taxonomy based on a combination of data obtained by various laboratory techniques), they also emphasize that new mathematical and information strategies for adequate data integration are required. Whether these novel information and communication technologies will always provide adequate consensus outcomes needs to be assessed in future studies. However, it is believed that the multidisciplinary polyphasic approach may be worth implementing as an essential component of future taxonomic studies (120). Moreover, the polyphasic approach could provide the foundation for molecular genetic typing studies in microbiology in general.
Although taxonomy profits from the data generated by using multiple markers to assess a genetic profile for a given microbial isolate, many of these markers may be either more general or subspecies or isolate specific. Variation of a genetic marker for which the level of species, subspecies, or isolate restriction has been defined can also be used for estimating the degree of shared evolutionary history between populations. Depending on how a population is structured, this may lead to taxonomically invalid conclusions. On a theoretical basis, four different types of population structures have been proposed by Maynard Smith et al. (68). At one extreme there is clonal evolution, in which genetic exchange is not frequent enough to prevent the individualization of discrete evolutionary lines. This involves a lack of gene segregation and so-called strong linkage disequilibrium among descendants (nonrandom association of genotypes occurring at different loci). At the opposite end of the spectrum there is panmixia, in which no lineages can be identified and, through basically sexual multiplication, fully randomized reassortment of genetic loci occurs continuously and blurs the vertical lineages. Between these two extremes lie the intermediates: cryptic speciation and epidemic clonality. In the case of cryptic speciation, the species under study is subdivided into two or more biological species, each of which is panmictic (i.e., with no lineages identified) in its own ecological niche; also, separate lineages cannot be identified for any of the species. Consequently, specific identification of different strains is hard. With epidemic clonality, sudden clonal expansion of a relatively short-lived type is occasionally be observed for a species which otherwise replicates in a sexual mode.
In conclusion, taxonomy will be based on increasingly more detailed phenotypic and genotypic data sets in future. Whereas simple DNA-DNA genome homology previously provided the guidelines for species identification, the unprecedented detail in which microbial isolates can now be typed will strengthen taxonomic coherence and at the same time bridge the gap between taxonomy on the one hand and evolutionary genetics or microbial epidemiology on the other.
As already exemplified in the preceding section, genetic typing techniques with different levels of resolution but applied to the same set of strains are likely to reveal different degrees of genetic variability. To better comprehend this variation, it is necessary not only to standardize the technology used but also to study the evolutionary processes responsible for it. This is done by evolutionary genetics. Table Table11 defines the terms used by evolutionary geneticists. At least four fundamental mechanisms can give rise to variation used to define evolutionary genetics: mutation, hypermutation, genetic recombination, and selection. The roles of each of these mechanisms in the generation of genetic variation will be considered below.
Mutation is the source of all genetic variation and includes all heritable changes in a single replicating genome, whether they are caused by DNA replication errors such as base pair substitutions, insertions, and deletions or by the activity of transposons and insertion sequence elements that can move around (and replicate) in the genome independent of the replication cycle of the host. Mutations are believed to be random with respect to their phenotypic effect. Although mutations can have a negative effect on the fitness of the organisms, many will have no measurable effect (e.g., synonymous base pair substitutions and mutations in nontranscribed regions of the genome); only very few are beneficial. A specific mutation, however, does not occur more often in an environment in which it has a beneficial effect on an organism's performance than in an environment in which its effect is negative (see references 12 and 56 for a more detailed discussion). Knowledge of the rate at which mutations occur is a prerequisite for interpreting the degree of variation observed. However, in recent years evidence has accumulated supporting the view that microbial genomes have far greater mutational flexibility than was previously assumed. Microbial mutation rates vary not only among species but also among different genes of the same individual and even within the same gene at different time points (22, 74).
So-called “mutators” display higher mutation rates than do wild-type organisms. The main mechanism behind this phenomenon is probably a lack of DNA mismatch repair. Defects in the wild-type methyl-directed mismatch repair route affect mutation and recombination frequencies, thereby enhancing the “mutability” of a cell (124a). In bacteria, mutators were first detected in the 1950s on the basis of highly variable phenotypic traits (115). Not only may mutator strains mutate their genes more rapidly, but also they exchange these genes at a higher rate, thereby neutralizing the potential divergence caused by increased mutation. The combination of rapid mutation and promiscuous exchange of DNA has led to the hypothesis that mutators may be crucial in the process of speciation (53, 86). Mutator strains have been found among pathogenic (53) and nonpathogenic (65) bacterial isolates, as well as among experimental populations of Escherichia coli (98). The results of mixed-cultivation experiments of wild-type and mutator strains (14) and of sequential mutant selection experiments (61) suggest that mutators may have a selective advantage when populations adapt to a new environment. However, this advantage may be limited (15), and the impact of mutators on molecular evolution may be smaller than previously thought (123). Higher mutation rates can also be induced physiologically by a variety of mechanisms, producing transient mutator phenotypes (92). Increased activity of insertion sequence elements or activation of otherwise lysogenic prophages may also be responsible for the elevated mutation rates in resting or starving populations of bacteria (3, 72, 80).
Particular genes that are involved in the interactions with the host have been found among pathogens. Some of these genes are inherently hypermutable because of mechanisms such as slipped-strand mispairing at loci harboring short sequence repeats. DNA strand slippage and subsequent insertions or deletions of repeat units during replication is a result of the formation of illegitimate hybrids between neighboring repeat units, thereby establishing aberrant tertiary structures in the DNA molecule. The basis of another mechanism to generate mutations is cassette switching mediated by insertion sequences. Switching can drastically affect the expression of genes encoding surface components or virulence factors. The synthesis of the intercellular adhesin of coagulase-negative staphylococci is regulated in this way (134). Genes whose expression is controlled by either of these random mechanisms have been called contingency genes (see below), because they are critical to the responsiveness of the organism to the environment (77). Thus, information on mutational mechanisms is indispensable for accurate interpretation of the genetic diversity revealed. The adaptive significance of such hypermutable mechanisms has been hypothesized, and a general theoretical framework is being developed to stimulate our understanding of the evolution of mutation rates (22, 77, 104). The implication is that once improved insight into (adaptive) mutation rates is achieved, expression of the genes involved can be monitored experimentally under different conditions. This may allow us to predict bacterial responses to environmental stimuli and help us to identify important determinants for control of infections. Although clearly related, variation in genotype (mutation) and variation in expression (regulation) are independent phenomena. Studies of each may provide distinct, albeit interrelated, data sets and insights.
Provided that mutation has generated variation in a population, the horizontal transfer of nucleic acid moieties and genetic recombination may also directly affect the level of variation. Although this is perhaps counterintuitive, genetic recombination can decrease the level of genetic variation under some conditions (5). If beneficial fitness alleles are not randomly distributed among population members and are found in different individuals more often than expected by chance (i.e., negative linkage equilibrium), only then can sex and recombination increase the genetic variance. Since the actual rate of recombination affects the variation, the question is raised of how true clonal evolution can be estimated. True clones are descendants of the same ancestors and should have independent evolutionary histories, giving rise to distinct genotypes (38, 68). In the case of panmixia (frequent recombination between genomes), no lineages can be discerned because of a very high degree of genetic variability among different isolates of a single bacterial species. It is probable that most species range between these two extremes, which emphasizes the need to determine the actual rate of recombination in order to derive their evolutionary histories and interpret the genetic differences observed.
Although mutation and recombination are considered separate biological processes, the enzymes involved (for instance, those that take part in the replication process) partially overlap (51). Moreover, strains lacking efficient. DNA repair are known to facilitate recombination between divergent chromosomes, making them more “promiscuous” (86). Genetic recombination in prokaryotes also affects the horizontal transfer of genes on plasmids that are important for fitness in particular environments. Typical plasmid-borne genes include those involved in resistance to bacteriophages and antibiotics and those encoding virulence factors. Plasmids can be acquired readily if they are beneficial and cured (lost) when the environment changes in such a way as to incur a cost to the host. In this way, horizontal transfer can be more efficient and less costly than conventional (random) recombination between two chromosomes. The presence (e.g., via conjugation) of virulence genes in pathogenicity islands is another example of a mechanism by which pathogens adapt to a mode of reproduction that facilitates the horizontal transfer of important genes (36). The fact that transfer of genes or other DNA elements between plasmids and chromosomes has been documented demonstrates that horizontal transfer may in the end also lead to vertical transmission of genetic traits (129a).
The role of natural selection in genetic variation is as important as that of the two evolutionary mechanisms discussed above. For instance, if the new genotypes generated by mutation or recombination have deleterious fitness effects, natural selection will tend to remove them from the population and no genetic variation will be observed. The role of selection depends largely on two variables: (i) the effect on fitness of the new genotypes generated by mutation or recombination, which can be neutral, deleterious, or beneficial; and (ii) the size of the population. The size of the population is important: the more cells, the more likely it is that variant genotypes will be present. Both variables affect the rate at which genetic variation accumulates by natural selection and therefore are important in rapidly adapting pathogens.
If the generated variation is neutral, as in the case of silent mutations that do not change the amino acid sequence of a given gene product, selection will have no effect on the fate of these variants. Stable future maintenance of the mutation depends entirely on the chance of being transmitted to the next generation (genetic drift). Theory predicts that neutral genetic changes accumulate roughly at the rate at which they occur in an asexual population (50). Thus, natural selection and the size of the population may not affect the rate at which independent clonal populations evolve through neutral variation. However, in small populations, the chances are increased that given clones are fixed or eliminated through random processes (also see below).
When the effect of new genotypes on fitness is deleterious, as with frameshift mutations, natural selection will tend to remove these variants from the population and prevent the accumulation of genetic variation between independently evolving populations. However, whether selection is able to remove deleterious variants depends on the size of the population. Selection is much less effective in small populations, where chance plays a relatively important role in determining the fate of genotypes; this is in contrast to large populations, where the effect of various chance events is averaged. Therefore, in small populations, deleterious mutations can accumulate as a result of repeated loss of the class of least-mutated individuals, a process called Muller's ratchet (78). This process may be of particular importance for the survival and persistence of pathogens in which small populations are likely to be common due to the small number of organisms entering a new host (75).
Newly generated genotypes can also have a beneficial effect on fitness. Again, the size of the population in which these new genotypes arise determines clonal expansion of these genotypes. Small populations will make selection less efficient and render beneficial mutants neutral, while large populations will increase the role of selection. Thus, large populations will maintain new beneficial variants more efficiently due to at least two factors: mutation (14) and more efficient selection. However, a recent study shows that when the number of beneficial variants in the populations is very large, the rate at which beneficial variants are fixed in a clonal population becomes independent of the population size and also becomes maximal (19). Therefore, for genetic variation generated at extremely mutation-prone loci, such as some of those determining virulence in pathogens, information about the size of the population is necessary to interpret the variation observed.
The role of natural selection depends on the interactions between an organism and its environment, i.e., the ecological factors that determine fitness. One such factor is the size of the population in which the organism resides. Other ecological factors include the degree of variability of the environment, which may be high for pathogenic microbes due to the variability of host defense mechanisms. Whether the environment changes at a constant pace or shows high variability alternating with periods of stasis is relevant to whether evolution proceeds gradually or in a punctuated fashion (26). Another relevant ecological factor is whether the fitness of an organism depends on its frequency relative to other variants in the population. A negative frequency dependence, so that fitness is high when a genotype is present at low frequency and low when it is present at high frequency, causes different genotypes to be maintained in the population for extended periods, during which no new variants can invade and take over the population. Negative frequency dependence is thought to be common among pathogens due to the advantages that genotypes have in their interaction with host defense mechanisms. Finally, a heterogeneous environment may allow the simultaneous evolution of several locally adapted variants, while a homogeneous environment will allow only a single variant to evolve (85). Information about the ecology of internal host environments is needed to understand the ecological factors relevant for pathogens. The within-host population dynamics of pathogen proliferation is extremely complex in cases of infectious diseases. These dynamics are steered by a multitude of physical, immunological, and chemical factors (57).
It is interesting that the concepts and events introduced above are continuously brought into practice for microbial strain identification purposes. This is most clear in microbial typing for epidemiologic purposes, as discussed in the subsequent sections.
To set the stage for microbial typing, the source of the organisms to be studied should be well defined. The “natural habitat” from which the cells are isolated is important to our ability to interpret the results of molecular typing and deduce adequate conclusions. The following nomenclature for use in molecular typing was proposed by Struelens et al. (101). A microbial isolate or a stock is a collection of cells derived as a monoculture from a primary colony growing on a solid medium on which the source of the isolate was inoculated. Bacteria grown in liquid cultures or obligatory intracellular parasites do not meet this criterion. The source of an isolate can be as diverse as a clinical specimen from a patient and a soil sample. A strain represents an isolate or group of isolates displaying specific genetic or phenotypic characteristics that set it apart from other isolates belonging to the same species. Consequently, a reference strain is a well-characterized strain that is preserved and included in further studies for comparison purposes. A clone (or clonal group of isolates or strains) comprises organisms descending from a common ancestor because of a direct chain of replication. Identification of clones must be based on thorough monitoring of several (molecular) markers of sufficient discriminatory power; the definition of “sufficient” is sometimes the subject of intense debate. A genetic clone comprises a group of strains that are completely identical at the DNA level. These five definitions form the basis for staging typing procedures. Although initial species identification is common practice, not yet specified isolates can be characterized as well. The laboratory techniques available to date are multiple and have been reviewed in detail in several publications (see, e.g., reference 64). Some general considerations about the qualities and applications of these systems will be summarized below.
Typing systems are used to define specific characteristics of the object under study. The procedures are specific for different phenotypic or genetic parameters and can be general (i.e., applicable to any microbial species) or species or genus specific (110). Since typing procedures are the basis for the integration of epidemiology, taxonomy, and evolutionary genetics, these procedures are included in Table Table2.2. This table is far from complete since modifications have been developed for most of the techniques described. For additional information, the reader is referred to reviews on the specific procedures (for PCR-mediated typing, see, e.g., reference 84). It cannot be overemphasized that limitations exist in both the applicability of the different techniques and their results when they are used to categorize microbial isolates. For example, plasmid profiling is adequate only for organisms possessing these extrachromosomal elements. Obviously, typing data should reveal differences not only between strains belonging to different species but also between strains belonging to a single species.
Optimal typeability, a high degree of reproducibility, adequate stability, and unprecedented resolving power characterize the “gold standard” typing technique. Several of these characteristics, such as the index of discriminatory power, can be expressed as a number, enabling comparative monitoring of the quality of a typing procedure (42). In addition, the procedures should not be too costly or complicated and should be easily accessible. Table Table22 also includes some of the qualities of molecular and phenotypic typing techniques (63, 64).
Table Table33 summarizes the suitability of typing techniques from a theoretical perspective. A careful attempt has been made to correlate the usefulness of a procedure with specific classes of genomic changes. These changes have been categorized as replication phenomena versus acquisition or loss of extraneous elements. Again, this table reflects our opinion, and we suggest to that these data be interpreted with caution. Table Table33 includes techniques that are capable of efficiently monitoring all types of mutations and thus should be preferred for interdisciplinary studies. Currently, nucleic acid-mediated techniques are more frequently applied and better appreciated than the phenotypically oriented approaches in taxonomy, epidemiology, and evolutionary studies. Multilocus enzyme electrophoresis (MLEE) (97) is the exception, having been the technique of choice for many population genetic microbial studies for more than three decades.
Furthermore, space and time need to be considered when selecting the optimal molecular marker(s). Small-scale epidemiological studies require different approaches from the analysis of a worldwide or nationwide spread of certain microbes. The speed at which the genome alters (molecular clock) influences the data that are generated on the basis of a certain molecular marker, and thus the molecular marker should be selected in accordance with the scope of the study. Since genomic instability depends on the region within a given chromosome (i.e., repetitive DNA domains alter at a relatively high rate), the selection of a region for genetic characterization should be done with care. For international long-term screening of dissemination or for phylogenetic studies (around and below the species level), highly variable markers that define a maximum number of individual types and have a so-called high molecular clock speed are essentially not suitable.
The statistical methods used greatly influence the outcome of comparative studies (112). Depending on the statistical method used after having scored a variety of diverse molecular markers, results could drastically differ from one study to another. It is reassuring that the modern computer programs suited for translation of typing data into coherent genetic profiles can combine different sets of typing data. For instance, Bionumerics, a program developed by Applied Maths (Ghent, Belgium), can use both genetic and phenotypic data to calculate interstrain relatedness in a combinatorial manner.
In conclusion, there are many different methods for determination of genetic variation among microbial isolates. Each of these methods has its technical and nucleic acid target-dependent limitations, which should be taken into account when performing molecular typing studies and subsequently calculating strain relatedness. The method chosen is determined by the nature of the microbiological questions to be answered.
Microbial epidemiologists monitor the spread of viruses, bacteria, fungi, and protozoan parasites associated with human or animal infectious diseases at levels ranging from a single host or ecosystem to the worldwide environment. On the basis of epidemiological investigation, public health risks can be determined and interventions in the spread of disease can be designed and their efficacy can be assessed. Because data are generally neither confirmed by in-depth studies on the mechanisms by which the mutations arise nor complemented by data on the frequencies at which the mutations occur under natural conditions, the studies are descriptive by nature and nonexplanatory with respect to strain evolution. For instance, outbreak-related strains in a medical setting are supposed to share a certain set of characteristics that differ from those of epidemiologically nonrelated isolates or strains arising from individual sources. As time passes and organisms spread, divergence may increase but should still be of limited significance. Based on these assumptions, empiric interpretation guidelines can be proposed. These guidelines generally permit a certain level of flexibility in the experimental data: small differences between data sets need not always identify novel genetic types but could be due to experimental inaccuracy. If correctly implemented, accurate experimental data can also be shared among different research centers (106). Microbial epidemiology is useful for the worldwide tracking of pathogenic or multiresistant microbes as part of surveillance systems.
It is important to realize that monitoring of either genome or gene polymorphism has implications for the correct identification of genetic types. Methicillin-resistant Staphylococcus aureus (MRSA) strains will be used as an example of how molecular typing has been used to define modes of spread of bacterial types versus individual microbial genes. Molecular evolution of MRSA, which has been under way for a more 40 years, involves both horizontal gene transfer (79) and clonal dissemination of certain strains (52). The latter is of primary importance for monitoring short-term nosocomial spread, whereas horizontal gene transfer of methicillin resistance has been documented by studies covering a longer time and greater geographic space (40, 45, 122). It has to be reemphasized that short- and long-term evolution cannot be distinguished in a straightforward manner because of their dependence on the intricate interplay between the microbe and environmental influences posed by chemicals or other species of organisms. Analysis of a large number of diverse MRSA strains revealed that (depending on the penicillin-binding protein 2′-encoding region) horizontal spread was either rare or more frequent. These differences are related to the horizontal transmissibility of the genomic region involved (103). Consequently, the value of penicillin-binding protein 2′ gene analysis may not be appealing to those who want to monitor a nosocomial outbreak.
In conclusion, short-term outbreak investigation requires approaches that are essentially different from those that can be successfully used for large-scale longitudinal surveillance studies. The resolving power of the typing strategy applied may differ, as may the numbers of discriminative genetic markers produced. However, the conceptual frameworks for both studies are closely related (for recent position papers, see references 101 and 102) and are similar to the frameworks described in the previous sections on taxonomy and evolutionary genetics.
Epidemiologists, taxonomists, and molecular geneticists use molecular typing data for comparative genome analysis. The way in which variability occurs, whether and how it is regulated, and how it affects the coding potential of the organism's genome are studied in depth in other fields, some of which are briefly mentioned below.
Evolutionary studies address how genetic changes are induced, persevere, and become fixed. Also, the question of the environmental influences on the rate and nature of these changes and how one can adequately measure (or even predict) the changes themselves and the mechanisms responsible for them are matters for investigation. Evolutionary processes can be studied in the laboratory by semicontinuous growth of bacterial cultures over many thousands of generations, the so-called serial-passage experiments (SPE) (54). By monitoring phenotypic or genotypic changes over time, insight into basic aspects of genetic adaptation or evolution in an artificial model system can be obtained. In 1996, Elena et al. (26) reported that beneficial mutations causing a selective advantage to the recipient cell to such an extent that they become fixed in the population are rare events (21). According to these authors, the rarity of these selective sweeps leads to so-called punctuated evolution, where the fitness of the population increases suddenly and stepwise after relatively quiescent periods of stasis. This study was performed with a single starting strain without very strong selective pressures and with asexual reproduction (16). Other studies of evolving populations of Escherichia coli in simple chemically defined growth media have addressed many different issues including the study of genetic targets of adaptation (73, 94, 113, 124), the relative roles of chance and evolutionary history on subsequent evolution (113, 114), and the evolutionary origin of high mutation rates (98) and their effect on the rate of adaptation (19). Recently, a number experimental evolution studies using other organisms, such as the unicellular alga Chlamydomonas (6), bacteriophage X174 (129), and Saccharomyces cerevisiae (28) have been described.
Additional SPEs are still urgently required to analyze other species, preferably in multispecies or multistrain systems and under various selective pressures (antibiotics, pH, nutrient limitation, etc.). Because current SPEs are performed in relatively simple systems, genuine “serial-passage infection” models should also be used with susceptible animals or during nosocomial or public outbreaks of infections such that the effects on the bacterial genotype of moving from one host to another may be carefully assessed. In this respect it is interesting that cystic fibrosis patients colonized with Haemophilus influenzae may already represent such an in vivo SPE-like model. During persistent pulmonary colonization of cystic fibrosis patients, variability can be assessed at the single H. influenzae virulence gene level; DNA repeats present in some of these genes are clearly variable, despite conservation of the overall genotype of the strain involved (88). Apparently, waves of bacterial variants reflect displacement of genetic variants in the in vivo situation. In the end, detailed SPEs may also resolve the controversy about directed or induced mutagenesis (55).
Genome plasticity is defined by the accumulation of changes in a genome. Various molecular processes such as transposition, transformation, and mutation lead to (epi)genomic changes and depend on the primary structure of the nucleic acid molecule. The molecular mechanisms that model genomic variability have been studied by many authors (see, e.g., reference 2 for a review). Genome variability as a consequence of the actions of insertion elements is high (80), and movement of genetic elements in general has phenotypic consequences. If pathogenicity islands (36) or simple genetic cassettes such as iterons (11, 30) are transferred novel microbial types are generated with different abilities to cause deleterious effects to the host in which the strain was previously residing. For example, the transfer of genetic material between bacteria in real-time during infection of an individual host has been documented for vancomycin-resistant enterococci (129a). It was shown that in certain clinical settings, outbreaks of vancomycin-resistant enterococcus colonization and infection were not due to clonal dissemination of a certain strain but were caused by transmission of the resistance-encoding transposon Tn1546. Many bacterial species have also developed specific systems for trapping genes, such as the integron, which functions as a collector of various genes mostly involved in antibiotic or antiseptic resistance (87). For example, Vibrio cholerae can acquire and maintain heterologous genes by another mechanism that specifically captures these genes in regions of repetitive DNA by use of specific integrase enzymes (69). In this way, cassette-organized regions containing virulence-related genes or genes encoding products involved in antimicrobial resistance arise.
Other structural features of DNA molecules determine DNA plasticity. DNA primary structure is not homogeneous, and local differences in nucleotide composition and primary sequence motifs can be assessed by computerized scanning (49). These differences accommodate various functional and structural elements in the DNA molecule and as such should survive or overcome selection. The spatial arrangement of certain elements enables specific proteins to bind and affect gene expression by structuring and partitioning the genome (131). Processes involving mutation, deletion, acquisition, or exchange of DNA lead to new three-dimensional structures and make large contributions to genome plasticity. Studies relating DNA structure to mutagenicity are sparse, and additional research is clearly required. In addition, there are clear topological differences in mutation and recombination rates in different regions of the chromosome (58, 91).
Many studies have documented and located genetic changes in the microbial genome which seem to occur for a certain purpose. These genetic events provide the bacterial cell with a specific, advantageous phenotypic trait that is either missing from or different in cells from the same origin of descent. The genetic basis of this phenotype diversity frequently involves hypermutable loci that mutate randomly in time but at a fixed location (76). These sequences are called contingency loci to emphasize their involvement in the adaptation of bacterial cells to sudden changes in the environment. Variation in stretches of repetitive DNA due to slipped-strand mispairing, polymerase inefficiency, and sometimes lack of DNA mismatch repair is a major contingency effector in both eukaryotes and prokaryotes (27, 29, 47, 76). Some authors have suggested that contingency behavior could also occur in the absence of DNA replication (32, 55, 93); this hypothesis is subject to continuing discussion (56). Contingency loci are often associated with virulence genes, which would facilitate accurate adaptation to host environments. Detecting repeat variability could shed light on issues as complex as population mating structure, codon usage in relation to repeat expansion or shrinking, and involvement of these repeats (also known as microsatellites) in contingency behavior. In general, the mechanisms maintaining the capacity to genetically react to environmental stimuli are poorly understood. An important fact to consider is that bacterial populations do not seem to evolve toward a “single most fit clone” occupancy of a given niche (117). Even in a simple environment such as a stationary batch culture, bacteria appear to maintain an interactive polymorphic state (85). In conclusion, certain aspects of genetic variation have an immediate effect on the potential survival of bacterial cells in a hostile environment. The, mechanisms by which these changes can be explained are being found and are deepening our insight in host-pathogen interplay during infections. This can best be exemplified by the description of the way in which typing is used for the taxonomic, evolutionary genetics, and epidemiological typing of a single bacterial species such as E. coli (see below).
E. coli is by far the most extensively studied model microorganism with respect to genetic variation. Various experimental tools have been developed, and new techniques are being described frequently (4). These techniques not only allow rapid monitoring of outbreak situations in clinical settings (1) but also can be used for mapping the (inter)national dissemination of particularly pathogenic clones such as E. coli O157, an important emerging pathogen (116, 130). Recently, the nucleotide sequence of the entire E. coli chromosome was determined by Blattner et al. (9), and this led to the identification of novel cryptic coliform phages and new insertion elements (60, 100). In addition, linkage maps describing the genomic organization of various E. coli strains have been published since the 1970s (8). The availability of these maps and the full chromosome sequence may facilitate the screening of large portions of the DNA sequence from a large number of strains (comparative genomics) with the help of DNA chip technology and sequencing by hybridization. This enables molecular typing to be performed at an unprecedented level of detail.
Many types of genetic variability have been documented for E. coli. Repetitive DNA fluctuates in size and nucleotide sequence motif within and among strains. Sequence variability in individual genes recovered from strains with a different ancestry has been demonstrated by comparative DNA sequencing. Moreover, comparative analysis enabled the detection of major differences in genomic size (90). By random amplification of polymorphic DNA, evidence was provided for the existence of a large pool of strain-specific genes, whose the origin could even be outside the species Escherichia (43). This implies that intensive interspecies DNA exchange is generally important in bacterial evolution. Hot spots for the presence of just a few chromosomal loci were identified, and there appeared to be a clear association between the presence or absence of certain elements and bacterial virulence (90). Nucleic acid transfer has also occurred through mechanisms such as conjugation, transduction, and transformation. Furthermore, basic processes giving rise to localized sequence variability, such as homologous recombination, DNA repair, chromosomal rearrangement, transposition, and site-specific recombination, have been analyzed in a large number of studies with E. coli strains (44). Gene sequence comparisons have illustrated that insertion of foreign DNA into the E. coli chromosome occurs regularly. These regions stand out on the basis of increased sequence divergence when sequences obtained from multiple strains are analyzed (23, 24, 99).
Many studies on the in vitro evolution of E. coli grown in batch cultures for many generations have been published. Chromosomal changes occur during these periods of prolonged cultivation (7), and their effects on fitness can be documented. Many of the genetic changes taking place were associated with rearrangements of insertion elements (83) and allowed for the detection of putatively beneficial mutations. Over time, differences from the ancestral strain increased, as did the genetic diversity encountered among specific cells in the population. Whether starvation leads to accelerated increase in population diversity is still subject to debate, but some studies show this to be quite likely (31).
A mutator is a bacterial isolate that shows significantly increased mutation rates compared to the wild-type isolates. Analysis of the mutator phenotype is facilitated by the versatile genetic cloning and modification systems that exist for E. coli, and direct agar medium-based selection of mutators is now feasible (71). Culture-based identification of mutator cells allows the precise determination of mutation frequencies in these strains and of the environmental effects leading to variation in the frequency. In some model systems, E. coli mutators are capable of overcoming multiple growth-limiting barriers (71). High mutation rates can apparently arise spontaneously; this may be clinically important since accelerated evolution and/or adaptation to the host environment is facilitated (98). Mutators may be involved in the current expansion of the pathogenic E. coli O157 (128). The first outbreak of infections caused by this strain was observed in 1982 (89). This E. coli lineage is probably an example of a strain that acquired a Shiga-like toxin gene as a secondary virulence factor, possibly through bacteriophage transduction into an already cytoadherent type of E. coli (81, 82). The toxin gene was later encountered in diverse lineages, further substantiating the possibility of horizontal transmission (127). E. coli O157 is an example of an organism that became virulent and widespread through the acquisition of a novel genetic element.
In terms of population evolution, the genetic variability of E. coli appears to be shaped by both clonal evolution and recombination. Pioneer population genetic studies based on MLEE have suggested that E. coli exhibits a clonal population structure, as shown in particular by strong levels of linkage disequilibrium (70, 95, 96, 126). However, convergent evidence suggests also that the rate of recombination in natural populations of this bacterium is not negligible. This is shown by the discrepancy between phylogenetic trees designed from the sequence of different genes (25) and by the existence of mosaic genes whose sequences originate from several parental clones through a process of intragenic recombination (67). Direct comparison by the same techniques and with the same statistics clearly shows that the impact of genetic recombination is notably more important in E. coli than in Trypanosoma cruzi, the agent of Chagas' disease (111). Many procedures developed for the study of E. coli strains have been aplied to other microorganisms as well. This has led to increased recognition that typing procedures have contributed to the understanding of microbial genome variability.
Different groups of microbiologists use the same technology for assessing genomic diversity among strains of microorganisms. Medical microbiological epidemiologists are generally involved in monitoring the short-term dissemination of microbes in a single institution. Studies focusing on microbial taxonomy, evolutionary genetics, and population dynamics differ in the sense that efforts are being made to use the genetic data obtained to define species concepts or to understand the evolutionary mechanisms leading to genetic change. Frameworks that facilitate communication between the different categories of microbiologists through common technological approaches to typing of microbes should be designed, and data should be interpreted according to unified guidelines. Reference collections of microorganisms for interlaboratory standardization and intralaboratory optimization should be established. These issues and basic research implications are discussed in more detail below.
Basic to all studies are the typing procedures and the interpretation and categorization of the experimental results. Guidelines to the relationship between data on genomic diversity and relatedness of organisms should be established; i.e., the data should be assessed quantitatively. This is by no means a simple task, but a consensus set of interpretation guidelines should enable data integration. For some common typing procedures, standardization has already been attained (121) or proposed (106, 119). For Mycobacterium tuberculosis typing, an international database has been established to integrate new typing data (48, 121). Because databases like the one for M. tuberculosis survey genetic variability on a worldwide level and without limitations in time, these fingerprint depositories are of great value. Development of similar databases for other microbial species is mandatory to generate similar frameworks.
The technique used to provide answers to a specific question should be chosen with care. Table Table22 summarizes the average accuracy of the procedures available for monitoring certain genetic events. However, the appropriateness of a procedure must be validated experimentally for each microbial species (100, 105, 119). In addition, data exchange should be feasible among researchers and institutions, preferably through electronic communication. It is for this reason that systems with a binary output (numbers or characters) are preferred over ones that produce banding patterns like restriction fragment length polymorphism and PCR fingerprinting analyses (48). There are ongoing discussions on the usefulness of the currently available software for the interpretation of banding patterns by computer procedures (13), but comprehensive programs are undergoing development and validation (33). In the end, DNA sequencing might turn out to be optimal, either through standard protocols involving chemical or enzymatic reactions or, more likely, with the help of DNA chip technology (15). Whatever technique is ultimately selected, it is extremely important to carefully match this technology with the speed at which the molecular changes occur. In this respect, even DNA sequencing requires careful consideration: in gene families like those encoding the serum albumins, for instance, variable mutation frequencies for different but clearly similar members of the gene family were documented on the basis of phylogenetic analyses (34). Even the molecular clock of separate domains in a single gene that initially originated from tripling of the ancestral unit varied, indicating that interpretation of comparative DNA sequencing in a uniform manner carries a substantial risk of error that is difficult to predict. Selection of DNA target sequences of appreciable variability to provide the requested information presents a challenge.
Reference collections of microbes are further assets in our attempt to integrate the various disciplines. This would enable “internal standardization” and detailed comparison of inter- and intralaboratory reproducibility of various test systems. The value of these collections is increasingly clear, but the criteria for strain selection and the qualities that make a robust, useful collection are still not well defined. To illustrate, the quality of various typing techniques for Staphylococcus aureus was tested in several multicenter studies with a single collection of strains. The studies involved pulsed-field gel electrophoresis and random amplification of polymorphic DNA and have led to an enhanced insight into the technical possibilities and limitations of exchanging genetic typing data obtained in different laboratories (18, 105, 118, 119).
Last but certainly not least, our knowledge at the molecular level of the mechanisms leading to type changes requires urgent expansion. In both the natural setting and in laboratory-based experiments we need a deeper understanding of the way in which genetic changes take place. The only route to understanding the implications and significance of changes in a DNA fingerprint will be through detailed studies of the various modes of DNA reshuffling and mutagenesis. Examples of how that can be achieved have been presented in the preceding text, but our current understanding of the molecular basis of genetic flexibility, be it random or specifically driven by internal or environmental influences, represents only the tip of the iceberg. Additional basic research efforts in computer science and data analysis are required. A recent theoretical analysis of longitudinal epidemiological data concerning pathogens which elicit (or do not elicit) immune responses that prevent the efficient transmission of pathogen to a new host explains strain structure in antigenically diverse microbial species (37). The intricate relationship between multiple strains in a single ecosystem, selective pressure exerted by the host's (lack of) immune response, and the antigenic structure of the pathogen population was determined, and evolutionary dynamics could be explained on the basis of model calculations. These sophisticated calculations of strain instability in time and antigenic repertoire illustrate that much is to be gained from additional developments of computer-mediated data analysis. The development of algorithms suited for detailed analysis of whole-genome sequences that are being produced at high speed will shed light on the genetic elements that best suit typing purposes (49). Computational tools enabling the prediction of nucleic acid function on the basis of primary-structure information are required as well. Programs will be available in the not too distant future (10). Using whole-genome sequences and the current computer programs, it has been demonstrated recently that there appears to be a clear difference in horizontal transfer rates of operational and informational genes (46). This is probably due to the increased complexity of the multigene organization of the informational genes.
Techniques for monitoring microbial variability need to be standardized to enable efficient data exchange between researchers in the various areas. Once all disciplines use similar procedures, thereby generating congruent data sets, the communication among disciplines will be facilitated to a large extent. Once the mathematical and statistical procedures are standardized as well, barriers between microbial epidemiologists, population geneticists, and taxonomists will disappear. The adequate development of these procedures requires centralized facilities where people from the different disciplines can interact on a day-to-day basis and provide the logistics for the above goals to be pursued. Moreover, this type of facility would enable concerted research on microbial diversity in the context of the prevention and control of (re)emerging infectious diseases. Likewise, the use of genetically modified microorganisms in agriculture or industry and its potential ecological impact could be carefully assessed. The establishment of large databases on taxonomic and epidemiologic issues of relevance would be another advantage of an internationally operated institution. The first initiatives to establish centralized facilities in Europe are being made public and are based on the successful model provided by the Centers for Disease Control and Prevention (CDC) (Atlanta, Ga.) (107, 108). This move has clear opposition, since it has been suggested by Giesecke and Weinberg that supporting the already existing electronic networks (the virtual European CDC, as they call it) is more likely to “deliver public health action and benefit than building with bricks and mortar.” (35). Through these networks, a guideline for the pan-European surveillance of communicable diseases has already been issued.
Finally, we are convinced that microorganisms present themselves primarily as proposed by Vandamme et al. (120): within a continuum of different entities, with species concepts provide a theoretical framework for classification of major groupings only. The real issue is the presence of chimeric, complex microorganisms that do not belong to any of these groups specifically but share characteristics with some or many. Genetic (in)stability and the possibility of variation will in the end maintain the genetic continuum. It is for this reason that we would like to gather microbial taxonomy, epidemiology, and population genetics under a single universal conceptual and technical umbrella, with each discipline retaining its specific characteristics. As measured with the current and future armamentarium of molecular techniques, genomic evolution provides the common denominator that will unite all those concerned with its various implications. Ultimately, the genomic evolution biologist will unite the characteristics that may now be artificially dispersed among workers in various disciplines.
The first drafts of the manuscript were prepared during a short 1998 summer sabbatical of the first author in the laboratory of E. Richard Moxon (Institute of Molecular Medicine [IMM], Molecular Infectious Disease Group, Headington, Oxford, United Kingdom). The hospitality of the entire “Moxon crew” is gratefully acknowledged.
Sphaero Q (Leiden, The Netherlands), GenProbe (Luik, Belgium), and Oxoid (Haarlem, The Netherlands) are thanked for sharing some of the costs involved in the 3-month stay in the IMM laboratory. Arjan de Visser was supported by a fellowship from the Netherlands Organization for Scientific Research (NWO).