In this study, we show the utility of the 454 pyrosequencing technology and a ‘split mapping’ computational method to investigate SGRs in bacterial populations. Massively parallel pair-end sequencing has been extensively used to identify genome rearrangements in cancer genomes, in which putative rearrangements were suggested by discordantly mapping reads and then experimentally confirmed by PCR amplification of the breakpoints in tumor and normal DNA. However, the frequencies of non-selected SGRs in bacterial populations are usually very low, which renders PCR unreliable in verifying putative SGRs. Therefore, we used 454 pyrosequencing to obtain whole genome sequences in relatively long reads (300 nucleotides on average), subsequently determined the breakpoints of the putative SGRs to base pair resolution via a ‘split mapping’ computational method and employed a new technique, padlock probe hybridization, to experimentally verify the junction sequences of putative SGRs. By using this strategy, we were able to identify and experimentally confirm junction sequences caused by SGRs in a S. typhimurium
population and determine how fast SGRs approach their steady state frequency by examining the frequency of SGRs at three different time points (generations 48, 144 and 240) in cells from a chemostat-grown population. We classified the identified putative rearrangements into duplications, inversions and small deletions based on the relative chromosomal locations and orientations of the prefix and suffix in a read sampled across a putative rearrangement. One should note that the junctions caused by translocations would also be identified as duplication or inversion junctions, but since they are rare rearrangement events in S. typhimurium
it is likely that translocations only had a small contribution in generating junction sequences in the chromosome 
Based on the verification results, the frequency of expected true SGRs at generation 48 was calculated to be approximately 20%, 20% and 40% for duplications, inversions and small deletions, respectively (as estimated from dataset gen48) and SGRs reached steady state within 48 generations based on the observation that there was no significant difference between the three datasets (gen48, gen144 and gen240) in terms of the frequency of expected true SGRs. Previous estimates suggest that at least 10% of cells contain a duplication somewhere in the genome in a growing S. typhimurium
. The frequency of spontaneous duplications (20%) deduced from this study is in good agreement with the previous estimate (10%), considering that the previous calculation of duplication frequency was based on a subset of spontaneous duplications that could potentially bias the estimation. Our results suggest that spontaneous duplications are more frequent than previously estimated, even though the most frequent rearrangements (spontaneous duplications between rRNA operons) are not detectable in this study. Thus, the frequency of duplications is likely to exceed 20% of the cells in the population. To our knowledge, neither inversion or deletion frequency has been measured previously on a genome-wide scale in a bacterial genome because the detection usually relies on observable phenotypes generated by these two types of rearrangements and it is difficult to do on a large scale in the chromosome. Most previous works on measuring inversion frequencies were based on placing sequences in inverse order at known chromosomal positions and examining inversions formed at these specific sequences 
. Furthermore, for measurements of deletion frequencies most studies were either performed in the same manner as inversions by placing sequences in direct order at known positions 
, or focused on deletions occurring within a specific sequence context 
. Thus, the results of our study provide new insights into frequencies of SGRs in bacteria populations.
Despite the strength of this new strategy in terms of detecting and validating low abundance SGRs on a genome-wide scale in bacterial populations, a few limitations should be noted: (1) Because the frequencies of most SGRs are relatively low in a bacterial population, it is not possible to isolate individual cells with a particular genome rearrangement and study it in detail but instead we have to rely on identifying the unique junction sequences generated by the SGRs and deduce the structures of the rearrangements. (2) SGRs formed between long repetitive sequences are undetectable due to the limited read length, which could lead to underestimation of rearrangement frequencies. (3) Although padlock probe hybridization technique has been used to detect low abundance DNA sequences with extraordinary sensitivity and precision, we cannot completely rule out the existence of artifacts giving rise to false positives. One possibility is that, in detecting a small deletion junction sequence, the wild type sequence of deleted region forms a hairpin loop structure that could potentially juxtapose the padlock probes binding to the flanking regions and lead to a substrate for DNA ligation. However, we were unable to find any strong palindromic sequences in the small deletions identified in this work, which made this possibility less likely. (4) Padlock probe hybridization technique is not applicable for those rearrangements with >30 bp junction microhomology due to the limited length of padlock probe. Therefore, verification of rearrangements with long junction microhomology can only rely on PCR. The deduced frequencies of inversions (20%) and deletions (40%) are higher than expected considering the irreversibility of deletions and the low reversibility of inversions. If all the identified inversion and deletion junction sequences came from genomes of viable cells and these cells could form colonies on plates with proper size, one would expect that 60% of randomly picked colonies should contain an inversion or a deletion somewhere in the genome assuming that these rearrangements were evenly distributed among cells. However, the above deduction is unlikely to be true because otherwise it should have been noticed in whole genome re-sequencing work. This discrepancy can be explained by the fact that the examination of SGRs in this work was based on the detection of rearrangement junction sequences rather than isolation of mutants with selectable phenotypes as in most previous works on this subject. Firstly, cells with inversions or deletions, which are likely to be costly for the cells to carry 
, could be either very slow-growing or lethal and cannot form full-size colonies. Secondly, irreversible rearrangements could be accumulated in a small subpopulation of cells that each contains many different rearrangements, which will lead to two possible outcomes: (i) cells with multiple rearrangements cannot form full-size colonies due to the synthetic sickness or lethality; (ii) the small size of subpopulations with rearrangements will make it difficult for small-scale whole genome re-sequencing work to find those clones derived from cells with rearrangements, e.g whole-genome re-sequencing of 100 independent clones each derived from a single cell is required to detect such a clone if 1% of cells in the population accumulate rearrangements.
In a summary, by using the strategy described in this work, we have taken three “snapshots” of a growing bacterial population at three transient states (generations 48, 144 and 240) in terms of their genomic sequences and revealed all the footprints (junction sequences) made by chromosomal rearrangements at each of the transient states, but these footprints might disappear under certain conditions, such as forming visible colonies on plates. Finally, we think that complete characterization of all types of SGRs in unselected bacterial populations will require combining pair-end sequencing of libraries with large insert size (10 kb), which can be used to detect rearrangements between large repeats (rRNA operons, IS elements), and the strategy described in this study. With the fast development of sequencing technologies, one would expect more rapid and accurate estimate of the frequency of SGRs in bacterial populations under any defined genetic background or growth conditions, which could greatly facilitate examination of genome stability and studies of bacterial genome evolution.