Completion of the whole-genome sequencing of many organisms, ranging from bacteria to humans, has transformed the way in which biological research is conducted. Genome sequencing is mostly used as a resource to obtain the reference sequence information of laboratory species, and its full applications in genetic research remain unexplored, due to its time-consuming and expensive nature. These problems can potentially be circumvented using next-generation sequencing platforms such as 454, Solexa, and SOLiD, which perform cost effective, high throughput sequencing, thus making sequencing of individual isolates a feasible option. For example, with the Solexa platform, a large number of DNA fragments are immobilized on a solid surface and read with fluorescence-labeled nucleotides simultaneously. Millions of 36–50 base pair long reads can be obtained from each sample lane at a cost of less than $1000. The deep sampling of DNA fragments allows rapid procurement of high coverage genome sequence information. These new, powerful sequencing technologies will be widely accessible in the near future, and have the potential to revolutionize the way in which current research is conducted.
Genetic studies with model organisms are often conducted with multiple laboratory strains without detailed information on how these strains differ from one another. The observation of several ‘strain-specific phenotypes’ suggests underlying differences in their genomes. Further, multiple isolates are used for each strain, in most cases, without an objective measure of their isogenicity. Direct sequencing has the potential to identify such unknown differences to inform experimental design and analysis, and reveal avenues for reverse genetic studies. Another potentially tremendous benefit from the knowledge of complete and precise genome sequences is the direct identification of suppressor mutations. Traditional genetic mapping to identify suppressors is a time-consuming process, which can be further complicated by unstable strains, dominant alleles, and multiple suppressors occurring in a single strain. Epistatic interactions are commonly studied between pairs of relevant genes and suppressor mapping is often designed to reveal two-locus genetic interactions. Despite the potential prevalence of multi-component genetic interactions in organisms, they are difficult to identify with traditional genetic approaches. Whole-genome sequencing, however, can circumvent these difficulties, by identifying multiple mutations in a given strain in a single step.
The Gram-positive bacterium Bacillus subtilis
is an ideal system for a ‘proof-of-principle’ study of the applications of whole-genome sequencing. Being an excellent model for investigating the mechanisms of gene regulation, differentiation, and metabolism, B. subtilis
has been extensively studied in hundreds of laboratories world-wide for more than half a century using a variety of laboratory strains 
. However, the laboratory strain 168 
is the only B. subtilis
strain with known genomic sequence, obtained through an extensive collaboration more than ten years ago 
. 168 was generated by mutagenic X-rays and UV treatment of the wild type B. subtilis
(Marburg) strain 
, resulting in the requirement for externally added tryptophan for growth, and the inability to produce a secreted antibiotic surfactin, due to mutations in the genes trpC
, respectively 
. Another broadly studied strain JH642 
which was obtained by multiple gene exchange experiments (
, and James Hoch, personal communication) further differs from 168, including mutations in the genes pheA
that lead to phenylalanine requirement and cold sensitivity, respectively 
. On the other hand, some laboratory strains (such as NCIB 3610 and SMY) do not have these phenotypes and are proposed to be true wild type strains. Thus, obtaining the genome information of these different laboratory strains and their independent isolates would aid in understanding of the reproducibility of results between strains, the molecular bases of strain-specific phenotypes, as well as defining the ‘isogenicity’ of isolates.
In this work, we used the Solexa Genome Analyzer method to sequence the related laboratory strains 168, NCIB 3610, SMY and JH642 (). Based on our results, we provide an updated draft of the 168 reference sequence. In addition, we found that independent isolates of the same strain differ by as few as 6 base pairs, while the difference between laboratory strains is larger. We verified multiple genome variations reported in the literature, and verified selected additional base variations by Sanger sequencing. Further, by correlating the genotypes with the phenotypes, we experimentally uncovered a hidden phenotype of the laboratory strain JH642 due to a defect in its two-component histidine kinase sensor responsible for citrate import. Finally, we identified the multiple causal nucleotide alterations in a single suppressor strain of a relA
deletion mutant. The RelA enzyme is crucial for modulating the level of the small nucleotide (p)ppGpp, which is central in mounting the bacterial starvation response-the stringent response 
. We identified mutations in two small homologs of relA
that were independently shown to have (p)ppGpp synthesis activities 
and found that mutations in each of these genes lead to partial suppression of the relA
-associated growth defect. As a result, multiple types of suppressor mutations are generated in these genes in response to deletion of relA
, making their identification difficult with traditional genetic mapping. Hence whole-genome sequencing enables the identification of individual nodes of multi-component genetic interaction networks simultaneously, and maps evolutionary pathways that can promote the growth of a genetically compromised strain. Our results offer strong proof that the Solexa method can be used to rapidly reveal multiple aspects of genomic content and organization, especially base substitutions, which greatly simplifies experimental design and facilitates our understanding of the biology of model organisms. This method can be applied broadly, including to similar studies with other bacterial and higher organisms 
B. subtilis strains sequenced and the number of Solexa sequencing reads at each genomic position for each strain.