Here we report a functional and structural genomics effort that applied saturation-level transposon mutagenesis and next generation sequencing (Tn-seq) to identify essential genes in B. thailandensis
, followed by high-throughput structure determination. We used an “ortholog rescue” approach to maximize structural coverage of these gene families, which are likely to be essential not only in B. thailandensis,
but also in related, but more virulent, Burkholderia
species, such as B. pseudomallei
. A large fraction of the genes (83%, 336/406) that we identified have homologs previously identified as essential either in B. cenocepacia
, in P. aeruginosa
, or in other prokaryotes listed in the Database of Essential Genes 
. Of the remaining 70, some are likely to be essential but have not been identified previously, as there had been no experimental genome-wide essentiality studies in Burkholderia
prior to this study. A small percentage of our putative essential genes may be false positives – genes wrongly identified as essential. These are most likely to be small genes which due to their size are most likely to have eluded mutagenesis, or genes with close to the threshold of three insertions per kB in the 5–90% portion of the ORF (in two independent mutant pools) (Table S1
). This threshold was chosen based on a survey of genes thought to be essential based on annotated function, in which small numbers of insertions were detected, and was used to reduce false negatives; for example, rare insertions in transiently duplicated genes or within intra-domain regions may not fully abrogate essential function. False negatives are still possible, and are most likely to be genes that possess nonessential domains tolerant of transposon insertions.
The number of essential genes identified, 406, falls within the range of values estimated for other bacteria using experimental approaches such as genome-wide gene disruption or mutagenesis 
. Experimentally determined estimates of the number of essential genes in pathogenic bacteria range from <200 to >600. By comparing the genomes of all 51 species in the order Burkholderiales
and clustering using OrthoMCL, Juhas et al.
identified 610 ortholog groups conserved among all 51 species (the “core genome”), corresponding to 649 genes in B. cenocepacia
. Of these 649 genes, 454 had homologs in the Database of Essential Genes (DEG). However, both computational gene conservation analysis and experimental methods that use lower mutation rates per gene (upon which much of the DEG is based) are likely to overestimate the number of essential genes.
By using an ortholog rescue strategy for insoluble or difficult to crystallize targets, we increased our structural coverage of B. thailandensis
essential genes from 31/406 (7.6%) to 49/406 (12.1%) (, Table S2
). Such an approach has been used previously in high-throughput structure determination efforts to similarly improve the overall gene-to-structure efficiency for closely related protein sequences. In Plasmodium
, the ortholog rescue approach was able to improve the protein solubility rate to 229/468 target genes (49%) resulting in 32 structures (6.8%) 
. SSGCID has also improved the gene-to-structure rate from 11% for Mycobacterium tuberculosis
targets to 36% by using orthologs from nine other Mycobacterium
species [manuscript in preparation]. However, the underlying rationale for this approach – that ortholog structures are sufficiently similar to serve as surrogates in drug design – has rarely been verified with experimental data. For the seven pairs of ortholog structures (with no bound ligand) solved in this study, the average overall Cα RMSD was 1.5±0.5 Å (Table S4
), indicating a high degree of structural similarity. This structural similarity suggests that the ortholog approach is an efficient method to obtain useable structures from otherwise intractable targets, thereby lowering the barrier to structure-based drug design targeting infectious organisms. Ortholog structures may also be useful in designing broad-spectrum antibiotics with cross-species activity, and by representing a variety of functionally conservative point mutations in the active site may be useful in developing drugs less susceptible to mutations that cause drug resistance.
Of the 56 Burkholderia protein targets with a structure solved, 25 possess properties of a potential antimicrobial drug target: i.e., they were experimentally identified as an essential gene product or are a close ortholog; they are members of a metabolic pathway containing at least two essential enzymes (as listed in the DEG); they possess a deep, druggable pocket large enough to envelop a compound of at least six non-hydrogen atoms; and they lack a close human homolog, reducing the chance of host toxicity. Thus we have solved structures for 25 Burkholderia proteins that appear worthy of further validation as drug targets, including chemical validation to determine whether blocking the target affects cell growth and viability in vivo.
We have combined an experimental genome-wide essentiality screen in B. thailandensis, using a high rate of insertions per gene, with high-throughput structure determination and an ortholog rescue approach to achieve a significant structural coverage of essential genes. Using only seven Burkholderia species to select orthologs of essential genes, we solved structures for 49/406 essential gene families, and for 56 total Burkholderia protein targets (including seven ortholog replicates). Of these 56 targets, 25 satisfied criteria for being a potential antimicrobial drug target. By increasing the number of species used to select orthologs, future efforts may come closer to complete coverage of the essential structomes of other infectious organisms. The resulting collection of structures and information about target essentiality and solubility provides a resource for development of new antibiotics to treat Burkholderia-related infectious diseases.