virtual screening techniques are valuable computational tools for the discovery of new small molecules that can bind to a specific target [Kinnings et al., 2009
]. Indeed, computational methods have been integrated into the discovery process for over 50 compounds that are in clinical trials as well as marketed drugs [Jorgensen, 2004
]. The benefits of in silico
virtual screening are its speed, enrichment rates, and affordability, which circumvent the often laborious, slow and expensive process of experimental screening and library synthesis/collection. Moreover, in silico
screening explores a much larger chemical space and increases the chance of discovering new compounds with novel chemical scaffolds. It is estimated that up to 20% of new drugs will be found by virtual screening methods in 2010 [Kapetanovic et al., 2008
There are several considerations when performing virtual screening including choice of structure or ligand-based approaches as well as the in silico
chemical library and computational resources required for the screening studies [Dror et al., 2009
]. Structure-based virtual screening methods require the availability of a structure of the target, which is usually acquired through high-resolution NMR or X-Ray crystallography techniques or homology modeling. There are many on-line resources available for structures such as the RSCB protein data bank (PDB) and the Nucleic Acid Database (NDB). These noncommercial databases are useful as the structures can be downloaded directly and prepared for virtual screening experiments. Structure-based virtual screening uses software to screen millions of compounds to determine how well each compound can fit into a active site or inhibit partner interaction on the three dimensional target of interest [Zoete et al., 2009
]. Structure-based virtual screening by molecular docking involves two steps; the “docking” of the compounds to the target to determine how well the small molecule fits to the target site as well as “scoring” of the poses which determines which of the top poses produced by the software is “correct” [Rester, 2008
]. The “scoring” and ranking of the top poses of each ligand in the binding pocket of the target is one of the most challenging aspects of docking [Kinnings et al., 2009
]. Molecular docking using programs such as DOCK, Autodock, Ludi, FlexX and Surflex-Dock have been used to find many lead molecules against a variety of targets, such as thymidylate synthase, retinoic acid receptor, kinases, estrogen receptor and thrombin [Ekins et al., 2007
; Schapira et al., 2000
; Tondi et al., 1999
; Baxter et al., 2000
]. The use of molecular docking appears well entrenched in academia and industry and its use will likely increase as virtual databases of small molecules and drug targets continue to expand.
A second type of virtual screening is ligand-based approach which, in contrast to structure-based virtual screening, requires knowledge only of the structure of a biologically active ligand. The structure of the active compound is compared to millions of other chemical compounds to check for chemical and morphological similarity. If the structure of the test compound is similar to that of the known active compound, then the test compound may possess similar biological activity [Rester, 2008
]. If multiple small molecules are known to possess similar biological activity, a “pharmacophore” can be constructed that describes the chemical properties that are necessary for a ligand to interact with its target. This “pharmacophore” modeling can be particularly useful to detect a wide number of compounds with diverse chemical features [Dror et al., 2005]. One consideration with ligand-based virtual screening is that it does not require knowledge of the structure of the target. This can sometimes be advantageous particularly when controversy surrounds what target structure is “correct” due to the use of different methods (NMR versus X-Ray crystallography versus in vivo
state) for determining the structure. On the other hand, it is disadvantageous in that important interactions of the active compound with the target may not be effectively visualized and assessed due to not being present in the small molecule(s). Ligand-based virtual screening has gained increased acceptance as a method to search for derivatives of known biologically active compounds. This approach has also been used to enrich databases for possible selection of lead compounds [Rester, 2008
]. Programs such as FlexS and Surflex-Sim have been previously used with success for ligand-similarity based searches [Dror et al., 2005].
Another aspect of virtual screening is the selection of a database of small molecules for screening experiments. The number of compounds for virtual screening has increased dramatically in recent years, with tens of millions of compounds currently available in multiple databases [Rester, 2008
]. In our own experience, one of the ZINC databases used for virtual screening experiment increased from approximately 2.7 million compounds in 2007 to approximately 11.3 million compounds in 2010, the majority of which are available from vendors world-wide. There are other databases with even greater numbers of compounds such as Pubchem that had an estimated 37 million compounds in 2009. The value in having such large compound databases is the expanded chemical space that can be explored. This vastly increases the number of small molecules considered as possible lead candidates that is favorable compared to the relatively few molecules that are evaluated by actual chemical synthesis and other drug discovery techniques. Additionally, many of the in silico
libraries have been filtered based on specific criteria to increase the chance that the molecules are “drug-like” in behavior. One such filter is "Lipinski’s Rule of 5", which was derived from a structural and statistical analysis of a large library of drugs that were either currently marketed or in clinical trials. The "Rule of 5" characterizes a small molecule as “drug-like” if it: has ≤5 hydrogen bond donors; has ≤10 hydrogen bond acceptors; is ≤500 daltons molecular weight and has an octanol-water partition coefficient (Log P) of ≤5 [Lipinski et al., 2001
]. Taken in total, virtual screening against large databases of compounds is a way to discover new lead candidate small molecules against a target of interest.
Another consideration when performing virtual screening experiments is the computational requirement. Given the sheer size of the in silico
libraries and complexity of the virtual screening software, larger computer resources are becoming essential to effectively perform virtual screening experiments. In collaboration with the Kentucky Dataseam Initiative (http://www.kydataseam.com/index2.htm
), we use a "grid" of >10,000 computer processors on a network that can be used for in silico
screening of large chemical compound libraries. This grid can reduce computational docking time from decades to a few days and is a useful resource for our current drug discovery efforts. Benchmarking of candidate software is another consideration because many of the screening software packages can be operated using a multitude of options that can substantially impact docking accuracy and speed. For example, in some recent docking experiments, the software Surflex-Dock was an order of magnitude faster than Autodock even though the software performance (docking accuracy and ranking) was comparable [Holt et al., 2008
]. Thus, a researcher should consider the software, number and speed of available computers and in silico
library size when performing virtual screening experiments.
While the use of virtual screening for the discovery of new ligands that target proteins has been well established, very few studies have been performed with nucleic acids [Detering et al., 2004
; Hurley, 2001
]. This may be partly because almost all virtual screening software has been parameterized for proteins, and may not account for characteristics that are particularly important to nucleic acids, such as their distinct geometrical symmetry, and the electrostatic effects of the phosphate backbone. Moreover, there are few published reports of the use of these programs to target nucleic acids [Chen et al., 1997
; Evans et al., 2006
]. The most notable attempt at screening against DNA morphologies is an early attempt using the first version of DOCK (1.0) by Kuntz, Kollman et al., [Grootenhuis et al., 1994
]. It should be noted that DOCK is now five generations further refined (6.4) and there are now several alternatives available such as Autodock and Surflex. In their pioneering study [Grootenhuis et al., 1994
] Kollman et al., used the DOCK algorithm to study the interaction of 10,000 compounds from the Cambridge Crystallographic Database with A-, B-, and Z-form DNA [Kuntz et al., 1982
]. Despite this success, perhaps the greatest gap in knowledge in this area is the lack of a systematic study to determine whether docking software can accurately reproduce known structures of ligands bound to nucleic acids and also predict the binding mechanisms of small molecules to nucleic acids. We recently addressed this deficiency for the software Surflex-Dock and Autodock and demonstrated unambiguously that these packages can reproduce known structures of small molecules bound to DNA [Holt et al., 2008
]. In total, this work suggests that in silico
approaches can be applied to DNA to discover new small molecules that bind to a specific target with a known binding mechanism.