Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Draft Genome Sequence of a Clinical Isolate of Multidrug-Resistant Mycobacterium tuberculosis East African Indian Strain OSDD271 
Genome Announcements  2013;1(4):e00541-13.
We describe the genome sequencing and analysis of a clinical isolate of Mycobacterium tuberculosis East African Indian (EAI) strain OSDD271 from India.
PMCID: PMC3731838  PMID: 23908284
2.  Predictive modeling of anti-malarial molecules inhibiting apicoplast formation 
BMC Bioinformatics  2013;14:55.
Malaria is a major healthcare problem worldwide resulting in an estimated 0.65 million deaths every year. It is caused by the members of the parasite genus Plasmodium. The current therapeutic options for malaria are limited to a few classes of molecules, and are fast shrinking due to the emergence of widespread resistance to drugs in the pathogen. The recent availability of high-throughput phenotypic screen datasets for antimalarial activity offers a possibility to create computational models for bioactivity based on chemical descriptors of molecules with potential to accelerate drug discovery for malaria.
In the present study, we have used high-throughput screen datasets for the discovery of apicoplast inhibitors of the malarial pathogen as assayed from the delayed death response. We employed machine learning approach and developed computational predictive models to predict the biological activity of new antimalarial compounds. The molecules were further evaluated for common substructures using a Maximum Common Substructure (MCS) based approach.
We created computational models using state-of-the-art machine learning algorithms. The models were evaluated based on multiple statistical criteria. We found Random Forest based approach provides for better accuracy as assessed from ROC curve analysis. We further evaluated the active molecules using a substructure based approach to identify common substructures enriched in the active set. We argue that the computational models generated could be effectively used to screen large molecular datasets to prioritize them for phenotypic screens, drastically reducing cost while improving the hit rate.
PMCID: PMC3599641  PMID: 23419172
3.  Computational analysis and predictive modeling of small molecule modulators of microRNA 
MicroRNAs (miRNA) are small endogenously transcribed regulatory RNA which modulates gene expression at a post transcriptional level. These small RNAs have now been shown to be critical regulators in a number of biological processes in the cell including pathophysiology of diseases like cancers. The increasingly evident roles of microRNA in disease processes have also motivated attempts to target them therapeutically. Recently there has been immense interest in understanding small molecule mediated regulation of RNA, including microRNA.
We have used publicly available datasets of high throughput screens on small molecules with potential to inhibit microRNA. We employed computational methods based on chemical descriptors and machine learning to create predictive computational models for biological activity of small molecules. We further used a substructure based approach to understand common substructures potentially contributing to the activity.
We generated computational models based on Naïve Bayes and Random Forest towards mining small RNA binding molecules from large molecular datasets. We complement this with substructure based approach to identify and understand potentially enriched substructures in the active dataset. We use this approach to identify miRNA binding potential of a set of approved drugs, suggesting a probable novel mechanism of off-target activity of these drugs. To the best of our knowledge, this is the first and most comprehensive computational analysis towards understanding RNA binding activities of small molecules and predictive modeling of these activities.
PMCID: PMC3466443  PMID: 22889302
microRNA; Machine learning; Maximum common substructure (MCS)
4.  Computational models for in-vitro anti-tubercular activity of molecules based on high-throughput chemical biology screening datasets 
BMC Pharmacology  2012;12:1.
The emergence of Multi-drug resistant tuberculosis in pandemic proportions throughout the world and the paucity of novel therapeutics for tuberculosis have re-iterated the need to accelerate the discovery of novel molecules with anti-tubercular activity. Though high-throughput screens for anti-tubercular activity are available, they are expensive, tedious and time-consuming to be performed on large scales. Thus, there remains an unmet need to prioritize the molecules that are taken up for biological screens to save on cost and time. Computational methods including Machine Learning have been widely employed to build classifiers for high-throughput virtual screens to prioritize molecules for further analysis. The availability of datasets based on high-throughput biological screens or assays in public domain makes computational methods a plausible proposition for building predictive models. In addition, this approach would save significantly on the cost, effort and time required to run high throughput screens.
We show that by using four supervised state-of-the-art classifiers (SMO, Random Forest, Naive Bayes and J48) we are able to generate in-silico predictive models on an extremely imbalanced (minority class ratio: 0.6%) large dataset of anti-tubercular molecules with reasonable AROC (0.6-0.75) and BCR (60-66%) values. Moreover, these models are able to provide 3-4 fold enrichment over random selection.
In the present study, we have used the data from in-vitro screens for anti-tubercular activity from a high-throughput screen available in public domain to build highly accurate classifiers based on molecular descriptors of the molecules. We show that Machine Learning tools can be used to build highly effective predictive models for virtual high-throughput screens to prioritize molecules from large molecular libraries.
PMCID: PMC3342097  PMID: 22463123
5.  De novo identification of viral pathogens from cell culture hologenomes 
BMC Research Notes  2012;5:11.
Fast, specific identification and surveillance of pathogens is the cornerstone of any outbreak response system, especially in the case of emerging infectious diseases and viral epidemics. This process is generally tedious and time-consuming thus making it ineffective in traditional settings. The added complexity in these situations is the non-availability of pure isolates of pathogens as they are present as mixed genomes or hologenomes. Next-generation sequencing approaches offer an attractive solution in this scenario as it provides adequate depth of sequencing at fast and affordable costs, apart from making it possible to decipher complex interactions between genomes at a scale that was not possible before. The widespread application of next-generation sequencing in this field has been limited by the non-availability of an efficient computational pipeline to systematically analyze data to delineate pathogen genomes from mixed population of genomes or hologenomes.
We applied next-generation sequencing on a sample containing mixed population of genomes from an epidemic with appropriate processing and enrichment. The data was analyzed using an extensive computational pipeline involving mapping to reference genome sets and de-novo assembly. In depth analysis of the data generated revealed the presence of sequences corresponding to Japanese encephalitis virus. The genome of the virus was also independently de-novo assembled. The presence of the virus was in addition, verified using standard molecular biology techniques.
Our approach can accurately identify causative pathogens from cell culture hologenome samples containing mixed population of genomes and in principle can be applied to patient hologenome samples without any background information. This methodology could be widely applied to identify and isolate pathogen genomes and understand their genomic variability during outbreaks.
PMCID: PMC3284880  PMID: 22226071
Epidemics; Mixed Population Genomes; Hologenome; De novo assembly; Japanese encephalitis; Next generation sequencing
6.  Predictive models for anti-tubercular molecules using machine learning on high-throughput biological screening datasets 
BMC Research Notes  2011;4:504.
Tuberculosis is a contagious disease caused by Mycobacterium tuberculosis (Mtb), affecting more than two billion people around the globe and is one of the major causes of morbidity and mortality in the developing world. Recent reports suggest that Mtb has been developing resistance to the widely used anti-tubercular drugs resulting in the emergence and spread of multi drug-resistant (MDR) and extensively drug-resistant (XDR) strains throughout the world. In view of this global epidemic, there is an urgent need to facilitate fast and efficient lead identification methodologies. Target based screening of large compound libraries has been widely used as a fast and efficient approach for lead identification, but is restricted by the knowledge about the target structure. Whole organism screens on the other hand are target-agnostic and have been now widely employed as an alternative for lead identification but they are limited by the time and cost involved in running the screens for large compound libraries. This could be possibly be circumvented by using computational approaches to prioritize molecules for screening programmes.
We utilized physicochemical properties of compounds to train four supervised classifiers (Naïve Bayes, Random Forest, J48 and SMO) on three publicly available bioassay screens of Mtb inhibitors and validated the robustness of the predictive models using various statistical measures.
This study is a comprehensive analysis of high-throughput bioassay data for anti-tubercular activity and the application of machine learning approaches to create target-agnostic predictive models for anti-tubercular agents.
PMCID: PMC3228709  PMID: 22099929

Results 1-6 (6)