PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-24 (24)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  CancerDR: Cancer Drug Resistance Database 
Scientific Reports  2013;3:1445.
Cancer therapies are limited by the development of drug resistance, and mutations in drug targets is one of the main reasons for developing acquired resistance. The adequate knowledge of these mutations in drug targets would help to design effective personalized therapies. Keeping this in mind, we have developed a database “CancerDR”, which provides information of 148 anti-cancer drugs, and their pharmacological profiling across 952 cancer cell lines. CancerDR provides comprehensive information about each drug target that includes; (i) sequence of natural variants, (ii) mutations, (iii) tertiary structure, and (iv) alignment profile of mutants/variants. A number of web-based tools have been integrated in CancerDR. This database will be very useful for identification of genetic alterations in genes encoding drug targets, and in turn the residues responsible for drug resistance. CancerDR allows user to identify promiscuous drug molecules that can kill wide range of cancer cells. CancerDR is freely accessible at http://crdd.osdd.net/raghava/cancerdr/
doi:10.1038/srep01445
PMCID: PMC3595698  PMID: 23486013
2.  Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information 
BMC Bioinformatics  2013;14:44.
Background
The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure.
Results
In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets.
Conclusions
This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/).
doi:10.1186/1471-2105-14-44
PMCID: PMC3577447  PMID: 23387468
Vitamin-interacting residue; Pyridoxal-5-phosphate; SVM; PSSM; VitaPred
3.  NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database 
Nucleic Acids Research  2012;41(D1):D1124-D1129.
Plant-derived molecules have been highly valued by biomedical researchers and pharmaceutical companies for developing drugs, as they are thought to be optimized during evolution. Therefore, we have collected and compiled a central resource Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database (NPACT, http://crdd.osdd.net/raghava/npact/) that gathers the information related to experimentally validated plant-derived natural compounds exhibiting anti-cancerous activity (in vitro and in vivo), to complement the other databases. It currently contains 1574 compound entries, and each record provides information on their structure, manually curated published data on in vitro and in vivo experiments along with reference for users referral, inhibitory values (IC50/ED50/EC50/GI50), properties (physical, elemental and topological), cancer types, cell lines, protein targets, commercial suppliers and drug likeness of compounds. NPACT can easily be browsed or queried using various options, and an online similarity tool has also been made available. Further, to facilitate retrieval of existing data, each record is hyperlinked to similar databases like SuperNatural, Herbal Ingredients’ Targets, Comparative Toxicogenomics Database, PubChem and NCI-60 GI50 data.
doi:10.1093/nar/gks1047
PMCID: PMC3531140  PMID: 23203877
4.  Predicting Turns in Proteins with a Unified Model 
PLoS ONE  2012;7(11):e48389.
Motivation
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously.
Results
In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
doi:10.1371/journal.pone.0048389
PMCID: PMC3492357  PMID: 23144872
5.  GlycoPP: A Webserver for Prediction of N- and O-Glycosites in Prokaryotic Protein Sequences 
PLoS ONE  2012;7(7):e40155.
Glycosylation is one of the most abundant post-translational modifications (PTMs) required for various structure/function modulations of proteins in a living cell. Although elucidated recently in prokaryotes, this type of PTM is present across all three domains of life. In prokaryotes, two types of protein glycan linkages are more widespread namely, N- linked, where a glycan moiety is attached to the amide group of Asn, and O- linked, where a glycan moiety is attached to the hydroxyl group of Ser/Thr/Tyr. For their biologically ubiquitous nature, significance, and technology applications, the study of prokaryotic glycoproteins is a fast emerging area of research. Here we describe new Support Vector Machine (SVM) based algorithms (models) developed for predicting glycosylated-residues (glycosites) with high accuracy in prokaryotic protein sequences. The models are based on binary profile of patterns, composition profile of patterns, and position-specific scoring matrix profile of patterns as training features. The study employ an extensive dataset of 107 N-linked and 116 O-linked glycosites extracted from 59 experimentally characterized glycoproteins of prokaryotes. This dataset includes validated N-glycosites from phyla Crenarchaeota, Euryarchaeota (domain Archaea), Proteobacteria (domain Bacteria) and validated O-glycosites from phyla Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria (domain Bacteria). In view of the current understanding that glycosylation occurs on folded proteins in bacteria, hybrid models have been developed using information on predicted secondary structures and accessible surface area in various combinations with training features. Using these models, N-glycosites and O-glycosites could be predicted with an accuracy of 82.71% (MCC 0.65) and 73.71% (MCC 0.48), respectively. An evaluation of the best performing models with 28 independent prokaryotic glycoproteins confirms the suitability of these models in predicting N- and O-glycosites in potential glycoproteins from aforementioned organisms, with reasonably high confidence. A web server GlycoPP, implementing these models is available freely at http:/www.imtech.res.in/raghava/glycopp/.
doi:10.1371/journal.pone.0040155
PMCID: PMC3392279  PMID: 22808107
6.  TumorHoPe: A Database of Tumor Homing Peptides 
PLoS ONE  2012;7(4):e35187.
Background
Cancer is responsible for millions of immature deaths every year and is an economical burden on developing countries. One of the major challenges in the present era is to design drugs that can specifically target tumor cells not normal cells. In this context, tumor homing peptides have drawn much attention. These peptides are playing a vital role in delivering drugs in tumor tissues with high specificity. In order to provide service to scientific community, we have developed a database of tumor homing peptides called TumorHoPe.
Description
TumorHoPe is a manually curated database of experimentally validated tumor homing peptides that specifically recognize tumor cells and tumor associated microenvironment, i.e., angiogenesis. These peptides were collected and compiled from published papers, patents and databases. Current release of TumorHoPe contains 744 peptides. Each entry provides comprehensive information of a peptide that includes its sequence, target tumor, target cell, techniques of identification, peptide receptor, etc. In addition, we have derived various types of information from these peptide sequences that include secondary/tertiary structure, amino acid composition, and physicochemical properties of peptides. Peptides in this database have been found to target different types of tumors that include breast, lung, prostate, melanoma, colon, etc. These peptides have some common motifs including RGD (Arg-Gly-Asp) and NGR (Asn-Gly-Arg) motifs, which specifically recognize tumor angiogenic markers. TumorHoPe has been integrated with many web-based tools like simple/complex search, database browsing and peptide mapping. These tools allow a user to search tumor homing peptides based on their amino acid composition, charge, polarity, hydrophobicity, etc.
Conclusion
TumorHoPe is a unique database of its kind, which provides comprehensive information about experimentally validated tumor homing peptides and their target cells. This database will be very useful in designing peptide-based drugs and drug-delivery system. It is freely available at http://crdd.osdd.net/raghava/tumorhope/.
doi:10.1371/journal.pone.0035187
PMCID: PMC3327652  PMID: 22523575
7.  PolysacDB: A Database of Microbial Polysaccharide Antigens and Their Antibodies 
PLoS ONE  2012;7(4):e34613.
Vaccines based on microbial cell surface polysaccharides have long been considered as attractive means to control infectious diseases. To realize this goal, detailed systematic information about the antigenic polysaccharide is necessary. However, only a few databases that provide limited knowledge in this area are available. This paper describes PolysacDB, a manually curated database of antigenic polysaccharides. We collected and compiled comprehensive information from literature and web resources about antigenic polysaccharides of microbial origin. The current version of the database has 1,554 entries of 149 different antigenic polysaccharides from 347 different microbes. Each entry provides comprehensive information about an antigenic polysaccharide, i.e., its origin, function, protocols for its conjugation to carriers, antibodies produced, details of assay systems, specificities of antibodies, proposed epitopes involved and antibody utilities. For convenience to the user, we have integrated web interface for searching, advanced searching and browsing data in database. This database will be useful for researchers working on polysaccharide-based vaccines. It is freely available from the URL: http://crdd.osdd.net/raghava/polysacdb/.
doi:10.1371/journal.pone.0034613
PMCID: PMC3324500  PMID: 22509333
8.  ccPDB: compilation and creation of data sets from Protein Data Bank 
Nucleic Acids Research  2011;40(D1):D486-D489.
ccPDB (http://crdd.osdd.net/raghava/ccpdb/) is a database of data sets compiled from the literature and Protein Data Bank (PDB). First, we collected and compiled data sets from the literature used for developing bioinformatics methods to annotate the structure and function of proteins. Second, data sets were derived from the latest release of PDB using standard protocols. Third, we developed a powerful module for creating a wide range of customized data sets from the current release of PDB. This is a flexible module that allows users to create data sets using a simple six step procedure. In addition, a number of web services have been integrated in ccPDB, which include submission of jobs on PDB-based servers, annotation of protein structures and generation of patterns. This database maintains >30 types of data sets such as secondary structure, tight-turns, nucleotide interacting residues, metals interacting residues, DNA/RNA binding residues and so on.
doi:10.1093/nar/gkr1150
PMCID: PMC3245168  PMID: 22139939
9.  ProGlycProt: a repository of experimentally characterized prokaryotic glycoproteins 
Nucleic Acids Research  2011;40(D1):D388-D393.
ProGlycProt (http://www.proglycprot.org/) is an open access, manually curated, comprehensive repository of bacterial and archaeal glycoproteins with at least one experimentally validated glycosite (glycosylated residue). To facilitate maximum information at one point, the database is arranged under two sections: (i) ProCGP—the main data section consisting of 95 entries with experimentally characterized glycosites and (ii) ProUGP—a supplementary data section containing 245 entries with experimentally identified glycosylation but uncharacterized glycosites. Every entry in the database is fully cross-referenced and enriched with available published information about source organism, coding gene, protein, glycosites, glycosylation type, attached glycan, associated oligosaccharyl/glycosyl transferases (OSTs/GTs), supporting references, and applicable additional information. Interestingly, ProGlycProt contains as many as 174 entries for which information is unavailable or the characterized glycosites are unannotated in Swiss-Prot release 2011_07. The website supports a dedicated structure gallery of homology models and crystal structures of characterized glycoproteins in addition to two new tools developed in view of emerging information about prokaryotic sequons (conserved sequences of amino acids around glycosites) that are never or rarely seen in eukaryotic glycoproteins. ProGlycProt provides an extensive compilation of experimentally identified glycosites (334) and glycoproteins (340) of prokaryotes that could serve as an information resource for research and technology applications in glycobiology.
doi:10.1093/nar/gkr911
PMCID: PMC3245024  PMID: 22039152
10.  HIVsirDB: A Database of HIV Inhibiting siRNAs 
PLoS ONE  2011;6(10):e25917.
Background
Human immunodeficiency virus (HIV) is responsible for millions of deaths every year. The current treatment involves the use of multiple antiretroviral agents that may harm patients due to their toxic nature. RNA interference (RNAi) is a potent candidate for the future treatment of HIV, uses short interfering RNA (siRNA/shRNA) for silencing HIV genes. In this study, attempts have been made to create a database HIVsirDB of siRNAs responsible for silencing HIV genes.
Descriptions
HIVsirDB is a manually curated database of HIV inhibiting siRNAs that provides comprehensive information about each siRNA or shRNA. Information was collected and compiled from literature and public resources. This database contains around 750 siRNAs that includes 75 partially complementary siRNAs differing by one or more bases with the target sites and over 100 escape mutant sequences. HIVsirDB structure contains sixteen fields including siRNA sequence, HIV strain, targeted genome region, efficacy and conservation of target sequences. In order to facilitate user, many tools have been integrated in this database that includes; i) siRNAmap for mapping siRNAs on target sequence, ii) HIVsirblast for BLAST search against database, iii) siRNAalign for aligning siRNAs.
Conclusion
HIVsirDB is a freely accessible database of siRNAs which can silence or degrade HIV genes. It covers 26 types of HIV strains and 28 cell types. This database will be very useful for developing models for predicting efficacy of HIV inhibiting siRNAs. In summary this is a useful resource for researchers working in the field of siRNA based HIV therapy. HIVsirDB database is accessible at http://crdd.osdd.net/raghava/hivsir/.
doi:10.1371/journal.pone.0025917
PMCID: PMC3191155  PMID: 22022467
11.  Identification of Mannose Interacting Residues Using Local Composition 
PLoS ONE  2011;6(9):e24039.
Background
Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs.
Results
This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM) based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1) main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2) realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (http://www.imtech.res.in/raghava/premier/).
Conclusions
Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function of protein as well as in understanding the role of mannose in the immune system.
doi:10.1371/journal.pone.0024039
PMCID: PMC3172211  PMID: 21931639
12.  Designing of Highly Effective Complementary and Mismatch siRNAs for Silencing a Gene 
PLoS ONE  2011;6(8):e23443.
In past, numerous methods have been developed for predicting efficacy of short interfering RNA (siRNA). However these methods have been developed for predicting efficacy of fully complementary siRNA against a gene. Best of author's knowledge no method has been developed for predicting efficacy of mismatch siRNA against a gene. In this study, a systematic attempt has been made to identify highly effective complementary as well as mismatch siRNAs for silencing a gene.
Support vector machine (SVM) based models have been developed for predicting efficacy of siRNAs using composition, binary and hybrid pattern siRNAs. We achieved maximum correlation 0.67 between predicted and actual efficacy of siRNAs using hybrid model. All models were trained and tested on a dataset of 2182 siRNAs and performance was evaluated using five-fold cross validation techniques. The performance of our method desiRm is comparable to other well-known methods. In this study, first time attempt has been made to design mutant siRNAs (mismatch siRNAs). In this approach we mutated a given siRNA on all possible sites/positions with all possible nucleotides. Efficacy of each mutated siRNA is predicted using our method desiRm. It is well known from literature that mismatches between siRNA and target affects the silencing efficacy. Thus we have incorporated the rules derived from base mismatches experimental data to find out over all efficacy of mutated or mismatch siRNAs. Finally we developed a webserver, desiRm (http://www.imtech.res.in/raghava/desirm/) for designing highly effective siRNA for silencing a gene. This tool will be helpful to design siRNA to degrade disease isoform of heterozygous single nucleotide polymorphism gene without depleting the wild type protein.
doi:10.1371/journal.pone.0023443
PMCID: PMC3154470  PMID: 21853133
13.  Bridging Innate and Adaptive Antitumor Immunity Targeting Glycans 
Effective immunotherapy for cancer depends on cellular responses to tumor antigens. The role of major histocompatibility complex (MHC) in T-cell recognition and T-cell receptor repertoire selection has become a central tenet in immunology. Structurally, this does not contradict earlier findings that T-cells can differentiate between small hapten structures like simple glycans. Understanding T-cell recognition of antigens as defined genetically by MHC and combinatorially by T cell receptors led to the “altered self” hypothesis. This notion reflects a more fundamental principle underlying immune surveillance and integrating evolutionarily and mechanistically diverse elements of the immune system. Danger associated molecular patterns, including those generated by glycan remodeling, represent an instance of altered self. A prominent example is the modification of the tumor-associated antigen MUC1. Similar examples emphasize glycan reactivity patterns of antigen receptors as a phenomenon bridging innate and adaptive but also humoral and cellular immunity and providing templates for immunotherapies.
doi:10.1155/2010/354068
PMCID: PMC2896669  PMID: 20617150
14.  CyclinPred: A SVM-Based Method for Predicting Cyclin Protein Sequences 
PLoS ONE  2008;3(7):e2605.
Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server- CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods.
doi:10.1371/journal.pone.0002605
PMCID: PMC2435623  PMID: 18596929
15.  PRRDB: A comprehensive database of Pattern-Recognition Receptors and their ligands 
BMC Genomics  2008;9:180.
Background
Recently in a number of studies, it has been demonstrated that the innate immune system doesn't merely acts as the first line of defense but provides critical signals for the development of specific adaptive immune response. Innate immune system employs a set of receptors called pattern recognition receptors (PRRs) that recognize evolutionarily conserved patterns from pathogens called pathogen associated molecular patterns (PAMPs). In order to assist scientific community, a database PRRDB has been developed that provides extensive information about pattern recognition receptors and their ligands.
Results
The current version of database contains around 500 patterns recognizing receptors from 77 distinct organisms ranging from insects to human. This includes 177 Toll-like receptors, 124 are Scavenger receptors and 67 are Nucleotide Binding Site-Leucine repeats rich receptors. The database also provides information about 266 ligands that includes carbohydrates, proteins, nucleic acids, glycolipids, glycoproteins, lipopeptides. A number of web tools have been integrated in PRRDB in order to provide following services: i) searching on any field; ii) database browsing; and iii) BLAST search against the pattern-recognition receptors. PRRDB also provides external links to standard databases like Swiss-Prot and Pubmed.
Conclusion
PRRDB is a unique database of its kind, which provides comprehensive information about innate immunity. This database will be very useful in designing effective adjuvant for subunit vaccine and in understanding role of innate immunity. The database is available from the URL's in the Availabiltiy and requirements section.
doi:10.1186/1471-2164-9-180
PMCID: PMC2346480  PMID: 18423032
16.  Predicting Chemical Toxicity Effects Based on Chemical-Chemical Interactions 
PLoS ONE  2013;8(2):e56517.
Toxicity is a major contributor to high attrition rates of new chemical entities in drug discoveries. In this study, an order-classifier was built to predict a series of toxic effects based on data concerning chemical-chemical interactions under the assumption that interactive compounds are more likely to share similar toxicity profiles. According to their interaction confidence scores, the order from the most likely toxicity to the least was obtained for each compound. Ten test groups, each of them containing one training dataset and one test dataset, were constructed from a benchmark dataset consisting of 17,233 compounds. By a Jackknife test on each of these test groups, the 1st order prediction accuracies of the training dataset and the test dataset were all approximately 79.50%, substantially higher than the rate of 25.43% achieved by random guesses. Encouraged by the promising results, we expect that our method will become a useful tool in screening out drugs with high toxicity.
doi:10.1371/journal.pone.0056517
PMCID: PMC3574107  PMID: 23457578
17.  Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF) 
PLoS ONE  2013;8(1):e55144.
In order to effectively understand and cope with the current ‘biodiversity crisis’, having large-enough sets of qualified data is necessary. Information facilitators such as the Global Biodiversity Information Facility (GBIF) are ensuring increasing availability of primary biodiversity records by linking data collections spread over several institutions that have agreed to publish their data in a common access schema. We have assessed the primary records that one such publisher, the Spanish node of GBIF (GBIF.ES), hosts on behalf of a number of institutions, considered to be a highly representative sample of the total mass of available data for a country in order to know the quantity and quality of the information made available. Our results may provide an indication of the overall fitness-for-use in these data. We have found a number of patterns in the availability and accrual of data that seem to arise naturally from the digitization processes. Knowing these patterns and features may help deciding when and how these data can be used. Broadly, the error level seems low. The available data may be of capital importance for the development of biodiversity research, both locally and globally. However, wide swaths of records lack data elements such as georeferencing or taxonomical levels. Although the remaining information is ample and fit for many uses, improving the completeness of the records would likely increase the usability span for these data.
doi:10.1371/journal.pone.0055144
PMCID: PMC3555939  PMID: 23372828
18.  Crumple: A Method for Complete Enumeration of All Possible Pseudoknot-Free RNA Secondary Structures 
PLoS ONE  2012;7(12):e52414.
The diverse landscape of RNA conformational space includes many canyons and crevices that are distant from the lowest minimum free energy valley and remain unexplored by traditional RNA structure prediction methods. A complete description of the entire RNA folding landscape can facilitate identification of biologically important conformations. The Crumple algorithm rapidly enumerates all possible non-pseudoknotted structures for an RNA sequence without consideration of thermodynamics while filtering the output with experimental data. The Crumple algorithm provides an alternative approach to traditional free energy minimization programs for RNA secondary structure prediction. A complete computation of all non-pseudoknotted secondary structures can reveal structures that would not be predicted by methods that sample the RNA folding landscape based on thermodynamic predictions. The free energy minimization approach is often successful but is limited by not considering RNA tertiary and protein interactions and the possibility that kinetics rather than thermodynamics determines the functional RNA fold. Efficient parallel computing and filters based on experimental data make practical the complete enumeration of all non-pseudoknotted structures. Efficient parallel computing for Crumple is implemented in a ring graph approach. Filters for experimental data include constraints from chemical probing of solvent accessibility, enzymatic cleavage of paired or unpaired nucleotides, phylogenetic covariation, and the minimum number and lengths of helices determined from crystallography or cryo-electron microscopy. The minimum number and length of helices has a significant effect on reducing conformational space. Pairing constraints reduce conformational space more than single nucleotide constraints. Examples with Alfalfa Mosaic Virus RNA and Trypanosome brucei guide RNA demonstrate the importance of evaluating all possible structures when pseduoknots, RNA-protein interactions, and metastable structures are important for biological function. Crumple software is freely available at http://adenosine.chem.ou.edu/software.html.
doi:10.1371/journal.pone.0052414
PMCID: PMC3531468  PMID: 23300665
19.  Screening of Dengue Virus Antiviral Activity of Marine Seaweeds by an In Situ Enzyme-Linked Immunosorbent Assay 
PLoS ONE  2012;7(12):e51089.
Dengue is a significant public health problem worldwide. Despite the important social and clinical impact, there is no vaccine or specific antiviral therapy for prevention and treatment of dengue virus (DENV) infection. Considering the above, drug discovery research for dengue is of utmost importance; in addition natural marine products provide diverse and novel chemical structures with potent biological activities that must be evaluated. In this study we propose a target-free approach for dengue drug discovery based on a novel, rapid, and economic in situ enzyme-linked immunosorbent assay and the screening of a panel of marine seaweed extracts. The in situ ELISA was standardized and validated for Huh7.5 cell line infected with all four serotypes of DENV, among them clinical isolates and a laboratory strain. Statistical analysis showed an average S/B of 7.2 and Z-factor of 0.62, demonstrating assay consistency and reliability. A panel of fifteen seaweed extracts was then screened at the maximum non-toxic dose previously determined by the MTT and Neutral Red cytotoxic assays. Eight seaweed extracts were able to reduce DENV infection of at least one serotype tested. Four extracts (Phaeophyta: Canistrocarpus cervicornis, Padina gymnospora; Rhodophyta: Palisada perforate; Chlorophyta: Caulerpa racemosa) were chosen for further evaluation, and time of addition studies point that they might act at an early stage of the viral infection cycle, such as binding or internalization.
doi:10.1371/journal.pone.0051089
PMCID: PMC3515490  PMID: 23227238
20.  GUItars: A GUI Tool for Analysis of High-Throughput RNA Interference Screening Data 
PLoS ONE  2012;7(11):e49386.
Background
High-throughput RNA interference (RNAi) screening has become a widely used approach to elucidating gene functions. However, analysis and annotation of large data sets generated from these screens has been a challenge for researchers without a programming background. Over the years, numerous data analysis methods were produced for plate quality control and hit selection and implemented by a few open-access software packages. Recently, strictly standardized mean difference (SSMD) has become a widely used method for RNAi screening analysis mainly due to its better control of false negative and false positive rates and its ability to quantify RNAi effects with a statistical basis. We have developed GUItars to enable researchers without a programming background to use SSMD as both a plate quality and a hit selection metric to analyze large data sets.
Results
The software is accompanied by an intuitive graphical user interface for easy and rapid analysis workflow. SSMD analysis methods have been provided to the users along with traditionally-used z-score, normalized percent activity, and t-test methods for hit selection. GUItars is capable of analyzing large-scale data sets from screens with or without replicates. The software is designed to automatically generate and save numerous graphical outputs known to be among the most informative high-throughput data visualization tools capturing plate-wise and screen-wise performances. Graphical outputs are also written in HTML format for easy access, and a comprehensive summary of screening results is written into tab-delimited output files.
Conclusion
With GUItars, we demonstrated robust SSMD-based analysis workflow on a 3840-gene small interfering RNA (siRNA) library and identified 200 siRNAs that increased and 150 siRNAs that decreased the assay activities with moderate to stronger effects. GUItars enables rapid analysis and illustration of data from large- or small-scale RNAi screens using SSMD and other traditional analysis methods. The software is freely available at http://sourceforge.net/projects/guitars/.
doi:10.1371/journal.pone.0049386
PMCID: PMC3502531  PMID: 23185323
21.  GenoSets: Visual Analytic Methods for Comparative Genomics 
PLoS ONE  2012;7(10):e46401.
Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization methods from the field of visual analytics can provide abstracted comparative views, suitable for sifting large multi-genome datasets to identify critical similarities and differences. We introduce a software system for visual analysis of comparative genomics data. The system automates the process of data integration, and provides the analysis platform to identify and explore features of interest within these large datasets. GenoSets borrows techniques from business intelligence and visual analytics to provide a rich interface of interactive visualizations supported by a multi-dimensional data warehouse. In GenoSets, visual analytic approaches are used to enable querying based on orthology, functional assignment, and taxonomic or user-defined groupings of genomes. GenoSets links this information together with coordinated, interactive visualizations for both detailed and high-level categorical analysis of summarized data. GenoSets has been designed to simplify the exploration of multiple genome datasets and to facilitate reasoning about genomic comparisons. Case examples are included showing the use of this system in the analysis of 12 Brucella genomes. GenoSets software and the case study dataset are freely available at http://genosets.uncc.edu. We demonstrate that the integration of genomic data using a coordinated multiple view approach can simplify the exploration of large comparative genomic data sets, and facilitate reasoning about comparisons and features of interest.
doi:10.1371/journal.pone.0046401
PMCID: PMC3463605  PMID: 23056299
22.  Computational Prediction of Conformational B-Cell Epitopes from Antigen Primary Structures by Ensemble Learning 
PLoS ONE  2012;7(8):e43575.
Motivation
The conformational B-cell epitopes are the specific sites on the antigens that have immune functions. The identification of conformational B-cell epitopes is of great importance to immunologists for facilitating the design of peptide-based vaccines. As an attempt to narrow the search for experimental validation, various computational models have been developed for the epitope prediction by using antigen structures. However, the application of these models is undermined by the limited number of available antigen structures. In contrast to the most of available structure-based methods, we here attempt to accurately predict conformational B-cell epitopes from antigen sequences.
Methods
In this paper, we explore various sequence-derived features, which have been observed to be associated with the location of epitopes or ever used in the similar tasks. These features are evaluated and ranked by their discriminative performance on the benchmark datasets. From the perspective of information science, the combination of various features can usually lead to better results than the individual features. In order to build the robust model, we adopt the ensemble learning approach to incorporate various features, and develop the ensemble model to predict conformational epitopes from antigen sequences.
Results
Evaluated by the leave-one-out cross validation, the proposed method gives out the mean AUC scores of 0.687 and 0.651 on two datasets respectively compiled from the bound structures and unbound structures. When compared with publicly available servers by using the independent dataset, our method yields better or comparable performance. The results demonstrate the proposed method is useful for the sequence-based conformational epitope prediction.
Availability
The web server and datasets are freely available at http://bcell.whu.edu.cn.
doi:10.1371/journal.pone.0043575
PMCID: PMC3424238  PMID: 22927994
23.  SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data 
PLoS ONE  2012;7(8):e41948.
In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome.
doi:10.1371/journal.pone.0041948
PMCID: PMC3411592  PMID: 22870267
24.  Influenza A Virus Coding Regions Exhibit Host-Specific Global Ordered RNA Structure 
PLoS ONE  2012;7(4):e35989.
Influenza A is a significant public health threat, partially because of its capacity to readily exchange gene segments between different host species to form novel pandemic strains. An understanding of the fundamental factors providing species barriers between different influenza hosts would facilitate identification of strains capable of leading to pandemic outbreaks and could also inform vaccine development. Here, we describe the difference in predicted RNA secondary structure stability that exists between avian, swine and human coding regions. The results predict that global ordered RNA structure exists in influenza A segments 1, 5, 7 and 8, and that ranges of free energies for secondary structure formation differ between host strains. The predicted free energy distributions for strains from avian, swine, and human species suggest criteria for segment reassortment and strains that might be ideal candidates for viral attenuation and vaccine development.
doi:10.1371/journal.pone.0035989
PMCID: PMC3338493  PMID: 22558296

Results 1-24 (24)