PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (939449)

Clipboard (0)
None

Related Articles

1.  The core and unique proteins of haloarchaea 
BMC Genomics  2012;13:39.
Background
Since the first genome of a halophilic archaeon was sequenced in 2000, biologists have been advancing the understanding of genomic characteristics that allow for survival in the harsh natural environments of these organisms. An increase in protein acidity and GC-bias in the genome have been implicated as factors in tolerance to extreme salinity, desiccation, and high solar radiation. However, few previous attempts have been made to identify novel genes that would permit survival in such extreme conditions.
Results
With the recent release of several new complete haloarchaeal genome sequences, we have conducted a comprehensive comparative genomic analysis focusing on the identification of unique haloarchaeal conserved proteins that likely play key roles in environmental adaptation. Using bioinformatic methods, we have clustered 31,312 predicted proteins from nine haloarchaeal genomes into 4,455 haloarchaeal orthologous groups (HOGs). We assigned likely functions by association with established COG and KOG databases in NCBI. After identifying homologs in four additional haloarchaeal genomes, we determined that there were 784 core haloarchaeal protein clusters (cHOGs), of which 83 clusters were found primarily in haloarchaea. Further analysis found that 55 clusters were truly unique (tucHOGs) to haloarchaea and qualify as signature proteins while 28 were nearly unique (nucHOGs), the vast majority of which were coded for on the haloarchaeal chromosomes. Of the signature proteins, only one example with any predicted function, Ral, involved in desiccation/radiation tolerance in Halobacterium sp. NRC-1, was identified. Among the core clusters, 33% was predicted to function in metabolism, 25% in information transfer and storage, 10% in cell processes and signaling, and 22% belong to poorly characterized or general function groups.
Conclusion
Our studies have established conserved groups of nearly 800 protein clusters present in all haloarchaea, with a subset of 55 which are predicted to be accessory proteins that may be critical or essential for success in an extreme environment. These studies support core and signature genes and proteins as valuable concepts for understanding phylogenetic and phenotypic characteristics of coherent groups of organisms.
doi:10.1186/1471-2164-13-39
PMCID: PMC3287961  PMID: 22272718
2.  SoyDB: a knowledge database of soybean transcription factors 
BMC Plant Biology  2010;10:14.
Background
Transcription factors play the crucial rule of regulating gene expression and influence almost all biological processes. Systematically identifying and annotating transcription factors can greatly aid further understanding their functions and mechanisms. In this article, we present SoyDB, a user friendly database containing comprehensive knowledge of soybean transcription factors.
Description
The soybean genome was recently sequenced by the Department of Energy-Joint Genome Institute (DOE-JGI) and is publicly available. Mining of this sequence identified 5,671 soybean genes as putative transcription factors. These genes were comprehensively annotated as an aid to the soybean research community. We developed SoyDB - a knowledge database for all the transcription factors in the soybean genome. The database contains protein sequences, predicted tertiary structures, putative DNA binding sites, domains, homologous templates in the Protein Data Bank (PDB), protein family classifications, multiple sequence alignments, consensus protein sequence motifs, web logo of each family, and web links to the soybean transcription factor database PlantTFDB, known EST sequences, and other general protein databases including Swiss-Prot, Gene Ontology, KEGG, EMBL, TAIR, InterPro, SMART, PROSITE, NCBI, and Pfam. The database can be accessed via an interactive and convenient web server, which supports full-text search, PSI-BLAST sequence search, database browsing by protein family, and automatic classification of a new protein sequence into one of 64 annotated transcription factor families by hidden Markov models.
Conclusions
A comprehensive soybean transcription factor database was constructed and made publicly accessible at http://casp.rnet.missouri.edu/soydb/.
doi:10.1186/1471-2229-10-14
PMCID: PMC2826334  PMID: 20082720
3.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2001;29(1):11-16.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheri­tance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
PMCID: PMC29800  PMID: 11125038
4.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2000;28(1):10-14.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval and resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing pages, GeneMap’99, Davis Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP) pages, Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP) pages, SAGEmap, Online Mendelian Inheritance in Man (OMIM) and the Molecular Modeling Database (MMDB). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov
PMCID: PMC102437  PMID: 10592169
5.  Database resources of the National Center for Biotechnology Information: 2002 update 
Nucleic Acids Research  2002;30(1):13-16.
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, Human¡VMouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov.
PMCID: PMC99094  PMID: 11752242
6.  Fungal genome resources at NCBI 
Mycology  2011;2(3):142-160.
The National Center for Biotechnology Information (NCBI) is well known for the nucleotide sequence archive, GenBank and sequence analysis tool BLAST. However, NCBI integrates many types of biomolecular data from variety of sources and makes it available to the scientific community as interactive web resources as well as organized releases of bulk data. These tools are available to explore and compare fungal genomes. Searching all databases with Fungi [organism] at http://www.ncbi.nlm.nih.gov/ is the quickest way to find resources of interest with fungal entries. Some tools though are resources specific and can be indirectly accessed from a particular database in the Entrez system. These include graphical viewers and comparative analysis tools such as TaxPlot, TaxMap and UniGene DDD (found via UniGene Homepage). Gene and BioProject pages also serve as portals to external data such as community annotation websites, BioGrid and UniProt. There are many different ways of accessing genomic data at NCBI. Depending on the focus and goal of research projects or the level of interest, a user would select a particular route for accessing genomic databases and resources. This review article describes methods of accessing fungal genome data and provides examples that illustrate the use of analysis tools.
doi:10.1080/21501203.2011.584576
PMCID: PMC3379888  PMID: 22737589
Bioinformatics; fungal genomics; comparative genomics; protein clusters
7.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2012;41(Database issue):D8-D20.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page.
doi:10.1093/nar/gks1189
PMCID: PMC3531099  PMID: 23193264
8.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2013;42(Database issue):D7-D17.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page.
doi:10.1093/nar/gkt1146
PMCID: PMC3965057  PMID: 24259429
9.  Database resources of the National Center for Biotechnology Information: update 
Nucleic Acids Research  2004;32(Database issue):D35-D40.
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkh073
PMCID: PMC308807  PMID: 14681353
10.  Database resources of the National Center for Biotechnology 
Nucleic Acids Research  2003;31(1):28-33.
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
PMCID: PMC165480  PMID: 12519941
11.  Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data 
BMC Bioinformatics  2011;12:282.
Background
Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data.
Description
The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.
Conclusions
The Algal Functional Annotation Tool aims to provide an integrated data-mining environment for algal genomics by combining data from multiple annotation databases into a centralized tool. This site is designed to expedite the process of functional annotation and the interpretation of gene lists, such as those derived from high-throughput RNA-seq experiments. The tool is publicly available at http://pathways.mcdb.ucla.edu.
doi:10.1186/1471-2105-12-282
PMCID: PMC3144025  PMID: 21749710
12.  NemaPath: online exploration of KEGG-based metabolic pathways for nematodes 
BMC Genomics  2008;9:525.
Background
Nematode.net is a web-accessible resource for investigating gene sequences from parasitic and free-living nematode genomes. Beyond the well-characterized model nematode C. elegans, over 500,000 expressed sequence tags (ESTs) and nearly 600,000 genome survey sequences (GSSs) have been generated from 36 nematode species as part of the Parasitic Nematode Genomics Program undertaken by the Genome Center at Washington University School of Medicine. However, these sequencing data are not present in most publicly available protein databases, which only include sequences in Swiss-Prot. Swiss-Prot, in turn, relies on GenBank/Embl/DDJP for predicted proteins from complete genomes or full-length proteins.
Description
Here we present the NemaPath pathway server, a web-based pathway-level visualization tool for navigating putative metabolic pathways for over 30 nematode species, including 27 parasites. The NemaPath approach consists of two parts: 1) a backend tool to align and evaluate nematode genomic sequences (curated EST contigs) against the annotated Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database; 2) a web viewing application that displays annotated KEGG pathway maps based on desired confidence levels of primary sequence similarity as defined by a user. NemaPath also provides cross-referenced access to nematode genome information provided by other tools available on Nematode.net, including: detailed NemaGene EST cluster information; putative translations; GBrowse EST cluster views; links from nematode data to external databases for corresponding synonymous C. elegans counterparts, subject matches in KEGG's gene database, and also KEGG Ontology (KO) identification.
Conclusion
The NemaPath server hosts metabolic pathway mappings for 30 nematode species and is available on the World Wide Web at . The nematode source sequences used for the metabolic pathway mappings are available via FTP , as provided by the Genome Center at Washington University School of Medicine.
doi:10.1186/1471-2164-9-525
PMCID: PMC2588608  PMID: 18983679
13.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2007;36(Database issue):D13-D21.
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkm1000
PMCID: PMC2238880  PMID: 18045790
14.  Protein Information Resource: a community resource for expert annotation of protein data 
Nucleic Acids Research  2001;29(1):29-32.
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
PMCID: PMC29802  PMID: 11125041
15.  Comprehensive DNA Signature Discovery and Validation 
PLoS Computational Biology  2007;3(5):e98.
DNA signatures are nucleotide sequences that can be used to detect the presence of an organism and to distinguish that organism from all other species. Here we describe Insignia, a new, comprehensive system for the rapid identification of signatures in the genomes of bacteria and viruses. With the availability of hundreds of complete bacterial and viral genome sequences, it is now possible to use computational methods to identify signature sequences in all of these species, and to use these signatures as the basis for diagnostic assays to detect and genotype microbes in both environmental and clinical samples. The success of such assays critically depends on the methods used to identify signatures that properly differentiate between the target genomes and the sample background. We have used Insignia to compute accurate signatures for most bacterial genomes and made them available through our Web site. A sample of these signatures has been successfully tested on a set of 46 Vibrio cholerae strains, and the results indicate that the signatures are highly sensitive for detection as well as specific for discrimination between these strains and their near relatives. Our approach, whereby the entire genomic complement of organisms are compared to identify probe targets, is a promising method for diagnostic assay development, and it provides assay designers with the flexibility to choose probes from the most relevant genes or genomic regions. The Insignia system is freely accessible via a Web interface and has been released as open source software at: http://insignia.cbcb.umd.edu.
Author Summary
Now that the genome sequences of hundreds of bacteria and viruses are known, we can design tests that will rapidly detect the presence of these species based solely on their DNA. Such tests have a wide range of applications, from diagnosing infections to detecting harmful microbes in a water supply. These tests can detect a pathogen in a complex mixture of organic material by recognizing short, distinguishing sequences—called DNA signatures—that occur in the pathogen and not in any other species. We present Insignia, a new computational system that identifies DNA signatures of any length in bacterial and viral genomes. Insignia uses highly efficient algorithms to compare sequenced bacterial and viral genomes against each other and to additional background genomes including plants, animals, and human. These comparisons are stored in a database and used to rapidly compute signatures for any particular target species. To maximize its utility for the community, we have made Insignia available as free, open-source software and as a Web application. We have also validated 50 Insignia-designed assays on a panel of 46 strains of Vibrio cholerae, and our results show that the signatures are both sensitive and specific.
doi:10.1371/journal.pcbi.0030098
PMCID: PMC1868776  PMID: 17511514
16.  GTOP: a database of protein structures predicted from genome sequences 
Nucleic Acids Research  2002;30(1):294-298.
Large-scale genome projects generate an unprecedented number of protein sequences, most of them are experimentally uncharacterized. Predicting the 3D structures of sequences provides important clues as to their functions. We constructed the Genomes TO Protein structures and functions (GTOP) database, containing protein fold predictions of a huge number of sequences. Predictions are mainly carried out with the homology search program PSI-BLAST, currently the most popular among high-sensitivity profile search methods. GTOP also includes the results of other analyses, e.g. homology and motif search, detection of transmembrane helices and repetitive sequences. We have completed analyzing the sequences of 41 organisms, with the number of proteins exceeding 120 000 in total. GTOP uses a graphical viewer to present the analytical results of each ORF in one page in a ‘color-bar’ format. The assigned 3D structures are presented by Chime plug-in or RasMol. The binding sites of ligands are also included, providing functional information. The GTOP server is available at http://spock.genes.nig.ac.jp/~genome/gtop.html.
PMCID: PMC99104  PMID: 11752318
17.  The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions 
Nucleic Acids Research  2008;37(Database issue):D333-D337.
The Genomes TO Protein Structures and Functions (GTOP) database (http://spock.genes.nig.ac.jp/~genome/gtop.html) freely provides an extensive collection of information on protein structures and functions obtained by application of various computational tools to the amino acid sequences of entirely sequenced genomes. GTOP contains annotations of 3D structures, protein families, functions, and other useful data of a protein of interest in user-friendly ways to give a deep insight into the protein structure. From the initial 1999 version, GTOP has been continually updated to reap the fruits of genome projects and augmented to supply novel information, in particular intrinsically disordered regions. As intrinsically disordered regions constitute a considerable fraction of proteins and often play crucial roles especially in eukaryotes, their assignments give important additional clues to the functionality of proteins. Additionally, we have incorporated the following features into GTOP: a platform independent structural viewer, results of HMM searches against SCOP and Pfam, secondary structure predictions, color display of exon boundaries in eukaryotic proteins, assignments of gene ontology terms, search tools, and master files.
doi:10.1093/nar/gkn855
PMCID: PMC2686575  PMID: 18987007
18.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2011;40(Database issue):D13-D25.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkr1184
PMCID: PMC3245031  PMID: 22140104
19.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2009;38(Database issue):D5-D16.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkp967
PMCID: PMC2808881  PMID: 19910364
20.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2006;35(Database issue):D5-D12.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at .
doi:10.1093/nar/gkl1031
PMCID: PMC1781113  PMID: 17170002
21.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2010;39(Database issue):D38-D51.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkq1172
PMCID: PMC3013733  PMID: 21097890
22.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2008;37(Database issue):D5-D15.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkn741
PMCID: PMC2686545  PMID: 18940862
23.  The Protein Information Resource: an integrated public resource of functional annotation of proteins 
Nucleic Acids Research  2002;30(1):35-37.
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).
PMCID: PMC99125  PMID: 11752247
24.  CASCAD: a database of annotated candidate single nucleotide polymorphisms associated with expressed sequences 
BMC Genomics  2005;6:10.
Background
With the recent progress made in large-scale genome sequencing projects a vast amount of novel data is becoming available. A comparative sequence analysis, exploiting sequence information from various resources, can be used to uncover hidden information, such as genetic variation. Although there are enormous amounts of SNPs for a wide variety of organisms submitted to NCBI dbSNP and annotated in most genome assembly viewers like Ensembl and the UCSC Genome Browser, these platforms do not easily allow for extensive annotation and incorporation of experimental data supporting the polymorphism. However, such information is very important for selecting the most promising and useful candidate polymorphisms for use in experimental setups.
Description
The CASCAD database is designed for presentation and query of candidate SNPs that are retrieved by in silico mining of high-throughput sequencing data. Currently, the database provides collections of laboratory rat (Rattus norvegicus) and zebrafish (Danio rerio) candidate SNPs. The database stores detailed information about raw data supporting the candidate, extensive annotation and links to external databases (e.g. GenBank, Ensembl, UniGene, and LocusLink), verification information, and predictions of a potential effect for non-synonymous polymorphisms in coding regions. The CASCAD website allows search based on an arbitrary combination of 27 different parameters related to characteristics like candidate SNP quality, genomic localization, and sequence data source or strain. In addition, the database can be queried with any custom nucleotide sequences of interest. The interface is crosslinked to other public databases and tightly coupled with primer design and local genome assembly interfaces in order to facilitate experimental verification of candidates.
Conclusions
The CASCAD database discloses detailed information on rat and zebrafish candidate SNPs, including the raw data underlying its discovery. An advanced web-based search interface allows universal access to the database content and allows various queries supporting many types of research utilizing single nucleotide polymorphisms.
doi:10.1186/1471-2164-6-10
PMCID: PMC548278  PMID: 15676075
25.  PeanutDB: an integrated bioinformatics web portal for Arachis hypogaea transcriptomics 
BMC Plant Biology  2012;12:94.
Background
The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource.
Description
With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors.
Conclusion
As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the Peanut research community with an easy-to-use web portal that will definitely facilitate genomics research and molecular breeding in this less-studied crop.
doi:10.1186/1471-2229-12-94
PMCID: PMC3444431  PMID: 22712730
Peanut; Arachis hypogaea; Transcriptome sequencing; Transcriptome assembly; Database; PeanutDB; SNP; SSR; Functional annotation

Results 1-25 (939449)