PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-19 (19)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
1.  Phenotype–Genotype Integrator (PheGenI): synthesizing genome-wide association study (GWAS) data with existing genomic resources 
Rapidly accumulating data from genome-wide association studies (GWASs) and other large-scale studies are most useful when synthesized with existing databases. To address this opportunity, we developed the Phenotype–Genotype Integrator (PheGenI), a user-friendly web interface that integrates various National Center for Biotechnology Information (NCBI) genomic databases with association data from the National Human Genome Research Institute GWAS Catalog and supports downloads of search results. Here, we describe the rationale for and development of this resource. Integrating over 66 000 association records with extensive single nucleotide polymorphism (SNP), gene, and expression quantitative trait loci data already available from the NCBI, PheGenI enables deeper investigation and interrogation of SNPs associated with a wide range of traits, facilitating the examination of the relationships between genetic variation and human diseases.
doi:10.1038/ejhg.2013.96
PMCID: PMC3865418  PMID: 23695286
database; data integration; genome sequence; genome-wide association study; phenotype; single nucleotide polymorphism
2.  Data Use under the NIH GWAS Data Sharing Policy and Future Directions 
Nature genetics  2014;46(9):934-938.
In 2007, the National Institutes of Health (NIH) introduced the Genome-Wide Association Studies (GWAS) Policy and the database of Genotypes and Phenotypes (dbGaP) to facilitate “controlled” access to GWAS data based on participants’ informed consent. dbGaP has provided 2,221 investigators access to 304 studies, resulting in 924 publications and significant scientific advances. Following this success, the 2014 Genomic Data Sharing Policy will extend the GWAS Policy to additional data types.
doi:10.1038/ng.3062
PMCID: PMC4182942  PMID: 25162809
3.  Characterizing Genetic Variants for Clinical Action 
Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few common variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: 1) identify clinically valid genetic variants; 2) decide whether they are actionable and what the action should be; and 3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop.
doi:10.1002/ajmg.c.31386
PMCID: PMC4158437  PMID: 24634402
genomic medicine; clinical actionability; database; electronic health records (EHR); pharmacogenomics; DNA sequencing
4.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2013;42(Database issue):D7-D17.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page.
doi:10.1093/nar/gkt1146
PMCID: PMC3965057  PMID: 24259429
5.  The 1000 Genomes Project: Data Management and Community Access 
Nature methods  2012;9(5):459-462.
The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalogue of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available for community use. The project data coordination center has developed and deployed several tools to enable widespread data access.
doi:10.1038/nmeth.1974
PMCID: PMC3340611  PMID: 22543379
6.  A Mathematical Approach to the Analysis of Multiplex DNA Profiles 
Bulletin of mathematical biology  2010;73(8):1909-1931.
Multiplex DNA profiles are used extensively for biomedical and forensic purposes. However, while DNA profile data generation is automated, human analysis of those data is not, and the need for speed combined with accuracy demands a computer-automated approach to sample interpretation and quality assessment. In this paper, we describe an integrated mathematical approach to modeling the data and extracting the relevant information, while rejecting noise and sample artifacts. We conclude with examples showing the effectiveness of our algorithms.
doi:10.1007/s11538-010-9598-0
PMCID: PMC3150588  PMID: 21103945
DNA; STR; multiplex; electrophoresis; cubic spline; inner product; norm; Gaussian; OSIRIS; forensics; profile; biomedical
7.  Assessing and Managing Risk when Sharing Aggregate Genetic Variant Data 
Nature Reviews. Genetics  2011;12(10):730-736.
Preface
Access to genetic data across studies is an important aspect of identifying new genetic associations through genome-wide association studies (GWAS). Meta-analysis across multiple GWAS with combined cohort sizes of tens of thousands of individuals often uncovers many more genome-wide associated loci than the original individual studies, which emphasizes the importance of tools and mechanisms for data sharing. However, even sharing summary-level data, such as allele frequencies, inherently carries some degree of privacy risk to study participants. Here we discuss mechanisms and resources for sharing data from GWAS, particularly focusing on approaches for assessing and quantifying privacy risks to participants from sharing of summary-level data.
doi:10.1038/nrg3067
PMCID: PMC3349221  PMID: 21921928
8.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2011;40(Database issue):D13-D25.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkr1184
PMCID: PMC3245031  PMID: 22140104
9.  The variant call format and VCFtools 
Bioinformatics  2011;27(15):2156-2158.
Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
Availability: http://vcftools.sourceforge.net
Contact: rd@sanger.ac.uk
doi:10.1093/bioinformatics/btr330
PMCID: PMC3137218  PMID: 21653522
10.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2010;39(Database issue):D38-D51.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkq1172
PMCID: PMC3013733  PMID: 21097890
11.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2009;38(Database issue):D5-D16.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkp967
PMCID: PMC2808881  PMID: 19910364
14.  Supplementing High-Density SNP Microarrays for Additional Coverage of Disease-Related Genes: Addiction as a Paradigm 
PLoS ONE  2009;4(4):e5225.
Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well represented by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.
doi:10.1371/journal.pone.0005225
PMCID: PMC2668711  PMID: 19381300
15.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2008;37(Database issue):D5-D15.
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
doi:10.1093/nar/gkn741
PMCID: PMC2686545  PMID: 18940862
17.  Database resources of the National Center for Biotechnology Information 
Nucleic Acids Research  2005;34(Database issue):D173-D180.
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: .
doi:10.1093/nar/gkj158
PMCID: PMC1347520  PMID: 16381840
18.  dbSNP: a database of single nucleotide polymorphisms 
Nucleic Acids Research  2000;28(1):352-355.
In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Cancer for Biotechnology Information (NCBI) has established the dbSNP database. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP . Submitted SNPs can also be downloaded via anonymous FTP at ftp://ncbi. nlm.nih.gov/snp/
PMCID: PMC102496  PMID: 10592272
19.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones 
Imanishi, Tadashi | Itoh, Takeshi | Suzuki, Yutaka | O'Donovan, Claire | Fukuchi, Satoshi | Koyanagi, Kanako O | Barrero, Roberto A | Tamura, Takuro | Yamaguchi-Kabata, Yumi | Tanino, Motohiko | Yura, Kei | Miyazaki, Satoru | Ikeo, Kazuho | Homma, Keiichi | Kasprzyk, Arek | Nishikawa, Tetsuo | Hirakawa, Mika | Thierry-Mieg, Jean | Thierry-Mieg, Danielle | Ashurst, Jennifer | Jia, Libin | Nakao, Mitsuteru | Thomas, Michael A | Mulder, Nicola | Karavidopoulou, Youla | Jin, Lihua | Kim, Sangsoo | Yasuda, Tomohiro | Lenhard, Boris | Eveno, Eric | Suzuki, Yoshiyuki | Yamasaki, Chisato | Takeda, Jun-ichi | Gough, Craig | Hilton, Phillip | Fujii, Yasuyuki | Sakai, Hiroaki | Tanaka, Susumu | Amid, Clara | Bellgard, Matthew | de Fatima Bonaldo, Maria | Bono, Hidemasa | Bromberg, Susan K | Brookes, Anthony J | Bruford, Elspeth | Carninci, Piero | Chelala, Claude | Couillault, Christine | de Souza, Sandro J. | Debily, Marie-Anne | Devignes, Marie-Dominique | Dubchak, Inna | Endo, Toshinori | Estreicher, Anne | Eyras, Eduardo | Fukami-Kobayashi, Kaoru | R. Gopinath, Gopal | Graudens, Esther | Hahn, Yoonsoo | Han, Michael | Han, Ze-Guang | Hanada, Kousuke | Hanaoka, Hideki | Harada, Erimi | Hashimoto, Katsuyuki | Hinz, Ursula | Hirai, Momoki | Hishiki, Teruyoshi | Hopkinson, Ian | Imbeaud, Sandrine | Inoko, Hidetoshi | Kanapin, Alexander | Kaneko, Yayoi | Kasukawa, Takeya | Kelso, Janet | Kersey, Paul | Kikuno, Reiko | Kimura, Kouichi | Korn, Bernhard | Kuryshev, Vladimir | Makalowska, Izabela | Makino, Takashi | Mano, Shuhei | Mariage-Samson, Regine | Mashima, Jun | Matsuda, Hideo | Mewes, Hans-Werner | Minoshima, Shinsei | Nagai, Keiichi | Nagasaki, Hideki | Nagata, Naoki | Nigam, Rajni | Ogasawara, Osamu | Ohara, Osamu | Ohtsubo, Masafumi | Okada, Norihiro | Okido, Toshihisa | Oota, Satoshi | Ota, Motonori | Ota, Toshio | Otsuki, Tetsuji | Piatier-Tonneau, Dominique | Poustka, Annemarie | Ren, Shuang-Xi | Saitou, Naruya | Sakai, Katsunaga | Sakamoto, Shigetaka | Sakate, Ryuichi | Schupp, Ingo | Servant, Florence | Sherry, Stephen | Shiba, Rie | Shimizu, Nobuyoshi | Shimoyama, Mary | Simpson, Andrew J | Soares, Bento | Steward, Charles | Suwa, Makiko | Suzuki, Mami | Takahashi, Aiko | Tamiya, Gen | Tanaka, Hiroshi | Taylor, Todd | Terwilliger, Joseph D | Unneberg, Per | Veeramachaneni, Vamsi | Watanabe, Shinya | Wilming, Laurens | Yasuda, Norikazu | Yoo, Hyang-Sook | Stodolsky, Marvin | Makalowski, Wojciech | Go, Mitiko | Nakai, Kenta | Takagi, Toshihisa | Kanehisa, Minoru | Sakaki, Yoshiyuki | Quackenbush, John | Okazaki, Yasushi | Hayashizaki, Yoshihide | Hide, Winston | Chakraborty, Ranajit | Nishikawa, Ken | Sugawara, Hideaki | Tateno, Yoshio | Chen, Zhu | Oishi, Michio | Tonellato, Peter | Apweiler, Rolf | Okubo, Kousaku | Wagner, Lukas | Wiemann, Stefan | Strausberg, Robert L | Isogai, Takao | Auffray, Charles | Nomura, Nobuo | Gojobori, Takashi | Sugano, Sumio
PLoS Biology  2004;2(6):e162.
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community
doi:10.1371/journal.pbio.0020162
PMCID: PMC393292  PMID: 15103394

Results 1-19 (19)