PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-16 (16)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Semantically enabling a genome-wide association study database 
Background
The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data.
Results
A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications.
Conclusions
We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.
doi:10.1186/2041-1480-3-9
PMCID: PMC3579732  PMID: 23244533
Ontology; Phenotype; GWAS; RDF
2.  Toward a roadmap in global biobanking for health 
European Journal of Human Genetics  2012;20(11):1105-1111.
Biobanks can have a pivotal role in elucidating disease etiology, translation, and advancing public health. However, meeting these challenges hinges on a critical shift in the way science is conducted and requires biobank harmonization. There is growing recognition that a common strategy is imperative to develop biobanking globally and effectively. To help guide this strategy, we articulate key principles, goals, and priorities underpinning a roadmap for global biobanking to accelerate health science, patient care, and public health. The need to manage and share very large amounts of data has driven innovations on many fronts. Although technological solutions are allowing biobanks to reach new levels of integration, increasingly powerful data-collection tools, analytical techniques, and the results they generate raise new ethical and legal issues and challenges, necessitating a reconsideration of previous policies, practices, and ethical norms. These manifold advances and the investments that support them are also fueling opportunities for biobanks to ultimately become integral parts of health-care systems in many countries. International harmonization to increase interoperability and sustainability are two strategic priorities for biobanking. Tackling these issues requires an environment favorably inclined toward scientific funding and equipped to address socio-ethical challenges. Cooperation and collaboration must extend beyond systems to enable the exchange of data and samples to strategic alliances between many organizations, including governmental bodies, funding agencies, public and private science enterprises, and other stakeholders, including patients. A common vision is required and we articulate the essential basis of such a vision herein.
doi:10.1038/ejhg.2012.96
PMCID: PMC3477856  PMID: 22713808
3.  Finding and sharing: new approaches to registries of databases and services for the biomedical sciences 
The recent explosion of biological data and the concomitant proliferation of distributed databases make it challenging for biologists and bioinformaticians to discover the best data resources for their needs, and the most efficient way to access and use them. Despite a rapid acceleration in uptake of syntactic and semantic standards for interoperability, it is still difficult for users to find which databases support the standards and interfaces that they need. To solve these problems, several groups are developing registries of databases that capture key metadata describing the biological scope, utility, accessibility, ease-of-use and existence of web services allowing interoperability between resources. Here, we describe some of these initiatives including a novel formalism, the Database Description Framework, for describing database operations and functionality and encouraging good database practise. We expect such approaches will result in improved discovery, uptake and utilization of data resources.
Database URL: http://www.casimir.org.uk/casimir_ddf
doi:10.1093/database/baq014
PMCID: PMC2911849  PMID: 20627863
4.  The Human Variome Project 
Science (New York, N.Y.)  2008;322(5903):861-862.
An ambitious plan to collect, curate, and make accessible information on genetic variations affecting human health is beginning to be realized.
doi:10.1126/science.1167363
PMCID: PMC2810956  PMID: 18988827
5.  Genetic Structures of Copy Number Variants Revealed by Genotyping Single Sperm 
PLoS ONE  2009;4(4):e5236.
Background
Copy number variants (CNVs) occupy a significant portion of the human genome and may have important roles in meiotic recombination, human genome evolution and gene expression. Many genetic diseases may be underlain by CNVs. However, because of the presence of their multiple copies, variability in copy numbers and the diploidy of the human genome, detailed genetic structure of CNVs cannot be readily studied by available techniques.
Methodology/Principal Findings
Single sperm samples were used as the primary subjects for the study so that CNV haplotypes in the sperm donors could be studied individually. Forty-eight CNVs characterized in a previous study were analyzed using a microarray-based high-throughput genotyping method after multiplex amplification. Seventeen single nucleotide polymorphisms (SNPs) were also included as controls. Two single-base variants, either allelic or paralogous, could be discriminated for all markers. Microarray data were used to resolve SNP alleles and CNV haplotypes, to quantitatively assess the numbers and compositions of the paralogous segments in each CNV haplotype.
Conclusions/Significance
This is the first study of the genetic structure of CNVs on a large scale. Resulting information may help understand evolution of the human genome, gain insight into many genetic processes, and discriminate between CNVs and SNPs. The highly sensitive high-throughput experimental system with haploid sperm samples as subjects may be used to facilitate detailed large-scale CNV analysis.
doi:10.1371/journal.pone.0005236
PMCID: PMC2668179  PMID: 19384415
6.  HGVbaseG2P: a central genetic association database 
Nucleic Acids Research  2008;37(Database issue):D797-D802.
The Human Genome Variation database of Genotype to Phenotype information (HGVbaseG2P) is a new central database for summary-level findings produced by human genetic association studies, both large and small. Such a database is needed so that researchers have an easy way to access all the available association study data relevant to their genes, genome regions or diseases of interest. Such a depository will allow true positive signals to be more readily distinguished from false positives (type I error) that fail to consistently replicate. In this paper we describe how HGVbaseG2P has been constructed, and how its data are gathered and organized. We present a range of user-friendly but powerful website tools for searching, browsing and visualizing G2P study findings. HGVbaseG2P is available at http://www.hgvbaseg2p.org.
doi:10.1093/nar/gkn748
PMCID: PMC2686551  PMID: 18948288
7.  Polymorphisms associated with asthma are inversely related to glioblastoma multiforme 
Cancer research  2005;65(14):6459-6465.
doi:10.1158/0008-5472.CAN-04-3728
PMCID: PMC1762912  PMID: 16024651
Asthma; polymorphisms; glioblastoma multiforme; GBM glioblastoma multiforme; IL interleukin; COX-2 cyclooxygenase 2; OR odds ratio; CI confidence interval; SNP single nucleotide polymorphism; CRP C-reactive protein
8.  Lower rate of genomic variation identified in the trans-membrane domain of monoamine sub-class of Human G-Protein Coupled Receptors: The Human GPCR-DB Database 
BMC Genomics  2004;5:91.
Background
We have surveyed, compiled and annotated nucleotide variations in 338 human 7-transmembrane receptors (G-protein coupled receptors). In a sample of 32 chromosomes from a Nordic population, we attempted to determine the allele frequencies of 80 non-synonymous SNPs, and found 20 novel polymorphic markers. GPCR receptors of physiological and clinical importance were prioritized for statistical analysis. Natural variation and rare mutation information were merged and presented online in the Human GPCR-DB database .
Results
The average number of SNPs per 1000 bases of exonic sequence was found to be twice the average number of SNPs per Kilobase of intronic regions (2.2 versus 1.0). Of the 338 genes, 111 were single exon genes, that is, were intronless. The average number of exonic-SNPs per single-exon gene was 3.5 (n = 395) while that for multi-exon genes was 0.8 (n = 1176). The average number of variations within the different protein domain (N-terminus, internal- and external-loops, trans-membrane region, C-terminus) indicates a lower rate of variation in the trans-membrane region of Monoamine GPCRs, as compared to Chemokine- and Peptide-receptor sub-classes of GPCRs.
Conclusions
Single-exon GPCRs on average have approximately three times the number of SNPs as compared to GPCRs with introns. Among various functional classes of GPCRs, Monoamine GPRCs have lower number of natural variations within the trans-membrane domain indicating evolutionary selection against non-synonymous changes within the membrane-localizing domain of this sub-class of GPCRs.
doi:10.1186/1471-2164-5-91
PMCID: PMC538281  PMID: 15579207
9.  HGBASE: a database of SNPs and other variations in and around human genes 
Nucleic Acids Research  2000;28(1):356-360.
Human genome polymorphism is expected to play a key role in defining the etiologic basis of phenotypic differences between individuals in aspects such as drug responses and common disease predisposition. Relevant functional DNA changes will probably be located in or near to transcribed sequences, and include many single nucleotide polymorphisms. To aid the future analysis of such genome variation, HGBASE (Human Genic Bi-Allelic SEquences) was constructed as a means to gather human gene-linked polymorphisms from all possible public sources, and show these as a non-redundant set of records in a standardized and user-friendly database endowed with text and sequence based search facilities. After 1 year of presence on the WWW, the HGBASE project has compiled data for over 22 000 records, and this number continues to triple every 6–12 months with data harvested or submitted from all major public genome databases and published literature from the previous decade. Extensive annotation enhancement, internal consistency checking and manual review of every record is undertaken to address potential errors and deficiencies sometimes present in the original source data. The fully polished and comprehensive database is made freely available to all at http://hgbase.cgr.ki.se
PMCID: PMC102467  PMID: 10592273
10.  Targeted enrichment of genomic DNA regions for next-generation sequencing 
Briefings in Functional Genomics  2011;10(6):374-386.
In this review, we discuss the latest targeted enrichment methods and aspects of their utilization along with second-generation sequencing for complex genome analysis. In doing so, we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a powerful tool. We explain how targeted enrichment for next-generation sequencing has made great progress in terms of methodology, ease of use and applicability, but emphasize the remaining challenges such as the lack of even coverage across targeted regions. Costs are also considered versus the alternative of whole-genome sequencing which is becoming ever more affordable. We conclude that targeted enrichment is likely to be the most economical option for many years to come in a range of settings.
doi:10.1093/bfgp/elr033
PMCID: PMC3245553  PMID: 22121152
targeted enrichment; next-generation sequencing; genome partitioning; exome; genetic variation
11.  VarioML framework for comprehensive variation data representation and exchange 
BMC Bioinformatics  2012;13:254.
Background
Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement.
Results
The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components.
Conclusions
VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.
doi:10.1186/1471-2105-13-254
PMCID: PMC3507772  PMID: 23031277
LSDB; Variation database curation; Data collection; Distribution
12.  A mechanistic basis for amplification differences between samples and between genome regions 
BMC Genomics  2012;13:455.
Background
For many analytical methods the efficiency of DNA amplification varies across the genome and between samples. The most affected genome regions tend to correlate with high C + G content, however this relationship is complex and does not explain why the direction and magnitude of effects varies considerably between samples.
Results
Here, we provide evidence that sequence elements that are particularly high in C + G content can remain annealed even when aggressive melting conditions are applied. In turn, this behavior creates broader ‘Thermodynamically Ultra-Fastened’ (TUF) regions characterized by incomplete denaturation of the two DNA strands, so reducing amplification efficiency throughout these domains.
Conclusions
This model provides a mechanistic explanation for why some genome regions are particularly difficult to amplify and assay in many procedures, and importantly it also explains inter-sample variability of this behavior. That is, DNA samples of varying quality will carry more or fewer nicks and breaks, and hence their intact TUF regions will have different lengths and so be differentially affected by this amplification suppression mechanism – with ‘higher’ quality DNAs being the most vulnerable. A major practical consequence of this is that inter-region and inter-sample variability can be largely overcome by employing routine fragmentation methods (e.g. sonication or restriction enzyme digestion) prior to sample amplification.
doi:10.1186/1471-2164-13-455
PMCID: PMC3469336  PMID: 22950736
DNA amplification; DNA denaturation; C + G; Illumina infinium
13.  The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button 
BMC Bioinformatics  2010;11(Suppl 12):S12.
Background
There is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new *omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed.
Methods
The MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS’ generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This ‘model-driven’ method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software.
Results
In recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist’s satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the ‘ExtractModel’ procedure.
Conclusions
The MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.
doi:10.1186/1471-2105-11-S12-S12
PMCID: PMC3040526  PMID: 21210979
14.  Locus Reference Genomic sequences: an improved basis for describing human DNA variants 
Genome Medicine  2010;2(4):24.
As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org.
doi:10.1186/gm145
PMCID: PMC2873802  PMID: 20398331
16.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones 
Imanishi, Tadashi | Itoh, Takeshi | Suzuki, Yutaka | O'Donovan, Claire | Fukuchi, Satoshi | Koyanagi, Kanako O | Barrero, Roberto A | Tamura, Takuro | Yamaguchi-Kabata, Yumi | Tanino, Motohiko | Yura, Kei | Miyazaki, Satoru | Ikeo, Kazuho | Homma, Keiichi | Kasprzyk, Arek | Nishikawa, Tetsuo | Hirakawa, Mika | Thierry-Mieg, Jean | Thierry-Mieg, Danielle | Ashurst, Jennifer | Jia, Libin | Nakao, Mitsuteru | Thomas, Michael A | Mulder, Nicola | Karavidopoulou, Youla | Jin, Lihua | Kim, Sangsoo | Yasuda, Tomohiro | Lenhard, Boris | Eveno, Eric | Suzuki, Yoshiyuki | Yamasaki, Chisato | Takeda, Jun-ichi | Gough, Craig | Hilton, Phillip | Fujii, Yasuyuki | Sakai, Hiroaki | Tanaka, Susumu | Amid, Clara | Bellgard, Matthew | de Fatima Bonaldo, Maria | Bono, Hidemasa | Bromberg, Susan K | Brookes, Anthony J | Bruford, Elspeth | Carninci, Piero | Chelala, Claude | Couillault, Christine | de Souza, Sandro J. | Debily, Marie-Anne | Devignes, Marie-Dominique | Dubchak, Inna | Endo, Toshinori | Estreicher, Anne | Eyras, Eduardo | Fukami-Kobayashi, Kaoru | R. Gopinath, Gopal | Graudens, Esther | Hahn, Yoonsoo | Han, Michael | Han, Ze-Guang | Hanada, Kousuke | Hanaoka, Hideki | Harada, Erimi | Hashimoto, Katsuyuki | Hinz, Ursula | Hirai, Momoki | Hishiki, Teruyoshi | Hopkinson, Ian | Imbeaud, Sandrine | Inoko, Hidetoshi | Kanapin, Alexander | Kaneko, Yayoi | Kasukawa, Takeya | Kelso, Janet | Kersey, Paul | Kikuno, Reiko | Kimura, Kouichi | Korn, Bernhard | Kuryshev, Vladimir | Makalowska, Izabela | Makino, Takashi | Mano, Shuhei | Mariage-Samson, Regine | Mashima, Jun | Matsuda, Hideo | Mewes, Hans-Werner | Minoshima, Shinsei | Nagai, Keiichi | Nagasaki, Hideki | Nagata, Naoki | Nigam, Rajni | Ogasawara, Osamu | Ohara, Osamu | Ohtsubo, Masafumi | Okada, Norihiro | Okido, Toshihisa | Oota, Satoshi | Ota, Motonori | Ota, Toshio | Otsuki, Tetsuji | Piatier-Tonneau, Dominique | Poustka, Annemarie | Ren, Shuang-Xi | Saitou, Naruya | Sakai, Katsunaga | Sakamoto, Shigetaka | Sakate, Ryuichi | Schupp, Ingo | Servant, Florence | Sherry, Stephen | Shiba, Rie | Shimizu, Nobuyoshi | Shimoyama, Mary | Simpson, Andrew J | Soares, Bento | Steward, Charles | Suwa, Makiko | Suzuki, Mami | Takahashi, Aiko | Tamiya, Gen | Tanaka, Hiroshi | Taylor, Todd | Terwilliger, Joseph D | Unneberg, Per | Veeramachaneni, Vamsi | Watanabe, Shinya | Wilming, Laurens | Yasuda, Norikazu | Yoo, Hyang-Sook | Stodolsky, Marvin | Makalowski, Wojciech | Go, Mitiko | Nakai, Kenta | Takagi, Toshihisa | Kanehisa, Minoru | Sakaki, Yoshiyuki | Quackenbush, John | Okazaki, Yasushi | Hayashizaki, Yoshihide | Hide, Winston | Chakraborty, Ranajit | Nishikawa, Ken | Sugawara, Hideaki | Tateno, Yoshio | Chen, Zhu | Oishi, Michio | Tonellato, Peter | Apweiler, Rolf | Okubo, Kousaku | Wagner, Lukas | Wiemann, Stefan | Strausberg, Robert L | Isogai, Takao | Auffray, Charles | Nomura, Nobuo | Gojobori, Takashi | Sugano, Sumio
PLoS Biology  2004;2(6):e162.
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community
doi:10.1371/journal.pbio.0020162
PMCID: PMC393292  PMID: 15103394

Results 1-16 (16)