HGVbase input data is acquired from multiple sources, including public databases, the literature, our own and collaborative discovery efforts, and direct submissions provided by the community. In addition to the variations themselves, allele frequency data in different populations (even for previously known and well-studied variations) is highly valuable information and its supply is very much appreciated. No claim of ownership of any underlying HGVbase data is made by us, though our specific compilation and representation of it are subject to our copyleft and ownership claims (ftp://ftp.ebi.ac.uk/pub/databases/variantdbs/hgvbase/LICENSE). These are designed to ensure this resource remains freely available to everyone for research purposes.
A great deal of HGVbase data originates from various public polymorphism and mutation databases. Approval is always sought before data is harvested from any other depository. To date, without exception, every public resource we have approached has been willing and proactive in helping us to access and process their information. No attempt has yet been made to acquire any private-domain lists of sequence variations. Bidirectional data exchange with dbSNP (
1) was established at the end of the year 2000, though we only incorporate those records that succeed in passing all of our quality requirements summarized above.
Beyond harvesting of large public datasets, individual researchers make frequent submissions to HGVbase. Smaller lists of variants (up to tens of records) may be submitted as Microsoft Excel or Microsoft Word submission sheets that are filled in locally and then emailed to us. For larger submissions (up to a few thousand records) we will work with the submitter to create purpose-built software tailored towards extracting data from whatever format is convenient for the submitter. In both these above cases, we strive to fully manually curate (with the aid of established purpose-built visualization and data processing software) all aspects of the supplied data to ensure absolute data consistency and full coverage according to our list of record features summarized below. For even larger datasets, and where regular submissions might be anticipated (e.g. dbSNP downloads), we would establish fully automated curation procedures.
From 1998 to early 2001 we worked hard to manually extract new variations and related information from the published literature every week. Recent dramatic expansion of research in the field has put this task beyond the scope of our available manpower, and so we are now working to establish more automated procedures whereby authors of such papers will automatically be contacted and asked to make pre-formatted data submissions of the information they have published. These submissions will then be automatically validated as much as possible, and entered into HGVbase.
Details of data sources (and submitter contact details where appropriate) are provided within each HGVbase record. Existing HGVbase records as of September 2001 are a composite of information from 791 different sources that provided data as follows: 714 publications, 142 batch submissions, plus information from 30 web databases (AD Study Results Database, Albinism Database, ALFRED Database, Androgen Receptor Mutation Database, Ataxia-Telangiectasia Mutation Database, Breast Cancer Mutation Database, Canvas Database, CGAP-GAI Database, Cystic Fibrosis Mutation Database, Cytokine Gene Polymorphism Database, dbSNP Database, EGP Database, Factor VIII Database, Fanconi Anemia Mutation Database, GM2 Gangliosidoses Database, GSD II Database, Human Gene Mutation Database, Human Ornithine Transcarbamylase Database, Human Type I and Type III Collagen Mutation Database, Hypertension Candidate Gene SNP Database, ICG-HNPCC Database, IMS-JST Database, Leiden Muscular Dystrophy Database, Neuronal Ceroid Lipofuscinoses NCL Mutations Database, Online Mendelian Inheritance in Man Database, p53 Database, Phenylalanine Hydroxylase Locus Database, von Willebrand Factor Database, Whitehead SNP Database, WRN Mutations Database).