In addition to disease-causing mutations, HGMD seeks to include polymorphic DNA sequence variants that are either disease-associated and of direct functional significance, or of clear functional significance even though an associated clinical phenotype has yet to be identified. At present, these polymorphic variants comprise about 5% of HGMD data and approximately 55% of these are 'disease-associated'. The remainder represent variants that, despite manifesting no demonstrable disease association, have nevertheless been shown to significantly alter the expression of a gene or the structure/function of the gene product. Although functional polymorphisms with no known disease association do not have any immediate clinical relevance, these data are potentially very valuable in terms of understanding inter-individual differences in disease susceptibility.
Although the vast majority of polymorphic variants in HGMD are single-nucleotide polymorphisms (SNPs), a small number are of the insertion/deletion type. The polymorphic variants logged in HGMD are generally located in either the gene promoter or coding regions. However, it should be noted that SNPs occurring outside of these regions may nevertheless still have consequences for gene expression, splicing or transcription-factor binding. Polymorphic variants affecting individual drug response [14
], patient survival times after diagnosis and responses to surgical intervention are not generally included in HGMD. Studies that simply report SNPs [15
] in association with disease (and hence are likely to represent merely a linkage disequilibrium effect), but with no additional evidence of direct functional involvement of the variants in question, are also not included. Reports of haplotypes associated with an increased risk of disease are not included unless there is some indication as to precisely which variant(s) within the haplotype is/are responsible for the disease association or functional effect.
In some instances, the above criteria may be only partially satisfied, such that the HGMD curators remain unconvinced as to the clinical phenotypic relevance of the reported polymorphic variant. In such cases, the polymorphism may nevertheless still be included (i) as a result of supporting information becoming available subsequent to the publication of the original report, or (ii) because the associated gene/disease state was deemed to be of sufficient importance for it to warrant further study. Such variants are generally ascribed the descriptor 'association with?' to indicate that some degree of uncertainty is involved. The difficulty inherent in making decisions regarding the inclusion or exclusion of variants that have potential disease associations highlights the need for a methodical and methodologically uniform approach to assessing such reports as they appear in the literature [16
Several other databases [17
] have attempted to collate known polymorphism-disease associations but have met with only partial success owing to an over-reliance on computerized search procedures and automated data collection. This methodology tends to result in the creation of a database that comprises either verbatim and/or often inconsistent records of the disease-associated variants, or merely a list of PubMed citations rather than the actual variants in question. Polymorphism-disease association data curated in this way are also likely to comprise markers that occur in linkage disequilibrium with the presumed disease-associated/functional variants rather than being of functional significance themselves. We on the HGMD team believe that a manually curated database provides a rather better solution. Indeed, HGMD is currently the only database that focuses specifically on the collation of functional/disease-associated polymorphic variants to the exclusion of linkage markers.
A current limitation with regard to recording disease-associated polymorphic variants of functional significance within HGMD is the inclusion of only a single literature reference for each variant. A large proportion of those papers reporting a novel association between a disease and a polymorphic variant do not include functional data on that variant. HGMD will in the future address this by implementing a dual referencing system for polymorphisms: reference 1 will correspond to the first report demonstrating a functional effect (or disease-association) that meets the HGMD inclusion criteria, whereas reference 2 will (where appropriate) provide evidence of the first disease-association (or functional effect) of the polymorphism.