In the present article, we report an updating of the HmtDB genomic resource (
8), a database intended to support both population geneticists as well as clinicians undertaking the task to assess the pathogenicity of specific mtDNA mutations. The major improvements here reported concern the query system and the classifier tools. The increasing number of complete and fragmented human mitochondrial genomes published requires the availability of updated and well-structured resources to give access in real time to the new data produced. In this context the Phylotree web system, integrated with HmtDB, represents a valid support to this aim, making the haplogroup prediction of newly produced genomes as well as of genomes previously published fine and reliable. Indeed, once new genomes are sequenced, the browsing of the database and the application of the classifiers become pivotal procedures in order to verify the quality of the sequence. In fact, in the attempt to attribute the haplogroup to the submitted genome, the classify tool provides information that may attribute an inappropriate haplogroup. The reason for this may reside in the misreading of the sequence. In light of the many cases of misinterpretation due to nuclear mitochondrial sequences (NumtS) contamination, phantom mutations and recombination artifacts, as brought up by Refs (
22–25), it is plausible to wonder whether the increase in number of human mitochondrial genomes relevantly contributes to foster knowledge or merely to augment noise (
22). A critical examination of the classify results may highlight the absence/presence of some alleles which could unequivocally allow the attribution of the genome to an appropriate haplogroup. This aspect strengthens the value of HmtDB in the application of the classifier tools both in the case of the complete genome as well as in the case of fragments. However, in the latter circumstance, the reliability of the predicted haplogroup depends upon the length as well as the region of the genome where the fragment maps. Indeed, the haplogroup defining sites are spread along the genome albeit not in a homogeneous fashion. Thus, one of our main goals is to identify the most informative regions on the basis of the Phylotree data. However, the application of the fragment classifier tool has demonstrated to be a powerful tool when applied to the HmtDB partial genomes (
Supplementary File 3). Such application has allowed to update the haplogroup assignment taking into account the more detailed actual classification as well as to revise incorrect attribution. Examples of human mtDNA data revised upon the direct implementation of HmtDB tools are abundant in the recent literature (
6,
7,
26,
27). The query system helps researchers to easily retrieve data on variants not listed in other databases (e.g. MITOMAP), when pathogenic potential of mtDNA variants need to be predicted. Also, the availability of site variability data contributes both to the definition of new haplogroups and to the recognition of private variants or mutations with a potential pathogenic role. It is worth mentioning that with respect to amino acid changes, the estimation of the functional effect obtained with external systems such as PolyPhen (
28), or SIFT (
29) may further contribute to evaluate the quality of the sequence as well as the impact on the phenotype. However, disease-associated mutations are not only located in protein coding loci. Various mutations have been described as responsible for or associated with disease, which map within RNA genes (both tRNAs and rRNAs). In these cases, the observation of the effect on the secondary structure as well as the comparison of the site variability within the inter-vertebrata MA ought to be compulsory validating procedures in order to estimate pathogenicity.
Furthermore, both quality and correct interpretation of collected data is of utmost importance to this purpose. In clinical studies, HmtDB may represent an advanced tool towards a further development of tailored databases including all relevant data related to genotype, phenotype, family history, healthy controls and functional studies to allow a more accurate interpretation for clinicians and their patients. To start developing this ambitious task, the first issue we intend to tackle concerns the individual type annotation. In this context, the clear distinction between healthy and pathologic categories of the data adds an important value to the database. Genomes are in fact separated in the two categories based on what is reported in the paper where they are extracted from. Nonetheless, definition of pathologic with respect to healthy may be ambiguous as the latter adjective may be used to indicate that the subject does not show the pathology for which he/she is used as control, which usually occurs in case–control studies. Within HmtDB, the pathologic subset indicates that the sequence comes either from a pathologic somatic tissue or from the blood of affected individuals, whom in many cases may have been screened following a suspect or a diagnosis of a mitochondrial disease. In cases when the mtDNA is sequenced from a pathologic somatic tissue, it is worth to underline that such sequences may not be necessarily different from the constitutive mtDNA, unless they harbor proven somatic mutations. Similarly, sequences obtained from constitutive mtDNA of pathologic individuals may not always harbor germ-line mutations. With respect to the healthy category, it must be said that it includes all mtDNAs for which the health status of the subject is not specified, a common feature for genomes extracted from population-based studies. These genomes may not be necessarily devoid of private or pathologic mutations, although the fact that they have been mostly sequenced from blood (or saliva) renders this condition very likely. Ultimately, there is still a need for a more punctual annotation. Lack of specifications as well as the plethora of yet uncovered mitochondria-related pathologic conditions makes it virtually impossible to correctly classify mtDNAs within the two above-mentioned categories, with the exception of few certain cases. Nevertheless, this division is thus far, in our opinion, the most functional when it comes to choose a population of controls, which however should be carefully selected based on the variability data HmtDB reports.
HmtDB is not the only resource where data concerning human mtDNA variability are stored. Starting from the MITOMAP home page, links to the most relevant databases are implemented. Nonetheless, upon accessing some of these resources, it becomes evident that they are no longer updated, whereas others host data confined to the HVS1 and HVS2 regions only. Only MitoTool (
30) presents aspects which overlap with the features implemented in HmtDB. All the others are either incomplete or provide complementary information, such as Zaramit (
31), Phylotree (
10), EMPOP (
32) and MITOMAP (
11). The combined use of all these resources, integrated with HmtDB, have the potential to offer the scientific community a true human ‘mitochondriome’ portal.