The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor–based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool.
The Genographic Project was launched in 2005 to address anthropological questions on a global scale using genetics as a tool. Samples are collected in two ways. First, the project comprises a consortium of ten scientific teams from around the world united by a core ethical and scientific framework that is responsible for sample collection and analysis in their respective region. Second, the project promotes public participation in countries around the world and anyone can participate by purchasing a participation kit (Video S1). The mitochondrial DNA (mtDNA), typed in female participants, is inherited from the mother without recombining, being particularly informative with respect to maternal ancestry. Over the first 18 months of public participation in the project we have built up the largest to date database of mtDNA variants, containing 78,590 entries from around the world. Here, we describe the procedures used to generate, manage, and analyze the genetic data, and the first insights from them. We can understand new aspects of the structure of the mtDNA tree and develop much better ways of classifying mtDNA. We therefore now release this dataset and the new methods we have developed, and will continue to update them as more people join the Genographic Project.
The assignment of haplogroups to mitochondrial DNA haplotypes contributes substantial value for quality control, not only in forensic genetics but also in population and medical genetics. The availability of Phylotree, a widely accepted phylogenetic tree of human mitochondrial DNA lineages, led to the development of several (semi-)automated software solutions for haplogrouping. However, currently existing haplogrouping tools only make use of haplogroup-defining mutations, whereas private mutations (beyond the haplogroup level) can be additionally informative allowing for enhanced haplogroup assignment. This is especially relevant in the case of (partial) control region sequences, which are mainly used in forensics. The present study makes three major contributions toward a more reliable, semi-automated estimation of mitochondrial haplogroups. First, a quality-controlled database consisting of 14,990 full mtGenomes downloaded from GenBank was compiled. Together with Phylotree, these mtGenomes serve as a reference database for haplogroup estimates. Second, the concept of fluctuation rates, i.e. a maximum likelihood estimation of the stability of mutations based on 19,171 full control region haplotypes for which raw lane data is available, is presented. Finally, an algorithm for estimating the haplogroup of an mtDNA sequence based on the combined database of full mtGenomes and Phylotree, which also incorporates the empirically determined fluctuation rates, is brought forward. On the basis of examples from the literature and EMPOP, the algorithm is not only validated, but both the strength of this approach and its utility for quality control of mitochondrial haplotypes is also demonstrated.
mtDNA; Haplogroup; EMPOP; Fluctuation rates; Phylotree
Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics.
Next Generation Sequencing; mtDNA genomes; Heteroplasmy; Sanger-type sequencing; PGM; Forensic science
Genetic variation on the non-recombining portion of the Y chromosome contains information about the ancestry of male lineages. Because of their low rate of mutation, single nucleotide polymorphisms (SNPs) are the markers of choice for unambiguously classifying Y chromosomes into related sets of lineages known as haplogroups, which tend to show geographic structure in many parts of the world. However, performing the large number of SNP genotyping tests needed to properly infer haplogroup status is expensive and time consuming. A novel alternative for assigning a sampled Y chromosome to a haplogroup is presented here. We show that by applying modern machine-learning algorithms we can infer with high accuracy the proper Y chromosome haplogroup of a sample by scoring a relatively small number of Y-linked short tandem repeats (STRs). Learning is based on a diverse ground-truth data set comprising pairs of SNP test results (haplogroup) and corresponding STR scores. We apply several independent machine-learning methods in tandem to learn formal classification functions. The result is an integrated high-throughput analysis system that automatically classifies large numbers of samples into haplogroups in a cost-effective and accurate manner.
The Y chromosome is passed on from father to son as a nearly identical copy. Occasionally, small random changes occur in the Y DNA sequences that are passed forward to the next generation. There are two kinds of changes that may occur, and they both provide vital information for the study of human ancestry. Of the two kinds, one is a single letter change, and the other is a change in the number of short tandemly repeating sequences. The single-letter changes can be laborious to test, but they provide information on deep ancestry. Measuring the number of sequence repeats at multiple places in the genome simultaneously is efficient, and provides information about recent history at a modest cost. We present the novel approach of training a collection of modern machine-learning algorithms with these sequence repeats to infer the single-letter changes, thus assigning the samples to deep ancestry lineages.
Hypertrophic cardiomyopathy (HCM) is a genetic disorder caused by mutations in genes coding for proteins involved in sarcomere function. The disease is associated with mitochondrial dysfunction. Evolutionarily developed variation in mitochondrial DNA (mtDNA), defining mtDNA haplogroups and haplogroup clusters, is associated with functional differences in mitochondrial function and susceptibility to various diseases, including ischemic cardiomyopathy. We hypothesized that mtDNA haplogroups, in particular H, J and K, might modify disease susceptibility to HCM. Mitochondrial DNA, isolated from blood, was sequenced and haplogroups identified in 91 probands with HCM. The association with HCM was ascertained using two Danish control populations. Haplogroup H was more prevalent in HCM patients, 60% versus 46% (p = 0.006) and 41% (p = 0.003), in the two control populations. Haplogroup J was less prevalent, 3% vs. 12.4% (p = 0.017) and 9.1%, (p = 0.06). Likewise, the UK haplogroup cluster was less prevalent in HCM, 11% vs. 22.1% (p = 0.02) and 22.8% (p = 0.04). These results indicate that haplogroup H constitutes a susceptibility factor and that haplogroup J and haplogroup cluster UK are protective factors in the development of HCM. Thus, constitutive differences in mitochondrial function may influence the occurrence and clinical presentation of HCM. This could explain some of the phenotypic variability in HCM. The fact that haplogroup H and J are also modifying factors in ischemic cardiomyopathy suggests that mtDNA haplotypes may be of significance in determining whether a physiological hypertrophy develops into myopathy. mtDNA haplotypes may have the potential of becoming significant biomarkers in cardiomyopathy.
Although the functional consequences of mitochondrial DNA (mtDNA) genetic backgrounds (haplotypes, haplogroups) have been demonstrated by both disease association studies and cell culture experiments, it is not clear which of the mutations within the haplogroup carry functional implications and which are “evolutionary silent hitchhikers”. We set forth to study the functionality of haplogroup-defining mutations within the mtDNA transcription/replication regulatory region by in vitro transcription, hypothesizing that haplogroup-defining mutations occurring within regulatory motifs of mtDNA could affect these processes. We thus screened >2500 complete human mtDNAs representing all major populations worldwide for natural variation in experimentally established protein binding sites and regulatory regions comprising a total of 241 bp in each mtDNA. Our screen revealed 77/241 sites showing point mutations that could be divided into non-fixed (57/77, 74%) and haplogroup/sub-haplogroup-defining changes (i.e., population fixed changes, 20/77, 26%). The variant defining Caucasian haplogroup J (C295T) increased the binding of TFAM (Electro Mobility Shift Assay) and the capacity of in vitro L-strand transcription, especially of a shorter transcript that maps immediately upstream of conserved sequence block 1 (CSB1), a region associated with RNA priming of mtDNA replication. Consistent with this finding, cybrids (i.e., cells sharing the same nuclear genetic background but differing in their mtDNA backgrounds) harboring haplogroup J mtDNA had a >2 fold increase in mtDNA copy number, as compared to cybrids containing haplogroup H, with no apparent differences in steady state levels of mtDNA-encoded transcripts. Hence, a haplogroup J regulatory region mutation affects mtDNA replication or stability, which may partially account for the phenotypic impact of this haplogroup. Our analysis thus demonstrates, for the first time, the functional impact of particular mtDNA haplogroup-defining control region mutations, paving the path towards assessing the functionality of both fixed and un-fixed genetic variants in the mitochondrial genome.
Mitochondria, the ‘power plant’ of the cell, have their own distinct genome (mtDNA), whose sequence varies among individuals around the globe. This variation, which was formed by the accumulation of mutations (variants) during the course of evolution, appears to alter the susceptibility to common complex diseases (such as Parkinson's disease and diabetes). However, since the accumulation of mtDNA mutations over time results in the formation of new combinations (genetic backgrounds), it is not clear which of the mutations are functional and which are “evolutionary silent hitchhikers”. Thus we aimed at assessing the functionality of mtDNA genetic variants, focusing on variants within the mtDNA regulatory region, hypothesizing that they could affect mtDNA activity and maintenance. We found that a variant defining mtDNA genetic background ‘J’ significantly increased the transcriptional efficiency and elevated mtDNA copy numbers in cells, as compared to other genetic backgrounds. Hence, mtDNA regulatory region variants can affect mtDNA maintenance, which may partially account for the involvement of this genetic background in disease susceptibility. Our analysis demonstrates, for the first time, the functional impact of a particular mtDNA variant that was fixed during evolution. Moreover, our findings underline the functionality of mtDNA variants in the evolutionary variable regulatory region.
Recent studies have shown that mtDNA background could affect the clinical expression of Leber hereditary optic neuropathy (LHON). We analyzed the mitochondrial DNA (mtDNA) variation of 304 Chinese patients with m.11778G>A (sample #1) and of 843 suspected LHON patients who lack the three primary mutations (sample #2) to discern mtDNA haplogroup effect on disease onset. Haplogroup frequencies in the patient group was compared to frequencies in the general Han Chinese population (n = 1,689; sample #3). The overall matrilineal composition of the suspected LHON population resembles that of the general Han Chinese population, suggesting no association with mtDNA haplogroup. In contrast, analysis of these LHON patients confirms mtDNA haplogroup effect on LHON. Specifically, the LHON sample significantly differs from the general Han Chinese and suspected LHON populations by harboring an extremely lower frequency of haplogroup R9, in particular of its main sub-haplogroup F (#1 vs. #3, P-value = 1.46×10−17, OR = 0.051, 95% CI: 0.016–0.162; #1 vs. #2, P-value = 4.44×10−17, OR = 0.049, 95% CI: 0.015–0.154; in both cases, adjusted P-value <10−5) and higher frequencies of M7b (#1 vs. #3, adjusted P-value = 0.001 and #1 vs. #2, adjusted P-value = 0.004). Our result shows that mtDNA background affects LHON in Chinese patients with m.11778G>A but not suspected LHON. Haplogroup F has a protective effect against LHON, while M7b is a risk factor.
To perform a genetic characterization of 7 skeletons from medieval age found in a burial site in the Aragonese Pyrenees.
Allele frequencies of autosomal short tandem repeats (STR) loci were determined by 3 different STR systems. Mitochondrial DNA (mtDNA) and Y-chromosome haplogroups were determined by sequencing of the hypervariable segment 1 of mtDNA and typing of phylogenetic Y chromosome single nucleotide polymorphisms (Y-SNP) markers, respectively. Possible familial relationships were also investigated.
Complete or partial STR profiles were obtained in 3 of the 7 samples. Mitochondrial DNA haplogroup was determined in 6 samples, with 5 of them corresponding to the haplogroup H and 1 to the haplogroup U5a. Y-chromosome haplogroup was determined in 2 samples, corresponding to the haplogroup R. In one of them, the sub-branch R1b1b2 was determined. mtDNA sequences indicated that some of the individuals could be maternally related, while STR profiles indicated no direct family relationships.
Despite the antiquity of the samples and great difficulty that genetic analyses entail, the combined use of autosomal STR markers, Y-chromosome informative SNPs, and mtDNA sequences allowed us to genotype a group of skeletons from the medieval age.
Mitochondrial DNA (mtDNA) variation has recently been suggested to have an association with various cancers, including prostate cancer risk, in human populations. Since mtDNA is haploid and lacks recombination, specific mutations in the mtDNA genome associated with human diseases arise and remain in particular genetic backgrounds referred to as haplogroups. To assess the possible contribution of mtDNA haplogroup-specific mutations to the occurrence of prostate cancer, we have therefore performed a population-based study of a prostate cancer cases and corresponding controls from the Korean population. No statistically significant difference in the distribution of mtDNA haplogroup frequencies was observed between the case and control groups of Koreans. Thus, our data imply that specific mtDNA mutations/lineages did not appear to have a significant effect on a predisposition to prostate cancer in the Korean population, although larger sample sizes are necessary to validate our results.
Mitochondrial DNA (mtDNA) is widely being used for population genetics, forensic DNA fingerprinting and clinical disease association studies. The recent past has uncovered severe problems with mtDNA genotyping, not only due to the genotyping method itself, but mainly to the post-lab transcription, storage and report of mtDNA genotypes.
eCOMPAGT, a system to store, administer and connect phenotype data to all kinds of genotype data is now enhanced by the possibility of storing mtDNA profiles and allowing their validation, linking to phenotypes and export as numerous formats. mtDNA profiles can be imported from different sequence evaluation programs, compared between evaluations and their haplogroup affiliations stored. Furthermore, eCOMPAGT has been improved in its sophisticated transparency (support of MySQL and Oracle), security aspects (by using database technology) and the option to import, manage and store genotypes derived from various genotyping methods (SNPlex, TaqMan, and STRs). It is a software solution designed for project management, laboratory work and the evaluation process all-in-one.
The extended mtDNA version of eCOMPAGT was designed to enable error-free post-laboratory data handling of human mtDNA profiles. This software is suited for small to medium-sized human genetic, forensic and clinical genetic laboratories. The direct support of MySQL and the improved database security options render eCOMPAGT a powerful tool to build an automated workflow architecture for several genotyping methods. eCOMPAGT is freely available at http://dbis-informatik.uibk.ac.at/ecompagt.
The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome.
We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome.
Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications.
The Norris Farms No. 36 cemetery in central Illinois has been the subject of considerable archaeological and genetic research. Both mitochondrial DNA (mtDNA) and nuclear DNA have been examined in this 700-year-old population. DNA preservation at the site was good, with about 70% of the samples producing mtDNA results and approximately 15% yielding nuclear DNA data. All four of the major Amerindian mtDNA haplogroups were found, in addition to a fifth haplogroup. Sequences of the first hypervariable region of the mtDNA control region revealed a high level of diversity in the Norris Farms population and confirmed that the fifth haplogroup associates with Mongolian sequences and hence is probably authentic. Other than a possible reduction in the number of rare mtDNA lineages in many populations, it does not appear as if European contact significantly altered patterns of Amerindian mtDNA variation, despite the large decrease in population size that occurred. For nuclear DNA analysis, a novel method for DNA-based sex identification that uses nucleotide differences between the X and Y copies of the amelogenin gene was developed and applied successfully in approximately 20 individuals. Despite the well-known problems of poor DNA preservation and the ever-present possibility of contamination with modern DNA, genetic analysis of the Norris Farms No. 36 population demonstrates that ancient DNA can be a fruitful source of new insights into prehistoric populations.
Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920 000±190 000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA.
Ovis aries; domestication; mitochondria; genome; diversity
MtDNA haplogroups could have important implication for understanding of the relationship between the mutations of the mitochondrial genome and diseases. Distribution of a variety of diseases among these haplogroups showed that some of the mitochondrial haplogroups are predisposed to disease. To examine the susceptibility of mtDNA haplogroups to ROU, we sequenced the mtDNA HV1, HV2 and HV3 in Chinese ROU.
MtDNA haplogroups were analyzed in the 249 cases of ROU patients and the 237 cases of healthy controls respectively by means of primer extension analysis and DNA sequencing. Haplogroups G1 and H were found significantly more abundant in ROU patients than in healthy persons, while haplogroups D5 and R showed a trend toward a higher frequency in control as compared to those in patients. The distribution of C-stretch sequences polymorphism in mtDNA HV1, HV2 and HV3 regions was found in diversity.
For the first time, the relationship of mtDNA haplogroups and ROU in Chinese was investigated. Our results indicated that mtDNA haplogroups G1 and H might constitute a risk factor for ROU, which possibly increasing the susceptibility of ROU. Meanwhile, haplogroups D5 and R were indicated as protective factors for ROU. The polymorphisms of C-stretch sequences might being unstable and influence the mtDNA replication fidelity.
The Maldives are an 850 km-long string of atolls located centrally in the northern Indian Ocean basin. Because of this geographic situation, the present-day Maldivian population has potential for uncovering genetic signatures of historic migration events in the region. We therefore studied autosomal DNA-, mitochondrial DNA-, and Y-chromosomal DNA markers in a representative sample of 141 unrelated Maldivians, with 119 from six major settlements. We found a total of 63 different mtDNA haplotypes that could be allocated to 29 mtDNA haplogroups, mostly within the M, R, and U clades. We found 66 different Y-STR haplotypes in 10 Y-chromosome haplogroups, predominantly H1, J2, L, R1a1a, and R2. Parental admixture analysis for mtDNA- and Y-haplogroup data indicates a strong genetic link between the Maldive Islands and mainland South Asia, and excludes significant gene flow from Southeast Asia. Paternal admixture from West Asia is detected, but cannot be distinguished from admixture from South Asia. Maternal admixture from West Asia is excluded. Within the Maldives, we find a subtle genetic substructure in all marker systems that is not directly related to geographic distance or linguistic dialect. We found reduced Y-STR diversity and reduced male-mediated gene flow between atolls, suggesting independent male founder effects for each atoll. Detected reduced female-mediated gene flow between atolls confirms a Maldives-specific history of matrilocality. In conclusion, our new genetic data agree with the commonly reported Maldivian ancestry in South Asia, but furthermore suggest multiple, independent immigration events and asymmetrical migration of females and males across the archipelago. Am J Phys Anthropol 151:58–67, 2013. © 2013 Wiley Periodicals, Inc.
Y chromosome; mitochondrial DNA; migration; Indo-Aryan languages; South Asia
An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was
proposed based upon combining a set of heterogeneous algorithms including support vector
machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their
prediction through a voting system. Additionally, the proposed algorithm, the
classification performance was also improved using discriminative features,
self-containment and its derivatives, which have shown unique structural robustness
characteristics of pre-miRNAs. These are applicable across different species. By applying
preprocessing methods—both a correlation-based feature selection (CFS) with genetic
algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique
(SMOTE) bagging rebalancing method—improvement in the performance of this ensemble
was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross
validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of
98.3%—this is better in trade-off sensitivity and specificity values than
those of other state-of-the-art methods. The ensemble model was applied to animal, plant
and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the
discriminative set of selected features also suggests that pre-miRNAs possess high
intrinsic structural robustness as compared with other stem loops. Our heterogeneous
ensemble method gave a relatively more reliable prediction than those using single
classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html.
Geographic distribution of the genetic diversity in domestic animals, particularly mitochondrial DNA, has often been used to infer centers of domestication. The underlying presumption is that phylogeographic patterns among domesticates were established during, or shortly after the domestication. Human activities are assumed not to have altered the haplogroup frequencies to any great extent. We studied this hypothesis by analyzing 24 mtDNA sequences in ancient Scandinavian dogs. Breeds originating in northern Europe are characterized by having a high frequency of mtDNA sequences belonging to a haplogroup rare in other populations (HgD). This has been suggested to indicate a possible origin of the haplogroup (perhaps even a separate domestication) in central or northern Europe.
The sequences observed in the ancient samples do not include the haplogroup indicative for northern European breeds (HgD). Instead, several of them correspond to haplogroups that are uncommon in the region today and that are supposed to have Asian origin.
We find no evidence for local domestication. We conclude that interpretation of the processes responsible for current domestic haplogroup frequencies should be carried out with caution if based only on contemporary data. They do not only tell their own story, but also that of humans.
The Koreans are generally considered a northeast Asian group because of their geographical location. However, recent findings from Y chromosome studies showed that the Korean population contains lineages from both southern and northern parts of East Asia. To understand the genetic history and relationships of Korea more fully, additional data and analyses are necessary.
Methodology and Results
We analyzed mitochondrial DNA (mtDNA) sequence variation in the hypervariable segments I and II (HVS-I and HVS-II) and haplogroup-specific mutations in coding regions in 445 individuals from seven east Asian populations (Korean, Korean-Chinese, Mongolian, Manchurian, Han (Beijing), Vietnamese and Thais). In addition, published mtDNA haplogroup data (N = 3307), mtDNA HVS-I sequences (N = 2313), Y chromosome haplogroup data (N = 1697) and Y chromosome STR data (N = 2713) were analyzed to elucidate the genetic structure of East Asian populations. All the mtDNA profiles studied here were classified into subsets of haplogroups common in East Asia, with just two exceptions. In general, the Korean mtDNA profiles revealed similarities to other northeastern Asian populations through analysis of individual haplogroup distributions, genetic distances between populations or an analysis of molecular variance, although a minor southern contribution was also suggested. Reanalysis of Y-chromosomal data confirmed both the overall similarity to other northeastern populations, and also a larger paternal contribution from southeastern populations.
The present work provides evidence that peopling of Korea can be seen as a complex process, interpreted as an early northern Asian settlement with at least one subsequent male-biased southern-to-northern migration, possibly associated with the spread of rice agriculture.
Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction.
The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries.
We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM.
We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance.
The SVM Classifier is available at .
To provide a screening tool to reduce time and sample consumption when attempting mtDNA haplogroup typing.
A single base primer extension assay was developed to enable typing, in a single reaction, of twelve mtDNA haplogroup specific polymorphisms. For validation purposes a total of 147 samples were tested including 73 samples successfully haplogroup typed using mtDNA control region (CR) sequence data, 21 samples inconclusively haplogroup typed by CR data, 20 samples previously haplogroup typed using restriction fragment length polymorphism (RFLP) analysis, and 31 samples of known ancestral origin without previous haplogroup typing. Additionally, two highly degraded human bones embalmed and buried in the early 1950s were analyzed using the single nucleotide polymorphisms (SNP) multiplex.
When the SNP multiplex was used to type the 96 previously CR sequenced specimens, an increase in haplogroup or macrohaplogroup assignment relative to conventional CR sequence analysis was observed. The single base extension assay was also successfully used to assign a haplogroup to decades-old, embalmed skeletal remains dating to World War II.
The SNP multiplex was successfully used to obtain haplogroup status of highly degraded human bones, and demonstrated the ability to eliminate possible contributors. The SNP multiplex provides a low-cost, high throughput method for typing of mtDNA haplogroups A, B, C, D, E, F, G, H, L1/L2, L3, M, and N that could be useful for screening purposes for human identification efforts and anthropological studies.
We analyzed the frequency of four mitochondrial DNA haplogroups in 424 individuals from 21 Colombian Amerindian tribes. Our results showed a high degree of mtDNA diversity and genetic heterogeneity. Frequencies of mtDNA haplogroups A and C were high in the majority of populations studied. The distribution of these four mtDNA haplogroups from Amerindian populations was different in the northern region of the country compared to those in the south. Haplogroup A was more frequently found among Amerindian tribes in northern Colombia, while haplogroup D was more frequent among tribes in the south. Haplogroups A, C and D have clinal tendencies in Colombia and South America in general. Populations belonging to the Chibcha linguistic family of Colombia and other countries nearby showed a strong genetic differentiation from the other populations tested, thus corroborating previous findings. Genetically, the Ingano, Paez and Guambiano populations are more closely related to other groups of south eastern Colombia, as also inferred from other genetic markers and from archeological data. Strong evidence for a correspondence between geographical and linguistic classification was found, and this is consistent with evidence that gene flow and the exchange of customs and knowledge and language elements between groups is facilitated by close proximity.
mitochondrial DNA; Amerindian; Colombia; Chibcha; genetic relationships
For the past few years, scientific controversy has surrounded the large number of errors in forensic and literature mitochondrial DNA (mtDNA) data. However, recent research has shown that using mtDNA phylogeny and referring to known mtDNA haplotypes can be useful for checking the quality of sequence data.
We developed a Web-based bioinformatics resource "mtDNAmanager" that offers a convenient interface supporting the management and quality analysis of mtDNA sequence data. The mtDNAmanager performs computations on mtDNA control-region sequences to estimate the most-probable mtDNA haplogroups and retrieves similar sequences from a selected database. By the phased designation of the most-probable haplogroups (both expected and estimated haplogroups), mtDNAmanager enables users to systematically detect errors whilst allowing for confirmation of the presence of clear key diagnostic mutations and accompanying mutations. The query tools of mtDNAmanager also facilitate database screening with two options of "match" and "include the queried nucleotide polymorphism". In addition, mtDNAmanager provides Web interfaces for users to manage and analyse their own data in batch mode.
The mtDNAmanager will provide systematic routines for mtDNA sequence data management and analysis via easily accessible Web interfaces, and thus should be very useful for population, medical and forensic studies that employ mtDNA analysis. mtDNAmanager can be accessed at .
When domestic taurine cattle diffused from the Fertile Crescent, local wild aurochsen (Bos primigenius) were still numerous. Moreover, aurochsen and introduced cattle often coexisted for millennia, thus providing potential conditions not only for spontaneous interbreeding, but also for pastoralists to create secondary domestication centers involving local aurochs populations. Recent mitochondrial genomes analyses revealed that not all modern taurine mtDNAs belong to the shallow macro-haplogroup T of Near Eastern origin, as demonstrated by the detection of three branches (P, Q and R) radiating prior to the T node in the bovine phylogeny. These uncommon haplogroups represent excellent tools to evaluate if sporadic interbreeding or even additional events of cattle domestication occurred.
The survey of the mitochondrial DNA (mtDNA) control-region variation of 1,747 bovine samples (1,128 new and 619 from previous studies) belonging to 37 European breeds allowed the identification of 16 novel non-T mtDNAs, which after complete genome sequencing were confirmed as members of haplogroups Q and R. These mtDNAs were then integrated in a phylogenetic tree encompassing all available P, Q and R complete mtDNA sequences.
Phylogenetic analyses of 28 mitochondrial genomes belonging to haplogroups P (N = 2), Q (N = 16) and R (N = 10) together with an extensive survey of all previously published mtDNA datasets revealed major similarities between haplogroups Q and T. Therefore, Q most likely represents an additional minor lineage domesticated in the Near East together with the founders of the T subhaplogroups. Whereas, haplogroup R is found, at least for the moment, only in Italy and nowhere else, either in modern or ancient samples, thus supporting an origin from European aurochsen. Haplogroup R could have been acquired through sporadic interbreeding of wild and domestic animals, but our data do not rule out the possibility of a local and secondary event of B. primigenius domestication in Italy.
The genetics and pathophysiology of Alzheimer Disease (AD) and Parkinson Disease (PD) appears complex. However, mitochondrial dysfunction is a common observation in these and other neurodegenerative diseases
Scope of Review
We argue that the available data on AD and PD can be incorporated into a single integrated paradigm based on mitochondrial genetics and pathophysiology.
Rare chromosomal cases of AD and PD can be interpreted as affecting mitochondrial function, quality control, and mitochondrial DNA (mtDNA) integrity. mtDNA lineages, haplogroups, such haplogroup H5a which harbors the mtDNA tRNAGln A8336G variant, are important risk factors for AD and PD. Somatic mtDNA mutations are elevated in AD, PD, and Down Syndrome and Dementia (DSAD) both in brains and also systemically. AD, DS, and DSAD brains also have reduced mtDNA ND6 mRNA levels, altered mtDNA copy number, and perturbed Aβ metabolism. Classical AD genetic changes incorporated into the 3XTg-AD (APP, Tau, PS1) mouse result in reduced forebrain size, life-long reduced mitochondrial respiration in 3XTg-AD males, and initially elevated respiration and complex I and IV activities in 3XTg-AD females which markedly declines with age.
Therefore, mitochondrial dysfunction provides a unifying genetic and pathophysiology explanation for AD, PD, and other neurodegenerative diseases.
Alzheimer Disease; Parkinson Disease; Mitochondria; mtDNA; 3XTg-AD Mouse; oxidative phosphorylation
Mitochondrial disease can be attributed to both mitochondrial and nuclear gene mutations. It has a heterogeneous clinical and biochemical profile, which is compounded by the diversity of the genetic background. Disease-based epidemiological information has expanded significantly in recent decades, but little information is known that clarifies the aetiology in African patients. The aim of this study was to investigate mitochondrial DNA variation and pathogenic mutations in the muscle of diagnosed paediatric patients from South Africa. A cohort of 71 South African paediatric patients was included and a high-throughput nucleotide sequencing approach was used to sequence full-length muscle mtDNA. The average coverage of the mtDNA genome was 81±26 per position. After assigning haplogroups, it was determined that although the nature of non-haplogroup-defining variants was similar in African and non-African haplogroup patients, the number of substitutions were significantly higher in African patients. We describe previously reported disease-associated and novel variants in this cohort. We observed a general lack of commonly reported syndrome-associated mutations, which supports clinical observations and confirms general observations in African patients when using single mutation screening strategies based on (predominantly non-African) mtDNA disease-based information. It is finally concluded that this first extensive report on muscle mtDNA sequences in African paediatric patients highlights the need for a full-length mtDNA sequencing strategy, which applies to all populations where specific mutations is not present. This, in addition to nuclear DNA gene mutation and pathogenicity evaluations, will be required to better unravel the aetiology of these disorders in African patients.
mitochondrial DNA; mitochondrial diseases; paediatrics; Africa; high-throughput nucleotide sequencing