Motivation: Y-chromosome short tandem repeats (Y-STRs) are widely used for population studies, forensic purposes and, potentially, the study of disease, therefore knowledge of their mutation rate is valuable. Here we show a novel method for estimation of site-specific Y-STR mutation rates from partial phylogenetic information, via the maximum likelihood framework.
Results: Given Y-STR data classified into haplogroups, we de-scribe the likelihood of observed data, and develop optimization strategies for deriving maximum likelihood estimates of mutation rates. We apply our method to Y-STR data from two recent papers. We show that our estimates are comparable, often more accurate than those obtained in familial studies, although our data sample is much smaller, and was not collected specifically for our study. Furthermore, we obtain mutation rate estimates for DYS388, DYS426, DYS457, three STRs for which there were no mutation rate measures until now.
The PCR amplification of tetranucleotide short tandem repeat (STR) loci typically produces a minor product band 4 bp shorter than the corresponding main allele band; this is referred to as the stutter band. Sequence analysis of the main and stutter bands for two sample alleles of the STR locus vWA reveals that the stutter band lacks one repeat unit relative to the main allele. Sequencing results also indicate that the number and location of the different 4 bp repeat units vary between samples containing a typical verses low proportion of stutter product. The results also suggest that the proportion of stutter product relative to the main allele increases as the number of uninterrupted core repeat units increases. The sequence analysis and results obtained using various DNA polymerases appear to support the slipped strand displacement model as a potential explanation for how these stutter products are generated.
A feature of Haemophilus influenzae genomes is the presence of several loci containing tracts of six or more identical tetranucleotide repeat units. These repeat tracts are unstable and mediate high frequency, reversible alterations in the expression of surface antigens. This process, termed phase variation (PV), enables H.influenzae to rapidly adapt to fluctuations in the host environment. Perturbation of lagging strand DNA synthesis is known to destabilize simple sequence repeats in yeast and Escherichia coli. By using a chromosomally located reporter construct, we demonstrated that the mutation of an H.influenzae rnhA (encoding RnaseHI) homologue increases the mutation rates of tetranucleotide repeats ∼3-fold. Additionally, deletion of the Klenow domain of DNA polymerase I (PolI) resulted in a ∼35-fold increase in tetranucleotide repeat-mediated PV rates. Deletion of the PolI 5′>3′ exonuclease domain appears to be lethal. The phenotypes of these mutants suggest that delayed or mutagenic Okazaki fragment processing destabilizes H.influenzae tetranucleotide repeat tracts.
The objective of this study was to investigate the quantitative characteristics of short tandem repeat (STR) variations deduced on the basis of the number of STRs that are beneficial for human survival. The longevity group included 60 nonagenarian subjects, and the control group included 250 reference adults (age, 20–50 years). Alleles of 15 Combined DNA Index System STR loci were determined using a commercial polymerase chain reaction kit. An STR with the highest frequency distribution in a population (control group) was considered as a conservative STR, and the number of core unit repeats of this STR allele was considered as the median repeat number in the STR locus (STRm). The absolute difference between the STRm and the number of core unit repeats of other STR alleles can be considered as the quantitative marker of variation for that particular STR allele (M value). The mean M values of CSF1TPO in the longevity group were significantly higher than those in the control group (P < 0.05). These findings appear to suggest that at least one of the STR loci may be associated with longevity. The M value of STR may be a new and high-efficacy genetic marker.
Longevity; STR; Genetic marker; Variation
A tandem repeat’s (TR) propensity to mutate increases with repeat number, and can become very pronounced beyond a critical boundary, transforming it into a microsatellite (MS). However, a clear understanding of the mutational behavior of different TR classes and motifs and related mechanisms is lacking, as is a consensus on the existence of a boundary separating short TRs (STRs) from MSs. This hinders our understanding of MSs’ mutational properties and their effective use as genetic markers. Using indel calls for 179 individuals from 1000 Genomes Pilot-1 Project, we determined polymorphism incidence for four major TR classes, and formalized its varying relationship with repeat number using segmented regression. We observed a biphasic regime with a transition from a faster to a slower exponential growth at 9, 5, 4, and 4 repeats for mono-, di-, tri-, and tetranucleotide TRs, respectively. We used an in vitro mutagenesis assay to evaluate the contribution of strand slippage errors to mutability. STRs and MSs differ in their absolute polymorphism levels, but more importantly in their rates of mutability growth. Although strand slippage is a major factor driving mononucleotide polymorphism incidence, dinucleotide polymorphism incidence is greater than that expected due to strand slippage alone, indicating that additional cellular factors might be driving dinucleotide mutability in the human genome. Leveraging on hundreds of human genomes, we present the first comprehensive, genome-wide analysis of TR mutational behavior, encompassing several motif sizes and compositions.
tandem repeats; short tandem repeats; microsatellites; replication slippage; segmented regression; change point
1.33 Mb of sequence from the human Y chromosome was searched for tri- to hexanucleotide microsatellites. Twenty loci containing a stretch of eight or more repeat units with complete repeat sequence homogeneity were found, 18 of which were novel. Six loci (one tri-, four tetra- and one pentanucleotide) were assembled into a single multiplex reaction and their degree of polymorphism was investigated in a sample of 278 males from Pakistan. Diversities of the individual loci ranged from 0.064 to 0.727 in Pakistan, while the haplotype diversity was 0.971. One population, the Hazara, showed particularly low diversity, with predominantly two haplotypes. As the sequence builds up in the databases, direct methods such as this will replace more biased and technically demanding indirect methods for the isolation of microsatellites.
To determine the human Y-chromosome haplogroup backgrounds of intermediate-sized variant alleles displayed by short tandem repeat (STR) loci DYS392, DYS449, and DYS385, and to evaluate the potential of each intermediate variant to elucidate new phylogenetic substructure within the human Y-chromosome haplogroup tree.
Molecular characterization of lineages was achieved using a combination of Y-chromosome haplogroup defining binary polymorphisms and up to 37 short tandem repeat loci. DNA sequencing and median-joining network analyses were used to evaluate Y-chromosome lineages displaying intermediate variant alleles.
We show that DYS392.2 occurs on a single haplogroup background, specifically I1*-M253, and likely represents a new phylogenetic subdivision in this European haplogroup. Intermediate variants DYS449.2 and DYS385.2 both occur on multiple haplogroup backgrounds, and when evaluated within specific haplogroup contexts, delineate new phylogenetic substructure, with DYS449.2 being informative within haplogroup A-P97 and DYS385.2 in haplogroups D-M145, E1b1a-M2, and R1b*-M343. Sequence analysis of variant alleles observed within the various haplogroup backgrounds showed that the nature of the intermediate variant differed, confirming the mutations arose independently.
Y-chromosome short tandem repeat intermediate variant alleles, while relatively rare, typically occur on multiple haplogroup backgrounds. This distribution indicates that such mutations arise at a rate generally intermediate to those of binary markers and Y-STR loci. As a result, intermediate-sized Y-STR variants can reveal phylogenetic substructure within the Y-chromosome phylogeny not currently detected by either binary or Y-STR markers alone, but only when such variants are evaluated within a haplogroup context.
In the human genome, short tandem repetitive (STR) DNA sequences often show restriction fragment length polymorphisms (RFLPs) due to variation in the number of copies of the repeat unit. For a subset of these sequences known as minisatellites or variable number tandem repeat loci (VNTR), it has been proposed that a homologous "core" sequence of 10-12 nucleotides is involved in the mechanism(s) generating the polymorphism. In our present study we have prepared oligonucleotide probes complementary to one or two repeat units of several VNTR loci. Under stringent hybridization and wash conditions these probes hybridize locus specifically thus allowing the evaluation of the intrinsic polymorphism of individual loci. Our results indicate that not all of the loci having STR DNA sequences are polymorphic despite the fact that they share the "core" sequence. This suggests that more than the DNA sequence of the locus is involved in the mechanism(s) generating the polymorphism.
In forensic casework, Y chromosome short tandem repeat markers (Y-STRs) are often used to identify a male donor DNA profile in the presence of excess quantities of female DNA, such as is found in many sexual assault investigations. Commercially available Y-STR multiplexes incorporating 12–17 loci are currently used in forensic casework (Promega's PowerPlex® Y and Applied Biosystems' AmpFlSTR® Yfiler®). Despite the robustness of these commercial multiplex Y-STR systems and the ability to discriminate two male individuals in most cases, the coincidence match probabilities between unrelated males are modest compared with the standard set of autosomal STR markers. Hence there is still a need to develop new multiplex systems to supplement these for those cases where additional discriminatory power is desired or where there is a coincidental Y-STR match between potential male participants. Over 400 Y-STR loci have been identified on the Y chromosome. While these have the potential to increase the discrimination potential afforded by the commercially available kits, many have not been well characterized. In the present work, 91 loci were tested for their relative ability to increase the discrimination potential of the commonly used ‘core’ Y-STR loci. The result of this extensive evaluation was the development of an ultra high discrimination (UHD) multiplex DNA typing system that allows for the robust co-amplification of 14 non-core Y-STR loci. Population studies with a mixed African American and American Caucasian sample set (n = 572) indicated that the overall discriminatory potential of the UHD multiplex was superior to all commercial kits tested. The combined use of the UHD multiplex and the Applied Biosystems' AmpFlSTR® Yfiler® kit resulted in 100% discrimination of all individuals within the sample set, which presages its potential to maximally augment currently available forensic casework markers. It could also find applications in human evolutionary genetics and genetic genealogy.
To perform a genetic characterization of 7 skeletons from medieval age found in a burial site in the Aragonese Pyrenees.
Allele frequencies of autosomal short tandem repeats (STR) loci were determined by 3 different STR systems. Mitochondrial DNA (mtDNA) and Y-chromosome haplogroups were determined by sequencing of the hypervariable segment 1 of mtDNA and typing of phylogenetic Y chromosome single nucleotide polymorphisms (Y-SNP) markers, respectively. Possible familial relationships were also investigated.
Complete or partial STR profiles were obtained in 3 of the 7 samples. Mitochondrial DNA haplogroup was determined in 6 samples, with 5 of them corresponding to the haplogroup H and 1 to the haplogroup U5a. Y-chromosome haplogroup was determined in 2 samples, corresponding to the haplogroup R. In one of them, the sub-branch R1b1b2 was determined. mtDNA sequences indicated that some of the individuals could be maternally related, while STR profiles indicated no direct family relationships.
Despite the antiquity of the samples and great difficulty that genetic analyses entail, the combined use of autosomal STR markers, Y-chromosome informative SNPs, and mtDNA sequences allowed us to genotype a group of skeletons from the medieval age.
Here we describe a new panel of short tandem repeats (STRs) for a novel exact typing assay that can be used to discriminate between Aspergillus fumigatus isolates. A total of nine STR markers were selected from available genomic A. fumigatus sequences and were divided into three multicolor multiplex PCRs. Each multiplex reaction amplified three di-, tri-, or tetranucleotide repeats, respectively. All nine STR markers were used to analyze 100 presumably unrelated A. fumigatus isolates. For each marker, between 11 and 37 alleles were found in this population. One isolate proved to be a mixture of at least two different isolates. With the remaining 99 isolates, 96 different fingerprinting profiles were obtained. The Simpson's diversity index for the individual markers ranged from 0.77 to 0.97. The diversity index for the multiplex combination of di-, tri-, and tetranucleotide repeats ranged from 0.9784 to 0.9968. The combination of all nine markers yielded a Simpson's diversity index of 0.9994, indicative of the high discriminatory power of these new loci. In theory, this panel of markers is able to discriminate between no less than 27 × 109 different genotypes. The multicolor multiplex approach allows large numbers of markers to be tested in a short period of time. The exact nature of the assay combines high reproducibility with the easy exchange of results and makes it a very suitable tool for large-scale epidemiological studies.
To evaluate the novel triplex polymerase chain reaction (PCR) assay for the analysis of polymorphic Y-chromosomal short tandem repeat loci (Y-STR).
A total of 14 Y-STR loci was analyzed. Allele frequencies for 3 tetrameric Y-STR loci (DYS449, DYS456, and DYS458) and extended haplotype loci typed by Y-PLEXTM 12 system were investigated in a sample of 50 unrelated healthy Czech male donors. We computed the relevant intra-population statistic parameters for our data (gene diversity, average gene diversity over loci, and mean number of pairwise differences) and compared our sample set with other Central European populations using RST pairwise genetic distance.
We focused on the comparison of genetic diversity between the Y-STR extended haplotype loci and that of the 3 additional loci, and on the benefit of using DYS449, DYS456, and DYS458 in forensic and population genetics applications. Total gene diversity in our sample set was 0.998367 when using all 14 loci. Our data analysis revealed very high genetic diversity at DYS449 locus (0.876735), which surpasses even the diversity at DYS385a/b (0.819592). Population comparison showed no difference between Czech, Bavarian, Austrian, and Saxon sample set. A minor difference was found between Czech and Polish sample set.
Typing of 3 Y-chromosomal microsatellite polymorphisms may provide a useful complement to already established sets of Y-STRs.
The dynamics of microsatellite, or short tandem repeats (STRs), is well documented for long, polymorphic loci, but much less is known for shorter ones. For example, the issue of a minimum threshold length for DNA slippage remains contentious. Model-fitting methods have generally concluded that slippage only occurs over a threshold length of about eight nucleotides, in contradiction with some direct observations of tandem duplications at shorter repeated sites. Using a comparative analysis of the human and chimpanzee genomes, we examined the mutation patterns at microsatellite loci with lengths as short as one period plus one nucleotide. We found that the rates of tandem insertions and deletions at microsatellite loci strongly deviated from background rates in other parts of the human genome and followed an exponential increase with STR size. More importantly, we detected no lower threshold length for slippage. The rate of tandem duplications at unrepeated sites was higher than expected from random insertions, providing evidence for genome-wide action of indel slippage (an alternative mechanism generating tandem repeats). The rate of point mutations adjacent to STRs did not differ from that estimated elsewhere in the genome, except around dinucleotide loci. Our results suggest that the emergence of STR depends on DNA slippage, indel slippage, and point mutations. We also found that the dynamics of tandem insertions and deletions differed in both rates and size at which these mutations take place. We discuss these results in both evolutionary and mechanistic terms.
tandem repeats; comparative genomics; microsatellite emergence; DNA slippage; indel slippage; point mutations; human
Epstein Barr virus (EBV)-transformed lymphoblastoid cell lines (LCLs) are a useful biological resource, however, genomic variations can happen during the generation and immortalization processes of LCLs. The purpose of this study was to identify genomic variations in LCL DNA compared with matched blood DNA using short tandem repeats (STRs) analysis.
We analyzed 15 STRs with blood DNA and their matched LCL DNA samples from 6645 unrelated healthy individuals.
Mutations (such as repeat variations and triallelic patterns) of 15 STR loci were detected in 612 LCL DNAs (9.2% of total) without mutations in their matched blood DNA. The repeat variations of 15 STRs were detected in 526 LCL DNAs (mutation rate = 0.0792) and triallelic patterns were identified in 123 (mutation rate = 0.0185). Among 15 STRs, the most common repeat variations (n = 214, mutation rate = 0.0322) and triallelic patterns (n = 17, mutation rate = 0.0026) were found at FGA locus.
Our study shows that mutations in STRs can occur during generation and immortalization of LCLs.
lymphoblastoid cell lines; short tandem repeats
Recently, the Combined DNA Index System (CODIS) Core Loci Working Group established by the US Federal Bureau of Investigation (FBI) reviewed and recommended changes to the CODIS core loci. The Working Group identified 20 short tandem repeat (STR) loci (composed of the original CODIS core set loci (minus TPOX), four European recommended loci, PentaE, and DYS391) plus the Amelogenin marker as the new core set. Before selecting and finalizing the core loci, some evaluations are needed to provide guidance for the best options of core selection.
The performance of current and newly proposed CODIS core loci sets were evaluated with simplified analyses for adventitious hit rates in reasonably large datasets under single-source profile comparisons, mixture comparisons and kinship searches, and for international data sharing. Informativeness (for example, match probability, average kinship index (AKI)) and mutation rates of each locus were some of the criteria to consider for loci selection. However, the primary factor was performance with challenged forensic samples.
The current battery of loci provided in already validated commercial kits meet the needs for single-source profile comparisons and international data sharing, even with relatively large databases. However, the 13 CODIS core loci are not sufficiently powerful for kinship analyses and searching potential contributors of mixtures in larger databases; 19 or more autosomal STR loci perform better. Y-chromosome STR (Y-STR) loci are very useful to trace paternal lineage, deconvolve female and male mixtures, and resolve inconsistencies with Amelogenin typing. The DYS391 locus is of little theoretical or practical use. Combining five or six Y-chromosome STR loci with existing autosomal STR loci can produce better performance than the same number of autosomal loci for kinship analysis and still yield a sufficiently low match probability for single-source profile comparisons.
A more comprehensive study should be performed to provide the necessary information to decision makers and stakeholders about the construction of a new set of core loci for CODIS. Finally, selection of loci should be driven by the concept that the needs of casework should be supported by the processes of CODIS (or for that matter any forensic DNA database).
Polymorphic short tandem repeat (STR), or microsatellite, loci have been widely used to analyze chimerism status following allogeneic hematopoietic stem cell transplantation (HSCT). The presence of a patient’s DNA, as identified by STR analysis, may indicate residual or recurrent malignant disease or may represent normal hematopoiesis of patient origin. The ratio of patient-derived to donor-derived alleles is used to calculate the relative amount of patient cells (both benign and malignant) to donor cells. STRs on chromosomes known to be gained or lost in a patient’s tumor are generally ignored because it is difficult to perform meaningful calculations of mixed chimerism. However, in this report, we present evidence that STR loci on gained or lost chromosomes are useful in distinguishing the benign or malignant nature of chimeric DNA. In the peripheral blood or bone marrow of four HSCT patients with leukemia or lymphoma, we identified tumor DNA on the basis of STR loci showing copy number alteration. We propose that a targeted evaluation of STR loci showing altered copy number in post-transplant chimerism analysis can provide evidence of residual cancer cells.
short tandem repeat; microsatellite; allogeneic hematopoietic stem cell transplantation; chimerism; leukemia relapse
To investigate the distribution of 17 Y-short tandem repeat (STR) loci in the population of the Cukurova region of Turkey.
In the period between 2009 and 2010, we investigated the distribution of 17 Y-STRs in a sample of 249 unrelated healthy men from the Cukurova region of Turkey. Genomic DNA was extracted with InstaGene matrix and Y-STRs were determined using the AmpFISTR Yfiler PCR amplification kit. Gene and haplotype diversity values were estimated using the Arlequin software. To compare our data to other populations, population pairwise genetic distances and associated probability values were calculated using the Y Chromosome Haplotype Reference Database Web site software.
At 17 Y-STR loci we detected 148 alleles. The lowest gene diversity in this region was 0.51 for DYS391 and the highest 0.95 for DYS385a/b. Haplotype diversity was 0.9997 ± 0.0004. We compared our data with haplotype data of other Turkish populations and no significant differences were found, except with Ankara population (Φst = 0.025, P = 0.018). Comparisons were also made with the neighboring populations using analysis of molecular variance of the Y-STR loci genetic structure and our population was nearest to Lenkoran-Azerbaijani (Φst = 0.012, P = 0.068) and Iranian Ahvaz population (Φst = 0.007, P = 0.173), followed by Greek (Φst = 0.026, P = 0.000) and Russian (Φst = 0.048, P = 0.000) population. Other countries like Portugal, Spain, Italy, Egypt, Israel (Palestinian Authority Area), and Taiwan showed a high genetic distance from our population.
Our study showed that Y-STR polymorphisms were a powerful discrimination tool for routine forensic applications and could be used in genealogical investigations.
To analyze the haplotype of the Ezhava population of Kerala, south India, using 8 short tandem repeat (STR) loci on the Y chromosome and trace the paternal genetic lineage of the population.
Whole blood samples (n = 104) were collected from unrelated healthy men of the Ezhava population over a period of one year from October 2009. Genomic DNA was extracted by salting out method. All samples were genotyped for the 8 Y-STR loci by the AmpFiSTR Y-filer PCR Amplification Kit. The haplotype and allele frequencies were determined by direct counting and analyzed using Arlequin 3.1 software, and molecular variance was calculated with the Y-chromosome haplotype reference database online analysis tool, www.yhrd.org.
Among the 104 examined haplotypes, we found 98 unique ones. The average gene diversity was 0.669, with the highest diversity of 0.9462 observed for the biallelic Y-STR marker DYS 385. The allele frequency among DYS loci varied between 0.0096 and 0.75. Out of the 104 haplotypes, 10 were identical to the Jat Sikh population of Punjab, which is the greatest number among the Indian populations, and 4 to the Turkish population, which is the greatest number among the European populations. According to the allele frequency of Y-STR, the Ezhavas were genetically more similar to the Europeans (60%) than to the East Asians (40%).
The vast majority of haplotypes were observed only once, reflecting the enormous genetic heterogeneity of the Ezhavas. Based on the genotype, the Ezhavas showed more resemblance to Jat Sikh population of Punjab and the Turkish populations than to the East Asians, hence indicating a paternal lineage of European origin.
Microsatellite instability (MSI) at tri- or tetranucleotide repeat markers (elevated microsatellite alterations at selected tetranucleotide repeat, EMAST) has been recently described. But, the underlying genetic mechanism of EMAST is unclear. This study was to investigate the prevalence of EMAST, in type I endometrial carcinoma, and to determine the correlation between the MSI status and mismatch repair genes (MMR) or p53.
We examined the 3 mono-, 3 di-, and 6 tetranucleotide repeat markers by PCR in 39 cases of type I endometrial carcinoma and performed the immunohistochemistry of hMSH2, hMLH1, and p53 protein.
More than two MSI at mono- and dinucleotide repeat markers was noted in 8 cases (MSI-H, 20.5%). MSI, at a tetranucleotide repeat, was detected in 15 cases (EMAST, 38.5%). In remaining 16 cases, any MSI was not observed. (MSS, 42.1%), MSI status was not associated with FIGO stage, grade or depth of invasion. The absence of expression of either one of both hMSH2 or hMLH1 was noted in seven (87.5%) of eight MSI-H tumors, one (6.3%) of 16 MSS tumors, and five (33.3%) of 15 EMAST tumors. (p = 0.010) The expression of p53 protein was found in one (12.5%) of eight MSI-H tumors, five (31.3%) of 16 MSS tumors, and seven of 15 EMAST tumors. (p = 0.247)
Our results showed that about 38.5% of type I endometrial carcinomas exhibited EMAST, and that EMAST was rarely associated with alteration of hMSH2 or hMLH1.
Tumor cell fusion with motile bone marrow-derived cells (BMDCs) has long been posited as a mechanism for cancer metastasis. While there is much support for this from cell culture and animal studies, it has yet to be confirmed in human cancer, as tumor and marrow-derived cells from the same patient cannot be easily distinguished genetically.
We carried out genotyping of a metastatic melanoma to the brain that arose following allogeneic bone-marrow transplantation (BMT), using forensic short tandem repeat (STR) length-polymorphisms to distinguish donor and patient genomes. Tumor cells were isolated free of leucocytes by laser microdissection, and tumor and pre-transplant blood lymphocyte DNAs were analyzed for donor and patient alleles at 14 autosomal STR loci and the sex chromosomes.
All alleles in the donor and patient pre-BMT lymphocytes were found in tumor cells. The alleles showed disproportionate relative abundances in similar patterns throughout the tumor, indicating the tumor was initiated by a clonal fusion event.
Our results strongly support fusion between a BMDC and a tumor cell playing a role in the origin of this metastasis. Depending on the frequency of such events, the findings could have important implications for understanding the generation of metastases, including the origins of tumor initiating cells and the cancer epigenome.
Koreans are generally considered a Northeast Asian group, thought to be related to Altaic-language-speaking populations. However, recent findings have indicated that the peopling of Korea might have been more complex, involving dual origins from both southern and northern parts of East Asia. To understand the male lineage history of Korea, more data from informative genetic markers from Korea and its surrounding regions are necessary. In this study, 25 Y-chromosome single nucleotide polymorphism markers and 17 Y-chromosome short tandem repeat (Y-STR) loci were genotyped in 1,108 males from several populations in East Asia.
In general, we found East Asian populations to be characterized by male haplogroup homogeneity, showing major Y-chromosomal expansions of haplogroup O-M175 lineages. Interestingly, a high frequency (31.4%) of haplogroup O2b-SRY465 (and its sublineage) is characteristic of male Koreans, whereas the haplogroup distribution elsewhere in East Asian populations is patchy. The ages of the haplogroup O2b-SRY465 lineages (~9,900 years) and the pattern of variation within the lineages suggested an ancient origin in a nearby part of northeastern Asia, followed by an expansion in the vicinity of the Korean Peninsula. In addition, the coalescence time (~4,400 years) for the age of haplogroup O2b1-47z, and its Y-STR diversity, suggest that this lineage probably originated in Korea. Further studies with sufficiently large sample sizes to cover the vast East Asian region and using genomewide genotyping should provide further insights.
These findings are consistent with linguistic, archaeological and historical evidence, which suggest that the direct ancestors of Koreans were proto-Koreans who inhabited the northeastern region of China and the Korean Peninsula during the Neolithic (8,000-1,000 BC) and Bronze (1,500-400 BC) Ages.
Tandem repeats (TRs) are unstable regions commonly found within genomes that have consequences for evolution and disease. In humans, polymorphic TRs are known to cause neurodegenerative and neuromuscular disorders as well as being associated with complex diseases such as diabetes and cancer. If present in upstream regulatory regions, TRs can modify chromatin structure and affect transcription; resulting in altered gene expression and protein abundance. The most common TRs are short tandem repeats (STRs), or microsatellites. Promoter located STRs are considerably more polymorphic than coding region STRs. As such, they may be a common driver of phenotypic variation. To study STRs located in regulatory regions, we have performed genome-wide analysis to identify all STRs present in a region that is 2 kilobases upstream and 1 kilobase downstream of the transcription start sites of genes.
The Short Tandem Repeats in Regulatory Regions Table, STaRRRT, contains the results of the genome-wide analysis, outlining the characteristics of 5,264 STRs present in the upstream regulatory region of 4,441 human genes. Gene set enrichment analysis has revealed significant enrichment for STRs in cellular, transcriptional and neurological system gene promoters and genes important in ion and calcium homeostasis. The set of enriched terms has broad similarity to that seen in coding regions, suggesting that regulatory region STRs are subject to similar evolutionary pressures as STRs in coding regions and may, like coding region STRs, have an important role in controlling gene expression.
STaRRRT is a readily-searchable resource for investigating potentially polymorphic STRs that could influence the expression of any gene of interest. The processes and genes enriched for regulatory region STRs provide potential novel targets for diagnosing and treating disease, and support a role for these STRs in the evolution of the human genome.
Short tandem repeats; STR; Microsatellites; Simple sequence repeats; SSR; Promoter; Regulatory region; Neurological disease; Neural genes; Evolution
Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10-17. This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications.
autosomal STRs; DNA profile data bank; Korean; microvariant; population database
There is a lack of information on how individual microsatellite loci differ with respect to their mutation properties. Such variation will have an important bearing on our understanding of the ubiquitous occurrence of simple repeat sequences in eukaryotic genomes and on deriving proper mutation models that can be incorporated into genetic distance estimates. We genotyped ∼100 families of the bird barn swallow (Hirundo rustica) for two hypervariable (heterozygosity >95%) microsatellite markers: HrU6, an (AAAG)n tetranucleotide repeat, and HrU10, an (AAGAG)n pentanucleotide repeat. A total of 27 germline mutation events were documented, corresponding to mutation rates of 0.57% (HrU6) and 1.56% (HrU10). The mutation rate increased with allele size, at ∼0.1% per repeat unit over the observed range of allele sizes (∼10–100 repeat units). Single repeat unit changes dominated, with 21/27 mutations representing the gain or loss of one repeat unit. There was no clear difference in the number of gains versus losses nor was there an effect of allele size on the magnitude or direction of mutation. Unexpectedly, the mutation rate of females (maternally transmitted mutations) was 2.5–5 times higher than that of males. Contrasting these observations with mutation data from other microsatellite loci reveals differences not only in the mutation rate, but also in the magnitude, direction and effect of sex on mutation. Thus, microsatellite mutation and evolution may be viewed as a dynamic and variable process.
Short tandem repeat (STR) polymorphisms have been firmly established as standard DNA marker systems since more than 15 years both in forensic stain typing as well as in paternity and kinship testing. However, when analyzing genetic relationships in deficiency cases, STRs have a couple of disadvantages due to the sometimes poor biostatistical efficiency as well as the possibility to observe one or more genetic inconsistencies that could also be explained by mutational events. In such situations, additional robust markers with negligible mutations rates such as single nucleotide polymorphisms (SNPs) and insertion/deletion markers (indels) can be used as adjuncts to provide decisive genetic information in favor for or against the assumed relationship. Both SNPs and indels can now be typed more easily using multiplexes of up to 50 loci based on fragment length analysis on instruments available in all routine forensic and paternity testing laboratories, thus making it possible to extend the range of markers beyond the currently used STRs.
Short tandem repeat systems; Single nucleotide polymorphism; Insertion/deletion polymorphism; Kinship testing; Forensic DNA analysis