|Home | About | Journals | Submit | Contact Us | Français|
Genetic information in forensic studies is largely limited to CODIS data and the ability to match samples and assign them to an individual. However, there are circumstances, in which a given DNA sample does not match anyone in the CODIS database, and no other information about the donor is available. In this study, we determined 75 SNPs in 24 genes (previously implicated in human or animal pigmentation studies) for the analysis of single- and multi-locus associations with hair, skin, and eye color in 789 individuals of various ethnic backgrounds. Using multiple linear regression modeling, five SNPs in five genes were found to account for large proportions of pigmentation variation in hair, skin, and eyes in our across-population analyses. Thus, these models may be of predictive value to determine an individual’s pigmentation type from a forensic sample, independent of ethnic origin.
FBI CODIS statistics showed that DNA forensic profiles increased exponentially from 2001–2006 (http://www.fbi.gov/hq/lab/codis/clickmap.htm). However, the DNA forensic profile hits increased linearly, suggesting that the discrepancy between unmatched DNA profiles and hits will continue to increase, as the CODIS DNA database increases. As a means of reducing the pool of suspects of an unmatched DNA profile, ancestry informative markers (AIMs) can indicate the ethnicity of an unknown sample. However, admixed samples are problematic, in that AIMs cannot categorize them into a particular ethnic group. Conceivably, an individual may have a majority of AIMs of one ethnic group, but depending on the amount of admixture, their physical appearance may be different than what might be expected.
While AIMs can enable inferences to be made about the ethnicity of an unknown sample, they do not enable identification of an individual based on physical characteristics within an ethnic group, hence as suggested by Tully (1), phenotype prediction based on genetic tests may be a useful tool in forensic analysis. In this study, we assayed genetic markers associated with pigmentation genes (some of which are also AIMs) to determine how much variation in human pigmentation they account for. These markers have been previously studied within specific ethnic groups. We demonstrate that in ethnically mixed populations, these same markers account for a significant fraction of the pigmentation phenotype.
There are two types of melanin, eumelanin (brown/black) and pheomelanin (yellow/red) that differ in sulfur content (2). Variation in human pigmentation results from differences in the type of melanin and amount of melanin synthesized in specialized vesicles (melanosomes) within pigment cells (melanocytes) and in the size, shape, and export of those melanosomes to the hair and skin (3). Although the rate of synthesis of melanin is much lower in the adult eye than in the skin and hair, additional background color is contained in the iris, making eye color a more complex trait. Genes previously implicated in mediating pigment variation include the melanocortin-1 receptor (MC1R) gene and a gene encoding its inhibitor, agouti signaling protein (ASIP); two genes associated with oculocutaneous albinism, P (OCA2) and SLC45A2 (OCA4, formerly named MATP); and most recently, SLC24A5, the human orthologue of the zebrafish golden gene.
Multiple polymorphisms in the MC1R gene have been linked to red hair and fair skin (4–8), and the ASIP gene, encoding an inhibitor of the MC1R ligand, α-MSH, has been linked to skin pigmentation (9–12). A wide range of OCA phenotypes has been noted for various mutations of OCA2 (13) and SLC45A2, formerly named MATP (14–16). Similarly, a wide range of coat color phenotypes is seen in mice with various mutations in their respective orthologous genes (p, [17,18] and Slc45a2, formerly named uw: ). These observations suggested that variations in these genes may be associated with variation in the normal range of human pigmentation. Indeed, gene(s) associated with brown eyes and brown hair were found to map to chromosome 15q, with the OCA2 gene as a prime candidate (20). Population studies have shown that specific polymorphisms in two genes associated with albinism, the OCA2 gene and the SLC45A2 gene, are strongly associated with variations in normal pigmentation of the hair (20–22), skin (22–25), and eyes (22,26–31).
In addition to its effects on pigmentation variation, the SLC45A2 polymorphism rs16891982 (L374F) is a useful marker of population origin (32,33). Another marker of population origin that plays an important role in pigmentation, SLC24A5 (or NCKX5, the human orthologue of the zebrafish golden gene), has been recently identified, with a coding polymorphism divergent between European/Caucasians and other human populations (32,34). SLC24A5 has been shown to biologically affect pigmentation in zebrafish, Danio rerio (34), murine, and cultured human epidermal melanocytes (35).
Alleles of other pigmentation genes such as tyrosinase-related protein 1 (TYRP1) and dopachrome tautomerase (DCT) have been statistically associated with human iris pigmentation (28). In association with certain alleles of other genes, specific alleles of agouti signaling protein have also been associated with human iris color (28) as well as skin color (9–11).
Previous studies have focused on the effects of a limited number of genes on hair or skin color (4,5,24,34). The most comprehensive studies to date have focused on genome-wide-association of SNPs with pigmentation within specific populations (8,12,36,37). In this report, we set out to determine the markers predictive for human pigmentation, independent of ethnic origin. We assayed 75 polymorphisms in 24 genes that were previously implicated in human or animal pigmentation studies for the analysis of single- and multi-locus associations with hair, skin, and eye color in 789 individuals of various ethnic backgrounds. Multiple linear regression (MLR) modeling revealed that a surprisingly small number of markers account for large proportions of pigmentation variation in hair, skin, and eyes in our across-population analyses.
Informed consent was obtained from 791 participants recruited at the University of Arizona between the ages of 18 and 40 with no gray hair and at least 1 inch (measured from the roots) of un-dyed scalp hair. Participants in this study roughly mirror the ethnic composition of the student population. Phenotype data, hair samples, and buccal cell samples were collected from each participant following an Institutional Review Board-approved protocol. Participant’s hair and eye color were independently scored by an investigator; participants also indicated other relevant information, such as tanning response and ethnicity. Buccal cell samples were collected using Catch-All Sample Collection Swabs (Epicentre, Madison, WI) and processed for DNA according to the manufacturer’s protocol.
Approximately 300 scalp hairs (1 cm at the base) were collected from each subject. Hair samples from 186 randomly selected participants were analyzed for both total melanin (combined amount of eumelanin and pheomelanin) content and the two subtypes of melanin, eumelanin and pheomelanin, following a previously published protocol (38,39). As not all genotypes were determined for all SNPs for all individuals, subsets of the 186 samples (54–185) were used in generating the multiple linear regression models discussed later.
Skin reflectance was measured as others (40) have with a portable spectrophotometer (Mercury 1000, Datacolor International, Lawrenceville, NJ) fitted using a 15-mm aperture. This device measures in the visible light range of 400–700 nanometers, at intervals of 20 nm. Three independent reflectance measurements (measured as CIEL, L [lightness] scale of the International Committee on Illumination) of the inner aspect of the upper arm were recorded and averaged for each participant.
Eye color was measured by matching subjects’ eye color to the Kolberg Iris Color Chart® (ocularistsupplies.com) and recorded. Measurements were binned into six different color categories based on another study (41) that correlated pigmentation content to color. Categories were binned one through six (1 = blue, 2 = yellow brown, 3 = green, 4 = packets of brown + blue/green, 5 = brown, and 6 = dark brown/black), where bin 1 corresponded to the least amount of pigmentation, and 6 corresponded to the highest pigmentation.
Some SNPs were determined by sequencing the PCR products (Fig. 1). In this case, each PCR consisted of the final concentrations/quantities of the following: 1 ×PCR buffer, 1.5 mM MgCl2, 10 pmol of each primer (forward and reverse; Table 1), 0.25 mM dNTPs, 1 U Taq, and ddH2O to 20 μL. PCR amplification was performed using a PTC-200 Thermal Cycler (MJ Research, Water-town, MA). Thermal cycle program was as follows: 3 min at 95°C, 34 cycles of 30 sec. at each temperature setting (95°C, 55°C, and 72°C), and a final extension of 5 min at 72°C.
Sequencing was performed by the Genomic Analysis and Technology Core at the University of Arizona on a 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA). In addition, the first 287 DNA samples collected were genotyped at 75 SNPs in 24 genes implicated in melanin biosynthesis by DNA Print Genomics (Sarasota, FL) following a previously published protocol (27). The remaining 504 individuals were also genotyped by DNA Print Genomics (Sarasota, FL) for 37 of the 75 SNPs that were statistically significant.
SNP rs12913832 (HERC2) was genotyped using TaqMan assay C__30724404 (Applied Biosystems).
Statistical analysis was performed using SAS (version 9.1) and JMP IN (release 5.1) statistical analysis software (SAS Institute, Cary, NC). The pool of SNPs was reduced from 75 (Fig. 1; Table 2) to 40 (within 15 genes) by choosing SNPs that were statistically significant (p < 0.05, Table 2) by ANOVA (n = 287 samples). Finally, rs1426654 (SLC24A5), rs6058017 (ASIP), and rs12913832 (HERC2) were genotyped for all individuals (n = 791 samples). Two samples were dropped from the study (sample 102 and 352) because of bookkeeping inconsistencies. In accordance with standard statistical procedure, the natural log of the ratio of eumelanin-to-pheomelanin was used to normalize the data.
A total of 40 SNPs (Fig. 1; Table 2) were used to build MLR models of three SNPs. Initially, forward, reverse, and mixed stepwise regression methods were used to trim the number of SNPs. Different models were obtained based on the method used. To determine SNPs that were most significant in a three-SNP MLR model, we used SAS to generate all possible models of three SNPs from the pool of 40 significant SNPs. All models were then plotted using squared regression coefficient (R2) in descending order of value, and inflections in the resulting curves were noted. To find the basis of these inflections, we plotted histograms using JMP IN that contained all SNPs that comprised all models up to the R2 inflections (along the steepest initial slopes). In doing so, it became obvious which SNPs were predominantly responsible for the inflections. This method was performed in determining SNPs for each trait.
Our primary aim was to develop a forensic DNA test predictive for pigmentation phenotype. Therefore, we analyzed the data across all populations of the study. The pool of SNPs was reduced from 75 to 40 SNPs by ANOVA (p < 0.05) (Fig. 1; Table 2). To determine the most likely SNPs for a given trait, all possible combinations of three-SNP models (40 choose 3) were generated, sorted by descending R2 value, and graphed. The three most frequent SNPs found in the range from the highest R2 model to the first inflection of the graph were used to construct an MLR model for each trait. These models accounted for significant variation in each of the four measured traits: scalp-hair total melanin (76.3%), natural log of the ratio of eumelanin-to-pheomelanin (43.2%), skin reflectance (45.7%), and eye color (74.8%) (Table 3).
The R2 curve generated from the three-SNP models showed SNPs rs16891982 (SLC45A2), rs1426654 (SLC24A5), and rs12913832 (HERC2) to be the most frequent SNPs (310 or 3.1%) above the first inflection of the R2 curve (Fig. 2A). Together these three SNPs yielded an R2 of 76.3% (n = 143 samples).
The R2 curve generated from the three-SNP models showed SNPs rs16891982 (SLC45A2), rs1426654 (SLC24A5), and rs1805007 (MC1R) to be the most frequent SNPs (430 or 4.4%) above the first inflection of the R2 curve (Fig. 2B). Together these three SNPs yielded an R2 of 43.2% (n = 162 samples).
The R2 curve generated from the three-SNP models showed SNPs rs16891982 (SLC45A2), rs1426654 (SLC24A5), and rs2424984 (ASIP) to be the most frequent SNPs (750 or 7.6%) above the first inflection of the R2 curve (Fig. 2C). Together these three SNPs yielded an R2 value of 45.7% (n = 447 samples). An interaction term of rs2424984 (ASIP) and rs16891982 (SLC45A2) increased the R2 value of the model by approximately 4% to 49.6% (n = 447 samples).
The R2 curve generated from the three-SNP models showed SNP rs12913832 (HERC2) to be the most frequent SNP (710 or 7.2%) above the first inflection of the R2 curve (Fig. 2D). rs12913832 yielded an R2 value of 74.8% (n = 484 samples). The next two inflection points showed SNPs rs12913832 (HERC2), rs16891982 (SLC45A2), and rs1426654 (SLC24A5) to be the most frequent SNPs. Together these three SNPs yielded an R2 value of 76.4% (n = 353 samples).
Using MLR, we have shown that a large proportion of variation in hair, skin, and eye color across diverse human populations can be accounted for by a small number of SNPs. Starting from an initial candidate group of 75 SNPs, we found by ANOVA that 40 of these SNPs were significantly (p < 0.05) associated with hair pigmentation, skin reflectance, or eye color phenotype. Initially, we trimmed the number of SNPs by forward, backward, and/or a combination of both step-wise regression methods. However, each method yielded a different model. In an attempt to circumvent this problem, we modeled all possible combinations of significant SNPs. The R2 values of the top models differed on average by <3–10 thousandths of a percent or smaller. The question arose as to which was the best model. To answer this question, we chose SNPs that were most frequent in the highest R2 three-SNP models. This method ultimately trimmed the SNPs to five SNPs (three coding and two noncoding) in five genes that accounted for most of the variance (76.3% for hair total melanin; 43.2% for hair eumelanin-to-pheomelanin ratio; 45.7% for skin CIEL; and 74.8% for eye color).
Previous studies have examined the overall color of hair in relationship to various genetic markers. Overall hair color is the result of at least two parameters: total melanin and the ratio of eumelanin-to-pheomelanin. These can be measured objectively by chemical analysis (2). We found that although both parameters are associated with SNPs from SLC45A2 and HERC2, they differ in the third most significant genetic contributor (SLC24A5 for total hair melanin and MC1R for the ratio of eumelanin-to-pheomelanin).
A high proportion of phenotypic variance of total hair melanin (76.3%) can be accounted for by three SNPs: rs1426654 (SLC24A5), rs16891982 (SLC45A2), and rs12913832 (HERC2) (Fig. 2A). While SLC24A5 is an AIM, it plays an important role in pigmentation (34). And in studies carefully controlled for population stratification, the coding SNP rs1426654 is indeed a determinant for normal human pigmentation variation (42). Similarly, rs16891982 (SLC45A2) has been shown to be an AIM and has been shown to be associated with pigmentation variation in mice (19) and humans (22,32) in studies controlled for population stratification (42).
For the analysis of total hair melanin, SNP rs12913832 (HERC2) was the third most significant contributor in a three-SNP model. Although this SNP lies within the HERC2 gene, it may be part of a promoter region for the adjacent OCA2 gene (37,43–46). OCA2 has been associated with albinism in diverse populations (13) and has also been associated with hair color by linkage analysis (20).
For the natural log of the ratio of eumelanin-to-pheomelanin, a high proportion of phenotypic variance (43.2%) can be accounted for by three SNPs. Two of these are in common with total hair melanin: rs1426654 (SLC24A5) and rs16891982 (SLC45A2) (Fig. 2B). SNP rs1805007 (MC1R) was the third most significant contributor in a three-SNP model. MC1R has been shown to be a major determinant in whether eumelanin or pheomelanin is produced (21,47). MC1R variants that decrease the protein’s functionality have been shown to be associated with an increased incidence of red hair color in humans (4,5,48). Chemically, this translates to increased pheomelanin relative to eumelanin production.
Different studies have analyzed MC1R in different ways. Many analyzed specific populations to determine which SNPs were associated with variation in hair color, other studies focused on specific populations and on red hair. Moreover, the statistical analyses employed were different, and different variants have been studied. In this study, we analyzed for significance by (i) one-way ANOVA and (ii) a three-SNP model based on the frequency of contributors in the models with the highest R2 values (Table 4).
Although we did not examine all known SNPs of MC1R, some of the SNPs we examined have been studied in relationship to red hair color. Valverde et al. (1995) found rs2228479 (V60L) in combination with other nonsynonymous MC1R SNPs in British and Irish to be associated with red hair color. In contrast, two studies (5,7) did not find an association of rs2228479 with red hair. Looking across populations of all hair colors, we did not find rs2228479 to be significant by ANOVA. In addition, Flanagan et al. (2000) and Branicki et al. (2007) found that rs1805007 (R151G) in combination with rs1805008 (R160W) to be associated with red hair color. Moreover, Sulem et al. (2007) found both of these SNPs to be associated with hair color in Icelandic and Dutch populations. In our three-SNP MLR model, rs1805007 (R151G) was the third most important genetic contributor for the ratio of pigmentation in hair color. We note that our analysis of the ratio of hair pigmentation did not focus on any particular hair color.
A high proportion of phenotypic variance of skin reflectance (45.7%) can be accounted for by three SNPs: rs1426654 (SLC24A5), rs16891982 (SLC45A2), and rs2424984 (ASIP) (Fig. 2C). Studies have examined SNPs rs1426654 (SLC24A5) and rs16891982 (SLC45A2) and found both to be associated with normal human pigmentation within various ethnic groups and allele frequencies across various ethnic groups (24,32,34,42). Thus, rs1426654 (SLC24A5) and rs16891982 (SLC45A2) mediate pigmentation variation, and they are also AIMs. Lamason et al. (34) found SNP rs1426654 to be an AIM in determining European vs. non-European ethnic origin. Stokowski et al. (2007) controlled for population stratification and showed rs1426654 (SLC24A5) to be significantly associated with a dichotomously defined skin reflectance in South Asians. Although it is an AIM, it does not clearly distinguish between Europeans and Sri Lankans (32). In contrast, SNP rs16891982 (SLC45A2) does distinguish between Europeans and Sri Lankans (32).
The third most significant genetic contributor was SNP rs2424984 (ASIP). ASIP has been shown to be associated with skin pigmentation (12), namely for rs6058017 (9,10). Although we found rs6058017 to be significant by ANOVA, we did not find it to be a better predictor in skin reflectance than rs2424984. For both a single SNP and a three-SNP model, rs2424984 was a better predictor for skin reflectance.
Other studies have found MC1R (21,25,43,47), OCA2, and DCT (49) to be associated with normal skin pigmentation. Similarly, we found SNPs within the above genes to be associated by ANOVA with normal skin pigmentation. However, none were found to be significant contributors in a three-SNP MLR model. This does not imply that these genes are not important in pigmentation, it simply means that our method was unable to detect their significance. This may be attributed to a variety of factors such as sampling error because of small sample size, different populations studied, and/or different methods of measuring skin reflectance.
A high proportion of phenotypic variance of eye color (74.8%) can be accounted for by one SNP rs12913832 (HERC2) (Fig. 2D). Previous studies have examined eye color in relationship to various genetic markers. OCA2 (8,20,26,28,29), SLC45A2 (22), and MC1R (8,50) have been statistically associated with variation in eye color.
Duffy et al. (2007) found three SNPs within intron 1 of OCA2 that when considered as a haplotype–diplotype explained about 74% of eye color variation. However, there are significant differences between our results and theirs. Among the differences were populations studied, SNPs genotyped, and binning of eye colors. The population they studied was Northern European, whereas we analyzed across various ethnic populations, and we genotyped additional SNPs within genes other than OCA2.
Most recently, genome-wide-association studies have shown that intronic SNPs of a gene 5′ to OCA2, HERC2, have given the highest association with eye color (37,43,45). Studies suggest that HERC2 contains a promoter region for OCA2 (37,43,45,46,51). To date, SNP rs12913832 (HERC2) in intron 86 was shown to have the highest association and most likely causative SNP in determining eye color (45), explaining 68% of the variance between blue eye and brown eye color in an Anglo-Celtic population. Therefore, rs12913832 is likely the causative SNP for European/Caucasians. We note that rs12913832 varies in Caucasian/European populations (where eye color is varied), but only one allele is found in non-Caucasian/European populations (where most non-Caucasian/Europeans have brown/dark eye color). Regardless of whether rs12913832 is causative or an AIM, it is extremely predictive for eye color both within and across populations.
A three-SNP model of rs12913832 (HERC2), rs16891982 (SLC45A2), rs1426654 (SLC24A5) is only marginally better at explaining the variance (76%) than rs12913832 (HERC2) alone. Like, rs12913832 (HERC2), SNPs rs16891982 (SLC45A2), rs1426654 (SLC24A5) are AIMs that distinguish European/Caucasians from non-European/Caucasians. However, individually rs16891982 (SLC45A2) accounts for 38% of eye color variance, and rs1426654 (SLC24A5) accounts for 34% of eye color variance, significantly less than rs12913832 (HERC2) that accounts for 74% of the variance.
Statistical interaction of SNPs rs12910433 (OCA2) and rs2228479 (MC1R) has been associated with skin reflectance in a Tibetan population (40). We did not find this interaction; however, we did find an interaction between SNPs rs16891982 (SLC45A2) and rs2424984 (ASIP) for the across-population skin reflectance MLR model. These differences may reflect our use of an ethnically diverse sample; thus, the statistical product of these two genes might be more predictive of skin reflectance across populations. Other interactions, such as between rs12913832 (HERC2) and rs1800407 (OCA2), may exist in explaining variation in eye color (45), but we cannot confirm this from our data.
We found five SNPs in five genes that were informative for normal human pigmentation (SLC45A2, SLC24A5, MC1R, HERC2, and ASIP). Three of these (SLC45A2, SLC24A5, and MC1R) were coding. Clearly, variations within these proteins can result in functional variation that contributes to the pigmentation variation. In contrast, the most significant SNPs found in OCA2-HERC2 and ASIP were noncoding. We genotyped most of all known coding SNPs within these genes that showed allele frequency differences across populations, and none were as significantly correlated with pigmentation as the informative noncoding SNPs. This suggests that the regulation of the expression of OCA2 and ASIP may underlie pigmentation variation. Although our results demonstrate that relatively few SNPs in a relatively few genes control a significant proportion of normal human pigmentation variation, it is certain that additional polymorphisms in these and other genes account for the remaining variation.
The polymorphisms that we report are in genes known to regulate pigmentation. Some or all of the variance in pigmentation explained by them may be because they are also AIMs. Nevertheless, they are predictive markers for normal human pigmentation variation across various ethnic backgrounds. We note that these models of human constitutive pigmentation phenotype have significant implications for forensic science. These results suggest that assays can be developed, independent of ethnicity, to predict hair, skin, and eye color from DNA samples. Preliminary analysis of an independent sample set (n = 261 samples) has validated the predictive utility of these models (Valenzuela et al., in preparation). In addition, recent genome-wide-association studies have uncovered a variety of other candidate SNPs that contribute to phenotypic variation in pigmentation. These include SNPs rs12896399 (SLC24A4), rs12203592 (IRF4), rs1540771 (6p25.3), and rs35264875 (TPCN2) (12,43,46). Future studies are needed to evaluate these and other SNPs as they are uncovered, to further refine these predictive models such that they can accurately predict the pigmentation phenotype of the donors of otherwise unknown forensic samples.
We thank Domonique Smith and Benjamin Metelits for their technical help with this project.
*Funded by the National Institute of Justice (2002-1J-CX-K010).
Conflict of Interest Statement
Dr. Tony Frudakis is and Dr. Matthew Thomas was an employee and shareholder of a company, DNAPrint, which performs forensic DNA analysis.