|Home | About | Journals | Submit | Contact Us | Français|
In support of the long-held idea that cone ratio is genetically determined by variation linked to the X-chromosome opsin gene locus, the present study identified nucleotide differences in DNA segments containing regulatory regions of the L and M opsin genes that are associated with significant differences in the relative number of L versus M cones. Specific haplotypes (combinations of genetic differences) were identified that correlated with high versus low L:M cone ratio. These findings are consistent with the biological principle that DNA sequence variations affect binding affinities for protein components of complexes that influence the relative probability that an L versus M opsin gene will be silenced during development, and in turn, produce variation in the proportion of L to M cones.
The relative numbers of long-wavelength (L) to middle wavelength sensitive cones is highly variable among men with normal color vision (DeVries, 1946; Rushton & Baker, 1964; Dartnall et al., 1983; Pokorny et al., 1991; Roorda & Williams, 1999; Carroll et al., 2002; Hofer et al., 2005), and it has long been suspected that variation in L:M cone ratio is X-linked (DeVries, 1946; Smallwood et al., 2002, McMahon, 2004 #872). L and M photoreceptors represent a cell type distinct from S cones (Bumsted et al., 1997), and evidence has accumulated recently indicating that L and M cones represent essentially a single cell type differentiated only by which X-chromosome opsin gene is selected for expression (Smallwood et al., 2002). In humans and other Old World primates the selection mechanism has been proposed to involve a competition between activation and silencing of adjacent cone opsin genes (Knoblauch et al., 2006). During development, all but one of the opsin genes are silenced so that mature adult L and M cones express a single opsin gene (Hagstrom et al., 2000). In a previous study, the flicker photometric electroretinogram was used to estimate the percentage of cones that are L (defined as 100*L/(L+M)) in a sample of males with normal color vision (Carroll et al., 2002), and the distribution of the percentage of L cones passed a statistical test for normality (Kolmogorov-Smirnov Test). A characteristic of inherited quantitative traits such as the percentage of L cones is that the greater the number of genetic loci that contribute to the phenotype, the more closely the distribution of the trait will approach a normal distribution (Fisher, 1918; Strachan & Read, 2000). The distribution of percent L cones thus is consistent with the cumulative effects of multiple genetic alterations contributing to the relative proportion of L to M cones. Polymorphisms at the X-chromosome cone opsin locus emerge as the most likely candidates for causing variation in the relative numbers of L and M cones by virtue of their potential to influence the relative strengths of activation versus silencing of the opsin genes and their ability to cumulatively produce a relatively broad, continuous range of proportions of L to M cones. We tested the hypothesis that genetic variation in the ~8,000 base pair region adjacent to the protein coding sequence for the L opsin gene on the X-chromosome affects the relative proportion of L to M cones.
All experiments involving human subjects were conducted in accordance with the Declaration of Helsinki, and were approved by the Institutional Review Board at the Medical College of Wisconsin. Subjects were 94 males who had previously passed several standard tests for color vision deficiency including Ishihara's 24-plate test, Richmond HRR 2002 edition, and the Neitz Test of Color Vision. Subjects identified themselves as being either of Caucasian descent (n = 67) or of African descent (n = 27). The proportion of L cones, expressed as the percentage of L plus M cones that were L, had been determined previously for each subject from a combination of genetics and the flicker photometric electroretinogram (FP-ERG) (Carroll et al., 2002; McMahon et al., 2008). Here we applied a correction factor to the estimated proportion of L cones to account for the 1.5 times greater contribution to the FP-ERG signal for each M cone compared to each L cone (Hofer et al., 2005).
Nucleic acid was isolated from buccal swabs or whole blood using PureGene DNA extraction kits. A series of polymerase chain reactions (PCRs) were designed to provide full coverage of the genomic DNA region upstream of the L opsin gene (amplicons 1-31, Table 1) and for the region upstream of the M opsin gene (amplicon 32, Table 1). X-chromosome nucleotides 152,926,644 through 152,930,676 (amplicons 13-31, Table 1) and 152,968,633 through 152,969,085 (amplicon 32) were sequenced for all 94 subjects (nucleotide numbers correspond to the May 2004 assembly of the Human Genome) as described below under DNA sequencing. The full reference sequence of the region examined can be accessed at http://genome.ucsc.edu. Nucleotides 152,923,038 through 152,926,801 (amplicons 1-13, Table 1) were analyzed for a subset of subjects, most of whom had a percentage of L cones that deviated from the mean for their ethnic group by more than one standard deviation, and thus represent the extremes for high and low percentage of L cones in our subject pool. For reference, the L opsin gene coding sequence starts at nucleotide 152,930,605 (note: amplicon 31 extends into the L opsin protein coding region), and the M opsin coding sequence starts at nucleotide 152,969,104.
As indicated in Table 1 the polymerase was either AmpliTaq Gold (0.25 μl), rTth DNA Polymerase XL (1 μl), Takara (0.5 μl), or a mixture of AmpliTaq Gold and Pfu in a 9:1 volume:volume ratio (0.25 μl) (Gunther et al., 2006). Final reaction volumes were 100 μl for reactions with rTth XL enzyme and 50 μl for all other reactions. Final concentrations for standard reaction conditions were 200μM of each of dATP, dTTP, dCTP and dGTP, 1.4mM either MgCl2 (AmpliTaq Gold) or Mg(OAc)2 (rTth XL), 0.25μl of AmpliTaq Gold or 1μl of rTth XL enzyme, and 1μl genomic DNA. In order to improve the yield of PCR product, we used non-standard conditions for amplicons 20, 21, 24, and 29 (Table 1). The final concentration of dATP, dTTP, dCTP, and dGTP was 400 mM for amplicon number 20 and 100 mM for amplicons 21, 24, and 29. The final concentration of MgCl2 was 1.0 mM for amplicons 21, 22, 23, and 29, and 3 mM for amplicon 24. For amplicon number 20, the Takara enzyme was used with the manufacturer-provided buffer containing MgCl2.
Standard thermal cycling conditions for reactions using AmpliTaq Gold included an initial 9 minute incubation at 95°C to activate the enzyme, followed by 37 cycles of 45 seconds at 94°C, 45 seconds at the annealing temperature given in Table 1, and 1 minute at 72°C. A final incubation was done for 7 minutes at 72°C. Standard thermal cycling conditions for reactions using the rTth XL enzyme included an initial incubation for 1 minute at 94°C, followed by 35 cycles of 15 seconds at 94°C, 30 seconds at the annealing temperature given in Table 1, and 1 minute at 72°C. Reactions were then incubated for 10 minutes at 72°C. Standard thermal cycling conditions for reactions using AmpliTaq Gold mixed with Pfu included an initial 10 minute incubation at 94 °C followed by 34 cycles of 20 seconds at 94°C, 1 minute at a changing annealing temperature, and 1 minute at 72°C. In the first cycle, the annealing temperature was 2°C above the annealing temperature given in Table 1, and it was decreased by 0.5°C for each successive cycle through cycle 14. Cycles 15 through 34 used the same temperature as in cycle 14 and this was 5°C below the temperature given in Table 1. After cycle 34, reactions were incubated for 10 minutes at 72°C.
Amplicons 14, 15, 16, 17, 19, 22, 26, 28, and 32 were screened for sequence variations by analyzing heteroduplex formation in dHPLC on the Wave-MD (Transgenomic, Omaha, NE). Heteroduplexes form when two strands of DNA differ by as little as a single nucleotide; however, heteroduplexes should not form in the samples from individual male subjects because they should have one X-chromosome. To screen for DNA polymorphisms by heteroduplex formation, an amplicon from each subject was mixed with the same amplicon from control subject 015, and in a separate tube, with the same amplicon from control subject 005. The nucleotide sequences of amplicons from control subjects were determined directly, as described below. To facilitate heteroduplex formation, amplicon mixtures were incubated in a thermal cycler with an initial temperature of 95°C, then decreased to 48°C in 0.5°C increments at 20 second intervals, then held at room temperature until assayed. Heteroduplex analysis was performed on the Wave-MD according to the manufacturer's instructions, and using one or two temperatures. Two temperatures were used for amplicons lacking known polymorphic positions in order to increase our chances of detecting polymorphisms. The amplicons and temperatures used for dHPLC analysis were as follows: amplicon 14 64.7°C and 64.8°C, amplicon 15 61.5°C and 61.8°C, amplicon 16 63.0°C and 63.3°C, amplicon 17 64.5°C and 64.7°C, amplicon 19 61.6°C and 61.8°C, amplicon 22 58.4, amplicon 26 61.6°C and 61.9°C, amplicon 28 65.2°C and amplicon 32 62.5°C. DNA polymorphisms discovered by this method were verified by direct sequencing for all subjects who had the low frequency variant and for a subset of subjects with the high frequency variant.
Screening for DNA sequence polymorphisms was done by direct sequencing of the two dHPLC control subjects (015 and 005), for spot-checking dHPLC results, and for all amplicons in Table 1 except for those listed above for which dHPLC screening was performed. Amplicons were sequenced with the primers indicated in Table 1 that were used in the initial PCR amplification. Sequencing reactions were done using the AmpliTaq FS kit (Applied Biosystems, Foster City California) following the manufacturer's recommendations and analyzed on an ABI 3100 XL or an ABI 3100 Avant.
Amplicons 2 and 7 (Table 1) each contained homopolymeric regions (extended regions of a single repeated nucleotide) which could not be determined unambiguously by direct sequencing. Fragment analysis was used instead to evaluate the homopolymeric regions for variations in length across subjects. Amplicons 2 and 7 were amplified as described above except that one primer in each pair was 5'-end labeled with the fluorescent dye 6FAM and PCR-amplified (Table 1). PCR products were purified using Microcon YM-30 Centrifugal Filter Devices (Millipore), diluted 1:2, 1:10, and 1:100 with water, and 0.5 μl of the diluted PCR product was mixed with 9.25μl formamide and 0.25μl GeneScan LIZ500 size standard. Mixtures were heat denatured for 4 minutes at 95°C, cooled on ice for 5 minutes, run on an ABI 3130XL and analyzed using GeneMapper software.
Twenty-eight single nucleotide polymorphisms (SNPs) and four insertion/deletions (indels) were identified within the 7,567 base pair region upstream of the L opsin coding sequence in our subject population (Table 2). Another three polymorphic positions, two SNPs and one insertion/deletion, occurred in the 452 base pairs upstream of the M-opsin coding sequence (Table 2). For each subject, we constructed a complete haplotype (a combination of polymorphisms) comprised of all of the polymorphic positions identified in this study, and we constructed two partial haplotypes. Haplotypes are given in Table 3. The complete haplotype for each subject was generated by stringing together the nucleotides present at each of the polymorphic positions listed in the first column of Table 2, and partial haplotypes were constructed by stringing together the nucleotides present at the polymorphic positions for two subregions. One subregion includes the known transcriptional control elements for the L and M opsin genes and extends from 152,926,644 to 152,923,076 (amplicons 13-31, Table 1) and from 152,968,633 to 152,969,085 (amplicon 32, Table 1), henceforth referred to as the proximal subregion. The second subregion spans a DNA segment that was previously reported to contain an SNP that correlated to the ratio of L:M opsin mRNA in donor retinas (Stamatoyannopoulos et al., 2005). This segment extends from nucleotide 152,923,038 to 152,926,801, and will be referred to as the distal subregion. The rationale for splitting the haplotypes into two subregions for further analysis is two fold. First, as is evident in Table 3, when evaluating all of the polymorphic positions in the complete haplotype the number of different haplotypes is large with very low representation for each individual haplotype, and this is not amenable to statistical analysis. Second, the proximal subregion contains the known DNA regulatory elements that are involved in controlling transcription of the opsin genes and polymorphisms in this region are likely candidates for influencing the relative number of L versus M cones.
In the complete haplotypes (Table 3), the thirty-two polymorphic positions defined eight Caucasian and eleven African haplotypes. Africans and Caucasians shared the same most frequent haplotype but its frequency was much lower in the African sample, only about 25%, compared to 50% of the Caucasian sample. No other overlap between African and Caucasian haplotypes occurred. Similar to other reports (i.e., (Verrelli & Tishkoff, 2004), in comparison to Caucasians, there was more diversity among the complete African haplotypes, with nine of the fifteen African subjects having a unique haplotype whereas only six of eighteen Caucasian subjects had a unique haplotype (Table 3).
To evaluate the relationship between cone ratio and haplotype we generated relatively high frequency haplotypes for each ethnic group by evaluating the proximal and distal subregions separately. Any two individuals will differ on average at one in every 1,200 nucleotides, representing the accumulation of rare, random mutations (Venter et al., 2001). By reducing the size of the region and considering only the polymorphic sites that occur at relatively high frequency, we generated a data set that is amenable to statistical analysis. In these comparisons, nucleotide positions at which polymorphisms occurred in fewer than three people were excluded because the small numbers preclude statistical analysis (Table 3). For each ethnic group, this resulted in three proximal subregion haplotypes with at least five people per haplotype and three haplotypes with fewer than five people per haplotype. Further reduction of the proximal segment haplotypes excluding all but the six highest frequency polymorphisms in the proximal subregion for Africans and Caucasians combined allowed all but one Caucasian to be categorized into three proximal subregion haplotypes, and all Africans to be categorized into four haplotypes. These reduced proximal subregion haplotypes were extracted from the data in Table 3 by stringing together only the underlined nucleotides (the six highest frequency polymorphisms) for the proximal subregion. For each ethnic group, the frequency of each of the reduced haplotypes was calculated and a histogram of the data is shown in Figure 1.
Inspection of Figure 1A shows substantial differences in distributions for percentages of L cones for the different Caucasian haplotypes. The mean percent L cones for individuals with the GTCGGA haplotype is 64.6%, the mean for the GTCGGG haplotype is about 8% higher (72.6% L cones) and the GTTGGA haplotype is 6% higher than that (78.4 % L cones). Thus, GTCGGA (low %L haplotype) versus GTTGGA (high %L haplotype) groups are a striking 13.8% different in percent L cones and this difference tests as significant with p = 0.0194 (Kruskal-Wallis nonparametric ANOVA) and p<0.05 following the application of Dunn's correction for multiple comparisons. This difference of 13.8% L cones makes the difference between a ratio of about 2 L for every M cone versus nearly 4 L for every M cone.
Figure 1B shows that the distribution of the percent L cones among African haplotypes is more overlapping compared to Caucasians. Consistent with this, the differences among African haplotypes do not test as being statistically different (p = 0.6607, Kruskal-Wallis nonparametric ANOVA). However, as we have reported earlier (McMahon et al., 2008), there are striking differences in cone ratio between Africans and Caucasians. Part of the difference can now be explained by the fact that the Caucasian “high %L haplotype” GTTGGA is absent from the African sample, while the Caucasian “low %L haplotype” GTCGGA is highly represented among the Africans; yet, these two differences do not completely explain the differences in cone ratio between the Africans and Caucasians because, in our sample, Africans versus Caucasians who share the most common haplotype in the proximal subregion (GGCTCGGGGGGGGCCC(), Table 3) still differ significantly in mean percentage of L cones (unpaired t test, p = 0.0011; 72.6% and 59.5%L cones for Caucasian and African, respectively). This suggests that additional nucleotide differences, such as those that occur in the distal subregion, also tune the relative numbers of L and M cones. Incidentally, one other proximal subregion haplotype is shared by Caucasians and Africans, GGCTCGGGGGAGGCC(), but the median percentage of L cones did not differ between these two relatively small groups (p = 0.7427; 64.6% versus 64.2%L cones).
The distal subregion spans a segment of DNA that was reported previously to contain a polymorphism that was significantly associated with the ratio of L:M cones as estimated from the ratio of L versus M opsin messenger RNA isolated from donor retinas (Stamatoyannopoulos et al.,2005). The polymorphism was reported to lie 7.4 kb upstream of the L opsin gene and to occur at high frequency in the population. We found two polymorphic nucleotide positions, 152,923,125 and 152,923,166, in the distal subregion that lie 7.4 kb upstream of the L opsin gene and that occur at high frequency (Table 2); however, the difference in the mean percentage of L cones for the subjects that differ at these two positions was not statistically significant in our sample (Mann Whitney U test p = 0.2786).
How might the differences among haplotypes affect the percentage of L cones in the retina? Polymorphisms in the regions examined in this study have a high potential for altering the binding affinity for proteins involved in regulating transcription and thereby in the choice made in each photoreceptor of which X-chromosome opsin gene to express. Differences among the haplotypes may tip the balance more or less in favor of expressing L opsin, and thus are likely candidates for producing subtle differences such as the difference between the intermediate versus the high and low % L cones phenotypes. Recently, evidence has accumulated indicating that the difference between L versus M cones is decided simply by which of the X-chromosome cone opsin genes is chosen for expression (Smallwood et al., 2002, Knoblauch et al., 2006 ). DNA polymorphisms in the region of the opsin genes containing elements involved in controlling gene expression and silencing can influence the transcriptional output of each of the X-chromosome opsin genes. Since each polymorphism can either act neutrally, synergistically, or counter to each other polymorphism, the contribution of individual polymorphisms can be difficult to detect and to quantify, especially if the contribution is relatively minor. Difficulties in detection are compounded by the relatively low frequency of each haplotype.
In this study, after correcting for multiple comparisons, one polymorphism was detected as significantly correlated to the percentage of L cones. For the Caucasian proximal region haplotypes, pairwise comparisons showed a significant difference in the mean percentage of L cones between GTCGGA (relatively low % L cones) versus GTTGGA (relatively high % L cones). The nucleotide position that differs between these haplotypes is C versus T at 152,928,238. This serves as an example of an association between a nucleotide difference and cone ratio that is predicted by the hypothesis that nucleotide polymorphisms upstream of the x-chromosome opsin gene array tune L:M cone ratio.
In summary, this study provides insight into the complex nature of the biological control of the relative number of L versus M cones. The one statistically significant difference identified here supports the principle that the nucleotide differences in the region of the opsin genes are responsible for determining cone ratio. Because of the high degree of variability, identifying which other nucleotides are involved will require testing many more subjects. The present analysis did identify one other candidate nucleotide. High and low % L cones haplotypes differed from the intermediate %L cones haplotype in an A versus G nucleotide at position 152,929,731. After correcting for multiple comparisons in our data set the difference in the mean percent L cones was not quite statistically significant between the GTCGGG haplotype associated with the intermediate %L cones and either the high (GTTGGA) or low (GTCGGA) %L cones haplotypes; however, this nucleotide merits further study in future efforts to identify genetic changes associated with variability in cone ratio.
This work was supported by National Eye Institute Grants F32EY014789 (KLG), T32EY014537, and R01EY09303, P30EY01961and by an unrestricted Research to Prevent Blindness (RPB) grant to the department of ophthalmology, Jay Neitz was the recipient of an RPB Senior Scientific Investigator Award.