|Home | About | Journals | Submit | Contact Us | Français|
To present results from a nested association study of the complement factor H (CFH) gene region using a novel methodology that uses a high‐resolution genetic linkage disequilibrium map to estimate a point location for a causal mutation.
Age‐related macular degeneration (AMD) case–control data from a genomewide single‐nucleotide polymorphism (SNP) panel were used to identify the target interval to be genotyped at higher density in a second independent panel. The pattern of linkage disequilibrium (LD) and segmental duplications across this region are described in detail.
Data were consistent with other studies in that strong association between the Y402H variant and AMD is observed. However, composite likelihood analysis, which combines association data from all SNPs in the region, and uses genetic locations on a high‐resolution LD map, gave a point location for a causal variant between exons 1 and 2 of the CFH gene.
The findings are consistent with evidence that, in addition to the widely described Y402H variant, there is at least one and, most probably, several other mutations in the CFH gene which determine disease manifestation in AMD. A genetic model in which multiple mutations contribute to a varying degree to disease aetiology has been previously well described in ophthalmic genetics, and is typified by the COL2A1 and ABCA4 genes.
The complement factor H (CFH) gene region has recently been implicated in age‐related macular degeneration (AMD) in a number of independent analyses.1,2,3,4,5 The genomewide association, candidate region and candidate gene analyses have each demonstrated that many single‐nucleotide polymorphism (SNP) markers which localise to this gene region show significant association, but rs1061170, through its conferral of an amino acid change at position 402 in the CFH protein, has received particular attention. Y402H has now been shown in various populations to predict increased risk for AMD, and functional evidence is now emerging in support of a causal mechanism.6 Positive staining for complement components in the drusen of patients with AMD supports an argument implicating an aberrant inflammatory process in disease progression.7 The CFH gene is localised within a cluster of complement genes, many of which share high genomic sequence identity that could extend to shared molecular functionality.8 Given the putative location of a causal determinant for AMD within a family of related genes, we assess the evidence for association with the CFH gene in an independent sample using a novel methodological approach that incorporates information on the local pattern of linkage disequilibrium (LD).
The sequencing of the human genome and biotechnological advances in high‐throughput genotyping represent recent milestones in association mapping. Nevertheless, lessons might still need to be learned from the early linkage studies. Simon et al9 reported positive linkage between haemachromatosis gene (HFE) and human leucocyte antigen‐A in 1976, but it was only when Feder et al10 incorporated evidence from the linkage map that the HFE locus was correctly identified 20 years later. This lag was a consequence of investigators inching stepwise along the physical map from HLA‐A towards the HFE gene, a distance of 4.6 Mb. A better understanding of the highly non‐linear relationship between the physical and genetic maps could have accelerated the progress. The linkage map, however, showed little evidence for recombination between HLA‐A and HFE, placed them <1 cM apart and provided the key to isolating the causal locus.
Recent advances have enabled the development of a novel type of genetic map with much greater resolution than the linkage map (in cM), which was so important for mapping of major disease genes. Linkage disequilibrium maps are analogous to the cM maps, but reflect historical recombination by using high‐density marker data from independent subjects within a population rather than rare meiotic events within few families. One set of such maps, developed by Tapper et al,11 have shown excellent concordance with the most recent linkage maps, but with ~1500‐fold higher resolution. The power of these maps to identify recombination hotspots has been further verified by direct observation of meiotic events in sperm.12
We describe a nested association study for AMD within the CFH gene region using genetic distances from LD maps rather than physical locations in kilobases (kb). Evidence for association is tested by modelling association with multiple SNPs rather than using single‐SNP analyses, which necessitate a heavy multiple testing correction. This analytical approach has shown a significant gain in power for association studies.13 The nested study design involves a first round of analysis in a conservatively large genomic region which has been relatively coarsely genotyped to target a smaller region for a second round of analysis using higher density data in an independent sample.
The nested strategy firstly involved detailed examination of the evidence for association of AMD with the CFH gene region using a subset of the genomewide association data described by Klein et al.5 These data represent one of the first whole‐genome SNP studies using the Affymetrix 100 K genotyping array collected for 96 patients with AMD and 50 controls. The second stage of high‐density SNP typing, with a locally ascertained case–control sample, was approved by the Southampton and Southwest Hants Local Research Ethics Committee (approval no. 347/02/t) and followed the tenets of the Declaration of Helsinki. Informed written consent was secured for 100 Caucasian subjects aged >55 years, with a diagnosis of AMD, and for 100 normal Caucasian controls aged >55 years recruited from ophthalmology clinics at the Southampton Eye Unit, Southampton, UK. Patients for the study underwent a detailed ophthalmic examination to characterise AMD phenotypes. Stereoscopic fundus photographs and fluorescein angiograms were recorded using a Topcon digital retinal camera (model TRC50IX; Topcon GB Ltd, Berkshire, UK). These photographs were graded by a masked observer into four groups of increasing disease severity as described in the Age‐Related Eye Disease Study.14 A 10 ml peripheral blood sample was collected from these patients and DNA was extracted according to the salting out method15 and stored at −20°C.
Genotyping of the locally ascertained case–control sample was outsourced to Ellipsis Biotherapeutics, Toronto, Canada. Samples were shipped as 50 μl aliquots of DNA at a concentration of 2 ng/μl and included six randomly selected duplicates. Multiplex PCR and SNP analyses were performed using the GenomeLab SNPstream Genotyping System (Beckman Coulter, Fullerton, California, USA) and its accompanying automated SNPstream software suite. Primers for the multiplex PCR and single‐base extension reactions were optimally designed using web‐based software provided at http://www.autoprimer.com (Beckman Coulter). The high‐throughput SNP genotyping assay was performed according to methods described previously.16
Association analyses were undertaken using the LOCATE17 and CHROMSCAN18 programs. Briefly, these methods compute association (z) and corresponding information (Kz) between each SNP and the AMD phenotype, and model the decline of association with genetic distance using composite likelihood. A location for a disease locus is estimated and its 95% CI is computed. The CHROMSCAN program computes the probability value (p value) through a permutation test which randomly shuffles case–control status to create multiple replicates for analysis under the assumption of no association (H0). The resultant probability distribution from multiple replicates (eg, 10000) enables the computation of p values under H1 that are not distorted because of autocorrelation created by large number of markers in high LD.
All SNPs lying within a 2 Mb interval centred on the CFH gene were identified from the Klein et al5 sample and subjected to quality‐control checks. All duplicate results were concordant. Of the 82 SNPs within this interval, 27 with minor allele frequency <5% and one with a significant deviation from Hardy–Weinberg equilibrium were excluded from further analyses. The remaining 54 SNPs were assigned linkage disequilibrium unit (LDU) locations19 using a chromosome‐1 LD map from the haplotype map of the human genome (HapMap) in the Caucasian (CEU) population (individuals of northern and western European ancestry; http://cedar.genetics.soton.ac.uk/public_html/LDB2000/). Analysis of these data indicated a 95% CI of ~220 kb extending from 193338 to 193560 kb encompassing the CFH, complement factor H‐like 3 (CFHL3) and CFHL1 genes. This constituted the target interval for the second stage of the nested design using high‐density genotyping in an independent sample. Like many of the genes within this genomic region, CFHL3 and CFHL1 seem to have arisen as a result of genomic duplication and form members of the same family of complement proteins. Although much less is known about the functional roles of CFHL3 and CFHL1 gene products, their proteins are rich in the same Sushi domains or complement control protein modules found in CFH, and as such their cellular roles and targets might be similar to or even overlap with CFH.20 These domains are known to be involved in many recognition processes, including the binding of several complement factors to fragments C3b and C4b of the complement system.21
Figure 11 shows the LD map for the 2 Mb region and indicates the limits of the CI. The region as a whole is shorter on the LD scale than the genome average (typically 20 LDUs/Mb in the CEU population), indicating a particularly high level of LD in the critical region.
Initially, SNPs chosen for high‐density genotyping across the target region were enriched for non‐synonymous mutations, potential splice variants and coding SNPs. Preliminary genotyping results indicated a high failure rate among this panel. Investigation using the segmental duplication database22 (http://humanparalogy.gs.washington.edu/) showed a high level of genomic duplication, which is likely to have contributed to genotyping failure (fig 22).). Replacement SNPs were selected to achieve uniform density across the target tract from those verified by the HapMap project to have a minor allele frequency >5% in the CEU population.23
A total of 37 SNPs were successfully typed, achieving an average density of ~1 SNP per 6 kb (table 11).). The CHROMSCAN program was used to assess the significance of the association between the entire tract and AMD. Random shuffling of case–control affection status provided 10000 replicates. These simulation tests (which account for multiple testing and autocorrelation of marker data) established the significance of association between our target region and AMD under the alternative hypothesis as p=1.16×10−5. SNPs from across the region exhibit significant association with the disease phenotype, and, consistent with other studies, the Y402H (rs1061170) variant at 193391 kb has the maximum χ2 when SNPs are examined individually (χ12=29.34). Composite likelihood analysis, which combines weighted information for all markers, estimates a maximum likelihood point location for the disease locus at 12.44 LDUs, which corresponds to a location at 193364 kb, ~25 kb from Y402H but still within the CFH gene. However, the 95% CI is wide at 12.0051–12.8749 LDUs, reflecting the high level of LD across the whole region.
The region surrounding the CFH gene on chromosome 1 has been shown in many studies to have significant association with AMD. Particular attention has been given to the Y402H variant, owing to its particularly high degree of association in single SNP analyses, combined with putative functional relevance of the amino acid replacement it confers. This attention is sustained, despite observation of stronger association between the disease and other non‐coding SNPs in the vicinity.24 The current study adopted a nested approach to examine the evidence for association between AMD and this genomic region. We first analysed low‐density SNP data over a 2 Mb segment extracted from a genomewide association dataset in order to identify a target region before a second round of higher density mapping in an independent sample. The analytical method incorporated a high‐resolution linkage disequilibrium map by assigning LDU locations to informative marker data. The method further contrasts preceding studies in that a model using composite likelihood is used to assess the association data from all marker loci in one analysis. This method is attractive for a number of reasons: it eradicates the necessity for a heavy multiple testing correction, it provides a point estimate for the location of a causal mutation; and it does not rely on arbitrary assumptions regarding haplotype block boundaries.
Our findings from the initial stage of the analysis indicated a 95% CI for association of AMD with a region encompassing the CFH gene and also two other members of the complement gene family—CFHL3 and CFHL1. LD maps created using comprehensive SNP data from the HapMap project show that, although this target interval can be split into a number of unique blocks of LD, historical recombinants which delimit these blocks have been infrequent. Furthermore, this pattern of extended LD observed in the CEU population corresponds very well with the LD map created using an independent sample of African ethnicity from the HapMap dataset.25
The composite likelihood analysis of high‐density SNP data from our locally ascertained case–control panel yielded a point estimate for a causal mutation between exons 1 and 2 of the CFH gene. This estimate is ~25 kb from the Y402H variant and lies within an LD block encompassing the 5′ end of the CFH gene, whereas Y402H lies within an extensive adjacent block which includes the CFHL3 gene. The discrepancy between the point estimate and the maximum single χ2 statistics in the region is interesting but not unprecedented. Previous association studies using case–control data have shown that the location of maximal χ2 statistics for single SNPs can be very misleading in efforts to identify a causal locus. An illustration is an analysis using composite likelihood17 as well as other coalescent‐based methods,26 which showed accuracy and power in identifying a locus with known location causing altered drug response. In both analytical approaches, the authors reported CIs for association, which excluded SNPs exhibiting the maximal χ12>>100 but included the known location.
Our finding compares well with those of a recent study in a Japanese AMD sample, in which variants towards the 5′ end of CFH and not Y402H exhibited strongest evidence for association.27 Similarly, an association study of variants across the CFH gene in a Chinese population identified an SNP located in the promoter region which conferred significantly increased likelihood of exudative AMD.28 However, one limitation of the composite approach is the simplifying assumption of a single causal variant in the region, whereas, in reality, multiple mutations which are detrimental to gene productivity or function are likely to be present. Indeed, within ophthalmic genetics, there are well‐documented examples of multiple adverse mutations—for example, in the ABCA4 and COL2A1 genes underlying Stargardt disease and Stickler syndrome, respectively.29,30 Furthermore, the pattern of LD surrounding the CFH gene is consistent in all populations examined, indicating that there has been little historical recombination within the region. This lack of recombination hampers fine‐scale association mapping, as neutral variants which are physically remote but genetically linked to the causal mutation will “hitchhike” on the same haplotype and exhibit evidence for association. The outcome of this is that CIs are necessarily large, as we have found in this study, and the emphasis will be on functional tests to determine the relative importance of different polymorphisms across the region in determining disease.
This study represents the first composite likelihood analysis of association between AMD and the CFH gene region. Our data indicate a point estimate for the causal locus at the 5′ end of the CFH gene, but show that a high degree of linkage disequilibrium exists across the whole region, which makes further fine‐mapping studies difficult. These results suggest that there could be other important sequence variations relevant to AMD at the 5′ end of the CFH gene in addition to the well‐described Y402H variant. Further confirmation of these results will require functional studies.
We thank all study participants for their time and cooperation, and the Southampton Wellcome Trust Clinical Research Facility, for its support in enrolling patients. This study was funded by the Macular Vision Research Foundation, the Wellcome Trust, the Macula Society, the Macula Disease Society, Lord Sandberg and the British Council for Prevention of Blindness.
AMD - age‐related macular degeneration
CEU - Caucasian
CFH - complement factor H
HapMap - haplotype map of the human genome
LD - linkage disequilibrium
LDU - linkage disequilibrium unit
SNP - single‐nucleotide polymorphism
Competing interests: None declared.