|Home | About | Journals | Submit | Contact Us | Français|
Childhood absence epilepsy (CAE) is an idiopathic generalised epilepsy characterised by absence seizures manifested by transitory loss of awareness with 2.5-4Hz spike-wave complexes on ictal EEG. A genetic component to aetiology is established but the mechanism of inheritance and the genes involved are not fully defined. Available evidence suggests that genes encoding brain expressed voltage-gated calcium channels, including CACNG3 on chromosome 16p12-p13.1, may represent susceptibility loci for CAE. The aim of this work was to further evaluate CACNG3 as a susceptibility locus by linkage and association analysis. Assuming locus heterogeneity, a significant HLOD score (HLOD=3.54, α=0.62) was obtained for markers encompassing CACNG3 in 65 nuclear families with a proband with CAE. The maximum NPL score was 2.87 (p<0.002). Re-sequencing of the coding exons in 59 patients did not identify any putative causal variants. A linkage disequilibrium (LD) map of CACNG3 was constructed using 23 single nucleotide polymorphisms (SNPs). Transmission disequilibrium was sought using individual SNPs and SNP-based haplotypes with the pedigree disequilibrium test in 217 CAE trios and the 65 nuclear pedigrees.
Evidence for transmission disequilibrium (p≤0.01) was found for SNPs within a ~35kb region of high LD encompassing the 5′UTR, exon 1 and part of intron 1 of CACNG3. Re-sequencing of this interval was undertaken in 24 affected individuals. Seventy-two variants were identified: 45 upstream; two 5′UTR; and 25 intronic SNPs. No coding sequence variants were identified, although four variants are predicted to affect exonic splicing.
This evidence supports CACNG3 as a susceptibility locus in a subset of CAE patients.
The absence epilepsies are a group of idiopathic generalised epilepsies (IGEs) which vary in their age of onset, seizure frequency and pattern of evolution. The typical absence seizure is manifested as a transitory loss of awareness with 2.5–4 Hz spike-wave complexes on ictal EEG. The International League Against Epilepsy (ILAE) classification recognises a number of distinct absence epilepsy syndromes including childhood absence epilepsy (CAE), juvenile absence epilepsy (JAE), epilepsy with myoclonic absences, eyelid myoclonus with absences and juvenile myoclonic epilepsy (JME)1. However, it is uncertain whether they represent a ‘biological continuum’ or distinct entities. There is some evidence that CAE and JAE share a close genetic relationship, allowing them to be considered as one phenotype in genetic studies2,3.
Twin studies demonstrate that the IGEs have a significant heritability4, with regards to both occurrence and type of seizure and syndrome5. The molecular genetic basis of CAE in humans is presently unknown, but studies on the mechanism by which spike-wave seizures are generated, isolation of genes causing spike-wave seizures in rodents and initial linkage and association studies in humans have allowed candidate genes and chromosomal regions to be identified.
Four mouse models of spike-wave epilepsy are caused by mutations in genes for different subunits of voltage-gated calcium channels (VGCCs): tottering tg, Cacna1a6; lethargic lh, Cacnb47; ducky du, Cacna2d28; and stargazer stg, CACNG29. There is some evidence that the gamma subunits may function as transmembrane receptor regulatory proteins (TARPs)10, involved in trafficking11.
Genome-wide linkage analysis of IGE-multiplex families has demonstrated evidence for susceptibility loci on chromosomes 3q26, 14q23 and 2q3612. Furthermore, loci for three similar forms of absence epilepsy have been identified on chromosomes 8q24 (ECA1), 5q31.1 (ECA2) and 3q26 (ECA3)13,14,15,16. An association in humans has been documented between polymorphisms in CACNA1A (chromosome 19p13.2-p13.1) and IGE including CAE17. Finally, 12 missense mutations in CACNA1H (chromosome 16p13.3) have been found in 14 sporadic Chinese Han patients with CAE but not in any of 230 unrelated controls18. However, Heron et al. screened exons 9 to 11 of CACNA1H (in which 75% of the missense mutations were found) in 192 patients with IGE or generalised epilepsy with febrile seizures plus but did not find any of those identified by Chen et al.19. Furthermore, we did not find any of the 12 missense mutations in our resource of CAE families and trios; nor did we find any evidence for linkage to the CACNA1H locus20.
Previous analysis of 33 nuclear families with CAE under the assumption of heterogeneity produced evidence supportive of linkage to the CACNG3 locus on 16p12-p13.121, with an HLOD score of 0.55 (α=0.35) and an NPL score of 1.21. Although the HLOD score did not reach statistical significance, this may reflect the lack of power in the family resource and locus heterogeneity. This gene was prioritised for further analysis because it had the second most positive HLOD score, albeit not statistically significant, and because it is a compelling candidate on biological grounds. The GABA cluster on chromosome 15q, which had the most positive HLOD score, has also been investigated further but the results are not presented here. Those genes with HLOD scores of zero and without any supportive evidence from NPL analysis were not pursued in the larger resource.
Available evidence therefore suggests that genes encoding brain expressed voltage-gated calcium channels including CACNG3 may contribute to the aetiology of CAE. The aim of this work was to test this hypothesis by linkage analysis using microsatellite loci spanning CACNG3 in a resource of 65 nuclear families each with a proband with CAE, expanded from the original resource of 33 families and by association analysis using 23 SNPs distributed across CACNG3 in the nuclear families and 217 parent-affected child trios.
The 217 trios (affected child and both parents) and 65 nuclear pedigrees (with a total of 145 AE cases, including 25 of the pedigrees originally used by Robinson et al.), were all of Caucasian origin and ascertained from European populations including the UK, France, Germany, Austria, the Netherlands, Denmark, Sweden, Finland and Italy. Clinical data on subjects categorised as affected are provided in the supplementary data. Appropriate informed consent was obtained from all participants. Diagnostic criteria based on the ILAE classification of absence epilepsies were applied as described in the supplementary data1.
Genomic DNA was extracted from whole blood or cheek swab samples according to standard protocols.
Linkage analysis was performed using three fluorescently-labelled microsatellite markers: D16S420, situated 5′ of the gene; URB036 in the gene; a novel marker, UCL1032122 situated 3′ of the gene (chromosome 16: 24416648-24416860). They span a genetic distance of 0.83cM and a physical distance of 273Kb (Figure 1). This is a smaller distance than originally tested for linkage by Robinson et al. because we are specifically targeting CACNG3 whereas markers to include CACNA1H as well were previously used.
Genotyping was performed on the ABI 373 Sequence Analyser using the Genescan® and Genotyper® software. All pedigrees were checked for Mendelian inheritance using the PedCheck program23. Any pedigrees which failed this test were re-genotyped. Multipoint linkage analysis was performed using GeneHunter 2.124. Parametric analysis was performed under the assumption of autosomal dominant inheritance with a penetrance of 50%. A disease allele frequency of 0.01 and a phenocopy rate of 0.0001 were assumed. These values are compatible with the observed population prevalence and sibling recurrence risk ratio attributable to the locus, based on the original calculations of Risch25. HLOD scores as well as an estimate of α, which represents the proportion of pedigrees consistent with linkage at a specific locus, were calculated. The nonparametric linkage (NPL) statistic, along with the corresponding degree of significance, was also calculated by GeneHunter. The NPL statistic assesses the degree of allele-sharing in affected individuals only and is valuable as it is a “model-free” form of analysis thus bypassing the inherent problems of a parametric analysis such as misspecification of parameters.
Bi-directional re-sequencing of the coding exons (chromosome 16: 24174862-24175823; 24265106-24266056; 24273263-24274226; and 24279814-24281595) was undertaken in 59 cases taken from the 65 nuclear pedigrees. This re-sequencing was performed in conjunction with ABC at Imperial College, London, using standard Sanger dideoxy protocols.
Genomic DNA was typed for 23 SNPs by KBiosciences using both the Amplifluor™ and Taqman™ chemistries. SNPs were chosen at ~10kb intervals encompassing the putative promoter region through to the 3′ UTR (Figure 2). The SNPs have been numbered from one to 23 for ease of reference. One of these is a novel SNP identified via previous sequencing of a subset of the nuclear pedigrees; the remaining 22 can be found on the NCBI SNP database (Table 1).
These SNPs were typed in the entire resource and the genotypes were used to construct LD blocks with Haploview 3.226. Blocks were defined as a solid-spine of LD, i.e. the first and last marker in a block are in strong LD with all intermediate markers (one slight mismatch is allowed by the programme), but these intermediate markers are not necessarily in LD with one another. A minimum D′ of 0.7 was used as the cut-off point for strong LD. The program's standard colour scheme was employed, with pairwise D′ values less than 1 shown and the degree of pink/red shading representing a pairwise LOD ≥ 2. GeneHunter was used to construct haplotypes based on the largest blocks identified. Intrafamilial association analysis was performed on individual SNPs using the PDT27. The PDT produces two measures of association, the PDT-AVE and the PDT-SUM. The former gives all families equal weight in the analysis whereas the latter gives more weight to more informative families28. Association analysis was also performed on the SNP haplotypes. Each haplotype was assigned a single number so that the analysis could be performed essentially as though each haplotype was a single locus with multiple alleles. This is necessary because the PDT cannot simultaneously analyse multiple loci.
The block structure of the CACNG3 locus was also determined using the HapMap genotyped SNPs (see supplementary data).
Bi-directional direct re-sequencing of ~35kb of genomic DNA (chromosome 16, 24155960-24190949; accession number GI 51511732, NCBI Nucleotide Database) encompassing SNPs 1-9 from 24 affecteds was performed. Cases were chosen from families compatible with linkage to the CACNG3 locus and included individuals whose haplotypes demonstrated the most significant disease association. This re-sequencing work was performed by Polymorphic DNA Technologies Inc., using standard Sanger dideoxy sequencing protocols. The potential functional affect of all identified variants was assessed by searching for predicted regulatory motifs contained within the TransFac and Biobase databases via the Softberry NSITE portal. This website also contains the FPROM program which predicts the position of potential promoters and enhancers. GeneSplicer30 was used to predict whether any variant might affect the splicing of the gene by identifying exon-intron boundaries and scoring them. ESEfinder31 was used to predict the presence of any exonic splicing enhancers in exon 1. This program identifies putative binding sites for 4 SR-rich proteins thought to be involved in the control of splicing. Prediction is based on a scoring system developed from weighted matrices for each motif consensus sequence; when a certain threshold score is achieved, the motif is recognised. The default values suggested by the program authors were used throughout. Standard BLAST analyses were performed to check for sequence conservation between species.
The maximum HLOD score was 3.54 (α=0.62) located 0.15cM upstream of the distal marker, UCL10321. The non-parametric analysis is also statistically significant: maximum NPL statistic of 2.87 (p<0.002) occurring at UCL10321 (Figure 1).
Bi-directional re-sequencing of the coding exons and surrounding intron-exon boundaries in 59 cases identified 34 variants: four were upstream of CACNG3; six in the 5′UTR; five in intron 1; five in intron 2; nine in intron 3; one synonymous SNP in exon 4 (A2121G, Pro307Pro); two in the 3′UTR; and two downstream of CACNG3.
Analysis of LD based on the whole resource identified 5 LD ‘blocks’ (Figure 2). The LD block structure predicted by the HapMap project genotyped SNPs (based on CEPH Caucasian data only), identified 11 blocks of LD across the same region (see supplementary data).
Three SNPs showed significant transmission disequilibrium (p≤0.01) with at least one of the test statistics: SNP3; SNP7; and SNP8 (Table 2). SNP3 is located approximately 2kb upstream of CACNG3 whilst SNPs 7 and 8 are all located in intron 1. All three SNPs are in the first block of LD (Figure 2).
Block-based haplotype association analysis was performed on the entire data set using the PDT. No single complete haplotype within a block was sufficiently common to allow demonstration of disease association on the global level; however, if a ‘sliding window’ approach was used on each block, associated haplotypes were identified. Using this approach there are 13 haplotypes in Block 1, composed of combinations of SNPs 2-8, which demonstrate overtransmission and disease association (p≤0.05; Table 3). The individual haplotypes which are overtransmitted within each window together form a larger haplotype composed of the alleles 2211122. This haplotype has a frequency of 26.4% in our parental population.
The sliding window approach also produces some significant results in Block 2 which runs from SNPs 10-13, although these data are not as significant as for Block 1 (see supplementary data for details).
Intra-familial association analysis suggested that any functional variant underlying the observed transmission disequilibrium was most likely to be found between SNPs 1 and 9. Consequently, re-sequencing of the ~35kb of genomic sequence in this region was undertaken. Of the 48 chromosomes from 24 affecteds that were sequenced, 19 were of the most common haplotype, 2211122, which also shows the greatest evidence for disease association. The remaining 25 chromosomes which were sequenced were composed of a variety of different haplotypes. A total of 72 sequence variants were identified, including the 9 previously typed (Figure 3; full details can be found in the supplementary information). Forty-five of these are within 20kb upstream of the gene, two in the 5′UTR and the remaining 25 are in intron 1.
An initial assessment of which of the identified variants were most likely to be causal was based on whether the minor allele frequency (MAF) was different in the 24 sequenced cases from that quoted on the NCBI database (if that information was available). Any variants in which this did seem to be the case were typed in our entire resource so that intrafamilial association analysis could be performed. Three variants (rs392728, rs11860647 and rs8048987) were genotyped across the resource for this reason. However, intra-familial association analysis with the PDT did not provide any evidence for preferential transmission of either allele (data not shown).
Bioinformatics tools were also used to ascertain which of these 72 variants might be functional. Those considered to be most likely to have a functional effect are summarised in Table 4 (see supplementary information for full details). Of these, rs2021512 and rs1494550 are conserved at the nucleotide level in the chimpanzee (see supplementary data). rs11646957 has been typed in our resource of pedigrees and trios and intrafamilial association analysis performed. The results were not significant (data not shown). Intrafamilial association analysis had already been performed on rs1494550 and n20 as they are SNPs numbers 5 and 6 of the original 23 which were used. Neither demonstrated any disease association in these analyses (see supplementary information). However, rs2021512 did demonstrate significant transmission disequilibrium (SUM PDT χ2(1)=7.91, p=0.005; AVE PDT χ2(1)=4.90, p=0.027), with the reference allele being overtransmitted to cases (457 transmitted:422 not transmitted).
Optimal strategies for the genetic analysis of ‘complex’ traits such as childhood absence epilepsy have been much discussed in the recent literature and the potential problems are well recognised. These include uncertainties surrounding the delineation of a categorical phenotype and the unpredictable relationship between phenotype and genotype exemplified by current ignorance of the genetic architecture of most ‘complex’ traits, and the recognition that even epilepsy phenotypes demonstrating simple Mendelian inheritance display immense heterogeneity and variable expressivity. All these uncertainties preclude confident predictions about the likely power and validity of any strategy selected, but these data arise from approaches which have some reasonable degree of support and expectation of success given certain assumptions.
Childhood absence epilepsy is a fairly well defined and homogeneous phenotype with an electrophysiological hallmark and clear evidence of a genetic aetiology. Although there is some debate as to whether a narrower phenotype definition should be adopted – in part to allow clearer prognostic predictions – the clinical criteria adopted here provide the reasonable expectation that the patients ascertained represent a homogenous clinical phenotype. It is known that a variety of IGE phenotypes may cluster in families with a proband with absence epilepsy, but analysis reveals an increased clustering of CAE and JAE2,3 suggesting that they may share susceptibility loci. For this reason the minority of pedigrees in which first degree relatives of a proband with CAE had a diagnosis of JAE were included and such individuals were categorised as affected.
A further advantage of this phenotype for genetic analysis lies in the existing level of understanding of the molecular neurophysiological basis of the ‘spike-wave’ seizures which are their hallmark1. A substantial body of evidence implicates voltage-gated calcium channel genes in the aetiology of spike-wave seizures in rodents and absence seizures in humans. In particular, the stargazer phenotype arises from mutations in Cacng2, one of a family of so-called gamma-subunit genes which have been further defined as a family of transmembrane AMPA receptor regulatory proteins (TARPS)10 that mediate surface expression of AMPA receptors. Preliminary analysis in a limited family resource provided support for CACNG3 as a CAE susceptibility locus. It is noteworthy that the expression pattern of γ3 is specific to the cortex and hippocampus with low levels in the cerebellum, consistent with a role in epileptogenesis. A candidate gene approach therefore seems justified.
The relevant merits of linkage or association in the analysis of a complex trait depend on the genetic architecture of the trait which cannot of course be known in advance. Linkage can detect a locus of moderate effect in a set of small nuclear pedigrees or sib-pairs provided the proportion of linked families is adequate. It is robust to any allelic heterogeneity. Association has more power to detect loci of small effect, but does of course depend on allelic homogeneity and allele frequencies conferring adequate power. Both parametric and non-parametric linkage analysis provided significant evidence for linkage, indicating CACNG3 is a susceptibility loci for CAE in a subset of the 65 nuclear pedigrees analysed. Sequencing of coding regions did not however identify any plausible causal sequence variants.
The role of CACNG3 was therefore further analysed by intra-familial association analysis using the indirect approach based on genotyping of a set of common single nucleotide polymorphisms spanning the CACNG3 gene. The pattern of LD across CACNG3 was established and confirmed by HapMap data. A number of assumptions underlie this strategy and restrict its power to identify causal sequence variants. It is assumed that the actual SNPs typed are not causal but are in sufficiently tight LD with causal SNPs of adequately matching allele frequency to permit their detection. Power diminishes rapidly if these conditions are not fulfilled32. It is also assumed that any causal SNPs will be within the associated interval, although in practice the interval is difficult to define33.
Three SNPs in the first ‘block’ of LD showed significant transmission disequilibrium (SNPs 3, 7 and 8) using the pedigree disequilibrium test in the entire patient resource. Using a ‘sliding window’ approach 13 haplotypes comprising SNPs 2-8 within this ‘block’ demonstrated transmission disequilibrium. Together these form an extended haplotype composed of the alleles 2211122. Only two of these SNPs (SNPs 5 and 6) could be implicated on functional grounds. In each case the minor allele is predicted to lead to creation of a splice acceptor site. However, neither SNP showed significant transmission disequilibrium either alone or in combination. A correction for multiple testing was not applied to these analyses because although methods for calculating the effective number of independent tests have been developed34, their validity in the presence of haplotype block structure has been questioned35 and these methods are not yet established tools. Thus the association evidence described must be considered as tentative but requiring independent replication.
These observations suggested that causal variants underlying the observed transmission disequilibrium might lie within the genomic region between SNP1 and SNP9 and were most likely to be found on chromosomes of the haplotype 2211122. Re-sequencing of 35kb of genomic sequence in 48 chromosomes identified a total of 72 sequence variants. Evaluation of these variants encompassed the use of bioinformatics tools to determine any potential functionality and genotyping in the entire resource to investigate disease association. Four SNPs were predicted to have some functional effect by disrupting exonic splicing enhancer binding motifs or creating novel acceptor sites. One of these, rs2021512, demonstrated significant transmission disequilibrium with the reference sequence allele (G) being overtransmitted in preference to the variant allele (A). This suggests that the variant form is protective. Analysis indicated that the variant allele could potentially create an acceptor splice site although it is unclear how this might affect the function of the protein as rs2021512 is non-genic and approximately 14kb upstream of CACNG3. However, it is possible that this SNP has a subtle regulatory effect which was not identified with the bioinformatics used. Indeed, a paper earlier this year demonstrated that a non-genic variant can have a gain-of-function effect on another gene by creating a new transcriptional promoter36. This is not necessarily what is occurring in this situation but it is clear that variants some distance from a gene can still exert a powerful effect on them. Furthermore, it is still possible that rs2021512 is not a causal variant but is in LD with an unidentified causal variant.
It is possible that the linkage observed is spurious and CACNG3 is not a susceptibility locus for the CAE trait. A false positive result is of course feasible even with the fairly stringent threshold for significance utilised. The transmission disequilibrium observed could be a false positive result, although not due to population substructure. Alternatively the observed association is real, but driven by causal variants outside the sequenced region or too infrequent and heterogeneous to be detected in the limited number of chromosomes sequenced. It has been demonstrated that long range LD can exist generating ‘genetically indistinguishable SNPs’ (giSNPs) which are many kilobases apart37. The power to detect a homogeneous causal variant with a population frequency of 5% is approximately 92% when 48 chromosomes are sequenced but of course a heterogeneous collection of low frequency variants might go undetected. Finally it is possible that the observed SNPs demonstrating transmission disequilibrium have functional consequences which are not apparent.
In conclusion, these observations provide genetic evidence that CACNG3 is a susceptibility locus for childhood absence epilepsy. Common variants showing transmission disequilibrium have been identified. Definitive evidence to confirm or exclude this locus will require re-sequencing across an extended genomic region encompassing CACNG3 in a larger number of patients. Replication studies in similar resources of CAE patients would demonstrate whether rs2021512 is associated in other patient groups, and functional work to establish what the exact biological mechanism could be is needed.
This work was supported by the MRC (UK), Wellcome Trust, Action Medical Research and Epilepsy Research Foundation. We are very grateful to the families for participating in this study and to all our 142 collaborating clinicians, including Dr Lina Nashef. We would like to thank Généthon for their assistance in collecting the French samples and Richard Sharp for his technical help. Austrian financial support came from the Austrian Research Foundation (awarded to Harald Aschauer, MD), grant number P10460-MED. Thomas Sander, MD, was awarded a grant by the Deutsche Forschungsgemeinschaft (Sa434/3-1), the German National Genome Research Network (01GS0479). Dutch financial support came from the Netherlands Organisation for Health, Research and Development (ZonMW, 940-33-030) and the Dutch National Epilepsy Fund – ‘The power of the small’ (NEF – ‘De macht van het kleine’). Danish support (Mogens Friis, MD, and Marianne Kjeldsen, MD) came from the NINDS grant (NS-31564).
The URLs for data presented herein are as follows: