Optimal strategies for the genetic analysis of ‘complex’ traits such as childhood absence epilepsy have been much discussed in the recent literature and the potential problems are well recognised. These include uncertainties surrounding the delineation of a categorical phenotype and the unpredictable relationship between phenotype and genotype exemplified by current ignorance of the genetic architecture of most ‘complex’ traits, and the recognition that even epilepsy phenotypes demonstrating simple Mendelian inheritance display immense heterogeneity and variable expressivity. All these uncertainties preclude confident predictions about the likely power and validity of any strategy selected, but these data arise from approaches which have some reasonable degree of support and expectation of success given certain assumptions.
Childhood absence epilepsy is a fairly well defined and homogeneous phenotype with an electrophysiological hallmark and clear evidence of a genetic aetiology. Although there is some debate as to whether a narrower phenotype definition should be adopted – in part to allow clearer prognostic predictions – the clinical criteria adopted here provide the reasonable expectation that the patients ascertained represent a homogenous clinical phenotype. It is known that a variety of IGE phenotypes may cluster in families with a proband with absence epilepsy, but analysis reveals an increased clustering of CAE and JAE2,3
suggesting that they may share susceptibility loci. For this reason the minority of pedigrees in which first degree relatives of a proband with CAE had a diagnosis of JAE were included and such individuals were categorised as affected.
A further advantage of this phenotype for genetic analysis lies in the existing level of understanding of the molecular neurophysiological basis of the ‘spike-wave’ seizures which are their hallmark1
. A substantial body of evidence implicates voltage-gated calcium channel genes in the aetiology of spike-wave seizures in rodents and absence seizures in humans. In particular, the stargazer phenotype arises from mutations in Cacng2
, one of a family of so-called gamma-subunit genes which have been further defined as a family of transmembrane AMPA receptor regulatory proteins (TARPS)10
that mediate surface expression of AMPA receptors. Preliminary analysis in a limited family resource provided support for CACNG3
as a CAE susceptibility locus. It is noteworthy that the expression pattern of γ3 is specific to the cortex and hippocampus with low levels in the cerebellum, consistent with a role in epileptogenesis. A candidate gene approach therefore seems justified.
The relevant merits of linkage or association in the analysis of a complex trait depend on the genetic architecture of the trait which cannot of course be known in advance. Linkage can detect a locus of moderate effect in a set of small nuclear pedigrees or sib-pairs provided the proportion of linked families is adequate. It is robust to any allelic heterogeneity. Association has more power to detect loci of small effect, but does of course depend on allelic homogeneity and allele frequencies conferring adequate power. Both parametric and non-parametric linkage analysis provided significant evidence for linkage, indicating CACNG3 is a susceptibility loci for CAE in a subset of the 65 nuclear pedigrees analysed. Sequencing of coding regions did not however identify any plausible causal sequence variants.
The role of CACNG3
was therefore further analysed by intra-familial association analysis using the indirect approach based on genotyping of a set of common single nucleotide polymorphisms spanning the CACNG3
gene. The pattern of LD across CACNG3
was established and confirmed by HapMap data. A number of assumptions underlie this strategy and restrict its power to identify causal sequence variants. It is assumed that the actual SNPs typed are not causal but are in sufficiently tight LD with causal SNPs of adequately matching allele frequency to permit their detection. Power diminishes rapidly if these conditions are not fulfilled32
. It is also assumed that any causal SNPs will be within the associated interval, although in practice the interval is difficult to define33
Three SNPs in the first ‘block’ of LD showed significant transmission disequilibrium (SNPs 3, 7 and 8) using the pedigree disequilibrium test in the entire patient resource. Using a ‘sliding window’ approach 13 haplotypes comprising SNPs 2-8 within this ‘block’ demonstrated transmission disequilibrium. Together these form an extended haplotype composed of the alleles 2211122. Only two of these SNPs (SNPs 5 and 6) could be implicated on functional grounds. In each case the minor allele is predicted to lead to creation of a splice acceptor site. However, neither SNP showed significant transmission disequilibrium either alone or in combination. A correction for multiple testing was not applied to these analyses because although methods for calculating the effective number of independent tests have been developed34
, their validity in the presence of haplotype block structure has been questioned35
and these methods are not yet established tools. Thus the association evidence described must be considered as tentative but requiring independent replication.
These observations suggested that causal variants underlying the observed transmission disequilibrium might lie within the genomic region between SNP1 and SNP9 and were most likely to be found on chromosomes of the haplotype 2211122. Re-sequencing of 35kb of genomic sequence in 48 chromosomes identified a total of 72 sequence variants. Evaluation of these variants encompassed the use of bioinformatics tools to determine any potential functionality and genotyping in the entire resource to investigate disease association. Four SNPs were predicted to have some functional effect by disrupting exonic splicing enhancer binding motifs or creating novel acceptor sites. One of these, rs2021512, demonstrated significant transmission disequilibrium with the reference sequence allele (G) being overtransmitted in preference to the variant allele (A). This suggests that the variant form is protective. Analysis indicated that the variant allele could potentially create an acceptor splice site although it is unclear how this might affect the function of the protein as rs2021512 is non-genic and approximately 14kb upstream of CACNG3
. However, it is possible that this SNP has a subtle regulatory effect which was not identified with the bioinformatics used. Indeed, a paper earlier this year demonstrated that a non-genic variant can have a gain-of-function effect on another gene by creating a new transcriptional promoter36
. This is not necessarily what is occurring in this situation but it is clear that variants some distance from a gene can still exert a powerful effect on them. Furthermore, it is still possible that rs2021512 is not a causal variant but is in LD with an unidentified causal variant.
It is possible that the linkage observed is spurious and CACNG3
is not a susceptibility locus for the CAE trait. A false positive result is of course feasible even with the fairly stringent threshold for significance utilised. The transmission disequilibrium observed could be a false positive result, although not due to population substructure. Alternatively the observed association is real, but driven by causal variants outside the sequenced region or too infrequent and heterogeneous to be detected in the limited number of chromosomes sequenced. It has been demonstrated that long range LD can exist generating ‘genetically indistinguishable SNPs’ (giSNPs) which are many kilobases apart37
. The power to detect a homogeneous causal variant with a population frequency of 5% is approximately 92% when 48 chromosomes are sequenced but of course a heterogeneous collection of low frequency variants might go undetected. Finally it is possible that the observed SNPs demonstrating transmission disequilibrium have functional consequences which are not apparent.
In conclusion, these observations provide genetic evidence that CACNG3 is a susceptibility locus for childhood absence epilepsy. Common variants showing transmission disequilibrium have been identified. Definitive evidence to confirm or exclude this locus will require re-sequencing across an extended genomic region encompassing CACNG3 in a larger number of patients. Replication studies in similar resources of CAE patients would demonstrate whether rs2021512 is associated in other patient groups, and functional work to establish what the exact biological mechanism could be is needed.