|Home | About | Journals | Submit | Contact Us | Français|
5HTTLPR, the length polymorphism repeat in the promoter region of the serotonin transporter gene (5HTT renamed SLC6A4) is one of the most studied polymorphisms for association with a range of psychiatric and personality phenotypes. However, the original 5HTTLPR assay is prone to bias toward short allele calling.
We designed new assays for the 5HTTLPR suitable for large scale genotyping projects and we genotyped 13 SNPs in a 38kb region around the 5HTTLPR including SNP rs25531, a polymorphism of the 5HTTLPR long allele. Association analysis was conducted for major depression and/or anxiety disorder in unrelated cases (N = 1161) and controls (N = 1051) identified through psychiatric interviews administered to a large population sample of Australian twin families. Participants had been scored for personality traits of neuroticism, extraversion and harm avoidance several years earlier (N ≥ 2643 unrelated individuals).
Using the linkage disequilibrium (LD) between markers we identified a two SNP haplotype proxy for 5HTTLPR; the CA haplotype of SNPs rs4251417 and rs2020934 is coupled with the short allele of 5HTTLPR (r2 = 0.72). We found evidence for association (p=0.0062, after accounting for multiple testing) for SLC6A4 SNPs rs6354 and rs2020936 (positioned in a different LD block about 15.5kb from 5HTTLPR) with anxiety and/or depression and neuroticism, with the strongest association for recurrent depression with onset in young adulthood (OR = 1.55, 95% CI 1.16–2.06).
The associated SNPs are in the same LD block as the VNTR STin2, for which association has previously been reported.
Serotonergic neurotransmission impacts on a wide range of behaviours including cognition and emotion (1, 2) and drugs targeting serotonin reuptake are clinically effective antidepressants (3). As a result, one of the most studied polymorphisms for association with a broad range of psychiatric and personality phenotypes is the length polymorphism repeat (LPR) in the promoter region of the serotonin transporter gene (5HTT renamed SLC6A4). The 5HTTLPR polymorphism comprises a 43-base pair (4–8) insertion or deletion (long, ‘L’ with 16 repeat units, or short, ‘S’ with 14 repeat units, alleles respectively). The S allele (frequency in Caucasians ~0.45 (9)) reduces transcriptional efficiency resulting in decreased SLC6A4 expression and function (10). Association studies and subsequent meta-analyses have shown conflicting results in support (Table 1) (11–15) of an association between the S allele and anxiety, depression and the personality trait neuroticism (a measure of emotional stability that is genetically correlated to both anxiety and depression (16–18)). Conflicting results have dogged candidate gene association studies for many complex disorders, attributable to small sample sizes of both primary and replication studies, heterogenous subject populations, association dependent on environmental conditions such as stressful life events (19), differing instruments for assessment of phenotypic traits and statistical methods e.g.,(20, 21). However, an additional problem specific to 5HTTLPR relates to the genotyping assay, which has caused considerable bias towards S allele identification (22, 23). Furthermore, association may have been compromised by the presence of an A/G single-nucleotide polymorphism (SNP), rs25531 that lies within the L allele of 5HTTLPR (5, 8); the L allele with the rarer G allele of rs25531 (denoted LG) is functionally equivalent to the S allele, because of changes to the AP2 transcription factor binding site altered by this SNP (5, 8).
The aim of this study was to investigate the association between the 5HTTLPR polymorphism including the rs25531 polymorphism and psychiatric and personality phenotypes in a large cohort. As part of our study, we designed new assays for the 5HTTLPR suitable for large scale genotyping projects and we genotyped tagging SNPs in a 38kb region around the 5HTTLPR (Figure 1A) to determine whether any SNP or combination of SNPs could be used as proxies for the difficult and time consuming 5HTTLPR assay. Our study design allows us to examine association in multiple traits within the same cohort and consistency across independent sub-cohorts. Within our study sample we can identify subsets of cases which are predicted to be genetically more homogenous. For example, the relative risk (RR) to first degree relatives is reported to be higher for recurrent early-onset depression (RR ≈ 4–5) (reviewed in (24)) compared to major depression (RR ≈ 2–3). Similarly, although estimates for RR are not available, anxiety co-morbid with depression is also considered to be a genetically more homogeneous group (reviewed in (25)). The optimum balance of sample size versus sample homogeneity cannot be predicted since the true genetic aetiology is unknown. However, the availability of a depth of phenotypic information allows us to investigate associations through case subsets.
All participants were adult twins and their families recruited through the Australian Twin Registry and provided informed consent under study protocols approved by the Queensland Institute of Medical Research and Washington University Human Research Ethics Committees. During the period 1988–1990 study participants from two twin birth cohorts (born 1890–1964, and 1964–1971, respectively) were mailed an extensive Health and Lifestyle Questionnaire (‘1989 Questionnaire Survey’(26, 27)). This included the shortened revised 48-item Eysenck Personality Questionnaire (EPQ) (28) and a short-form 54-item version of the Temperament and Character Inventory (26, 29), the Temperament Personality Questionnaire (TPQ). The EPQ measures four dimensions of personality including neuroticism and extraversion and the TPQ measures three dimensions of temperament including harm avoidance. Between 1992–1994 twins from the older cohort (N=5,995) (30) were interviewed by telephone using the SSAGA (Semi-Structured Assessment for the Genetics of Alcoholism) instrument, a comprehensive psychiatric interview designed to assess psychiatric disorders in adults (31) according to DSM-IIIR and subsequently updated to DSM-IV criteria (32) that was modified for use as a telephone survey instrument in Australia (SSAGA-OZ). Of these participants, N = 4,597 have subsequently provided a blood sample (or rarely a buccal or saliva sample) for genotyping (33). Over the period 1996–9, sibling pairs participating in the 1989 questionnaire survey that were either concordant or discordant for extreme EPQ neuroticism scores (one sibling in the top or bottom decile, the other sibling in the top or bottom quintile, and allowing inclusion of multiple siblings) were recruited (35–37). Participants (N = 2456) completed the shortened Composite International Diagnostic Interview (34) (CIDI), which provides DSM-IV (35) life-time diagnoses of depression (including recurrent major depression) and anxiety disorders. This extreme discordant and concordant (36) design is a cost efficient strategy for obtaining an informative data set for genetic studies (37). Of these participants 2213 provided DNA samples, plus 837 parents. Full details of the recruitment procedure for study, including response rates and incidence of DSM-IV diagnoses for anxiety and depression related disorders are given elsewhere (38–40). Finally, between 2001–2005 some participants from the earlier studies were reinterviewed again using an adaptation of the SSAGA, see (41, 42) for details. Participants reported ancestry of all four grandparents and a small number with two or more grandparents with known non-European ancestry were excluded from the analysis. Selection of individuals used in the association analyses are described in the Association Analysis section below.
Genomic DNA was extracted from blood (or buccal) samples using standard protocols (43). Samples were plated in 384 well plates in two study sample sets comprising i) MZ and DZ twins from the 1992–2000 SSAGA interview studies and ii) participants and their parents from the CIDI interview. Some 764 DNA samples are included in both sets, providing an opportunity for extensive quality control checking.
The original assay (44) for the 5HTTLPR used PCR primers in the non-repetitive sequences that flanked the 16 repeat elements, which are each comprised of between 19 and 23 base pairs (bp) (10, 44) (Figure 1B). This assay proved less than ideal: the PCR is difficult because of the very high GC content, and the long length of the PCR products. In pilot studies we found that these difficulties cause considerable bias towards S allele identification, since the L allele signal is weak so that heterozygotes are frequently mis-scored as S homozygotes (eg. see Figure 1D of 44). This observation has also been made by others e.g. (22, 23). We improved on the assay by redesigning the PCR primers, full details are provided in the Supplement 1. The final assay comprised a multiplexing scheme of PCR primers L3+R3 and L3+R4 located within the repeat elements (Figure 1 B). This choice of primers provided two PCR products which are semi-independent (L3 is shared) and which could be analysed together on a gel. The primer products were pooled (one DNA sample from each primer product) and run on a gel together (twofold multiplexing). Further, fourfold multiplexing was achieved by loading a second pair of samples on to the same gel (Figure 1C). In operation, a 384-well plate of DNAs can be amplified for both products and pooled in a way to allow duplicate scoring for each individual on two gels. Our pilot experiments showed that this was an efficient and robust system to genotype this difficult polymorphism.
The genotyping assay for SNP rs25531 [A/G] is described in the Supplement 1. DNA samples were genotyped for 13 additional SNPs (see Figure 1) in the 38kb region surrounding the 5HTTLPR. Our motivation was firstly, to determine whether there is a SNP-based proxy for 5HTTLPR which would be easier to genotype in large sample sets, and secondly, to see if any SNPs in this region show stronger association with any of the phenotypic measures than 5HTTLPR. SNPs were selected to represent the linkage disequilibrium landscape of the region bounded by rs140700 (intron 6) and rs7214991 (promoter) and preference was given to SNPs used in association studies of autism (45, 46). All non-genotyped HapMap (build 35) SNPs with minor allele frequency > 0.1 were represented by single SNP genotyped proxies with r2 > 0.8. Three non-synonymous SNPs were also included: rs28914832 (isoleucine/valine exon 10, also known as SERT Ile425Val), rs6355 (glycine/alanine exon 3, also known as Gly56Ala) and rs1050565 (isoleucine/valine exon 12 of BLMH 13kb from 5HTTLPR).
In pilot experiments we attempted to genotype the 17 bp variable number tandem repeat (VNTR) of SLC6A4 (Figure 1A), known as STin2. However, genotyping results did not pass our quality control checks (including Mendelian error checking) and so STin2 was not genotyped on the full study sample; problems with the assay have been found by others e.g. (22).
Our study design allowed us to undertake quality control checks often not possible in association studies (see Supplement 1). As the rare LG alleles are functionally equivalent to S alleles (4, 8), we constructed a 5HTTLPR+rs25531 (Marker 12) variable with three levels used in the analysis, SS&SLG, SLA& LGLA and LALA. After exclusion of identifiable errors, genotyping call rates ranged from 97.6% to 99.2% for the 13 SNPs excluding rs25531, but were only 96.9% for 5HTTLPR and 95.4% for 5HTTLPR+rs25531; these latter rates are lower than normal for our laboratory, but nonetheless much improved over the original assay for 5HTTLPR. Tests of deviation from Hardy-Weinberg equilbrium (HWE) and estimates of linkage disequilibrium were calculated in Haploview(47) using the SSAGA sample (a population sample) with one individual randomly selected per family (n = 2341). Within Haploview, the Tagger (48) option was used to force exclusion of 5HTTLPR to see if any one, two or three marker combination could be used to predict 5HTTLPR genotype. Similarly, we used Tagger to predict rs25531 genotype from other genotyped SNPs in those with LL 5HTTLPR genotypes.
Sum scores of 12 item responses in each EPQ domain resulted in quantitative scores for neuroticism and extraversion. Harm avoidance comprised sum scores of 18 items from the TPQ. Scores for MZ twins were averaged. Sum scores were transformed using the averaged angular transformation (49). Scores used for analysis were standardized residuals (standardised separately for each sex) after regression of the transformed scores to remove effects of sex and age. This standardisation was conducted using the population sample from the full study sample (n > 20,000), not just those genotyped (see (40)). Association analysis for quantitative traits used one individual per family selecting the individual that deviated most from the population mean, and were conducted in PLINK (50); in order to account for potential bias in the ascertainment of samples for analysis, permutation p-values are reported.
Cases were selected as unrelated individuals with a DSM-IV (51) life-time diagnosis of major depression or an anxiety disorder. Early age of onset for depression was defined as those with self-report age of onset of less than 30 years old. Controls were selected as unrelated individuals, from families in which no siblings who completed the questionnaires (including those not supplying a DNA sample) received a diagnosis of either major depression or of an anxiety disorder from any of the surveys. Details of selection of individuals for analysis within families are provided in Supplement 1. SSAGA controls were only selected if their neuroticism scores were less than the population average. Neuroticism scores of CIDI controls reflected the ascertainment into the study (Table 3). Cases and controls were divided into two independent samples according to the interview instrument. Families who participated in both studies were allocated to the CIDI case-control sample, unless the only anxiety or depression diagnoses for the family were allocated from the SSAGA instrument (2.7% of those measured on both instruments). Thus, the CIDI and SSAGA case control data sets (Table 2) are independent samples from the same population. An additional 133 cases were identified by the SSAGA reinterview. Association analysis was conducted for cases with major depression disorder (Depression), any anxiety disorder (Anxiety), or Anxiety and/or Depression (if they qualify for a diagnosis of any anxiety disorder and/or major depression). Analyses were also conducted for the case subset groups thought to represent genetically more homogeneous groups: Anxiety with Depression (if they qualify for both diagnoses) and recurrent, early age of onset depression (Table 2). Following up on literature reports discussed in the Introduction, association analysis of all markers was also conducted for Panic/Agoraphobia and OCD. However, for the other specific anxiety disorders, follow-up analyses were only conducted for SNPs showing association in the primary analyses. Logistic regression association analysis was conducted using PLINK (50) under a model that assumes additivity of allelic effects on the log(risk) scale. Power calculations and details of the permutation test used to determine the empirical significance of the pattern of association observed across data sets and phenotypes are provided in Supplement 1.
The numbers of genotyped individuals for the two independent samples used in the association analysis are shown in Table 2. The ascertainment of the CIDI sample means that both the cases and controls are likely to be more extreme that those of the SSAGA sample, and this is reflected in the summary statistics based on EPQ neuroticism and harm avoidance (Table 3) collected several years earlier.
None of the markers showed evidence for Hardy-Weinberg disequilibrium. Figure 2 shows the linkage disequilibrium as both |D’| and r2 measures; although the |D’| values are high, the corresponding r2 values are lower reflecting differences in allele frequencies (52). The best prediction of 5HTTLPR is provided by rs4251417 and rs2020934 (Figure 2b), where the CA haplotype is coupled with the S allele and the CG haplotype is coupled with the L allele of 5HTTLPR (r2 = 0.72). No useful prediction could be made of 5HTTLPR-rs25531 or of rs25531 within LL individuals. Genotype data are available from the corresponding author on request.
SNPs rs6354 and rs2020936 (markers 4 and 6) show association (p < 0.05) with all of the depression and anxiety phenotypes presented in Figure 1, with the T and A alleles of these SNPs respectively, being more frequent in cases than controls (0.82 vs 0.72, Odds ratio (OR) = 1.19 (1.03 – 1.38)). These SNPs are in high LD (r2 = 0.98) and so represent a single association signal. Although the associations for the two independent data sets are only significant for some analyses the direction of the association is the same, so that the association in the combined data set is significant. The association with Depression is marginally significant, OR = 1.17 (1.01 – 1.26). However, the association with the two subsets of cases thought to represent genetically more homogenous groups (24, 25), both show higher association odds ratios: recurrent, early onset depression (Figures 3e and and4),4), OR = 1.55 (1.16–2.06), and co-morbid group of Anxiety with Depression (Figure 3d and and4),4), OR = 1.31 (1.02–1.69). Permutation analysis showed that this pattern of association occurred only 62 times in 10,000 permutations across all markers under the null hypothesis, resulting in an empirical p-value which accounts for multiple testing of 0.0062. Odds ratios of association with rs6354 for the different anxiety disorders are presented in Figure 4. These results must be interpreted recognising the constraints on possible diagnostic classes from the questionnaires. Necessarily these results should be considered as hypothesis generating for future studies. The associations for rs6354 and rs2020936 with neuroticism were 0.019 and 0.015 respectively (Figure 3f), but there is no evidence for association between either harm avoidance or extraversion which had phenotypic correlations of 0.66 and −0.25, respectively, with neuroticism. Association (p < 0.05) is seen for rs4251417 and harm avoidance. Haplotype analysis of combinations of SNPs revealed no associations more significant than those of the individual genotyped markers.
For the association analysis in the CIDI sample we observed association (p <0.05) for the 5HTTLPR (L allele), most significantly with the Depression cases. We interpret this as a chance result as the association is not replicated in the SSAGA sample for which any trend in association is with the S allele. A detailed investigation is provided in Supplement 1. We find no evidence for association of 5HTTLPR (with or without rs25531) with panic/agoraphobia, OCD, neuroticism, extraversion or harm avoidance (Figure 3 e and f). As a consequence, our results provide no evidence, either way, for the usefulness of genotyping of SNP rs25531 to subclassify the L allele. A full report of allele and genotype frequencies and p-values is available online (Supplement 2).
Preferential amplification of one allele over another is recognised as a systematic problem in PCR assays for variable repeat polymorphisms (22). In pilot studies using the standard 5HTTLPR assay (44) we found overamplification of the S allele, as also reported by others (22, 23). Yonan et al (23) found that the assay was sensitive to the concentration of MgCl2 and Hardy-Weinberg equilibrium frequencies of genotypes could only be achieved at lower concentrations of MgCl2 and found that a disease association with 5HTTLPR disappeared when the revised assay was used. Although genotyping errors could lead to false detection of association, consistent association with one allele detected across studies would require a mechanism by which a sub-optimal assay systematically biases genotypes of cases versus controls. Such a mechanism is hard to contemplate, but ascertainment, age and storage of DNA samples can often differ between case and control sample sets. Our concern with the 5HTTLPR assay cannot conclusively undermine the meta-analysis results of a large number of studies in which the odds ratio for S vs L alleles was estimated to be 1.12 from 24 studies (11) as it is possible that optimal genotyping conditions in each study had been achieved. However, it is noteworthy that many of the contributing studies had too few samples to detect anything but extreme deviation from HWE. Furthermore, potential problems with the 5HTTLPR assay are often overlooked in discussions of results of 5HTTLPR associations, e.g., (53).
We have developed a new assay for 5HTTLPR based on two pairs of PCR primers which bind directly to the repeat sequences and which can be multiplexed into a single assay. Further, fourfold multiplexing was achieved by delayed double loading of sample replicates onto the same gel. In operation our protocol took only one hour to genotype 384 different DNA samples, providing replication of two assays for each sample. Genotyping of 13 SNPs within a 38kb region flanking 5HTTLPR has enabled us to examine linkage disequilibrium landscape of the region and identify a two SNP haplotype proxy for it; the CA haplotype of rs4251417 and rs2020934 is coupled with the short allele of 5HTTLPR (r2 = 0.72) (Figure 2). Ideally, a higher r2 is desirable for SNP proxies (usually an r2 threshold of 0.8 is set for selection of tagging SNPs), but with large sample sizes genotyping this two SNP proxy could usefully replace the time consuming 5HTTLPR assay in association studies, i.e. a sample size increased by a factor of 1/0.72 is required when genotyping the two SNP proxy to achieve the same power as direct genotyping of 5HTTLPR (54). Others (45, 46, 55) have genotyped SNPs in the 38kb region surrounding 5HTTLPR as part of association studies for autism or bipolar disorder. However, none of these studies (45) genotyped SNP rs4251417 whose minor T allele is useful in splitting the majority of L alleles that couple with the A allele of rs2020934. Despite the large number of association studies for depression and anxiety related traits that have genotyped 5HTTLPR none (to our knowledge) has genotyped more than a couple of additional SNPs in the region. Unfortunately, rs2020934 has not been genotyped as part of the HapMap project and has not been included on any of the genome-wide SNP platforms. rs4251417 is included on the Illumina 610K and 1M chips, but on its own it is not a useful proxy for 5HTTLPR (r2 = 0.06).
We have conducted an association study of 5HTTLPR, including the rs25531 polymorphism found on the L allele, as well as an additional 13 SNPs in a 38 kb region surrounding the 5HTTLPR using a large population cohort rich in psychiatric and personality phenotypic information. Our case and controls represent two independent study samples. The ascertainment for the sample completing the CIDI questionnaire is likely to have identified more extreme cases and controls and this is quantified through personality measures collected several years prior to administration of the psychiatric interviews (Table 3). Although, we might expect to observe more extreme associations with the CIDI cohort we would expect the direction of the association to be the same in the two cohorts if a true association exists. We found no consistent evidence of association with 5HTTLPR, nor with functional genotype classes of 5HTTLPR with SNP rs25531. However, we did find consistent evidence of association with SNPs rs6354 and rs2020936 in both independent samples. These two SNPs are in high LD (r2 = 0.95) and so represent a single association signal; the LD with 5HTTLPR for these SNPs is r2 = 0.01 and |D’| = 0.20. The association was upheld across multiple (correlated) phenotypic measures with larger differences in allele frequencies occurring with more extreme, homogenous phenotypes (although because of the smaller sample sizes, association p-values were not always smaller) e.g. association p-values with SNP rs6354 for Depression and/or Anxiety, co-morbid Anxiety with Depression, and recurrent, early onset depression are 0.021, 0.032 and 0.0027 respectively, reflecting control vs case frequencies of the T allele of 0.78 vs 0.81, 0.83 and 0.85, and generating allelic OR (95% CI) of 1.19 (1.03–1.38), 1.31 (1.02–1.69) and 1.55 (1.16 – 2.06) respectively. The higher OR for subsets of cases which are likely to be genetically more homogeneous provide support that this association is not a false positive. Indeed, permutation testing confirmed this, generating a significance of p = 0.0062, which includes correction for testing of multiple markers. The association was supported through analysis of the quantitative trait neuroticism with measures available on more than 2600 individuals collected several years prior to the diagnostic psychiatric interviews, p = 0.019 with rs6354, reflecting a standardized difference between homozygotes of d = 0.2, with a higher score associated with their T and A alleles respectively. There was no evidence for association with extraversion or harm avoidance with these SNPs, and within the anxiety disorders, no evidence was found for association with OCD, although the number of cases is low.
SNPs rs6354 and rs2020936 are positioned in the 15.5kb region that separates the 5HTTLPR and STin2 (they are 1.3 and 2.2 kb from STin2 respectively). These SNPs are not in LD with 5HTTLPR (Figure 2). Using CEU genotypes downloaded from the HapMap (56) website and haplotype information (57) we deduce that rs6354 and rs2020936 are in the same haplotype block as STin2 but we are unable to deduce the likely coupling of alleles between these polymorphisms (see Supplement 1). As for 5HTTLPR, mixed association results have been reported for STin2. A meta-analysis of 8 association studies of STin2 and major depression(11), reported an OR of 1.33 for the rare allele 9 vs 12, but this failed to reach significance because of small sample size, low frequency of the risk allele and heterogeneity between studies. Their result for allele 10 vs 12 was not significant. However, a recent association study of panic and social anxiety disorder reported significant association with rs140701 (|D’|=1, r2 ≈ 0.2 with rs6354), and no association with 5HTTLPR (58). Furthermore, in a study of allelic expression of variants within SLC6A4 and its promoter, the most significant correlation was found with two SNPs in intron 1 (rs16965628 and rs2020933, r2 between them of 0.79) which are not in LD with 5HTTLPR. Taken together these studies provide evidence for a role for SLC6A4, but not necessarily 5HTTLPR, in the etiology of depression and anxiety disorder, although whether these results reflect single or multiple causal variants is unclear.
This research was supported by grants to Nicholas G. Martin from the Australia National Health and Medical Research Council (NHMRC; 941177, 971232, 339450 and 443011) and by grants to Michele L. Pergadia (DA019951), Andrew C Heath (AA07535, AA07728 & AA10249), Patrick F Sullivan (MH059160), Pamela AF Madden (DA12854) and Richard D Todd (AA13320). We would like to thank David Smyth for computer support and our laboratory staff, especially Megan Campbell, Anjali Henders and Leanne McNeil. Lastly, this research would not be possible without the willing co-operation of twins and their families who participate in the Australian Twin Registry studies.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.