|Home | About | Journals | Submit | Contact Us | Français|
Obsessive compulsive disorder (OCD) has a complex etiology involving both genetic and environmental factors. However, the genetic causes of OCD are largely unknown, despite the identification of several promising candidate genes and linkage regions.
Our objective was to conduct genetic linkage studies of the type of OCD thought to have the strongest genetic etiology (i.e., childhood-onset OCD), in 33 Caucasian families with ≥2 childhood-onset OCD-affected individuals from the United States (US) (N=245 individuals with genotype data). Parametric and non-parametric genome-wide linkage analyses were conducted with Morgan and Merlin in these families using a selected panel of single nucleotide repeat polymorphisms (SNPs) from the Illumina 610-Quad Bead Chip. The initial analyses were followed by fine-mapping analyses in genomic regions with initial heterogeneity LOD (HLOD) scores of ≥2.0.
We identified five areas of interest (HLOD score ≥2) on chromosomes 1p36, 2p14, 5q13, 6p25, and 10p13. The strongest result was on chromosome 1p36.33-p36.32 (HLOD=3.77, suggestive evidence for linkage after fine-mapping). At this location, several of the families showed haplotypes co-segregating with OCD.
The results of this study represent the strongest linkage finding for OCD in a primary analysis to date, and suggest that chromosome 1p36, and possibly several other genomic regions, may harbor susceptibility loci for OCD. Multiple brain-expressed genes lie under the primary linkage peak (approximately 4 mb in size). Follow-up studies, including replication in additional samples and targeted sequencing of the areas of interest, are needed to confirm these findings and to identify specific OCD risk variants.
Obsessive compulsive disorder (OCD) [MIM 164230] is a common neuropsychiatric disorder consisting of repeated, distressing ego-dystonic thoughts (obsessions) and behaviors (compulsions) with a world-wide prevalence of 1–3%(1–4). Family, segregation, and twin studies clearly demonstrate that OCD is familial, with estimated heritabilities for obsessive-compulsive (OC) symptoms of 27% to 47% for adults and 45% to 65% for children(5–18). Genetic epidemiological studies also indicate that OCD with symptom onset before age 18 has a stronger genetic contribution than OCD with later onset, with a doubling of the OCD risk for first-degree family members in families of probands with childhood-onset compared to adult-onset of symptoms(11–18).
Although several promising genes and genomic regions of interest have been identified, clear susceptibility genes for OCD have not yet been demonstrated. Most studies have focused on candidate gene approaches, although four primary genome-wide linkage studies have also been conducted(15, 19–28). These studies have identified eleven genomic regions with LOD scores of ≥1.4 on chromosomes 1q, 3q, 6p, 6q, 7p, 9p, 10p, 11p, 14q, 15q, and 19q, most with a broad definition of the OCD phenotype(15, 19–28). The strongest linkage finding in a primary analysis of OCD reported to date was on chromosome 15q14 in three Costa Rican families (LOD score=3.13); this region was also previously identified in a Caucasian sample (25, 28). Other than 15q14 and the 9p region identified by Hanna et al, and subsequently examined as a targeted replication in a separate sample(26, 27), no linkage region has been identified in more than one study. Genome-wide association studies (GWAS) for OCD and Tourette Syndrome (TS), a related disorder, have recently been completed. However, in the context of genetic and environmental heterogeneity, multiple approaches are appropriate, and linkage studies continue to play an important role. While GWAS are useful for the identification of common variants with relatively small effect sizes, linkage studies of multiplex families are particularly useful for the identification of rare variants with larger effect sizes that are increasingly believed to underlie a substantial proportion of the risk for complex disorders(29). Individual variants identified via linkage approaches may be family specific, each accounting for a small proportion of the overall variance; however, the genes implicated by these variants will be of interest in multiple samples and populations, likely accounting for a much larger proportion of the overall OCD risk, in addition to providing insights about the biology of OCD.
The aim of this study was to search for genomic regions that potentially harbor OCD susceptibility genes using genome-wide linkage approaches in Caucasian families with childhood-onset OCD.
The sample consisted of 33 families (245 individuals with genotype data) ascertained for ongoing genetic studies of OCD in the US (Table 1). We included families for whom phenotype data were available for ≥2 OCD-affected individuals (broad or narrow phenotype) and who had ≥ 1 affected (narrow phenotype) and ≥1 unaffected individual with genotype data. Families ranged in size from 4 individuals in two generations to 58 individuals in four generations (examples in Figure 1). Families were ascertained via probands with DSM-IV OCD whose symptoms began before age 18 and who did not have a pervasive developmental disorder, bipolar disorder, schizophrenia, or a primary psychotic disorder. All families were Caucasian ethnicity of European (primarily Northern European) descent. Families were ascertained and collected at the University of California, San Diego and subsequently the University of California, San Francisco (CAM), the University of Michigan (GLH), and the University of Minnesota (SWK). Genome-wide linkage analyses using a less dense set of microsatellite markers in 17 of the 18 Michigan families have been previously reported(24, 26). The study was approved by the Institutional Review Boards of the participating sites. After complete discussion of the study with the participants, written informed consent or assent was obtained; parental permission was also obtained for participants under age 18.
Clinical assessments at UCSF/UCSD and Minnesota were conducted by psychiatrists or PhD-level psychologists specializing in OCD and trained in the research instruments. Clinical assessments at Michigan were conducted by interviewers with at least a master’s degree plus clinical training who were trained to ≥90% diagnostic agreement with the assessment instruments. The primary assessment instruments for all sites included the adult and child versions of the Yale-Brown Obsessive Compulsive Scale (Y-BOCS and CY-BOCS, respectively) (UCSF/UCSD and Minnesota) or the Schedule for Tourette and Other Behavioral Syndromes (STOBS), which includes a modified version of the Y-BOCS (Michigan), complemented with the Diagnostic Interview for Genetics Studies (DIGS) (UCSF/UCSD) or the Structured Clinical Interview for DSM-IV Axis I diagnoses (SCID) (Michigan and Minnesota) for adults, and the Schedule for Affective Disorders and Schizophrenia for School-Age Children (KSADS) (all sites) (30–35) (Additional details in Supplement). Because phenotypic data were collected independently and at different time periods by the sites, clinical data collection could not be standardized prospectively. Instead, phenotypes were standardized at the diagnostic level, using a common phenotype matrix across all sites that included two OCD phenotypes, an unaffected phenotype, and an unknown phenotype (see below). Concordance between sites was achieved via a best estimate (BE) consensus approach(36). This approach, which uses all available sources of information (e.g., medical records, clinical interviews, self-report questionnaires, and family history interviews), requires 100% concordance on all elements of the diagnostic criteria, and reduces the phenotypic heterogeneity that may arise from the use of different assessment instruments (details in Supplement).
Two OCD diagnoses were assigned: narrow and broad OCD. A narrow OCD diagnosis was given if the individual met all DSM-IV criteria for OCD. The broad OCD diagnosis encompassed both DSM-IV OCD and subclinical OCD, which was considered present if the individual had clear obsessions and/or compulsions, but did not quite meet the impairment or distress criteria (e.g., OC symptoms taking less than an hour and causing mild rather than moderate to severe distress and/or impairment). The broad definition was designed to capture a robust phenotype that is likely to be etiologically related to OCD, but was not severe enough to meet strict DSM-IV criteria. Participants were considered unknown for both phenotypes if there was a history of thoughts or behaviors suggestive of OC symptoms that met most, but not all criteria for subclinical OCD, or if they were under age 40 and did not have OC symptoms. Individuals with subclinical OCD who were coded as affected for the broad analyses were coded as unknown in the narrow analyses. Participants with no history of any OC symptoms who were ≥40 years old at the time of the interview were classified as unaffected. The mean age of onset of OC symptoms was 8.7, and the mean lifetime worst-ever Y-BOCS/CY-BOCS severity score was 24.0 for the broad phenotype and 24.9 for the narrow phenotype (Table 2). 15% had a co-occuring chronic tic disorder (Tourette Syndrome, chronic motor or vocal tic disorder), and 4% had a co-occurring eating disorder.
DNA extraction was performed from blood or immortalized lymphoblastoid cell lines according to standard procedures. A small number of individuals from the UCSF sample were genotyped using the Illumina Linkage Panel IVb at the UCSF Genome Core Facility (UCSF GCF). The rest were genotyped using the Illumina Human 610-Quad BeadChip at the Broad Institute (Massachussets General Hospital). Data were analyzed for quality control and Mendel errors using GenomeStudio software (Illumina). 540,123 SNPs were retained for analysis for the samples genotyped with the 610-Quad BeadChip. Those samples that were genotyped on the Illumina Linkage Panel 4b had 2,157 markers that overlapped with the Human 610-Quad BeadChip; genotypes for the additional markers were coded as missing.
Pedigree relationships were confirmed prior to analysis using PREST and PLINK(37, 38). In two families, pedigree structures were altered to incorporate non-paternities that were identified through these assessments. Parametric and nonparametric linkage analyses were conducted using Morgan version 3.0 and Merlin (details in Supplement)(39, 40). We chose to use both Merlin and Morgan because of the size and complexity of the pedigrees combined with the number of genetic markers available. While Morgan can analyze very large pedigrees, the number of markers that can be analyzed are limited, and must be in linkage equilibrium. In contrast, Merlin controls for the effects of linkage disequilibrium between markers, utilizing all available genotype information, but cannot use all individuals due to the size and complexity of the largest families. PedShrink was used to trim the pedigrees as needed for the Merlin analysis, with priority on trimming uninformative individuals(41).
We used a model-based (dominant and recessive) approach because simulation studies show that formulating a genetic model that approximates the true inheritance may have more power than nonparametric analyses, in part because parametric models can utilize information about unaffected individuals, which is not the case for nonparametric analyses(42). We also conducted non-parametric analyses (details in Supplement) because, for loci with high frequencies and low penetrances, non-parametric models may be more powerful. Because we had three different analytic approaches, each with different strengths, we were able to compare results across approaches, identifying and prioritizing those that were consistent across analyses as the most likely to represent true linkage regions.
The linkage parameters (see Supplement) were chosen to model a relatively rare locus with a large effect size and to reduce the risk of false positives due to phenocopies, given the high degree of bilineality that is seen in OCD families, including ours. We note that power to detect linkage is not sensitive to misspecification of penetrance or allele frequency, but instead is most sensitive to degree of dominance(43). Heterogeneity LOD (HLOD) scores were calculated by allowing the proportion of linked families to vary and estimating the proportion that gave the highest LOD scores for a given region.
Fine-mapping using additional SNP markers from the Illumina 610-Quad Bead Chip was conducted on chromosomal regions where the HLOD scores were ≥2.0 using the model and phenotype that showed the strongest evidence of linkage. All SNPs from the 610-Quad marker panel that were under the linkage peak of interest and had a MAF of >0.25 were identified and used for the Merlin analyses; for the Morgan analyses, this marker set was then pruned for linkage disequilibrium so that only SNPs with a pairwise r2<0.1 were included.
Haplotypes of SNPs pruned for linkage disequilibrium were generated for all linked pedigrees in the genomic region with the highest HLOD score using the “haplotype analysis” command in Simwalk2snp and visualized using Haplopainter(44). Haplotypes that were inherited identical by descent and co-segregated with the OCD phenotype were assessed within each family.
Estimates of genome-wide significance values, incorporating both the markers used in the original linkage analyses and those used for fine-mapping, were calculated using 1) simulations and 2) the autoregressive method described by Bacanau(45). Permutations were performed using gene-dropping simulations, as implemented in Morgan and Merlin. For both Morgan and Merlin, 1000 replicates were simulated and each replicate was analyzed with both parametric models and both affection statuses. The significance for each LOD score was assessed by: 1) counting the number of replicates (nr) in which the maximum LOD score exceeded the observed lod score; and 2) calculating the p-value as (nr + 1)/1001. The threshold for genome-wide significant linkage was taken to be the 49th highest LOD score of the 1000 replicates. Criteria for significant genome-wide linkage (occurring in 5% of genome scans by chance) was determined to be LOD of 2.8–2.9 for a single analysis in Morgan (3.1–3.3 in Merlin), and 3.3 considering all four parametric analyses (3.8 in Merlin). It is likely that the thresholds are higher for Merlin because many more markers were analyzed. In comparison, Lander and Krugylak suggested that a LOD score of 3.3 for a parametric linkage analysis of an “infinitely dense” map be considered the threshold for a genome-wide significant result(46). We also used the autoregressive method to generate genome-wide significance thresholds from the data and made a (conservative) Bonferroni correction for the number of genome scans that were done (both parametric and non-parametric). The range of LOD score thresholds for suggestive linkage using this approach was 3.1 to 3.5, and the range of LOD score thresholds for significant linkage was 4.1 to 4.8 (Table S1 in the Supplement).
We identified eleven chromosomal regions with HLOD scores ≥1.5, a threshold commonly used to identify linkage regions of interest, and five with a HLOD score ≥2 (Table 3). Figures S1–S3 in the Supplement shows the results of the genome-wide parametric and non-parametric analyses. The region with the highest HLOD score was on chromosome 1p36, with a maximum HLOD score of 2.96 using Merlin under the dominant model and broad phenotype (LOD score without correction for heterogeneity = −3.88) and a maximum HLOD score of 2.88 with the dominant model and narrow phenotype (LOD score without correction for heterogeneity = −2.74). The maximum HLOD score in this region with Morgan was 2.66 under a dominant model using the narrow phenotype, and the maximum LOD score in this region using the nonparametric approach was 0.87 at marker rs2377041 with Morgan and 0.93 at markers rs6676961 to rs6677984 with Morgan. Note that the difference between the parametic and nonparametric LOD scores is most likely due to the added information provided by the unaffected individuals in the parametric analyses.
Seventeen of the 33 families showed LOD scores >0 in this region. Only one of the 11 identified linkage regions has been previously reported as potentially harboring OCD susceptibility genes; the linkage region on chromosome 6p25, at ~3Mb, had a maximum HLOD score of 2.56, and is near the 6p25 region identified by Hanna et al at ~5Mb(26). When the six pedigrees that were linked in both the previous and the current studies were excluded, the evidence for linkage in this region remained, although somewhat diminished (Table 4).
Individual family LOD scores for the genomic regions with HLODs ≥2 are shown in Table S2 in the Supplement. The strongest genome-wide individual family LOD scores for the three largest families (shown in Figure 1) were 3.4 for family 1 on chromosome 1p36 (broad phenotype, dominant inheritance), 2.0 on chromosome 18q22.1 for family 2 (narrow phenotype, dominant inheritance), and 1.4 on chromosome 1q31 for family 3 (narrow phenotype, dominant inheritance).
Fine-mapping analyses were conducted on chromosomes 1p36, 2p14, 5q13, 6p25, and 10p13 (Figure 2). For chromosome 1p36, which had similar high HLOD scores for both the broad and narrow phenotypes in Merlin, we conducted fine-mapping analyses for the narrow phenotype in both Morgan and Merlin and for the broad phenotype in Merlin only, as Morgan gave a HLOD score of only 1.35 for the broad phenotype. For chromosome 2, the HLOD score decreased with the inclusion of additional markers, and for chromosomes 5 and 10, the HLOD score increased in one of the two analyses only. For all other genomic regions, the HLOD scores increased with the inclusion of additional markers for both analyses (Table 4). As in the genome-wide analysis, the highest overall HLOD score was obtained at chromosome 1p36.33 to 1p36.32, with a maximum HLOD score of 3.77 at marker rs897615 using Merlin and 3.08 at marker rs884080 using Morgan (dominant model, narrow phenotype in both) (Figure 2). The confidence interval for this linkage peak using HLOD>2.0 as the cutoff was bounded by SNPs rs884080 to rs7518255 for the Morgan analysis and by SNPs rs4475691 to rs1874266 for the Merlin analysis.
We examined the haplotypes generated by Simwalk2snp in all families linked to chromosome 1p36 using the LD-pruned SNP set. We identified haplotypes that co-segregated with OCD, encompassing the region with the highest genome-wide HLOD scores, in the majority of the linked families. In the largest family, which had a LOD score of 2.9 under the narrow phenotype (LOD = 3.4 under the broad phenotype), 11 of the 14 individuals with the narrow OCD phenotype carried a common haplotype inherited from the founder, along with all four individuals with the broad OCD phenotype and one obligate carrier. In the next largest family, which had a LOD score of 0.8 under the narrow phenotype, five of the seven biologically related individuals with the narrow OCD diagnosis (including the founder) carry a shared haplotype, along with three of the four biologically related individuals with the broad OCD phenotype, and two obligate carriers (Figure S4 in the Supplement). While there was haplotype sharing within each family, we did not identify a haplotype that was shared between families.
In this study we report the results of a genome-wide linkage analysis in multiply-affected pedigrees with childhood-onset OCD. We identifed several genomic regions of potential interest for OCD on chromosomes 1p36, 2p14, 5q13, 6p25, and 10p13. The linkage region on chromosome 1p36.33-1p36.32, which meets genome-wide criteria for suggestive linkage after fine-mapping based on our calculated significance thresholds (HLOD=3.77) and spans 4 Mb, is the strongest linkage finding for OCD reported to date, and the most interesting region that we identified. The majority of the linkage signal on 1p36 comes from our largest family, however, 16 other families also contributed to the LOD score, and had haplotypes that co-segregated with either the narrow or the broad OCD phenotype, suggesting that the finding is not specific to a single family.
Although this is the first reported linkage for OCD on chromosome 1p36, several other neuropsychiatric disorders have been linked to the 1p36 region or nearby including major depressive disorder (MDD), eating disorders (ED), childhood-onset mood disorders, and childhood epilepsy(47–50). A whole genome linkage scan of recurrent MDD identifed a maximum LOD score of 3.03 in females (there was no evidence of linkage in males) at 1p36.23 to 1p36.22 (7.6 Mb to 12.3 Mb), a region that adjoins our linkage region(47). Major depression co-occurs with OCD in about 50% of cases, including in our families, and a recent family study has suggested that childhood-onset OCD with MDD may represent a distinct etiological syndrome(51).
Similarly, a whole-genome linkage scan for ED identified a linkage region on chromosome 1p33 to 1p36 (fine mapping peak multipoint NPL score of 3.45, restricting subtype of anorexia nervosa), just centromeric to our linkage region (1p36.33 to 1p36.32, or 0 to 4 Mb)(49, 50). As with MDD, there is evidence that OCD and ED show substantial phenotypic and etiological overlap(49, 52–55). A recent comprehensive review of epidemiological, longitudinal, and family studies suggests a much higher rate of co-occurrence of ED and OCD than expected by chance, as well as a clear etiological relationship between these disorders. The data suggest that the two most likely models are 1) OCD and ED are alternate expressions (or different phases) of the same underlying etiological risk factors, or 2) OCD, which often has an earlier age of onset than ED, is a risk factor for the development of an ED(56). Rates of ED were very low in our families (~4%), and were not concentrated in the families that were linked to chromosome 1, suggesting that the second model is not likely for our sample. However, both models are consistent with shared genetic risk factors, and therefore candidate genomic regions that are identified in both OCD and ED are of increased interest.
The 1p36 region has also been implicated in a deletion syndrome (1p36 syndrome) with a complex phenotype characterized by intellectual disability and multiple system anomalies(57). Behavioral disorders, including self-biting, temper tantrums, reduced social interactions, stereotypies or other repetitive movements, and hyperphagia have been reported in approximately half of the deletion 1p36 cases(57). OCD symptoms have not specifically been reported, but given the phenomenological similarities between stereotypies and compulsive behaviors, and the role of the striatum in both phenotypes, the overlap of this known deletion syndrome with our primary linkage region strengthens the hypothesis that it may harbor susceptibility genes for OCD and perhaps for other related disorders of childhood, as do the linkage findings for MDD and ED.
The major strengths of our study are the size of the sample and the informativeness of the families for linkage analyses. We believe that the use of both parametric and nonparametric analyses is also a strength—this approach was chosen to maximize the information available from complex pedigrees that are not easily accomodated by a single phenotype or model in a disorder with an unknown mode of inheritance. Nevertheless, we recognize that multiple analyses can lead to an inflation of the LOD scores. We have corrected for this by calculating the relevant genome-wide significance thresholds using the actual data, and by examining the evidence for linkage across the multiple analyses, as well as by examining haplotypes and segregation patterns in our families. The results of the simulations indicate that our LOD scores are not artifically inflated. We believe that, for chromosome 1 at least, the convergence of results across analyses strengthens rather than weakens the evidence for linkage.
The primary limitation of this study relates to heterogeneity. For example, there may be phenotypic heterogeneity among families asertained from the different sites due to the nonuniformity of clinical assessments used (although all sites used a version of the Y-BOCS or CY-BOCS and an additional semi-structured, well-validated instrument for clinical assessment). We have addressed this by conducting BE diagnoses and requiring 100% concordance on all of the diagnostic criteria for phenotypic assignment of both OCD (narrow phenotype) and subclinical OCD (broad phenotype). We believe that this rigorous approach minimizes the problem of potential phenotypic heterogeneity.
There is also genetic heterogeneity in the study population, perhaps due to subtle ethnic variation (e.g., northern vs southern European descent). However, the use of an HLOD approach, which identifies and incorporates heterogeneity in the linkage analysis, addresses this concern. Genetic heterogeneity is also evident in the observation that, while we identified haplotypes on chromosome 1p36 that co-segregate with OCD within families, we did not identify a consistent haplotype that co-segregated with OCD across families. This is not surprising, given that the families in our sample are from outbred Caucasian populations rather than from a genetic isolate, and does not necessarily reduce our confidence in the results. It does highlight the need for further investigation of this region and the other genomic regions of interest identified by our study, however, as discussed below.
In conclusion, this work identifies a new region of interest for OCD on chromosome 1p36. Despite meeting suggestive rather than significant linkage criteria based on our simulations, this is the strongest linkage result reported to date for this complex disorder. This region has previously been associated with several neuropsychiatric phenotypes that are related to OCD, including eating and mood disorders, and the 1p36 deletion syndrome. As with all genetic investigations, follow-up studies are needed to validate and extend these findings, including replication of the linkage findings, and sequencing of the region to identify possible functional variants within particular gene(s).
Although for a time supplanted in favor of case control or trio-based approaches such as GWAS, the advent of high-throughput sequencing technology has caused a renewed interest in linkage studies for complex traits. Such studies are needed to help determine which of the many rare, potentially deleterious, variants identified through sequencing co-segregate with disease. Genome-wide linkage studies such as this one can help to identify and prioritize genomic regions of interest, as well as helping to identify the most informative individuals within linked pedigrees for either targeted or complete genome sequencing.
In addition, although the common disease/common variant approach has driven the interest in GWAS, rare variants have also been shown to be important in the etiology of common disease (e.g., Crohn’s disease)(58). Variants identified through family-based approaches, such as linkage and translocation studies, while individually affecting only a small proportion of families or individuals, have led to the identification of biologically relevant genes, gene clusters, or gene networks that are involved in the pathogenesis of Alzheimer disease (presenilin genes) and schizophrenia (DISC1 and 2)(59, 60). As with these complex traits, in order to more fully understand the biological underpinnings of OCD, multiple approaches, including (but likely not limited to) linkage, GWAS, whole-genome sequencing, and animal model studies, will ultimately be required.
This research was supported by grants to CAM from the National Center for Research Resources (K23 RR015533), the National Alliance for Research on Schizophrenia and Affective Disorders, the Obsessive Compulsive Foundation, and the Althea Foundation, by grants to GLH from the National Institute of Mental Health (K20 MH 01065 and R01 MH 58376) and the Obsessive Compulsive Foundation, and to DAC from the National Institute of Mental Health (K01 MH072952).
FINANCIAL DISCLOSURES: The authors report no biomedical financial interests or potential conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.