|Home | About | Journals | Submit | Contact Us | Français|
To identify rare variants contributing to multiple sclerosis (MS) susceptibility in a family we have previously reported with up to 15 individuals affected across 4 generations.
We performed exome sequencing in a subset of affected individuals to identify novel variants contributing to MS risk within this unique family. The candidate variant was genotyped in a validation cohort of 2,104 MS trio families.
Four family members with MS were sequenced and 21,583 variants were found to be shared among these individuals. Refining the variants to those with 1) a predicted loss of function and 2) present within regions of modest haplotype sharing identified 1 novel mutation (rs55762744) in the tyrosine kinase 2 (TYK2) gene. A different polymorphism within this gene has been shown to be protective in genome-wide association studies. In contrast, the TYK2 variant identified here is a novel, missense mutation and was found to be present in 10/14 (72%) cases and 28/60 (47%) of the unaffected family members. Genotyping additional 2,104 trio families showed the variant to be transmitted preferentially from heterozygous parents (transmitted 16: not transmitted 5; χ2 = 5.76, p = 0.016).
Rs55762744 is a rare variant of modest effect on MS risk affecting a subset of patients (0.8%). Within this pedigree, rs55762744 is common and appears to be a modifier of modest risk effect. Exome sequencing is a quick and cost-effective method and we show here the utility of sequencing a few cases from a single, unique family to identify a novel variant. The sequencing of additional family members or other families may help identify other variants important in MS.
Multiple sclerosis (MS) is a common, complex, neurologic condition with both environmental and genetic factors contributing to risk. Epidemiologic studies have highlighted the significant influence of latitude on MS prevalence and Epstein-Barr virus infection, smoking, and low vitamin D levels are strongly implicated environmental risk factors.1,2 Genetic contributions are clear from family studies with 15%–20% of individuals with MS having an affected relative.3 Furthermore, monozygotic twins have a 25% concordance rate as compared to the 5% seen in dizygotic twins.4 These and other familial studies have demonstrated that genes contribute significantly to the familial aggregation of MS.
Despite a strong genetic component, it is extremely rare to find families with 5 or more individuals with MS in successive generations. We previously reported a family with up to 15 individuals with MS.5 MS was present in 4 generations and penetrance was relatively consistent between these generations. The family appeared to implicate a monogenic form of MS but a genome-wide screen for linkage showed no significant linkage.6 There were a number of regions that showed mild, nonsignificant, evidence for linkage in this family. We speculated that the presence of phenocopies or genetic heterogeneity may hinder the ability to identify a significantly linked locus in this unique family.6
Massively parallel (next-generation) sequencing has been responsible for the identification of several autosomal recessive and dominant genes in Mendelian disease.7 Given the relatively high penetrance and apparent autosomal dominant segregation pattern observed in the pedigree, we applied next-generation sequencing technology to 4 individuals with MS in the pedigree.
The institutional review board of the University of Western Ontario approved this study.
Ascertainment and a detailed clinical description of the family has been described in detail elsewhere.5,6 There is a wide range of age at disease onset with the average onset age being 27 years (17–49 years). The female to male ratio is 1.8:1 and all forms of MS are represented in the pedigree (i.e., relapsing-remitting, primary progressive, and secondary progressive). The family is of German ancestry and there is no consanguinity. The pedigree is shown in figure 1.
DNA was extracted by standard methods. Four individuals were randomly selected for exome sequencing and are shown in the pedigree (figure 1). Exome capture was performed with the SureSelect Human All Exon kit (Agilent Technologies). Exon-enriched DNA was sequenced by the Illumina Genome Analyzer II platform following the manufacturer's instructions (Illumina). Raw image files were processed by the Illumina pipeline (version 1.3.4) for base calling with default parameters. The sequencing reads were aligned to the NCBI human reference genome (NCBI36.3) using bowtie 0.12.78 (options “-a –best –strata”). Single nucleotide polymorphisms (SNPs) were subsequently called using SAMTools (v0.1.8 [r613]9) with the following quality criteria: ≥×20 coverage and a Phred-like10 consensus quality of ≥30 and a SNP quality score of ≥100. Variants were defined as heterozygous when ≥25% of all nucleotides at the position showed nonreference bases and as homozygous according to the initial SAMTools classification (using default parameters). We used PolyPhen2 to assess nonsynonymous variants for a likely functional impact11; both the HumDiv and HumVar datasets were screened.
Reads for which bowtie was unsuccessful in identifying an ungapped alignment were mapped to the human reference genome with BWA,12 using default parameters. Indels of up to 4 bases were subsequently called using SAMTools (v0.1.8 [r613]9) with the following quality criteria: ≥×20 coverage, a Phred-like10 consensus quality of ≥50, and a SNP quality score of ≥100. Variants were defined as heterozygous when ≥25% of all nucleotides at the position showed nonreference bases and as homozygous according to the initial SAMTools classification (using default parameters).
Files for linkage analyses were prepared with MEGA213 and parametric linkage was performed with SIMWALK 2.0.14 A disease allele frequency was set at 0.001, the phenocopy rate was 0.001, and the penetrance was set low (0.01) to permit a more inclusive, albeit less powerful, screen for linkage. The analyses tested linkage to 681 genotyped markers with mapped distances taken from the Marshfield Genotyping Centre Database described in detail elsewhere.6 The analyses also included genotypes from the HLA-DRB1 locus presented in the original publication.5
An additional cohort of 2,104 parents–affected child trios and 1,543 healthy controls were available for confirmation studies. Genotyping of additional families was performed using TaqMan (Applied Biosystems) assays.
Paired, 76 base pair (bp) reads from postenrichment shotgun libraries were aligned to the reference genome. On average, 3.1 gigabases (Gb) of mapped sequence was generated per individual and 85% of reads were mapped. The average coverage of each exome was 70-fold. On average, 57,750 SNPs were called per individual (range 53,417–62,098), of which 64% were already annotated in a public database (dbSNP v131). A total of 19,743 SNPs were in common among all 4 patients and 7,701 of these were novel. A total of 396 SNPs were predicted to be damaging (loss of function) by PolyPhen2, of which 240 had a dbSNP ID. On average, 5,038 indels were called per individual, of which 13.5% were previously annotated in dbSNP v131. A total of 1,840 indels were in common among all 4 patients, 1,562 were novel (i.e., not present in dbSNP), and 367 were exonic. In order to refine our list of variants, we included only those variants that were frameshift indels or PolyPhen2 predicted to be loss of function (damaging) and were common to all 4 affected individuals and present in regions that showed evidence for linkage under a “rare variant” model of multipoint linkage analyses (figure 2; table).
A single SNP in the TYK2 gene on chromosome 19p13 was the only variant identified using these filter conditions (figure 2; table). The variant encodes a missense mutation in exon 3 of TYK2 that changes an alanine to threonine (A53T) and it is predicted to impact protein function by PolyPhen2 and SIFT.15 The genomic evolutionary rate profiling (GERP) score was +4.20. The GERP scores for rs897738 and rs6427384, the 2 “possibly” damaging variants, were +2.68 and −1.62, respectively (table). The A53T residue is highly conserved between species.
The variant at TYK2 has previously been reported once in dbSNP by a study on cancer genomes and has an ID of rs55762744. The SNP was not reported by the 1000 Genomes project. MS has previously been shown to be associated with another, nonsynonymous SNP (rs34536443) in TYK2 (P1104A), which lies over 25 kb downstream of rs55762744. The 4 individuals sequenced did not carry the rs34536443 MS-associated SNP. There were no obvious differences with MS phenotype or HLA genotype in the TYK2 A53T variant carriers and noncarriers.
The variant was genotyped in the remaining affected family members. Of the affected individuals with DNA available, a total of 10/14 (72%) were positive for the TYK2 mutation. The unaffected family members were also genotyped and a total of 28/60 (47%) were positive for the variant. When a parent was a heterozygous carrier for the mutation, the parent transmitted to an affected offspring 10 times and did not transmit 2 times (χ2 = 5.33, p = 0.02). We then genotyped 2,104 MS “trio” families and 16 cases were positive for the TYK2 variant (0.8%). Twenty-one unaffected parents were heterozygous for rs55762744 and the risk allele was transmitted preferentially to 16 cases and not transmitted to 5 cases (χ2 = 5.76, p = 0.016). A control sample was genotyped for the variant and 10 individuals of 1,543 (0.6%) were positive for the T allele and not significantly different from the affected allele frequency.
We also tested for any rare variants in the other known MS susceptibility genes outside of the linkage regions identified by the MS GWA studies. These genes included IL2RA, IL7R, CLEC16A, CD58, TNFRSF1A, IRF8, EV15, KIF1B, KIF21B, CD40, STAT3, CBLB, CD6, CD226, and GPC5. There were no frameshift insertions, deletions, or damaging SNPs identified in any of the 4 sequenced individuals at these genes.
The family presented here is unique for its number of affected individuals with MS in multiple generations, the extent of clinical investigation performed, and the advantage of 20 years of routine follow-up and prospectively identified cases. The family appeared to be segregating MS as an autosomal dominant condition and this is suggestive a priori that this family may harbor a dominant acting mutation and be amenable to a resequencing strategy.
The sequencing of 4 individuals from a single family resulted in the identification of a great number of shared variants (>20,000). The selection of cases is an important step in family studies of this type. Given the absence of significant linkage (lod > 3.3) we opted to select the individuals at random. If linkage had been observed, the individuals selected for sequencing would have been best selected as those sharing a common haplotype. Alternatively, sequencing family members that are relatively more distantly related may be overly conservative given the likely genetic heterogeneity present in a family with no significant linkage. The filtering of the variants can be a delicate process; the challenge is to reduce the number of variants to a manageable number of candidates and yet not to lose a possible causative variant in the process. Including the linkage results under a “rare” disease allele model was integral to the filtering process and has been used with success in Mendelian disease.16 We did not observe significant linkage and we assessed regions with mild to moderate evidence of haplotype sharing (i.e., lod ≥1). We were confident of the result for many reasons; TYK2 is a gene that has recently been shown to be implicated in MS as a protective allele (odds ratio = 0.8) in several large association studies.17–19 Furthermore, the observation of a GERP score of +4.20 was also in keeping with a deleterious genetic change.20 Polyphen2 and SIFT also showed evidence of an impact on protein function and, finally, there was significant transmission disequilibrium from parents to affected offspring (16 times transmitted vs 5 times not transmitted). A large, family-based TDT cohort for follow-up was necessary as the T allele was relatively common in our control sample (0.6%) and a case-control approach would not have easily identified this allele as a susceptibility variant. There may be other rare variants at TYK2 that modify risk and resequencing of the entire gene in hundreds to thousands of cases may prove useful in their identification. It is very likely that our conservative filtering approach missed other non-TYK2 risk alleles segregating within this family. The resequencing of the exome of other or all family members may be of benefit; in particular, those lacking the TYK2 risk variant.
TYK2 encodes a protein kinase that operates to phosphorylate proteins in the JAX-STAT3 immune pathway and regulate downstream immune cytokines.21 Analysis of protein–protein interactions by STRING22 highlighted an interaction of TYK2 with many of the associated MS genes identified by GWA studies (figure 3) including IL7R and STAT3, lending further support for the involvement of TYK2 in MS risk and likely acting through an immune mechanism. It is intriguing that the previously known MS-associated SNP, rs34536443, is considered a protective allele, whereas we provide evidence that rs55762744 confers risk to MS within this family. A recent study has shown the protective allele to alter tyrosine kinase activity and modify the cytokine profile to a TH2 phenotype.21 It may be common to see both protective and susceptibility alleles at the GWAS-associated genes akin to the variants observed at the MHC. An example would be HLA-DRB1*15 that acts as a susceptibility allele while HLA-DRB1*14 acts as a protective allele.23 How the newly identified variant acts to increase MS risk is not known and functional studies are necessary and currently underway to clarify its role.
The most appropriate method for the identification of rare variants in complex, qualitative disease is being debated.24 A suggested approach involves the exome or genome sequencing of many affected individuals and healthy controls. Having enough power and resources may be problematic at this time. Alternatively, the enrichment and resequencing of select regions that have been implicated by GWA studies may be fruitful. This latter strategy would have worked for the present study with (or without) the modest linkage information included as a filter. From this report, it is clear that a rare variant can have low penetrance even in a seemingly autosomal dominant family and this can confound linkage analysis and subsequent filtering if unaffected cases are included. In our pedigree 75% of carriers of the TYK2 mutation did not have MS and the linkage was not significant (lod = 1.75) although 19p13 was the region with the strongest evidence of haplotype sharing in a scan of the entire genome. While linkage worked for this investigation we also benefited from previous GWA studies that identified TYK2 as the site of a resistance allele.
We report here one of the first exome sequencing strategies to identify a novel variant for a common complex disease and we believe this approach will be of significant utility for many other traits in the near future.
Editorial, page 396
D. Dyment: drafting/revising the manuscript, study concept or design, analysis or interpretation of data, statistical analysis, study supervision, obtaining funding. M.Z. Cader: drafting/revising the manuscript, study concept or design, analysis or interpretation of data. M. Chao: drafting/revising the manuscript, acquisition of data. M. Lincoln: analysis or interpretation of data, acquisition of data. K. Morrison: drafting/revising the manuscript, contribution of vital reagents/tools/patients, study supervision. G. Disanto: analysis or interpretation of data. J. Morahan: drafting/revising the manuscript. G. DeLuca: drafting/revising the manuscript. A.D. Sadovnick: drafting/revising the manuscript, study concept or design, acquisition of data, study supervision, obtaining funding. P. Lepage: analysis or interpretation of data, acquisition of data. A. Monpetit: drafting/revising the manuscript, acquisition of data, study supervision. G. Ebers: drafting/revising the manuscript, study concept or design, analysis or interpretation of data, contribution of vital reagents/tools/patients, study supervision. S. Ramagopalan: drafting/revising the manuscript, study concept or design, analysis or interpretation of data, acquisition of data, statistical analysis, study supervision.
D. Dyment receives support from the CIHR Institute of Genetics Clinical Investigatorship Award. M.Z. Cader receives research support from the Medical Research Council of the United Kingdom. M. Chao, M. Lincoln, K. Morrison, G. Disanto, J. Morahan, and G. De Luca report no disclosures. A.D. Sadovnick has received funding for travel from Bayer Canada, the European Charcot Foundation, the MS Society of Canada, the National MS Society, and the Consortium of MS Clinics; has received speaker honoraria from Teva Scientific and the MS Society of Canada; and has received research support from the MS Society of Canada Scientific Research Foundation (Co-PI) and the CIHR (Co-PI). P. Lepage and A. Montpetit report no disclosures. G. Ebers has received a speaker honorarium from Roche and a consultation fee from UCB Pharma and receives research support from Bayer Schering Pharma, the Multiple Sclerosis Society of the United Kingdom (grant 865_07 [PI] and grant 875_07 [PI]), and the Multiple Sclerosis Society of Canada Scientific Research Foundation (PI and co-PI). S. Ramagopalan receives research support from the Multiple Sclerosis Society of Canada Scientific Research Foundation and the Multiple Sclerosis Society of the United Kingdom. Go to Neurology.org for full disclosures.