1.  Candidate genes and functional noncoding variants identified in a canine model of obsessive-compulsive disorder 
Genome Biology  2014;15(3):R25.
Obsessive-compulsive disorder (OCD), a severe mental disease manifested in time-consuming repetition of behaviors, affects 1 to 3% of the human population. While highly heritable, complex genetics has hampered attempts to elucidate OCD etiology. Dogs suffer from naturally occurring compulsive disorders that closely model human OCD, manifested as an excessive repetition of normal canine behaviors that only partially responds to drug therapy. The limited diversity within dog breeds makes identifying underlying genetic factors easier.
We use genome-wide association of 87 Doberman Pinscher cases and 63 controls to identify genomic loci associated with OCD and sequence these regions in 8 affected dogs from high-risk breeds and 8 breed-matched controls. We find 119 variants in evolutionarily conserved sites that are specific to dogs with OCD. These case-only variants are significantly more common in high OCD risk breeds compared to breeds with no known psychiatric problems. Four genes, all with synaptic function, have the most case-only variation: neuronal cadherin (CDH2), catenin alpha2 (CTNNA2), ataxin-1 (ATXN1), and plasma glutamate carboxypeptidase (PGCP). In the 2 Mb gene desert between the cadherin genes CDH2 and DSC3, we find two different variants found only in dogs with OCD that disrupt the same highly conserved regulatory element. These variants cause significant changes in gene expression in a human neuroblastoma cell line, likely due to disrupted transcription factor binding.
The limited genetic diversity of dog breeds facilitates identification of genes, functional variants and regulatory pathways underlying complex psychiatric disorders that are mechanistically similar in dogs and humans.
PMCID: PMC4038740  PMID: 24995881
2.  Canine Hereditary Ataxia in Old English Sheepdogs and Gordon Setters Is Associated with a Defect in the Autophagy Gene Encoding RAB24 
PLoS Genetics  2014;10(2):e1003991.
Old English Sheepdogs and Gordon Setters suffer from a juvenile onset, autosomal recessive form of canine hereditary ataxia primarily affecting the Purkinje neuron of the cerebellar cortex. The clinical and histological characteristics are analogous to hereditary ataxias in humans. Linkage and genome-wide association studies on a cohort of related Old English Sheepdogs identified a region on CFA4 strongly associated with the disease phenotype. Targeted sequence capture and next generation sequencing of the region identified an A to C single nucleotide polymorphism (SNP) located at position 113 in exon 1 of an autophagy gene, RAB24, that segregated with the phenotype. Genotyping of six additional breeds of dogs affected with hereditary ataxia identified the same polymorphism in affected Gordon Setters that segregated perfectly with phenotype. The other breeds tested did not have the polymorphism. Genome-wide SNP genotyping of Gordon Setters identified a 1.9 MB region with an identical haplotype to affected Old English Sheepdogs. Histopathology, immunohistochemistry and ultrastructural evaluation of the brains of affected dogs from both breeds identified dramatic Purkinje neuron loss with axonal spheroids, accumulation of autophagosomes, ubiquitin positive inclusions and a diffuse increase in cytoplasmic neuronal ubiquitin staining. These findings recapitulate the changes reported in mice with induced neuron-specific autophagy defects. Taken together, our results suggest that a defect in RAB24, a gene associated with autophagy, is highly associated with and may contribute to canine hereditary ataxia in Old English Sheepdogs and Gordon Setters. This finding suggests that detailed investigation of autophagy pathways should be undertaken in human hereditary ataxia.
Author Summary
Neurodegenerative diseases are one of the most important causes of decline in an aging population. An important subset of these diseases are known as the hereditary ataxias, familial neurodegenerative diseases that affect the cerebellum causing progressive gait disturbance in both humans and dogs. We identified a mutation in RAB24, a gene associated with autophagy, in Old English Sheepdogs and Gordon Setters with hereditary ataxia. Autophagy is a process by which cell proteins and organelles are removed and recycled and its critical role in maintenance of the continued health of cells is becoming clear. We evaluated the brains of affected dogs and identified accumulations of autophagosomes within the cerebellum, suggesting a defect in the autophagy pathway. Our results suggest that a defect in the autophagy pathway results in neuronal death in a naturally occurring disease in dogs. The autophagy pathway should be investigated in human hereditary ataxia and may represent a therapeutic target in neurodegenerative diseases.
PMCID: PMC3916225  PMID: 24516392
3.  Mutations causing medullary cystic kidney disease type 1 (MCKD1) lie in a large VNTR in MUC1 missed by massively parallel sequencing 
Nature genetics  2013;45(3):299-303.
While genetic lesions responsible for some Mendelian disorders can be rapidly discovered through massively parallel sequencing (MPS) of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple Mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing, and de novo assembly, we found that each of six MCKD1 families harbors an equivalent, but apparently independently arising, mutation in sequence dramatically underrepresented in MPS data: the insertion of a single C in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5-5 kb), GC-rich (>80%), coding VNTR in the mucin 1 gene. The results provide a cautionary tale about the challenges in identifying genes responsible for Mendelian, let alone more complex, disorders through MPS.
PMCID: PMC3901305  PMID: 23396133
4.  Genome-wide analyses implicate 33 loci in heritable dog osteosarcoma, including regulatory variants near CDKN2A/B 
Genome Biology  2013;14(12):R132.
Canine osteosarcoma is clinically nearly identical to the human disease, but is common and highly heritable, making genetic dissection feasible.
Through genome-wide association analyses in three breeds (greyhounds, Rottweilers, and Irish wolfhounds), we identify 33 inherited risk loci explaining 55% to 85% of phenotype variance in each breed. The greyhound locus exhibiting the strongest association, located 150 kilobases upstream of the genes CDKN2A/B, is also the most rearranged locus in canine osteosarcoma tumors. The top germline candidate variant is found at a >90% frequency in Rottweilers and Irish wolfhounds, and alters an evolutionarily constrained element that we show has strong enhancer activity in human osteosarcoma cells. In all three breeds, osteosarcoma-associated loci and regions of reduced heterozygosity are enriched for genes in pathways connected to bone differentiation and growth. Several pathways, including one of genes regulated by miR124, are also enriched for somatic copy-number changes in tumors.
Mapping a complex cancer in multiple dog breeds reveals a polygenic spectrum of germline risk factors pointing to specific pathways as drivers of disease.
PMCID: PMC4053774  PMID: 24330828
5.  Copy number expansion of the STX17 duplication in melanoma tissue from Grey horses 
BMC Genomics  2012;13:365.
Greying with age in horses is an autosomal dominant trait, associated with loss of hair pigmentation, melanoma and vitiligo-like depigmentation. We recently identified a 4.6 kb duplication in STX17 to be associated with the phenotype. The aims of this study were to investigate if the duplication in Grey horses shows copy number variation and to exclude that any other polymorphism is uniquely associated with the Grey mutation.
We found little evidence for copy number expansion of the duplicated sequence in blood DNA from Grey horses. In contrast, clear evidence for copy number expansions was indicated in five out of eight tested melanoma tissues or melanoma cell lines. A tendency of a higher copy number in aggressive tumours was also found. Massively parallel resequencing of the ~350 kb Grey haplotype did not reveal any additional mutations perfectly associated with the phenotype, confirming the duplication as the true causative mutation. We identified three SNP alleles that were present in a subset of Grey haplotypes within the 350 kb region that shows complete linkage disequilibrium with the causative mutation. Thus, these three nucleotide substitutions must have occurred subsequent to the duplication, consistent with our interpretation that the Grey mutation arose more than 2,000 years before present.
These results suggest that the mutation acts as a melanoma-driving regulatory element. The elucidation of the mechanistic features of the duplication will be of considerable interest for the characterization of these horse melanomas as well as for the field of human melanoma research.
PMCID: PMC3443021  PMID: 22857264
STX17; Melanoma; Hair greying; Copy number variation; Melanocytes
6.  Genetic variants and disease-associated factors contribute to enhanced IRF-5 expression in blood cells of systemic lupus erythematosus patients 
Arthritis and rheumatism  2010;62(2):562-573.
Genetic variants of the interferon (IFN) regulatory factor 5 (IRF5) gene are associated with systemic lupus erythematosus (SLE) susceptibility. The contribution of these variants to IRF-5 expression in primary blood cells of SLE patients has not been addressed, nor has the role of type I IFN. The aim of this study was to determine the association between increased IRF-5 expression and the IRF5 risk haplotype in SLE patients.
IRF-5 transcript and protein levels in 44 Swedish patients with SLE and 16 healthy controls were measured by quantitative real-time PCR, minigene assay, and flow cytometry. The rs2004640, rs10954213, rs10488631 and the CGGGG indel were genotyped in these patients. Genotypes of these polymorphisms defined a common risk and protective haplotype.
IRF-5 expression and alternative splicing were significantly upregulated in SLE patients versus healthy donors. Enhanced transcript and protein levels were associated with the risk haplotype of IRF5; rs10488631 gave the only significant independent association that correlated with increased transcription from non-coding exon 1C. Minigene experiments demonstrated an important role for rs2004640 and the CGGGG indel, along with type I IFNs in regulating IRF-5 expression.
This study provides the first formal proof that IRF-5 expression and alternative splicing are significantly upregulated in primary blood cells of SLE patients. The risk haplotype is associated with enhanced IRF-5 transcript and protein expression in SLE patients.
PMCID: PMC3213692  PMID: 20112383
7.  Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping 
PLoS Genetics  2011;7(10):e1002316.
The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Author Summary
There are hundreds of dog breeds that exhibit massive differences in appearance and behavior sculpted by tightly controlled selective breeding. This large-scale natural experiment has provided an ideal resource that geneticists can use to search for genetic variants that control these differences. With this goal, we developed a high-density array that surveys variable sites at more than 170,000 positions in the dog genome and used it to analyze genetic variation in 46 breeds. We identify 44 chromosomal regions that are extremely variable between breeds and are likely to control many of the traits that vary between them, including curly tails and sociality. Many other regions also bear the signature of strong artificial selection. We characterize one such region, known to associate with body size and ear type, in detail using “next-generation” sequencing technology to identify candidate mutations that may control these traits. Our results suggest that artificial selection has targeted genes involved in development and metabolism and that it may have increased the incidence of disease in dog breeds. Knowledge of these regions will be of great importance for uncovering the genetic basis of variation between dog breeds and for finding mutations that cause disease.
PMCID: PMC3192833  PMID: 22022279
8.  Mutation discovery in mice by whole exome sequencing 
Genome Biology  2011;12(9):R86.
We report the development and optimization of reagents for in-solution, hybridization-based capture of the mouse exome. By validating this approach in a multiple inbred strains and in novel mutant strains, we show that whole exome sequencing is a robust approach for discovery of putative mutations, irrespective of strain background. We found strong candidate mutations for the majority of mutant exomes sequenced, including new models of orofacial clefting, urogenital dysmorphology, kyphosis and autoimmune hepatitis.
PMCID: PMC3308049  PMID: 21917142
9.  A candidate gene study of the type I interferon pathway implicates IKBKE and IL8 as risk loci for SLE 
Systemic Lupus Erythematosus (SLE) is a systemic autoimmune disease in which the type I interferon pathway has a crucial role. We have previously shown that three genes in this pathway, IRF5, TYK2 and STAT4, are strongly associated with risk for SLE. Here, we investigated 78 genes involved in the type I interferon pathway to identify additional SLE susceptibility loci. First, we genotyped 896 single-nucleotide polymorphisms in these 78 genes and 14 other candidate genes in 482 Swedish SLE patients and 536 controls. Genes with P<0.01 in the initial screen were then followed up in 344 additional Swedish patients and 1299 controls. SNPs in the IKBKE, TANK, STAT1, IL8 and TRAF6 genes gave nominal signals of association with SLE in this extended Swedish cohort. To replicate these findings we extracted data from a genomewide association study on SLE performed in a US cohort. Combined analysis of the Swedish and US data, comprising a total of 2136 cases and 9694 controls, implicates IKBKE and IL8 as SLE susceptibility loci (Pmeta=0.00010 and Pmeta=0.00040, respectively). STAT1 was also associated with SLE in this cohort (Pmeta=3.3 × 10−5), but this association signal appears to be dependent of that previously reported for the neighbouring STAT4 gene. Our study suggests additional genes from the type I interferon system in SLE, and highlights genes in this pathway for further functional analysis.
PMCID: PMC3060320  PMID: 21179067
systemic lupus erythematosus; type I interferon system; candidate gene study; single nucleotide polymorphism; IKBKE; IL8
10.  Performance of Microarray and Liquid Based Capture Methods for Target Enrichment for Massively Parallel Sequencing and SNP Discovery 
PLoS ONE  2011;6(2):e16486.
Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP) discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74–75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41–67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.
PMCID: PMC3036585  PMID: 21347407
11.  Identification of the Bovine Arachnomelia Mutation by Massively Parallel Sequencing Implicates Sulfite Oxidase (SUOX) in Bone Development 
PLoS Genetics  2010;6(8):e1001079.
Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development.
Author Summary
Arachnomelia is a defect in skeletal development of cattle. Affected calves are born dead with elongated limbs and facial deformities. The causative mutation for this recessive condition had previously been mapped to a ∼7 Mb interval. We exploited the special structure of cattle families to identify the causative mutation by a purely genetic approach. The rich pedigree records in cattle breeding allowed us to identify the founder animal of arachnomelia, a Brown Swiss bull born in 1957. A few generations later several cattle received two copies of the same chromosome segment from the father of this bull due to inbreeding. One copy was passed through the founder animal and acquired the causative mutation, while the other copy was transmitted through a different line of animals and stayed in its ancestral state. Using next-generation sequencing, we sequenced the entire critical interval in one of these inbred animals. As expected, we found only one single heterozygous position, which consequently represents the causative mutation for arachnomelia. The mutation affects the gene for sulfite oxidase, thus indicating a previously unrecognized important role for this enzyme in bone development. Our findings can immediately be applied to remove this deleterious mutation from the cattle breeding population.
PMCID: PMC2928811  PMID: 20865119
12.  A risk haplotype of STAT4 for systemic lupus erythematosus is over-expressed, correlates with anti-dsDNA and shows additive effects with two risk alleles of IRF5 
Human Molecular Genetics  2008;17(18):2868-2876.
Systemic lupus erythematosus (SLE) is the prototype autoimmune disease where genes regulated by type I interferon (IFN) are over-expressed and contribute to the disease pathogenesis. Because signal transducer and activator of transcription 4 (STAT4) plays a key role in the type I IFN receptor signaling, we performed a candidate gene study of a comprehensive set of single nucleotide polymorphism (SNPs) in STAT4 in Swedish patients with SLE. We found that 10 out of 53 analyzed SNPs in STAT4 were associated with SLE, with the strongest signal of association (P = 7.1 × 10−8) for two perfectly linked SNPs rs10181656 and rs7582694. The risk alleles of these 10 SNPs form a common risk haplotype for SLE (P = 1.7 × 10−5). According to conditional logistic regression analysis the SNP rs10181656 or rs7582694 accounts for all of the observed association signal. By quantitative analysis of the allelic expression of STAT4 we found that the risk allele of STAT4 was over-expressed in primary human cells of mesenchymal origin, but not in B-cells, and that the risk allele of STAT4 was over-expressed (P = 8.4 × 10−5) in cells carrying the risk haplotype for SLE compared with cells with a non-risk haplotype. The risk allele of the SNP rs7582694 in STAT4 correlated to production of anti-dsDNA (double-stranded DNA) antibodies and displayed a multiplicatively increased, 1.82-fold risk of SLE with two independent risk alleles of the IRF5 (interferon regulatory factor 5) gene.
PMCID: PMC2525501  PMID: 18579578
13.  Quantitative evaluation by minisequencing and microarrays reveals accurate multiplexed SNP genotyping of whole genome amplified DNA 
Nucleic Acids Research  2003;31(21):e129.
Whole genome amplification (WGA) procedures such as primer extension preamplification (PEP) or multiple displacement amplification (MDA) have the potential to provide an unlimited source of DNA for large-scale genetic studies. We have performed a quantitative evaluation of PEP and MDA for genotyping single nucleotide polymorphisms (SNPs) using multiplex, four-color fluorescent minisequencing in a microarray format. Forty-five SNPs were genotyped and the WGA methods were evaluated with respect to genotyping success, signal-to-noise ratios, power of genotype discrimination, yield and imbalanced amplification of alleles in the MDA product. Both PEP and MDA products provided genotyping results with a high concordance to genomic DNA. For PEP products the power of genotype discrimination was lower than for MDA due to a 2-fold lower signal-to-noise ratio. MDA products were indistinguishable from genomic DNA in all aspects studied. To obtain faithful representation of the SNP alleles at least 0.3 ng DNA should be used per MDA reaction. We conclude that the use of WGA, and MDA in particular, is a highly promising procedure for producing DNA in sufficient amounts even for genome wide SNP mapping studies.
PMCID: PMC275486  PMID: 14576329
14.  Multiplex SNP genotyping in pooled DNA samples by a four-colour microarray system 
Nucleic Acids Research  2002;30(14):e70.
We selected 125 candidate single nucleotide polymorphisms (SNPs) in genes belonging to the human type 1 interferon (IFN) gene family and the genes coding for proteins in the main type 1 IFN signalling pathway by screening databases and by in silico comparison of DNA sequences. Using quantitative analysis of pooled DNA samples by solid-phase mini-sequencing, we found that only 20% of the candidate SNPs were polymorphic in the Finnish and Swedish populations. To allow more effective validation of candidate SNPs, we developed a four-colour microarray-based mini-sequencing assay for multiplex, quantitative allele frequency determination in pooled DNA samples. We used cyclic mini-sequencing reactions with primers carrying 5′-tag sequences, followed by capture of the products on microarrays by hybridisation to complementary tag oligonucleotides. Standard curves prepared from mixtures of known amounts of SNP alleles demonstrate the applicability of the system to quantitative analysis, and showed that for about half of the tested SNPs the limit of detection for the minority allele was below 5%. The microarray-based genotyping system established here is universally applicable for genotyping and quantification of any SNP, and the validated system for SNPs in type 1 IFN-related genes should find many applications in genetic studies of this important immunoregulatory pathway.
PMCID: PMC135771  PMID: 12136118

