1.  Detecting Large Copy Number Variants Using Exome Genotyping Arrays In a Large Swedish Schizophrenia Sample 
Molecular psychiatry  2013;18(11):1178-1184.
Although copy number variants (CNVs) are important in genomic medicine, CNVs have not been systematically assessed for many complex traits. Several large rare CNVs increase risk for schizophrenia (SCZ) and autism and often demonstrate pleiotropic effects; however, their frequencies in the general population and other complex traits are unknown. Genotyping large numbers of samples is essential for progress. Large cohorts from many different diseases are being genotyped using exome-focused arrays designed to detect uncommon or rare protein-altering sequence variation. Although these arrays were not designed for CNV detection, the hybridization intensity data generated in each experiment could, in principle, be used for gene-focused CNV analysis. Our goal was to evaluate the extent to which CNVs can be detected using data from one particular exome array (the Illumina Human Exome Bead Chip). We genotyped 9, 100 Swedish subjects (3, 962 cases with SCZ and 5, 138 controls) using both standard GWAS arrays and exome arrays. In comparison to CNVs detected using GWAS arrays, we observed high sensitivity and specificity for detecting genic CNVs ≥400 kb including known pathogentic CNVs along with replicating the literature finding that cases with SCZ had greater enrichment for genic CNVs. Our data confirm the association of SCZ with 16p11.2 duplications and 22q11.2 deletions and suggest a novel association with deletions at 11q12.2. Our results suggest the utility of exome focused arrays in surveying large genic CNVs in very large samples; and thereby open the door for new opportunities such as conducting well-powered CNV assessment and comparisons between different diseases. The use of a single platform also minimizes potential confounding factors that could impact accurate detection.
PMCID: PMC3966073  PMID: 23938935
schizophrenia; copy number variation; structural variation; genotyping; Illumina; exome array
2.  Endogenous retinoids in the pathogenesis of alopecia areata 
Alopecia areata (AA) is an autoimmune disease that attacks anagen hair follicles. Gene array in graft-induced C3H/HeJ mice revealed that genes involved in retinoic acid (RA) synthesis were increased, while RA degradation genes were decreased in AA compared to sham controls. This was confirmed by immunohistochemistry in biopsies from patients with AA and both mouse and rat AA models. RA levels were also increased in C3H/HeJ mice with AA. C3H/HeJ mice were fed a purified diet containing one of four levels of dietary vitamin A or an unpurified diet two weeks before grafting and disease progression followed. High vitamin A accelerated AA, while mice fed no vitamin A had more severe disease by the end of the study. More hair follicles were in anagen in mice fed high vitamin A. Both the number and localization of granzyme B positive cells were altered by vitamin A. IFNG was also lowest and IL13 highest in mice fed high vitamin A. Other cytokines were reduced and chemokines increased as the disease progressed, but no additional effects of vitamin A were seen. Combined, these results suggest that vitamin A regulates both the hair cycle and immune response to alter the progression of AA.
PMCID: PMC3546144  PMID: 23014334
3.  Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation 
Nucleic Acids Research  2012;41(3):1519-1532.
Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth–based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth–based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.
PMCID: PMC3561969  PMID: 23275535
4.  Association of Candidate Genes with Phenotypic Traits Relevant to Anorexia Nervosa 
European Eating Disorders Review  2011;19(6):487-493.
This analysis is a follow-up to an earlier investigation of 182 genes selected as likely candidate genetic variations conferring susceptibility to anorexia nervosa (AN). As those initial case-control results revealed no statistically significant differences in single nucleotide polymorphisms, herein we investigate alternative phenotypes associated with AN. In 1762 females using regression analyses we examined: (1) lowest illness-related attained body mass index; (2) age at menarche; (3) drive for thinness; (4) body dissatisfaction; (5) trait anxiety; (6) concern over mistakes; and (7) the anticipatory worry and pessimism vs. uninhibited optimism subscale of the harm avoidance scale. After controlling for multiple comparisons, no statistically significant results emerged. Although results must be viewed in the context of limitations of statistical power, the approach illustrates a means of potentially identifying genetic variants conferring susceptibility to AN because less complex phenotypes associated with AN are more proximal to the genotype and may be influenced by fewer genes.
PMCID: PMC3261131  PMID: 21780254
covariates; eating disorders; association studies; personality; genetic
5.  Patterns of Recombination Activity on Mouse Chromosome 11 Revealed by High Resolution Mapping 
PLoS ONE  2010;5(12):e15340.
The success of high resolution genetic mapping of disease predisposition and quantitative trait loci in humans and experimental animals depends on the positions of key crossover events around the gene of interest. In mammals, the majority of recombination occurs at highly delimited 1–2 kb long sites known as recombination hotspots, whose locations and activities are distributed unevenly along the chromosomes and are tightly regulated in a sex specific manner. The factors determining the location of hotspots started to emerge with the finding of PRDM9 as a major hotspot regulator in mammals, however, additional factors modulating hotspot activity and sex specificity are yet to be defined. To address this limitation, we have collected and mapped the locations of 4829 crossover events occurring on mouse chromosome 11 in 5858 meioses of male and female reciprocal F1 hybrids of C57BL/6J and CAST/EiJ mice. This chromosome was chosen for its medium size and high gene density and provided a comparison with our previous analysis of recombination on the longest mouse chromosome 1. Crossovers were mapped to an average resolution of 127 kb, and thirteen hotspots were mapped to <8 kb. Most crossovers occurred in a small number of the most active hotspots. Females had higher recombination rate than males as a consequence of differences in crossover interference and regional variation of sex specific rates along the chromosome. Comparison with chromosome 1 showed that recombination events tend to be positioned in similar fashion along the centromere-telomere axis but independently of the local gene density. It appears that mammalian recombination is regulated on at least three levels, chromosome-wide, regional, and at individual hotspots, and these regulation levels are influenced by sex and genetic background but not by gene content.
PMCID: PMC2999565  PMID: 21170346
6.  A survey of airway responsiveness in 36 inbred mouse strains facilitates gene mapping studies and identification of quantitative trait loci 
Airway hyper-responsiveness (AHR) is a critical phenotype of human asthma and animal models of asthma. Other studies have measured AHR in nine mouse strains, but only six strains have been used to identify genetic loci underlying AHR. Our goals were to increase the genetic diversity of available strains by surveying 27 additional strains, to apply haplotype association mapping to the 36-strain survey, and to identify new genetic determinants for AHR. We derived AHR from the increase in airway resistance in females subjected to increasing levels of methacholine concentrations. We used haplotype association mapping to identify associations between AHR and haplotypes on chromosomes 3, 5, 8, 12, 13, and 14. And we used bioinformatics techniques to narrow the identified region on chromosome 13, reducing the region to 29 candidate genes, with 11 of considerable interest. Our combined use of haplotype association mapping with bioinformatics tools is the first study of its kind for AHR on these 36 strains of mice. Our analyses have narrowed the possible QTL genes and will facilitate the discovery of novel genes that regulate AHR in mice.
PMCID: PMC2885868  PMID: 20143096
Mice; Genetics; Asthma
7.  CGDSNPdb: a database resource for error-checked and imputed mouse SNPs 
The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the ‘imputed genotype resource’ in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600 000 SNPs and over 900 000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login.
Database URL:
PMCID: PMC2911843  PMID: 20624716
8.  Otitis media: a genome-wide linkage scan with evidence of susceptibility loci within the 17q12 and 10q22.3 regions 
BMC Medical Genetics  2009;10:85.
Otitis media (OM) is a common worldwide pediatric health care problem that is known to be influenced by genetics. The objective of our study was to use linkage analysis to map possible OM susceptibility genes.
Using a stringent diagnostic model in which only those who underwent tympanostomy tube insertion at least once for recurrent/persistent OM are considered affected, we have carried out a genome-wide linkage scan using the 10K Affymetrix SNP panel. We genotyped 403 Caucasian families containing 1,431 genotyped individuals and 377 genotyped affected sib pairs, and 26 African American families containing 75 genotyped individuals and 27 genotyped affected sib pairs. After careful quality control, non-parametric linkage analysis was carried out using 8,802 SNPs.
In the Caucasian-only data set, our most significant linkage peak is on chromosome 17q12 at rs226088 with a p-value of 0.00007. Other peaks of potential interest are on 10q22.3 (0.00181 at rs1878001), 7q33 (0.00105 at rs958408), 6p25.1 (0.00261 at rs554653), and 4p15.2 (0.00301 at rs2133507). In the combined Caucasian and African American dataset, the 10q22.3 peak becomes more significant, with a minimal p-value of 0.00026 at rs719871. Family-based association testing reveals signals near previously implicated genes: 513 kb from SFTPA2 (10q22.3), 48 kb from IFNG (12q14), and 870 kb from TNF (6p21.3).
Our scan does not provide evidence for linkage in the previously reported regions of 10q26.3 and 19q13.43. Our best-supported linkage regions may contain susceptibility genes that influence the risk for recurrent/persistent OM. Plausible candidates in 17q12 include AP2B1, CCL5, and a cluster of other CCL genes, and in 10q22.3, SFTPA2.
PMCID: PMC2751750  PMID: 19728873
9.  An imputed genotype resource for the laboratory mouse 
We have created a high-density SNP resource encompassing 7.87 million polymorphic loci across 49 inbred mouse strains of the laboratory mouse by combining data available from public databases and training a hidden Markov model to impute missing genotypes in the combined data. The strong linkage disequilibrium found in dense sets of SNP markers in the laboratory mouse provides the basis for accurate imputation. Using genotypes from eight independent SNP resources, we empirically validated the quality of the imputed genotypes and demonstrate that they are highly reliable for most inbred strains. The imputed SNP resource will be useful for studies of natural variation and complex traits. It will facilitate association study designs by providing high density SNP genotypes for large numbers of mouse strains. We anticipate that this resource will continue to evolve as new genotype data become available for laboratory mouse strains. The data are available for bulk download or query at
PMCID: PMC2725522  PMID: 18301946
mouse; SNP; hidden Markov model; missing data
10.  The Recombinational Anatomy of a Mouse Chromosome 
PLoS Genetics  2008;4(7):e1000119.
Among mammals, genetic recombination occurs at highly delimited sites known as recombination hotspots. They are typically 1–2 kb long and vary as much as a 1,000-fold or more in recombination activity. Although much is known about the molecular details of the recombination process itself, the factors determining the location and relative activity of hotspots are poorly understood. To further our understanding, we have collected and mapped the locations of 5,472 crossover events along mouse Chromosome 1 arising in 6,028 meioses of male and female reciprocal F1 hybrids of C57BL/6J and CAST/EiJ mice. Crossovers were mapped to a minimum resolution of 225 kb, and those in the telomere-proximal 24.7 Mb were further mapped to resolve individual hotspots. Recombination rates were evolutionarily conserved on a regional scale, but not at the local level. There was a clear negative-exponential relationship between the relative activity and abundance of hotspot activity classes, such that a small number of the most active hotspots account for the majority of recombination. Females had 1.2× higher overall recombination than males did, although the sex ratio showed considerable regional variation. Locally, entirely sex-specific hotspots were rare. The initiation of recombination at the most active hotspot was regulated independently on the two parental chromatids, and analysis of reciprocal crosses indicated that parental imprinting has subtle effects on recombination rates. It appears that the regulation of mammalian recombination is a complex, dynamic process involving multiple factors reflecting species, sex, individual variation within species, and the properties of individual hotspots.
Author Summary
In most eukaryotic organisms, recombination—the exchange of genetic information between homologous chromosomes—ensures the proper recognition and segregation of chromosomes during meiosis. Recombination events in mammals are not randomly positioned along the chromosomes but occur in preferential 1–2-kilobase sequences termed hotspots. Different species such as humans and mice do not share hotspots, although the same principles almost certainly regulate their placement in the genome. Hotspot positions and activities depend on genetic background and show sex-specific differences. In this study, we present a detailed analysis of recombination activity along the largest mouse chromosome, finding that recombination is regulated on multiple levels, including regional positioning relative to the chromosomal ends, local gene content, sex-specific mechanisms of hotspot recognition, and parental origin. Our results will contribute to further understanding of one of the most fundamental biological processes and are likely to cast light on several aspects of population genetics and evolutionary biology, as well as enhance our practical ability to define the genetic components of human disease.
PMCID: PMC2440539  PMID: 18617997

