1.  Strategies to Design and Analyze Targeted Sequencing Data: The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Targeted Sequencing Study 
Genome-wide association studies (GWAS) have identified thousands of genetic variants that influence a variety of diseases and health-related quantitative traits. However, the causal variants underlying the majority of genetic associations remain unknown. The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Targeted Sequencing Study aims to follow up GWAS signals and identify novel associations of the allelic spectrum of identified variants with cardiovascular related traits.
Methods and Results
The study included 4,231 participants from three CHARGE cohorts: the Atherosclerosis Risk in Communities Study, the Cardiovascular Health Study, and the Framingham Heart Study. We used a case-cohort design in which we selected both a random sample of participants and participants with extreme phenotypes for each of 14 traits. We sequenced and analyzed 77 genomic loci, which had previously been associated with one or more of 14 phenotypes. A total of 52,736 variants were characterized by sequencing and passed our stringent quality control criteria. For common variants (minor allele frequency ≥1%), we performed unweighted regression analyses to obtain p-values for associations and weighted regression analyses to obtain effect estimates that accounted for the sampling design. For rare variants, we applied two approaches: collapsed aggregate statistics and joint analysis of variants using the Sequence Kernel Association Test.
We sequenced 77 genomic loci in participants from three cohorts. We established a set of filters to identify high-quality variants, and implemented statistical and bioinformatics strategies to analyze the sequence data, and identify potentially functional variants within GWAS loci.
PMCID: PMC4176824  PMID: 24951659
genetics; epidemiology; CHARGE; sampling; targeted sequencing
2.  Whole-Genome Sequencing in a Patient with Charcot–Marie–Tooth Neuropathy 
The New England journal of medicine  2010;362(13):1181-1191.
Whole-genome sequencing may revolutionize medical diagnostics through rapid identification of alleles that cause disease. However, even in cases with simple patterns of inheritance and unambiguous diagnoses, the relationship between disease phenotypes and their corresponding genetic changes can be complicated. Comprehensive diagnostic assays must therefore identify all possible DNA changes in each haplotype and determine which are responsible for the underlying disorder. The high number of rare, heterogeneous mutations present in all humans and the paucity of known functional variants in more than 90% of annotated genes make this challenge particularly difficult. Thus, the identification of the molecular basis of a genetic disease by means of whole-genome sequencing has remained elusive. We therefore aimed to assess the usefulness of human whole-genome sequencing for genetic diagnosis in a patient with Charcot–Marie–Tooth disease.
We identified a family with a recessive form of Charcot–Marie–Tooth disease for which the genetic basis had not been identified. We sequenced the whole genome of the proband, identified all potential functional variants in genes likely to be related to the disease, and genotyped these variants in the affected family members.
We identified and validated compound, heterozygous, causative alleles in SH3TC2 (the SH3 domain and tetratricopeptide repeats 2 gene), involving two mutations, in the proband and in family members affected by Charcot–Marie–Tooth disease. Separate subclinical phenotypes segregated independently with each of the two mutations; heterozygous mutations confer susceptibility to neuropathy, including the carpal tunnel syndrome.
As shown in this study of a family with Charcot–Marie–Tooth disease, whole-genome sequencing can identify clinically relevant variants and provide diagnostic information to inform the care of patients.
PMCID: PMC4036802  PMID: 20220177
3.  Low-frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility 
Wessel, Jennifer | Chu, Audrey Y. | Willems, Sara M. | Wang, Shuai | Yaghootkar, Hanieh | Brody, Jennifer A. | Dauriz, Marco | Hivert, Marie-France | Raghavan, Sridharan | Lipovich, Leonard | Hidalgo, Bertha | Fox, Keolu | Huffman, Jennifer E. | An, Ping | Lu, Yingchang | Rasmussen-Torvik, Laura J. | Grarup, Niels | Ehm, Margaret G. | Li, Li | Baldridge, Abigail S. | Stančáková, Alena | Abrol, Ravinder | Besse, Céline | Boland, Anne | Bork-Jensen, Jette | Fornage, Myriam | Freitag, Daniel F. | Garcia, Melissa E. | Guo, Xiuqing | Hara, Kazuo | Isaacs, Aaron | Jakobsdottir, Johanna | Lange, Leslie A. | Layton, Jill C. | Li, Man | Zhao, Jing Hua | Meidtner, Karina | Morrison, Alanna C. | Nalls, Mike A. | Peters, Marjolein J. | Sabater-Lleal, Maria | Schurmann, Claudia | Silveira, Angela | Smith, Albert V. | Southam, Lorraine | Stoiber, Marcus H. | Strawbridge, Rona J. | Taylor, Kent D. | Varga, Tibor V. | Allin, Kristine H. | Amin, Najaf | Aponte, Jennifer L. | Aung, Tin | Barbieri, Caterina | Bihlmeyer, Nathan A. | Boehnke, Michael | Bombieri, Cristina | Bowden, Donald W. | Burns, Sean M. | Chen, Yuning | Chen, Yii-Der I. | Cheng, Ching-Yu | Correa, Adolfo | Czajkowski, Jacek | Dehghan, Abbas | Ehret, Georg B. | Eiriksdottir, Gudny | Escher, Stefan A. | Farmaki, Aliki-Eleni | Frånberg, Mattias | Gambaro, Giovanni | Giulianini, Franco | III, William A. Goddard | Goel, Anuj | Gottesman, Omri | Grove, Megan L. | Gustafsson, Stefan | Hai, Yang | Hallmans, Göran | Heo, Jiyoung | Hoffmann, Per | Ikram, Mohammad K. | Jensen, Richard A. | Jørgensen, Marit E. | Jørgensen, Torben | Karaleftheri, Maria | Khor, Chiea C. | Kirkpatrick, Andrea | Kraja, Aldi T. | Kuusisto, Johanna | Lange, Ethan M. | Lee, I.T. | Lee, Wen-Jane | Leong, Aaron | Liao, Jiemin | Liu, Chunyu | Liu, Yongmei | Lindgren, Cecilia M. | Linneberg, Allan | Malerba, Giovanni | Mamakou, Vasiliki | Marouli, Eirini | Maruthur, Nisa M. | Matchan, Angela | McKean, Roberta | McLeod, Olga | Metcalf, Ginger A. | Mohlke, Karen L. | Muzny, Donna M. | Ntalla, Ioanna | Palmer, Nicholette D. | Pasko, Dorota | Peter, Andreas | Rayner, Nigel W. | Renström, Frida | Rice, Ken | Sala, Cinzia F. | Sennblad, Bengt | Serafetinidis, Ioannis | Smith, Jennifer A. | Soranzo, Nicole | Speliotes, Elizabeth K. | Stahl, Eli A. | Stirrups, Kathleen | Tentolouris, Nikos | Thanopoulou, Anastasia | Torres, Mina | Traglia, Michela | Tsafantakis, Emmanouil | Javad, Sundas | Yanek, Lisa R. | Zengini, Eleni | Becker, Diane M. | Bis, Joshua C. | Brown, James B. | Cupples, L. Adrienne | Hansen, Torben | Ingelsson, Erik | Karter, Andrew J. | Lorenzo, Carlos | Mathias, Rasika A. | Norris, Jill M. | Peloso, Gina M. | Sheu, Wayne H.-H. | Toniolo, Daniela | Vaidya, Dhananjay | Varma, Rohit | Wagenknecht, Lynne E. | Boeing, Heiner | Bottinger, Erwin P. | Dedoussis, George | Deloukas, Panos | Ferrannini, Ele | Franco, Oscar H. | Franks, Paul W. | Gibbs, Richard A. | Gudnason, Vilmundur | Hamsten, Anders | Harris, Tamara B. | Hattersley, Andrew T. | Hayward, Caroline | Hofman, Albert | Jansson, Jan-Håkan | Langenberg, Claudia | Launer, Lenore J. | Levy, Daniel | Oostra, Ben A. | O'Donnell, Christopher J. | O'Rahilly, Stephen | Padmanabhan, Sandosh | Pankow, James S. | Polasek, Ozren | Province, Michael A. | Rich, Stephen S. | Ridker, Paul M | Rudan, Igor | Schulze, Matthias B. | Smith, Blair H. | Uitterlinden, André G. | Walker, Mark | Watkins, Hugh | Wong, Tien Y. | Zeggini, Eleftheria | Scotland, Generation | Laakso, Markku | Borecki, Ingrid B. | Chasman, Daniel I. | Pedersen, Oluf | Psaty, Bruce M. | Tai, E. Shyong | van Duijn, Cornelia M. | Wareham, Nicholas J. | Waterworth, Dawn M. | Boerwinkle, Eric | Kao, WH Linda | Florez, Jose C. | Loos, Ruth J.F. | Wilson, James G. | Frayling, Timothy M. | Siscovick, David S. | Dupuis, Josée | Rotter, Jerome I. | Meigs, James B. | Scott, Robert A. | Goodarzi, Mark O.
Nature communications  2015;6:5897.
Fasting glucose and insulin are intermediate traits for type 2 diabetes. Here we explore the role of coding variation on these traits by analysis of variants on the HumanExome BeadChip in 60,564 non-diabetic individuals and in 16,491 T2D cases and 81,877 controls. We identify a novel association of a low-frequency nonsynonymous SNV in GLP1R (A316T; rs10305492; MAF=1.4%) with lower FG (β=-0.09±0.01 mmol L−1, p=3.4×10−12), T2D risk (OR[95%CI]=0.86[0.76-0.96], p=0.010), early insulin secretion (β=-0.07±0.035 pmolinsulin mmolglucose−1, p=0.048), but higher 2-h glucose (β=0.16±0.05 mmol L−1, p=4.3×10−4). We identify a gene-based association with FG at G6PC2 (pSKAT=6.8×10−6) driven by four rare protein-coding SNVs (H177Y, Y207S, R283X and S324P). We identify rs651007 (MAF=20%) in the first intron of ABO at the putative promoter of an antisense lncRNA, associating with higher FG (β=0.02±0.004 mmol L−1, p=1.3×10−8). Our approach identifies novel coding variant associations and extends the allelic spectrum of variation underlying diabetes-related quantitative traits and T2D susceptibility.
PMCID: PMC4311266  PMID: 25631608
4.  Lucilia cuprina genome unlocks parasitic fly biology to underpin future interventions 
Nature Communications  2015;6:7344.
Lucilia cuprina is a parasitic fly of major economic importance worldwide. Larvae of this fly invade their animal host, feed on tissues and excretions and progressively cause severe skin disease (myiasis). Here we report the sequence and annotation of the 458-megabase draft genome of Lucilia cuprina. Analyses of this genome and the 14,544 predicted protein-encoding genes provide unique insights into the fly's molecular biology, interactions with the host animal and insecticide resistance. These insights have broad implications for designing new methods for the prevention and control of myiasis.
Lucilia cuprina is a parasitic blowfly of major economic importance worldwide that feeds on the tissues of animals such as sheep. Here, the authors sequence the genome of L. cuprina and provide insights into the fly's molecular biology, interactions with the host animal and insecticide resistance.
PMCID: PMC4491171  PMID: 26108605
Neuro-Oncology  2014;16(Suppl 3):iii24.
BACKGROUND: Additional insight into the molecular alterations driving pediatric central nervous system (CNS) tumors is urgently needed, given the significant morbidity and mortality associated with these cancers and the relative paucity of effective chemotherapeutic options. Advances in sequencing technologies now allow for provision of genome-scale data to oncologists caring for pediatric cancer patients but current experience with the clinical application of genomic sequencing is limited. The goal of the BASIC3 (Baylor Advancing Sequencing into Childhood Cancer Care) study is to determine the clinical impact of incorporating CLIA-certified tumor and constitutional whole exome sequencing (WES) into the care of children with newly diagnosed solid tumors. METHODS: The study follows pediatric patients with newly diagnosed CNS and non-CNS solid tumors (target enrollment n = 280) at Texas Children's Cancer Center for two years after performing CLIA-certified whole exome sequencing (WES) of blood and frozen tumor samples. Results are deposited into the electronic medical record and disclosed to families by their oncologist and a genetic counselor. The potential impact of tumor exome findings on clinical decision-making is assessed through review of the medical record and through surveys of the oncologists regarding prioritization of treatment options in the hypothetical event of tumor recurrence. RESULTS: To date, 133 subjects have been enrolled, including 47 patients with CNS tumors (35%) comprising a diverse representation of diagnoses. Despite limited diagnostic biopsies in many patients, tumor samples adequate for WES were obtained from 33/47 (70%) patients. Tumor WES results have been reported for the first 22 CNS tumors, revealing a median of 7 (range of 0 to 25) protein-altering mutations per tumor, including alterations of known cancer genes such as ARID1A, SMARCA4, BRAF, CTNNB1, DDX3X, NF2, FANCA, and NOTCH3. Notably, 12/22 (55%) tumors were found to harbour mutations only in genes not known to be recurrently altered in human cancers. CONCLUSIONS: These results demonstrate the feasibility of routine tumor WES in the pediatric neuro- oncology clinic. Potentially clinically-relevant mutations can be identified in a substantial proportion of patients but early results suggest that integration of parallel genomic technologies (e.g. RNAseq) to identify genetic alterations not detectable by WES will be necessary; such studies are ongoing. Orthotopic xenograft models and cell lines are being established to allow in vitro and in vivo analysis of tumors containing alterations of interest. Data further assessing the clinical utility of the tumor exomes are under study. Supported by NHGRI/NCI 1U01HG006485. SECONDARY CATEGORY: Neuropathology & Tumor Biomarkers.
PMCID: PMC4144597
6.  Secondary findings and carrier test frequencies in a large multiethnic sample 
Genome Medicine  2015;7(1):54.
Besides its growing importance in clinical diagnostics and understanding the genetic basis of Mendelian and complex diseases, whole exome sequencing (WES) is a rich source of additional information of potential clinical utility for physicians, patients and their families. We analyzed the frequency and nature of single nucleotide variants (SNVs) considered secondary findings and recessive disease allele carrier status in the exomes of 8554 individuals from a large, randomly sampled cohort study and 2514 patients from a study of presumed Mendelian disease having undergone WES.
We used the same sequencing platform and data processing pipeline to analyze all samples and characterized the distributions of reported pathogenic (ClinVar, Human Gene Mutation Database (HGMD)) and predicted deleterious variants in the pre-specified American College of Medical Genetics and Genomics (ACMG) secondary findings and recessive disease genes in different ethnic groups.
In the 56 ACMG secondary findings genes, the average number of predicted deleterious variants per individual was 0.74, and the mean number of ClinVar reported pathogenic variants was 0.06. We observed an average of 10 deleterious and 0.78 ClinVar reported pathogenic variants per individual in 1423 autosomal recessive disease genes. By repeatedly sampling pairs of exomes, 0.5 % of the randomly generated couples were at 25 % risk of having an affected offspring for an autosomal recessive disorder based on the ClinVar variants.
By investigating reported pathogenic and novel, predicted deleterious variants we estimated the lower and upper limits of the population fraction for which exome sequencing may reveal additional medically relevant information. We suggest that the observed wide range for the lower and upper limits of these frequency numbers will be gradually reduced due to improvement in classification databases and prediction algorithms.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0171-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4507324  PMID: 26195989
7.  ADAM19 and HTR4 Variants and Pulmonary Function: The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Targeted Sequencing Study 
The pulmonary function measures of forced expiratory volume in one second (FEV1) and its ratio to forced vital capacity (FVC) are used in the diagnosis and monitoring of lung diseases and predict cardiovascular mortality in the general population. Genome wide association studies (GWAS) have identified numerous loci associated with FEV1 and FEV1/FVC but the causal variants remain uncertain. We hypothesized that novel or rare variants poorly tagged by GWAS may explain the significant associations between FEV1/FVC and two genes: ADAM19 and HTR4.
Methods and Results
We sequenced ADAM19 and its promoter region along with the approximately 21 kb portion of HTR4 harboring GWAS SNPs for pulmonary function and analyzed associations with FEV1/FVC among 3,983 participants of European ancestry from Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE). Meta-analysis of common variants in each region identified statistically significant associations (316 tests, P < 1.58×10−4) with FEV1/FVC for 14 ADAM19 SNPs and 24 HTR4 SNPs. After conditioning on the sentinel GWAS hit in each gene [ADAM19 rs1422795, minor allele frequency (MAF)=0.33 and HTR4 rs11168048, MAF=0.40] one SNP remained statistically significant (ADAM19 rs13155908, MAF = 0.12, P = 1.56×10−4). Analysis of rare variants (MAF < 1%) using Sequence Kernel Association Test did not identify associations with either region.
Sequencing identified one common variant associated with FEV1/FVC independently of the sentinel ADAM19 GWAS hit and supports the original HTR4 GWAS findings. Rare variants do not appear to underlie GWAS associations with pulmonary function for common variants in ADAM19 and HTR4.
PMCID: PMC4136502  PMID: 24951661
genetic polymorphism; lung; population studies; DNA sequencing; Genome Wide Association Study
8.  Global transcriptional disturbances underlie Cornelia de Lange syndrome and related phenotypes 
Cornelia de Lange syndrome (CdLS) is a genetically heterogeneous disorder that presents with extensive phenotypic variability, including facial dysmorphism, developmental delay/intellectual disability (DD/ID), abnormal extremities, and hirsutism. About 65% of patients harbor mutations in genes that encode subunits or regulators of the cohesin complex, including NIPBL, SMC1A, SMC3, RAD21, and HDAC8. Wiedemann-Steiner syndrome (WDSTS), which shares CdLS phenotypic features, is caused by mutations in lysine-specific methyltransferase 2A (KMT2A). Here, we performed whole-exome sequencing (WES) of 2 male siblings clinically diagnosed with WDSTS; this revealed a hemizygous, missense mutation in SMC1A that was predicted to be deleterious. Extensive clinical evaluation and WES of 32 Turkish patients clinically diagnosed with CdLS revealed the presence of a de novo heterozygous nonsense KMT2A mutation in 1 patient without characteristic WDSTS features. We also identified de novo heterozygous mutations in SMC3 or SMC1A that affected RNA splicing in 2 independent patients with combined CdLS and WDSTS features. Furthermore, in families from 2 separate world populations segregating an autosomal-recessive disorder with CdLS-like features, we identified homozygous mutations in TAF6, which encodes a core transcriptional regulatory pathway component. Together, our data, along with recent transcriptome studies, suggest that CdLS and related phenotypes may be “transcriptomopathies” rather than cohesinopathies.
PMCID: PMC4319410  PMID: 25574841
9.  Microcephaly, Epilepsy and Neonatal Diabetes Due to Compound Heterozygous Mutations in IER3IP1: Insights into the Natural History of a Rare Disorder 
Pediatric diabetes  2013;15(3):252-256.
Neonatal diabetes mellitus is known to have over 20 different monogenic causes. A syndrome of permanent neonatal diabetes along with primary microcephaly with simplified gyral pattern, associated with severe infantile epileptic encephalopathy was recently described in two independent reports in which disease-causing homozygous mutations were identified in the immediate early response-3 interacting protein-1 (IER3IP1) gene. We report here an affected male born to a non-consanguineous couple who was noted to have insulin-requiring permanent neonatal diabetes, microcephaly, and generalized seizures. He was also found to have cortical blindness, severe developmental delay and numerous dysmorphic features. He experienced a slow improvement but not abrogation of seizure frequency and severity on numerous anti-epileptic agents. His clinical course was further complicated by recurrent respiratory tract infections and he died at 8 years of age.
Whole exome sequencing was performed on DNA from the proband and parents. He was found to be a compound heterozygote with two different mutations in IER3IP1: p.Val21Gly (V21G) and a novel frameshift mutation p.Phe27fsSer*25. IER3IP1 is a highly conserved protein with marked expression in the cerebral cortex and in beta cells. This is the first reported case of compound heterozygous mutations within IER3IP1 resulting in neonatal diabetes. The triad of microcephaly, generalized seizures and permanent neonatal diabetes should prompt screening for mutations in IER3IP1. Since mutations in genes such as NEUROD1 and PTF1A could cause a similar phenotype, next-generation sequencing approaches – such as exome sequencing reported here – may be an efficient means of uncovering a diagnosis in future cases.
PMCID: PMC3994177  PMID: 24138066
10.  Human CLP1 mutations alter tRNA biogenesis affecting both peripheral and central nervous system function 
Cell  2014;157(3):636-650.
CLP1 is a RNA kinase involved in tRNA splicing. Recently, CLP1 kinase-dead mice were shown to display a neuromuscular disorder with loss of motor neurons and muscle paralysis. Human genome analyses now identified a CLP1 homozygous missense mutation (p.R140H) in five unrelated families, leading to a loss of CLP1 interaction with the tRNA splicing endonuclease (TSEN) complex, largely reduced pre-tRNA cleavage activity, and accumulation of linear tRNA introns. The affected individuals develop severe motor-sensory defects, cortical dysgenesis and microcephaly. Mice carrying kinase-dead CLP1 also displayed microcephaly and reduced cortical brain volume due to the enhanced cell death of neuronal progenitors that is associated with reduced numbers of cortical neurons. Our data elucidate a novel neurological syndrome defined by CLP1 mutations that impair tRNA splicing. Reduction of a founder mutation to homozygosity illustrates the importance of rare variations in disease and supports the clan genomics hypothesis.
PMCID: PMC4146440  PMID: 24766809
11.  Assessing structural variation in a personal genome—towards a human reference diploid genome 
BMC Genomics  2015;16(1):286.
Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.
We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.
HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4490614  PMID: 25886820
Structural variation; Long-read sequencing; SV software
12.  Compound Heterozygous CORO1A Mutations in Siblings with a Mucocutaneous-Immunodeficiency Syndrome of Epidermodysplasia Verruciformis-HPV, Molluscum Contagiosum and Granulomatous Tuberculoid Leprosy 
Journal of clinical immunology  2014;34(7):871-890.
Coronin-1A deficiency is a recently recognized autosomal recessive primary immunodeficiency caused by mutations in CORO1A (OMIM 605000) that results in T-cell lymphopenia and is classified as T-B+NK+severe combined immunodeficiency (SCID). Only two other CORO1A-kindred are known to date, thus the defining characteristics are not well delineated. We identified a unique CORO1A-kindred.
We captured a 10-year analysis of the immuneclinical phenotypes in two affected siblings from disease debut of age 7 years. Target-specific genetic studies were pursued but unrevealing. Telomere lengths were also assessed. Whole exome sequencing (WES) uncovered the molecular diagnosis and Western blot validated findings.
We found the compound heterozygous CORO1A variants: c.248_249delCT (p.P83RfsX10) and a novel mutation c.1077delC (p.Q360RfsX44) (NM_007074.3) in two affected non-consanguineous siblings that manifested as absent CD4CD45RA+ (naïve) T and memory B cells, low NK cells and abnormally increased doublenegative (DN) ϒδ T-cells. Distinguishing characteristics were late clinical debut with an unusual mucocutaneous syndrome of epidermodysplasia verruciformis-human papilloma virus (EV-HPV), molluscum contagiosum and oral-cutaneous herpetic ulcers; the older female sibling also had a disfiguring granulomatous tuberculoid leprosy. Both had bilateral bronchiectasis and the female died of EBV+ lymphomas at age 16 years. The younger surviving male, without malignancy, had reproducibly very short telomere lengths, not before appreciated in CORO1A mutations.
We reveal the third CORO1A-mutated kindred, with the immune phenotype of abnormal naïve CD4 and DN T-cells and newfound characteristics of a late/hypomorphiclike SCID of an EV-HPV mucocutaneous syndrome with also B and NK defects and shortened telomeres. Our findings contribute to the elucidation of the CORO1A-SCID-CID spectrum.
PMCID: PMC4386834  PMID: 25073507
CORO1A (Coronin-1A deficiency); SCID-CID; HPV-epidermodysplasia verruciformis; mucocutaneous; molluscum; leprosy; telomere; WES (whole exome sequencing)
13.  Rise and fall of subclones from diagnosis to relapse in pediatric B-acute lymphoblastic leukaemia 
Nature Communications  2015;6:6604.
There is incomplete understanding of genetic heterogeneity and clonal evolution during cancer progression. Here we use deep whole-exome sequencing to describe the clonal architecture and evolution of 20 pediatric B-acute lymphoblastic leukaemias from diagnosis to relapse. We show that clonal diversity is comparable at diagnosis and relapse and clonal survival from diagnosis to relapse is not associated with mutation burden. Six pathways were frequently mutated, with NT5C2, CREBBP, WHSC1, TP53, USH2A, NRAS and IKZF1 mutations enriched at relapse. Half of the leukaemias had multiple subclonal mutations in a pathway or gene at diagnosis, but mostly with only one, usually minor clone, surviving therapy to acquire additional mutations and become the relapse founder clone. Relapse-specific mutations in NT5C2 were found in nine cases, with mutations in four cases being in descendants of the relapse founder clone. These results provide important insights into the genetic basis of treatment failure in ALL and have implications for the early detection of mutations driving relapse.
Genetic heterogeneity and clonal evolution contribute to cancer progression. Here Ma et al. use deep whole-exome sequencing to identify recurrently mutated pathways and clonal architecture in pediatric acute lymphoblastic leukaemia, shedding light on the evolutionary trajectory from diagnosis to relapse
PMCID: PMC4377644  PMID: 25790293
14.  PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations 
BMC Genomics  2015;16(1):214.
Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high.
We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki–Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants.
The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1370-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4376517  PMID: 25887218
Targeted sequencing; Single molecule sequencing; Complex genomic rearrangement
15.  Gibbon genome and the fast karyotype evolution of small apes 
Carbone, Lucia | Harris, R. Alan | Gnerre, Sante | Veeramah, Krishna R. | Lorente-Galdos, Belen | Huddleston, John | Meyer, Thomas J. | Herrero, Javier | Roos, Christian | Aken, Bronwen | Anaclerio, Fabio | Archidiacono, Nicoletta | Baker, Carl | Barrell, Daniel | Batzer, Mark A. | Beal, Kathryn | Blancher, Antoine | Bohrson, Craig L. | Brameier, Markus | Campbell, Michael S. | Capozzi, Oronzo | Casola, Claudio | Chiatante, Giorgia | Cree, Andrew | Damert, Annette | de Jong, Pieter J. | Dumas, Laura | Fernandez-Callejo, Marcos | Flicek, Paul | Fuchs, Nina V. | Gut, Marta | Gut, Ivo | Hahn, Matthew W. | Hernandez-Rodriguez, Jessica | Hillier, LaDeana W. | Hubley, Robert | Ianc, Bianca | Izsvák, Zsuzsanna | Jablonski, Nina G. | Johnstone, Laurel M. | Karimpour-Fard, Anis | Konkel, Miriam K. | Kostka, Dennis | Lazar, Nathan H. | Lee, Sandra L. | Lewis, Lora R. | Liu, Yue | Locke, Devin P. | Mallick, Swapan | Mendez, Fernando L. | Muffato, Matthieu | Nazareth, Lynne V. | Nevonen, Kimberly A. | O,Bleness, Majesta | Ochis, Cornelia | Odom, Duncan T. | Pollard, Katherine S. | Quilez, Javier | Reich, David | Rocchi, Mariano | Schumann, Gerald G. | Searle, Stephen | Sikela, James M. | Skollar, Gabriella | Smit, Arian | Sonmez, Kemal | Hallers, Boudewijn ten | Terhune, Elizabeth | Thomas, Gregg W.C. | Ullmer, Brygg | Ventura, Mario | Walker, Jerilyn A. | Wall, Jeffrey D. | Walter, Lutz | Ward, Michelle C. | Wheelan, Sarah J. | Whelan, Christopher W. | White, Simon | Wilhelm, Larry J. | Woerner, August E. | Yandell, Mark | Zhu, Baoli | Hammer, Michael F. | Marques-Bonet, Tomas | Eichler, Evan E. | Fulton, Lucinda | Fronick, Catrina | Muzny, Donna M. | Warren, Wesley C. | Worley, Kim C. | Rogers, Jeffrey | Wilson, Richard K. | Gibbs, Richard A.
Nature  2014;513(7517):195-201.
Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ~5 million years ago, coincident with major geographical changes in Southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.
PMCID: PMC4249732  PMID: 25209798
16.  Low-frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility 
Wessel, Jennifer | Chu, Audrey Y | Willems, Sara M | Wang, Shuai | Yaghootkar, Hanieh | Brody, Jennifer A | Dauriz, Marco | Hivert, Marie-France | Raghavan, Sridharan | Lipovich, Leonard | Hidalgo, Bertha | Fox, Keolu | Huffman, Jennifer E | An, Ping | Lu, Yingchang | Rasmussen-Torvik, Laura J | Grarup, Niels | Ehm, Margaret G | Li, Li | Baldridge, Abigail S | Stančáková, Alena | Abrol, Ravinder | Besse, Céline | Boland, Anne | Bork-Jensen, Jette | Fornage, Myriam | Freitag, Daniel F | Garcia, Melissa E | Guo, Xiuqing | Hara, Kazuo | Isaacs, Aaron | Jakobsdottir, Johanna | Lange, Leslie A | Layton, Jill C | Li, Man | Hua Zhao, Jing | Meidtner, Karina | Morrison, Alanna C | Nalls, Mike A | Peters, Marjolein J | Sabater-Lleal, Maria | Schurmann, Claudia | Silveira, Angela | Smith, Albert V | Southam, Lorraine | Stoiber, Marcus H | Strawbridge, Rona J | Taylor, Kent D | Varga, Tibor V | Allin, Kristine H | Amin, Najaf | Aponte, Jennifer L | Aung, Tin | Barbieri, Caterina | Bihlmeyer, Nathan A | Boehnke, Michael | Bombieri, Cristina | Bowden, Donald W | Burns, Sean M | Chen, Yuning | Chen, Yii-DerI | Cheng, Ching-Yu | Correa, Adolfo | Czajkowski, Jacek | Dehghan, Abbas | Ehret, Georg B | Eiriksdottir, Gudny | Escher, Stefan A | Farmaki, Aliki-Eleni | Frånberg, Mattias | Gambaro, Giovanni | Giulianini, Franco | Goddard, William A | Goel, Anuj | Gottesman, Omri | Grove, Megan L | Gustafsson, Stefan | Hai, Yang | Hallmans, Göran | Heo, Jiyoung | Hoffmann, Per | Ikram, Mohammad K | Jensen, Richard A | Jørgensen, Marit E | Jørgensen, Torben | Karaleftheri, Maria | Khor, Chiea C | Kirkpatrick, Andrea | Kraja, Aldi T | Kuusisto, Johanna | Lange, Ethan M | Lee, I T | Lee, Wen-Jane | Leong, Aaron | Liao, Jiemin | Liu, Chunyu | Liu, Yongmei | Lindgren, Cecilia M | Linneberg, Allan | Malerba, Giovanni | Mamakou, Vasiliki | Marouli, Eirini | Maruthur, Nisa M | Matchan, Angela | McKean-Cowdin, Roberta | McLeod, Olga | Metcalf, Ginger A | Mohlke, Karen L | Muzny, Donna M | Ntalla, Ioanna | Palmer, Nicholette D | Pasko, Dorota | Peter, Andreas | Rayner, Nigel W | Renström, Frida | Rice, Ken | Sala, Cinzia F | Sennblad, Bengt | Serafetinidis, Ioannis | Smith, Jennifer A | Soranzo, Nicole | Speliotes, Elizabeth K | Stahl, Eli A | Stirrups, Kathleen | Tentolouris, Nikos | Thanopoulou, Anastasia | Torres, Mina | Traglia, Michela | Tsafantakis, Emmanouil | Javad, Sundas | Yanek, Lisa R | Zengini, Eleni | Becker, Diane M | Bis, Joshua C | Brown, James B | Adrienne Cupples, L | Hansen, Torben | Ingelsson, Erik | Karter, Andrew J | Lorenzo, Carlos | Mathias, Rasika A | Norris, Jill M | Peloso, Gina M | Sheu, Wayne H.-H. | Toniolo, Daniela | Vaidya, Dhananjay | Varma, Rohit | Wagenknecht, Lynne E | Boeing, Heiner | Bottinger, Erwin P | Dedoussis, George | Deloukas, Panos | Ferrannini, Ele | Franco, Oscar H | Franks, Paul W | Gibbs, Richard A | Gudnason, Vilmundur | Hamsten, Anders | Harris, Tamara B | Hattersley, Andrew T | Hayward, Caroline | Hofman, Albert | Jansson, Jan-Håkan | Langenberg, Claudia | Launer, Lenore J | Levy, Daniel | Oostra, Ben A | O'Donnell, Christopher J | O'Rahilly, Stephen | Padmanabhan, Sandosh | Pankow, James S | Polasek, Ozren | Province, Michael A | Rich, Stephen S | Ridker, Paul M | Rudan, Igor | Schulze, Matthias B | Smith, Blair H | Uitterlinden, André G | Walker, Mark | Watkins, Hugh | Wong, Tien Y | Zeggini, Eleftheria | Laakso, Markku | Borecki, Ingrid B | Chasman, Daniel I | Pedersen, Oluf | Psaty, Bruce M | Shyong Tai, E | van Duijn, Cornelia M | Wareham, Nicholas J | Waterworth, Dawn M | Boerwinkle, Eric | Linda Kao, W H | Florez, Jose C | Loos, Ruth J.F. | Wilson, James G | Frayling, Timothy M | Siscovick, David S | Dupuis, Josée | Rotter, Jerome I | Meigs, James B | Scott, Robert A | Goodarzi, Mark O
Nature Communications  2015;6:5897.
Fasting glucose and insulin are intermediate traits for type 2 diabetes. Here we explore the role of coding variation on these traits by analysis of variants on the HumanExome BeadChip in 60,564 non-diabetic individuals and in 16,491 T2D cases and 81,877 controls. We identify a novel association of a low-frequency nonsynonymous SNV in GLP1R (A316T; rs10305492; MAF=1.4%) with lower FG (β=−0.09±0.01 mmol l−1, P=3.4 × 10−12), T2D risk (OR[95%CI]=0.86[0.76–0.96], P=0.010), early insulin secretion (β=−0.07±0.035 pmolinsulin mmolglucose−1, P=0.048), but higher 2-h glucose (β=0.16±0.05 mmol l−1, P=4.3 × 10−4). We identify a gene-based association with FG at G6PC2 (pSKAT=6.8 × 10−6) driven by four rare protein-coding SNVs (H177Y, Y207S, R283X and S324P). We identify rs651007 (MAF=20%) in the first intron of ABO at the putative promoter of an antisense lncRNA, associating with higher FG (β=0.02±0.004 mmol l−1, P=1.3 × 10−8). Our approach identifies novel coding variant associations and extends the allelic spectrum of variation underlying diabetes-related quantitative traits and T2D susceptibility.
Both rare and common variants contribute to the aetiology of complex traits such as type 2 diabetes (T2D). Here, the authors examine the effect of coding variation on glycaemic traits and T2D, and identify low-frequency variation in GLP1R significantly associated with these traits.
PMCID: PMC4311266  PMID: 25631608
17.  16S gut community of the Cameron County Hispanic Cohort 
Microbiome  2015;3:7.
Obesity and type 2 diabetes (T2D) are major public health concerns worldwide, and their prevalence has only increased in recent years. Mexican Americans are disproportionately afflicted by obesity and T2D, and rates are even higher in the United States-Mexico border region. To determine the factors associated with the increased risk of T2D, obesity, and other diseases in this population, the Cameron County Hispanic Cohort was established in 2004.
In this study, we characterized the 16S gut community of a subset of 63 subjects from this unique cohort. We found that these communities, when compared to Human Microbiome Project subjects, exhibit community shifts often observed in obese and T2D individuals in published studies. We also examined microbial network relationships between operational taxonomic units (OTUs) in the Cameron County Hispanic Cohort (CCHC) and three additional datasets. We identified a group of seven genera that form a tightly interconnected network present in all four tested datasets, dominated by butyrate producers, which are often increased in obese individuals while being depleted in T2D patients.
Through a combination of increased disease prevalence and relatively high gut microbial homogeneity in the subset of CCHC members we examined, we believe that the CCHC may represent an ideal community to dissect mechanisms underlying the role of the gut microbiome in human health and disease. The lack of CCHC subject gut community segregation based on all tested metadata suggests that the community structure we observe in the CCHC likely occurs early in life, and endures. This persistent ‘disease’-related gut microbial community in CCHC subjects may enhance existing genetic or lifestyle predispositions to the prevalent diseases of the CCHC, leading to increased attack rates of obesity, T2D, non-alcoholic fatty liver disease, and others.
Electronic supplementary material
The online version of this article (doi:10.1186/s40168-015-0072-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4355967  PMID: 25763184
18.  Molecular Findings Among Patients Referred for Clinical Whole-Exome Sequencing 
JAMA  2014;312(18):1870-1879.
Clinical whole-exome sequencing is increasingly used for diagnostic evaluation of patients with suspected genetic disorders.
To perform clinical whole-exome sequencing and report (1) the rate of molecular diagnosis among phenotypic groups, (2) the spectrum of genetic alterations contributing to disease, and (3) the prevalence of medically actionable incidental findings such as FBN1 mutations causing Marfan syndrome.
Observational study of 2000 consecutive patients with clinical whole-exome sequencing analyzed between June 2012 and August 2014. Whole-exome sequencing tests were performed at a clinical genetics laboratory in the United States. Results were reported by clinical molecular geneticists certified by the American Board of Medical Genetics and Genomics. Tests were ordered by the patient’s physician. The patients were primarily pediatric (1756 [88%]; mean age, 6 years; 888 females [44%], 1101 males [55%], and 11 fetuses [1% gender unknown]), demonstrating diverse clinical manifestations most often including nervous system dysfunction such as developmental delay.
Whole-exome sequencing diagnosis rate overall and by phenotypic category, mode of inheritance, spectrum of genetic events, and reporting of incidental findings.
A molecular diagnosis was reported for 504 patients (25.2%) with 58% of the diagnostic mutations not previously reported. Molecular diagnosis rates for each phenotypic category were 143/526 (27.2%; 95% CI, 23.5%–31.2%) for the neurological group, 282/1147 (24.6%; 95% CI, 22.1%–27.2%) for the neurological plus other organ systems group, 30/83 (36.1%; 95% CI, 26.1%–47.5%) for the specific neurological group, and 49/244 (20.1%; 95% CI, 15.6%–25.8%) for the nonneurological group. The Mendelian disease patterns of the 527 molecular diagnoses included 280 (53.1%) autosomal dominant, 181 (34.3%) autosomal recessive (including 5 with uniparental disomy), 65 (12.3%) X-linked, and 1 (0.2%) mitochondrial. Of 504 patients with a molecular diagnosis, 23 (4.6%) had blended phenotypes resulting from 2 single gene defects. About 30% of the positive cases harbored mutations in disease genes reported since 2011. There were 95 medically actionable incidental findings in genes unrelated to the phenotype but with immediate implications for management in 92 patients (4.6%), including 59 patients (3%) with mutations in genes recommended for reporting by the American College of Medical Genetics and Genomics.
Whole-exome sequencing provided a potential molecular diagnosis for 25% of a large cohort of patients referred for evaluation of suspected genetic conditions, including detection of rare genetic events and new mutations contributing to disease. The yield of whole-exome sequencing may offer advantages over traditional molecular diagnostic approaches in certain patients.
PMCID: PMC4326249  PMID: 25326635
19.  Targeted Sequencing in Chromosome 17q Linkage Region Identifies Familial Glioma Candidates in the Gliogene Consortium 
Scientific Reports  2015;5:8278.
Glioma is a rare, but highly fatal, cancer that accounts for the majority of malignant primary brain tumors. Inherited predisposition to glioma has been consistently observed within non-syndromic families. Our previous studies, which involved non-parametric and parametric linkage analyses, both yielded significant linkage peaks on chromosome 17q. Here, we use data from next generation and Sanger sequencing to identify familial glioma candidate genes and variants on chromosome 17q for further investigation. We applied a filtering schema to narrow the original list of 4830 annotated variants down to 21 very rare (<0.1% frequency), non-synonymous variants. Our findings implicate the MYO19 and KIF18B genes and rare variants in SPAG9 and RUNDC1 as candidates worthy of further investigation. Burden testing and functional studies are planned.
PMCID: PMC4317686  PMID: 25652157
20.  The Common Marmoset Genome Provides Insight into Primate Biology and Evolution 
Worley, Kim C. | Warren, Wesley C. | Rogers, Jeffrey | Locke, Devin | Muzny, Donna M. | Mardis, Elaine R. | Weinstock, George M. | Tardif, Suzette D. | Aagaard, Kjersti M. | Archidiacono, Nicoletta | Rayan, Nirmala Arul | Batzer, Mark A. | Beal, Kathryn | Brejova, Brona | Capozzi, Oronzo | Capuano, Saverio B. | Casola, Claudio | Chandrabose, Mimi M. | Cree, Andrew | Dao, Marvin Diep | de Jong, Pieter J. | del Rosario, Ricardo Cruz-Herrera | Delehaunty, Kim D. | Dinh, Huyen H. | Eichler, Evan | Fitzgerald, Stephen | Flicek, Paul | Fontenot, Catherine C. | Fowler, R. Gerald | Fronick, Catrina | Fulton, Lucinda A. | Fulton, Robert S. | Gabisi, Ramatu Ayiesha | Gerlach, Daniel | Graves, Tina A. | Gunaratne, Preethi H. | Hahn, Matthew W. | Haig, David | Han, Yi | Harris, R. Alan | Herrero, Javier M. | Hillier, LaDeana W. | Hubley, Robert | Hughes, Jennifer F. | Hume, Jennifer | Jhangiani, Shalini N. | Jorde, Lynn B. | Joshi, Vandita | Karakor, Emre | Konkel, Miriam K. | Kosiol, Carolin | Kovar, Christie L. | Kriventseva, Evgenia V. | Lee, Sandra L. | Lewis, Lora R. | Liu, Yih-shin | Lopez, John | Lopez-Otin, Carlos | Lorente-Galdos, Belen | Mansfield, Keith G. | Marques-Bonet, Tomas | Minx, Patrick | Misceo, Doriana | Moncrieff, J. Scott | Morgan, Margaret B. | Muthuswamy, Raveendran | Nazareth, Lynne V. | Newsham, Irene | Nguyen, Ngoc Bich | Okwuonu, Geoffrey O. | Prabhakar, Shyam | Perales, Lora | Pu, Ling-Ling | Puente, Xose S. | Quesada, Victor | Ranck, Megan C. | Raney, Brian J. | Deiros, David Rio | Rocchi, Mariano | Rodriguez, David | Ross, Corinna | Ruffier, Magali | Ruiz, San Juana | Sajjadian, S. | Santibanez, Jireh | Schrider, Daniel R. | Searle, Steve | Skaletsky, Helen | Soibam, Benjamin | Smit, Arian F. A. | Tennakoon, Jayantha B. | Tomaska, Lubomir | Ullmer, Brygg | Vejnar, Charles E. | Ventura, Mario | Vilella, Albert J. | Vinar, Tomas | Vogel, Jan-Hinnerk | Walker, Jerilyn A. | Wang, Qing | Warner, Crystal M. | Wildman, Derek E. | Witherspoon, David J. | Wright, Rita A. | Wu, Yuanqing | Xiao, Weimin | Xing, Jinchuan | Zdobnov, Evgeny M. | Zhu, Baoli | Gibbs, Richard A. | Wilson, Richard K.
Nature genetics  2014;46(8):850-857.
A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras.
We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
PMCID: PMC4138798  PMID: 25038751
21.  The Sheep Genome Illuminates Biology of the Rumen and Lipid Metabolism 
Science (New York, N.Y.)  2014;344(6188):1168-1173.
Sheep (Ovis aries) are a major source of meat, milk and fiber in the form of wool, and represent a distinct class of animals that have a specialized digestive organ, the rumen, which carries out the initial digestion of plant material. We have developed and analyzed a high quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants, compared to non-ruminant animals.
PMCID: PMC4157056  PMID: 24904168
22.  The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome Organisation in the Centipede Strigamia maritima 
Chipman, Ariel D. | Ferrier, David E. K. | Brena, Carlo | Qu, Jiaxin | Hughes, Daniel S. T. | Schröder, Reinhard | Torres-Oliva, Montserrat | Znassi, Nadia | Jiang, Huaiyang | Almeida, Francisca C. | Alonso, Claudio R. | Apostolou, Zivkos | Aqrawi, Peshtewani | Arthur, Wallace | Barna, Jennifer C. J. | Blankenburg, Kerstin P. | Brites, Daniela | Capella-Gutiérrez, Salvador | Coyle, Marcus | Dearden, Peter K. | Du Pasquier, Louis | Duncan, Elizabeth J. | Ebert, Dieter | Eibner, Cornelius | Erikson, Galina | Evans, Peter D. | Extavour, Cassandra G. | Francisco, Liezl | Gabaldón, Toni | Gillis, William J. | Goodwin-Horn, Elizabeth A. | Green, Jack E. | Griffiths-Jones, Sam | Grimmelikhuijzen, Cornelis J. P. | Gubbala, Sai | Guigó, Roderic | Han, Yi | Hauser, Frank | Havlak, Paul | Hayden, Luke | Helbing, Sophie | Holder, Michael | Hui, Jerome H. L. | Hunn, Julia P. | Hunnekuhl, Vera S. | Jackson, LaRonda | Javaid, Mehwish | Jhangiani, Shalini N. | Jiggins, Francis M. | Jones, Tamsin E. | Kaiser, Tobias S. | Kalra, Divya | Kenny, Nathan J. | Korchina, Viktoriya | Kovar, Christie L. | Kraus, F. Bernhard | Lapraz, François | Lee, Sandra L. | Lv, Jie | Mandapat, Christigale | Manning, Gerard | Mariotti, Marco | Mata, Robert | Mathew, Tittu | Neumann, Tobias | Newsham, Irene | Ngo, Dinh N. | Ninova, Maria | Okwuonu, Geoffrey | Ongeri, Fiona | Palmer, William J. | Patil, Shobha | Patraquim, Pedro | Pham, Christopher | Pu, Ling-Ling | Putman, Nicholas H. | Rabouille, Catherine | Ramos, Olivia Mendivil | Rhodes, Adelaide C. | Robertson, Helen E. | Robertson, Hugh M. | Ronshaugen, Matthew | Rozas, Julio | Saada, Nehad | Sánchez-Gracia, Alejandro | Scherer, Steven E. | Schurko, Andrew M. | Siggens, Kenneth W. | Simmons, DeNard | Stief, Anna | Stolle, Eckart | Telford, Maximilian J. | Tessmar-Raible, Kristin | Thornton, Rebecca | van der Zee, Maurijn | von Haeseler, Arndt | Williams, James M. | Willis, Judith H. | Wu, Yuanqing | Zou, Xiaoyan | Lawson, Daniel | Muzny, Donna M. | Worley, Kim C. | Gibbs, Richard A. | Akam, Michael | Richards, Stephen
PLoS Biology  2014;12(11):e1002005.
Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific life history.
Author Summary
Arthropods are the most abundant animals on earth. Among them, insects clearly dominate on land, whereas crustaceans hold the title for the most diverse invertebrates in the oceans. Much is known about the biology of these groups, not least because of genomic studies of the fruit fly Drosophila, the water flea Daphnia, and other species used in research. Here we report the first genome sequence from a species belonging to a lineage that has previously received very little attention—the myriapods. Myriapods were among the first arthropods to invade the land over 400 million years ago, and survive today as the herbivorous millipedes and venomous centipedes, one of which—Strigamia maritima—we have sequenced here. We find that the genome of this centipede retains more characteristics of the presumed arthropod ancestor than other sequenced insect genomes. The genome provides access to many aspects of myriapod biology that have not been studied before, suggesting, for example, that they have diversified receptors for smell that are quite different from those used by insects. In addition, it shows specific consequences of the largely subterranean life of this particular species, which seems to have lost the genes for all known light-sensing molecules, even though it still avoids light.
PMCID: PMC4244043  PMID: 25423365
23.  Exonic duplication CNV of NDRG1 associated with autosomal-recessive HMSN-Lom/CMT4D 
Copy-number variations as a mutational mechanism contribute significantly to human disease. Approximately one-half of the patients with Charcot–Marie–Tooth (CMT) disease have a 1.4 Mb duplication copy-number variation as the cause of their neuropathy. However, non-CMT1A neuropathy patients rarely have causative copy-number variations, and to date, autosomal-recessive CMT disease has not been associated with copy-number variation as a mutational mechanism.
We performed Agilent 8 × 60K array comparative genomic hybridization on DNA from 12 recessive Turkish families with CMT disease. Additional molecular studies were conducted to detect breakpoint junctions and to evaluate gene expression levels in a family in which we detected an intragenic duplication copy-number variation.
We detected an ~6.25 kb homozygous intragenic duplication in NDRG1, a gene known to be causative for recessive HMSNL/CMT4D, in three individuals from a Turkish family with CMT neuropathy. Further studies showed that this intragenic copy-number variation resulted in a homozygous duplication of exons 6–8 that caused decreased mRNA expression of NDRG1.
Exon-focused high-resolution array comparative genomic hybridization enables the detection of copy-number variation carrier states in recessive genes, particularly small copy-number variations encompassing or disrupting single genes. In families for whom a molecular diagnosis has not been elucidated by conventional clinical assays, an assessment for copy-number variations in known CMT genes might be considered.
PMCID: PMC4224029  PMID: 24136616
autosomal recessive; Charcot–Marie–Tooth disease; CMT4D; CNV; NDRG1
24.  Exome sequencing identification of a GJB1 missense mutation in a kindred with X-linked spinocerebellar ataxia (SCA-X1) 
Human Molecular Genetics  2013;22(21):4329-4338.
We undertook a gene identification and molecular characterization project in a large kindred originally clinically diagnosed with SCA-X1. While presenting with ataxia, this kindred also had some unique peripheral nervous system features. The implicated region on the X chromosome was delineated using haplotyping. Large deletions and duplications were excluded by array comparative genomic hybridization. Exome sequencing was undertaken in two affected subjects. The single identified X chromosome candidate variant was then confirmed to co-segregate appropriately in all affected, carrier and unaffected family members by Sanger sequencing. The variant was confirmed to be novel by comparison with dbSNP, and filtering for a minor allele frequency of <1% in 1000 Genomes project, and was not present in the NHLBI Exome Sequencing Project or a local database at the BCM HGSC. Functional experiments on transfected cells were subsequently undertaken to assess the biological effect of the variant in vitro. The variant identified consisted of a previously unidentified non-synonymous variant, GJB1 p.P58S, in the Connexin 32/Gap Junction Beta 1 gene. Segregation studies with Sanger sequencing confirmed the presence of the variant in all affected individuals and one known carrier, and the absence of the variant in unaffected members. Functional studies confirmed that the p.P58S variant reduced the number and size of gap junction plaques, but the conductance of the gap junctions was unaffected. Two X-linked ataxias have been associated with genetic loci, with the first of these recently characterized at the molecular level. This represents the second kindred with molecular characterization of X-linked ataxia, and is the first instance of a previously unreported GJB1 mutation with a dominant and permanent ataxia phenotype, although different CNS deficits have previously been reported. This pedigree has also been relatively unique in its phenotype due to the presence of central and peripheral neural abnormalities. Other X-linked SCAs with unique features might therefore also potentially represent variable phenotypic expression of other known neurological entities.
PMCID: PMC3792691  PMID: 23773993
25.  Whole Exome Sequencing Identifies Novel Genes for Fetal Hemoglobin Response to Hydroxyurea in Children with Sickle Cell Anemia 
PLoS ONE  2014;9(10):e110740.
Hydroxyurea has proven efficacy in children and adults with sickle cell anemia (SCA), but with considerable inter-individual variability in the amount of fetal hemoglobin (HbF) produced. Sibling and twin studies indicate that some of that drug response variation is heritable. To test the hypothesis that genetic modifiers influence pharmacological induction of HbF, we investigated phenotype-genotype associations using whole exome sequencing of children with SCA treated prospectively with hydroxyurea to maximum tolerated dose (MTD). We analyzed 171 unrelated patients enrolled in two prospective clinical trials, all treated with dose escalation to MTD. We examined two MTD drug response phenotypes: HbF (final %HbF minus baseline %HbF), and final %HbF. Analyzing individual genetic variants, we identified multiple low frequency and common variants associated with HbF induction by hydroxyurea. A validation cohort of 130 pediatric sickle cell patients treated to MTD with hydroxyurea was genotyped for 13 non-synonymous variants with the strongest association with HbF response to hydroxyurea in the discovery cohort. A coding variant in Spalt-like transcription factor, or SALL2, was associated with higher final HbF in this second independent replication sample and SALL2 represents an outstanding novel candidate gene for further investigation. These findings may help focus future functional studies and provide new insights into the pharmacological HbF upregulation by hydroxyurea in patients with SCA.
PMCID: PMC4215999  PMID: 25360671

