StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions. Data provided by these searches will elucidate gene family evolution in early animals. Unique research tools, including a Nematostella genetic stock library, a primer library, a literature repository and a gene expression library will provide support to the burgeoning Nematostella research community. The development of StellaBase accompanies significant upgrades to CnidBase, the Cnidarian Evolutionary Genomics Database. With the completion of the first sequenced cnidarian genome, genome comparison tools have been added to CnidBase. In addition, StellaBase provides a framework for the integration of additional species-specific databases into CnidBase. StellaBase is available at .
doi:10.1093/nar/gkj020
PMCID: PMC1347383
PMID: 16381919
Large data sets on human genetic variation have been collected recently, but their usefulness for learning about history and natural selection has been limited by biases in the ways polymorphisms were chosen. We report large subsets of SNPs from the International HapMap Project1,2 that allow us to overcome these biases and to provide accurate measurement of a quantity of crucial importance for understanding genetic variation: the allele frequency spectrum. Our analysis shows that East Asian and northern European ancestors shared the same population bottleneck expanding out of Africa but that both also experienced more recent genetic drift, which was greater in East Asians.
doi:10.1038/ng2116
PMCID: PMC3586588
PMID: 17828266
Cheng, Linzhao | Hansen, Nancy F. | Zhao, Ling | Du, Yutao | Zou, Chunlin | Donovan, Frank X. | Chou, Bin-Kuan | Zhou, Guangyu | Li, Shijie | Dowey, Sarah N. | Ye, Zhaohui | Chandrasekharappa, Settara C. | Yang, Huanming | Mullikin, James C. | Liu, P. Paul
Summary
The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1058–1808 heterozygous single nucleotide variants (SNVs), but no copy number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction.
doi:10.1016/j.stem.2012.01.005
PMCID: PMC3298448
PMID: 22385660
Human iPS cells; Reprogramming; Episomal vectors; Integration-free; Genetic mutations; Whole Genome Sequencing
Pierson, Tyler Mark | Adams, David | Bonn, Florian | Martinelli, Paola | Cherukuri, Praveen F. | Teer, Jamie K. | Hansen, Nancy F. | Cruz, Pedro | Mullikin, James C. | Blakesley, Robert W. | Golas, Gretchen | Kwan, Justin | Sandler, Anthony | Fuentes Fajardo, Karin | Markello, Thomas | Tifft, Cynthia | Blackstone, Craig | Rugarli, Elena I. | Langer, Thomas | Gahl, William A. | Toro, Camilo
PLoS Genetics
2013;9(2):10.1371/annotation/273d7d98-3a1b-494b-839e-de31a0f33d28.
doi:10.1371/annotation/273d7d98-3a1b-494b-839e-de31a0f33d28
PMCID: PMC3586582
PMID: 23460765
Summary: VarSifter is a graphical software tool for desktop computers that allows investigators of varying computational skills to easily and quickly sort, filter, and sift through sequence variation data. A variety of filters and a custom query framework allow filtering based on any combination of sample and annotation information. By simplifying visualization and analyses of exome-scale sequence variation data, this program will help bring the power and promise of massively-parallel DNA sequencing to a broader group of researchers.
Availability and Implementation: VarSifter is written in Java, and is freely available in source and binary versions, along with a User Guide, at http://research.nhgri.nih.gov/software/VarSifter/.
Contact:
mullikin@mail.nih.gov
Supplementary Information: Additional figures and methods available online at the journal's website.
doi:10.1093/bioinformatics/btr711
PMCID: PMC3278764
PMID: 22210868
Alhaddad, Hasan | Khan, Razib | Grahn, Robert A. | Gandolfi, Barbara | Mullikin, James C. | Cole, Shelley A. | Gruffydd-Jones, Timothy J. | Häggström, Jens | Lohi, Hannes | Longeri, Maria | Lyons, Leslie A. | Ellegren, Hans
Domestic cats have a unique breeding history and can be used as models for human hereditary and infectious diseases. In the current era of genome-wide association studies, insights regarding linkage disequilibrium (LD) are essential for efficient association studies. The objective of this study is to investigate the extent of LD in the domestic cat, Felis silvestris catus, particularly within its breeds. A custom illumina GoldenGate Assay consisting of 1536 single nucleotide polymorphisms (SNPs) equally divided over ten 1 Mb chromosomal regions was developed, and genotyped across 18 globally recognized cat breeds and two distinct random bred populations. The pair-wise LD descriptive measure (r2) was calculated between the SNPs in each region and within each population independently. LD decay was estimated by determining the non-linear least-squares of all pair-wise estimates as a function of distance using established models. The point of 50% decay of r2 was used to compare the extent of LD between breeds. The longest extent of LD was observed in the Burmese breed, where the distance at which r2 ≈ 0.25 was ∼380 kb, comparable to several horse and dog breeds. The shortest extent of LD was found in the Siberian breed, with an r2 ≈ 0.25 at approximately 17 kb, comparable to random bred cats and human populations. A comprehensive haplotype analysis was also conducted. The haplotype structure of each region within each breed mirrored the LD estimates. The LD of cat breeds largely reflects the breeds’ population history and breeding strategies. Understanding LD in diverse populations will contribute to an efficient use of the newly developed SNP array for the cat in the design of genome-wide association studies, as well as to the interpretation of results for the fine mapping of disease and phenotypic traits.
doi:10.1371/journal.pone.0053537
PMCID: PMC3538540
PMID: 23308248
Prüfer, Kay | Munch, Kasper | Hellmann, Ines | Akagi, Keiko | Miller, Jason R. | Walenz, Brian | Koren, Sergey | Sutton, Granger | Kodira, Chinnappa | Winer, Roger | Knight, James R. | Mullikin, James C. | Meader, Stephen J. | Ponting, Chris P. | Lunter, Gerton | Higashino, Saneyuki | Hobolth, Asger | Dutheil, Julien | Karakoç, Emre | Alkan, Can | Sajjadian, Saba | Catacchio, Claudia Rita | Ventura, Mario | Marques-Bonet, Tomas | Eichler, Evan E. | André, Claudine | Atencia, Rebeca | Mugisha, Lawrence | Junhold, Jörg | Patterson, Nick | Siebauer, Michael | Good, Jeffrey M. | Fischer, Anne | Ptak, Susan E. | Lachmann, Michael | Symer, David E. | Mailund, Thomas | Schierup, Mikkel H. | Andrés, Aida M. | Kelso, Janet | Pääbo, Svante
Nature
2012;486(7404):527-531.
Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours1–4, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other.
doi:10.1038/nature11128
PMCID: PMC3498939
PMID: 22722832
Recent advances in sequencing technology have led to a rapid accumulation of mitochondrial DNA (mtDNA) sequences, which now represent the wide spectrum of animal diversity. However, one animal phylum – Ctenophora – has, to date, remained completely unsampled. Ctenophores, a small group of marine animals, are of interest due to their unusual biology, controversial phylogenetic position, and devastating impact as an invasive species. Using data from the Mnemiopsis leidyi genome sequencing project, we PCR amplified and analyzed its complete mitochondrial (mt-) genome. At just over 10kb, the mt-genome of M. leidyi is the smallest animal mtDNA ever reported and is among the most derived. It has lost at least 25 genes, including atp6 and all tRNA genes. We show that atp6 has been relocated to the nuclear genome and has acquired introns and a mitochondrial targeting presequence, while tRNA genes have been genuinely lost, along with nuclear-encoded mt-aminoacyl tRNA synthetases. The mt-genome of M. leidyi also displays extremely high rates of sequence evolution, which likely led to the degeneration of both protein and rRNA genes. In particular, encoded rRNA molecules possess little similarity with their homologues in other organisms and have highly reduced secondary structures. At the same time, nuclear encoded mt-ribosomal proteins have undergone expansions, probably to compensate for the reductions in mt-rRNA. The unusual features identified in M. leidyi mtDNA make this organism an interesting system for the study of various aspects of mitochondrial biology, particularly protein and tRNA import and mt-ribosome structures, and add to its value as an emerging model species. Furthermore, the fast-evolving M. leidyi mtDNA should be a convenient molecular marker for species- and population-level studies.
doi:10.3109/19401736.2011.624611
PMCID: PMC3313829
PMID: 21985407
Ctenophora; comparative genomics; cytonuclear coevolution
Zhu, Jiang | O’Dell, Sijy | Ofek, Gilad | Pancera, Marie | Wu, Xueling | Zhang, Baoshan | Zhang, Zhenhai | Mullikin, James C. | Simek, Melissa | Burton, Dennis R. | Koff, Wayne C. | Shapiro, Lawrence | Mascola, John R. | Kwong, Peter D.
Select HIV-1-infected individuals develop sera capable of neutralizing diverse viral strains. The molecular basis of this neutralization is currently being deciphered by the isolation of HIV-1-neutralizing antibodies. In one infected donor, three neutralizing antibodies, PGT135–137, were identified by assessment of neutralization from individually sorted B cells and found to recognize an epitope containing an N-linked glycan at residue 332 on HIV-1 gp120. Here we use next-generation sequencing and bioinformatics methods to interrogate the B cell record of this donor to gain a more complete understanding of the humoral immune response. PGT135–137-gene family specific primers were used to amplify heavy-chain and light-chain variable-domain sequences. Pyrosequencing produced 141,298 heavy-chain sequences of IGHV4-39 origin and 87,229 light-chain sequences of IGKV3-15 origin. A number of heavy and light-chain sequences of ∼90% identity to PGT137, several to PGT136, and none of high identity to PGT135 were identified. After expansion of these sequences to include close phylogenetic relatives, a total of 202 heavy-chain sequences and 72 light-chain sequences were identified. These sequences were clustered into populations of 95% identity comprising 15 for heavy chain and 10 for light chain, and a select sequence from each population was synthesized and reconstituted with a PGT137-partner chain. Reconstituted antibodies showed varied neutralization phenotypes for HIV-1 clade A and D isolates. Sequence diversity of the antibody population represented by these tested sequences was notably higher than observed with a 454 pyrosequencing-control analysis on 10 antibodies of defined sequence, suggesting that this diversity results primarily from somatic maturation. Our results thus provide an example of how pathogens like HIV-1 are opposed by a varied humoral immune response, derived from intrinsic mechanisms of antibody development, and embodied by somatic populations of diverse antibodies.
doi:10.3389/fmicb.2012.00315
PMCID: PMC3441199
PMID: 23024643
antibody bioinformatics; high-throughput sequencing; HIV-1; immunity; N-linked glycan
Rees, Matthew G. | Ng, David | Ruppert, Sarah | Turner, Clesson | Beer, Nicola L. | Swift, Amy J. | Morken, Mario A. | Below, Jennifer E. | Blech, Ilana | Mullikin, James C. | McCarthy, Mark I. | Biesecker, Leslie G. | Gloyn, Anna L. | Collins, Francis S.
Defining the genetic contribution of rare variants to common diseases is a major basic and clinical science challenge that could offer new insights into disease etiology and provide potential for directed gene- and pathway-based prevention and treatment. Common and rare nonsynonymous variants in the GCKR gene are associated with alterations in metabolic traits, most notably serum triglyceride levels. GCKR encodes glucokinase regulatory protein (GKRP), a predominantly nuclear protein that inhibits hepatic glucokinase (GCK) and plays a critical role in glucose homeostasis. The mode of action of rare GCKR variants remains unexplored. We identified 19 nonsynonymous GCKR variants among 800 individuals from the ClinSeq medical sequencing project. Excluding the previously described common missense variant p.Pro446Leu, all variants were rare in the cohort. Accordingly, we functionally characterized all variants to evaluate their potential phenotypic effects. Defects were observed for the majority of the rare variants after assessment of cellular localization, ability to interact with GCK, and kinetic activity of the encoded proteins. Comparing the individuals with functional rare variants to those without such variants showed associations with lipid phenotypes. Our findings suggest that, while nonsynonymous GCKR variants, excluding p.Pro446Leu, are rare in individuals of mixed European descent, the majority do affect protein function. In sum, this study utilizes computational, cell biological, and biochemical methods to present a model for interpreting the clinical significance of rare genetic variants in common disease.
doi:10.1172/JCI46425
PMCID: PMC3248284
PMID: 22182842
Davis, Erica E. | Zhang, Qi | Liu, Qin | Diplas, Bill H. | Davey, Lisa M. | Hartley, Jane | Stoetzel, Corinne | Szymanska, Katarzyna | Ramaswami, Gokul | Logan, Clare V. | Muzny, Donna M. | Young, Alice C. | Wheeler, David A. | Cruz, Pedro | Morgan, Margaret | Lewis, Lora R. | Cherukuri, Praveen | Maskeri, Baishali | Hansen, Nancy F. | Mullikin, James C. | Blakesley, Robert W. | Bouffard, Gerard G. | Gyapay, Gabor | Reiger, Susanne | Tönshoff, Burkhard | Kern, Ilse | Soliman, Neveen A. | Neuhaus, Thomas J. | Swoboda, Kathryn J. | Kayserili, Hulya | Gallagher, Tomas E. | Lewis, Richard A. | Bergmann, Carsten | Otto, Edgar A. | Saunier, Sophie | Scambler, Peter J. | Beales, Philip L. | Gleeson, Joseph G. | Maher, Eamonn R. | Attié-Bitach, Tania | Dollfus, Hélène | Johnson, Colin A. | Green, Eric D. | Gibbs, Richard A. | Hildebrandt, Friedhelm | Pierce, Eric A. | Katsanis, Nicholas
Ciliary dysfunction leads to a broad range of overlapping phenotypes, termed collectively as ciliopathies. This grouping is underscored by genetic overlap, where causal genes can also contribute modifying alleles to clinically distinct disorders. Here we show that mutations in TTC21B/IFT139, encoding a retrograde intraflagellar transport (IFT) protein, cause both isolated nephronophthisis (NPHP) and syndromic Jeune Asphyxiating Thoracic Dystrophy (JATD). Moreover, although systematic medical resequencing of a large, clinically diverse ciliopathy cohort and matched controls showed a similar frequency of rare changes, in vivo and in vitro evaluations unmasked a significant enrichment of pathogenic alleles in cases, suggesting that TTC21B contributes pathogenic alleles to ∼5% of ciliopathy patients. Our data illustrate how genetic lesions can be both causally associated with diverse ciliopathies, as well as interact in trans with other disease-causing genes, and highlight how saturated resequencing followed by functional analysis of all variants informs the genetic architecture of disorders.
doi:10.1038/ng.756
PMCID: PMC3071301
PMID: 21258341
ClinSeq is a large-scale medical sequencing (LSMS) project at the National Institutes of Health (NIH), the goal of which is to pilot the feasibility of using high throughput genome sequencing for clinical research and eventually to improve the delivery of healthcare. In phase one, 1000 participants are being clinically evaluated for cardiovascular phenotypes and DNA is being collected for sequencing of 400 candidate genes to identify genetic variants that may predispose to the early development of atherosclerosis. We report on an individual with familial hypercholesterolemia (OMIM #143890) who has a novel mutation, c.261_262invGA that predicts a premature stop (p.Trp87X) in the LDLR gene. Although the p.Trp87X predicted protein mutation has been reported, c.261_262invGA is distinct from mutations reported in prior families and emphasizes the importance of describing mutations at the DNA level. It is important to describe mutations according to the underlying DNA change as multiple nucleotide changes may underlie a single predicted protein change.
doi:10.1016/j.atherosclerosis.2010.04.011
PMCID: PMC2914107
PMID: 20452591
Background
Nuclear receptors (NRs) are an ancient superfamily of metazoan transcription factors that play critical roles in regulation of reproduction, development, and energetic homeostasis. Although the evolutionary relationships among NRs are well-described in two prominent clades of animals (deuterostomes and protostomes), comparatively little information has been reported on the diversity of NRs in early diverging metazoans. Here, we identified NRs from the phylum Ctenophora and used a phylogenomic approach to explore the emergence of the NR superfamily in the animal kingdom. In addition, to gain insight into conserved or novel functions, we examined NR expression during ctenophore development.
Results
We report the first described NRs from the phylum Ctenophora: two from Mnemiopsis leidyi and one from Pleurobrachia pileus. All ctenophore NRs contained a ligand-binding domain and grouped with NRs from the subfamily NR2A (HNF4). Surprisingly, all the ctenophore NRs lacked the highly conserved DNA-binding domain (DBD). NRs from Mnemiopsis were expressed in different regions of developing ctenophores. One was broadly expressed in the endoderm during gastrulation. The second was initially expressed in the ectoderm during gastrulation, in regions corresponding to the future tentacles; subsequent expression was restricted to the apical organ. Phylogenetic analyses of NRs from ctenophores, sponges, cnidarians, and a placozoan support the hypothesis that expansion of the superfamily occurred in a step-wise fashion, with initial radiations in NR family 2, followed by representatives of NR families 3, 6, and 1/4 originating prior to the appearance of the bilaterian ancestor.
Conclusions
Our study provides the first description of NRs from ctenophores, including the full complement from Mnemiopsis. Ctenophores have the least diverse NR complement of any animal phylum with representatives that cluster with only one subfamily (NR2A). Ctenophores and sponges have a similarly restricted NR complement supporting the hypothesis that the original NR was HNF4-like and that these lineages are the first two branches from the animal tree. The absence of a zinc-finger DNA-binding domain in the two ctenophore species suggests two hypotheses: this domain may have been secondarily lost within the ctenophore lineage or, if ctenophores are the first branch off the animal tree, the original NR may have lacked the canonical DBD. Phylogenomic analyses and categorization of NRs from all four early diverging animal phyla compared with the complement from bilaterians suggest the rate of NR diversification prior to the cnidarian-bilaterian split was relatively modest, with independent radiations of several NR subfamilies within the cnidarian lineage.
doi:10.1186/2041-9139-2-3
PMCID: PMC3038971
PMID: 21291545
Background
Intercellular signaling pathways are a fundamental component of the integrating cellular behavior required for the evolution of multicellularity. The genomes of three of the four early branching animal phyla (Cnidaria, Placozoa and Porifera) have been surveyed for key components, but not the fourth (Ctenophora). Genomic data from ctenophores could be particularly relevant, as ctenophores have been proposed to be one of the earliest branching metazoan phyla.
Results
A preliminary assembly of the lobate ctenophore Mnemiopsis leidyi genome generated using next-generation sequencing technologies were searched for components of a developmentally important signaling pathway, the Wnt/β-catenin pathway. Molecular phylogenetic analysis shows four distinct Wnt ligands (MlWnt6, MlWnt9, MlWntA and MlWntX), and most, but not all components of the receptor and intracellular signaling pathway were detected. In situ hybridization of the four Wnt ligands showed that they are expressed in discrete regions associated with the aboral pole, tentacle apparati and apical organ.
Conclusions
Ctenophores show a minimal (but not obviously simple) complement of Wnt signaling components. Furthermore, it is difficult to compare the Mnemiopsis Wnt expression patterns with those of other metazoans. mRNA expression of Wnt pathway components appears later in development than expected, and zygotic gene expression does not appear to play a role in early axis specification. Notably absent in the Mnemiopsis genome are most major secreted antagonists, which suggests that complex regulation of this secreted signaling pathway probably evolved later in animal evolution.
doi:10.1186/2041-9139-1-10
PMCID: PMC2959043
PMID: 20920349
Background
The much-debated phylogenetic relationships of the five early branching metazoan lineages (Bilateria, Cnidaria, Ctenophora, Placozoa and Porifera) are of fundamental importance in piecing together events that occurred early in animal evolution. Comparisons of gene content between organismal lineages have been identified as a potentially useful methodology for phylogenetic reconstruction. However, these comparisons require complete genomes that, until now, did not exist for the ctenophore lineage. The homeobox superfamily of genes is particularly suited for these kinds of gene content comparisons, since it is large, diverse, and features a highly conserved domain.
Results
We have used a next-generation sequencing approach to generate a high-quality rough draft of the genome of the ctenophore Mnemiopsis leidyi and subsequently identified a set of 76 homeobox-containing genes from this draft. We phylogenetically categorized this set into established gene families and classes and then compared this set to the homeodomain repertoire of species from the other four early branching metazoan lineages. We have identified several important classes and subclasses of homeodomains that appear to be absent from Mnemiopsis and from the poriferan Amphimedon queenslandica. We have also determined that, based on lineage-specific paralog retention and average branch lengths, it is unlikely that these missing classes and subclasses are due to extensive gene loss or unusually high rates of evolution in Mnemiopsis.
Conclusions
This paper provides a first glimpse of the first sequenced ctenophore genome. We have characterized the full complement of Mnemiopsis homeodomains from this species and have compared them to species from other early branching lineages. Our results suggest that Porifera and Ctenophora were the first two extant lineages to diverge from the rest of animals. Based on this analysis, we also propose a new name - ParaHoxozoa - for the remaining group that includes Placozoa, Cnidaria and Bilateria.
doi:10.1186/2041-9139-1-9
PMCID: PMC2959044
PMID: 20920347
BACKGROUND
Stuttering is a disorder of unknown cause characterized by repetitions, prolongations, and interruptions in the flow of speech. Genetic factors have been implicated in this disorder, and previous studies of stuttering have identified linkage to markers on chromosome 12.
METHODS
We analyzed the chromosome 12q23.3 genomic region in consanguineous Pakistani families, some members of which had nonsyndromic stuttering and in unrelated case and control subjects from Pakistan and North America.
RESULTS
We identified a missense mutation in the N-acetylglucosamine-1-phosphate transferase gene (GNPTAB), which encodes the alpha and beta catalytic subunits of GlcNAc-phosphotransferase (GNPT [EC 2.7.8.15]), that was associated with stuttering in a large, consanguineous Pakistani family. This mutation occurred in the affected members of approximately 10% of Pakistani families studied, but it occurred only once in 192 chromosomes from unaffected, unrelated Pakistani control subjects and was not observed in 552 chromosomes from unaffected, unrelated North American control subjects. This and three other mutations in GNPTAB occurred in unrelated subjects with stuttering but not in control subjects. We also identified three mutations in the GNPTG gene, which encodes the gamma subunit of GNPT, in affected subjects of Asian and European descent but not in control subjects. Furthermore, we identified three mutations in the NAGPA gene, which encodes the so-called uncovering enzyme, in other affected subjects but not in control subjects. These genes encode enzymes that generate the mannose-6-phosphate signal, which directs a diverse group of hydrolases to the lysosome. Deficits in this system are associated with the mucolipidoses, rare lysosomal storage disorders that are most commonly associated with bone, connective tissue, and neurologic symptoms.
CONCLUSIONS
Susceptibility to nonsyndromic stuttering is associated with variations in genes governing lysosomal metabolism.
doi:10.1056/NEJMoa0902630
PMCID: PMC2936507
PMID: 20147709
The development of massively parallel sequencing technologies, coupled with new massively parallel DNA enrichment technologies (genomic capture), has allowed the sequencing of targeted regions of the human genome in rapidly increasing numbers of samples. Genomic capture can target specific areas in the genome, including genes of interest and linkage regions, but this limits the study to what is already known. Exome capture allows an unbiased investigation of the complete protein-coding regions in the genome. Researchers can use exome capture to focus on a critical part of the human genome, allowing larger numbers of samples than are currently practical with whole-genome sequencing. In this review, we briefly describe some of the methodologies currently used for genomic and exome capture and highlight recent applications of this technology.
doi:10.1093/hmg/ddq333
PMCID: PMC2953745
PMID: 20705737
Mullikin, James C | Hansen, Nancy F | Shen, Lei | Ebling, Heather | Donahue, William F | Tao, Wei | Saranga, David J | Brand, Adrianne | Rubenfield, Marc J | Young, Alice C | Cruz, Pedro | Driscoll, Carlos | David, Victor | Al-Murrani, Samer WK | Locniskar, Mary F | Abrahamsen, Mitchell S | O'Brien, Stephen J | Smith, Douglas R | Brockman, Jeffrey A
Background
The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV) that are homologues to human scourges (cancer, SARS, and AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP) map is required in order to accomplish disease and phenotype association discovery.
Description
To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%.
Conclusions
These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.
doi:10.1186/1471-2164-11-406
PMCID: PMC2996934
PMID: 20576142
Microsatellite length mutations are often modeled using the generalized stepwise mutation process, which is a type of random walk. If this model is sufficiently accurate, one can estimate the coalescence time between alleles of a locus after a mathematical transformation of the allele lengths. When large-scale microsatellite genotyping first became possible, there was substantial interest in using this approach to make inferences about time and demography, but that interest has waned because it has not been possible to empirically validate the clock by comparing it with data in which the mutation process is well understood. We analyzed data from 783 microsatellite loci in human populations and 292 loci in chimpanzee populations, and compared them with up to one gigabase of aligned sequence data, where the molecular clock based upon nucleotide substitutions is believed to be reliable. We empirically demonstrate a remarkable linearity (r2 > 0.95) between the microsatellite average square distance statistic and sequence divergence. We demonstrate that microsatellites are accurate molecular clocks for coalescent times of at least 2 million years (My). We apply this insight to confirm that the African populations San, Biaka Pygmy, and Mbuti Pygmy have the deepest coalescent times among populations in the Human Genome Diversity Project. Furthermore, we show that microsatellites support unbiased estimates of population differentiation (FST) that are less subject to ascertainment bias than single nucleotide polymorphism (SNP) FST. These results raise the prospect of using microsatellite data sets to determine parameters of population history. When genotyped along with SNPs, microsatellite data can also be used to correct for SNP ascertainment bias.
doi:10.1093/molbev/msp025
PMCID: PMC2734136
PMID: 19221007
microsatellite evolution; molecular clocks; coalescent time; average square distance; FST; SNP ascertainment bias
A comparative analysis of SNPs and their exonic and intronic environments identifies the features predictive of splice affecting variants.
Background
Single point mutations at both synonymous and non-synonymous positions within exons can have severe effects on gene function through disruption of splicing. Predicting these mutations in silico purely from the genomic sequence is difficult due to an incomplete understanding of the multiple factors that may be responsible. In addition, little is known about which computational prediction approaches, such as those involving exonic splicing enhancers and exonic splicing silencers, are most informative.
Results
We assessed the features of single-nucleotide genomic variants verified to cause exon skipping and compared them to a large set of coding SNPs common in the human population, which are likely to have no effect on splicing. Our findings implicate a number of features important for their ability to discriminate splice-affecting variants, including the naturally occurring density of exonic splicing enhancers and exonic splicing silencers of the exon and intronic environment, extensive changes in the number of predicted exonic splicing enhancers and exonic splicing silencers, proximity to the splice junctions and evolutionary constraint of the region surrounding the variant. By extending this approach to additional datasets, we also identified relevant features of variants that cause increased exon inclusion and ectopic splice site activation.
Conclusions
We identified a number of features that have statistically significant representation among exonic variants that modulate splicing. These analyses highlight putative mechanisms responsible for splicing outcome and emphasize the role of features important for exon definition. We developed a web-tool, Skippy, to score coding variants for these relevant splice-modulating features.
doi:10.1186/gb-2010-11-2-r20
PMCID: PMC2872880
PMID: 20158892
Lagresle-Peyrou, Chantal | Six, Emmanuelle M. | Picard, Capucine | Rieux-Laucat, Frédéric | Michel, Vincent | Ditadi, Andrea | Chappedelaine, Corinne Demerens-de | Morillon, Estelle | Valensi, Françoise | Simon-Stoos, Karen L. | Mullikin, James C. | Noroski, Lenora M. | Besse, Céline | Wulffraat, Nicolas M. | Ferster, Alina | Abecasis, Manuel M. | Calvo, Fabien | Petit, Christine | Candotti, Fabio | Abel, Laurent | Fischer, Alain | Cavazzana-Calvo, Marina
Reticular dysgenesis (RD) is an autosomal recessive form of human Severe Combined Immunodeficiency, characterized by an early differentiation arrest in the myeloid lineage and impaired lymphoid maturation. In addition, affected newborns have bilateral sensorineural deafness. We have identified biallelic mutations in the adenylate kinase 2 (AK2) gene in seven patients affected with RD. These mutations resulted in the absence or a strong decrease in protein expression. We then demonstrated that restoration of AK2 expression in the bone marrow cells of RD patients overcomes the neutrophil differentiation arrest underlining its specific requirement in the development of a restricted set of haematopoietic lineages. Lastly, we established that AK2 is specifically expressed in the stria vascularis region of the inner ear, which provides an explanation to the sensorineural deafness. These results suggest a novel mechanism regulating haematopoetic cell differentiation, and involved in one of the most severe human immunodeficiency syndromes.
doi:10.1038/ng.278
PMCID: PMC2612090
PMID: 19043416
Wu, Xueling | Zhou, Tongqing | Zhu, Jiang | Zhang, Baoshan | Georgiev, Ivelin | Wang, Charlene | Chen, Xuejun | Longo, Nancy S. | Louder, Mark | McKee, Krisha | O’Dell, Sijy | Perfetto, Stephen | Schmidt, Stephen D. | Shi, Wei | Wu, Lan | Yang, Yongping | Yang, Zhi-Yong | Yang, Zhongjia | Zhang, Zhenhai | Bonsignori, Mattia | Crump, John A. | Kapiga, Saidi H. | Sam, Noel E. | Haynes, Barton F. | Simek, Melissa | Burton, Dennis R. | Koff, Wayne C. | Doria-Rose, Nicole A. | Connors, Mark | Mullikin, James C. | Nabel, Gary J. | Roederer, Mario | Shapiro, Lawrence | Kwong, Peter D. | Mascola, John R.
Antibody VRC01 is a human immunoglobulin that neutralizes about 90% of HIV-1 isolates. To understand how such broadly neutralizing antibodies develop, we used x-ray crystallography and 454 pyrosequencing to characterize additional VRC01-like antibodies from HIV-1–infected individuals. Crystal structures revealed a convergent mode of binding for diverse antibodies to the same CD4-binding-site epitope. A functional genomics analysis of expressed heavy and light chains revealed common pathways of antibody-heavy chain maturation, confined to the IGHV1-2*02 lineage, involving dozens of somatic changes, and capable of pairing with different light chains. Broadly neutralizing HIV-1 immunity associated with VRC01-like antibodies thus involves the evolution of antibodies to a highly affinity-matured state required to recognize an invariant viral structure, with lineages defined from thousands of sequences providing a genetic roadmap of their development.
doi:10.1126/science.1207532
PMCID: PMC3516815
PMID: 21835983
Comparisons of chromosome X and the autosomes can illuminate differences in the histories of males and females as well as the forces of natural selection. We compared the patterns of variation in these parts of the genome using two data sets that we assembled for this study that are both genomic in scale. Three independent analyses show that around the time of the dispersal of modern humans out of Africa, chromosome X experienced much more genetic drift than is expected from the pattern on the autosomes. This is not predicted by known episodes of demographic history, and we found no similar patterns associated with the dispersals into East Asia and Europe. We conclude that a gender-biased process that reduced the female effective population size, or an episode of natural selection unusually affecting chromosome X, was associated with the founding of non-African populations.
doi:10.1038/ng.303
PMCID: PMC2612098
PMID: 19098910
Eichler, Evan E. | Nickerson, Deborah A. | Altshuler, David | Bowcock, Anne M. | Brooks, Lisa D. | Carter, Nigel P. | Church, Deanna M. | Felsenfeld, Adam | Guyer, Mark | Lee, Charles | Lupski, James R. | Mullikin, James C. | Pritchard, Jonathan K. | Sebat, Jonathan | Sherry, Stephen T. | Smith, Douglas | Valle, David | Waterston, Robert H.
Nature
2007;447(7141):161-165.
doi:10.1038/447161a
PMCID: PMC2685471
PMID: 17495918
Two sequences of major histocompatibility complex (MHC) regions in the domestic cat, 2.976 and 0.362 Mbps, which were separated by an ancient chromosome break (55–80 MYA) and followed by a chromosomal inversion were annotated in detail. Gene annotation of this MHC was completed and identified 183 possible coding regions, 147 human homologues, possible functional genes and 36 pseudo/unidentified genes) by GENSCAN and BLASTN, BLASTP RepeatMasker programs. The first region spans 2.976 Mbp sequence, which encodes six classical class II antigens (three DRA and three DRB antigens) lacking the functional DP, DQ regions, nine antigen processing molecules (DOA/DOB, DMA/DMB, TAPASIN, and LMP2/LMP7,TAP1/TAP2), 52 class III genes, nineteen class I genes/gene fragments (FLAI-A to FLAI-S). Three class I genes (FLAI-H, I-K, I-E) may encode functional classical class I antigens based on deduced amino acid sequence and promoter structure. The second region spans 0.362 Mbp sequence encoding no class I genes and 18 cross-species conserved genes, excluding class I, II and their functionally related/associated genes, namely framework genes, including three olfactory receptor genes. One previously identified feline endogenous retrovirus, a baboon retrovirus derived sequence (ECE1) and two new endogenous retrovirus sequences, similar to brown bat endogenous retrovirus (FERVmlu1, FERVmlu2) were found within a 140 Kbp interval in the middle of class I region. MHC SNPs were examined based on comparisons of this BAC sequence and MHC homozygous 1.9× WGS sequences and found that 11,654 SNPs in 2.84 Mbp (0.00411 SNP per bp), which is 2.4 times higher rate than average heterozygous region in the WGS (0.0017 SNP per bp genome), and slightly higher than the SNP rate observed in human MHC (0.00337 SNP per bp).
doi:10.1371/journal.pone.0002674
PMCID: PMC2453318
PMID: 18629345