We previously reported a human-specific gene conversion of
SIGLEC11 by an adjacent paralogous pseudogene
(SIGLEC16P), generating a uniquely human form of the Siglec-11 protein,
which is expressed in the human brain. Here, we show that Siglec-11 is expressed
exclusively in microglia in all human brains studied—a finding of potential
relevance to brain evolution, as microglia modulate neuronal survival, and Siglec-11
recruits SHP-1, a tyrosine phosphatase that modulates microglial biology. Following the
recent finding of a functional SIGLEC16 allele in human populations,
further analysis of the human SIGLEC11 and
SIGLEC16/P sequences revealed an unusual series of
gene conversion events between two loci. Two tandem and likely simultaneous gene
conversions occurred from SIGLEC16P to SIGLEC11 with a
potentially deleterious intervening short segment happening to be excluded. One of the
conversion events also changed the 5′ untranslated sequence, altering predicted
transcription factor binding sites. Both of the gene conversions have been dated to
∼1–1.2 Ma, after the emergence of the genus Homo, but prior to
the emergence of the common ancestor of Denisovans and modern humans about 800,000 years
ago, thus suggesting involvement in later stages of hominin brain evolution. In keeping
with this, recombinant soluble Siglec-11 binds ligands in the human brain. We also address
a second-round more recent gene conversion from SIGLEC11 to
SIGLEC16, with the latter showing an allele frequency of
∼0.1–0.3 in a worldwide population study. Initial pseudogenization of
SIGLEC16 was estimated to occur at least 3 Ma, which thus preceded the
gene conversion of SIGLEC11 by SIGLEC16P. As gene
conversion usually disrupts the converted gene, the fact that ORFs of
hSIGLEC11 and hSIGLEC16 have been maintained after an
unusual series of very complex gene conversion events suggests that these events may have
been subject to hominin-specific selection forces.
pseudogene; gene conversion; human evolution; human brain; microglia
Summary: VarSifter is a graphical software tool for desktop computers that allows investigators of varying computational skills to easily and quickly sort, filter, and sift through sequence variation data. A variety of filters and a custom query framework allow filtering based on any combination of sample and annotation information. By simplifying visualization and analyses of exome-scale sequence variation data, this program will help bring the power and promise of massively-parallel DNA sequencing to a broader group of researchers.
Availability and Implementation: VarSifter is written in Java, and is freely available in source and binary versions, along with a User Guide, at http://research.nhgri.nih.gov/software/VarSifter/.
Supplementary Information: Additional figures and methods available online at the journal's website.
The kallikrein (KLK) gene family comprises the largest uninterrupted locus of serine proteases in the human genome and represents a notable case for studying the evolutionary fate of duplicated genes. In primates, a recent duplication event gave rise to KLK2 and KLK3, both encoding essential proteins for the cascade of seminal plasma liquefaction. We reconstructed the evolutionary history of KLK2 and KLK3 by comparative analysis of the orthologous sequences from 22 primate species, calculated dN/dS ratios, and addressed the hypothesis of coevolution with their substrates, the semenogelins (SEMG1 and SEMG2). Our findings support the placement of the KLK2–KLK3 duplication in the Catarrhini ancestor and unveil the frequent loss of KLK2 throughout primate evolution by different genomic mechanisms, including unequal crossing-over, deletions, and pseudogenization. We provide evidences for an adaptive evolution of KLK3 toward an expanded enzymatic spectrum, with an effect on the hydrolysis of semen coagulum. Furthermore, we found associations between mating system, the number of SEMG repeat units, and the number of functional KLK2 and KLK3, suggesting complex evolutionary dynamics shaped by reproductive biology.
serine proteases; adaptive evolution; mating system; semen coagulation; semenogelins
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.
gene clusters; orthology; conversion; evolutionary inference; KIR
Reports of frequent loss of heterozygosity (LOH) of markers on human chromosome 7q in malignant myeloid disorders as well as breast, prostate, ovarian, colon, head and neck, gastric, pancreatic, and renal cell carcinomas suggest the presence of a tumor suppressor gene (TSG). Functional assays have demonstrated that the introduction of an intact copy of human chromosome 7 (hchr7) can restore senescence to immortalized human fibroblast cell lines having LOH of markers within 7q31-q32 and can inhibit the tumorigenic phenotype of a murine squamous cell carcinoma cell line. To facilitate the cloning of the putative TSG, we have constructed a high-resolution physical map of this region of hchr7, specifically that encompassing the markers D7S522 and D7S677 within 7q31.1-q31.2. By using a lower resolution yeast artificial chromosome-based map as a starting framework, we established complete clone coverage of the implicated critical region in bacterial-artificial chromosomes (BACs) and P1-derived artificial chromosomes (PACs). The resulting BAC/PAC-based contig map has provided suitable clones for the systematic sequencing of the entire interval. In addition, we have already identified 29 clusters of overlapping expressed-sequence tags (ESTs) and 4 known genes contained within these clones. Together, the physical map reported here coupled with the evolving sequence and gene maps should hasten the identification of the putative TSG residing within this region of hchr7.
tumor suppressor gene; human chromosome 7; physical mapping; expressed-sequence tag; bacterial artificial chromosome
Gene conversion events are often overlooked in analyses of genome evolution. In a conversion event, an interval of DNA sequence (not necessarily containing a gene) overwrites a highly similar sequence. The event creates relationships among genomic intervals that can confound attempts to identify orthologs and to transfer functional annotation between genomes. Here we examine 1,616,329 paralogous pairs of mouse genomic intervals, and detect conversion events in about 7.5% of them. Properties of the putative gene conversions are analyzed, such as the lengths of the paralogous pairs and the spacing between their sources and targets. Our approach is illustrated using conversion events in primate CCL gene clusters. Source code for our program is included in the 3SEQ_2D package, which is freely available at www.bx.psu.edu/miller_lab/.
algorithms; computational molecular biology; evolution
Rapid evolution is a hallmark of centromeric DNA in eukaryotic genomes. Yet, the centromere itself has a conserved functional role that is mediated by the kinetochore protein complex. To broaden our understanding about both the DNA and proteins that interact at the functional centromere, we sought to gain a detailed view of the evolutionary events that have shaped the primate kinetochore. Specifically, we performed comparative mapping and sequencing of the genomic regions encompassing the genes encoding three foundation kinetochore proteins: Centromere Proteins A, B, and C (CENP-A, CENP-B, and CENP-C). A histone H3 variant, CENP-A provides the foundation of the centromere-specific nucleosome. Comparative sequence analyses of the CENP-A gene in 14 primate species revealed encoded amino-acid residues within both the histone-fold domain and the N-terminal tail that are under strong positive selection. Similar comparative analyses of CENP-C, another foundation protein essential for centromere function, identified amino-acid residues throughout the protein under positive selection in the primate lineage, including several in the centromere localization and DNA-binding regions. Perhaps surprisingly, the gene encoding CENP-B, a kinetochore protein that binds specifically to alpha-satellite DNA, was not found to be associated with signatures of positive selection. These findings point to important and distinct evolutionary forces operating on the DNA and proteins of the primate centromere.
kinetochore; selection; evolution; centromere
Patients with cystic fibrosis (CF) manifest a multisystem disease due to deleterious mutations in each gene encoding the cystic fibrosis transmembrane conductance regulator (CFTR). However, the role of dysfunctional CFTR is uncertain in individuals with mild forms of CF (ie, pancreatic sufficiency) and mutation in only one CFTR gene.
Eleven pancreatic sufficient (PS) CF patients with only one CFTR mutation identified after mutation screening (three patients), mutation scanning (four patients) or DNA sequencing (four patients) were studied. Bi-directional sequencing of the coding region of CFTR was performed in patients who had mutation screening or scanning. If a second CFTR mutation was not identified, CFTR mRNA transcripts from nasal epithelial cells were analysed to determine if any PS-CF patients harboured a second CFTR mutation that altered RNA expression.
Sequencing of the coding regions of CFTR identified a second deleterious mutation in five of the seven patients who previously had mutation screening or mutation scanning. Five of the remaining six patients with only one deleterious mutation identified in the coding region of one CFTR gene had a pathologic reduction in the amount of RNA transcribed from their other CFTR gene (8.4–16% of wild type).
These results show that sequencing of the coding region of CFTR followed by analysis of CFTR transcription could be a useful diagnostic approach to confirm that patients with mild forms of CF harbour deleterious alterations in both CFTR genes.
In Gaucher disease (GD), the inherited deficiency of glucocerebrosidase results in the accumulation of glucocerebroside within lysosomes. Although almost 300 mutations in the glucocerebrosidase gene (GBA) have been identified, the ability to predict phenotype from genotype is quite limited. In this study, we sought to examine potential GBA transcriptional regulatory elements for variants that contribute to phenotypic diversity. Specifically, we generated the genomic sequence for the orthologous genomic region (~39.4 kb) encompassing GBA in eight non-human mammals. Computational comparisons of the resulting sequences, using human sequence as the reference, allowed the identification of multi-species conserved sequences (MCSs). Further analyses predicted the presence of two putative clusters of transcriptional regulatory elements upstream and downstream of GBA, containing five and three transcription factor-binding sites (TFBSs), respectively. A firefly luciferase (Fluc) reporter construct containing sequence flanking the GBA gene was used to test the functional consequences of altering these conserved sequences. The predicted TFBSs were individually altered by targeted mutagenesis, resulting in enhanced Fluc expression for one site and decreased expression for seven others sites. Gel-shift assays confirmed the loss of nuclear-protein binding for several of the mutated constructs. These identified conserved non-coding sequences flanking GBA could play a role in the transcriptional regulation of the gene contributing to the complexity underlying the phenotypic diversity seen in GD.
Multi-species sequence comparisons; glucocerebrosidase; Gaucher disease; transcriptional regulation; luciferase assays
Balancing selection is potentially an important biological force for maintaining advantageous genetic diversity in populations, including variation that is responsible for long-term adaptation to the environment. By serving as a means to maintain genetic variation, it may be particularly relevant to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting that although balancing selection may not have an obvious impact on a large proportion of human genes, it is a key force affecting the evolution of a number of genes in humans.
overdominance; frequency-dependent selection; heterosis; human evolution; population genetics; human diversity
Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest. We have developed an algorithm for reconstructing the evolutionary history of gene clusters using only human genomic sequence data, which allows the tempo of large-scale evolutionary events in human gene clusters to be estimated. We further propose an extension of the method to simultaneously reconstructing the evolutionary histories of orthologous gene clusters in multiple primates, which will facilitate primate comparative sequencing studies that aim to reconstruct their evolutionary history more fully.
alignment; computational molecular biology; genetic mapping; haplotypes; Markov chains
Long considered to be the building block of life, it is now apparent that protein is only one of many functional products generated by the eukaryotic genome. Indeed, more of the human genome is transcribed into noncoding sequence than into protein-coding sequence. Nevertheless, whilst we have developed a deep understanding of the relationships between evolutionary constraint and function for protein-coding sequence, little is known about these relationships for non-coding transcribed sequence. This dearth of information is partially attributable to a lack of established non-protein-coding RNA (ncRNA) orthologs among birds and mammals within sequence and expression databases.
Here, we performed a multi-disciplinary study of four highly conserved and brain-expressed transcripts selected from a list of mouse long intergenic noncoding RNA (lncRNA) loci that generally show pronounced evolutionary constraint within their putative promoter regions and across exon-intron boundaries. We identify some of the first lncRNA orthologs present in birds (chicken), marsupial (opossum), and eutherian mammals (mouse), and investigate whether they exhibit conservation of brain expression. In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable.
The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla. Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.
Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest. We have developed an algorithm for reconstructing the evolutionary history of gene clusters using only human genomic sequence data, which allows the tempo of largescale evolutionary events in human gene clusters to be estimated. We further propose an extension of the method to simultaneously reconstructing the evolutionary histories of orthologous gene clusters in multiple primates, which will facilitate primate comparative sequencing studies that aim to reconstruct their evolutionary history more fully.
alignment; computational molecular biology; genetic mapping; haplotypes; Markov chains
Human skin is a large, heterogeneous organ that protects the body from pathogens while sustaining microorganisms that influence human health and disease. Our analysis of 16S ribosomal RNA gene sequences obtained from 20 distinct skin sites of healthy humans revealed that physiologically comparable sites harbor similar bacterial communities. The complexity and stability of the microbial community are dependent on the specific characteristics of the skin site. This topographical and temporal survey provides a baseline for studies that examine the role of bacterial communities in disease states and the microbial interdependencies required to maintain healthy skin.
Expression profile analysis clusters Gpnmb with known pigment genes, Tyrp1, Dct, and Si. During development, Gpnmb is expressed in a pattern similar to Mitf, Dct and Si with expression vastly reduced in Mitf mutant animals. Unlike Dct and Si, Gpnmb remains expressed in a discrete population of caudal melanoblasts in Sox10-deficient embryos. To understand the transcriptional regulation of Gpnmb we performed a whole genome annotation of 2,460,048 consensus MITF binding sites, and cross-referenced this with evolutionarily conserved genomic sequences at the GPNMB locus. One conserved element, GPNMB-MCS3, contained two MITF consensus sites, significantly increased luciferase activity in melanocytes and was sufficient to drive expression in melanoblasts in vivo. Deletion of the 5’-most MITF consensus site dramatically reduced enhancer activity indicating a significant role for this site in Gpnmb transcriptional regulation. Future analysis of the Gpnmb locus will provide insight into the transcriptional regulation of melanocytes and Gpnmb expression can be used as a marker for analyzing melanocyte development and disease progression.
Comparative analysis of gene expression profiles using melanocyte lines derived from mice provides a powerful resource to explore genetic components of melanocyte development and pigment cell function. Using expression data, we identified Gpnmb as a new marker for early melanoblast development. We show that Gpnmb is dependent on Mitf for in vivo expression and marks a unique set of Sox10-independent melanoblasts. We identified an 89 basepair evolutionarily conserved genomic sequence at the Gpnmb locus that can enhance expression in melanocytes and tested MITF E-box consensus sequences for their involvement in melanocyte-restricted expression. Gpnmb and the panel of genes identified in this study will be valuable resources for understanding the genetic components involved in melanocyte development and diseases.
Gpnmb; Mitf; Sox10; melanoblast; melanocyte; melanoma
Large-scale genome rearrangements brought about by chromosome breaks underlie numerous inherited diseases, initiate or promote many cancers and are also associated with karyotype diversification during species evolution. Recent research has shown that these breakpoints are nonrandomly distributed throughout the mammalian genome and many, termed "evolutionary breakpoints" (EB), are specific genomic locations that are "reused" during karyotypic evolution. When the phylogenetic trajectory of orthologous chromosome segments is considered, many of these EB are coincident with ancient centromere activity as well as new centromere formation. While EB have been characterized as repeat-rich regions, it has not been determined whether specific sequences have been retained during evolution that would indicate previous centromere activity or a propensity for new centromere formation. Likewise, the conservation of specific sequence motifs or classes at EBs among divergent mammalian taxa has not been determined.
To define conserved sequence features of EBs associated with centromere evolution, we performed comparative sequence analysis of more than 4.8 Mb within the tammar wallaby, Macropus eugenii, derived from centromeric regions (CEN), euchromatic regions (EU), and an evolutionary breakpoint (EB) that has undergone convergent breakpoint reuse and past centromere activity in marsupials. We found a dramatic enrichment for long interspersed nucleotide elements (LINE1s) and endogenous retroviruses (ERVs) and a depletion of short interspersed nucleotide elements (SINEs) shared between CEN and EBs. We analyzed the orthologous human EB (14q32.33), known to be associated with translocations in many cancers including multiple myelomas and plasma cell leukemias, and found a conserved distribution of similar repetitive elements.
Our data indicate that EBs tracked within the class Mammalia harbor sequence features retained since the divergence of marsupials and eutherians that may have predisposed these genomic regions to large-scale chromosomal instability.
Previously we have shown that nonsyndromic cleft lip with or without cleft palate (NSCL/P)1, is strongly associated with SNPs in Interferon Regulatory Factor 6 (IRF6)2. Here, multispecies sequence comparisons identify a common SNP (rs642961, G>A) in a novel IRF6 enhancer. The A allele is significantly overtransmitted (P=1×10−11) in families with NSCL/P, in particular with cleft lip (CL) but not cleft palate. Further, there is a dosage effect of the A allele, with the relative risk for CL 1.68 for the AG genotype and 2.40 for the AA genotype. EMSA and ChIP assays demonstrate that the risk allele disrupts the binding site of transcription factor AP-2α and expression analysis in the mouse localizes the enhancer activity to craniofacial and limb structures. Our findings place IRF6 and AP-2α in the same developmental pathway and identify a high frequency variant in a regulatory element contributing substantially to a common, complex disorder.
Numerous genetic association studies have implicated the KIAA0319 gene on human chromosome 6p22 in dyslexia susceptibility. The causative variant(s) remains unknown but may modulate gene expression, given that (1) a dyslexia-associated haplotype has been implicated in the reduced expression of KIAA0319, and (2) the strongest association has been found for the region spanning exon 1 of KIAA0319. Here, we test the hypothesis that variant(s) responsible for reduced KIAA0319 expression resides on the risk haplotype close to the gene's transcription start site. We identified seven single-nucleotide polymorphisms on the risk haplotype immediately upstream of KIAA0319 and determined that three of these are strongly associated with multiple reading-related traits. Using luciferase-expressing constructs containing the KIAA0319 upstream region, we characterized the minimal promoter and additional putative transcriptional regulator regions. This revealed that the minor allele of rs9461045, which shows the strongest association with dyslexia in our sample (max p-value = 0.0001), confers reduced luciferase expression in both neuronal and non-neuronal cell lines. Additionally, we found that the presence of this rs9461045 dyslexia-associated allele creates a nuclear protein-binding site, likely for the transcriptional silencer OCT-1. Knocking down OCT-1 expression in the neuronal cell line SHSY5Y using an siRNA restores KIAA0319 expression from the risk haplotype to nearly that seen from the non-risk haplotype. Our study thus pinpoints a common variant as altering the function of a dyslexia candidate gene and provides an illustrative example of the strategic approach needed to dissect the molecular basis of complex genetic traits.
Dyslexia, or reading disability, is a common disorder caused by both genetic and environmental factors. Genetic studies have implicated a number of genes as candidates for playing a role in dyslexia. We functionally characterized one such gene (KIAA0319) to identify variant(s) that might affect gene expression and contribute to the disorder. We discovered a variant residing outside of the protein-coding region of KIAA0319 that reduces expression of the gene. This variant creates a binding site for the transcription factor OCT-1. Previous studies have shown that OCT-1 binding to a specific DNA sequence upstream of a gene can reduce the expression of that gene. In this case, reduced KIAA0319 expression could lead to improper development of regions of the brain involved in reading ability. This is the first study to identify a functional variant implicated in dyslexia. More broadly, our study illustrates the steps that can be utilized for identifying mutations causing other complex genetic disorders.
The mammalian vomeronasal organ (VNO) expresses two G-protein coupled receptor gene families that mediate pheromone responses, the V1R and V2R receptor genes. In rodents, there are ~150 V1R genes comprising 12 subfamilies organized in gene clusters at multiple chromosomal locations. Previously, we showed that several of these subfamilies had been extensively modulated by gene duplications, deletions, and gene conversions around the time of the evolutionary split of the mouse and rat lineages, consistent with the hypothesis that V1R repertoires might be involved in reinforcing speciation events. Here, we generated genome sequence for one large cluster containing two V1R subfamilies in Mus spretus, a closely related and sympatric species to Mus musculus, and investigated evolutionary change in these repertoires along the two mouse lineages.
We describe a comparison of spretus and musculus with respect to genome organization and synteny, as well as V1R gene content and phylogeny, with reference to previous observations made between mouse and rat. Unlike the mouse-rat comparisons, synteny seems to be largely conserved between the two mouse species. Disruption of local synteny is generally associated with differences in repeat content, although these differences appear to arise more from deletion than new integrations. Even though unambiguous V1R orthology is evident, we observe dynamic modulation of the functional repertoires, with two of seven V1Rb and one of eleven V1Ra genes lost in spretus, two V1Ra genes becoming pseudogenes in musculus, two additional orthologous pairs apparently subject to strong adaptive selection, and another divergent orthologous pair that apparently was subjected to gene conversion.
Therefore, eight of the 18 (~44%) presumptive V1Ra/V1Rb genes in the musculus-spretus ancestor appear to have undergone functional modulation since these two species diverged. As compared to the rat-mouse split, where modulation is evident by independent expansions of these two V1R subfamilies, divergence between musculus and spretus has arisen more by mutations within coding sequences. These results support the hypothesis that adaptive changes in functional V1R repertoires contribute to the delineation of very closely related species.
Sox10 is a dynamically regulated transcription factor gene that is essential for the development of neural crest–derived and oligodendroglial populations. Developmental genes often require multiple regulatory sequences that integrate discrete and overlapping functions to coordinate their expression. To identify Sox10 cis-regulatory elements, we integrated multiple model systems, including cell-based screens and transposon-mediated transgensis in zebrafish, to scrutinize mammalian conserved, noncoding genomic segments at the mouse Sox10 locus. We demonstrate that eight of 11 Sox10 genomic elements direct reporter gene expression in transgenic zebrafish similar to patterns observed in transgenic mice, despite an absence of observable sequence conservation between mice and zebrafish. Multiple segments direct expression in overlapping populations of neural crest derivatives and glial cells, ranging from pan-Sox10 and pan-neural crest regulatory control to the modulation of expression in subpopulations of Sox10-expressing cells, including developing melanocytes and Schwann cells. Several sequences demonstrate overlapping spatial control, yet direct expression in incompletely overlapping developmental intervals. We were able to partially explain neural crest expression patterns by the presence of head to head SoxE family binding sites within two of the elements. Moreover, we were able to use this transcription factor binding site signature to identify the corresponding zebrafish enhancers in the absence of overall sequence homology. We demonstrate the utility of zebrafish transgenesis as a high-fidelity surrogate in the dissection of mammalian gene regulation, especially those with dynamically controlled developmental expression.
The neural crest is a population of embryonic migratory stem cells. They form atop the future spinal cord and migrate throughout developing embryos and form many different cells, including the epidermal pigment cells, bone cells in the head, and nerve cells of the peripheral nervous system. In this study, we studied the genome elements responsible for expression of SOX10, a dynamically expressed gene that is essential for neural crest development. We isolated candidate regulatory elements for SOX10 by identifying the small percentage of genomic DNA around the gene that did not vary as avian and mammalian genomes changed though evolution. We tested these fragments for their ability to regulate gene expression in zebrafish, a model system that is highly efficient for DNA-mediated expression studies and embryology. We found that even though the genome sequences were not similar to the SOX10 gene in fish, the genomic fragments were able to recapitulate the dynamic expression of SOX10 during development. Through computational analysis of the sequences, we identified a transcription factor binding site signature that identified the corresponding zebrafish SOX10 regulatory elements. This study describes a paradigm for dissecting regulation of essential genes that display complex expression patterns during development.
The low Ca2+ concentration of mammalian endolymph in the inner ear is required for normal hearing and balance. We reported [Yamauchi et al. Biochem Biophys Res Commun, 2005] that the epithelial Ca2+ channels TRPV5 and TRPV6 are expressed in the vestibular system and that TRPV5 expression is stimulated by 1,25-dihydroxyvitamin D3 (1,25(OH)2D3), as also reported in kidney. TRPV5/6 channels are known to be inhibited by extracellular acidic pH. Endolymphatic pH, [Ca2+] and transepithelial potential of the utricle (UP) were measured in Cl-/
HCO3− exchanger pendrin (SLC26A4) knockout mice in vivo. Slc26a4-/- mice exhibit reduced pH and UP and increased [Ca2+]. Monolayers of primary cultures of rat semicircular canal duct (SCCD) cells were grown on permeable supports and cellular uptake of 45Ca2+ was measured individually from the apical and basolateral sides. Net uptake of 45Ca2+ was greater after incubation with 1,25(OH)2D3. Net 45Ca2+ absorption was dramatically inhibited by low apical pH and was stimulated by apical alkaline pH. Gadolinium, lanthanum and ruthenium red reduced apical uptake. These observations support the notion that one aspect of vestibular dysfunction in Pendred syndrome is a pathological elevation of endolymphatic [Ca2+] due to luminal acidification and consequent inhibition of TRPV5/6-mediated Ca2+ absorption.
Epithelial Ca Channel; vitamin D; SLC26A4; HCO3- secretion; TRPV5; TRPV6
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large data set of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large data set, we partitioned the sequence data in several ways and utilized maximum likelihood, maximum parsimony, and Neighbor-Joining algorithms, implemented in parallel on Linux clusters. These studies yielded well-supported phylogenetic trees, largely confirming other recent molecular phylogenetic analyses. Our results provide support for rooting the placental mammal tree between Atlantogenata (Xenarthra and Afrotheria) and Boreoeutheria (Euarchontoglires and Laurasiatheria), illustrate the difficulty in resolving some branches even with large amounts of data (e.g., in the case of Laurasiatheria), and demonstrate the valuable role that very large comparative sequence data sets can play in refining our understanding of the evolutionary relationships of vertebrates.
Placentalia; Eutheria; Mammalia; mammalian phylogeny; phylogenomics; Atlantogenata; molecular systematics
Otopetrin 1 (Otop1) encodes a multi-transmembrane domain protein with no homology to known transporters, channels, exchangers, or receptors. Otop1 is necessary for the formation of otoconia and otoliths, calcium carbonate biominerals within the inner ear of mammals and teleost fish that are required for the detection of linear acceleration and gravity. Vertebrate Otop1 and its paralogues Otop2 and Otop3 define a new gene family with homology to the invertebrate Domain of Unknown Function 270 genes (DUF270; pfam03189).
Multi-species comparison of the predicted primary sequences and predicted secondary structures of 62 vertebrate otopetrin, and arthropod and nematode DUF270 proteins, has established that the genes encoding these proteins constitute a single family that we renamed the Otopetrin Domain Protein (ODP) gene family. Signature features of ODP proteins are three "Otopetrin Domains" that are highly conserved between vertebrates, arthropods and nematodes, and a highly constrained predicted loop structure.
Our studies suggest a refined topologic model for ODP insertion into the lipid bilayer of 12 transmembrane domains, and highlight conserved amino-acid residues that will aid in the biochemical examination of ODP family function. The high degree of sequence and structural similarity of the ODP proteins may suggest a conserved role in the intracellular trafficking of calcium and the formation of biominerals.
Pendred syndrome, an autosomal-recessive disorder characterized by deafness and goiter, is caused by a mutation of SLC26A4, which codes for the anion exchanger pendrin. We investigated the relationship between pendrin expression and deafness using mice that have (Slc26a4+/+ or Slc26a4+/-) or lack (Slc26a4-/-) a complete Slc26a4 gene. Previously, we reported that stria vascularis of adult Slc26a4-/- mice is hyperpigmented and that marginal cells appear disorganized. Here we determine the time course of hyperpigmentation and marginal cell disorganization, and test the hypothesis that inflammation contributes to this tissue degeneration.
Slc26a4-/- and age-matched control (Slc26a4+/+ or Slc26a4+/-) mice were studied at four postnatal (P) developmental stages: before and after the age that marks the onset of hearing (P10 and P15, respectively), after weaning (P28-41) and adult (P74-170). Degeneration and hyperpigmentation stria vascularis was evaluated by confocal microscopy. Gene expression in stria vascularis was analyzed by microarray and quantitative RT-PCR. In addition, the expression of a select group of genes was quantified in spiral ligament, spleen and liver to evaluate whether expression changes seen in stria vascularis are specific for stria vascularis or systemic in nature.
Degeneration of stria vascularis defined as hyperpigmentation and marginal cells disorganization was not seen at P10 or P15, but occurred after weaning and was associated with staining for CD68, a marker for macrophages. Marginal cells in Slc26a4-/-, however, had a larger apical surface area at P10 and P15. No difference in the expression of Lyzs, C3 and Cd45 was found in stria vascularis of P15 Slc26a4+/- and Slc26a4-/- mice. However, differences in expression were found after weaning and in adult mice. No difference in the expression of markers for acute inflammation, including Il1a, Il6, Il12a, Nos2 and Nos3 were found at P15, after weaning or in adults. The expression of macrophage markers including Ptprc (= Cd45), Cd68, Cd83, Lyzs, Lgals3 (= Mac2 antigen), Msr2, Cathepsins B, S, and K (Ctsb, Ctss, Ctsk) and complement components C1r, C3 and C4 was significantly increased in stria vascularis of adult Slc26a4-/- mice compared to Slc26a4+/+ mice. Expression of macrophage markers Cd45 and Cd84 and complement components C1r and C3 was increased in stria vascularis but not in spiral ligament, liver or spleen of Slc26a4-/- compared to Slc26a4+/- mice. The expression of Lyzs was increased in stria vascularis and spiral ligament but not in liver or spleen.
The data demonstrate that hyperpigmentation of stria vascularis and marginal cell reorganization in Slc26a4-/- mice occur after weaning, coinciding with an invasion of macrophages. The data suggest that macrophage invasion contributes to tissue degeneration in stria vascularis, and that macrophage invasion is restricted to stria vascularis and is not systemic in nature. The delayed onset of degeneration of stria vascularis suggests that a window of opportunity exists to restore/preserve hearing in mice and therefore possibly in humans suffering from Pendred syndrome.