Little is known about pre-mRNA splicing in Dictyostelium discoideum although its genome has been completely sequenced. Our analysis suggests that pre-mRNA splicing plays an important role in D. discoideum gene expression as two thirds of its genes contain at least one intron. Ongoing curation of the genome to date has revealed 40 genes in D. discoideum with clear evidence of alternative splicing, supporting the existence of alternative splicing in this unicellular organism. We identified 160 candidate U2-type spliceosomal proteins and related factors in D. discoideum based on 264 known human genes involved in splicing. Spliceosomal small ribonucleoproteins (snRNPs), PRP19 complex proteins and late-acting proteins are highly conserved in D. discoideum and throughout the metazoa. In non-snRNP and hnRNP families, D. discoideum orthologs are closer to those in A. thaliana, D. melanogaster and H. sapiens than to their counterparts in S. cerevisiae. Several splicing regulators, including SR proteins and CUG-binding proteins, were found in D. discoideum, but not in yeast. Our comprehensive catalog of spliceosomal proteins provides useful information for future studies of splicing in D. discoideum where the efficient genetic and biochemical manipulation will also further our general understanding of pre-mRNA splicing.
pre-mRNA splicing; spliceosomal genes; Dictyostelium discoideum; comparative genomics; splicing regulators
Little is known about cell–substrate adhesion and how motile and adhesive forces work together in moving cells. The ability to rapidly screen a large number of insertional mutants prompted us to perform a genetic screen in Dictyostelium to isolate adhesion-deficient mutants. The resulting substrate adhesion–deficient (sad) mutants grew in plastic dishes without attaching to the substrate. The cells were often larger than their wild-type parents and displayed a rough surface with many apparent blebs. One of these mutants, sadA−, completely lacked substrate adhesion in growth medium. The sadA− mutant also showed slightly impaired cytokinesis, an aberrant F-actin organization, and a phagocytosis defect. Deletion of the sadA gene by homologous recombination recreated the original mutant phenotype. Expression of sadA–GFP in sadA-null cells restored the wild-type phenotype. In sadA–GFP-rescued mutant cells, sadA–GFP localized to the cell surface, appropriate for an adhesion molecule. SadA contains nine putative transmembrane domains and three conserved EGF-like repeats in a predicted extracellular domain. The EGF repeats are similar to corresponding regions in proteins known to be involved in adhesion, such as tenascins and integrins. Our data combined suggest that sadA is the first substrate adhesion receptor to be identified in Dictyostelium.
Dictyostelium; cell–substrate adhesion; EGF-like repeats; phagocytosis; cytokinesis
Nonmuscle myosin II plays a crucial role in a variety of cellular processes (e.g., polarity formation, cell motility, and cytokinesis). It is composed of two heavy chains, two regulatory light chains and two essential light chains. The ATPase activity of the myosin II motor domain is regulated through phosphorylation of the regulatory light chain (RLC) by myosin light chain kinase. To study myosin function and localization in cellular processes, GFP-fused RLCs are widely used; however, the exact kinetic properties of myosins with bound GFP-RLC are poorly described. More importantly, it has not been shown that a regulatory light chain fused at its N-terminus with GFP can maintain the normal phosphorylation-dependent regulation of nonmuscle myosin or serve as a substrate for myosin light chain kinase. We coexpressed N-terminal GFP-RLC with a heavy meromyosin (HMM)-like fragment of nonmuscle myosin IIA and essential light chain to characterize the phosphorylation dynamics and in vitro kinetic properties of the resulting HMM. Myosin light chain kinase phosphorylates the GFP-RLC bound to HMM IIA with the same Vmax as it does the wild type RLC bound to HMM IIA, but the Km is about two fold higher for the GFP fusion protein, meaning that it is a somewhat poorer substrate. The steady-state actin-activated MgATPase activity of the GFP-RLC HMM is very low in the absence of phosphorylation demonstrating that the GFP moiety does not prevent formation of the off state. The actin-activated MgATPase activity of phosphorylated GFP-RLC-HMM and is about half that of wild type phosphorylated HMM. The ability of phosphorylated GFP-RLC-HMM to move actin filaments in the actin gliding assay is also slightly compromised. These data indicate that despite some kinetic differences the N-terminal GFP fusion to the regulatory light chain is a reasonable model system for studying myosin function in vivo.
GFP; Nonmuscle myosin; Regulatory light chain; Enzymatic activity; In vitro motility
dictyBase (http:// dictybase.org), the model organism database for Dictyostelium discoideum, includes the complete genome sequence and expression data for this organism. Relevant literature is integrated into the database, and gene models and functional annotation are manually curated from experimental results and comparative multigenome analyses. dictyBase has recently expanded to include the genome sequences of three additional Dictyostelids, and has added new software tools to facilitate multigenome comparisons. The Dicty Stock Center, a strain and plasmid repository for Dictyostelium research has relocated to Northwestern University in 2009. This allowed us integrating all Dictyostelium resources to better serve the research community. In this chapter, we will describe how to navigate the website and highlight some of our newer improvements.
Dictyostelium discoideum; database; genomic sequence; multigenome; genome browser; Blast; gene page; functional annotation; strains; phenotypes
Clinical data in Electronic Medical Records (EMRs) is a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network or eMERGE investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73–98% and negative predictive values of 98–100%. A majority of EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
dictyBase (http://dictybase.org) is the model organism database for the social amoeba Dictyostelium discoideum. This contribution provides an update on dictyBase that has been previously presented. During the past 3 years, dictyBase has taken significant strides toward becoming a genome portal for the whole Amoebozoa clade. In its latest release, dictyBase has scaled up to host multiple Dictyostelids, including Dictyostelium purpureum [Sucgang, Kuo, Tian, Salerno, Parikh, Feasley, Dalin, Tu, Huang, Barry et al.(2011) (Comparative genomics of the social amoebae Dictyostelium discoideum and Dictyostelium purpureum. Genome Biol., 12, R20)], Dictyostelium fasciculatum and Polysphondylium pallidum [Heidel, Lawal, Felder, Schilde, Helps, Tunggal, Rivero, John, Schleicher, Eichinger et al. (2011) (Phylogeny-wide analysis of social amoeba genomes highlights ancient origins for complex intercellular communication. Genome Res., 21, 1882–1891)]. The new release includes a new Genome Browser with RNAseq expression, interspecies Basic Local Alignment Search Tool alignments and a unified Basic Local Alignment Search Tool search for cross-species comparisons.
Previous work from our laboratory showed that the Dictyostelium discoideum SadA protein plays a central role in cell-substrate adhesion. SadA null cells exhibit a loss of adhesion, a disrupted actin cytoskeleton, and a cytokinesis defect. How SadA mediates these phenotypes is unknown. This work addresses the mechanism of SadA function, demonstrating an important role for the C-terminal cytoplasmic tail in SadA function. We found that a SadA tailless mutant was unable to rescue the sadA adhesion deficiency, and overexpression of the SadA tail domain reduced adhesion in wild-type cells. We also show that SadA is closely associated with the actin cytoskeleton. Mutagenesis studies suggested that four serine residues in the tail, S924/S925 and S940/S941, may regulate association of SadA with the actin cytoskeleton. Glutathione S-transferase pull-down assays identified at least one likely interaction partner of the SadA tail, cortexillin I, a known actin bundling protein. Thus, our data demonstrate an important role for the carboxy-terminal cytoplasmic tail in SadA function and strongly suggest that a phosphorylation event in this tail regulates an interaction with cortexillin I. Based on our data, we propose a model for the function of SadA.
The eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record (EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors.
The five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel.
The primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site.
Plans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care.
By combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
dictyBase (http://www.dictybase.org), the model organism database for Dictyostelium, aims to provide the broad biomedical research community with well integrated, high quality data and tools for Dictyostelium discoideum and related species. dictyBase houses the complete genome sequence, ESTs, and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome to provide a ‘reference genome’ in the Amoebozoa clade. We highlight several new features in the present update: (i) new annotations; (ii) improved interface with web 2.0 functionality; (iii) the initial steps towards a genome portal for the Amoebozoa; (iv) ortholog display; and (v) the complete integration of the Dicty Stock Center with dictyBase.
dictyBase (http://dictybase.org) is the model organism database for Dictyostelium discoideum. It houses the complete genome sequence, ESTs and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome. This dictyBase update describes the annotations and features implemented since 2006, including improved strain and phenotype representation, integration of predicted transcriptional regulatory elements, protein domain information, biochemical pathways, improved searching and a wiki tool that allows members of the research community to provide annotations.
Dictyostelium discoideum is a model system for studying many important physiological processes including chemotaxis, phagocytosis, and signal transduction. The recent sequencing of the genome has revealed the presence of over 12,500 protein-coding genes. The model organism database dictyBase hosts the genome sequence as well as a large amount of manually curated information.
We present here an anatomy ontology for Dictyostelium based upon the life cycle of the organism.
Anatomy ontologies are necessary to annotate species-specific events such as phenotypes, and the Dictyostelium anatomy ontology provides an essential tool for curation of the Dictyostelium genome.
xanthusBase () is the official model organism database (MOD) for the social bacterium Myxococcus xanthus. In many respects, M.xanthus represents the pioneer model organism (MO) for studying the genetic, biochemical, and mechanistic basis of prokaryotic multicellularity, a topic that has garnered considerable attention due to the significance of biofilms in both basic and applied microbiology research. To facilitate its utility, the design of xanthusBase incorporates open-source software, leveraging the cumulative experience made available through the Generic Model Organism Database (GMOD) project, MediaWiki (), and dictyBase (), to create a MOD that is both highly useful and easily navigable. In addition, we have incorporated a unique Wikipedia-style curation model which exploits the internet's inherent interactivity, thus enabling M.xanthus and other myxobacterial researchers to contribute directly toward the ongoing genome annotation.
dictyBase () is the model organism database (MOD) for the social amoeba Dictyostelium discoideum. The unique biology and phylogenetic position of Dictyostelium offer a great opportunity to gain knowledge of processes not characterized in other organisms. The recent completion of the 34 MB genome sequence, together with the sizable scientific literature using Dictyostelium as a research organism, provided the necessary tools to create a well-annotated genome. dictyBase has leveraged software developed by the Saccharomyces Genome Database and the Generic Model Organism Database project. This has reduced the time required to develop a full-featured MOD and greatly facilitated our ability to focus on annotation and providing new functionality. We hope that manual curation of the Dictyostelium genome will facilitate the annotation of other genomes.
Dictyostelium discoideum is a powerful and genetically tractable model system used for the study of numerous cellular molecular mechanisms including chemotaxis, phagocytosis and signal transduction. The past 2 years have seen a significant expansion in the scope and accessibility of online resources for Dictyostelium. Recent advances have focused on the development of a new comprehensive online resource called dictyBase (http://dictybase.org). This database not only provides access to genomic data including functional annotation of genes, gene products and chromosomal mapping, but also to extensive biological information such as mutant phenotypes and corresponding reference material. In conjunction with additional sites (http://genome.imb-jena.de/dictyostelium/, http://dictyensembl.bioch.bcm.tmc.edu and http://www.sanger.ac.uk/Projects/D_discoideum/) from the genome sequencing and assembly centers, these improvements have expanded the scope of the Dictyostelium databases making them accessible and useful to any researcher interested in comparative and functional genomics in metazoan organisms.
Approaches with high spatial and temporal resolution are required to understand the regulation of nonmuscle myosin II in vivo. Using fluorescence resonance energy transfer we have produced a novel biosensor allowing simultaneous determination of myosin light chain kinase (MLCK) localization and its [Ca2+]4/calmodulin-binding state in living cells. We observe transient recruitment of diffuse MLCK to stress fibers and its in situ activation before contraction. MLCK is highly active in the lamella of migrating cells, but not at the retracting tail. This unexpected result highlights a potential role for MLCK-mediated myosin contractility in the lamella as a driving force for migration. During cytokinesis, MLCK was enriched at the spindle equator during late metaphase, and was maximally activated just before cleavage furrow constriction. As furrow contraction was completed, active MLCK was redistributed to the poles of the daughter cells. These results show MLCK is a myosin regulator in the lamella and contractile ring, and pinpoints sites where myosin function may be mediated by other kinases.
myosin light chain kinase; myosin light chains; phosphorylation; cell division; FRET
Cytoplasmic dynein intermediate chain (IC) mediates dynein–dynactin interaction in vitro (Karki, S., and E.L. Holzbaur. 1995. J. Biol. Chem. 270:28806–28811; Vaughan, K.T., and R.B. Vallee. 1995. J. Cell Biol. 131:1507–1516). To investigate the physiological role of IC and dynein–dynactin interaction, we expressed IC truncations in wild-type Dictyostelium cells. ICΔC associated with dynactin but not with dynein heavy chain, whereas ICΔN truncations bound to dynein but bound dynactin poorly. Both mutations resulted in abnormal localization to the Golgi complex, confirming dynein function was disrupted. Striking disorganization of interphase microtubule (MT) networks was observed when mutant expression was induced. In a majority of cells, the MT networks collapsed into large bundles. We also observed cells with multiple cytoplasmic asters and MTs lacking an organizing center. These cells accumulated abnormal DNA content, suggesting a defect in mitosis. Striking defects in centrosome morphology were also observed in IC mutants, mostly larger than normal centrosomes. Ultrastructural analysis of centrosomes in IC mutants showed interphase accumulation of large centrosomes typical of prophase as well as unusually paired centrosomes, suggesting defects in centrosome replication and separation. These results suggest that dynactin-mediated cytoplasmic dynein function is required for the proper organization of interphase MT network as well as centrosome replication and separation in Dictyostelium.
dynein function; intermediate chain; dynein–dynactin interaction; microtubule organization; centrosome replication and separation
Although the potential for genomics to contribute to clinical care has long been anticipated, the pace of defining the risks and benefits of incorporating genomic findings into medical practice has been relatively slow. Several institutions have recently begun genomic medicine programs, encountering many of the same obstacles and developing the same solutions, often independently. Recognizing that successful early experiences can inform subsequent efforts, the National Human Genome Research Institute brought together a number of these groups to describe their ongoing projects and challenges, identify common infrastructure and research needs, and outline an implementation framework for investigating and introducing similar programs elsewhere. Chief among the challenges were limited evidence and consensus on which genomic variants were medically relevant; lack of reimbursement for genomically driven interventions; and burden to patients and clinicians of assaying, reporting, intervening, and following up genomic findings. Key infrastructure needs included an openly accessible knowledge base capturing sequence variants and their phenotypic associations and a framework for defining and cataloging clinically actionable variants. Multiple institutions are actively engaged in using genomic information in clinical care. Much of this work is being done in isolation and would benefit from more structured collaboration and sharing of best practices.
Genet Med 2013:15(4):258–267
medical genomics; practice standards
Electrocardiographic QRS duration, a measure of cardiac intraventricular conduction, varies ~2-fold in individuals without cardiac disease. Slow conduction may promote reentrant arrhythmias.
Methods and Results
We performed a genome-wide association study (GWAS) to identify genomic markers of QRS duration in 5,272 individuals without cardiac disease selected from electronic medical record (EMR) algorithms at five sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the CHARGE consortium QRS GWAS meta-analysis. Twenty-three single nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 SNPs were in the chromosome 3 SCN5A and SCN10A loci, where the most significant SNPs were rs1805126 in SCN5A with p=1.2×10−8 (eMERGE) and p=2.5×10−20 (CHARGE) and rs6795970 in SCN10A with p=6×10−6 (eMERGE) and p=5×10−27 (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies (PheWAS) on variants in these five loci in 13,859 European Americans to search for diagnoses associated with these markers. PheWAS identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5,272 “heart-healthy” study population.
We conclude that DNA biobanks coupled to EMRs provide a platform not only for GWAS but may also allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The PheWAS approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
cardiac conduction; QRS duration; atrial fibrillation; genome-wide association study; phenome-wide association study; electronic medical records
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
Type 2 diabetes (T2D) is a complex metabolic disease that disproportionately affects African Americans. Genome-wide association studies (GWAS) have identified several loci that contribute to T2D in European Americans, but few studies have been performed in admixed populations. We first performed a GWAS of 1,563 African Americans from the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project as part of the electronic Medical Records and Genomics (eMERGE) network. We successfully replicate an association in TCF7L2, previously identified by GWAS in this African American dataset. We were unable to identify novel associations at p<5.0×10−8 by GWAS. Using admixture mapping as an alternative method for discovery, we performed a genome-wide admixture scan that suggests multiple candidate genes associated with T2D. One finding, TCIRG1, is a T-cell immune regulator expressed in the pancreas and liver that has not been previously implicated for T2D. We performed subsequent fine-mapping to further assess the association between TCIRG1 and T2D in >5,000 African Americans. We identified 13 independent associations between TCIRG1, CHKA, and ALDH3B1 genes on chromosome 11 and T2D. Our results suggest a novel region on chromosome 11 identified by admixture mapping is associated with T2D in African Americans.
The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)-funded consortium engaged in the development of methods and best-practices for utilizing the Electronic Medical Record (EMR) as a tool for genomic research. Now in its sixth year, its second funding cycle and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from EMRs can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and healthcare informatics, particularly electronic phenotyping, genome-wide association studies, genomic medicine implementation and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here we describe the evolution, accomplishments, opportunities and challenges of the network since its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting towards implementation of genomic medicine.
electronic medical records; personalized medicine; genome-wide association studies; genetics and genomics; collaborative research