Nonmuscle myosin II plays a crucial role in a variety of cellular processes (e.g., polarity formation, cell motility, and cytokinesis). It is composed of two heavy chains, two regulatory light chains and two essential light chains. The ATPase activity of the myosin II motor domain is regulated through phosphorylation of the regulatory light chain (RLC) by myosin light chain kinase. To study myosin function and localization in cellular processes, GFP-fused RLCs are widely used; however, the exact kinetic properties of myosins with bound GFP-RLC are poorly described. More importantly, it has not been shown that a regulatory light chain fused at its N-terminus with GFP can maintain the normal phosphorylation-dependent regulation of nonmuscle myosin or serve as a substrate for myosin light chain kinase. We coexpressed N-terminal GFP-RLC with a heavy meromyosin (HMM)-like fragment of nonmuscle myosin IIA and essential light chain to characterize the phosphorylation dynamics and in vitro kinetic properties of the resulting HMM. Myosin light chain kinase phosphorylates the GFP-RLC bound to HMM IIA with the same Vmax as it does the wild type RLC bound to HMM IIA, but the Km is about two fold higher for the GFP fusion protein, meaning that it is a somewhat poorer substrate. The steady-state actin-activated MgATPase activity of the GFP-RLC HMM is very low in the absence of phosphorylation demonstrating that the GFP moiety does not prevent formation of the off state. The actin-activated MgATPase activity of phosphorylated GFP-RLC-HMM and is about half that of wild type phosphorylated HMM. The ability of phosphorylated GFP-RLC-HMM to move actin filaments in the actin gliding assay is also slightly compromised. These data indicate that despite some kinetic differences the N-terminal GFP fusion to the regulatory light chain is a reasonable model system for studying myosin function in vivo.
GFP; Nonmuscle myosin; Regulatory light chain; Enzymatic activity; In vitro motility
dictyBase (http:// dictybase.org), the model organism database for Dictyostelium discoideum, includes the complete genome sequence and expression data for this organism. Relevant literature is integrated into the database, and gene models and functional annotation are manually curated from experimental results and comparative multigenome analyses. dictyBase has recently expanded to include the genome sequences of three additional Dictyostelids, and has added new software tools to facilitate multigenome comparisons. The Dicty Stock Center, a strain and plasmid repository for Dictyostelium research has relocated to Northwestern University in 2009. This allowed us integrating all Dictyostelium resources to better serve the research community. In this chapter, we will describe how to navigate the website and highlight some of our newer improvements.
Dictyostelium discoideum; database; genomic sequence; multigenome; genome browser; Blast; gene page; functional annotation; strains; phenotypes
Clinical data in Electronic Medical Records (EMRs) is a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network or eMERGE investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73–98% and negative predictive values of 98–100%. A majority of EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
Little is known about cell–substrate adhesion and how motile and adhesive forces work together in moving cells. The ability to rapidly screen a large number of insertional mutants prompted us to perform a genetic screen in Dictyostelium to isolate adhesion-deficient mutants. The resulting substrate adhesion–deficient (sad) mutants grew in plastic dishes without attaching to the substrate. The cells were often larger than their wild-type parents and displayed a rough surface with many apparent blebs. One of these mutants, sadA−, completely lacked substrate adhesion in growth medium. The sadA− mutant also showed slightly impaired cytokinesis, an aberrant F-actin organization, and a phagocytosis defect. Deletion of the sadA gene by homologous recombination recreated the original mutant phenotype. Expression of sadA–GFP in sadA-null cells restored the wild-type phenotype. In sadA–GFP-rescued mutant cells, sadA–GFP localized to the cell surface, appropriate for an adhesion molecule. SadA contains nine putative transmembrane domains and three conserved EGF-like repeats in a predicted extracellular domain. The EGF repeats are similar to corresponding regions in proteins known to be involved in adhesion, such as tenascins and integrins. Our data combined suggest that sadA is the first substrate adhesion receptor to be identified in Dictyostelium.
Dictyostelium; cell–substrate adhesion; EGF-like repeats; phagocytosis; cytokinesis
dictyBase (http://dictybase.org) is the model organism database for the social amoeba Dictyostelium discoideum. This contribution provides an update on dictyBase that has been previously presented. During the past 3 years, dictyBase has taken significant strides toward becoming a genome portal for the whole Amoebozoa clade. In its latest release, dictyBase has scaled up to host multiple Dictyostelids, including Dictyostelium purpureum [Sucgang, Kuo, Tian, Salerno, Parikh, Feasley, Dalin, Tu, Huang, Barry et al.(2011) (Comparative genomics of the social amoebae Dictyostelium discoideum and Dictyostelium purpureum. Genome Biol., 12, R20)], Dictyostelium fasciculatum and Polysphondylium pallidum [Heidel, Lawal, Felder, Schilde, Helps, Tunggal, Rivero, John, Schleicher, Eichinger et al. (2011) (Phylogeny-wide analysis of social amoeba genomes highlights ancient origins for complex intercellular communication. Genome Res., 21, 1882–1891)]. The new release includes a new Genome Browser with RNAseq expression, interspecies Basic Local Alignment Search Tool alignments and a unified Basic Local Alignment Search Tool search for cross-species comparisons.
Previous work from our laboratory showed that the Dictyostelium discoideum SadA protein plays a central role in cell-substrate adhesion. SadA null cells exhibit a loss of adhesion, a disrupted actin cytoskeleton, and a cytokinesis defect. How SadA mediates these phenotypes is unknown. This work addresses the mechanism of SadA function, demonstrating an important role for the C-terminal cytoplasmic tail in SadA function. We found that a SadA tailless mutant was unable to rescue the sadA adhesion deficiency, and overexpression of the SadA tail domain reduced adhesion in wild-type cells. We also show that SadA is closely associated with the actin cytoskeleton. Mutagenesis studies suggested that four serine residues in the tail, S924/S925 and S940/S941, may regulate association of SadA with the actin cytoskeleton. Glutathione S-transferase pull-down assays identified at least one likely interaction partner of the SadA tail, cortexillin I, a known actin bundling protein. Thus, our data demonstrate an important role for the carboxy-terminal cytoplasmic tail in SadA function and strongly suggest that a phosphorylation event in this tail regulates an interaction with cortexillin I. Based on our data, we propose a model for the function of SadA.
The eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record (EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors.
The five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel.
The primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site.
Plans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care.
By combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
dictyBase (http://www.dictybase.org), the model organism database for Dictyostelium, aims to provide the broad biomedical research community with well integrated, high quality data and tools for Dictyostelium discoideum and related species. dictyBase houses the complete genome sequence, ESTs, and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome to provide a ‘reference genome’ in the Amoebozoa clade. We highlight several new features in the present update: (i) new annotations; (ii) improved interface with web 2.0 functionality; (iii) the initial steps towards a genome portal for the Amoebozoa; (iv) ortholog display; and (v) the complete integration of the Dicty Stock Center with dictyBase.
dictyBase (http://dictybase.org) is the model organism database for Dictyostelium discoideum. It houses the complete genome sequence, ESTs and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome. This dictyBase update describes the annotations and features implemented since 2006, including improved strain and phenotype representation, integration of predicted transcriptional regulatory elements, protein domain information, biochemical pathways, improved searching and a wiki tool that allows members of the research community to provide annotations.
Dictyostelium discoideum is a model system for studying many important physiological processes including chemotaxis, phagocytosis, and signal transduction. The recent sequencing of the genome has revealed the presence of over 12,500 protein-coding genes. The model organism database dictyBase hosts the genome sequence as well as a large amount of manually curated information.
We present here an anatomy ontology for Dictyostelium based upon the life cycle of the organism.
Anatomy ontologies are necessary to annotate species-specific events such as phenotypes, and the Dictyostelium anatomy ontology provides an essential tool for curation of the Dictyostelium genome.
xanthusBase () is the official model organism database (MOD) for the social bacterium Myxococcus xanthus. In many respects, M.xanthus represents the pioneer model organism (MO) for studying the genetic, biochemical, and mechanistic basis of prokaryotic multicellularity, a topic that has garnered considerable attention due to the significance of biofilms in both basic and applied microbiology research. To facilitate its utility, the design of xanthusBase incorporates open-source software, leveraging the cumulative experience made available through the Generic Model Organism Database (GMOD) project, MediaWiki (), and dictyBase (), to create a MOD that is both highly useful and easily navigable. In addition, we have incorporated a unique Wikipedia-style curation model which exploits the internet's inherent interactivity, thus enabling M.xanthus and other myxobacterial researchers to contribute directly toward the ongoing genome annotation.
dictyBase () is the model organism database (MOD) for the social amoeba Dictyostelium discoideum. The unique biology and phylogenetic position of Dictyostelium offer a great opportunity to gain knowledge of processes not characterized in other organisms. The recent completion of the 34 MB genome sequence, together with the sizable scientific literature using Dictyostelium as a research organism, provided the necessary tools to create a well-annotated genome. dictyBase has leveraged software developed by the Saccharomyces Genome Database and the Generic Model Organism Database project. This has reduced the time required to develop a full-featured MOD and greatly facilitated our ability to focus on annotation and providing new functionality. We hope that manual curation of the Dictyostelium genome will facilitate the annotation of other genomes.
Dictyostelium discoideum is a powerful and genetically tractable model system used for the study of numerous cellular molecular mechanisms including chemotaxis, phagocytosis and signal transduction. The past 2 years have seen a significant expansion in the scope and accessibility of online resources for Dictyostelium. Recent advances have focused on the development of a new comprehensive online resource called dictyBase (http://dictybase.org). This database not only provides access to genomic data including functional annotation of genes, gene products and chromosomal mapping, but also to extensive biological information such as mutant phenotypes and corresponding reference material. In conjunction with additional sites (http://genome.imb-jena.de/dictyostelium/, http://dictyensembl.bioch.bcm.tmc.edu and http://www.sanger.ac.uk/Projects/D_discoideum/) from the genome sequencing and assembly centers, these improvements have expanded the scope of the Dictyostelium databases making them accessible and useful to any researcher interested in comparative and functional genomics in metazoan organisms.
Approaches with high spatial and temporal resolution are required to understand the regulation of nonmuscle myosin II in vivo. Using fluorescence resonance energy transfer we have produced a novel biosensor allowing simultaneous determination of myosin light chain kinase (MLCK) localization and its [Ca2+]4/calmodulin-binding state in living cells. We observe transient recruitment of diffuse MLCK to stress fibers and its in situ activation before contraction. MLCK is highly active in the lamella of migrating cells, but not at the retracting tail. This unexpected result highlights a potential role for MLCK-mediated myosin contractility in the lamella as a driving force for migration. During cytokinesis, MLCK was enriched at the spindle equator during late metaphase, and was maximally activated just before cleavage furrow constriction. As furrow contraction was completed, active MLCK was redistributed to the poles of the daughter cells. These results show MLCK is a myosin regulator in the lamella and contractile ring, and pinpoints sites where myosin function may be mediated by other kinases.
myosin light chain kinase; myosin light chains; phosphorylation; cell division; FRET
Cytoplasmic dynein intermediate chain (IC) mediates dynein–dynactin interaction in vitro (Karki, S., and E.L. Holzbaur. 1995. J. Biol. Chem. 270:28806–28811; Vaughan, K.T., and R.B. Vallee. 1995. J. Cell Biol. 131:1507–1516). To investigate the physiological role of IC and dynein–dynactin interaction, we expressed IC truncations in wild-type Dictyostelium cells. ICΔC associated with dynactin but not with dynein heavy chain, whereas ICΔN truncations bound to dynein but bound dynactin poorly. Both mutations resulted in abnormal localization to the Golgi complex, confirming dynein function was disrupted. Striking disorganization of interphase microtubule (MT) networks was observed when mutant expression was induced. In a majority of cells, the MT networks collapsed into large bundles. We also observed cells with multiple cytoplasmic asters and MTs lacking an organizing center. These cells accumulated abnormal DNA content, suggesting a defect in mitosis. Striking defects in centrosome morphology were also observed in IC mutants, mostly larger than normal centrosomes. Ultrastructural analysis of centrosomes in IC mutants showed interphase accumulation of large centrosomes typical of prophase as well as unusually paired centrosomes, suggesting defects in centrosome replication and separation. These results suggest that dynactin-mediated cytoplasmic dynein function is required for the proper organization of interphase MT network as well as centrosome replication and separation in Dictyostelium.
dynein function; intermediate chain; dynein–dynactin interaction; microtubule organization; centrosome replication and separation
The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)-funded consortium engaged in the development of methods and best-practices for utilizing the Electronic Medical Record (EMR) as a tool for genomic research. Now in its sixth year, its second funding cycle and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from EMRs can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and healthcare informatics, particularly electronic phenotyping, genome-wide association studies, genomic medicine implementation and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here we describe the evolution, accomplishments, opportunities and challenges of the network since its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting towards implementation of genomic medicine.
electronic medical records; personalized medicine; genome-wide association studies; genetics and genomics; collaborative research
Only one LDL-C GWAS has been reported in African Americans. We performed a GWAS of LDL-C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL-C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL-C calculated after the exclusion of some lab values based on co-morbidities and medication (n = 618) and another dataset having median LDL-C calculated without any exclusions (n = 1249). Rs7412 in APOE was strongly associated with LDL-C at levels of GWAS significance in both datasets (p < 5 X 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dl per minor allele was observed. The effect size was attenuated (12.3 mg/dl) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL-C extracted from EMR after exclusions for medications and co-morbidities increased the percentage of trait variance explained by genetic variation.
GWAS; LDL; electronic medical records
Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype–phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems.
Materials and Methods
An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions.
The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D.
By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS.
An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.
Analytics; application of biological knowledge to clinical care; bioinformatics; biomedical informatics; clinical phenotyping; controlled terminologies and vocabularies; data mining; EHR; EMR secondary and meaningful use; genetic epidemiology; genetics; genome-wide association studies; genomics; HIT data standards; improving the education and skills training of health professionals; infection control; information retrieval; knowledge representations; linking the genotype and phenotype; medical informatics; modeling; natural-language processing; ontologies; pharmacogenomics; phenotyping; reuseability; translational research
The social amoebae (Dictyostelia) are a diverse group of Amoebozoa that achieve multicellularity by aggregation and undergo morphogenesis into fruiting bodies with terminally differentiated spores and stalk cells. There are four groups of dictyostelids, with the most derived being a group that contains the model species Dictyostelium discoideum.
We have produced a draft genome sequence of another group dictyostelid, Dictyostelium purpureum, and compare it to the D. discoideum genome. The assembly (8.41 × coverage) comprises 799 scaffolds totaling 33.0 Mb, comparable to the D. discoideum genome size. Sequence comparisons suggest that these two dictyostelids shared a common ancestor approximately 400 million years ago. In spite of this divergence, most orthologs reside in small clusters of conserved synteny. Comparative analyses revealed a core set of orthologous genes that illuminate dictyostelid physiology, as well as differences in gene family content. Interesting patterns of gene conservation and divergence are also evident, suggesting function differences; some protein families, such as the histidine kinases, have undergone little functional change, whereas others, such as the polyketide synthases, have undergone extensive diversification. The abundant amino acid homopolymers encoded in both genomes are generally not found in homologous positions within proteins, so they are unlikely to derive from ancestral DNA triplet repeats. Genes involved in the social stage evolved more rapidly than others, consistent with either relaxed selection or accelerated evolution due to social conflict.
The findings from this new genome sequence and comparative analysis shed light on the biology and evolution of the Dictyostelia.
Background Vast sample sizes are often essential in the quest to disentangle the complex interplay of the genetic, lifestyle, environmental and social factors that determine the aetiology and progression of chronic diseases. The pooling of information between studies is therefore of central importance to contemporary bioscience. However, there are many technical, ethico-legal and scientific challenges to be overcome if an effective, valid, pooled analysis is to be achieved. Perhaps most critically, any data that are to be analysed in this way must be adequately ‘harmonized’. This implies that the collection and recording of information and data must be done in a manner that is sufficiently similar in the different studies to allow valid synthesis to take place.
Methods This conceptual article describes the origins, purpose and scientific foundations of the DataSHaPER (DataSchema and Harmonization Platform for Epidemiological Research; http://www.datashaper.org), which has been created by a multidisciplinary consortium of experts that was pulled together and coordinated by three international organizations: P3G (Public Population Project in Genomics), PHOEBE (Promoting Harmonization of Epidemiological Biobanks in Europe) and CPT (Canadian Partnership for Tomorrow Project).
Results The DataSHaPER provides a flexible, structured approach to the harmonization and pooling of information between studies. Its two primary components, the ‘DataSchema’ and ‘Harmonization Platforms’, together support the preparation of effective data-collection protocols and provide a central reference to facilitate harmonization. The DataSHaPER supports both ‘prospective’ and ‘retrospective’ harmonization.
Conclusion It is hoped that this article will encourage readers to investigate the project further: the more the research groups and studies are actively involved, the more effective the DataSHaPER programme will ultimately be.
Data synthesis; data quality; data pooling; harmonization; meta-analysis; DataSHaPER; prospective harmonization; retrospective harmonization
The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.
We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.
The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.