The concept of topographic mapping is central to the understanding of the visual system at many levels, from the developmental to the computational. It is important to be able to relate different coordinate systems, e.g. maps of the visual field and maps of the retina. Retinal maps are frequently based on flat-mount preparations. These use dissection and relaxing cuts to render the quasi-spherical retina into a 2D preparation. The variable nature of relaxing cuts and associated tears limits quantitative cross-animal comparisons. We present an algorithm, “Retistruct,” that reconstructs retinal flat-mounts by mapping them into a standard, spherical retinal space. This is achieved by: stitching the marked-up cuts of the flat-mount outline; dividing the stitched outline into a mesh whose vertices then are mapped onto a curtailed sphere; and finally moving the vertices so as to minimise a physically-inspired deformation energy function. Our validation studies indicate that the algorithm can estimate the position of a point on the intact adult retina to within 8° of arc (3.6% of nasotemporal axis). The coordinates in reconstructed retinae can be transformed to visuotopic coordinates. Retistruct is used to investigate the organisation of the adult mouse visual system. We orient the retina relative to the nictitating membrane and compare this to eye muscle insertions. To align the retinotopic and visuotopic coordinate systems in the mouse, we utilised the geometry of binocular vision. In standard retinal space, the composite decussation line for the uncrossed retinal projection is located 64° away from the retinal pole. Projecting anatomically defined uncrossed retinal projections into visual space gives binocular congruence if the optical axis of the mouse eye is oriented at 64° azimuth and 22° elevation, in concordance with previous results. Moreover, using these coordinates, the dorsoventral boundary for S-opsin expressing cones closely matches the horizontal meridian.
How should funding agencies enable researchers to explore high-risk but potentially high-reward science? One model that appears to work is the NSF-funded synthesis center, an incubator for community-led, innovative science.
The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.
Building on the planning efforts of the RCN4GSC project, a workshop was convened in San Diego to bring together experts from genomics and metagenomics, biodiversity, ecology, and bioinformatics with the charge to identify potential for positive interactions and progress, especially building on successes at establishing data standards by the GSC and by the biodiversity and ecological communities. Until recently, the contribution of microbial life to the biomass and biodiversity of the biosphere was largely overlooked (because it was resistant to systematic study). Now, emerging genomic and metagenomic tools are making investigation possible. Initial research findings suggest that major advances are in the offing. Although different research communities share some overlapping concepts and traditions, they differ significantly in sampling approaches, vocabularies and workflows. Likewise, their definitions of ‘fitness for use’ for data differ significantly, as this concept stems from the specific research questions of most importance in the different fields. Nevertheless, there is little doubt that there is much to be gained from greater coordination and integration. As a first step toward interoperability of the information systems used by the different communities, participants agreed to conduct a case study on two of the leading data standards from the two formerly disparate fields: (a) GSC’s standard checklists for genomics and metagenomics and (b) TDWG’s Darwin Core standard, used primarily in taxonomy and systematic biology.
We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets.
Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.
Academic researchers and many in industry often lack the financial resources available to scientists working in “big pharma.” High costs include those associated with high-throughput screening and chemical synthesis. In order to address these challenges, many researchers have in part turned to alternate methodologies. Virtual screening, for example, often substitutes for high-throughput screening, and click chemistry ensures that chemical synthesis is fast, cheap, and comparatively easy. Though both in silico screening and click chemistry seek to make drug discovery more feasible, it is not yet routine to couple these two methodologies. We here present a novel computer algorithm, called AutoClickChem, capable of performing many click-chemistry reactions in silico. AutoClickChem can be used to produce large combinatorial libraries of compound models for use in virtual screens. As the compounds of these libraries are constructed according to the reactions of click chemistry, they can be easily synthesized for subsequent testing in biochemical assays. Additionally, in silico modeling of click-chemistry products may prove useful in rational drug design and drug optimization. AutoClickChem is based on the pymolecule toolbox, a framework that may facilitate the development of future python-based programs that require the manipulation of molecular models. Both the pymolecule toolbox and AutoClickChem are released under the GNU General Public License version 3 and are available for download from http://autoclickchem.ucsd.edu.
Chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME) is a severely debilitating disease of unknown pathogenesis consisting of a variety of symptoms including severe fatigue. The objective of the study was to examine the efficacy and safety of a TLR-3 agonist, rintatolimod (Poly I: C12U), in patients with debilitating CFS/ME.
Methods and Findings
A Phase III prospective, double-blind, randomized, placebo-controlled trial comparing twice weekly IV rintatolimod versus placebo was conducted in 234 subjects with long-standing, debilitating CFS/ME at 12 sites. The primary endpoint was the intra-patient change from baseline at Week 40 in exercise tolerance (ET). Secondary endpoints included concomitant drug usage, the Karnofsky Performance Score (KPS), Activities of Daily Living (ADL), and Vitality Score (SF 36). Subjects receiving rintatolimod for 40 weeks improved intra-patient placebo-adjusted ET 21.3% (p = 0.047) from baseline in an intention-to-treat analysis. Correction for subjects with reduced dosing compliance increased placebo-adjusted ET improvement to 28% (p = 0.022). The improvement observed represents approximately twice the minimum considered medically significant by regulatory agencies. The rintatolimod cohort vs. placebo also reduced dependence on drugs commonly used by patients in an attempt to alleviate the symptoms of CFS/ME (p = 0.048). Placebo subjects crossed-over to receive rintatolimod demonstrated an intra-patient improvement in ET performance at 24 weeks of 39% (p = 0.04). Rintatolimod at 400 mg twice weekly was generally well-tolerated.
Rintatolimod produced objective improvement in ET and a reduction in CFS/ME related concomitant medication usage as well as other secondary outcomes.
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.
Data standards; evolutionary informatics; interoperability; phyloinformatics; semantic web; syntax format
Sleep apnea is characterized by increased sympathetic activity and is associated with systemic hypertension. Angiotensin (Ang) peptides have previously been shown to participate in the regulation of sympathetic tone and arterial pressure in the hypothalamic paraventricular nucleus (PVN) neurons. We investigated the role of endogenous Ang peptides within the PVN to control blood pressure in a rat model of sleep apnea-induced hypertension. Male Sprague Dawley rats (250g), instrumented with bilateral guide cannulae targeting the PVN, received chronic infusion of Ang antagonists (A-779, Ang-(1-7) antagonist; losartan and ZD7155, AT1 antagonists; PD123319, AT2 receptor antagonist, or saline vehicle). A separate group received an infusion of the GABAA receptor agonist (muscimol) to inhibit PVN neuronal activity independent of angiotensin receptors. After cannula placement, rats were exposed during their sleep period to eucapnic intermittent hypoxia (IH; nadir 5% O2; 5% CO2 to peak 21% O2; 0% CO2) 20 cycles/hour, 7 hours/day, for 14 days while mean arterial pressure (MAP) was measured by telemetry. In rats receiving saline, IH exposure significantly increased MAP (+12±2 mmHg vs Sham −2±1 mmHg P<0.01). Inhibition of PVN neurons with muscimol reversed the increase in MAP in IH rats (MUS: −9±4 mmHg vs vehicle +12±2 mmHg; P<0.01). Infusion of any of the Ang antagonists also prevented the rise in MAP induced by IH (A-779: −5±1 mmHg, losartan: −9±4 mmHg, ZD7155: −11±4 mmHg and PD123319: −4±3 mmHg; P<0.01). Our results suggest that endogenous Ang peptides acting in the PVN contribute to IH-induced increases in MAP observed in this rat model of sleep apnea-induced hypertension.
hypothalamic-pituitary-adrenal regulation; peptides: anatomy and physiology; cardiovascular regulation; sympathetic activity; intermittent hypoxia; hypertension; renin-angiotensin system; angiotensin-(1-7)
Linking phenotypic with genotypic diversity has become a major requirement for basic and applied genome-centric biological research. To meet this need, a comprehensive database backend for efficiently storing, querying and analyzing large experimental data sets is necessary. Chado, a generic, modular, community-based database schema is widely used in the biological community to store information associated with genome sequence data. To meet the need to also accommodate large-scale phenotyping and genotyping projects, a new Chado module called Natural Diversity has been developed. The module strictly adheres to the Chado remit of being generic and ontology driven. The flexibility of the new module is demonstrated in its capacity to store any type of experiment that either uses or generates specimens or stock organisms. Experiments may be grouped or structured hierarchically, whereas any kind of biological entity can be stored as the observed unit, from a specimen to be used in genotyping or phenotyping experiments, to a group of species collected in the field that will undergo further lab analysis. We describe details of the Natural Diversity module, including the design approach, the relational schema and use cases implemented in several databases.
Human cytomegalovirus (HCMV) UL77 gene encodes the essential protein UL77, its function is characterized in the present study. Immunoprecipitation identified monomeric and oligomeric pUL77 in HCMV infected cells. Immunostaining of purified virions and subviral fractions showed that pUL77 is a structural protein associated with capsids. In silico analysis revealed the presence of a coiled-coil motif (CCM) at the N-terminus of pUL77. Chemical cross-linking of either wild-type pUL77 or CCM deletion mutant (pUL77ΔCCM) implicated that CCM is critical for oligomerization of pUL77. Furthermore, co-immunoprecipitations of infected and transfected cells demonstrated that pUL77 interacts with the capsid-associated DNA packaging motor components, pUL56 and pUL104, as well as the major capsid protein. The ability of pUL77 to bind dsDNA was shown by an in vitro assay. Binding to certain DNA was further confirmed by an assay using biotinylated 36-, 250-, 500-, 1000-meric dsDNA and 966-meric HCMV-specific dsDNA designed for this study. The binding efficiency (BE) was determined by image processing program defining values above 1.0 as positive. While the BE of the pUL56 binding to the 36-mer bio-pac1 containing a packaging signal was 10.0±0.63, the one for pUL77 was only 0.2±0.03. In contrast to this observation the BE of pUL77 binding to bio-500 bp or bio-1000 bp was 2.2±0.41 and 4.9±0.71, respectively. By using pUL77ΔCCM it was demonstrated that this protein could not bind to dsDNA. These data indicated that pUL77 (i) could form homodimers, (ii) CCM of pUL77 is crucial for oligomerization and (iii) could bind to dsDNA in a sequence independent manner.
The Feeding Experiments End-user Database (FEED) is a research tool developed by the Mammalian Feeding Working Group at the National Evolutionary Synthesis Center that permits synthetic, evolutionary analyses of the physiology of mammalian feeding. The tasks of the Working Group are to compile physiologic data sets into a uniform digital format stored at a central source, develop a standardized terminology for describing and organizing the data, and carry out a set of novel analyses using FEED. FEED contains raw physiologic data linked to extensive metadata. It serves as an archive for a large number of existing data sets and a repository for future data sets. The metadata are stored as text and images that describe experimental protocols, research subjects, and anatomical information. The metadata incorporate controlled vocabularies to allow consistent use of the terms used to describe and organize the physiologic data. The planned analyses address long-standing questions concerning the phylogenetic distribution of phenotypes involving muscle anatomy and feeding physiology among mammals, the presence and nature of motor pattern conservation in the mammalian feeding muscles, and the extent to which suckling constrains the evolution of feeding behavior in adult mammals. We expect FEED to be a growing digital archive that will facilitate new research into understanding the evolution of feeding anatomy.
The importance of data archiving, data sharing, and public access to data has received considerable attention. Awareness is growing among scientists that collaborative databases can facilitate these activities.We provide a detailed description of the collaborative life history database developed by our Working Group at the National Evolutionary Synthesis Center (NESCent) to address questions about life history patterns and the evolution of mortality and demographic variability in wild primates.Examples from each of the seven primate species included in our database illustrate the range of data incorporated and the challenges, decision-making processes, and criteria applied to standardize data across diverse field studies. In addition to the descriptive and structural metadata associated with our database, we also describe the process metadata (how the database was designed and delivered) and the technical specifications of the database.Our database provides a useful model for other researchers interested in developing similar types of databases for other organisms, while our process metadata may be helpful to other groups of researchers interested in developing databases for other types of collaborative analyses.
population ecology; evolutionary biology; bioinformatics; data archiving; data sharing; database development
Motivation to change substance use behavior is an important component of the recovery process that has usually been studied at entry into treatment. Less studied, but equally important, is the measurement of motivation over time and the role motivation plays in subsequent substance use.
The present study sought to examine longitudinal motivation toward sobriety among residents of sober living houses.
Sober living residents (n = 167) were followed at 6-month intervals over an 18-month period and assessed for motivation and substance use outcomes at each study interview. Motivation was measured using the costs and benefits subscales of the Alcohol and Drug Consequences Questionnaire (ADCQ) and substance use outcomes included the Addiction Severity Index (ASI) alcohol scale, ASI drug scale, and peak density of substance use (number of days of most use in a month).
Participants reported higher benefits than costs of sobriety or cutting down substance use at every study time point. Using lagged generalized estimating equation models, the ADCQ costs predicted increased severity for alcohol, drugs, and peak density, whereas the benefits subscale predicted decreased drug and peak density.
Longitudinal measurement of motivation can be a useful clinical tool to understand later substance use problems.
Given the mixed findings from prior studies on the effects of baseline motivation, a shift toward examining longitudinal measures of motivation at proximal and temporal intervals is indicated.
motivation; recovery; sober living; alcohol; drug
This report summarizes the proceedings of the one day BioSharing meeting held at the Intelligent Systems for Molecular Biology (ISMB) 2010 conference in Boston, MA, USA This inaugural BioSharing event was hosted by the Genomic Standards Consortium as part of its M3 & BioSharing special interest group (SIG) workshop. The BioSharing event included invited talks from a range of community leaders and a panel discussion at the end of the day. The panel session led to the formal agreement among community leaders to join together to promote cross-community knowledge exchange and collaborations. A key focus of the newly formed Biosharing community will be linking up resources to promote real-world data sharing (virtuous cycle of data) and supporting compliance with data policies through the creation of a one-stop-portal of information. Further information about the newly established BioSharing effort can be found at http://biosharing.org.
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.
The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies.
We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators.
The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge.
Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices.
Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.
The rich knowledge of morphological variation among organisms reported in the systematic literature has remained in free-text format, impractical for use in large-scale synthetic phylogenetic work. This noncomputable format has also precluded linkage to the large knowledgebase of genomic, genetic, developmental, and phenotype data in model organism databases. We have undertaken an effort to prototype a curated, ontology-based evolutionary morphology database that maps to these genetic databases (http://kb.phenoscape.org) to facilitate investigation into the mechanistic basis and evolution of phenotypic diversity. Among the first requirements in establishing this database was the development of a multispecies anatomy ontology with the goal of capturing anatomical data in a systematic and computable manner. An ontology is a formal representation of a set of concepts with defined relationships between those concepts. Multispecies anatomy ontologies in particular are an efficient way to represent the diversity of morphological structures in a clade of organisms, but they present challenges in their development relative to single-species anatomy ontologies. Here, we describe the Teleost Anatomy Ontology (TAO), a multispecies anatomy ontology for teleost fishes derived from the Zebrafish Anatomical Ontology (ZFA) for the purpose of annotating varying morphological features across species. To facilitate interoperability with other anatomy ontologies, TAO uses the Common Anatomy Reference Ontology as a template for its upper level nodes, and TAO and ZFA are synchronized, with zebrafish terms specified as subtypes of teleost terms. We found that the details of ontology architecture have ramifications for querying, and we present general challenges in developing a multispecies anatomy ontology, including refinement of definitions, taxon-specific relationships among terms, and representation of taxonomically variable developmental pathways.
Bioinformatics; devo-evo; fish; morphology; ontology; Teleostei
Peer acceptance and friendships were examined as moderators in the link between family adversity and child externalizing behavioral problems. Data on family adversity (i.e., ecological disadvantage, violent marital conflict, and harsh discipline) and child temperament and social information processing were collected during home visits from 585 families with 5-year-old children. Children's peer acceptance, friendship, and friends' aggressiveness were assessed with sociometric methods in kindergarten and grade 1. Teachers provided ratings of children's externalizing behavior problems in grade 2. Peer acceptance served as a moderator for all three measures of family adversity, and friendship served as a moderator for harsh discipline. Examination of regression slopes indicated that family adversity was not significantly associated with child externalizing behavior at high levels of positive peer relationships. These moderating effects generally were not qualified by child gender, ethnicity, or friends' aggressiveness, nor were they accounted for by child temperament or social information-processing patterns. The need for process-oriented studies of risk and protective factors is stressed.
The receptor-binding domain of Plasmodium vivax Duffy binding protein, region II (PvRII), is an attractive candidate for a vaccine against P. vivax malaria. Here, we have studied the safety and immunogenicity of recombinant PvRII in Macaca mulatta (rhesus monkeys). Recombinant PvRII with a C-terminal 6-histidine tag was expressed in E. coli, recovered from inclusion bodies, refolded into its functional conformation, purified to homogeneity and formulated with three adjuvants, namely, Alhydrogel, Montanide ISA 720 and the GSK proprietary Adjuvant System AS02A for use in immunogenicity studies. All the PvRII vaccine formulations tested were safe and highly immunogenic. The overall magnitude of the antibody response was significantly higher for both Montanide ISA 720 and AS02A formulations in comparison with Alhydrogel. Furthermore, there was a significant correlation between antibody recognition titers by ELISA and binding inhibition titers in in vitro binding assays. The PvRII vaccine formulations also induced IFN-γ recall responses that were identified using ex vivo ELISPOT assays. These results provide support for further clinical development of a vaccine for P. vivax malaria based on recombinant PvRII.
malaria vaccine; Duffy Binding Protein; Plasmodium vivax; adjuvant formulation
The SICAvar gene family, expressed at the surface of infected erythrocytes, is critical for antigenic variation in Plasmodium knowlesi. When this family was discovered, a prototypic SICAvar gene was characterized and defined by a 10-exon structure. The predicted 205-kDa protein lacked a convincing signal peptide, but included a series of variable cysteine-rich modules, a transmembrane domain encoded by the penultimate exon, and a cytoplasmic domain encoded by the final highly conserved exon. The 205 SICAvar gene and its family with up to 108 possible family members, was identified prior to the sequencing of the P. knowlesi genome. However, in the published P. knowlesi database this gene remains disjointed in five fragments. This study addresses a number of structural and functional questions that are critical for understanding SICAvar gene expression.
Database mining, bioinformatics, and traditional genomic and post-genomic experimental methods including proteomic technologies are used here to confirm the genomic context and expressed structure of the prototype 205 SICAvar gene.
This study reveals that the 205 SICAvar gene reported previously to have a 10-exon expressed gene structure has, in fact, 12 exons, with an unusually large and repeat-laden intron separating two newly defined upstream exons and the bona fide 5'UTR from the remainder of the gene sequence. The initial exon encodes a PEXEL motif, which may function to localize the SICA protein in the infected erythrocyte membrane. This newly defined start of the 205 SICAvar sequence is positioned on chromosome 5, over 340 kb upstream from the rest of the telomerically positioned SICAvar gene sequence in the published genome assembly. This study, however, verifies the continuity of these sequences, a 9.5 kb transcript, and provides evidence that the 205 SICAvar gene is located centrally on chromosome 5.
The prototype 205 SICAvar gene has been redefined to have a 12-exon structure. These data are important because they 1) address questions raised in the P. knowlesi genome database regarding SICAvar gene fragments, numbers and structures, 2) show that this prototype gene encodes a PEXEL motif, 3) emphasize the need for further refinement of the P. knowlesi genome data, and 4) retrospectively, provide evidence for recombination within centrally located SICAvar sequences.
Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the ‘kra’ monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia1,2. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated3, and it has a close phylogenetic relationship to Plasmodium vivax4, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or ‘hypnozoite’ in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone5) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome4 and other sequenced Plasmodium genomes6-8. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs9, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.