Protein phosphorylation is a key mechanism to regulate protein functions. However, the contribution of this protein modification to species divergence is still largely unknown. Here, we studied the evolution of mammalian phosphoregulation by comparing the human and mouse phosphoproteomes. We found that 84% of the positions that are phosphorylated in one species or the other are conserved at the residue level. Twenty percent of these conserved sites are phosphorylated in both species. This proportion is 2.5 times more than expected by chance alone, suggesting that purifying selection is preserving phosphoregulation. However, we show that the majority of the sites that are conserved at the residue level are differentially phosphorylated between species. These sites likely result from false-negative identifications due to incomplete experimental coverage, false-positive identifications and non-functional sites. In addition, our results suggest that at least 5% of them are likely to be true differentially phosphorylated sites and may thus contribute to the divergence in phosphorylation networks between mouse and humans and this, despite residue conservation between orthologous proteins. We also showed that evolutionary turnover of phosphosites at adjacent positions (in a distance range of up to 40 amino acids) in human or mouse leads to an over estimation of the divergence in phosphoregulation between these two species. These sites tend to be phosphorylated by the same kinases, supporting the hypothesis that they are functionally redundant. Our results support the hypothesis that the evolutionary turnover of phosphorylation sites contributes to the divergence in phosphorylation profiles while preserving phosphoregulation. Overall, our study provides advanced analyses of mammalian phosphoproteomes and a framework for the study of their contribution to phenotypic evolution.
Understanding how differences in cellular regulation lead to phenotypic differences between species remains an open challenge in evolutionary genetics. The extensive phosphorylation data currently available allows to compare the human and mouse phosphoproteomes and to measure changes in their phosphoregulation. We found a general conservation of phosphorylation sites between these two species. However, a fraction of sites are conserved at the sequence level (the same amino acid is present in both species) but differ in their phosphorylation status. These sites represent candidate sites that have the potential to explain differences between human and mouse signalling networks that do not depend on the divergence of orthologous residues. Furthermore, we identified several sites where to a phosphorylation site in one species corresponds a non-phosphorylatable residue in the other one. These cases represent clear differences in protein regulation. Recent studies suggest that phosphorylation sites can shift position during evolution, leading to configurations in which pairs of divergent phosphorylation sites are functionally redundant. We identified more than 100 putative such cases, suggesting that divergence in amino acid does not necessarily imply functional divergence when comparing phosphoproteomes. Overall, our study provides new key concepts and data for the study of how regulatory differences may be linked to phenotypic ones at the network level.
Nerve cells and spontaneous coordinated behavior first appeared near the base of animal evolution in the common ancestor of cnidarians and bilaterians. Experiments on the cnidarian Hydra have demonstrated that nerve cells are essential for this behavior, although nerve cells in Hydra are organized in a diffuse network and do not form ganglia. Here we show that the gap junction protein innexin-2 is expressed in a small group of nerve cells in the lower body column of Hydra and that an anti-innexin-2 antibody binds to gap junctions in the same region. Treatment of live animals with innexin-2 antibody eliminates gap junction staining and reduces spontaneous body column contractions. We conclude that a small subset of nerve cells, connected by gap junctions and capable of synchronous firing, act as a pacemaker to coordinate the contraction of the body column in the absence of ganglia.
The use of autozygosity as a mapping tool in the search for autosomal recessive disease genes is well established. We hypothesized that autozygosity not only unmasks the recessiveness of disease causing variants, but can also reveal natural knockouts of genes with less obvious phenotypic consequences. To test this hypothesis, we exome sequenced 77 well phenotyped individuals born to first cousin parents in search of genes that are biallelically inactivated. Using a very conservative estimate, we show that each of these individuals carries biallelic inactivation of 22.8 genes on average. For many of the 169 genes that appear to be biallelically inactivated, available data support involvement in modulating metabolism, immunity, perception, external appearance and other phenotypic aspects, and appear therefore to contribute to human phenotypic variation. Other genes with biallelic inactivation may contribute in yet unknown mechanisms or may be on their way to conversion into pseudogenes due to true recent dispensability. We conclude that sequencing the autozygome is an efficient way to map the contribution of genes to human phenotypic variation that goes beyond the classical definition of disease.
Identification of disease-causing gene variants by taking advantage of autozygosity mapping in consanguineous pedigrees is well established. However, autozygous intervals can also result in making homozygous those loss of function variants in genes that may not result in a discernible phenotype even under a complete knockout. The advent of next-generation sequencing makes it possible to systematically sequence all autozygous intervals per individual (the autozygome) and uncover all apparent homozygous loss of function variants therein. By applying this approach on well phenotyped offspring of first cousin marriages, we were able to uncover >160 genes that appear to be completely inactivated, and we show that the apparent lack of phenotype may be context-dependent. This work expands the spectrum of phenotypic consequence of human knockout to include apparent lack of discernible phenotypes.
It is common practice in genome-wide association studies (GWAS) to focus on the relationship between disease risk and genetic variants one marker at a time. When relevant genes are identified it is often possible to implicate biological intermediates and pathways likely to be involved in disease aetiology. However, single genetic variants typically explain small amounts of disease risk. Our idea is to construct allelic scores that explain greater proportions of the variance in biological intermediates, and subsequently use these scores to data mine GWAS. To investigate the approach's properties, we indexed three biological intermediates where the results of large GWAS meta-analyses were available: body mass index, C-reactive protein and low density lipoprotein levels. We generated allelic scores in the Avon Longitudinal Study of Parents and Children, and in publicly available data from the first Wellcome Trust Case Control Consortium. We compared the explanatory ability of allelic scores in terms of their capacity to proxy for the intermediate of interest, and the extent to which they associated with disease. We found that allelic scores derived from known variants and allelic scores derived from hundreds of thousands of genetic markers explained significant portions of the variance in biological intermediates of interest, and many of these scores showed expected correlations with disease. Genome-wide allelic scores however tended to lack specificity suggesting that they should be used with caution and perhaps only to proxy biological intermediates for which there are no known individual variants. Power calculations confirm the feasibility of extending our strategy to the analysis of tens of thousands of molecular phenotypes in large genome-wide meta-analyses. We conclude that our method represents a simple way in which potentially tens of thousands of molecular phenotypes could be screened for causal relationships with disease without having to expensively measure these variables in individual disease collections.
The standard approach in genome-wide association studies is to analyse the relationship between genetic variants and disease one marker at a time. Significant associations between markers and disease are then used as evidence to implicate biological intermediates and pathways likely to be involved in disease aetiology. However, single genetic variants typically only explain small amounts of disease risk. Our idea is to construct allelic scores that explain greater proportions of the variance in biological intermediates than single markers, and then use these scores to data mine genome-wide association studies. We show how allelic scores derived from known variants as well as allelic scores derived from hundreds of thousands of genetic markers across the genome explain significant portions of the variance in body mass index, levels of C-reactive protein, and LDLc cholesterol, and many of these scores show expected correlations with disease. Power calculations confirm the feasibility of scaling our strategy to the analysis of tens of thousands of molecular phenotypes in large genome-wide meta-analyses. Our method represents a simple way in which tens of thousands of molecular phenotypes could be screened for potential causal relationships with disease.
Nori, a marine red alga, is one of the most profitable mariculture crops in the world. However, the biological properties of this macroalga are poorly understood at the molecular level. In this study, we determined the draft genome sequence of susabi-nori (Pyropia yezoensis) using next-generation sequencing platforms. For sequencing, thalli of P. yezoensis were washed to remove bacteria attached on the cell surface and enzymatically prepared as purified protoplasts. The assembled contig size of the P. yezoensis nuclear genome was approximately 43 megabases (Mb), which is an order of magnitude smaller than the previously estimated genome size. A total of 10,327 gene models were predicted and about 60% of the genes validated lack introns and the other genes have shorter introns compared to large-genome algae, which is consistent with the compact size of the P. yezoensis genome. A sequence homology search showed that 3,611 genes (35%) are functionally unknown and only 2,069 gene groups are in common with those of the unicellular red alga, Cyanidioschyzon merolae. As color trait determinants of red algae, light-harvesting genes involved in the phycobilisome were predicted from the P. yezoensis nuclear genome. In particular, we found a second homolog of phycobilisome-degradation gene, which is usually chloroplast-encoded, possibly providing a novel target for color fading of susabi-nori in aquaculture. These findings shed light on unexplained features of macroalgal genes and genomes, and suggest that the genome of P. yezoensis is a promising model genome of marine red algae.
Endogenous retroviruses (ERVs) comprise a significant percentage of the mammalian genome, and it is poorly understood whether they will remain as inactive genomes or emerge as infectious retroviruses. Although several types of ERVs are present in domestic cats, infectious ERVs have not been demonstrated. Here, we report a previously uncharacterized class of endogenous gammaretroviruses, termed ERV-DCs, that is present and hereditary in the domestic cat genome. We have characterized a subset of ERV-DC proviral clones, which are numbered according to their genomic insertions. One of these, ERV-DC10, located in the q12-q21 region on chromosome C1, is an infectious gammaretrovirus capable of infecting a broad range of cells, including human. Our studies indicate that ERV-DC10 entered the genome of domestic cats in the recent past and appeared to translocate to or reintegrate at a distinct locus as infectious ERV-DC18. Insertional polymorphism analysis revealed that 92 of 244 domestic cats had ERV-DC10 on a homozygous or heterozygous locus. ERV-DC-like sequences were found in primate and rodent genomes, suggesting that these ERVs, and recombinant viruses such as RD-114 and BaEV, originated from an ancestor of ERV-DC. We also found that a novel recombinant virus, feline leukemia virus subgroup D (FeLV-D), was generated by ERV-DC env transduction into feline leukemia virus in domestic cats. Our results indicate that ERV-DCs behave as donors and/or acceptors in the generation of infectious, recombinant viruses. The presence of such infectious endogenous retroviruses, which could be harmful or beneficial to the host, may affect veterinary medicine and public health.
Proliferative vitreoretinopathy (PVR) is a destructive complication of retinal detachment and vitreoretinal surgery which can lead to severe vision reduction by tractional retinal detachments. The purpose of this study was to determine the gene expression profile of epiretinal membranes (ERMs) associated with a PVR (PVR-ERM) and to compare it to the expression profile of less-aggressive secondary ERMs.
A PCR-amplified complementary DNA (cDNA) library was constructed using the RNAs isolated from ERMs obtained during vitrectomy. The sequence from the 5′ end was obtained for randomly selected clones and used to generate expressed sequence tags (ESTs). We obtained 1116 nonredundant clusters representing individual genes expressed in PVR-ERMs, and 799 clusters representing the genes expressed in secondary ERMs. The transcriptome of the PVR-ERMs was subdivided by functional subsets of genes related to metabolism, cell adhesion, cytoskeleton, signaling, and other functions, by FatiGo analysis. The genes highly expressed in PVR-ERMs were compared to those expressed in the secondary ERMs, and these were subdivided by cell adhesion, proliferation, and other functions. Querying 10 cell adhesion-related genes against the STRING database yielded 70 possible physical relationships to other genes/proteins, which included an additional 60 genes that were not detected in the PVR-ERM library. Of these, soluble CD44 and soluble vascular cellular adhesion molecule-1 were significantly increased in the vitreous of patients with PVR.
Our results support an earlier hypothesis that a PVR-ERM, even from genomic points of view, is an aberrant form of wound healing response. Genes preferentially expressed in PVR-ERMs may play an important role in the progression of PVR and could be served as therapeutic targets.
In evolution of mammals, some of essential genes for placental development are known to be of retroviral origin, as syncytin-1 derived from an envelope (env) gene of an endogenous retrovirus (ERV) aids in the cell fusion of placenta in humans. Although the placenta serves the same function in all placental mammals, env-derived genes responsible for trophoblast cell fusion and maternal immune tolerance differ among species and remain largely unidentified in the bovine species. To examine env-derived genes playing a role in the bovine placental development comprehensively, we determined the transcriptomic profiles of bovine conceptuses during three crucial windows of implantation periods using a high-throughput sequencer. The sequence reads were mapped into the bovine genome, in which ERV candidates were annotated using RetroTector© (7,624 and 1,542 for ERV-derived and env-derived genes, respectively). The mapped reads showed that approximately 18% (284 genes) of env-derived genes in the genome were expressed during placenta formation, and approximately 4% (63 genes) were detected for all days examined. We verified three env-derived genes that are expressed in trophoblast cells by polymerase chain reaction. Out of these three, the sequence of env-derived gene with the longest open reading frame (named BERV-P env) was found to show high expression levels in trophoblast cell lines and to be similar to those of syncytin-Car1 genes found in dogs and cats, despite their disparate origins. These results suggest that placentation depends on various retrovirus-derived genes that could have replaced endogenous predecessors during evolution.
endogenous retrovirus; RNA-seq; syncytin; envelope; cow
The osteoblast-lineage consists of cells at various stages of maturation that are essential for skeletal development, growth, and maintenance. Over the past decade, many of the signaling cascades that regulate this lineage have been elucidated; however, little is known of the networks that coordinate, modulate, and transmit these signals. Here, we identify a gene network specific to the osteoblast-lineage through the reconstruction of a bone co-expression network using microarray profiles collected on 96 Hybrid Mouse Diversity Panel (HMDP) inbred strains. Of the 21 modules that comprised the bone network, module 9 (M9) contained genes that were highly correlated with prototypical osteoblast maker genes and were more highly expressed in osteoblasts relative to other bone cells. In addition, the M9 contained many of the key genes that define the osteoblast-lineage, which together suggested that it was specific to this lineage. To use the M9 to identify novel osteoblast genes and highlight its biological relevance, we knocked-down the expression of its two most connected “hub” genes, Maged1 and Pard6g. Their perturbation altered both osteoblast proliferation and differentiation. Furthermore, we demonstrated the mice deficient in Maged1 had decreased bone mineral density (BMD). It was also discovered that a local expression quantitative trait locus (eQTL) regulating the Wnt signaling antagonist Sfrp1 was a key driver of the M9. We also show that the M9 is associated with BMD in the HMDP and is enriched for genes implicated in the regulation of human BMD through genome-wide association studies. In conclusion, we have identified a physiologically relevant gene network and used it to discover novel genes and regulatory mechanisms involved in the function of osteoblast-lineage cells. Our results highlight the power of harnessing natural genetic variation to generate co-expression networks that can be used to gain insight into the function of specific cell-types.
The osteoblast-lineage consists of a range of cells from osteogenic precursors that mature into bone-forming osteoblasts to osteocytes that are entombed in bone. Each cell in the lineage serves a number of distinct and critical roles in the growth and maintenance of the skeleton, as well as many extra-skeletal functions. Over the last decade, many of the major regulatory pathways governing the differentiation and activity of these cells have been discovered. In contrast, little is known regarding the composition or function of gene networks within the lineage. The goal of this study was to increase our understanding of how genes are organized into networks in osteoblasts. Towards this goal, we used microarray gene expression profiles from bone to identify a group of genes that formed a network specific to the osteoblast-lineage. We used the knowledge of this network to identify novel genes that are important for regulating various aspects of osteoblast function. These data improve our understanding of the gene networks operative in cells of the osteoblast-lineage.
H-InvDB (http://www.h-invitational.jp/) is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.
The relationship between sequence polymorphisms and human disease has been studied mostly in terms of effects of single nucleotide polymorphisms (SNPs) leading to single amino acid substitutions that change protein structure and function. However, less attention has been paid to more drastic sequence polymorphisms which cause premature termination of a protein’s sequence or large changes, insertions, or deletions in the sequence. We have analyzed a large set (n = 512) of insertions and deletions (indels) and single nucleotide polymorphisms causing premature termination of translation in disease-related genes. Prediction of protein-destabilization effects was performed by graphical presentation of the locations of polymorphisms in the protein structure, using the Genomes TO Protein (GTOP) database, and manual annotation with a set of specific criteria. Protein-destabilization was predicted for 44.4% of the nonsense SNPs, 32.4% of the frameshifting indels, and 9.1% of the non-frameshifting indels. A prediction of nonsense-mediated decay allowed to infer which truncated proteins would actually be translated as defective proteins. These cases included the proteins linked to diseases inherited dominantly, suggesting a relation between these diseases and toxic aggregation. Our approach would be useful in identifying potentially aggregation-inducing polymorphisms that may have pathological effects.
Complement C3 and C4 play key roles in the main physiological activities of complement system, and their deficiencies or over-expression are associated with many clinical infectious or immunity diseases. A two-stage genome-wide association study (GWAS) was performed for serum levels of C3 and C4. The first stage was conducted in 1,999 healthy Chinese men, and the second stage was performed in an additional 1,496 subjects. We identified two SNPs, rs3753394 in CFH gene and rs3745567 in C3 gene, that are significantly associated with serum C3 levels at a genome-wide significance level (P = 7.33×10−11 and P = 1.83×10−9, respectively). For C4, one large genomic region on chromosome 6p21.3 is significantly associated with serum C4 levels. Two SNPs (rs1052693 and rs11575839) were located in the MHC class I area that include HLA-A, HLA-C, and HLA-B genes. Two SNPs (rs2075799 and rs2857009) were located 5′ and 3′ of C4 gene. The other four SNPs, rs2071278, rs3763317, rs9276606, and rs241428, were located in the MHC class II region that includes HLA-DRA, HLA-DRB, and HLA-DQB genes. The combined P-values for those eight SNPs ranged from 3.19×10−22 to 5.62×10−97. HBsAg-positive subjects have significantly lower C3 and C4 protein concentrations compared with HBsAg-negative subjects (P<0.05). Our study is the first GWAS report which shows genetic components influence the levels of complement C3 and C4. Our significant findings provide novel insights of their related autoimmune, infectious diseases, and molecular mechanisms.
The complement system plays important roles in the innate and adaptive immune functions. C3 and C4 participate in almost all physiological activities and activated pathways as key complement members and host defense proteins. Identifying the genes that influence serum levels of C3 and C4 may help to elucidate the factors and mechanisms underlying the complement system. The genome-wide association studies (GWAS) have shown great success in revealing robust associations in both quantitative and qualitative traits. In this study, we performed a two-stage GWAS in a large cohort from the Chinese male population to examine the roles of common genetic variants on serum C3 and C4 levels. Our research identified genetic determinants associated with the quantitative levels of C3 and C4. Overall, our study highlights an intricate regulation of complement levels and potentially reveals novel mechanisms that may be followed up with additional functional studies.
We sequenced the genome of Theileria orientalis, a tick-borne apicomplexan protozoan parasite of cattle. The focus of this study was a comparative genome analysis of T. orientalis relative to other highly pathogenic Theileria species, T. parva and T. annulata. T. parva and T. annulata induce transformation of infected cells of lymphocyte or macrophage/monocyte lineages; in contrast, T. orientalis does not induce uncontrolled proliferation of infected leukocytes and multiplies predominantly within infected erythrocytes. While synteny across homologous chromosomes of the three Theileria species was found to be well conserved overall, subtelomeric structures were found to differ substantially, as T. orientalis lacks the large tandemly arrayed subtelomere-encoded variable secreted protein-encoding gene family. Moreover, expansion of particular gene families by gene duplication was found in the genomes of the two transforming Theileria species, most notably, the TashAT/TpHN and Tar/Tpr gene families. Gene families that are present only in T. parva and T. annulata and not in T. orientalis, Babesia bovis, or Plasmodium were also identified. Identification of differences between the genome sequences of Theileria species with different abilities to transform and immortalize bovine leukocytes will provide insight into proteins and mechanisms that have evolved to induce and regulate this process. The T. orientalis genome database is available at http://totdb.czc.hokudai.ac.jp/.
Cancer-like growth of leukocytes infected with malignant Theileria parasites is a unique cellular event, as it involves the transformation and immortalization of one eukaryotic cell by another. In this study, we sequenced the whole genome of a nontransforming Theileria species, Theileria orientalis, and compared it to the published sequences representative of two malignant, transforming species, T. parva and T. annulata. The genome-wide comparison of these parasite species highlights significant genetic diversity that may be associated with evolution of the mechanism(s) deployed by an intracellular eukaryotic parasite to transform its host cell.
Numerous cultivars of Japanese flowering cherry (Prunus subgenus Cerasus) are recognized, but in many cases they are difficult to distinguish morphologically. Therefore, we evaluated the clonal status of 215 designated cultivars using 17 SSR markers. More than half the cultivars were morphologically distinct and had unique genotypes. However, 22 cultivars were found to consist of multiple clones, which probably originate from the chance seedlings, suggesting that their unique characteristics have not been maintained through propagation by grafting alone. We also identified 23 groups consisting of two or more cultivars with identical genotypes. Most members of these groups were putatively synonymously related and morphologically identical. However, some of them were probably derived from bud sport mutants and had distinct morphologies. SSR marker analysis provided useful insights into the clonal status of the examined Japanese flowering cherry cultivars and proved to be a useful tool for cultivar characterization.
Cerasus; clone identification; cultivars; Prunus; SSR; microsatellite; taxonomy
Kleptoplastidy is the retention of plastids obtained from ingested algal prey, which may remain temporarily functional and be used for photosynthesis by the predator. We showed that the marine dinoflagellate Dinophysis mitra has great kleptoplastid diversity. We obtained 308 plastid rbcL sequences by gene cloning from 14 D. mitra cells and 102 operational taxonomic units (OTUs). Most sequences were new in the genetic database and positioned within Haptophyceae (227 sequences [73.7%], 80 OTUs [78.4%]), particularly within the genus Chrysochromulina. Others were closely related to Prasinophyceae (16 sequences [5.2%], 5 OTUs [4.9%]), Dictyochophyceae (14 sequences [4.5%], 5 OTUs [4.9%]), Pelagophyceae (14 sequences [4.5%], 1 OTU [1.0%]), Bolidophyceae (3 sequences [1.0%], 1 OTU [1.0%]), and Bacillariophyceae (1 sequence [0.3%], 1 OTU [1.0%]); however, 33 sequences (10.8%) as 9 OTUs (8.8%) were not closely clustered with any particular group. Only six sequences were identical to those of Chrysochromulina simplex, Chrysochromulina hirta, Chrysochromulina sp. TKB8936, Micromonas pusilla NEPCC29, Micromonas pusilla CCMP491, and an unidentified diatom. Thus, we detected >100 different plastid sequences from 14 D. mitra cells, strongly suggesting kleptoplastidy and the need for mixotrophic prey such as Laboea, Tontonia, and Strombidium-like ciliates, which retain numerous symbiotic plastids from different origins, for propagation and plastid sequestration.
The Targeted Proteins Research Program (TPRP) promoted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan is the phase II of structural biology project (2007–2011) following the Protein 3000 Project (2002–2006) in Japan. While the phase I Protein 3000 Project put partial emphasis on the construction and maintenance of pipelines for structural analyses, the TPRP is dedicated to revealing the structures and functions of the targeted proteins that have great importance in both basic research and industrial applications. To pursue this objective, 35 Targeted Proteins (TP) Projects selected in the three areas of fundamental biology, medicine and pharmacology, and food and environment are tightly collaborated with 10 Advanced Technology (AT) Projects in the four fields of protein production, structural analyses, chemical library and screening, and information platform. Here, the outlines and achievements of the 35 TP Projects are summarized in the system named TP Atlas. Progress in the diversified areas is described in the modules of Graphical Summary, General Summary, Tabular Summary, and Structure Gallery of the TP Atlas in the standard and unified format. Advances in TP Projects owing to novel technologies stemmed from AT Projects and collaborative research among TP Projects are illustrated as a hallmark of the Program. The TP Atlas can be accessed at http://net.genes.nig.ac.jp/tpatlas/index_e.html.
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-012-9139-1) contains supplementary material, which is available to authorized users.
Structural biology; National project; Research dissemination; Targeted Proteins Research Program; Protein 3000 Project
The industrially important food-yeast Candida utilis is a Crabtree effect-negative yeast used to produce valuable chemicals and recombinant proteins. In the present study, we conducted whole genome sequencing and phylogenetic analysis of C. utilis, which showed that this yeast diverged long before the formation of the CUG and Saccharomyces/Kluyveromyces clades. In addition, we performed comparative genome and transcriptome analyses using next-generation sequencing, which resulted in the identification of genes important for characteristic phenotypes of C. utilis such as those involved in nitrate assimilation, in addition to the gene encoding the functional hexose transporter. We also found that an antisense transcript of the alcohol dehydrogenase gene, which in silico analysis did not predict to be a functional gene, was transcribed in the stationary-phase, suggesting a novel system of repression of ethanol production. These findings should facilitate the development of more sophisticated systems for the production of useful reagents using C. utilis.
A mechanistic understanding of robust self-assembly and repair capabilities of complex systems would have enormous implications for basic evolutionary developmental biology as well as for transformative applications in regenerative biomedicine and the engineering of highly fault-tolerant cybernetic systems. Molecular biologists are working to identify the pathways underlying the remarkable regenerative abilities of model species that perfectly regenerate limbs, brains, and other complex body parts. However, a profound disconnect remains between the deluge of high-resolution genetic and protein data on pathways required for regeneration, and the desired spatial, algorithmic models that show how self-monitoring and growth control arise from the synthesis of cellular activities. This barrier to progress in the understanding of morphogenetic controls may be breached by powerful techniques from the computational sciences—using non-traditional modeling approaches to reverse-engineer systems such as planaria: flatworms with a complex bodyplan and nervous system that are able to regenerate any body part after traumatic injury. Currently, the involvement of experts from outside of molecular genetics is hampered by the specialist literature of molecular developmental biology: impactful collaborations across such different fields require that review literature be available that presents the key functional capabilities of important biological model systems while abstracting away from the often irrelevant and confusing details of specific genes and proteins. To facilitate modeling efforts by computer scientists, physicists, engineers, and mathematicians, we present a different kind of review of planarian regeneration. Focusing on the main patterning properties of this system, we review what is known about the signal exchanges that occur during regenerative repair in planaria and the cellular mechanisms that are thought to underlie them. By establishing an engineering-like style for reviews of the molecular developmental biology of biomedically important model systems, significant fresh insights and quantitative computational models will be developed by new collaborations between biology and the information sciences.
Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1–4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at http://esper.lab.nig.ac.jp/genome-composition-database/.
GCD; oligonucleotide frequency; alignment-free sequence comparison
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the ‘DDBJ Omics Archive’ (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.