Nori, a marine red alga, is one of the most profitable mariculture crops in the world. However, the biological properties of this macroalga are poorly understood at the molecular level. In this study, we determined the draft genome sequence of susabi-nori (Pyropia yezoensis) using next-generation sequencing platforms. For sequencing, thalli of P. yezoensis were washed to remove bacteria attached on the cell surface and enzymatically prepared as purified protoplasts. The assembled contig size of the P. yezoensis nuclear genome was approximately 43 megabases (Mb), which is an order of magnitude smaller than the previously estimated genome size. A total of 10,327 gene models were predicted and about 60% of the genes validated lack introns and the other genes have shorter introns compared to large-genome algae, which is consistent with the compact size of the P. yezoensis genome. A sequence homology search showed that 3,611 genes (35%) are functionally unknown and only 2,069 gene groups are in common with those of the unicellular red alga, Cyanidioschyzon merolae. As color trait determinants of red algae, light-harvesting genes involved in the phycobilisome were predicted from the P. yezoensis nuclear genome. In particular, we found a second homolog of phycobilisome-degradation gene, which is usually chloroplast-encoded, possibly providing a novel target for color fading of susabi-nori in aquaculture. These findings shed light on unexplained features of macroalgal genes and genomes, and suggest that the genome of P. yezoensis is a promising model genome of marine red algae.
Endogenous retroviruses (ERVs) comprise a significant percentage of the mammalian genome, and it is poorly understood whether they will remain as inactive genomes or emerge as infectious retroviruses. Although several types of ERVs are present in domestic cats, infectious ERVs have not been demonstrated. Here, we report a previously uncharacterized class of endogenous gammaretroviruses, termed ERV-DCs, that is present and hereditary in the domestic cat genome. We have characterized a subset of ERV-DC proviral clones, which are numbered according to their genomic insertions. One of these, ERV-DC10, located in the q12-q21 region on chromosome C1, is an infectious gammaretrovirus capable of infecting a broad range of cells, including human. Our studies indicate that ERV-DC10 entered the genome of domestic cats in the recent past and appeared to translocate to or reintegrate at a distinct locus as infectious ERV-DC18. Insertional polymorphism analysis revealed that 92 of 244 domestic cats had ERV-DC10 on a homozygous or heterozygous locus. ERV-DC-like sequences were found in primate and rodent genomes, suggesting that these ERVs, and recombinant viruses such as RD-114 and BaEV, originated from an ancestor of ERV-DC. We also found that a novel recombinant virus, feline leukemia virus subgroup D (FeLV-D), was generated by ERV-DC env transduction into feline leukemia virus in domestic cats. Our results indicate that ERV-DCs behave as donors and/or acceptors in the generation of infectious, recombinant viruses. The presence of such infectious endogenous retroviruses, which could be harmful or beneficial to the host, may affect veterinary medicine and public health.
Proliferative vitreoretinopathy (PVR) is a destructive complication of retinal detachment and vitreoretinal surgery which can lead to severe vision reduction by tractional retinal detachments. The purpose of this study was to determine the gene expression profile of epiretinal membranes (ERMs) associated with a PVR (PVR-ERM) and to compare it to the expression profile of less-aggressive secondary ERMs.
A PCR-amplified complementary DNA (cDNA) library was constructed using the RNAs isolated from ERMs obtained during vitrectomy. The sequence from the 5′ end was obtained for randomly selected clones and used to generate expressed sequence tags (ESTs). We obtained 1116 nonredundant clusters representing individual genes expressed in PVR-ERMs, and 799 clusters representing the genes expressed in secondary ERMs. The transcriptome of the PVR-ERMs was subdivided by functional subsets of genes related to metabolism, cell adhesion, cytoskeleton, signaling, and other functions, by FatiGo analysis. The genes highly expressed in PVR-ERMs were compared to those expressed in the secondary ERMs, and these were subdivided by cell adhesion, proliferation, and other functions. Querying 10 cell adhesion-related genes against the STRING database yielded 70 possible physical relationships to other genes/proteins, which included an additional 60 genes that were not detected in the PVR-ERM library. Of these, soluble CD44 and soluble vascular cellular adhesion molecule-1 were significantly increased in the vitreous of patients with PVR.
Our results support an earlier hypothesis that a PVR-ERM, even from genomic points of view, is an aberrant form of wound healing response. Genes preferentially expressed in PVR-ERMs may play an important role in the progression of PVR and could be served as therapeutic targets.
In evolution of mammals, some of essential genes for placental development are known to be of retroviral origin, as syncytin-1 derived from an envelope (env) gene of an endogenous retrovirus (ERV) aids in the cell fusion of placenta in humans. Although the placenta serves the same function in all placental mammals, env-derived genes responsible for trophoblast cell fusion and maternal immune tolerance differ among species and remain largely unidentified in the bovine species. To examine env-derived genes playing a role in the bovine placental development comprehensively, we determined the transcriptomic profiles of bovine conceptuses during three crucial windows of implantation periods using a high-throughput sequencer. The sequence reads were mapped into the bovine genome, in which ERV candidates were annotated using RetroTector© (7,624 and 1,542 for ERV-derived and env-derived genes, respectively). The mapped reads showed that approximately 18% (284 genes) of env-derived genes in the genome were expressed during placenta formation, and approximately 4% (63 genes) were detected for all days examined. We verified three env-derived genes that are expressed in trophoblast cells by polymerase chain reaction. Out of these three, the sequence of env-derived gene with the longest open reading frame (named BERV-P env) was found to show high expression levels in trophoblast cell lines and to be similar to those of syncytin-Car1 genes found in dogs and cats, despite their disparate origins. These results suggest that placentation depends on various retrovirus-derived genes that could have replaced endogenous predecessors during evolution.
endogenous retrovirus; RNA-seq; syncytin; envelope; cow
The osteoblast-lineage consists of cells at various stages of maturation that are essential for skeletal development, growth, and maintenance. Over the past decade, many of the signaling cascades that regulate this lineage have been elucidated; however, little is known of the networks that coordinate, modulate, and transmit these signals. Here, we identify a gene network specific to the osteoblast-lineage through the reconstruction of a bone co-expression network using microarray profiles collected on 96 Hybrid Mouse Diversity Panel (HMDP) inbred strains. Of the 21 modules that comprised the bone network, module 9 (M9) contained genes that were highly correlated with prototypical osteoblast maker genes and were more highly expressed in osteoblasts relative to other bone cells. In addition, the M9 contained many of the key genes that define the osteoblast-lineage, which together suggested that it was specific to this lineage. To use the M9 to identify novel osteoblast genes and highlight its biological relevance, we knocked-down the expression of its two most connected “hub” genes, Maged1 and Pard6g. Their perturbation altered both osteoblast proliferation and differentiation. Furthermore, we demonstrated the mice deficient in Maged1 had decreased bone mineral density (BMD). It was also discovered that a local expression quantitative trait locus (eQTL) regulating the Wnt signaling antagonist Sfrp1 was a key driver of the M9. We also show that the M9 is associated with BMD in the HMDP and is enriched for genes implicated in the regulation of human BMD through genome-wide association studies. In conclusion, we have identified a physiologically relevant gene network and used it to discover novel genes and regulatory mechanisms involved in the function of osteoblast-lineage cells. Our results highlight the power of harnessing natural genetic variation to generate co-expression networks that can be used to gain insight into the function of specific cell-types.
The osteoblast-lineage consists of a range of cells from osteogenic precursors that mature into bone-forming osteoblasts to osteocytes that are entombed in bone. Each cell in the lineage serves a number of distinct and critical roles in the growth and maintenance of the skeleton, as well as many extra-skeletal functions. Over the last decade, many of the major regulatory pathways governing the differentiation and activity of these cells have been discovered. In contrast, little is known regarding the composition or function of gene networks within the lineage. The goal of this study was to increase our understanding of how genes are organized into networks in osteoblasts. Towards this goal, we used microarray gene expression profiles from bone to identify a group of genes that formed a network specific to the osteoblast-lineage. We used the knowledge of this network to identify novel genes that are important for regulating various aspects of osteoblast function. These data improve our understanding of the gene networks operative in cells of the osteoblast-lineage.
H-InvDB (http://www.h-invitational.jp/) is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.
The relationship between sequence polymorphisms and human disease has been studied mostly in terms of effects of single nucleotide polymorphisms (SNPs) leading to single amino acid substitutions that change protein structure and function. However, less attention has been paid to more drastic sequence polymorphisms which cause premature termination of a protein’s sequence or large changes, insertions, or deletions in the sequence. We have analyzed a large set (n = 512) of insertions and deletions (indels) and single nucleotide polymorphisms causing premature termination of translation in disease-related genes. Prediction of protein-destabilization effects was performed by graphical presentation of the locations of polymorphisms in the protein structure, using the Genomes TO Protein (GTOP) database, and manual annotation with a set of specific criteria. Protein-destabilization was predicted for 44.4% of the nonsense SNPs, 32.4% of the frameshifting indels, and 9.1% of the non-frameshifting indels. A prediction of nonsense-mediated decay allowed to infer which truncated proteins would actually be translated as defective proteins. These cases included the proteins linked to diseases inherited dominantly, suggesting a relation between these diseases and toxic aggregation. Our approach would be useful in identifying potentially aggregation-inducing polymorphisms that may have pathological effects.
Complement C3 and C4 play key roles in the main physiological activities of complement system, and their deficiencies or over-expression are associated with many clinical infectious or immunity diseases. A two-stage genome-wide association study (GWAS) was performed for serum levels of C3 and C4. The first stage was conducted in 1,999 healthy Chinese men, and the second stage was performed in an additional 1,496 subjects. We identified two SNPs, rs3753394 in CFH gene and rs3745567 in C3 gene, that are significantly associated with serum C3 levels at a genome-wide significance level (P = 7.33×10−11 and P = 1.83×10−9, respectively). For C4, one large genomic region on chromosome 6p21.3 is significantly associated with serum C4 levels. Two SNPs (rs1052693 and rs11575839) were located in the MHC class I area that include HLA-A, HLA-C, and HLA-B genes. Two SNPs (rs2075799 and rs2857009) were located 5′ and 3′ of C4 gene. The other four SNPs, rs2071278, rs3763317, rs9276606, and rs241428, were located in the MHC class II region that includes HLA-DRA, HLA-DRB, and HLA-DQB genes. The combined P-values for those eight SNPs ranged from 3.19×10−22 to 5.62×10−97. HBsAg-positive subjects have significantly lower C3 and C4 protein concentrations compared with HBsAg-negative subjects (P<0.05). Our study is the first GWAS report which shows genetic components influence the levels of complement C3 and C4. Our significant findings provide novel insights of their related autoimmune, infectious diseases, and molecular mechanisms.
The complement system plays important roles in the innate and adaptive immune functions. C3 and C4 participate in almost all physiological activities and activated pathways as key complement members and host defense proteins. Identifying the genes that influence serum levels of C3 and C4 may help to elucidate the factors and mechanisms underlying the complement system. The genome-wide association studies (GWAS) have shown great success in revealing robust associations in both quantitative and qualitative traits. In this study, we performed a two-stage GWAS in a large cohort from the Chinese male population to examine the roles of common genetic variants on serum C3 and C4 levels. Our research identified genetic determinants associated with the quantitative levels of C3 and C4. Overall, our study highlights an intricate regulation of complement levels and potentially reveals novel mechanisms that may be followed up with additional functional studies.
We sequenced the genome of Theileria orientalis, a tick-borne apicomplexan protozoan parasite of cattle. The focus of this study was a comparative genome analysis of T. orientalis relative to other highly pathogenic Theileria species, T. parva and T. annulata. T. parva and T. annulata induce transformation of infected cells of lymphocyte or macrophage/monocyte lineages; in contrast, T. orientalis does not induce uncontrolled proliferation of infected leukocytes and multiplies predominantly within infected erythrocytes. While synteny across homologous chromosomes of the three Theileria species was found to be well conserved overall, subtelomeric structures were found to differ substantially, as T. orientalis lacks the large tandemly arrayed subtelomere-encoded variable secreted protein-encoding gene family. Moreover, expansion of particular gene families by gene duplication was found in the genomes of the two transforming Theileria species, most notably, the TashAT/TpHN and Tar/Tpr gene families. Gene families that are present only in T. parva and T. annulata and not in T. orientalis, Babesia bovis, or Plasmodium were also identified. Identification of differences between the genome sequences of Theileria species with different abilities to transform and immortalize bovine leukocytes will provide insight into proteins and mechanisms that have evolved to induce and regulate this process. The T. orientalis genome database is available at http://totdb.czc.hokudai.ac.jp/.
Cancer-like growth of leukocytes infected with malignant Theileria parasites is a unique cellular event, as it involves the transformation and immortalization of one eukaryotic cell by another. In this study, we sequenced the whole genome of a nontransforming Theileria species, Theileria orientalis, and compared it to the published sequences representative of two malignant, transforming species, T. parva and T. annulata. The genome-wide comparison of these parasite species highlights significant genetic diversity that may be associated with evolution of the mechanism(s) deployed by an intracellular eukaryotic parasite to transform its host cell.
Numerous cultivars of Japanese flowering cherry (Prunus subgenus Cerasus) are recognized, but in many cases they are difficult to distinguish morphologically. Therefore, we evaluated the clonal status of 215 designated cultivars using 17 SSR markers. More than half the cultivars were morphologically distinct and had unique genotypes. However, 22 cultivars were found to consist of multiple clones, which probably originate from the chance seedlings, suggesting that their unique characteristics have not been maintained through propagation by grafting alone. We also identified 23 groups consisting of two or more cultivars with identical genotypes. Most members of these groups were putatively synonymously related and morphologically identical. However, some of them were probably derived from bud sport mutants and had distinct morphologies. SSR marker analysis provided useful insights into the clonal status of the examined Japanese flowering cherry cultivars and proved to be a useful tool for cultivar characterization.
Cerasus; clone identification; cultivars; Prunus; SSR; microsatellite; taxonomy
Kleptoplastidy is the retention of plastids obtained from ingested algal prey, which may remain temporarily functional and be used for photosynthesis by the predator. We showed that the marine dinoflagellate Dinophysis mitra has great kleptoplastid diversity. We obtained 308 plastid rbcL sequences by gene cloning from 14 D. mitra cells and 102 operational taxonomic units (OTUs). Most sequences were new in the genetic database and positioned within Haptophyceae (227 sequences [73.7%], 80 OTUs [78.4%]), particularly within the genus Chrysochromulina. Others were closely related to Prasinophyceae (16 sequences [5.2%], 5 OTUs [4.9%]), Dictyochophyceae (14 sequences [4.5%], 5 OTUs [4.9%]), Pelagophyceae (14 sequences [4.5%], 1 OTU [1.0%]), Bolidophyceae (3 sequences [1.0%], 1 OTU [1.0%]), and Bacillariophyceae (1 sequence [0.3%], 1 OTU [1.0%]); however, 33 sequences (10.8%) as 9 OTUs (8.8%) were not closely clustered with any particular group. Only six sequences were identical to those of Chrysochromulina simplex, Chrysochromulina hirta, Chrysochromulina sp. TKB8936, Micromonas pusilla NEPCC29, Micromonas pusilla CCMP491, and an unidentified diatom. Thus, we detected >100 different plastid sequences from 14 D. mitra cells, strongly suggesting kleptoplastidy and the need for mixotrophic prey such as Laboea, Tontonia, and Strombidium-like ciliates, which retain numerous symbiotic plastids from different origins, for propagation and plastid sequestration.
The Targeted Proteins Research Program (TPRP) promoted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan is the phase II of structural biology project (2007–2011) following the Protein 3000 Project (2002–2006) in Japan. While the phase I Protein 3000 Project put partial emphasis on the construction and maintenance of pipelines for structural analyses, the TPRP is dedicated to revealing the structures and functions of the targeted proteins that have great importance in both basic research and industrial applications. To pursue this objective, 35 Targeted Proteins (TP) Projects selected in the three areas of fundamental biology, medicine and pharmacology, and food and environment are tightly collaborated with 10 Advanced Technology (AT) Projects in the four fields of protein production, structural analyses, chemical library and screening, and information platform. Here, the outlines and achievements of the 35 TP Projects are summarized in the system named TP Atlas. Progress in the diversified areas is described in the modules of Graphical Summary, General Summary, Tabular Summary, and Structure Gallery of the TP Atlas in the standard and unified format. Advances in TP Projects owing to novel technologies stemmed from AT Projects and collaborative research among TP Projects are illustrated as a hallmark of the Program. The TP Atlas can be accessed at http://net.genes.nig.ac.jp/tpatlas/index_e.html.
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-012-9139-1) contains supplementary material, which is available to authorized users.
Structural biology; National project; Research dissemination; Targeted Proteins Research Program; Protein 3000 Project
The industrially important food-yeast Candida utilis is a Crabtree effect-negative yeast used to produce valuable chemicals and recombinant proteins. In the present study, we conducted whole genome sequencing and phylogenetic analysis of C. utilis, which showed that this yeast diverged long before the formation of the CUG and Saccharomyces/Kluyveromyces clades. In addition, we performed comparative genome and transcriptome analyses using next-generation sequencing, which resulted in the identification of genes important for characteristic phenotypes of C. utilis such as those involved in nitrate assimilation, in addition to the gene encoding the functional hexose transporter. We also found that an antisense transcript of the alcohol dehydrogenase gene, which in silico analysis did not predict to be a functional gene, was transcribed in the stationary-phase, suggesting a novel system of repression of ethanol production. These findings should facilitate the development of more sophisticated systems for the production of useful reagents using C. utilis.
A mechanistic understanding of robust self-assembly and repair capabilities of complex systems would have enormous implications for basic evolutionary developmental biology as well as for transformative applications in regenerative biomedicine and the engineering of highly fault-tolerant cybernetic systems. Molecular biologists are working to identify the pathways underlying the remarkable regenerative abilities of model species that perfectly regenerate limbs, brains, and other complex body parts. However, a profound disconnect remains between the deluge of high-resolution genetic and protein data on pathways required for regeneration, and the desired spatial, algorithmic models that show how self-monitoring and growth control arise from the synthesis of cellular activities. This barrier to progress in the understanding of morphogenetic controls may be breached by powerful techniques from the computational sciences—using non-traditional modeling approaches to reverse-engineer systems such as planaria: flatworms with a complex bodyplan and nervous system that are able to regenerate any body part after traumatic injury. Currently, the involvement of experts from outside of molecular genetics is hampered by the specialist literature of molecular developmental biology: impactful collaborations across such different fields require that review literature be available that presents the key functional capabilities of important biological model systems while abstracting away from the often irrelevant and confusing details of specific genes and proteins. To facilitate modeling efforts by computer scientists, physicists, engineers, and mathematicians, we present a different kind of review of planarian regeneration. Focusing on the main patterning properties of this system, we review what is known about the signal exchanges that occur during regenerative repair in planaria and the cellular mechanisms that are thought to underlie them. By establishing an engineering-like style for reviews of the molecular developmental biology of biomedically important model systems, significant fresh insights and quantitative computational models will be developed by new collaborations between biology and the information sciences.
Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1–4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at http://esper.lab.nig.ac.jp/genome-composition-database/.
GCD; oligonucleotide frequency; alignment-free sequence comparison
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the ‘DDBJ Omics Archive’ (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
Duplogs, or intraspecies paralogs, constitute the important portion of eukaryote genomes and serve as a major source of functional innovation. We conducted detailed analyses of recently emerged animal duplogs. Genome data of three vertebrate species (Homo sapiens, Mus musculus, and Danio rerio), Caenorhabditis elegans, and two Drosophila species (Drosophila melanogaster and D. pseudoobscura) were used. Duplication events were divided into six age-groups according to the synonymous distance (dS) up to 0.6. Duplogs were classified into four equal-sized classes on physical distances and into three classes on relative orientations. We observed the following shared characteristics among intrachromosomal multiexon duplogs: 1) inverted duplogs account for 20–50%, and about a half of the physically most distant 25%; 2) except for C. elegans, the composition of physical distances, that of relative orientations, and the proportion of inverted duplogs in each physical distance category are more or less uniform; 3) except for C. elegans, the characteristics of the youngest (dS < 0.01) duplogs are similar to the overall characteristics of the entire set. These results suggest that intrachromosomal duplogs with fairly long physical distances were generated at once, rather than resulting from tandem duplications and subsequent genomic rearrangements. This is different from the three well-known modes of gene duplication: tandem duplication, retrotransposition, and genome duplication. We termed this new mode as “drift” duplication. The drift duplication has been producing duplicate copies at paces comparable with tandem duplications since the common ancestor of vertebrates, and it may have already operated in the common ancestor of bilateral animals.
duplog; paralog; gene duplication; physical distance; transcriptional orientation; animals; genome-wide analysis; cross-sectional analysis
We propose an innovative, integrated, cost-effective health system to combat major non-communicable diseases (NCDs), including cardiovascular, chronic respiratory, metabolic, rheumatologic and neurologic disorders and cancers, which together are the predominant health problem of the 21st century. This proposed holistic strategy involves comprehensive patient-centered integrated care and multi-scale, multi-modal and multi-level systems approaches to tackle NCDs as a common group of diseases. Rather than studying each disease individually, it will take into account their intertwined gene-environment, socio-economic interactions and co-morbidities that lead to individual-specific complex phenotypes. It will implement a road map for predictive, preventive, personalized and participatory (P4) medicine based on a robust and extensive knowledge management infrastructure that contains individual patient information. It will be supported by strategic partnerships involving all stakeholders, including general practitioners associated with patient-centered care. This systems medicine strategy, which will take a holistic approach to disease, is designed to allow the results to be used globally, taking into account the needs and specificities of local economies and health systems.
MicroRNAs (miRNAs) are small non-coding RNAs that act as regulators of gene expression in eukaryotes modulating a large diversity of biological processes. The discovery of miRNAs has provided new opportunities to understand the biology of a number of species. The cattle tick, Rhipicephalus (Boophilus) microplus, causes significant economic losses in cattle production worldwide and this drives us to further understand their biology so that effective control measures can be developed. To be able to provide new insights into the biology of cattle ticks and to expand the repertoire of tick miRNAs we utilized Illumina technology to sequence the small RNA transcriptomes derived from various life stages and selected organs of R. microplus.
To discover and profile cattle tick miRNAs we employed two complementary approaches, one aiming to find evolutionary conserved miRNAs and another focused on the discovery of novel cattle-tick specific miRNAs. We found 51 evolutionary conserved R. microplus miRNA loci, with 36 of these previously found in the tick Ixodes scapularis. The majority of the R. microplus miRNAs are perfectly conserved throughout evolution with 11, 5 and 15 of these conserved since the Nephrozoan (640 MYA), Protostomian (620MYA) and Arthropoda (540 MYA) ancestor, respectively. We then employed a de novo computational screening for novel tick miRNAs using the draft genome of I. scapularis and genomic contigs of R. microplus as templates. This identified 36 novel R. microplus miRNA loci of which 12 were conserved in I. scapularis. Overall we found 87 R. microplus miRNA loci, of these 15 showed the expression of both miRNA and miRNA* sequences. R. microplus miRNAs showed a variety of expression profiles, with the evolutionary-conserved miRNAs mainly expressed in all life stages at various levels, while the expression of novel tick-specific miRNAs was mostly limited to particular life stages and/or tick organs.
Anciently acquired miRNAs in the R. microplus lineage not only tend to accumulate the least amount of nucleotide substitutions as compared to those recently acquired miRNAs, but also show ubiquitous expression profiles through out tick life stages and organs contrasting with the restricted expression profiles of novel tick-specific miRNAs.
Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome.
In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing.
We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT.
Having the ability to coordinate the behavior of stem cells to induce regeneration of specific large-scale structures would have far reaching consequences in the treatment of degenerative diseases, acute injury, and aging. Thus, identifying and learning to manipulate the sequential steps that determine the fate of new tissue within the overall morphogenetic program of the organism is fundamental. We identified novel early signals, mediated by the central nervous system and 3 innexin proteins, which determine the fate and axial polarity of regenerated tissue in planarians. Modulation of gap junction-dependent and neural signals specifically induces ectopic anterior regeneration blastemas in posterior and lateral wounds. These ectopic anterior blastemas differentiate new brains that establish permanent primary axes re-established during subsequent rounds of unperturbed regeneration. These data reveal powerful novel controls of pattern formation and suggest a constructive model linking nervous inputs and polarity determination in early stages of regeneration.
gap junctions; neural signals; regeneration; polarity; planaria
Preterm birth is the major cause of neonatal death and serious morbidity. Most preterm births are due to spontaneous onset of labor without a known cause or effective prevention. Both maternal and fetal genomes influence the predisposition to spontaneous preterm birth (SPTB), but the susceptibility loci remain to be defined. We utilized a combination of unique population structures, family-based linkage analysis, and subsequent case-control association to identify a susceptibility haplotype for SPTB. Clinically well-characterized SPTB families from northern Finland, a subisolate founded by a relatively small founder population that has subsequently experienced a number of bottlenecks, were selected for the initial discovery sample. Genome-wide linkage analysis using a high-density single-nucleotide polymorphism (SNP) array in seven large northern Finnish non-consanginous families identified a locus on 15q26.3 (HLOD 4.68). This region contains the IGF1R gene, which encodes the type 1 insulin-like growth factor receptor IGF-1R. Haplotype segregation analysis revealed that a 55 kb 12-SNP core segment within the IGF1R gene was shared identical-by-state (IBS) in five families. A follow-up case-control study in an independent sample representing the more general Finnish population showed an association of a 6-SNP IGF1R haplotype with SPTB in the fetuses, providing further evidence for IGF1R as a SPTB predisposition gene (frequency in cases versus controls 0.11 versus 0.05, P = 0.001, odds ratio 2.3). This study demonstrates the identification of a predisposing, low-frequency haplotype in a multifactorial trait using a well-characterized population and a combination of family and case-control designs. Our findings support the identification of the novel susceptibility gene IGF1R for predisposition by the fetal genome to being born preterm.
Preterm birth is the major cause of infant deaths and life-long neurologic and cardiopulmonary morbidity. More than 10% of births in the United States occur prematurely, and the rate is increasing without known effective prevention. Previous premature birth increases the risk 3-fold in subsequent pregnancies. We report here, for the first time to our knowledge, a genome-wide study on susceptibility to spontaneous preterm birth in singleton pregnancies. To detect novel regions of the genome associated with preterm birth, we performed linkage analysis on seven carefully selected large families with recurrent spontaneous premature births. When we studied the fetuses, evidence was found for linkage of a region on chromosome 15 with spontaneous preterm birth, with the highest linkage signals occurring within a single gene, IGF1R. Evidence of the involvement of this gene in the etiology of preterm birth was further strengthened by subsequent haplotype segregation analysis and case-control analysis of an independent patient population. The IGF1R gene encodes insulin-like growth factor receptor 1 (IGF-1R), an important protein that potentially regulates signaling cascades involved in the onset of labor. Our analyses are unique in providing evidence that fetal IGF1R influences the risk of spontaneous preterm labor, leading to preterm birth.