1.  Dopamine Receptor Genes and Evolutionary Differentiation in the Domestication of Fighting Cocks and Long-Crowing Chickens 
PLoS ONE  2014;9(7):e101778.
The chicken domestication process represents a typical model of artificial selection, and gives significant insight into the general understanding of the influence of artificial selection on recognizable phenotypes. Two Japanese domesticated chicken varieties, the fighting cock (Shamo) and the long-crowing chicken (Naganakidori), have been selectively bred for dramatically different phenotypes. The former has been selected exclusively for aggressiveness and the latter for long crowing with an obedient sitting posture. To understand the particular mechanism behind these genetic changes during domestication, we investigated the degree of genetic differentiation in the aforementioned chickens, focusing on dopamine receptor D2, D3, and D4 genes. We studied other ornamental chickens such as Chabo chickens as a reference for comparison. When genetic differentiation was measured by an index of nucleotide differentiation (NST) newly devised in this study, we found that the NST value of DRD4 for Shamo (0.072) was distinctively larger than those of the other genes among the three populations, suggesting that aggressiveness has been selected for in Shamo by collecting a variety of single nucleotide polymorphisms. In addition, we found that in DRD4 in Naganakidori, there is a deletion variant of one proline at the 24th residue in the repeat of nine prolines of exon 1. We thus conclude that artificial selection has operated on these different kinds of genetic variation in the DRD4 genes of Shamo and Naganakidori so strongly that the two domesticated varieties have differentiated to obtain their present opposite features in a relatively short period of time.
PMCID: PMC4117491  PMID: 25078403
2.  Coevolution of Axon Guidance Molecule Slit and Its Receptor Robo 
PLoS ONE  2014;9(5):e94970.
Coevolution is important for the maintenance of the interaction between a ligand and its receptor during evolution. The interaction between axon guidance molecule Slit and its receptor Robo is critical for the axon repulsion in neural tissues, which is evolutionarily conserved from planarians to humans. However, the mechanism of coevolution between Slit and Robo remains unclear. In this study, we found that coordinated amino acid changes took place at interacting sites of Slit and Robo by comparing the amino acids at these sites among different organisms. In addition, the high level correlation between evolutionary rate of Slit and Robo was identified in vertebrates. Furthermore, the sites under positive selection of slit and robo were detected in the same lineage such as mosquito and teleost. Overall, our results provide evidence for the coevolution between Slit and Robo.
PMCID: PMC4011710  PMID: 24801615
4.  Innexin gap junctions in nerve cells coordinate spontaneous contractile behavior in Hydra polyps 
Scientific Reports  2014;4:3573.
Nerve cells and spontaneous coordinated behavior first appeared near the base of animal evolution in the common ancestor of cnidarians and bilaterians. Experiments on the cnidarian Hydra have demonstrated that nerve cells are essential for this behavior, although nerve cells in Hydra are organized in a diffuse network and do not form ganglia. Here we show that the gap junction protein innexin-2 is expressed in a small group of nerve cells in the lower body column of Hydra and that an anti-innexin-2 antibody binds to gap junctions in the same region. Treatment of live animals with innexin-2 antibody eliminates gap junction staining and reduces spontaneous body column contractions. We conclude that a small subset of nerve cells, connected by gap junctions and capable of synchronous firing, act as a pacemaker to coordinate the contraction of the body column in the absence of ganglia.
PMCID: PMC3882753  PMID: 24394722
7.  The First Symbiont-Free Genome Sequence of Marine Red Alga, Susabi-nori (Pyropia yezoensis) 
PLoS ONE  2013;8(3):e57122.
Nori, a marine red alga, is one of the most profitable mariculture crops in the world. However, the biological properties of this macroalga are poorly understood at the molecular level. In this study, we determined the draft genome sequence of susabi-nori (Pyropia yezoensis) using next-generation sequencing platforms. For sequencing, thalli of P. yezoensis were washed to remove bacteria attached on the cell surface and enzymatically prepared as purified protoplasts. The assembled contig size of the P. yezoensis nuclear genome was approximately 43 megabases (Mb), which is an order of magnitude smaller than the previously estimated genome size. A total of 10,327 gene models were predicted and about 60% of the genes validated lack introns and the other genes have shorter introns compared to large-genome algae, which is consistent with the compact size of the P. yezoensis genome. A sequence homology search showed that 3,611 genes (35%) are functionally unknown and only 2,069 gene groups are in common with those of the unicellular red alga, Cyanidioschyzon merolae. As color trait determinants of red algae, light-harvesting genes involved in the phycobilisome were predicted from the P. yezoensis nuclear genome. In particular, we found a second homolog of phycobilisome-degradation gene, which is usually chloroplast-encoded, possibly providing a novel target for color fading of susabi-nori in aquaculture. These findings shed light on unexplained features of macroalgal genes and genomes, and suggest that the genome of P. yezoensis is a promising model genome of marine red algae.
PMCID: PMC3594237  PMID: 23536760
8.  Infectious Endogenous Retroviruses in Cats and Emergence of Recombinant Viruses 
Journal of Virology  2012;86(16):8634-8644.
Endogenous retroviruses (ERVs) comprise a significant percentage of the mammalian genome, and it is poorly understood whether they will remain as inactive genomes or emerge as infectious retroviruses. Although several types of ERVs are present in domestic cats, infectious ERVs have not been demonstrated. Here, we report a previously uncharacterized class of endogenous gammaretroviruses, termed ERV-DCs, that is present and hereditary in the domestic cat genome. We have characterized a subset of ERV-DC proviral clones, which are numbered according to their genomic insertions. One of these, ERV-DC10, located in the q12-q21 region on chromosome C1, is an infectious gammaretrovirus capable of infecting a broad range of cells, including human. Our studies indicate that ERV-DC10 entered the genome of domestic cats in the recent past and appeared to translocate to or reintegrate at a distinct locus as infectious ERV-DC18. Insertional polymorphism analysis revealed that 92 of 244 domestic cats had ERV-DC10 on a homozygous or heterozygous locus. ERV-DC-like sequences were found in primate and rodent genomes, suggesting that these ERVs, and recombinant viruses such as RD-114 and BaEV, originated from an ancestor of ERV-DC. We also found that a novel recombinant virus, feline leukemia virus subgroup D (FeLV-D), was generated by ERV-DC env transduction into feline leukemia virus in domestic cats. Our results indicate that ERV-DCs behave as donors and/or acceptors in the generation of infectious, recombinant viruses. The presence of such infectious endogenous retroviruses, which could be harmful or beneficial to the host, may affect veterinary medicine and public health.
PMCID: PMC3421742  PMID: 22674983
9.  Comparison of Gene Expression Profile of Epiretinal Membranes Obtained from Eyes with Proliferative Vitreoretinopathy to That of Secondary Epiretinal Membranes 
PLoS ONE  2013;8(1):e54191.
Proliferative vitreoretinopathy (PVR) is a destructive complication of retinal detachment and vitreoretinal surgery which can lead to severe vision reduction by tractional retinal detachments. The purpose of this study was to determine the gene expression profile of epiretinal membranes (ERMs) associated with a PVR (PVR-ERM) and to compare it to the expression profile of less-aggressive secondary ERMs.
Methodology/Principal Findings
A PCR-amplified complementary DNA (cDNA) library was constructed using the RNAs isolated from ERMs obtained during vitrectomy. The sequence from the 5′ end was obtained for randomly selected clones and used to generate expressed sequence tags (ESTs). We obtained 1116 nonredundant clusters representing individual genes expressed in PVR-ERMs, and 799 clusters representing the genes expressed in secondary ERMs. The transcriptome of the PVR-ERMs was subdivided by functional subsets of genes related to metabolism, cell adhesion, cytoskeleton, signaling, and other functions, by FatiGo analysis. The genes highly expressed in PVR-ERMs were compared to those expressed in the secondary ERMs, and these were subdivided by cell adhesion, proliferation, and other functions. Querying 10 cell adhesion-related genes against the STRING database yielded 70 possible physical relationships to other genes/proteins, which included an additional 60 genes that were not detected in the PVR-ERM library. Of these, soluble CD44 and soluble vascular cellular adhesion molecule-1 were significantly increased in the vitreous of patients with PVR.
Our results support an earlier hypothesis that a PVR-ERM, even from genomic points of view, is an aberrant form of wound healing response. Genes preferentially expressed in PVR-ERMs may play an important role in the progression of PVR and could be served as therapeutic targets.
PMCID: PMC3553111  PMID: 23372684
10.  Dynamic Evolution of Endogenous Retrovirus-Derived Genes Expressed in Bovine Conceptuses during the Period of Placentation 
Genome Biology and Evolution  2013;5(2):296-306.
In evolution of mammals, some of essential genes for placental development are known to be of retroviral origin, as syncytin-1 derived from an envelope (env) gene of an endogenous retrovirus (ERV) aids in the cell fusion of placenta in humans. Although the placenta serves the same function in all placental mammals, env-derived genes responsible for trophoblast cell fusion and maternal immune tolerance differ among species and remain largely unidentified in the bovine species. To examine env-derived genes playing a role in the bovine placental development comprehensively, we determined the transcriptomic profiles of bovine conceptuses during three crucial windows of implantation periods using a high-throughput sequencer. The sequence reads were mapped into the bovine genome, in which ERV candidates were annotated using RetroTector© (7,624 and 1,542 for ERV-derived and env-derived genes, respectively). The mapped reads showed that approximately 18% (284 genes) of env-derived genes in the genome were expressed during placenta formation, and approximately 4% (63 genes) were detected for all days examined. We verified three env-derived genes that are expressed in trophoblast cells by polymerase chain reaction. Out of these three, the sequence of env-derived gene with the longest open reading frame (named BERV-P env) was found to show high expression levels in trophoblast cell lines and to be similar to those of syncytin-Car1 genes found in dogs and cats, despite their disparate origins. These results suggest that placentation depends on various retrovirus-derived genes that could have replaced endogenous predecessors during evolution.
PMCID: PMC3590765  PMID: 23335121
endogenous retrovirus; RNA-seq; syncytin; envelope; cow
11.  H-InvDB in 2013: an omics study platform for human functional gene and transcript discovery 
Nucleic Acids Research  2012;41(Database issue):D915-D919.
H-InvDB ( is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.
PMCID: PMC3531145  PMID: 23197657
12.  Prediction of Protein-Destabilizing Polymorphisms by Manual Curation with Protein Structure 
PLoS ONE  2012;7(11):e50445.
The relationship between sequence polymorphisms and human disease has been studied mostly in terms of effects of single nucleotide polymorphisms (SNPs) leading to single amino acid substitutions that change protein structure and function. However, less attention has been paid to more drastic sequence polymorphisms which cause premature termination of a protein’s sequence or large changes, insertions, or deletions in the sequence. We have analyzed a large set (n = 512) of insertions and deletions (indels) and single nucleotide polymorphisms causing premature termination of translation in disease-related genes. Prediction of protein-destabilization effects was performed by graphical presentation of the locations of polymorphisms in the protein structure, using the Genomes TO Protein (GTOP) database, and manual annotation with a set of specific criteria. Protein-destabilization was predicted for 44.4% of the nonsense SNPs, 32.4% of the frameshifting indels, and 9.1% of the non-frameshifting indels. A prediction of nonsense-mediated decay allowed to infer which truncated proteins would actually be translated as defective proteins. These cases included the proteins linked to diseases inherited dominantly, suggesting a relation between these diseases and toxic aggregation. Our approach would be useful in identifying potentially aggregation-inducing polymorphisms that may have pathological effects.
PMCID: PMC3506574  PMID: 23189203
13.  GBE Editor’s Report 
Genome Biology and Evolution  2012;4(10):1031-1032.
PMCID: PMC3490415
14.  Comparative Genome Analysis of Three Eukaryotic Parasites with Differing Abilities To Transform Leukocytes Reveals Key Mediators of Theileria-Induced Leukocyte Transformation 
mBio  2012;3(5):e00204-12.
We sequenced the genome of Theileria orientalis, a tick-borne apicomplexan protozoan parasite of cattle. The focus of this study was a comparative genome analysis of T. orientalis relative to other highly pathogenic Theileria species, T. parva and T. annulata. T. parva and T. annulata induce transformation of infected cells of lymphocyte or macrophage/monocyte lineages; in contrast, T. orientalis does not induce uncontrolled proliferation of infected leukocytes and multiplies predominantly within infected erythrocytes. While synteny across homologous chromosomes of the three Theileria species was found to be well conserved overall, subtelomeric structures were found to differ substantially, as T. orientalis lacks the large tandemly arrayed subtelomere-encoded variable secreted protein-encoding gene family. Moreover, expansion of particular gene families by gene duplication was found in the genomes of the two transforming Theileria species, most notably, the TashAT/TpHN and Tar/Tpr gene families. Gene families that are present only in T. parva and T. annulata and not in T. orientalis, Babesia bovis, or Plasmodium were also identified. Identification of differences between the genome sequences of Theileria species with different abilities to transform and immortalize bovine leukocytes will provide insight into proteins and mechanisms that have evolved to induce and regulate this process. The T. orientalis genome database is available at
Cancer-like growth of leukocytes infected with malignant Theileria parasites is a unique cellular event, as it involves the transformation and immortalization of one eukaryotic cell by another. In this study, we sequenced the whole genome of a nontransforming Theileria species, Theileria orientalis, and compared it to the published sequences representative of two malignant, transforming species, T. parva and T. annulata. The genome-wide comparison of these parasite species highlights significant genetic diversity that may be associated with evolution of the mechanism(s) deployed by an intracellular eukaryotic parasite to transform its host cell.
PMCID: PMC3445966  PMID: 22951932
15.  Clone identification in Japanese flowering cherry (Prunus subgenus Cerasus) cultivars using nuclear SSR markers 
Breeding Science  2012;62(3):248-255.
Numerous cultivars of Japanese flowering cherry (Prunus subgenus Cerasus) are recognized, but in many cases they are difficult to distinguish morphologically. Therefore, we evaluated the clonal status of 215 designated cultivars using 17 SSR markers. More than half the cultivars were morphologically distinct and had unique genotypes. However, 22 cultivars were found to consist of multiple clones, which probably originate from the chance seedlings, suggesting that their unique characteristics have not been maintained through propagation by grafting alone. We also identified 23 groups consisting of two or more cultivars with identical genotypes. Most members of these groups were putatively synonymously related and morphologically identical. However, some of them were probably derived from bud sport mutants and had distinct morphologies. SSR marker analysis provided useful insights into the clonal status of the examined Japanese flowering cherry cultivars and proved to be a useful tool for cultivar characterization.
PMCID: PMC3501942  PMID: 23226085
Cerasus; clone identification; cultivars; Prunus; SSR; microsatellite; taxonomy
16.  Multiple Plastids Collected by the Dinoflagellate Dinophysis mitra through Kleptoplastidy 
Kleptoplastidy is the retention of plastids obtained from ingested algal prey, which may remain temporarily functional and be used for photosynthesis by the predator. We showed that the marine dinoflagellate Dinophysis mitra has great kleptoplastid diversity. We obtained 308 plastid rbcL sequences by gene cloning from 14 D. mitra cells and 102 operational taxonomic units (OTUs). Most sequences were new in the genetic database and positioned within Haptophyceae (227 sequences [73.7%], 80 OTUs [78.4%]), particularly within the genus Chrysochromulina. Others were closely related to Prasinophyceae (16 sequences [5.2%], 5 OTUs [4.9%]), Dictyochophyceae (14 sequences [4.5%], 5 OTUs [4.9%]), Pelagophyceae (14 sequences [4.5%], 1 OTU [1.0%]), Bolidophyceae (3 sequences [1.0%], 1 OTU [1.0%]), and Bacillariophyceae (1 sequence [0.3%], 1 OTU [1.0%]); however, 33 sequences (10.8%) as 9 OTUs (8.8%) were not closely clustered with any particular group. Only six sequences were identical to those of Chrysochromulina simplex, Chrysochromulina hirta, Chrysochromulina sp. TKB8936, Micromonas pusilla NEPCC29, Micromonas pusilla CCMP491, and an unidentified diatom. Thus, we detected >100 different plastid sequences from 14 D. mitra cells, strongly suggesting kleptoplastidy and the need for mixotrophic prey such as Laboea, Tontonia, and Strombidium-like ciliates, which retain numerous symbiotic plastids from different origins, for propagation and plastid sequestration.
PMCID: PMC3264124  PMID: 22101051
17.  TP Atlas: integration and dissemination of advances in Targeted Proteins Research Program (TPRP)—structural biology project phase II in Japan 
The Targeted Proteins Research Program (TPRP) promoted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan is the phase II of structural biology project (2007–2011) following the Protein 3000 Project (2002–2006) in Japan. While the phase I Protein 3000 Project put partial emphasis on the construction and maintenance of pipelines for structural analyses, the TPRP is dedicated to revealing the structures and functions of the targeted proteins that have great importance in both basic research and industrial applications. To pursue this objective, 35 Targeted Proteins (TP) Projects selected in the three areas of fundamental biology, medicine and pharmacology, and food and environment are tightly collaborated with 10 Advanced Technology (AT) Projects in the four fields of protein production, structural analyses, chemical library and screening, and information platform. Here, the outlines and achievements of the 35 TP Projects are summarized in the system named TP Atlas. Progress in the diversified areas is described in the modules of Graphical Summary, General Summary, Tabular Summary, and Structure Gallery of the TP Atlas in the standard and unified format. Advances in TP Projects owing to novel technologies stemmed from AT Projects and collaborative research among TP Projects are illustrated as a hallmark of the Program. The TP Atlas can be accessed at
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-012-9139-1) contains supplementary material, which is available to authorized users.
PMCID: PMC3414706  PMID: 22644393
Structural biology; National project; Research dissemination; Targeted Proteins Research Program; Protein 3000 Project
18.  Genome and Transcriptome Analysis of the Food-Yeast Candida utilis 
PLoS ONE  2012;7(5):e37226.
The industrially important food-yeast Candida utilis is a Crabtree effect-negative yeast used to produce valuable chemicals and recombinant proteins. In the present study, we conducted whole genome sequencing and phylogenetic analysis of C. utilis, which showed that this yeast diverged long before the formation of the CUG and Saccharomyces/Kluyveromyces clades. In addition, we performed comparative genome and transcriptome analyses using next-generation sequencing, which resulted in the identification of genes important for characteristic phenotypes of C. utilis such as those involved in nitrate assimilation, in addition to the gene encoding the functional hexose transporter. We also found that an antisense transcript of the alcohol dehydrogenase gene, which in silico analysis did not predict to be a functional gene, was transcribed in the stationary-phase, suggesting a novel system of repression of ethanol production. These findings should facilitate the development of more sophisticated systems for the production of useful reagents using C. utilis.
PMCID: PMC3356342  PMID: 22629373
19.  A New Database (GCD) on Genome Composition for Eukaryote and Prokaryote Genome Sequences and Their Initial Analyses 
Genome Biology and Evolution  2012;4(4):501-512.
Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1–4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at
PMCID: PMC3342873  PMID: 22417913
GCD; oligonucleotide frequency; alignment-free sequence comparison
20.  The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments 
Nucleic Acids Research  2011;40(Database issue):D38-D42.
The DNA Data Bank of Japan (DDBJ; maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the ‘DDBJ Omics Archive’ (DOR; and BioProject ( DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
PMCID: PMC3244990  PMID: 22110025
21.  Introduction to the Special Series 
PMCID: PMC3227408
22.  Evolutionary Patterns of Recently Emerged Animal Duplogs 
Genome Biology and Evolution  2011;3:1119-1135.
Duplogs, or intraspecies paralogs, constitute the important portion of eukaryote genomes and serve as a major source of functional innovation. We conducted detailed analyses of recently emerged animal duplogs. Genome data of three vertebrate species (Homo sapiens, Mus musculus, and Danio rerio), Caenorhabditis elegans, and two Drosophila species (Drosophila melanogaster and D. pseudoobscura) were used. Duplication events were divided into six age-groups according to the synonymous distance (dS) up to 0.6. Duplogs were classified into four equal-sized classes on physical distances and into three classes on relative orientations. We observed the following shared characteristics among intrachromosomal multiexon duplogs: 1) inverted duplogs account for 20–50%, and about a half of the physically most distant 25%; 2) except for C. elegans, the composition of physical distances, that of relative orientations, and the proportion of inverted duplogs in each physical distance category are more or less uniform; 3) except for C. elegans, the characteristics of the youngest (dS < 0.01) duplogs are similar to the overall characteristics of the entire set. These results suggest that intrachromosomal duplogs with fairly long physical distances were generated at once, rather than resulting from tandem duplications and subsequent genomic rearrangements. This is different from the three well-known modes of gene duplication: tandem duplication, retrotransposition, and genome duplication. We termed this new mode as “drift” duplication. The drift duplication has been producing duplicate copies at paces comparable with tandem duplications since the common ancestor of vertebrates, and it may have already operated in the common ancestor of bilateral animals.
PMCID: PMC3194840  PMID: 21859807
duplog; paralog; gene duplication; physical distance; transcriptional orientation; animals; genome-wide analysis; cross-sectional analysis
23.  Systems medicine and integrated care to combat chronic noncommunicable diseases 
Genome Medicine  2011;3(7):43.
We propose an innovative, integrated, cost-effective health system to combat major non-communicable diseases (NCDs), including cardiovascular, chronic respiratory, metabolic, rheumatologic and neurologic disorders and cancers, which together are the predominant health problem of the 21st century. This proposed holistic strategy involves comprehensive patient-centered integrated care and multi-scale, multi-modal and multi-level systems approaches to tackle NCDs as a common group of diseases. Rather than studying each disease individually, it will take into account their intertwined gene-environment, socio-economic interactions and co-morbidities that lead to individual-specific complex phenotypes. It will implement a road map for predictive, preventive, personalized and participatory (P4) medicine based on a robust and extensive knowledge management infrastructure that contains individual patient information. It will be supported by strategic partnerships involving all stakeholders, including general practitioners associated with patient-centered care. This systems medicine strategy, which will take a holistic approach to disease, is designed to allow the results to be used globally, taking into account the needs and specificities of local economies and health systems.
PMCID: PMC3221551  PMID: 21745417
24.  Evolutionary conserved microRNAs are ubiquitously expressed compared to tick-specific miRNAs in the cattle tick Rhipicephalus (Boophilus) microplus 
BMC Genomics  2011;12:328.
MicroRNAs (miRNAs) are small non-coding RNAs that act as regulators of gene expression in eukaryotes modulating a large diversity of biological processes. The discovery of miRNAs has provided new opportunities to understand the biology of a number of species. The cattle tick, Rhipicephalus (Boophilus) microplus, causes significant economic losses in cattle production worldwide and this drives us to further understand their biology so that effective control measures can be developed. To be able to provide new insights into the biology of cattle ticks and to expand the repertoire of tick miRNAs we utilized Illumina technology to sequence the small RNA transcriptomes derived from various life stages and selected organs of R. microplus.
To discover and profile cattle tick miRNAs we employed two complementary approaches, one aiming to find evolutionary conserved miRNAs and another focused on the discovery of novel cattle-tick specific miRNAs. We found 51 evolutionary conserved R. microplus miRNA loci, with 36 of these previously found in the tick Ixodes scapularis. The majority of the R. microplus miRNAs are perfectly conserved throughout evolution with 11, 5 and 15 of these conserved since the Nephrozoan (640 MYA), Protostomian (620MYA) and Arthropoda (540 MYA) ancestor, respectively. We then employed a de novo computational screening for novel tick miRNAs using the draft genome of I. scapularis and genomic contigs of R. microplus as templates. This identified 36 novel R. microplus miRNA loci of which 12 were conserved in I. scapularis. Overall we found 87 R. microplus miRNA loci, of these 15 showed the expression of both miRNA and miRNA* sequences. R. microplus miRNAs showed a variety of expression profiles, with the evolutionary-conserved miRNAs mainly expressed in all life stages at various levels, while the expression of novel tick-specific miRNAs was mostly limited to particular life stages and/or tick organs.
Anciently acquired miRNAs in the R. microplus lineage not only tend to accumulate the least amount of nucleotide substitutions as compared to those recently acquired miRNAs, but also show ubiquitous expression profiles through out tick life stages and organs contrasting with the restricted expression profiles of novel tick-specific miRNAs.
PMCID: PMC3141673  PMID: 21699734
25.  Binary classification of protein molecules into intrinsically disordered and ordered segments 
Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome.
In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing.
We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at
PMCID: PMC3199747  PMID: 21693062

