1.  Coupling Deep Transcriptome Analysis with Untargeted Metabolic Profiling in Ophiorrhiza pumila to Further the Understanding of the Biosynthesis of the Anti-Cancer Alkaloid Camptothecin and Anthraquinones 
Plant and Cell Physiology  2013;54(5):686-696.
The Rubiaceae species, Ophiorrhiza pumila, accumulates camptothecin, an anti-cancer alkaloid with a potent DNA topoisomerase I inhibitory activity, as well as anthraquinones that are derived from the combination of the isochorismate and hemiterpenoid pathways. The biosynthesis of these secondary products is active in O. pumila hairy roots yet very low in cell suspension culture. Deep transcriptome analysis was conducted in O. pumila hairy roots and cell suspension cultures using the Illumina platform, yielding a total of 2 Gb of sequence for each sample. We generated a hybrid transcriptome assembly of O. pumila using the Illumina-derived short read sequences and conventional Sanger-derived expressed sequence tag clones derived from a full-length cDNA library constructed using RNA from hairy roots. Among 35,608 non-redundant unigenes, 3,649 were preferentially expressed in hairy roots compared with cell suspension culture. Candidate genes involved in the biosynthetic pathway for the monoterpenoid indole alkaloid camptothecin were identified; specifically, genes involved in post-strictosamide biosynthetic events and genes involved in the biosynthesis of anthraquinones and chlorogenic acid. Untargeted metabolomic analysis by Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) indicated that most of the proposed intermediates in the camptothecin biosynthetic pathway accumulated in hairy roots in a preferential manner compared with cell suspension culture. In addition, a number of anthraquinones and chlorogenic acid preferentially accumulated in hairy roots compared with cell suspension culture. These results suggest that deep transcriptome and metabolome data sets can facilitate the identification of genes and intermediates involved in the biosynthesis of secondary products including camptothecin in O. pumila.
PMCID: PMC3653139  PMID: 23503598
Anthraquinone; Camptothecin; Hairy root; Metabolome; Ophiorrhiza pumila; Transcriptome
2.  Identification of an Imprinted Gene Cluster in the X-Inactivation Center 
PLoS ONE  2013;8(8):e71222.
Mammalian development is strongly influenced by the epigenetic phenomenon called genomic imprinting, in which either the paternal or the maternal allele of imprinted genes is expressed. Paternally expressed Xist, an imprinted gene, has been considered as a single cis-acting factor to inactivate the paternally inherited X chromosome (Xp) in preimplantation mouse embryos. This means that X-chromosome inactivation also entails gene imprinting at a very early developmental stage. However, the precise mechanism of imprinted X-chromosome inactivation remains unknown and there is little information about imprinted genes on X chromosomes. In this study, we examined whether there are other imprinted genes than Xist expressed from the inactive paternal X chromosome and expressed in female embryos at the preimplantation stage. We focused on small RNAs and compared their expression patterns between sexes by tagging the female X chromosome with green fluorescent protein. As a result, we identified two micro (mi)RNAs–miR-374-5p and miR-421-3p–mapped adjacent to Xist that were predominantly expressed in female blastocysts. Allelic expression analysis revealed that these miRNAs were indeed imprinted and expressed from the Xp. Further analysis of the imprinting status of adjacent locus led to the discovery of a large cluster of imprinted genes expressed from the Xp: Jpx, Ftx and Zcchc13. To our knowledge, this is the first identified cluster of imprinted genes in the cis-acting regulatory region termed the X-inactivation center. This finding may help in understanding the molecular mechanisms regulating imprinted X-chromosome inactivation during early mammalian development.
PMCID: PMC3735490  PMID: 23940725
3.  ATF6α/β-mediated adjustment of ER chaperone levels is essential for development of the notochord in medaka fish 
Molecular Biology of the Cell  2013;24(9):1387-1395.
The endoplasmic reticulum (ER) membrane-bound transcription factors ATF6α and ATF6β mediate adjustment of chaperone levels to increased demands in the ER, which is essential for development of the notochord; the latter synthesizes and secretes large amounts of extracellular matrix proteins to serve as the body axis before formation of the vertebra.
ATF6α and ATF6β are membrane-bound transcription factors activated by regulated intramembrane proteolysis in response to endoplasmic reticulum (ER) stress to induce various ER quality control proteins. ATF6α- and ATF6β single-knockout mice develop normally, but ATF6α/β double knockout causes embryonic lethality, the reason for which is unknown. Here we show in medaka fish that ATF6α is primarily responsible for transcriptional induction of the major ER chaperone BiP and that ATF6α/β double knockout, but not ATF6α- or ATF6β single knockout, causes embryonic lethality, as in mice. Analyses of ER stress reporters reveal that ER stress occurs physiologically during medaka early embryonic development, particularly in the brain, otic vesicle, and notochord, resulting in ATF6α- and ATF6β-mediated induction of BiP, and that knockdown of the α1 chain of type VIII collagen reduces such ER stress. The absence of transcriptional induction of several ER chaperones in ATF6α/β double knockout causes more profound ER stress and impaired notochord development, which is partially rescued by overexpression of BiP. Thus ATF6α/β-mediated adjustment of chaperone levels to increased demands in the ER is essential for development of the notochord, which synthesizes and secretes large amounts of extracellular matrix proteins to serve as the body axis before formation of the vertebra.
PMCID: PMC3639050  PMID: 23447699
4.  Development and Characterization of cDNA Resources for the Common Marmoset: One of the Experimental Primate Models 
The common marmoset is a new world monkey, which has become a valuable experimental animal for biomedical research. This study developed cDNA libraries for the common marmoset from five different tissues. A total of 290 426 high-quality EST sequences were obtained, where 251 587 sequences (86.5%) had homology (1E−100) with the Refseqs of six different primate species, including human and marmoset. In parallel, 270 673 sequences (93.2%) were aligned to the human genome. When 247 090 sequences were assembled into 17 232 contigs, most of the sequences (218 857 or 15 089 contigs) were located in exonic regions, indicating that these genes are expressed in human and marmoset. The other 5578 sequences (or 808 contigs) mapping to the human genome were not located in exonic regions, suggesting that they are not expressed in human. Furthermore, a different set of 118 potential coding sequences were not similar to any Refseqs in any species, and, thus, may represent unknown genes. The cDNA libraries developed in this study are available through RIKEN Bio Resource Center. A Web server for the marmoset cDNAs is available at, where each marmoset EST sequence has been annotated by reference to the human genome. These new libraries will be a useful genetic resource to facilitate research in the common marmoset.
PMCID: PMC3686431  PMID: 23543116
common marmoset; cDNA; gene resource
5.  Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development 
Renfree, Marilyn B | Papenfuss, Anthony T | Deakin, Janine E | Lindsay, James | Heider, Thomas | Belov, Katherine | Rens, Willem | Waters, Paul D | Pharo, Elizabeth A | Shaw, Geoff | Wong, Emily SW | Lefèvre, Christophe M | Nicholas, Kevin R | Kuroki, Yoko | Wakefield, Matthew J | Zenger, Kyall R | Wang, Chenwei | Ferguson-Smith, Malcolm | Nicholas, Frank W | Hickford, Danielle | Yu, Hongshi | Short, Kirsty R | Siddle, Hannah V | Frankenberg, Stephen R | Chew, Keng Yih | Menzies, Brandon R | Stringer, Jessica M | Suzuki, Shunsuke | Hore, Timothy A | Delbridge, Margaret L | Mohammadi, Amir | Schneider, Nanette Y | Hu, Yanqiu | O'Hara, William | Al Nadaf, Shafagh | Wu, Chen | Feng, Zhi-Ping | Cocks, Benjamin G | Wang, Jianghui | Flicek, Paul | Searle, Stephen MJ | Fairley, Susan | Beal, Kathryn | Herrero, Javier | Carone, Dawn M | Suzuki, Yutaka | Sugano, Sumio | Toyoda, Atsushi | Sakaki, Yoshiyuki | Kondo, Shinji | Nishida, Yuichiro | Tatsumoto, Shoji | Mandiou, Ion | Hsu, Arthur | McColl, Kaighin A | Lansdell, Benjamin | Weinstock, George | Kuczek, Elizabeth | McGrath, Annette | Wilson, Peter | Men, Artem | Hazar-Rethinam, Mehlika | Hall, Allison | Davis, John | Wood, David | Williams, Sarah | Sundaravadanam, Yogi | Muzny, Donna M | Jhangiani, Shalini N | Lewis, Lora R | Morgan, Margaret B | Okwuonu, Geoffrey O | Ruiz, San Juana | Santibanez, Jireh | Nazareth, Lynne | Cree, Andrew | Fowler, Gerald | Kovar, Christie L | Dinh, Huyen H | Joshi, Vandita | Jing, Chyn | Lara, Fremiet | Thornton, Rebecca | Chen, Lei | Deng, Jixin | Liu, Yue | Shen, Joshua Y | Song, Xing-Zhi | Edson, Janette | Troon, Carmen | Thomas, Daniel | Stephens, Amber | Yapa, Lankesha | Levchenko, Tanya | Gibbs, Richard A | Cooper, Desmond W | Speed, Terence P | Fujiyama, Asao | M Graves, Jennifer A | O'Neill, Rachel J | Pask, Andrew J | Forrest, Susan M | Worley, Kim C
Genome Biology  2011;12(8):R81.
We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development.
The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements.
Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.
PMCID: PMC3277949  PMID: 21854559
6.  Biochemical Characterization of a Novel Indole Prenyltransferase from Streptomyces sp. SN-593▿ †  
Journal of Bacteriology  2010;192(11):2839-2851.
Genome sequencing of Streptomyces species has highlighted numerous potential genes of secondary metabolite biosynthesis. The mining of cryptic genes is important for exploring chemical diversity. Here we report the metabolite-guided genome mining and functional characterization of a cryptic gene by biochemical studies. Based on systematic purification of metabolites from Streptomyces sp. SN-593, we isolated a novel compound, 6-dimethylallylindole (DMAI)-3-carbaldehyde. Although many 6-DMAI compounds have been isolated from a variety of organisms, an enzyme catalyzing the transfer of a dimethylallyl group to the C-6 indole ring has not been reported so far. A homology search using known prenyltransferase sequences against the draft sequence of the Streptomyces sp. SN-593 genome revealed the iptA gene. The IptA protein showed 27% amino acid identity to cyanobacterial LtxC, which catalyzes the transfer of a geranyl group to (−)-indolactam V. A BLAST search against IptA revealed much-more-similar homologs at the amino acid level than LtxC, namely, SAML0654 (60%) from Streptomyces ambofaciens ATCC 23877 and SCO7467 (58%) from S. coelicolor A3(2). Phylogenetic analysis showed that IptA was distinct from bacterial aromatic prenyltransferases and fungal indole prenyltransferases. Detailed kinetic analyses of IptA showed the highest catalytic efficiency (6.13 min−1 μM−1) for l-Trp in the presence of dimethylallyl pyrophosphate (DMAPP), suggesting that the enzyme is a 6-dimethylallyl-l-Trp synthase (6-DMATS). Substrate specificity analyses of IptA revealed promiscuity for indole derivatives, and its reaction products were identified as novel 6-DMAI compounds. Moreover, ΔiptA mutants abolished the production of 6-DMAI-3-carbaldehyde as well as 6-dimethylallyl-l-Trp, suggesting that the iptA gene is involved in the production of 6-DMAI-3-carbaldehyde.
PMCID: PMC2876496  PMID: 20348259
7.  Small RNA class transition from siRNA/piRNA to miRNA during pre-implantation mouse development 
Nucleic Acids Research  2010;38(15):5141-5151.
Recent studies showed that small interfering RNAs (siRNAs) and Piwi-interacting RNA (piRNA) in mammalian germ cells play important roles in retrotransposon silencing and gametogenesis. However, subsequent contribution of those small RNAs to early mammalian development remains poorly understood. We investigated the expression profiles of small RNAs in mouse metaphase II oocytes, 8–16-cell stage embryos, blastocysts and the pluripotent inner cell mass (ICM) using high-throughput pyrosequencing. Here, we show that during pre-implantation development a major small RNA class changes from retrotransposon-derived small RNAs containing siRNAs and piRNAs to zygotically synthesized microRNAs (miRNAs). Some siRNAs and piRNAs are transiently upregulated and directed against specific retrotransposon classes. We also identified miRNAs expression profiles characteristic of the ICM and trophectoderm (TE) cells. Taken together, our current study reveals a major reprogramming of functional small RNAs during early mouse development from oocyte to blastocyst.
PMCID: PMC2926599  PMID: 20385573
8.  Ontogeny of Circadian Organization in the Rat 
Journal of biological rhythms  2009;24(1):55-63.
The mammalian circadian system is orchestrated by a master pacemaker in the brain but many peripheral tissues also contain independent or quasi-independent circadian oscillators. The adaptive significance of clocks in these structures must lie, in large part, in the phase relationships between the constituent oscillators and their micro- and macro-environments. To examine the relationship between postnatal development, which is dependent on endogenous programs and maternal/environmental influences, and the phase of circadian oscillators, we assessed the circadian phase of pineal, liver, lung, adrenal, and thyroid tissues cultured from Period 1-luciferase (Per1-luc) rat pups of various postnatal ages. The liver, thyroid, and pineal were rhythmic at birth, but the phases of their Per1-luc expression rhythms shifted remarkably during development. To determine if the timing of the phase shift in each tissue could be the result of changing environmental conditions, we monitored the behavior of pups and their mothers. We found that the circadian phase of the liver shifted from the day to night around postnatal day (P) 22 as the pups nursed less during the light and instead ate solid food during the dark. Furthermore, the phase of Per1-luc expression in liver cultures from nursing neonates could be shifted experimentally from the day to the night by allowing pups access to the dam only during the dark. Peak Per1-luc expression also shifted from mid-day to early night in thyroid cultures at about P20, concurrent with the shift in eating times. The phase of Per1-luc expression in the pineal gland shifted from day to night coincident with its sympathetic innervation at around P5. Per1-luc expression was rhythmic in adrenal cultures and peaked around the time of lights-off throughout development, however, the amplitude of the rhythm increased at P25. Lung cultures were completely arrhythmic until P12 when the pups began to leave the nest. Taken together, our data suggest that the molecular machinery that generates circadian oscillations matures at different rates in different tissues and that the phase of at least some peripheral organs is malleable and may shift as the organ's function changes during development.
PMCID: PMC2665126  PMID: 19150929
development; circadian rhythms; suprachiasmatic nucleus; peripheral clocks; Period1; luciferase reporter; mammalian
9.  Ligand-specific sequential regulation of transcription factors for differentiation of MCF-7 cells 
BMC Genomics  2009;10:545.
Sharing a common ErbB/HER receptor signaling pathway, heregulin (HRG) induces differentiation of MCF-7 human breast cancer cells while epidermal growth factor (EGF) elicits proliferation. Although cell fates resulting from action of the aforementioned ligands completely different, the respective gene expression profiles in early transcription are qualitatively similar, suggesting that gene expression during late transcription, but not early transcription, may reflect ligand specificity. In this study, based on both the data from time-course quantitative real-time PCR on over 2,000 human transcription factors and microarray of all human genes, we identified a series of transcription factors which may control HRG-specific late transcription in MCF-7 cells.
We predicted that four transcription factors including EGR4, FRA-1, FHL2, and DIPA should have responsibility of regulation in MCF-7 cell differentiation. Validation analysis suggested that one member of the activator protein 1 (AP-1) family, FOSL-1 (FRA-1 gene), appeared immediately following c-FOS expression, might be responsible for expression of transcription factor FHL2 through activation of the AP-1 complex. Furthermore, RNAi gene silencing of FOSL-1 and FHL2 resulted in increase of extracellular signal-regulated kinase (ERK) phosphorylation of which duration was sustained by HRG stimulation.
Our analysis indicated that a time-dependent transcriptional regulatory network including c-FOS, FRA-1, and FHL2 is vital in controlling the ERK signaling pathway through a negative feedback loop for MCF-7 cell differentiation.
PMCID: PMC2785842  PMID: 19925682
10.  The identification and functional implications of human-specific "fixed" amino acid substitutions in the glutamate receptor family 
The glutamate receptors (GluRs) play a vital role in the mediation of excitatory synaptic transmission in the central nervous system. To clarify the evolutionary dynamics and mechanisms of the GluR genes in the lineage leading to humans, we determined the complete sequences of the coding regions and splice sites of 26 chimpanzee GluR genes.
We found that all of the reading frames and splice sites of these genes reported in humans were completely conserved in chimpanzees, suggesting that there were no gross structural changes in humans after their divergence from the human-chimpanzee common ancestor. We observed low KA/KS ratios in both humans and chimpanzees, and we found no evidence of accelerated evolution. We identified 30 human-specific "fixed" amino acid substitutions in the GluR genes by analyzing 80 human samples of seven different populations worldwide. Grantham's distance analysis showed that GRIN2C and GRIN3A are the most and the second most diverged GluR genes between humans and chimpanzees. However, most of the substitutions are non-radical and are not clustered in any particular region. Protein motif analysis assigned 11 out of these 30 substitutions to functional regions. Two out of these 11 substitutions, D71G in GRIN3A and R727H in GRIN3B, caused differences in the functional assignments of these genes between humans and other apes.
We conclude that the GluR genes did not undergo drastic changes such as accelerated evolution in the human lineage after the divergence of chimpanzees. However, there remains a possibility that two human-specific "fixed" amino acid substitutions, D71G in GRIN3A and R727H in GRIN3B, are related to human-specific brain function.
PMCID: PMC2753569  PMID: 19737383
11.  Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns 
BMC Genomics  2009;10:271.
Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs) for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals.
As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks) of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection.
We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the ongoing curation and annotation of the wheat genome. The data for each clone's expression in various tissues and stress treatments and its variability in wheat and rice as a result of their diversification are valuable tools for functional genomics in wheat and for comparative genomics in cereals.
PMCID: PMC2703658  PMID: 19534823
12.  Analysis of Multiple Occurrences of Alternative Splicing Events in Arabidopsis thaliana Using Novel Sequenced Full-Length cDNAs 
Alternative splicing (AS) is a mechanism by which multiple types of mature mRNAs are generated from a single pre-mature mRNA. In this study, we completely sequenced 1800 full-length cDNAs from Arabidopsis thaliana, which had 5′ and/or 3′ sequences that were previously found to have AS events or alternative transcription start sites. Unexpectedly, these sequences gave us further evidence of AS, as 601 out of 1800 transcripts showed novel AS events. We focused on the combination patterns of multiple AS events within individual genes. Interestingly, some specific AS event combination patterns tended to appear more frequently than expected. The two most common patterns were: (i) alternative donor–0∼12 times of exon skips–alternative acceptor and (ii) several times (∼8) of retained introns. We also found that multiple AS events in a transcript tend to have the same effects concerning the length of the mature mRNA. Our current results are consistent with our previous observations, which showed changes in AS profiles under different conditions, and suggest the involvement of hypothetical cis- and trans-acting factors in the regulation of AS events.
PMCID: PMC2695776  PMID: 19423640
Arabidopsis; alternative splicing; bioinformatics; full-length cDNA
13.  Sequencing and Analysis of Approximately 40 000 Soybean cDNA Clones from a Full-Length-Enriched cDNA Library 
A large collection of full-length cDNAs is essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. We obtained a total of 39 936 soybean cDNA clones (GMFL01 and GMFL02 clone sets) in a full-length-enriched cDNA library which was constructed from soybean plants that were grown under various developmental and environmental conditions. Sequencing from 5′ and 3′ ends of the clones generated 68 661 expressed sequence tags (ESTs). The EST sequences were clustered into 22 674 scaffolds involving 2580 full-length sequences. In addition, we sequenced 4712 full-length cDNAs. After removing overlaps, we obtained 6570 new full-length sequences of soybean cDNAs so far. Our data indicated that 87.7% of the soybean cDNA clones contain complete coding sequences in addition to 5′- and 3′-untranslated regions. All of the obtained data confirmed that our collection of soybean full-length cDNAs covers a wide variety of genes. Comparative analysis between the derived sequences from soybean and Arabidopsis, rice or other legumes data revealed that some specific genes were involved in our collection and a large part of them could be annotated to unknown functions. A large set of soybean full-length cDNA clones reported in this study will serve as a useful resource for gene discovery from soybean and will also aid a precise annotation of the soybean genome.
PMCID: PMC2608845  PMID: 18927222
EST; full-length cDNA; functional annotation; legume; soybean
14.  Absolute quantification of the budding yeast transcriptome by means of competitive PCR between genomic and complementary DNAs 
BMC Genomics  2008;9:574.
An ideal format to describe transcriptome would be its composition measured on the scale of absolute numbers of individual mRNAs per cell. It would help not only to precisely grasp the structure of the transcriptome but also to accelerate data exchange and integration.
We conceived an idea of competitive PCR between genomic DNA and cDNA. Since the former contains every gene exactly at the same copy number, it can serve as an ideal normalization standard for the latter to obtain stoichiometric composition data of the transcriptome. This data can then be easily converted to absolute quantification data provided with an appropriate calibration. To implement this idea, we improved adaptor-tagged competitive PCR, originally developed for relative quantification of the 3'-end restriction fragment of each cDNA, such that it can be applied to any restriction fragment. We demonstrated that this "generalized" adaptor-tagged competitive PCR (GATC-PCR) can be performed between genomic DNA and cDNA to accurately measure absolute expression level of each mRNA in the budding yeast Saccharomyces cerevisiae. Furthermore, we constructed a large-scale GATC-PCR system to measure absolute expression levels of 5,038 genes to show that the yeast contains more than 30,000 copies of mRNA molecules per cell.
We developed a GATC-PCR method to accurately measure absolute expression levels of mRNAs by means of competitive amplification of genomic and cDNA copies of each gene. A large-scale application of GATC-PCR to the budding yeast transcriptome revealed that it is twice or more as large as previously estimated. This method is flexibly applicable to both targeted and genome-wide analyses of absolute expression levels of mRNAs.
PMCID: PMC2612024  PMID: 19040753
15.  Large-scale collection and annotation of full-length enriched cDNAs from a model halophyte, Thellungiella halophila 
BMC Plant Biology  2008;8:115.
Thellungiella halophila (also known as Thellungiella salsuginea) is a model halophyte with a small plant size, short life cycle, and small genome. It easily undergoes genetic transformation by the floral dipping method used with its close relative, Arabidopsis thaliana. Thellungiella genes exhibit high sequence identity (approximately 90% at the cDNA level) with Arabidopsis genes. Furthermore, Thellungiella not only shows tolerance to extreme salinity stress, but also to chilling, freezing, and ozone stress, supporting the use of Thellungiella as a good genomic resource in studies of abiotic stress tolerance.
We constructed a full-length enriched Thellungiella (Shan Dong ecotype) cDNA library from various tissues and whole plants subjected to environmental stresses, including high salinity, chilling, freezing, and abscisic acid treatment. We randomly selected about 20 000 clones and sequenced them from both ends to obtain a total of 35 171 sequences. CAP3 software was used to assemble the sequences and cluster them into 9569 nonredundant cDNA groups. We named these cDNAs "RTFL" (RIKEN Thellungiella Full-Length) cDNAs. Information on functional domains and Gene Ontology (GO) terms for the RTFL cDNAs were obtained using InterPro. The 8289 genes assigned to InterPro IDs were classified according to the GO terms using Plant GO Slim. Categorical comparison between the whole Arabidopsis genome and Thellungiella genes showing low identity to Arabidopsis genes revealed that the population of Thellungiella transport genes is approximately 1.5 times the size of the corresponding Arabidopsis genes. This suggests that these genes regulate a unique ion transportation system in Thellungiella.
As the number of Thellungiella halophila (Thellungiella salsuginea) expressed sequence tags (ESTs) was 9388 in July 2008, the number of ESTs has increased to approximately four times the original value as a result of this effort. Our sequences will thus contribute to correct future annotation of the Thellungiella genome sequence. The full-length enriched cDNA clones will enable the construction of overexpressing mutant plants by introduction of the cDNAs driven by a constitutive promoter, the complementation of Thellungiella mutants, and the determination of promoter regions in the Thellungiella genome.
PMCID: PMC2621223  PMID: 19014467
16.  Identification and classification of genes regulated by phosphatidylinositol 3-kinase- and TRKB-mediated signalling pathways during neuronal differentiation in two subtypes of the human neuroblastoma cell line SH-SY5Y 
BMC Research Notes  2008;1:95.
SH-SY5Y cells exhibit a neuronal phenotype when treated with all-trans retinoic acid (RA), but the molecular mechanism of activation in the signalling pathway mediated by phosphatidylinositol 3-kinase (PI3K) is unclear. To investigate this mechanism, we compared the gene expression profiles in SK-N-SH cells and two subtypes of SH-SY5Y cells (SH-SY5Y-A and SH-SY5Y-E), each of which show a different phenotype during RA-mediated differentiation.
SH-SY5Y-A cells differentiated in the presence of RA, whereas RA-treated SH-SY5Y-E cells required additional treatment with brain-derived neurotrophic factor (BDNF) for full differentiation. After exposing cells to a PI3K inhibitor, LY294002, we identified 386 genes and categorised these genes into two clusters dependent on the PI3K signalling pathway during RA-mediated differentiation in SH-SY5Y-A cells. Transcriptional regulation of the gene cluster, including 158 neural genes, was greatly reduced in SK-N-SH cells and partially impaired in SH-SY5Y-E cells, which is consistent with a defect in the neuronal phenotype of these cells. Additional stimulation with BDNF induced a set of neural genes that were down-regulated in RA-treated SH-SY5Y-E cells but were abundant in differentiated SH-SY5Y-A cells.
We identified gene clusters controlled by PI3K- and TRKB-mediated signalling pathways during the differentiation of two subtypes of SH-SY5Y cells. The TRKB-mediated bypass pathway compensates for impaired neural function generated by defects in several signalling pathways, including PI3K in SH-SY5Y-E cells. Our expression profiling data will be useful for further elucidation of the signal transduction-transcriptional network involving PI3K or TRKB.
PMCID: PMC2615028  PMID: 18957096
17.  Characterization of expressed sequence tags from a full-length enriched cDNA library of Cryptomeria japonica male strobili 
BMC Genomics  2008;9:383.
Cryptomeria japonica D. Don is one of the most commercially important conifers in Japan. However, the allergic disease caused by its pollen is a severe public health problem in Japan. Since large-scale analysis of expressed sequence tags (ESTs) in the male strobili of C. japonica should help us to clarify the overall expression of genes during the process of pollen development, we constructed a full-length enriched cDNA library that was derived from male strobili at various developmental stages.
We obtained 36,011 expressed sequence tags (ESTs) from either one or both ends of 19,437 clones derived from the cDNA library of C. japonica male strobili at various developmental stages. The 19,437 cDNA clones corresponded to 10,463 transcripts. Approximately 80% of the transcripts resembled ESTs from Pinus and Picea, while approximately 75% had homologs in Arabidopsis. An analysis of homologies between ESTs from C. japonica male strobili and known pollen allergens in the Allergome Database revealed that products of 180 transcripts exhibited significant homology. Approximately 2% of the transcripts appeared to encode transcription factors. We identified twelve genes for MADS-box proteins among these transcription factors. The twelve MADS-box genes were classified as DEF/GLO/GGM13-, AG-, AGL6-, TM3- and TM8-like MIKCC genes and type I MADS-box genes.
Our full-length enriched cDNA library derived from C. japonica male strobili provides information on expression of genes during the development of male reproductive organs. We provided potential allergens in C. japonica. We also provided new information about transcription factors including MADS-box genes expressed in male strobili of C. japonica. Large-scale gene discovery using full-length cDNAs is a valuable tool for studies of gymnosperm species.
PMCID: PMC2568000  PMID: 18691438
18.  Integrative Genome-Wide Expression Analysis Bears Evidence of Estrogen Receptor-Independent Transcription in Heregulin-Stimulated MCF-7 Cells 
PLoS ONE  2008;3(3):e1803.
Heregulin ß-1 (HRG) is an extracellular ligand that activates mitogen-activated protein kinase (MAPK) and phosphatidylinositol-3-OH kinase (PI3K)/Akt signaling pathways through ErbB receptors. MAPK and Akt have been shown to phosphorylate the estrogen receptor (ER) at Ser-118 and Ser-167, respectively, thereby mimicking the effects of estrogenic activity such as estrogen responsive element (ERE)-dependent transcription. In the current study, integrative analysis was performed using two tiling array platforms, comprising histone H3 lysine 9 (H3K9) acetylation and RNA mapping, together with array comparative genomic hybridization (CGH) analysis in an effort to identify HRG-regulated genes in ER-positive MCF-7 breast cancer cells. Through application of various threshold settings, 333 (326 up-regulated and 7 down-regulated) HRG-regulated genes were detected. Prediction of upstream transcription factors (TFs) and pathway analysis indicated that 21% of HRG-induced gene regulation may be controlled by the MAPK cascade, while only 0.6% of the gene expression is controlled by ERE. A comparison with previously reported estrogen (E2)-regulated gene expression data revealed that only 12 common genes were identified between the 333 HRG-regulated (3.6%) and 239 E2-regulated (5.0%) gene groups. However, with respect to enriched upstream TFs, 4 common TFs were identified in the 14 HRG-regulated (28.6%) and 13 E2-regulated (30.8%) gene groups. These results indicated that while E2 and HRG may induce common TFs, the regulatory mechanisms that govern HRG- and E2-induced gene expression differ.
PMCID: PMC2266794  PMID: 18350142
19.  Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response 
BMC Plant Biology  2007;7:66.
Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs).
The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features.
The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome.
PMCID: PMC2245942  PMID: 18096061
20.  Functional annotation of 19,841 Populus nigra full-length enriched cDNA clones 
BMC Genomics  2007;8:448.
Populus is one of favorable model plants because of its small genome. Structural genomics of Populus has reached a breakpoint as nucleotides of the entire genome have been determined. Reaching the post genome era, functional genomics of Populus is getting more important for well-comprehended plant science. Development of bioresorce serving functional genomics is making rapid progress. Huge efforts have achieved deposits of expressed sequence tags (ESTs) in various plant species consequently accelerating functional analysis of genes. ESTs from full-length cDNA clones are especially powerful for accurate molecular annotation. We promoted collection and annotation of the ESTs from Populus full-length enriched cDNA clones as part of functional genomics of tree species.
We have been collecting the full-length enriched cDNA of the female poplar (Populus nigra var. italica) for years. By sequencing P. nigra full-length (PnFL) cDNA libraries, we generated about 116,000 5'-end or 3'-end ESTs corresponding to 19,841 nonredundant PnFL clones. Population of PnFL cDNA clones represents 44% of the predicted genes in the Populus genome.
Our resource of P. nigra full-length enriched clones is expected to provide valuable tools to gain further insight into genome annotation and functional genomics in Populus.
PMCID: PMC2222646  PMID: 18053163
21.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes 
Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic and gene composition, those from adults and weaned children were more complex but showed a high functional uniformity regardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene families commonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis of their predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinal environment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. By analysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal microbiomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes, which strongly suggests that the intestine is a ‘hot spot’ for horizontal gene transfer between microbes.
PMCID: PMC2533590  PMID: 17916580
metagenomics; human gut microbiota; gene family; conjugative transposon
22.  An integrative in silico approach for discovering candidates for drug-targetable protein-protein interactions in interactome data 
BMC Pharmacology  2007;7:10.
Protein-protein interactions (PPIs) are challenging but attractive targets for small chemical drugs. Whole PPIs, called the 'interactome', have been emerged in several organisms, including human, based on the recent development of high-throughput screening (HTS) technologies. Individual PPIs have been targeted by small drug-like chemicals (SDCs), however, interactome data have not been fully utilized for exploring drug targets due to the lack of comprehensive methodology for utilizing these data. Here we propose an integrative in silico approach for discovering candidates for drug-targetable PPIs in interactome data.
Our novel in silico screening system comprises three independent assessment procedures: i) detection of protein domains responsible for PPIs, ii) finding SDC-binding pockets on protein surfaces, and iii) evaluating similarities in the assignment of Gene Ontology (GO) terms between specific partner proteins. We discovered six candidates for drug-targetable PPIs by applying our in silico approach to original human PPI data composed of 770 binary interactions produced by our HTS yeast two-hybrid (HTS-Y2H) assays. Among them, we further examined two candidates, RXRA/NRIP1 and CDK2/CDKN1A, with respect to their biological roles, PPI network around each candidate, and tertiary structures of the interacting domains.
An integrative in silico approach for discovering candidates for drug-targetable PPIs was applied to original human PPIs data. The system excludes false positive interactions and selects reliable PPIs as drug targets. Its effectiveness was demonstrated by the discovery of the six promising candidate target PPIs. Inhibition or stabilization of the two interactions may have potential therapeutic effects against human diseases.
PMCID: PMC2045083  PMID: 17705877
23.  Generation of medaka gene knockout models by target-selected mutagenesis 
Genome Biology  2006;7(12):R116.
A reverse genetics approach for the routine generation of medaka (Oryzias latipes) gene knockouts is described and applied to create a cryopreserved resource containing knockouts for most medaka genes.
We have established a reverse genetics approach for the routine generation of medaka (Oryzias latipes) gene knockouts. A cryopreserved library of N-ethyl-N-nitrosourea (ENU) mutagenized fish was screened by high-throughput resequencing for induced point mutations. Nonsense and splice site mutations were retrieved for the Blm, Sirt1, Parkin and p53 genes and functional characterization of p53 mutants indicated a complete knockout of p53 function. The current cryopreserved resource is expected to contain knockouts for most medaka genes.
PMCID: PMC1794429  PMID: 17156454
24.  Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression 
BMC Genomics  2004;5:16.
Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions.
We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI.
Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions.
PMCID: PMC375527  PMID: 15053842
promoter; tissue-specific gene expression; position weight matrix; regulatory motif
25.  Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates 
Genome Biology  2003;4(11):R74.
The first comprehensive analysis of human processed pseudogenes (PPs) using all known human genes as queries is presented. The data suggest a nearly simultaneous burst of PP and Alu formation in the genomes of ancestral primates.
Abundant pseudogenes are a feature of mammalian genomes. Processed pseudogenes (PPs) are reverse transcribed from mRNAs. Recent molecular biological studies show that mammalian long interspersed element 1 (L1)-encoded proteins may have been involved in PP reverse transcription. Here, we present the first comprehensive analysis of human PPs using all known human genes as queries.
The human genome was queried and 3,664 candidate PPs were identified. The most abundant were copies of genes encoding keratin 18, glyceraldehyde-3-phosphate dehydrogenase and ribosomal protein L21. A simple method was developed to estimate the level of nucleotide substitutions (and therefore the age) of PPs. A Poisson-like age distribution was obtained with a mean age close to that of the Alu repeats, the predominant human short interspersed elements. These data suggest a nearly simultaneous burst of PP and Alu formation in the genomes of ancestral primates. The peak period of amplification of these two distinct retrotransposons was estimated to be 40-50 million years ago. Concordant amplification of certain L1 subfamilies with PPs and Alus was observed.
We suggest that a burst of formation of PPs and Alus occurred in the genome of ancestral primates. One possible mechanism is that proteins encoded by members of particular L1 subfamilies acquired an enhanced ability to recognize cytosolic RNAs in trans.
PMCID: PMC329124  PMID: 14611660

