Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Identification of cucurbitacins and assembly of a draft genome for Aquilaria agallocha 
BMC Genomics  2014;15(1):578.
Agarwood is derived from Aquilaria trees, the trade of which has come under strict control with a listing in Appendix II of the Convention on International Trade in Endangered Species of Wild Fauna and Flora. Many secondary metabolites of agarwood are known to have medicinal value to humans, including compounds that have been shown to elicit sedative effects and exhibit anti-cancer properties. However, little is known about the genome, transcriptome, and the biosynthetic pathways responsible for producing such secondary metabolites in agarwood.
In this study, we present a draft genome and a putative pathway for cucurbitacin E and I, compounds with known medicinal value, from in vitro Aquilaria agallocha agarwood. DNA and RNA data are utilized to annotate many genes and protein functions in the draft genome. The expression changes for cucurbitacin E and I are shown to be consistent with known responses of A. agallocha to biotic stress and a set of homologous genes in Arabidopsis thaliana related to cucurbitacin bio-synthesis is presented and validated through qRT-PCR.
This study is the first attempt to identify cucurbitacin E and I from in vitro agarwood and the first draft genome for any species of Aquilaria. The results of this study will aid in future investigations of secondary metabolite pathways in Aquilaria and other non-model medicinal plants.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-578) contains supplementary material, which is available to authorized users.
PMCID: PMC4108785  PMID: 25005802
Agarwood; Cucurbitacin; Aquilaria; Genome
2.  Analysis of New Functional Profiles of Protein Isoforms Yielded by Ds Exonization in Rice 
Insertion of transposable elements (TEs) into introns can lead to their activation as alternatively spliced cassette exons, an event called exonization. Exonization can enrich the complexity of transcriptomes and proteomes. Previously, we performed a genome-wide computational analysis of Ds exonization events in the monocot Oryza sativa (rice). The insertion patterns of Ds increased the number of transcripts and subsequent protein isoforms, which were determined as interior and C-terminal variants. In this study, these variants were scanned with the PROSITE database in order to identify new functional profiles (domains) that were referred to their reference proteins. The new profiles of the variants were expected to be beneficial for a selective advantage and more than 70% variants achieved this. The new functional profiles could be contributed by an exon–intron junction, an intron alone, an intron–TE junction, or a TE alone. A Ds-inserted intron may yield 167 new profiles on average, while some cases can yield thousands of new profiles, of which C-terminal variants were in major. Additionally, more than 90% of the TE-inserted genes were found to gain novel functional profiles in each intron via exonization. Therefore, new functional profiles yielded by the exonization may occur in many local regions of the reference protein.
PMCID: PMC3795530  PMID: 24137048
Ac/Ds transposon; exonization; PROSITE; protein isoforms
3.  Discovery of Genes Related to Insecticide Resistance in Bactrocera dorsalis by Functional Genomic Analysis of a De Novo Assembled Transcriptome 
PLoS ONE  2012;7(8):e40950.
Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.
PMCID: PMC3413685  PMID: 22879883
4.  DBD2BS: connecting a DNA-binding protein with its binding sites 
Nucleic Acids Research  2012;40(Web Server issue):W173-W179.
By binding to short and highly conserved DNA sequences in genomes, DNA-binding proteins initiate, enhance or repress biological processes. Accurately identifying such binding sites, often represented by position weight matrices (PWMs), is an important step in understanding the control mechanisms of cells. When given coordinates of a DNA-binding domain (DBD) bound with DNA, a potential function can be used to estimate the change of binding affinity after base substitutions, where the changes can be summarized as a PWM. This technique provides an effective alternative when the chromatin immunoprecipitation data are unavailable for PWM inference. To facilitate the procedure of predicting PWMs based on protein–DNA complexes or even structures of the unbound state, the web server, DBD2BS, is presented in this study. The DBD2BS uses an atom-level knowledge-based potential function to predict PWMs characterizing the sequences to which the query DBD structure can bind. For unbound queries, a list of 1066 DBD–DNA complexes (including 1813 protein chains) is compiled for use as templates for synthesizing bound structures. The DBD2BS provides users with an easy-to-use interface for visualizing the PWMs predicted based on different templates and the spatial relationships of the query protein, the DBDs and the DNAs. The DBD2BS is the first attempt to predict PWMs of DBDs from unbound structures rather than from bound ones. This approach increases the number of existing protein structures that can be exploited when analyzing protein–DNA interactions. In a recent study, the authors showed that the kernel adopted by the DBD2BS can generate PWMs consistent with those obtained from the experimental data. The use of DBD2BS to predict PWMs can be incorporated with sequence-based methods to discover binding sites in genome-wide studies.
Available at:,, and
PMCID: PMC3394304  PMID: 22693214
5.  Predicting Target DNA Sequences of DNA-Binding Proteins Based on Unbound Structures 
PLoS ONE  2012;7(2):e30446.
DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins' unbound structures (structures of the unbound state). Given an unbound query protein and a template complex, the proposed method first employs structure alignment to generate synthetic protein-DNA complexes for the query protein. Once a complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on seven DNA-binding proteins, which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Since this work is the first attempt to predict target sequences of DNA-binding proteins from their unbound structures, three types of structural variations that presumably influence the prediction accuracy were examined and discussed. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.
PMCID: PMC3270014  PMID: 22312425
6.  seeMotif: exploring and visualizing sequence motifs in 3D structures 
Nucleic Acids Research  2009;37(Web Server issue):W552-W558.
Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D structures that have also been accumulated at an astounding rate in recent years. This article reports the development of the web service seeMotif, which provides users with an interactive interface for visualizing sequence motifs on protein structures from the Protein Data Bank (PDB). Researchers can quickly see the locations and conformation of multiple motifs among a number of related structures simultaneously. Considering the fact that PDB sequences are usually shorter than those in sequence databases and/or may have missing residues, seeMotif has two complementary approaches for selecting structures and mapping motifs to protein chains in structures. As more and more structures belonging to previously uncharacterized protein families become available, combining sequence and structure information gives good opportunities to facilitate understanding of protein functions in large-scale genome projects. Available at:, or
PMCID: PMC2703912  PMID: 19477961
7.  E1DS: catalytic site prediction based on 1D signatures of concurrent conservation 
Nucleic Acids Research  2008;36(Web Server issue):W291-W296.
Large-scale automatic annotation of protein sequences remains challenging in postgenomics era. E1DS is designed for annotating enzyme sequences based on a repository of 1D signatures. The employed sequence signatures are derived using a novel pattern mining approach that discovers long motifs consisted of several sequential blocks (conserved segments). Each of the sequential blocks is considerably conserved among the protein members of an EC group. Moreover, a signature includes at least three sequential blocks that are concurrently conserved, i.e. frequently observed together in sequences. In other words, a sequence signature is consisted of residues from multiple regions of the protein sequence, which echoes the observation that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. E1DS is evaluated based on a collection of enzymes with catalytic sites annotated in Catalytic Site Atlas. When compared to the famous pattern database PROSITE, predictions based on E1DS signatures are considered more sensitive in identifying catalytic sites and the involved residues. E1DS is available at and a mirror site can be found at
PMCID: PMC2447799  PMID: 18524800

Results 1-7 (7)