Search tips
Search criteria

Results 1-25 (80)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites 
PLoS ONE  2014;9(6):e99015.
The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs.
PMCID: PMC4057186  PMID: 24926895
2.  Gene Expression Profiling Reveals Large Regulatory Switches between Succeeding Stipe Stages in Volvariella volvacea 
PLoS ONE  2014;9(5):e97789.
The edible mushroom Volvariella volvacea is an important crop in Southeast Asia and is predominantly harvested in the egg stage. One of the main factors that negatively affect its yield and value is the rapid transition from the egg to the elongation stage, which has a decreased commodity value and shelf life. To improve our understanding of the changes during stipe development and the transition from egg to elongation stage in particular, we analyzed gene transcription in stipe tissue of V. volvacea using 3′-tag based digital expression profiling. Stipe development turned out to be fairly complex with high numbers of expressed genes, and regulation of stage differences is mediated mainly by changes in expression levels of genes, rather than on/off modulation. Most explicit is the strong up-regulation of cell division from button to egg, and the very strong down-regulation hereof from egg to elongation, that continues in the maturation stage. Button and egg share cell division as means of growth, followed by a major developmental shift towards rapid stipe elongation based on cell extension as demonstrated by inactivation of cell division throughout elongation and maturation. Examination of regulatory genes up-regulated from egg to elongation identified three potential high upstream regulators for this switch. The new insights in stipe dynamics, together with a series of new target genes, will provide a sound base for further studies on the developmental mechanisms of mushroom stipes and the switch from egg to elongation in V. volvacea in particular.
PMCID: PMC4035324  PMID: 24867220
3.  Characterization of Withania somnifera Leaf Transcriptome and Expression Analysis of Pathogenesis – Related Genes during Salicylic Acid Signaling 
PLoS ONE  2014;9(4):e94803.
Withania somnifera (L.) Dunal is a valued medicinal plant with pharmaceutical applications. The present study was undertaken to analyze the salicylic acid induced leaf transcriptome of W. somnifera. A total of 45.6 million reads were generated and the de novo assembly yielded 73,523 transcript contig with average transcript contig length of 1620 bp. A total of 71,062 transcripts were annotated and 53,424 of them were assigned GO terms. Mapping of transcript contigs to biological pathways revealed presence of 182 pathways. Seventeen genes representing 12 pathogenesis-related (PR) families were mined from the transcriptome data and their pattern of expression post 17 and 36 hours of salicylic acid treatment was documented. The analysis revealed significant up-regulation of all families of PR genes by 36 hours post treatment except WsPR10. The relative fold expression of transcripts ranged from 1 fold to 6,532 fold. The two families of peroxidases including the lignin-forming anionic peroxidase (WsL-PRX) and suberization-associated anionic peroxidase (WsS-PRX) recorded maximum expression of 377 fold and 6532 fold respectively, while the expression of WsPR10 was down-regulated by 14 fold. Additionally, the most stable reference gene for normalization of qRT-PCR data was also identified. The effect of SA on the accumulation of major secondary metabolites of W. somnifera including withanoside V, withaferin A and withanolide A was also analyzed and an increase in content of all the three metabolites were detected. This is the first report on expression patterns of PR genes during salicylic acid signaling in W. somnifera.
PMCID: PMC3989240  PMID: 24739900
4.  Identification of Thyroid Carcinoma Related Genes with mRMR and Shortest Path Approaches 
PLoS ONE  2014;9(4):e94022.
Thyroid cancer is a malignant neoplasm originated from thyroid cells. It can be classified into papillary carcinomas (PTCs) and anaplastic carcinomas (ATCs). Although ATCs are in an very aggressive status and cause more death than PTCs, their difference is poorly understood at molecular level. In this study, we focus on the transcriptome difference among PTCs, ATCs and normal tissue from a published dataset including 45 normal tissues, 49 PTCs and 11 ATCs, by applying a machine learning method, maximum relevance minimum redundancy, and identified 9 genes (BCL2, MRPS31, ID4, RASAL2, DLG2, MY01B, ZBTB5, PRKCQ and PPP6C) and 1 miscRNA (miscellaneous RNA, LOC646736) as important candidates involved in the progression of thyroid cancer. We further identified the protein-protein interaction (PPI) sub network from the shortest paths among the 9 genes in a PPI network constructed based on STRING database. Our results may provide insights to the molecular mechanism of the progression of thyroid cancer.
PMCID: PMC3981740  PMID: 24718460
5.  Invasion Biology Meets Parasitology: A Case Study of Parasite Spill-Back with Egyptian Fasciola gigantica in the Invasive Snail Pseudosuccinea columella 
PLoS ONE  2014;9(2):e88537.
The liver fluke Fasciola gigantica is a trematode parasite of ruminants and humans that occurs naturally in Africa and Asia. Cases of human fascioliasis, attributable at least in part to F. gigantica, are significantly increasing in the last decades. The introduced snail species Galba truncatula was already identified to be an important intermediate host for this parasite and the efficient invader Pseudosuccinea columella is another suspect in this case. Therefore, we investigated snails collected in irrigation canals in Fayoum governorate in Egypt for prevalence of trematodes with focus on P. columella and its role for the transmission of F. gigantica. Species were identified morphologically and by partial sequencing of the cytochrome oxidase subunit I gene (COI). Among all 689 snails found at the 21 sampling sites, P. columella was the most abundant snail with 296 individuals (42.96%) and it was also the most dominant species at 10 sites. It was not found at 8 sites. Molecular detection by PCR and sequencing of the ITS1-5.8S-ITS2 region of the ribosomal DNA (rDNA) revealed infections with F. gigantica (3.38%), Echinostoma caproni (2.36%) and another echinostome (7.09%) that could not be identified further according to its sequence. No dependency of snail size and trematode infection was found. Both high abundance of P. columella in the Fayoum irrigation system and common infection with F. gigantica might be a case of parasite spill-back (increased prevalence in local final hosts due to highly susceptible introduced intermediate host species) from the introduced P. columella to the human population, explaining at least partly the observed increase of reported fascioliasis-cases in Egypt. Eichhornia crassipes, the invasive water hyacinth, which covers huge areas of the irrigation canals, offers safe refuges for the amphibious P. columella during molluscicide application. As a consequence, this snail dominates snail communities and efficiently transmits F. gigantica.
PMCID: PMC3921205  PMID: 24523913
6.  De Novo Sequencing, Assembly, and Analysis of the Root Transcriptome of Persea americana (Mill.) in Response to Phytophthora cinnamomi and Flooding 
PLoS ONE  2014;9(2):e86399.
Avocado is a diploid angiosperm containing 24 chromosomes with a genome estimated to be around 920 Mb. It is an important fruit crop worldwide but is susceptible to a root rot caused by the ubiquitous oomycete Phytophthora cinnamomi. Phytophthora root rot (PRR) causes damage to the feeder roots of trees, causing necrosis. This leads to branch-dieback and eventual tree death, resulting in severe losses in production. Control strategies are limited and at present an integrated approach involving the use of phosphite, tolerant rootstocks, and proper nursery management has shown the best results. Disease progression of PRR is accelerated under high soil moisture or flooding conditions. In addition, avocado is highly susceptible to flooding, with even short periods of flooding causing significant losses. Despite the commercial importance of avocado, limited genomic resources are available. Next generation sequencing has provided the means to generate sequence data at a relatively low cost, making this an attractive option for non-model organisms such as avocado. The aims of this study were to generate sequence data for the avocado root transcriptome and identify stress-related genes. Tissue was isolated from avocado infected with P. cinnamomi, avocado exposed to flooding and avocado exposed to a combination of these two stresses. Three separate sequencing runs were performed on the Roche 454 platform and produced approximately 124 Mb of data. This was assembled into 7685 contigs, with 106 448 sequences remaining as singletons. Genes involved in defence pathways such as the salicylic acid and jasmonic acid pathways as well as genes associated with the response to low oxygen caused by flooding, were identified. This is the most comprehensive study of transcripts derived from root tissue of avocado to date and will provide a useful resource for future studies.
PMCID: PMC3919710  PMID: 24563685
7.  miR-342 Regulates BRCA1 Expression through Modulation of ID4 in Breast Cancer 
PLoS ONE  2014;9(1):e87039.
A miRNAs profiling on a group of familial and sporadic breast cancers showed that miRNA-342 was significantly associated with estrogen receptor (ER) levels. To investigate at functional level the role of miR-342 in the pathogenesis of breast cancer, we focused our attention on its “in silico” predicted putative target gene ID4, a transcription factor of the helix-loop-helix protein family whose expression is inversely correlated with that of ER. ID4 is expressed in breast cancer and can negatively regulate BRCA1 expression. Our results showed an inverse correlation between ID4 and miR-342 as well as between ID4 and BRCA1 expression. We functionally validated the interaction between ID4 and miR-342 in a reporter Luciferase system. Based on these findings, we hypothesized that regulation of ID4 mediated by miR-342 could be involved in the pathogenesis of breast cancer by downregulating BRCA1 expression. We functionally demonstrated the interactions between miR-342, ID4 and BRCA1 in a model provided by ER-negative MDA-MB-231 breast cancer cell line that presented high levels of ID4. Overexpression of miR-342 in these cells reduced ID4 and increased BRCA1 expression, supporting a possible role of this mechanism in breast cancer. In the ER-positive MCF7 and in the BRCA1-mutant HCC1937 cell lines miR-342 over-expression only reduced ID4. In the cohort of patients we studied, a correlation between miR-342 and BRCA1 expression was found in the ER-negative cases. As ER-negative cases were mainly BRCA1-mutant, we speculate that the mechanism we demonstrated could be involved in the decreased expression of BRCA1 frequently observed in non BRCA1-mutant breast cancers and could be implicated as a causal factor in part of the familial cases grouped in the heterogeneous class of non BRCA1 or BRCA2-mutant cases (BRCAx). To validate this hypothesis, the study should be extended to a larger cohort of ER-negative cases, including those belonging to the BRCAx class.
PMCID: PMC3903605  PMID: 24475217
8.  Is Gene Transcription Involved in Seed Dry After-Ripening? 
PLoS ONE  2014;9(1):e86442.
Orthodox seeds are living organisms that survive anhydrobiosis and may display dormancy, an inability to germinate at harvest. Seed germination potential can be acquired during a prolonged period of dry storage called after-ripening. The aim of this work was to determine if gene transcription is an underlying regulatory mechanism for dormancy alleviation during after-ripening. To identify changes in gene transcription strictly associated with the acquisition of germination potential but not with storage, we used seed storage at low relative humidity that maintains dormancy as control. Transcriptome profiling was performed using DNA microarray to compare change in gene transcript abundance between dormant (D), after-ripened non-dormant (ND) and after-ripened dormant seeds (control, C). Quantitative real-time polymerase chain reaction (qPCR) was used to confirm gene expression. Comparison between D and ND showed the differential expression of 115 probesets at cut-off values of two-fold change (p<0.05). Comparisons between both D and C with ND in transcript abundance showed that only 13 transcripts, among 115, could be specific to dormancy alleviation. qPCR confirms the expression pattern of these transcripts but without significant variation between conditions. Here we show that sunflower seed dormancy alleviation in the dry state is not related to regulated changes in gene expression.
PMCID: PMC3896479  PMID: 24466101
9.  Deep insights into Dictyocaulus viviparus transcriptomes provides unique prospects for new drug targets and disease intervention 
Biotechnology advances  2010;29(3):10.1016/j.biotechadv.2010.11.005.
The lungworm, Dictyocaulus viviparus, causes parasitic bronchitis in cattle, and is responsible for substantial economic losses in temperate regions of the world. Here, we undertake the first large-scale exploration of available transcriptomic data for this lungworm, examine differences in transcription between different stages/both genders and identify and prioritize essential molecules linked to fundamental metabolic pathways, which could represent novel drug targets. Approximately 3 million expressed sequence tags (ESTs), generated by 454 sequencing from third-stage larvae (L3) as well as adult females and males of D. viviparus, were assembled and annotated. The assembly of these sequences yielded ~61,000 contigs, of which relatively large proportions encoded collagens (4.3%), ubiquitins (2.1%) and serine/threonine protein kinases (1.9%). Subtractive analysis in silico identified 6,928 nucleotide sequences as being uniquely transcribed in L3, and 5,203 and 7,889 transcripts as being exclusive to the adult female and male, respectively. Most peptides predicted from the conceptual translations were nucleoplasmins (L3), serine/threonine protein kinases (female) and major sperm proteins (male). Additional analyses allowed the prediction of three drug target candidates, whose Caenorhabditis elegans homologues were linked to a lethal RNA interference phenotype. This detailed exploration, combined with future transcriptomic sequencing of all developmental stages of D. viviparus, will facilitate future investigations of the molecular biology of this parasitic nematode as well as genomic sequencing. These advances will underpin the discovery of new drug and/or vaccine targets, focused on biotechnological outcomes.
PMCID: PMC3827682  PMID: 21182926
Dictyocaulus viviparous; Bovine lungworm; Next-generation sequencing; Bioinformatics; Transcriptome; Ancylostoma-secreted proteins; Drug target prediction
10.  InCoB2013 introduces Systems Biology as a major conference theme 
BMC Systems Biology  2013;7(Suppl 3):S1.
The Asia-Pacific Bioinformatics Network (APBioNet) held the first International Conference on Bioinformatics (InCoB) in Bangkok in 2002 to promote North-South networking. Commencing as a forum for Asia-Pacific researchers to interact with and learn from with scientists of developed countries, InCoB has become a major regional bioinformatics conference, with participants from the region as well as North America and Europe. Since 2006, InCoB has selected the best submissions for publication in BMC Bioinformatics. In response to the growth and maturation of data-driven approaches, InCoB added BMC Genomics in 2009 and with the introduction of this conference supplement, BMC Systems Biology to its journal choices for submitting authors. Co-hosting InCoB2013 with the second International Conference for Translational Bioinformatics (ICTBI) is in line with InCoB's support for the current trend in taking bioinformatics to the bedside, along with a systems approach to solving biological problems.
PMCID: PMC3816296  PMID: 24555777
11.  APBioNet—Transforming Bioinformatics in the Asia-Pacific Region 
PLoS Computational Biology  2013;9(10):e1003317.
PMCID: PMC3814852  PMID: 24204244
12.  Transcriptome Analysis of Androgenic Gland for Discovery of Novel Genes from the Oriental River Prawn, Macrobrachium nipponense, Using Illumina Hiseq 2000 
PLoS ONE  2013;8(10):e76840.
The oriental river prawn, Macrobrachium nipponense, is an important aquaculture species in China, even in whole of Asia. The androgenic gland produces hormones that play crucial roles in sexual differentiation to maleness. This study is the first de novo M. nipponense transcriptome analysis using cDNA prepared from mRNA isolated from the androgenic gland. Illumina/Solexa was used for sequencing.
Methodology and Principal Finding
The total volume of RNA sample was more than 5 ug. We generated 70,853,361 high quality reads after eliminating adapter sequences and filtering out low-quality reads. A total of 78,408 isosequences were obtained by clustering and assembly of the clean reads, producing 57,619 non-redundant transcripts with an average length of 1244.19 bp. In total 70,702 isosequences were matched to the Nr database, additional analyses were performed by GO (33,203), KEGG (17,868), and COG analyses (13,817), identifying the potential genes and their functions. A total of 47 sex-determination related gene families were identified from the M. nipponense androgenic gland transcriptome based on the functional annotation of non-redundant transcripts and comparisons with the published literature. Furthermore, a total of 40 candidate novel genes were found, that may contribute to sex-determination based on their extremely high expression levels in the androgenic compared to other sex glands,. Further, 437 SSRs and 65,535 high-confidence SNPs were identified in this EST dataset from which 14 EST-SSR markers have been isolated.
Our study provides new sequence information for M. nipponense, which will be the basis for further genetic studies on decapods crustaceans. More importantly, this study dramatically improves understanding of sex-determination mechanisms, and advances sex-determination research in all crustacean species. The huge number of potential SSR and SNP markers isolated from the transcriptome may shed the lights on research in many fields, including the evolution and molecular ecology of Macrobrachium species.
PMCID: PMC3810145  PMID: 24204682
13.  Simple re-instantiation of small databases using cloud computing 
BMC Genomics  2013;14(Suppl 5):S13.
Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress.
We describe a Web-accessible system, available online at, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (, preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases.
Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.
PMCID: PMC3852246  PMID: 24564380
Database archival; Re-instantiation; Cloud computing; BioSLAX; biodb100; MIABi
14.  Transcriptome Analysis of Litopenaeus vannamei in Response to White Spot Syndrome Virus Infection 
PLoS ONE  2013;8(8):e73218.
Pacific white shrimp (Litopenaeus vannamei) is the most extensively farmed crustacean species in the world. White spot syndrome virus (WSSV) is one of the major pathogens in the cultured shrimp. However, the molecular mechanisms of the host-virus interaction remain largely unknown. In this study, the impact of WSSV infection on host gene expression in the hepatopancreas of L. vannamei was investigated through the use of 454 pyrosequencing-based RNA-Seq of cDNA libraries developed from WSSV-challenged shrimp or normal controls. By comparing the two cDNA libraries, we show that 767 host genes are significantly up-regulated and 729 genes are significantly down-regulated by WSSV infection. KEGG analysis of the differentially expressed genes indicated that the distribution of gene pathways between the up- and down-regulated genes is quite different. Among the differentially expressed genes, several are found to be involved in various processes of animal defense against pathogens such as apoptosis, mitogen-activated protein kinase (MAPK) signaling, toll-like receptor (TLR) signaling, Wnt signaling and antigen processing and presentation pathways. The present study provides valuable information on differential expression of L. vannamei genes following WSSV infection and improves our current understanding of this host-virus interaction. In addition, the large number of transcripts obtained in this study provides a strong basis for future genomic research on shrimp.
PMCID: PMC3753264  PMID: 23991181
15.  Nitric Oxide Mediates Root K+/Na+ Balance in a Mangrove Plant, Kandelia obovata, by Enhancing the Expression of AKT1-Type K+ Channel and Na+/H+ Antiporter under High Salinity 
PLoS ONE  2013;8(8):e71543.
It is well known that nitric oxide (NO) enhances salt tolerance of glycophytes. However, the effect of NO on modulating ionic balance in halophytes is not very clear. This study focuses on the role of NO in mediating K+/Na+ balance in a mangrove species, Kandelia obovata Sheue, Liu and Yong. We first analyzed the effects of sodium nitroprusside (SNP), an NO donor, on ion content and ion flux in the roots of K. obovata under high salinity. The results showed that 100 μM SNP significantly increased K+ content and Na+ efflux, but decreased Na+ content and K+ efflux. These effects of NO were reversed by specific NO synthesis inhibitor and scavenger, which confirmed the role of NO in retaining K+ and reducing Na+ in K. obovata roots. Using western-blot analysis, we found that NO increased the protein expression of plasma membrane (PM) H+-ATPase and vacuolar Na+/H+ antiporter, which were crucial proteins for ionic balance. To further clarify the molecular mechanism of NO-modulated K+/Na+ balance, partial cDNA fragments of inward-rectifying K+ channel, PM Na+/H+ antiporter, PM H+-ATPase, vacuolar Na+/H+ antiporter and vacuolar H+-ATPase subunit c were isolated. Results of quantitative real-time PCR showed that NO increased the relative expression levels of these genes, while this increase was blocked by NO synthesis inhibitors and scavenger. Above results indicate that NO greatly contribute to K+/Na+ balance in high salinity-treated K. obovata roots, by activating AKT1-type K+ channel and Na+/H+ antiporter, which are the critical components in K+/Na+ transport system.
PMCID: PMC3747234  PMID: 23977070
16.  First transcriptomic analysis of the economically important parasitic nematode, Trichostrongylus colubriformis, using a next-generation sequencing approach 
Trichostrongylus colubriformis (Strongylida), a small intestinal nematode of small ruminants, is a major cause of production and economic losses in many countries. The aims of the present study were to define the transcriptome of the adult stage of T. colubriformis, using 454 sequencing technology and bioinformatic analyses, and to predict the main pathways that key groups of molecules are linked to in this nematode. A total of 21,259 contigs were assembled from the sequence data produced from a normalized cDNA library; 7,876 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and encoded, amongst others, proteins with ‘transthyretin-like’ (8.8%), ‘RNA recognition’ (8.4%) and ‘metridin-like ShK toxin’ (7.6%) motifs. Bioinformatic analyses inferred that relatively high proportions of the C. elegans homologues are involved in biological pathways linked to ‘peptidases’ (4%), ‘ribosome’ (3.6%) and ‘oxidative phosphorylation’ (3%). Highly represented were peptides predicted to be associated with the nervous system, digestion of host proteins or inhibition of host proteases. Probabilistic functional gene networking of the complement of C. elegans orthologues (n = 2,126) assigned significance to particular subsets of molecules, such as protein kinases and serine/threonine phosphatases. The present study represents the first, comprehensive insight into the transcriptome of adult T. colubriformis, which provides a foundation for fundamental studies of the molecular biology and biochemistry of this parasitic nematode as well as prospects for identifying targets for novel nematocides. Future investigations should focus on comparing the transcriptomes of different developmental stages, both genders and various tissues of this parasitic nematode for the prediction of essential genes/gene products that are specific to nematodes.
PMCID: PMC3666958  PMID: 20692378
Trichostrongylus colubriformis; Transcriptome; Next-generation sequencing; Bioinformatics; Peptidases; Ancylostoma-secreted proteins
17.  CLCAs - A Family of Metalloproteases of Intriguing Phylogenetic Distribution and with Cases of Substituted Catalytic Sites 
PLoS ONE  2013;8(5):e62272.
The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.
PMCID: PMC3650047  PMID: 23671590
18.  Identification of ovarian cancer associated genes using an integrated approach in a Boolean framework 
BMC Systems Biology  2013;7:12.
Cancer is a complex disease where molecular mechanism remains elusive. A systems approach is needed to integrate diverse biological information for the prognosis and therapy risk assessment using mechanistic approach to understand gene interactions in pathways and networks and functional attributes to unravel the biological behaviour of tumors.
We weighted the functional attributes based on various functional properties observed between cancerous and non-cancerous genes reported from literature. This weighing schema was then encoded in a Boolean logic framework to rank differentially expressed genes. We have identified 17 genes to be differentially expressed from a total of 11,173 genes, where ten genes are reported to be down-regulated via epigenetic inactivation and seven genes are up-regulated. Here, we report that the overexpressed genes IRAK1, CHEK1 and BUB1 may play an important role in ovarian cancer. We also show that these 17 genes can be used to form an ovarian cancer signature, to distinguish normal from ovarian cancer subjects and that the set of three genes, CHEK1, AR, and LYN, can be used to classify good and poor prognostic tumors.
We provided a workflow using a Boolean logic schema for the identification of differentially expressed genes by integrating diverse biological information. This integrated approach resulted in the identification of genes as potential biomarkers in ovarian cancer.
PMCID: PMC3605242  PMID: 23383610
19.  Effects of Warm Ischemic Time on Gene Expression Profiling in Colorectal Cancer Tissues and Normal Mucosa 
PLoS ONE  2013;8(1):e53406.
Genome-wide gene expression analyses of tumors are a powerful tool to identify gene signatures associated with biologically and clinically relevant characteristics and for several tumor types are under clinical validation by prospective trials. However, handling and processing of clinical specimens may significantly affect the molecular data obtained from their analysis. We studied the effects of tissue handling time on gene expression in human normal and tumor colon tissues undergoing routine surgical procedures.
RNA extracted from specimens of 15 patients at four time points (for a total of 180 samples) after surgery was analyzed for gene expression on high-density oligonucleotide microarrays. A mixed-effects model was used to identify probes with different expression means across the four different time points. The p-values of the model were adjusted with the Bonferroni method.
Thirty-two probe sets associated with tissue handling time in the tumor specimens, and thirty-one in the normal tissues, were identified. Most genes exhibited moderate changes in expression over the time points analyzed; however four of them were oncogenes, and two confirmed the effect of tissue handling by independent validation.
Our results suggest that a critical time point for tissue handling in colon seems to be 60 minutes at room temperature. Although the number of time-dependent genes we identified was low, the three genes that already showed changes at this time point in tumor samples were all oncogenes, hence recommending standardization of tissue-handling protocols and effort to reduce the time from specimen removal to snap freezing accounting for warm ischemia in this tumor type.
PMCID: PMC3538764  PMID: 23308215
20.  T-Cell Receptors Binding Orientation over Peptide/MHC Class I Is Driven by Long-Range Interactions 
PLoS ONE  2012;7(12):e51943.
Crystallographic data about T-Cell Receptor – peptide – major histocompatibility complex class I (TCRpMHC) interaction have revealed extremely diverse TCR binding modes triggering antigen recognition. Understanding the molecular basis that governs TCR orientation over pMHC is still a considerable challenge. We present a simplified rigid approach applied on all non-redundant TCRpMHC crystal structures available. The CHARMM force field in combination with the FACTS implicit solvation model is used to study the role of long-distance interactions between the TCR and pMHC. We demonstrate that the sum of the coulomb interactions and the electrostatic solvation energies is sufficient to identify two orientations corresponding to energetic minima at 0° and 180° from the native orientation. Interestingly, these results are shown to be robust upon small structural variations of the TCR such as changes induced by Molecular Dynamics simulations, suggesting that shape complementarity is not required to obtain a reliable signal. Accurate energy minima are also identified by confronting unbound TCR crystal structures to pMHC. Furthermore, we decompose the electrostatic energy into residue contributions to estimate their role in the overall orientation. Results show that most of the driving force leading to the formation of the complex is defined by CDR1,2/MHC interactions. This long-distance contribution appears to be independent from the binding process itself, since it is reliably identified without considering neither short-range energy terms nor CDR induced fit upon binding. Ultimately, we present an attempt to predict the TCR/pMHC binding mode for a TCR structure obtained by homology modeling. The simplicity of the approach and the absence of any fitted parameters make it also easily applicable to other types of macromolecular protein complexes.
PMCID: PMC3522592  PMID: 23251658
21.  Helminth secretome database (HSD): a collection of helminth excretory/secretory proteins predicted from expressed sequence tags (ESTs) 
BMC Genomics  2012;13(Suppl 7):S8.
Helminths are important socio-economic organisms, responsible for causing major parasitic infections in humans, other animals and plants. These infections impose a significant public health and economic burden globally. Exceptionally, some helminth organisms like Caenorhabditis elegans are free-living in nature and serve as model organisms for studying parasitic infections. Excretory/secretory proteins play an important role in parasitic helminth infections which make these proteins attractive targets for therapeutic use. In the case of helminths, large volume of expressed sequence tags (ESTs) has been generated to understand parasitism at molecular level and for predicting excretory/secretory proteins for developing novel strategies to tackle parasitic infections. However, mostly predicted ES proteins are not available for further analysis and there is no repository available for such predicted ES proteins. Furthermore, predictions have, in the main, focussed on classical secretory pathways while it is well established that helminth parasites also utilise non-classical secretory pathways.
We developed a free Helminth Secretome Database (HSD), which serves as a repository for ES proteins predicted using classical and non-classical secretory pathways, from EST data for 78 helminth species (64 nematodes, 7 trematodes and 7 cestodes) ranging from parasitic to free-living organisms. Approximately 0.9 million ESTs compiled from the largest EST database, dbEST were cleaned, assembled and analysed by different computational tools in our bioinformatics pipeline and predicted ES proteins were submitted to HSD.
We report the large-scale prediction and analysis of classically and non-classically secreted ES proteins from diverse helminth organisms. All the Unigenes (contigs and singletons) and excretory/secretory protein datasets generated from this analysis are freely available. A BLAST server is available at, for checking the sequence similarity of new protein sequences against predicted helminth ES proteins.
PMCID: PMC3546426  PMID: 23281827
22.  TranSeqAnnotator: large-scale analysis of transcriptomic data 
BMC Bioinformatics  2012;13(Suppl 17):S24.
The transcriptome of an organism can be studied with the analysis of expressed sequence tag (EST) data sets that offers a rapid and cost effective approach with several new and updated bioinformatics approaches and tools for assembly and annotation. The comprehensive analyses comprehend an organism along with the genome and proteome analysis. With the advent of large-scale sequencing projects and generation of sequence data at protein and cDNA levels, automated analysis pipeline is necessary to store, organize and annotate ESTs.
TranSeqAnnotator is a workflow for large-scale analysis of transcriptomic data with the most appropriate bioinformatics tools for data management and analysis. The pipeline automatically cleans, clusters, assembles and generates consensus sequences, conceptually translates these into possible protein products and assigns putative function based on various DNA and protein similarity searches. Excretory/secretory (ES) proteins inferred from ESTs/short reads are also identified. The TranSeqAnnotator accepts FASTA format raw and quality ESTs along with protein and short read sequences and are analysed with user selected programs. After pre-processing and assembly, the dataset is annotated at the nucleotide, protein and ES protein levels.
TranSeqAnnotator has been developed in a Linux cluster, to perform an exhaustive and reliable analysis and provide detailed annotation. TranSeqAnnotator outputs gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. The pipeline is applied to annotate large EST datasets to identify several novel and known genes with therapeutic experimental validations and could serve as potential targets for parasite intervention. TransSeqAnnotator is freely available for the scientific community at
PMCID: PMC3521237  PMID: 23282024
23.  An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications 
BMC Genomics  2012;13(Suppl 7):S10.
Teladorsagia circumcincta (order Strongylida) is an economically important parasitic nematode of small ruminants (including sheep and goats) in temperate climatic regions of the world. Improved insights into the molecular biology of this parasite could underpin alternative methods required to control this and related parasites, in order to circumvent major problems associated with anthelmintic resistance. The aims of the present study were to define the transcriptome of the adult stage of T. circumcincta and to infer the main pathways linked to molecules known to be expressed in this nematode. Since sheep develop acquired immunity against T. circumcincta, there is some potential for the development of a vaccine against this parasite. Hence, we infer excretory/secretory molecules for T. circumcincta as possible immunogens and vaccine candidates.
A total of 407,357 ESTs were assembled yielding 39,852 putative gene sequences. Conceptual translation predicted 24,013 proteins, which were then subjected to detailed annotation which included pathway mapping of predicted proteins (including 112 excreted/secreted [ES] and 226 transmembrane peptides), domain analysis and GO annotation was carried out using InterProScan along with BLAST2GO. Further analysis was carried out for secretory signal peptides using SignalP and non-classical sec pathway using SecretomeP tools.
For ES proteins, key pathways, including Fc epsilon RI, T cell receptor, and chemokine signalling as well as leukocyte transendothelial migration were inferred to be linked to immune responses, along with other pathways related to neurodegenerative diseases and infectious diseases, which warrant detailed future studies. KAAS could identify new and updated pathways like phagosome and protein processing in endoplasmic reticulum. Domain analysis for the assembled dataset revealed families of serine, cysteine and proteinase inhibitors which might represent targets for parasite intervention. InterProScan could identify GO terms pertaining to the extracellular region. Some of the important domain families identified included the SCP-like extracellular proteins which belong to the pathogenesis-related proteins (PRPs) superfamily along with C-type lectin, saposin-like proteins. The 'extracellular region' that corresponds to allergen V5/Tpx-1 related, considered important in parasite-host interactions, was also identified.
Six cysteine motif (SXC1) proteins, transthyretin proteins, C-type lectins, activation-associated secreted proteins (ASPs), which could represent potential candidates for developing novel anthelmintics or vaccines were few other important findings. Of these, SXC1, protein kinase domain-containing protein, trypsin family protein, trypsin-like protease family member (TRY-1), putative major allergen and putative lipid binding protein were identified which have not been reported in the published T. circumcincta proteomics analysis.
Detailed analysis of 6,058 raw EST sequences from dbEST revealed 315 putatively secreted proteins. Amongst them, C-type single domain activation associated secreted protein ASP3 precursor, activation-associated secreted proteins (ASP-like protein), cathepsin B-like cysteine protease, cathepsin L cysteine protease, cysteine protease, TransThyretin-Related and Venom-Allergen-like proteins were the key findings.
We have annotated a large dataset ESTs of T. circumcincta and undertaken detailed comparative bioinformatics analyses. The results provide a comprehensive insight into the molecular biology of this parasite and disease manifestation which provides potential focal point for future research. We identified a number of pathways responsible for immune response. This type of large-scale computational scanning could be coupled with proteomic and metabolomic studies of this parasite leading to novel therapeutic intervention and disease control strategies. We have also successfully affirmed the use of bioinformatics tools, for the study of ESTs, which could now serve as a benchmark for the development of new computational EST analysis pipelines.
PMCID: PMC3521389  PMID: 23282110
24.  Automated Analysis and Reannotation of Subcellular Locations in Confocal Images from the Human Protein Atlas 
PLoS ONE  2012;7(11):e50514.
The Human Protein Atlas contains immunofluorescence images showing subcellular locations for thousands of proteins. These are currently annotated by visual inspection. In this paper, we describe automated approaches to analyze the images and their use to improve annotation. We began by training classifiers to recognize the annotated patterns. By ranking proteins according to the confidence of the classifier, we generated a list of proteins that were strong candidates for reexamination. In parallel, we applied hierarchical clustering to group proteins and identified proteins whose annotations were inconsistent with the remainder of the proteins in their cluster. These proteins were reexamined by the original annotators, and a significant fraction had their annotations changed. The results demonstrate that automated approaches can provide an important complement to visual annotation.
PMCID: PMC3511558  PMID: 23226299
25.  A Complex Set of Sex Pheromones Identified in the Cuttlefish Sepia officinalis 
PLoS ONE  2012;7(10):e46531.
The cephalopod mollusk Sepia officinalis can be considered as a relevant model for studying reproduction strategies associated to seasonal migrations. Using transcriptomic and peptidomic approaches, we aim to identify peptide sex pheromones that are thought to induce the aggregation of mature cuttlefish in their egg-laying areas.
To facilitate the identification of sex pheromones, 576 5′-expressed sequence tags (ESTs) were sequenced from a single cDNA library generated from accessory sex glands of female cuttlefish. Our analysis yielded 223 unique sequences composed of 186 singletons and 37 contigs. Three major redundant ESTs called SPα, SPα′ and SPβ were identified as good candidates for putative sex pheromone transcripts and are part of the 87 unique sequences classified as unknown. The alignment of translated SPα and SPα′ revealed a high level of conservation, with 98.4% identity. Translation led to a 248-amino acid precursor containing six peptides with multiple putative disulfide bonds. The alignment of SPα-α′ with SPβ revealed a partial structural conservation, with 37.3% identity. Translation of SPβ led to a 252-amino acid precursor containing five peptides. The occurrence of a signal peptide on SPα, SPα′ and SPβ showed that the peptides were secreted. RT-PCR and mass spectrometry analyses revealed a co-localization of transcripts and expression products in the oviduct gland. Preliminary in vitro experiments performed on gills and penises revealed target organs involved in mating and ventilation.
The analysis of the accessory sex gland transcriptome of Sepia officinalis led to the identification of peptidic sex pheromones. Although preliminary functional tests suggested the involvement of the α3 and β2 peptides in ventilation and mating stimulation, further functional investigations will make it possible to identify the complete set of biological activities expected from waterborne pheromones.
PMCID: PMC3484142  PMID: 23118854

Results 1-25 (80)