Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Phylogenetic Analysis and Classification of the Fungal bHLH Domain 
Molecular Biology and Evolution  2011;29(5):1301-1318.
The basic Helix-Loop-Helix (bHLH) domain is an essential highly conserved DNA-binding domain found in many transcription factors in all eukaryotic organisms. The bHLH domain has been well studied in the Animal and Plant Kingdoms but has yet to be characterized within Fungi. Herein, we obtained and evaluated the phylogenetic relationship of 490 fungal-specific bHLH containing proteins from 55 whole genome projects composed of 49 Ascomycota and 6 Basidiomycota organisms. We identified 12 major groupings within Fungi (F1–F12); identifying conserved motifs and functions specific to each group. Several classification models were built to distinguish the 12 groups and elucidate the most discerning sites in the domain. Performance testing on these models, for correct group classification, resulted in a maximum sensitivity and specificity of 98.5% and 99.8%, respectively. We identified 12 highly discerning sites and incorporated those into a set of rules (simplified model) to classify sequences into the correct group. Conservation of amino acid sites and phylogenetic analyses established that like plant bHLH proteins, fungal bHLH–containing proteins are most closely related to animal Group B. The models used in these analyses were incorporated into a software package, the source code for which is available at
PMCID: PMC3339315  PMID: 22114358
bHLH; fungal; phylogeny; discriminant; analysis
2.  Spectral Analysis of Sequence Variability in Basic-Helix-loop-helix (bHLH) Protein Domains 
The basic helix-loop-helix (bHLH) family of transcription factors is used as a paradigm to explore structural implications of periodicity patterns in amino acid sequence variability. A Boltzmann-Shannon entropy profile represents site-by-site amino acid variation in the bHLH domain. Spectral analysis of almost 200 bHLH sequences documents the periodic nature of the bHLH sequence variation. Spectral analyses provide strong evidence that the patterns of amino acid variation in large numbers of sequences conform to the classical α-helix three-dimensional structure periodicity of 3.6 amino acids per turn. Multivariate indices of amino acid physiochemical attributes derived from almost 500 amino acid attributes are used to provide information regarding the underlying causal components of the bHLH sequence variability. Five multivariate attribute indices are used that reflect patterns in i) polarity - hydrophobicity - accessibility, ii) propensity for secondary structures, iii) molecular volume, iv) codon composition and v) electrostatic charge. Multiple regression analyses of the entropy values as dependent variables and the factor score means and variances as independent variables are used to partition variation in entropy values into their underlying causal structural components.
PMCID: PMC2674655  PMID: 19455213
Spectral analysis; bHLH proteins; entropy; factor analysis; molecular architecture
3.  A Novel N-Terminal Domain May Dictate the Glucose Response of Mondo Proteins 
PLoS ONE  2012;7(4):e34803.
Glucose is a fundamental energy source for both prokaryotes and eukaryotes. The balance between glucose utilization and storage is integral for proper energy homeostasis, and defects are associated with several diseases, e.g. type II diabetes. In vertebrates, the transcription factor ChREBP is a major component in glucose metabolism, while its ortholog MondoA is involved in glucose uptake. Both MondoA and ChREBP contain five Mondo conserved regions (MCRI-V) that affect their cellular localization and transactivation ability. While phosphorylation has been shown to affect ChREBP function, the mechanisms controlling glucose response of both ChREBP and MondoA remain elusive. By incorporating sequence analysis techniques, structure predictions, and functional annotations, we synthesized data surrounding Mondo family proteins into a cohesive, accurate, and general model involving the MCRs and two additional domains that determine ChREBP and MondoA glucose response. Paramount, we identified a conserved motif within the transactivation region of Mondo family proteins and propose that this motif interacts with the phosphorylated form of glucose. In addition, we discovered a putative nuclear receptor box in non-vertebrate Mondo and vertebrate ChREBP sequences that reveals a potentially novel interaction with nuclear receptors. These interactions are likely involved in altering ChREBP and MondoA conformation to form an active complex and induce transcription of genes involved in glucose metabolism and lipogenesis.
PMCID: PMC3323566  PMID: 22506051
4.  Brain cancer prognosis: independent validation of a clinical bioinformatics approach 
Translational and evidence based medicine can take advantage of biotechnology advances that offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. The clinical information hidden in these data can be clarified with clinical bioinformatics approaches. We have recently proposed a method to analyze different layers of high-throughput (omic) data to preserve the emergent properties that appear in the cellular system when all molecular levels are interacting. We show here that this method applied to brain cancer data can uncover properties (i.e. molecules related to protective versus risky features in different types of brain cancers) that have been independently validated as survival markers, with potential important application in clinical practice.
PMCID: PMC3296594  PMID: 22297051
glioblastoma; survival; system; emergent property; high-throughput biology
5.  Evolution of the Max and Mlx Networks in Animals 
Transcription factors (TFs) are essential for the regulation of gene expression and often form emergent complexes to perform vital roles in cellular processes. In this paper, we focus on the parallel Max and Mlx networks of TFs because of their critical involvement in cell cycle regulation, proliferation, growth, metabolism, and apoptosis. A basic-helix-loop-helix-zipper (bHLHZ) domain mediates the competitive protein dimerization and DNA binding among Max and Mlx network members to form a complex system of cell regulation. To understand the importance of these network interactions, we identified the bHLHZ domain of Max and Mlx network proteins across the animal kingdom and carried out several multivariate statistical analyses. The presence and conservation of Max and Mlx network proteins in animal lineages stemming from the divergence of Metazoa indicate that these networks have ancient and essential functions. Phylogenetic analysis of the bHLHZ domain identified clear relationships among protein families with distinct points of radiation and divergence. Multivariate discriminant analysis further isolated specific amino acid changes within the bHLHZ domain that classify proteins, families, and network configurations. These analyses on Max and Mlx network members provide a model for characterizing the evolution of TFs involved in essential networks.
PMCID: PMC3177325  PMID: 21859806
protein evolution; basic-helix-loop-helix-leucine zipper (bHLHZ) domain; Myc/Max/Mad network; Mlx and Mondo Network; phylogenetic tree; discriminant analysis
6.  Joint analysis of transcriptional and post- transcriptional brain tumor data: searching for emergent properties of cellular systems 
BMC Bioinformatics  2011;12:86.
Advances in biotechnology offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. However, to date, most computational and algorithmic efforts have been directed at mining data from each of these molecular levels (genomic, transcriptional, etc.) separately. In view of the rapid advances in technology (new generation sequencing, high-throughput proteomics) it is important to address the problem of analyzing these data as a whole, i.e. preserving the emergent properties that appear in the cellular system when all molecular levels are interacting. We analyzed one of the (currently) few datasets that provide both transcriptional and post-transcriptional data of the same samples to investigate the possibility to extract more information, using a joint analysis approach.
We use Factor Analysis coupled with pre-established knowledge as a theoretical base to achieve this goal. Our intention is to identify structures that contain information from both mRNAs and miRNAs, and that can explain the complexity of the data. Despite the small sample available, we can show that this approach permits identification of meaningful structures, in particular two polycistronic miRNA genes related to transcriptional activity and likely to be relevant in the discrimination between gliosarcomas and other brain tumors.
This suggests the need to develop methodologies to simultaneously mine information from different levels of biological organization, rather than linking separate analyses performed in parallel.
PMCID: PMC3078861  PMID: 21450054
7.  Site-specific evolutionary rates in proteins are better modeled as non-independent and strictly relative 
Bioinformatics  2008;24(19):2177-2183.
Motivation: In a nucleotide or amino acid sequence, not all sites evolve at the same rate, due to differing selective constraints at each site. Currently in computational molecular evolution, models incorporating rate heterogeneity always share two assumptions. First, the rate of evolution at each site is assumed to be independent of every other site. Second, the values of these rates are assumed to be drawn from a known prior distribution. Although often assumed to be small, the actual effect of these assumptions has not been previously quantified in the literature.
Results: Herein we describe an algorithm to simultaneously infer the set of n−1 relative rates that parameterize the likelihood of an n-site alignment. Unlike previous work (a) these relative rates are completely identifiable and distinct from the branch-length parameters, and (b) a far more general class of rate priors can be used, and their effects quantified. Although described in a Bayesian framework, we discuss a future maximum likelihood extension.
Conclusions: Using both synthetic data and alignments from the Myc, Max and p53 protein families, we find that inferring relative rather than absolute rates has several advantages. First, both empirical likelihoods and Bayes factors show strong preference for the relative-rate model, with a mean Δ ln P=−0.458 per alignment site. Second, the computed likelihoods and Bayes factors were essentially independent of the relative-rate prior, indicating that good estimates of the posterior rate distribution are not required a priori. Third, a novel finding is that rates can be accurately inferred even when up to ≈4 substitutions per site have occurred. Thus biologically relevant putative hypervariable sites can be identified as easily as conserved sites. Lastly, our model treats rates and tree branch-lengths as completely identifiable, allowing for the first time coherent simultaneous inference of branch-lengths and site-specific evolutionary rates.
Availability: Source code for the utility described is available under a BSD-style license at
Supplementary information: Supplementary data is available at Bioinformatics online.
PMCID: PMC2553437  PMID: 18662926
8.  Gaussian Quadrature Formulae for Arbitrary Positive Measures 
We present computational methods and subroutines to compute Gaussian quadrature integration formulas for arbitrary positive measures. For expensive integrands that can be factored into well-known forms, Gaussian quadrature schemes allow for efficient evaluation of high-accuracy and -precision numerical integrals, especially compared to general ad hoc schemes. In addition, for certain well-known density measures (the normal, gamma, log-normal, Student’s t, inverse-gamma, beta, and Fisher’s F) we present exact formulae for computing the respective quadrature scheme.
PMCID: PMC2674649  PMID: 19455218

Results 1-8 (8)