Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)
more »
Year of Publication
Document Types
1.  dissectHMMER: a HMMER-based score dissection framework that statistically evaluates fold-critical sequence segments for domain fold similarity 
Biology Direct  2015;10:39.
Annotation transfer for function and structure within the sequence homology concept essentially requires protein sequence similarity for the secondary structural blocks forming the fold of a protein. A simplistic similarity approach in the case of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc.) is not justified and a pertinent source for mistaken homologies. The latter is either due to positional sequence conservation as a result of a very simple, physically induced pattern or integral sequence properties that are critical for function. Furthermore, against the backdrop that the number of well-studied proteins continues to grow at a slow rate, it necessitates for a search methodology to dive deeper into the sequence similarity space to connect the unknown sequences to the well-studied ones, albeit more distant, for biological function postulations.
Based on our previous work of dissecting the hidden markov model (HMMER) based similarity score into fold-critical and the non-globular contributions to improve homology inference, we propose a framework-dissectHMMER, that identifies more fold-related domain hits from standard HMMER searches. Subsequent statistical stratification of the fold-related hits into cohorts of functionally-related domains allows for the function postulation of the query sequence. Briefly, the technical problems as to how to recognize non-globular parts in the domain model, resolve contradictory HMMER2/HMMER3 results and evaluate fold-related domain hits for homology, are addressed in this work. The framework is benchmarked against a set of SCOP-to-Pfam domain models. Despite being a sequence-to-profile method, dissectHMMER performs favorably against a profile-to-profile based method-HHsuite/HHsearch. Examples of function annotation using dissectHMMER, including the function discovery of an uncharacterized membrane protein Q9K8K1_BACHD (WP_010899149.1) as a lactose/H+ symporter, are presented. Finally, dissectHMMER webserver is made publicly available at
The proposed framework-dissectHMMER, is faithful to the original inception of the sequence homology concept while improving upon the existing HMMER search tool through the rescue of statistically evaluated false-negative yet fold-related domain hits to the query sequence. Overall, this translates into an opportunity for any novel protein sequence to be functionally characterized.
This article was reviewed by Masanori Arita, Shamil Sunyaev and L. Aravind.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0068-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4521371  PMID: 26228544
Sequence homology; Hidden Markov model; Sequence similarity search; Fold-critical sequence segment; Non-globular sequence segment; Similarity score dissection
2.  A new piece in the puzzle of the novel avian-origin influenza A (H7N9) virus 
Biology Direct  2013;8:26.
This article was reviewed by Prof Xiufan Liu (nominated by Dr Purificacion Lopez-Garcia) and Prof Sandor Pongor.
Using phylogenetic analysis on newly available sequences, we characterize A/chicken/Jiangsu/RD5/2013(H10N9) as currently closest precursor strain for the NA segment in the novel avian-origin H7N9 virus responsible for an outbreak in China. We also show that the internal segments of this precursor strain are closely related to those of the presumed precursor for the HA segment, A/duck/Zhejiang/12/2011(H7N3), which indicates that the sources of both HA and NA donors for the reassortant virus are of regional and not migratory-bird origin and highlights the role of chicken already in the early reassortment events.
PMCID: PMC4016609  PMID: 24160334
Avian influenza; Zoonotic infections; Phylogeny; Reassortment history
3.  Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins 
Biology Direct  2011;6:57.
Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues.
We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information.
For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments.
This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian.
PMCID: PMC3217874  PMID: 22024092
4.  Mapping the sequence mutations of the 2009 H1N1 influenza A virus neuraminidase relative to drug and antibody binding sites 
Biology Direct  2009;4:18.
In this work, we study the consequences of sequence variations of the "2009 H1N1" (swine or Mexican flu) influenza A virus strain neuraminidase for drug treatment and vaccination. We find that it is phylogenetically more closely related to European H1N1 swine flu and H5N1 avian flu rather than to the H1N1 counterparts in the Americas. Homology-based 3D structure modeling reveals that the novel mutations are preferentially located at the protein surface and do not interfere with the active site. The latter is the binding cavity for 3 currently used neuraminidase inhibitors: oseltamivir (Tamiflu®), zanamivir (Relenza®) and peramivir; thus, the drugs should remain effective for treatment. However, the antigenic regions of the neuraminidase relevant for vaccine development, serological typing and passive antibody treatment can differ from those of previous strains and already vary among patients.
This article was reviewed by Sandor Pongor and L. Aravind.
PMCID: PMC2691737  PMID: 19457254
5.  On the necessity of different statistical treatment for Illumina BeadChip and Affymetrix GeneChip data and its significance for biological interpretation 
Biology Direct  2008;3:23.
The original spotted array technology with competitive hybridization of two experimental samples and measuring relative expression levels is increasingly displaced by more accurate platforms that allow determining absolute expression values for a single sample (for example, Affymetrix GeneChip and Illumina BeadChip). Unfortunately, cross-platform comparisons show a disappointingly low concordance between lists of regulated genes between the latter two platforms.
Whereas expression values determined with a single Affymetrix GeneChip represent single measurements, the expression results obtained with Illumina BeadChip are essentially statistical means from several dozens of identical probes. In the case of multiple technical replicates, the data require, therefore, different stistical treatment depending on the platform. The key is the computation of the squared standard deviation within replicates in the case of the Illumina data as weighted mean of the square of the standard deviations of the individual experiments. With an Illumina spike experiment, we demonstrate dramatically improved significance of spiked genes over all relevant concentration ranges. The re-evaluation of two published Illumina datasets (membrane type-1 matrix metalloproteinase expression in mammary epithelial cells by Golubkov et al. Cancer Research (2006) 66, 10460; spermatogenesis in normal and teratozoospermic men, Platts et al. Human Molecular Genetics (2007) 16, 763) significantly identified more biologically relevant genes as transcriptionally regulated targets and, thus, additional biological pathways involved.
The results in this work show that it is important to process Illumina BeadChip data in a modified statistical procedure and to compute the standard deviation in experiments with technical replicates from the standard errors of individual BeadChips. This change leads also to an improved concordance with Affymetrix GeneChip results as the spermatogenesis dataset re-evaluation demonstrates.
This article was reviewed by I. King Jordan, Mark J. Dunning and Shamil Sunyaev.
PMCID: PMC2453111  PMID: 18522715
6.  pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model 
Biology Direct  2007;2:1.
Protein kinase A (cAMP-dependent kinase, PKA) is a serine/threonine kinase, for which ca. 150 substrate proteins are known. Based on a refinement of the recognition motif using the available experimental data, we wished to apply the simplified substrate protein binding model for accurate prediction of PKA phosphorylation sites, an approach that was previously successful for the prediction of lipid posttranslational modifications and of the PTS1 peroxisomal translocation signal.
Approximately 20 sequence positions flanking the phosphorylated residue on both sides have been found to be restricted in their sequence variability (region -18...+23 with the site at position 0). The conserved physical pattern can be rationalized in terms of a qualitative binding model with the catalytic cleft of the protein kinase A. Positions -6...+4 surrounding the phosphorylation site are influenced by direct interaction with the kinase in a varying degree. This sequence stretch is embedded in an intrinsically disordered region composed preferentially of hydrophilic residues with flexible backbone and small side chain. This knowledge has been incorporated into a simplified analytical model of productive binding of substrate proteins with PKA.
The scoring function of the pkaPS predictor can confidently discriminate PKA phosphorylation sites from serines/threonines with non-permissive sequence environments (sensitivity of ~96% at a specificity of ~94%). The tool "pkaPS" has been applied on the whole human proteome. Among new predicted PKA targets, there are entirely uncharacterized protein groups as well as apparently well-known families such as those of the ribosomal proteins L21e, L22 and L6.
The supplementary data as well as the prediction tool as WWW server are available at .
Erik van Nimwegen (Biozentrum, University of Basel, Switzerland), Sandor Pongor (International Centre for Genetic Engineering and Biotechnology, Trieste, Italy), Igor Zhulin (University of Tennessee, Oak Ridge National Laboratory, USA).
PMCID: PMC1783638  PMID: 17222345

Results 1-6 (6)