The endolysin LysK derived from staphylococcal phage K has previously been shown to have two enzymatic domains, one of which is an N-acetylmuramoyl-L-alanine amidase and the other a cysteine/histidine-dependant amidohydrolase/peptidase designated CHAPk. The latter, when cloned as a single-domain truncated enzyme, is conveniently overexpressed in a highly-soluble form. This enzyme was shown to be highly active in vitro against live cell suspensions of S. aureus. In the current study, the IVIS imaging system was used to demonstrate the effective elimination of a lux labeled S. aureus from the nares of BALB/c mice.
Staphylococcus; decolonization; lysin; bacteriophage; nasal
New antibacterial agents are urgently needed for the elimination of biofilm-forming bacteria that are highly resistant to traditional antimicrobial agents. Proliferation of such bacteria can lead to significant economic losses in the agri-food sector. This study demonstrates the potential of the bacteriophage-derived peptidase, CHAPK, as a biocidal agent for the rapid disruption of biofilm-forming staphylococci, commonly associated with bovine mastitis. Purified CHAPK applied to biofilms of Staphylococcus aureus DPC5246 completely eliminated the staphylococcal biofilms within 4 h. In addition, CHAPK was able to prevent biofilm formation by this strain. The CHAPK lysin also reduced S. aureus in a skin decolonization model. Our data demonstrates the potential of CHAPK as a biocidal agent for prevention and treatment of biofilm-associated staphylococcal infections or as a decontaminating agent in the food and healthcare sectors.
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.
Expresso is a multiple sequence alignment server that aligns sequences using structural information. The user only needs to provide sequences. The server runs BLAST to identify close homologues of the sequences within the PDB database. These PDB structures are used as templates to guide the alignment of the original sequences using structure-based sequence alignment methods like SAP or Fugue. The final result is a multiple sequence alignment of the original sequences based on the structural information of the templates. An advanced mode makes it possible to either upload private structures or specify which PDB templates should be used to model each sequence. Providing the suitable structural information is available, Expresso delivers sequence alignments with accuracy comparable with structure-based alignments. The server is available on .
A truncated derivative of the phage endolysin LysK containing only the CHAP (cysteine- and histidine-dependent amidohydrolase/peptidase) domain exhibited lytic activity against live clinical staphylococcal isolates, including methicillin-resistant Staphylococcus aureus. This is the first known report of a truncated phage lysin which retains high lytic activity against live staphylococcal cells.
The Streptococcus agalactiae bacteriophage B30 endolysin contains three domains: cysteine, histidine-dependent amidohydrolase/peptidase (CHAP), Acm glycosidase, and the SH3b cell wall binding domain. Truncations and point mutations indicated that the Acm domain requires the SH3b domain for activity, while the CHAP domain is responsible for nearly all the cell lysis activity.
Toxoplasma gondii is a protozoan parasite capable of infecting humans and animals. Surface antigen glycoproteins, SAG2C, -2D, -2X, and -2Y, are expressed on the surface of bradyzoites. These antigens have been shown to protect bradyzoites against immune responses during chronic infections. We studied structures of SAG2C, -2D, -2X, and -2Y proteins using bioinformatics methods. The protein sequence alignment was performed by T-Coffee method. Secondary structural and functional domains were predicted using software PSIPRED v3.0 and SMART software, and 3D models of proteins were constructed and compared using the I-TASSER server, VMD, and SWISS-spdbv. Our results showed that SAG2C, -2D, -2X, and -2Y are highly homologous proteins. They share the same conserved peptides and HLA-I restricted epitopes. The similarity in structure and domains indicated putative common functions that might stimulate similar immune response in hosts. The conserved peptides and HLA-restricted epitopes could provide important insights on vaccine study and the diagnosis of this disease.
Protein structure prediction provides valuable insights into function, and comparative modeling is one of the most reliable methods to predict 3D structures directly from amino acid sequences. However, critical problems arise during the selection of the correct templates and the alignment of query sequences therewith. We have developed an automatic protein structure prediction server, (PS)2, which uses an effective consensus strategy both in template selection, which combines PSI-BLAST and IMPALA, and target–template alignment integrating PSI-BLAST, IMPALA and T-Coffee. (PS)2 was evaluated for 47 comparative modeling targets in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction). For the benchmark dataset, the predictive performance of (PS)2, based on the mean GTD_TS score, was superior to 10 other automatic servers. Our method is based solely on the consensus sequence and thus is considerably faster than other methods that rely on the additional structural consensus of templates. Our results show that (PS)2, coupled with suitable consensus strategies and a new similarity score, can significantly improve structure prediction. Our approach should be useful in structure prediction and modeling. The (PS)2 is available through the website at .
Consensus is a server developed to produce high-quality alignments for comparative modeling, and to identify the alignment regions reliable for copying from a given template. This is accomplished even when target–template sequence identity is as low as 5%. Combining the output from five different alignment methods, the server produces a consensus alignment, with a reliability measure indicated for each position and a prediction of the regions suitable for modeling. Models built using the server predictions are typically within 3 Å rms deviations from the crystal structure. Users can upload a target protein sequence and specify a template (PDB code); if no template is given, the server will search for one. The method has been validated on a large set of homologous protein structure pairs. The Consensus server should prove useful for modelers for whom the structural reliability of the model is critical in their applications. It is currently available at http://structure.bu.edu/cgi-bin/consensus/consensus.cgi.
Virion-associated peptidoglycan hydrolases have potential as antimicrobial agents due to their ability to lyse Gram-positive bacteria on contact. In this work, our aim was to improve the lytic activity of HydH5, a virion-associated peptidoglycan hydrolase from the Staphylococcus aureus bacteriophage vB_SauS-phiIPLA88. Full-length HydH5 and two truncated derivatives containing only the CHAP (cysteine, histidine-dependent amidohydrolase/peptidase) domain exhibited high lytic activity against live S. aureus cells. In addition, three different fusion proteins were created between lysostaphin and HydH5, each of which showed higher staphylolytic activity than the parental enzyme or its deletion construct. Both parental and fusion proteins lysed S. aureus cells in zymograms and plate lysis and turbidity reduction assays. In plate lysis assays, HydH5 and its derivative fusions lysed bovine and human S. aureus strains, the methicillin-resistant S. aureus (MRSA) strain N315, and human Staphylococcus epidermidis strains. Several nonstaphylococcal bacteria were not affected. HydH5 and its derivative fusion proteins displayed antimicrobial synergy with the endolysin LysH5 in vitro, suggesting that the two enzymes have distinct cut sites and, thus, may be more efficient in combination for the elimination of staphylococcal infections.
Homology models of amidase-03 from Bacillus anthracis were constructed using Modeller (9v2). Modeller constructs protein models using
an automated approach for comparative protein structure modeling by the satisfaction of spatial restraints. A template structure of Listeria
monocytogenes bacteriophage PSA endolysin PlyPSA (PDB ID: 1XOV) was selected from protein databank (PDB) using BLASTp with
BLOSUM62 sequence alignment scoring matrix. We generated five models using the Modeller default routine in which initial coordinates
are randomized and evaluated by pseudo-energy parameters. The protein models were validated using PROCHECK and energy minimized
using the steepest descent method in GROMACS 3.2 (flexible SPC water model in cubic box of size 1 Å instead of rigid SPC model). We
used G43a1 force field in GROMACS for energy calculations and the generated structure was subsequently analyzed using the VMD
software for stereo-chemistry, atomic clash and misfolding. A detailed analysis of the amidase-03 model structure from Bacillus anthracis
will provide insight to the molecular design of suitable inhibitors as drug candidates.
Homology modeling; modeller; amidase-03; hydrolase enzyme; Bacillus anthracis
Staphylococci cause bovine mastitis, with Staphylococcus aureus being responsible for the majority of the mastitis-based losses to the dairy industry (up to $2 billion/annum). Treatment is primarily with antibiotics, which are often ineffective and potentially contribute to resistance development. Bacteriophage endolysins (peptidoglycan hydrolases) present a promising source of alternative antimicrobials. Here we evaluated two fusion proteins consisting of the streptococcal λSA2 endolysin endopeptidase domain fused to staphylococcal cell wall binding domains from either lysostaphin (λSA2-E-Lyso-SH3b) or the staphylococcal phage K endolysin, LysK (λSA2-E-LysK-SH3b). We demonstrate killing of 16 different S. aureus mastitis isolates, including penicillin-resistant strains, by both constructs. At 100 μg/ml in processed cow milk, λSA2-E-Lyso-SH3b and λSA2-E-LysK-SH3b reduced the S. aureus bacterial load by 3 and 1 log units within 3 h, respectively, compared to a buffer control. In contrast to λSA2-E-Lyso-SH3b, however, λSA2-E-LysK-SH3b permitted regrowth of the pathogen after 1 h. In a mouse model of mastitis, infusion of 25 μg of λSA2-E-Lyso-SH3b or λSA2-E-LysK-SH3b into mammary glands reduced S. aureus CFU by 0.63 or 0.81 log units, compared to >2 log for lysostaphin. Both chimeras were synergistic with lysostaphin against S. aureus in plate lysis checkerboard assays. When tested in combination in mice, λSA2-E-LysK-SH3b and lysostaphin (12.5 μg each/gland) caused a 3.36-log decrease in CFU. Furthermore, most protein treatments reduced gland wet weights and intramammary tumor necrosis factor alpha (TNF-α) concentrations, which serve as indicators of inflammation. Overall, our animal model results demonstrate the potential of fusion peptidoglycan hydrolases as antimicrobials for the treatment of S. aureus-induced mastitis.
HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows to search a wide choice of databases, such as the PDB, SCOP, Pfam, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in a user-friendly format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template alignments, multiple alignments of the query with a set of templates selected from the search results, as well as 3D structural models that are calculated by the MODELLER software from these alignments. A detailed help facility is available. As a demonstration, we analyze the sequence of SpoVT, a transcriptional regulator from Bacillus subtilis. HHpred can be accessed at .
Template selection and target-template alignment are critical steps for template-based modeling (TBM) methods. To identify the template for the twilight zone of 15~25% sequence similarity between targets and templates is still difficulty for template-based protein structure prediction. This study presents the (PS)2-v2 server, based on our original server with numerous enhancements and modifications, to improve reliability and applicability.
To detect homologous proteins with remote similarity, the (PS)2-v2 server utilizes the S2A2 matrix, which is a 60 × 60 substitution matrix using the secondary structure propensities of 20 amino acids, and the position-specific sequence profile (PSSM) generated by PSI-BLAST. In addition, our server uses multiple templates and multiple models to build and assess models. Our method was evaluated on the Lindahl benchmark for fold recognition and ProSup benchmark for sequence alignment. Evaluation results indicated that our method outperforms sequence-profile approaches, and had comparable performance to that of structure-based methods on these benchmarks. Finally, we tested our method using the 154 TBM targets of the CASP8 (Critical Assessment of Techniques for Protein Structure Prediction) dataset. Experimental results show that (PS)2-v2 is ranked 6th among 72 severs and is faster than the top-rank five serves, which utilize ab initio methods.
Experimental results demonstrate that (PS)2-v2 with the S2A2 matrix is useful for template selections and target-template alignments by blending the amino acid and structural propensities. The multiple-template and multiple-model strategies are able to significantly improve the accuracies for target-template alignments in the twilight zone. We believe that this server is useful in structure prediction and modeling, especially in detecting homologous templates with sequence similarity in the twilight zone.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline.
protein structure prediction; threading; contact prediction; ab initio folding; CASP
We introduce BAR-PLUS (BAR+), a web server for functional and structural annotation of protein sequences. BAR+ is based on a large-scale genome cross comparison and a non-hierarchical clustering procedure characterized by a metric that ensures a reliable transfer of features within clusters. In this version, the method takes advantage of a large-scale pairwise sequence comparison of 13 495 736 protein chains also including 988 complete proteomes. Available sequence annotation is derived from UniProtKB, GO, Pfam and PDB. When PDB templates are present within a cluster (with or without their SCOP classification), profile Hidden Markov Models (HMMs) are computed on the basis of sequence to structure alignment and are cluster-associated (Cluster-HMM). Therefrom, a library of 10 858 HMMs is made available for aligning even distantly related sequences for structural modelling. The server also provides pairwise query sequence–structural target alignments computed from the correspondent Cluster-HMM. BAR+ in its present version allows three main categories of annotation: PDB [with or without SCOP (*)] and GO and/or Pfam; PDB (*) without GO and/or Pfam; GO and/or Pfam without PDB (*) and no annotation. Each category can further comprise clusters where GO and Pfam functional annotations are or are not statistically significant. BAR+ is available at http://bar.biocomp.unibo.it/bar2.0.
Template-based protein structure modeling is commonly used for protein structure prediction. Based on the observation that multiple template-based methods often perform better than single template-based methods, we further explore the use of a variable number of multiple templates for a given target in the latest variant of TASSER, TASSERVMT. We first develop an algorithm that improves the target-template alignment for a given template. The improved alignment, called the SP3 alternative alignment, is generated by a parametric alignment method coupled with short TASSER refinement on models selected using knowledge-based scores. The refined top model is then structurally aligned to the template to produce the SP3 alternative alignment. Templates identified using SP3 threading are combined with the SP3 alternative and HHEARCH alignments to provide target alignments to each template. These template models are then grouped into sets containing a variable number of template/alignment combinations. For each set, we run short TASSER simulations to build full-length models. Then, the models from all sets of templates are pooled, and the top 20–50 models selected using FTCOM ranking method. These models are then subjected to a single longer TASSER refinement run for final prediction. We benchmarked our method by comparison with our previously developed approach, pro-sp3-TASSER, on a set with 874 Easy and 318 Hard targets. The average GDT-TS score improvements for the first model are 3.5% and 4.3% for Easy and Hard targets, respectively. When tested on the 112 CASP9 targets, our method improves the average GDT-TS scores as compared to pro-sp3-TASSER by 8.2% and 9.3% for the 80 Easy and 32 Hard targets, respectively. It also shows slightly better results than the top ranked CASP9 Zhang-Server, QUARK and HHpredA methods. The program is available for download at http://cssb.biology.gatech.edu/.
template-based modeling; threading; alignment; SP3; TASSER
SWISS-MODEL (http://swissmodel.expasy.org) is a server for automated comparative modeling of three-dimensional (3D) protein structures. It pioneered the field of automated modeling starting in 1993 and is the most widely-used free web-based automated modeling facility today. In 2002 the server computed 120 000 user requests for 3D protein models. SWISS-MODEL provides several levels of user interaction through its World Wide Web interface: in the ‘first approach mode’ only an amino acid sequence of a protein is submitted to build a 3D model. Template selection, alignment and model building are done completely automated by the server. In the ‘alignment mode’, the modeling process is based on a user-defined target-template alignment. Complex modeling tasks can be handled with the ‘project mode’ using DeepView (Swiss-PdbViewer), an integrated sequence-to-structure workbench. All models are sent back via email with a detailed modeling report. WhatCheck analyses and ANOLEA evaluations are provided optionally. The reliability of SWISS-MODEL is continuously evaluated in the EVA-CM project. The SWISS-MODEL server is under constant development to improve the successful implementation of expert knowledge into an easy-to-use server.
The ProDom database contains protein domain families generated from the SWISS-PROT database by automated sequence comparisons. It can be searched on the World Wide Web (http://protein.toulouse.inra. fr/prodom.html ) or by E-mail (firstname.lastname@example.org) to study domain arrangements within known families or new proteins. Strong emphasis has been put on the graphical user interface which allows for interactive analysis of protein homology relationships. Recent improvements to the server include: ProDom search by keyword; links to PROSITE and PDB entries; more sensitive ProDom similarity search with BLAST or WU-BLAST; alignments of query sequences with homologous ProDom domain families; and links to the SWISS-MODEL server (http: //www.expasy.ch/swissmod/SWISS-MODEL.html ) for homology based 3-D domain modelling where possible.
Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins.
We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Cα-root mean square deviation (RMSD) of 3.8Å, with 6 of them having a Cα-RMSD < 2.5Å. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Cα-RMSD < 2.5Å. The average Cα-RMSD of the I-TASSER models was 3.9Å, whereas it was 5.9Å using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Cα-RMSD of 3.9Å was obtained for the third benchmark, with seven cases having a Cα-RMSD < 2.5Å.
Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users .
The I-TASSER server is an integrated platform for automated protein structure and function prediction based on the sequence-to-structure-to-function paradigm. Starting from an amino acid sequence, I-TASSER first generates three-dimensional atomic models from multiple threading alignments and iterative structural assembly simulations. The function of the protein is then inferred by structurally matching the 3D models with other known proteins. The output from a typical server run contains full-length secondary and tertiary structure predictions, and functional annotations on ligand-binding sites, Enzyme Commission numbers and Gene Ontology terms. An estimate of accuracy of the predictions is provided based on the confidence score of the modeling. This protocol provides new insights and guidelines for designing of on-line server systems for the state-of-the-art protein structure and function predictions. The server is available at http://zhang.bioinformatics.ku.edu/I-TASSER.
I-TASSER; protein structure prediction; protein function prediction
In a variety of threading methods, often poorly ranked (low z-score) templates have good alignments. Here, a new method, TASSER_low-zsc that identifies these low z-score ranked templates to improve protein structure prediction accuracy is described. The approach consists of clustering of threading templates by affinity propagation on the basis of structural similarity (thread_cluster) followed by TASSER modeling, with final models selected using a TASSER_QA variant. To establish generality of the approach, templates provided by two threading methods, SP3 and SPARKS2, are examined. The SP3 and SPARKS2 benchmark datasets consist of 351 and 357 medium/hard proteins (those with moderate to poor quality templates and/or alignments) of length ≤ 250 residues respectively. For SP3 medium and hard targets, using thread_cluster, the TM-scores of the best template improve by ~4% and ~9% over the original set (without low z-score templates) respectively; after TASSER modeling/refinement and ranking, the best model improves by ~7% and ~9% over the best model generated with the original template set. Moreover, TASSER_low-zsc generates 22% (43%) more foldable medium (hard) targets. Similar improvements are observed with low ranked templates from SPARKS2. The template clustering approach could be applied to other modeling methods that utilize multiple templates to improve structure prediction.
Structure prediction; threading; TASSER; tertiary structure
The I-TASSER algorithm for protein 3D structure prediction was tested in CASP8, with the procedure fully automated in both the Server and Human sections. The quality of the server models is close to that of human ones but incorporating more diverse templates from other servers improves the results of human predictions in the distant homology category. For the first time, the sequence-based contact predictions from machine learning techniques are found helpful for both template-based modeling (TBM) and template-free modeling (FM). In TBM, although the average accuracy of the sequence-based contact predictions is lower than that from template-based ones, the novel contacts in the sequence-based predictions, which are complementary to the threading templates in the weakly or unaligned regions, are important to improve the global and local packing of these regions. Moreover, the newly developed atomic structural refinement algorithm was tested in CASP8 and found to improve the hydrogen-bonding networks and the overall TM-score, which is mainly due to its ability of removing steric clashes so that the models can be generated from cluster centroids. Nevertheless, one of the major issues of the I-TASSER pipeline is the model selection where the best models could not be appropriately recognized when the correct templates are detected only by the minority of the threading algorithms. There are also problems related with domain-splitting and mirror image recognition which mainly influences the performance of I-TASSER modeling in the FM-based structure predictions.
Protein structure prediction; threading; I-TASSER; CASP8; contact prediction; free modeling
The SUPERFAMILY database contains a library of hidden Markov models representing all proteins of known structure. The database is based on the SCOP ‘superfamily’ level of protein domain classification which groups together the most distantly related proteins which have a common evolutionary ancestor. There is a public server at http://supfam.org which provides three services: sequence searching, multiple alignments to sequences of known structure, and structural assignments to all complete genomes. Given an amino acid or nucleotide query sequence the server will return the domain architecture and SCOP classification. The server produces alignments of the query sequences with sequences of known structure, and includes multiple alignments of genome and PDB sequences. The structural assignments are carried out on all complete genomes (currently 59) covering approximately half of the soluble protein domains. The assignments, superfamily breakdown and statistics on them are available from the server. The database is currently used by this group and others for genome annotation, structural genomics, gene prediction and domain-based genomic studies.