Myosins are one of the largest protein superfamilies with 24 classes. They have conserved structural features and catalytic domains yet show huge variation at different domains resulting in a variety of functions. Myosins are molecules driving various kinds of cellular processes and motility until the level of organisms. These are ATPases that utilize the chemical energy released by ATP hydrolysis to bring about conformational changes leading to a motor function. Myosins are important as they are involved in almost all cellular activities ranging from cell division to transcriptional regulation. They are crucial due to their involvement in many congenital diseases symptomatized by muscular malfunctions, cardiac diseases, deafness, neural and immunological dysfunction, and so on, many of which lead to death at an early age. We present Myosinome, a database of selected myosin classes (myosin II, V, and VI) from five model organisms. This knowledge base provides the sequences, phylogenetic clustering, domain architectures of myosins and molecular models, structural analyses, and relevant literature of their coiled-coil domains. In the current version of Myosinome, information about 71 myosin sequences belonging to three myosin classes (myosin II, V, and VI) in five model organisms (Homo Sapiens, Mus musculus, D. melanogaster, C. elegans and S. cereviseae) identified using bioinformatics surveys are presented, and several of them are yet to be functionally characterized. As these proteins are involved in congenital diseases, such a database would be useful in short-listing candidates for gene therapy and drug development. The database can be accessed from http://caps.ncbs.res.in/myosinome.
myosin; Myosinome; myosin II; myosin V; myosin VI; myosin database
3D domain swapping is an oligomerization process in which structural elements get exchanged between subunits. This mechanism grasped interest of many researchers due to its association with neurodegenerative diseases like Alzheimer's disease, spongiform encephalopathy etc. Despite the biomedical relevance, very little is known about understanding this mechanism. The quest for ruling principles behind this curious phenomenon that could enable early prediction provided an impetus for our bioinformatics studies.
A novel method, HIDE, has been developed to find non-domain-swapped homologues and to identify hinge from domain-swapped oligomers. Non-domain-swapped homologues were identified from the protein structural databank for majority of the domain-swapped entries and hinge boundaries could be recognised automatically by means of successive superposition techniques. Different sequence and structural features in domain-swapped proteins and related proteins have also been analysed.
The HIDE algorithm was able to identify hinge region in 83% cases. Sequence and structural analyses of hinge and interfaces reveal amino acid preferences and specific conformations of residues at hinge regions, while comparing the domain-swapped and non-domain-swapped states. Interactions differ significantly between regular dimeric interfaces and interface formed at the site of domain-swapped examples. Such preferences of residues, conformations and interactions could be of predictive value.
Olfactory receptors are key components in signal transduction. Mutations in olfactory receptors alter the odor response, which is a fundamental response of organisms to their immediate environment. Understanding the relationship between odorant response and mutations in olfactory receptors is an important problem in bioinformatics and computational biology. In this work, we have systematically analyzed the relationship between various physical, chemical, energetic and conformational properties of amino acid residues, and the change of odor response/compound's potency/half maximal effective concentration (EC50) due to amino acid substitutions.
We observed that both the characteristics of odorant molecule (ligand) and amino acid properties are important for odor response and EC50. Additional information on neighboring and surrounding residues of the mutants enhanced the correlation between amino acid properties and EC50. Further, amino acid properties have been combined systematically using multiple regression techniques and we obtained a correlation of 0.90-0.98 with odor response/EC50 of goldfish, mouse and human olfactory receptors. In addition, we have utilized machine learning methods to discriminate the mutants, which enhance or reduce EC50 values upon mutation and we obtained an accuracy of 93% and 79% for self-consistency and jack-knife tests, respectively.
Our analysis provides deep insights for understanding the odor response of olfactory receptor mutants and the present method could be used for identifying the mutants with enhanced specificity.
After the discovery of the complete repertoire of D. melanogaster Olfactory Receptors (ORs), candidate ORs have been identified from at least 12 insect species from four orders (Coleoptera, Lepidoptera, Diptera, and Hymenoptera), including species of economic or medical importance. Although all ORs share the same G-protein coupled receptor structure with seven transmembrane domains, they share poor sequence identity within and between species, and have been identified mainly through genomic data analyses. To date, D. melanogaster remains the only insect species where ORs have been extensively studied, from expression pattern establishment to functional investigations. These studies have confirmed several observations made in vertebrates: one OR type is selectively expressed in a subtype of olfactory receptor neurons, and one olfactory neuron expresses only one type of OR. The olfactory mechanism, further, appears to be conserved between insects and vertebrates. Understanding the function of insect ORs will greatly contribute to the understanding of insect chemical communication mechanisms, particularly with agricultural pests and disease vectors, and could result in future strategies to reduce their negative effects. In this study, we propose molecular models for insect olfactory receptor co-receptor OR83b and its possible functional oligomeric states. The functional similarity of OR83b to GPCRs and ion channels has been exploited for understanding the structure of OR83b. We could observe that C-terminal region (TM4-7) of OR83b is involved in homodimer amd heterodimer formation (with OR22a) which suggests why C-terminus of insect ORs are highly conserved across different species. We also propose two possible ion channel pathways in OR83b: one formed by TM4-5 region with intracellular pore-forming domain and the other formed by TM5-6 with extracellular pore forming domain using analysis of the electrostatics distribution of the pore forming domain.
olfaction; homology modeling; ion channels; heterodimers; distant relationships
The myosin superfamily is a versatile group of molecular motors involved in the transport of specific biomolecules, vesicles and organelles in eukaryotic cells. The processivity of myosins along an actin filament and transport of intracellular ‘cargo’ are achieved by generating physical force from chemical energy of ATP followed by appropriate conformational changes. The typical myosin has a head domain, which harbors an ATP binding site, an actin binding site, and a light-chain bound ‘lever arm’, followed often by a coiled coil domain and a cargo binding domain. Evolution of myosins started at the point of evolution of eukaryotes, S. cerevisiae being the simplest one known to contain these molecular motors. The coiled coil domain of the myosin classes II, V and VI in whole genomes of several model organisms display differences in the length and the strength of interactions at the coiled coil interface. Myosin II sequences have long-length coiled coil regions that are predicted to have a highly stable dimeric interface. These are interrupted, however, by regions that are predicted to be unstable, indicating possibilities of alternate conformations, associations to make thick filaments, and interactions with other molecules. Myosin V sequences retain intermittent regions of strong and weak interactions, whereas myosin VI sequences are relatively devoid of strong coiled coil motifs. Structural deviations at coiled coil regions could be important for carrying out normal biological function of these proteins.
myosin structure; myosin domain architecture; coiled coil
Accurate structure-based sequence alignments of distantly related proteins are crucial in gaining insight about protein domains that belong to a superfamily. The PASS2 database provides alignments of proteins related at the superfamily level and are characterized by low sequence identity. We thus report an automated, updated version of the superfamily alignment database known as PASS2.4, consisting of 1961 superfamilies and 10 569 protein domains, which is in direct correspondence with SCOP (1.75) database. Database organization, improved methods for efficient structure-based sequence alignments and the analysis of extreme distantly related proteins within superfamilies formed the focus of this update. Alignment of family-specific functional residues can be realized using such alignments and is shown using one superfamily as an example. The database of alignments and other related features can be accessed at http://caps.ncbs.res.in/pass2/.
Cytoplasmic class XI myosins are the fastest processive motors known. This class functions in high-velocity cytoplasmic streaming in various plant cells from algae to angiosperms. The velocities at which they process are ten times faster than its closest class V homologues.
To provide sequence determinants and structural rationale for the molecular mechanism of this fast pace myosin, we have compared the sequences from myosin class V and XI through Evolutionary Trace (ET) analysis. The current study identifies class-specific residues of myosin XI spread over the actin binding site, ATP binding site and light chain binding neck region. Sequences for ET analysis were accumulated from six plant genomes, using literature based text search and sequence searches, followed by triple validation viz. CDD search, string-based searches and phylogenetic clustering. We have identified nine myosin XI genes in sorghum and seven in grape by sequence searches. Both the plants possess one gene product each belonging to myosin type VIII as well. During this process, we have re-defined the gene boundaries for three sorghum myosin XI genes using fgenesh program.
Molecular modelling and subsequent analysis of putative interactions involving these class-specific residues suggest a structural basis for the molecular mechanism behind high velocity of plant myosin XI. We propose a model of a more flexible switch I region that contributes to faster ADP release leading to high velocity movement of the algal myosin XI.
Signaling mechanisms involving protein tyrosine phosphatases govern several cellular and developmental processes. These enzymes are regulated by several mechanisms which include variation in the catalytic turnover rate based on redox stimuli, subcellular localization or protein-protein interactions. In the case of Receptor Protein Tyrosine Phosphatases (RPTPs) containing two PTP domains, phosphatase activity is localized in their membrane-proximal (D1) domains, while the membrane-distal (D2) domain is believed to play a modulatory role. Here we report our analysis of the influence of the D2 domain on the catalytic activity and substrate specificity of the D1 domain using two Drosophila melanogaster RPTPs as a model system. Biochemical studies reveal contrasting roles for the D2 domain of Drosophila Leukocyte antigen Related (DLAR) and Protein Tyrosine Phosphatase on Drosophila chromosome band 99A (PTP99A). While D2 lowers the catalytic activity of the D1 domain in DLAR, the D2 domain of PTP99A leads to an increase in the catalytic activity of its D1 domain. Substrate specificity, on the other hand, is cumulative, whereby the individual specificities of the D1 and D2 domains contribute to the substrate specificity of these two-domain enzymes. Molecular dynamics simulations on structural models of DLAR and PTP99A reveal a conformational rationale for the experimental observations. These studies reveal that concerted structural changes mediate inter-domain communication resulting in either inhibitory or activating effects of the membrane distal PTP domain on the catalytic activity of the membrane proximal PTP domain.
Accurate functional annotation of protein sequences is hampered by important factors such as the failure of sequence search methods to identify relationships and the inherent diversity in function of proteins related at low sequence similarities. Earlier, we had employed intermediate sequence search approach to establish new domain relationships in the unassigned regions of gene products at the whole genome level by taking Mycoplasma gallisepticum as a specific example and established new domain relationships. In this paper, we report a detailed comparison of the conservation status of the domain and domain architectures of the gene products that bear our newly predicted domains amongst 14 other Mycoplasma genomes and reported the probable implications for the organisms. Some of the domain associations, observed in Mycoplasma that afflict humans and other non-human primates, are involved in regulation of solute transport and DNA binding suggesting specific modes of host-pathogen interactions.
Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.
We have developed a new method for the identification of domains in unassigned regions through indirect connections and scaled up its application to the analysis of 434 unassigned regions in 726 protein sequences of Mycoplasma gallisepticum genome. We could establish 71 new domain relationships and probable 63 putative new domain families through intermediate sequences in the unassigned regions, which importantly represent an overall 10% increase in PfamA domain annotation over the direct assignment in this genome.
The systematic analysis of the unassigned regions in the Mycoplasma gallisepticum genome has provided some insight into the possible new domain relationships and putative new domain families. Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.
Elucidating the key players of molecular mechanism that mediate the complex stress-responses in plants system is an important step to develop improved variety of stress tolerant crops. Understanding the effects of different types of biotic and abiotic stress is a rapidly emerging domain in the area of plant research to develop better, stress tolerant plants. Information about the transcription factors, transcription factor binding sites, function annotation of proteins coded by genes expressed during abiotic stress (for example: drought, cold, salinity, excess light, abscisic acid, and oxidative stress) response will provide better understanding of this phenomenon. STIFDB is a database of abiotic stress responsive genes and their predicted abiotic transcription factor binding sites in Arabidopsis thaliana. We integrated 2269 genes upregulated in different stress related microarray experiments and surveyed their 1000 bp and 100 bp upstream regions and 5′UTR regions using the STIF algorithm and identified putative abiotic stress responsive transcription factor binding sites, which are compiled in the STIFDB database. STIFDB provides extensive information about various stress responsive genes and stress inducible transcription factors of Arabidopsis thaliana. STIFDB will be a useful resource for researchers to understand the abiotic stress regulome and transcriptome of this important model plant system.
Disulphide bridges are well known to play key roles in stability, folding and functions of proteins. Introduction or deletion of disulphides by site-directed mutagenesis have produced varying effects on stability and folding depending upon the protein and location of disulphide in the 3-D structure. Given the lack of complete understanding it is worthwhile to learn from an analysis of extent of conservation of disulphides in homologous proteins. We have also addressed the question of what structural interactions replaces a disulphide in a homologue in another homologue.
Using a dataset involving 34,752 pairwise comparisons of homologous protein domains corresponding to 300 protein domain families of known 3-D structures, we provide a comprehensive analysis of extent of conservation of disulphide bridges and their structural features. We report that only 54% of all the disulphide bonds compared between the homologous pairs are conserved, even if, a small fraction of the non-conserved disulphides do include cytoplasmic proteins. Also, only about one fourth of the distinct disulphides are conserved in all the members in protein families. We note that while conservation of disulphide is common in many families, disulphide bond mutations are quite prevalent. Interestingly, we note that there is no clear relationship between sequence identity between two homologous proteins and disulphide bond conservation. Our analysis on structural features at the sites where cysteines forming disulphide in one homologue are replaced by non-Cys residues show that the elimination of a disulphide in a homologue need not always result in stabilizing interactions between equivalent residues.
We observe that in the homologous proteins, disulphide bonds are conserved only to a modest extent. Very interestingly, we note that extent of conservation of disulphide in homologous proteins is unrelated to the overall sequence identity between homologues. The non-conserved disulphides are often associated with variable structural features that were recruited to be associated with differentiation or specialisation of protein function.
Serine proteases are one of the most abundant groups of proteolytic enzymes found in all the kingdoms of life. While studies have established significant roles for many prokaryotic serine proteases in several physiological processes, such as those associated with metabolism, cell signalling, defense response and development, functional associations for a large number of prokaryotic serine proteases are relatively unknown. Current analysis is aimed at understanding the distribution and probable biological functions of the select serine proteases encoded in representative prokaryotic organisms.
A total of 966 putative serine proteases, belonging to five families, were identified in the 91 prokaryotic genomes using various sensitive sequence search techniques. Phylogenetic analysis reveals several species-specific clusters of serine proteases suggesting their possible involvement in organism-specific functions. Atypical phylogenetic associations suggest an important role for lateral gene transfer events in facilitating the widespread distribution of the serine proteases in the prokaryotes. Domain organisations of the gene products were analysed, employing sensitive sequence search methods, to infer their probable biological functions. Trypsin, subtilisin and Lon protease families account for a significant proportion of the multi-domain representatives, while the D-Ala-D-Ala carboxypeptidase and the Clp protease families are mostly single-domain polypeptides in prokaryotes. Regulatory domains for protein interaction, signalling, pathogenesis, cell adhesion etc. were found tethered to the serine protease domains. Some domain combinations (such as S1-PDZ; LON-AAA-S16 etc.) were found to be widespread in the prokaryotic lineages suggesting a critical role in prokaryotes.
Domain architectures of many serine proteases and their homologues identified in prokaryotes are very different from those observed in eukaryotes, suggesting distinct roles for serine proteases in prokaryotes. Many domain combinations were found unique to specific prokaryotic species, suggesting functional specialisation in various cellular and physiological processes.
Structural motifs are important for the integrity of a protein fold and can be employed to design and rationalize protein engineering and folding experiments. Such conserved segments represent the conserved core of a family or superfamily and can be crucial for the recognition of potential new members in sequence and structure databases. We present a database, MegaMotifBase, that compiles a set of important structural segments or motifs for protein structures. Motifs are recognized on the basis of both sequence conservation and preservation of important structural features such as amino acid preference, solvent accessibility, secondary structural content, hydrogen-bonding pattern and residue packing. This database provides 3D orientation patterns of the identified motifs in terms of inter-motif distances and torsion angles. Important applications of structural motifs are also provided in several crucial areas such as similar sequence and structure search, multiple sequence alignment and homology modeling. MegaMotifBase can be a useful resource to gain knowledge about structure and functional relationship of proteins. The database can be accessed from the URL http://caps.ncbs.res.in/MegaMotifbase/index.html
Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins.
In this paper, we propose a new algorithm that uses Regularized Least Squares Classifier (RLSC) in conjunction with multiple physicochemical properties of amino acids to predict odorant-binding proteins. The algorithm was applied to the dataset derived from Pfam and GenDiS database and we obtained overall prediction accuracy of 97.7% (94.5% and 98.4% for positive and negative classes respectively).
Our study suggests that RLSC is potentially useful for predicting the odorant binding proteins from sequence-derived properties irrespective of sequence similarity. Our method predicts 92.8% of 56 odorant binding proteins non-homologous to any protein in the swissprot database and 97.1% of the 414 independent dataset proteins, suggesting the usefulness of RLSC method for facilitating the prediction of odorant binding proteins from sequence information.
Bcl-2 family proteins are key regulators of mitochondrial integrity and comprise both pro- and anti-apoptotic proteins. Bax a pro-apoptotic member localizes as monomers in the cytosol of healthy cells and accumulates as oligomers in mitochondria of apoptotic cells. The Bcl-2 homology-3 (BH3) domain regulates interactions within the family, but regions other than BH3 are also critical for Bax function. Thus, the N-terminus has been variously implicated in targeting to mitochondria, interactions with BH3-only proteins as well as conformational changes linked to Bax activation. The transmembrane (TM) domains (α5-α6 helices in the core and α9 helix in the C-terminus) in Bax are implicated in localization to mitochondria and triggering cytotoxicity. Here we have investigated N-terminus modulation of TM function in the context of regulation by the anti-apoptotic protein Bcl-xL.
Deletion of 29 amino acids in the Bax N-terminus (Bax 30–192) caused constitutive accumulation at mitochondria and triggered high levels of cytotoxicity, not inhibited by Bcl-xL. Removal of the TM domains (Bax 30–105) abrogated mitochondrial localization but resulted in Bcl-xL regulated activation of endogenous Bax and Bax-Bak dependent apoptosis. Inclusion of the α5-α6 helices/TMI domain (Bax 30–146) phenocopied Bax 30–192 as it restored mitochondrial localization, Bcl-xL independent cytotoxicity and was not dependent on endogenous Bax-Bak. Inhibition of function and localization by Bcl-xL was restored in Bax 1–146, which included the TM1 domain. Regardless of regulation by Bcl-xL, all N-terminal deleted constructs immunoprecipitated Bcl-xLand converged on caspase-9 dependent apoptosis consistent with mitochondrial involvement in the apoptotic cascade. Sub-optimal sequence alignments of Bax and Bcl-xL indicated a sequence similarity between the α5–α6 helices of Bax and Bcl-xL. Alanine substitutions of three residues (T14A-S15A-S16A) in the N-terminus (Bax-Ala3) attenuated regulation by the serine-threonine kinase Akt/PKB but not by Bcl-xL indicative of distinct regulatory mechanisms.
Collectively, the analysis of Bax deletion constructs indicates that the N-terminus drives conformational changes facilitating inhibition of cytotoxicity by Bcl-xL. We speculate that the TM1 helices may serve as 'structural antagonists' for BH3-Bcl-xL interactions, with this function being regulated by the N-terminus in the intact protein.
Serine proteases are one of the largest groups of proteolytic enzymes found across all kingdoms of life and are associated with several essential physiological pathways. The availability of Arabidopsis thaliana and rice (Oryza sativa) genome sequences has permitted the identification and comparison of the repertoire of serine protease-like proteins in the two plant species.
Despite the differences in genome sizes between Arabidopsis and rice, we identified a very similar number of serine protease-like proteins in the two plant species (206 and 222, respectively). Nearly 40% of the above sequences were identified as potential orthologues. Atypical members could be identified in the plant genomes for Deg, Clp, Lon, rhomboid proteases and species-specific members were observed for the highly populated subtilisin and serine carboxypeptidase families suggesting multiple lateral gene transfers. DegP proteases, prolyl oligopeptidases, Clp proteases and rhomboids share a significantly higher percentage orthology between the two genomes indicating substantial evolutionary divergence was set prior to speciation. Single domain architectures and paralogues for several putative subtilisins, serine carboxypeptidases and rhomboids suggest they may have been recruited for additional roles in secondary metabolism with spatial and temporal regulation. The analysis reveals some domain architectures unique to either or both of the plant species and some inactive proteases, like in rhomboids and Clp proteases, which could be involved in chaperone function.
The systematic analysis of the serine protease-like proteins in the two plant species has provided some insight into the possible functional associations of previously uncharacterised serine protease-like proteins. Further investigation of these aspects may prove beneficial in our understanding of similar processes in commercially significant crop plant species.
Owing to high evolutionary divergence, it is not always possible to identify distantly related protein domains by sequence search techniques. Intermediate sequences possess sequence features of more than one protein and facilitate detection of remotely related proteins. We have demonstrated recently the employment of Cascade PSI-BLAST where we perform PSI-BLAST for many ‘generations’, initiating searches from new homologues as well. Such a rigorous propagation through generations of PSI-BLAST employs effectively the role of intermediates in detecting distant similarities between proteins. This approach has been tested on a large number of folds and its performance in detecting superfamily level relationships is ∼35% better than simple PSI-BLAST searches. We present a web server for this search method that permits users to perform Cascade PSI-BLAST searches against the Pfam, SCOP and SwissProt databases. The URL for this server is .
Protein structure validation is an important step in computational modeling and structure determination. Stereochemical assessment of protein structures examine internal parameters such as bond lengths and Ramachandran (φ,ψ) angles. Gross structure prediction methods such as inverse folding procedure and structure determination especially at low resolution can sometimes give rise to models that are incorrect due to assignment of misfolds or mistracing of electron density maps. Such errors are not reflected as strain in internal parameters. HARMONY is a procedure that examines the compatibility between the sequence and the structure of a protein by assigning scores to individual residues and their amino acid exchange patterns after considering their local environments. Local environments are described by the backbone conformation, solvent accessibility and hydrogen bonding patterns. We are now providing HARMONY through a web server such that users can submit their protein structure files and, if required, the alignment of homologous sequences. Scores are mapped on the structure for subsequent examination that is useful to also recognize regions of possible local errors in protein structures. HARMONY server is located at
Realization of conserved residues that represent a protein family is crucial for clearer understanding of biological function as well as for the better recognition of additional members in sequence databases. Functionally important residues are recognized well due to their high degree of conservation in closely related sequences and are annotated in functional motif databases. Structural motifs are central to the integrity of the fold and require careful analysis for their identification. We report the availability of a database of spatially interacting motifs in single protein structures as well as those among distantly related protein structures that belong to a superfamily. Spatial interactions amongst conserved motifs are automatically measured using sequence similarity scores and distance calculations. Interactions between pairs of conserved motifs are described in the form of pseudoenergies. iMOTdb database provides information for 854 488 motifs corresponding to 60 849 protein structural domains and 22 648 protein structural entries.
Establishment of similarities between proteins is very important for the study of the relationship between sequence, structure and function and for the analysis of evolutionary relationships. Motif-based search methods play a crucial role in establishing the connections between proteins that are particularly useful for distant relationships. This paper reports SCANMOT, a web-based server that searches for similarities between proteins by simultaneous matching of multiple motifs. SCANMOT searches for similar sequences in entire sequence databases using multiple conserved regions and utilizes inter-motif spacing as restraints. The SCANMOT server is available via .
Functional selection and three-dimensional structural constraints of proteins relate to the retention of significant sequence similarity between proteins of similar fold and function despite poor overall sequence identity and evolutionary pressures. We report the availability of ‘iMOT’ (interacting MOTif) server, an interactive package for the automatic identification of spatially interacting motifs among distantly related proteins sharing similar folds and possessing common ancestral lineage. Spatial interactions between conserved stretches of a protein are evaluated by calculations of pseudo-potentials that describe the strength of interactions. Such an evaluation permits the automatic identification of highly interacting conserved regions of a protein. Interacting motifs have been shown to be useful in searching for distant homologues and establishing remote homologies among the largely unassigned sequences in genome databases. Information on such motifs should also be of value in protein folding, modelling and engineering experiments. The iMOT server can be accessed from http://www.ncbs.res.in/~faculty/mini/imot/iMOTserver.html. Supplementary Material can be accessed from: http://www.ncbs.res.in/~faculty/mini/imot/supplementary.html.
DSDBASE is a database of disulphide bonds in proteins, which provides information on native disulphides and those that are stereochemically possible between pairs of residues for all known protein structural entries. The modelling of disulphides has been performed, using MODIP, by the identification of residue pairs that can strainlessly accommodate a covalent cross-link. We also assess the stereochemical quality of the covalent cross-link and grade them appropriately. One of the potential uses of the database is to design site-directed mutants in order to enhance the thermal stability of a protein. The proposed sites of mutations can be viewed specifically with respect to active sites of enzymes and across physiological dimers. The occurrence of native and modelled disulphides increases the dimensions of the database enormously. This database can also be employed for proposing three-dimensional models of disulphide-rich short polypeptides. The database can be accessed from http://www.ncbs.res.in/~faculty/mini/dsdbase/dsdbase.html. Supplementary information can be accessed from http://www.ncbs.res.in/~faculty/mini/dsdbase/nar/suppl.htm.
PASS2 is a nearly automated version of CAMPASS and contains sequence alignments of proteins grouped at the level of superfamilies. This database has been created to fall in correspondence with SCOP database (1.53 release) and currently consists of 110 multi-member superfamilies and 613 superfamilies corresponding to single members. In multi-member superfamilies, protein chains with no more than 25% sequence identity have been considered for the alignment and hence the database aims to address sequence alignments which represent 26 219 protein domains under the SCOP 1.53 release. Structure-based sequence alignments have been obtained by COMPARER and the initial equivalences are provided automatically from a MALIGN alignment and subsequently augmented using STAMP4.0. The final sequence alignments have been annotated for the structural features using JOY4.0. Several interesting links are provided to other related databases and genome sequence relatives. Availability of reliable sequence alignments of distantly related proteins, despite poor sequence identity and single-member superfamilies, permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure–function relationships of individual superfamilies. The database can be queried by keywords and also by sequence search, interfaced by PSI-BLAST methods. Structure-annotated sequence alignments and several structural accessory files can be retrieved for all the superfamilies including the user-input sequence. The database can be accessed from http://www.ncbs.res.in/%7Efaculty/mini/campass/pass.html.
Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying ‘priority proteins’ for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.