|Home | About | Journals | Submit | Contact Us | Français|
Detailed structural analysis of protein necessitates investigation at primary, secondary and tertiary levels, respectively. Insight into protein secondary structures pave way for understanding the type of secondary structural elements involved (α-helices, β-strands etc.), the amino acid sequence that encode the secondary structural elements, number of residues, length and, percentage composition of the respective elements in the protein. Here we present a standalone tool entitled “ExSer” which facilitate an automated extraction of the amino acid sequence that encode for the secondary structural regions of a protein from the protein data bank (PDB) file.
ExSer is freely downloadable from http://code.google.com/p/tool-exser/
Protein structures are determined by experimental methods which include NMR, Xray and Electronic Microscopy. The experimentally solved protein structures are archived at the publicly available databases such as, Protein Data Bank (PDB) . Protein Data Bank (PDB) is an internationally referred protein structure database, it comprises of the atomic coordinates of the three dimensional (3D) structure of proteins . The coordinate file is organized in a common file format, called the PDB format . Amino acid sequence, secondary structural information and atom positions are organized systematically in the PDB file enabling machine learning . Many molecular visualization tools (RasMol, JMol, PyMOL etc.) have been developed based on the PDB format leading to visualization of the 3D structure of proteins from the atomic coordinates. The information pertaining to the fold class at the tertiary level and the arrangement of secondary structural elements (motifs) at the lower levels in the structural hierarchy could directly be inferred from the 3D structure. However,information that are embedded in the PDB file such as the amino acid sequence of the secondary structural elements viz., αhelices and βstrands, is of significant interest to the scientific community. Knowing the amino acid sequence of the secondary structures throw light on the biochemical and structural properties of the residues which inturn governs the formation of αhelices and βstrands in proteins. It furthers understanding on the relationship between proteins and their possible evolutionary origins, the relationship between the amino acid sequence and the tertiary structure and, structurefunction analysis. To the best of our understanding, the amino acid sequence is manually extracted based on the identifier(s)that are associated with the secondary structural elements in PDB format. Towards an automated extraction of the amino acid sequence encoding the secondary structural regions, we developed a standalone tool entitled ExSer (Extract the amino acid Sequences of secondary structures). ExSer decipher the amino acid sequence encoding the secondary structural elements (helices and strands), number of residues, position and their percentage composition, given the PDB coordinate file as query.
ExSer was developed using Practical Extraction and Report Language (PERL)  and Tool kit (Tk) . PERL is used for scripting and Tk for designing graphical user interface (GUI). ExSer provide the option to either select a PDB file via the browse button or to paste the structural information in the PDB format. Firstly, the tool extract the entire amino acid sequence of the polypeptide chain(s) based on the identifier “SEQRES” defined in the PDB format (latest ver. 3.20) . Simultaneously it picks up the start and end positions of the helices and strands using “HELIX” and “SHEET” identifiers mentioned in the PDB file. Finally, based on the above positions the amino acid sequences of helices and strands were extracted from the PDB coordinate file (Figure 1). The ExSer output comprise of, the complete amino acid sequence of the polypeptide chain(s) (likely A, B, C etc.) that make the protein structure, the amino acid sequences of secondary structural regions (helices and strands), the number of residues involved, the start and end positions of the secondary structural regions and their percentage composition with respect to the total protein. Subclassification in the helices such as righthanded alpha, omega, pi, gamma, 310 and lefthanded alpha,omega,gamma etc., finds mention in the output. Similarly, the parallel and antiparallel nature of the strands with respect to the previous strand is indicated. The extracted amino acid sequence finally appears in the FASTA format as output and can be saved in a separate text file, as well.
ExSer had been validated with more than 300 coordinate files downloaded from PDB. Since, turns finds no explicit mention in the PDB format (ver. 3.20), the tool cannot retrieve the amino acid sequence corresponding to the turns in protein structure. This tool can work on all windows operating systems including windows vista. In future, ExSer will be available for Linux and Macintosh operating systems.
This tool was developed at the Bioinformatics centre, Tamil Nadu Agricultural University with the financial support of the Department of Biotechnology (DBT), Government of India under the Biotechnology Information System (BTIS) program.
Citation:Vignesh et al, Bioinformation 4(9): 390-391 (2010)