The present study focused on sequence, structural and
comparative genomics analysis of
M. genitalium hypothetical
protein MG_237. ProtParam was used to analyze different
physiochemical properties from the amino acid sequence. The
hypothetical protein MG_237 was predicted to be 294 amino
acids, with a molecular weight of 34572.1 Daltons and an
isoelectric point of 7.69. An isoelectric point above 7 indicates a
positively charged protein, and an instability index of 28.33
suggests a stable protein. The negative GRAVY index of 0.235
is indicative of a hydrophilic and soluble protein. The protein
sequence was found to be rich in the amino acid leucine,
suggesting a preference for alpha-helices in 3D structure.
Secondary structure analysis was performed using ESPript
() and the protein was predicted to contain several
helices, consistent with the ProtParam results (). The
high percentage of helices in the structure makes the protein
more flexible for folding, which might increase protein
interactions. Subcellular localization is a key functional attribute
of a protein. Cellular functions are often localized in specific
compartments; therefore, predicting the subcellular localization
of unknown proteins could be used to obtain useful information
about their functions, and to select proteins for further study.
Moreover, studying the subcellular localization of proteins is
also helpful in understanding disease mechanisms and
developing novel drugs [
22]. The consensus protein subcellular
localization predictions suggest that MG_237 is a cytoplasmic
protein.
Homology modelling of MG_237:
Protein 3D structures can provide us with precise information
of how proteins interact and localize in their stable
conformation. Homology or comparative modelling is one of
the most common structure prediction methods in structural
genomics and proteomics. Numerous online servers and tools
have become available for homology or comparative modeling
of proteins in past years. Despite minimal modifications, one
initial step that is common in all modeling tools and servers is
to find the best matching template by performing a sequence
homology search with BLASTP. Templates are experimentally
determined 3D structures of proteins that share sequence
similarity with the query sequence. The template sequence and
the protein sequence whose structure is to be determined are
aligned using multiple sequence alignment algorithms [
23]. A
well-defined alignment is very important for the prediction of a
reliable 3D structure. The genome of
M. genitalium consists of 94
hypothetical proteins without any known function or structure.
A BLASTP search was performed for each protein sequence
against the PDB to identify templates for homology modeling.
MG_237 was selected for homology modeling as it showed
maximum identity to its 1td6_A, which is an X-Ray diffraction
model of a
M. pneumoniae hypothetical protein. The query
sequence and template ID was then given as input to the (PS)
2
server for homology modeling using MODELLER.
Energy minimization, quality assessment and visualization:
The predicted 3D structure of MG_237 is shown in .
Even though there were no steric clashes in the structure
generated, it was still subject to energy minimization and
assessed for both geometric and energy aspects. The positioning
of alpha-helices and beta-sheets was then compared using
ESPript2.2. Secondary structure elements were found to be
comparable to that of the template (). Eleven helices
and two beta sheets were predicted in the 3D structure of
MG_237, which implies that it is rich in helical structures
(). Several structure assessment methods including
RMSD, Z-scores, and Ramachandran plots were used to check
reliability of the predicted 3D model. The RMSD value indicates
the degree to which two 3D structures are similar. The lower
the value, the more similar the structures. Both template and
query structures were superimposed for the calculation of
RMSD (). The RMSD value obtained from
superimposition of MG_237 and 1TD6 in UCSF Chimera was
found to be 0.213 Å, suggesting a reliable 3D structure.
The Zscore is indicative of overall model quality and is used to check
whether the input structure is within the range of scores
typically found for native proteins of similar size. Z-scores of
the template and query model were obtained from PROSAweb.
The template Z score was 7.97 () and for the
MG-237 homology model it was 7.42 (), suggesting
similarity between template and query structure. Finally, the
Ramachandran plots were obtained for both the homology
model and the template as a quality assessment. PROCHECK
displayed 91% of residues in the most favored regions, with
7.6%, 0.7%, and 0.7% residues in additionally allowed,
generously allowed and disallowed regions, respectively
(). This indicated that the backbone dihedral angles,
phi and psi, in the MG_237 3D model, were reasonably
accurate. The Ramachandran plot for the template structure
showed the amino acid residues to be 84.2%, 14.0%, 1.9% and
0.0%, in most favored, additionally allowed, generously
allowed and disallowed regions respectively (Data not shown).
The comparable Ramachandran plot characteristics, RMSD
values, and Z-scores confirm the quality of the homology model
of MG_237. The final protein structure was deposited in PMDB
and is available under ID: PM0077727.
Functional annotation and comparative genomics analysis of MG_237:
Currently, there is no known function of MG_237 is known. In
the present study, a systematic workflow consisting of several
bioinformatics tools and databases was defined and used with
the goal of performing functional annotation of MG_237. Three
web tools were used to search the conserved domains and
potential function of Mg_237. Based on consensus predictions
made by Pfam, NCBI-CDD and InterProScan, it is suggested
that MG_237 contains DUF3196 domains and is currently
classified as protein of unknown function. Once the functional
annotation of hypothetical was performed, we applied
comparative genomics approach to further characterize
MG_237. This involved search against human proteome,
essentiality estimation, and involvement in metabolic
pathways. At first, a BLASTP search against human proteome
was performed to identify whether MG_237 has any human
homologues. It was identified that MG_237 is a unique protein
of M. genitalium and showed no homology to any of the human
proteins. Proteins with no homology to human proteins can
effectively be used as drug targets as targeting these proteins
will not have any side effects. Identification of proteins that
regulate key factors, such as nutrient uptake, virulence and
pathogenicity, is of great importance for disruption of pathogen
functions and existence. Such proteins are termed as essential
for the pathogen. Again, not all essential proteins are
nonhomologous in nature. Therefore, pathogen proteins that fulfil
the criteria of being unique and essential at the same time
represent more attractive drug targets. The information about
essential genes of M genitalium was retrieved from DEG
database. Microbial BLASTP search as per selection criteria
mentioned in materials & methods section, suggested that it is a
non-essential protein. Finally, KEGG was used to identify the
involvement of MG_237 in M. genitalium metabolic pathways.
Based on search performed via KAAS, MG_237 was found to be
involved in four metabolic pathways namely; biosynthesis of
secondary metabolites, microbial metabolism in diverse
environments, glycolysis / gluconeogenesis, and amino sugar
and nucleotide sugar metabolism.