The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years and has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe.
The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years. The JCSG has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe, as well as making substantial inroads into structural coverage of an entire organism. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination. The pipeline uses parallel processing methods at almost every step in the process and can adapt to a wide range of protein targets from bacterial to human. The construction, expansion and optimization of the JCSG gene-to-structure pipeline over the years have resulted in many technological and methodological advances and developments. The vast number of targets and the enormous amounts of associated data processed through the multiple stages of the experimental pipeline required the development of variety of valuable resources that, wherever feasible, have been converted to free-access web-based tools and applications.
structural genomics; Joint Center for Structural Genomics; Protein Structure Initiative
Specific use cases of TOPSAN, an innovative collaborative platform for creating, sharing and distributing annotations and insights about protein structures, such as those determined by high-throughput structural genomics in the Protein Structure Initiative (PSI), are described. TOPSAN is the main annotation platform for JCSG structures and serves as a conduit for initiating collaborations with the biological community, as illustrated in this special issue of Acta Crystallographica Section F. Developed at the JCSG with the goal of opening a dialogue on the novel protein structures with the broader biological community, TOPSAN is a unique tool for fostering distributed collaborations and provides an efficient pathway to peer-reviewed publications.
The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (‘The Open Protein Structure Annotation Network’), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in insightful structure–function analysis for many proteins and have led to numerous peer-reviewed publications, as exemplified by the articles included in this issue of Acta Crystallographica Section F.
collaborative annotations; structural genomics; Protein Structure Initiative
Tools for systematic comparisons of NMR and crystal structures developed by the JCSG were applied to two proteins with known functions: the T. maritima anti-σ factor antagonist TM1081 and the mouse γ-glutamylamine cyclotransferase A2LD1 (gi:13879369). In an attempt to exploit the complementarity of crystal and NMR data, the combined use of the two structure-determination techniques was explored for the initial steps in the challenge of searching proteins of unknown functions for putative active sites.
The JCSG has recently developed a protocol for systematic comparisons of high-quality crystal and NMR structures of proteins. In this paper, the extent to which this approach can provide function-related information on the two functionally annotated proteins TM1081, a Thermotoga maritima anti-σ factor antagonist, and A2LD1 (gi:13879369), a mouse γ-glutamylamine cyclotransferase, is explored. The NMR structures of the two proteins have been determined in solution at 313 and 298 K, respectively, using the current JCSG protocol based on the software package UNIO for extensive automation. The corresponding crystal structures were solved by the JCSG at 100 K and 1.6 Å resolution and at 100 K and 1.9 Å resolution, respectively. The NMR and crystal structures of the two proteins share the same overall molecular architectures. However, the precision of the structure determination along the amino-acid sequence varies over a significantly wider range in the NMR structures than in the crystal structures. Thereby, in each of the two NMR structures about 65% of the residues have displacements below the average and in both proteins the less well ordered residues include large parts of the active sites, in addition to some highly solvent-exposed surface areas. Whereas the latter show increased disorder in the crystal and in solution, the active-site regions display increased displacements only in the NMR structures, where they undergo local conformational exchange on the millisecond time scale that appears to be frozen in the crystals. These observations suggest that a search for molecular regions showing increased structural disorder and slow dynamic processes in solution while being well ordered in the corresponding crystal structure might be a valid initial step in the challenge of identifying putative active sites in functionally unannotated proteins with known three-dimensional structure.
Thermotoga maritima anti-σ factor antagonist; mouse γ-glutamylamine cyclotransferase; NMR and crystal structure comparison; active-site conformation
The crystal structure of tryptophanyl-tRNA synthetase from T. maritima unexpectedly revealed an iron–sulfur cluster bound to the tRNA anticodon-binding region.
A novel aminoacyl-tRNA synthetase that contains an iron–sulfur cluster in the tRNA anticodon-binding region and efficiently charges tRNA with tryptophan has been found in Thermotoga maritima. The crystal structure of TmTrpRS (tryptophanyl-tRNA synthetase; TrpRS; EC 220.127.116.11) reveals an iron–sulfur [4Fe–4S] cluster bound to the tRNA anticodon-binding (TAB) domain and an l-tryptophan ligand in the active site. None of the other T. maritima aminoacyl-tRNA synthetases (AARSs) contain this [4Fe–4S] cluster-binding motif (C-x
2-C). It is speculated that the iron–sulfur cluster contributes to the stability of TmTrpRS and could play a role in the recognition of the anticodon.
TM0492; tryptophanyl-tRNA ligase; tryptophanyl-tRNA synthetase class I; iron–sulfur clusters; structural genomics
The crystal structure of BT_3984, a SusD-family protein, reveals a TPR N-terminal region providing support for a loop-rich C-terminal subdomain and suggests possible interfaces involved in sus complex formation.
The crystal structure of the Bacteroides thetaiotaomicron protein BT_3984 was determined to a resolution of 1.7 Å and was the first structure to be determined from the extensive SusD family of polysaccharide-binding proteins. SusD is an essential component of the sus operon that defines the paradigm for glycan utilization in dominant members of the human gut microbiota. Structural analysis of BT_3984 revealed an N-terminal region containing several tetratricopeptide repeats (TPRs), while the signature C-terminal region is less structured and contains extensive loop regions. Sequence and structure analysis of BT_3984 suggests the presence of binding interfaces for other proteins from the polysaccharide-utilization complex.
structural genomics; starch-utilization system; gut microbiome; metagenomics
NMR structures of the proteins TM1112 and TM1367 solved by the JCSG in solution at 298 K could be superimposed with the corresponding crystal structures at 100 K with r.m.s.d. values of <1.0 Å for the backbone heavy atoms. For both proteins the structural differences between multiple molecules in the asymmetric unit of the crystals correlated with structural variations within the bundles of conformers used to represent the NMR solution structures. A recently introduced JCSG NMR structure-determination protocol, which makes use of the software package UNIO for extensive automation, was further evaluated by comparison of the TM1112 structure obtained using these automated methods with another NMR structure that was independently solved in another PSI center, where a largely interactive approach was applied.
The NMR structures of the TM1112 and TM1367 proteins from Thermotoga maritima in solution at 298 K were determined following a new protocol which uses the software package UNIO for extensive automation. The results obtained with this novel procedure were evaluated by comparison with the crystal structures solved by the JCSG at 100 K to 1.83 and 1.90 Å resolution, respectively. In addition, the TM1112 solution structure was compared with an NMR structure solved by the NESG using a conventional largely interactive methodology. For both proteins, the newly determined NMR structure could be superimposed with the crystal structure with r.m.s.d. values of <1.0 Å for the backbone heavy atoms, which provided a starting platform to investigate local structure variations, which may arise from either the methods used or from the different chemical environments in solution and in the crystal. Thereby, these comparative studies were further explored with the use of reference NMR and crystal structures, which were computed using the NMR software with input of upper-limit distance constraints derived from the molecular models that represent the results of structure determination by NMR and by X-ray diffraction, respectively. The results thus obtained show that NMR structure calculations with the new automated UNIO software used by the JCSG compare favorably with those from a more labor-intensive and time-intensive interactive procedure. An intriguing observation is that the ‘bundles’ of two TM1112 or three TM1367 molecules in the asymmetric unit of the crystal structures mimic the behavior of the bundles of 20 conformers used to represent the NMR solution structures when comparing global r.m.s.d. values calculated either for the polypeptide backbone, the core residues with solvent accessibility below 15% or all heavy atoms.
NMR and crystal structure comparison; structure-determination software; reference structures; Thermotoga maritima
The crystal structure of a putative NTP pyrophosphohydrolase, YP_001813558.1 from E. sibiricum, reveals a novel segment-swapped linked-dimer assembly.
The crystal structure of a putative NTPase, YP_001813558.1 from Exiguobacterium sibiricum 255-15 (PF09934, DUF2166) was determined to 1.78 Å resolution. YP_001813558.1 and its homologs (dimeric dUTPases, MazG proteins and HisE-encoded phosphoribosyl ATP pyrophosphohydrolases) form a superfamily of all-α-helical NTP pyrophosphatases. In dimeric dUTPase-like proteins, a central four-helix bundle forms the active site. However, in YP_001813558.1, an unexpected intertwined swapping of two of the helices that compose the conserved helix bundle results in a ‘linked dimer’ that has not previously been observed for this family. Interestingly, despite this novel mode of dimerization, the metal-binding site for divalent cations, such as magnesium, that are essential for NTPase activity is still conserved. Furthermore, the active-site residues that are involved in sugar binding of the NTPs are also conserved when compared with other α-helical NTPases, but those that recognize the nucleotide bases are not conserved, suggesting a different substrate specificity.
structural genomics; putative NTP pyrophosphohydrolase; MazG nucleotide pyrophosphohydrolase; dUTPases
The crystal structure of BT2081 from B. thetaiotaomicron reveals a two-domain protein with a putative carbohydrate-binding site in the C-terminal domain.
BT2081 from Bacteroides thetaiotaomicron (GenBank accession code NP_810994.1) is a member of a novel protein family consisting of over 160 members, most of which are found in the different classes of Bacteroidetes. Genome-context analysis lends support to the involvement of this family in carbohydrate metabolism, which plays a key role in B. thetaiotaomicron as a predominant bacterial symbiont in the human distal gut microbiome. The crystal structure of BT2081 at 2.05 Å resolution represents the first structure from this new protein family. BT2081 consists of an N-terminal domain, which adopts a β-sandwich immunoglobulin-like fold, and a larger C-terminal domain with a β-sandwich jelly-roll fold. Structural analyses reveal that both domains are similar to those found in various carbohydrate-active enzymes. The C-terminal β-jelly-roll domain contains a potential carbohydrate-binding site that is highly conserved among BT2081 homologs and is situated in the same location as the carbohydrate-binding sites that are found in structurally similar glycoside hydrolases (GHs). However, in BT2081 this site is partially occluded by surrounding loops, which results in a deep solvent-accessible pocket rather than a shallower solvent-exposed cleft.
gut microbiome; sugars; structural genomics; immunoglobulin-like fold; jelly-roll fold
The first structures from the FmdE Pfam family (PF02663) reveal that some members of this family form tightly intertwined dimers consisting of two domains (N-terminal α+β core and C-terminal zinc-finger domains), whereas others contain only the core domain. The presence of the zinc-finger domain suggests that some members of this family may perform functions associated with transcriptional regulation, protein–protein interaction, RNA binding or metal-ion sensing.
Examination of the genomic context for members of the FmdE Pfam family (PF02663), such as the protein encoded by the fmdE gene from the methanogenic archaeon Methanobacterium thermoautotrophicum, indicates that 13 of them are co-transcribed with genes encoding subunits of molybdenum formylmethanofuran dehydrogenase (EC 18.104.22.168), an enzyme that is involved in microbial methane production. Here, the first crystal structures from PF02663 are described, representing two bacterial and one archaeal species: B8FYU2_DESHY from the anaerobic dehalogenating bacterium Desulfitobacterium hafniense DCB-2, Q2LQ23_SYNAS from the syntrophic bacterium Syntrophus aciditrophicus SB and Q9HJ63_THEAC from the thermoacidophilic archaeon Thermoplasma acidophilum. Two of these proteins, Q9HJ63_THEAC and Q2LQ23_SYNAS, contain two domains: an N-terminal thioredoxin-like α+β core domain (NTD) consisting of a five-stranded, mixed β-sheet flanked by several α-helices and a C-terminal zinc-finger domain (CTD). B8FYU2_DESHY, on the other hand, is composed solely of the NTD. The CTD of Q9HJ63_THEAC and Q2LQ23_SYNAS is best characterized as a treble-clef zinc finger. Two significant structural differences between Q9HJ63_THEAC and Q2LQ23_SYNAS involve their metal binding. First, zinc is bound to the putative active site on the NTD of Q9HJ63_THEAC, but is absent from the NTD of Q2LQ23_SYNAS. Second, whereas the structure of the CTD of Q2LQ23_SYNAS shows four Cys side chains within coordination distance of the Zn atom, the structure of Q9HJ63_THEAC is atypical for a treble-cleft zinc finger in that three Cys side chains and an Asp side chain are within coordination distance of the zinc.
Pfam family PF02663; metalloproteins; domain swapping; structural genomics; methanogenesis
The crystal structure of a novel MACPF protein, which may play a role in the adaptation of commensal bacteria to host environments in the human gut, was determined and analyzed.
Membrane-attack complex/perforin (MACPF) proteins are transmembrane pore-forming proteins that are important in both human immunity and the virulence of pathogens. Bacterial MACPFs are found in diverse bacterial species, including most human gut-associated Bacteroides species. The crystal structure of a bacterial MACPF-domain-containing protein BT_3439 (Bth-MACPF) from B. thetaiotaomicron, a predominant member of the mammalian intestinal microbiota, has been determined. Bth-MACPF contains a membrane-attack complex/perforin (MACPF) domain and two novel C-terminal domains that resemble ribonuclease H and interleukin 8, respectively. The entire protein adopts a flat crescent shape, characteristic of other MACPF proteins, that may be important for oligomerization. This Bth-MACPF structure provides new features and insights not observed in two previous MACPF structures. Genomic context analysis infers that Bth-MACPF may be involved in a novel protein-transport or nutrient-uptake system, suggesting an important role for these MACPF proteins, which were likely to have been inherited from eukaryotes via horizontal gene transfer, in the adaptation of commensal bacteria to the host environment.
MACPF; membrane-attack complexes; perforins; transmembrane pores; pathogenesis
The crystal structure of the prephenate dehydrogenase component of the bifunctional H. influenzae TyrA reveals unique structural differences between bifunctional and monofunctional TyrA enzymes.
Chorismate mutase/prephenate dehydrogenase from Haemophilus influenzae Rd KW20 is a bifunctional enzyme that catalyzes the rearrangement of chorismate to prephenate and the NAD(P)+-dependent oxidative decarboxylation of prephenate to 4-hydroxyphenylpyruvate in tyrosine biosynthesis. The crystal structure of the prephenate dehydrogenase component (HinfPDH) of the TyrA protein from H. influenzae Rd KW20 in complex with the inhibitor tyrosine and cofactor NAD+ has been determined to 2.0 Å resolution. HinfPDH is a dimeric enzyme, with each monomer consisting of an N-terminal α/β dinucleotide-binding domain and a C-terminal α-helical dimerization domain. The structure reveals key active-site residues at the domain interface, including His200, Arg297 and Ser179 that are involved in catalysis and/or ligand binding and are highly conserved in TyrA proteins from all three kingdoms of life. Tyrosine is bound directly at the catalytic site, suggesting that it is a competitive inhibitor of HinfPDH. Comparisons with its structural homologues reveal important differences around the active site, including the absence of an α–β motif in HinfPDH that is present in other TyrA proteins, such as Synechocystis sp. arogenate dehydrogenase. Residues from this motif are involved in discrimination between NADP+ and NAD+. The loop between β5 and β6 in the N-terminal domain is much shorter in HinfPDH and an extra helix is present at the C-terminus. Furthermore, HinfPDH adopts a more closed conformation compared with TyrA proteins that do not have tyrosine bound. This conformational change brings the substrate, cofactor and active-site residues into close proximity for catalysis. An ionic network consisting of Arg297 (a key residue for tyrosine binding), a water molecule, Asp206 (from the loop between β5 and β6) and Arg365′ (from the additional C-terminal helix of the adjacent monomer) is observed that might be involved in gating the active site.
tyrosine biosynthesis; prephenate; chorismate; Haemophilus influenzae; structural genomics
The crystal structure of the highly specific γ-d-glutamyl-l-diamino acid endopeptidase YkfC from Bacillus cereus in complex with l-Ala-γ-d-Glu reveals the structural basis for the substrate specificity of NlpC/P60-family cysteine peptidases.
Dipeptidyl-peptidase VI from Bacillus sphaericus and YkfC from Bacillus subtilis have both previously been characterized as highly specific γ-d-glutamyl-l-diamino acid endopeptidases. The crystal structure of a YkfC ortholog from Bacillus cereus (BcYkfC) at 1.8 Å resolution revealed that it contains two N-terminal bacterial SH3 (SH3b) domains in addition to the C-terminal catalytic NlpC/P60 domain that is ubiquitous in the very large family of cell-wall-related cysteine peptidases. A bound reaction product (l-Ala-γ-d-Glu) enabled the identification of conserved sequence and structural signatures for recognition of l-Ala and γ-d-Glu and, therefore, provides a clear framework for understanding the substrate specificity observed in dipeptidyl-peptidase VI, YkfC and other NlpC/P60 domains in general. The first SH3b domain plays an important role in defining substrate specificity by contributing to the formation of the active site, such that only murein peptides with a free N-terminal alanine are allowed. A conserved tyrosine in the SH3b domain of the YkfC subfamily is correlated with the presence of a conserved acidic residue in the NlpC/P60 domain and both residues interact with the free amine group of the alanine. This structural feature allows the definition of a subfamily of NlpC/P60 enzymes with the same N-terminal substrate requirements, including a previously characterized cyanobacterial l-alanine-γ-d-glutamate endopeptidase that contains the two key components (an NlpC/P60 domain attached to an SH3b domain) for assembly of a YkfC-like active site.
γ-d-glutamyl-l-diamino acid endopeptidase; cell-wall recycling; NlpC/P60; SH3b; cysteine peptidases; enzyme specificity
The crystal structure of the first representative of DUF364 family reveals a combination of enolase N-terminal-like and C-terminal Rossmann-like folds. Analysis of the interdomain cleft combined with sequence and genome context conservation among homologs, suggests a unique catalytic site likely involved in the synthesis of a flavin or pterin derivative.
The crystal structure of Dhaf4260 from Desulfitobacterium hafniense DCB-2 was determined by single-wavelength anomalous diffraction (SAD) to a resolution of 2.01 Å using the semi-automated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG) as part of the NIGMS Protein Structure Initiative (PSI). This protein structure is the first representative of the PF04016 (DUF364) Pfam family and reveals a novel combination of two well known domains (an enolase N-terminal-like fold followed by a Rossmann-like domain). Structural and bioinformatic analyses reveal partial similarities to Rossmann-like methyltransferases, with residues from the enolase-like fold combining to form a unique active site that is likely to be involved in the condensation or hydrolysis of molecules implicated in the synthesis of flavins, pterins or other siderophores. The genome context of Dhaf4260 and homologs additionally supports a role in heavy-metal chelation.
structural genomics; domains of unknown function; rare metals; siderophores; pterins
A survey of the types and frequency of ligands that are bound to PSI structures is analyzed as well as their utility in functional annotation of previously uncharacterized proteins.
Approximately 65% of PSI structures report some type of ligand(s) that is bound in the crystal structure. Here, a description is given of how such ligands are handled and analyzed at the JCSG and a survey of the types, variety and frequency of ligands that are observed in the PSI structures is also compiled and analyzed, including illustrations of how these bound ligands have provided functional clues for annotation of proteins with little or no previous experimental characterization. Furthermore, a web server was developed as a tool to mine and analyze the PSI structures for bound ligands and other identifying features.
structural genomics; ligands; PSI; protein–ligand complexes; data mining
Comparison of the NMR and crystal structures of a protein determined using largely automated methods has enabled the interpretation of local differences in the highly similar structures. These differences are found in segments of higher B values in the crystal and correlate with dynamic processes on the NMR chemical shift timescale observed in solution.
The NMR structure of the protein NP_247299.1 in solution at 313 K has been determined and is compared with the X-ray crystal structure, which was also solved in the Joint Center for Structural Genomics (JCSG) at 100 K and at 1.7 Å resolution. Both structures were obtained using the current largely automated crystallographic and solution NMR methods used by the JCSG. This paper assesses the accuracy and precision of the results from these recently established automated approaches, aiming for quantitative statements about the location of structure variations that may arise from either one of the methods used or from the different environments in solution and in the crystal. To evaluate the possible impact of the different software used for the crystallographic and the NMR structure determinations and analysis, the concept is introduced of reference structures, which are computed using the NMR software with input of upper-limit distance constraints derived from the molecular models representing the results of the two structure determinations. The use of this new approach is explored to quantify global differences that arise from the different methods of structure determination and analysis versus those that represent interesting local variations or dynamics. The near-identity of the protein core in the NMR and crystal structures thus provided a basis for the identification of complementary information from the two different methods. It was thus observed that locally increased crystallographic B values correlate with dynamic structural polymorphisms in solution, including that the solution state of the protein involves a slow dynamic equilibrium on a time scale of milliseconds or slower between two ensembles of rapidly interchanging conformers that contain, respectively, the cis or trans form of the C-terminal proline and represent about 25 and 75% of the total protein.
structure comparison in crystals and in solution; structure-determination software; reference structures; nitrogenase iron–molybdenum cofactor
The crystal structure of SSO2064, the first structural representative of Pfam family PF01796 (DUF35), reveals a two-domain architecture comprising an N-terminal zinc-ribbon domain and a C-terminal OB-fold domain. Analysis of the domain architecture, operon organization and bacterial orthologs combined with the structural features of SSO2064 suggests a role involving acyl-CoA binding for this family of proteins.
SSO2064 is the first structural representative of PF01796 (DUF35), a large prokaryotic family with a wide phylogenetic distribution. The structure reveals a novel two-domain architecture comprising an N-terminal, rubredoxin-like, zinc ribbon and a C-terminal, oligonucleotide/oligosaccharide-binding (OB) fold domain. Additional N-terminal helical segments may be involved in protein–protein interactions. Domain architectures, genomic context analysis and functional evidence from certain bacterial representatives of this family suggest that these proteins form a novel fatty-acid-binding component that is involved in the biosynthesis of lipids and polyketide antibiotics and that they possibly function as acyl-CoA-binding proteins. This structure has led to a re-evaluation of the DUF35 family, which has now been split into two entries in the latest Pfam release (v.24.0).
structural genomics; domains of unknown function; acyl-carrier proteins; acyl-coA; polyketide biosynthesis
The crystal structures of SPO0140 and Sbal_2486 revealed a two-domain structure that adopts a novel fold. Analysis of the interdomain cleft suggests a nucleotide-based ligand with a genome context indicating signaling as a possible role for this family.
The crystal structures of SPO0140 and Sbal_2486 were determined using the semiautomated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG) as part of the NIGMS Protein Structure Initiative (PSI). The structures revealed a conserved core with domain duplication and a superficial similarity of the C-terminal domain to pleckstrin homology-like folds. The conservation of the domain interface indicates a potential binding site that is likely to involve a nucleotide-based ligand, with genome-context and gene-fusion analyses additionally supporting a role for this family in signal transduction, possibly during oxidative stress.
structural genomics; domain of unknown function; domain duplication; signaling; oxidative stress
The crystal structure of the BVU2987 gene product from B. vulgatus (UniProt A6L4L1) reveals that members of the new Pfam family PF11396 (domain of unknown function; DUF2874) are similar to β-lactamase inhibitor protein and YpmB.
Proteins that contain the DUF2874 domain constitute a new Pfam family PF11396. Members of this family have predominantly been identified in microbes found in the human gut and oral cavity. The crystal structure of one member of this family, BVU2987 from Bacteroides vulgatus, has been determined, revealing a β-lactamase inhibitor protein-like structure with a tandem repeat of domains. Sequence analysis and structural comparisons reveal that BVU2987 and other DUF2874 proteins are related to β-lactamase inhibitor protein, PepSY and SmpA_OmlA proteins and hence are likely to function as inhibitory proteins.
BVU2987; DUF2874; PF11396; human gut microbiome; β-lactamase inhibitor protein-like fold; putative inhibitor proteins
NE1406, the first structural representative of PF09410, reveals a lipocalin-like fold with features that suggest involvement in lipid metabolism. In addition, NE1406 provides potential structural templates for two other protein families (PF07143 and PF08622).
The first structural representative of the domain of unknown function DUF2006 family, also known as Pfam family PF09410, comprises a lipocalin-like fold with domain duplication. The finding of the calycin signature in the N-terminal domain, combined with remote sequence similarity to two other protein families (PF07143 and PF08622) implicated in isoprenoid metabolism and the oxidative stress response, support an involvement in lipid metabolism. Clusters of conserved residues that interact with ligand mimetics suggest that the binding and regulation sites map to the N-terminal domain and to the interdomain interface, respectively.
structural genomics; domains of unknown function; calycin; lipocalin; fatty-acid binding proteins
The crystal structures of two orthologous proteins from different Shewanella species have uncovered a resemblance to CRAL-TRIO carrier proteins, which suggest that they function as transporters of small nonpolar molecules. One protein adopts an open conformation, while the other adopts a closed structure that may act as a conformational switch in the transport of ligands at the membrane surface.
The crystal structures of the proteins encoded by the YP_749275.1 and YP_001095227.1 genes from Shewanella frigidimarina and S. loihica, respectively, have been determined at 1.8 and 2.25 Å resolution, respectively. These proteins are members of a novel family of bacterial proteins that adopt the α/β SpoIIAA-like fold found in STAS and CRAL-TRIO domains. Despite sharing 54% sequence identity, these two proteins adopt distinct conformations arising from different dispositions of their α2 and α3 helices. In the ‘open’ conformation (YP_001095227.1), these helices are 15 Å apart, leading to the creation of a deep nonpolar cavity. In the ‘closed’ structure (YP_749275.1), the helices partially unfold and rearrange, occluding the cavity and decreasing the solvent-exposed hydrophobic surface. These two complementary structures are reminiscent of the conformational switch in CRAL-TRIO carriers of hydrophobic compounds. It is suggested that both proteins may associate with the lipid bilayer in their ‘open’ monomeric state by inserting their amphiphilic helices, α2 and α3, into the lipid bilayer. These bacterial proteins may function as carriers of nonpolar substances or as interfacially activated enzymes.
YP_001095227.1; YP_749275.1; SpoIIAA-like proteins
The crystal structure of the NGO1945 gene product from N. gonorrhoeae (UniProt Q5F5IO) reveals that the N-terminal domain assigned as a domain of unknown function (DUF2063) is likely to bind DNA and that the protein may be involved in transcriptional regulation.
Proteins with the DUF2063 domain constitute a new Pfam family, PF09836. The crystal structure of a member of this family, NGO1945 from Neisseria gonorrhoeae, has been determined and reveals that the N-terminal DUF2063 domain is likely to be a DNA-binding domain. In conjunction with the rest of the protein, NGO1945 is likely to be involved in transcriptional regulation, which is consistent with genomic neighborhood analysis. Of the 216 currently known proteins that contain a DUF2063 domain, the most significant sequence homologs of NGO1945 (∼40–99% sequence identity) are from various Neisseria and Haemophilus species. As these are important human pathogens, NGO1945 represents an interesting candidate for further exploration via biochemical studies and possible therapeutic intervention.
NGO1945; PF09836; DUF2063; putative DNA-binding proteins; putative transcription regulators; structural genomics
The crystal structure of the first representative of the Pfam PF07336 (DUF1470) family reveals a two-domain organization that contains a new fold, termed the ABATE domain, at the N-terminus and a treble-clef zinc finger that is likely to bind DNA at the C-terminus.
The crystal structure of Jann_2411 from Jannaschia sp. strain CCS1, a member of the Pfam PF07336 family classified as a domain of unknown function (DUF1470), was solved to a resolution of 1.45 Å by multiple-wavelength anomalous dispersion (MAD). This protein is the first structural representative of the DUF1470 Pfam family. Structural analysis revealed a two-domain organization, with the N-terminal domain presenting a new fold called the ABATE domain that may bind an as yet unknown ligand. The C-terminal domain forms a treble-clef zinc finger that is likely to be involved in DNA binding. Analysis of the Jann_2411 protein and the broader ABATE-domain family suggests a role as stress-induced transcriptional regulators.
structural genomics; environmental stress; domains of unknown function; Pfam; bound metal identification
The first structural representative of the PF08866 (DUF1831) protein family reveals a potential new α+β fold and indicates a possible involvement in amino-acid metabolism.
The structure of LP2179, a member of the PF08866 (DUF1831) family, suggests a novel α+β fold comprising two β-sheets packed against a single helix. A remote structural similarity to two other uncharacterized protein families specific to the Bacillus genus (PF08868 and PF08968), as well as to prokaryotic S-adenosylmethionine decarboxylases, is consistent with a role in amino-acid metabolism. Genomic neighborhood analysis of LP2179 supports this functional assignment, which might also then be extended to PF08868 and PF08968.
structural genomics; DUFs; S-adenosylmethionine decarboxylase; amino-acid metabolism; probiotics
PA1994, a Pfam PF06475 (DUF1089) family homolog from P. aeruginosa, reveals remote similarities to lipoprotein localization factors and a conserved putative glycolipid-binding site.
The crystal structure of PA1994 from Pseudomonas aeruginosa, a member of the Pfam PF06475 family classified as a domain of unknown function (DUF1089), reveals a novel fold comprising a 15-stranded β-sheet wrapped around a single α-helix that assembles into a tight dimeric arrangement. The remote structural similarity to lipoprotein localization factors, in addition to the presence of an acidic pocket that is conserved in DUF1089 homologs, phospholipid-binding and sugar-binding proteins, indicate a role for PA1994 and the DUF1089 family in glycolipid metabolism. Genome-context analysis lends further support to the involvement of this family of proteins in glycolipid metabolism and indicates possible activation of DUF1089 homologs under conditions of bacterial cell-wall stress or host–pathogen interactions.
structural genomics; DUFs; glycolipids; osmotic stress; host–pathogen interactions
The crystal structure of an essential bacterial protein, YeaZ, from T. maritima identifies an interface that potentially mediates protein–protein interaction.
YeaZ is involved in a protein network that is essential for bacteria. The crystal structure of YeaZ from Thermotoga maritima was determined to 2.5 Å resolution. Although this protein belongs to a family of ancient actin-like ATPases, it appears that it has lost the ability to bind ATP since it lacks some key structural features that are important for interaction with ATP. A conserved surface was identified, supporting its role in the formation of protein complexes.
YgjD; YeaZ; TM0874; essential genes; protein complexes