Sequence similarity between KH domains of hnRNP K and ribosomal
protein S3 described back in 1993 (
17)
can be detected by PSI-BLAST. Even the gapped BLAST program finds
S3 when hnRNP K is used as a query and vice versa. For example,
gapped BLAST aligns the first KH domain of human hnRNP K [NCBI
database gene identification number (GI) 585911, residues 35–95] taken
as a query with the ribosomal protein S3 from
Deinococcus
radiodurans (GI:7473848) detected in the nr protein sequence
database (November 2000, 582 290 sequences; 183 345 511 letters).
The alignment spans through 64 residues, which constitutes virtually
the entire KH motif and displays 32% identity (47% similarity,
score 31.4 bits, E-value 1.3). BLAST alignment of hnRNP K (GI:585911 residues
35–95) and
H.halobium S3 (GI:133930) spans through
36 residues giving 38% identity (63% similarity, no gaps,
score 31 bits, E-value 1.6). In this alignment, a nine-residue segment
in the KH signature region is invariant between the two sequences:
VIGKGGKNI (GI:585911 residues 57–65, GI:133930 residues
54–62). Conversely, when
D.radiodurans S3
(GI:7473848 residues 63–126) is taken as a gapped BLAST
query, the first KH domain of human hnRNP K (GI:585911) is found
with a score of 34 bits (E-value 0.19). Additionally, the KH domain
of GTPase ERA from
Mycobacterium leprae is found
with a score of 34.3 bits (E-value 0.16), 31% identity
(50% similarity) in a 47-residue alignment.
When the KH motif was first described (
17),
no spatial structure for KH-containing proteins was determined.
By now, we have several KH domain structures in hand (
18,
38–
44), including those detected by sequence similarity in the original paper that
identified the motif (
17): hnRNP
K and ribosomal protein S3. The structure of the C-terminal KH domain
of human hnRNP K has been determined by NMR spectroscopy (Fig. B) (
40) and the coordinates for S3 became available recently following the
solution of the X-ray structure of the entire 30S ribosome subunit
from
Thermus thermophilus (Fig. E) (
44). Was the prediction of structural similarity between hnRNP K and
S3 based on sequence similarity in the KH motif region fulfilled?
Yes and no. The conformations of residues in and around the KH consensus
VIGXXGXXI are indeed very similar between the two structures (Figs B and E and A). Near the consensus, the protein chain is folded as two α-helices,
A and
B (Fig. ), arranged at an angle of 100–120° to each
other. A two-residue protruding turn connects the α-helices
A and
B (Figs B and E and A). The two largely invariant glycines
separated by two variable residues in the turn (GXXG) serve as C-
and N-caps of the two α-helices
A and
B. The side chains of residues around the consensus are conformationally similar (Fig. B and
E) and are likely to bear the same functional role. The KH consensus
sequence has been implied in direct contact with nucleic acids (
17,
19,
21,
22,
45) and the recent crystal structure of nova-2 KH domain bound to a 20mer RNA hairpin (
43)
confirmed this hypothesis (Fig. B). The α-helix
A, the following turn and the β-strand
b (Fig. ) are involved in extensive contacts with RNA.
Thus the local motif identified by the statistically supported sequence
similarity is folded the same way in hnRNP K and S3 structures,
and is likely to bind nucleic acids by the same mechanism. But are
the global folds of the two proteins similar? The first spatial
structure of a KH motif protein, the sixth KH domain of vigilin
(Fig. A), revealed the presence of a compact
domain. In addition to the motif sequence covering the βααβ unit
(Fig. ,
a,
A,
B and
b), the KH domain included a βα unit at the C-terminus that
is inherently important for its structural integrity (
18).
Indeed, the β-strand
c is
the central element of the three-stranded anti-parallel β-sheet
(Fig. A and B). The α-helix
C (Fig. A and B) completes the hydrophobic core of the protein and the KH domain is unable
to fold when this α-helix is deleted
(
18). The vigilin KH domain can
be described as an α+β two-layer
sandwich with α-β plate topology
(
9,
10).
This topology is also known as the ‘ferredoxin-like’ protein
fold (
11,
12)
(the last strand of the ferredoxin common fold is missing in the
KH domain). An example of a protein with α-β plate topology that does not share sequence similarity with the KH domain, namely the C-terminal
domain of the
Escherichia coli arginine repressor (
46), is illustrated in Figure C.
The structure of the vigilin domain leads to re-definition of the
KH motif boundaries to cover the helix
C (
18), making the domain length equal to approximately 70 residues. However, several KH sequences lack the
C helix. These include ribosomal protein S3, amongst others. The shorter KH sequences that match the original definition
of the KH motif (
17) were termed ‘mini-KH’,
in contrast to typical ‘vigilin-like’ ‘maxi-KH’ domains
(
18). Surprisingly, the structure
of the ribosomal protein S3 N-terminal domain (
44)
revealed that the β-sheet topology of
the mini-KH domain is drastically different from the one established
for maxi-KH (Fig. E). Indeed, not only
the α-helix
C, but
also the central β-strand
c,
which seemed to be crucially important for the fold, is lacking
in S3 structure (Fig. E). Alternatively,
another β-strand (
a′) and α-helix (
A′) donated by the
N-terminal part of the domain complete the hydrophobic core of the
mini-KH. Such an arrangement results in architectural similarity
between maxi- and mini-KH: both domains are built from a three-stranded β-sheet with three α-helices packed on one side of it (Figs and A). The difference is topological: while in maxi-KH the β-sheet is anti-parallel,
in mini-KH it is mixed. Parallel β-strands
a and
b that were included in the original definition of KH motif (
17,
19) form hydrogen bonds with each other in the S3 structure (Fig. E), but are separated
by the β-strand
c in
maxi-KH (Fig. B). Another structure of
a mini-KH domain-containing protein, GTPase ERA (
41),
displays significant topological similarity to S3 (Fig. D) and thus confirms that the structure of S3 is not an exception, but a template for mini-KH domains. The structures
topologically similar to mini-KH domain are known among proteins
that do not contain KH motif. For example, the C-terminal domain
of
E.coli GMP synthetase (
47)
is shown in Figure F.
Global structure similarity search programs such as DALI (
30–
32), VAST (
33,
34)
and CE (
35) find similarity significant within
mini- and maxi-KH classes, but concur on the global structural differences
between the two classes. For example, DALI finds the structures
of two mini-KH domains similar: the KH domains of S3 (PDB entry
1FJF chain C) and GTPase ERA (1EGA chain A, C-terminal domain) are
aligned with z-score of 4.1, root mean-squared deviation
(RMSD) of 4.3 Å and 7% sequence identity in the
alignment of 89 residues. DALI does not report similarity of these
proteins to any of the maxi-KH domains implying that corresponding
z-scores are <2.0.
The analysis presented forces us to return to the original definition
of the KH motif boundaries that include only the βααβ unit
shared between maxi- and mini-KH domains (Fig. G).
In addition to this shared KH motif element, maxi- and mini-KH domains
contain C- and N-terminal extensions, respectively. Therefore in
terms of the overall domain size, the mini-KH domain is not smaller
than the maxi-KH domain: both comprise approximately 70 residues.
The mini-/maxi-KH terminology was originally meaningful.
The mini-KH domain does not contain the C-terminal β-strand
and α-helix (Fig. A and
B,
c and
C) of maxi-KH that were
included in the modified KH domain definition (
18).
Prior to mini-KH structure determination it was not known that sequence
segments upstream of the N-terminal boundary set by maxi-KH would
be part of the hydrophobic core of the mini-KH domain, thus the
mini-KH domain appeared to be shorter than the maxi-KH domain. However,
due to the lack of chain length differences between the two domains,
as revealed by their crystal structures, mini/maxi terminology
loses its meaning. We suggest naming the two topologically different
KH domains KH type I for the KH domain with the C-terminal βα extension [maxi-KH,
its structure was determined first (
18)],
and KH type II for the KH domain with N-terminal αβ extension
(mini-KH).
It is clear that the type I and II KH domains belong to different
protein folds (Fig. A, B, D and E). It
is also clear that they share the same KH motif (Fig. G). What is the evolutionary connection between the two different KH domains with the same KH motif? The simplest,
and well-documented, mechanism of topological changes in protein
evolution (
48–
50), circular permutation, is not possible in this case since the order of secondary structural elements differs:
a βα unit is present at the C-terminus
of the type I KH, but an αβ unit
starts type II KH. It is therefore likely that type I and II KH
domains are not homologous throughout their entire length. Theoretically,
four evolutionary scenarios are possible. First, local sequence, structural
and functional similarities in the KH motif region were acquired
independently by type I and II KH domains and thus are convergent.
Second, the element of the local sequence similarity (minimally,
sequence segment around the turn between the α-helices
A and
B; Fig. ) was inserted in two different structural templates: type I and II
KH domains. Third, the homology region covers the entire βααβ unit,
which represents a ‘primordial’ KH domain. This
domain was expanded by the C-terminal extension to form a type I
KH domain fold or by the N-terminal extension to form a type II
KH domain fold. Fourth, one of the two types represents the ancestral
form and the other type evolved through N- or C-terminal extension,
and displacement and deletion at the other end.
It appears that the third and fourth scenarios offer the simplest
explanation to the available data. Indeed, insertions, deletions
and terminal extensions are very common events in protein evolution
(
51,
52).
Also, it was argued, and largely accepted, that statistically significant
similarity detected from the sequence alone (without consideration
of spatial structure) reflects descent from the common ancestor,
i.e. homology (
16,
53,
54). Programs that are routinely used for sequence similarity searches, such as PSI-BLAST (
24,
25), are based on amino acid similarity matrices which are derived under evolutionary models (
55)
or computed from the aligned homologous sequences (
26)
and thus are intended to find homologs. Therefore, convergent origin
of KH domains appears unlikely due to their highly significant sequence
similarity (
17–
19). At present, it is hard to discriminate between the third and fourth scenarios. The third scenario might
seem unrealistic, since it assumes the existence of a putative primordial βααβ domain, which might not be stable in the absence of the N- or C-terminal α-helix
to pack against the β-sheet. However,
it is likely that primordial proteins existed in tight contacts
with RNA and might not be foldable in the absence of RNA molecules.
It is also reasonable to assume that primordial proteins were significantly
shorter than average present-day domains. The fourth scenario offers
a physically realistic model that might pass through an intermediate
protein containing both N- and C-terminal extensions before one
of the extensions was eliminated. There is a chance that such a ‘hybrid’ protein
still exists in nature. Thus to discover the KH motif-containing
protein with topology αββααββα (a
combination of both type I and II domains, four-stranded β-sheet
with four α-helices on one side; Fig. A, B, D and E), would be an argument favoring the fourth scenario.
Interestingly, the C-terminal extension
cC (Fig. A and B) in the type I KH domains required rearrangement of the β-sheet: hydrogen-bonding
between β-strands
a and
b of the putative ‘primordial’ KH domain should have been broken to accommodate the central β-strand
c. Typically, terminal extensions do not disrupt the β-sheet topology, but add up to
the existing structural core, like the N-terminal extension
a′
A′ (Fig. D and E) in the type II KH domain. However,
the KH domain is not the first example for which the rearrangement
of β-sheet topology has been suggested.
Serine protease inhibitors, serpins, are known to undergo the conformational
change during which one of the β-strands
is inserted between the two hydrogen-bonded parallel β-strands
(
56). P-loop ATPases that display
statistically significant sequence similarity in Walker A and B
motifs (
57) are known to possess
several distinct topologies that can be transformed to each other
through the β-sheet rearrangement (
58,
59). β-Sheet rearrangement was postulated
for the triabin that shares sequence similarity with lipocalins
but possesses distinct topology (
14).
In summary, analysis of available spatial structures revealed that
there are two different KH domains that belong to different protein
folds, but share a single KH motif. The KH motif is folded into
a βααβ unit.
In addition to the motif core, type II KH domains (e.g. ribosomal
protein S3) include N-terminal extension αβ and
type I KH domains (e.g. hnRNP K) contain C-terminal extension βα. A β-strand of this extension in type I KH is inserted into the β-sheet
formed by the KH motif βααβ unit
offering a clear example of a rare structural rearrangement. KH
domains demonstrate how proteins can change fold in the course of
evolution.