The analysis of thermal unfolding MD trajectories provides insight to reveal the key phenomena of the first steps of protein unfolding. The study of structural aspects of states along the unfolding pathway may reveal the basis of wild type PHD2 malfunction in some types of tumors.
Variations in protein dimension and native contacts are indicators of protein structural changes during thermal denaturation. We compute the radius of gyration and the fraction of long-range native contacts for the structures of PHD2 species along 25 ns thermal unfolding simulations (). The corruption of long-range native contacts is apparent by the decrease in Qsh value for the a-PHD2 (nascent form of PHD2), f-PHD2 (PHD2 with iron in active site) and fh-PHD2 (f-PHD2 with protonated histidine residues) structures upon heating. Also, an increase in protein radius of gyration indicates that the structures swell as a consequence of partial thermal unfolding. Changes in these metrics indicate that, despite identical protein sequence and initial structure, these three different species of PHD2 exhibit different properties upon unfolding.
Representation of unfolding process metrics.
A suitable reaction coordinate is necessary to study the unfolding pathway chronologically. The average distance of query structure’s properties from the native state properties (d
) is a suitable reaction coordinate to analyze protein unfolding 
. We measure the distance (d
) of PHD2 unfolded structures from the crystal structure. This metric assigns a value of zero to native structure and one to the most unfolded structures. Probability densities of d
indicate that, while most structures are a large distance from the native state and therefore unfolded, the distributions of populated structures vary between the different PHD2 species ( and Tables S1
). This finding implies the possibility of various unfolding pathways for PHD2. For example, f-PHD2 has a clear population when it loses 45% of its similarity to its native state properties (d
0.45) while other species do not have such population. On the other hand, the most populated structure of fh-PHD2 appears at d
0.7 and other two species have the most populated structures in other regions of reaction coordinate.
The distributions of d metric as a reaction coordinate upon thermal unfolding.
To confirm the distance (d
) distributions of structure populations, we perform PCA with the properties used to construct d
as the reaction coordinate. PCA reduces the multidimensionality of property space 
. The first two components of PCA cover over 95% of data variance. The 2D kernel density maps of principal components 1 and 2 indicate different highly populated regions for partial unfolding process of PHD2 species (). Because the 2D kernel density of PCA components is a population density (Xi
), it is possible to consider it as 2D potential of mean force (PMF) map for unfolding trajectories (PMF
2D kernel densities of PCA for the PHD2 species unfolding states.
The appearance of highly populated regions in the 2D PCA maps (darker regions) demonstrates that three species of PHD2 have different populated states during thermal unfolding, implying a multi-state model of unfolding. The populated regions of such maps are in good consistency with d kernel density distributions (). The 2D PCA map for a-PHD2 indicates this species has four populated states in the most unfolded region of its trajectory (). The coordination of Fe (II) to the active site lumen of a-PHD2 creates the f-PHD2 species, decreasing the number of states to two. Protonation of histidine residues in f-PHD2 (fh-PHD2) reduces its unfolded populations to a single populated state ().
What are the properties of the thermal unfolding states of the various PHD2 species? How do the addition of iron and protons to the PHD2 structure change the prominent states of thermal unfolding? In the following section we attempt to answer these questions and reveal the consequences of PHD2 species partial unfolding.
Notably, PCA presumes a linear relationship between variables, and thus ignores non-linear interrelationships between variables. To compensate for this effect, we perform non-metric multidimensional scaling (n-MDS) for the properties used to construct the d
. We use the first two-fitted n-MDS configuration vectors to cluster trajectory structures into separate populations. Although data compressing methods such as PCA and n-MDS reveal separate populations for PHD2 species unfolding, they are unable to find the border and a representative structure for each population. To provide these data, we utilize exemplar-based affinity propagation (AP) clustering 
. The AP clustering of n-MDS outputs reveals the same populations as when using the d
metric distribution and 2D-PCA maps for thermal unfolding of PHD2 species (). The representative structures are the centroids of the clusters.
The results of structures clustering by Affinity propagation (AP) method for PHD2 species.
Now, we are able to compute various structural and thermodynamical properties of each state (AP cluster) along the unfolding pathway (Tables S4
). To simplify the representation of at least 20 different parameters for 7 states of each PHD2 species, we compute the information content of each averaged parameter for all states of each species. A variable with high information content is likely to be important in the process of unfolding. By comparing the information content of each property between different PHD2 species, we capture which properties are more influenced by the addition of iron or protons to a-PHD2 ().
Comparing the percentage of PHD2 species contribution to each variable during partial unfolding.
The stand-out variables in information content graph are the accessible surface area of tryptophan residues (s.Wxxx.ASA). Tryptophan 258 (W258), which resides at the entrance of the PHD2 active site lumen, is the first tryptophan residue that reaches its maximal exposure in the a-PHD2 species, which occurs when the protein reaches a state that is only 40% unfolded (d=
0.4), compared to other tryptophan residues in a-PHD2 that do not become exposed until the last steps of protein unfolding (Table S4
). Two tryptophan residues (W334 and W389) located along the active site lumen reach their maximal solvent accessibility when fh-PHD2 becomes 60% unfolded. Tryptophan 389 (W389) reaches its maximal accessibility in f-PHD2 when the protein is in 70% unfolded structure. These observations indicate that the coordination of the Fe (II) ion to the PHD2 active site sensitizes tryptophan 389 to structural changes before the completion of thermal unfolding. The acidic environment in fh-PHD2 induces the selective exposure of tryptophan 334 upon PHD2 partial unfolding. Knowledge of the effects of selective exposure of tryptophan residues as the consequence of iron or proton addition are helpful for designing single molecule fluorescent labels for further experimental study of PHD2 unfolding pathway.
Another parameter whose information content varies between the three PHD2 species is the change in heat capacity (ΔCp) along the unfolding pathway ( and Table S4
). The ΔCp reaches its maximum value in f-PHD2, a-PHD2, and fh-PHD2 sequentially. This results in the f-PHD2 hydrophobic surface becoming accessible to solvent in the early steps of protein unfolding.
By scrutinizing the decay rate of strand structures (l.E%) (Table S4
), we observe that f-PHD2 strand structures are more stable than other PHD2 species, although they reside at the same distance from the native structure. The structure of f-PHD2 retains 50% of its strand structures even though the protein is far from the native state (d
0.8). The amount of hydrophobic surface exposure and the percent of remaining strand structures suggest that the common version of cellular PHD2, f-PHD2, may be vulnerable to aggregation upon partial denaturation 
. These observations provide misfolding and aggregation as a possible mechanism for wild type PHD2 malfunction in normoxic tumor cells.
For further analysis of the aggregation propensity of the states of the various PHD2 species, we compute changes in the protein hydration free energy (ΔG(h)
) for the representative structures of three species of PHD2 using the APBS software. This software is able to compute the hydration free energy of the polar and apolar parts of each structure (). A positive ΔG(h)
indicates an unfavorable hydration upon unfolding 
. The a-PHD2 species has an unfavorable total hydration free energy when it passes 50 or 90% of the way along its unfolding pathway. Therefore, considering ΔG(h)
, a-PHD2 is aggregation prone in most unfolded states. In contrast, fh-PHD2 is hydrated all over the unfolding pathway (). The functional form of PHD2, f-PHD2, does not hydrate along the unfolding pathway. The change in total hydration free energy, the strand structural content, and the ΔCp suggest that f-PHD2 is the aggregation prone species of PHD2.
The changes of protein solvation free energy upon unfolding.
The quantity of the polar hydration free energy also indicates an odd behavior of the f-PHD2: that the amounts of polar solvation of the f-PHD2 structures do not facilitate f-PHD2 unfolding (the polar part of ΔG(h) is usually negative because polar regions become hydrated easily). It is possible that the iron in the f-PHD2 structure causes gathering of polar regions around some nuclei while allowing hydrophobic regions to become exposed during the protein partial unfolding. Therefore, Fe (II) possibly contributes to f-PHD2 aggregation.
Because the iron atom plays critical role in PHD2 function and also is a risk factor for possible PHD2 aggregation, we study in detail the iron experienced events during PHD2 unfolding.
There is an arginine residue (R383) at the end of PHD2 active site lumen which is essential for substrate binding. The position of R383 does not change severely during PHD2 partial unfolding. The measurement of iron – R383 distance indicates that the distance between iron and R383 is increased suddenly in fh-PHD2 unfolding () while such distance increment did not observe for f-PHD2 unfolding. Possibly, such Fe atom jump is necessary to alleviate harsh conditions during fh-PHD2 unfolding.
The consequences of Fe atom detaching from PHD2 active site lumen.
The study of Fe atom – basic residues interactions indicates the repulsion between the fh-PHD2 Fe atom and basic residues decreases and reaches to a minimum during first 5 ns where the distance between Fe atom and active site lumen is increased (1). On the other hand, the interaction of Fe atom and acidic residues is attractive for f-PHD2 (Figure S2
). There are two important strand structures in PHD2 active site lumen. A strand is made by residues from 308 to 320 (strand HD). This strand carries D315 and H313 residues. These two residues with an additional histidine (H374) which resides in the second important strand (residues 372 to 377, strand H) make PHD2 active sites. The strand H and HD cover the floor of active site. The study of interaction energy between strand H and HD indicates that these two strands interaction energy is decreased gradually during f-PHD2 unfolding. It means the floor of f-PHD2 active site is disrupted upon unfolding while the interaction energy between fh-PHD2 strand H and HD does not change critically ().
We propose the protonation of histidine residues makes additional electrostatic attractions between strand H and HD in fh-PHD2. Also it is concluded the Fe atom is pushed out from fh-PHD2 active site lumen therefore it does not make trouble for active site floor integrity. Such Fe atom pushing out causes severe rupture of f-PHD2 active site floor (Figures S3
). These observations denote that D315 traps the Fe atom during PHD2 unfolding. D315 acts as a relay between strand HD and the outside of active site lumen. In this study, it is inferred that D315 is free to guide Fe atom from fh-PHD2 active site lumen to outside. While the mentioned aspartate residue is not free to relay Fe ion movement in f-PHD2 therefore the Fe-rooted repulsion between strand H and HD increases. Such Fe residence in f-PHD2 structure possibly prepares conditions for starting protein assembly and seeding the aggregation.
To estimate the total stability of each PHD2 species, we need a structural criterion to derive protein stability from a single temperature protein unfolding trajectory. To derive such a criterion, we return to the basic definition of the transition temperature (Tm), where the populations of native and unfold structures are in equilibrium.
For a structure x
residing at dx
along the unfolding pathway, we compute the fraction of structures within 3 angstroms RMSD of x
, and the fraction of structures within 3 angstroms RMSD of x
. The point at which these two fractions become equal represents the transition point and its d
value (). The junction in this graph represents the position of the main transition point of PHD2 species thermal unfolding. We set the cutoff of pairwise RMSD to 3 angstroms because the average RMSD value at d
0.5 with respect to the native state is 3 angstroms. At d=0.5
the PHD2 structures are not completely folded nor completely unfolded, and hence do not have a bias toward either state.
The maximal stability of PHD2 species.
The junction point appears in d
0.6 for a-PHD2, while the transition points of f-PHD2 and fh-PHD2 appear at d
0.72 and 0.70 respectively. Therefore, the coordination of Fe (II) to PHD2 structure improves the protein stability, but at the same time prominently enhances the protein aggregation propensity in response to partial unfolding. Possibly, the coordination of the Fe (II) ion to PHD2 stabilizes the states that are reactive for aggregation.
We conducted MD simulations of the partial thermal unfolding pathways of three species of the PHD2 protein. We clustered structures along the MD trajectories and characterized their corresponding unfolded states.
Partial unfolding is a critical step for protein misfolding and aggregation. We inferred that immature PHD2 (a-PHD2) may aggregate at the last stages of unfolding. However, the functional form of PHD2 is aggregation prone even in the first stages of unfolding.
The addition of the Fe (II) and protons to PHD2 structure severely changes its unfolding pathway states. Among the PHD2 species, f-PHD2 unfolding states may be more susceptible for misfolding and aggregation. Such misfolding propensity arises from positive ΔG(h), high content of strand structures as aggregation triggering structures, Fe ion derived rupture of structure and high exposed hydrophobic area during unfolding. It may provide explanation for PHD2 malfunctions in some normoxic tumor cells.