The sequences of rSK1 (gi 9506831), rSK2 (gi 9506833), rSK3 (gi 31543039), Ca
v1.2 (gi 158186633), α-actinin (gi 1142640), AMDAR (gi 167001419) and NMDAR (gi 11038637) were taken from the NCBI protein database in FASTA format. Subsequently, using the NCBI protein BLAST service and the Blosum62 matrix
[44], we found sequences homologous to the abovementioned proteins from the protein non-redundant database in the Archea (taxid 2157) and Eubacteria taxa (taxid 2), as well as vertebrate classes including Fishes (taxid 7898), Amphibia (taxid 8292), Aves (taxid 8782) and Mammalia (taxid 40674) (
Table S1).
We calculate the thermodynamic, biophysical, and structural parameters ΔCp (change in specific heat), ΔCp(hyd) (change in hydration specific heat), ΔG(hyd) (change in Gibbs energy of hydration), ΔG(oct) (change in free energy of transfer from water to octanol), ΔG(wif) (change in free energy of transfer from water to POPC interface), ΔΔG(α-helix), GG4Br, ΔH(hyd) (change in enthalpy of hydration) and kProt for the sequences obtained from the BLAST. We consider ΔCp, ΔG(hyd) and ΔH(hyd) as parameters characterizing protein properties in the water phase. ΔG(oct), ΔG(wif) and ΔΔG(α-helix) have a role in the transition of proteins from the aqueous phase to the lipid phase. Finally, ΔCp(hyd), GG4Br, and kProt explain the behavior of proteins in the lipid phase.
We perform this calculation using the Hamid, Ali akbar, Maryam Data Analyser Machine (HAMDAM) software (freely available upon request). We calculate the hydration (hyd) parameters ΔCp(hyd), ΔG(hyd) and ΔH(hyd) of each sequence using the following equations
[45]–
[47]:
Where ΔX refers to the change in X from the native state to the unfolded state, ΔF(hyd) represents each of the three parameters, j is the residue position, ASA stands for the accessible surface area, and
n represents the total number of residues in each sequence.
We obtain ΔCp from the following equation
[48]:
In order to calculate the ΔG(oct)
[49], ΔG(wif)
[50],
[51], ΔΔG(α-helix)
[52], GG4Br
[53], and kProt
[54] (all indicated with a “W” after the parameter name in figures), we employ the Sliding Window Recognizer (SWR) procedure
[55]. This procedure reads the protein sequence within a window of a given number of residues and computes the parameters for the amino acids within that window, then slides forward one residue and repeats the process. We choose a window of 10 residues and calculate the parameter average for each window. Then we report the average of averages over all windows. In the case of the ΔΔG(α-helix) parameter, although proline residues are considered helix breakers, their behavior differs in membrane proteins
[56], which led us to consider this amino acid as a helix maker within this subset of proteins. For calculation of the GG4Br parameter, the number of GXXXG[I/V] motifs are counted in each window. We perform Anova and PCA analysis using the free software PSPP (
http://www.gnu.org/software/pspp).
To produce alkaline phosphatase (APHO)18A3L, APHO16A5L and APHO14A7L sequences, three peptide constructs generate with the 18A3L, 16A5L and 14A7L amino acids compositions. In order to consider different sequences for each of the three amino acid compositions, we generate 2000 random sequences for each peptide and insert them to the corresponding site in alkaline phosphatase.