|Home | About | Journals | Submit | Contact Us | Français|
Myosin performs ATP free energy transduction into mechanical work in the motor domain of the myosin heavy chain (MHC). Energy transduction is the definitive systemic feature of the myosin motor performed by coordinating in a time ordered sequence: ATP hydrolysis at the active site, actin affinity modulation at the actin binding site, and the lever-arm rotation of the power stroke. These functions are carried out by several conserved sub-domains within the motor domain. Single nucleotide polymorphisms (SNPs) affect the MHC sequence of many isoforms expressed in striated muscle, smooth muscle, and non-muscle tissue. The purpose of this work is to provide a rationale for using SNPs as a functional genomics tool to investigate structurefunction relationships in myosin. In particular, to discover SNP distribution over the conserved sub-domains and surmise what it implies about sub-domain stability and criticality in the energy transduction mechanism.
An automated routine identifying human nonsynonymous SNP amino acid missense substitutions for any MHC gene mined the NCBI SNP data base. The routine tested 22 MHC genes coding muscle and non-muscle isoforms and identified 89 missense mutation positions in the motor domain with 10 already implicated in heart disease and another 8 lacking sequence homology with a skeletal MHC isoform for which a crystallographic model is available. The remaining 71 SNP substitutions were found to be distributed over MHC with 22 falling outside identified functional sub-domains and 49 in or very near to myosin sub-domains assigned specific crucial functions in energy transduction. The latter includes the active site, the actin binding site, the rigid lever-arm, and regions facilitating their communication. Most MHC isoforms contained SNPs somewhere in the motor domain.
Several functional-crucial sub-domains are infiltrated by a large number of SNP substitution sites suggesting these domains are engineered by evolution to be too-robust to be disturbed by otherwise intrusive sequence changes. Two functional sub-domains are SNP-free or relatively SNP-deficient but contain many disease implicated mutants. These sub-domains are apparently highly sensitive to any missense substitution suggesting they have failed to evolve a robust sequence paradigm for performing their function.
Single nucleotide polymorphisms (SNPs) are common single base DNA sequence variants that account for a sizable portion of the genetic variability between individuals. Some SNPs are common and have minor allele frequencies that approach 50%, while others are found much less frequently. There is some conceptual overlap between rare SNPs (with minor allele frequencies of less than 1%) and disease implicated mutations, but in common usage the term polymorphism is restricted to non-pathogenic sequence changes. Genome SNP patterns are fingerprints identifying subpopulations with common heritage that potentially instigate subpopulation specific traits exploitable for individualized treatment of disease. For the purposes of this discussion, it is implicit that the SNPs accumulate in the genome because they negligibly affect survival, however, we recognize that some SNPs may in fact have deleterious effects on the expression or function of protein products.
Another distinction that is germane to the present work is between synonymous and nonsynonymous SNPs. SNPs that lie within the coding region are synonymous if they do not change the amino acid specified by the codon and are nonsynonymous when they do. Even though synonymous SNPs will not change protein sequence they might still impact expression of a protein product by influencing the subcellular localization, stability or translational efficiency of mRNA [1,2]. The nonsynonymous SNP introduces a single amino acid change into the protein sequence. Assessing the impact of such protein sequence changes on protein function can be difficult. For most SNPs whose minor allele frequency is at least 5%, it is likely that their impact on genetic fitness is minor . SNPs that clearly have harmful functional consequences only rarely become more common than this 5% level in the population as a whole. For less common SNPs, homozygous individuals that have two copies of the SNP are rare and deleterious consequences can often be masked by the presence of the more common functional allele. Another factor in deciding the significance of the nonsynonymous SNPs is genetic redundancy. The affected gene could be one copy among several present in the genome. These factors decide the prevalence of the affected protein in the tissues and may be more important for survival than how the SNP affects protein function. Furthermore, a SNP induced loss of functionality can sometimes be compensated by redundant functionality in the larger physiological system. An example is skeletal muscle where the relative composition of the myosin isoforms depends on environmental factors such as exercise [4-6] or zero gravity . The muscle tissue may adapt to loss of function in the SNP affected isoform to assure survival. Finally, while SNPs accumulated in the data base are identified world wide without apparent prejudice, they are not necessarily a random sampling of the genome. SNP frequency in different gene regions could reflect uneven DNA sequencing accuracy and efficiency or other unknown biases inadvertently built into the data base. Our results are implications from the current data base that changes and grows daily.
Here we identify and discuss the myosin heavy chain nonsynonymous SNPs. We will include even scarce SNPs where homozygous individuals are rare. We do so in the spirit of being comprehensive, with the full understanding that neutrality of function cannot be assumed in these cases. Additionally, some SNPs discussed appear in the database, but have not been confirmed by other groups. Our purpose is to provide a rationale for using these reports to investigate structure-function relationships in myosin. Functional sub-domains within myosin have emerged as disturbance propagators essential to transduction. They are structural entities in which to identify SNP substitution sites and focus speculation on how SNPs influence myosin function.
The myosin heavy chain (MHC) consists of a globular head domain called subfragment 1 (S1) and C-terminal tail responsible for dimerization and cargo binding. In muscle, myosin dimers form (thick) filaments with interacting tails and head domains projecting outward from the filament to associate with complementary actin (thin) filaments in a muscle sarcomere . Cellular myosins have several forms [9-12]. The cellular myosin V dimer is a processive motor utilizing the two heads cooperatively to move along actin filaments .
S1 contains functional sub-domains that retain their structure, but move relatively, during transduction . ATPase transduction starts in the active site composed in part by the P-loop, Switch 1, and Switch 2 polypeptides (linked to a 7-stranded β-sheet separating the active site from the actin binding site) that sense the γ-phosphate position and coordination. ATP hydrolysis in the active site is not followed immediately by phosphate release because the Switch 1 R246 and Switch 2 E469 salt-bridge "back door" inhibits phosphate release until actin binds. With actin binding, the back door opens to release phosphate and the Switch 2 α-helix (or relay helix) transmits linear force originating from the active site to a converter domain converting linear force into torque to rotate the lever-arm. The lever-arm α-helix, rigidified by the bound myosin light chains (MLCs), amplifies displacement and impels myosin relative to actin [15,16]. A region between the switch 2 helix and the converter domain, the SH2/SH1 hinge, undergoes substantial conformation change during the transduction. The 7-stranded β-sheet, switch 2 helix, converter, lever-arm, and SH2/SH1 hinge are the sub-domains identified in the skeletal myosin crystal structure depicted in Figure Figure11.
Multiple peptides within S1 (C-loop, Myopathy loop, and others) constitute the actin binding site. This site alternates between weak and strong actin binding affinity in coordination with the lever-arm swing. Phosphate release in muscle myosin ATPase (or the opening of the R246/E469 back door salt-bridge) is rate limiting due to its inhibition by a large entropic energy barrier in the absence of actin. Phosphate release initiates lever-arm swing hence the rate limiting step inhibits progress through the cycle until actin binds and a work producing lever-arm power stroke is possible. Coupling actin binding to reducing the energy barrier to phosphate release is an allosteric effect predicted for Myopathy or C-loop structured surface loops [18,19]. We showed the C-loop is the allosteric actin contact sensor mediating bidirectionally between the actin-binding and active sites . The C-loop lowers the energy barrier to phosphate release by coupling structural transitions within the S1 that allow phosphate release without inducing strain in the 7-stranded β-sheet. Several actin binding myosin surface loops are identified in Figure Figure11.
In this paper we locate nonsynonymous SNPs within the functional sub-domains and evaluate their expected functional impact based on: i) how well the native residue is conserved among isoforms, ii) comparison of general physical characteristics (size, charge, hydrophobicity, etc.) between the native and substituted side chain, and iii) (when possible) the hypothetical role of the native residue in transduction and how this role could be affected by the substituted side chain. We find that SNPs infiltrate nearly every corner of the myosin head and that their significance may lie in where they are tolerated in contrast to where they are excluded.
Twenty-two human myosin heavy chain genes MYH1-4, MYH6-15, MYH7B, MYOIE, MYO6, MYO7A and 7B, MYO9A and 9B, and MYO10 were searched for nonsynonymous SNPs as described in METHODS. We confined the search to SNPs in the S1 domain. Table Table11 summarizes results with the SNP reference number locating it in the human genome and in the NCBI data base, the sequence position relative to the skeletal myosin 2× sequence, the MHC gene and myosin isoform it encodes, the position for the SNP within myosin domains, the average percentage of heterozygous genes (one allele normal the other SNP substituted) in the standard sub-populations, whether (Yes or No) there are SNP substituted homozygous genes (both alleles SNP substituted) in the standard sub-populations or observed in screened individuals, and the number of individuals screened. Missing data is denoted with a hyphen.
Most MHC isoforms contribute SNPs including the fast skeletal muscle isoforms 2×, 2a, 2b, embryonic, and perinatal; α- and β-cardiac myosins; smooth muscle variants sm1A and sm2B; myosin V; super-fast myosin; two isoforms of MYH14; MYH15; and other unconventional myosins including IE, VI, VIIA, VIIB, IXA, IXB, and X. Myosin 2b (MYH4) is a fast skeletal myosin isoform apparently expressed in a skeletal jaw closing (masseter) muscle . The embryonic and perinatal isoforms (MYH3 and 8) are developmentally regulated [22,23]. The embryonic isoform also expresses in regenerating muscle and in extraocular muscles (EOM) . A mutation in perinatal myosin is implicated in a rare genetic disease causing distal arthrogryposis syndrome . The α-cardiac isoform (MYH6) is predominantly expressed in the atrium . The β-cardiac isoform (MYH7) is predominantly expressed in the heart ventriculum and in slow or type 1 skeletal muscle fibers. Numerous heart disease implicated mutations occur in the β-cardiac MHC . Another myosin isoform (MYH7B/MYH14) on chromosome 20 is called a β-cardiac myosin with 85% homology with the β-cardiac myosin coded by MYH7.
The MYH11 gene encodes the 4 smooth muscle myosin isoforms that differ near the C-terminus (isoforms 1 and 2) or at Loop 1 where a seven amino acid peptide is omitted (isoform A) or inserted (isoform B) . The Loop 1 insert correlates with tissue localization  and enzyme kinetics [30,31]. Nucleotide-induced fluorescence intensity increase in the highly conserved W511 occurs in all but one myosin isoform implying it results from common origin within the conserved motor core. The smA isoform has nucleotide-induced tryptophan fluorescence enhancement like other myosin isoforms but we showed with site-directed mutagenesis that the contributing tryptophan is probably not W511. These results suggested that smA and smB myosin isoforms have different influence propagating pathways emanating from the active site through the switch 2 helix containing W511 to the converter domain .
Processive myosin V (MYH12) translates over actin filaments using a hand-over-hand mechanism [33-35]. The super-fast isoform (MYH13) expresses in EOMs performing diverse functions including eye movement . The EOMs are more complex than limb muscles with 9 or more different MHCs expressed in adult muscle including the fast skeletal, β-cardiac, and embryonic MHC. The super-fast myosin supports higher velocity and lower tension contractions . The MYH14 gene codes for 2 myosin isoforms (iso1/iso2 in Table Table1)1) expressed in muscle and non muscle tissue. The MYH14 gene is implicated in hearing disorders . SNP variants in MYH15 are associated with elevated risk for heart disease .
Myosin IE (MYOIE) is a cellular myosin with an extended C-loop thought to interact with tropomyosin in tropomyosin-containing actin [39,40]. Myosin VI is a reversedirection motor with core structure resembling forward-direction motors except for two inserts (1 & 2). Insert 1, near the active site and Switch 1, is probably partially responsible for the slow ADP release (A·M^·D→A·M in Scheme Scheme1,1, METHODS). Insert 2 modifies the converter domain to re-direct lever-arm swing for reverse-directed motility [41,42]. Mutations in myosin VII Aassociate with Us her syndrome that in its most severe phenotype is a deafness-blindness disorder [43,44]. Myosin IXB is a single headed processive motor that is unique because its rate limiting step is not ADP release but ATP hydrolysis (M*·T→M**·D·P in Scheme Scheme1)1) suggesting it remains actin bound even in a weak actin binding state . Slow dissociation from actin is attributed to tethering by a unique Loop 2 insertion. Myosin X is a membrane associated motor involved in filopodia motility  that has novel monomeric and dimeric conformations at physiological protein concentrations . Calmodulin or calmodulin like protein (CLP) are the lever-arm bound myosin light chains in myosin X . Myosin X is believed to be processive consistent with its function in the cell.
Three SNPs, R246H, E469Q (Figure (Figure2)2) and G701R (Figure (Figure3),3), are remarkable because of the functional significance of the residues they modify. R246 and E469 form the Switch 1/Switch 2 salt-bridge at the γ-phosphate in bound ATP preventing product release following hydrolysis in the active site of S1. Site-directed mutagenesis of Dictyostelium discoideum (Dicty) myosin produced R246E, E469R, and the double mutant R246E/E469R reversing the polarity of the salt-bridge . Only the double mutant remained functional but with altered ATPase kinetics. Similar experiments using expressed smooth muscle myosin suggested R246 is significant for back door closure after ATP binding to the active site and that E469 is important for ATP hydrolysis by its positioning of a water molecule for nucleophilic attack on the γ-phosphate . The E469Q SNP substitution does not necessarily prohibit the latter mechanism although E469A/R mutants eliminated the phosphate burst and actin-activated ATPase. The R246H SNP substitution alters size but not charge of the side chain while R246A/E mutants likewise eliminated the phosphate burst and actin-activated ATPase. The loss of the phosphate burst and actin-activated ATPase indicates the phosphate release step is no longer rate limiting undermining the efficient energy transduction in muscle because the ATPase cycle disengages from the work producing interaction with actin. The SNPs suggests it would be interesting to test the R246H and E469Q substitutions in an in vitro system. R246H occurs in myosin VIIA where ATPase is rate limited by ADP release rather than phosphate release possibly explaining its tolerance to substitution. E469Q occurs in β-cardiac myosin where ATPase is rate limited by phosphate release. R246H and E469Q occur with unknown or apparently low probability for expression, respectively.
G701R in super-fast myosin replaces the glycine swivel with a residue containing a bulky side chain inhibiting free rotation in the Ramachandran angles that is characteristic only to glycine . The highly conserved G701 pivot was found to be essential for motor activity in skeletal  and Dicty  myosins. S1 crystal structures depicting myosin conformation in M**  and M*  lever-arm orientation states indicate G701 swivels 40-50 deg about two axes with the lever-arm swing . Unlike nearby swivel-candidate residues, G705 and G712, the G701 maintains conformations in the pre- and post-powerstroke states that only glycine can readily accommodate. If the G701R substituted protein truly functions in healthy individuals, it suggests the super-fast myosin has a substantially modified transduction mechanism to accomplish its high velocity contraction. The G701R substitution occurs with an average heterozygous population in percent (AHP) of 2.8% and with homozygous individuals identified.
Figure Figure22 shows the 7-stranded β-sheet separating the active from the actin binding sites and numerous residue positions with SNP substitutions. Substitutions H170Y (R170 in MYH1), D171E (E171 in MYH1), and N172D modify β4; L463P modifies β5; while I253V and E257V (T257 in MYH1) modify β6 within the β-sheet structure. SNP substitution R675Q modifying β3 is implicated distal arthrogryposis syndrome  and in heart disease together with SNP substitution R252Q modifying β6 . The R/Q substitution replaces a large +charged side chain with a polar, neutral, and smaller side chain where total surface area changes by ~52 Å2. I/V, D/E, H/Y and N/D substitutions are conservative of charge, size, and polarity. E/V substitution is not conservative of charge, size, or polarity. The L/P substitution could be the most perturbative due to proline's backbone restrictive geometry inducing strain in the β-sheet. Distortions in this β-sheet accompany events associated with ATP hydrolysis and product release since the P-loop, Switch 1, and Switch 2 are closely linked with the β4, β6, and β5 strands, respectively . Furthermore, crystal structures of myosin V without bound nucleotide models the actin-attached rigor conformation and suggests strong binding with actin closes a cleft within the 50 kDa domain of S1  causing distortions in the β1, β2, and β3 strands . Thus many aspects of energy transduction impact the 7-stranded β-sheet and in particular, strands 3-6 containing the SNP substitutions. Our MCMD simulation suggests distortion of the β-sheet in the β5 and β7 strands provides a substantial part of the entropy dominated energy barrier to product release in the absence of actin . In particular, β5 residues Y458 and I460 (Figure (Figure2)2) have solvent accessible surface areas (ASA) that undergo dramatic increases with barrier transition contributing significantly to the entropy dominated free energy barrier. Simulation suggests residues on β5 and β7 should more likely perturb the contraction mechanism implying the SNP substitution L463P modifying β5 may again have special significance. These SNPs occur in a variety of myosin genes and isoforms but most (including L463P) are relatively new entries that have not undergone further verification to test their presence in sub-population data bases.
Figure Figure22 also shows Switch 1 and the R240W and R246H SNP sites adjacent to the 7-stranded β-sheet, E681K adjacent to β3, Switch 2 and the E469Q SNP site adjacent to β5, T258K appearing between the β6 (AA248-257) and β7 (AA265-271) strands, and D108E (E108 in MYH1) upstream from β1 and adjacent to the 7-stranded β-sheet. Additionally, I88M, E89Q, and H98Q are within a few angstroms of β1, while R445Q and N447K are within a few angstroms of β7. The R240W substitution provides an opportunity to insert a harmless spectroscopic probe (i.e., W240) into the Switch 1 peptide to sense directly the back-door dynamics and induced strain in the 7-stranded β-sheet. The D108E SNP reiterates the natural cardiac/skeletal myosin isoform sequence substitution. I88M, E89Q, and H98Q connect the 7-stranded β-sheet to SH3. N447K appears within a cluster of heart disease linked mutations . R445Q has an AHP of 18% and comes from MYH15. The next largest AHP SNP at 2.5% from around the 7-stranded β-sheet is T258K in super-fast myosin. Seventeen SNP substitutions (excluding disease linked R252Q and R675Q) in, or proximal to, the 58 residue 7-stranded β-sheet is unexpected given the significance attributed to the region.
Figure Figure44 shows the Switch 2 helix underlying the 7-stranded β-sheet where β5 and the Switch 2 helix are linked by Switch 2, and sites for the eight SNP substitutions, Q478H, Y483H (F483 in MYH1), N485I, H494D, H495Y, F497L, and K505N, modifying the Switch 2 helix. SNP substitution E502K is implicated in heart disease . The Switch 2 α-helix transmits linear force originating from the active site to the converter domain. Helix rigidity could be essential to force transmission. Q478 is a highly conserved residue at the beginning of the helix where it could maximally affect force transmission efficiency by reducing rigidity. The Q478H substitution tests this hypothesis because the Q/H substitution is helix destabilizing  suggesting helix stability may not be crucial. Other SNP substitutions, N485I, H495Y, F497L and E502K, are helix stabilizing while Y483H, H494D and K505N are destabilizing. K505N is at the C-terminal end of the helix where destabilization would be least detrimental. H495 is a potentially sensitive point where the helix develops a kink in the pre-power stroke M** conformation . The H495Y substitution from MYH15 has an AHP of 38% and homozygous individuals were identified. A H495W mutation might sense Switch 2 helix movement by tryptophan fluorescence changes and represents a desirable signal donor in the transduction mechanism. Highly conserved residues on the tip of the switch 2 helix, Y504 and I509, are thought to mediate interaction with the converter domain but are not affected by known SNPs.
Figure Figure33 shows the SNP substitutions sites for the SH2/SH1 hinge [M688V, V694E (L694 in MYH1), R695H (H695 in MYH1), G701R, R707S, R710S], converter [R743Q (K743 in MYH1)], and lever-arm [L785M and D787N (Q787 in MYH1)]. SNP substitutions V720I, Y723C, and P735S in the converter, and A801T in the lever-arm, are implicated in heart disease . The converter domain position P735 is implicated is heart disease by a P735L substitution in β-cardiac MHC . The P735S SNP replacement is homozygous in the 4 subpopulation groups tested suggesting the reference sequence is incorrect.
The converter domain has been implicated as the element undergoing elastic distortion in the myosin accompanying force development prior to mechanical work production [58,59]. Site directed mutagenesis in the SH2/SH1 hinge--converter interface, and in the converter domain proper, provided clues for how it performs the conversion of linear translation in the Switch 2 helix to the rotation of the lever-arm . Mutations F713A or F768A completely eliminated or sharply inhibited the ability of smooth muscle myosin to function mechanically. Standard kinetics experiments looking at the ATP binding, hydrolysis, product release, and the fluorescence from the ATP sensitive tryptophan (W511) indicate active site kinetics and transmission of force through the Switch 2 helix are fully intact in the mutants. The findings suggest that the mutations perturb only the converter function. Other experiments, specifically fluorescence resonance energy transfer (FRET), actin sliding assay, and single myosin force development show that both mutants are compromised mechanically. The F713A mutant causes a catastrophic mechanical failure due to the loss of the hydrophobic contact between the SH2/SH1 hinge and converter. The F768A mutant looses the rigidity of the Switch 2 helix/converter contact at F768 making a motor that can rotate its lever-arm but with less force. The SH2/SH1 hinge has a high density of SNP substitutions with 6 occurring in a 28 residue peptide. SNP sites at R707 and R710 are highly conserved among myosin isoforms but do not appear to enter into the crucial interaction with the converter domain. M688 is not conserved among the myosin isoforms.
The sole converter domain SNP substitution not implicated in disease is R743Q that is also common with an AHP of 13.5%. The SNP occurs in myosin X that has a low converter domain sequence homology with myosin 2×. Myosin VI, VII, and X have a unique structure at the head-tail junction wherein a monomeric single α-helix (SAH) competes with the dimeric coiled-coil [9-11]. The SAH domain might function as an elastic element undermining converter domain significance in this isoform. Hypothetically, the converter domain is sensitive to mutation because it usually needs to perform the two functions of the linear-to-rotary motion converter and elastic element.
The L785M, D787N (Q787 in MYH1), and A801T SNP substitutions modify the leverarm. Mutation in this region could interfere with essential light chain (ELC) binding. The loss of light chains, either regulatory (RLC) or ELC, affects myosin morphology and its motor function probably by lowering the rigidity of the lever-arm domain . Either substitution could also directly perturb the lever-arm structure and rigidity also leading to loss of function. None of these possibilities may be happening due to the lever-arm SNP substitutions because their consequences would be severe. Aside from G701R and R743Q, SNPs shown in Figure Figure33 affect a small or unknown percentage of the population.
Figure Figure55 shows the SH3 domain (AA30-80) and the sites for the P31T, G57R, V60I, and K68N (A68 in MYH1) substitutions. The SH3 domain is a β-barrel located close to the myosin active site but with uncertain function. The SH3 domain undergoes significant conformational change during ATP hydrolysis including at the G57 site where Ramachandran angle changes consistent with a glycine residue will be inhibited by steric clash with the arginine side chain. Smooth muscle myosin contains tryptophan residues in SH3 and its vicinity that sense conformational change in the N-terminal domain with nucleotide binding to the active site . N-terminus truncated constructs of the Dicty S1 showed SH3 played a role is structural stabilization of S1 and in communication among functional domain within the myosin head .
The essential light chain (ELC) appears to couple SH3 and lever-arm sub-domains in S1. LC1 and LC3 are ELC isoforms expressed in skeletal muscle with the S1 heavy chain binding LC1/RLC or LC3/RLC pairs in equal amounts. LC1 differs from LC3 by a 40 residue extension at the N-terminus. Cardiac myosin has only LC1. The flexible LC1 N-terminus is not resolved in S1 crystal structures and is not shown in Figure Figure1.1. Electron cryomicroscopy reconstructions suggest SH3 and the N-terminus of LC1 interact in skeletal muscle myosin . There it is suggested that the SH3/LC1 interaction could facilitate the (N-term)LC1/(N-term)actin interaction known to affect the overall actomyosin interaction in skeletal and cardiac myosins [65,66] and that SH3 could contain an LC1/active-site path of influence known to modulate myosin ATPase . The P31T substitution in MYH14 and V60I in myosin X are common.
Protein-protein contacts in actomyosin were identified by experimental structural studies combined with simulated docking of the myosin and actin crystal structures [68-72], by detecting changes in actin binding strength, actin-activated myosin ATPase, and in vitro motility caused by the mutation of small peptide segments or individual residues in myosin [73-81]. Primary hydrophobic actin contacts are helical segments, AA529-560 and AA652-661, while the unstructured surface Loop 2 (AA626-651) maintains ionic interactions with the actin N-terminus. Secondary sites are an unstructured surface loop, Loop 3 (AA568-580), and the structured Myopathy loop (AA404-417)  also on the S1 surface. Figure Figure66 shows the actin binding myosin surface loops including Loops 2, 3, Myopathy and C-loop. Also shown are actin binding site SNP substitutions F534L and F655S that fall within a primary hydrophobic actin contact.
Experimental work comparing tertiary structure of skeletal and β-cardiac MHC identified the C-loop (AA363-377), Figures Figures11 and and6,6, having influence on active site coupling to actin and possibly a direct interaction with actin . Limited proteolysis of skeletal S1 cleaves the heavy chain at Loop 1 (AA200-220) and Loop 2 producing 25, 50, and 20 kDa molecular mass fragments . These loops are involved in regulation of substrate release (Loop 1) and in actin binding and regulation of actin-activated ATPase (Loop 2) [75,78,84-86]. Limited proteolysis of βS1 cleaves the heavy chain at equivalent points and at the C-loop within the 50 kDa fragment . The C-loop cleavage dramatically affects βS1 Mg++ATPase suggesting it participates in energy transduction . Actin binding protected Loop 2 from proteolysis in skeletal S1 indicating Loop 2 involvement in actin binding [87,88]. Actin binding to βS1 fails to inhibit Loop 2 cleavage [82,89] but does inhibit C-loop cleavage . These observations highlight differences in skeletal S1 and βS1 conformation and identify the C-loop as a possible actin binding site. The C-loop was proposed as a site of actin binding in skeletal S1 [71,72], in myosin V , and in myosin IE where it is called Loop 4 . We constructed single site mutations in the smooth muscle myosin C-loop and chimeric proteins containing C-loop sequences from βS1 and skeletal S1 to investigate how sequence perturbs C-loop function [20,91]. Based on this work we proposed that the C-loop is an allosteric actin contact sensor initiating actin-activation of the myosin ATPase.
The C-loop SNP substitutions, R370H (K370 in MYH1), Q371P, and P378Q, occur in the myosin VIIA, super-fast myosin and α-cardiac myosin, respectively. In super-fast myosin, the Q/P substitution can be expected to perturb and rigidify the C-loop structure with the proline. We proposed structural rigidity in the C-loop is probably necessary for its effectiveness  but overall structure also plays a role and impacts myosin kinetics . Each of these substitutions is expected to affect myosin kinetics related to its interaction with actin, e.g. actin-activated ATPase and motility. Heterozygous Q371P alleles are significantly abundant although no homozygous individuals were detected. Abundance of the other SNPs is untested.
Loop 2 and near Loop 2 SNP substitutions are A621V, T631N (E631 in MYH1), A637V (G637 in MYH1), and E643K (K643 in MYH1). A/V substitution replaces the apolar alanine with the larger apolar valine, T/N exchanges two neutral polar groups, and E/K reverses charge. The effect of Loop 2 on myosin ATPase and myosin interaction with actin has been thoroughly studied with site directed mutagenesis. In Dicty myosin II, Loop 2 tolerates significant sequence changes without affecting function. Function seems to depend on charged or apolar residue distribution but not on specific structural features because the loop is flexible. A/V and T/N substitutions are conservative and unlikely to impact function. The A637V substitution is from perinatal myosin and abundant in this isoform although no homozygous individuals were detected. Abundance of the other Loop 2 SNPs is untested.
The Myopathy Loop is a structured surface loop containing an unusual clustering of heart disease implicated mutations including R406Q/W/L, V407M, V409M, and G410V . It was shown to be an actin binding site [69,92] and modulator of actin activated ATPase [81,93]. At R406, the smooth muscle heavy meromyosin (HMM) model for β-cardiac myosin has largely reiterated the clinical phenotype . The G410V mutant shows modest changes in actomyosin affinity possibly indicating a generally degraded fit between myosin and actin like the R406 mutants . The R406W mutant appears in the SNP data base in apparently healthy individuals. R408C (K408 in MYH1) has not been implicated in disease. None of the Myopathy loop SNPs have significant abundance although homozygous individuals carrying R406W were identified.
The Loop 3 SNP substitutions, R570K (K570 in MYH1) and V572A/D (A572 in MYH1), and the hydrophobic actin binding site residue substitutions, F534L and F655S, are conservative except for F655S. Among the myosin isoforms, all of these residues are variable. The heterozygous allele for F534L in skeletal myosin 2b has significant abundance (2.8%) but no homozygous individuals were identified.
SNPs appearing in Table Table11 but not discussed in the context of Figures Figures2,2, ,3,3, ,4,4, ,5,5, and and66 locate to the N-terminus (AA1-219), the upper 50 k (AA220-468) and lower 50 k (AA469-620). The upper 50 k and lower 50 k domains (U50 and L50) designate peptides within the 50 k of S1 on either side of the cleft that closes on strong binding to actin. N-term SNP substitutions include D4A in MYH7 and D4E in MYH13, R18C, R24P/L, T25M (I25 in MYH1), M142I (A142 in MYH1), R145G (G145 in MYH1), I148T (R148 in MYH1), and L199F (I199 in MYH1). The U50 SNP substitutions include H287R, E300D (L300 in MYH1), T307P, P320A (E320 in MYH1), S323C (V323 in MYH1), I326T, V335I (T335 in MYH1), R336H (D335 in MYH1), T345A, V350D/A from myosin 2b and R350W (V350 in MYH1) from myosin X. L50 SNP substitutions are I514T and A594T. SNP substitutions Q329R and I514T are disease implicated. SNPs in the "other myosin S1" category do not fall within, or are not spatially proximal to, presently recognized S1 functional elements directly participating in transduction. The average heterozygous SNP allele frequency for this set ranges from 0 to 17.2% (17.2% for E389D in myosin V with homozygous individuals identified) with most heterozygous substitutions having significant abundance and some untested.
Figure Figure77 shows the amino acid sequence of myosin 2× vs AHP for the SNPs in Table Table1.1. Myosin regions or domains identified with specific functions in energy transduction are identified by the broken red line and with a vertical label. Many SNPs have unknown or zero AHP that are both plotted as zero. The remaining substitution sites have a significant-to-large AHP and are identified with a horizontal label. Healthy individuals with homozygous SNP substituted alleles are implied by the significant heterozygous populations. Figure Figure7A,7A, specifying SNPs in the AA1-400 portion of the MHC show numerous substitutions at the N-terminus and in the AA280-360 peptide segments. These regions are not directly related to core function because they do not fall into the identified functional subdomains. The C-loop and β6-β7 junction are functional elements that are the exceptions in this region because they are SNP modified. Figure Figure7B,7B, specifying SNPs in the AA401-813 portion of the MHC, shows a somewhat different picture with all but one significant SNP substitution modifying a functional sub-domain. The high AHP sites are candidates for modification by mutagenesis to introduce probes that monitor motor dynamical structure without altering native behavior. Most crucial regions are covered where in introduction of Trp or Cys would facilitate direct (Trp) or indirect, through a specific modification by an extrinsic probe (Cys), observation of myosin dynamics. Remarkably, the G701R substitution in the SH2/SH1 hinge falls into this category.
SNPs account for about 75% of the number of sequence differences between individuals in a population . They have significant utility in gene mapping studies and there are an increasing number of SNPs that have been shown to have a significant influence on an individual's susceptibility to disease and response to various drug therapies. In some well-studied genes and gene families, patterns of SNPs within the coding sequences have also provided some insight into protein function. Perhaps the best known example of this is the SNP (rs334) that leads to sickle cell anemia. This Glu6Val change in the protein sequence has an allele frequency of about 25% in Yoruban populations and creates well-known negative and presumed positive consequences for the individuals that carry this sequence variant. Myosin SNPs could likewise create negative and positive consequences for certain populations with specific functional requirements that should be explored experimentally. Nonsynonymous SNPs that have truly neutral functional consequences also provide important information about protein function, because they highlight regions of protein structure that are tolerant of structural variation. We discuss three scenarios for interpreting SNP data denoted, the function-neutral, the too-robust, and the too-sensitive.
SNPs reside in healthy people, otherwise they are deleted by natural selection, hence it might be hypothesized that the amino acid substitution is function-neutral. In this case, SNPs identify residues or components of the motor not directly related to core function making them candidate sites for adventitious modification. An example is the R240W substitution that places a spectroscopic probe in the 7-stranded β-sheet at the interface of the active and actin-binding sites. It is thought that coordination of ATP hydrolysis with actin binding and release is mediated by this β-sheet and an optical probe of its structure would be valuable. R240 is not highly conserved among species but infrequently substituted by aromatic residues. Its negligible AHP raises uncertainty about potential use although homozygous individuals were reported. G701 is the pivot for the lever-arm swing and thought to be necessary for myosin function. The G701R SNP substitution has 2.8% AHP but it is difficult to understand how this substitution could be function-neutral in the swinging lever-arm picture for force production. Other unlikely function-neutral candidates are the R246H and E469Q substitutions with unknown and insignificant AHP, respectively. R246 and E469 form the SW1/SW2 salt-bridge regulating phosphate release in ATPase in the rate limiting step for muscle myosins. R246H originates from the cellular myosin VII isoform where the native rate limiting ADP release minimizes R246 significance. E469Q originates from β-cardiac myosin using the phosphate release mechanism. Under the function-neutral hypothesis, the collection of SNPs at R246, E469 and the vicinity of the 7-stranded β-sheet would imply a diminished role for this structure in energy transduction.
Figure Figure77 identifies the most likely candidates for the function-neutral SNPs because they have significant to large AHP implying the presence of healthy individuals carrying homozygous substituted alleles. Here SNPs cluster in a few regions of the myosin head not identified with core functionality but also cover most function-crucial regions in the S1. SNPs in Figure Figure77 may be most useful as leads for the introduction of probes that monitor motor dynamical structure without altering native behavior.
An alternative to the function-neutral hypothesis is that SNP substitutions modify functional structures too-robust to be disturbed by otherwise intrusive sequence changes. Excluding for a moment the E469Q and G701R special cases, several structures in S1 fulfill expectations implied by this hypothesis. The 7-stranded β-sheet and vicinity collected 17 SNP substitution sites (excluding known disease related sites, see Table Table1)1) over 14 of the 22 myosin genes where SNPs were identified. The too-robust hypothesis implies that the energy transduction mechanism contained in the 7-stranded β-sheet is too well designed by evolution to be compromised by any of the SNP substitutions. Actin binding by S1, modified by SNPs in the main hydrophobic binding site (F534L and F655S), in Loop 3 (R570K and V572A/D), Loop 2 (A621V, T631N, A637V, and E643K), the Myopathy loop (R408C, not including the disease implicated R406W), and the C-loop (R370H, Q371P and P378Q), is another potential too-robust site. The switch 2 helix, charged with propagating linear motion from the active site to the converter domain, suffers seven SNP substitutions and functions while the SH2/SH1 hinge and lever-arm successfully manage impacts of six and two SNP substitutions, respectively. Returning to E469Q and G701R, the functional impact of E469Q (7-stranded β-sheet) has not been evaluated in vitro, however, other substitutions at this site were disruptive to muscle myosin function except when the polarity of the R246/E469 salt-bridge was reversed. Then the mutant S1 functioned near normally. The E469Q substitution seems likely to disrupt the interaction with R246 thereby altering specific rates in the ATPase cycle but potentially within boundaries that do not overly impact function. The G701R substitution is difficult to rationalize even in the too-robust hypothesis. The functional impact of G701R has likewise not been evaluated in vitro but a glycine swivel is disrupted by the presence of any side chain with a β-carbon. How super-fast myosin could function with G701R needs clarification.
The too-robust hypothesis is often the implicit working hypothesis for those utilizing site directed mutagenesis to study protein mechanisms. It may sometimes be too optimistic an assumption for interpreting SNPs. The SNP substitution may affect protein function compensated by redundant functionality outside the affected protein in the larger physiological system. Different SNPs will have different applicable hypotheses and the challenge is to identify the correct one to apply.
We anticipated SNP clustering in a few regions of the myosin head identifying the expendable components of the system unrelated to core functionality. This occurred to some extent within the narrow context of the significant AHP SNPs. Residues 1-400 in MHC show significant AHP SNP substitutions at the N-terminus and in the AA280-360 peptide segments that do not fall into the identified functional sub-domains. However, residues 401-812 show a different picture with all but one significant AHP SNP substitution modifying functional sub-domains. Overviewing the total set of substitutions, the S1 SNPs are distributed spatially over the protein more uniformly than expected. Figure Figure88 shows the annotated S1 from Figure Figure11 with the SNP substitution sites identified by the solid atom rendering in yellow. Most SNPs locate within, or to the vicinity of, domains intimately connected to the myosin core functionality. They do not seem to select any one structural feature for a diminished role in the core functionality except possibly the 7-stranded β-sheet that attracted so many SNP substitutions.
The Myopathy loop and the converter domain nearly escaped modification by SNPs not already disease related. They each have one SNP substitution not known to be disease implicated. As suggested by its name, known mutations in the Myopathy loop accompany heart disease . SNPs do not appear in the Myopathy loop because most substitutions cause serious functional deficit. The same appears to be true for the converter where there are 10 or more heart disease linked mutations. These structures are too-sensitive to substitution to be a target for SNPs. The large AHP for the SNP substitution R743Q in the converter domain (Figure (Figure7)7) appears to have identified a unique site in the domain that can be safely modified. At least for myosin, regions or domains excluded from SNP substitution may be the most precisely defined. These high sensitivity regions are anti-targets for SNPs because they are not functionally robust (i.e. too-sensitive) rather than because they are the only domains necessary for function.
Single nucleotide polymorphisms of the myosin heavy chain were mined from the NCBI data base using computer programs written in perl that identified human nonsynonymous SNP amino acid missense substitutions for any MHC gene. Twenty-two MHC genes were searched including muscle and non-muscle myosin isoforms. Excluding disease implicated residues and sites without homology with skeletal myosin 2×, the 71 SNPs identified were distributed spatially homogeneously over MHC with 22 falling outside specific functional domains. The remaining 49 SNPs were spatially related to the 7-stranded β-sheet mediating active site and actin binding site communication (17 SNPs), the actin binding peptides (12 SNPs), switch 2 helix (7 SNPs), SH2/SH1 hinge (6 SNPs), SH3 (4 SNPs), lever-arm (2 SNPs), and converter (1 SNP). Two functional elements were almost devoid of SNPs, the converter domain and the Myopathy loop. These elements are burdened with an unusually high number of disease implicated mutations. The SNP substitution sites in MHC suggest they can infiltrate domains that are engineered by evolution to be "too-robust" to be disturbed by otherwise intrusive sequence changes, and, "function-neutral" sites where substitution is immaterial. Disease implicated mutations and the absence of SNP substitutions map to regions "too-sensitive" to be modified because they have not evolved a robust sequence paradigm for performing their function.
SNP data records were retrieved using a computer program written in perl. Perl was chosen because of its text-parsing and internet capabilities; and because it is widely available. The basic distribution of perl is available for use with various operating systems, including Microsoft Windows, linux/unix, and Macintosh. The first program (get_snps_flt.pl) uses NCBI eUtils to search for missense SNPs in the human myosin genes. The results are saved locally as flat text files then parsed into tables using a second perl program (make_table.pl). The get_snps_flt.pl and make_table.pl programs are listed in the additional file 1 where there are instructions for their use.
The NCBI SNP database stores individual SNP submissions and assigns them unique identifiers that are prefixed with ss (submitter sequence). These individual records are eventually assigned to groups so that all SNPs that occur at a particular position in a gene are in the same cluster. Each cluster has a unique numeric identifier that is prefixed with rs (reference sequence). Table Table22 shows a representative tabular output from make_table.pl. In every cluster, the reference (or "normal") allele is compared to the SNP substituted allele to determine the missense mutation in the expressible protein. Up to three SNPs may be reported in a cluster, corresponding to the three possible non-reference bases. The residue sequence in Table Table22 is the sequence of the gene product, or myosin isoform, in which the SNP is detected. Sequence alignment with BLAST positioned the SNP substitutions on the myosin crystal structure.
The dominant kinetic pathway for the actomyosin ATPase cycle is summarized by solid arrows in Scheme Scheme11.
M, M*, M**, and M^ are intermediates corresponding to distinct myosin conformations, A is actin, T is ATP, D is ADP, and P is inorganic phosphate. M* and M** weakly bind actin while M and M^ strongly bind actin. The broken arrow (···>) in the bottom row indicates the rate limiting product release step in the absence of actin. Work production occurs when M** ·D·P weakly binds actin (vertical transition), releases product P with the opening of the active site "back door" (R246/E469 salt-bridge), and forms the strong actin bond A·M^·D. The lever-arm rotates to impel actin ending in the low free energy rigor state A.M. Cross-bridge repriming begins with ATP binding to A.M initiating dissociation from actin, ATP hydrolysis, and reversal of the lever-arm power stroke rotation. The power stroke, envisioned by comparison of the M** and M* crystal structures, rotates the lever-arm through ~70 degrees [17,53]. We performed a dynamics simulation for myosin joining the known M** and M* transient intermediate structures in the Scheme 1pathway using a non-equilibrium Monte Carlo molecular dynamics (MCMD) simulation of the M** → M* conformation trajectory .
The lever-arm swing conformation change in myosin occurs on the millisecond time domain and involves many atoms. We developed the new dynamics simulation strategy because system complexity excludes use of standard molecular dynamics methods. All conformation trajectories confirm that during ATPase, an entropy dominated structural transition in the actin binding domain of S1 is a free energy barrier ensuring that product release is rate limiting in the absence of actin and prior to the lever-arm swing in agreement with the conventional myosin mechanism. MCMD simulation of the power stroke indicated that two complementary peptides, designated U50a and U50b (AA145-361 and AA362-462 peptides), mainly in the 50 kDa actin binding domain encompass the free energy barrier to product release and the anti-barrier, respectively. Simulation shows that coupling the U50a and U50b conformation transitions removes the free energy barrier to product release. The C-loop is a structured surface loop linking U50a and U50b whose structure would be perturbed with actin binding. These circumstances suggest that perturbation of the C-loop with actin binding couples U50a and U50b transitions causing product release and power stroke initiation.
Homology modeling uses a crystal structure template to constrain a target sequence of unknown structure. The template is the chicken skeletal myosin S1 (2mys, ) and the target sequence is the human fast skeletal MHC isoform 2× (MYH1) and essential light chain 3 (ELC3). Homology modeling was done with Modeller 9.4 . All structures shown depict S1 from myosin 2×.
Accurate model building depends on target and template sequence similarity and alignment. Template and target sequences for the MHC and ELC are practically identical (>91% sequence identity) suggesting Modeller produces a reliable structure . We did alignment using the NCBI BLAST 2 sequences protocol http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi. Unstructured surface loops on myosin were not resolved in the crystal structures but were added using the appropriate Modeller routine. Homology model accuracy was evaluated using the Discrete Optimized Protein Energy (DOPE) score as suggested in the Modeller tutorial. The DOPE score is proportional to potential energy per residue, smoothed over a 15 residue window, and normalized by the number of restraints acting on each residue. It indicates problem regions in the homology model when profiles are compared from target and template sequences. The DOPE profile showed no unusual problem regions over the entire peptide.
(AA): amino acid; (AHP): average heterozygous population; (ASA): accessible surface area; (βS1): β-ventricular myosin subfragment 1; (Dicty): Dictyostelium discoideum; (DOPE): discrete optimized protein energy; (ELC): myosin essential light chain; (EOM): extraocular muscle; (L50): lower 50 k domain of S1 containing AA469-620; (LC1): long ELC isoform; (LC3)short ELC isoform; (MHC): myosin heavy chain; (MLC): myosin light chain; (NCBI): National Center for Biotechnology Information; (RLC): myosin regulatory light chain; (S1): myosin subfragment 1; (SAH): single α-helix; (SNP): single nucleotide polymorphism; (U50): upper 50 k domain of S1 containing AA220-468; (U50a): U50 sub-domain (AA145-361): containing free energy barrier to phosphate release in myosin ATPase; (U50b): U50 sub-domain (AA362-462): containing free energy anti-barrier to phosphate release in myosin ATPase
TPB performed the SNP searches, aligned protein sequences, and drafted the manuscript. KLN wrote the perl software and adapted it for use on the project. EDW and KA provided conceptual guidance and helped draft the manuscript. All authors read and approved the final manuscript.
Instructions for retrieving SNPs. Two perl programs provide automated retrieval and organization of SNP data from the NCBI database. Additional file 1 contains the perl programs and instructions for their use.
This work was supported by NIH-NIAMS grant R01AR049277 and by the Mayo Clinic Rochester. We thank Dr Alan R. Penheiter at the Mayo Molecular Medicine Program for guiding us at the start of this project.