|Home | About | Journals | Submit | Contact Us | Français|
Computational simulation of pandemic diseases provides important insight into many disease features that may benefit public health. This is especially true for the influenza virus, a continuing global pandemic threat. Molecular or atomic-level investigation of influenza has predominantly focused on the two major virus glycoproteins, neuraminidase (NA) and hemagglutinin (HA). In this chapter, we walk the readers through major considerations for studying pandemic influenza glycoproteins, from choosing the most useful choice of system(s) to avoiding common pitfalls in experimental design and execution. While a brief discussion of several potential simulation and docking techniques is presented, we emphasize molecular dynamics (MD) and Brownian dynamics (BD) simulation techniques and molecular docking, within the context of biologically outstanding questions in influenza research.
Computational investigations of influenza have repeatedly shown that they are capable of adding significant insight into the structural dynamics and function of major components of the influenza virus. They have also been extensively used in the rational design of the two clinically administered antiviral drugs, oseltamivir (Tamiflu) and zanamivir (Relenza), and a number of other inhibitors (1, 2). The availability of numerous high-resolution x-ray crystallographic structures for both of the virus’s major glycoproteins, neuraminidase (NA) and hemagglutinin (HA), make these enzymes well-suited to investigation with atomic-level approaches. In addition, the time- and length-scales of biologically and medically relevant motions in these systems are accessible by several simulation techniques, such as classical (3–10), steered (11) and generalized Born molecular dynamics simulations (12, 13), Brownian dynamics simulations (14), virtual screening (15–19), and a range of free energy techniques, from MM-PBSA and related approaches (12, 20–25) to thermodynamic integration (26) and free energy perturbation (27) (see Note 1).
In this chapter, we outline the major critical system setup considerations when attempting to perform atomistic simulations (e.g. molecular or Brownian dynamics simulations) and docking calculations for both NA and HA. We focus exclusively on the influenza type A pathogens, which are responsible for most of the seasonal, epidemic, and pandemic disease outbreaks in humans. Pandemic events are classified according to the World Health Organization 6-stage scale (see Note 2), and there have been four major pandemic influenza outbreaks in recent history since the first recorded event in 1918. The causative strains include H1N1, which caused the “Spanish Flu” pandemic in 1918 and the “Swine Flu” pandemic in 2009; H2N2, which caused “Asian Flu” in 1957; and H3N2, which caused “Hong Kong Flu” in 1968 (28). H5N1, which caused “Bird Flu” in 2004, H1N2, H7N2, H7N3, H7N7, H9N2, and H10N7 are other influenza A serotypes that have been found in humans but have not caused any pandemics (see Note 3).
Neuraminidase (NA) and hemagglutinin (HA) are the two major glycoproteins in influenza virions, which are present in the host-derived lipid envelope in a HA:NA ratio of approximately 4–5:1 (29, 30). Together they perform a delicate balancing act between host cell sialic acid receptor binding, performed by HA, and sialic acid receptor cleavage, performed by NA, which facilitates viral shedding (31). The choices of systems to investigate, i.e. computational “starting materials,” are numerous considering the wide array of high-resolution atomic level structures presently available.
As of December 2010, there are 77 publicly available influenza A neuraminidase structures deposited in the protein databank (32), consisting of NAs from both phylogenetically-determined group-1 (N1, N4, N5, N8) and group-2 (N2, N3, N6, N7, N9) serotypes (33) (see Note 4). Structures from human and avian species are represented (see Note 5), as are N1 structures from the 1918 and 2009 pandemic strains (see Note 6), and drug resistant mutants from both group-1 and group-2 strains (see Note 7).
Aside from the protein itself, one may endeavor to simulate complexes of NA with various ligands. The natural substrates of NA are glycosides with α-linked terminal sialic acid (a.k.a. sialosides). Sialic acid has been resolved in complex with group-2 NAs, as have the clinically administered drugs (oseltamivir and zanamivir), potential drug candidates (peramivir or BCX-1812), and various lead compounds (many of which are sialic acid analogue compounds) (see Note 8). Antibody-bound structures with wild-type and escape-mutant NAs have also been deposited (see Note 9).
At the time of writing, there are 75 hemagglutinin (HA) structures from the Influenza A virus (IAV) deposited in the PDB, using the Blast sequence search tool with the 2009 H1 chain A (HA1) sequence from 3MLH (34), with low complexity region masked, and an expectation value of 0.001. HA, like NA, are classified into two groups based upon phylogenetic analysis (35): so-called “group-1,” which consists of H1, H2, H5, H6, H8, H9, H11–H13, H16, and “group-2,” constituted by H3, H4, H7, H10, H14, H15. Of these, only three HA strains, H1, H2, and H3, are known to infect humans, however small human outbreaks from avian subtypes H5, H7, and H9 have also been recorded (36). Over 40 crystal structures of non-pandemic HA strains are deposited in the PDB database (see Note 10), representing mouse, avian, swine, and human species (see Note 11). Currently, there are 26 publicly available crystal structures of pandemic strain HAs, representing all of the twentieth century pandemic events (see Note 12).
In addition, a number of crystal structures of HA with their human or avian receptor analogues have also been isolated. These receptors include LSTa, LSTc, 6-SLN, 3-SLN, or monosaccharides, such as sialic acid, which interact with the receptor binding domain, as well as those which interact with the fusion domain, such as tert-butyl hydroquinone (see Note 13). Finally, six deposited HA structures are co-complexed with antibody fragments, which provide platforms for potential vaccine design (see Note 14).
After the specific NA and HA strains are chosen for investigation, a number of structural features must be considered prior to molecular-level simulation or docking. In this section, we outline these considerations in detail for both major glycoproteins. The goal is to facilitate the setup of the best possible biophysical systems for use in all-atom investigations, and present potential known pitfalls where appropriate.
The biologically relevant form of NA is a tetramer. The symmetric 4-subunit head, which contains the sialidase active sites, sits atop a long, variable “stalk,” which is anchored to the viral particle membrane via a hydrophobic segment of residues (37). Low-resolution cryo-electron microscopy images suggest that the stalk length is in the range of 100 angstroms (29). Although at least two experimental studies have shown that stalk length is relevant to NA function (38, 39), the lack of high-resolution structural information for the stalk and sheer system-size has, to date, prohibited the all-atom investigation of the full-NA structure. Instead, most computational investigations have focused on either tetramer or monomer forms of the head group itself (see Note 15).
NA requires calcium to function (40), although the calcium ion is not required for the actual enzymatic glycosidase activity. The calcium ion, which has been crystallized in a number of structures, has been shown to play an important structural role, as evidenced by its location behind a loop adjacent to the sialic acid binding site. Free energy calculations performed with and without the bound ion indicate that stability of residue Y347 and binding of oseltamivir is affected by the presence of the calcium (41). It is therefore critical to include the calcium ion in all-atom investigations of NA (see Note 16).
In addition to the calcium ions, the inclusion of explicit water molecules is a major consideration for the investigator. This is especially important in the large sialic acid binding cavity, which, if left unoccupied, could undergo non-realistic structural rearrangements. Several crystal structures show numerous buried water molecules in the NA enzyme (see Note 17), and these should be considered for use in homology modeling of the water molecules into structures without such information. Alternately, one can use water prediction programs, such as DOWSER (42), to attempt to address where buried or bound water molecules may reside based on water-protein interaction energies (see Note 18). A further alternative is to restrain the protein during initial dynamics so that the water molecules can penetrate into their proper positions, before the protein is allowed to move and adapt its shape artifactually in the absence of such stabilizing waters. This last alternative is probably the least favored for atomic-level modeling since it may be especially difficult for buried water molecules to reach their proper positions in a reasonable simulation time scale.
Many studies wish to explore the basis of molecular recognition for bound substrates or small molecule ligands. In these cases, one hopes there is available crystal structure information with resolved electron density for the ligand(s) of interest (see Note 19); in cases where there is no such information, docking programs can be used to predict docked poses (see Note 20). A common ligand of interest is oseltamivir (also called GS4071 or Tamiflu), which is a pro-drug that is metabolized in the body after administration to its active form (see Note 21). It is the only orally-available, clinically used antiviral currently on the market, and therefore, frequently included in all-atom simulations of NA. With regard to ligand-bound simulations, it is also important to consider the effect of explicit solvent or if implicit solvent treatment could be employed. Ligands that are known to coordinate to the NA active site through hydrogen bonds with water are not well-represented with implicit solvent (see Note 22), and thus should be avoided when continuum representations of solvent are desired (13, 20).
As with any atomic-level simulation, protonation states of titratable residues must be determined. The protonation states of most residues can be assigned with standard protonation state prediction programs, e.g. WHATIF (43) or PROPKA (44) (see Note 23). Residue H274, which is a commonly found in drug-resistant mutated strains to be H274Y, may be of special interest and if so, should be treated more carefully (see Note 24).
Certainly one of the better-known major pitfalls with studying the group-1 enzymes involves the selection of the starting crystal structure with regard to the topology of the active site area, and especially, the conformation of the so-called 150-loop. The first crystallographically resolved N1 structures (of the non-pandemic H5N1) exhibited an altogether new cavity adjacent to the sialic acid binding pocket, which was formed by an “open” conformation of the 150-loop (33) (see Note 25). The same study also crystallized the H5N1 with a closed 150-loop conformation when the enzyme was soaked with high concentrations of oseltamivir or for longer soaking times. Very recent structural evidence of the 2009 pandemic H1N1 clearly showed that it lacked the 150-cavity, despite being a classified as a group-1 NA (45). The co-complex structural elucidation of a new inhibitor designed to target the N1 150-cavity indirectly confirms, however, that the 2009 pandemic H1N1 is indeed susceptible to ligands that target the open conformation of the 150-loop (46). Clearly, a better understanding of the atomic-level control mechanisms for the structural dynamics of the 150-loop is warranted; in the meantime, atomic-level investigations should carefully choose which loop configuration is appropriate for any particular study (see Note 26). Along these lines, NA has been used as a model system for the development of ensemble-based virtual screening experiments (15) (see Note 27) in a procedure known as the relaxed complex scheme (47, 48). Similarly, higher-level binding free energy calculations, such as free energy perturbation or thermodynamic integration, will be heavily impacted by motion in this loop (see Note 28), and as such, extra caution in choosing the initial starting structural configuration of the 150-loop is warranted.
Glycosylation is another possible consideration that we present. Although glycosylation is generally believed to play a larger role in HA function, N2 is known to have four glycosylation sites per monomer (49), including a residue on the 150-loop (see Note 29). Curiously, glycosylation at the 150-loop site has been shown to play a role in neurovirulence in mice (50). To the best of our knowledge, there have been no atomic-level computational investigations to date that include bound sialoglycans on NA, yet, the development of an improved generalizable carbohydrate force field, GLYCAM06 (51, 52), makes such studies more accessible.
Hemagglutinin (HA) is involved in the attachment of viral particles to sialosaccharides on host cell membrane lipids or surface proteins. The receptor binding domain (RBD) of HA is made up of the 190-helix (HA1 188 to 190), 130-loop (HA1 134 to 138) and the 220-loop (HA1 221 to 228), with a number of conserved residues for receptor binding and species specificity (53). The terminal sialic acid are often linked through α-2,3 or α-2,6 linkage to the galactose, with the latter thought to be recognized by human influenza HA. Glycan receptor binding affinity to HA is in the millimolar range, but compensated by multivalent interactions (avidity) between multiple HA’s and glycan receptors (54). Crystal structural studies have revealed that sialic acid makes contact with several conserved residues, e.g., Y98, S/T136, W153, H183, L/I194 (H3 numbering), which exhibit more variations in HA’s from humans than from birds (55, 56). Furthermore, it is believed that the larger RBD size of human H3, compared to RBDs from avian H1 or H5, may be required to accommodate the larger α-2,6 linked glycan receptors (57).
There is a growing recognition that the optimization of molecular interactions in the HA systems may require significant conformational adjustments of the participating proteins, ligands, or substrates and carbohydrates (55, 56). Unfortunately, extensive large-scale conformational changes are very difficult, if not impossible, to sample in all-atom molecular level investigations of HA. However, more local aspects of molecular recognition are indeed tractable with such methods. In fact, while superimpositions of pentasaccharides using known crystal structures offer potential clues as to why α-2,3 or α-2,6 linkages may be preferred for particular species (33, 58), the flexibility of both HA protein and the bound glycans is largely undetectable in crystallography studies. Atomic-level simulation techniques have the opportunity to make significant contributions in the exploration of such areas.
A major outstanding question in HA biology pertains to how HAs differentially recognize different host cell glycan receptors. The pentasaccharides LSTa and LSTc, natural sialosides from human milk, are convenient avian and human receptor analogues to employ in such studies (55, 57, 58) (see Note 30). These two sialosides are often found in complex glycans on cell surfaces and contain lactosamine (Gal2-GlcNAc3) and lactose (Gal4-Glc5) units (see Note 31). Significant advances in carbohydrate force field development have been made over the years, primarily by GLYCAM06 (52, 59) for the AMBER force field (60), and CSFF (61) and others (62–64) for the CHARMM force field (65). The inherent flexibility that challenged crystallography and limited earlier computational studies to short di- or tri-saccharides (66–68) can now be examined in atomic detail.
The analysis of glycan structural dynamics must also be considered carefully and in a manner that is slightly different than the typical small-molecule ligand. In the context of glycan-HA binding interactions, most of the glycan conformational changes occur relative to their sialic-acid-1 (Sia1) units. Furthermore, it has been observed that Sia1 placement is relatively stable in the HA RBD. Consequently, utilizing a global RMSD alignment to analyze the glycan conformations is obviously inappropriate. Instead, aligning the glycan trajectory frames on the heavy atoms (C and O) of the Sia1 pyranose ring in order to remove the overall rotation and translation was shown to be a viable approach (3, 69) (see Note 32). As part of the glycan analysis, clustering of the resulting glycan structures can also be considered (see Note 33).
Previously reported HA affinity for sialyl oligosaccharides usually has dissociation constants in the millimolar range (70–77), a behavior often attributed to an enthalpy-entropy compensation phenomenon (78, 79). The loss of entropy which offsets enthalpic gain is interpreted in terms of conformational distortion and freezing of flexible oligosaccharide ligands as well as solvent reorganization accompanying binding. While the solvent associated entropy contribution is still the least-understood aspect, MD simulations of the free and bound glycans make it possible to explore the conformational energetics of the glycan-HA binding interactions. We urge the reader to include estimates for entropic terms in their studies wherever possible (see Note 34).
The biologically-relevant form of HA is a trimer of heterodimers. The configuration of the trimer indicates that the individual monomer units strongly interact with eachother, and this feature essentially requires all-atom studies to utilize the complete trimer structure (see Note 35). Although studies of one monomer of the RBD alone could be employed, they would neglect likely important stabilizing interactions from neighboring units present biologically. Protonation states of the protein residues must be treated prior to simulation and this can be performed in an identical manner as described for NA (see Note 22).
As evidenced by the large number of published computational molecular-level studies of the influenza glycoproteins (see Note 1), there are numerous options for simulation and docking that can be pursued. Unfortunately, it is not possible to list each program’s input parameters with full detail in this chapter. To provide both simplicity and usefulness, we present a brief overview of the available computational techniques and refer the reader to the individual references cited herein for explicit methodological details.
All-atom, explicitly solvated molecular dynamics simulations for pandemic and potentially-pandemic NA and HA complexes have been carried out using a number of simulation software packages, including NAMD2 (80), AMBER (81), GROMOS (82), GROMACS (83), and DESMOND (84) (see Note 36). Recently published manuscripts employing these programs provide explicit methodological details which can be referenced by the reader, and include Xu et al. (69) and Amaro et al. (4), Chachra et al. (21), Lawrenz et al. (26), Kasson et al. (85), and Wereszczynski and McCammon (27), respectively. In all cases, periodic boundary conditions were applied in conjunction with particle-mesh Ewald (PME) summation (86) to treat long-range electrostatics. Non-equilibrium steered molecular dynamics simulations of unbinding events in NA have been carried out using NAMD2 as well (11). Implicit solvent generalized Born simulations of the NA monomer and tetramer have been carried out using AMBER (13). Brownian dynamics simulations to determine rates of association between NA and sialic acid or oseltamivir (14) have been carried out using the Brownian dynamics simulation package SDA (87).
As neuraminidase is one of the major antiviral drug targets in influenza, it has been used extensively in docking and free energy of binding studies, starting from the early 1990s (see Note 34). Nearly every docking program available has published examples of neuraminidase compounds, and the system is generally considered among benchmark sets for molecular docking; examples of docking procedures for NA include AutoDock (15), GOLD (88), DOCK (89), Surflex-Dock (90), and LigandFit (91), among others (see Note 37). In addition to docking, binding free energy estimates have been obtained using high-accuracy alchemical free energy methods (26, 27) as well as less accurate, hybrid techniques, such as MM-PB(GB)SA (13, 20, 21, 69) and linear interaction energy (92) approaches.
This work was funded by the National Institutes of Health (NIH) through the NIH Director's New Innovator Award Program, 1-DP2-OD007237 and a NIH Career Transition Award 1-K22-AI081901 to R.E.A. W.W.L. is funded in part by NIH P41 RR08605.
1Searching only the American Chemical Society’s journal database for “neuraminidase molecular dynamics” retrieves over 255 articles; a search for “hemagglutinin molecular dynamics” retrieves an additional 274. Given the wide range of NA and HA atomic-level computational investigations, we fully acknowledge that the references cited here are just a small sampling of the rather sizeable number of publications in the literature, and we apologize to the authors whose work we have not been able to explicitly cite. We stress that citations provided here are merely examples of the various computational techniques that have been explored, and that we do not intend to be comprehensive.
4PDBs available at time of writing are: Group-1, pandemic (N1): 3CYE, 3NSS, 3B7E, 3BEQ; Group-1, non-pandemic (N4, N8): 2HT5, 2HT7, 2HT8, 2HTQ, 2HTR, 2HTU, 2HTV, 2HTW, 2HTY, 2HU0, 2HU4, 3CKZ, 3CL0, 3CL2; Group-2, non-pandemic (N2, N6, and N9): 1BJI, 1F8B, 1F8C, 1F8D, 1F8E, 1ING, 1INH, 1INW, 1INX, 1INY, 1IVC, 1IVD, 1IVE, 1IVF, 1IVG, 1L7F, 1L7G, 1L7H, 1MWE, 1NCA, 1NCB, 1NCC, 1NCD, 1NMA, 1NMB, 1NMC, 1NN2, 1NNA, 1NNB, 1NNC, 1V0Z, 1W1X, 1W20, 1W21, 1XOE, 1XOG, 2AEP, 2AEQ, 2B8H, 2BAT, 2C4A, 2C4L, 2CML, 2QWA, 2QWB, 2QWC, 2QWD, 2QWE, 2QWF, 2QWG, 2QWH, 2QWI, 2QWJ, 2QWK, 3NN9, 4NN9, 5NN9, 6NN9, 7NN9.
5One must be careful when selecting species-specific strains to investigate. For example, representative N2 structures are available from both Tern (avian, PDB identifier 1QWK) and human (PDB identifier: 1NN2) isolates. Adaptation through sequence mutations and alterations in glycosylation patterns of NAs and HAs occur over time in the host organism; as all influenza infections in human are believed to be derived from avian progenitors, this process is commonly known as “human adaptation” (93). N1 and N2 are the only subtypes of NA currently known to circulate widely in humans.
6Structures of the 1918 “Spanish flu” A/Brevig Mission/1/1918 H1N1 are available as 3B7E, 3BEQ, 3CYE, and a structure of the 2009 “Swine flu” A/California/04/2009 is available as 3NSS.
7Drug-resistant mutants of NA have been found to occur over time in the population due to selective pressure. Structural representations of select common drug-resistant mutations in non-pandemic strains are: 3CKZ and 3CL0 (H274Y mutant in H5N1 NA), 3CL2 (N294S mutant in H5N1 NA); 2QWJ, 2QWH, 2QWG, 2QWF, 2QWE, 2QWD, 2QWC, 2QWB, 2QWA, 1L7H (R292K mutant in Tern N9); and 1L7G (E119G in Tern N9).
8A large number of ligand-bound NA structures are available in the PDB database. Group-2 sialic acid bound structures: 1MWE, 1W1X, 1W20, 1W21, 2BAT, 2C4A, 2C4L, 2QWB; Group-1 oseltamivir-bound structures: 2HT7, 2HT8, 2HU0, 2HU4, 3CL0, 3CL2; Group-2 oseltamivir-bound structures: 2QWH, 2QWK; zanamivir-bound structures: 2CML (group-2), 3B7E (pandemic 1918 N1), 2HTQ/3CKZ (H5N1); Group-1 peramivir-bound structure: 2HTU; Group-2 peramivir-bound structures: 1L7F, 1L7G, 1L7H; Group-2 NAs with other lead compounds or sialic acid analogues: 1BJI, 1F8B, 1F8C, 1F8D, 1F8E, 1ING, 1INH, 1INW, 1INX, 1INY, 1IVC, 1IVD, 1IVE, 1IVF, 1IVG, 1NNA, 1NNB, 1NNC, 1XOE, 1XOG, 2QWC, 2QWD, 2QWE, 2QWF, 2QWG, 2QWI, 2QWJ. Group-1 NA with other lead compounds: 2HTW.
9Antibody-bound NA structures are: 2AEP, 2AEQ, 1NCA, 1NCB, 1NCC, 1NCD, 1NMA, 1NMB, AND 1NMC.
10HA PDBs of non-pandemic strains available at the time of writing are, Group-1 (H1, H5): 1RU7, 1RVX, 1RVZ, 1RUY, 1RV0, 1RVT, 3HTO, 3HTP, 3HTQ, 3HTT, 2WRH, 2FK0, 2IBX, 3GBM, 3FKU; Group-2 (H3, H7, H14): 2VIR, 2VIS, 2VIT, 2VIU, 1HA0, 1HGD, 1HGE, 1HGF, 1HGG, 1HGH, 1HGI, 1HGJ, 1EO8, 1KEN, 1QFU, 3EYM, 1MQL, 1MQM, 1MQN, 1T18, 3M5G, 3M5H, 3M5I, 3M5J, 3EYJ, 3EYK.
11As with NA, one must be careful in the selection of particular strains to investigate due to the fact that isolates from several species may have crystal structures. For example, H7 has been isolated from both human (3M5G representing A/New York/107/2003) and avian (1T18 representing A/turkey/Italy/02) species. As species-specific features are likely to be present in each of the structures (including specific residue mutations and glycosylation sites), careful consideration of the actual strain and/or structural source must be exercised prior to system setup.
12HA crystal structures from pandemic strains include: 1RD8, 1RUZ, 2WRG, 3GBN, 3LZF for 1918 pandemic H1; 2WR1, 2WR2, 2WR3, 2WR4, 2WR5, 2WR7, 2WRB, 2WRC, 2WRD, 2WRE, 2WRF, and 3KU3, 3KU6, and 3KU5 (very high resolution structures representing A/Japan/305/57) for 1957 pandemic H2; 2HMG (representing A/HongKong/19/68(H3N2)) and 3HMG, 4HMG, 5HMG (representing A/Aichi/2/1968(H3N2); 3M6S representing A/Darwin/2001/2009 and 3LZG and 3AL4 representing A/California/04/2009 for the 2009 H1 pandemic.
13Co-complexes with various ligands are available in the PDB, including, LSTa (1RVX, 1RV0, 3HTP, 1MQM, 2WR3, 2WRB); LSTc (1RVZ, 1RVT, 3HTQ, 1MQN, 2WR7, 2WRF); 2,3 sialyllactose (3HTT); tert-butyl hydroquinone (3EYM); sialyl-N-acetyllactosamine (3M5H, 3M5I); and sialic acid (4HMG, 5HMG).
14Co-crystal structures of HA with antibody fragments are, recombinant X31 H3: 1EO8, 1KEN, 1QFU; H5: 3FKU; and 1918 H1: 3GBN, 3LZF.
15Ref. (13) shows that protein instabilities can arise when studying NA in the monomer form as opposed to the tetramer, due to the loss of stabilizing inter-subunit contacts. All-atom investigations using the monomer NA are likely sufficient when exploring dynamics on short time scales (less than 10 ns of classical molecular dynamics simulations) or docking calculations, which generally focus on a particular binding site. When performing longer time scale or implicit solvent simulations, the tetramer form of NA should be utilized to avoid the introduction of non-biological structural artifacts.
16Many of the NA x-ray crystallographic structures deposited in the protein data bank do not have density information for the bound calcium ion. High-resolution structures of group-1 and group-2 NAs with the bound calcium ions (e.g. group-1: 2HTY, group-2: 2QWK) should be used to model the calcium by homology when it is missing in a particular structure of interest.
17A representative structure of a group-1 NA with explicitly resolved water molecules is 2HTY, and for group-2, 2QWK. These structures can be used to homology-model explicit water molecules. The group-2 2QWK structure has a bound oseltamivir ligand in the active site, and 4 water molecules are shown to coordinate within 3 angstroms of the ligand. The 2HTY structure is without substate or bound ligand, and therefore there are some differences in the water positions in the sialic acid binding pocket. If homology modeling of the water molecules is pursued in conjunction with a bound ligand, one must remove water molecules that sterically overlap with the atoms of the ligand.
18DOWSER program information can be found at: http://hekto.med.unc.edu:8080/HERMANS/software/DOWSER/
19Structures with bound ligands have been presented in Note 8
21One should be careful not to confuse the pro-drug form (the ethyl ester) with the active compound (acid). If studying the drug in complex with NA, the active, acid form of oseltamivir should be selected.
22Sialic acid, as well as 2-deoxy-2,3-didehydro-N-acetylneuraminic acid (DANA) and zanamivir, have been shown to depend on explicit water molecules in order to stabilize the bound conformation, and thus treatment with implicit solvent conditions is ill-advised (20) The same study showed that oseltamivir is able to retain the correct bound pose without explicit water molecules.
23A useful webservice to predict relevant protonation states of the protein residues at a user-defined pH is maintained by the National Biomedical Computation Resource (NBCR) PDB2PQR web service (http://nbcr.sdsc.edu/pdb2pqr) (96).
24In most simulations with standard treatment, this residue is predicted to be neutral and to have a proton on its epsilon nitrogen.
25It is standard practice in the NA field to use N2 numbering. The 150-loop has been defined as residues 146/147–152.
26The open 150-loop non-pandemic H5N1 structures are: 2HTY (ligand-free), 2HU0 (oseltamivir-bound). 2HU4 is the same system with a closed 150-loop, which was determined during longer time soaks or under higher oseltamivir concentrations. 3NSS presents the 2009 pandemic H1N1 crystal structure with a closed 150-loop. 3O9J and 3O9K present N8 with an open 150-loop, in complex with two inhibitors that bind to the 150-cavity.
27In Cheng et al. (15), the resulting snapshots from both the apo and oseltamivir-bound all-atom MD trajectories of N1 were clustered according to root-mean-square-deviation (RMSD) of a subset of 62 residues lining the binding site area using the GROMOS++ analysis software (97). The exact residues used in the clustering were (N1 numbering): 117–119, 133–138, 146–152, 156, 179, 180, 196–200, 223–228, 243–247, 277, 278, 293, 295, 344–347, 368, 401, 402, and 426–441. When performing RMSD-based clustering, several RMSD thresholds must be tested in order to determine the optimal clustering cutoff, which is generally chosen after evaluating the dependence of the number of clusters on the cutoff values. An additional metric that the user can examine is hydrogen bonding within the cluster groupings; this adds “physical insight” into the choice of cutoff and can also be used as a metric for guiding the clustering cutoff choice. For the NA system, a range of RMSD-threshold values from 1.0–1.5 Å were tested and 1.3 Å was ultimately chosen as the final cutoff value. The exact choice of cutoff is up to the user and may also depend on other factors; e.g. if one plans to perform ensemble-based virtual screening experiments with the resulting cluster representative structures, the user may desire to choose the number of clusters such that the majority of the trajectory is contained in some computationally tractable number of structures. In the case of the Cheng et al. study, the top 3 most dominant clusters for both the apo and oseltamivir-bound simulations represented over 60% of the trajectories.
28Two recent papers have investigated the use of new methodologies to effectively increase the sampling of the 150-loop with respect to accurate ligand free energy of binding estimates. Lawrenz et al. utilized a novel “independent trajectories” approach to thermodynamic integration calculations, in order to better account for 150-loop motion in N1-peramivir co-complexes (26). Similarly, selectively applied accelerated molecular dynamics was employed by Wereszczynski and McCammon in conjunction with alchemical free energy transformation techniques to enhance sampling of the 150-loop and improve binding affinity estimates for an N1-oseltamivir co-complex (27). Such novel approaches highlight the need to address 150-loop sampling before reliable binding free estimates can be obtained.
29Asn residues 86, 146, 200 and 234 are glycosylated. Carbohydrate units attached to Asn146 are of the complex type, containing N-acetylglucosamine, N-acetylgalactosamine, mannose, galactose, and fructose (49).
30Complete LSTc can be extracted from the LSTc-H9 crystal structure complex (PDB Code: 1JSI). LSTa is currently only available in its trisaccharide (Sia1-Gal2-GlcNAc3) form in the protein data bank crystal structures. The missing Gal4 and Glc5 units can be added using the freely-available Maestro GUI (Schrödinger Inc) or GlyProt (98).
31It should be noted that although the final sialic acid linkage presents an obvious difference between host species, there are likely several other features that are relevant to species specificity. For example, based on a survey of available crystal structures, a new parameter to define the topology adopted by the long α-2,3 or α-2,6 linked glycans was suggested plays a crucial role in HA-glycan specificity of recognition (56).
32In the Xu et al. and Newhouse et al. studies, the RMSD was measured on the heavy atoms of Gal2-Glc5 6-member pyranose rings between the individual glycan trajectory and the glycan MD starting structure.
33Clustering of glycan structural dynamics using hierarchical average linkage clustering has been carried out by Xu et al. and Newhouse et al. (3, 69). The glycan trajectories were aligned via the Sia1-aligned residues and concatenated into a single trajectory. In this case, the hierarchical average linkage clustering method was chosen because of its superior performance in producing clusters with the smallest within-cluster variance and large between-cluster separation compared to many other clustering algorithms (99). In Xu et al. and Newhouse et al., a 3-cluster solution was selected for the sake of simplicity (3, 69). The structures of the glycan cluster representatives and the cluster percent population can then be extracted and analyzed. In addition, the agglomerative clustering process and the RMS distances at which clusters were merged can be illustrated through dendrograms.
34Docking investigations and studies pursuing quantifiable free energies of binding are much less common for HA and glycan systems, likely owing to the many degrees of freedom in the glycan receptor molecules. In fact, the few studies that have utilized such techniques for oligosaccharide systems concluded that entropic considerations of the glycans to estimates of binding cannot be neglected (3, 69, 100).
35If only the heterodimer is present in the HA crystal structure of choice, a number of programs can be used to build the full oligomeric state priot to simulation. For example, Chimera (101) and VMD (102) are freely-available programs that can both be used to perform symmetry transformations from monomer to trimer configuration using crystal record information.
36Of the available simulation packages, we note that NAMD2, GROMACS, and DESMOND are made freely available to academic researchers. In particular, NAMD2 has been designed specifically for the simulation of large biomolecular complexes, and thus often offers benchmark advantages when parallel computing resources can be utilized. Since the tetramer NA and trimer HA systems, when fully solvated, contain over 120,000 atoms, this can be an important factor in simulation software selection.
37It should be noted that the choice of any molecular docking package may be appropriate, as long as docking control experiments with known actives are provided in each case.