|Home | About | Journals | Submit | Contact Us | Français|
The interactive chromatin modeling web server (ICM Web) is an interactive tool that allows users to rapidly assess nucleosome stability and fold sequences of DNA into putative chromatin templates. ICM Web takes a sequence composed of As, Cs, Gs, and Ts as input and generates (i) a nucleosome energy level diagram, (ii) coarse-grained representations of free DNA and chromatin and (iii) plots of the helical parameters (Tilt, Roll, Twist, Shift, Slide and Rise) as a function of position. The user can select from several different energy models, nucleosome structures and methods for placing nucleosomes in the energy landscape. Alternatively, if nucleosome footprints are known from experiment, ICM Web can use these positions to create a nucleosome array. The default energy model achieves a correlation coefficient of 0.7 with 100 experimentally determined values of stability and properly predicts the location of six positioned nucleosomes in the mouse mammary tumor virus (MMTV) promoter. ICM Web is suitable for interactively investigating nucleosome stability and chromatin folding for sequences up to tens of kilobases in length. No login is required to use ICM Web.
Experimentally determined genome wide maps of nucleosome positions are becoming readily available (1–3). These data have prompted renewed interest in models of nucleosome positioning (4). Most models of nucleosome positioning rely on the idea that nucleosome stability is dependent upon the ability of a sequence of DNA to assume the structure observed in the nucleosome, as summarized in ref. (5). Low-energy, stable nucleosomes are formed when the DNA sequence exhibits a flexibility that enables it to more readily assume the nucleosome conformation or when the sequence exhibits deformations that match the nucleosome’s geometry. Little attention has been devoted to determining if the experimentally determined irregularly preferred positions can actually be mapped into three-dimensional structures. This is likely due to the fact that sequence-dependent nucleosome positioning is not the sole determinant of chromatin organization and that not all preferred nucleosome positions will be occupied simultaneously. It is well known that histone tails (6–8), linker histones (9) and many other factors play a role in determining the organization of chromatin. Models that account for such extra-nucleosomal interactions are being developed (10–12). The structure of chromatin is further complicated by the fact that nucleosomes exhibit a family of structures (13).
Here, we introduce a web server that can rapidly fold tens of thousands of base pairs into a nucleosome array or putative chromatin template using either an automated placement procedure or a list of defined nucleosome positions. The automated procedure uses a nucleosome stability model to predict the lowest energy conformation, while the list-based approach allows direct input of experimentally determined positions. By comparing the two methods of chromatin folding, one can assess the role of sequence-dependent nucleosome stability in determining chromatin architecture.
The web server offers a choice between several different nucleosome stability models. All models assume elastic deformations of individual base pairs in the nucleosome. The default provides a correlation of 0.7 with experimentally determined values of nucleosome stability for over 100 sequences of DNA (14). The mouse mammary tumor virus promoter sequence (MMTV) is used here as an example to demonstrate how users can employ these tools to investigate promoter function. The default settings properly predict the known positioning in the MMTV.
Below we follow the workflow in ICM Web as a means of introducing the underlying methods while also describing the server functionality. We conclude with an application of these tools to the MMTV.
Workflow in ICM Web begins at http://dna.ccs.tulane.edu/icm . The user inputs a sequence, selects an energy model, specifies the occupancy and provides a temperature (see Supplementary Figure S1). Clicking ‘Go’ produces the output: a 2D energy level diagram, 3D structures of free DNA and the nucleosome array and a 6D helix parameter model (see Supplementary Figure S2).
The sequence of interest is entered into the primary text box on the main page as a string of As, Cs, Gs or Ts. Any other characters will generate an error code when the ‘Go’ button is activated. A set of more than 100 DNA sequences that have been extensively studied by our group (14) and others (15–17) is provided for convenience. A link to PubMed’s Entrez Nucleotide Database is also provided. FASTA formatted sequences without the header lines are suitable input.
The energy model is an elastic energy model that utilizes the inter-base-pair DNA helical parameters (Tilt, Roll, Twist, Shift, Slide and Rise) (18). Let values for the six helical parameters of free DNA be denoted by the list XDNA. Since there are 16 canonical dinucleotide steps AA, AC,…. TG, TT, the lists are indexed as XDNA(j) 1<j<16. Similarly, the helical parameters describing the conformation of nucleosomal DNA with nnuc base pairs is denoted by the list Xnuc(i) where 1<i<nnuc −1. For a sequence of length nseq>nnuc base pairs, the energy associated with deforming any subset of nnuc adjacent base pairs into the geometry required for a nucleosome is modeled as:
Here K(ji) denotes a 6×6 matrix of base pair step specific stiffness constants, and ji is the sequence identity at step i. In the simplest model of DNA elasticity K is diagonal and reduces to independent linear constitutive relations for each helical parameter. For a more realistic description of DNA, K is not such a sparse matrix; there exist couplings between deformations of the various helical parameters. Sequence-specific values for K are available from refs. 19 and 20. For a sequence of length nseq there are nseq−nnuc values of Enuc(s). Each one represents the energy associated with a footprint of the histone octamer on the DNA that begins at s and extends to s+nnuc.
Currently, ICM Web allows four different choices for the conformation of free DNA, XDNA, and an accompanying set of stiffnesses, K. These are labeled XrayB, the B-DNA values from ref. 19; XrayP, the P–DNA (protein–DNA) values from ref. 19; MD-B, the values in ref. 20; and IdealB, an idealized set of B-DNA values. The values from ref. 19 are based on analysis of X-ray crystallographic structures of B-form DNA or of protein-bound DNA, while the values from ref. 20 were obtained from analysis of molecular dynamics simulations of free DNA in B-form conformation.
The values in IdealB represent an idealized conformation of free DNA with a Twist of 35.4°/bp and Rise of 3.32Å. The other helical parameters in the IdealB set are zero; thus, the intrinsic conformation of B-form DNA is straight for this set. It has no intrinsic bend or shear deformations. The matrix of stiffness constants for this idealized case corresponds to gross measures of stiffness that are sequence independent and exhibit no coupling, i.e. K is diagonal. This set is determined by averaging the sequence dependent values in ref. 20. Isotropic bend and shear values are determined as harmonic averages of the anisotropic values. The IdealB set (K and XDNA) will not capture any of the known sequence-specific properties of DNA or sequence-specific nucleosome stability measures. The energy landscape is flat (i.e. constant) so it cannot be used to predict nucleosome positioning. However, the IdealB parameters sets, K or XDNA, are useful for determining the role of intrinsic conformation, nucleosome conformation, Xnuc or sequence specificity, K, when mixed with other parameter sets.
The choices for conformation of the nucleosome, Xnuc, are more numerous. There are currently some 30 X-ray structures of the nucleosomes. Helical parameter sets for 20 of these are provided as possible choices. The choices on the web site correspond to the protein databank id tags (http://www.rcsb.org). Recently, Bishop conducted a study of all available X-ray structures to identify the necessary and sufficient distribution of helical parameter values required to achieve high-resolution models of nucleosomal DNA geometry (21). This was accomplished using Fourier filtering of the helical parameters to create so-called knockouts and knockins. The knockout and knockin series obtained from the crystal structure 1kx3 are provided. These lists of helical parameters are indicated as ko.NN.par or ki.NN.par where NN represents the knockout wave number or the total number of knockins (see ref. 21 for complete details). These structures do not necessarily provide realistic models of the nucleosome's 3D structure. For some X-ray structures a ‘min’ selection is also offered. These structures contain only the necessary and sufficient distribution of helical parameter values required to achieve a high-resolution reconstruction of the original X-ray structure, i.e. within 3 Å RMSD, as determined in ref. 21. Since the helical parameters in the sets denoted with a min have been smoothed by Fourier filtering, the energy landscape predicted with these data sets is also smooth compared to the corresponding non-filtered data set. Additionally, the energy values are lower.
The default settings for K, Xnuc and XDNA provide energy values that have a 0.7 correlation with over 100 experimentally determined values of nucleosome stability (14), 84 from refs. 15 and 22 and 33 from ref. 16.
For DNA free in solution, random thermal fluctuations cause its conformation to deviate from the intrinsic helical parameters, XDNA. In nucleosomes, the gross super helical structure of DNA must be preserved. Therefore, thermal motion of nucleosomal DNA is not entirely random. ICM Web allows the user to introduce thermal fluctuations into the model of free DNA and the linker DNA between nucleosomes. For this purpose, a temperature must be given. Temperature serves as a scale factor for the Gaussian noise added to each base pair step parameter in the free DNA or linker DNA regions as the 3D models are constructed. For each helical parameter a value is determined by where is a routine that provides a Gaussian distribution with mean value μ and deviation σ. Here, T represents the temperature in Kelvin specified on the web form and σABC represents the sequence independent fluctuation values for Tilt, Roll, Twist, Shift, Slide and Rise obtained from molecular dynamics simulations of B-form DNA free in solution (23). A temperature of zero thus provides no variation from the XDNA set of helical parameter values selected as input. The random seed is fixed so that the results are reproducible.
There are two placement options available. One defines the starting coordinate of each nucleosome footprint. The other allows ICM Web to automatically place nucleosomes in the energy landscape based on occupancy criteria.
We define occupancy as the number and location of nucleosomes on a segment of DNA. Free DNA corresponds to an occupancy of zero. The maximum number of nucleosomes that can be placed on a segment of DNA is nmax=nseq/(nnuc+nlk) where nseq is the length of the input sequence, nnuc is the number of base pair in the nucleosome and nlk is the minimum allowed length of linker between nucleosomes. Occupancy is controlled on the server by specifying nocc and nlk. If nocc>1 an attempt will be made to place this many nucleosomes on the sequence of DNA. If 0<nocc<1, nocc is interpreted as a percent of nmax. In practice, we find that 70–80% occupancies can be readily achieved with our simple population method described below. Higher occupancies cannot usually be realized with this method. To achieve occupancies approaching 100%, the user has to specify nucleosome start sites.
The automated placement option utilizes the energy landscape to populate the landscape as follows: the global minimum of the energy landscape is determined by sorting all possible locations and a nucleosome is located at this position. This nucleosome’s footprint occludes nnuc base pair from the landscape. The sort and locate process repeats until the desired occupancy is achieved. The specification of a nonzero linker length extends the nucleosome footprint by nlk/2. This is an ad hoc method of avoiding steric overlap between neighboring nucleosomes without including an explicit nucleosome-nucleosome interaction potential. This simple occupancy algorithm does not necessarily guarantee the lowest total energy conformation for the nucleosome array. More sophisticated algorithms are being considered, but for web functionality they must also be fast.
Since nocc<1 is a non-physical occupancy we utilize such values to trigger alternate methods of populating the segment of DNA with nucleosomes. If nocc=0 then nucleosomes are located at regular intervals spaced by nlk, i.e. 0, nnuc+nlk, nnuc+2 nlk…. If nocc<0 the nucleosomes are spaced at regular intervals as in the nocc=0 case but the positions of all nucleosomes are shifted by nocc base pairs. For example, if the occupancy is −5 the footprints are 5, nnuc+nlk+5,….
Instead of using our automated procedure, a list of nucleosome positions can be specified directly. In this case, an energy landscape is calculated and reported, but only the specified positions are used to determine nucleosome start sites. This allows users to utilize their own algorithms for determining nucleosome positioning or to use experimentally determined values as compiled in ref. 3 to generate a putative chromatin fold. On the ICM input page links are provided to other web sites that predict nucleosome positions.
The ‘Reset’ button clears all forms and sets all variables to default values. The ‘Go’ button initiates three calculations and generates the results page. The first calculation determines the energy landscape. The second determines nucleosome footprints via the iterative search criteria mentioned above or checks the validity of the user defined start positions before placing nucleosomes at the indicated positions. Two sets of helical parameters are then generated: one describes free DNA; the other describes the putative chromatin template. The third calculation converts the helical parameter representations to Cartesian coordinate representations of the DNA using the algorithm defined in ref. 24.
The data from these calculations is presented on the results page (Supplementary Figure S2), which contains an energy landscape and footprints displayed as a 2D nucleosome energy level diagram; DNA helical parameters displayed a set of six plots, a 6D model; and two 3D models, one for free DNA and one for the nucleosome array. Since nucleosome positions are calculated based only on nucleosome-DNA interactions (i.e. in helical parameter space) or obtained as a list of start sites, there is no a priori check to identify unfavorable steric interactions. For this reason, a distance calculation is conducted on the 3D nucleosome array to identify such steric clashes. The minimal nucleosome–nucleosome, nucleosome–DNA and DNA–DNA distances are reported.
The 3D models are 4 points-per-base pair models of DNA in xyz format. The four atoms define the centerline (atom name CA) and the local helical axis frame (atom names H1, H2 and H3). For the chromatin model, an additional atom is located at the center of geometry of each nucleosome (atom name OC for octasome center). The 3D models are displayed using Jmol: an open-source Java viewer for chemical structures in 3D (http://www.jmol.org/) (25). Jmol displays a sphere of diameter 2nm every 5bp along the DNA and a sphere of diameter 8nm for each histone octamer. Spheres in close contact are colored red. Close contacts are defined as <110Å for nucleosome-nucleosome spacing, <65Å for nucleosome-DNA spacing, and <20Å for DNA–DNA spacing. The user has full control over the display in Jmol and can color base pairs to display biological information (Figure 3). A ‘Save Data’ link allows for download of all data. The helical parameter files can be converted into all-atom models by 3DNA (26) or a coarse-grain representation with a VMD plug-in that we developed, VDNA (27).
As noted in methods, if XDNA and K are chosen to be IdealB then the total energy is constant, regardless of sequence. The only determinant of energy in this case is Xnuc, the conformation of the nucleosome. If Xnuc is set to 1kx5.min the energy required to deform DNA into this nucleosome is 151kcal/mol-nuc ~ 1.7 kbT/bps. For Xnuc=1kx5 the value is 417kcal/mol-nuc ~ 4.8kbT/bps. The corresponding results for pdb entry 1kx3 are 173kcal/mol-nuc ~2.0kbT/bps and 502kcal/mol-nuc ~ 5.8kbT/bps (1kx3 has 146 instead of 147bp). These results agree well with physical expectations. For a simple homogenous model of DNA, ~2kbT/bps is needed to wrap DNA around the histone octamer (28). Our smoothed nucleosomal DNA super-helices, 1kx5.min or 1kx3.min, yield this result. The helical parameters obtained directly from X-ray structures, 1kx5 or 1kx3, have not been smoothed so the energy is higher. Since there are 6 degrees of freedom in our model of DNA we expect the effects of thermal motion to contribute an extra 3kbT/bps over the minimum energy conformation. Again, our results agree with this expectation. Similar energy values are obtained if the sequence specific parameter set denoted, MD-B, is used.
Using the sequences in refs. 16 and 17 as a test set we have demonstrated that the default energy options (K=MD-B.dat, Xnuc=1kx5.min.par, and XDNA=MD-B.par) achieve a correlation of 0.7 with the experimentally determined values of ΔΔG associated with these sequences of DNA. Results using the XrayB and XrayP sets are not so well correlated. A complete analysis has been reported elsewhere (14).
Thermal fluctuations strongly affect the 3D conformations of the free DNA model and nucleosome array (Figure 1). For the free DNA conformation thermal motion introduces a random distribution about the intrinsic conformation of each base pair step. For nucleosome arrays, thermal fluctuations are only introduced into the linker regions. If the temperature is set to 0K, then the equilibrium conformation is defined only by the choice of helical parameters XDNA. A zero temperature model is useful for identifying intrinsic bends or other deformations in the free DNA and linker regions. A nonzero temperature is useful for investigating effects of flexibility on overall conformation.
The choice of occupancies also strongly affects the overall conformation of the nucleosome array. We provide a series of images to demonstrate various occupancies (Figure 2). All of the images have been assigned a temperature of 298K. The images include uniform nucleosome spacing (i.e. constant linker) with different linker lengths and irregular nucleosome spacing determined by our minimum energy criteria with different percent occupancies.
We utilize the MMTV long terminal repeat promoter as a sample sequence because it is widely used as an experimental construct to regulate gene expression. It has been extensively characterized (29–33). The sequence GR-MMTV (34) used here can be obtained from PubMed as entry V01175. The C3H strain of the MMTV, J02274 (35) is also available from PubMed. Differences between these sequences are discussed in ref. 36.
This promoter is steroid activated but some basal transcription occurs even in the inactive state. This promoter is known to position six nucleosomes, denoted [A–F], that assemble into an organized chromatin structure (30,37). The location of these nucleosomes are indicated in Figure 3 with A +70, B −127, C −415, D −672, E −918 and F −1139. (The reported positions are for the nucleosome dyad position relative to the transcription start. Our Enuc uses the footprint start site so subtract 73 from these values.) The promoter contains four binding sites for the glucocorticoid receptor (GR), two octamer transcription factor 1 binding sites (Oct) and a nuclear factor 1 (NF1) binding site (34). Nucleosomes A and B are of particular interest because nucleosome A overlaps with the TATA element, while nucleosome B overlaps with the NF1 and four GR binding sites. The rotational phasing of B leaves the GR binding sites open to binding (38). Steroid activation alters the organization of nucleosomes B and C (39,40), and establishes a specific translational phasing that does not exist in the inactive state (41). There also appears to be a boundary effect resulting from the stability of nucleosomes A and D, while the positioning of C is influenced by the more proximal region of the promoter (41). Recent models for MMTV structure and mechanism of action are presented in (42) and (43), respectively. A bent secondary chromatin structure is proposed in figure 7 of ref. 42. Given this information, we can readily interpret the nucleosome energy level diagram and the models predicted by ICM Web.
The energy landscape predicted with the default energy options produces the locations of six positioned nucleosomes in the MMTV (37,40) with a high degree of accuracy. Nucleosome D is the most stable and is separated from A, B and C by a region of high energy where nucleosomes are least likely to form. We propose that this feature corresponds to the observed boundary effect. The models returned in the Jmol viewer clearly indicate that the MMTV with six positioned nucleosomes is a much more open structure than depicted in the literature (42). For this array, the minimum nucleosome-nucleosome distance is nearly 24nm, thus the histone tails from different nucleosomes cannot interact. The TATA box (green) and most distal GR binding site (red) are just outside the nucleosome footprint and are in close proximity to one another as the linker enters and exits nucleosome B. The Oct 1 binding site (orange) closest to the TATA box has a GR binding site adjacent to it, while the NF1 binding site (blue) is directed away from the entire nucleosome array.
It is not likely that the length of free DNA between nucleosomes will assume the extended conformation indicated by ICM Web (Left image in Figure 3); rather, extra-nucleosomal interactions and thermal fluctuations will cause the linker DNA to deviate from its equilibrium conformation. However, the fact that six positioned nucleosomes leaves significant lengths of DNA exposed has not been specifically addressed in the MMTV literature. For highly purified in vitro systems, extra-nucleosomal interactions may cause the linker to self-interact or bend and enable the MMTV to further condense. For in vivo systems of chromatin, not only are there spatial restrictions on the chromatin, but there are also numerous other factors that come into play.
For these reasons, we allow the user to override the energy landscape and specify a uniform distribution of nucleosomes or an arbitrary set of nucleosome locations. A model of condensed chromatin can be obtained by selecting the number of nucleosomes, nocc, to zero (uniform spacing) and a linker length, nlk, of 30bp (Right image in Figure 3). The minimum energy for the condensed array is obtained when all nucleosomes are positioned with a phase shift of 6 (nocc=−6), i.e. the first nucleosome starts at position 6. This positioning locates nucleosome B such that the TATA box (green) is outside of the footprint of nucleosome B (right image in Figure 3). The TATA box also appears on the surface of the fiber. At the other end of nucleosome B, the GR binding sites are shifted into the linker regions and are located deep within the chromatin fiber. However, as the right image in Figure 3 indicates, even when the GR binding sites are located deep within the fiber they are still very accessible. This result agrees well with observations of fast on-off rates for GR in vivo (44). ICM Web also predicts a bent chromatin structure. Since all nucleosomes are identical the bend in chromatin arises only from deformations in the linker DNA.
We have demonstrated how ICM can be achieved in real time using a simple elastic model to automatically place nucleosomes. We have also introduced the idea of a nucleosome energy level diagram as an intuitive means of tracking energy differences associated with different nucleosome footprint configurations. Our present model of nucleosome stability is overly simplistic. Clearly, a more complete description of chromatin will account for nucleosome–nucleosome, nucleosome–linker, linker–linker interactions and all other types of extra-nucleosomal interactions. Our ‘linker’ length is an ad hoc method of providing some of this functionality. There are a number of efforts underway to investigate nucleosome positioning, see for example, ref. (45). Such effects will be included in future versions of our tools.
Since a unique structure for chromatin may not exist, we regard ICM Web, not so much as a predictive tool, but as a tool for rapidly assembling models of chromatin that can be employed to rationalize biophysical data, especially spatial relations. Such a tool is crucial for investigating molecular mechanisms of chromatin. To our knowledge, there is no such tool available. ICM Web provides this functionality.
Funding for open access charge: National Institutes of Health (R01GM076356 to T.B.).
Conflict of interest statement. None declared.
Supplementary Data are available at NAR Online.
We wish to acknowledge Gregory Foster for his efforts in creating an initial version of this web page. Carleigh Hebbard, Austin Griffith, and Yuriy Sereda are acknowledged for testing the web page and other contributions to our efforts.