The concept of RNA secondary structure began with the work of Doty and Fresco (
1,
2). The prediction of RNA secondary structure (folding) by energy minimization using nearest neighbor energy parameters began with Tinoco and colleagues (
3–
6) and also with Delisi and Crothers (
7). Efficient algorithms for RNA secondary structure prediction using dynamic programming methods borrowed from sequence alignment were developed independently by a number of people (
8–
13).
My own early RNA folding programs (
12,
14) computed a single minimum energy folding of an RNA sequence. They were popular in the 1980s and a modified version was incorporated into the UWGCG (University of Wisconsin Genetics Computer Group) suite of programs (
15); the acronym was shortened to GCG when this group left the University of Wisconsin to form a private company. Initially, they used free energy parameters that had been summarized by Salser (
16). After 1986, free energies from the Turner group (Doug H. Turner, Department of Chemistry, University of Rochester, Rochester, NY) were used (
17).
The ‘mfold’ software for RNA folding was developed in the late 1980s (
18). The ‘m’ simply refers to ‘multiple’. The core algorithm predicts a minimum free energy, Δ
G, as well as minimum free energies for foldings that must contain any particular base pair. Any base pair,
ri−
rj, between the
ith nucleotide and the
jth nucleotide that is contained in a folding no more than δδ
G from the minimum, is plotted in a triangular plot called the ‘energy dot plot’. The base pair
ri−
rj is plotted in row
i and column
j of this matrix. The free energy increment, δδ
G, is chosen
a priori by the user, who selects a ‘percent suboptimality’,
P. From this, δδ
G is computed to be
P/100 |Δ
G|. Base pairs within this free energy increment are chosen either automatically, or else by the user, and foldings that contain the chosen base pair are computed. They have minimum free energy conditional on containing the chosen base pair. The description and use of the mfold package has appeared in a number of articles (
19–
22). The closely related ‘RNAstructure’ program has also been described (
23,
24).
The Turner group has published numerous articles over the years that detail the development of the RNA folding parameters. A subset of these articles are what I would call ‘major works’ that summarize the current state of the art. Version 1 of the mfold package used free energies that were described by Freier
et al. (
17). Versions 2.1 to 2.3 used the parameters from Walter
et al. (
25), although the incorporation of coaxial stacking parameters into the minimization algorithm has not been accomplished. The current version 3 software uses free energy data from Mathews
et al. (
26).
DNA folding prediction with the mfold software began in 1996, when DNA specific parameters were added to the mfold package through a collaboration with the SantaLucia group (John SantaLuica Jr., Department of Chemistry, Wayne State University, Detroit, MI). These data have been described by SantaLucia (
27). The DNA stacking (
27), single mismatch (
28–
32) and dangling end (
33) parameters have been measured in the SantaLucia laboratories. The remaining terminal stacking and loop parameters were estimated by SantaLucia and have been incorporated into the mfold package by personal communication from John SantaLucia. They remain unpublished. In 1999, corrections for [Na
+], [Mg
++] were incorporated into the mfold package (
34).
The mfold web server was first created at Washington University School of Medicine during the fall of 1995. DNA folding parameters were added in the spring of 1996. From 1995 until the fall of 2000, the server ran on SGI workstations or multiprocessor servers (Silicon Graphics, Inc. 1600 Amphitheatre Pkwy., Mountain View, CA 94043), as well as on a dual processor ‘Intel/Solaris’ platform (Pentium 2 processors by Intel Corp., Solaris operating system by Sun Microsystems). The server was moved to Rensselaer Polytechnic Institute (RPI) in October 2000. It ran on a dual processor ‘Intel/Linux’ platform (Linux operating system as developed and marketed by Red Hat, Inc., 1801 Varsity Drive, Raleigh, NC 27606). Since July 2002, the mfold web server has been running on a cluster of 36 dual processor ‘Intel/Linux’ workstations that were donated to the joint RPI-Wadsworth Bioinformatics Center by IBM (IBM Research, P.O. Box 218, Yorktown Heights, NY 10598). This equipment was awarded as an SUR (Shared University Research) grant to RPI and the Wadsworth Center (PI: M. Zuker).
The use of the mfold web server has grown steadily since its inception. The ‘quikfold’ server was added while I was still at Washington University. The remaining servers that will be described have all been added within that past 2 years. The servers have been used extensively by researchers in universities, medical schools, non-profit organizations, US government and military laboratories and by companies all over the world. In addition, the servers are being used in the teaching of computational biology methods.