Search tips
Search criteria 


Logo of narLink to Publisher's site
Nucleic Acids Res. 1993 April 11; 21(7): 1655–1664.
PMCID: PMC309377

Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids.


We present a method to determine the location and extent of protein binding regions in nucleic acids by computer-assisted analysis of sequence data. The program ConsIndex establishes a library of consensus descriptions based on sequence sets containing known regulatory elements. These defined consensus descriptions are used by the program ConsInspector to predict binding sites in new sequences. We show the programs to correctly determine the significant regions involved in transcriptional control of seven sequence elements. The internal profile of relative variability of individual nucleotide positions within these regions paralleled experimental profiles of biological significance. Consensus descriptions are determined by employing an anchored alignment scheme, the results of which are then evaluated by a novel method which is superior to cluster algorithms. The alignment procedure is able to include several closely related sequences without biasing the consensus description. Moreover, the algorithm detects additional elements on the basis of a moderate distance correlation and is capable of discriminating between real binding sites and false positive matches. The software is well suited to cope with the frequent phenomenon of optional elements present in a subset of functionally similar sequences, while taking maximal advantage of the existing sequence data base. Since it requires only a minimum of seven sequences for a single element, it is applicable to a wide range of binding sites.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (2.0M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Koonin EV. The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. J Gen Virol. 1991 Sep;72(Pt 9):2197–2206. [PubMed]
  • Elena SF, Dopazo J, Flores R, Diener TO, Moya A. Phylogeny of viroids, viroidlike satellite RNAs, and the viroidlike domain of hepatitis delta virus RNA. Proc Natl Acad Sci U S A. 1991 Jul 1;88(13):5631–5634. [PubMed]
  • Doolittle RF, Feng DF, McClure MA, Johnson MS. Retrovirus phylogeny and evolution. Curr Top Microbiol Immunol. 1990;157:1–18. [PubMed]
  • Werner T, Brack-Werner R, Leib-Mösch C, Backhaus H, Erfle V, Hehlmann R. S71 is a phylogenetically distinct human endogenous retroviral element with structural and sequence homology to simian sarcoma virus (SSV). Virology. 1990 Jan;174(1):225–238. [PubMed]
  • Cavener DR. Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Res. 1987 Feb 25;15(4):1353–1361. [PMC free article] [PubMed]
  • Golemis EA, Speck NA, Hopkins N. Alignment of U3 region sequences of mammalian type C viruses: identification of highly conserved motifs and implications for enhancer design. J Virol. 1990 Feb;64(2):534–542. [PMC free article] [PubMed]
  • Galas DJ, Eggert M, Waterman MS. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J Mol Biol. 1985 Nov 5;186(1):117–128. [PubMed]
  • Mengeritsky G, Smith TF. Recognition of characteristic patterns in sets of functionally equivalent DNA sequences. Comput Appl Biosci. 1987 Sep;3(3):223–227. [PubMed]
  • Goodrich JA, Schwartz ML, McClure WR. Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites for Escherichia coli integration host factor (IHF). Nucleic Acids Res. 1990 Sep 11;18(17):4993–5000. [PMC free article] [PubMed]
  • Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986 Apr 5;188(3):415–431. [PubMed]
  • Stormo GD, Hartzell GW., 3rd Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A. 1989 Feb;86(4):1183–1187. [PubMed]
  • Hertz GZ, Hartzell GW, 3rd, Stormo GD. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci. 1990 Apr;6(2):81–92. [PubMed]
  • Cardon LR, Stormo GD. Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. J Mol Biol. 1992 Jan 5;223(1):159–170. [PubMed]
  • Lawrence CE, Reilly AA. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. 1990;7(1):41–51. [PubMed]
  • O'Neill MC. Training back-propagation neural networks to define and detect DNA-binding sites. Nucleic Acids Res. 1991 Jan 25;19(2):313–318. [PMC free article] [PubMed]
  • Demeler B, Zhou GW. Neural network optimization for E. coli promoter prediction. Nucleic Acids Res. 1991 Apr 11;19(7):1593–1599. [PMC free article] [PubMed]
  • Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. [PubMed]
  • Angel P, Hattori K, Smeal T, Karin M. The jun proto-oncogene is positively autoregulated by its product, Jun/AP-1. Cell. 1988 Dec 2;55(5):875–885. [PubMed]
  • Luisi BF, Xu WX, Otwinowski Z, Freedman LP, Yamamoto KR, Sigler PB. Crystallographic analysis of the interaction of the glucocorticoid receptor with DNA. Nature. 1991 Aug 8;352(6335):497–505. [PubMed]
  • Lee W, Mitchell P, Tjian R. Purified transcription factor AP-1 interacts with TPA-inducible enhancer elements. Cell. 1987 Jun 19;49(6):741–752. [PubMed]
  • Wingender E. Compilation of transcription regulating proteins. Nucleic Acids Res. 1988 Mar 25;16(5):1879–1902. [PMC free article] [PubMed]
  • Ghosh D. TFD: the transcription factors database. Nucleic Acids Res. 1992 May 11;20 (Suppl):2091–2093. [PMC free article] [PubMed]
  • Bucher P, Trifonov EN. Compilation and analysis of eukaryotic POL II promoter sequences. Nucleic Acids Res. 1986 Dec 22;14(24):10009–10026. [PMC free article] [PubMed]
  • Schüle R, Umesono K, Mangelsdorf DJ, Bolado J, Pike JW, Evans RM. Jun-Fos and receptors for vitamins A and D recognize a common response element in the human osteocalcin gene. Cell. 1990 May 4;61(3):497–504. [PubMed]
  • Risse G, Jooss K, Neuberg M, Brüller HJ, Müller R. Asymmetrical recognition of the palindromic AP1 binding site (TRE) by Fos protein complexes. EMBO J. 1989 Dec 1;8(12):3825–3832. [PubMed]
  • Ryden TA, Beemon K. Avian retroviral long terminal repeats bind CCAAT/enhancer-binding protein. Mol Cell Biol. 1989 Mar;9(3):1155–1164. [PMC free article] [PubMed]
  • Dorn A, Bollekens J, Staub A, Benoist C, Mathis D. A multiplicity of CCAAT box-binding proteins. Cell. 1987 Sep 11;50(6):863–872. [PubMed]
  • Jantzen HM, Strähle U, Gloss B, Stewart F, Schmid W, Boshart M, Miksicek R, Schütz G. Cooperativity of glucocorticoid response elements located far upstream of the tyrosine aminotransferase gene. Cell. 1987 Apr 10;49(1):29–38. [PubMed]
  • Strömstedt PE, Poellinger L, Gustafsson JA, Carlstedt-Duke J. The glucocorticoid receptor binds to a sequence overlapping the TATA box of the human osteocalcin promoter: a potential mechanism for negative regulation. Mol Cell Biol. 1991 Jun;11(6):3379–3383. [PMC free article] [PubMed]
  • Tsukiyama T, Niwa O, Yokoro K. Mechanism of suppression of the long terminal repeat of Moloney leukemia virus in mouse embryonal carcinoma cells. Mol Cell Biol. 1989 Nov;9(11):4670–4676. [PMC free article] [PubMed]
  • Halvorsen YD, Nandabalan K, Dickson RC. Identification of base and backbone contacts used for DNA sequence recognition and high-affinity binding by LAC9, a transcription activator containing a C6 zinc finger. Mol Cell Biol. 1991 Apr;11(4):1777–1784. [PMC free article] [PubMed]
  • McLauchlan J, Gaffney D, Whitton JL, Clements JB. The consensus sequence YGTGTTYY located downstream from the AATAAA signal is required for efficient formation of mRNA 3' termini. Nucleic Acids Res. 1985 Feb 25;13(4):1347–1368. [PMC free article] [PubMed]
  • Weiss EA, Gilmartin GM, Nevins JR. Poly(A) site efficiency reflects the stability of complex formation involving the downstream element. EMBO J. 1991 Jan;10(1):215–219. [PubMed]
  • Lamb P, McKnight SL. Diversity and specificity in transcriptional regulation: the benefits of heterotypic dimerization. Trends Biochem Sci. 1991 Nov;16(11):417–422. [PubMed]
  • Brack-Werner R, Barton DE, Werner T, Foellmer BE, Leib-Mösch C, Francke U, Erfle V, Hehlmann R. Human SSAV-related endogenous retroviral element: LTR-like sequence and chromosomal localization to 18q21. Genomics. 1989 Jan;4(1):68–75. [PubMed]
  • Quinn JP, Farina AR, Gardner K, Krutzsch H, Levens D. Multiple components are required for sequence recognition of the AP1 site in the gibbon ape leukemia virus enhancer. Mol Cell Biol. 1989 Nov;9(11):4713–4721. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press