We have shown that there is a dramatic enrichment of arginines in narrow regions of the DNA minor groove that provides the basis for a novel DNA recognition mechanism that is used by many families of DNA-binding proteins. A readout mechanism based on groove width requires a connection between sequence and shape. This connection appears to be provided in part by A-tracts, which have a strong tendency to narrow the groove, producing binding sites for arginines that, when spaced appropriately on the protein surface, offer a complementary set of positive charges that can recognize local variations in shape. Arginines often insert into the minor groove as part of short sequence motifs (e.g. RQR in the Hox protein Scr12
, RKKR in POU homeodomains18
, RPR in Engrailed36
, RGHR in MATa1/MATα217
, RRGR in the nuclear orphan receptor37
and RGGR in the human orphan receptor38
), thus offering a variety of presentation modes that can contribute to the specificity of DNA shape recognition.
The tendency of A-tracts to narrow the minor groove is due primarily to their ability to assume conformations, through propeller twisting, that lead to the formation of inter- base pair hydrogen bonds in the major groove15
. This network is disrupted by TpA steps as strikingly seen in the MogR binding site19
. GC base pairs also have a tendency to widen the minor groove14
. The combination of these and other factors, such as effects induced by flanking bases that are not directly located within the binding site39
, can produce a complex minor groove landscape that offers numerous possibilities for specific interactions with proteins. Indeed, minor groove geometry is no doubt the result of the interplay of intrinsic and protein-induced structural effects.
The physical mechanisms described here are dramatically evident in the nucleosome. The energetic cost of narrowing and bending the DNA in regions where the backbone faces inward will be reduced by the presence of short A-tracts that have an intrinsic propensity to assume such conformations and hence to bend the DNA28
. In addition, the penetration of arginines into the minor groove at sites where the DNA bends and the groove is narrow21,40
provides a significant stabilizing interaction
The variations in DNA shape observed in protein-DNA complexes often reflect conformational preferences of free DNA4,10,41
. Sequence-dependent conformational preferences have also been observed in computational studies11,21,42
and, most recently, analysis of hydroxyl radical cleavage patterns shows that DNA shape is under evolutionary selection43
. Such observations suggest that the role of DNA shape must be taken into consideration when annotating entire genomes and predicting transcription factor binding sites. The biophysical insights described here, together with the increased availability of high-throughput binding data, offer the hope of major progress in understanding how proteins recognize specific DNA sequences and in the development of improved predictive algorithms.