The input page of DBD2BS includes two submission forms corresponding to the two flows in . The first prediction mode includes an extra step for generating the DBD–DNA complex structures by performing structure alignment. In each mode, users can specify the query structure in three different ways: (i) provide a PDB ID (the protein chain ID is additionally required for the mode ‘Query with a protein structure’); (ii) specify the atomic coordinates of the query structure in PDB format in the text field or (iii) upload a structure file in PDB format.
The first form ‘Query with a protein structure’ allows only one monomer per run. If the query contains more than one protein chain, only the first protein chain is used. After pressing the submit button, the user is directed to the template selection page, where candidate templates are sorted by the TM-scores between the template and the query structure. This page includes the cluster information, template IDs (PDB IDs), structure alignment scores, sequence alignment scores (e-values between the template sequence and query sequence by performing BLAST with default parameters), template proteins (recognized by UniProt (23
) entry name) and template descriptions. Although templates with higher TM-scores are generally preferred, those with intermediate TM-scores but significant e-values can also be considered. Users are warned if the protein chain in the template is a long α
-helix. The superimposed complexes may be unreliable in such cases. To prevent that the query protein in the superimposed complex has serious collisions against the DNA structure, synthesized complexes containing more than five conflicting residues are excluded by DBD2BS automatically, where a conflicting residue is defined as a residue with at least one heavy atom within 1.5
Å to any heavy atom of the DNA. Users can decide whether the predictions are based on a set of similar templates or on a more diverse set. In each run, users must select one to four templates to make PWM prediction. Users can return to the page at any time to select other templates.
The left side of the result page (a) lists the templates selected in the previous page. The predicted PWMs are shown in the sequence logo form. Clicking the ‘3D’ buttons of sequence logos loads the corresponding templates into the Jmol (available at http://www.jmol.org/
) panel on the right side of the result page. In this panel, DBDs are displayed as ‘sticks’. Users can click the ‘Both’ radio button to see how the query protein and the template are superimposed. The DNA base pairs are colored according to their conservation level. The conservation score is derived by calculating the position entropy in the predicted PWM. The 5′ end of the sequence logo (the position ‘1’ in the sequence logo) in the Jmol panel is highlighted by showing the corresponding base in green so that users can quickly link the sequence logo with the DNA in the Jmol panel.
Screenshots of DBD2BS results. (a) Result page and (b) comparison page.
Two advanced functions are provided by DBD2BS to help users determine the reliability of the predicted PWM might be. First, atom collisions, the red sticks in a, are highlighted by clicking the ‘On’ radio button above the 3D viewer. Users should be aware of potential false predictions on the base pairs close to any atom collisions. Second, for any PWM of interest, users can click the ‘CMP’ button of the sequence logo to see whether the selected PWM (or some of the positions) is supported by the predicted PWMs from other templates. b shows that the sequence logo of the selected template (3N4M:A) is highlighted on the comparison page as the reference PWM. The reference complex is first aligned to each of the other complexes by superimposing the query protein inside them. After superimposition, the DNA structures from two synthetic complexes are structurally aligned by invoking dynamic programming. Base pairs from different complexes are aligned if their distance is within 2
Å. This may result in discontinuous alignment of the sequence logos. b shows that the unaligned positions are trimmed to produce new sequence logos. Comparing the sequence logos from different templates shows which positions have higher confidence when consistent predictions are observed.
On the comparison page, the DBD2BS also provides quantitative scores between each pair of aligned sequence logos. The similarity score with respect to a particular position from the two aligned PWMs, p
, is defined as follows:
) and qj
) are the frequencies of the base type α
at the position j
in PWMs p
, respectively. With sim(pj
), the similarity score between p
is defined as
is the alignment length of the two PWMs. Furthermore, the complete-similarity developed in the study of Tanaka et al.
) can be calculated by:
is the median score of sim(pj,rj
) with rj
) the frequency of the base type α
at the position j
in a randomly generated PWM r
. When the DBD2BS uses the complete-similarity score to measure the similarity between two predicted PWMs, 10
000 randomly generated PWMs were produced to get the median score mj
Clicking the ‘open’ button in the main result page (a) of each sequence logo reveals additional details of the selected template and its prediction results. Five files are prepared for download in this panel, including (i) the superimposed complex structure containing the query structure, (ii) the native complex structure of the template, (iii) alignment, (iv) contact residues and (v) PWM.
The second form ‘Query with a protein–DNA complex’ in the input page accepts a protein–DNA complex as the query, which enables users to generate protein–DNA complexes using other techniques such as protein–DNA docking (20
) or homology modeling (25
). This mode skips the template selection step and directly guides users to the result page. The given complex is regarded as the only one entry in the template list.