MYBS provides a web-based interface with three main features: binding sites mining, regulatory association searching and target gene selection for each TF pair. MYBS allows users to search for occurrences of a motif in the promoters of a gene, or potential binding sites for a TF. For binding motifs and TFs, their target genes are also reported. MYBS also enables users to visualize in parallel the potential regulators for a given set of genes and allows users to obtain target/non-target gene sets of a pair of TFs in different combinations.
For each function, MYBS allows users to search for occurrences of possible binding sites computationally without using any filters or by applying two filters—phylogenic information and ChIP-chip data—to improve the accuracy of binding site search. The user may request that a TFBS be conserved across a user-defined number of species (ranging from zero to seven) within a neighboring region of 25
bp upstream and downstream of the binding site occurrence in S. cerevisiae. In addition, the user can alter the degree of experimental support for TF-DNA binding affinity by setting the P
-value in a ChIP-chip experiment.
The underlying core of MYBS is the integration of motif information. Each motif is linked to one or more TFs, and points to a set of genes whose promoter sequences contain incidences of the motif. Similarly, there may be multiple consensuses accrued from various sources listed for a given TF (). The bi-directional search can start from a TF, a motif or a gene, and allows for easy identification of regulatory associations between TFs and between motifs. For example, the user can query a short sequence pattern (I.U.B. code allowed) to acquire a list of matching binding motif consensuses. One can choose a motif from the list for detailed information, including its corresponding TFs, the sequence logo, the PWMs and the cutoff thresholds of the PWMs. In addition, MYBS allows users to scan any given sequences for binding occurrences of the selected motif. The user can further select which TF he/she is interested in. With the choice of either or both of two user-defined filters, MYBS provides a potential target gene list of the selected motif and allows the user to look into visualized sequence information for one or multiple genes simultaneously. The user can include or exclude certain databases in the process, and also discover other potential regulators of the selected genes. All related information can be downloaded as plain text files or image files.
Figure 1. Users can study the relationships between motifs, TFs and genes in the following ways: For a query by motif consensus, MYBS will report the TFs whose binding consensus match well to it (upper left); the user may also obtain possible motif consensuses (more ...)
In order to give the user an idea of the significance of the TF predicted to be enriched in a given group of target genes, we calculate an enrichment P
-value for each TF in ‘Search regulatory association’. This is done by calculating the probability of finding x
or more promoters in a user-input gene set that can be bound by the specified TF, in addition to fulfilling the ChIP-chip and conservation requirements set by the user:
is the overall number of genes examined, K
is the subset of M that are bound by the TF, N
is the size of the user-input gene set, and x
is the number of promoters within the user-input set that are bound by the TF.
Since the calculation is done for every single TF, the P-value calculation, which could be computationally intensive, is made optional by the user. If the button ‘Calculate enrichment P-value’ is clicked, an enrichment P-value will be shown for each TF in either the text or graphical output.
We also provide P
-values for the ‘Find target genes for TF pairs’ function. For any given pair of TFs, we construct a 3 × 3 contingency table and perform the chi-square goodness of fit test.
statistic follows a chi-square distribution with four degrees of freedom (3 − 1) × (3 − 1). The P
-value gives the user an idea of the probability of the two TFs being associated in a non-random manner. Note that we assign a default P
-value of 1 if the expected number of genes E11
simultaneously bound by TF1 and TF2 exceeds the observed number of genes O11
Since MYBS allows users to dynamically select different criteria for desired TFBSs, it is not easy to know the reliability of the MYBS predictions. To address this issue, for
101 experimentally verified TFBSs of 12 TFs (21
) we analyzed their corresponding ChIP-chip P
-values and the degree of conservation. Overall, 12 sites failed to be recognized by the PWMs of the corresponding TF in the MYBS database. shows the range of ChIP-chip P
-values of these target genes and their degree of conservation across species. As shown, ~65% of promoters where the experimentally verified TFBSs reside have ChIP-chip P
-values < 0.01 and more than 70% experimentally verified TFBSs are conserved in at least three species.
Figure 2. Distributions of ChIP-chip P-values and the degree of conservation for 101 experimentally verified TFBSs in S. cerevisiae. (A) Comparison of ChIP-chip P-values of 101 TFBSs that were identified and were not identified by MYBS. The ChIP-chip P-values are (more ...)
A case study
MYBS enables users to visualize in parallel the potential regulators for a given set of genes, providing scientists with an efficient way to glance at potential underlying transcription mechanisms. Here we present an example of how this feature can be used toward potential regulatory association discovery. Burckin et al
) used splicing-sensitive microarrays to investigate the impact of perturbations on the steady-state levels of mRNAs and pre-mRNAs. Among these perturbations was one that used a conditional-lethal ded1 allele to inactivate Dep1p, a translation initiation factor (23
) that is also known to be functionally involved in splicing (24
). According to their results, a subset of intron-containing genes is sensitive to the loss of Dep1p. It is interesting to ask why Dep1p preferentially affects these intron-containing genes and whether these genes have anything in common in their promoter regions, since transcription and splicing are known to be coupled (25
). To do this, we used the function ‘Search regulatory association’ to identify which TFs potentially regulate these genes. As shown in , a contact map of the genes against all TFs is presented in the image format, and sorted according to the number of regulatory interactions. We found that 69 of the 111 Ded1p-sensitive intron-containing genes contain both FHL1 (Fork Head-Like) and RAP1 (Repressor Activator Protein) binding sites in their promoter regions (indicated by a red block). RAP1 encodes an essential protein involved in many processes in S. cerevisiae
, including telomere maintenance, transcriptional silencing and high level transcriptional activation of genes encoding ribosomal proteins (RP) (27
). FHL1 is a putative transcriptional regulator with similarity to the DNA-binding domain of Drosophila forkhead and is required for rRNA processing (28
). Martin et al
) showed that FHL1 is also involved in the regulation of RP gene transcription in yeast. In contrast, only five of the 143 Ded1p-insensitive intron-containing genes harbor FHL1 or RAP1 binding sites in their promoter regions. These observations raise the possibility that Ded1p's influence on splicing can be exerted, either directly or indirectly, via promoter regions that contain both FHL1 and RAP1 binding sites.
Figure 3. An example of a regulatory associations contact map. A regulatory association search using MYBS is performed on 111 intron-containing genes sensitive to Dep1p loss. The search provides a contact map of the genes against all TFs, and the map is sorted (more ...)