We have developed a framework to formally couple Bayesian network learning and experimental feedback to model a specific biological response in yeast. We were able to use this integrative approach to achieve two goals. First, we discovered an additional layer of regulation acting upon YHB1 transcription, a key mediator of nitric oxide defense. Secondly, our approach dissected out specific versus nonspecific responses to NO· and reactive nitrogen intermediate exposure. The core structure of the Fzf1p-dependent NO·-specific response sub-network (nitric oxide, FZF1 genotype Fzf1p activity, and Fzf1p response clusters) was predicted and maintained throughout the three models. The transcriptional responses to other environmental factors were gradually elucidated by additional iterations of the process.
Previous studies have suggested that YHB1
is important for the survival of yeast under oxidative and nitrosative stress 
. Our results show YHB1
is transcriptionally regulated by both NO· exposure mediated through Fzf1p and glucose repression mediated by Tup1p. Taken together, these data indicate that YHB1
is regulated by many environmental signals, highlighting the combinatorial control of this gene. While glucose derepression caused a 2 to 3-fold increase in Yhb1p protein levels, studies have shown a 10-fold increase by NO· treatment, suggesting a more prominent role of Yhb1p in NO· detoxification 
Within the context of our Bayesian model, we were able to utilize available biological knowledge to systematically explore the response to nitric oxide by the refinement of the random variables used. Clearly, incorporation of prior biological knowledge has the effect that our results will be biased towards our current understanding of the problem. While this fact represents a caveat, all models make assumptions; and the biological knowledge in this case was extremely useful to uncover the underlying relationships. Indeed, prior knowledge in this case can be considered to be a critical property of the process, since proper definition of the random variables used to model the dataset was essential to arrive at a biologically meaningful conclusion.
A common practice in statistical learning is to select one single model that best fits the data. But in many situations, other models also score very well although not necessarily the best. Using a single highest scoring model to derive a biological conclusion is potentially risky. To circumvent this problem, we used the average of all the high scoring networks found by the searching procedure 
. An added benefit of this approach is that it yields a confidence score associated with each edge connection 
. The confidence score is especially useful for filtering out low-confidence connections from a complex network, thus simplifying what might otherwise be a confusing network.
Since Bayesian network edges represent statistical instead of causal relationships, it is possible that a derived edge does not represent a direct biological connection. For example, two gene clusters sharing high mutual information would likely be connected. One method to eliminate such connections is to merge those highly correlated clusters into a single node. Additionally, structural constraints may be used to define gene expression nodes as leaf nodes and the environmental variable nodes as root nodes.
Gene clusters were defined through an automatic hierarchical clustering algorithm with manual interventions. Although it is not purely automatic, this step was where we incorporated prior biological knowledge to interpret the gene expression dataset. Therefore it is critical for ensuring the defined gene cluster nodes that truly represent the underlying gene expression profiles.
The computational framework and experimental approach presented here essentially represents a supervised data exploration system. The overall methodology is a straightforward hypothesis-generation, testing, and refinement cycle. However, complex datasets with large numbers of measurements become increasingly difficult to represent and score with regard to a given hypothesis. The creation and use of Bayesian networks, incorporating prior knowledge, allows for systematic scoring of a given hypothesis, and furthermore, provides an opportunity for automatic learning, which in turn can facilitate the discovery of new relationships within the data.