The increased use of high-throughput analysis methods, such as microarrays, in mainstream biological research has led to a shift from studying small groups of reasonably well-characterized variables to exploring a complicated mire of thousands of inter-related variables simultaneously [
1]. These methods are powerful, but their outputs are complicated and difficult to interpret due to the sheer volume of data produced. Interpretation can be prohibitively time consuming in the absence of computational assistance.
The ultimate goal of any microarray experiment is to gain insight into the workings of cellular organisms by understanding the interactions of genes and proteins. For this to be accomplished, raw data must not only be converted into information, but this information must also be interpreted in context, to be transformed into timely biological discovery and knowledge [
2]. Currently, the lack of a community-wide consensus on how best to integrate experimental data with information resources limits this knowledge acquisition [
2]. The recent work of Saraiya
et al. (2005) [
1] highlighted a "critical need" for tools able to "connect numerical patterns to the underlying biological phenomena", as current techniques fail to adequately link microarray data to biological meaning, which limits researchers' biological insights [
1].
One intuitive way to integrate biological knowledge and microarray data is through protein-interaction networks, where nodes represent proteins and edges symbolize relationships between proteins [
3]. However, focusing solely on physical protein interactions, such network constructs neglect a wealth of knowledge currently distributed among hundreds of existing biological databases (over 1000 listed in this year's
Nucleic Acids Research database issue alone [
4]) that is directly applicable to proteins investigated via microarray experiments. Current protein network constructs typically focus on a small subset of this biological knowledge, producing incomplete and sparsely populated resources. This is a particular problem for higher eukaryotic organisms such as mice and humans, for which physical protein interaction data are limited.
In agreement with Lee
et al. (2004) [
5] and Leach
et al. (2007) [
3], we demonstrate by expanding the definition of 'interaction' to include functional information that a) there is enough publicly available biological information to produce biologically useful, well populated interaction networks for higher eukaryotic species, b) through the combination of expression data and functional information, it is possible to provide contextual insight into the network, and c) it is possible to effectively link to existing biological knowledge using current technology. Using a murine craniofacial developmental expression microarray dataset [
6] and a recently published technique for weighting and integrating functional interaction information [
3], we illustrate how the application of context sensitive methodology leverages the full force of current available biological knowledge, enabling the translation of complex high-throughput datasets into scientific insight and discovery.