Biologists have long been fascinated by the green plant leaf and have tried to understand how leaves are born, live and die. In the last decades, several new approaches to study the structure and function of leaves have emerged: Molecular biology and molecular genetics have, for example, enabled identification of genes that regulate the primary function of the leaf - photosynthesis - and leaf development has been understood in much greater detail; high through-put transcriptomics has identified additional factors influencing leaf function, but traditional transcriptome analyses typically reduces the problem of finding key regulators to detecting differentially expressed genes or computing pair-wise similarity between targets and putative regulators (e.g. hierarchical clustering or co-expression networks). In contrast, systems biology analysis of transcriptional programs treats genes as interacting rather than isolated entities. Thus these methods can begin to understand how so-called emergent properties such as complex phenotypes arise from interacting genes. Whether this can be seen as taking a holistic rather than a reductionistic approach to science has generated quite some debate [1
], but systems biology methods account for synergistic and competitive effects between regulators that individually could have low similarity to the target. Methods for reverseengineering the transcriptional network from collections of gene expression data have been pioneered on single-cell organisms, but have increasingly been applied to higher order organisms [3
] including plants [4
] where applications of systems biology methods are now emerging. Most systems biology studies have - not surprisingly - utilized using "THE model plant" Arabidopsis thaliana
, where large transcriptomics programs have generated adequate quantities of high-quality data to enable systems analysis [6
]. For example, Carerra et al.
] modeled the transcriptional network of Arabidopsis
and identified plant-specific properties such as high connectivity between genes involved in response and adaptation to changing environments. However, not all aspects of plant biology can be studied in Arabidopsis
, which in many respects is a rather atypical plant. Indeed, it was not selected as a model system due to its physiological and ecological qualities, but rather for its suitability for genetic and genomic studies. Therefore, it is important to perform parallel studies in plants with other characteristics, as well as developing the methods to allow data from the Arabidopsis
system to inform studies in other organisms.
One rapidly emerging plant model system is Populus
]; it's interesting biology (a woody perennial) and the access to a sequenced genome [8
] represent an attractive combination. Correspondingly, more advanced data analyses approaches are now being applied in Populus
provides an attractive model system for studies of leaf biology. For example, Sjödin et al.
] exploited the fact that mature aspen (Populus tremula
) in boreal regions have the rather unique property that all leaves emerge simultaneously from overwintering buds. This provides a synchronized system, resulting in a full temporal separation of the leaf developmental stages and subsequent acclimation that could be exploited using transcriptomics. Access to a centralized repository of much of the Populus
cDNA microarray data [10
] and databases for the analysis of gene expression - and other - data [11
] substantially facilitates the ability to perform systems biology studies. For example, Grönlund et al.
] induced a co-expression network revealing modular architecture explaining gene function and tissue-specific expression; Street et al.
] identified co-expression networks across a large collection of leaf transcriptomics data and found that some network hubs have existing functional evidence in Arabidopsis
; Quesada et al.
] performed a comparative analysis of the transcriptomes of Populus
, and found evidence of extensive remodeling of the transcriptional network, although some essential functions showed little divergence. A few studies have also integrated promoter information to study regulatory control in Populus
. Shi et al.
] identified combinations of xylem-specific motifs in Populus
promoters. Another study inferred transcriptional networks in xylem, leaves, and roots, and showed that genes with conserved regulation across tissues are primarily cis
-regulated, while genes with tissue-specific regulation are often trans
]. All these studies are essentially co-expression networks that visualize expression similarity between pairs of genes, but do not infer complex interactions.
Network inference methods using expression data can be divided into those that aim to model the general influence that genes have on the expression of other genes (gene networks) [17
] and methods that aim to model the physical interaction between transcription factors and the regulated genes (gene regulatory networks) [19
]. Both approaches employ common network inference methods (see e.g. [20
]), but those that infer gene regulatory networks also typically integrate motif finding and detection of transcriptional modules [23
]. Approaches that describe how the regulatory genome orchestrates dynamic gene expression has developed from Pilpel et al.
], who showed that yeast genes sharing pairs of binding sites in their promoters were significantly more likely to be co-expressed than genes sharing only single binding sites, to various machine learning methods that identify modules of co-expressed genes with common motif patterns in their promoters (so-called cis
Here we apply a network inference method combining promoter information and expression data to describe the transcriptional network in Populus leaves. Our aims were (1) to detect regulatory hubs in leaves, (2) to describe conservation of transcriptional regulation within Populus and between Populus and Arabidopsis, and (3) to understand the regulatory complexity in leaves by comparing systems biology and traditional bioinformatics as methods for detecting target genes for further analysis. This study goes beyond previous meta-analyses of Populus transcriptome data by taking into account synergistic and competitive interactions between regulators, and by systematically integrating the regulatory genome and the transcriptome to infer networks. We show that our network is robust, explains available gene function information and generalizes to new expression data in both Populus and Arabidopsis. We identify the main regulators of primary processes in leaves, and show how some of these have regulatory partners orchestrating expression either in a synergistic or competitive manner. Such interactions are not considered by pair-wise similarity methods, and thus several of the regulators predicted here would not have been identified by traditional approaches.