With advances in whole genome high-throughput technologies such as ChIP-Chip, expression, and genotyping arrays, it is now possible to integrate data from these sources together to decipher the complex regulatory networks that govern transcription. In addition to serving as powerful models for how basic cellular function is achieved, these regulatory networks can also help us shed light on how certain disease phenotypes are manifested. At the heart of these networks are a few regulator genes such as transcription factors (TFs), miRNAs and histones whose activity govern the behavior of many other genes. Among these regulators, transcription factors that bind the promoter regions of genes are by far the most well understood. The process of TFs activating or repressing transcription at initiation is believed to be the primary mechanism of gene regulation. A central question in genetics is how genetic variations perturb this underlying regulatory mechanism to give rise to differential gene expression and ultimately complex phenotypes.
The simplest analysis one can perform to address this question is expression quantitative trait loci (eQTL) mapping, which identifies genetic variations such as SNPs in the form of linkages and associations that are correlated with gene expression. Such studies have been carried out in a variety of organisms including yeast 
, mouse 
and human 
. These studies have identified many linkages between SNPs and genes in close proximity suggesting potential local regulatory mechanisms mediated by regulators such as transcription factors and miRNAs. These studies have also identified a few SNPs linked to the expressions of many genes suggesting a global regulatory mechanism mediated by master regulators such as transcription factors and histones. Unfortunately, beyond nominating candidate genes either as targets or regulators, these studies give little insight into how SNPs perturb the underlying transcription regulatory networks that control gene expression.
To gain a better understanding of the mechanisms of transcription regulation, several systems biology based methods have been proposed including clustering of co-regulated genes 
, multipoint linkage analysis 
, pathway enrichment analysis 
, prediction of regulatory modules 
and the prediction of causal regulatory relationships 
. Many of these advanced methods aim to tease out both the nodes (regulators and targets) as well as the topology (mapping of edges) in a transcription regulatory network from only considering gene expression profiles. Although these methods have predicted some interesting relationships, there are at least two aspects of transcription regulation that go unaddressed when we use them to study transcription factors and their targets. First, most previous methods rely on probabilistic models that do not provide much insight into the hidden dynamics between the activity of transcription factors and the expression of their targets. Second, the relationships inferred by these methods from the expression profiles alone can be misleading because the in vivo
activity of a transcription factor does not always correlate with its expression levels 
To overcome these problems, we adopt a framework from network component analysis (NCA) 
that considers a simple bipartite network model of transcription regulation involving only transcription factors and their targets. In this model, the expression of a target gene is completely captured by two properties of the network, the concentrations and promoter affinities of transcription factors. In general, inferring these two quantities from the expression profiles of the target genes alone is difficult. But by leveraging protein-DNA binding data from ChIP-Chip experiments 
, a partial topology of the network can be constructed and one can make the inference given certain constraints 
The NCA method as described by liao et al. has been successfully applied to several gene expression datasets to understand transcription regulation in a temporal setting 
and in the context of gene knockouts 
. In this study, we extended NCA to study transcription regulation over a population gradient by modeling three mechanisms by which genetic variations perturb the concentrations and promoter affinities of active transcription factors to induce differential expression. gives a simple example that illustrates the original NCA model and our extensions. Imagine we have a small experiment where we collected the gene expressions of four genes, the genotypes of three markers over three individuals. Given the topology of the bipartite network between transcription factors and their targets (), the NCA algorithm allows us to infer the active transcription factor concentrations (C) and the respective promoter affinities (PA) from the given gene expressions (E) in a log-linear fashion (, see Methods
). In this example, SNP1 and SNP3 are linked to the expressions of G1 and G3 while SNP2 is linked to the expressions of G2 and G4. We propose three possible mechanisms any one SNP can perturb the regulatory network and show an instance of each using the given example.
Graphical illustration of NCA and extension of NCA to include genetic perturbations.
- SNP perturbs the concentration of an active transcription factor. SNP1 is linked to the concentration of TF1 and expressions of G1 and G3, both targets of TF1 (). Biologically, SNP1 could be located in close or far proximity to TF1 to change the concentration of TF1 in vivo through transcriptional, translational or post translational regulation causing differential expression of the target genes.
- SNP perturbs the promoter affinities of a transcription factor globally. SNP2 is linked to the expressions of G2 and G4, both targets of TF2. Here, SNP2 is not linked to the concentration of TF2 but can still mediate global differential expression by altering the promoter affinities of TF2 on its targets (). Biologically, SNP2 could be located either in close or far proximity to TF2 and alters TF2's affinities to many promoter regions either through a rare non-synonymous mutation or a change in binding affinity between transcription factors in a complex, causing the global differential expression of the target genes.
- SNP perturbs the promoter affinities of transcription factors on a gene locally. SNP3 is linked to the expression levels of G1 and G3 but is only cis to G3. It perturbs the local promoter affinities of TF1 and TF2 on G3 causing differential expression of G3 (). Biologically, SNP3 could be located in G3's promoter region altering the promoter affinities of a transcription factor (i.e. TF1) or a complex of transcription factors (i.e. TF1 and TF2), causing local differential expression of the target gene between populations. This mechanism differs from SNPs perturbing promoter affinities globally in that differential expression for only one gene (local), versus many genes (global) is induced.
Because the inclusion of genetic variation creates additional parameters in each of our three models compared to the original NCA model, we expected them to always fit the data better. To effectively evaluate our models, we devised a likelihood ratio statistic and a permutation scheme to assess the statistical significance of our improvements. We then applied our method to study an expression data collected over 112 segregants of Saccharomyces cerevisiae yeast and two separate ChIP-Chip datasets generated by Harbisonet al. and Lee et al..We identified several interesting global regulatory networks perturbed by SNPs located in regulatory hotspots. Some of these networks have one property perturbed (transcription factor concentration or promoter affinity) while others have both properties perturbed suggesting a complex mechanism of global regulation. We also examined linkages between SNPs and target genes located in close proximity. We found that many of these cis linked SNPs perturb the promoter affinities of transcription factors on a target gene locally confirming previous hypotheses of cis regulation.
An interesting method proposed by Sun et al. also used the NCA framework to infer the concentrations of active transcription factors from gene expression data collected over the same yeast strains. Their method was designed to detect linkages between the inferred concentrations and genetic variations and used conditional independence tests to find modules of genes controlled by the same causal regulator. Compared to this method, we expect to find similar networks of genes and transcription factors but our method does not allow us to infer additional causal relationships using statistical tests. Instead, we focus on identifying different mechanisms by which genetic variations can perturb the regulatory networks by directly modeling the effects of these perturbations into the NCA framework. We do not attempt to make rigorous causal claims but use the causal information inherent in genotyping and ChIP-Chip experiments to suggest possible mechanisms of transcription regulation.