|Home | About | Journals | Submit | Contact Us | Français|
Individual genetic variation affects gene expression in response to stimuli, often by influencing complex molecular circuits. Here we combine genomic and intermediate-scale transcriptional profiling with computational methods to identify variants that affect the responsiveness of genes to stimuli (responsiveness QTLs; reQTLs) and to position these variants in molecular circuit diagrams. We apply this approach to study variation in transcriptional responsiveness to pathogen components in dendritic cells from recombinant inbred mouse strains. We identify reQTLs that correlate with particular stimuli and position them in known pathways. For example, in response to a virus-like stimulus, a trans-acting variant acts as an activator of the antiviral response; using RNAi, we identify Rgs16 as the likely causal gene. Our approach charts an experimental and analytic path to decipher the mechanisms underlying genetic variation in circuits that control responses to stimuli.
Genetic variation can affect molecular circuits that control mRNA levels1-4. In particular, genetic variants that affect the absolute abundance of transcripts are called expression quantitative trait loci (eQTLs), whereas variants that affect changes in transcript abundance after perturbation by external stimuli are referred to as responsiveness expression quantitative trait loci (reQTLs)4. Recent studies demonstrated the utility of testing reQTLs in the context of a single stimulus (e.g., ionizing radiation) in yeast and mammals1, 4-10; some reQTLs could be detected in response to stimulus, but not using baseline mRNA abundance levels.
Although most molecular circuits can respond to multiple distinct stimuli, it remains unknown how the same reQTL would behave in response to different stimuli in the same cell type. The reQTL associated with a given gene could either affect that gene's responsiveness to all the stimuli to which the gene responds (stimulus non-specific reQTLs; Fig. 1a, left), or may influence gene expression in response to some but not all of the stimuli to which the gene responds (stimulus-specific reQTLs, Fig. 1a, right,). Earlier studies monitored reQTLs only during a single stimulus and hence could not distinguish these possibilities.
Determining the stimulus-specificity of a reQTL can, in principle, allow us to position it within complex circuits. For example, consider a circuit with three input signals, two target genes, and the network connecting them (Fig. 1b,c). Determining a trans-reQTL in just one stimulus (e.g., s2, Fig. 1b), would position it upstream of its target gene(s) (e.g., x1, Fig. 1b), but cannot predict on which of the multiple paths from stimulus to target it resides (e.g., both pathways 2 and 3 are possible positions, Fig. 1b). However, if one determines a trans-reQTL across a sufficiently diverse set of stimuli (e.g., s1, s2, and s3, Fig. 1b), it can be positioned on a particular pathway in the circuit (e.g., pathway 2, Fig. 1b). A similar rationale applies to cis-acting variants (Fig. 1c). If the genetic architecture cannot be explained in the context of the known circuitry, we can use the inconsistency to refine the circuit with new putative connections.
Here, we systematically study these questions by developing a framework, involving several new computational methods, for multi-stimulus reQTL analysis. We apply this approach to study the transcriptional responsiveness of bone-marrow-derived primary dendritic cells (DCs) to stimulation with different pathogen components, across a panel of recombinant inbred (RI) BXD mice, generated by crossing the parental C57BL/6J (B6) and DBA/2J (D2) strains11. We find that stimulation-specific reQTLs are common, both in cis and in trans, and that many trans-reQTLs likely function the same way in response to different stimuli. By comparing the responsiveness of genes and their underlying reQTLs across multiple stimuli, we refine known molecular circuits and infer the position of reQTLs within them.
We devised a general approach to study the genetic basis of variation in the transcriptional responsiveness to diverse stimuli (Fig. 2a). First, we isolate primary cells from genetically variable individuals (here, previously genotyped RI mouse strains), stimulate the cells in vitro with different stimuli (here, the TLR4 agonist lipopolysaccharide (LPS) which induces both inflammatory and anti-viral response genes; the TLR3 and MDA-5 agonist poly IC which induces anti-viral response genes; and the TLR2 agonist PAM which induces inflammatory response genes12, 13; Fig. 2b), and measure gene expression profiles in a relatively small number of individuals (here, 8 strains). Second, we use the gene expression profiles to define a ‘variation signature’ consisting of several hundred genes (here, 424 genes) whose responsiveness differs significantly between genetically distinct individuals. Third, we measure the variation signature (here, using the Nanostring nCounter) across samples from many individuals, under different stimuli and/or time points (here, 96 individuals, 3 stimuli and two time points). Fourth, we combine these signatures with genotypic information14 to decipher the genetic architecture of variation in gene responsiveness. The initial genomic profiling allows us to determine a high-quality gene signature, while the higher throughput and low sample requirements of the subsequent signature assay allow scaling to a large number of stimuli, time points, and individuals , thus enhancing biological scope and statistical power.
We applied our step-wise approach to study the variation in transcriptional response of DCs from RI BXD mice11. We isolated DCs from individual age-synchronized female mice of six BXD strains and the two parental strains (B6, D2) and measured 30 global gene expression profiles in resting conditions and after stimulation with pathogen components (Supplementary Table 1). The profiles suggest significant genetic variation in the transcriptional response, with 40.6% of genes showing heritability that is higher than expected by chance (Wilcoxon test, P<10-200; Supplementary Note 1).
We selected a ‘variation signature’ of 424 genes (Supplementary Note 2, Supplementary Tables 2 and 3), using a new method, InSignature, that selects genes based on genetic information so as to represent different underlying genetic factors regardless of their effect sizes. InSignature first partitions genes into distinct classes, each consisting of genes whose heritable variation in expression is likely associated with the same (shared) genetic factor (Fig. 2c), and then selects representatives from each class (Fig. 2d; Supplementary Note 2; Methods). InSignature performs well compared to other selection procedures by several measures (Supplementary Fig. 1).
Next, we measured the expression of the 424 signature genes in 384 samples of DCs from 96 individual mice: two individuals from each of 44 BXD strains and four individuals from each of the two parental strains (Supplementary Table 4). We measured mRNA abundance prior to and at 6 hours after LPS, polyI:C or PAM stimulation, using the nCounter (Methods). The nCounter measurements were highly consistent with those of microarrays (Supplementary Fig. 2a-c) and were highly reproducible (Supplementary Fig. 2d).
To identify reQTLs in different stimuli, we calculated — for each gene in each stimulus and strain — a responsiveness trait as the log ratio between that gene's expression level at 6 hours post stimulation and its baseline expression level (Methods). The heritability of responsiveness (percentage of responsiveness variance accounted for by strain) is much higher than expected by chance (P<<10-139, permutation test). Thus, our data is amenable to genetic analysis of complex traits. We considered the responsiveness of each gene in each stimulus as an independent trait, and used a likelihood ratio (LR) score to determine the putative genetic variant(s) (reQTLs) that are significantly associated with each responsiveness trait per stimulus (Methods).
For 31 genes we detected a genetic association in cis in at least one stimulus (Fig. 3a; FDR-corrected association P-value<0.01; median heritability: 88%; Methods). Similar results are obtained from measuring the variation signature in LPS-stimulated DCs at an earlier (2h) time point, further supporting the identified cis-reQTLs (Supplementary Note 3). While each of the 31 genes is significantly cis-linked in one or two stimuli, those associations are insignificant in at least one other stimulus (Fig. 3a), even when lowering the significance threshold (Methods). Consistently, there is a significant gene-stimulus interaction in 28 of the 31 genes (P<0.03; Supplementary Fig. 3a). Thus, these cis-reQTLs are stimulus-dependent. This may parallel the common observation of different cis-eQTLs in different tissues15-18. The stimulus-dependency of cis-reQTLs is not merely a result of lack of responsiveness to some stimuli. While a high association (LR) score is related to higher (absolute) responsiveness in the gene (relative to that gene's responsiveness to other stimuli, Supplementary Fig. 3b-e), the reverse is not necessarily true (Supplementary Fig. 3b,d).
We partitioned the 31 cis-linked genes into eight groups (I-VIII, Fig. 3a) based on their level of responsiveness (Fig. 3a, grey scale) and cis-associations across stimuli (Fig. 3a, brown to yellow scale), respectively defining responsiveness and association profiles for each gene. The cisreQTLs of the 19 genes in groups II, IV, VI, VII, VIII are stimulus-specific, since in at least one stimulus there is high absolute responsiveness but no genetic association. The remaining groups (I, III, and V, 12 genes) consist of stimulus non-specific cis-reQTLs, with matching cis-association and responsiveness profiles. On average, the stimulus non-specific cis-reQTLs have a larger effect size than stimulus-specific cis-reQTLs (P<10-7, Wilcoxon test; Supplementary Fig. 3f). This parallels previously observed differences between tissue non-specific and tissue-specific eQTLs19.
We next examined trans-acting reQTLs. Due to the lack of statistical power to detect trans-linkage using information on each trait independently20 (Supplementary Fig. 3g), we focused on identifying pleiotropic variants that affect a set of co-varying genes. To this end, we devised InVamod, a method to identify pleiotropic genetic variants, their associated genes and the stimuli under which the association holds. InVamod takes as input all responsiveness traits and potential genotypes, partitions the traits into ‘co-variation modules’ and identifies for each module an associated genetic variant (Fig. 4a, Supplementary Fig. 4, Methods). InVamod first calculates the association scores for each trait at each genomic position, and uses this information to initialize a trivial seed solution of single-trait modules whose associated genetic variant is the best-scoring variant of the trait (Fig. 4a, top). It then agglomerates modules. In each agglomeration step, InVamod merges pairs of modules from the previous step into a new module, and updates the genetic variant of the new module such that it best reflects all traits in the newly generated module (Fig. 4a, steps 1-3). The updated variant might differ from the variants associated with the two agglomerated modules in the previous step (e.g., variants C and D are replaced by variant F; Fig. 4a, step 2). Each merging step minimizes the loss (in association scores) due to updating the module's genetic variant. Merging (and updating) is only allowed if the loss is acceptable. The output is a set of modules, each consisting of traits and their (collectively) best associated genetic variant (Fig. 4a, bottom).
InVamod found six significantly large (possibly overlapping) modules in our data (Fig. 4b,c, Supplementary Table 5), each with at least 14 traits, the minimal size that would be expected by chance at a very low probability (P<0.001, permutation test, Supplementary Fig. 5a). For example, the reQTL of module #2 (Fig. 4b-d) affects the responsiveness of the antiviral genes Irf7, Stat2 and Dhx58 under poly IC (a viral mimic), whereas the reQTL of module #5 (Fig. 4b,c) affects the responsiveness of the inflammatory genes Cxcr4, NFkb2 and Nlrp3 under PAM and LPS (bacterial components). Each module is linked to a distinct (trans) genomic position, and together spanning five different chromosomes with no particular location biases. Although the genomic intervals associated with modules #1, #2 and #4 include cis-associated genes (Supplementary Table 6), a potential source of propagated variation, these are not necessarily detected in the relevant conditions. The modules’ trans-reQTLs explain on average 46% to 58% of the heritability of their genes (Fig. 4b, Supplementary Fig. 5b,c, Methods). Similarly to cis-reQTLs, a trans-reQTL implies high responsiveness of the cognate targets, but not vice versa (Fig. 4d, Supplementary Fig. 5d). The vast majority of trans-reQTLs are stimulus-specific (95% of modules’ genes, Supplementary Fig. 5d), similar to the observed frequency of tissue specific trans-eQTLs (80-91% in Refs. 16, 17, 21).
The traits in each module are significantly associated in coherent stimuli, forming a ‘reQTL activity profile’ (Fig. 4b,d, Supplementary Fig. 5d; Methods): mostly PAM in module #1, mostly poly IC in modules #2, #3 and #4, and PAM and LPS in modules #5 and #6 (Fig. 4b,c). In some cases, the same genes are members of more than one module, each associated with different reQTLs in different stimuli (e.g., nine genes shared between module #1 and #2, Fig. 4c).
The transcriptional responsiveness of a module's genes is typically coordinated, likely reflecting their co-regulation. We define a gene's responsiveness profile as the set of stimuli to which the gene is highly responsive, and a module's ‘leading responsiveness profile’ based on significance of enrichment across the responsiveness profiles of its member genes (Methods). Five of the six modules have at least one leading (enriched) responsiveness profile (Fig. 4b,d, Supplementary Fig. 5d).
Next, we aimed to decipher the mechanisms through which reQTLs affect target genes. In many cases, the same gene is associated with the same DNA region in response to more than one stimulus (cis: groups I-IV, Fig. 3a; trans: modules #5, #6; Fig. 4c). A parsimonious explanation for this observation is that the high responsiveness in these stimuli is mediated by the same mechanism, which is modulated by a single DNA polymorphism, for example by an effect on the binding of a regulatory protein that is active only during responses to those stimuli.
Two lines of evidence support this ‘shared association, shared mechanism’ hypothesis. First, the same reQTL is associated in combinations of stimuli that are consistent with the known architecture of the TLR circuit. Trans-reQTLs are associated in both PAM and LPS (modules #5 and #6, Fig. 4b, Supplementary Fig. 5d), suggesting interaction with inflammatory signaling pathways activated by both stimuli (Fig. 2a). cis-reQTLs are associated with both PAM and LPS (10 genes; groups I,II, Fig. 3a), which activate inflammatory transcription factors such as NF-κB (Fig. 2a), or with both poly IC and LPS (6 genes; groups III,IV, Fig. 3a), which activate antiviral factors such as members of the STAT and IRF families (Fig. 2a). Notably, no reQTLs were found associated with both PAM and poly IC; most known pathway components activated by these stimuli are distinct. Second, when responsiveness traits are classified into major classes based on their behavior across genotypes and time (Fig. 3b), a gene typically belongs to the same class in different stimuli (Fig. 3a, Supplementary Fig. 5d). Notably, we cannot completely rule out the non-parsimonious (and quite unlikely) possibility that due to the large linkage intervals, in some cases one reQTL in an interval may act in one stimulus and a neighboring reQTL may act in another, or that in the same stimulus, one reQTL in an interval may affect one gene and a neighboring reQTL may affect another gene.
Next, we exploited the shared characteristics of association and responsiveness profiles of a module's genes to devise InCircuit, a procedure to position a trans-reQTL in the context of a well-established molecular wiring map (a ‘circuit’; Fig. 5a). InCircuit takes as input a (known) wiring diagram of a circuit associating stimuli to signaling pathways. First, InCircuit adds each of module gene at the bottom of the wiring map based on its responsiveness profile. Next, it positions the trans-reQTL associated with the module within the existing branches of the wiring diagram, if possible, such that (1) the stimuli upstream of the reQTL match its predicted activity profile, and (2) the reQTL is upstream to a significant number of the target genes of the module. In particular, in this study, the reQTL is required to be upstream of all of its module's target genes, as long as they belong to a leading (enriched) responsiveness profile of the module. When more than one position is consistent with these two criteria, InCircuit selects the one that minimizes the number of ‘irrelevant’ (non-leading) downstream responsiveness profiles (Supplementary Fig. 6, Methods). Trans-acting reQTLs that can be positioned in this way are deemed consistent with the known circuit. Those that cannot may suggest new branches in the circuit (Supplementary Fig. 6b, ‘model refinement’). This analysis can only be applied to modules with a least one enriched (leading) responsiveness profile (in our case, only five candidate reQTLs). In principle, the same rationale can be applied to cis-reQTLs (Supplementary Fig. 7). However, since cis-reQTL associations are determined from the expression of a single gene, they do not allow us to identify robustly enriched responsiveness profiles, and hence cannot be as reliably resolved in our data.
We applied InCircuit to a TLR wiring diagram that was previously constructed in DCs22. InCircuit first added genes at the bottom of the wiring map based on their responsiveness profiles (Supplementary Fig. 5d, Methods). The accuracy of InCircuit's differential positioning of genes downstream of branches 7, 8, 9 and 10 is strongly supported by their differential enrichment for binding by transcription factors from the RELA, STAT and IRF families, as measured by ChIP-Seq in DCs23 (Supplementary Fig. 8). Next, InCircuit mapped each of four of the five candidate trans-reQTLs into a particular branch in the circuit diagram (Fig. 5b). For example, reQTL #5 is PAM- and LPS-specific and affects two enriched responsiveness profiles, PAM-LPS and PAM-LPS-poly IC. It was positioned in the branch propagating both PAM and LPS signaling, prior to the split into two distinct responsiveness profiles.
The fifth trans-reQTL, which is associated with module #1, could not be positioned in the known molecular circuit in a manner consistent with any existing branch. This module consists primarily of anti-viral genes (11 of 14, P<10-7, hypergeometric test, e.g., Irf7, Stat2, Dhx58, Ifit2) with the LPS-poly IC (viral-like) responsiveness profile (Fig. 4b, 4th column) that are, however, trans-linked primarily under PAM stimulation (Fig. 4b, 3rd column; Supplementary Fig. 5d). To reconcile the reQTL with the circuit, we must add a potential cross-talk between bacterial and antiviral pathways (Fig. 5b, dashed line) and position the reQTL on this new branch. The existence of such a cross-talk has been previously postulated24, 25, and may be apparent only for some alleles.
Known interactions between transcription factors and the promoters of antiviral genes (Fig. 5c) provide support and insights into the potential bacterial-antiviral cross talk in module #1. In particular, promoter analysis26 predicts that 9 of the 12 genes in module #1 are targets of the anti-viral IRF family of transcription factors (P<10-7, hypergeometric test; Fig. 5c, green rectangles in rightmost grey box), whereas only one gene is a predicted target of NF-κB (Fig. 5c, green rectangles in middle grey box). ChIP-Seq data indicates that the promoters of 8 of the 12 genes are bound by STAT1 in DCs23 (P<10-9, hypergeometric test). This is consistent with their placement under branch 10. reQTL #1 may thus underlie genetic variation in IRF or STAT activity, but only in a manner independent of NF-κB (Fig. 5c; branch 10).
The trans-reQTL associated with module #2 is particularly intriguing because it affects the largest number of genes, its target genes are immune-related (P<10-10, Fig. 4b), and it is most significantly predicted to be active only during responses to poly IC (P<10-16; Fig. 4d), but not LPS, PAM, or pre-stimulation (median LR score: 1.6 for eQTL).
To identify the gene likely underlying this trans-reQTL, we tested the effect of nine potential genes in the associated linkage interval at Chr1:129-169 Mb (Fig. 6a, Supplementary Fig. 9a) in stimulated DCs. These nine genes were selected from the 65 genes that reside within the linkage interval and are regulated in DCs during poly IC stimulation. Six genes (Ikbke, Ptprc, Ivns1abp, Fam129a, Rgs16, Rnasel) have a non-synonymous coding polymorphism that alters the physicochemical group of the encoded amino acid, two genes have a cis-associated reQTL in either the array or the nCounter data (Pla2g4a, Glul; Supplementary Table 6), and Rab2b is a well-known direct effector of interferon production and hence a likely candidate.
We used shRNAs to knock down each candidate gene in primary bone marrow-derived DCs from B6 and D2 mice (one individual per strain), and measured the effect on our variation gene signature pre-infection and at 6 hours of infection with Sendai Virus (SeV). We used SeV because poly IC stimulation of shRNA-treated DCs lead to rapid cell death (data not shown), and SeV and poly IC affect the transcriptional response of key genes in similar ways27. Although poly IC and SeV are detected by distinct sensors (TLR3 and MDA-5 for poly IC, RIG-I and TLR7 for SeV28-30; Fig. 5c), both converge on the protein IPS-1 and its downstream signaling pathways, thus limiting any potential false positives to the short RIG-I-IPS branch.
Rgs16 is the most likely candidate gene influencing module #2, since it was the only gene of all those tested with a robust and specific effect on the module (P<10-24; Fig. 6b, Methods). This effect was observed with two different shRNAs targeting Rgs16 and its magnitude was proportional to the degree of Rgs16 knockdown (Fig. 6b). To determine if the Rgs16 effect is specific to an anti-viral response, we compared the effect of shRNA specific for Rgs16 on transcript levels 6 hours after stimulation with SeV infection, PAM, LPS or no stimulation (Fig. 6c, Methods). As predicted by our analysis, Rgs16 shRNA significantly and specifically (t-test P-value < 10-3) reduced the expression of Module #2 genes after SeV infection, suggesting that RGS16 acts as an activator of the module genes under viral stimulation. After PAM stimulation and in baseline (no stimulation) conditions, Rgs16 shRNA had the opposite effect (P<10-7), suggesting that RGS16 acts as an inhibitor of the module in these conditions. After LPS stimulation, Rgs16 shRNA had no effect on the module (P>0.2; Fig. 6c). The same results were obtained with two different shRNAs targeting Rgs16 (Fig. 6c and Supplementary Fig. 9b), using both B6 and D2 mice (two biological replicates per strain; data not shown). One potential explanation for the effect of Rgs16 activity is a single nucleotide polymorphism (SNP rs6362681) leading to a non-synonymous amino acid change in residue 193 (glycine in B6, serine in D2)31, an unstructured serine-rich part of RGS16 that influences its signaling activity32.
Here, we present an experimental and computational framework to characterize the impact of genetic variation on transcriptional responsiveness to multiple stimuli, and we apply it to study the response of DCs to pathogen components. At the heart of our approach are two novel computational methods for multi-stimulus reQTL analysis: InVamod and InCircuit.
InVamod identifies trans-reQTLs with pleiotropic effects and characterizes their stimulus specificity. Previously published methods can jointly analyze multiple traits, but either do not scale well33,34, 35 or generate results that are not directly biologically interpretable36. In contrast, InVamod analyzes multiple traits in an a-parametric, scalable manner, and produces coherent modules of significantly associated traits. Since InVamod takes as input the association scores rather than the trait values, it can in principle construct heterogeneous modules of distinct traits, allowing researchers to group traits regardless of their source and distribution.
InCircuit positions the discovered reQTLs within the branches of a known molecular circuit diagram. Circuit reconstruction methods position genetic variation in circuits based on gene expression data3, 37, but do not leverage the stimulus specificity of signaling pathways, reQTLs and target genes. In contrast, InCircuit directly uses stimulus-specificity of reQTLs (Supplementary Fig. 10), but only in the context of an established (prior) molecular network (Fig. 5b). If InCircuit cannot position a reQTL within the known circuitry, the reQTL can suggest new putative connections. InCircuit assumes that a single reQTL affects each responsiveness trait, and that the circuit diagram is deterministic and acyclic. Future developments that handle dynamics, feedback and uncertainty may increase the accuracy of reQTL positioning and improve our understanding of reQTL mechanism of action.
Our results lent strong support for RGS16 as the underlying cause of the module #2 reQTL. This regulator of G-protein coupled receptors is involved in inflammatory processes in B and T cells, and affects B cell function, leading to autoimmune disease in BXD2 mice38, 39. Rgs16 expression is induced in mouse and human DCs activated with LPS or poly IC40. Interestingly, RGS16 is contained within a copy number variation region that substantially differs within human Asian populations41.
To collect a large-scale, multi-stimulus dataset, we relied on a practical strategy that combines genome-wide profiling experiments, from which we define a ‘variation signature’, with a subsequent meso-scale assay to monitor the variation signature across many combinations of genotypes, stimuli and time points. Several different meso-scale technologies have cost, sensitivity and scale benefits (e.g., Luminex QuantiGene Plex, MRM42, Mass Cytometry43) that can be exploited as part of this design. While a meso-scale approach offers some benefits, genomic profiling may allow the identification of more variants (Supplementary Fig. 1e). As our overall framework is general, and InVamod and InCircuit are scalable and can be applied to multi-stimulus reQTL analysis with signature data (as here) or genome-wide profiles (e.g., microarrays, RNA-seq), this approach may help decipher the stimulus specificity, molecular circuitry and underlying causes of a wide variety of reQTLs.
BMDCs were generated from 6- to 8-week-old female C57BL/6J, DBA/2J and BXD mice (Jackson Laboratories). All animal protocols were reviewed and approved by the MIT / Whitehead Institute / Broad Institute Committee on Animal Care (CAC protocol 0609-058-12). Bone marrow cells were collected from femora and tibiae and plated at 106 cells/ml on non-tissue culture treated Petri dishes in RPMI-1640 medium (GIBCO), supplemented with 10% FBS, L-glutamine, penicillin/streptomycin, MEM nonessential amino acids, HEPES, sodium pyruvate, β-mercaptoethanol, and murine GM-CSF (15 ng/ml; Peprotech) or human Flt3L (100 ng/ml; Peprotech). For microarray experiments, floating cells from GM-CSF cultures were sorted at day 5 by MACS using the CD11c (N418) MicroBeads kit (Miltenyi). Sorted CD11c+ cells were used as GM-CSF-derived BMDCs, and plated at 106 cells/ml and stimulated at 16 hr post sorting. For all other experiments, GM-CSF-derived BMDCs were used directly and composed of ~90% CD11c+ cells, which was comparable to sorted BMDCs and to our previous observations22. Notably, we excluded from this study those BXD strains that were recently found by genomic analysis to have unexpected heterozygosity contaminations (Fernando Pardo-Manuel de Villena, personal communication).
TLR ligands were from Invivogen (Pam3CSK4, LPS) and Enzo Life Sciences (poly IC), and were used at the following concentrations: Pam3CSK4 (250 ng/ml), poly IC (10 μg/ml), LPS (100 ng/ml). Sendai virus was obtained from ATCC. DCs were infected with Sev in multiplicity of infection of 1.
All array hybridizations (Supplementary Table 1) and pre-processing were done using Affymetrix Mouse Genome 430A 2.0 Array as in Ref. 22 and normalized using RMA44. Probe sets that were absent in all samples according to Affymetrix flags were removed. Only 6,850 genes that have at least one measurement that is higher than a threshold value of 50 were used.
The standard log fold change is defined as log expression under stimulation, normalized by baseline log expression of the same strain. Inter-strain fold changes x/B6 is the log expression in strain x normalized by the averaged log expression in strain B6 at the same stimulus and time point. We selected for further analysis 4,573 genes whose expression may be affected by either stimulus or strain, defined by ≥1.7 standard fold change or ≥1.7 inter-strain fold change.
The variation signature consists of 424 genes selected using four distinct criteria: (I) heritability in responsiveness, as scored by InSignature (below) (322 genes); (II) prior knowledge on a key role in antiviral and inflammatory responses or in related diseases, based on the annotation in the Ingenuity Knowledge Base (Ingenuity Systems, Mountain View, CA, USA) (61 genes); (III) positive responsiveness controls, based on DCs response (especially up regulation) to LPS, poly IC or PAM stimulation in previous report22 (20 genes); and (IV) negative controls, based on low responsiveness variation (21 genes) (Supplementary Note 2).
Selection of genes based on heritable variation (criterion (I)) is challenging since (i) information should be combined across different stimuli, but only a few strains were globally profiled in each stimulus; and (ii) using top-scoring genes (e.g., based on variance, heritability or association scores) might lead to an unbiased representation of the underlying genetic variants, over-representing target genes of variants with large pleiotropic effects. To address both challenges, we developed a new method, InSignature, as follows.
The InSignature method takes as input genome-wide profiles from different strains and stimuli and predicts, for each gene, an underlying ‘InSignature pattern’ comprising of a set of hypothetical optimal genetic variants that are likely to act in each of the stimuli. The InSignature patterns impose a partition of the genes into classes, each consisting of all genes with the same predicted InSignature pattern (Supplementary Fig. 11). InSignature then selects representative genes within each class. This reduces the dependence on effect size and multiplicity: genetic variants with strong influence on many genes will be represented similarly to variants with moderate or low effect on a few target genes only.
Formally, an InSignature pattern is a maximum likelihood fitting of expression traits into discrete values, as follows (Supplementary Fig. 11). For each gene, the method discretizes transcript abundances using a mixture of two or three Gaussians, so as to maximize a likelihood score. We further constrain the solution space so that individuals of the same strain and same condition must have the same discrete value. The InSignature score is the –log10 P-value of a maximum likelihood ratio test, calculated as follows: The likelihood of the null hypothesis is calculated based on a single Gaussian, whereas the likelihood of the alternative hypothesis is the maximum likelihood fitting into two or three Gaussians. The test statistics is the logarithm of the likelihoods’ ratio, and its probability distribution is approximated using a chi-squared distribution (degrees of freedom are calculated based on Wilks’ theorem45) that allows the calculation of a P-value score. This approach allows (1) a principled filtering of genes; in our data, an InSignature score of 5.5 and higher corresponds to false discovery rate of 0.05 (Supplementary Fig. 12); (2) prediction of an underlying genotype in the case of multiple conditions and only a few strains, allowing a different genotype for each condition but requiring the same effect size; and (3) grouping together genes with the same InSignature pattern into ‘InSignature classes’ (Supplementary Fig. 11).
Next, to choose the final set of signature genes (criterion I), InSignature proceeds as follows. It first randomly selects a class and then chooses a gene within it. We use several alternative criteria to select a representative gene within a chosen InSignature class: the top-scoring gene (based on the InSignature score); biologically relevant genes within the class (defined as criterion II above); responsiveness under LPS, poly IC and PAM (defined as criterion III above); and random selection of a gene within a class (full details are provided in Supplementary Note 2).
Here, of the 4,573 analyzed genes, 799 could be significantly fit with an InSignature score that is higher than 5.5 and were grouped into 446 InSignature classes. Of those, 294 are in singleton classes and the rest fall into 152 classes of size 2 to 13. Out of 322 genes selected by InSignature from these classes (typically one or two representatives per class), 232 were chosen as ‘top scoring’ in their class, 35 were randomly chosen from their class, 20 were chosen as class members that also have known biological relevance, and 35 were chosen as class members that are also highly responsive. To those we then added another 61 ‘biologically relevant’ genes, 20 ‘responsive’ genes and 20 negative controls, all of which were not within the classes (Supplementary Table 3).
The NanoString nCounter gene expression system captures and counts individual mRNA transcripts, using a multiplexed probe library (codeset) with sequence-specific probes for each gene of interest. Details on the nCounter system are presented in full in Ref. 46. The nCounter codeset was constructed to detect each of 424 genes in the variation signature and used to measure mRNA expression on nCounter as previously described27.
We normalized the nCounter data (Supplementary Table 7-10) in three steps. First, we used positive spiked-in controls provided by the nCounter instrument to control for small variations in the efficiency of the automated process. Second, 21 control genes (Supplementary Table 2), which showed minimal variation based on the microarray data, were used to control for differences in sample values. Third, we removed covariates associated with epoch population substructure47. P-P plots demonstrating low observed inflation for representative trans-linked genes are presented in Supplementary Fig. 13.
Genotyping of 3,500 single nucleotide polymorphisms (SNPs) of BXDs were taken from WebQTL14. Responsiveness traits were calculated as the log ratio between the expression level under exposure to pathogen and the baseline expression level. (A responsiveness profile of a gene is the set of stimuli under which the gene is ‘highly responsive’, defined as exceeding 40% of the maximal measured responsiveness for that gene). Responsiveness-SNP associations were identified by assuming a two-level nested ANOVA model (higher level: genotypes; intermediate level: strains; lower level: individuals in each strain) and calculating its likelihood ratio (LR) statistic48. The ‘LR score’ is defined as 2 · ln LR. The LR score indicates whether responsiveness differences could be related to the genetic marker rather than to differences between or within strains. To determine a significance threshold, we used a permutation test, reshuffling the BXD labels among strains, but performing no other reshuffling or randomization. In this way, we only disrupt the matching between reQTLs and responsiveness but maintain within-strain variation. Using this permutation test, and requiring a false discovery rate of at most 0.001, we set the significance threshold for the LR score to be 29 for declaring a significant association of a single gene under a single stimulus. (For comparison, based on an approximation to an F statistic48 and before multiple testing correction, an LR score of 29 corresponds to a type I error of 3.5*10-7). If the best scoring marker of a given responsiveness trait (or multiple traits) exceeds this threshold, it was termed a genetic variant (reQTL) associated with this trait.
Between-genotype and between-strain components of variance48 were estimated based on the above two-level ANOVA model, allowing evaluation of (broad-sense) heritability (H2): the percentage of between-strain variance out of the overall variance. The percentage of heritability that is attributed to between-genotype effects is denoted % explained heritability49.
A responsiveness trait was classified based on comparing the degree of displacement (the difference between mean values of expression in strains carrying the two reQTL alleles) before and after stimulation (Fig. 3b). A trait is divergent if the (absolute) displacement is higher post-stimulation than at baseline. Divergence is early if the pre-stimulation displacement is already significant (here, 0.65 standard deviations difference), and late otherwise. Convergence indicates that the (absolute) displacement is lower post-stimulation than at baseline. This includes the special case of inverted traits (referred as ‘inversion’), when there is a significant displacement at both baseline and stimulated state, but at opposing directions.
To calculate interactions, we analyze the responsiveness data of the three stimulations by a full ANOVA model including the stimulation effect (S) and the reQTL×S interaction effect:
where yij is the gene's responsiveness at the ith individual (i= 1,...,n) and jth stimulation; Qi is the genotype effect; Sj is the jth stimulation effect; eij is the residual error; and (Q×S)ij is the interaction effect between the ith reQTL genotype and jth stimulation. P-values for each of the effects are derived based on a corresponding F distribution as part of the full ANOVA test. For each gene, we first find the reQTL with the best LR score and then calculate the interaction effect for this reQTL.
The information from multiple traits of the same gene can provide an extra statistical power to detect significantly-linked genetic variants. We focused on multiple traits of the same gene that are linked to the same genetic variant. Such recurring event is highly unlikely to be generated at random, and therefore the LR score threshold can be reduced, while maintaining the same significance level. Because LR is an additive score, it is possible to calculate the association of multiple traits to a single genetic variant (marker) by summing over the LR scores of each of the traits alone. Based on empirical permutation tests, in our dataset two traits must attain a total (summed) LR score of 46, that is, an average LR of only 23 for each trait, to provide an FDR-corrected P-value of 0.001 (compared to an LR of 29 for a single gene in a single stimulus, above). Altogether, we applied the following procedure. For each gene, we used a strict association score of 29 to detect genetic variants that act on single traits, or a permissive association score of 23 for cases of two or more traits that are best-linked to the same variant. We define an association profile of a gene and an reQTL as the set of stimuli under which the gene is significantly associated to that reQTL.
InVamod partitions the traits into co-variation modules in a step-wise agglomerative manner (Fig. 4a, Supplementary Fig. 4). Each module consists of a group of significant traits and the best scoring genetic variant for this group of traits. We define the association score between a module and a given variant as the sum of LR scores across all the module's traits to this variant. The best scoring variant of a given module is the variant that attains the highest association score for that module. This variant is typically different from the optimal variants that would be associated as the best-scoring marker of each of the traits independently (which may be different variants for different traits in the module). The association score of a module is therefore equal to or lower than the score of these optimal independent variants (denoted ‘optimal score’).
We initialize InVamod with the trivial seed solution of significant single-trait modules. At each step, InVamod merges traits or modules from the previous step, and then updates the best scoring genetic variant to the new module. The newly generated modules may attain a new genetic variant that is different from their variant at the previous step. A merge that results in a smaller association score loss (compared to the optimal score) is favored over those merges that cause a higher loss. The merging process is deterministic (optimizing the loss) and repeated until no two modules can be merged into a legal module. A module is legal if and only if (1) it is a multiplicative ε-approximation of its optimal score; and (2) it does not contain traits that are insignificantly associated with its genetic variant. The cutoff for trait significance is predefined and depends on the number of traits within the module (‘module size’): the larger the module, the lower the cutoff.
We evaluated module significance based on an empirical permutation test. Permutation tests were conducted by applying InVamod on 100 permuted datasets and identifying variation modules in each of these permuted datasets. InVamod was applied on the permuted datasets using the same parameters as applied on the original dataset (including ε and LR cutoffs for traits in modules of sizes 1, 2 and 3; no other parameters are required). Each permuted dataset was generated by reshuffling each of the responsiveness traits independently. Reshuffling of a responsiveness trait was performed by changing responsiveness values among BXD strains while maintaining the original matching among responsiveness values that belong to two individuals of the same strain. This way, the original within-strain variation, between-strain variation and heritability is maintained, and only the association with a particular genetic variant is broken.
To detect significant modules (even when no single trait is highly significant), we constructed modules initially based on permissive cutoffs for trait significance and then applied stringent criteria to filter significantly large modules, based on permutation tests. In our analysis, we used ε=0.7, and permissive cutoffs of LR = 16, 10, 9 for modules of size 1, 2, 3 and larger, respectively (for comparison, LR of 29 and 23 provide stringent cutoffs for one or two traits as detailed above). Based on empirical permutation tests, with these parameters, module size of 14 and higher is unlikely to be generated by chance (P-value ≤ 0.01; Supplementary Fig. 5a).
InCircuit takes as input a co-variation module and a wiring diagram (denoted ‘regulatory circuit’) depicting all relevant stimuli, signaling pathways, and responsiveness profiles (RPs) under study. The reQTL activity profile of a module is the collection of stimuli under which the reQTL is predicted to be active (predicted activity under a given combination of stimuli is based on over-representation of module traits under these particular stimuli; FDR-corrected hyper-geometric P-value<10-3). Over-represented RPs of a module are its ‘leading RPs’. A trans-acting reQTL is added to a branch on the wiring diagram based on two criteria: the stimuli upstream the reQTL exactly match its activity profile, while all the module's leading RPs are downstream to it (Supplementary Fig. 6a). When more than one possible branch satisfies these conditions, a model selection criterion is applied: the reQTL position is selected such that it minimizes the total number of downstream (leading or non-leading) RPs. When no such branch exists, a new branch is added to the wiring diagram consistently with these requirements (Supplementary Fig. 6b, ‘model refinement’).
InCircuit's input regulatory circuit (Fig. 5b) was not constructed as part of this study but was taken as prior knowledge from a comprehensive previous study of the transcriptional response to TLR stimulation in DCs22, and is consistent with extensive literature on the DC TLR circuit13. The target genes were added to the diagram based on their measured responsiveness profiles (Supplementary Fig. 5d). The wiring diagram in Fig. 5c represents protein-protein interactions map of TLR signaling from the Ingenuity Knowledge Base (Ingenuity Systems, Mountain View, CA, USA), where the target genes were added based on a computational promoter analysis26.
High-titer lentiviruses encoding shRNAs targeting genes of interest were obtained from The RNAi Consortium (TRC; Broad Institute, Cambridge, MA, USA)50. Bone marrow cells were infected with lentiviruses as previously described22. For each gene of interest, we verified knock down efficiency in our cells using qPCR of the target gene. To this end, we harvested PolyA+ RNA in 96-well plates with the Turbocapture mRNA kit (Qiagen) and reverse transcribed with the Sensiscript RT kit (Qiagen). Real time quantitative PCR reactions were performed on the LightCycler 480 system (Roche) with FastStart Universal SYBR Green Master Mix (Roche). Every reaction was run in triplicate and GAPDH levels were used as an endogenous control for normalization. All shRNA used in the study had >50% knockdown efficiency. The two shRNAs targeting Rgs16 (shRGS16(1) and shRGS16(2), Fig. 6) have distinct seed region sequences (1: GCCCAGAACTTATACTTTCTA; 2: CCGAGAACTGACCAAGACAAA). Lentivirus-infected cells were composed of ~90% CD11c+ cells, which was comparable to sorted BMDCs and to our previous observations22. The effect of shRNA knockdown on gene expression was quantified using the nCounter assay on shRNA-treated DCs before and 6 hours after stimulation with PAM, LPS or infection with SeV at MOI of 1.
The shRNA effect on the responsiveness each gene is defined as the difference between responsiveness of the shRNA-treated cells compared to responsiveness of LacZ shRNA and Luciferase shRNA control-treated cells (responsiveness is averaged across B6 and D2 in both cases). Similarly, the shRNA effect on transcript level of each gene is defined as log2 (expression level in shRNA-treated cells / expression level of LacZ shRNA and Luciferase shRNA control-treated cells). The preferential effect of a reQTL on the responsiveness of the module genes is the –log P-value of a t-test for the difference between the distribution of shRNA effects on responsiveness of module genes compared to the distribution of effects on the remaining genes in the assay.
We thank Chun (Jimmie) Ye, Joseph Pickerel, Mark Daly and Eric Lander for comments and discussions. IGV and IA were supported by Human Frontiers Science Program postdoctoral fellowships. Work was supported by HHMI, an NIH PIONEER award, a Burroughs-Wellcome Fund Career Award at the Scientific Interface (AR), a Center for Excellence in Genome Science grant 5P50HG006193-02 from the NHGRI (NH and AR), the Klarman Cell Observatory at the Broad Institute (AR), the New England Regional Center for Excellence/Biodefense and Emerging Infectious Disease U54 AI057159 (NH), the Israeli Centers of Research Excellence (I-CORE) Gene Regulation in Complex Human Disease, Center No 41/11 (IGV), the Human Frontiers Science Program Career Development Award and an ISF Bikura Institutional Research Grant Program (IA) and the Edmond J. Safra Center for Bioinformatics at Tel-Aviv university (RW and YS) . AR is a fellow of the Merkin Foundation for Stem Cell Research at the Broad Institute. IGV is a Faculty Fellow of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University and an Alon Fellow.
I.G.V., I.A. and A.R. conceived and designed the study. N.C., T.E, R.R., A.S. and I.A. conducted the experiments. I.G.V. and A.R. conceived computational methods. I. G. V., R.W. and Y.S. conceived, developed and implemented the computational methods. N. H. participated in study design and interpretation. I.G.V., I.A., and A. R. wrote the manuscript with input from all the authors.
Competing financial interests
The authors declare no competing financial interests.
Accession codes. The complete microarray data sets are available from the Gene Expression Omnibus (GSE40452).