The relationship between copy number and gene expression has only begun to be explored as most studies are focused on identifying regions of copy number variation (CNV) [23
]. The first studies to extensively explore CNV effects on expression in mice highlighted the potential for widespread impact of CNVs on shaping the transcriptome of various tissues [6
]. Recent studies of CNV effects on gene expression in human and mouse rely upon naturally occurring variation (deletions, duplications, triplications, etc) and have been limited to cis
]. Radiation hybrid panels allow a genome-wide survey of gene expression changes due to copy number increases and are not limited to regions of previously identified CNVs.
Several lines of evidence support the broader applicability of RH panels in understanding gene expression networks. Though highly multiplexed, RH panels are not unlike other systems such as transgenic organisms or transfected cell lines which have given useful biologically insights. Phenotypic mapping experiments using radiation hybrids have successfully located human and murine viral entry proteins [29
] by exploiting the ability of RH clones to correctly express exogenous genes and synthesize and post-translationally modify the resulting proteins. Recent sequencing efforts of the hamster genome showed that coding sequences are 88% conserved with human [33
The gene-gene correlation between human RH and SymAtlas datasets also implies no substantial difference in gene expression between our human RH panels and in vivo
gene expression for the 12,000 genes we tested. One caveat is their different sources of genetic variation so this result should be considered in context with other available evidence. Unlike genetic coexpression studies, the high resolution of the RH approach allowed construction of directed genetic networks from the mouse RH data. These directed networks showed significant overlap with other networks including protein-protein interaction and coexpression networks [34
]. Adding the human RH data will improve the resolution and power of the directed RH genetic networks giving additional insights into the hierarchical circuitry of gene regulation.
Using a human-hamster RH panel, a mouse-hamster RH panel, an aneuploid mouse dataset and publicly available TCGA data, we present strong evidence that many genes possess the ability to decrease their gene expression in response to increased copy number (i.e., possess negative cis
α). In the mouse and human RH datasets, 30% of genes show this ability compared to 6% of surveyed TCGA genes. Some of this is likely due to the difference in coverage: the entire human/mouse genome was represented in the RH panels while only 38% of the genome was covered in TCGA data. A small number of negative cis
alphas have been reported in human [5
] and mouse [26
], but the RH approach is the first to interrogate the entire mammalian genome.
Additional factors may underlie some of these negative cis ceQTLs, but are unable to account for the totality of negative cis ceQTLs. Antisense transcription plays no significant role and the inclusion of partial length genes in each RH clone could maximally account for only a minority (< 21%) of negative cis ceQTLs.
Across the RH and TCGA data, the most enriched gene ontology categories for genes that decrease expression in response to increased copy number involved signaling, receptor activity and membrane functions. This finding is new and suggests that signaling pathways are tightly regulated and may possess autoregulatory feedback to compensate for increased copy number. Signaling genes were recently found to be enriched among human CNVs [24
] and under positive selective pressure [35
], possibly because negative cis α values confer a regulatory robustness in the face of sequence changes. Study of individual genes should reveal details of the responsible mechanisms.
We found 42 common genes with negative cis α between the two RH and TCGA data sets (Additional file 2
Table S2). Surprisingly, the relatively modest overlap in the number of genes still yields a high degree of similarity in GO categories across the three data sets suggesting conserved pathways are affected.
We observed that cis
ceQTLs on the human X chromosome showed substantially lower effect sizes than autosomes - a discovery we first noted in the mouse RH panel. The attenuation of the relationship between dosage and expression is independent of Xist
mediated X chromosome inactivation and may represent a form of previously unseen dosage compensation in mammals. In placental mammals, X chromosome inactivation occurs through the expression of Xist
, a noncoding RNA on the future inactive X chromosome (Xi) [36
]. Transcribed sequences from the Xist
locus coat the Xi-elect by binding nongenic regions of the X chromosome [37
]. The predicted secondary RNA structure of Xist
possesses two stem loops and may serve as a scaffold for silencing factors [39
]. Chromatin modification [40
], scaffold proteins [41
], and polycomb proteins [42
] have all been implicated in the initiation and maintainance of X chromosome inactivation although the picture is far from clear. In contrast to mammals, Drosophila
] and C. elegans
] both use transcriptional control for X chromosome dosage compensation. The autoregulatory control of X chromosome expression found in the human and mouse RH panels may thus represent an evolutionary remnant of these invertebrate dosage compensation mechanisms which has since been supplemented by X chromosome inactivation. The same attenuation pattern was found in male TCGA data on the X chromsosome. While cancer resembles RH clones in some respects, cancer cells differ in several important aspects such as mutation, selection, heterogeneity of fragment length and differences in genome coverage.
loci, we found evidence of conserved regulating genes between the human and mouse RH panels. Trans
ceQTLs were particularly associated with genes involved in binding, signaling and ion-channel activity suggesting that these genes tend to represent network hubs and that copy number changes in these genes can contribute to non-lethal variation. We found enrichment of transcription factor activity in mouse but not human RH data at FDR < 0.25. However, at FDR < 0.3, transcription factor activity was enriched in human RH as well. Trans
regulatory hotspots have been observed in eQTL studies involving yeast [16
], mouse [45
] and human [46
] and are commonly interpreted as evidence for master regulators. However, unanticipated factors in the data may contribute to false positives. For instance, a high degree of relatedness between mouse strains has produced signatures of regulatory hotspots [47
] and association with groups of highly correlated genes has produced unlikely regulatory hotspots [48
]. Integrating additional information such as transcription factor binding sites, protein-protein interaction data and functional analysis is helpful in identifying likely candidates when unanticipated heterogeneity may exist [48
We found noncoding ceQTLs in both human and mouse. Debate continues about the importance of the substantial portion of the genome that does not code for genes. While it is clear that much of the genome is actively transcribed, the role of these regions is unclear. We examined new datasets containing genes and functional genomic elements in noncoding regions, yet the vast majority of our noncoding ceQTLs cannot be explained by these recent discoveries. We also found no significant overlap of the location of noncoding ceQTL blocks in both species at FDR < 0.25. At a slightly less stringent FDR < 0.3, there is significant overlap in the locations of noncoding ceQTL blocks in both species but the regulated genes differ. This may reflect evolutionary divergence. Indeed, microRNAs, many of which are conserved across species, have also been found to show species-specific regulation [51
Our own search for novel microRNAs in noncoding ceQTLs yielded no candidates, though it is likely that improved screening techniques and computational algorithms may aid their discovery. Also, there were very small numbers of other unconventional RNAs such as linc RNAs in the noncoding ceQTLs. Thus, unanticipated forms of gene regulation seem likely. While the RH approach does not reveal possible mechanisms of action, the noncoding ceQTL data could act as a guide for discovery of these novel elements by allowing transfection of overlapping genomic DNA fragments traversing the ceQTL combined with transcript profiling as a bioassay.
Radiation hybrid panels exist for a number of other organisms including sheep [52
], pig [53
], cow [54
], rat [56
] and dog [57
]. The potential exists for probing species-specific copy number effects on gene expression. Amalgamating these data sets can also be used to improve mapping resolution and examine common networks of gene regulation and regulatory regions.