Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics) provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects) in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature.
We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning) to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p < 0.05) association with gene expression. Using the literature as a "gold standard" to compare 14 genes with data from both this study and the literature, we observed 80% and 85% concordance for genes exhibiting and not exhibiting significant cis sequence effects in our study, respectively.
Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.