We sequenced the SHR genome to high depth and produced a full genome analysis of BN-Lx
to establish a comprehensive resource of nucleotide and structural genomic variants between these two founder strains of the HXB/BXH RI panel. We identified 3.2 million SNVs and 425,924 indels. Studies performed on three human individuals detected 3.4 [21
], 3.2 [22
] and 3.3 [23
] milion SNVs and the numbers of indels detected in two of these studies were 170,202 [21
] and 292,102 [22
]. Thus, the number of SNVs and indels between the two investigated rat strains is highly comparable to the amounts found when human individuals are compared to the human reference genome [21
In the model system used here the far majority of variation is homozygous because each strain is highly inbred. Environmental effects on expression are limited because the rats are kept in a controlled facility. Moreover, by analyzing transcriptomes of multiple animals of the same strain biological variation can be accounted for. Therefore, we could directly investigate the effect of genetic variation on quantitative and qualitative diversity in transcription in vivo.
It has been reported before that a large part of the variation in transcription levels can be assigned to CNVs [24
] and that the level of contribution is related to the size of the CNV [26
]. In contrast, recent studies on different inbred mouse strains have suggested a relatively small role for SVs on phenotypic and transcriptional variation. However, the strongest QTLs were enriched for effects of the variant type SV [27
In line with these latter reports, we found that the total amount of genes affected by SVs is lower than that of the SNVs and indels (108 versus 489), but when SVs are involved, the effects are more likely to be prominent.
We have estimated which type of variants are the best predictors for differential expression by considering which proportion of the expressed genes carrying a specific type of mutation are differentially expressed. Using this criterion, duplications are the best predictors for differential expression. Expanding the analyses to more tissues and more rat strains can further generalize the predictive value of different types of genomic variants. Moreover, the predictive value of especially SNVs and indels can be different in organisms for which the gene annotation is more accurate - for example, in human. It should be noted that our assessments are aimed towards functional predictions based on whole genome analyses. The strong predictive value of stop and splice mutations as de novo events or as recessive alleles in congenital disease, for example, is not questioned here. However, our data do illustrate the importance of full spectrum analyses of genomic variants in (clinical) genetic studies.
Moreover, the data of specific genes are illustrative for the diversity of mechanisms by which SVs can change expression. By nucleotide analyses of RNA-seq data in gene amplifications we found examples that show gene amplification and subsequent diversification do not necessarily lead to pseudogenes, but can result in expression from both the original and the duplicated locus and can result in a novel transcript. Deletions of >2 kb of repetitive elements located in introns were found in genes with a change in level of transcription. It has been described that remnants of repetitive elements can form transcriptional start sites and binding sites for transcription factors. As it is also shown that the repetitive sequences are unstable sites in the genome and are thus likely to mutate, these sequences could be a major direct cause of transcriptional variation between individuals. Due to the limited size (3.0 to 6.4 kb) and the repetitive nature of the encompassed sequence, this type of deletion is largely missed in predictions based on NGS read coverage. These findings thus illustrate the importance of including mate-pair analyses in genetic studies.
Combining stop codon-related small genomic variants and the SVs, only 37 of the 532 differentially expressed genes contain a genomic variant with a predicted effect on gene expression. Equivalently, changes in transcript structure rarely overlap with genomic variants. This suggests that the major part of both qualitative and quantitative variation in the transcriptome is regulated by unknown regulatory elements or result from changes in transcriptional networks. Segregation analyses of the transcriptome in the RI panel can dissect these cis and trans regulatory factors, both for gene expression levels and splicing variation.
We have not evaluated the genetic content of the individual 30 RI lines. Some de novo
mutations may have occurred in the 80 generations of breeding. Indeed, in a previous study evaluating CNV patterns in the founder lines and two of the RI lines, three novel CNVs were detected in the RI strains compared to 626 normally segregating CNVs [29
]. Nevertheless, as lines are crossed without mixing from the F2 generation on, all novel variants will be private to a single line. Although such variants could affect biology, they do not affect the genotype-phenotype segregation pattern within the RI panel, which is the common type of use for this genetic system.