Animal breeding has resulted in breeds that are extremely diverse for a large number of traits. These phenotypic differences are influenced by DNA variants and mediated by distinct transcriptome programmes across breeds. The dissection of the transcriptome breed differences will thus largely illuminate the physiological and genetic causes underlying artificial selection and breed differentiation. Here we reasoned that many changes observed in target selection tissues, i.e., muscle and fat, might actually be due to signals external to the tissue itself, notably through the endocrine system. In addition, and despite the relevance of the endocrine system in animals, the knowledge of its transcriptome is rather scarce.
Among the many statistical approaches employed to analyze microarrays e.g. [15
], here we have adopted a Bayesian approach that is closely related to mixed model methods [16
]. The main advantages of these approaches are their modeling flexibility while allowing the whole dataset to be analyzed simultaneously. Both characteristics are important to contrast a number of biological hypotheses with a minimum standard error and, consequently, maximum power. In addition the Bayesian method provides an exact measure of error for variances and variance ratios, whereas convolute approximations are needed in the classical mixed model method. There are also potential hindrances in the analyses reported here. The main one is that variance homogeneity is assumed, unaccounting for differences in variability across probes other than the effects included in the model. For instance, a gene whose expression level is very low across all tissues is less variable than a gene expressed in some tissues and switched off in another tissues. There is a rich literature on heteroskedasticity, especially within the Bayesian paradigm. However, accounting for variance heterogeneity obliges to fit a distinct variance for a gene or a group of genes, which can be extremely hard to compute given the large number of genes in microarrays. An alternative is to analyze each gene separately, but this is also undesirable because many parameters are estimated and the risk of false positive increases.
Nevertheless, the models used here fitted the data quite well, as evidenced by the high heritabilities reported (Table ), an there are several relevant conclusions that can be drawn from our study. First, we show that probeset is by far the most influential factor, accounting for at least 85% of total variability, whereas tissue explains in the order of 10%. Breed and sex contribute only marginally to total variance in the transcriptome. There are no differences across breeds nor between sexes in this respect. These results agree extremely well with a previous study from our group, although there we employed a different statistical methodology and we analyzed 16 tissues in a smaller number of animals (four) [3
]. Although sex and breed were, globally, much less relevant than the probeset effect, sex and breed do influence largely the expression of a subset of genes: those with most extreme z-scores.
Also importantly, we report a strong link between sex – bias and breed variability, which is not caused by large differences in reproductive development (Table ). The male gonad is the tissue with largest breed heritability (
, Table ). This result is coherent with several independent observations. First, many of the genes that have been identified as undergoing selection in the human and other species are involved in spermatogenesis [17
]. It is plausible then that artificial selection and breed divergence, which operates through the same mechanisms as evolution, affects also spermatogenesis. Second, modern breeding in livestock targets primarily the male because a sire can leave much more offspring than a dam, and thus selection intensity is usually much higher in males than in females. This would help to explain why sex biased and breed biased genes partially overlap (Table , Figure ). And third, recent work in Drosophila [20
] and references therein have confirmed that sex biased genes exhibit a faster rate of evolution than non biased genes; in addition, male biased genes show a stronger signal of adaptive selection than female biased genes. Nevertheless, although the most breed – biased genes tend to be also sex biased, the most sex biased genes are not among the most breed biased genes. Thus, these two phenomena are inextricably but only partially linked. Similarly, not all genes among those with largest breed – variability are involved in spermatogenesis (Table , Additional File 5
). Thus, the high breed heritability in the male gonads cannot be explained solely by changes in spermatogenesis. This is certainly an area meriting further research.
A worth noting observation is an elevated number of myogenesis related processes among genes involved in breed differentiation (Table ). The muscle – the major component of the meat – is the tissue that has been the main target of artificial selection in the pig. We have previously shown [3
] that a number of genes involved in myogenesis were differentially expressed in both tissues. Thus, the excess of muscle development genes in fat and gonads might simply reflect a pleiotropic change caused by a primary effect in muscle. Thus, our initial hypothesis that breed transcriptome differences might affect primarily the endocrine system should be reevaluated, as is not fully supported by our experimental results. In fact, it is quite remarkable that both sex and breed differences at the hypothalamus, one of the key endocrine organs, is smaller than in the rest of tissues studied (Table , Additional File 4
). Certainly, the endocrine system plays a fundamental role in animal's physiology and consequently in breed and sex differences, but may be transcriptome differences are more pronounced at development stages other than that studied here or affect a very small subset of genes.
As expected in the light of previous research in the pig and in other species, e.g., [3
], we find extensive evidence of sex biased probesets. Not surprisingly, the gonads are the most sex biased tissue overall (Table , all data are in additional file 2
). Globally, the most sex biased genes are also sex biased across a range of tissues, except in the gonads (Table , low triangular, and Figure ). Most sex biased genes in Table were identified previously by us and by an independent group, and some were confirmed by quantitative real time PCR [3
]. Several of the gonad sex biased genes identified here are known to be involved in gonadal development in mice and pigs [22
], like LHX9
, GATA4, AMH
= 6.7 in gonads, zsex
~ 0 in the rest of tissues), SOX9
= 12.0 in gonads, ~0 in the rest of tissues). In contrast, we do not find any sex bias for sex determining region (SRY
= 0.08), which initiates the sex differentiation cascade, probably because its temporal expression is very narrow, 10 – 12 days post coitum in the mouse [22
]. Follistatin, a glycoprotein forming part of the inhibin-activin-follistatin axis that plays an important role in follicular development within the ovary, is highly overexpressed in ovary (zsex
= -16.7) but no significant bias appears in the rest of tissues analyzed. The most female biased gene, nonetheless, is protein tyrosine phosphatase receptor type M (PTPRM
), an important signaling molecule that regulates cell growth and differentiation. This gene was already identified in a previous study [3
] as being also strongly female biased.
As genes work coordinately and thus their expression levels are correlated, considering gene modules should be a more powerful approach than analyzing each gene separately. We observe that connectivity varies across tissues (Figure ) and that the least connected transcriptome occurs when gonads of both sexes are jointly analyzed. This is likely a result of large heterogeneity in expression patterns between ovaries and testicles even before puberty. But, for our purposes, the main use of detecting modules was combining expression bias and connectivity in order to increase power and discover more subtle signals that may not be evident when studying each gene in isolation [24
]. Several approaches can be envisaged to attain this. Here, first we identified sets of highly correlated genes (modules) within each tissue using standard techniques [25
], followed by an assessment of whether any module was enriched in breed biased genes. Finally, we looked for over represented gene ontologies among all genes in that module. We found that different modules were enriched in specific ontologies (Table ), reflecting the modularity of gene expression. Importantly, we identify a series of biological processes (spermatogenesis, muscle differentiation and several metabolic processes) that have been the likely target, direct or indirect, of artificial selection. The next logical step will be to verify whether genes that have been the target of selection (showing evidence, e.g., of a selective sweep) in the pig are enriched in these gene ontologies. At least in humans, there a significant excess of genes undergoing natural selection are involved in spermatogenesis [26