|Home | About | Journals | Submit | Contact Us | Français|
The phylogeography of cattle genetic variants has been extensively described and has informed the history of domestication. However, there remains a dearth of demographic models inferred from such data. Here, we describe sequence diversity at 37 000 bp sampled from 17 genes in cattle from Africa, Europe and India. Clearly distinct population histories are suggested between Bos indicus and Bos taurus, with the former displaying higher diversity statistics. We compare the unfolded site frequency spectra in each to those simulated using a diffusion approximation method and build a best-fitting model of past demography. This implies an earlier, possibly glaciation-induced population bottleneck in B. taurus ancestry with a later, possibly domestication-associated demographic constriction in B. indicus. Strikingly, the modelled indicine history also requires a majority secondary admixture from the South Asian aurochs, indicating a complex, more diffuse domestication process. This perhaps involved multiple domestications and/or introgression from wild oxen to domestic herds; the latter is plausible from archaeological evidence of contemporaneous wild and domestic remains across different regions of South Asia.
I should think, from facts communicated to me by Mr Blyth, on the habits, voice and constitution etc. of the humped Indian cattle, that these had descended from a different aboriginal stock from our European cattle; and several competent judges believe these latter may have had more than one wild parent. (Charles Darwin. ‘On the Origin of Species’, p.28)
In the light of over 20 years of molecular genetic investigation of diversity within cattle populations it is clear that Darwin and his correspondent Mr Blyth were correct; the divergence between Bos indicus and Bos taurus is too great to have emerged from domestication of a single population of the aurochs, Bos primigenius (Loftus et al. 1994). Mitochondrial sequences have been extensively sampled from diverse populations, including from fossil wild oxen, and have revealed a striking pattern of deeply branching clades with many very similar sequences clustered at each tip (Troy et al. 2001; Edwards et al. 2007; Achilli et al. 2009). This is interpreted as the signature of a discrete number of divergent wild ox populations contributing to the domestic pool; in each of B. indicus and B. taurus, two relatively similar clades predominate, with other types being found only in fossils or at very small frequencies in the domestic pool. Thus, domestication seems most likely to have been a limited sampling of the wild ox, with at least two (maybe more) episodes, one focused around the near East and one further East in the Indian subcontinent.
However, the nature of the domestication process forms an open question. The number of animals captured, its duration and whether it was discrete or continuous over a long period remain unknown. Whereas overall mitochondrial DNA (mtDNA) evidence does seem to point towards some restriction in place and time in domestication events (Machugh & Bradley 2001), low-frequency phylogenetically outlying haplotypes betray some degree of secondary introgression from the wild (Achilli et al. 2008, 2009; Stock et al. 2009). These must be regarded as suggesting much greater secondary autosomal contribution. Incorporation of wild matrilines must always have involved difficult captures; alternatively, input to the domestic pool via wild inseminations would have immediately resulted in semidomestic hybrids and seems clearly a more likely, and hence frequent, type of introgression to have occurred within domestic history.
The recent genomewide analysis by the bovine HapMap consortium found high levels of genetic diversity in all cattle breeds surveyed, exceeding those found in human and dog populations, and revealed more than twice as much nucleotide diversity in one indicine breed in comparison to B. taurus breeds (Marth et al. 2003; Ostrander & Wayne 2005; Gibbs et al. 2009; Lewin 2009). These data agree with previous findings from major histocompatibility complex (MHC) allele diversity by Vila et al. (2005), which strongly argue that the allele spectra in many domestic species today are not depleted in diversity, as might be expected from a traditional view of early domestic history involving a severe capture bottleneck. Additionally, it has been noted that levels of protein polymorphism in cattle, as assayed by classical methods, are typical for mammals (Lenstra & Bradley 1999), showing no dramatic indication of a restricted capture of diversity from the wild.
In contrast, two recent analyses suggest discernible population genetic signatures of the domestication process within modern cattle. Estimates of past effective population sizes derived from examining linkage disequilibrium decay with distance, using a medium density single nucleotide polymorphism (SNP) panel, indicate a large predomestication population with an approximately 50-fold decline at domestication followed by a further decline with the formation of modern breeds (Goddard & Hayes 2009). Also, Finlay et al. (2007) examined mtDNA datasets from four domestic Bovidae (cattle, mithan, yak and water buffalo) using the Bayesian approach to the coalescent process implemented in BEAST, and compared these to two wild bovid samples (bison and African buffalo). In each domestic sample, a sharp recent increase in effective population size was estimated that seems consistent with the timeframe of domestic history. That this was related to human capture and taming was suggested by the observation that neither wild bovine showed this phenomenon.
Here, we sought to model the demographic history (and hence illuminate the domestication process) of cattle populations sampled from three continents, choosing, as far as possible, breeds known from prior analysis to display only limited or no recent admixture between ancestral populations. We analysed sequence diversity assayed at 17 genes and explored plausible demographic models by comparing the unfolded allele frequency spectra with those generated under simulations using a diffusion approximation method. These loci include genes involved in immunity and some have indication of adaptive history: the initial rationale for their sequencing (Freeman et al. 2008). As such they represent an imperfect dataset for demographic modelling but we argue that this remains an interesting exercise given the scale of the dataset. In particular, any biases from locus choice should be similar in B. indicus and B. taurus and our strongest result is that of divergent domestication histories between the cattle subtypes. Our optimal model indicates an early bottleneck plus expansion within B. taurus, a later (possibly domestication-related) bottleneck within B. indicus and, strikingly, a substantial secondary input from local aurochsen to the South Asian domestic pool.
Seventeen genes were fully or partially resequenced in one of two sequencing panels. TLR2, TLR4, TLR5, TLR7 and TLR9 were resequenced in a smaller panel made up of five European B. taurus (Charolais, Jersey, Simmental, Friesian and Aberdeen Angus), five African B. taurus (Somba and N'Dama), five B. indicus (Hariana, Sahiwal and Tharparker) and one outgroup species, Bos gaurus (Gaur).
The remaining genes including seven previously described (Freeman et al. 2008) were resequenced in a panel comprising 20 European B. taurus (Aberdeen Angus, Friesian, Norwegian Red, German Black, Highland, Alentejana, Mertolenga, Romagnola, Sikias, Anatolian Black), 11 African B. taurus representing three West African breeds known to have little or no zebu introgression (N'Dama, Somba and Lagune) and 11 Asian B. indicus representing four Indian breeds. In addition, one B. gaurus (Gaur), one Bison bison (Plains Bison) were included in the panel. Only those outgroup sequences for which haplotypes could be determined were used in analyses.
Primers were designed using Primer Select or Primer3. In the case of TLR2, TLR4, TLR5, TLR7, TLR9 and PRF1, primers were designed to amplify as much coding sequence as possible in a single fragment. In the case of the remaining genes, the sequencing strategy aimed to cover the whole coding sequence, approximately 1 kb of intronic sequence as well as 1 kb of flanking sequence at the 5′ and 3′ end of the gene. The ‘Overlapping Primers’ option of The PCR Suite (Van Baren & Heutink 2004), an interface to Primer 3, was used to design multiple overlapping pairs of primers to amplify MRPL30, MRPS14, MRPS05, FEZL and TNFAIP8L1. The amplification and sequencing of TRIF, CD2, IL2, IL13, ART4 and TYROBP have been described (Freeman et al. 2008).
The genes were amplified and subjected to Sanger sequencing using commercial service providers (Agowa GmbH, Germany) and sequence data were assembled and analysed using the Phred, Phrap and Consed software (Ewing & Green 1998; Ewing et al. 1998; Gordon et al. 1998). Polymorphic sites were identified using Polyphred (Stephens & Scheet 2005) and confirmed visually . PHASE v.2.1.1 (Stephens & Scheet 2005) was used to infer haplotypes and population genetic parameters were estimated using DnaSP (Rozas et al. 2003).
Genes that were completely or partially sequenced in at least a subset of samples from the three continents included: CD2, IL2, IL13, ART4, TYROBP, TRIF, TNFAIP8L1, FEZL, MRPL30, MRPS14, MRPS05, PRF1, TLR2, TLR4, TLR5 and TLR9. TLR7 was omitted from all analyses as no polymorphic sites were identified in the region sequenced.
The unfolded site frequency spectrum (SFS) is a summary statistic of the data that records the counts, xi, of the number of derived alleles at frequency i in a sample of n chromosomes. We were not interested in ancestral only or fixed sites, so that i ranges from 1 to n–1. The SFS has been a useful tool in understanding how demographic history and selection affects the pattern of genetic variation observed in a sample. Diffusion approximations to the forward model of molecular evolution can be used to derive, or numerically solve for, the expected SFS under various model structures (e.g. different demographic scenarios) and thus one can exploit a likelihood framework for estimating model parameters. A powerful tool for solving diffusion model equations and performing model fitting to frequency spectrum data has been recently developed, called ai (Gutenkunst et al. 2009).
Unlike the popular program ms (Hudson 2000), ai is based on a forward-time model. Briefly it models the evolution of allele frequencies (governed by a linear diffusion equation) in up to three populations as a function of time t. One of the advantages of ai is its flexibility and its computational speed. This software allows one to specify general demographic models with up to three populations and all the same demographic features (including population splitting, admixture, growth, bottlenecks and migration) as ms. It has optimization algorithm capabilities so that one can search for maximum-likelihood estimates for parameters based on the likelihood of the joint SFS. Unlike ms, this software assumes that sites are independent. All customizable settings such as grid spacing and time-step size for the numerical solver were kept at default values.
The software is designed to fit joint frequency spectrum data, a generalization of the frequency spectrum described above (xij is the number of derived alleles at frequency i in population 1 and at frequency j in population 2), but we applied it to two population marginal spectra because of the smaller size of our dataset that would provide only a very poorly sampled two-dimensional joint spectrum. For demographic inference analysis, we extracted the synonymous SNPs from all genes except those for which no outgroup information was available, MRPL30, MRPS05 and MRPS14. For the other loci, the ancestral state was estimated by comparison to either bison or gaur sequence. Where both were available, similar SFS resulted using either comparison.
Since we had a collection of samples with varying sample sizes, we needed to adjust the data for the SFS analysis by projecting the data down to a common sample size of 10 chromosomes (Marth et al. 2004). Enforcing a requirement of a minimum of 10 chromosomes per subpopulation per gene, we also discarded TLR9. The remaining data involved 12 genes with a total of 197 synonymous (or intronic or in a UTR region) segregating sites. We aggregated the African and European B. taurus samples, as they were determined to have similar SNP distributions by a principal components analysis (PCA), shown in figure 1.
Given a demographic model, dependent on demographic parameters such as time of taurus–indicus split, bottleneck size and duration and migration rates, we ran ai to generate an expected SFS and then optimized the parameter values by minimizing a χ2-measure of goodness of fit between the data and the model output (the χ2-statistic was chosen as it is standard for goodness-of-fit tests and approximates the more theoretically based Poisson log-likelihood underlying the SFS framework). The final parameters from the minimization routine are point estimates for the demographic parameters. To assign errors to these estimates, we followed the procedure in Gutenkunst et al. (2009): we generated sequences for each gene in the dataset using ms, where the number of segregating sites and sample sizes were set to correspond to the same values in the data, while the demographic model and parameters for ms came from the point estimates. For 74 SFSs generated in this manner, we then re-optimized the fit using ai to produce 74 point estimates for the parameters. The standard deviations of each parameter across these 74 estimates were then identified as uncertainty estimates.
To assess goodness of fit, we compare the χ2-measure of fit for the final inferred demographic model to the distribution of χ2-values for the best-fit-simulated datasets under the same model. The χ2-value for the best-fit demographic model is 0.023 which is consistent with the values from simulated datasets (minimum = 0.015, 1st quartile = 0.035, median = 0.056, 3rd quartile = 0.088, maximum = 0.29). Four simulated datasets out of the 74 have lower, and the remaining simulated datasets have higher, χ2-values. This implies that, using the simulated datasets to provide a null distribution, the fit to the real data is not unusually bad and neither is it an example of overfitting.
In order to convert to physical units we estimated the ancestral population size (Na). We used the population mutation rate θ = 4NaμL, where μ was assumed to be the average mammalian rate 2.2E–9 per year (or 11E–9 per generation assuming that one generation is equal to 5 years (Kumar & Subramanian 2002)), and L is the effective sequence length (37 000 base pairs in our case). In a neutral model, θ can be estimated by Watterson's formula (S/∑(1/i)). Alternatively, under the diffusion approach θ can be estimated by ∑(data)/∑(model); (Bustamante et al. 2001). The ∑(data) is simply the observed number of segregating sites and ∑(model) is the sum of the expected SFS using the best-fit parameters of our model. This yields a θ of 23, so that we estimate Na = 23/(4 × 3700 × 11E−9) = 14 127.
In total, the 37 011 bp of sequence yielded 303 SNPs, giving an average frequency of one SNP every 122 bp. There was a great variation between genes and between populations in terms of the number of SNPs identified, with 50 polymorphic sites being identified in TRIF and none in TLR7. As widely differing numbers of sites were sequenced for each region, it is more useful to consider the SNP frequencies than the absolute number of SNPs identified. Bos indicus showed the highest frequency of polymorphic sites while the African B. taurus showed the lowest. In the African B. taurus, on average a SNP was identified every 370 bp, in European B. taurus a SNP was identified every 204 bp, and in B. indicus a SNP was identified every 167 bp.
No SNPs were identified in TLR7 (826 bp sequenced) within any population. No SNPs were identified in TLR5 in either of the B. taurus populations (412 bp sequenced). Additionally, four other genes showed SNP frequencies of less than one SNP per kilobyte. These were TNFAIP8L1 in African B. taurus (one SNP every 1081 bp), PRF1 in European B. taurus (one SNP every 1107 bp), FEZL in African B. taurus (one SNP every 2141 bp) and IL2 in African B. taurus (one SNP every 3461 bp).
A number of genes showed much higher than average SNP frequencies. The following genes had a SNP frequency of greater than one SNP every 100 bp: ART4 in B. indicus (one SNP every 57 bp), TLR4 in B. indicus (one SNP every 73 bp) and MRPS05 in B. indicus (one SNP every 79 bp) and TLR9 in B. indicus (one SNP every 94 bp).
Table 1 lists summary statistics for each gene sequence within each continental sample. Overall, haplotype diversity is lowest in African B. taurus (11 out of the 16 genes) and highest in B. indicus (10/16). European B. taurus has the highest value for 3/16 and the lowest for 3/16. Although the African B. taurus typically has the lowest haplotype diversity, there are two genes for which it has the highest haplotype diversity. These are TLR9 and MRPS14. Only one gene, TRIF (also known as TICAM1), shows significantly lower than expected haplotype diversity in all three populations.
The unfolded SFS for both B. taurus and B. indicus are shown in figure 2. There is an abundance of derived alleles with higher frequencies than expected under neutrality. This SFS is quite different in shape from that obtained by Maceachern et al. (2009) based on 7500 SNPs in Angus and Holstein breeds; especially comparing the remarkably low number of rare SNPs in their data, which was far less than expected under a neutral model. This is most probably owing to the ascertainment bias inherent by genotyping SNPs that have been discovered first in a very small sample. In our analysis European and African B. taurus samples are combined, but even taken separately, we do not see the frequency distribution observed by Maceachern et al. in their analysis (results not shown).
We sought to describe the site frequency data with a demographic model that had a minimum of parameters but which had a good fit and realistically reflected current knowledge regarding cattle demographic history. During the process of refining demographic models, we came to some conclusions concerning what features the model had to contain to correctly capture the observed frequency spectra. First, a bottleneck followed by population growth in the B. taurus population is supported by the data but the estimated time for this bottleneck precedes domestication events. Second, to correctly capture the excess of synonymous-derived alleles at high frequency in the B. taurus and B. indicus spectra, we require some non-zero migration between the species. Third, B. indicus has an unusually slow decay in its SFS, which can be explained by substantial admixture from a wild outgroup derived from an ancient split. There is also evidence of a transitory bottleneck in B. indicus that may overlap with the timeframe of the domestication process. Figure 3 shows the observed and modelled SFS for both samples, including outer five percentile boundaries derived from ms simulations.
The final demographic model containing 12 parameters and two bottlenecks is shown in figure 4.
The final model fit for each parameter and standard deviation is shown in table 2 (in genetic units and in years, fixing an ancestral population size at 14 127 and generation time at 5 years). An indication of parameter estimate uncertainties is provided by the standard deviations (in genetic units) of each derived from ms simulations (table 2).
In summary, approximately 280 000 years ago the population of B. taurus and B. indicus diverged. Soon after (approx. 270 000 years ago) the model has B. indicus ancestors split from a wild group. These ancestors endured a severe bottleneck commencing about 15 000 years ago and ending roughly 10 000 years ago. Bos indicus are only allowed to recover back to the original population size before the bottleneck but there is a recent, substantial admixture event from the wild population, contributing the majority of the modern genome. Within the B. taurus ancestral population, a severe bottleneck (0.012 of the ancestral population survived) began about 40 000 years ago and was estimated to have ended roughly 36 000 years ago. After this bottleneck there was a recovery (modelled as instantaneous growth) and the population grew to 1.36 of its ancestral size. Migration from B. taurus to B. indicus (mt→i) was low and migration rates from B. indicus to B. taurus (mi→t) were only slightly larger.
The analysis of demographic history described here has been carried out on populations defined at the continental level, that is, an African B. taurus population, a European B. taurus population and a B. indicus population. These three samples were chosen to represent the products of two separate domestication events (B. indicus and B. taurus) plus a further division between African and European taurus that stretches back at least to the Neolithic (Loftus et al. 1994; Bradley et al. 1996). Interestingly, several lines of evidence robustly suggest that the demographic history of the B. indicus population is markedly different from either of the B. taurus populations.
At a descriptive level, nucleotide diversity seems higher within indicus; in 11 out of 16 loci, π is greatest in B. indicus, and in eight of 16 loci θW is greatest in B. indicus. Haplotype diversity is also consistently higher with 11 of 16 loci giving a highest value in this sample. This strongly suggests divergence in demographic histories between the two taxa.
Our initial modelling approach (not shown) was to assume possible bottlenecks followed by population expansion in the history of each of the three sampled populations. We also allowed migration between continents. Under this approach, the B. taurus allele frequency spectra fitted a demography with a severe bottleneck but this predates by a factor of 2.75 the modelled divergence of African and European cattle which, from other evidence, probably dates to the early domestic period or possibly before (Wendorf & Schild 1994; Bradley et al. 1996; Achilli et al. 2008; Ho et al. 2008). However, the search for a maximum-likelihood fit would not allow a bottleneck within B. indicus history, and an excess of high-frequency-derived alleles in that population suggested a more complex scenario.
To account for this, and given the clear similarity between the B. taurus populations illustrated in a PCA (figure 1), we developed a second three-population model with a single B. taurus population and two B. indicus populations: one a domesticated lineage with a bottleneck and recovery, plus a second parallel wild population that was allowed to mix with the first. This allowed a closer fit of observed and modelled data.
Several features of the best-fitting model are interesting. First, the early and clearly predomestic population bottleneck persists in the ancestral demography of European and African B. taurus. When one assumes an ancestral population size calculated at 14 127 and a generation time of 5 years this maps to between 40 and 36 kyrs ago with s.d. of the order of 11 kyrs. There is uncertainty in this calibration but we note that some estimates of the ancestral population size for wild cattle are larger (Goddard & Hayes 2009), which would push these calendar estimates further into the past. It is possible that this single-modelled bottleneck represents an amalgam of a more variegated demographic history, which perhaps includes a domestication bottleneck. However, it does seem to point primarily toward an earlier episode, perhaps associated with the effects of the glacial period on the West Asian aurochs. Interestingly, ancient DNA analysis of 34 mtDNA sequences sampled from European aurochsen bones also gives a signature of population expansion that concurs with the timeframe of glacial retreat and which, by definition, cannot be a result of the domestication process (Edwards et al. 2007; Ho et al. 2008). We note that an equivalent early population constriction is missing from B. indicus ancestry, probably reflecting a more benevolent glacial ecology within South Asia. Divergence between zebu and taurine mtDNA has been calibrated repeatedly, typically being of the order of hundreds of thousands of years (most recently by Ho et al. (2008) as between 84–219 kyr ago). Our estimate of 280 kyr with s.d. 22 kyr for ancestral separation is of similar order.
The strongest contrast between West and South Asian domestic history lies in the modelled complexity within the domestication process. The comparatively large haplotypic diversity and distinctive allele frequency spectrum within B. indicus seem to defy a uniform domestication narrative. Whereas the best-fitting model allows a B. indicus ancestral bottleneck that lies conceivably within the domestication timeframe, this requires a majority input (approx. 80%) into the domestic indicine gene pool from wild admixture. We are restricted here from examining more complex models but it should be noted that this may not exclude alternatives such as multiple strands of wild ox ancestry arising within B. indicus ancestry through two or more independent domestications.
Some geographical and genetic complexity within B. indicus domestication does seem likely from prior work. mtDNA sequencing has described two clusters of indicine mtDNA haplotypes and these have a non-random distribution when samples from east and west are compared (Baig et al. 2005; Lai et al. 2006; Magee et al. 2007). This is argued as suggesting that more than one domestication process may have given rise to B. indicus. Domestic cattle first appear in the Baluchistan early agricultural center, Mehrgarh, 8000–7000 bp. This is in the context of other domesticates (goats, barley and wheat) that probably diffused eastwards from the Fertile Crescent (Fuller 2006). However, artistic representations and thoracic vertebrae with morphology typical of zebu have allowed Meadow (1987) to argue that these early cattle were B. indicus of local origin. A transition in the nature of Bos bone finds (abundance, size) points towards this as a plausible primary centre for domestication of zebu from the South Asian aurochs, Bos primigenius namadicus.
Additional centres for cattle domestication in the subcontinent are possible. The South Indian Neolithic features distinctive ashmounds that are thought to have been produced by the burning of cattle dung (Misra et al. 2001). These have been argued as representing sites of cattle pens that may have served as enclosures for capture and taming of aurochs 5000 years ago (Allchin & Allchin 1974). That these may have given rise to a separate strain of zebu is (tentatively) indicated by early representations that seem to depict cattle with longer horns and more delicate limbs in comparison to those represented in early seals from Baluchistan. More persuasively, finds of large Bos bones suggest the survival of wild cattle populations into the Neolithic in South India and the Ganges region further East (reviewed in Fuller 2006). Recruitment of wild oxen via crossbreeding, or less commonly by capture, in different regions of the subcontinent is thus eminently plausible. Recently, mitochondrial sequence diversity levels within B. indicus has been described widely through Asia and strongly suggest that highest sequence diversity and therefore domestic origins lie within the Indian subcontinent but that these may well not be restricted to the primary Baluchistan region (Chen et al. 2010).
Our model suggests admixture between the continental groups. This is not surprising for a domesticate that would have been a passenger or subject of human trade and migration. This mirrors earlier work that shows traces of African genes in European B. taurus and vice versa (Cymbron et al. 1999; Loftus et al. 1999) and ancestral exchange between Near Eastern and Indian breeds (Loftus et al. 1999; Kumar et al. 2003). For example, a huge secondary input of B. indicus genes into Africa has been described, probably reflecting an ancient Indian Ocean trade corridor stretching back over three millennia (Hanotte et al. 2002).
One issue with the analysis we present here is that many of the loci chosen for resequencing have importance in the immune response and therefore may have been subject to natural selection, which may have skewed the SFS. We use only synonymous and non-coding SFS in our analysis but a possibility remains that skew may persist because of linkage disequilibrium with adaptive sequence variants. However, the high frequency bump in our B. indicus SFS that drives the strong conclusion of majority wild admixture in Indian cattle ancestry is a feature that remains even when the three genes (TRIF, TLR4, ART4) showing highest linkage disequilibrium are removed from the analysis. Also, a skew because of selective effects should manifest in both B. indicus and B. taurus and our strongest conclusion is of divergent domestication histories between the two taxa. Finally, it should be noted that reasonable modelling inference has been achieved in human populations using sequence data with a strong bias towards immunologically important loci, although of course this does not guarantee that such will always be the case (Akey et al. 2004).
In sum, we have constructed a model of bovine population history based on SFS from SNPs sampled from 37 000 bp resequenced in African, European and Indian cattle. This model has several surprising departures from a simple domestication narrative. Bos taurus history involves a severe population bottleneck that seems to predate domestication and which may be a result of glacial habitat restriction. Bos indicus population history contrasts with this, showing no contemporaneous bottleneck, perhaps because of a more benign environment during the glacial maximum but does allow for a later, probably domestication influenced, population constriction. Most strikingly, indicine cattle have had a more complex process of sampling from the wild with a prediction of a substantial secondary admixture from a parallel wild ox population, an assertion that seems archaeologically plausible.
This material is based on work supported by the Science Foundation Ireland under Grant No. 02-IN.1-B256. EH-S was a receipt of a EU Marie Curie short term fellowship in Trinity College Dublin, a Genetime training site. A.R. Freeman and D.J. Lynn assisted in project planning.
The first two authors contributed equally to the study.
One contribution of 18 to a Discussion Meeting Issue ‘Genetics and the causes of evolution: 150 years of progress since Darwin’.