Our experiment is outlined in . SNPs are generated by partial sequencing at ¼ coverage for each of 3 domestic breeds (a male broiler, a female layer, and a female Silkie), and comparison of the resultant reads to the 6.6x genome for the wild ancestor of domestic chickens, Red Jungle Fowl (RJF). We expect marked heterozygosity within the 3 domestic lines, but not within RJF because the sequenced bird for the genome project is from a highly inbred line that is essentially homozygous.
Figure 1 SNP discovery experiment. We sampled 3 domestic chickens at 1/4 coverage each and compared the resultant sequence to the 6.6x draft genome of Red Jungle Fowl (RJF). Chicken photographs shown here are provided by Bill Payne (RJF), Paul Hocking (broiler), (more ...)
Comparing the sequence reads for broiler, layer, and Silkie to the genome of RJF, we identified nearly a million SNPs in each instance, at mean rates of about 5 SNP/kb, as shown in . Notice that all of the “SNP rates” quoted in this paper are computed as nucleotide diversities
(π), and given in units of π×103
. After correcting for SNPs detected in more than one line, there are 2,833,578 variant sites, or one potential marker every 374 bp along the 1.06 Gb genome. To assess the reliability of these data, we resequenced 295 SNPs in the same bird in which it was detected (Table S1
). As many as 94% of the SNPs were confirmed. However, confirmation rates are sensitive to the functional context (e.g.,
coding versus non-coding) and SNPs in rare categories are less likely to be confirmed. In fact, only 83% of the non-synonymous SNPs were confirmed. Small indels of a few base pairs in length (mean of 2.3 and median of 1) are detected at rates that are well correlated with the corresponding SNP rates, but smaller by about a factor of 10.
Table 1 Frequency of SNPs in different comparisons of RJF and the 3 domestic chicken lines. In addition, we show comparisons involving 3.8-Mb of finished BAC sequence from another line of the layer (White Leghorn) breed. SNP rates are an estimate of nucleotide (more ...)
Chicken autosomes are sorted by size into 5 large macrochromosomes (GGA1-5), 5 intermediate chromosomes (GGA6-10), and 28 microchromosomes (GGA11-38). SNP and indel rates are independent of chromosome size, as shown in . GGA16 is the sole exception, because it contains the highly variable MHC10
. This result is surprising, as recombination rates on microchromosomes are much higher than on macrochromosomes1
and studies in other organisms exhibit a positive correlation between recombination rates and polymorphism rates11-12
. We expect that higher gene densities on microchromosomes likely counteract the effect of higher recombination rates.
SNP and indel rates versus chromosome number. We excluded all sequences with “random” chromosome positions. Because of the assembly problems on W, it is not shown. The rates are computed as an average of all 3 domestic lines.
SNP rates between and within chicken lines can be determined from the overlaps between reads. demonstrates that almost every pairwise combination gives a SNP rate of just over 5 SNP/kb, except for broiler-broiler and layer-layer, which show about 4 SNP/kb, as expected since the sequenced broiler and layer are from closed breeding lines. To ensure that there are no confounding factors from the single read nature of our data, or the complexities of the overlap analysis, we used comparisons to 3.8 Mb of finished BAC sequence of a different White Leghorn13
from the same breed but not the same line as the layer sequenced herein. 15 chromosomes were sampled, and the results confirm our rates of 5 SNP/kb. In another study of 15 kb of introns in 25 birds from 10 divergent breeds of domestic chickens14
, an autosomal rate of 6.5 SNP/kb was reported.
To quantify SNP and indel rate variation versus functional context, we considered three gene sets representing 3868 confirmed mRNA transcripts, 995 chicken orthologs of human disease genes, and 17,709 Ensembl annotations from the RJF analysis1
. Complete details for all 3 lines are tabulated in the supplements (Table S2
). An excerpt for broiler is shown in . Within genes defined by mRNA transcripts, the SNP rates are 3.5, 2.1, 5.7, and 3.4 SNP/kb in 5′-UTR, coding exon, intron, and 3′-UTR regions respectively. In coding regions, indel rates are 43 times smaller than SNP rates. Ka/Ks is 0.098, similar to what is typically seen in vertebrate comparisons. We also studied “conserved non-coding regions” from the RJF analysis1
. SNP rates are similar to those of coding exons, but indel rates are intermediate to those of coding exons and UTRs, which supports the notion that these regions are functional, but may not encode proteins.
Utility of these SNPs depends on their frequency of occurrence in commonly used chicken populations. Hence, we typed 125 SNPs (including coding and non-coding SNPs, randomly distributed across the chicken genome) in 10 unrelated individuals from each of 9 divergent lines representing an assortment of European breeds. This collection includes commercial broiler and layer breeds, standardized breeds selected for their morphological traits, and an unselected breed from Iceland (Table S3
). Both alleles segregated in 73% of 1113 successful marker-line combinations (out of 1125 possible combinations). Averaged minor allele frequency is 27%, but it decreases to 20% if marker-line combinations where one of the two alleles is fixed are included. This indicates that a majority of the SNPs are common variants that predate the divergence of modern breeds. Only 12% of the markers had a minor allele frequency of less than 10% in the 90 animals tested.
We demonstrate by example how these data can be used to target specific genome regions. Details of our experiments are in Supplement E (Examples). First, we consider a body weight related QTL on GGA4 that was previously mapped to a 150 cM interval15,16
. After a year of effort, where every known microsatellite (>50) was tested, 26 informative markers were developed. Further progress would have required the laborious sequencing of multiple chickens to find additional polymorphisms in this target region. With the SNP map, we selected 47 random broiler-layer SNPs, and ABI SNPlex assays were developed to type an experimental F2
cross (n = 466). 28 (60%) of these SNPs were informative, but none had breed specific alleles, confirming that most variations predate domestication. In just one month, we doubled the number of markers, and resolved the initial QTL into two QTLs that affect body weight at 3 and 9 weeks of age.
In addition to providing markers for fine mapping, these SNPs are a rich source of candidate polymorphisms for the causative differences underlying important traits. As an example, candidate genes for disease resistance often include TGF-β17,18
, and the major histocompatibility complex (MHC). We thus identified 40 SNPs from the SNP map in the coding or promoter regions of 12 cytokine genes. When typed against 8 inbred layer lines, 32 of these SNPs were informative. Cytokine genes on GGA13, including IL4
, two genes that are expressed in T helper-2 (Th2) cells, drive antibody response. Four of the six SNPs that were polymorphic among lines were in IL4
, and these SNPs were fixed for different alleles in lines N and 15I, which show differential antibody response to vaccination20
. These SNPs therefore allow us to test whether the IL4
loci directly determine the observed differential antibody response.