We have sequenced, annotated and analysed the MHC-B gene cluster of the black grouse. Black grouse is a wild bird species and represents the lineage Tetraoninae in the Galliformes
]. With the availability of its MHC sequence and several other fully sequenced galliform MHC we now, for the first time, have the opportunity to perform a comparative genomic study of avian MHC. The MHC-B gene cluster of black grouse is just as simple and streamlined as that of chicken
). By contrast, the quail MHC-B has more duplicated genes and pseudogenes (10 BLB, 7 BF and 8 BG loci) compared to black grouse
). The turkey MHC-B and the golden pheasant MHC-B, which are phylogenetically closer to black grouse than chicken and quail, also have expanded BLB genes
). Our results provide additional evidence that the extremely compact nature of the chicken MHC is not merely an artefact of domestication, since we find a similar pattern in a wild related species that is fully outbred.
The nucleotide identity of the black grouse MHC-B shows high similarity with that of other galliform birds (Table
). However, individual MHC genes might have different evolutionary histories. The phylogenetic tree based on the entire MHC-B sequence shows exactly the same topology as neutral markers
). But when we used the coding sequences of each gene independently, only TAPBP, BRD2, DMA, DMB1, BF1 and TAP2 share the same tree topology with neutral genes (Figure
). Interestingly, for the genes Blec1, DMB2, TAP1 and BF2, the black grouse is more divergent than turkey and pheasant, while for the two BLB genes (BLB1 and BLB2), black grouse is closer to pheasant than turkey (Figure
, Additional file
). If we use the dN
values to estimate the selection pressure on the genes, we find that the genes following the neutral phylogenetic expectation generally have lower dN
values than genes with aberrant tree topologies (Figure
). Taken together the deviation from neutral phylogenetic patterns and elevated dN
levels indicates that the molecular evolution of several of the genes in the galliform MHC region is affected by selective forces. Especially, the MHC class IIB genes (BLB1 and BLB2) show elevated levels of dN
. The peptide binding regions of these genes are classical examples of balancing selection
]. An intriguing possibility is that the clustering of the grouse BLB and pheasant BLB might be due to specific selection in the wild since they were both sampled from natural populations, but this hypothesis needs further confirmation.
Another striking finding of the comparison of galliform MHC-B is the repeated inversions of the TAPBP gene and the TAP1-TAP2 block (Figure
). Using data from all available galliform MHC sequences, we found that the inversion of the TAPBP gene, located between the two MHC class IIB loci, seems to have happened once in the clade; either in the lineage leading to chicken and quail or in the lineage of pheasant, turkey and grouse, depending on the ancestral state. By contrast, the inversion of the TAP1-TAP2 gene block has occurred at least twice (depending on what the ancestral state is, which we cannot tell from our data) during the evolution of this clade. The TAP1-TAP2 block is flanked by the two Class I genes, BF1 and BF2. The events of gene conversion or interlocus recombination in the evolution of MHC genes have been reported before (reviewed in
]). Here, our result could provide an indirect evidence for such events since if the gene conversion occurred repeatedly, the non-random breakpoints beside the two BF loci may lead to the inversion of the gene block TAP1-TAP2 between them. However, this needs to be further tested.
In this study, we constructed a fosmid library and used it to screen of the MHC genes. Fosmid libraries have been widely used in large genome projects such as gap closure of the human genome or metagenomics analysis
]. The success of our experiment demonstrates that the fosmid library is also suitable and convenient to sequence specific genome regions of a species whose genome map is unavailable. To verify the expression of the identified MHC genes, we mapped the transcriptome data of a 454 sequencing project to the MHC region. This allows us to efficiently confirm the expression of 17 identified genes. However, due to the limited 454 sequencing depth, it was not possible to cover all the 19 putatively expressed genes. Moreover, not all exons were verified in the expressed genes. This could be because of limited sequencing coverage, alternative splicing or artefacts from the mapping method to the short exons