|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Diverse TR and IG repertoires are generated by V(D)J somatic recombination. Genomic studies have been pivotal in cataloguing the V, D, J and C genes present in the various TR/IG loci and describing how duplication events have expanded the number of these genes. Such studies have also provided insights into the evolution of these loci and the complex mechanisms that regulate TR/IG expression. In this study we analyze the sequence of the third bovine genome assembly to characterize the germline repertoire of bovine TRB genes and compare the organization, evolution and regulatory structure of the bovine TRB locus with that of humans and mice.
The TRB locus in the third bovine genome assembly is distributed over 5 scaffolds, extending to ~730 Kb. The available sequence contains 134 TRBV genes, assigned to 24 subgroups, and 3 clusters of DJC genes, each comprising a single TRBD gene, 5–7 TRBJ genes and a single TRBC gene. Seventy-nine of the TRBV genes are predicted to be functional. Comparison with the human and murine TRB loci shows that the gene order, as well as the sequences of non-coding elements that regulate TRB expression, are highly conserved in the bovine. Dot-plot analyses demonstrate that expansion of the genomic TRBV repertoire has occurred via a complex and extensive series of duplications, predominantly involving DNA blocks containing multiple genes. These duplication events have resulted in massive expansion of several TRBV subgroups, most notably TRBV6, 9 and 21 which contain 40, 35 and 16 members respectively. Similarly, duplication has lead to the generation of a third DJC cluster. Analyses of cDNA data confirms the diversity of the TRBV genes and, in addition, identifies a substantial number of TRBV genes, predominantly from the larger subgroups, which are still absent from the genome assembly. The observed gene duplication within the bovine TRB locus has created a repertoire of phylogenetically diverse functional TRBV genes, which is substantially larger than that described for humans and mice.
The analyses completed in this study reveal that, although the gene content and organization of the bovine TRB locus are broadly similar to that of humans and mice, multiple duplication events have led to a marked expansion in the number of TRB genes. Similar expansions in other ruminant TR loci suggest strong evolutionary pressures in this lineage have selected for the development of enlarged sets of TR genes that can contribute to diverse TR repertoires.
Diverse αβTR repertoires are crucial to the maintenance of effective T cell-mediated immunity . Estimates based on direct measurement indicate that in humans and mice individuals express a repertoire of approximately 2 × 107  and 2 × 106  unique αβTRs respectively. As with the other antigen-specific receptors (IG of B cells and γδTRs of γδT cells) diversity is generated in lymphocytic precursors by somatic recombination of discontiguous variable (V), diversity (D – TRB chains but not TRA chains) and joining (J) genes to form the membrane-distal variable domains. Diversity is derived from both the different permutations of V(D)J genes used to form the TRA and TRB chains expressed by individual thymocytes (combinatorial diversity) and also by the activity of terminal deoxynucleotide transferase and exonuclease at the V(D)J junction during recombination (junctional diversity). Consequently, much of the diversity is focused in the third complementarity determining region (CDR3), which is encoded by the V(D)J junction and forms the most intimate association with the antigenic peptide component of the peptide-MHC (pMHC) ligand of αβTRs, whereas the CDR1 and CDR2 of the TRA and TRB chains, that predominantly interact with the MHC, are encoded within the germline V genes [4,5].
TRB chain genes are located in the TRB locus, which in humans is ~620 Kb long and situated on chromosome 7 and in mice is ~700 Kb and located on chromosome 6 [6-8]. In both species, the organisation of TRB genes is similar, with a library of TRBV genes positioned at the 5' end and 2 DJC clusters (each composed of a single TRBD, 6–7 TRBJ and a single TRBC gene) followed by a single TRBV gene with an inverted transcriptional orientation located at the 3'end [9,10]. The germline repertoire of TRBV genes in humans is composed of 65 genes belonging to 30 subgroups (genes with > 75% nucleotide identity), whilst in mice the repertoire comprises 35 genes belonging to 31 subgroups [10-12] The disparity between the number of TRBV genes in the 2 species is the result of multiple duplication events within the human TRB locus, most of which have involved tandem duplication of blocks of DNA (homology units) containing genes from more than one subgroup [10,13].
V(D)J recombination is initiated by site-specific DNA cleavage at recombination signal sequences (RSs) mediated by enzymes encoded by recombination activating genes (RAG) 1 and 2 . RSs comprise conserved heptamer and nonamer sequences separated by spacers of either 12 bp (12-RS – located 5'to TRBD and TRBJ genes) or 23 bp (23-RS – located 3' to TRBV and TRBD genes). Correct V(D)J assembly is achieved as recombination can only occur between genes flanked with RS of dissimilar length (the '12/23 rule') and direct TRBV/TRBJ recombination is prohibited by the 'beyond 12/23' phenomenon [15-17]. As with other antigen-specific receptor loci, recombination in the TRB locus is under strict lineage-, stage- and allele-specific regulation associated with control of RAG accessibility to RSs mediated through alterations in chromatin structure (the 'accessibility hypothesis') [18-20]. Numerous studies have shown that both the TRB enhancer (Eβ) and transcriptional promoters within the TRB locus serve as RAG accessibility control elements, playing a critical role in regulating chromatin structure and therefore recombination of TRB genes [21-27].
Current knowledge of the TRB gene repertoires of agriculturally important artiodactyl species (e.g. pigs, cattle and sheep) is limited. Published analyses of rearranged TRB transcripts have demonstrated the expression of 19 TRBV subgroups in pigs [28,29], 13 subgroups in sheep  and 17 subgroups in cattle, some of which have undergone extensive duplication [31-34]. Information on the genomic organisation of the TRB loci is predominantly restricted to the DJC region, which in the pig was found to be composed of 2 tandemly arranged DJC clusters  but in sheep contained 3 tandemly arranged DJC clusters . Preliminary analysis of a BAC clone corresponding to part of the DJC region indicates that in cattle the DJC region may also consist of 3 DJC clusters .
Sequencing of the complete TRB loci in human and mice allowed the repertoire of TRB genes in these species to be fully characterised and also permitted analysis of the organisation, regulation and evolution of this immunologically important locus [9,10]. In this study we have used the sequence of the third bovine genome assembly (Btau_3.1) to further study the bovine TRB repertoire and TRB locus. Although the sequence of the TRB locus is incomplete, the results reveal that duplication within the locus has been prolific leading to a massive expansion of TRBV gene numbers and the generation of a third DJC cluster. Furthermore, the analysis shows that the genomic organisation of the TRB locus and the non-coding elements that regulate TRB expression are highly conserved in cattle when compared to that of humans and mice.
A total of 134 TRBV genes, distributed over 5 scaffolds was identified in Btau_3.1 (Additional File 1). Consistent with data from fluorescent in situ hybridisation studies , the majority of the TRBV genes were located on 2 scaffolds (Chr4.003.105 [91 TRBV] and Chr4.003.108 [21 TRBV]) mapped to chromosome 4, whilst the remaining genes were located on 3 scaffolds (ChrUn.003.1717 [18 TRBV], ChrUn003.4367 [3 TRBV] and ChrUn.003.12588 [1 TRBV]) which have not been assigned a chromosomal location. Within the scaffolds are several regions of undetermined sequence, including large areas of ~35 Kb and ~147 Kb on Chr4.003.105 and Chr4.003.108 respectively.
Each TRBV gene is composed of i) a short leader (L) exon, generally of ~50 bp, ii) a single intron of between ~80 and ~500 bp and iii) a variable (V) exon of ~300 bp, immediately flanked at the 3'end with a 23-RS. Comparison of the nucleotide sequence of each of the bovine TRBV genes with human TRBV gene sequences, revealed maximum levels of similarity between the species ranging from 71.8% to 83.15% for all except one of the bovine TRBV genes. On the basis of these results, bovine TRBV genes were considered orthologues of their most similar human counterpart and were assigned to subgroups named according to the orthologous human subgroup (Table (Table1).1). The single bovine TRBV gene that lacked significant homology to any of the human TRBV genes displayed 76.6% identity with the murine TRBV1 gene (which lacks a human orthologue) and was placed in subgroup TRBVX. The subgroups thus established generally adhered to the definition of members within a subgroup exhibiting > 75% nucleotide sequence identity. However, the single member of the TRBV10 subgroup displayed > 75% identity to all of the TRBV6 genes and the identity between members of the TRBV9 and TRBV5 subgroups was often > 75% (data not shown). Conversely a single member of the TRBV19 subgroup (TRBV19f) showed only 63.0–64.8% nucleotide identity with the other members of this subgroup.
Of the 24 bovine subgroups present in the genome assembly, 11 have multiple members. Subgroups TRBV6, 9 and 21 have all undergone substantial expansion, having 40, 35 and 16 members respectively – together representing 68% of the total Btau_3.1 TRBV gene repertoire. Southern blot analysis corroborates the presence of large numbers of TRBV6 and 9 genes in the genome (Figure (Figure11).
A prominent feature of the genomic organisation of TRBV genes (Figure (Figure2)2) is that members of expanded subgroups are generally intercalated with members of other expanded subgroups in a recurrent pattern. Thus, a 165 KB region of Chr4.003.105 and virtually all of scaffold ChrUn.003.1717 are composed of alternating TRBV6 and 9 genes (reflected in the similarity in the patterns of larger bands (> 4.3 Kb) obtained in southern blots of genomic DNA when hybridised with TRBV9- and TRBV6-specific probes in Figure Figure1),1), whilst the 3'end of Chr4.003.105 and the 5' end of Chr4.003.108 contain repeated units comprising TRBV18, 19, 20 and 21 genes. Dot-plot analyses indicate that this organisation has arisen through a series of complex tandem duplication events within the regions in which TRBV9 and 6 genes and TRBV18, 19, 20 and 21 genes are located (Figure (Figure3).3). Six homology units, ranging in size from ~7 Kb to ~31 Kb and encompassing from 1 to 11 TRBV genes were identified. Three of these homology units (represented by the orange, dark blue and black bars in Figure Figure2)2) have undergone multiple (2–3) duplications: variation in the length of the different copies of these homology units (represented by broken lines in Figure Figure2),2), suggests that either i) distinct iterations of a duplication event have involved different components of the homology unit or ii) the different copies have been subject to different post-duplication deletions.
The levels of nucleotide identity between TRBV genes in corresponding positions in homology units is frequently high: 12 pairs of TRBV6 genes, 11 pairs of TRBV9 and 1 pair each of TRBV19 and TRBV20 have identical coding sequences whilst 1 pair of TRBV4 genes and 3 pairs of TRBV21 as well as 4 triplets of TRBV6 and 4 triplets of TRBV9 genes have > 97% sequence identity in the coding region.
A total of 3 TRBD, 18 TRBJ and 3 TRBC genes were identified in the assembly (Additional File 1). These genes were all located within a ~26 Kb region of scaffold Chr4.003.108 and organised into 3 tandemly arranged clusters, each of ~7 Kb length and composed of a single TRBD gene, 5-7 TRBJ genes and a single TRBC gene (Figure (Figure2).2). Dot-plot analysis reveals the presence of a third DJC cluster is attributable to duplication of a ~7 Kb region, one copy of which incorporates TRBC1, TRBD2 and the TRBJ2 cluster whilst the other copy incorporates TRBC2, TRBD3 and the TRBJ3 cluster (Figure (Figure4).4). Numerous interruptions in the line representing the duplicated region indicate that there has been significant post-duplication deletion/insertion related modification of the duplicated region.
The nucleotide and deduced amino acid sequence of the 3 TRBD and 18 TRBJ genes as well as the flanking RS are shown in Figure Figure5a5a and and5b5b respectively. The 13 bp (TRBD1) or 16 bp (TRBD2 and 3) TRBD genes are G-rich and encode at least one glycine in all 3 potential reading frames with the exception of the 3rd reading frame of TRBD1. The TRBJ genes range in size from 43 bp to 59 bp in length and all encode the canonical FGXG amino acid motif that defines TRBJ genes.
As with all mammalian TRBC genes so far characterised, bovine TRBC1 and TRBC3 genes are composed of 4 exons, 3 introns and a 3'UTR region. The structure of the TRBC2 gene is anticipated to be the same but due to a region of undetermined sequence between exons 1 and 3 we were unable to identify exon 2. The exon nucleotide sequences of TRBC1 and 3 are very similar (97%), resulting in the encoded 178 amino acid products differing by only 5 residues – 3 in the extra-cellular domain and 2 in the cytoplasmic domain (Figure (Figure6a).6a). The incomplete sequence for TRBC2 is predicted to encode a product identical to that of TRBC1. In contrast to the high levels of pairwise identity between the exonic nucleotide sequences of all 3 TRBC genes, the nucleotide sequences of the 3rd intron and the 3'UTR regions of TRBC3 show low identity with TRBC1 and 2, whereas the latter two genes show a high level of identity (Figure (Figure6b).6b). The similarity in the lengths of TRBD2 and 3, the phylogenetic clustering of TRBJ2 and TRBJ3 genes in corresponding genomic positions (Figure (Figure7)7) and the similarity in the sequences of the 3rd introns and 3'UTRs of TRBC1 and 2 all reflect the duplication history of the DJC region as described in Figure Figure44.
Computational analysis was used to predict the functional competency of the TRBV, TRBD and TRBJ genes present in the genome assembly. Fifty-five (41%) of the TRBV genes identified are predicted to encode pseudogenes (Additional File 2), whilst TRBJ1-2 (which has a 1 bp deletion that results in the canonical FGXG motif being lost in the ORF) and TRBJ1-3 (which lacks a RS that is compatible with somatic recombination) are also predicted to be non-functional (Figure (Figure5).5). Thus, the functional repertoire comprises 79 (59%) TRBV genes (comprising 66 unique coding TRBV sequences) belonging to 19 different subgroups, 3 TRBD genes and 16 TRBJ genes. This provides a potential 3168 (66 × 3 × 16) unique VDJ permutations that can be used during somatic recombination of TRB chains.
Phylogenetic analysis demonstrates that the repertoire of functional TRBV genes is diverse (Figure (Figure8),8), with representatives in each of the 6 phylogenetic groups (A-F) described previously in humans and mice [13,39]. The phylogenetic groupings were supported by high (99%), bootstrap values (PB), with the exception of group A (PB = 76%). Maximum likelihood analysis using a variety of nucleotide models provides essentially similar phylogenetic clustering (data not shown) indicating the reliability of the tree presented in Figure Figure8.8. The extensive intermingling of murine, human and bovine TRBV subgroups is consistent with the establishment of distinct subgroups having occurred prior to mammalian radiation. Conversely, the formation of distinct clades of TRBV genes of orthologous subgroups from different species (e.g. TRBV6 genes from human and bovine form distinct clades) indicates that duplication within subgroups has predominantly occurred post-speciation. Despite this and the substantial disparity in the number of functional TRBV genes present in the 3 species, the distribution amongst the different phylogenetic groups is similar (Figure (Figure8b).8b). Phylogenetic groups C and F form a minor component of the functional TRBV repertoire, whilst the contributions from groups B and D are also fairly modest. In contrast, group E and to an even greater extent group A, are over-represented, together representing between 61.9% (in the mouse) and 81.6% (in humans) of the total functional repertoire.
Phylogenetic analysis resolves the functional TRBJ genes in human, mice and Btau_3.1 into 12 groups (Figure (Figure7).7). With the exception of group 8, each group is supported by high PB values and is composed of orthologues that share a conserved order in the genome; consistent with the duplication history of the DJC region, TRBJ genes from both the 2nd and 3rd bovine DJC clusters group together with the respective genes from the 2nd murine and human DJC clusters. Group 8, which contains TRBJ2-3, human and murine TRBJ2-4 and bovine TRBJ3-3 and 3–4 genes is only supported by a PB value of 57%. The diversity of the functional TRBJ repertoire across the 3 species is comparable, with humans having functional genes in each of the 12 phylogenetic groups whilst in both mice and Btau_3.1 only 2 groups lack functional members: groups 3 (TRBJ1-3) and 11 (TRBJ1-2) in Btau_3.1 and groups 2 (TRBJ1-6) and 6 (TRBJ2-6) in mice.
Using a variety of RT-PCR based methods, our group has isolated and sequenced over 1000 partial TRB chain cDNAs [31-33,40]. With a few exceptions, these cDNA sequences incorporated > 230 bp of the TRBV gene (i.e. over 80% of the sequence encoding the mature peptide) and in some cases the full length of the TRBV gene. Based on the assumption that sequences sharing ≤ 97% nucleotide identity represent distinct genes, as applied in studies of human and murine TRBV genes [41,42], our analysis identified 86 putative unique TRBV genes belonging to 22 subgroups (Table (Table1).1). Analysis of the sequence data available for each cDNA sequence indicated that only one of these genes is predicted to be non-functional (TRBV6-6 – due to a loss of a conserved cysteine encoding codon at position 104 according to the IMGT numbering system ), consistent with evidence that mRNA expression of non-functional TRB chains is down-regulated and therefore limited [10,44,45].
All of the TRBV subgroups identified in Btau_3.1 were also identified in the cDNA sequences. In addition a single member of subgroup TRBV27, which is not represented in Btau_3.1, was identified. Although the repertoire of cDNA and functional genomic TRBV genes is broadly similar in both size and distribution across the subgroups (Table (Table1),1), detailed comparison shows that for the large subgroups there is substantial disparity between the genes present in the assembled genome and the cDNA repertoire. Thus, only 23 (35.9%) of the 64 TRBV genes in subgroups TRBV6, 9, 19, 20, 21 and 29 identified from cDNA analysis had genomic sequences showing 100% sequence identity, whilst 26 (40.6%) exhibited ≤ 97% identity to any genomic sequence. The remaining 15 (23.4%) sequences displayed nucleotide identities of 98–99% with genomic gene sequences. Given the presence in the genome of TRBV genes exhibiting > 97% nucleotide identity, it is not possible to conclude whether these cDNAs represent allelic variants of already identified genes or products of additional genes absent from the current assembly. That at least some of the cDNAs fall into the latter category, is supported by the identification of sequences exhibiting 100% identity with 4 of these cDNA sequences, in the genome project's WGS trace archive (data not shown). Conversely, 40 (63.5%) of the 63 predicted functional genes identified in these subgroups within the genome did not have cDNA sequences displaying 100% nucleotide identity. Twenty-two of these (34.9%) showed 98–99% identity with cDNA sequences, whilst the remaining 18 (28.6%) exhibited < 97% identity to any of the cDNA sequences. In contrast to the findings with multi-member subgroups, cDNAs corresponding to 9 subgroups with single members identified in the genome showed 100% identity with their respective genome sequence. Thus, comparison with cDNA evidence suggests that substantial numbers of genes, predominantly from the large subgroups, are still absent from Btau_3.1.
In contrast to the TRBV situation, all TRBD and TRBC genes and the 16 functional TRBJ genes identified in Btau_3.1 have been found expressed in cDNA. In addition, a functional allele of the TRBJ1-2 gene, which compared to the genomic sequence has a 1 bp insertion that restores the ORF encoding the FGXG motif (Figure (Figure5),5), has been identified. No evidence for further TRBD, TRBJ or TRBC genes was found from cDNA analysis, suggesting the repertoire of these genes in Btau_3.1 is complete.
The organisation of genes within Chr4.003.105 and Chr4.003.108 shows marked conservation in order with that of genes at the 5' and 3'ends of the human TRB locus respectively (Figure (Figure9).9). Genes belonging to orthologous TRBV subgroups show a similar order although in some areas substantial tandem duplication has obscured synteny at the level of individual genes (e.g. the TRBV3-13 regions in the human TRB locus and on Chr4.003.105). TRBVX, the only bovine TRBV gene that has no human orthologue, is located in a position (between the dopamine-β-hydroxylase-like (DβH-like) gene and trypsinogen genes) syntenic with its murine orthologue (mTRBV1). As mentioned previously, synteny is also shown in the organisation of TRBJ genes, with human and bovine orthologues occupying conserved positions in their relative clusters.
Synteny also extends to non-TRB genes located within and adjacent to the human TRB locus. The 5 trypsinogen genes located on Chr4.003.105 and Chr4.003.108 are syntenic to those located towards the 5'end and 3'end of the human TRB locus respectively, and the DβH-like gene flanking the 5' end of the human TRB locus and the ephrin type-b receptor 6 precursor (EPH6), transient receptor potential cation channel subfamily V (TRPV) member 5, TRPV6 and kell blood group glycoprotein (kell) loci flanking the 3'end of the human TRB locus all have bovine orthologues in syntenic positions on the 2 scaffolds.
Although fluorescent in situ hybridisation studies have previously shown that the position of the TRB locus with respect to the blue cone pigment (BCP) and chloride channel protein 1 (CLCN1) genes are conserved between ruminants and humans , this analysis shows for the first time the high levels of synteny between human and bovine orthologues both within and adjacent to the TRB locus. Extrapolation of this synteny predicts that Chr4.003.105 and Chr4.003.108 (in reverse complement) should be juxtaposed on chromosome 4, whilst Chr4.003.106, which contains bovine orthologues to numerous genes that in humans are telomeric to the TRB locus (including CLCN1) should be located 3' to Chr4.003.108 and Chr4.003.107, which contains a bovine orthologue to the acylglycerol kinase (AGK) gene that in humans lies centromeric to the TRB locus, should be positioned 5' to Chr4.003.105. This location of Chr4.003.106 has also been predicted by clone paired-end analysis (data not shown).
The RS sequences of the bovine TRBV, TRBJ and TRBD genes show a high degree of similarity to canonical RS sequences defined for the corresponding human and murine genes (Figure (Figure10).10). In the bovine TRBV 23-RS sequences the CACAG of the heptamer and a poly-A stretch in the centre of the nonamer show a high degree of intra- and inter-species conservation. Although conservation of the spacer is less marked, the CTGCA sequence proximal to the heptamer is reasonably well conserved and similar to that of humans. Despite more limited conservation, the 8 bp proximal to the nonamer also displays a degree of cross-species similarity. Similarly, the bovine TRBJ RS exhibits intra- and inter-species conservation of the first 3 bp (CAC) of the heptamer sequence and a poly-A stretch in the nonamer. Conservation in the spacer is limited but overrepresentation of G at the position 6 bp from the heptamer and C 4 bp from the nonamer is seen in both the bovine and human.
We identified a 187 bp sequence ~8.7 Kb 3' to the TRBC3 gene that displays high nucleotide similarity with the sequences of the enhancers (Eβ) identified in the murine (76.2%) and human (78.3%) TRB loci [47-49]. Sequences of the protein binding sites described in the Eβs of humans (Tβ2-4) and mice (βE1-6) are well conserved in the aligned bovine sequence (Figure 11a); several of the transcription binding sites shown to be functionally important in the regulation in Eβ function [47-50], such as the GATA-binding site in βE1/Tβ2 and the κE2-binding motif in βE3 are absolutely conserved, whilst others (such as he CRE in βE2/Tβ2) show minimal sequence divergence. In contrast, the sequence of the TRBD1 promoter (PDβ1), which includes the ~300 bp directly upstream of the TRBD1 gene and has been well defined in the mouse [51,52], displays a more limited nucleotide identity (59.2%) with the bovine sequence. As shown in Figure 11b, some transcription factor binding sites demonstrated to be important for PDβ1 function (SP-1 and GATA) in mice and/or humans are absent from the bovine sequence, whereas others (TATA box, AP-1 and Ikaros/Lyf-1) have been well conserved [51-53].
We were also able to identify a conserved cAMP responsive element (CRE) motif (AGTGAxxTGA) in the ~80–120 bp upstream sequence of 57 (42.6%) of the bovine TRBV genes (Figure 11c). This motif is found within conserved decamer sequences in the promoter regions of some murine and human TRBV genes  and has been shown to specifically bind a splice variant of a CRE binding protein preferentially expressed in the thymus . In general, the CRE motif has been found in bovine genes that are members of subgroups that are orthologous to the human TRBV subgroups in which the CRE motif is also found .
Sequencing of the human and murine TRB loci has defined the repertoire of TRB genes in these species as well as provided insights into the organisation, evolution and regulation of this immunologically important locus [9,10]. Although the bovine TRB locus sequence in the third bovine genome assembly is incomplete, the analysis conducted in the present study has provided insight into the nature of the bovine TRB gene repertoire and its genomic organisation and evolution.
The most striking result from the study was the large number of TRBV genes identified (134) which is over twice the number found in humans and four times that in mice [11,12]. Although 11 of the 24 bovine subgroups identified in the genome contain multiple genes, the majority of the TRBV repertoire expansion is attributable to the extensive membership of just 3 subgroups, TRBV6 (40 members), 9 (35 members) and 21 (16 members). By comparison, the largest subgroups in humans are TRBV6 and TRBV7, with 9 members each, whilst in mice the only multi-membered subgroups are TRBV12 and 13 with 3 members each. As in humans the expansion of the TRBV repertoire has predominantly occurred through the tandem duplication of DNA blocks containing genes from more than 1 subgroup [9,10]. Dot-plot analyses shows that this duplication in the bovine is complex, leading to the generation of 6 homology units ranging in size from ~7 Kb to ~31 Kb and encompassing between 1 and 11 TRBV genes. Unequal cross-over (non-homologous meiotic recombination) between genome-wide repeats (e.g. SINEs, LINEs and LTRs) has been proposed to act as the substrate for such duplication events in TR loci . Although genome-wide repeats are found in the DNA surrounding the bovine TRBV genes (Additional file 3), as in the human TRB locus they are only rarely found at the boundaries of duplicated homology units (data not shown), suggesting their contribution to mediating duplication is minimal .
Although gene conversion of TRBV genes has been documented , as with other multi-gene families in the immune system, TRBV genes predominantly follow a 'birth-and-death' model of evolution [13,57,58], by which new genes are created by repeated gene duplication, some of which are maintained in the genome whilst others are deleted or become non-functional due to mutation. Genes maintained following duplication are subject to progressive divergence, providing the opportunity for diversification of the gene repertoire. Gene duplication within the TR loci has occurred sporadically over hundreds of millions of years with ancient duplications accounting for the generation of different subgroups and more recent duplications giving rise to the different members within subgroups [9,59]. The continuous nature of duplication and divergence of bovine TRBV genes is evident in the multi-membered subgroups where nucleotide identity between members ranges between 75.5% and 100%. The complete identity observed between some TRBV genes suggests that some of the duplication events have occurred very recently. Similar features have been described for the murine TRA and human IGκ loci, within which recent duplications, < 8 million years ago (MYA), have created pairs of V genes exhibiting ~97% nucleotide identity [9,60,61]. Southern blot data showing differences in RFLP banding patterns of TRBV9 and 27 genes in DNA from Bos indicus and Bos taurus animals (Figure (Figure1c),1c), which only diverged between 0.25 – 2 million years ago [62-64], provides further evidence of recent evolutionary development of the TRBV repertoire in cattle.
The distribution of TRBV genes over 5 scaffolds and the presence of > 180 Kb of undetermined sequence within two of the scaffolds indicate that characterisation of the genomic TRBV repertoire remains incomplete. Comparison with cDNA sequences data indicates that the number of undefined genes is substantial – only 36/86 (42%) of TRBV genes identified from cDNA analysis have corresponding identical sequences in Btau_3.1. Most of the identified TRBV genes missing from the assembly are members of the large subgroups TRBV6, 9, 19, 20, 21 and 29, further enhancing their numerical dominance. Although it is anticipated that completion of the TRB locus sequence will incorporate significant numbers of additional TRBV genes, the possible existence of insertion-deletion related polymorphisms (IDRPs), which can lead to intra-species variation in genomic TRBV gene repertoires as described in human and murine TRB loci [65-68], may result in some of the genes identified in cDNA being genuinely absent from the sequenced bovine genome
The proportion of TRBV pseudogenes in Btau_3.1 is 41%, comparable to that seen in both humans (29%) and mice (40%), suggesting that the 'death rate' in TRBV gene evolution is generally high . Pseudogene formation has occurred sporadically throughout the evolution of TRBV genes, with genes that have lost function tending to subsequently accumulate further lesions . The majority of bovine TRBV pseudogenes (57%) contain a single lesion and thus appear to have arisen recently; the remaining 43% have multiple lesions of varying severity and complexity (Additional file 2). In addition to pseudogenes we also identified 7 sequences showing limited local similarity to TRBV genes in Btau_3.1 (Figure (Figure22 – open boxes). Such severely mutated TRBV 'relics', 22 of which have been identified in the human TRB locus ., are considered to represent the remnants of ancient pseudogene formation.
In contradiction to a previous report , the repertoire of functional TRBV genes in Btau_3.1 exhibits a level of phylogenetic diversity similar to that of humans and mice. Phylogenetic groups A and E are over-represented in all 3 species, which in humans and cattle is largely attributable to expansion of subgroups TRBV5, 6, 7 and 10 and TRBV6, 9 and 21 respectively; in mice the expansion of subgroups TRBV12 and 13 make a more modest contribution to this over-representation. Much of the expansion of human subgroups TRBV5, 6 and 7 occurred 24–32 MYA  and similarly, as described above, in bovines much of the expansion of subgroups TRBV6, 9 and 21 subgroups appears to be very recent. As these expansions have occurred subsequent to primate/artiodactyl divergence (~100MYA) , over-representation of phylogenetic groups A and E must have occurred as parallel but independent events in these lineages, raising interesting questions about the evolutionary pressures that shape the functional TRBV repertoire.
In contrast to the wide variation in the organisation of TRBD, TRBJ and TRBC genes in the TRB locus seen in non-mammalian vertebrates [70-74], in mammals the arrangement of tandemly located DJC clusters is well conserved [10,35,36,75,76]. Although most placental species studied have 2, variation in the number of DJC clusters has been observed, with unequal cross-over events between TRBC genes usually invoked as the most likely explanation for this variation [36,77,78]. The results from this study provide the first description of the entire bovine DJC region and confirm that like sheep, cattle have 3 complete DJC clusters [36,37]. Dot-plot and sequence analyses indicate that unequal crossover between the ancestral TRBC1 and TRBC3 genes led to duplication of a region incorporating TRBC1, TRBD3 and TRBJ3 genes, generating the DJC2 cluster. The similarity with the structure of the ovine DJC region suggests that this duplication event occurred prior to ovine/bovine divergence 35.7 MYA . As with duplication of TRBV genes, expansion of TRBD and TRBJ gene numbers has increased the number of genes available to partake in somatic recombination – the 3168 different VDJ permutations possible from the functional genes present in Btau_3.1 is considerably more than that for either humans (42 × 2 × 13 = 1092) or mice (21 × 2 × 11 = 462). Interestingly, the sequence of bovine TRBD1 gene is the first TRBD gene described that doesn't encode a glycine residue (considered integral to the structure of the CDR3β) in all 3 reading frames . However, analysis of cDNA reveals evidence of expression by functional TRB chains of TRBD1in the reading frame that doesn't encode a glycine but have generated a glycine codon by nucleotide editing at the VJ junction (data not shown).
In contrast to TRBV, TRBD and TRBJ genes which encode products that bind to a diverse array of peptide-MHC ligands, TRBC gene products interact with components of the CD3 complex which are non-polymorphic. Consequently, due to structural restrictions TRBC genes are subject to concerted evolutionary pressures with intra-species homogenization through gene conversion evident in both humans and mice [9,80]. Similarly, the bovine TRBC genes were found to encode near identical products, most likely as a result of gene conversion, although in the case of TRBC1 and TRBC2 genes this more probably reflects minimal divergence following duplication.
Comparison with the human and murine sequences shows that non-coding elements that regulate TRB expression, such as the Eβ, promoters and RSs are highly conserved in the bovine. This is consistent with work demonstrating that the critical role of RSs has enforced a high level of evolutionary conservation [70,73,74,81] and that Eβ and PDβ1 sequences are well conserved in eutherian species [36,52]. Although transcriptional factor binding sites are less well conserved in the putative PDβ1 than the Eβ sequence, the Ikaros/Lyf-1 and Ap-1 binding sites of the PDβ1, which are vital in enforcing stage-specific (i.e. Dβ-Jβ prior to Vβ-DβJβ recombination) are conserved [53,82]. Our analysis of putative TRBV promoter elements was restricted to the well described CRE motif [9,10,54]. However, TRBV promoters are complex and expression of TRBV genes whose promoters lack the CRE motif is maintained through the function of other transcriptional factor binding sites . A more detailed analysis of the bovine TRBV promoters would be interesting given the potential influence this may have on shaping the expressed TRBV repertoire , but is beyond the scope of the current study.
The portion of the bovine TRB locus described in Btau_3.1 encompasses > 730 Kb of sequence (excluding the regions of undetermined sequence in Chr4.003.105 and Chr4.003.108). Thus, although incomplete, the bovine TRB locus is larger than that of either humans (620 Kb) or mice (700 Kb), mainly as a consequence of the duplications leading to the dramatic expansion of the V genes. In contrast to V genes, duplication of trypsinogen genes within the TRB locus is more limited in the bovine (Figure (Figure2),2), where only 5 trypsinogen genes were identified, compared to the human and murine where more extensive duplication has lead to the presence of 8 and 20 trypsinogen genes respectively. Despite the differences in duplication events, the organisation of both TR and non-TR genes within and adjacent to the TRB locus exhibits a striking conserved synteny between cattle, humans and mice [9,84]. Indeed, the organisation of genes within the TRB locus and its position relative to adjacent loci is ancient, with marked conserved synteny also demonstrated between eutherian and marsupial mammalian species and, to a large extent, chickens [9,75]. Given the evidence for conserved synteny of TRBV gene organisation despite dissimilar duplication/deletion events between mice, humans and cattle, the results of the analysis completed in this study suggest that several subgroups including TRBV1, 2, 17, 22 and 23, which were not identified in the genome assembly or from cDNA sequences, may have been deleted from the bovine genome (Figure (Figure9).9). Conservation of synteny would predict that the genomic location of the TRBV27 gene identified from cDNA analysis will be within the region of undetermined sequence in Chr4.003.108 between the TRBV26 and 28 genes (Figure (Figure99).
The primary purpose of this study was to analyse the sequence data made available from the third bovine genome assembly to gain a better understanding of the bovine TRB gene repertoire and the organization and evolution of the bovine TRB locus. The results of this analysis have shown that: (1) the bovine TRBV genomic repertoire has been dramatically expanded through a complex series of duplication events and although incomplete, is the largest described to date. These duplication events have led to massive expansion in the membership of certain TRBV subgroups, particularly TRBV6, 9 and 21; (2) duplication has generated 3 DJC clusters compared to 2 in humans and mice; (3) the elements that regulate TRB expression and the organisation of genes within and adjacent to the TRB loci exhibit high levels of conservation between humans, mice and cattle. (4) cDNA evidence indicates that a substantial number of TRBV genes, predominantly from the larger subgroups are absent from the current assembly.
Notwithstanding the incomplete assembly of the TRB locus, the results of these analyses clearly demonstrate that cattle possess a phylogenetically diverse repertoire of functional TRB genes that is substantially larger than that described for other species. These findings, together with emerging evidence of similar expansions of gene repertoires for other TR chains in ruminants [85,86] suggest that strong evolutionary pressures have driven a generic enlargement of TR gene numbers, and thus greater potential TR diversity, in the ruminant lineage. Further studies are required to define the full extent of these expansions and to understand their evolutionary basis.
The third bovine genome assembly (Btau_3.1) was retrieved directly from the sequencing centre involved in the Bovine Genome Project . Sequences of bovine TRB genes identified from cDNA analysis [31-34,88] and human and murine TRB genes derived from GenBank sequences [GenBank:U66059–U66061] and [GenBank:AE000663–AE000665] respectively were compared to Btau_3.1 using the BLASTN algorithm tool on the Ensembl website . The locations of the TRB genes identified in Btau_3.1 are provided in Additional file 1.
Basic sequence analysis such as CLUSTALW alignments  and translations were conducted using the DNAsis Max v2.7 programme (MiraiBio, Alameda, CA, USA). Comparison of human and genomic bovine TRBV sequences was completed using the IMGT/V-QUEST programme  available through the IMGT homepage, http://imgt.cines.fr[92,93]. Dot-plot analyses were completed using dotter programme  and dottup programme from EMBOSS . Genome wide repeats were analysed using the RepeatMasker programme . Sequence logos in Figures Figures1010 and and1111 were generated using the Weblogo programme .
To be considered functional TRBV gene segment sequences were required to maintain i) splice sites appropriate for RNA editing, ii) open reading frames, which include codons for the conserved cysteine, tryptophan and cysteine residues at positions 23, 41 and 104 (IMGT unique numbering system ) respectively and iii) a 23-RS compatible with somatic recombination [98,99].
As the sequence of the TRB locus was incomplete, it was not possible to fully implement the IMGT nomenclature system which requires knowledge of the genomic order of genes from the 5' to 3' end of the locus . Genomic bovine TRBV gene subgroups have been named according to the orthologous subgroups in humans and members of subgroups have been given an alphabetic rather than numeric description to avoid confusion with previously published cDNA data . The DJC region of the locus appears complete and so the TRBD, TRBJ and TRBC have been named according to their 5' to 3' order in the genome.
Phylogenetic analysis was performed on the nucleotide sequences of functional TRBV genes (coding sequences) and TRBJ genes (coding sequence + RS) of humans, mice and bovine as identified in Btau_3.1. Neighbour-joining method  analysis was performed with the MEGA4 software [102,103], using the uncorrected nucleotide differences (p-distance), which is known to give better results when a large number of sequences which contain a relatively small number of nucleotides are examined . Maximum likelihood analysis was performed under a variety of substitution models (Jukes-Cantor, Kimura 2-parameter, Felenstein 81, Felenstein 84, Tamura-Nei 93 and General Time Reversible) as implemented by the PHYML programme [105,106], using the phylogenetic tree produced by NJ analysis as the primary tree. In each case the reliability of the resulting trees was estimated by the approximate Likelihood Ratio Test (aLRT) method .
Southern blots were performed as described in Houston et al. . Analysis of bovine TRB cDNA expression included use of the methods described in previous studies [31-33,88]. cDNA sequences for bovine TRBV genes were derived from sequences submitted to public databases: [D90130, AJ006569, AJ006570, AJ006572, AJ006573, AJ006574, AJ006575, AJ006576, D90121, D90123, AJ006583, D90122, D90127, D90124, D90133, L18951, AJ006580, D90131, AJ006579, AJ235264, AJ235265, AJ006578, AJ235266, AJ235267, AJ006577, D90128, D90129, AJ235268, D90125AJ006347, AJ006346 and D90132] and additional un-submitted data that are available on request from the corresponding author.
IG: immunolobulin; IMGT: IMGT®, the international ImMunoGeneTics information system®; RFLP: restriction fragment length polymorphism; TRB: T cell receptor beta chain; TR: T cell receptor; TRBV: β variable gene; TRBJ: β joining gene; TRBD: β diversity gene; TRBC: β constant gene.
TC, JA, AL and WM conceived the study. TC performed the genome analysis, the cDNA and southern blot work, and with JA completed the sequence and phylogenetic analysis. All authors read and approved the manuscript.
Table S1 – Location of bovine TRB genes in Btau_3.1. For each gene the start and stop positions on the scaffold as well as the orientation are shown.
Table S2 – Summary of the analysis of non-functional TRBV gene segments identified in Btau_3.1. To be considered functional TRBV gene segment sequences were required to fulfil several conditions as described in the Methods. The table above summarises the lesions identified in the genomic sequences of TRBV genes that are considered to render them non-functional and therefore psuedogenes. TRBV6c, 6g, 6r, 6w and 6aa are open-reading frame (ORF) pseudogenes as their reading frame is maintained, the lesions causing their predicted loss of function resulting either in the loss of a conserved residue or loss of a consensus 23-RS.
Figure S1 – Analysis of the repeat content in (A) Chr4.003.105, (B) Chr4.003.108_RC and (C) ChrUn.003.1717. The repeat content is calculated as the percentage (y-axis) of the sequence (x-axis) composed of repeat elements in 10 Kb windows using a rolling 1 Kb step. The positions of TRB genes and interposed/adjacent non-TRB genes, as well as regions of undetermined sequence are shown according to the legend. Abbreviations for gene names used in the figure have been described in Figure Figure99.
We would like to thank Dr. Zen Lu from the Roslin Institute for help with dot-plot data. We would also like to acknowledge funding from the Veterinary Training and Research Initiative award from the Department of Food and Rural Affairs (DEFRA) and the Scottish Higher Education Funding Council (SHEFC). Some of the work on cDNA analysis was also supported by Wellcome Trust Grant No. 075820.