|Home | About | Journals | Submit | Contact Us | Français|
All jawed vertebrates produce immunoglobulins (IGs) as a defense mechanism against pathogens. Typically, IGs are composed of two identical heavy chains (IGH) and two identical light chains (IGL). Most tetrapod species encode more than one isotype of light chains. Chicken is the only representative of birds for which genomic information is currently available and is an exception to the above rule because it encodes only a single IGL isotype (i.e., lambda). Here, we show that the genome of zebra finch, another bird species, encodes a single IGL isotype, that is, lambda, like the chicken. These results strongly suggest that the second isotype (i.e., kappa) present in both reptiles and mammals was lost in a very early stage of bird evolution. Furthermore, we show that both chicken and zebra finch contain a single set of functional variable, joining, and constant region genes and multiple variable region pseudogenes. The latter finding suggests that this type of genomic organization was already present in the common ancestor of these bird species and remained unchanged over a long evolutionary time. This conservation is in contrast with the high levels of variation observed in the mammalian IGL loci. The presence of a single functional variable region gene followed by multiple variable pseudogenes in zebra finch suggest that this species may be generating antibody diversity by a gene conversion-like mechanism like the chicken.
A typical immunoglobulin (antibody, Ig) in jawed vertebrates is composed of two identical heavy chains (IGH) and two identical light chains (IGL) and provides defense against all extracellular and some intracellular pathogens (Klein and Hořejší 1997). Jawed vertebrate species, with the exception of chickens, ducks, and bats, express more than one immunoglobulin light (IGL) chain isotype (Lundqvist et al. 2006; Criscitiello and Flajnik 2007; Das et al. 2008). In mammals, IGL genes generally exist in two distinct isotypes called kappa (κ) and lambda (λ). The genes for the two light chain isotypes are encoded at separate and unlinked loci, and the organization of κ and λ chain locus differs significantly (Wahlstrom et al. 1988; Lai et al. 1989). In general, the κ chain–encoding locus is arranged with multiple IGVK genes (variable kappa), a small cluster of IGJK (joining kappa) genes, and a single IGCK (constant kappa) gene, whereas in the λ chain–encoding locus multiple IGVL (variable lambda) genes are followed by IGJL (joining lambda) and IGCL (constant lambda) genes, which occur as IGJL–IGCL blocks, usually present in multiple copies (Frippiat et al. 1995; Kirschbaum et al. 1996; Kawasaki et al. 1997, 2001; Das et al. 2008).
Unlike humans and mice, chickens and ducks have been shown to exclusively express λ light chains (Sanders and Travis 1975; Magor et al. 1994; Lundqvist et al. 2006). In the chicken IGL locus, there is only one functional IGVL, IGJL, and IGCL gene, whereas there are multiple IGVL pseudogenes located upstream of the functional IGVL gene (Parvari et al. 1987). In contrast to the humans and mice that depend on gene rearrangements to generate light chain diversity, chickens generate light chain diversity through intrachromosomal gene conversion, a nonreciprocal recombination process that uses the upstream pseudo-IGVL genes as donor sequences (McCormack et al. 1991).
Birds are an enormously diverse group of vertebrates comprising around 9,000 species (Shukla and Tyagi 2004). However, with the exception of chicken (galliforms) and ducks (anseriforms), the characterization of avian IGL isotypes is very limited (Reynaud et al. 1983; Magor et al. 1994; Lundqvist et al. 2006). Furthermore, analysis of the complete genomic organization of the IGL locus currently exists only for chicken. With the breadth of limited knowledge regarding avian IGL genes and their genomic organization, the recently available draft genomic sequence of zebra finch (Taeniopygia guttata) provides an opportunity to study the IGL genes in another avian model species. Zebra finch is a member of Passeriformes, which diverged from chicken (galliforms) more than 100 Ma (Brown et al. 2008).
In the present study, we analyzed the IGL sequences and their genomic organization in zebra finch to investigate whether the overall organization of the IGL locus in this bird species is similar to the chicken IGL locus. We also compared the genomic organization of the IGL locus between avian (chicken–zebra finch) and mammalian (human–horse) species to understand the evolutionary mechanisms that generated the IGL repertoire in these species.
An exhaustive gene search was conducted to identify all the light chain genes in the draft genome sequences of zebra finch (T. guttata) from the Ensembl genome browser. This is the first release of the zebra finch genome assembly (assembly: Taeniopygia_guttata-3.2.4, Aug 2008). The complete genome sequence of zebra finch was produced by The Genome Center at Washington University School of Medicine in St Louis. To retrieve the light chain variable and constant genes in zebra finch, we performed TBlastN searches (cutoff E-value of 10−15) using as queries the encoded amino acid sequences of nine functional light chain variable region genes [three IGVK and three IGVL from human, one IGVL from chicken, and two IGV sigma–encoding sequences (IGVS) from frog] and five functional constant sequences (one IGCK and two IGCL from human, one IGCL from chicken, and one IGCS from frog), respectively (Das et al. 2008). The nine IGL variable sequences and the five IGL constant sequences in the query data set aligned to the same genomic regions because they are similar to one another. For this reason, we extracted only nonoverlapping genomic sequences that produced alignments with the lowest E-values. To identify the joining genes, which are very short and cannot be detected by Blast searches, we manually screened 7 kb upstream of the constant gene, taking into account the location of the recombination signal sequence (RSS) at the 5′ end of the joining gene.
To retrieve any expression data, the identified genomic sequences were used as queries in BlastN searches against the Expressed Sequence Tag (EST) database of NCBI and the ESTIMA database (http://titan.biotec.uiuc.edu/cgi-bin/ESTWebsite/estima_start?seqSet=songbird). Multiple sequence alignments between the retrieved cDNA sequences and the genomic IGVL genes were performed using MAFFT (Katoh et al. 2009).
The variable region genes can be divided into two hypervariable or complementarity-determining regions (CDR1 and CDR2) and three framework regions (FRl, FR2, and FR3) (Kabat and Wu 1991). For the variable region genes, any retrieved sequence that aligned with the query sequence without any frameshift mutations and/or premature stop codons in the leader exon and the V-exon, possessed the two conserved Cys residues in FR1 and FR3 regions, respectively, and had a proper RSS was regarded as a potentially functional gene. All other sequences, including truncated ones, were regarded as pseudogenes. For the constant and joining region genes, the retrieved sequences that did not have any frameshift mutations and/or internal stop codons were regarded as potentially functional genes. In addition, for the joining region gene, we have examined the RSS to determine putative functionality. All the retrieved sequences were aligned with the query sequence (MI0003665) using the ClustalW program (Thompson et al. 1994), and the alignments were inspected manually to maximize similarity.
The repetitive elements were identified using the CENSOR software tool (Kohany et al. 2006).
The number of nucleotide differences and the proportion of differences per site (P distance) (Nei and Kumar 2000) were calculated using MEGA4 (Tamura et al. 2007) and SWAAP v1.0.3 (http://asiago.stanford.edu/SWAAP/SwaapPage.htm). In these analyses, all three codon positions were included. All positions containing alignment gaps and missing data were eliminated only in pairwise sequence comparisons (pairwise deletion option). Standard errors were obtained by a bootstrap procedure of 500 replicates.
To map the unique insertion QQQSST (see Results) on the tertiary structure of zebra finch, IGL homology modeling and fold recognition were performed using the SWISS-MODEL (http://swissmodel.expasy.org) (Arnold et al. 2006) and PHYRE (http://www.sbg.bio.ic.ac.uk/~phyre) (Bennett-Lovsey et al. 2008) web servers. Pairwise structural alignments and structural superimposition were performed using the SSAP (http://cathdb.info/cgi-bin/SsapServer.pl) (Taylor and Orengo 1989) and DaliLite (http://www.ebi.ac.uk/Tools/dalilite) (Holm and Park 2000) web servers. Tertiary structure figures were generated using PyMol (DeLano Scientific; http://pymol.org).
Using the human (kappa and lambda), chicken (lambda), and frog (sigma) IGL variable sequences as queries, we identified 21 IGL variable region genes located in a cluster in chromosome 15 of the zebra finch genome. The genomic location of these variable region genes in zebra finch is given in supplementary table S1 (Supplementary Material online). The sequence comparison with functional sequences of human, chicken, and frog indicated that among the 21 IGL variable region genes, only one sequence is functional because it contains a complete coding sequence without frameshift mutations and/or internal stop codons, two conserved Cys residues in FR1 and FR3 regions and a proper RSS (fig. 1). The genomic structure of the single functional IGL variable region gene in zebra finch is shown in supplementary figure S1 (Supplementary Material online). The remaining IGVL genes either lacked the proper leader and/or RSS or were truncated in their 5′ or 3′ ends (supplementary fig. S2, Supplementary Material online), like the chicken IGVL pseudogenes (Reynaud et al. 1987). Only four sequences contained internal stop codons (supplementary fig. S2 and table S1, Supplementary Material online).
From the similarity search using as queries, five functional IGL constant sequences (one IGCK and two IGCL from human, one IGCL from chicken, and one IGCS from frog), we identified only a single functional IGL constant–encoding gene in the zebra finch genome. This gene is located 4.5 kb downstream of the single functional variable region gene. ESTs and cDNA sequences confirm the presence of a single functional IGL constant–encoding gene in zebra finch because the identified ESTs and cDNA sequences align—with almost 100% identity and no gaps—to a single genomic position in the zebra finch genome that corresponds to the position of the single IGCL functional gene (supplementary table S2 and fig. S3, Supplementary Material online). To identify the IGL joining region gene in the zebra finch genome, we scanned for the conserved RSS in the 4.5 kb region between the functional variable and constant region genes because the joining region gene is too short (usually 12 amino acids in length) to be identified by Blast searches. Once the potential RSS was identified in this 4.5 kb region, we translated the nucleotide sequences at the 3′ end of the RSS into amino acids and compared the translated sequence with the human, chicken, and frog IGL joining region sequences, which were identified in our previous study (Das et al. 2008). Using this method, we identified a single functional IGL joining region gene in zebra finch.
To characterize the isotype of the zebra finch IGL sequences, we used the cladistic molecular markers, which we previously described (Das et al. 2008). Like the IGVL sequences of other tetrapods, the only functional variable region sequence of zebra finch lacks Ser or Thr at position 7 and does not possess a bulky aromatic residue (Phe or Tyr) at position 53 (fig. 1). The tetrapods IGVL sequences generally have a fairly conserved DEAD (Asp–Glu–Ala–Asp) motif in the FR3 region (Das et al. 2008). However, like chicken IGVL sequence (i.e., DEAV at position 64–67), the Asp residue is substituted for Val at position 67 (fig. 1).
Consistent with the variable region sequence, the molecular markers in the joining and constant region sequences in zebra finch also categorize them as lambda light chain sequences (figs. 2 and and3).3). In addition, like mammalian lambda light chain genes, the RSS sequences flanking the single functional variable region gene and the joining region gene are interrupted by a 23-bp and a 12-bp spacer, respectively, in the zebra finch IGL locus. Hence, the molecular markers in the variable, joining, and constant region sequences in zebra finch indicate that like chicken (Sanders and Travis 1975) and duck (Magor et al. 1994), the zebra finch genome encodes only the lambda isotype of Ig.
Strikingly, the zebra finch IGL constant region sequence can be distinguished from the tetrapod IGL constant region sequences because it contains a unique insertion of a six amino acid stretch (QQQSST) (fig. 3). This insertion is confirmed by the fact that all ESTs/cDNAs sequences, when translated, include the additional six amino acids (see supplementary table S2 and fig. S3, Supplementary Material online). To further analyze this insertion, we generated a 3D model of the zebra finch IGCL protein and mapped the insertion on the 3D model (fig. 4). Our analysis predicts that the QQQSST insertion extends a loop located in the region between the IGVL and IGCL by approximately 9 Å (fig. 4; supplementary fig. S4, Supplementary Material online). Although this extended loop is predicted to reduce the distance between the Ig light and the heavy chains by almost 6 Å (supplementary fig. S4, Supplementary Material online), it does not seem to affect the contact between the different Ig chains. This supposition deserves to be experimentally tested.
In the kappa- and sigma-encoding locus of most tetrapods, multiple joining region genes are present in a cluster, followed by a single constant gene, whereas in the lambda-encoding locus, joining and constant genes occur as IGJL–IGCL blocks, which usually have multiple copies. Only chicken has one IGJL–IGCL block (Das et al. 2008). The IGL locus in zebra finch contains one functional IGVL gene, multiple pseudo-IGVL genes, and only one IGJL–IGCL block (fig. 5), like the chicken locus (Reynaud et al. 1985, 1987).
The comparison of the IGL locus between the chicken and the zebra finch shows that in these species both the number and the position of the IGL genes are very similar (fig. 5), despite the fact that these species have diverged more than 100 Ma (Brown et al. 2008). The main difference between the two loci is the presence of a few pseudogenes with reverse orientation in chicken. In contrast to the general conservation of IGL locus observed between the two bird species, comparison of the IGL locus between different mammalian species (i.e., human–horse) that have diverged approximately 100 Ma showed many differences both in the constant and variable regions (fig. 5). In the constant region, both human and horse contain seven IGJL–IGCL blocks, but the distribution of functional genes and pseudogenes are different between the two species (fig. 5). In the variable region, both the number and the distribution of functional genes and pseudogenes also vary between the human (32 functional and 42 variable region pseudogenes) and the horse (25 variable region functional and 20 pseudogenes) lambda loci.
Our analysis of the IGL locus in zebra finch (Passeriformes) shows that the genome of this species encodes a single IGL isotype, that is, lambda (figs. 1–3), like the chicken (Galliformes). Taking into account that ducks (Arseniformes) contain only lambda IGL chains (Magor et al. 1994), we can safely suggest that birds encode a single IGL chain. If this idea proves to be true for all birds, then birds are exceptional to most studied tetrapod species, which encode at least two different types of IGL chains (Pilstrom 2002; Das et al. 2008). Indeed, three IGL isotypes (i.e., kappa, lambda, and sigma) are present in frogs, whereas reptiles and most mammals contain two IGL isotype (i.e., kappa and lambda) (Criscitiello and Flajnik 2007; Das et al. 2008; Qin et al. 2008). Assuming that the genomes of birds encode only lambda IGL chains, then the IGL kappa–encoding genes were lost before the divergence of Passeriformes and Galliformes (ca. 100 Ma) and after the divergence of the bird and the reptilian lineages (fig. 6).
Our data show that the genomic organization of the IGL locus is very similar between zebra finch and chicken, and both species contain a single set of functional IGVL, IGJL, and IGCL genes and multiple IGVL pseudogenes (fig. 5). The presence of a single IGVL gene and multiple pseudogenes has been reported for other bird species with the sole exception of muscovy ducks (Cairina moschata), which contain an additional potentially functional IGVL gene (McCormack et al. 1989). Given that the mallard duck (Anas platyrhynchos) contains only one functional IGVL gene (Lundqvist et al. 2006) and that ducks and chickens are more closely related than chickens and zebra finch (Hackett et al. 2008), we speculate that the presence of an additional functional IGVL gene in muscovy ducks is the result of a recent lineage-specific gene duplication. Therefore, the similar genomic organization between the IGL loci in zebra finch and chicken suggest that this type of organization was already present in the common ancestor of these two species and remained largely unchanged for a long evolutionary time.
The results of our study have three major implications for the evolution of the IGL lambda–encoding locus in tetrapods. First, the presence of a single functional IGJL–IGCL block is unique in birds because multiple copies of paralogous IGJL–IGCL blocks are present in the Ig lambda–encoding locus of all other tetrapods (Das et al. 2008). Second, the presence of multiple IGVL pseudogenes upstream of a single set of functional IGVL, IGJL, and IGCL genes and the nearly similar number of IGVL pseudogenes in both birds contradict the evolution of IGL lambda–encoding loci observed in other species. For example, in mammals, the ratio between functional genes and pseudogenes varies significantly, even between closely related species, like mice and rats or humans and macaques (Das et al. 2008; Das 2009). A prevalent hypothesis concerning large scale changes in genomic sequences, including the Ig heavy chain–encoding loci, suggests that such alterations may be partially explained by the content of different repetitive elements (Straubinger et al. 1987; Mazzarella and Schlessinger 1997; Matsuda et al. 1998). To test this hypothesis, we compared the content and distribution of repetitive elements in the lambda-encoding loci of zebra finch, chicken, human, and horse. Our analysis revealed that the content of repetitive elements is much higher in mammals than that of avian species (table 1). We speculate that the differences in the repetitive element content in the IGL locus can be one of the reasons to explain the higher gene content heterogeneity of the IGL locus in mammals as compared with birds. Third, our analysis together with previous results on the IGL lambda–encoding loci of multiple tetrapods (Das et al. 2008) suggest that the general organization of the lambda loci in amphibians and reptiles is more similar to that of mammals than that of birds. Whether this type of organization occurred by random genomic drift (Nei 2007) or whether this genomic organization was selected due to functional constraints in the common ancestor of birds remains an open question.
The conservation of the genomic organization of the IGL loci between the two bird species raises an additional implication that concerns the generation of antibody diversity in birds. It has been shown that the chicken generates light chain diversity through intrachromosomal gene conversion, which uses the upstream pseudo-IGVL genes as donor sequences (Carlson et al. 1990; McCormack and Thompson 1990; McCormack et al. 1991). The presence of a single functional IGVL followed by multiple IGVL pseudogenes in zebra finch suggests that this species may also use a similar mechanism to generate light chain diversity. The limited number (seven) of ESTs/cDNAs that we identified seems to support this notion (see supplementary table S2 and fig. S5, Supplementary Material online). This analysis suggests that there are at least three stretches of nucleotides in the IGVL-encoding mRNA that could be the result of a gene conversion-like mechanism. The estimated sequence divergence between the genomic IGVL sequence and the ESTs/cDNAs IGVL fragments (table 2; supplementary table S3 and fig. S6, Supplementary Material online) suggests that the extent of diversity in zebra finch is similar to the diversity in other bird species (Magor et al. 1994; Lundqvist et al. 2006).
In conclusion, our analysis of the IGL locus in zebra finch and the comparison of the evolution of the IGL loci between birds and mammals indicate that the mammals and the birds have used different evolutionary processes to achieve the same physiological result, which is the generation of antibody diversity and ultimately the defense against pathogens.
We thank Max Cooper, Jan Klein, Parimal Majumder, Jianxu Li, Masafumi Nozawa, and Sayaka Miura for their valuable comments and suggestions. U.M. was supported by the Associate Student Incorporated Grant from California State University, Fullerton (CSUF). This work was supported by the National Institutes of Health (grant GM020293-35 to M.N.), by the CSUF (start-up money to N.N.), and by a CSUF Junior Faculty Research Grant (to N.N.).