|Home | About | Journals | Submit | Contact Us | Français|
Despite tremendous progress in understanding the nature of the immune system, the full diversity of an organism’s antibody repertoire is unknown. We used high-throughput sequencing of the variable domain of the antibody heavy chain from 14 zebrafish to analyze VDJ usage and antibody sequence. Zebrafish were found to use between 50% and 86% of all possible VDJ combinations and shared a similar frequency distribution, with some correlation of VDJ patterns between individuals. Zebrafish antibodies retained a few thousand unique heavy chains that also exhibited a shared frequency distribution. We found evidence of convergence, in which different individuals made the same antibody. This approach provides insight into the breadth of the expressed antibody repertoire and immunological diversity at the level of an individual organism.
The nature of the immune system’s antibody repertoire has been a subject of fascination for more than a century. This repertoire is highly plastic and can be directed to create antibodies with broad chemical diversity and high selectivity (1, 2). There is also a good understanding of the potential diversity available and the mechanistic aspects of how this diversity is generated. Antibodies are composed of two types of chains (heavy and light), each containing a highly diversified antigen-binding domain (variable). The V, D and J gene segments of the antibody heavy-chain variable genes go through a series of recombination events to generate a new heavy-chain gene (Fig. 1). Antibodies are formed by a mixture of recombination among gene segments, sequence diversification at the junctions of these segments, and point mutations throughout the gene (3). Estimates of immune diversity for antibodies or the related T cell receptors either have attempted to extrapolate from small samples to entire systems or have been limited by coarse resolution of immune receptor genes (4). However, certain very elementary questions have remained open more than a half-century after being posed (1, 5, 6): It is still unclear what fraction of the potential repertoire is expressed in an individual at any point in time and how similar repertoires are between individuals who have lived in similar environments. Moreover, because each individual’s immune system is an independent experiment in evolution by natural selection, these questions about repertoire similarity also inform our understanding of evolutionary diversity and convergence.
Zebrafish are an ideal model system for studying the adaptive immune system because in evolutionary terms they have the earliest recognizable adaptive immune system whose features match the essential human elements ( 7, 8). Like humans, zebrafish have a recombination activating gene (RAG) and a combinatorial rearrangement of V, D and J gene segments to create antibodies. They also have junctional diversity during recombination and somatic hypermutation of antibodies to improve specificity, and the organization of their immunoglobulin (Ig) gene loci approximates that of human (9). In addition, the zebrafish immune system has only ~300,000 antibody-producing B cells, making it three orders of magnitude simpler than mouse and five orders simpler than human in this regard.
We developed an approach to characterize the antibody repertoire of zebrafish by analyzing complimentarity determining region 3 (CDR3) of the heavy chain, which contains the vast majority of immunoglobulin diversity (10, 11) and can be captured in a single sequencing read (Fig. 1). Using the 454 GS FLX high-throughput pyrosequencing technology allowed sequencing of 640 million bases of zebrafish antibody cDNA from 14 zebrafish in four families (Fig. 1B). Zebrafish were raised in separate aquaria for each family and were allowed to have normal interactions with the environment, including the development of natural internal flora. We chose to investigate the quiescent state of the immune system, a state where the zebrafish had sampled a complex but fairly innocuous environment and had established an equilibrium of normal immune function. mRNA was prepared from whole fish and we synthesized cDNA using primers designed to capture the entire variable region.
Between 28,000 and 112,000 useful sequencing reads were obtained per fish, and we focused our analysis on CDR3 sequences. Each read was assigned V and J by alignment to a reference with a 99 .6% success rate (table S3); failures were due to similarity in some of the V gene segments. D was determined for each read by applying a clustering algorithm to all of the reads within a given VJ, and then aligning the consensus sequence from each cluster to a reference. D was assigned to 69.6% of reads; many of the un assignable cases had D regions mostly deleted. Both the isotypes that are known to exist in zebrafish (IgM and IgZ) were found, and their relative abundance agrees with previous studies (12). Our analysis focused on IgM, which is the most abundant species; IgZ data a represented in figs. S3 and S4 (13).
There are 975 possible VDJ combinations in zebrafish (39 V × 5 D × 5 J = 975 VDJ). In any given fish, the VDJ combination coverage was at least 50% and in some cases at least 86% (Fig. 2). By using subsets of the full dataset to perform rarefaction studies, we demonstrated that our sampling of the VDJ repertoire was asymptoting toward saturation (Fig. 3A). Any VDJ classes that may be missing from the data are occurring at frequencies below 10−4 to 10 −5. There was a commonality to the frequency distributions of VDJ usage that was independent of the specific VDJ repertoire for individual fish (Fig. 3B). Specifically, the majority of VDJ combinations in each fish were of low abundance, but a similarly small fraction --- although different combinations for different fish --- were found at high frequencies. This distribution could be used to constrain theoretical models of repertoire development.
We next asked whether there is a quantitative relationship between the VDJ usage of different fish. The VDJ repertoire is a vector in which each element records the number of reads that map to a particular VDJ class. The dot product between VDJ repertoire vectors measures the degree of correlation between different fish (table S5 and Fig. 3C) [control experiments are described in (13)].
Most fish were uncorrelated in their VDJ repertoires; however, some fish were highly correlated, and three pairs of fish had correlation coefficients in the range 0.62 to 0.75. Some of these correlations appear to derive from the largest VDJ class in the repertoire (table S5A and Fig. 3C). When the fish-fish VDJ correlations were computed in the absence of the largest VDJ class, we discovered that although the largest correlations disappeared, a new set of correlations appeared between a larger fraction of the fish (table S5B and Fig. 3D). These correlations were mostly weaker than the previous correlations but still well above the statistical noise.
We were surprised to find measurable correlation in antibody repertoires between independent organisms. We created a model for random VDJ repertoire assembly, using simulated VDJ distributions that replicated the actual measured distributions and coverage fractions. The correlations in these simulated VDJ repertoires are all near zero, and the probability of two fish having a highly correlated random repertoire is less than 10−6(Fig. 3C and D). Thus, even though the VDJ repertoire is believed to be generated by a series of random molecular events within independent individual cells, in zebrafish the VDJ repertoire appears substantially structured and nonrandom on a global scale. It is possible that the source of this structure is simply convergent evolution, that the fish see a similar enough environment that selection in their quiescent immune systems converges to correlated VDJ usage. It is also possible that this distribution reflects bias in the VDJ recombination mechanisms, which would have important implications for antibody diversity space and would suggest that the number of solutions to a given antigen recognition problem, or at least the number that are readily evolvable, may be much smaller than previously assumed.
Summarizing the VDJ repertoire with a simple count of the number of different VDJ combinations neglects the variation in abundance of different VDJ species. Ecologists have the same problem in characterizing species diversity; they refer to the counting approach as species richness, and have developed other methods to characterize variation of abundance, which they term “heterogeneity” (14). The most popular approach to characterize heterogeneity is based on information theory, specifically the Shannon-Weaver entropy, which summarizes the frequency distribution in a single number (14). The VDJ repertoire entropies generally varied between 3.1 and 7.7 bits for individual fish. Exponentiating the entropy indicates the effective size of the VDJ repertoire, and this varied between 9 and 200 with an average of 105, or an average effective VDJ repertoire coverage of about 9%. This can be interpreted as the fraction of highly expressed VDJ classes.
Whereas the VDJ repertoire provides a coarse view of immunological diversity, each VDJ class can contain a large number of distinct individual antibodies that differ as a result of hypermutations and junctional changes. We characterized the antibody repertoire by using quality threshold clustering of Smith-Waterman alignments to group similar reads together; each cluster defines an antibody. Performing this analysis on control data with well-defined sequence clones allowed us to calibrate the clustering algorithm and separate true hypermutation diversity from sequencing errors. Many VDJ combinations included a large number of distinct antibodies. We found that the overall distribution of the abundances of the antibodies followed an apparent power law with scaling parameter 2.2, and this was consistent among all fish over two decades (Fig. 4B). This behavior may represent an important signature of the underlying dynamics of the adaptive immune system. It was not observed for either the control data or the VDJ distributions, and thus we ruled out the possibility that it is an artifact of polymerase chain reaction (PCR) bias.
There are several ways to use this data to estimate the number of unique antibodies per fish. The first is to perform rarefaction studies and determine whether the number of independent clusters tends to saturate. We did this and found that the saturation occurs at between ~1200 and 3500 unique antibodies per fish (Fig. 4A). Another way is by applying approaches used in ecology to estimate population sizes and diversity—sample and resample techniques (15). This yielded an estimate of between 1200 and 3700 unique antibodies per fish, whether applied blindly or using knowledge of the antibody abundance distributions (Fig. 4C). Both approaches are lower bounds on the true antibody diversity because antibodies that differ by only one or two mutations will be incorporated into the same cluster. We corrected for this effect by reanalyzing the data within each cluster with zero error tolerance, only matching exact reads. The largest clusters each had several subclusters with more than two reads each, and the control sequence data indicated that probably half of those clusters are real while the other half are artifacts due to sequencing error. By combining this stringent method of finding small differences in common sequences with the more permissive method of clustering rare sequences with less similarity together (thereby having tolerance to sequencing errors on rare transcripts) we estimated that the upper limit of heavy-chain antibody diversity is within 50% of the lower bound estimates, or between 5000 and 6000 antibodies in an individual fish.
To see how often repertoires converged to the same antibody, we searched for sequences that are shared between fish. Although there were no antibodies common to all fish, some antibodies were shared between smaller groups of fish (Fig. 4D). These cases of convergent evolution were more frequent than one would expect from a random usage model, with P values as low as 10−15. Unexpectedly, different individuals shared heavy chains that were identical in the region we sequenced, even up to hypermutation. Specifically, there were 254 unique sequences shared between two fish and 2 unique sequences shared between five fish. These data illustrate the powerful forces of selection and perhaps can be used to estimate evolutionary dynamics in this system.
In conclusion, we have performed a comprehensive measurement of the heavy-chain antibody repertoire of zebrafish. We discovered that the abundance distributions of both the VDJ repertoire and antibody heavy-chain diversity were similar between individuals, that VDJ usage is not uniform, that individuals can have highly correlated VDJ repertoires, and that convergent evolution of identical heavy-chain sequences is unexpectedly common. With the rapid advance of sequencing throughput, it will soon be possible to make similar measurements on mice and humans. These organisms use the same molecular mechanisms for repertoire generation as fish; thus, we predict that they may also show similar distributions of antibody frequencies.
We thank W. Talbot for useful conversations and the generous loan of equipment and N. Neff for assistance with sequencing. IgM and IgZ sequence and quality-score files are available on the NIH short-read archive, with accession number SRA008134. This research was supported by the NIH Director’s Pioneer award (S.R.Q.), the Arthritis Foundation Postdoctoral Fellowship (N.J.), and an NSF graduate fellowship (J.A.W.).