|Home | About | Journals | Submit | Contact Us | Français|
The complex repertoire of immune receptors generated by B and T cells enables recognition of diverse threats to the host organism. In this work, we show that massively parallel DNA sequencing of rearranged immune receptor loci can provide direct detection and tracking of immune diversity and expanded clonal lymphocyte populations in physiological and pathological contexts. DNA was isolated from blood and tissue samples, a series of redundant primers was used to amplify diverse DNA rearrangements, and the resulting mixtures of barcoded amplicons were sequenced using long-read ultra deep sequencing. Individual DNA molecules were then characterized on the basis of DNA segments that had been joined to make a functional (or nonfunctional) immune effector. Current experimental designs can accommodate up to 150 samples in a single sequence run, with the depth of sequencing sufficient to identify stable and dynamic aspects of the immune repertoire in both normal and diseased circumstances. These data provide a high-resolution picture of immune spectra in normal individuals and in patients with hematological malignancies, illuminating, in the latter case, both the initial behavior of clonal tumor populations and the later suppression or re-emergence of such populations after treatment.
Antigen receptors with diverse binding activities are the hallmark of B and T cells of the adaptive immune system in jawed vertebrates and are generated by genomic rearrangement of variable (V), diversity (D), and joining (J) gene segments separated by highly variable junction regions (1). Initial calculations of the combinatorial and junctional possibilities that contribute to the human immune receptor repertoire greatly exceed the total number of peripheral T or B cells in an individual (2). One study in which small subsets of rearranged T cell receptor (TCR) subunit genes were extensively sequenced using a few segment-specific primers yielded extrapolations for the full TCR repertoire corresponding to 2.5 × 107 distinct TCRα-TCRβ pairs in the peripheral blood of an individual (3). Extensive repertoire analyses for the human B cell compartment have been more limited, although small-scale studies and focused analysis of immunoglobulin (Ig) class subsets such as IgE have been performed (4, 5). Advanced sequencing methods have recently been used to analyze B cell receptor diversity in the relatively simple model immune system in zebrafish (6). Against a background of continually generated novel DNA sequences, expanded clones of B cells with useful antigen specificities persist over time to enable rapid responses to antigens previously detected by the immune system. Systematic means for detection of such expanded clones in human beings would open much of our immunity to specific analysis and tracking, including measurement of clonal population sizes, anatomic distributions, and changes in response to immunological events (7).
In contrast to healthy immune systems, malignancies of B- or T-cell origin typically express a single dominant clonal Ig or TCR receptor. A variety of assays have been used to detect the presence of B cell clonality for diagnosis of lymphomas and leukemias, including analysis of Ig light chain gene restriction and Southern blotting or sizing of polymerase chain reaction (PCR) products from rearranged Ig or TCR loci (8, 9). While adequate for many applications, these strategies make limited use of the high information content inherent in rearranged immune receptor gene sequences and can give indeterminate results. A recent study using deep sequencing of clonal Ig heavy chain receptor genes (IgH) in chronic lymphocytic leukemia revealed unexpected intraclonal heterogeneity in a subset of cases, showing that fundamental features of leukemic cell populations have escaped notice using prior approaches (10). Detection of more subtle clonal populations (for example, to follow the response of lymphomas or leukemias to treatment) currently relies on time- and labor-intensive multiparameter flow cytometry or custom-designed patient- and clone-specific real-time PCR assays (11–13). Early diagnostic screening approaches may benefit from generalized and more efficient clonal detection. Indeed, a recent population-based epidemiological study showed that small amplified B-cell populations can be seen in almost all individuals who go on to develop chronic lymphocytic leukemia, further underscoring the importance of assessing lymphocyte clonality in human specimens (14).
Detection and analysis of clonality is also of fundamental interest in characterizing and tracking both normal and pathogenic immune reactions. For protective and healthy humoral immune responses, high-resolution analysis of immune receptor clonality and evolution offers the potential for definitive detection and monitoring of effective immune responses to vaccination and specific infections (15), while for some autoimmune disorders this type of analysis could facilitate diagnosis, long-term therapeutic monitoring strategies and, eventually, specific interventions (16).
Using a bar-coding strategy to allow pooling of multiple libraries of rearranged immunoglobulin heavy chain (IgH) V-D-J gene loci from many human blood samples, we have performed high-throughput pyrosequencing to characterize the B cell populations in a series of human clinical specimens (17). Deep sequencing of immune receptor gene populations offers specific and detailed molecular characterization as well as high sensitivity for detecting sequences of interest, and should help to transform our understanding of the human immune system, while aiding in diagnosis and tracking of lymphoid malignancies.
We amplified rearranged IgH loci in human blood samples using BIOMED-2 nucleic acid primers adapted for high-throughput DNA pyrosequencing. A unique 6-, 7-, or 10-nucleotide sequence “barcode” in the primers used for a particular sample allowed pooling and bulk sequencing of many libraries together, and subsequent sorting of sequences from each sample (Fig. 1, Supplementary Table S1). Patient specimens in our initial 2 replicate experiments included peripheral blood of three healthy individuals, with experimental replicates of one individual’s blood sample at each of two different time points 14 months apart; tissue specimens from patients with lymphomas; and peripheral blood from patients with chronic lymphocytic leukemia. We also studied samples generated by serial 10-fold dilutions of a chronic lymphocytic leukemia peripheral blood specimen into a healthy control peripheral blood sample, to assess the sensitivity of the sequencing approach for detecting small numbers of clonal B cells among a background B cell population (Table 1, Supplementary Table S2). From all specimens pooled for Experiment 1, we obtained 299,846 different IgH rearrangement sequences, while Experiment 2 yielded 207,043 sequences. All sequence reads used for further analysis were full-length IgH amplicons extending from the V gene segment FR2 framework region primer to the J primer region.
An overview of the IgH amplicon sequences in the data sets from Experiments 1 and 2 is shown in Fig. 2, with each point in the 2-dimensional grid for each sample indicating the V gene segment and the J gene segment used by a particular IgH V-D-J rearrangement. The size and color warmth of the circle at each point indicates what proportion of all sequences in the sample had the indicated V and J gene segment usage. Healthy peripheral blood lymphocyte populations showed a diverse use of different V and J gene segments, while samples that contained clonal IgH populations corresponding to lymphomas or chronic lymphocytic leukemia specimens were readily identified. Plots of the data showing the V, D and J segment usage are shown in Supplementary Figure S1.
Human cancers are clonal proliferations of cells that have sustained mutational damage leading to dysregulated proliferation, survival, and response to the extracellular environment (18). Molecular clonality testing of IgH receptor and TCR gamma loci, accomplished with the use of PCR and capillary electrophoresis, is a helpful adjunct to morphological and immunophenotypic evaluation of suspected B or T cell malignancies (19). Blood or bone marrow samples from some patients give indeterminate or oligoclonal patterns of reactivity for a variety of reasons: Few lymphocytes may be present, there may be genuine oligoclonal lymphocyte populations, or clonal lymphocytes may have separately detected rearrangements from two chromosomes. We compared the results from DNA sequencing of the products of independent PCR replicates for such samples. One such difficult case is represented by the bone marrow and liver specimens from patient 5 in Table 1. The patient had undergone liver transplantation and subsequently developed a large B cell lymphoma in the liver as a manifestation of post-transplant lymphoproliferative disorder, a condition in which immunosuppression leads to B or T cell lymphomas that are typically associated with Epstein-Barr virus infection (Fig. 2). The patient’s bone marrow showed small lymphoid aggregates that were shown to contain B cells on morphological and immunohistochemical stain evaluation. Capillary electrophoresis sizing of VDJ rearrangements in the bone marrow sample gave support for a clonal population, but it was unclear whether this population represented involvement of the patient’s bone marrow by the lymphoma seen in the liver. The sequencing data resolved this uncertainty, showing no relationship between the liver lymphoma clone associated with IGHV1-8*01-IGHD2-8*01-IGHJ4*02 and the bone marrow B cells. Instead, a separate clonal B cell population that used gene segments IGHV3-15*04-IGHD3-9*01-IGHJ6*02 was present in the bone marrow. Patients with post-transplant lymphoproliferative disorder can develop multiple independent malignant clones, making the extra information provided by sequencing analysis of replicate PCR products particularly helpful. The other VDJ rearrangements detected in the patient’s bone marrow differed between the 2 replicate experiments, indicating the presence of small numbers of non-clonal B cells in the specimen. Another diagnostically challenging case, the chronic electrophoresis analysis. A consistent pattern was seen with deep sequencing of the sample. Finally, the two distinct V-D-J rearrangements in a lymph node from patient 3 indicated that there were two separate clonal B cell populations in the specimen, a conclusion supported by morphological and immunophenotypic evidence of two different B cell lymphomas (follicular lymphoma and small lymphocytic lymphoma) in the tissue.
To evaluate the sensitivity of deep sequencing for detection of a clonal lymphoid population in a background of polyclonal cells, we performed serial 10-fold dilutions of a known clonal chronic lymphocytic leukemia blood sample into normal peripheral blood. The percentage of clonal sequences detected at each dilution is shown in Fig. 3 for Experiment 2, demonstrating detection down to a 1:10,000 dilution. This represents detection of 0.5 cells per microliter of blood when between 7,500 and 14,000 sequences are measured per sample of DNA template derived from approximately 10 microliters of blood.
We next evaluated clinical specimens from patients with chronic lymphocytic leukemia who had undergone total lymphoid irradiation and anti-thymocyte globulin therapy followed by HLA-identical allogeneic peripheral blood progenitor cell transplantation (20–21), and compared the results of deep sequencing analysis to results from patient clone-specific real-time PCR assays (Table 2). In these experiments, the patients with chronic lymphocytic leukemia were different from the patients tested in our initial experiments described in Table 1, and the minimal residual disease sequencing was performed in a separate instrument run. Real-time PCR assay results were reported as confidently positive if at least 100 copies/μg of template DNA were detected. Table 2 demonstrates that all specimens showed agreement between the high-throughput sequencing data and real-time PCR assay, although for the lowest confidently positive real-time PCR result for chronic lymphocytic leukemia patient A, the clone was detected in only one of the two high-throughput sequencing sample replicates.
To identify potentially expanded B cell clones within healthy peripheral blood, we looked for independent occurrences of “coincident” IgH sequences (identical V, D, and J segments, and identical V-D and D-J junction sequences) in independent pools from the same individual. Such coincidences could have resulted from clonally related cells; indeed, clonal relationships are likely for a majority of these coincidences, given both the diversity of the potential repertoire of IgH rearrangements and the absence of rearrangements found in this individual from comparable sequence samples from different individuals. We note that any population with a limited IgH rearrangement repertoire would be expected to show large numbers of such coincidences. Instead, we observed only small numbers of coincident sequences in our data. From six independent amplification pools derived from the blood of a single individual at one time point, we observed only 19 potential coincidences from a total of 10,921 distinct IgH rearrangements sequenced. Seven independent amplification pools from a second time point (14 months later) gave comparable results (25 potential coincidences from a total of 7,450 distinct rearrangements sequenced) (Table 3).
It is noteworthy that we see only slightly fewer coincidences when comparing aliquots between the two time points (0.76 coincidences per sample comparison versus 1.22 for comparisons within the same time point). Although the difference is statistically significant (P<0.05; Fischer’s exact 1-tailed), the modest ratio between intra-temporal and inter-temporal coincidence levels indicates a considerable degree of persistence in the clonal populations in this individual. The numbers of coincident sequences observed when comparing sequence data from any two aliquots provide strong evidence for substantial diversity in the IgH repertoire. Minimal estimates obtained using approaches similar to the “birthday problem” in probability theory (22) yield a lower bound of approximately 2 million different IgH rearrangements in these samples. The analysis leading to this lower bound estimate does not yield an upper bound on repertoire; in particular, it is not possible from these data to rule out a category of IgH rearrangements that are very diverse but present in single or low copy number in the approximately 2×109 B cells in peripheral blood. Thus the true complexity of the blood IgH repertoire could certainly be much greater than 2×106.
In addition to the total complexity of the IgH pool, it is of interest to evaluate the degree to which clonal cell populations above a certain size are present in normal peripheral blood. No sequence was identified in more that described above, we can derive an upper bound for the most abundant IgH rearrangements. For the healthy individual examined in these experiments, this analysis yields a maximum contribution to the sequence pool of 1/1000 for any individual clone (P<0.01) in this individual (23).
Within these experimental estimates of the lower bound of IgH repertoire size, and the upper bound of largest clone size, a variety of combinations of clonally expanded populations of different sizes could give rise to our observed data. Estimation of the upper limit of the IgH repertoire would require much more extensive sequencing to evaluate the extent of single-copy or very small clonal expansions of B cells, and would require characterization of a significant fraction of the blood volume of a healthy donor, which presents ethical concerns. It should be noted that this analysis of the blood does not exclude the possibility that other tissues may contain B cells that are clonally related to circulating cells, and does not address the exchange of B cells between the blood and other hematolymphoid compartments of the body. The sequences found in multiple replicates performed with blood from the healthy donor characterized in Table 3 are presented in Supplementary Table S4.
We extended our analysis of healthy human patients to an additional 23 subjects ranging in age from 19 to 79 years by sequencing sixfold replicate samples of peripheral blood IgHs from each individual. We detected considerable inter-individual variation in the number of expanded lymphocyte clones and expanded clone sizes (Table 4). Using an analysis similar to that performed for the healthy donor in Table 3, we calculated the minimum IgH repertoire size and the largest clone size for these additional subjects. Our data confirm that at least 15 of the 23 additional normal human samples had IgH pools of greater than 1,000,000 different rearrangements. Although the additional eight individuals may have comparable diversity, the lower bound estimates were somewhat lower, relative to the other 15 subjects, because of the greater numbers of weakly amplified clones detected and the lower total yield of sequences from these samples. For a majority of the healthy samples, no sequence appeared in more than two of six sequenced DNA aliquots; for these individuals, this places an upper limit of 0.1–0.3% of the measured B cell repertoire that could be dedicated to any single clone, similar to the results from the individual in Table 3.
Two of the apparently healthy blood donors in our sample set had expanded B cell clones that were large enough to be detected in all 6 sequencing replicates. The size of these larger clones can be estimated by the expanded clonal sequence’s proportion of total sequences obtained from these patients: For the 54-year-old patient this value was 0.15%, while for the 68-year-old patient the value was 1.5% of the total sequences.
These data demonstrate detection of clonal populations that make up greater than 0.1% of the total B cell population is readily possible with the small blood samples used for this work (less than 0.1 ml of blood was sufficient for the multiple replicates from these specimens). Further, these results suggest that searches for persistent pre-malignant or pathological clonal populations at the 0.1% level might be facilitated in certain cases by the limited set of amplified candidates in the normal repertoire.
Deep sequencing data sets of this kind should enable explicit detection of preferentially rearranged or selected combinations of V, D or J segments in IgHs in specific populations. Using the healthy control specimens in our current data sets, we have seen evidence of preferential pairwise segment associations for at least 3 combinations (D2-2 with J6, D3-22 with J3, and D3-3 with J6) across the group of individuals. Overrepresentation of these D/J combinations (i.e. a frequency of the DJ combination that is greater than the products of the D and J frequencies) was observed in 122/138, 113/138, and 119/138 sequenced aliquots, respectively. With a false discovery rate of less than 10−7 (no examples of overrepresentation in this number of aliquots were found in 107 randomly shuffled datasets), these were the most consistent non-random associations seen with the dataset (certainly other associations might emerge from a larger dataset). We interpret these results as reflecting non-random character in rearrangement or selection in this specific population of individuals (Stanford’s blood donor pool in a fixed time frame). One could certainly expect (and it would be of great interest to follow) different specific nonrandom characters in other populations with distinct histories of community immune response and genetic compositions.
Modern DNA sequencing methods open a new window of investigation into the complex gene rearrangements necessary for human lymphocyte function. Our results using multiplexed barcoded IgH sequencing of multiple replicate samples of blood from 24 healthy subjects represent the most extensive characterization to date of human B cell populations. For a majority of the healthy individuals, our results were sufficient to place a lower limit of 1,000,000 on the number of distinct IgH rearrangements in circulating lymphocytes, and an upper bound of 0.1–0.3% of total B cells on the representation of any single clone within the repertoire. A small number of individual amplified clones with greater representation were observed in healthy individuals in our sample set, with the largest clonal populations (seen in patients aged 54- and 68-years-old) accounting for 0.15–1.5% of total sequences of the observed sequence space from circulating B cells. These larger expanded clones may be the result of physiological responses to environmental antigens or pathogens; alternatively, these could represent the precursors to lymphoid malignancies, such as chronic lymphocytic leukemia, that have a strong association with advanced patient age. Recent and older literature describing monoclonal B cell lymphocytosis (MBL) using multi-parameter flow cytometry assays to detect B cells with aberrant surface protein expression has indicated that between 5–12% of adults have these atypical B cell populations, and essentially all patients who develop chronic lymphocytic leukemia can be shown to have had preceding MBL (14, 24, 25). An important caveat is that most patients who show MBL do not go on to develop chronic lymphocytic leukemia (24, 26). High-throughput immune receptor sequencing provides an unprecedented degree of sensitivity and specificity in tracking monoclonal B cell expansions and enables detection of clonal B cell populations that do not show aberrant cell surface marker expression; it remains to be seen whether this augmented form of tracking will be of use in dissecting the additional clinical and molecular variables that lead some clonal expansions to progress to frank leukemias.
Deep sequencing of IgH rearrangements simplifies the assessment of overt populations of suspected malignant B cells in clinical samples and shows preliminary success in minimal residual disease testing after treatment of leukemia patients. A substantial advantage of the minimal residual disease detection approach used here is that all patient samples can be analyzed with a single uniform assay, rather than having to tailor individual real-time PCR assays to each patient’s clonal malignant sequence and to validate these assays individually as unique clinical tests, an expensive and laborious process likely to limit the accessibility of minimal residual disease (MRD) testing. Having a sequence-based assay that can detect variants from the original malignant clonal sequences present at diagnosis may be an advantage in screening for disease relapse. Recent microarray-based data from studies of acute lymphoblastic leukemias suggest that genomic copy number changes may occur relatively frequently at immune receptor loci between initial diagnostic specimens and relapse specimens (27). For the most sensitive detection of residual disease and clonal variants in a variety of B cell neoplasms, particularly those such as follicular lymphoma that have ongoing hypermutation of rearranged IgH gene loci, it will likely be advisable to use several different primer sets (for example, making use of all three framework regions of the IgH V genes) to avoid false-negative results that arise from mutations at primer-binding sites.
In a broader perspective, the deep-sequencing approach to lymphocyte population analysis may provide insights into autoimmune and infectious diseases, medical manipulations of the immune system such as vaccination, and harmful outcomes of current therapies, such as graft-versus-host disease after stem cell transplantation. We expect that immune receptor sequencing in medical scenarios that involve lymphoid malignancies or immune-mediated diseases will be broadly useful for gathering diagnostic, prognostic, and disease-monitoring information.
Fig. S1. V-D-J plots of healthy peripheral blood and lymphoid malignancies.
Fig. S2. Sequence complexity of healthy-donor blood specimens.
Table S1. Sequencing primers. (A) Primers used in Experiments 1 and 2. (B) Primers used in Experiment 3 (C) Primers used in healthy subjects of various ages.
Table S2. Patient specimens used in Experiment 2
Table S3. Number of sequences determined per specimen
Tables S4. Sequences found in more than one replicate from healthy-donor blood samples
We thank Kenneth Weinberg, Dan Arber, Poornima Parameswaran, Gil Chu, Perry Blackshear, Gerald Marti, Alex Lucas, Mark Davis, Bob Ohgami, Ron Levy, Shoshona Levy, Sandy Feng, Claus Niemann, David Lewis, Ash Alizadeh, Steve Galli, Emmanuel Mignot, Magali Fontaine, Bill Robinson, May Han, Yaso Natkunam, Andrew Collins, and members of our laboratories for helpful discussions, Chaya Krishna for excellent technical support. We are grateful to the Lucille-Packard Child Health Research Program for support of research materials, and for individual support from the Walter V. and Idun Berry Fellowship program (SDB), NCI (P01CA049605 [JLZ]), NIGMS (T32GM07365-34 [ELM]), the Stanford VPUE Research Program (LNZ), and the Stanford Immunity, Transplantation, and Infectious Disease Institute (KDN, KCN).
*This manuscript has been accepted for publication in Science Translational Medicine. This version has not undergone final editing. Please refer to the complete version of record at http://www.sciencetranslationalmedicine.org/. The manuscript may not be reproduced or used in any manner that does not fall within the fair use provisions of the Copyright Act without the prior, written permission of AAAS.
A complete electronic version of this article and other services, including high-resolution figures, can be found at: http://stm.sciencemag.org/content/1/12/12ra23.full.html
Information about obtaining reprints of this article or about obtaining permission to reproduce this article in whole or in part can be found at: http://www.sciencemag.org/about/permissions.dtl
Conflicts of Interest
Boyd, Marshall, Merker, Maniar, Zhang, Sahaf, Jones, Nadeau, Nguyen, Miklos, Zehnder, Fire: None Simen, Hanczaruk, and Egholm are employees of 454 Life Sciences, A Roche Company
Authorship ContributionsDesigned research: Boyd, Fire, Marshall, Merker, Miklos, Sahaf, Zehnder, Jones
Performed research: Boyd, Marshall, Merker, Zhang, Sahaf, Jones, Nguyen, Simen, Hanczaruk
Contributed analytical tools and reagents: Fire, Boyd, Marshall, Maniar, Zhang, Egholm, Simen, Hanczaruk, Zehnder, Nadeau, Miklos
Analyzed data: Boyd, Fire, Marshall, Maniar, Zhang, Simen, Sahaf, Miklos, Zehnder, Jones
Wrote manuscript: Boyd, Marshall, Fire
Read and approved manuscript: all authors