Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Ann Zool Fennici. Author manuscript; available in PMC 2010 August 16.
Published in final edited form as:
Ann Zool Fennici. 2008 January 1; 45(5): 369–384.
PMCID: PMC2922036

Simulation studies for a multistage dynamic process of immune memory response to influenza: experiment in silico


This communication provides an illustration for the use of computer simulations in human immunology. When traditional experiments are impossible, unethical, or unfeasible, in silico modeling procedures may help to fill the gaps in our knowledge of an immune system response to a pathogen. In our study, we define terms and properties of modeled entities: “a clonotype”, its distribution, and rank-frequency summaries, and describe properties associated with each of these three clonotype-related entities. We simulate a multistage dynamic process of an immune memory response to influenza. We believe that illustrated properties of fractality and self-similarity might arise due to the following process. The memory T cells operate in a complex environment of shifting pathogen concentrations, increasing and then decreasing inflammatory signals, and multiple interactions with other immune cells and their infected targets. Therefore, a fractal structure to such a population would represent an optimization in terms of percolation into immune/inflammatory space.


The immune system is an example of a complex system with multiple layers of interacting cells and molecules all aimed at identification and elimination of pathogens while avoiding collateral damage to self. As a first approximation the immune system can be divided into two levels, the first recognizes signs of destruction of the host and of pathogen invasion. This innate level of immune responsiveness has evolved over longer periods of time and is genetically hard wired. It includes effector functions such as phagocytosis and the various aspects of inflammation. The second level acts during the life time of its host and involves adaptive changes by cells of the immune system to possible pathogens encountered. While starting at the level of recognition of rapidly changing pathogen signatures the adaptive level has taken on more complex roles and functions. The hallmark of this layer is the use of multi-gene families which can alter their expression on the basis of a learning process. Most impressive is the harnessing of genomic chaos in the form of gene rearrangements and somatic mutation for generating novel receptors on a clonal basis. This level of the immune system can be considered a real time example of natural selection with the generation of diversity followed by selection. The adaptive level recognizes finer characteristics of pathogens and its effector arm is more highly tailored to their specific elimination, decreasing the requirement for inflammatory processes associated with innate function. Inherent in this latter process is the ability to recognize a second occurrence of the same pathogen with a strong specific response. This is called immune memory and is the basis for vaccination (Ahmed & Gray 1996).

In spite of the importance of memory, it is still poorly understood in either man or the most commonly used immune animal model, the mouse. While many basic issues of the induction of memory can be established in the mouse, of most practical interest is the understanding of memory in man. Memory in man has different requirement from that of mouse in terms of the longevity, complexity of pathogen exposure, and migration/travel of humans resulting in novel pathogen exposure. Thus, some aspects of immune memory can only be studied in man. The analysis of immune responses in man is difficult. It has already been shown in mouse studies that the history of exposure to pathogens set up differences in the memory repertoire (Selin et al.1999). In humans, the exposure history is unknown. Samples are only available from the blood and not from other secondary lymphoid organs. Thus, many lines of experimentation that are possible in mice are impossible, unethical, or unfeasible in man. One of the purposes of computer simulations or in silico modeling is to fill the gaps in our knowledge by gathering simulated observations.

Simulations can be launched in order to understand how a system may take shape. In silico modeling can be used to provide possible histories of how a system may evolve to a particular stage for which in vitro or in vivo observation exist. They can be used for forecasting or prediction on how a system may change under various stressors. The results of simulated experiment may trigger new questions or hypotheses and provide insights into the nature of a process under investigation. The largest benefit would be to provide a basis for predictions that can be tested at the bench.

Our recent observation of a fractal self-similar structure in memory repertoires (Naumov et al. 2003) analyzed in vitro motivated us to explore simulation studies that can provide a better understanding of the multistage dynamic process of an immune memory response to a pathogen. In this communication we describe the in silico modeling procedure that imitates the development of immune response to influenza peptide. We start the modeling procedure with defining terms and properties of the modeled entities. We define the smallest observable unit used for modeling and analysis, “a clonotype”, clonotype distributions, rank-frequency summaries of clonotype distributions, and describe properties associated with each of these three clonotype-related entities. We develop a conceptual framework for modeling and provide an illustration of simulation in simplifying conditions. The results of simulation are visualized using a polygonal spiral. The similarities between simulated and actual distributions are assessed. The insights gained from simulations are discussed. All simulations and analyses were performed using S-plus statistical software.

Working definitions and properties

At first, we define ”a clonotype”, the smallest observable unit used for modeling and analysis, and three clonotype-related entities: clonotype repertoire, distribution, and rank-frequency summary, which are the products of aggregating clonotype-specific information at various levels. Most of the properties of clonotypes are relatively invariant to experimental conditions and can be directly measured, as can some of the properties of the repertoire. We postulate that each of these three entities can be characterized by its diversity. Diversity is a property of a TCR repertoire that reflects repertoire variability and can be measured at different levels including CDR3 amino acid sequences and clonotype distribution. Diversity is one of the properties of a clonotype distribution and is associated with variety in rearrangement and multiplicity of selective re-assortment.

Clonotype, clonotypic lineage, clonotype properties

The T cell receptor (TCR) is composed of two protein chains. The genes for these proteins are generated by a rearrangement process joining two or three DNA segments; the variable segment (V) rearranges next to the joining segment (J) in the case of the α chain, and a V to diversity segment (D) to J segment rearrangement for the β chain (Davis & Chien 2001). The rearrangement process results in a random insertion/deletion of genetic information at the joining sites. The joining region will encode the part of the receptor that contacts the antigen (Fig. 1) and is referred to as the third complementarity determining region (CDR3) of the TCR. Therefore, the TCR is predefined by the use of a particular V and J region combined with the pseudorandom DNA sequence of the CDR3 (Fig. 2). As a general rule each functional T cell leaving the thymus expresses a single TCR, i.e. the receptor is clonal. The CDR3 sequence of a T cell defines its clonotype. In the context of recognition of an antigen fragment the term clonotype can refer to the amino acid sequence of the CDR3. However, we will use the more restricted definition of clonotype as the nucleotide sequence of the CDR3. On a molecular level almost all T cells express only one copy of the β chain and for this reason we will be generating β-chain CDR3 sequences.

Fig. 1
Illustration of the T cell receptor with clonotype defining region. The αβ-T cell receptor heterodimer(in blue) is generated by a rearrangement process that results in a random section of genetic information inserted in the position that ...
Fig. 2
Illustration of the V(D)J rearrangement process.

After a T cell bearing a particular clonotype is generated, it leaves the thymus and circulates through the blood and peripheral immune organs. It has a restricted lifetime in which to encounter the antigen for which its TCR has sufficient avidity of recognition. In the absence of antigen stimulation the clonotype will die out. In the presence of stimulation the clonotype will expand and define a lineage of cells all derived from the original clonotype precursor. However, it should be pointed out that this is an approximation as the exact number of precursors derived from a cell which has undergone a unique rearrangement is unknown. There is evidence for limited expansion of the original cell that rearranged its receptor prior to thymic exit (Correia-Neves et al. 2001). It is also unclear at which rate two identical clonotype sequences may arise during the rearrangement process. For example, rearrangements that encode the CDR3 amino acid sequence “IRSS” and that utilize the entire BV17 gene, which encodes the “I”, and the entire BJ2.7 segment which encodes the final “S”, can arise in 36 different ways that RS can be encoded. If such rearrangements occur more frequently than 36 times there will have to be more than one copy of an identical clonotype generated in the thymus. If the rearrangement is random then it will follow a Poisson distribution in which some of the clonotypes will appear two or three times before all the possibilities are filled. The rate at which identical clonotypes are generated de novo is a function of the length of the CDR3 with longer sequences decreasing the probability that this portion of the TCR could have arisen by independent events.

In addition to the unique DNA sequence in the CDR3, definable clonotype properties include the V, D, and J segments used, the length of the CDR3, the amino acid sequence encoded by the CDR3, and its avidity/affinity for any particular antigen complex. An antigen complex is usually a peptide fragment derived from the pathogen that is bound in the peptide binding groove of a molecule of the major histocompatibility locus (MHC).

Repertoires: naïve, responding, memory, diverse, and restricted

We define a T cell repertoire as the set of unique clonotypes. A naïve repertoire is an ensemble of clonotypes that have recently exited the thymus and have not yet encountered an antigen complex of sufficient avidity to induce stimulation. All the clonotypes involved in a response to a particular antigen complex define the responding repertoire. A memory repertoire is a subset of clonotypes previously selected by an antigen. A repertoire can be considered restricted if there are easily observed restrictions in some of its properties. For example all the responding clonotypes utilize a particular V gene, and/or a particular CDR amino acid sequence.

Specifically, a repertoire is a set of unique CDR3 sequences, C = {C1,C2,…,Ci,…,CN} where Ci is a unique clonotype, and N is a number of unique clonotypes with some pre-selected properties. A null set would indicate no measurable response. The simplest measure of repertoire diversity is the number of sequences observed in a measurement or clonotypes expected in the repertoire.

Clonotype distributions

The concept of clonotype distribution incorporates the number of times any particular clonotype is observed. A clonotype distribution described as a repertoire, C = {C1,C2,C3,…,CN}, with each clonotype being observed a number of times V = {VC1,VC2,VC3,…,VCN}, so M = ∑V, where M is a number of sequences. Typically, when counts are sorted in descending order, a clonotype distribution has a long tail that consists of singletons, or clonotypes presented by only one copy (Fig. 3).

Fig. 3
Example of a clonotype distribution. When the absolute colony counts for each clonotype were plotted in descending order, we see that a few clonotypes are present in multiple copies (reflected by the number of colonies) and a majority of the clonotypes ...

The shape of the clonotype distribution presented in such manner reflects its diversity. A number of unique clonotypes, denoted as N, and a number of clonotype copies, denoted as M represent two simplest measures of diversity. Their ratio yields another measure, an average number of clonotype copies, V = M/N. The difference between the highest number of copies for a unique clonotype, denoted as Cmax, and the average number of clonotype copies defines a crude measure of overall diversity, DC = Cmax/V – 1. In experimental conditions, the values of these measures (1 ≤ NNmax and 1 ≤ MMmax) are predetermined by rearrangement processes and by applied methods of clonotype detection. With respect to these measures, two extreme scenarios of reduced diversity could be observed. For instance, a monoclonal response, when only one clonotype with many copies is detected, indicates an extreme scenario with N = 1, V = M, Cmax = M, and DC = 0. Naturally, all the clonotypes that participate in the response define the diversity of the repertoire: C = {C1,C2,C3,…,CN}. If a set C contains less then two elements, a repertoire would be considered to be non-diverse. In another extreme case, many clonotypes with one copy are observed, so M = N, V = 1, Cmax = 1, and DC = 0. If there are many clonotypes, but all clonotypes observed with identical frequency, VC1 = VC2 = VC3 = … = VCN, a clonotype distribution would be considered to be non-diverse. All other situations are bounded by these two extreme forms of clonotype distributions, when the average number of clonotype copies is ranging as 1 < V < M, and a crude measure of overall diversity is above zero, DC > 0, simply reflecting a variety in a clonotype distribution form.

Rank-frequency clonotype summaries

The analysis of clonotype distribution based on relative frequencies for each observed clonotype is limited in its ability to describe the system. A rank-frequency or frequency of frequencies approach provides an efficient summary of the data and allows the further development of the concept of repertoire diversity. Figure 4 illustrates the rank-frequency summary for the clonotype distribution shown in Fig. 3. The rank-frequencies approach has been widely used in the studies of occurrence of species to provide a succinct summary of the data and to predict the probability of occurrence of new species. In converting a clonotype count into a frequency of frequencies the clonotypes are ranked on the based of their rank. Implicit in such a ranking is that the highranking clonotypes have expanded more than others, indicating a better fitness for the antigen. Since the only clonotypes with the sufficient number of copies can be detected in experimental conditions, the minimum observable frequency is one copy. Here the variety of rank-frequency summaries determines their diversity. We recognize that an analysis of rank-frequency summaries is based on somewhat simplistic assumption that a rank can reflect a clonotype probably to proliferate. Realistically, in rank-frequency summaries, a “rank” only mimics a behavior of a real physical parameter or a latent process.

Fig. 4
Illustration of a rank-frequency summary of a clonotype distribution. To further describe properties of the clonotype distribution we assigned each clonotype a rank based on the absolute counts of copies (rank 1 consists of clonotypes observed as single ...

To quantify rank-frequency summaries rapid decay we applied a power law equation y = a/xb, where x is the rank and y is the rank frequency. In the simplest situation, plotting a log/log transformation of the data: logy = logablogx, should yield a straight line, where the parameter a indicates the frequency of observing single copy clonotypes and parameter b describes the shape of the curve by indicating how rapidly the curve decays. However, in our case the repertoire distribution can be best described by treating it as composed of two components (see inset in Fig. 4). Separation of the curve into two portions at a critical point, xc, allows for a good fit of the first portion to a power law. This first portion includes the ranks represented by a high number of clonotypes, i.e. the extensive single copy tails from Fig. 3.

Measurement of repertoires

Identifying a clonotype

We define clonotypes based on the sequence of the CDR3. The CDR3 is obtained using polymerase chain reaction (PCR) amplification of the DNA from the end of the V region to the end of the J region. There are approximately 40–48 VB regions that can participate in the rearrangement, some of which comprise families of genes with related sequences. We routinely use 25 primers to amplify these genes. Thus the data from some of the multi-gene families cannot distinguish exactly which family member was involved in the rearrangement. The PCR product is fractionated by the length of the CDR3 as a first approximation of restriction and diversity. Those PCR reactions with products are subcloned into bacterial vectors, used for transforming bacteria, and the resulting bacterial clones are isolated and sequenced. Possible sources of error in clonotype identification are nucleotide transitions (more common) or transversions, which are introduced by the enzymes used for generating cDNA from mRNA (reverse transcriptase) as well as by the thermal stable polymerases used in the PCR technique. An estimation of this error rate comes from examining the non-CDR3 sequences flanking the CDR3 which are invariant.

Measuring clonotype properties

The V gene usage is derived from the primer used for the PCR amplification. In many cases it can be confirmed by the sequence obtained from the end of the primer until the beginning of the CDR3. In some cases the sequence can identify the actual V gene used in those cases where the primer amplifies more than one related family member. The J and D region usage, CDR3 length, CDR3 amino acid sequence can all be determined from the clonotype DNA sequence data. Presently measuring the avidity of the clonotype for the antigen is a complicated process and not easily amenable to routine use.

Measuring clonotype frequencies

Identification of clonotypes and their properties is relatively straightforward. Determining the frequency of clonotypes in a sample is less so. The simplest approach is to count the number of times a particular sequence is observed in the pool of bacterial colonies analyzed. Possible sources of error in the colony counting approach are changes in frequency of the mRNA or DNA due to the PCR amplification process, the efficiency with which the PCR product is ligated into the bacterial plasmid vector, the effect of the PCR insert on the ability of the plasmid to transform the bacteria and/or to effect the growth of the bacteria after transformation. These factors are difficult to control for.

Alternative approaches to clonotype frequency measurement involve synthesizing clonotype-specific oligonucleotides that can be used as probes or primers. Both uses involve DNA-DNA hybridization and conditions must be found that can allow discrimination between related sequences and also allow for the strength of signal to reflect frequency of target sequence. The most commonly used approach is quantitative PCR (qPCR) in real time. In one version of qPCR the oligonucleotide is used as a primer in the PCR process and the increase in double strand DNA is measured after each cycle by fluorescent dye intercalation. Another approach uses unlabeled primers and a dual-labeled self-quenching fluorescent probe which is hydrolyzed during the PCR process releasing the fluorescent dye and resulting in signal. Possible sources of error include the hybridization properties of the primer/probe which have to be standardized to a known control sequence, usually a housekeeping gene. The proper calibration of the system is to use DNA representing equimolar concentrations of control and target sequence. Cross-hybridization of the primer/probe with DNA of similar sequence to the target will effect the measurement. The cross-hybridization can be measured using control DNA of similar sequence, but eliminating this effect by increasing stringency may be difficult. At this point the colony counting method, while more cumbersome, is more tractable.

Measuring recall responses

In our measurements we define T cell memory functionally; by equating memory with the ability to respond to a recall antigen in vitro with a seven day kinetic. This takes advantage of the increased quality of memory T cells. In vitro manipulation of cells removed from an organism introduces a number of variables. Growth conditions in vitro are not currently able to mimic conditions found in the organism. Important anatomical architecture is lost. The limitation must be recognized in any analysis. Nevertheless, useful data can be obtained. An important aspect of in vitro approaches is reproducibility. Growth and stimulation in culture is a complex process that is meant to reflect a similar complex process taking place in an organism, but small changes in unknown variables can lead to chaos. Thus, there is a need to perform all these analyses in triplicate using identical starting materials to insure that the cellular processes taking place in the cultures are not chaotic.

A multistage dynamic process of immune memory generation

We view memory development as a multistage dynamic process that consists of three stages: (1) initiation, (2) selection, and (3) memory stabilization. This dynamic process reflects transformation of a naïve repertoire into a mature memory repertoire. The framework for simulating immune response builds on our current understanding of how a naïve repertoire is generated and how a mature repertoire evolves and decays with aging. A simplified schematic of the model framework is shown in Fig. 5.At the current stage of framework development we are omitting the last stage of this process, the stage of memory decay.

Fig. 5
Model framework for generation of the memory repertoire. The initial clonotype distribution (naïve repertoire) can be uniform (grey bars, y-axis left panel) or show some skewing (white bars). Each clonotype has a particular avidity for the antigen ...

Repertoire initiation

A T cell exiting the thymus must express a TCR (α and β chains) of sufficient affinity to pass positive selection, which includes TCR recognition of self-peptide MHC complexes. However, too strong a response to self may lead to negative selection and deletion. A subset of such naïve T cells should be able to respond to any antigenpeptide MHC complexes (pMHCI) that the cells may encounter after leaving the thymus (Blackman et al. 1990). In addition to TCR recognition, there are a number of accessory molecules that participate in the activation of the naïve T cell by the antigen (accessory signals) (Grakoui et al. 1999, Lee et al. 2003).The nature and number of these accessory signals will vary as the selection takes place. Thus, every T cell will have an avidity value for the antigen-presenting cell based on the final contributions of the TCR and accessory molecules. These contributions can be both qualitative and quantitative. Thus, two T cells with exactly the same TCR may have different avidities based on differences in accessory molecules. And two T cells with identical TCR and identical panoply of accessory molecules may have different avidities based on quantitative differences in the TCR or accessory molecules.

An important lacunae in our understanding of initiation is that the distribution of clonotypes generated by the thymus is not known. It is assumed that the naïve repertoire, originating from clonal events is not extensively pre-selected, and thus the clonotype distribution would be flat. However, experiments to clearly prove this are lacking. Two major mechanisms by which the initial repertoire may deviate from a uniform distribution are uneven mechanisms for generating rearrangements and partial expansion within the allowed avidity window.

Antigen-induced selection

Contact with antigen will cause T cells with sufficient avidity to expand. Although some elements of selection are known (Busch & Pamer 1999, Yewdell & Bennink 1999,Kedl et al. 2000, Mercado et al. 2000), the exact details of this process are still poorly understood. The relation between avidity and probability of expansion are also not known. A selection cycle would include expansion followed by arrest of expansion and then contraction as the antigen is cleared from the system. As part of this process the repertoire gains information about the antigen and pathogen from which it was derived. The probability of contacting antigen has to be considered. In the selection phase the parameters are: the probability that any one of the clonotypes actually contacts antigen, the relation between avidity for antigen and expansion, the probability of elimination during the contraction process, and the possible differences in these parameters with different cycles of antigen contact.

Repertoire stabilization

As measured by us for a specific instance, the memory repertoire is composed of a large number of T cells, some of which have shown a large expansion, but an important characteristic is a heavy tail of T cells that have undergone minimal expansion (Naumov et al. 1998, 2003). Our current model fits the data into a discontinuous distribution, part of which can be described by a power law. While this description will be refined by further experiments, the model derived from the initialized repertoire subject to antigen selection should result in an equivalent distribution. It will be important to investigate the constraints imposed on our description of the mature repertoire by the sampling process. The stability of this phase will also be of interest. For the mature memory stage, the issues are the generating a more refined definition of clonotype frequency, exploring the fractal nature of the repertoire, establishing the relation between avidity and ability to divide, and determining the repertoire stability.

Framework for simulation studies of immune response

The simulation of a multistage dynamic process currently consists of three distinctive stages: initiation, iterative cycling, and evaluation. The first and second stages mirror in part the initialization and selection of repertoire development as described in our framework. The iterative cycling imitates repertoire development in “real time” from a naïve to a mature form, where each cycle can be interpreted as an episode of exposure resulting in immune response. Finally, the third stage of modeling is evaluation, which is essential for systematic assessment of model performance.


Let X be a set of unique clonotypes {x1, x2,…, xi,…, xN},where N number of clonotypes. Let θi be a propagation probability and ωi be a removal probability for a clonotype xi, so that θ = {θ1,θ2,…,θi,…,θN}, and ω = {ω1,ω2,…ωi,…,ωN}. The distribution of clonotypes generated by the thymus, and therefore the naïve repertoire, is not known. We can assume that clonotype distribution is uniform, so that at the initialization step each of N0-starting clonotype is represented by only one copy. However, there may be some level of preferential expansion in the thymus or in the periphery, so we may also consider an array of potential distributions as a start-up clonotype distribution, Wx, including Gaussian, exponential, or mixtures of distributions. A simple letter-based illustration of repertoire initiation is in Fig. 6.

Fig. 6
A simple letter-based illustration of repertoire initiation, expansion and contraction. At initiation a clonotype distribution associated with the naïve repertoire is assumed to be uniform. For illustration, each clonotype is coded by a single ...

Propagation and removal

We assume that the process of repertoire development consists of number of cycles, D, and each cycle consists of two consecutive iterations: one for the propagation and one for the removal. We allow each clonotype to propagate at j-cycle with the probability of propagation, θij (e.g. a set will be extended by one copy of a clonotype xi). At each j-cycle any clonotype can be removed with probability of removal, ωij. To implement this process, for any given unique clonotype xi at a j-cycle, we let Y and Z be random variables associated with a Bernoulli trial by defining it as follows: Y(not propagated) = 0, Y(propagated) = 1, Z(removed) = 0, Z(kept) = 1. Therefore, the simulated process consists of two draws: one for the repertoire expansion: yij ~ Bernoulli(θij); and one for the repertoire contraction zij ~ Bernoulli(ωij). The Bernoulli sampling scheme has a straightforward interpretation.

For example, for the first cycle, let yijk and zijk denote k-event at j = 1 step for the i-clonotype, (if y = 0 or z = 0, then k = 0; and if y = 1 or z = 1, then k = 1) the propagation and removal processes have the following alternative scenarios:

  • yi10: xi is not propagated, only a single copy of xi at the cycle 1 is available;
  • yi11: xi is propagated into two copies at the cycle 1;
  • yi10zi10: the only copy of xi is removed at cycle 1; xi is lost for future propagation;
  • yi11zi10: one out of two copies of xi is removed at cycle 1;
  • yi10zi11 – the only copy of xi is kept at cycle 1;
  • yi11zi11 – two copies of xi are available for future propagation.

At this stage, we let a random expansion of the system to be generated from a probability distribution, selected at the initiation step, and then a random contraction of the system to be performed according to the selected propagation/removal probabilities.An illustration of this process is shown in Fig. 6. By modifying modeling parameters, we imitate a “birth-death” process for different scenarios, which is computationally equivalent to a dynamic modeling with pre-selected expansion/contraction rates. After performing an expansion/contraction cycle, which reflects an episode of antigen-specific response, a simulated mature repertoire can be examined and evaluated.


We assume that after a number of cycles, the mature repertoire distribution can be observed and a correspondent rank-frequency summary can be created (see Fig. 7).

Fig. 7
clonotype distribution for a mature repertoire and its rank-frequency summary. clonotypes observed only once (e, h, n, p, q, r, z) are the singletons and form the low-frequency component of the mature repertoire. clonotypes a and b are the most observed ...

After completing each cycle or a series of cycles, we consider a decision-making step, which can serve as a set of stopping/checking rules for an iterative process. This step is essential for verification our initial assumptions about start-up and intermediate distributions. At each cycle we evaluate the number of clonotype available for propagation, total number of clonotypes, and clonotype frequencies.

As a first exploratory step to understanding the rank-frequency summaries we employed a simple curve fitting procedure based on a power law equation, y = a/xb, where x is the rank (absolute counts of a particular clonotype) and y is the rank frequency. This well-known power law equation forms the basis for the discrete form of the Pareto distribution and the Zipf distribution. The parameter a indicates the frequency of observing single copy clonotypes. Parameter b describes the shape of the curve by indicating how rapidly the curve decays. In the simplest situation, plotting a log/log transformation of the data: logy = logablogx, should yield a straight line. As is shown in Fig. 4, the simple power law equation approximates very poorly the sharp decay in the rank-frequency distribution: we observed greater deviations at lower, rather than higher frequencies. Separation of the curve into two portions substantially improves fitting that is suggestive of a possible biphasic mechanism for repertoire generation.

Although publications related to examining rank-frequency summaries of the TCR repertoire distribution are very limited (non-existent), there is a body of literature suggesting different approaches for fitting discrete long-tailed distributions. The selected potential candidates for such distributions are: the traditional Poisson and negative binomial distributions, discrete Pareto, Zipf, Reimann zeta, logarithmic, and shifted geometric distributions; these distributions decay very rapidly at first, but later slowly and might be able to accommodate the need of fitting long-tails.

An objective and measurable criterion of the successful modeling is an agreement between experimental data, theoretical inferences, and simulation results. We consider the proposed framework to be reliable if two approaches for approximating the actual data by a theoretical distribution or a mixture of distributions (“data-to-model” approach) and another by modeling a dynamic birth–death process (“model-to-data” approach) will give similar resulting distributions.Specifically, we expect that for a given set of initial assumptions after completing a number of cycles (to be determined) we obtain a resulting distribution that satisfies our theoretical assumptions and experimental data.

A multistage dynamic process of immune memory response to influenza matrix protein: experiment in silico

In designing experiment in silico, the data obtained from actual immunological experiments typically feed the parameters for computer simulations. In our example, we are simulating the process to better understand how the mature repertoire has been arrived. The similarity in the actual and simulated data ensures a complete description of repertoire development. Here, we derived parameters for the simulation study from our experience and existing publications and compared the results of simulation with the actual experiment on the recall response to an immunodominant peptide from the influenza matrix protein, M158–66 restricted by HLA-A2 described below.

Experimental data

The immune response to a peptide derived from amino acids 58–66 of influenza matrix protein (M158–66) is characterized by the extensive use of T cells expressing the BV17 TCR gene and a restricted amino acid sequence in the CDR3 of the TCR β-chain (Moss et al. 1991, Lehner et al. 1995, Naumov et al. 1998). As the CDR3 interacts with the antigenic peptide, a restricted motif is indicative of antigen driven selection. To analyze the memory T cell repertoire in the M158–66 response, triplicate cultures were generated from the peripheral mononuclear cells of a known responder. The cultures showed cytotoxicity specific for the antigen and HLA-A2 by the third in vitro stimulation (Naumov et al. 1998).At this point, T cells expressing the BV17 were the predominant species in the cultures. The unit of measure of the repertoire is the clonotype, represented by the unique DNA sequence that encodes the CDR3. The clonotypes present in each culture were identified by BV17-specific PCR of the CDR3, bacterial cloning of the amplified CDR3, and DNA sequencing. The number of bacterial colonies representing identical clonotypes was counted and used as a measure of clonotype frequency. In total, 501 bacterial colonies were counted and they identified 141 distinct clonotypes (data available at Applying a power law fit to the first portion provides the estimates of the parameters: a = 0.60 and b = 1.67 (see Figs. Figs.33 and and4).4). These results indicate that the first component of the repertoire is highly skewed in favor of clonotypes represented once and that the decay from this high proportion of unique clonotypes is relatively rapid.

Fine tuning

At the first round of simulations the expansion/contraction process was done by a Monte-Carlo simulation for a Bernoulli trail with a constant probability of 0.5 (fair coin toss) and in more generalized manner. We examine various scenarios for the probability of propagation, θij. In all these scenarios we start with 200 clonotypes (N = 200) each present at a single copy, and the probability of removal is fixed at 0.05 (ω = 0.05). The values for θij are drawn from: (a) a uniform distribution, θij ~ Uniform{0,1} and fair coin toss; (b) a linear composition as θij = 0.9 – 0.004i; (c) a power-like distribution that imitates our original clonotype distribution, θij = ib, where b = −0.618; (d) a normal distribution with parameters μ and σ2, θij ~ N{0, 1}; and (e) a mix of rates.

Each scenario at each cycle of simulation yields a set of trajectories indicating the life-time of a clonotype and the number of copies for each clonotype.

Figure 8 demonstrates the first two scenarios. The top panels illustrate the simulated trajectories for clonotype propagation and removal for each generation cycle. The histograms (lower panels) reflect the probabilistic properties of the rank-frequency summary for a mature repertoire distribution. As expected the first scenario, when the values for θij are drawn from the uniform distribution, leads to a Poisson distribution. The second scenario, when the values for θij are drawn from the standard normal distribution, leads to a distribution that is more skewed than the Poisson distribution, typical for a negative-binomial distribution. The approximation of the simulated data by the Poisson and by the negative binomial distributions provides a good agreement for the first and the second scenario, respectively. In the second scenario, a substantial fraction (15%) of clonotypes was lost for future propagation.

Fig. 8
Simulated individual trajectories and mature repertoire distributions.

The most interesting results were observed when we use the estimates based on the results of the experiment described above. We consider the third scenario with a diverse propagation probability scheme, in which b = −0.618, D = 8, N = 200, ω = 0.05. The simulated results demonstrate close similarity between the simulated (Fig. 9) and the actual data (see Fig. 4). After the final cycle of simulation, out of 200 clonotypes 51 clonotypes were lost for propagation and the rest 149 clonotypes produced 498 copies. We observed a power-like rank-frequency relationship with the shape parameters similar to the actual data. Applying the log-log rank-frequency summary to the simulated clonotype distribution revealed the self-similar fractal nature of the simulated immune response.

Fig. 9
Simulated repertoire for the scenario with a diverse propagation probability scheme.

Insights from simulation

One of important observations was the deviation from the simple fractal structure indicating a biphasic nature to the outcome. This nature became even more apparent in the scenario when 75% of clonotypes in the naïve repertoire has low (θ = 0.25) propagation probability and 25% has high (θ = 0.75) propagation probability (see Fig. 10). After six cycles, the simulated repertoire exhibited clearly a biphasic fractal structure. If for a simple fractal structure: logqY = 1 – logrX; where q = max(Y) and r = max(X),and for a biphasic fractal structure:


then we consider the area between two curves: one for a simple fractal structure, and another for an observed structure, and a measure of the degree of departure from simple fractal structure as I = ∑[1 – log(x) – log(y)]/n, where x is the rank, y is the frequency, and n is a number of observed ranks. In the simulated experiment, I = 0.284 (upper right panel). When we repeated the simulation 300 times (upper right panel) the rank-frequency summaries revealed stability of the observed biphasic fractal structure. This collected set of clonotype distributions can be thought as individual realizations from a population with M individuals. These results suggest that the departure from a simple fractal structure increases with increasing the number of cycles and the fraction of clonotypes with low propagation probability. The results also indicate that although the curve fitting is a straightforward procedure, the main limitation of this approach is the data-driven arbitrary selection of the cut-off point that can be crucial in drawing inferences about mechanism for repertoire generation.

Fig. 10
Simulated individual trajectories and rank-frequency summaries with the biphasic fractal structure.

The division of the clontypes into two groups with high or low propagation probabilities is in itself a simplification. We could expect that the actual distribution is more complicated with a span of possible θ values (see Fig 5). We expect that extending our models to include such a situation will further enhance the biphasic structure.

While we currently consider the probability of removal, ωij, to be constant for all clonotypes and at each cycle, it would be useful to explore the alternative and more realistic scenarios when ωij became dependent on number of cycles and number of clonotypes. Moreover, advancing the model building we can incorporate the dependence of a propagation probability on a number of cycles and the dependence of a removal rate on a total number of clonotypes for a given cycle and examine the relations between start-up and final distributions.


We have started to model memory T cell repertoire development on the basis of the proposed framework using as a target the existing clonotype distribution data from a mature repertoire. We simulated the process of repertoire generation on the basis of theoretical assumptions about model specification and parameterization. We compared simulated repertoires with actually measured ones. We identified conditions at which the simulated repertoire best resembles the repertoire measured in the actual experiment. These pointed out the importance of the rate and manner in which repertoire contraction takes place in the final outcome.

We believe that carefully crafted simulations within a well thought conceptual frame-work may imitate the development of immune response. The similarities between simulated and actual distributions are strengthening our recent observation of a fractal self-similar structure in memory repertoires observed in vitro. The parameters and variables used in the simulation can help focus the thinking about memory generation by laboratory researchers as well suggest experiments to test the model defined by the simulation. We believe that illustrated properties of fractality and self-similarity might arise due to the following process. Memory T cells are required to provide preemptive protection against a pathogen. They operate in a complex environment of shifting pathogen concentrations, increasing and then decreasing inflammatory signals, and multiple interactions with other immune cells and their infected targets. Single cells and their offspring cannot fulfill the required functions under these various conditions. Thus a population of cells is employed. Thus a fractal structure to such a population would represent an optimization in terms of percolation into immune/inflammatory space.

Computational power and existing infrastructure formed an ideal foundation for new and exiting field of computational immunology. This new approach may supplement traditional methodology based on in vivo and in vitro experiments. As any scientific experiment, a simulation experiment has to be carefully designed and evaluated. By incorporating regular checks and feedback loops into the modeling process, a research design employs an efficient strategy needed for the anticipated extensive computational effort. As much it is important to learn how to simulate an experiment it is crucial to know how to asses the simulation results, deliver them in a clear way and provide a proper interpretation. A well crafted and well executed in silico experiment should force new questions, trigger novel hypothesis, provide insights into the mechanism and nature of processes under the study.


This study was supported by the National Institutes of Health grants NIH-NIAID U19 AI62627 and HHSN266200500032C.


  • Ahmed R, Gray D. Immunological memory and protective immunity: understanding their relation. Science. 1996;272:54–60. [PubMed]
  • Blackman M, Kappler J, Marrack P. The role of the T cell receptor in positive and negative selection of developing T cells. Science. 1990;248:1335–1341. [PubMed]
  • Busch DH, Pamer EG. T cell affinity maturation by selective expansion during infection. J. Exp. Med. 1999;189:701–710. [PMC free article] [PubMed]
  • Correia-Neves M, Waltzinger C, Mathis D, Benoist C. The shaping of the T cell repertoire. Immunity. 2001;14:21–32. [PubMed]
  • Davis MM, Chien Y-H. T cell antigen receptors. In: Paul WE, editor. Fundamental immunology. 5th ed Lippincott Williams & Wilkins; Philadelphia: 2003. pp. 227–258.
  • Grakoui A, Bromley SK, Sumen C, Davis MM, Shaw AS, Allen PM, Dustin ML. The immunological synapse: a molecular machine controlling T cell activation. Science. 1999;285:221–227. [PubMed]
  • Kedl RM, Rees WA, Hildeman DA, Schaefer B, Mitchell T, Kappler J, Marrack P. T cells compete for access to antigen-bearing antigen-presenting cells. J. Exp. Med. 2000;192:1105–1113. [PMC free article] [PubMed]
  • Lee KH, Dinner AR, Tu C, Campi G, Raychaudhuri S, Varma R, Sims TN, Burack WR, Wu H, Wang J, Kanagawa O, Markiewicz M, Allen PM, Dustin ML, Chakraborty AK, Shaw AS. The immunological synapse balances T cell receptor signaling and degradation. Science. 2003;302:1218–1222. [PubMed]
  • Lehner PJ, Wang EC, Moss PA, Williams S, Platt K, Friedman SM, Bell JI, Borysiewicz LK. Human HLA-A0201-restricted cytotoxic T lymphocyte recognition of influenza A is dominated by T cells bearing the Vβ17 gene segment. J. Exp. Med. 1995;181:79–91. [PMC free article] [PubMed]
  • Mercado R, Vijh S, Allen SE, Kerksiek K, Pilip IM, Pamer EG. Early programming of T cell populations responding to bacterial infection. J. Immunol. 2000;165:6833–6839. [PubMed]
  • Moss PA, Moots RJ, Rosenberg WM, Rowland-Jones SJ, Bodmer HC, McMichael AJ, Bell JI. Extensive conservation of alpha and beta chains of the human T-cell antigen receptor recognizing HLA-A2 and influenza A matrix peptide. Proc. Natl. Acad. Sci. USA. 1991;88:8987–8990. [PubMed]
  • Naumov YN, Hogan KT, Naumova EN, Pagel JT, Gorski J. A class I MHC-restricted recall response to a viral peptide is highly polyclonal despite stringent CDR3 selection: implications for establishing memory T cell repertoires in “real-world” conditions. J. Immunol. 1998;160:2842–2852. [PubMed]
  • Naumov YN, Naumova EN, Hogan KT, Selin LK, Gorski J. A fractal clonotype distribution in the CD8+ memory T cell repertoire could optimize potential for immune responses. J. Immunol. 2003;170:3994–4001. [PubMed]
  • Selin LK, Lin MY, Kraemer KA, Pardoll DM, Schneck JP, Varga SM, Santolucito PA, Pinto AK, Welsh RM. Attrition of T cell memory: selective loss of LCMV epitope-specific memory CD8 T cells following infections with heterologous viruses. Immunity. 1999;11:733–742. [PubMed]
  • Yewdell JW, Bennink JR. Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. Annu. Rev. Immunol. 1999;17:51–88. [PubMed]