|Home | About | Journals | Submit | Contact Us | Français|
Rheumatologists see patients with a range of autoimmune diseases. Phenotyping these diseases for diagnosis, prognosis and selection of therapies is an ever increasing problem. Advances in multiplexed assay technology at the gene, protein, and cellular level have enabled the identification of `actionable biomarkers'; that is, biological metrics that can inform clinical practice. Not only will such biomarkers yield insight into the development, remission, and exacerbation of a disease, they will undoubtedly improve diagnostic sensitivity and accuracy of classification, and ultimately guide treatment. This Review provides an introduction to these powerful technologies that could promote the identification of actionable biomarkers, including mass cytometry, protein arrays, and immunoglobulin and T-cell receptor high-throughput sequencing. In our opinion, these technologies should become part of routine clinical practice for the management of autoimmune diseases. The use of analytical tools to deconvolve the data obtained from use of these technologies is also presented here. These analyses are revealing a more comprehensive and interconnected view of the immune system than ever before and should have an important role in directing future treatment approaches for autoimmune diseases.
Biomarkers—biological characteristics that can be objectively evaluated as indicators of a biological or pathological state—are being sought for many diseases. Biomarkers have the potential to transform our basic understanding and clinical management of a wide range of human illnesses. We have coined the term `actionable biomarkers' to describe biomarkers that can inform clinical practice—that is, biomarkers upon which clinicians can act (Figure 1).
Actionable biomarkers are already used in the clinical management of certain diseases, most notably cancer. A prime example is the BCR–ABL1 fusion gene of t(9;22) chromosomal translocations, which, in the correct clinical context, can be used to identify patients with chronic myelogenous leukaemia who are likely to respond to therapy with drugs that target the activity of the tyrosine protein kinase ABL1.1 Likewise, overexpression of the receptor tyrosine-protein kinase erbB2 (also known as HER2) characterizes the subset of patients with breast cancer who are likely to respond to treatment with a monoclonal antibody that targets the erbB2 receptor.2 These two success stories illustrate how molecular characteristics that are linked to disease pathogenesis, rather than clinical characteristics (which are generally a disease epiphenomenon), are most likely to serve as actionable biomarkers. In these examples a single biomarker suffices; in other cases, however, a panel of multiple biomarkers is more useful as it can yield a more comprehensive picture (termed a molecular signature) of a disease and its subtypes.3–6 In fact, in rheumatic diseases, only profiling using multiple biomarkers has so far proven useful.
One potential use for actionable biomarkers is in diagnosing disease. First, by casting a wide net, combinations of biomarkers might be identified that improve both the sensitivity and specificity of disease detection and classification. Second, by revealing a molecular signature of disease before the onset of definitive, characteristic symptoms, biomarkers might enable earlier diagnosis and therefore earlier institution of therapeutic, or even preventive, interventions. For example, biomarkers that can distinguish individuals with early-stage rheumatoid arthritis (RA) from patients with undifferentiated arthritis—or better yet from asymptomatic individuals who are genetically predisposed to develop RA—would be invaluable as evidence suggests that early intervention with existing drugs could prevent RA progression.7
As illustrated earlier, a potential use for actionable biomarkers is in predicting how an individual's disease will develop. As all known rheumatic diseases are heterogeneous, they do not manifest identically in all patients, nor do all patients respond to treatment in the same way. For example, RA ranges from mild and self-limiting to severe and progressive. In our opinion, stratification into subtypes is important for the clinical management of a disease and we propose that actionable biomarkers could aid this subtyping. Stratification of disease could help clinicians determine whether an individual's condition is likely to progress, and therefore whether aggressive intervention is needed, as well as select and establish an effective treatment strategy. For example, less than two-thirds of all individuals with RA have an adequate response to anti-TNF therapy.8 Using appropriate biomarkers might enable identification of non-responders before TNF-inhibitor therapy is initiated, thereby lowering costs and preventing unwanted complications associated with a therapy that was not going to be effective. Emerging reports of autoantibody profiles that can predict disease progression in so-called incomplete lupus,9 predict which patients will develop RA,10 or predict which patients with RA will respond to anti-TNF therapy,11 suggest that biomarker-based predictive tests will become as much a mainstay in the management of rheumatic diseases as they currently are in cancer.
Actionable biomarkers can also be used to monitor a patient's response to specific therapies. Such pharmaco-dynamic biomarkers can accelerate clinical trials by serving as early surrogate markers of the efficacy and safety of an investigational drug as well as guide clinicians as to when a given therapy should be initiated.
The nascent field of systems immunology, a branch of systems biology, uses computational mathematical modelling to characterize the immune system and predict its response when a specific component is affected. New technological approaches that can generate vast multiplex datasets have enabled the development of this field. Indeed, more than 40,000 mRNA transcripts from the human genome can now be routinely measured in a single microarray (a technology that provides details on which genes are expressed in a tissue or cell of interest). Multiplexed Luminex TM (Luminex Corporation, Austin, TX, USA) assays can quantitate 50 or more proteins that are involved in inflammation (that is, cytokines and chemokines) in a single small sample of tissue or blood; protein arrays can measure many more.
Additionally, new flow-cytometric methods are now available to simultaneously analyze the expression of 30 or more surface and intracellular proteins in individual cells. This technology promotes the identification and enumeration of the various peripheral blood cells in addition to revealing, for instance, which signalling pathways are activated in the different cell types.
A successful systems immunology study requires that the assays employed are as comprehensive as possible, and that they also possess sufficient resolution to distinguish the changes that accompany differential outcomes. Investigators of systems immunology are increasingly measuring a plethora of signals in response to an experimental intervention, such as a vaccine.12 Complex signatures emerging from such studies can act as biomarkers, and also provide clues to the mechanistic pathways that lead to specific outcomes, such as protection from disease. In addition, sufficient assay standardization and sample handling, including standardization of processing and storage protocols, are essential for a study to achieve reproducible results over time. This standardization is particularly important for studies in human immunology, which often involve longitudinal sampling, collection of specimens from multiple sites, and/or subject recruitment that can span multiple years.
New immunological technologies provide novel types of highly multiplexed readouts, with the potential to measure the activation induced in vitro by a given intervention, as well as resting immune phenotypes of cells (Figure 1). For example, individual differences in activation-induced signalling, but not in resting expression levels of certain phosphoproteins, correlate with disease outcome in acute myeloid leukaemia.13 Therefore, measuring changes in activation-induced signalling in rheumatic autoimmune diseases, using a flow cytometry based technique, might lead to changes in the clinical management of these diseases.
Cytometry by time-of-flight (CyTOF) mass cytometry uses multiple antibodies, each tagged with multiple copies of an individual heavy metal ion, and measures their binding to cells by mass spectrometry.14 By contrast, fluorescence cytometry is used to measure the binding of antibodies tagged with a fluorophore. The advantage of mass cytometry is that many more antibodies can be used in combination to assay a single sample (such as whole blood or single-cell suspensions from tissues), without the inherent spillover between fluorescence spectra that is inherent in optical fluorescence systems.15 Such a system has already been used to quantitate differences in cellular constitution and drug responses of individual cells in a complex mixture of cells such as bone marrow.16
In one of the authors' laboratories, 36 different metal ions have been chelated to polymers that have then been conjugated to antibodies, DNA dyes, or other markers (H. T. Maecker, unpublished work). In most cases, the resolution and sensitivity of mass cytometry are comparable to those of fluorescence flow cytometry, although generating a sufficiently sensitive reagent has so far not been possible for a few cellular markers. As experience with this approach increases, and with the availability of pre-made heavy metal ion-antibody conjugates for mass cytometry, this problem should be resolved. Moreover, as the number of mass cytometry systems in use increases, mass cytometry is likely to become the preferred method for initial multi-parameter flow-cytometric analysis, especially as the cost per marker analyzed is similar to that of fluorescence systems.
A number of new analysis platforms such as HyperCyt® (IntelliCyt Corporation, 9620 San Mateo Blvd NE, Alberquerque, NM 87113, USA)17 and CyTOF16 are vastly increasing the sample throughput and number of independent proteomic parameters that can be measured at the single cell level. The data collected in a single day, if reviewed by conventional methods, would require viewing many thousands of bivariate plots. This approach is not only inefficient, but also results in an incomplete understanding of the multidimensional relationships present in the underlying data. Effective automated gating and specialized tools for visualizing high-dimensional flow cytometry data are crucial areas of development.
In 2009, two automated gating methods—flow analysis with automated multivariate estimation (FLAME)18 and density-based merging (DBM)19—were developed, both of which are highly promising but that use very different approaches (Figure 2). By contrast, spanning-tree progression analysis of density-normalized events (SPADE),20 a tool developed for visualizing high-complexity flow cytometry data, foregoes traditional gating and bivariate plots altogether.
FLAME is based on the assumption that a sample of flow cytometry data can be modelled as a heterogeneous mixture of populations of cells (known as clusters) in which each cluster can be described by a skewed Student's t distribution (skew-t distribution).18 The skew-t distribution better fits asymmetrical populations than traditional gating approaches that are based on Gaussian mixture modelling. FLAME is designed to create an optimal number of clusters by comparing the average scale-free intracluster distance with the average scale-free intercluster distance. If the optimal number of populations has been assigned, the average scale-free intracluster distance will be smaller than the average scale-free intercluster distance (Figure 2).
FLAME seems to be effective when the populations can be distinguished by surface markers whose expression is binary. However, certain combinations of markers, such as those used in the study of cell cycle and differentiation, have staining patterns that are too irregular to be well-approximated by the skew-t distribution. These combinations include distributions with concave perimeters or distributions with `U', `L', or `S' shapes. Fortunately, DBM uses the density contours of the data itself to define the gates for each population and is better-suited for irregularly shaped distributions than FLAME.19 DBM detects inflection points in the data, much as experienced immunologists do when gating manually. Unlike FLAME, DBM becomes computationally inefficient beyond three dimensions.
FLAME and DBM are marked advances in automated cell-population gating, which is of great importance for complex datasets that can require the gating of a large number of distinct cell populations across each biological sample in the dataset. However, manually reviewing all of the automatically assigned gates to confirm that they have been properly applied can be time consuming.
As an alternative to automated gating approaches that attempt to approximate manual gating, SPADE16,20 is a visualization tool that organizes clusters into a 2D tree representation on the basis of their similarities across all markers selected by the user. By displaying clusters in a 2D tree structure, and using size and colour to denote cell density and marker expression, SPADE enables users to rapidly review large, high-dimensional datasets (Figure 3). Importantly, the density-dependent down-sampling and agglomerative clustering employed by SPADE can prevent rare cellular phenotypes from being `drowned out' by more highly represented cell types.21
One caveat of SPADE is that the user must specify the number of clusters to be found in the dataset, rather than have the number of clusters be driven by the data itself. In our experience, the user must specify that SPADE find a large number of clusters in order to ensure that rare cellular phenotypes are represented in the ensuing SPADE trees. This requirement causes SPADE to overcluster the data. We think, therefore, that SPADE needs to implement a formal methodology for determining when a single cluster cannot be further subdivided on the basis of the data being analyzed. This methodology should, at a minimum, take into consideration the empirically determined resolution limit of the detection platform, whether it be CyTOF or conventional fluorescence-based flow cytometry. If all differences between cells in a cluster fall below this resolution limit, then no further division into subclusters would be permitted. In addition, SPADE should enable groups of files to be compared using the same tree structure (such as comparing patients with healthy controls in which the tree structure is defined by data from the healthy controls). Currently, groups of files can only be compared if all data files are submitted to the program at one time, and no group-level statistical comparisons are available.
Microscope-slide-based linear antigen arrays were developed over a decade ago and have proven particularly useful for studying antibody responses to a large panel of different antigens in autoimmune, rheumatologic, and allergic diseases.22 The initial methodology was simple and involved printing purified or recombinant peptides or proteins on glass microscope slides coated with materials such as poly-l lysine, epoxy, and nitrocellulose to enhance noncovalent binding of the printed target peptides to the slide surface.22–25 Printing was, and still is, usually performed using contact printing and standard robotic microarrayers, but has evolved to include delivery using piezoelectric arrays, among other methods. Array content for the characterization of autoantigens has also progressed to include arrays of proteins, peptides, carbohydrates, and even lipids.26–28
Many groups still construct their own custom microarrays for individual diseases and applications. Investigators who lack the instrumentation or expertise to set up an array facility can purchase commercially available large-scale arrays containing over 10,000 recombinant proteins.29,30 The majority of array methodologies employ fluorescence or chemiluminescence for detection; new technologies for detection include multiplexed surface plasmon resonance,31 Raman spectral measurement,32,33 and magnetic particles on giant magnetoresistive sensors.34 If antigen array techniques are to alter the clinical practice of rheumatology, they will most likely do so in clinical laboratories or even at point-of-care using sophisticated sensors to read out the array data.
Rheumatology has several factors that make it particularly well-suited to the use of protein array technology. First, many rheumatic diseases are characterized by the presence of serum autoantibodies that predate development of clinical disease. These proteins are useful for diagnosis and prognosis, and, as some of them can be directly pathogenic, offer important clues for understanding disease pathogenesis.35 Second, a large number of rheumatic and other inflammatory diseases are thought to be autoimmune in nature, yet the target antigen(s) have yet to be identified. Third, autoantibody identification might prove useful for development of antigen-specific therapies36,37 or for selecting treatment modalities, such as belimumab or other biologic therapeutics, that are known to reduce levels of autoantibodies in treated patients.
Systemic lupus erythematosus (SLE) is a model autoimmune disease that has been extensively studied using multiplex assays. SLE is characterized by multisystem organ involvement and the production of high-titre, highly specific autoantibodies directed against molecules found in the nucleus (anti-nuclear antibodies).38 SLE is an extremely heterogeneous disease and, as such, is poorly understood, has few good biomarkers, and had no approved therapeutics until 2011. A striking finding in SLE and SLE-related diseases, including dermatomyositis, polymyositis, and systemic sclerosis, is that a majority of prominent autoantigens exist as particles containing one or more polypeptides that are associated with nucleic acids, such as RNA and DNA.39 Antigen arrays, whether spotted onto microscope slides or developed as bead-based arrays, have been used to simultaneously measure antibodies directed against all of the particles, individual polypeptides from the particles, and even linear epitopes modelled on each polypeptide, for both SLE and SLE-related diseases.22,40–43
Peripheral blood mononuclear cells (PBMCs) from a large subset of patients with SLE contain what has been referred to as an interferon biosignature.36,44 Several groups have demonstrated that mRNA transcript profiles from this SLE subset are highly similar to mRNA transcript profiles from PBMCs from healthy individuals that are exposed, in vitro, to type I interferons (IFN-α and IFN-β).44,45 This observation led to the hypothesis that defects in type I interferons and/or interferon-related signalling pathways could underlie the disease a large subset of patients who develop SLE, and could lead to therapies targeting this pathway.46,47
Multiplexed protein measurements have now been used to broadly characterize serum analytes; patients with SLE who possess the interferon biosignature were identified as part of the Autoimmune Biomarkers Collaborative Network,44 to test the hypothesis that, just as interferon-inducible transcript profiles in PBMCs are strongly associated with SLE, interferon-inducible serum cytokine and chemokine expression can be found in blood from patients with SLE. Bauer et al.48 used a method called rolling circle amplification to compare protein levels of a panel of 160 cytokines, chemokines, growth factors, and soluble receptors in patients with SLE with those in healthy controls.48 The same analytes were also measured in supernatants prepared from PBMCs from healthy donors that had been stimulated for varying periods of time with IFN-α. Surprisingly, ~30 circulating factors were markedly upregulated in blood from patients with SLE, many of them interferon-inducible. This striking observation provided early biochemical evidence that the interferon biosignature was not just an epiphenomenon, but rather was directly linked to the biology of the underlying disease. Importantly, these findings seem to be clinically actionable, as measurement of just three of the chemokines (namely CCL2, CCL19 and CXCL10, performed using a high-throughput method chemiluminescent assay) accurately predicted disease activity and clinically meaningful disease flares over a 1-year period in a cohort of 267 patients with SLE.49 In fact, measurement of these three chemokines proved to be superior to standard clinical rheumatology assays including those that measure C3, C4, double-stranded DNA, erythrocyte sedimentation rate, and C-reactive protein level.49 Taken together, these results provide a rationale for multiplexed measurement of cytokines and chemokines in other autoimmune diseases, including RA, in which a subset of cytokines have been shown to be elevated and associated with aggressive disease,50 and multiple sclerosis, in which a multiplexed bead-based assay demonstrated that IL-17F levels were elevated in patients with multiple sclerosis who failed to respond to IFN-β treatment.51
Are autoantibody profiles associated with the interferon signatures described above? The research group of one of the authors (P. J. Utz) has used arrays containing over 100 antigens to analyse the same serum samples used by Bauer et. al.,48 and demonstrated a strong association with autoantibodies directed against particles associated with RNA and DNA; this association has now been replicated in two additional SLE cohorts (P. J. Utz, unpublished work). We hypothesize that immune complexes composed of these RNA-containing and DNA-containing antigens are internalized by B cells and dendritic cells, at which point the RNA and DNA moieties dissociate from the immune complexes and activate proinflammatory Toll-like receptors including TLR3, TLR7, TLR8 and TLR9.52
Autoantibody profiles have been used by other groups to study cohorts of patients with SLE, RA, and multiple sclerosis. Multiple ongoing studies by one of the authors (P. J. Utz) are focused on characterizing antibody profiles in patients who are exposed to investigational drugs, with the goal of identifying predictive biomarkers.53 Although beyond the scope of this Review, antigen arrays have been extremely useful in studying mouse models of lupus, particularly mice lacking genes encoding interferon signalling molecules, retrogenic mice, and mice with altered MHC molecules.37,54–57
Clearly, multiplexed protein measurements will be crucial for elucidating pathogenic mechanisms in rheumatic diseases. Newer methods, such as high-throughput immunophenotyping using transcription (HIT) and Intel® (Intel Corporation, Santa Clara, CA, USA) peptide arrays synthesized using photolithography on the surface of silicon wafers, will enable more rapid and accurate measurement of serum analytes than ever before.58,59
Prior to the development of `next-generation' DNA sequencing instruments in the first decade of the 21st century, sequencing costs limited the characterization of B-cell receptor (BCR) and T-cell (TCR) populations. The experimental landscape has changed with the commercialization of several sequencing technologies that now make it possible to obtain thousands to millions of TCR or immunoglobulin sequences at a relatively low cost.40,60–68 Currently, the major issues are: how best to prepare immune-receptor-sequence libraries, which sequencing technologies to use, how to analyze the data, and how to relate sequence data with functional activities of the immunoglobulin or TCR complexes.
One can break down the kinds of analysis enabled by high-throughput DNA sequencing of TCR or immunoglobulin rearrangements into three main categories. First, this method can be used to measure overall repertoire features, including: V, D and J segment usage frequencies (Figure 4); junctional properties, such as exonuclease digestion and non-templated base addition; the pattern of amino acid usage in the CDR3 region; evidence of receptor editing; heavy-chain isotype usage and hypermutation of rearranged gene segments (in the case of antibodies); and the number of distinct sequences present, which can be used to estimate repertoire diversity. Second, the receptors expressed by clonally expanded B cells or T cells (Figure 4) can be detected and characterized, whether or not one knows the antigen specificity or other functional features of the expanded clones. Third, B-cell or T-cell clones of interest that have previously been identified and correlated with known function can be tracked. Each of these kinds of analysis can yield insights into lymphocyte populations but the features of T-cell and B-cell repertoires that distinguish autoimmune disease patients from healthy individuals have not yet been fully explored.
The key variables in high-throughput DNA sequencing are read length, throughput, accuracy, and cost. Although this technology is rapidly developing, most published work on high-throughput sequencing of immunoglobulin and TCR to date has used either the 454 platform (Roche, Basel, Switzerland), owing its long read lengths (~450 bases) and moderate throughput (1 million reads per run), or the Illumina platform (Ilumina, San Diego, CA, USA) with its higher throughput (tens to hundreds of millions of reads per run) for comparable cost, but shorter read lengths (up to 150 bases from each end of a DNA molecule). The 454 instrument can capture a full immunoglobulin heavy chain V(D)J sequence in a single read, which is very helpful when studying patterns of hypermutation in clonally related IgH.40,43,61,64,68–70 TCR sequences can be captured by shorter reads covering the V(D)J junction, and can take advantage of the Illumina platform throughput.62,65,71
The number of sequences that must be measured to provide meaningful data depends on the biological question being asked. Features of the immune repertoire such as segment usage, junctional nucleotides, hypermutation rates, and clonality can be analysed with thousands to tens of thousands of sequences. Deeper sequencing can detect progressively rarer populations. Typically, the detection of very rare sequences will only be meaningful if one has a prior reason for being interested in them, such as knowing the binding activity of these sequences, having previously observed clonally-related sequences in the same individual, or having seen similar sequences in other individuals. In addition, the finite rate of sequencing errors or PCR errors in a deep-sequencing experiment leads to the generation of artifactual sequence variants that can complicate estimation of the true diversity of an immunoglobulin or TCR library, particularly if the number of input B cells or T cells is not known, or if conservative filtering and replicate sample sequencing steps are not taken.63,72
For library preparation, multiplexed PCR reactions using large numbers of primers specific to the families of genes that encode the V and J segments have the advantage of relatively efficiently capturing sequences for amplification, but are difficult to optimize and usually confer amplification bias to some sequences. Heavily hypermutated immunoglobulin sequences are expected to be under-represented in all datasets owing to mutations in primer binding sites. The use of a variety of primer sets, including primers located in the relatively less-mutated leader regions of genes encoding the V segment, can alleviate this problem.73 An alternative strategy requires using a protocol involving rapid amplication of complementary DNA ends (5' RACE), which does not rely on gene segment-specific primers. Our current knowledge of human variation in immunoglobulin and TCR germline loci is incomplete, and copy number variants (both deletions and amplifications), allelic variants, and other germline locus features might affect detection strategies.74,75 Choice of template can also affect data interpretation, as genomic DNA is normalized to one copy of a V(D)J rearrangement per cell, and replicate libraries generated from genomic DNA aliquots give information about distinct cell populations. As mRNA is present in multiple copies, sequencing from cDNA actually limits the ability to distinguish between expanded clonal populations compared with high levels of mRNA expression by a single cell.
The initiating events of human autoimmune disorders are uncertain, and, despite clear evidence that adaptive immune responses have an important role in disease pathogenesis, it remains unknown whether T cells or B cells, or neither, are the site of primary dysregulation leading to immune-mediated damage of host tissues. Studies of the overall repertoire may shed light on abnormal selection processes for T cells and B cells in patients with autoimmune disease, as suggested by reports of alterations in the receptor repertoire following negative selection of self-reactive B cells, and impairment of selection checkpoints in patients with SLE.42,76–78 DNA sequence-based understanding of the underlying immunoglobulin and TCR repertoires, and of the receptors expressed by expanded clonal B-cell and T-cell populations in patients, might offer important new information for classification and monitoring of these diseases.
Will it transpire that the overall repertoires of immunoglobulin or TCR gene rearrangements in patients with autoimmune diseases are pathognomonic in gene segment usage or detailed sequence features, or that they have any other distinguishing parameter when compared with the repertoires of healthy individuals? The answer is currently unknown. It is possible that public TCR or immunoglobulin rearrangements (that is, identical receptors used to respond to the same antigen in more than one person, despite the huge diversity of possible receptors) could be essential pathologic features of some autoimmune diseases. However, aberrant immune responses in different patients with the same diagnosis apparently target multiple self-antigens, different subsets of self-antigens, and multiple epitopes on those antigens, decreasing the likelihood that a particular immunoglobulin or TCR rearrangement will be a highly specific or sensitive disease marker. Indeed, phage display of human single-chain variable antibodies has shown that many distinct sequences can bind the same antigen; over 1,000 distinct immunoglobulin heavy-chain rearrangements result in molecules that bind human B-lymphocyte stimulator (BLyS, also known as TNF ligand superfamily member 13B), with little overall stereotyping of this repertoire.79 Nevertheless, a high-throughput DNA sequencing study of monozygotic twins showed that an individual's germline genomic DNA sequence might be the strongest determinant of the usage of V, D and J segments in the immunoglobulin repertoire, providing a potential mechanism for some of the heritable predisposition to developing autoimmune disorders.80 Other results have highlighted that extensive public rearrangements contribute to immunoglobulin light-chain repertoires.81
If autoimmune disease-specific public TCR or BCR signatures prove difficult to identify, tracking of clonally expanded (and presumably antigen-stimulated) B-cell or T-cell populations over the course of disease and treatment could act as a filter, to identify clones of cells that are likely to be involved in disease pathogenesis in a particular patient. Persistence of particular clones of B cells or T cells, and their correlation with disease activity, response to therapy, and likelihood of relapse, could guide immunosuppressive medication regimens. Studies of lupus nephritis demonstrated that the T cells in renal infiltrates are relatively oligoclonal, and that related clone members can also be detected in blood samples.82–84 In one study, a clonal CD8+ T-cell lineage found in blood and renal tissue samples from a patient with lupus nephritis was still detectable in a subsequent renal biopsy sample taken 6 years later, suggesting that persistent and long-lived clones are a relevant feature of this disease.84 Further investigation of these topics will be greatly enhanced by the use of high-throughput DNA sequencing, by the more comprehensive measurement of TCR or immunoglobulin rearrangements present in a given blood or tissue sample, as well as by establishing age-adjusted normal-range measurements of the clonality of T cells and B cells in healthy individuals. Elderly individuals have high rates of oligoclonal and frequently cytomegalovirus-specific T-cell populations in the blood, particularly in the CD8+ compartment.85,86 Ensuring that such persistent clonal expansion of these T cells are interpreted with caution is an important factor in studies of autoimmunity.85,86 Tracking of clonally related B cells and T cells in patient samples over time, particularly if functional data have been obtained to identify pathologically important cell lineages, might offer the best hope of monitoring disease in a patient-specific fashion. This approach might be challenging, given the imperfect correlation or lag between the presence of both B cells and T cells that express auto-reactive sequences, or the detection of autoantibodies in the serum, and the development of disease signs or symptoms in the patient.35,77
In summary, high-throughput sequencing of immunoglobulin and TCR sequences offers a number of opportunities to expand our knowledge of human autoimmune biology. Global signatures might be present in some autoimmune diseases, but even in the absence of such signatures, tracking of B-cell and T-cell clones in individual patients could be used to monitor disease status and responses to therapy. We predict that the pairing of immunoglobulin or TCR sequencing with other experimental methods (such as selection of antigen-specific cells, or sorting of phenotypic lymphocyte populations of interest) should be particularly powerful for evaluating disease phenotypes.
In many cases, the biological samples analyzed by technologies such as microarrays are heterogeneous; that is, they are composed of multiple different cell types, each with its own gene and protein expression signatures. The frequency of different cell types might vary markedly between specimens, as it does, for example, in peripheral blood samples (2–10-fold differences in frequency among various cell types).87 In the case of gene expression microarrays, for example, the tissue sample is lysed to isolate the mRNA, which is then analysed by microarray. Traditional microarray analysis methods do not take into account any information on cell-type heterogeneity in the sample and so cannot distinguish between variations in gene expression attributable to an actual physiological change in a cell type and those attributable to differences in actual cell-type frequency. Moreover, the contributions of the different cell types to the total measured gene expression cannot be identified.88–90 Therefore, the ability of these methods to detect differentially expressed genes is strongly affected by variation in the frequencies of different cell types in the sample;88,89,91 moreover, the interpretation of results is made difficult as transcripts are described as part of a single system, without cellular boundaries or context (Figure 5A, right). Techniques to circumvent this issue by isolating specific cell types and profiling each type separately affect the underlying biology to a vary ing extent and make a strong underlying assumption on the cell type of interest. As a result, the perspective of the overall system is missing; that is, any information about non-profiled cell types is unknown and the effects of cell-to-cell interaction are lost (Figure 5A, left).
A methodological innovation is to use statistical deconvolution techniques to achieve a middle ground between cell-type specific and system-wide information levels (Figure 5A, middle; Figure 5B). This approach exploits the fact that the majority of genes are expressed to a varying degree in multiple cell types. By tracking how gene expression fluctuates between samples in relation to cell-frequency changes, the average gene expression of each cell type within an analyzed group of samples, as well as the cell-type specific expression differences between groups, can be accurately estimated in silico.92–94 The sensitivity of cell-type specific expression analysis performed in this manner is often orders of magnitude higher than that obtained by analyzing heterogeneous tissue samples, yet is likely to be lower than that achieved by isolating the individual cell types. Moreover, as the deconvolution methodology does not require any cell separation, the cell type responsible for any detected differences in expression can be identified whilst avoiding the requirement to isolate the cell type of interest. In contrast to traditional techniques, increased variation in cell frequencies between samples actually improves the performance of statistical deconvolution in accurately estimating cell-type specific expression and group differences.94 Groups of specific cell types have been shown to be reliably detected for cells whose frequency in the sample is as low as 5–10%, though the minimal cell-type frequency for which detection of group differences is possible can only be determined empirically owing to the large number of factors involved.94 Notably, statistical deconvolution-based techniques are not restricted to microarray gene expression but may be easily adapted to a large number of other assays (including deep sequencing, intracellular flow cytometry, mass cytometry, and protein arrays, as well as bead-based profiling) in which the biological samples analyzed are heterogeneous with respect to cell type.
As in all analyses performed in humans, a large amount of variability exists between samples, which is attributable to genetic differences, environmental factors, medical conditions and medication taken. A balanced experimental design between study groups to control for major factors (such as gender, age, BMI and so on)is recommended, yet accounting for all factors within the study is nearly impossible. We therefore recommend a combined solution comprising: a careful and detailed documentation of as many confounding variables as possible; rigorous statistical testing to measure the effects of the confounder variables at the start of the analyses, and the introduction of the major variables into the statistical model, sample size allowing, as per the classical statistical literature; post-discovery retesting of the relationship between findings and confounder variables; and follow-up experiments aimed at testing detected relationships between main findings and confounder variables.
In this Review, we have discussed new technologies that will be used in future immune phenotyping analyses: mass cytometry, peptide and protein arrays, and BCR and TCR sequencing. These novel assays offer the promise of new information to improve the management of autoimmune disease and represent the latest methodology for analyzing cells, soluble proteins, and genes, respectively. New technologies for the analysis of gene expression in whole blood samples and for deconvolution of the resultant datasets enable the expression of specific genes to be assigned to cell subsets, without isolation and manipulation of the blood cells; in this way they offer a much improved method of looking for actionable biomarkers. From such highly multiplexed analytical approaches, panels of actionable biomarkers will undoubtedly be extracted that will be useful for diagnosis, prognosis, clinical subtyping, and selection and monitoring of therapy. Given the complexity of the immune system and the high degree of crosstalk between cells, biomarkers would be expected to be not only of a single measure, but also of relationships between measures. It may be too early to tell which of these new methods will prove most practical and useful, but we strongly believe that future clinical decisions may be guided, in part, by biomarkers that can only be defined at as high dimensional. Hence, we advocate for increased training in quantitative methods.
The authors wish to thank Dr Hongwu Du for his advice on the systems immunology section of the article and for assistance with editing the manuscript.
Competing interests S. D. Boyd declares an association with the following company: ImmuMetrix LLC. See the article online for full details of the relationship. The other authors declare no competing interests.
Author contributions All authors researched data for the article, substantially contributed to the discussion of content and selection of references and reviewed/edited the manuscript before submission. H. T. Maecker, T. M. Lindstrom, W. H. Robinson, P. J. Utz, M. Hale, S. D. Boyd and S. S. Shen-Orr wrote the article.