|Home | About | Journals | Submit | Contact Us | Français|
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Dendritic cells (DCs) are a complex group of cells that play a critical role in vertebrate immunity. Lymph-node resident DCs (LN-DCs) are subdivided into conventional DC (cDC) subsets (CD11b and CD8α in mouse; BDCA1 and BDCA3 in human) and plasmacytoid DCs (pDCs). It is currently unclear if these various DC populations belong to a unique hematopoietic lineage and if the subsets identified in the mouse and human systems are evolutionary homologs. To gain novel insights into these questions, we sought conserved genetic signatures for LN-DCs and in vitro derived granulocyte-macrophage colony stimulating factor (GM-CSF) DCs through the analysis of a compendium of genome-wide expression profiles of mouse or human leukocytes.
We show through clustering analysis that all LN-DC subsets form a distinct branch within the leukocyte family tree, and reveal a transcriptomal signature evolutionarily conserved in all LN-DC subsets. Moreover, we identify a large gene expression program shared between mouse and human pDCs, and smaller conserved profiles shared between mouse and human LN-cDC subsets. Importantly, most of these genes have not been previously associated with DC function and many have unknown functions. Finally, we use compendium analysis to re-evaluate the classification of interferon-producing killer DCs, lin-CD16+HLA-DR+ cells and in vitro derived GM-CSF DCs, and show that these cells are more closely linked to natural killer and myeloid cells, respectively.
Our study provides a unique database resource for future investigation of the evolutionarily conserved molecular pathways governing the ontogeny and functions of leukocyte subsets, especially DCs.
Dendritic cells (DCs) were initially identified by their unique ability to present antigen for the priming of naïve CD4 and CD8 T lymphocytes . DCs have more recently been shown to be key sentinel immune cells able to sense, and respond to, danger very early in the course of an infection due to their expression of a broad array of pattern recognition receptors . Indeed, DCs have been shown to play a major role in the early production of effector antimicrobial molecules such as interferon (IFN)-α and IFN-β  or inducible nitric oxide synthase  and it has been demonstrated that DCs can also activate other innate effector cells such as natural killer (NK) cells . In light of these properties, it has been clearly established that DCs are critical for defense against infections, as they are specially suited for the early detection of pathogens, the rapid development of effector functions, and the triggering of downstream responses in other innate and adaptive immune cells.
DCs can be divided into several subsets that differ in their tissue distribution, their phenotype, their functions and their ontogeny . Lymph node-resident DCs (LN-DCs) encompass conventional DCs (cDCs) and plasmacytoid DCs (pDCs) in both humans and mice. LN-cDCs can be subdivided into two populations in both mouse (CD8α and CD11b cDCs)  and in human (BDCA1 and BDCA3 cDCs) . In mouse, CD8α cDCs express many scavenger receptors and may be especially efficient for cross-presenting antigen to CD8 T cells  whereas CD11b cDCs have been suggested [9,10], and recently shown , to be specialized in the activation of CD4 T cells. As human cDC functions are generally studied with cells derived in vitro from monocytes or from CD34+ hematopoietic progenitors, which may differ considerably from the naturally occurring DCs present in vivo, much less is known of the eventual functional specialization of human cDC subsets. Due to differences in the markers used for identifying DC subsets between human and mouse and to differences in the expression of pattern recognition receptors between DC subsets, it has been extremely difficult to address whether there are functional equivalences between mouse and human cDC subsets .
pDCs, a cell type discovered recently in both human and mouse, appear broadly different from the other DC subsets to the point that their place within the DC family is debated . Some common characteristics between human and mouse pDCs that distinguish them from cDCs  include: their ability to produce very large amounts of IFN-α/β upon activation, their limited ability to prime naïve CD4 and CD8 T cells under steady state conditions, and their expression of several genes generally associated with the lymphocyte lineage and not found in cDCs . Several differences have also been reported between human and mouse pDCs, which include the unique ability of mouse pDCs to produce high levels of IL-12 upon triggering of various toll-like receptors (TLRs) or stimulation with viruses [13,14]. Adding to the complexity of accurately classifying pDCs within leukocyte subsets are recent reports describing cell types bearing mixed phenotypic and functional characteristics of NK cells and pDCs in the mouse [15,16]. Collectively, these findings raise the question of how closely related human and mouse pDCs are to one another or to cDCs as compared to other leukocyte populations.
Global transcriptomic analysis has recently been shown to be a powerful approach to yield new insights into the biology of specific cellular subsets or tissues through their specific gene expression programs [17-21]. Likewise, genome-wide comparative gene expression profiling between mouse and man has recently been demonstrated as a powerful approach to uncover conserved molecular pathways involved in the development of various cancers [22-27]. However, to the best of our knowledge, this approach has not yet been applied to study normal leukocyte subsets. Moreover, DC subsets have not yet been scrutinized through the prism of gene expression patterns within the context of other leukocyte populations. In this report, we assembled compendia comprising various DC and other leukocyte subtypes, both from mouse and man. Using intra- and inter-species comparisons, we define the common and specific core genetic programs of DC subsets.
We used pan-genomic Affymetrix Mouse Genome 430 2.0 arrays to generate gene expression profiles of murine splenic CD8α (n = 2) and CD11b (n = 2) cDCs, pDCs (n = 2), B cells (n = 3), NK cells (n = 2), and CD8 T cells (n = 2). To generate a compendium of 18 mouse leukocyte profiles, these data were complemented with published data retrieved from public databases, for conventional CD4 T cells (n = 2)  and splenic macrophages (n = 3) . We used Affymetrix Human Genome U133 Plus 2.0 arrays to generate gene expression profiles of blood monocytes, neutrophils, B cells, NK cells, and CD4 or CD8 T cells . These data were complemented with published data on human blood DC subsets (pDCs, BDCA1 cDCs, BDCA3 cDCs, and lin-CD16+HLA-DR+ cells) retrieved from public databases . All of the human samples were done in independent triplicates. Information regarding the original sources and the public accessibility of the datasets analyzed in the paper are given in Table Table11.
To verify the quality of the datasets mentioned above, we analyzed signal intensities for control genes whose expression profiles are well documented across the cell populations under consideration. Expression of signature markers were confirmed to be detected only in each corresponding population (see Table Table22 for mouse data and Table Table33 for human data). For example, Cd3 genes were detected primarily in T cells and often to a lower extent in NK cells; the mouse Klrb1c (nk1.1) gene or the human KIR genes in NK cells; Cd19 in B cells; the mouse Siglech and Bst2 genes or the human LILRA4 (ILT7) and IL3RA (CD123) genes in pDCs; and Cd14 in myeloid cells. As expected, many markers were expressed in more than a single cell population. For example, in the mouse, Itgax (Cd11c) was found expressed to high levels in NK cells and all DC subsets; Itgam (Cd11b) in myeloid cells, NK cells, and CD11b cDCs; Ly6c at the highest level in pDCs but also strongly in many other leukocyte populations; and Cd8a in pDCs and CD8α cDCs. However, the analysis of combinations of these markers confirmed the lack of detectable cross-contaminations between DC subsets: only pDCs expressed high levels of Klra17 (Ly49q) and Ly6c together, while Cd8a, ly75 (Dec205, Cd205), and Tlr3 were expressed together at high levels only in CD8α cDCs, and Itgam (Cd11b) with Tlr1 and high levels of Itgax (Cd11c) only in CD11b cDCs. Thus, each cell sample studied harbors the expected pattern of expression of control genes and our data will truly reflect the gene expression profile of each population analyzed, without any detectable cross-contamination.
To determine whether LN-DCs may constitute a specific leukocyte family, we first evaluated the overall proximity between LN-DC subsets as compared to lymphoid or myeloid cell types, based on the analysis of their global gene expression program. For this, we used hierarchical clustering with complete linkage , principal component analysis (PCA) , as well as fuzzy c-means (FCM) partitional clustering approaches . Hierarchical clustering clearly showed that the three LN-DC subsets studied clustered together, both in mouse (7,298 genes analyzed; Figure Figure1a)1a) and human (11,507 genes analyzed; Figure Figure1b),1b), apart from lymphocytes and myeloid cells. The close relationship between all the DC subsets in each species was also revealed by PCA for mouse (Figure (Figure1c)1c) and human (Figure (Figure1d).1d). Finally, FCM clustering also allowed clear visualization of a large group of genes with high and specific expression levels in all DC subtypes (Figure (Figure2,2, 'pan DC' clusters). These analyses, which are based on very different mathematical methods, thus highlight the unity of the LN-DC family. To investigate the existence of a core genetic program common to the LN-DC subsets and conserved in mammals, clustering of mouse and human data together was next performed. We identified 2,227 orthologous genes that showed significant variation of expression in both the mouse and human datasets. After normalization (as described in Materials and methods), the two datasets were pooled and a complete linkage clustering was performed. As shown in Figure Figure1e,1e, the three major cell clusters, lymphocytes, LN-DCs, and myeloid cells, were obtained as observed above when clustering the mouse or human data alone. Thus, this analysis shows that DC subsets constitute a specific cell family distinct from the classic lymphoid and myeloid cells and that pDCs belong to this family in both mice and humans. All the LN-DC subsets studied therefore share a common and conserved genetic signature, which must determine their ontogenic and functional specificities as compared to other leukocytes, including other antigen-presenting cells.
Genes that are selectively expressed in a given subset of leukocytes in a conserved manner between mouse and human were identified and are presented in Table Table4.4. Our data analysis is validated by the recovery of all the genes already known to contribute to the characteristic pathways of development or to the specific functions for the leukocyte subsets studied, as indicated in bold in Table Table4.4. These include, for example, Cd19 and Pax5 for B cells , Cd3e-g and Lat for T cells , as well as Ncr1  and Tbx21 (T-bet)  for NK cells. Similarly, all the main molecules involved in major histocompatibility (MHC) class II antigen processing and presentation are found selectively expressed in antigen-presenting cells (APCs). Indeed, a relatively high proportion of the genes selectively expressed in lymphocytes or in APCs has been known for a long time to be involved in the biology of these cells. However, we also found genes identified only recently as important in these cells, such as March1  or Unc93b1 [40,41] for APCs, and Edg8 for NK cells . Interestingly, we also identified genes that were not yet known to be involved in the biology of these cells, to the best of our knowledge, such as the E430004N04Rik expressed sequence tag in T cells, the Klhl14 gene in B cells, or the Osbpl5 gene in NK cells.
In contrast to the high proportion of documented genes selectively expressed in the cell types mentioned above, most of the genes specifically expressed in LN-DCs have not been previously associated with these cells and many have unknown functions. Noticeable exceptions are Flt3, which has been recently shown to drive the differentiation of all mouse [43-45] and human  LN-DC subsets , and Ciita (C2ta), which is known to specifically regulate the transcription of MHC class II molecules in cDCs . Interestingly, mouse or human LN-DCs were found to lack expression of several transcripts present in all the other leukocytes studied here, including members of the gimap family, especially gimap4, which have been very recently shown to be expressed to high levels in T cells and to regulate their development and survival [49-51].
Thus, the identity of the gene signatures specific for the various leukocyte subsets studied highlights the sharp contrast between our advanced understanding of the molecular bases that govern the biology of lymphocytes or the function of antigen presentation and our overall ignorance of the genetic programs that specifically regulate DC biology. This contrast is enforced upon annotation of each of the gene signatures found with Gene Ontology terms for biological processes, molecular functions, or cellular components, and with pathways, or with interprotein domain names, using DAVID bioinformatics tools [52,53] (Table (Table5).5). Indeed, many significant annotations pertaining directly to the specific function of myeloid cells, lymphocyte subsets or APCs are recovered, as indicated in bold in Table Table5.5. In contrast, only very few significant annotations are found for LN-DCs, most of which may not appear to yield informative knowledge regarding the specific functions of these cells.
Thus, when taken together, our data show that LN-DC subsets constitute a specific family of leukocytes, sharing selective expression of several genes, most of which are still of unknown function. We believe that the identification of these genes selectively expressed in LN-DC subsets in a conserved manner between mouse and human will be very helpful for future investigation of the mechanisms regulating LN-DC biology by the generation and study of novel genetically manipulated animal models.
To search for equivalence between mouse and human LN-DC subsets, we examined their genetic relationships in the hierarchical clustering depicted in Figure Figure1e.1e. Two observations can be made. First and remarkably, mouse and human pDCs clustered together. This result indicates a high conservation in their genetic program and establishes these two cell types as homologs. Indeed, human and mouse pDCs share a large and specific transcriptional signature (Table (Table4),4), with a number of genes comparable to those of the transcriptional signature of NK or T cells. To the best of our knowledge, most of these genes had not been reported to be selectively expressed in pDCs, with the exception of Tlr7 [31,54] and Plac8 (C15) . Second, although mouse and human cDCs clustered together, the two cDC subsets of each species appeared closer to one another than to the subsets of the other species. Thus, no clear homology could be drawn between human and mouse cDC subsets in this analysis. However, it should be noted that known homologous human and mouse lymphoid cell types also failed to cluster together in this analysis and were closer to the other cell populations from the same species within the same leukocyte family. This is clearly illustrated for the T cell populations as mouse CD4 and CD8 T cells cluster together and not with their human CD4 or CD8 T cell counterparts (Figure (Figure1e).1e). Therefore, to further address the issue of the relationships between human and mouse cDC subsets, we used a second approach. We performed hierarchical clustering with complete linkage on the mouse and human LN-DC datasets alone (1,295 orthologous LN-DC genes), without taking into account the pattern of expression of each gene in the other leukocyte subsets as it may have hidden some degree of similarity between subsets clustering in the same branch. The results of the analysis of gene expression focused on DCs confirmed that mouse and human pDCs cluster together and apart from cDCs (Figure (Figure3).3). Importantly, when analyzing the DC datasets alone, mouse CD8α and human BDCA3 cDCs on the one hand, and mouse CD11b and human BDCA1 cDCs on the other hand, clustered together and shared a conserved genetic signature (Figure (Figure33 and Table Table6).6). Thus, although a higher genetic distance is observed between mouse and human conventional DC subsets as opposed to pDCs, a partial functional equivalence is suggested between these cell types. The majority of the genes conserved between mouse CD8α and human BDCA3 cDCs versus mouse CD11b and human BDCA1 cDCs have unknown functions and have not been previously described to exhibit a conserved pattern of expression between these mouse and human cell types. Notable exceptions are Tlr3 [31,56] and the adhesion molecule Nectin-like protein 2 (Cadm1, also called Igsf4) , which have been previously described to be conserved between mouse CD8α and human BDCA3 cDCs. When comparing cDC to pDCs, a few genes already known to reflect certain functional specificities of these cells when compared to one another are identified. Tlr7 and Irf7 are found preferentially expressed in pDCs over cDCs, consistent with previous reports that have documented their implication in the exquisite ability of these cells to produce high levels of IFN-α/β in response to viruses [58-60]. Ciita, H2-Ob, Cd83 and Cd86 are found preferentially expressed in cDCs over pDCs, which is consistent with their higher efficiency for MHC class II antigen presentation and T cell priming .
The functional annotations associated with the genes selectively expressed in specific DC subsets when compared to the others are listed in Table Table7.7. The most significant clusters of functional annotations in pDCs point to the specific expression in these cells of many genes expressed at the cell surface or in intracellular compartments, including the endoplasmic reticulum, the Golgi stack, and the lysosome. A cluster of genes involved in endocytosis/vesicle-mediated transport is also observed. This suggests that pDCs have developed an exquisitely complex set of molecules to sense, and interact with, their environment and to regulate the intracellular trafficking of endocytosed molecules, which may be consistent with the recent reports describing different intracellular localization and retention time of endocytosed CpG oligonucleotides in pDCs compared to cDCs [62,63]. The most significant clusters of functional annotations in cDCs concerns the response to pest, pathogens or parasites and the activation of lymphocytes, which include genes encoding TLR2, costimulatory molecules (CD83, CD86), proinflammatory cytokines (IL15, IL18), and chemokines (CXCL9, CXCL16), consistent with the specialization of cDCs in T cell priming and recruitment. Clusters of genes involved in inflammatory responses are found in both pDCs and cDCs. However, their precise analysis highlights the differences in the class of pathogens recognized, and in the nature of the cytokines produced, by these two cell types: IFN-α/β production in response to viruses by pDCs through mechanisms involving IRF7 and eventually TLR7; and recognition and killing of bacteria and production of IL15 or IL18 by cDCs through mechanisms eventually involving TLR2 or lysozymes. Many genes selectively expressed in cDCs are involved in cell organization and biogenesis, cell motility, or cytoskeleton/actin binding, consistent with the particular morphology of DCs linked to the development of a high membrane surface for sampling of their antigenic environment and for the establishment of interactions with lymphocytes. pDCs and cDCs also appear to express different arrays of genes involved in signal transduction/cell communication, transcription regulation and apotosis. A statistically significant association with lupus erythematosus highlights the proposed harmful role of pDCs in this autoimmune disease .
The mCD11b/hBDCA1 cDC cluster of genes comprises many genes involved in inflammatory responses and the positive regulation of the I-kappaB kinase/NF-kappaB cascade. A statistically significant association with asthma also highlights the proinflammatory potential of this cell type. Recently, it has been reported that the mouse CD11b cDC subset is specialized in MHC class II mediated antigen presentation in vivo . In support of our findings here that mouse CD11b cDCs are equivalent to human BDCA1 cDCs, we found that many of the genes involved in the MHC class II antigen presentation pathway that were reported to be expressed to higher levels in mouse CD11b cDCs over CD8α cDCs  are also preferentially expressed in the human BDCA1 cDC subset over the BDCA3 one. These genes include five members of the cathepsin family (Ctsb, Ctsd, Ctsh, Ctss, and Ctsw) as well as Ifi30 and Lamp1 and Lamp2 (see Additional data file 2 for expression values). Thus, it is possible that, like the mouse CD11b cDC subset, human BDCA1 cDCs serve as a subset of DCs that are specialized in presenting antigen via MHC class II molecules. It is also noteworthy that mCD11b and hBDCA1 cDCs express high constitutive levels of genes that are known to be induced by IFN-α/β and that can contribute to cellular antiviral defense (Oas2, Oas3, Ifitm1, Ifitm2, Ifitm3).
No significant informative functional annotations are found for the mCD8α/hBDCA3 cDC gene cluster. However, groups of genes involved in cell organization and biogenesis or in small GTPase regulator activity are found and the study of these genes may increase our understanding of the specific functions of these cells. Mouse CD8α cDCs have been proposed to be specialized for a default tolerogenic function but to be endowed with the unique ability to cross-present antigen for the activation of naïve CD8 T cells within the context of viral infection . It will be important to determine whether this is also the case for hBDCA3 cDCs. From this point of view, it is noteworthy that hBDCA3 cDCs selectively express TLR3, lack TLR7 and TLR9, and exhibit the highest ratio of IRF8 (ICSBP)/TYROBP (DAP12) expression, all of which have been shown to participate in the regulation of the balance between tolerance and cross-presentation by mouse CD8α cDCs [65,66].
A novel cell type has been recently reported in the mouse that presents mixed phenotypic and functional characteristics of pDCs and NK cells, IKDCs [15,16]. A strong genetic relationship between IKDCs and other DC populations was suggested. However, this analysis was based solely on comparison of the transcriptional profile of IKDCs to DCs and not to other cell populations . As IKDCs were also reported to be endowed with antigen presentation capabilities  and to be present in mice deficient for the expression of RAG2 and the common γ chain of the cytokine receptors , they have been proposed to belong to the DC family rather than to be a subset of NK cells in a particular state of differentiation or activation. However, IKDCs have been reported to express many mRNA specific for NK cells and many of their phenotypic characteristics that were claimed to discriminate IKDCs from NK cells  are in fact consistent with classical NK cell features as recently reviewed , including the expression of B220  and CD11c [69,70] (BD/Pharmingen technical datasheet of the CD11c antibody) . To clarify the genetic nature of IKDCs, we reanalyzed the published gene chip data on the comparison of these cells with other DC subsets , together with available datasets on other leukocyte populations. We thus assembled published data generated on the same type of microarrays (Affymetrix U74Av2 chips) to build a second mouse compendium, allowing us to compare the transcriptomic profile published for the IKDCs (n = 2) with that of pDCs (n = 2), cDCs (n = 2) , CD8α+ (n = 2), CD4+ (n = 2) or double-negative (n = 2) cDC subsets , NK cells , CD4 T cells (n = 2), and B1 (n = 2) and B2 (n = 2) cells . Information regarding the original sources and the public accessibility of the corresponding datasets are given in Table Table1.1. As depicted in Figure Figure4a,4a, the hierarchical clustering with complete linkage results of these data sets, together with our novel 430 2.0 data, clearly show that IKDCs cluster with NK cells, close to other lymphocytes, and not with DCs. Indeed, IKDCs express the conserved genetic signature of NK cells but not of DCs (Table (Table88 and Additional data file 4). Thus, these results strongly support the hypothesis that the cells described as IKDCs feature a specific subset of mouse NK cells that are in a particular differentiation or activation status, rather than a new DC subset.
A subset of leukocytes characterized as lineage-CD16+HLA-DR+ (hereafter referred to as CD16 cells) has been reported in human blood, and claimed to be a subpopulation of DCs based on their antigen-presentation capabilities. This subset segregates apart from BDCA1 and BDCA3 DCs and pDCs upon gene expression profiling . It is not found in significant amounts in secondary lymphoid organs of healthy donors, contrary to pDCs and BDCA1 or BDCA3 cDCs. It expresses specific pattern recognition receptors, such as TLR4 and TLR8, and chemokine receptors, such as CX3CR1 and CMKOR1 , which were initially described to be preferentially expressed by monocytes in humans . As the transcriptional relationship of CD16 cells with other known DC populations was originally established based solely on the transcriptional profile of DCs, we sought to better understand the nature of these cells. For this, we reanalyzed the global gene expression profile of CD16 cells in comparison to not only DC subsets but also to monocytes, neutrophils, and lymphocytes. The results depicted in Figure Figure4b4b clearly show that the CD16 cells cluster with neutrophils and monocytes and not with LN-DCs. Indeed, we find many genes that are expressed to much higher levels in monocytes or neutrophils and CD16 cells than in LN-DC subsets (Table (Table99 and Additional data file 2). Interestingly, MAFB, which has been described to inhibit the differentiation of DCs but to promote that of macrophages from hematopoeitic precursors , is expressed to much higher levels in CD16 cells and monocytes compared to DCs (average signal intensity of 6,263 in CD16 cells compared to 3,479 in monocytes, 65 in pDCs, 309 in BDCA1 DCs and <50 in BDCA3 DCs). CD16 cells also express to high levels many genes that are absent or only expressed to very low levels in LN-DCs compared to both lymphoid and myeloid cells, in particular many members of the gimap family. Reciprocally, many of the genes characterized above as specifically expressed in human and mouse LN-DCs are absent or expressed only to low levels in CD16 cells, in particular FLT3 and SCARB1. Thus, CD16 cells likely differentiate along the canonical myeloid lineage rather than belong to the LN-DC family. However, many genes are also specifically expressed to much higher levels in LN-DC subsets and CD16 cells than in monocytes, neutrophils and lymphocytes, attesting to the existence of biological functions common, and specific, to DC subsets and CD16 cells. Thus, these results strongly suggest that CD16 cells represent a particular subset of monocytes endowed with DC-like properties. One possibility is that CD16 cells are the naturally occurring equivalents of the 'monocyte-derived DCs' generated in vitro.
In vitro derived GM-CSF DCs are the most commonly used model to analyze DC biology. They are often used to investigate the interaction between DCs and other cell types or with pathogens, both in mouse (bone marrow (BM)-derived GM-CSF DCs) and human (monocyte-derived GM-CSF DCs). However, the relationship between these in vitro GM-CSF-derived DCs and the LN-DC subsets present in vivo in the steady state is not clear. A very recent publication suggests that in vitro derived GM-CSF mouse DCs may correspond to the DCs that differentiate from Ly6C+ monocytes in vivo only under inflammatory conditions and appear specialized in the production of high levels of tumor necrosis factor-α and inducible nitric oxide synthase in response to intracellular bacteria, therefore differing from LN-DCs according to both ontogenic and functional criteria . To gain further insights into the relationship between monocytes, macrophages, LN-DCs, and in vitro derived GM-CSF DCs, we thus compared their global gene expression profiling in both human and mouse, using publicly available gene chip data. Information regarding the original sources and the public accessibility of the corresponding datasets are given in Table Table1.1. The results depicted in Figure Figure55 clearly show that the in vitro derived GM-CSF DCs cluster with monocytes and macrophages and not with the LN-DCs. This result was further confirmed by PCA, which also showed that both mouse and human GM-CSF DCs are close to macrophages, and distant from LN-DCs (Additional data file 6). Indeed, we found many genes that are expressed to much higher levels in monocytes, macrophages and in vitro derived GM-CSF DCs than in LN-DC subsets (Tables (Tables1010 and and11).11). As for human CD16 cells, these genes include the transcription factor Mafb. Reciprocally, some of the genes identified in this study as specific to LN-cDCs are expressed only to much lower levels in GM-CSF DCs. However and interestingly, compared to monocytes, in vitro derived GM-CSF DCs harbor stronger levels of other lymph node resident cDC-specific genes, including scarb1, snft/9130211l03Rik, spint1, ctsh, C22ORF9/5031439G07Rik, and bri3bp. Thus, in vitro derived GM-CSF DCs seem to harbor a strong myeloid gene signature but also express some of the LN-DC-specific genes, consistent with their myeloid ontogeny and their ability to exert myeloid-type functions but also with their acquisition of DC functional properties. In conclusion, our gene chip data analysis is consistent with a very recent report suggesting that in vitro derived GM-CSF mouse DCs correspond to inflammatory DCs and differ greatly from LN-DCs . Indeed, several papers have recently established that in vitro derived FLT3-L DCs constitute the true equivalent of LN-DCs and constitute the only proper surrogate model currently available for their study [75-77].
By performing meta-analyses of various datasets describing global gene expression of mouse spleen and human blood leukocyte subsets, we have been able to identify for the first time conserved genetic programs common to human and mouse LN-DC subsets. All the LN-DC subsets examined here are shown to share selective expression of several genes, while harboring only low levels of other transcripts present in all other leukocytes. These analyses indicate that LN-DCs, including pDCs, constitute a specific family of leukocytes, distinct from those of classic lymphoid or myeloid cells. Furthermore, we demonstrate a striking genetic proximity between mouse and human pDCs, which are shown for the first time to harbor a very distinct transcriptional signature as large and specific as that observed for NK cells or T cells. In contrast, a higher genetic distance is observed between mouse and human conventional DC subsets, although a partial functional equivalence is suggested between mCD8α and hBDCA3 cDCs on the one hand versus mCD11b and hBDCA1 cDCs on the other hand.
Our finding that LN-DCs constitute a distinct entity within immune cells raises the question of whether these cells form a distinct lineage in terms of ontogeny, or whether their shared gene expression profile (notably that between cDCs and pDCs) reflects a functional rather than a developmental similarity. To date, the place of both cDCs and pDCs in the hematopoietic tree is not clear [78,79]. A BM progenitor, named macrophage and dendritic cell progenitor (MDP), has been recently identified that specifically gives rise to monocytes/macrophages and to cDCs, but not to polymorphonuclear cells or to lymphoïd cells [80,81]. Under the experimental conditions used in the corresponding report, pDCs were not detected in the progeny of MDPs. Here, we show that the transcriptome programs of mouse spleen and human blood cDCs exhibit only a very limited overlap with that of monocytes/macrophages (Figure (Figure2).2). This is consistent with the recent observation that monocytes can give rise to mucosal, but not splenic, cDCs, suggesting that splenic cDCs develop from MDPs without a monocytic intermediate . While mouse pDCs have been argued to arise from both lymphoid or myeloid progenitors, their gene expression overlaps with lymphoid or myeloid cells are limited. Interestingly, a murine progenitor cell line that exhibits both cDC and pDC differentiation potential has been described recently , suggesting that putative pan-DC progenitors might also exist in vivo, which would be consistent with the gene profiling analyses presented here.
Our study identifies transcriptional signatures conserved between mouse and human, common to all LN-DC subsets examined, or specific to pDCs, cDCs, or individual cDC subsets. A genetic equivalence is suggested between mouse CD8α cDCs and human BDCA3 cDCs, and between mouse CD11b cDCs and human BDCA1 cDCs. In contrast to the genes selectively expressed in subsets of myeloid or lymphoid cells in a conserved manner between mouse and human, most of the genes specifically increased in all LN-DC subsets or in individual LN-DC subsets are currently uncharacterized. As a consequence, the functional annotations of the LN-DC transcriptional signatures appear much less informative than those for myeloid cells, lymphocytes or APCs. This highlights how much has already been deciphered regarding the molecular regulation of antigen presentation or lymphocyte biology, as opposed to how little we know about the genetic programs that determine the specific features of LN-DCs. We believe that our study provides a unique database resource for future investigation of the evolutionarily conserved molecular pathways governing specific aspects of the ontogeny and functions of leukocyte subsets, especially DCs.
It should be noted that many genes are found to be expressed to very high levels in specific subsets of either mouse or man while no orthologous gene has been identified in the other species. This could be due to a true absence of orthologous genes between these two vertebrate species, or to a lack of identification of an existing orthology relationship. It is also possible that some of the genes expressed only in mouse DCs or only in human DCs, and not conserved between the two species, might represent functional homologs, similar to what is observed for human KIR and mouse Ly49 NK cell receptors. This may be the case for the human LILRA4 (ILT7) and the mouse SIGLECH molecules, as both of them signal through immunoreceptor tyrosine-based activation motif (ITAM)-bearing adaptors to downmodulate IFN-α/β production by human and mouse pDCs, respectively, upon triggering of TLRs [83,84]. Thus, understanding the role in LN-DCs of genes identified here only in mouse or human might be important. The transcriptional signatures identified for mouse LN-DC subsets in this study have been confirmed by analyses of independent data recently published by others on mouse cDC subsets, B cells and T cells  or on cDCs and pDCs . Most of the data for the mouse 430 2.0 compendium were generated in-house, with the exceptions being CD4 T cells and myeloid cells. In humans, we generated the data for non-DC populations, whereas data for DC subsets and CD16 cells were all generated by another group and retrieved from a public database. It is well known that datasets for the same cell type can vary considerably between laboratories. However, many of the genes identified as specific for each mouse LN-DC subset using our own data were confirmed by the analysis of other data independently generated by the groups of M Nussenzweig and R Steinman . These data are given in Additional data file 5.
Our clustering analyses and PCA also showed relatively little dataset-dependent biases, and generally grouped related cell populations together, even if they were from different origins (see, for instance, the PCA clustering of in vitro derived GM-CSF DC samples, which originated from two independent datasets in Additional data file 6). In addition, we analyzed by real-time PCR the expression profile of 27 genes across mouse leukocyte subsets from biological samples independent of those used in the gene chips analysis. All the results were consistent with the gene chip data (Additional data file 7). We also confirmed specific expression of PACSIN1 in human pDCs at both the mRNA and protein levels (Additional data file 8). Finally, we believe that our approach validates the gene expression profile identified for leukocyte subsets in the strongest way possible, by demonstrating the evolutionary conservation between mouse and human. Indeed, the gene signatures that we describe here are based on genes found specifically expressed in putatively homologous subsets of mouse and human leukocytes compared to several other types of leukocytes. This approach does not rely solely on the use of independent biological samples of similar origin and on different techniques for measurement of the expression of mRNA. It actually shows that orthologous genes share the same specific expression pattern in putatively homologous immune cell subsets from two different species, under conditions where the markers used to purify the human and mouse cell populations, and the probes used to check the expression of the orthologous genes, differ considerably. Thus, we believe that the analyses presented here are extremely robust even though they were, in part, performed by creating compendia regrouping data generated by different laboratories for different cell types.
In addition to our discovery of transcriptional signatures specific to all LN-DCs or to LN-DC subsets, we demonstrate that, once identified, the transcriptional signatures of multiple cell types can be effectively used to help determine the nature of newly identified cell types of ambiguous phenotype or functions. In our attempt to appropriately place IKDCs and CD16 cells within the leukocyte family, we used the microarray data from the original reports aimed at characterizing these cells and compared them to the data from several other leukocyte populations. The conclusions of this analysis are in sharp contrast to those originally reported [15,31]. We believe that these opposing conclusions arise from the difference in the contextual framework within which our data and that of the previously mentioned studies were analyzed. Thus, the results of our analysis of the transcriptional signature of both IKDCs and CD16 cells emphasize the need to study the transcriptional signatures of individual cell populations in the context of multiple cell types of various phenotypes and functions. Finally, this approach also allowed us to confirm a very recent report that demonstrated that in vitro derived GM-CSF mouse DCs likely correspond to inflammatory DCs and greatly differ from LN-DCs, based on ontogenic and functional studies . Thus, extrapolation to LN-DCs of the results of the cell biology and functional studies performed with in vitro derived GM-CSF DCs should only be made with extreme caution.
This study comparing whole genome expression profiling of human and mouse leukocytes has identified for the first time conserved genetic programs common to all LN-DCs or specific to the plasmacytoid versus conventional subsets. In depth studies of these genetic signatures should provide novel insights on the developmental program and the specific functions of LN-DC subsets. The study in the mouse of the novel, cDC-specific genes identified here should accelerate the understanding of the mysteries of the biology of these cells in both mouse and human. This should help to more effectively translate fundamental immunological discoveries in the mouse to applied immunology research aimed at improving human health in multiple disease settings.
Duplicates of pDCs (Lin-CD11c+120G8high), CD8α cDCs (Lin-CD11chighCD8α+120G8-/low), CD11b cDCs (Lin-CD11chighCD11b+120G8-/low) and NK cells (NK1.1+TCRβ-) were sorted during two independent experiments from pooled spleens of untreated C57BL/6 mice. Splenic CD19+ B lymphocytes, CD4 T cells and CD8 T cells were sorted in other independent experiments. Purity of sorted cell populations was over 98% as checked by flow cytometry (not shown).
RNA was extracted from between 7.5 × 105 and 1.5 × 106 cells for each leukocyte subset with the Qiagen (Courtaboeuf, France) micro RNAeasy kit, yielding between 200 and 700 ng of total RNA for each sample. Quality and absence of genomic DNA contamination were assessed with a Bioanalyser (Agilent, Massy, France). RNA (100 ng) from each sample was used to synthesize probes, using two successive rounds of cRNA amplification with appropriate quality control to ensure full length synthesis according to standard Affymetrix protocols, and hybridized to mouse 430 2.0 chips (Affymetrix, Santa Clara, CA, USA). Raw data were transformed with the Mas5 algorithm, which yields a normalized expression value, and 'absent' and 'present' calls. Target intensity was set to 100 for all chips.
For each compendium, all datasets were normalized with the invariant rank method and only one representative dataset was kept for redundant ProbeSets targeting the same gene. The datasets were further filtered to eliminate genes with similar expression in all samples, by selecting only the genes expressed above 50 (respectively 100) in all the replicates of at least one population for the mouse (respectively human) datasets and whose expression across all samples harbored a coefficient of variation above the median of the coefficient of variation of all ProbeSets. The final dataset consisted of 7,298 (respectively 11,507) ProbeSets for the mouse 430 2.0 compendium (respectively human U133 Plus 2.0), representing individual genes with differential expression between ex vivo isolated cell subsets. The final dataset consisted of 12,857 (respectively 6,724) ProbeSets for the mouse 430 2.0 compendium (respectively human U133 Plus 2.0), representing individual genes with differential expression between LN-DCs, monocytes/macrophages and in vitro derived GM-CSF DCs. These datasets for ex vivo isolated cells are accessible as Excel workbooks in Additional data files 1 and 2. The software Cluster and Treeview were used to classify cell subsets according to the proximity of their gene expression pattern as assessed by hierarchical clustering with complete linkage.
We implemented a function in the Matlab software to perform PCA. This function computes the eigenvalues and eigenvectors of the dataset using the correlation matrix. The eigenvalues were then ordered from highest to lowest, indicating their relative contribution to the structure of the data. For both mouse and human datasets, the first principal component accounted for most of the information (54% and 68% for mouse and human, respectively) and was associated with a similar coordinate for all samples. This component thus reflected the common gene expression among the samples. Second and third components together represented 24% and 21%, respectively, of the information for mouse and human datasets, and thus accounted for a large part of the variability. The projection of each sample on the planes defined by these components was represented as a dot plot to generate the PCA figures.
Partitional clustering was performed using the FCM algorithm, which links each gene to all clusters via a vector of membership indexes, each comprised between 0 and 1 . For both mouse and human datasets, we heuristically set the number of clusters to 30, and the fuzziness parameter m was taken as 1.2 (see  for the determination of m). Ten independent runs of the algorithm were performed, and the one minimizing the inertia criterion was selected . A threshold value of 0.9 was taken to select probe sets most closely associated with a given cluster. This selection retained 4,062 and 4,751 probe sets from mouse and human datasets, respectively. Probe set clusters were then manually ordered to provide coherent pictures, which were visualized with Treeview.
We identified 2,227 orthologous genes that showed significant variation of expression in both the mouse 430 2.0 and U133 Plus 2.0 human datasets. This dataset is accessible as an Excel workbook in Additional data file 3. In order to compare the expression patterns of these genes between human and mouse, the log signal values for each of these genes were first normalized to a mean equal to zero and a variance equal to 1, independently in the mouse and human datasets, as previously described for comparing the gene expression program of human and mouse tumors [22,27]. The two normalized datasets were then pooled and a hierarchical clustering with complete linkage was performed. A similar analysis was performed for the comparison of human and mouse LN-DCs, monocytes, macrophages and in vitro derived GM-CSF DCs.
In order to classify the IKDCs based on the optimal gene signatures of the different cell subsets examined, with only minimal impact of differences in the experimental protocols used to prepare the cells and to perform the gene chips assays, the clustering of the cell populations was performed as a meta-analysis of our own mouse 430 2.0 dataset together with the published U74Av2 datasets. The Array Comparison support information of the NetAffyx™ analysis center (Affymetrix) was used to identify matched ProbSets between the two types of microarrays. Only one representative dataset was kept for redundant ProbeSets targeting the same gene. This yielded a set of 2,251 genes whose expression could be compared between the two datasets, using the same normalization method as described above. This dataset is accessible as Excel workbooks in Additional data file 4. As expected, this meta-analysis led to co-clustering of all the samples derived from identical cell types whether their gene expression had been measured by us on 430 2.0 microarrays or by others on U74Av2 microarrays, with the exception of the cDC population from , which segregated with pDCs rather than with the cDC subsets from the other datasets.
Gene lists were analyzed using the DAVID 'functional annotation chart' tool accessible on the NIAID website [52,53]. Different databases were used for these annotations: gene ontology (Amigo), knowledge pathways (KEGG), interactions (BIND), interprotein domains (INTERPRO), and disease (OMIM/OMIA). The annotations shown in Tables Tables55 and and77 were selected as the most highly significant terms retrieved by performing an over-representation study. To this end, a modified Fisher exact P value called the 'EASE score' was calculated to measure the enrichment in gene-annotation terms between the gene signature specific to the leukocyte subpopulation examined ('List') and the complete set of all the genes selected for the compendium analyzed ('Background'). The significance threshold was set at an EASE score below 0.05 in most instances, or below 0.1 for DC signatures that did not yield many highly significant terms as discussed in Results. Individual significant annotations encompassing many common genes or similar biological processes were regrouped using the 'Functional annotation clustering' tool of the DAVID software. More information on this type of analysis is available on the DAVID website .
Our datasets for mouse DC subsets, NK cells, CD8 T cells, and B lymphocytes have been deposited in the Gene Expression Omnibus (GEO) database under reference number GSE9810. The references for download of the public data used from the original websites where they were first made available are given in Table Table1.1. In addition, all raw transcriptomic data analyzed here have been regrouped on our website  and are available for public download.
APC, antigen-presenting cell; BM, bone marrow; cDC, conventional dendritic cell; CDP, common dendritic progenitor; DC, dendritic cell; FCM, fuzzy c-means; GEO, Gene Expression Omnibus; GM-CSF, granulocyte-macrophage colony stimulating factor; IFN, interferon; IKDC, interferon-producing killer dendritic cell; ITAM, immunoreceptor tyrosine-based activation motif; LN-DC, lymph node-resident DC; M-CSF, macrophage colony-stimulating factor; MDP, macrophage and dendritic cell progenitor; MHC, major histocompatibility; NK, natural killer; PCA, principal component analysis; pDC, plasmacytoid dendritic cell; TLR, toll-like receptor.
SHR, TW, SC, PK, and MD designed the research; SHR, TW, CT, HX, MS, GB, AD and MD performed the research; EV and PP contributed new reagents/analytical tools; SHR, TW, CT, HX, DD, MS, FRS, SC, PK, and MD analyzed data; and SHR, TW, and MD wrote the paper.
During the review process of this paper, two reports were published in Nature Immunology that identified a common progenitor characterized as FLT3+M-CSF+ for mouse LN-DCs (pDCs, CD8α cDCs and CD11b cDCs), devoid of any capability to generate lymphoid cells or monocytes/macrophages, and named common dendritic progenitor (CDP) [87,88]. This observation is thus consistent with our gene profiling analysis of human and mouse leukocytes. The question whether this pathway for LN-DCs is the major one, or just one possibility among others, including differentiation from monocytes, has been raised . Our gene profiling data would suggest that most mouse LN-DCs derive from the recently identified CDP or MDP in vivo, without a monocytic intermediate, consistent with a recent report . It also implies that a similar pathway must exist in humans. The relationship between the CDP and the MDP still remains to be established. Three reports have been published very recently in the Journal of Experimental Medicine that showed that IKDCs are a specific subset of NK cells, based on functional and ontogenic approaches comparing these cells to DCs and NK cells [90-92]. This is consistent with the results of our clustering analysis of IKDCs with other leukocyte subsets. Finally, two recent reports have identified a new transduction pathway in human pDCs involving a B cell receptor-like ITAM-signaling pathway [93,94]. This pathway involves the BLNK transduction molecule, which we have identified here as expressed to very high levels in mouse and human pDCs compared to the other LN-DCs (Table (Table6)6) and many other leukocytes. We believe that the conserved transcriptional signatures identified here for mouse and human LN-DC subsets will lead to many more discoveries for the understanding of the specialized functions of these cells.
The following additional data are available. Additional data file 1 is a Microsoft Excel workbook with raw data for the mouse gene chip compendium. Additional data file 2 is a Microsoft Excel workbook with raw data for the human gene chip compendium. Additional data file 3 is a Microsoft Excel workbook with raw data for the human/mouse gene chip compendium. Additional data file 4 is a Microsoft Excel workbook with raw data for the IKDC gene chip compendium. Additional data file 5 is a Microsoft Excel workbook giving the mouse DC subset gene signatures according to our datasets with confirmation from two other independent datasets (one for pDCs and one for cDC subsets). Additional data file 6 is a figure showing the results of PCA for investigation of the relationships between in vitro derived GM-CSF DCs and LN-DCs in mouse and human. Additional data file 7 is a table giving real-time PCR data for the pattern of expression of 27 genes across mouse leukocyte subsets. Additional data file 8 is a figure illustrating PACSIN1 expression in human pDCs versus PBMCs by RT-PCR and western blotting.
Raw data for the mouse gene chip compendium.
Raw data for the human gene chip compendium.
Raw data for the human/mouse gene chip compendium.
Raw data for the IKDC gene chip compendium.
Mouse DC subset gene signatures according to our datasets with confirmation from two other independent datasets (one for pDCs and one for cDC subsets).
Results of PCA for investigation of the relationships between in vitro derived GM-CSF DCs and LN-DCs in mouse and human.
Real-time PCR data for the pattern of expression of 27 genes across mouse leukocyte subsets.
PACSIN1 expression in human pDCs versus PBMCs by RT-PCR and western blotting.
The authors are indebted to Bertrand Nadel and Jean-Marc Navarro for help with the real-time PCR experiments and to Markus Plomann for the generous gift of the anti-PACSIN1 antibody. The authors also thank the staff of the animal care facilities and the flow cytometry core facility of the CIML for excellent assistance. This work was supported by an ATIP grant from the CNRS, a grant from the Association pour la Recherche sur le Cancer (ARC) and a grant from the Réseau National des Génopoles (RNG) to MD. SHR was supported by the CNRS, the Fondation pour la Recherche Médicale, and the Philippe Foundation. The CIML is supported by institutional grants from the INSERM, the CNRS, and the Université de la Méditerranée. We thank the IPSOGEN company for their advice on the analysis of the data. The authors declare no conflict of interest.