We have used DNA microarrays to determine if the heterogeneity in scleroderma can be captured quantitatively and objectively using gene expression profiling. We used an experimental design that has previously been used with great success to identify molecular subsets in tumors 
and now shows that we can also find subsets in the gene expression patterns of scleroderma, a disease of completely different etiology but also characterized by disease heterogeneity.
Our results show that the diversity in the gene expression patterns of SSc is much greater than demonstrated in two prior studies of dSSc skin 
. We find evidence for four major groups, each characterized by a distinct gene expression profile. The diffuse-proliferation
group is composed solely of patients with a diagnosis of dSSc, the inflammatory
group includes patients with dSSc, lSSc and morphea, the limited
group is comprised solely of patients with lSSc, and the normal-like
group includes healthy controls along with dSSc and lSSc patients. The diffuse-proliferation
group contains two potential subgroups, however, our sample size is not large enough to draw definitive conclusions regarding their stability.
It is unlikely that the underlying gene expression groups result from technical artifacts or heterogeneity at the site of biopsy. First, we created a standardized sample-processing pipeline, which was extensively tested on skin collected from surgical discards prior to beginning this study and included strict protocols that were used throughout with the goal of eliminating variability in sample handling and preparation. Second, all gene expression groups were analyzed for correlation to date of hybridization, date of sample collection and other technical variables that might have affected the groupings. Also, heterogeneity at the site of biopsy is unlikely to account for the findings as the signatures used to classify the samples were selected by virtue of their being expressed in both the forearm and back samples of each patient. The inflammatory group is unlikely to be a result of active infection in patients as individuals with active infections were excluded from the study. Finally, the gene expression signatures we found are supported by both the IHC findings () and the quantitative real-time PCR findings ().
We were able to associate our gene expression signatures with changes in specific cell markers. We have confirmed infiltration of T cells in the dermis of the ‘inflammatory’ subgroup, and have confirmed an increase in the number of proliferating cells in the epidermis in the ‘proliferation’ group. The increase in the number of proliferating cells in the epidermis could result from paracrine influences on the resident keratinocytes, possibly activated by the profibrotic cytokine TGFβ. We were not able to find significant numbers of CD20 positive B-cells.
An open question that remains is how do these gene expression changes correlate with more specific histological changes in the skin? Two studies of gene expression in liver 
and in the brain 
have correlated large-scale morphological changes with the changes in gene expression. In each case it was possible to create a detailed map linking gene expression to features in detailed imaging analysis providing addition insight into tumorigenesis. A comprehensive gene expression study in SSc that combines detailed histological or morphological analysis of fat changes, vascular changes and dermal markers, would provide additional insight into how the gene expression changes correlate with morphological changes in SSc skin. Unfortunately these analyses are not possible with our current set of data.
The detection of subsets in the gene expression of SSc raises questions as to their etiology. Do these subsets represent distinct groups with stable patterns of gene expression or do the groups represent different time-dependent phases of the disease? We have found a clear relationship between severity of disease and gene expression (–), but only a weak association between duration of disease and gene expression (). However, analysis of disease duration in only the dSSc patients raises the possibility that the groups we have labeled as inflammatory and normal-like include patients in the early stages of disease, while the diffuse-proliferation group includes patients with later stage disease. There is the distinct possibility that patients with the inflammatory gene expression signature will eventually progress to a gene expression signature more characteristic of the diffuse-proliferation group - a hypothesis that can only be addressed directly in a longitudinal study of a well-defined patient cohort.
The multiple groups observed in our gene expression data may correspond to patients that will have distinct clinical outcomes. This is supported by recent work analyzing the relationship between change in skin score and outcome in a large single center cohort of 225 patients 
. Using a Latent linear trajectory model, Denton and coworkers were able to classify 58% of their patients into 1 of 3 subgroups with different skin score trajectories. Each group showed different progression to clinical endpoints. Survival was lowest in a group with the highest baseline skin score and showed little improvement during follow-up. A second group had severe MRSS but improved with follow-up and a third group had low initial MRSS and subsequent improvement. A second study analyzed SSc patients with anti-topoisomerase I (anti-topo I) antibodies and found patients could be divided into five different subgroups based on skin thickness progression rates 
. These included three groups of dSSc patients and two groups of lSSc patients.
This study allows us to then propose two different models that could account for the gene expression subsets we have found in scleroderma. The first model is that there are multiple distinct groups of scleroderma patients, each exhibiting distinct gene expression profiles. The aberrant gene expression patterns may be established early in the disease and remain stable during disease progression. In this case, serial biopsies taken over time would result in sequential biopsies from the same patient always remaining in the same group. It would likely be possible to identify the clinical endpoints and complications to which each group would progress. The implications are that it may be possible to predict patient outcome based on their gene expression profile. The reports of three different groups of diffuse patients with different outcome trajectories or different skin thickness progress rates supports this model 
The second model is that the different gene expression subgroups represent different disease stages. This is supported in part by the analysis of disease duration since the first onset of non-Raynaud's symptoms between the group we labeled diffuse-proliferation, and the dSSc patients that were classified as either inflammatory or normal-like (). There is an obvious trend toward the patients in the very earliest stages of disease mapping to the inflammatory group and the latest stage patients mapping to the diffuse-proliferation group.
The gene expression profiles in scleroderma hold the promise of identifying markers of disease activity that could be used as surrogate markers in clinical trials. Therefore, the analysis of skin biopsies before and after treatment may be useful in testing the efficacy of novel therapeutics. To this end, we have identified 177 genes that are strongly correlated with the severity of skin disease. These genes may point to a novel pathway involved in skin fibrosis that includes TNFRSF12A (Tweak Receptor (TweakR); Fn14), which is a TNF receptor family member expressed on both fibroblasts 
and in endothelial cells 
. It is induced by FGF1 and other mitogens, including the proinflamatory cytokine TGFβ (J.L.S. and M.L.W., unpublished). In fibroblasts, increased expression results in decreased adhesion to ECM proteins fibronectin and vitronectin 
. TNFRSF12A has also been shown to play role in angiogenesis 
. In vitro cross-linking of the TNFRSF12A in endothelial cells stimulates endothelial cell proliferation 
, while inhibition prevented endothelial cell migration in vitro and angiogenesis in vivo. Activation of TNFRSF12A in human dermal fibroblasts results in increased production of MMP1, the proinflammatory prostaglandin E2, IL6, IL8, RANTES and IL10 
. The cytoplasmic domain of TNFRSF12A binds to TRAF1, 2 and 3 
. A factor downstream of the TRAFs, TRIP (TRAF Interacting Protein), is highly correlated with MRSS. With further refinement, these genes could serve as surrogate markers for disease severity in scleroderma.