Hierarchical clustering of the 64 samples was performed using the selected 4539 clones representing 3341genes whose expression varied more than threefold from the overall mean abundance in at least three samples (). In the dendrogram shown in , four distinct groups of tumors are apparent, suggesting that the tumors can be divided into four types on the basis of the 3341 differentially expressed genes. The association of tumors within this unsupervised cluster is not due to gene filtering criteria because varying data selection criteria still maintains the tumor associations. It also seems that the contents of tumor cells or the adipose and immune components have little influence on this cluster pattern (see Clinical and Pathology Parameters on the Web supplement at http://genome-www.stanford.edu/breast_cancer/lobular/
). One striking feature is that 11 of 21 (52%) of the ILCs were found in group IV, which also contains three normal breast samples. This suggests that this group of ILCs is different in gene expression profile from IDCs and has more gene expression similarities with normal breast than IDCs. We refer to this group of ILCs as “typical” ILCs. A fraction of other ILCs share similar gene expression profiles with IDCs and are referred to as “ductal-like” ILCs. The relatedness of typical ILCs to normal samples is not likely due to the composition of the tumors because five of eight ILCs with relatively low percentage of tumor cells (40–60%) clustered elsewhere with IDCs. In addition, genes such as E-cadherin and basal epithelial cell markers (e.g., KRT5, KRT 17, and epidermal growth factor receptor [EGFR]) show significantly different expression levels in typical ILCs and normal samples (). It is also worth noticing that the two lymph node metastases clustered together with the primary tumors they derived from, consistent with our previous findings, suggesting a similar gene expression profile between primary tumor and lymph node metastasis. Each normal sample (derived from the same breast as a corresponding primary tumor but taken from a distant location) exhibited expression profiles similar to other normals (our unpublished data) and different from their corresponding IDC ().
Figure 1. Unsupervised hierarchical clustering analysis of 64 breast samples. ULL represents the Norwegian samples and BC represents the Stanford samples. (A) Dendrogram representing similarities in the expression patterns between experimental samples. Thirty-eight (more ...)
Group I tumors have high relative expression of ER and its regulated genes (). This group displays low relative expression of basal epithelial cell markers, including basal keratins and EGFR (), adipose () and stromal tissue markers (). Interestingly, the ER-overexpressing group I tumors differentially express genes involved in proliferation and cell cycle regulation (). Group II IDC tumors exhibit the lowest relative expression of ER and its regulated genes () and high relative expression of basal epithelial cell markers, EGFR, and proliferation and cell cycle-regulated genes. (). Stromal and adipose tissue markers in group II are present mainly in the ILC samples (). Group III and IV are similar in that they both show relatively low proliferation/cell cycle activities () but differ in other signatures. Specifically, group III has relatively high expression of ER and its regulated genes (), stromal tissue markers (), and variable expression of basal epithelial cell (although relatively low EGFR) and adipose tissue markers (). Group IV tumors, consisting of the typical ILCs, has mixed expression of ER and its regulated genes () and stromal tissue markers (), variable expression of basal epithelial markers (with relatively low EGFR expression) () but very high relative expression of adipose tissue markers (). Two markers, E-cadherin and ERBB2, are almost absent from group IV tumors but present in several tumors in the other three groups (). These results suggest that the typical ILCs are molecularly different from IDCs. It is worth noting that group III mainly consists of patients <55 years of age and most had lymph node metastases. More than one-half the patients in group IV (typical ILCs) also had lymph node metastases but were at least 55 years old at diagnosis.
To identify genes whose expression differs significantly between ILCs and IDCs, we performed SAM analysis (Tusher et al., 2001
). There were 474 clones representing 378 unique genes that were selected at the lowest median number of falsely significant genes, 0.6. Of the 378 clones, 150 have known biological functions, including 75 genes that show high expression in ILCs and low expression in IDCs, and 75 genes vice versa. Most of the 150 genes can be categorized into five biological processes according to Gene Ontology annotations (Ashburner et al., 2000
): cell adhesion/motility, lipid/fatty acid metabolism, immune and defense response, electron transport, and nucleosome assembly (). Many genes involved in signal transduction, regulation of transcription, and small molecule transport and metabolism were also among the genes identified by SAM (see Web supplement for full list).
Genes whose expression significantly differs between ILCs and IDCs identified by SAM
To explore the question of which genes best discriminate ILCs and IDCs, we performed PAM analysis. This method of nearest shrunken centroids is used in cancer class prediction to find genes that best characterize cancer types. Here, we used PAM to identify a minimal subset of genes that succinctly characterized ILCs and IDCs. By using a threshold of 2.9 (), a set of 78 clones representing 45 named genes were selected (), 44 of which were also present in the list of genes identified by SAM. ILCs and IDCs were separated based on the expression pattern of these genes with an overall error rate of 0.15. Specifically, 18 of 21 ILCs (86%) and 32 of 39 IDCs (82%) were correctly classified. BC-L-014, ULL-L-014, and ULL-L-028 were the exceptions and they all belonged to the ductal-like ILCs. When the 78 clones were used in a hierarchical clustering of all 59 tumor samples, the same three ductal-like ILC samples were placed on a main ductal branch containing most of the IDCs, separate from the lobular branch that contained 18 ILCs (). All typical ILCs clustered together in a core on the lobular branch with ductal-like ILCs positioned at the edges. Two group I IDCs (ULL-D-056 and ULL-D-216) and three group II IDCs (BC-D-007, BC-D-032, and BC-D-035) also are on the lobular branch, although most are on one edge near the ductal-like ILCs. Each of the IDCs on the lobular branch is ER and/or PR positive (see Clinical and Pathology Parameters on the Web supplement).
Figure 2. Identification of gene expression patterns distinguishing IDCs and ILCs by PAM. (A) Relationships of value of threshold in cross-validation, number of genes identified, and overall misclassification rate or misclassification rate for each tumor type are (more ...)
The most important discriminator identified by PAM is cadherin 1 (CDH1, E-cadherin). Four different clones representing CDH1 were among the top discriminators (). Their average expression ratio in ILCs was 4.2- fold lower than that in IDCs, consistent with previous immunohistological studies of CDH1 in ILCs and IDCs. It is worth noticing that BC-D-048 has low expression of CDH1 similar to ILCs, which is consistent with invasiveness and unfavorable prognosis (Siitonen et al., 1996
; Hunt et al., 1997
; Nagae et al., 2002
). Seven other genes (SORBS1, VWF, AOC3, MMRN, ITGA7, CD36, and ANXA1) functioning in cell adhesion were also selected as discriminators, suggesting a different cell adhesion feature between ILCs and IDCs. A number of other genes with high ranks among the identified discriminators are involved in lipid/fatty acid transport and metabolism, including FABP4, LPL, PLIN, ANXA1, and CD36, indicating a potential difference in lipid/fatty acid metabolism between ILC and IDC tumor tissue. An interesting electron transport gene overexpressed in ILCs is glutathione peroxidase 3, which catalyzes the reduction of hydrogen peroxide, organic hydroperoxides, and lipid peroxides, protecting cells against oxidative damage. Together, these results demonstrate that the majority of ILCs can be distinguished from IDCs by expression patterns of a small set of genes involved in several biological processes.
When typical ILCs were compared with IDCs by PAM analysis (see Web Supplement), 26 clones representing 14 named genes were identified that best distinguished the two groups with an overall misclassification error rate of 0.102 (0% error rate for the typical ILCs, 13% error rate for the IDCs). Twenty-one of the 26 clones were present among the 78 clones previously identified by PAM that distinguished ILCs and IDCs. Among the five clones not identified, there were two named genes: PDE2A (phosphodiesterase 2A and cGMP-stimulated) and early B-cell factor. These two genes are also present in a PAM analysis that distinguishes typical ILCs from ductal-like ILCs (), discussed below.
Figure 4. Identification of gene expression patterns that distinguish typical and ductal-like ILCs by using PAM. (A) Relationships of value of threshold in cross validation, number of genes identified and overall misclassification rate or misclassification rate (more ...)
To further assess the degree of differences between gene expression profiles in ILCs and IDCs, and to compare that to the previous classification into five subclasses (luminal A, luminal B, ERBB2, basal, and normal-like), we performed Pearson's correlation by using the five sets of centroids recently defined in Sorlie et al.
). These sets of centroids consist of the average expression of the 500 intrinsic genes corresponding to each of the five subtypes. The Pearson's correlation coefficients between the expression ratio of 455 intrinsic genes in our 59 tumor samples, and the five sets of centroids were calculated. Fifty-six of 59 carcinomas were assigned to a subtype by the highest r (), confirming the existence of the five centroids also in this set of tumors. The three tumors that could not be classified using an r threshold of 0.14 (determined by multiple permutations of gene expression values) were all typical ILCs (ULL-L-024, ULL-L-058, and ULL-L-105, colored gray in ).
Figure 3. Comparison of gene expression patterns of ILCs and IDCs by using intrinsic genes. (A). The highest Pearson's correlation coefficients between each of the 59 primary tumors and five sets of centroids derived from 122 breast samples published previously (more ...)
The correlation coefficients between our 59 samples and the centroids of the five subtypes provide additional evidence that typical ILCs are different from ductal-like ILCs and IDCs in their gene expression profile. Seven of the eight typical ILCs that have >0.14 correlation coefficients were assigned to the normal-like subtype (), consistent with hierarchical clustering results shown in . Only one typical ILC was assigned to another subtype (BC-L-090, assigned to basal subtype with an r of 0.25 compared with the ductal-like lobular BC-L-014 assigned to basal subtype with an r of 0.7). In contrast, only one of the 10 ductal-like ILCs was present in the normal-like subtype group (ULL-L-168, with an r of 0.3). Five of 10 ductal-like ILCs showed high correlation with the corresponding set of centroids for their subtypes (r > 0.3). Notably, the basal subtype had the highest correlation with the centroids compared with other subtypes, suggesting a highly consistent gene expression pattern associated with basal subtype tumors.
When variation in expression of 481 intrinsic genes was used to order the 59 samples in a hierarchical clustering, two features of the dendrogram were evident (). First, samples tended to cluster based on their correlation to the centroids of the subtypes. For example, seven of 10 basal subtype tumors clustered together, consistent with the high r among basal subtype IDCs observed above. Second, six of the 11 typical ILCs clustered together on the normal-like subtype branch, whereas only one of the 10 ductal-like ILCs clustered with this group, confirming that this group of ILCs has characteristic gene expression patterns different from IDCs and ductal-like ILCs. When we ordered the 38 IDCs only using the intrinsic genes, the dendrogram showed an even clearer separation of the five subtypes (see Web supplement). This is not surprising because the centroids were essentially derived from IDCs and thus have a high power of classification for IDCs.
The expression patterns of the intrinsic genes characterizing the five subtypes are largely in agreement with previous reports. For example, the basal epithelial cell markers, including keratins 5 and 17 were relatively highly expressed in the basal subtype (), whereas ER and most of the other ER coexpressing genes failed to express in this subtype (). Genes representing tumor markers such as ERBB2 and MUC1 also showed relative low expression in the basal subtype (). Interestingly, a cluster of genes with diverse functions is highly expressed in basal and ERBB2 subtypes () and seem inversely related to ER expression. Another cluster of genes show relative low expression in basal and luminal B subtypes (), with relative overexpression in luminal A and normal-like subtypes.
To identify a minimum set of genes that best discriminate typical ILCs from ductal-like ILCs, PAM was performed on 23,914 clones representing 15,281 genes whose expression was measurable in at least 80% of the 21 ILCs. Seventy-six clones representing 44 genes with known functions were selected at an overall error rate of 9% (). These genes function in a number of biological processes according to Gene Ontology annotations (for details, see Web supplement). Many of these genes are involved in regulation of cell growth (CDKN1C, G0S2, PDGFA, KIT, and F2 relatively overexpressed and MAP3K8 relatively underexpressed in the typical ILCs) and immune response (AOC3, IGJ, F2, F3, and IGLL1 relatively overexpressed and DEFB1, HLA-C relatively underexpressed in the typical ILCs). When the 76 clones were used in hierarchical clustering of the 21 ILCs, typical ILCs and ductal-like ILCs were separated into two groups with 100% accuracy (). The two genes identified in the PAM analysis of typical ILCs compared with IDCs (see Web supplement) but not identified on the original SAM list of clones distinguishing ILCs and IDCs, PDE2A (phosphodiesterase 2A) and EBF (early B-cell factor) are also relatively overexpressed in typical ILCs (). Together, these results strongly suggest the existence of two groups of ILCs differing in gene expression profiles.