Search tips
Search criteria 


Logo of currgenoLink to Publisher's site
Curr Genomics. 2006 November; 7(8): 523–532.
PMCID: PMC1828915

Molecular Classification of Breast Carcinoma In Situ


Pleomorphic variant of invasive lobular carcinoma (PILC) is an aggressive variant of invasive lobular carcinoma (ILC). Its in situ counterpart, pleomorphic lobular carcinoma in situ (PLCIS) is a recently described entity. Morphologically it has the typical architectural pattern of LCIS, but the neoplastic cells resemble intermediate grade DCIS. Molecular signatures that distinguish PLCIS from DCIS and LCIS would provide additional tools to aid in the histopathologic classification of PLCIS as a lesion distinct from LCIS and DCIS. CIS lesions, obtained from a study cohort of 38 breast cancer patients, were divided into 18 DCIS, 14 PLCIS and 6 LCIS. DNA from microdissected archival tissue was interrogated for loss or gain of 112 breast-cancer-specific genes using the Multiplex Ligation-dependent Probe Amplification Assay (MLPA). Classification Regression Tree (CART) analysis was employed to develop a gene-based molecular classification to distinguish or separate out PLCIS from DCIS and LCIS. Molecular classification via CART, based on gene copy number, agreed with histopathology in 34/38 CIS cases. Loss of CASP1 was predictive of LCIS (n=4) with one misclassified PLCIS. Gain of RELA predicted only the LCIS classification (n=2 cases). STK15 and TNFRSF1B were predictive only for DCIS with no misclassifications. Gain of EHF and TNFRSF1B and loss of NCOA3 were predictive of PLCIS, but not without misclassification. Molecular reclassification by CART was accomplished in 4 CIS cases: 1 PLCIS was reclassified as LCIS, 1 LCIS reclassified as PLCIS, and 2 DCIS cases as PLCIS. This study provides additional rationale for molecular modeling strategies in the evaluation of CIS lesions. This diagnostic aid may serve to minimize misclassification between PLCIS and DCIS, and PLCIS and LCIS, aiding to increase accuracy in the differential diagnosis of CIS lesions.

Key Words: Carcinoma in situ (CIS), Ductal carcinoma in situ (DCIS), Pleomorphic lobular carcinoma in situ (PLCIS), Lobular carcinoma in situ (LCIS), Classification Regression Tree (CART), molecular classification, Multiplex Ligation-dependent Probe Amplification Assay (MLPA), differential diagnosis


Most breast carcinomas in situ are easily categorized as ductal (DCIS) or lobular (LCIS) (Fig. (Fig.1).1). However, some CIS lesions have indeterminate histological features (Fig. (Fig.2)2) [1, 2]. A pleomorphic variant of invasive lobular carcinoma (PILC) is known to be an aggressive variant of invasive lobular carcinoma (ILC) [3]. Its in situ counterpart, (PLCIS), defined by Frost et al. [4] in 1996, has not been fully defined histologically and biologically (Fig. (Fig.3).3). PLCIS, like PILC, is expected to be more aggressive than LCIS (Fig. (Fig.4)4) [5]. Moreover, although classic LCIS is considered a risk marker for cancer when compared to DCIS, the clinical and biological significance of PLCIS is currently unknown [4].

Fig. (1)
Most breast carcinomas in situ are easily categorized as ductal (DCIS) or lobular (LCIS)
Fig. (2)
Some carcinoma in situ lesions have indeterminate (ID) histological features.
Fig. (3)
Morphologically PLCIS has a typical architectural pattern of LCIS but the neoplastic cells resemble intermediate grade DCIS (DCIS IG).
Fig. (4)
PLCIS is expected to be more aggressive than LCIS.

The cellular morphology in PLCIS is similar to that of intermediate grade DCIS (Fig. (Fig.3).3). In the past, because of the histological similarity and associated necrosis, most PLCIS lesions have been diagnosed as DCIS. Treatment strategies are different for different types of CIS. If a diagnosis of LCIS is made, the patient is followed by observation [2], whereas a diagnosis of DCIS usually leads to definitive treatment, depending on the extent and grade of DCIS (mastectomy, lumpectomy and radiation therapy, or observation alone). Because of the expected aggressive behavior of PLCIS, it is believed that treatment similar to DCIS may be warranted.

Current management of classic LCIS versus PLCIS and DCIS is not uniform, and additional methods to aid in the differential diagnosis are likely to have clinical consequences. Gene expression of E-cadherin (EC) provides some degree of lesion sub-typing (Fig. (Fig.5)5) [68]. Although a negative EC stain can confirm a diagnosis of classic ILC or PLCIS it cannot distinguish LCIS and ILC from PLCIS. Furthermore, positive EC staining of DCIS-ID [ID not defined or indeterminate] lesions with reduced EC staining (EC-1+) (Fig. (Fig.6)6) can increase the propensity for misdiagnosis. Thus, a negative EC stain cannot unequivocally distinguish DCIS-ID from PLCIS.

Fig. (5)
E-cadherin (EC) expression provides some degree of lesion sub typing. A: H & E; B: EC stain.
Fig. (6)
Limitations to EC staining: A negative EC stain cannot unequivocally distinguish DCIS ID from PLCIS.

Molecular fingerprinting of CIS, by integrating lesion-specific genetic targets into the differential diagnosis, has the potential to provide more accurate distinction of prognostic groups and improved therapeutic strategies. The goal of this study was to test whether a molecular classification approach using gene copy number and Classification Regression Tree (CART) models can differentiate among three types of CIS: PLCIS, DCIS and LCIS.



The patient cohort comprised 38 breast cancer cases with CIS lesions, either concurrent with tumor (17 cases: 9 DCIS, 5 PLCIS, 3 LCIS), as single CIS lesions of DCIS (9 cases), PLCIS (9 cases), and LCIS (2 cases), or in one case as concurrent CIS lesions of LCIS and PLCIS, with LCIS as the lesion of inclusion (Table (Table1).1). All the DCIS lesions were of intermediate grade. The final CIS designation of the 38 patient cohort was as follows: 18 DCIS, 14 PLCIS, and 6 LCIS (Table (Table11).

Table 1
Carcinoma In Situ (CIS) Classification

DNA Extraction

CIS tissue and normal breast epithelium when available from each case were microdissected for DNA extraction. As a first step, 300ul of P-buffer (50mM tris-HCL, pH 8.5; 100mM NaCl, 1mM EDTA, 0.5% Triton X100; 20mM DTT) was added to tubes containing whole 5 micron tissue sections or microdissected tissue. The tubes were heated for 15-20 min, at 90°C in a water bath and allowed to cool down to 60°C followed by the addition and mixing of 6ul of 20mg/ml Proteinase K, overlaid with 3 drops of mineral oil and spun 5 seconds at 13,000g. This was followed by a 4-16 hour (overnight) incubation at 60°C. The tubes were heated for 10min at 90°C in order to denature the Proteinase K and to disrupt nucleic acid formaldehyde adducts. Upon removal of the oil, the tubes were centrifuged for 15 min (at 13,000g) at room temperature and 250 ul of the supernatant was transferred to a clean 1.5 ml tube. After addition of 10ul 5M NaCl and 1000ml ethanol to the 250 ul supernatant, the tubes were incubated at −20°C for least 60 mins. This was followed by centrifugation for 15 mins at 13,000g, at −4°C. Upon removal of the supernatant, an additional centrifugation step for 10 secs ensured removal of the last traces of the supernatant. Finally, the pellet was air-dried and dissolved in 100ul of ddH2O.

The Multiplex Ligation-Dependent Probe Amplification Assay (MLPA)

The MLPA assay is a recent method for relative quantification of approximately 30-40 different DNA sequences in a single reaction, requiring only 20 ng of human DNA. The method has been detailed elsewhere [913]. The assay has been used successfully for the detection of deletions and duplications and the characterization of chromosomal aberrations for gains and losses of genes in cell lines and tumor samples [913]. Probes added to the samples are amplified and quantified instead of target nucleic acids. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oli-gonucleotides, one synthetic and one M13-derived, each hybridizing to adjacent sites of the target sequence. Such hybridized probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50-70 nucleotides). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences. The amplified fragments are separated on a DNA sequencer (Fig. (Fig.77).

Fig. (7)
Multiplex Ligation-dependent Probe Amplification (MLPA).

We have created and validated a panel of 122 breast-cancer-associated gene probes [12], distributed in 3 batches with 40, 41, and 41 probes, respectively. Normal tissue from each cancer subject serves as an internal reference when available. When normal tissue is not available from a subject, controls are obtained from breast reduction surgeries that have been reviewed and determined by the pathologist to have only normal breast epithelium. for cell lines, where normal DNA is not available, control (normal) female DNA samples are run with each probe set. Quantification of loss or gain of gene loci is determined through a process of normalization [913]. The latter addresses variations in the surface area of a peak (intensity) encountered due to fluctuations in the assay run, such as amount of DNA, ploidy variations, and PCR conditions. To determine gene copy number, the peak area for each probe is expressed as a percent of the total surface area of all peaks of a sample in an assay run (Fig. (Fig.7).7). Relative copy number for each probe is obtained as a ratio of the normalized value for each locus (peak) of the sample to that of the normal control. A difference is significant only if the ratio is less than 0.7 (loss) or higher than 1.3 (gain). Complete loss or 0 copies is indicated by absence of a peak for that particular locus. A relative copy number of 2 is considered normal, 1 or 0 copies is considered loss, and 3 copies or more is considered gain.

Statistical Analysis

The study utilized the Classification and Regression Tree (CART®) analysis [14] to develop a gene-based model to discriminate among lesions in the three categories of DCIS, PLCIS, and LCIS. CART methodology, known as binary recursive partitioning, was developed in 1984 by Breiman et al., and uses non-parametric approaches [14]. The term “binary” implies that each group of patients, represented by a “node” in a decision tree, can only be split into two groups. Thus, each parent node can be split into two child nodes (Fig. (Fig.88A). The term “recursive” refers to the fact that the binary partitioning process can be applied over and over again. Thus, each parent node can give rise to two child nodes and, in turn, each of these child nodes may themselves be split, forming additional children (Fig. (Fig.8).8). The term “partitioning” refers to the fact that the dataset is split into sections or partitioned.

Fig. (8)
The optimal tree sequence with the least error rate yielded 7 terminal nodes (A) with the smallest error rate (B).

CART has several advantages as a tool for data mining and predictive modeling. The tree produced represents a model or decision tree in which each node (branch) is determined by splitting the dataset on the basis of the one variable that results in the best separation as defined by values of the dependent variable (in this case, gene variables). At every branch, every variable is tested for its usefulness in further splitting. This exhaustive search for splitters can make CART computationally intensive. The relative importance of each variable is assessed based on its importance over all possible nodes and splits. In any one node, only one variable will be the best splitter although another may be a close second best (a good surrogate). The second-best variable may be a good surrogate for numerous splits without ever being selected as the best primary splitter. Its usefulness as a surrogate for multiple splits leads to its higher importance.

CART’s recursive partitioning algorithm, which identifies the first gene variable with the greatest predictive power to create a first level branch (node), was applied to separating patients into three groups of PLCIS, LCIS, DCIS. It proceeded next to identify the second gene for each subgroup with the second greatest predictive power to partition patients further into the same three groups. The process was continued until no further gene was identified to achieve further classification. Finally, CART calculates the error in each category as well as the overall error. The error rate is a percentage of cases that are misclassified (e.g., a PLCIS case that is re- classified as LCIS).

To reduce the number of variables selected, we first identified a set of gene variables based on their importance from high (e.g., 100%) to low (0%) to predict lesion classification. This is followed by CART analysis on a subset of variables with a relative importance greater than 20%. The unbalanced cost ratio was used in CART to reduce the error on misclas-sifying a PLCIS case into the category LICS or DCIS, or an error in misclassifying an LCIS case to the PLCIS category.

To minimize error in misclassifying PLCIS as LCIS or DCIS, and LCIS as PLCIS, we used a cost ratio of 3:1 for misclassifying PLCIS as either LCIS or DCIS, and 2:1 for misclassifying LCIS as PLCIS. For example, a 3:1 cost ratio in the partitioning of PLCIS and LCIS lesions indicates that the cost of misclassifying a PLCIS case into the LCIS category group is 3 times more egregious than the misclassification of a DCIS lesion into the PLCIS category group.

For modeling purposes, we calculated the error rate in each CIS category, as well as the error rate for the model, focusing on error reductions with respect to misclassification of a PLCIS case into either the LCIS or DCIS category, and misclassification of a DCIS case into the PLCIS category.

To avoid over-fitting the data, the leave-one-out cross–validation [14] was performed to evaluate the predictive ability when the model was applied to new data in the same patient cohort. Cross validation is a computationally-intensive method for validating a procedure for model building, which avoids the requirement for a new or independent validation dataset. In cross validation, the learning dataset is randomly split into N sections, stratified by the outcome variable of interest. This assures that a similar distribution of outcomes is present in each of the N subsets of data. One of these subsets of data is reserved for use as an independent test dataset, whereas the other N-1 subsets are combined for use as the learning dataset in the model-building procedure. The entire model-building procedure is repeated N times, with a different subset of the data reserved for use as the test dataset each time. Thus, N different models are produced, each one of which can be tested against an independent subset of the data. The remarkable fact on which cross validation is based is that the average performance of these N models is an excellent estimate of the performance of the original model (produced using the entire learning dataset) on a future independent set of patients [14].


Ten genes in the range of 29% to 100% in variable importance were selected in a univariate analysis as predictor variables from among the 122 gene probe panel (Table. (Table.2).2). The optimal tree sequence with the least error rate yielded 7 terminal nodes (Table. (Table.3,3, Fig. Fig.88A). The regression tree for CIS is presented in Fig. ((9).9). The splitting criterion for each node is given within the blue boxes. Terminal nodes (red boxes) indicate class prediction based on gene copy number. The tree generated is initiated as a root node (Node 1) containing all 38 CIS cases. This node is split based on the value of a gene’s copy number obtained from the list of genes determined on a univariate analysis (Table. (Table.22).

Fig. (9)
Regression tree for CIS. The splitting criterion for each node is given within the blue boxes. Terminal nodes (red boxes) indicate class prediction based on gene copy number. The tree generated is initiated as a root node (Node 1) containing all 38 CIS ...
Table 2
Univariate Analysis for the Importance of Genes Variable
Table 3
Tree Sequence

The parental node (Node 1) was split based on loss of CASP1 copy number (<=1.5) generating terminal Node-1 and predicts a CIS class of LCIS (4 cases). This resulted in classifying 3 LCIS cases and 1 PLCIS into the LCIS class. All other CIS cases (34) become placed in Node 2. Node 2 becomes split initially through assignment of the RELA gene, where gain of RELA (gene copy > 3.5) generates terminal Node 7 and predicts only LCIS (n=2). The remaining 32 CIS cases without gain of the RELA gene (>3.5) are split into Node 3, which is further split by gain of EHF (gene copy > 2.5) into terminal Node-6 predicting 4 PLCIS and reclassifying a LCIS as PLCIS. The remaining 27 CIS cases without gain of EHF become assembled into node 4 through assignment of the NCOA3 gene loss (copy number <1.5) classifying 5 CIS as PLCIS with a resultant reclassification of 1 DCIS as a PLCIS. Node 5 CIS cases (n=22) become further split in terminal node-3 based on STK15 copy number (<2.5) classifying 12 CIS as DCIS without any misclassifications. The remaining 10 CIS in node 6 finally become split into terminal node-4 as a result of TNFRSF1B abnormal gene copy number containing only DCIS cases (n=4) and into terminal Node-5 (gene copy number <1.5) with no misclassi-fications and terminal Node-6 (gene copy number >1.5) for a PLCIS classification to include 5 PLCIS and 1 DCIS.

Four cases of CIS were misclassified; 1 PLCIS reclassified into the LCIS category, 1 LCIS reclassified as a PLCIS, and 2 DCIS cases into the PLCIS class. Error rates for LCIS, PLCIS, and DCIS were11%, 7%, and 17%, respectively, for the learned data (Table (Table4),4), and 33%, 28% and 50% (Table (Table5),5), respectively, based on testing data (results of model validation).

Table 4
Misclassification for Learned Data
Table 5
Misclassification for Test Data


Historically, the molecular pathogenesis of cancer has been examined one gene at a time. A detailed molecular characterization or fingerprint of cancer is an objective recently made possible by the development of several new high-throughput analytical methods. These include techniques for the analysis of DNA, mRNA, and proteins within a cell [1517]. The databases of detailed molecular information can then be linked to clinical information [18]. This approach can help patients by improving classification of tumor types, enabling clinicians to distinguish prognostic groups more accurately and therefore to select the most effective therapies.

Classification and Regression Tree (CART) analysis is a statistical method to partition data sets into logically similar groups based on either numeric or categorical variables. CART produces decision trees, based on simple yes/no questions, to reveal relationships that are sometimes hidden in extremely complex datasets. CART permitted us to quantify the unique relationship between the categories of PLCIS, DCIS, and LCIS and gene copy number variables.

Several things should be pointed out regarding this CART tree. First, it is much simpler to interpret than the multivariate logistic regression model, making it more likely to be practical in a clinical setting. Secondly, the inherent “logic” in the tree is easily apparent, and it makes clinical sense. Interestingly, it has been shown that clinical decision-making rules which make sense to clinicians are more likely to be followed in clinical practice than rules in which the reasoning is not apparent.

All LCIS cases but one were correctly classified into their specific LCIS category. The misclassified LCIS was netted in terminal Node-6 as a PLCIS. A single PLCIS case (1/14) was reclassified into the LCIS category at terminal Node1. Terminal nodes 3 and 4 correctly classified only DCIS lesions (n=12 and n=4, respectively). Two DCIS became reclassified as PLCIS through assignment of NCOA3 and TNFRSF1B gene assignments.

This study demonstrates the ability of CART analysis to predict CIS tissue types molecularly, based on gene copy number variables. Currently, PLCIS is treated like LCIS. However, the aggressive behavior and histological pleomorphism seen in PLCIS indicate a possible association between PLCIS and DCIS that may warrant an altered clinical management. Because negative E-cadherin immunostaining does not discriminate PLCIS from LCIS, nor does it unequivocally differentiate DCIS-ID from PLCIS, additional tools would aid in the categorical classification of CIS lesions as LCIS, DCIS, or PLCIS.

The present study demonstrated a propensity for misclas-sification of DCIS into the PLCIS category. Their genotypic and morphological similarities add weight to consideration of PLCIS as an aggressive lesion. The study provides rationale for the utility of molecular differentiation algorithms in the evaluation of PLCIS and indeterminate CIS lesions.

The purpose of a decision tree is usually to allow the accurate prediction of outcome for future cases, based on the value of gene copy number variables. This is accomplished when a generated decision tree is saved for future use for interrogation with a new dataset to predict outcome. Because of the small sample size, and a less-than-robust validation result, a decision tree like the one generated in this study requires additional verification using an independent dataset, where cases from the new dataset are run through the tree.

From a practical standpoint, once a validated decision tree is generated, the process of CIS classification can be streamlined. Instead of starting from a 112 MLPA gene panel, a refined and focused MLPA panel comprising the 10 validated genes from the panel can provide the fluidity and practicality of an evidence-based targeted gene panel.


This study was supported by NIH CA 70923, DAMD DAMD17-00-1-0288, and DAMD17-02-1-0406 (Dr. Wor-sham).


1. Fisher ER, Costantino J, Fisher B, Palekar AS, Paik SM, Suarez CM, Wolmark N. Pathologic findings from the National Surgical Adjuvant Breast Project (NSABP) Protocol B-17. Five-year observations concerning lobular carcinoma in situ. Cancer. 1996;78:1403–16. [PubMed]
2. Schnitt SJ, Morrow M. Lobular carcinoma in situ: current concepts and controversies. Semin Diagn Pathol. 1999;16:209–23. [PubMed]
3. Bentz JS, Yassa N, Clayton F. Pleomorphic lobular carcinoma of the breast: clinicopathologic features of 12 cases. Mod Pathol. 1998;11:814–22. [PubMed]
4. Frost AR, Tsangaris TN, Silverberg SG. Pleomorphic lobular carcinoma in situ. Pathol Case Rev. 1961;1:27.
5. Reis-Filho JS, Simpson PT, Jones C, Steele D, Mackay A, Iravani M, Fenwick K, Valgeirsson H, Lambros M, Ashworth A, Palacios J, Schmitt F, Lakhani SR. Pleomorphic lobular carcinoma of the breast: role of comprehensive molecular pathology in characterization of an entity. J Pathol. 2005;207:1–13. [PubMed]
6. Jacobs TW, Pliss N, Kouria G, Schnitt SJ. Carcinomas in situ of the breast with indeterminate features: role of E-cadherin staining in categorization. Am J Surg Pathol. 2001;25:229–36. [PubMed]
7. Middleton LP, Palacios DM, Bryant BR, Krebs P, Otis CN, Merino MJ. Pleomorphic lobular carcinoma: morphology, immunohistochemistry, and molecular analysis. Am J Surg Pathol. 2000;24:1650–6. [PubMed]
8. Palacios J, Sarrio D, Garcia-Macias MC, Bryant B, Sobel ME, Merino MJ. Frequent E-cadherin gene inactivation by loss of heterozygosity in pleomorphic lobular carcinoma of the breast. Mod Pathol. 2003;16:674–8. [PubMed]
9. Worsham MJ, Pals G, Shouten J, von Spaendonk R, Concus AP, Carey TE, Benninger MS. Delineating genetic pathways of disease progression in head and neck squamous cell carcinoma. Arch Otolaryngol Head Neck Surg. 2003;129:702–708. [PubMed]
10. Kunjoonju JP, Raitanen M, Grenman S, Tiwari N, Worsham MJ. Identification of individual genes altered in squamous cell carcinoma of the vulva. Genes Chromosomes Cancer. 2005;44:185–193. [PubMed]
11. Worsham MJ, Chen KM, Tiwari N, Pals G, Schouten JP, Sethi S, Benninger MS. Fine-mapping loss of gene architecture at the CDKN2B (p15INK4b), CDKN2A (p14ARF, p16INK4a), and MTAP genes in head and neck squamous cell carcinoma. Arch Otolaryngol Head Neck Surg. 2006;132:409–415. [PubMed]
12. Worsham M, Pals G, Schouten J, Miller F, Tiwari N, van Spaendonk, Wolman SR. High-resolution mapping of molecular events associated with immortalization, transformation, and progression to breast cancer in the MCF10 model. Breast Cancer Res Treat. 2006;96:177–86. [PubMed]
13. Worsham MJ, Chen KM, Meduri V, Nygren A, Errami A, Schouten JP, Benninger MS. Epigenetic events of disease progression in head and neck squamous cell carcinoma. Arch Otolaryngol Head Neck Surg. 2006;132:668–77. [PubMed]
14. Breiman L, Friedman J, Olshen R, Stone C. Classification and regression tress. New York: Chapman & Hall (Wadsworth, Inc.); 1984.
15. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamen-schikov A, Williams CF, Jeffrey SS, Botstein D, Brown PO. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet. 1999;23:41–46. [PubMed]
16. Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM. Expression profiling using cDNA microarrays. Nat Genet. 1999;21:10–4. [PubMed]
17. Oh JM, Hanash SM, Teichroew D. Mining protein data from two-dimensional gels: tools for systematic post-planned analyses. Electrophoresis. 1999;20:766–774. [PubMed]
18. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–7. [PubMed]

Articles from Current Genomics are provided here courtesy of Bentham Science Publishers