|Home | About | Journals | Submit | Contact Us | Français|
The differential diagnosis of clear cell, papillary and chromophobe renal cell carcinoma is clinically important, because these tumor subtypes are associated with different pathobiology and clinical behavior. For cases in which histopathology is equivocal, immunohistochemistry and quantitative RT-PCR can assist in the differential diagnosis by measuring expression of subtype-specific biomarkers. Several renal tumor biomarkers have been discovered in expression microarray studies. However, due to heterogeneity of gene and protein expression, additional biomarkers are needed for reliable diagnostic classification. We developed novel bioinformatics systems to identify candidate renal tumor biomarkers from the microarray profiles of 45 clear cell, 16 papillary and 10 chromophobe renal cell carcinoma; the microarray data was derived from two independent published studies. The ArrayWiki biocomputing system merged the microarray datasets into a single file, so gene expression could be analyzed from a larger number of tumors. The caCORRECT system removed non-random sources of error from the microarray data, and the omniBioMarker system analyzed data with several gene-ranking algorithms, in order to identify algorithms effective at recognizing previously described renal tumor biomarkers. We predicted these algorithms would also be effective at identifying unknown biomarkers that could be verified by independent methods. We selected six novel candidate biomakers from the omniBioMarker analysis, and verified their differential expression in formalin-fixed paraffin-embedded tissues by quantitative RT-PCR and immunohistochemistry. The candidate biomarkers were carbonic anhydrase IX, ceruloplasmin, schwannomin-interacting protein 1, E74-like factor 3, cytochrome c oxidase subunit 5a and acetyl-CoA acetyltransferase 1. Quantitative RT-PCR was performed on 17 clear cell, 13 papillary and 7 chromophobe renal cell carcinoma. Carbonic anhydrase IX and ceruloplasmin were overexpressed in clear cell renal cell carcinoma; schwannomin-interacting protein 1 and E74-like factor 3 were overexpressed in papillary renal cell carcinoma; and cytochrome c oxidase subunit 5a and acetyl-CoA acetyltransferase 1 were overexpressed in chromophobe renal cell carcinoma. Immunohistochemistry was performed on tissue microarrays containing 66 clear cell, 16 papillary and 12 chromophobe renal cell carcinoma. Cytoplasmic carbonic anhydrase IX staining was significantly associated with clear cell renal cell carcinoma. Strong cytoplasmic schwannomin-interacting protein 1 and cytochrome c oxidase subunit 5a staining were significantly more frequent in papillary and chromophobe renal cell carcinoma, respectively. In summary, we developed a novel process for identifying candidate renal tumor biomarkers from microarray data, and verifying differential expression in independent assays. The tumor biomarkers have potential utility as a multiplex expression panel for classifying renal cell carcinoma with equivocal histology. Biomarker expression assays are increasingly important for renal cell carcinoma diagnosis, as needle core biopsies become more common and different therapies for tumor subtypes continue to be developed.
Renal cell carcinoma (RCC) is the major adult malignancy of the kidney; it is subclassified into several subtypes including clear cell, papillary and chromophobe RCC. Renal oncocytoma is a relatively common benign tumor that may be related to chromophobe RCC . Accurate classification is clinically important, because tumor subtypes are associated with different malignant potential, prognoses and optimal therapies . In recent years, we and other groups have used cDNA and oligonucleotide microarrays to characterize gene expression profiles in renal tumor subtypes [3-6]. Based on unique expression patterns, several novel immunohistochemical markers have been identified for each RCC subtype. When used in conjunction with histopathology, these immunohistochemical markers are clinically useful for renal tumor diagnosis [7-10]. However, due to the heterogeneity of gene and protein expression in RCC, additional biomarkers are needed to develop clinically reliable immunohistochemical panels, with adequate diagnostic sensitivity and specificity for each RCC subtype. In this report, we describe the use of novel bioinformatics systems to identify candidate RCC biomarkers from previous microarray data . The bioinformatics systems are designed to combine disparate datasets from independent microarray studies, remove non-random sources of error from the expression data, and analyze expression patterns in the context of pre-existing biological knowledge, in order to identify valid biomarkers more efficiently. Following identification of candidate biomarkers, we describe the verification of selected markers in formalin-fixed paraffin-embedded renal tumor tissues by quantitative RT-PCR and immunohistochemistry.
Microarray datasets were obtained from previously published reports [3, 12]. Schuetz et al utilized Affymetrix HG Focus microarrays with over 8700 probe sets, in a study that included 13 clear cell, 5 papillary and 4 chromophobe RCC. The chromophobe carcinomas were combined with 3 additional renal oncocytomas to form a single class for biomarker discovery (n = 7). Jones et al. utilized Affymetrix HG-U133A microarrays with over 22000 probe sets, in a study that included 32 clear cell, 11 papillary, 6 chromophobe RCC and 12 oncocytomas. The HG-U133A microarray data were reduced to include only those probe sets shared with HG Focus data. The ArrayWiki biocomputing system  combined the microarray datasets into a single data file, in order to increase total sample size, while updating probe annotation with a knowledge management interface based on Wikipedia standards. The URL http://arraywiki.bme.gatech.edu/index.php/Andrew_Young contains visual representations of experiments described in this report. General information on ArrayWiki is available at the URL http://www.bio-miblab.org/arraywiki.
The RCC microarray datasets were analyzed with caCORRECT (chip artifact CORRECTion; http://www.bio-miblab.org), a web-based bioinformatics system that detected and removed localized array, or “chip”, artifacts . For this purpose, artifacts were defined as spatially prominent data variances caused by chip manufacturing or lab processing errors For detection of artifacts, quantile normalization was used to align signal distributions from each microarray and remove global array biases within the set of experiments . After quantile normalization, variance scores were calculated for each probe on each microarray chip, using a modified t-statistic calculated from other chips in a leave-one-out fashion. A sliding window image-processing algorithm was then run to identify high-variance probes that were geographically clustered on the array platform; regions of clustered high-variance probes represented potential artifacts, while geographically isolated high-variance probes were left alone. After this first round was complete, four additional rounds of artifact-omitting quantile normalization, and artifact-weighted artifact detection, were performed in order to identify subtle artifacts that may have been overshadowed in earlier rounds by larger defects. At this point, quality metrics were calculated to describe the artifact coverage and noise content of each chip and of the experiment as a whole. Probe data that were identified as artifacts were then replaced with the probe-specific median intensity of all other chips in the dataset. Completion of the caCORRECT process resulted in the following files: (i) heatmap images of probe variance score for all chips, with and without logical artifact masks; (ii) new versions of “clean” probe expression files with appropriate replacements; and (iii) gene expression value tables, calculated by R implementation of the Robust Microarray Averaging (RMA) algorithm  from data before and after caCORRECT.
The omniBioMarker bioinformatics resource  was used to identify genes expressed differentially in RCC subtypes, using microarray data processed with caCORRECT. Processed gene expression value tables were combined into a master gene expression data file, which was assessed by hierarchical clustering  to ensure that combined data continued to classify RCC subtypes as in the original reports [3, 12]. omniBioMarker then analyzed the combined RCC data in an iterative fashion using support vector machine classifiers (SVM) to rank the genes individually by classification ability, as determined by bootstrapping . In order to identify the optimal algorithm for subtype classification, omniBioMarker varied the SVMs by adjusting two parameters that control classifier complexity and generalization ability. The algorithm searched for the best set of parameters over a predefined parameter space. In the first iteration, omniBioMarker ranked the performance of each SVM classifier using control biomarkers, which were defined as biomarkers that had been verified in previous studies by RT-PCR or IHC [3, 20]. The optimal classifier generally ranked control biomarkers before non-control biomarkers. After identifying the optimal gene-ranking classifier for the combined RCC microarray data, the corresponding ranking results were used to identify additional candidate biomarkers with consistent differential expression in clear cell, papillary and chromophobe RCC. Candidates were interpreted with Gene Ontology analysis tools including GO-Miner , and selected for subsequent verification by quantitative RT-PCR and IHC.
Gene expression was assessed by quantitative RT-PCR, using total RNA from fixed tissues of 17 clear cell, 13 papillary and 7 chromophobe RCC. Duplicate experiments were performed according to published protocols with minor modifications : Histological sections were deparaffinized with ethanol and xylene, and cells of interest were microdissected with a sterile scalpel. Tissues were digested in buffer containing proteinase K at 60°C overnight. RNA was extracted with phenol/chloroform, and genomic DNA was removed with DNase. RNA quality and quantity were assessed with a Bioanalyzer (Agilent Technologies). Up to 3 μg of RNA was used for first strand cDNA synthesis with Superscript III (Invitrogen). PCR was performed with a custom-designed Taqman Low Density Array (LDA, Applied Biosystems) in a 96-well microfluidic card format, using the ABI PRISM 7900HT Sequence Detection System (high-throughput real-time PCR system). Gene expression data were normalized relative to the geometric mean of two housekeeping genes (18S, ACTB). LDA runs were analyzed by using Relative Quantification (RQ) Manager (Applied Biosystems) software. The following test genes were analyzed: carbonic anhydrase IX (CA9, Assay ID: Hs00154208_m1, Applied Biosystems); ceruloplasmin (CP, Assay ID: Hs00236810_m1, Applied Biosystems); schwannomin-interacting protein 1 (SCHIP1, Assay ID: Hs00205829_m1, Applied Biosystems); E74-like factor 3 (ELF3, Assay ID: Hs00231786_m1, Applied Biosystems); cytochrome c oxidase subunit 5a (COX5A, Assay ID: Hs00362067_m1, Applied Biosystems); and acetyl-CoA acetyltransferase 1 (ACAT1, Assay ID: Hs00608002_m1, Applied Biosystems). Test gene expression was normalized to 18S ribosomal RNA and referenced to a normal kidney reference RNA specimen. Relative normalized gene expression was compared in renal tumor subtypes, with statistical significance assessed by two-tailed T-test.
Selected biomarkers were further verified by immunohistochemistry, performed on the KIC1501 tissue microarray (Clonagen), which included 66 clear cell, 16 papillary and 12 chromophobe RCC. Tissue sections were incubated with the following primary antibodies: anti-CA9 (rabbit polyclonal serum, Novus Biological), anti-SCHIP1 (rabbit polyclonal IgG, Sigma), and anti-COX5A (rabbit polyclonal IgG, Protein Tech Group). After washing unbound primary antibody, sections were treated with goat anti-rabbit immunoglobulin conjugated to a peroxidase-labeled polymer, according to the manufacturer's instructions (Envision kit; DAKO Corp., Carpinteria, CA). Immunohistochemical reactions were developed with diaminobenzidine as the chromogenic peroxidase substrate. Sections were counterstained with hematoxylin after immunohistochemistry. Specificity was verified by negative control reactions without primary antibody, as well as appropriate staining reactions in positive control tissues. The intensity of immunohistochemical staining in tumor cells was graded as negative (0), weak (1+), moderate (2+) and strong (3+); negative-to-weak staining was classified as low-level expression, whereas moderate-to-strong staining was classified as high-level expression. Frequency of cases with high-level expression was compared among renal tumor subtypes, with statistical significance assessed by chi-square analysis.
We analyzed two RCC microarray datasets with a series of novel bioinformatics systems, in order to identify candidate diagnostic biomarkers. First, the ArrayWiki knowledge management system was used to combine and annotate the datasets in compatible formats. The caCORRECT quality assurance system was then used to identify non-random physical artifacts on the microarrays, and eliminate potentially confounding results from these defective regions. Examples of chip artifacts included scratches on the array surface and bubbles in the hybridization medium (Figure 1). Next, the omniBioMarker system was used to analyze the combined microarray dataset with a variety of support vector machine classifiers, in order to identify optimal algorithms for identifying candidate RCC biomarkers. In this step, the support vector machines were compared for performance in classifying RCC subtypes, using only biomarkers in the dataset that had been previously verified in independent studies by quantitative RT-PCR or immunohistochemistry. The strongest algorithm was then applied to the entire dataset, in order to identify additional gene products with differential expression in RCC subtypes. From this group of gene products, we selected candidate diagnostic biomarkers for subsequent verification by quantitative RT-PCR and immunohistochemistry.
Differential expression of six gene products was verified by quantitative RT-PCR, using formalin-fixed paraffin-embedded specimens from 17 clear cell, 13 papillary and 7 chromophobe RCC (Figure 2). CA9 and CP were overexpressed in clear cell RCC (p = 9.83 × 10-05 and 3.59 × 10-06, respectively); SCHIP1 and ELF3 were overexpressed in papillary RCC (p = 1.48 × 10-03 and 4.14 × 10-08, respectively); and COX5A and ACAT1 were overexpressed in chromophobe RCC (p = 1.32 × 10-05 and 1.40 × 10-07, respectively). Differential expression of CA9, SCHIP1 and COX5A was further verified by immunohistochemistry, using commercial primary antibodies and a formalin-fixed paraffin-embedded tissue microarray that included 66 clear cell, 16 papillary and 12 chromophobe RCC (Figure 3). By immunohistochemistry, CA9 was strongly overexpressed in the tumor cell cytoplasm of clear cell RCC (p < 0.001). Cytoplasmic SCHIP1 staining was seen in papillary and clear cell RCC, but 3+ intensity was significantly more frequent in tumor cells of papillary RCC (p < 0.001). Cytoplasmic COX5A staining was seen in all RCC subtypes; however, 3+ intensity was significantly more frequent in tumor cells of chromophobe RCC (p < 0.02). Immunohistochemical data are summarized in Table 1.
Gene expression profiling is an important approach to discover molecular markers for diagnostic pathology. Microarrays are used to identify complex expression profiles, which are screened to identify large numbers of differentially expressed genes. These differential expression profiles provide a list of candidate diagnostic biomarkers for clinical pathology laboratories, using assays such as immunohistochemistry and quantitative RT-PCR [3, 4, 6-10, 23-25]. While this approach has been applied effectively, it remains limited because the number of genes in most microarray studies exceeds the number of experimental samples by several orders of magnitude. Therefore, differential expression profiles tend to contain numerous false positives (candidate biomarkers that are not verified when different samples and analytical methods are tested) and false negatives (true biomarkers that are not differentially expressed among the small number of samples in the microarray study). In order to maximize the potential contribution of microarray technology, new information tools are needed to identify candidate biomarkers with the greatest likelihood of validity. In this report, we describe an integrated series of biocomputation systems, called ArrayWiki, caCORRECT and omniBioMarker, which we developed to make the process of biomarker discovery effective and efficient [11, 13, 14, 17]. The bioinformatics systems are designed to maximize the experimental sample size, remove systematic error from the microarray data, and empirically identify optimal algorithms to identify differentially expressed biomarkers. Our data are presented publicly at the ArrayWiki Internet site (see Materials & Methods). Data in ArrayWiki are open to community contribution, comment, and modification, using syntax and structure common to Wikipedia and similar resources [26, 27]. Ongoing annotation from the community could enhance the value of this knowledge base for future biomarker discovery experiments.
We selected six candidate RCC biomarkers for verification by RT-PCR and immunohistochemistry; each biomarker has potential relevance for renal tumor pathobiology and clinical management. Carbonic anhydrase IX (CA9) and ceruloplasmin (CP) were verified as biomarkers for clear cell RCC. CA9, a hypoxia-inducible protein, is well-established as a clear cell RCC biomarker [20, 28]. It is overexpressed in clear cell tumors compared to benign lesions and other RCC subtypes, and thus may be useful for diagnostic classification. Several studies also suggest that comparatively low CA9 expression in clear cell RCC is a negative prognostic indicator [29, 30]. In addition, CA9 is a potentially important therapeutic biomarker, since it is the protein target for G250 monoclonal antibody-based immunotherapy and vaccines against clear cell RCC . Along with CA9, the acute phase reactant CP was overexpressed in clear cell RCC in our previous microarray experiments. This pattern was seen with many other genes related to inflammation and the acute phase response [3, 6]. Similarly, other groups have identified CP as a clear cell RCC biomarker by suppression subtractive hybridization [32, 33]. In addition, serum CP protein levels are elevated in patients with RCC and other malignancies compared to healthy controls [34-38]. CP has antioxidant properties that may be involved with the host response to neoplasia .
Schwannomin-interacting protein 1 (SCHIP1) and E74-like factor 3 (ELF3) were verified as biomarkers for papillary RCC. Neither gene product has been described in RCC previously. SCHIP1 was first discovered as a protein that interacts specifically with spliced isoforms and naturally occurring mutants of neurofibromatosis type 2 (NF2) tumor suppressor protein, also known as schwannomin or merlin . Specific interactions of NF2 with a variety of proteins, including SCHIP1, have been associated experimentally with the PI3-kinase, MAP kinase and small GTPase signaling pathways, which may represent therapeutic targets for inhibiting tumor proliferation . ELF3 (also termed ESX and ESE-1) encodes an ETS-family nuclear transcription factor that is expressed specifically in epithelial cells [42, 43]. Microarray studies have shown that ELF3 is overexpressed in several types of carcinoma and sarcomas with epithelial differentiation [43-45]. Transfection of ELF3 into breast epithelial cell lines results in malignant transformation [46, 47]. ELF3 expression may be involved in feedback regulatory pathways with transforming growth factor beta type II receptor and erbB2 receptor, and thus may be a potential therapeutic target [48-50].
Cytochrome c oxidase subunit 5a (COX5A) and acetyl-CoA acetyltransferase 1 (ACAT1) were verified as biomarkers for chromophobe RCC. The COX5A gene product is localized to the mitochondrion and is critical for oxidative phosphorylation . Proteomic studies have shown that COX5A is expressed differentially in gastric carcinoma . In addition, we have shown that chromophobe RCC overexpresses many mitochondrial proteins and other gene products related to energy pathways, electron transport, and oxidative phosphorylation [3, 6], a molecular signature that may reflect the abundant mitochondria in neoplastic cells of these tumors . Previous research on renal tumors has correlated high content of oxidative phosphorylation complexes with a slow growing, noninvasive phenotype . ACAT1 is an integral membrane protein that localizes to the endoplasmic reticulum. It controls cholesterol ester formation in kidney and other organs . Interferon gamma and STAT1 regulate ACAT1 expression in prostate cancer cells [56, 57], but the regulation of ACAT1 in renal cancer is still unknown.
In summary, we describe several candidate biomarkers for RCC classification, derived from microarray data analyzed with a series of novel biocomputational tools. These biomarkers have potential clinical utility as a multiplex expression panel for classification of RCC with equivocal histology, when combined with H&E morphology and other gene expression or cytogenetic studies. For many tumor types, multiplex biomarker panels are more sensitive and specific than assays for individual markers [10, 20, 58]. In addition, multiplex immunohistology platforms will be important to develop the emerging class of immunoassays based on nanoparticle optical detection tags [59, 60]. Therefore, CA9, SCHIP1 and COX5A could be combined with other immunohistochemical markers for renal tumor classification, such as glutathione S-transferase (GSTA) and adipophilin (ADFP) for clear cell RCC [23, 25]; alpha methyacyl CoA racemase (AMACR) for papillary RCC ; and parvalbumin (PVALB), beta defensin-1 (DEFB1), claudin 7 (CLDN7) and claudin 8 (CLDN8) for chromophobe RCC [7, 10]. Multiplex immunohistochemical profiling of RCC is likely to become more important for classification, as the use of diagnostic needle core biopsies grows, and differential therapies for primary and metastatic RCC subtypes continue to be developed [2, 61, 62].
This work was supported by a grant from the National Cancer Institute Centers for Cancer Nanotechnology Excellence Program (U54CA119338).
Disclosure/Conflict of Interest: The authors declare no conflicts of interest related to this work.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.