A total of 47 samples were selected for profiling at the DNA and pre-miRNA level ( and Table S1
). These include the largest number of PEL cell lines to date (n
14). Four KSHV-negative Burkitt lymphoma cell lines were included as controls as they are expected to transcribe miRNAs that are common to B lineage lymphomas. The pre-miRNA difference between these samples and PEL defines part of the PEL signature 
. For this study, we added 9 tonsil tissues as a normal tissue control. These serve to determine which pre-miRNAs are highly abundant in B cells, as tonsils consist of over 50% B cells, including germinal center (GC) B cells, which many assume to be the normal precursor of PEL 
. Two T-cell lymphoma cell lines were included to differentiate T cell pre-miRNAs.
For the first time, KS primary biopsies were also assessed for pre-miRNA transcription. We collected 9 AIDS-KS skin biopsies from the Americas. The biopsies were collected by individuals with experience in KS clinical trials. They are considered representative lesions for the purpose of tumor and response staging. The majority of cells in each biopsy are KSHV-infected endothelial cells 
. All biopsies were from male subjects with a median age of 44 years (range 30–57). Patients had biopsy-confirmed KS and were on HAART as well as concurrent chemotherapy. Their median CD4 count was 78 cells/microliter (range 7–402). CD4 counts were not available for one subject. All patients had extensive cutaneous KS and with the exception of one are alive at present. Tumor samples were obtained within the last three years. Hence, these patients represent the current post-HAART AIDS epidemic.
Two immortalized virus-negative endothelial cell lines were included in the arrays, as well as isogenic controls carrying latent KSHV 
. A similar model exists in the E1 TIVE and L1 TIVE cell lines 
. These currently represent the best human cell culture tumor model for KS, as these two cell lines induce KS-like tumors in nude mice with 100% efficiency. Also included is the only known KS-derived cell line, SLK 
, which has lost the KSHV genome, but is tumorigenic in mice.
As positive control we used DNA as the input and real-time QPCR and primers directed against the pre-miRNA as described 
. We were able to independently verify KSHV-infection status for each sample. Likewise, the EBV miRNA genes were detectable only in the EBV-positive PEL and BL cell lines, but not KS or any other samples. EBV miRNA genes were not detectable in normal tonsil tissue. The relative copy number for the KSHV miRNA genes was significantly lower in KSHV carrying endothelial cell lines compared to PEL (see Figure S4
). This is consistent with earlier reports that PEL carry more viral plasmids (50~100 copies/cell) than KS (~10 copies/cell) and KSHV-infected endothelial cell cultures 
. Among the KSHV infected endothelial cell models, the HMVEC carried the highest KSHV genome number, suggesting that they are most capable of maintaining high levels of the KSHV plasmid. This is consistent with earlier studies showing that not all endothelial cells are equally permissive for KSHV infection, which drives reprogramming towards lymphatic endothelial cells 
Unsupervised clustering reveals the pre-miRNA profile of KS
Our pre-miRNA data set, which included 160 primer pairs, representing 145 cellular miRNAs, 9 viral miRNAs, 2 viral mRNAs and 4 cellular RNAs (U6), and 47 samples consisted of >20,000 individual data points. QPCR measures target abundance on a 2
log scale with higher CT numbers reflecting lower abundance. For this analysis, the average of the triplicate CT values was taken. These were normalized to U6 levels, to give dCT. Note that dCT values represent the underlying pre-miRNA levels on a 2
log scale thus facilitating robust clustering 
. Following normalization, each sample set was Z-standardized to remove variation between samples 
shows the heatmap representation after hierarchical clustering for the full panel of samples, with red indicating a higher level of expression and blue indicating a lower level of expression compared to the median of all data (white). 6 distinct groups were identified. These represent the minimal number of non-overlapping clusters based on principal component analysis (PCA) (data not shown). The first two groups represent the pre-miRNAs that are unchanged across all samples, those with low levels of expression (I in blue) and those with high levels of expression across all samples (II in red). The KSHV pre-miRNAs all cluster in group III. Group IV represents the pre-miRNAs that are downregulated in KSHV-positive cells. 20 miRNAs are contained in this group. Group V represents 11 cellular miRNAs that are highly expressed in immortalized HUVEC and HMVEC cells, both uninfected and KSHV-infected, but not any of the tumor cell lines and biopsies. They do not appear to be significantly enriched in any of the other endothelial cell types (KS or TIVE). Finally, group VI contains cellular miRNAs that are downregulated in all B-cell lymphomas, including PEL, vis-à-vis tonsil and KS.
Pre-miRNA profiling of all samples.
To remove the impact of lineage-specific determinants [B cell (PEL and Tonsil) vs. endothelial cell] from the analysis, we analyzed the two KSHV-associated cell types separately. Our analysis of PEL specific miRNAs was previously published 
and analysis of the extended data set confirmed this observation (data not shown). When the endothelial-derived subset of samples was analyzed alone, a clearer picture emerged that highlights similarities and disparities between different stages of endothelial cell transformation (). The groups represent the minimal number of non-overlapping clusters based on PCA (data not shown). The first two groups (I and II) represent miRNAs with minimal discernable patterns across all samples—at least at the power of our analysis. Blue indicates low levels while red indicates comparable high levels of miRNAs, vis-à-vis the median of all data in this set. This is not to say that pre-miRNAs within these two clusters did not exhibit any change between samples classes, only that these changes were smaller compared to others and therefore less interesting from a biomarker perspective. For example, mir-222 clusters in group II because it was more highly transcribed in all samples relative to 50% of all other pre-miRNAs. Nevertheless mir-222 was downregulated in KSHV-infected, tumorigenic samples, compared to EC. The pattern of mir-222 parallels that of mir-221, which is expected because of their known co-regulation 
. However, the range of change was much larger for mir-221 as seen in group IV.
Pre-miRNA profiling of primary KS biopsies and endothelial derived cell cultures.
Group III shows the pre-miRNAs that are upregulated upon KSHV infection of EC and increased even more in KS and in the tumorigenic L1/E1 clones. This group includes the KSHV pre-miRNAs along with the cellular pre-miRNAs let-7a-1, let-7a-2, let-7a-3, mir-7-1, mir-27a, mir-125b-2, mir-140, mir-152, mir-181c, mir-194-2, and mir-220. The detailed transcription pattern for this group is shown as a bar graph for pre-miRNA let-7a-3 (, all bar graphs show Z-standardized values of median dCTU6).
To demark the degree of viral latent transcription, LANA mRNA levels are shown (). LANA is transcribed in all KSHV-positive samples but not the KSHV-negative SLK, HUVEC or HMVEC cell lines. KSHV latent RNA levels correlated positively with increasing tumor-forming capability of the infected cells (p≤10−13 by ANOVA of linear model). They were undetectable in uninfected cells, lowest in KSHV-infected HUVEC and E1/L1 cells in culture, higher in E1 mouse tumors and KS lesions and highest in PEL (data not shown). This was mirrored by KSHV pre-mir-K12-2 (). KSHV pre-miRNA transcription levels correlated KSHV plasmid copy number (DNA) as measured by real-time QPCR using the same primer sets with DNA as input (data not shown). The positive correlation between the level of viral miRNA and the relative tumorigenicity of the sample class supports a causal role for miRNA in KS tumorigenesis. It suggests that KSHV miRNAs are required to maintain the KS tumor phenotype. Group IV contains a set of 8 cellular miRNAs that are highest expressed in KS tumors only, compared to cell lines. These include mir-24-2, mir-30c-2, mir-125a, mir-130a, mir-196, mir-215, mir-218-2, and mir-367. The bar graph of mir-24-2 levels in serves as an example for the pre-miRNA expression pattern of this group, for which miRNA levels were highest in KS tumors and significantly lower in other samples whether KSHV-infected or not. As expected for all primary tumor samples, we observed more heterogeneity in the KS biopsies compared to clonal cell lines. This necessitated the use of 9 independent biopsies, which is a larger number then used in prior KS mRNA array analyses. With this number of biopsies, PCA analysis validated the significance of cluster membership for all pre-miRNA, including those that group in cluster IV.
Group V compromises a group of 13 cellular pre-miRNAs with highest levels in the E1 and L1 TIVE cell lines. These pre-miRNAs were present at higher levels in E1/L1 cells even compared even to KS biopsies. These are mir-17, mir-22, mir-28, mir-32, mir-128b, mir-135b, mir-143, mir-151, mir-181b-2, mir-205, mir-213, mir-216 and mir-372. The bar graph of mir-32 expression in is an example of the pre-miRNA expression pattern for this group.
Group VI consists of 13 pre-miRNAs with highest levels in the non-tumorigenic endothelial HUVEC and HMVEC cell lines, whether KSHV-infected or not. These are mir-26b, mir-29a, mir-34b, mir-92-1, mir-93, mir-133a-1, mir-133a-2, mir-193, mir-221, mir-223, mir-301, mir-323 and mir-346. 11 of these miRNAs were also contained in the HUVEC/HMVEC upregulated cluster from the larger data set (). Additionally, mir-34b and mir-92-1 fell into this group upon clustering of only the endothelial cell data. The histogram of mir-29a expression in is an example of the pre-miRNA transcription pattern for this group, with highest levels in both infected and uninfected HUVEC/HMVEC cells and significantly lower levels in all other samples.
Group VII is the inverse of group VI and consists of miRNAs with undetectable levels in the endothelial HUVEC and HMVEC, whether KSHV-infected or not. This group compromises 11 cellular pre-miRNAs: mir-7-2, mir-9-2, mir-30b, mir-107, mir-135a-2, mir-153-1, mir-153-2, mir-181b-2, mir-197, mir-325 and mir-370. The bar graph of mir-370 expression in is an example of the pre-miRNA transcription pattern of this group.
In sum, unsupervised clustering as a discovery tool identified (i) distinct stages of endothelial cell transformation and (ii) specific pre-miRNAs that serve as biomarkers for each of them.
One of the concerns in profiling cell lines in culture is that the transcription signature may be reflective of a particular proliferation state rather than a general characteristic of the tumor subtype. Proliferation dependence is well documented for mRNA levels in fibroblasts 
. For several miRNAs, too, proliferation and miRNA transcription rates are linked 
. To guard against this fallacy, we only used RNA derived from log-phase cells for our profiling analysis. Nevertheless, to test the hypothesis that some miRNA levels were proliferation state dependent, we conducted a time course experiment for the E1 and L1 TIVE cell lines (see Figures S1
). This revealed a very limited number of pre-miRNAs that were enriched in log-phase cells compared to stationary phase cells and vice versa. They were at the lower limit of detection and additional experiments are needed to validate the biological significance of this observation.
miRNAs as endothelial cell tumor stage biomarkers
Unsupervised comparisons represent the first level of large scale profiling studies. Here, they revealed (i) the existence of multiple distinct steps of endothelial cell transformation and (ii) pre-miRNAs that were selectively transcribed in one or more stages and that therefore serve as biomarkers. The latter were further validated by supervised class prediction methods. Based upon pre-miRNA clustering ( and ) and published phenotype (), we defined the following classes: Endothelial cells (E), KSHV-infected endothelial cells (EK), endothelial cells that have the ability to form tumors in nude mice (ET), which includes the KSHV-positive TIVE- E1, L1 cell lines as well as KSHV-negative SLK cells), xenograft tumors of TIVE- E1 cells consisting of 5 independent samples (ETM), KS patient biopsies (KS), PEL (P), and as negative controls tonsil (TN) and non-KSHV associated lymphomas (TM).
First, we conducted pair-wise comparisons between classes using the median dCTU6
for each class (Figure S2
). The two TERT-immortalized EC cell lines HUVEC and HMVEC exhibited a nearly identical pre-miRNA transcription pattern (r2
0.7238). Infection with KSHV of these immortalized cell lines did result in changes (r2
0.6798). Of note, this comparison is between median levels for the two EC cell lines (HUVEC and HMVEC) and three independent clones of tightly latently infected TERT-HUVEC cells. Thus, it exhibited more variability than a pair-wise comparison of just two cell lines. The most drastic change in overall pre-miRNA transcription emerged when comparing KSHV-infected, non-tumorigenic EC cell lines to the two KSHV-infected, highly tumorigenic E1/L1 cell lines. Here, we failed to detect any linear correlation. The two TIVE cell lines E1 and L1, of course, exhibited a strikingly similar pattern of pre-miRNA transcription as shown in detail in Figure S4
and Figure S1
. The pair-wise comparison between E1/L1 cells in culture to E1 xenograft tumors showed a reasonable linear correlation, but less than between different culture models (r2
0.5684). Analysis of residuals identified all KSHV pre-miRNAs as well as mir-223 to be significantly upregulated in the tumorgraft (data not shown). Since there are no human infiltrating lymphocytes in the SCID mouse model, and since the tumor vasculature is made of murine endothelial cells, any changes in pre-miRNA composition reflect the grafted human tumor cells. Importantly, the comparison between E1 xenograft tumor biopsies and patient KS biopsies yielded a better correlation (r2
0.5846) than between E1/L1 cells in culture and E1 tumor grafts. This reinforces the results of the phenotypic characterization of E1/L1 cells 
and demonstrates that the E1/L1 xenograft model adequately mimics primary KS patient biopsies.
Next, we identified and validated a set of diagnostic pre-miRNA biomarkers that signify the different steps of endothelial cell transformation. To do so we used the miRNAs identified by hierarchical clustering (), extended the dataset to include mouse xenograft tumor samples and used visual inspection followed by ANOVA and appropriate pair-wise t-test to identify pre-miRNAs with distinct distributions among the different steps of endothelial cell transformation. To give a better impression of within class variability, plots individual dCTU6 for cellular pre-miRNAs including technical replicates for each class.
Validation of 4 individual biomarkers for endothelial cell transformation.
The mir-221 pre-miRNA emerged as a biomarker for the transition from immortalized to tumorigenic endothelial cells independent of KSHV infection status (). Mir-222 was co-regulated with mir-221, but did not change as dramatically (data not shown). Since mir-221/222 exhibit tumor suppressor activity in endothelial and other cancer models 
, this suggests that the down-regulation of the mir-221 biomarker is of biological significance.
Model of KSHV-dependent progressive transformation of endothelial cells as identified by pre-miRNA clustering.
The mir-15 pre-miRNA is an example for miRNAs that exhibit the opposite pattern of transcription as mir-221. Therefore it did contribute additional information that would have improved tumor classification. It was high in tumorigenic KSHV-infected endothelial calls, KS and PEL (data not shown). There was one significant difference between mir-15 and mir-221 expression: the KSHV-negative SLK cells transcribed significantly lower levels of mir-15. In a separate analysis of only the endothelial/KS sample and excluding SLK cells (data not shown), mir-15 levels correlated closely with KSHV latent mRNA and miRNA transcription and can thus be considered KSHV –regulated.
The mir-140 pre-miRNA levels correlated linearly with tumor status. It was present at appreciable levels only in the xenograft tumors and KS biopsies, but not KSHV-infected cells grown in culture (, class ETM, KS). Pre-mir-140 levels did not distinguish tonsil and PEL, since 50% of PEL lines as well as all KSHV-negative lymphoma lines had only very low levels of mir-140. Hence, the utility of mir-140 as a biomarker is limited to the endothelial lineage, but not lymphatic lineage cancers.
The mir-24-2 pre-miRNA levels were strikingly elevated only in KS biopsies, not E1 xenograft tumors or PEL (). It therefore serves as a KS-specific biomarker and not as a marker for KSHV-associated transformation. This may have utility for clinical diagnosis, but more importantly it represents at least one molecular difference between clinical KS lesions and all available tissue culture models. In other words, any of the KS-specific mir-24-2 dependent reprogramming of target mRNA and protein levels is not captured in our current, laboratory-based understanding of KS and KSHV biology.
To establish the utility of these four biomarkers for endothelial cell tumorigenesis, we calculated cumulative density distributions (cdf) () and a decision tree (). Pre-mir-221 and pre-mir-24-2 showed steep cdfs, which allowed for binary classification into positive and negative classes. Pre-mir-140 () showed an almost linear cdf consistent with gradual changes among multiple sample classes. This is reflected in the minimal decision tree () that computes cut-off values for each miRNA to yield the most parsimonious and accurate classification schema. Similar decision trees could be derived using other representative miRNAs from each of the clusters identified in . We also built decision trees based on just viral pre-miRNA levels (data not shown). These were comparable to ANOVA for individual pre-miRNAs, since KSHV genome copy number (Figure S3
), latent RNA levels and latent pre-miRNA levels were all correlated (they clustered together by unsupervised clustering (, ) and increased progressively with increasing tumorigencity.
In sum, supervised classification established (i) the presence of molecularly distinct, progressive steps of endothelial cell transformation and (ii) a set of biomarkers that distinguishes between these steps.