Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cytometry A. Author manuscript; available in PMC 2010 July 1.
Published in final edited form as:
PMCID: PMC2763559

Automated Quantification of DNA Demethylation Effects in Cells via 3D Mapping of Nuclear Signatures and Population Homogeneity Assessment1



Today’s advanced microscopic imaging applies to the preclinical stages of drug discovery that employ high-throughput and high-content three-dimensional (3D) analysis of cells to more efficiently screen candidate compounds. Drug efficacy can be assessed by measuring response homogeneity to treatment within a cell population. In this study topologically quantified nuclear patterns of methylated cytosine and global nuclear DNA are utilized as signatures of cellular response to the treatment of cultured cells with the demethylating anti-cancer agents: 5-azacytidine (5-AZA) and octreotide (OCT).


Mouse pituitary folliculostellate TtT-GF cells treated with 5-AZA and OCT for 48 hours, and untreated populations, were studied by immunofluorescence with a specific antibody against 5-methylcytosine (MeC), and 4,6-diamidino-2-phenylindole (DAPI) for delineation of methylated sites and global DNA in nuclei (n=163). Cell images were processed utilizing an automated 3D analysis software that we developed by combining seeded watershed segmentation to extract nuclear shells with measurements of Kullback-Leibler’s (K-L) divergence to analyze cell population homogeneity in the relative nuclear distribution patterns of MeC versus DAPI stained sites. Each cell was assigned to one of the four classes: similar, likely similar, unlikely similar and dissimilar.


Evaluation of the different cell groups revealed a significantly higher number of cells with similar or likely similar MeC/DAPI patterns among untreated cells (~100%), 5-AZA-treated cells (90%), and a lower degree of same type of cells (64%) in the OCT-treated population. The latter group contained (28%) of unlikely similar or dissimilar (7%) cells.


Our approach was successful in the assessment of cellular behavior relevant to the biological impact of the applied drugs, i.e. the reorganization of MeC/DAPI distribution by demethylation. In a comparison with other metrics, K-L divergence has proven to be a more valuable and robust tool for categorization of individual cells within a population, with potential applications in epigenetic drug screening.

KEY TERMS: Cytomics, DNA methylation, dissimilarity, cell population homogeneity, Kullback-Leibler’s divergence, 3D nuclear mapping, watershed, image cytometry, epigenetic drug screening, high-content analysis


Topological analysis of the distribution of proteinaceous and nucleic acid components of the cell, in particular mammalian cell nuclei, is helpful in understanding cellular functions in the state of health versus disease [110]. Correlations between the distribution of cellular proteins and/or fractions of nuclear DNA and certain diseases has allowed mammalian cells to be utilized as useful models in the search for appropriate disease treatment, in the context of systems biology [11,12]. With the availability of today’s more advanced imaging approaches (including confocal laser scanning microscopy, two-photon excitation microscopy, high content cell imaging, and automated tissue scanning), high resolution optical imaging has evolved into an essential tool for moving new chemical entities through the pharmaceutical discovery pipeline utilizing cell-based assays. Imaging advantages for drug discovery are realized through the ability of high-resolution microscopic imaging to measure the spatial and temporal distribution of molecules and cellular components, which is vital to understanding the activity of drug targets at the cellular level. Thus, microscopic imaging applies to the preclinical stages of drug discovery for exploratory studies, target identification and validation, lead generation and optimization, and biomarker discovery [13]. Drug efficiency can be measured by the uniformity of cellular response upon drug application, focusing on what percentage of cells in a population has reacted to the applied drug. More interestingly, compound effects can be evaluated by imaging changes in the relevant proteins’ distribution patterns and or nucleic acid loci which function as drug targets. This new, cytomic approach [1,2] is gaining momentum by decreasing attrition in the very costly process of drug development.

Epigenetic changes, such as DNA methylation and histone modification, play a key role in cellular differentiation [1416]. Aberrant global methylation patterns are associated with several cancer types. Methylation pattern imbalances in cancer cells include genome-wide hypomethylation and localized aberrant hypermethylation of CpG dinucleotides (CpG islands) in promoter regions of tumor suppressor genes [17,18]. The reversible nature of epigenetic aberrations constitutes an attractive therapeutic target, and epigenetic cancer therapy with demethylating agents has already shown to be promising [19]. Demethylating agents cause structural reorganization of the genome in cell nuclei, as they not only alter the DNA methylation load but also influence its spatial distribution [20,21]. Therefore, in a previous image-based cytometrical approach, we delineated methylcytosine (MeC) and overall DNA in AtT20 mouse pituitary tumor cells by means of immunofluorescence, and revealed significant differences in the patterns of MeC and DAPI-derived signals between untreated and a subpopulation of these cells treated with 5-azacytidine (5-AZA) [22], a demethylating agent that has been reported to change methylation patterns on a genomic scale [23]. Therefore, image-based assessment of DNA methylation patterns may provide a powerful technique for characterizing mammalian cells during differentiation and their status of health versus disease, as the underlying molecular processes involve large-scale chromatin reorganization, which is visible by light microscopy [2429].

Today’s advanced cellular imaging systems can produce multispectral two-dimensional (2D) and 3D data in quantities that often require machine vision support to assess and quantify the degree of individual cell similarity within an entire cell population based on cellular features. Topological analyses typically necessitate the segmentation of cellular regions of interest (ROI), including the entire cell and/or subcellular compartments such as the nuclei. This process involves the delineation of the ROI, recognition of residing patterns, and statistical quantification of these patterns with dedicated algorithms. So far, nuclear features have been analyzed in one of the following three ways: (i) comparing a known or unknown pattern with a reference pattern using statistical tests; (ii) classification of patterns through supervised learning, utilizing decision trees, support vector machines and neural networks; or (iii) clustering, in which the distance between points in feature space is used as a discriminating factor [30]. The features are measurements reflecting complete cellular or just nuclear morphology, fluorescence intensity and texture. For example Strovas et al. normalized the intensity of a variant of green fluorescent protein (GFPuv) from methylotrophy promoter (PmxaF) of single cells to their size, in Methylobacterium extorquens AM1 culture. This served as a descriptor of cell-to-cell heterogeneity in growth rate and gene expression in response to antibiotics [31]. Knowles et al. measured protein distribution through radial bright features within nuclei to identify changes in tissue phenotype [32]. Lin et al. employed linear discriminant analysis with nuclear models that were constructed from user-provided training examples to distinguish different cell types [33]. Markovian and fractal features [34], Zernike moments, co-occurrence matrices [35] and features generated by Gabor transformation have been commonly used in recognizing subcellular structures [36]. Yet, the sensitivity of texture features depends strongly on the optical system setup, such as focusing, image magnification and object positioning. In the description of cellular structures, the textural, morphological and intensity features are usually complementary.

The use of features in the quantitative description of 3D nuclear architecture is employed in many biological and medical applications, ranging from in situ studies of DNA, protein localization and migration in living cells, exploration of the structural aspects of cell division to investigations of the role of nuclear alterations in pathology [610,37,38]. These approaches mostly consider the statistical distribution of one target, a protein or DNA fragment (single gene copy or genomic region) to be analyzed. In those cases, a reference pattern detected under specific conditions is usually defined and compared to protein/DNA distribution patterns that result from changes in culture conditions. However, image-based cytometry, which readily considers two or more parameters at the same time, would largely benefit from algorithms that can statistically assess patterns of multiple cellular targets. This is especially valuable in the discovery of pathways that can be targeted in drug discovery. Here we report the development and application of a novel comparison-based approach that provides a statistical measurement on the two classes of DNAs; MeC and DAPI-positive global DNA, as nuclear targets. The algorithm compares the relative distribution of signals derived from these two targets (from two color-channels), projects them onto scatter plots, and then measures the degree of similarities between the plotted signal distributions of cells within a population [22]. This method offers a way to evaluate cellular response to external factors such as drugs and changes in culture conditions via a dissimilarity assessment of relevant cellular structures.

Similarity between two data objects is perceived through measurement of the objects proximity in a multi-dimensional space, and is used to express the objects’ relationships within a cluster or between clusters obtained through a partitioning process. Distance or similarity measurements between objects forming a cluster have been defined as equivalent notions [39]; however, appropriate metrics are required in order to identify objects with similar or dissimilar profiles. Commonly applied similarity measures can be organized into three groups according to object representation: (1) point-based, including Euclidean and Minkowski distances, (2) set-based including Jaccard’s, Tanimoto’s, and Dice’s [40] indices, and (3) probabilistic with Bhattacharyya [41], Kullback-Leibler’s, and correlation-based Mahalanobis [42] distances, respectively. In many practical applications the objects are described by discrete features, by which the similarity is assessed [39]. Furthermore, the sample homogeneity as cluster quality measure can be perceived as an averaged pairwise object similarity [36, 39].

We utilized the Kullback-Leibler’s measure with its properties in our study. The background of this approach is introduced here. Let us consider a random discrete variable X with probability distribution p = {pi}, where pi is the probability for the system to be in i-th state. The measure log (1/pi) is called the unexpectedness or surprise [43]. Two extreme states can occur: if pi = 1, then the event is certain to happen, and if pi ≈ 0 then the event is nearly impossible. Now, consider two discrete distributions p = {pi} and q = {qi}, where pi and qi are the probabilities of occurrence of the i-th state in a set of system states. The difference: log(1/qi) −log (1/pi)defines change of unexpectedness of the probability p with respect to probability q. Averaging the unexpectedness of the events over pi leads to:


where: H (p) is the negative of Shannon’s entropy [44] and K(p, q) is the measure of information referred to as inaccuracy [45]. KL(pq) is nonnegative and delimited by the following constraints: limpi0qi0pilog(pi/qi)=0,andlimqi0pi0pilog(pi/qi)=.

Function KL(pq) is known as the Kullback-Leibler’s divergence [46] of information linked to two probability distributions p and q. This is also a measure of how different two probability distributions (over the same system states space) are. Typically, pi represents data, observations, or a precisely calculated probability distribution, and qi represents an “arbitrary” distribution, a model, a description or an approximation of pi. Following [46] it is assumed that: (i) 0log(0/qi) = 0; and (ii) terms in Eq. (1) where the denominator is zero are treated as undefined and are neglected in order to provide absolute continuity of pi with qi.

The Kullback-Leibler’s divergence can be used to measure the distance between various kinds of distributions [47]. For instance, it has been employed in medical and systems biology applications including registration of image datasets [48], image segmentation [49], temporal analysis of gene expression [50], clustering of gene expression data [51] and similarity analysis of DNA sequences [52]. The objects’ homogeneity assessment is then performed in two steps. First, distance-based similarity is measured between the combined 2D MeC/DAPI histograms of all nuclei and the histogram of each individual cell nucleus. Second, each nucleus (object) in the population is assigned into one of the predefined categories based on similarities.

Assessment of cell population homogeneity is not a trivial task as it is constrained by the imaging modalities and the cell type itself. In a typical setting, the evaluation of cellular response to external factors such as drugs can be achieved with a comparison of the treated population to an untreated (reference) population. However, in this work we present a method to assess each population by itself, in isolation. These populations were analyzed a posteriori, (i.e. without prior knowledge of relevant structural information). Regardless, our approach also allows for a global assessment of cellular patterns among populations.

In 3D image analysis of nuclei, the segmentation of the nucleus and the quantification of residing features are the most vital components. A common scheme in existing approaches is the watershed algorithm followed by extraction of pertinent features [5358]. The aforementioned solutions require the extraction of tens of features for clustering or classifier training for the further application of a pattern recognition task. Hence, an algorithm utilized for feature extraction and pattern recognition, may be restricted by the morphology of a specimen, in which some features are redundant while others are irrelevant. Although some methods for cellular detection and segmentation have been proposed, a general-purpose system that can perform analysis and recognition tasks for a variety of confocal microscope images without necessitating an approach modification or system training (related to the target-specific applications) is still not available.

The main aim of this work is to develop a software system that can be robustly applied to the topological analysis of nuclear targets, such as MeC and DAPI, which will provide useful parameters in the elucidation of epigenetic mechanisms as well as the evaluation of epigenetic drugs tested in cultured cell models. The algorithm developed combines the three major tasks: (1) automated segmentation of nuclei in a cell population, (2) subsequent nuclear pattern extraction, and (3) distance-based statistical measurement of cell dissimilarity using Kullback-Leibler (K-L) divergence. This method considers the strength of statistical evaluation of intra-nuclear MeC/DAPI patterns, especially valuable when cell population homogeneity is difficult to be assessed due to lack of standardized reference and sample size. In this study, we evaluate the potential of using an unsupervised 3D seeded watershed algorithm coupled with K-L divergence measurement to calculate the dissimilarity of mouse pituitary folliculostellate TtT-GF cell response to treatment with the demethylating agents, 5-AZA and OCT. This response was quantitatively measured and displayed as the differential co-distribution of MeC/global DNA signals in treated and untreated cells. A comparison of K-L divergence with other commonly used similarity metrics demonstrates the superior performance of our method.


Cell Culture

TtT-GF cells (ATCC) were grown in serum-containing low glucose Dulbecco's modified Eagle's medium (Invitrogen) supplemented with 10% fetal bovine serum, with addition of 2 mM glutamine and 1% antibiotic/antimycotic (100 units/ml penicillin G sodium, 100 µg/ml streptomycin sulfate) (Invitrogen), in 6% CO2, 37 °C as described by Ben-Shlomo et al. [59]. Cells were plated at 1×105 cells onto coverslips in multi-well plates, and allowed to attach for 24 hours. Then, cells were divided into two groups: (i) two control populations that were not treated for 48 hours (NT-TtT-1, NT-TtT-2), (ii) and two treated populations: AZA-TtT cells treated with 1 µM 5-azacytidine (Sigma-Aldrich) and OCT-TtT cells treated with 100 nM octreotide (Sigma-Aldrich), both for 48 hours.

Immunofluorescence and Imaging

In order to preserve the three-dimensional structure, cells cultured on coverslips in 12-well microplates were fixed with 4 % paraformaldehyde/phosphate buffered saline (PBS) (Sigma-Aldrich) and permeabilized as previously described in [60,61]. Subsequently, cellular RNA was removed with RNase A (Novagen), particularly because transfer RNA (tRNA) contains methylated cytosine as previously described [22]. Cells were depurinated with 2N HCl and blocked with 2% BSA/PBS prior to application of antibodies: a monoclonal mouse 5-MeC antibody (EMD Biosciences) followed by a secondary Alexa 488-linked goat anti-mouse polyclonal IgG (Invitrogen). The specimens were counterstained with DAPI, and 3D imaging was performed using a confocal laser scanning microscope TCS SP2 (Leica Microsystems Inc.) equipped with a multi-line argon laser (458 nm, 488 nm, 514 nm) for Alexa 488 (MeC), and a 405 nm diode laser line for excitation of DAPI fluorescence: serial optical 2D sections were collected at increments of 200–300 nm with a Plan-Apo 63× 1.4 oil immersion lens; pinhole size was 1.0 airy unit. To avoid bleed-through, the imaging of each channel was acquired sequentially. The typical image size was 1024 × 1024, with a respective voxel size of 116nm × 116nm × 230.5 nm (x,y, and z axes), and resolution was 8 bits per pixel in all channels. Example images of NT-TtT cells are presented in Figure 1. Fluorescence intensity of 5-MeC and DAPI signals, IMeC and IDAPI, from optical sections were recorded into separate 3D channels.

Figure 1
A maximum intensity projection of 3D confocal microscopy images of NT-TtT-1 cells: (A) cell nuclei with patterns of DAPI-staining (blue channel), (B) and MeC-staining (green channel), (C) merged projection, scale bar at the right left corner is 0.24 µm. ...

Image Analysis

Image analysis was performed in three main steps (see Figure 2): (1) 3D image segmentation resulting in the delineation of a 3D shell for each individual nucleus; (2) extraction of MeC and DAPI signal intensity distribution within each 3D shell; and (3) dissimilarity assessment of MeC and DAPI signal distribution patterns between each individual nucleus and a reference pattern derived from the entire cell population (Fig. 2). This workflow was designed based on the images taken from the NT-TtT-1 and the following assumptions: the background in each image stack was considered to be quasi-uniform, meaning that there are very small to zero low frequency fluctuations or trends in the background through a single image plane or across the depth of the image stack. Moreover, all images in each stack are assumed to be acquired under nearly identical conditions and modality settings, and so the drift of the settings during acquisition can be considered minimal and thus neglected. In order to reduce computational complexity during the segmentation phase, the image resolution was decreased by a factor of four for this step only in the x and y directions. The developed methodology was subsequently applied to all image stacks.

Figure 2
A three-step flowchart of the image analysis methodology.

STEP 1: 3D Segmentation of Nuclei

The IDAPI and IMeC image stacks were combined in the following way: I(x,y,z) = max (IDAPI (x,y,z), IMeC (x,y,z)), thus intensity of the output image I is always a maximum of the intensities in corresponding channels at pixel position (x,y,z) (Fig. 3A). To separate the nuclei from the background a histogram of image I was constructed. We apply the technique described in [62] yielding the threshold value Tb that splits the histogram into two parts; a main peak representing the background, and a histogram tail reflecting intensities of the nuclear content. A binary image was obtained in which background pixels and nuclear content were converted to the values 0 and 1, respectively. This image was then subjected to enhancement by means of 3D morphological operations (closing and filling holes), yielding a refined binary image Ib (Fig. 3B). We note that in Ib the majority of nuclei were distinct. However, some nuclei touch (or nearly so) one another to form larger clusters. These two groups of objects were processed separately to better delineate all nuclei.

Figure 3
3D image segmentation workflow, demonstrated with NT-TtT-1 cells: (A) combined MeC-DAPI channels, (B) binary image resulting from thresholding image in Figure 3A at the level of Tb; (C) Distinguished groups of binary objects obtained by mean volume thresholding ...

A reduction of the original resolution by factor of four of images Ib and I creates two down-sampled images I' b and I' respectively. Labeling and counting of the binary objects in I' b was carried out according to Haralick et al. [34], and the volume of each object was found. A mean volume value Tvol, served as criterion to split the image I' b into two binary masks, one with small components I' bs and one with large components I' bl (Fig. 3C). Then, all voxels of image I' under the mask I' b were replaced by a constant value Tb, creating an image I' m that models the nuclei (Fig. 3D). Such approach is useful for object segmentation, because it is comprised of image intensities equal to or lower than the automatically defined threshold Tb. This model, is used to create 3D seeds that define location of each nucleus, and serves also as the input for the seeded watershed segmentation technique.

Next, image I' m was subjected to smoothing by two anisotropic Gaussian filters, Gs and Gi for small and large binary components, respectively. Infinite Gaussian kernel is approximated and its size is defined by Nx, Ny and Nz representing mask size in each direction. The smoothing effect in 3D is controlled by three parameters Gxyz). To assure that smoothing can produce a signal strong enough to detect a seed, the approximated filter kernels were adaptively adjusted to the relative volume of the binary objects in I' bl and I' bs respectively. The kernel size is adjusted first. We chose a spherical model for cell nuclei, and allocated three kernels for each x,y and z axis of a sphere. This approach provides a predefined number of filter kernels that fit the hypothetical nuclear size, in our case seven (n = 7). Since the image voxels in our data stacks are not isotropic, Nz can be almost twice as much compared to Nx=Ny, and the filter size can therefore be calculated from Tvoln · NxNyNz . Substituting Nz = 2Nx and Ny = Nx the filter size NxTvol/2n3 can be derived as the largest odd number satisfying this inequality. Thus, mean volumes of binary objects in Ibs and Ibl can be used to calculate filter size Nx for kernels Gs and Gl . Second, the remaining filter coefficients σx, σx, and σx were empirically set to one half of the mask size in each direction. In general, sizes of Gs and Gl kernels are proportional to the mean volume of binary objects under respective masks, and so the corresponding filter coefficients. Also, the size of Gl is never smaller than Gs.

The image I' m is separately smoothed once (by each kernel) to obtain the images I' ms and I' ml . The larger the kernel is, the smoother the created surface of the ROI (nucleus) will be. After filtering, the results were combined into one output binary image according to:


where trh denotes a threshold function expressed as:


and where Q(x,y,z) is an image, T is the threshold, If is the output image, I' ms, I' ml are the smoothed components, 2297 denotes element-by-element multiplication and [union or logical sum] is the matrix logical union.

The smoothing procedure produces slowly varying intensity fields in Ims and Iml with maxima and local plateaus resembling blobs in 3D space which are located inside the nuclei, with intensities oscillating around Tb. The location and size of the maxima depend on the smoothing kernel and the nucleus size. The thresholding of the smoothed image at the level of Tb yields binary seeds in If, with one seed per nucleus (Fig. 3E). Small seeds were eliminated and converted to background.

The watershed algorithm [63] in its original form has several well-known limitations; it typically over-segments the image and does not take into account image-inherited cues such as intensity gradients, topology and content of segmented objects. Thus, the seeds serve as a priori knowledge about segmented structures and form numerous points for algorithm initialization. Such an approach has the potential to generate a number of unique regions that closely matches the number of seeds. In this study we extend the existing implementation of the 2D seeded watershed method [64,65] to obtain 3D nuclear shells (Fig. 2F). During this segmentation each nucleus receives a label for further identification and visualization. Then, the segmented image I' s was up-sampled by factor of four with the nearest neighbor interpolation technique, resulting in the image Is that contains the 3D nuclear shells. This image can also be superimposed onto IDAPI or IMeC and displayed, as shown later in Figure 4.

Figure 4
Example of cell populations with selected nuclear MeC/DAPI co-distribution patterns: NT-TtT-1 having an overall low number of dissimilar cells (A), OCT-TtT with a high number of dissimilar cells (G), and AZA-TtT with large majority of similar cells (M). ...

STEP 2: Extraction of MeC and DAPI patterns

A powerful aspect of scatter plots is their ability to depict mixture models of simple relationships between variables. These relationships can reflect cellular patterns as specific signatures, in which the variables can be nuclear structures as shown in the case of DNA methylation patterns versus DAPI-stained DNA [22]. These nuclear entities are not static and reorganize during cellular differentiation, as well as upon the application of demethylating agents. Earlier we showed that such reorganizations can be dynamically monitored by scatter plotting the two types of DNA, with their differential distribution becoming visible as changes in the plotted patterns. In this case, we first individually segmented nuclei to create three-dimensional ROIs (3D-shells). Then, we plotted the fluorescent MeC and DAPI signal distributions within these shells. Utilizing K-L divergence, the degree of similarity between two scatter plots can be easily measured, and reflects the similarity of target (MeC and DAPI signals) topology between two cell nuclei (in Kullback-Leibler sense).

STEP 3: Nuclear Pattern Analysis by means of Kullback-Leibler’s divergence

In our approach, we applied the K-L divergence as a statistical measure of dissimilarity between two normalized scatter plots: the value of qi denotes a probability of occurrence of intensity i in an analyzed nucleus outlined by 3D shells and pi signifies a reference scatter plot component. The reference scatter plot is constructed from all individual plots. To the best of our knowledge, no such work on identification of nuclear patterns based on Kullback-Leibler’s measure has been reported so far. Therefore, this is an innovative way to perform an intra-population assessment of cells with regard to their homogeneity in response to environmental changes in culture, and is especially suitable for high-throughput multi-parameter analyses.

The K-L divergences represent distinctive and relative measurements derived from a unique cell population. A comparison of K-L values between experiments, in principle, requires identical reference distributions to be applied. However, a lack of reproducibility in sample preparation, drift and instability of imaging modality settings is the primary constraint in determining such a universal reference. In order to reduce the influence of these constrains, and to make the K-L values more descriptive, we introduced four soft-qualifiers for defining the similarity degree of a cell versus the entire cell population. These degrees are associated with particular ranges of K-L divergences derived for two idealized Gaussian distributions. For the multivariate d-dimensional Gaussian densities given by G(x,μ, Σ) = (2π)−d/2 |Σ|−1/2 exp(−0.5(xμ)T Σ−1 (xμ)) the Kullback-Leibler’s divergence is expressed by:


where: x is the random variable, μ is the vector of means, Σ is the covariance matrix, tr is the trace function, and |·| is the determinant of a matrix.

The K-L divergence in Eq. (4) between two one-dimensional univariate Gaussian distributions pG(x)=N(x;μp,σp2)andqG(x)=N(x;μq,σq2) with x as the random variable comes down to [60]:


Furthermore, assuming that σp ≈ σq and that σ can be substituted instead of σp and σq in Eq. (5), we obtain KLGp,σ;μq,σ) = (μq − μp)2/2σ2, where the numerator reflects the distance between the peaks of the two Gaussian distributions. The KLG in the simplified formula can be also related to the fraction of the distributions’ overlap area and used as a way of articulating dissimilarity. Also, when expressing μp−μq as a multiple of σ, the KLG value becomes solely dependent on the standard deviation in the evaluated distributions. Table 1 illustrates the four soft-qualifiers defining the similarity degree of KLG divergence linked to σ, obtained on the basis of the aforementioned assumption. The four soft-qualifiers are defined as: similar KLG [set membership] [0,0.5), likely similar KLG [set membership] [0.5,2), unlikely similar KLG [set membership] [2,4.5), and dissimilar for KLG [set membership] [4.5, ∞). Thus this procedure can be perceived as a classification process. As a side note, the K-L divergence between two bivariate normal densities is a function of Pearson’s correlation coefficient [66].

Table 1
Values of Kullback-Leibler’s divergence and percent of overlap for two hypothetical univariate Gaussian distributions pG = Np2) and qG = Nq2).

Evaluation of Similarity Measures

Three commonly used similarity metrics including Mahalanobis, Bhattacharyya distances, and Dice’s index were implemented into the image analysis workflow together with the proposed K-L divergence and then applied to NT-TtT-1, AZA-TtT and OCT-TtT cellular images. Since none of these metrics have been documented for assessing cell culture homogeneity through 2D methylation pattern histograms, we compared their performance to determine the most appropriate approach for measuring demethylating effects by nuclear topology. Unlike the method and system validation characteristics such as accuracy and reliability that are based on individual results, the characteristic of the uncertainty of results delivered by a classification method needs to be determined on a method-to-method based comparison [67]. Therefore, using the uncertainty as a validation characteristic raises the objectivity of our comparative evaluation. In our case we used similarity values of nuclei within a cell population. Assuming that a similarity metric can label a nucleus in a way that it reflects its natural proximity to other nuclei in the feature (nuclear pattern) space, then such labeling should have a low uncertainty.

Our evaluation steps were as follows: (i) each of the tested metrics yielded a similarity value for all nuclei; (ii) the nuclei were grouped into classes based on assigned similarity value. For this a minimum distance criterion in class forming scheme was applied, and up to six classes were generated; (iii) clustering results were evaluated as described in [67]. The entropies of the results were calculated as a measure of uncertainty, in which the lowest entropy indicates the least uncertainty of results produced by the evaluated method. Finally (iv), a normalized certainty was used for method comparison [67]:


where: M is the number of classes used in the classification scheme, and EntropyM is calculated from the results of the similarity measure classification into M classes.


Untreated (NT-TtT-1, NT-TtT-2) and treated mouse pituitary tumor cells (OCT-TtT and AZA-TtT), (total number of cells n=163) were imaged, and then analyzed by our in-house developed, MATLAB-based software. Following our algorithm, the three-dimensional nuclear shells were first delineated (Fig. 3), and then for each nucleus within an image field the fluorescent signals derived from MeC-specific staining and DAPI staining were mapped as respective scatter plots. The K-L divergences of the distribution of MeC and DAPI signals between individual plots (nuclei) and the reference plot (cumulative plot from all nuclei) were then calculated. The algorithm displays the K-L values and the digital ROI for each cell nucleus, as shown in Figure 4. Six nuclei (two from each of OCT-TtT, AZA-TtT and NT-TtT-1 cell group) illustrating different nuclear MeC and DAPI patterns were selected as examples for visualization purposes. The fields appearing in these figures are smaller than the complete microscopic field of view. Figure 3 shows the earlier intermediate steps of the algorithm described in the methods section, followed by the actual results in Figure 4.

The applicability of the K-L divergence was tested for the categorization of nuclear patterns with significantly different DAPI signal distributions. One-dimensional MeC and DAPI histograms were generated for each of the two 5-AZA-treated as well as the two OCT-treated nuclei, and plotted next to their respective 2D joint MeC/DAPI diagrams (Figure 5). This separation shows that both signals, MeC and DAPI, differ in their intensities (indicated by the curves’ shapes) between cells, which can be interpreted as the result of differences between cells in their response to the demethylating agents.

Figure 5
Differential nuclear MeC and DAPI signal distributions of drug treated cells, 5-AZA-treated cells (A–F) and OCT-treated cells (G–L), displayed as 2D histograms (middle column) and individual 1D histograms (MeC, left column, and DAPI, right ...

Based on the definition of soft-qualifiers in Table 1, we have chosen four categories into which the processed nuclei fall: similar, likely similar, unlikely similar, and dissimilar.

This categorization helps to characterize a cell population in a quantitative and readable fashion (Table 2). The classification was performed twofold: (i) using solely the MeC histogram, and (ii) using joint MeC/DAPI histograms, of individual cells versus the entire population. In the first case a combined MeC histogram was used as the reference distribution. The outcome provides statistical information about the number of cells that fall into each category. Different cell populations can then be compared based on their category statistics.

Table 2
Results of soft-qualification of nuclei in different cell populations. Application of the K-L divergence to 2D MeC/DAPI distributions revealed a significantly higher number of cells with similar or likely similar MeC/DAPI patterns among untreated cells ...

Utilizing the joint MeC/DAPI patterns in the categorization of the four groups of cell populations revealed that all NT-TtT-1 cells are classified as at least likely similar, with a majority of 76 % being similar. This signifies a relatively high homogeneity of MeC versus DAPI distribution within the NT-TtT-1 cell population. Likewise, 74.5% similar, 23.5% likely similar and 2% of unlikely similar cells was found in NT-TtTGF-2 population. Our assessment of untreated cells revealed that the distribution of the cell categories was quite consistent in populations with different numbers of cells. In comparison, OCT-TtT cells display a higher portion (64 %) of likely similar cells and to a lesser degree (36 %) also unlikely similar cells. The AZA-TtT cells represent very low ratio of dissimilarity with 90% of similar and 10% of likely similar cells. However one can note that their intracellular architecture is different comparing to NT-TtT and OCT-TtT cells in that, fewer loci is seen within AZA-TtT cell nuclei vs. nuclei in the remaining cultures.

Utilizing only MeC histograms to categorize cells yielded no dissimilar cells in all four tested populations. In NT-TtT-1 and NT-TtT-2 control cell lines there were identical fractions of approximately similar cells (88%). In NT-TtT-1 12% of cells and 10% in NT-TtT-2 were classified as likely similar, with 2% of cells found unlikely similar in NT-TtT-2 population. OCT-treated cells revealed almost equal (28–35%) allocation of cells among all three similar cells categories. The cell population treated with 5-AZA was characterized as highly represented by similar cells (97%) with only one cell (3.3%) classified as likely similar.

The cell categorization was implemented into the image visualization and analysis software we developed, as shown in Fig. 6. Such visualization is a valuable feature of image-based cytometry, providing dual information of cell behavior/category and localization within the sample environment. Processed images of the three cell groups used in this study underwent a visual check by an expert (J.T.) and the dissimilarity evaluations between cells matched the automated analytical results.

Figure 6
Visualization aid in the evaluation of cell population homogeneity: our software, developed in-house, is able to convert K-L values into pseudo-colors, as illustrated for the NT-TtT-2 cell population with a large number of similar cells constituting a ...

In our definition of soft qualifiers, normality of the sampled population was assumed. In order to evaluate normality of the individual MeC/DAPI distributions, we estimated two Gaussian components by means of the expectation-maximization clustering algorithm [68] in each of the segmented nuclei of the NT-TtT-1 population (Fig. 7). The components estimated in this way constitute approximately 75% of data points of each nucleus. In addition using Lilliefors’ statistical tests [69] we tested a null hypothesis, which considers that the data derives from a multivariate family of normal distributions. This test was performed for each nucleus and separately for each dimension. The null hypothesis was not rejected at the 5% significance level. Therefore, we assume that the scatter plots obtained throughout our experiments can be approximated by multivariate Gaussian components.

Figure 7
A scatter plot of a TtT-NT-1 nucleus with bivariate Gaussian components estimated by the expectation-maximization algorithm. The mean values are marked by the “+” sign and the ovals outline areas within one standard deviation from the ...

Selected similarity metrics, including the Mahalanobis and Bhattacharyya distances as well as the Dice’s coefficient and the proposed K-L divergence, were calculated for each of the individual two-dimensional MeC and DAPI plots and the combined distribution. The normalized certainty (Eq. 6) of the results determined by the different metric methods is presented in Table 3.

Table 3
Normalized certainty of the results obtained with different metrics: similarity data was generated for the distinction of cells into two to six categories (classes). K-L divergence shows the highest certainty values (bolded) in the majority of tested ...

Our comparison of the different most applicable metrics indicates that in the majority of cases (73%) the normalized certainties reached their highest values when the classification was based on the K-L divergence. Moreover, if more than two classes (a more frequent scenario) are considered in the classification scheme, the proposed K-L similarity measure achieves the highest certainty scores in even more of the cases (91%).


The main goal of this study was to develop an automated image analysis tool that would be suitable for measuring the effects of demethylating agents through the differential analysis of relevant nuclear structures, as represented by methylated CpG-dinucleotides (MeCs) and global DNA, in cells. For this purpose, a dedicated tool was designed that performs the three sequential steps on individual cells within a population: (1) unsupervised segmentation of 3D imaged cell nuclei via seeded watershed algorithm, (2) multi-channel quantitative distribution analysis of nuclear entities, and (3) similarity testing of cells in regard of their distribution profiles by means of Kullback-Leibler’s divergence measurement. Our experience with mouse pituitary tumor cells confirms that demethylating agents can exert the two known effects: (i) a decrease in the number of MeCs in global DNA [70], and (ii) the subsequent decondensation of highly compact heterochromatic regions of the genome, that lead to spatial reorganization in the nucleus and affect nuclear architecture [28]. The image analysis we developed utilizes these coexisting phenomenon to measure and display the relevant changes in intensity distribution of the two types of signals that reflect said phenomena: (a) MeC-signals created through immunofluorescence targeting of methylated cytosine and (b) DAPI-signals generated by subsequent counter-staining of the same cells, as DAPI intercalates into AT-rich DNA the main component of highly repetitive and compact heterochromatic sequences. Our computational approach minimized the usual obstacles in automated cellular analysis such as intra-specimen variation in background and morphological properties of nuclei, including size, shape, and structural density. Furthermore, cellular clustering seen for some types of cells such as pituitary tumor cells in culture, can create a poor contrast between nuclear borders. The implementation of the seeded watershed algorithm in here allowed for a conservative separation of nuclei. In addition, the change of object resolution during image processing allowed for process acceleration through reduction of computational complexity. The segmentation masks can be overlaid onto the corresponding raw MeC/DAPI images for performing visual assessment of segmentation accuracy. It should be noted that the visual classification of the composite MeC/DAPI signals can be very time consuming and quite subjective, as compared to computer-aided classification in an automated fashion. This fact is especially true when large sets of image data with a highly non-geometrical distribution of nuclear targets need to be processed. In this way, both the delineation of the nuclei and the topological quantification of the complex patterns will be streamlined and results will be produced with higher confidence. The developed method is amenable to scale and suitable for high-content, high-throughput analysis of cells in both research and at the industrial volume.

In previous studies we showed that the nuclear distribution of MeC versus DAPI signals, displayed as a 2D scatter plot, can serve as a signature by which cells differing in their state of differentiation or in treatment can be distinguished [22]. We also observed that untreated and drug-treated cells of the same kind display different degrees of dissimilarity within their populations, as judged by the resulting scatter plots. This led to the development of the synthesized image analysis method described here, which utilizes the resulting scatter plots in a statistical fashion to assess structural and behavioral dissimilarity within a cell population. These features are generally studied in relevance to a variety of cell biological applications. Our notion was to develop and test an algorithm that can be meaningfully and robustly applied to the evaluation of demethylating agents such as 5-azacytidine and octreotide. However, the developed algorithm can be flexibly utilized for similar topological studies, in which nuclear entities and their distribution are targeted in a biological context. Especially, the modular integration of the K-L divergence measure is a valuable feature that allows for the statistical evaluation of cells, when the targets do not have a consistent location within the considered ROI, such as the nucleus. Furthermore, our analyses indicate that if only 1D histograms of MeC signal distribution were utilized for K-L divergence measurement, significantly different results were observed in homogeneity assessment when compared to those using the joint 2D MeC/DAPI histograms. The exclusion of DAPI causes a shift of the cells to the lower categories, suggesting that the DAPI signal is a meaningful dynamic parameter as it increases the differential resolution in the image-based analysis of nuclear methylation patterns. This can be reconciled with the aforementioned biological effects on nuclear DNA. In particular, heterochromatin decondensation, as a secondary effect of global demethylation, results in the relocation of heterochromatic sites within the nucleus (which is associated with genome destabilization). As a consequence of these conformational and organizational changes of the DAPI-positive nuclear sites, the same DAPI signal intensity is spread out over a higher number of voxels. Thus, both MeC and DAPI have dynamic patterns in the cell nucleus that become more discernable in a joint 2D plots than in a 1D MeC plot, or even when the two signals are separately displayed in one dimension (see Figure 5). Notably, our snapshots of untreated cells also display dissimilarity in MeC/DAPI signal distribution, however to a much lesser degree than treated cells (Figure 4). We assume that this could be due to the fact that the cells were in different cell cycle phases, as this study did not apply any synchronizing agents for two reasons: (i) to minimize other induced effects that could interfere with demethylation, and (ii) to more closely model the in vivo situation in which synchronicity of cells within their native tissue environment is naturally not the case. Therefore untreated cells that display a lower MeC load signal may represent replicating cells in S-Phase that had not completed methylation of de novo synthesized DNA strands, as delay times between the two processes of replication and methylation have been reported for various types of cells [71,72].

Our approach directly illustrates the distribution of voxel intensities. The changes of these distributions are derived from the underlying changes in the topology (spatial patterns) of global DNA in response to drug treatment. Consequently, we are able to demonstrate that when the topological nuclear distribution patterns of methylated cytosine and global DNA are converted into two-dimensional histograms, they can be utilized as differential biosignatures in the evaluation of cellular response to treatment with demethylating anti-cancer agents. This characteristic is in line with the larger purpose of our approach, namely to create a rapid image analysis method that is of low complexity and therefore computationally inexpensive with potential for high-throughput cell screening tasks.

Other statistical methods such as cluster or bimodal analyses [73,74], commonly utilized in gene expression analysis [75], are important when targets (with respective intensities) have a definite location (coordinates). These methods are valuable in assessing ratio-labeling of targets when hybridized to arrayed nucleic acid fragments that are immobilized and have defined coordinates on the supporting material (DNA microarrays) [7678], or when hybridized to genomic loci with known chromosomal locations on metaphase chromosomes of normal cells [79]. In contrast, nuclear targets such as genomic loci on largely decondensed DNA or proteinaceaous entities may strongly vary in their localization between nuclei. In these cases, the K-L divergence becomes of value as it does not require dealing with absolute target coordinates for similarity testing. Moreover and unlike k-means clustering or bimodal analysis of gene expression, the K-L approach tolerates the occurrence of null categories that may not be filled by any object (in this case nucleus). Fig. 6 shows an example in which the fourth category, namely dissimilar, is not represented by any of the nuclei in the tested population (no red-colored nucleus is present in Fig. 6B).

The Kullback-Leibler’s divergence is a valuable method for quantitating dissimilarities within a cell population and this measure can be applied to any multi-color cellular assay that utilizes topological information of intracellular structures to assess cellular behavior. Our comparison of the metrics most frequently used for similarity measurements demonstrates that the K-L method produces the highest certainty (least uncertainty) for the nuclear MeC/DAPI pattern analysis within the imaged cell populations. Moreover, the Pearson’s correlation coefficient between two distributions can be directly calculated from the K-L divergence if the distributions are normal, especially in cases when correlating samples do not have equal size. However, proving normality of multimodal distributions may increase computational complexity in practical cases. A way of identifying a distribution’s normal components described and implemented in our study supports the suitability of K-L divergence to be used for our data, especially in determining the soft qualifiers, because in our study the majority of the acquired 2D signals had a normal distribution.

We observed the robustness of the K-L divergence against potential intra-experimental data variability introduced through the biochemical processing of specimen or the modality settings in between imaging sessions, which both may additively alter the intensity levels within the MeC and the DAPI channel. We did not detect any difference in K-L divergences, which was confirmed by the fact that the shape of the scatter plots remained unchanged. On the contrary influences of multiplicative nature may skew the results of all types of metrics. Additionally, the K-L divergence measurement has the advantage of being independent from image rotation and the inherent anisotropy of confocal microscopy images.

As one would expect, statistical methods in the form of similarity measures gain more confidence when applied to large datasets, in this case large cell populations with thousands of nuclei. To our pleasant surprise, the K-L divergence outperformed the comparative metrics when utilized for smaller cell populations of only around twenty cells. This underlines not only the robustness of the method, but also its flexibility in dealing with a high dynamic range in sample size. This characteristic is quite valuable in connection with the current limited capabilities of our imaging systems that are restricted in the field of view size when acquiring highest-resolution 3D images. Thus, it is necessary to collect and tile multiple image stacks in order to obtain a complete picture of the entire sample. The robustness of the K-L measurement allows it to be applied across the entire tiled image. Such an approach could be helpful in the assessment of relationships between single cells and their macro- and micro-level neighborhoods for studying intra- and inter-population functional relationships through epigenetic effects such as DNA methylation via tissue diagnostics in disease pathology and cell-based assays for compound screening in drug development.


This work was supported by the US Navy Bureau of Medicine and Surgery, the National Science Foundation, and the National Institutes of Health.


1We thank Dr. Anat Ben-Shlomo (Cedars-Sinai Medical Center) for providing treated and control TtT-GF cells.


1. Valet GK, Tárnok A. Cytomics in predictive medicine. Cytometry B. 2003;53:1–3. [PubMed]
2. Valet G, Leary JF, Tárnok A. Cytomics-new technologies: towards a human cytome project. Cytometry A. 2004;59:167–171. [PubMed]
3. Strouboulis J, Wolffe AP. Functional compartmentalization of the nucleus. J Cell Sci. 1996;109:1991–2000. [PubMed]
4. Stein GS, van Wijnen AJ, Stein JL, Lian JB, Montecino M, Zaidi K, Javed A. Subnuclear organization and trafficking of regulatory proteins: implications for biological control and cancer. J Cell Biochem Suppl. 2000 Suppl 35:84–92. [PubMed]
5. Berezney R, Wei X. The new paradigm: integrating genomic function and nuclear architecture. J Cell Biochem Suppl. 1998;30–31:238–242. [PubMed]
6. Misteli T, Spector DL. The cellular organization of gene expression. Curr Opin Cell Biol. 1998;10:323–331. [PubMed]
7. Burke B, Stewart CL. The laminopathies: the functional architecture of the nucleus and its contribution to disease. Annu Rev Genomics Hum Genet. 2006;7:369–405. [PubMed]
8. Cremer T, Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat Rev Genet. 2001;2:292–301. [PubMed]
9. Lamond AI, Spector DL. Nuclear speckles: a model for nuclear organelles. Nat Rev Mol Cell Biol. 2003;4:605–612. [PubMed]
10. Fackelmayer FO. Nuclear Architecture and Gene Expression in the Quest for Novel Therapeutics. Curr Pharm Des. 2004;10:2851–2860. [PubMed]
11. Szöllosi J, Nolan J, Tárnok A. New trends in cytometry in the era of systems biology. Cytometry A. 2008;73:267–269. [PubMed]
12. Tárnok A. A focus on high-content cytometry. Cytometry A. 2008;73:381–383. [PubMed]
13. Lang P, Yeow K, Nichols A, Scheer A. Cellular imaging in drug discovery. Nat Rev Drug Discov. 2006;5:343–356. [PubMed]
14. Doerfler W. DNA methylation and gene activity. Annu Rev Biochem. 1983;52:93–124. [PubMed]
15. Bird AP. CpG-rich islands and the function of DNA methylation. Nature. 1986;321:209–213. [PubMed]
16. Li E. Chromatin modification and epigenetic reprogramming in mammalian development. Nat Rev Genet. 2002;3(9):662–673. [PubMed]
17. Riggs A, Jones PA. 5-methylcytosine, gene regulation, and cancer. Adv Cancer Res. 1983;40:1–30. [PubMed]
18. Jones PA, Baylin SB. The epigenomics of cancer. Cell. 2007;128:683–692. [PMC free article] [PubMed]
19. Yoo CB, Jones PA. Epigenetic therapy of cancer: past, present and future. Nat Rev Drug Discov. 2006;5:37–50. [PubMed]
20. Esteller M, Herman JG. Cancer as an epigenetic disease: DNA methylation and chromatin alterations in human tumors. J Pathol. 2002;196:1–7. [PubMed]
21. Santos AP, Abranches R, Stoger E, Beven A, Viegas W, Shaw PJ. The architecture of interphase chromosomes and gene positioning are altered by changes in DNA methylation and histone acetylation. J Cell Sci. 2002;115:4597–4605. [PubMed]
22. Tajbakhsh J, Wawrowsky KA, Gertych A, Bar-Nur O, Vishnevsky E, Lindsley EH, Farkas DL. Characterization of tumor cells and stem cells by differential nuclear methylation imaging. In: Farkas DL, Nicolau DV, Leif RC, editors. Proceedings Vol. 6859 Imaging, Manipulation, and Analysis of Biomolecules, Cells, and Tissues VI; 2008. p. 68590F.
23. Christman JK. 5-Azacytidine and 5-aza-2'-deoxycytidine as inhibitors of DNA methylation: mechanistic studies and their implications for cancer therapy. Oncogene. 2002;21:5483–5495. [PubMed]
24. Arney KL, Fisher AG. Epigenetic aspects of differentiation. J Cell Sci. 2004;117(19):4355–4363. [PubMed]
25. Fisher AG, Merkenschlager M. Gene silencing, cell fate and nuclear organization. Curr Opin Genet Dev. 2002;12:193–197. [PubMed]
26. Narayan G, Raman R. Cytological evaluation of global DNA methylation in mouse testicular genome. Hereditas. 1995;123:275–283. [PubMed]
27. Kobayakawa S, Miike K, Nakao M, Abe K. Dynamic changes in the epigenomic state and nuclear organization of differentiating mouse embryonic stem cells. Genes Cells. 2007;12:447–460. [PubMed]
28. Gilbert N, Thomson I, Boyle S, Allan J, Ramsahoye B, Bickmore WA. DNA methylation affects nuclear organization, histone modifications, and linker histone binding but not chromatin compaction. J Cell Biol. 2007;177:401–411. [PMC free article] [PubMed]
29. Ginis I, Luo Y, Miura T, Thies S, Brandenberger R, Gerecht-Nir S, Amit M, Hoke A, Carpenter MK, Itskovitz-Eldor J, Rao MS. Differences between human and mouse embryonic stem cells. Dev Biol. 2004;269:360–380. [PubMed]
30. Glory E, Murphy RF. Automated Subcellular Location Determination and High-Throughput Microscopy. Dev Cell. 2007;12:7–16. [PubMed]
31. Strovas TJ, Sauter LM, Guo X, Lidstrom ME. Cell-to-Cell Heterogeneity in Growth Rate and Gene Expression in Methylobacterium extorquens AM1. J Bacteriol. 2007;189:7127–7133. [PMC free article] [PubMed]
32. Knowles DW, Sudar D, Bator-Kelly C, Bissell MJ, Lelievre SA. Automated local bright feature image analysis of nuclear protein distribution identifies changes in tissue phenotype. PNAS. 2006;103:4445–4450. [PubMed]
33. Lin G, Chawla MK, Olson K, Barnes CA, Guzowski JF, Bjornsson C, Shain W, Roysam B. A Multi-Model Approach to Simultaneous, Segmentation and Classification of Heterogeneous Populations of Cell Nuclei in 3D Confocal Microscope Images. Cytometry A. 2007;71:724–736. [PubMed]
34. Huisman A, Ploeger LS, Dullens HF, Poulin N, Grizzle WE, van Diest PJ. Development of 3D chromatin texture analysis using confocal laser scanning microscopy. Cell Oncol. 2005;27:335–345. [PubMed]
35. Haralick, Shapiro L. Computer and Robot Vision. Prentice Hall; 2002.
36. Boland MV, Murphy RF. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope image of HeLa cells. Bioinformatics. 2001;17:1213–1223. [PubMed]
37. Stacey DW, Hitomi M. Cell Cycle Studies Based Upon Quantitative Image Analysis. Cytometry A. 2008;73A:270–278. [PubMed]
38. Gue M, Messaoudi C, Sun JS, Boudier T. Smart 3D-FISH: Automation of Distance Analysis in Nuclei of Interphase Cells by Image Processing. Cytometry A. 2005;67A:18–26. [PubMed]
39. Jiang D, Tang C, Zhang A. Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans Knowl Data Eng. 2004;16:1370–1386.
40. Dice L. Measures of the amount of ecological association between species. J. Ecology. 1945;26:297–302.
41. Bhattacharyya A. On a measure of divergence between two statistical populations defined by probability distributions. Bull Calcutta Math Soc. 1943;35:99–109.
42. Mahalanobis PC. On the generalized distance in statistics. Proc Nat Inst Scien India. 1936;2:49–55.
43. Renyi A. On measures of entropy and information. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability; Univ. California Press; Berkley. 1961. pp. 547–561.
44. Shannon CE. A mathematical theory of communication. The Bell System Technical Journal. 1948;27:379–423. 623–656.
45. Kerridge DF. Inaccuracy and Inference. J. Roy. Statist. Soc. 1961;23:184–194.
46. Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22:79–86.
47. Mathiassen JR, Skavhaug A, Bø K. Texture Similarity Measure Using Kullback-Leibler Divergence between Gamma Distributions. LNCS. 2002;2352:133–147.
48. Pluim JPW, Maintz JBA, Viergever MA. Mutual-information-based registration of medical images: a survey. IEEE Trans Med Imaging. 2003;22:986–1004. [PubMed]
49. Hibbard LS. Region segmentation using information divergence measures. Med Image Anal. 2004;8:233–244. [PubMed]
50. Kasturi J, Acharya R, Ramanathan M. An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics. 2004;19:449–458. [PubMed]
51. Xiongjun P, Wenlu Y, Liqing Z. Blind Clustering of DNA Fragments Based on Kullback-Leibler Divergence. LNCS. 2005;3610:1043–1046.
52. Wenlu Y, Xiongjun P, Liqing Z. Similarity Analysis of DNA Sequences Based on the Relative Entropy. LNCS. 2005;3610:1035–1038.
53. Lin G, Adiga U, Olson K, Guzowski JF, Barnes CA, Roysam B. A Hybrid 3D Watershed Algorithm Incorporating Gradient Cues and Object Models for Automatic Segmentation of Nuclei in Confocal Image Stacks. Cytometry A. 2003;56A:23–36. [PubMed]
54. Lin G, Chawla MK, Olson K, Guzowski JF, Barnes CA, Roysam B, Hierarchical Model-Based Merging of Multiple Fragments for Improved Three-Dimensional Segmentation of Nuclei. Cytometry A. 2005;63A:20–33. [PubMed]
55. Choi H-J, Choi H-K. Grading of renal cell carcinoma by 3D morphological analysis of cell nuclei. Comput Biol Med. 2007;37:1334–1341. [PubMed]
56. Zhao T, Murphy RF. Automated Learning of Generative Models for Subcellular Location: Building Blocks for Systems Biology. Cytometry A. 2007;71A:978–990. [PubMed]
57. Beil M, Durschmied D, Paschke S, Schreiner B, Nolte U, Bruel A, Irinopoulou T. Spatial Distribution Patterns of Interphase Centromeres During Retinoic Acid-Induced, Differentiation of Promyelocytic Leukemia Cells. Cytometry. 2002;47:217–225. [PubMed]
58. Gudla PR, Nandy K, Collins J, Meaburn KJ, Misteli T, Lockett S. A high-throughput system for segmenting nuclei using multiscale techniques. Cytometry A. 2008;73:451–466. [PubMed]
59. Ben-Shlomo A, Wawrowsky KA, Proekt I, Wolkenfeld NM, Ren S-G, Taylor J, Culler MD, Melmed S. Somatostatin receptor type 5 modulates somatostatin receptor type 2 regulation of adrenocorticotropin secretion. J Biol Chem. 2005;280:24011–24021. [PubMed]
60. Tajbakhsh J, Luz H, Bornfleth H, Lampel S, Cremer C, Lichter P. Spatial distribution of GC- and AT-rich DNA sequences within human chromosome territories. Exp Cell Res. 2000;15:229–237. [PubMed]
61. Scheuermann MO, Tajbakhsh J, Kurz A, Saracoglu K, Eils R, Lichter P. Topology of genes and nontranscribed sequences in human interphase nuclei. Exp Cell Res. 2004;10:266–279. [PubMed]
62. Zack LS, Rogers WE, Latt SA. Automatic measurement of sister chromatid exchange frequency. J Histochem Cytochem. 1977;25:741–753. [PubMed]
63. Beucher S, Lantuejoul C. International Workshop on Image Processing: Real Time and motion Detection/Estimation. France: Rennes; 1979. Sept, Use of watersheds in contour detection.
64. Li F, Zhou X, Zhu J, Ma J, Huang X, Wong STC. High content image analysis for human H4 neuroglioma cells exposed to CuO nanoparticles. BMC Biotechnol. 2007;7:66. [PMC free article] [PubMed]
65. Lindblad J, Wahlby C, Bengtsson E, Zaltsman A. Image Analysis for Automatic Segmentation of Cytoplasms and Classification of Rac1 Activation. Cytometry A. 2004;57A:22–33. [PubMed]
66. Kullback S. Information Theory and Statistics. Dover Pub. 1997:8.
67. Dehghan H, Ghassemian H. Measurement of uncertainty by the entropy: application to the classification of MSS data. Int J Remote Sens. 2006;27:4005–4014.
68. Dempster A, Laird N, Rubin D. Likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977;39(1):1–38.
69. Lilliefors HW. On the Komogorov-Smirnov test for normality with mean and variance unknown. J Am Stat Ass. 1967;62:399–402.
70. de Capoa A, Menendez F, Poggesi I, Giancotti P, Grappelli C, Marotta MR, Di Leandro M, Reynaud C, Niveleau A. Cytological evidence for 5-azacytidine-induced demethylation of the heterochromatic regions of human chromosomes. Chromosome Res. 1996;4(4):271–276. [PubMed]
71. Gruenbaum Y, Szyf M, Cedar H, Razin A. Methylation of replicating and post-replicated mouse L-cell DNA. Proc. Natl. Acad. Sci. USA. 80(16):4919–4921. [PubMed]
72. Woodcock DM, Simmons DL, Crowther OJ, Cooper IA, Trainor KJ, Morley AA. Delayed DNA methylation is an integral feature of DNA replication in mammalian cells. Exp. Cell. Res. 166(1):102–112. [PubMed]
73. Madeira SC, Oliveira AL. Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE Trans Comput Biol Bioinform. 2004;1:24–45. [PubMed]
74. Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC bioinformatics. 2006;2:280–302. [PMC free article] [PubMed]
75. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A Genetics. 1998;95:14863–14868. [PubMed]
76. Hackett HS, Hodgson JG, Law ME, Fridlyand J, Osoegawa K, de Jong PJ, Nowak NJ, Pinkel D, Albertson DG, Jain A, Jenkins R, Gray JW, Weiss WA. Genome-wide Array CGH Analysis of Murine Neuroblastoma Reveals Distinct Genomic Aberrations which Parallel those in Human Tumors. Cancer Res. 2003;63:5266–5273. [PubMed]
77. Bastian BC, Olshen AB, LeBoit PE, Pinkel D. Classifying Melanocytic Tumors Based on DNA Copy Number Changes. Am J Pathol. 2003;163(5):1765–1770. [PubMed]
78. Roerig P, Nessling M, Radlwimmer B, Joos S, Wrobel G, Schwaenen C, Reifenberger G, Lichter P. Molecular classification of human gliomas using matrix-based comparative genomic hybridization. Int. J. Cancer. 2005;117:95–103. [PubMed]
79. Jain AN, Chin K, Børresen-Dale A-L, Erikstein BK, Lonning PE, Kaaresen R, Gray JW. Quantitative analysis of chromosomal CGH in human breast tumors associates copy number abnormalities with p53 status and patient survival. PNAS. 2001;98(14):7952–7957. [PubMed]