In this paper, we evaluate the discrimination power of structural superindices. Superindices for graphs represent measures composed of other structural indices. In particular, we compare the discrimination power of the superindices with those of individual graph descriptors. In addition, we perform a statistical analysis to generalize our findings to large graphs.
Medical forms are very heterogeneous: on a European scale there are thousands of data items in several hundred different systems. To enable data exchange for clinical care and research purposes there is a need to develop interoperable documentation systems with harmonized forms for data capture. A prerequisite in this harmonization process is comparison of forms. So far – to our knowledge – an automated method for comparison of medical forms is not available. A form contains a list of data items with corresponding medical concepts. An automatic comparison needs data types, item names and especially item with these unique concept codes from medical terminologies. The scope of the proposed method is a comparison of these items by comparing their concept codes (coded in UMLS). Each data item is represented by item name, concept code and value domain. Two items are called identical, if item name, concept code and value domain are the same. Two items are called matching, if only concept code and value domain are the same. Two items are called similar, if their concept codes are the same, but the value domains are different. Based on these definitions an open-source implementation for automated comparison of medical forms in ODM format with UMLS-based semantic annotations was developed. It is available as package compareODM from http://cran.r-project.org. To evaluate this method, it was applied to a set of 7 real medical forms with 285 data items from a large public ODM repository with forms for different medical purposes (research, quality management, routine care). Comparison results were visualized with grid images and dendrograms. Automated comparison of semantically annotated medical forms is feasible. Dendrograms allow a view on clustered similar forms. The approach is scalable for a large set of real medical forms.
Nanofiber scaffolds have been useful for engineering tissues derived from mesenchymal cells, but few studies have investigated their applicability for epithelial cell-derived tissues. In this study, we generated nanofiber (250 nm) or microfiber (1200 nm) scaffolds via electrospinning from the polymer, poly-L-lactic-co-glycolic acid (PLGA). Cell-scaffold contacts were visualized using fluorescent immunocytochemistry and laser scanning confocal microscopy. Focal adhesion (FA) proteins, such as phosphorylated FAK (Tyr397), paxillin (Tyr118), talin and vinculin were localized to FA complexes in adult cells grown on planar surfaces but were reduced and diffusely localized in cells grown on nanofiber surfaces, similar to the pattern observed in adult mouse salivary gland tissues. Significant differences in epithelial cell morphology and cell clustering were also observed and quantified, using image segmentation and computational cell-graph analyses. No statistically significant differences in scaffold stiffness between planar PLGA film controls compared to nanofibers scaffolds were detected using nanoindentation with atomic force microscopy, indicating that scaffold topography rather than mechanical properties accounts for changes in cell attachments and cell structure. Finally, PLGA nanofiber scaffolds could support the spontaneous self-organization and branching of dissociated embryonic salivary gland cells. Nanofiber scaffolds may therefore have applicability in the future for engineering an artificial salivary gland.
Biomimetic material; cell signaling; cell morphology; ECM; nanofibers; organ culture; PLGA; salivary gland; scaffold
The resurgence of tuberculosis in the 1990s and the emergence of drug-resistant tuberculosis in the first decade of the 21st century increased the importance of epidemiological models for the disease. Due to slow progression of tuberculosis, the transmission dynamics and its long-term effects can often be better observed and predicted using simulations of epidemiological models. This study provides a review of earlier study on modeling different aspects of tuberculosis dynamics. The models simulate tuberculosis transmission dynamics, treatment, drug resistance, control strategies for increasing compliance to treatment, HIV/TB co-infection, and patient groups. The models are based on various mathematical systems, such as systems of ordinary differential equations, simulation models, and Markov Chain Monte Carlo methods. The inferences from the models are justified by case studies and statistical analysis of TB patient datasets.
Tuberculosis; epidemiological models; transmission; drug resistance; treatment; co-epidemics of HIV and tuberculosis; patient groups
Many dynamical systems, including lakes, organisms, ocean circulation patterns, or financial markets, are now thought to have tipping points where critical transitions to a contrasting state can happen. Because critical transitions can occur unexpectedly and are difficult to manage, there is a need for methods that can be used to identify when a critical transition is approaching. Recent theory shows that we can identify the proximity of a system to a critical transition using a variety of so-called ‘early warning signals’, and successful empirical examples suggest a potential for practical applicability. However, while the range of proposed methods for predicting critical transitions is rapidly expanding, opinions on their practical use differ widely, and there is no comparative study that tests the limitations of the different methods to identify approaching critical transitions using time-series data. Here, we summarize a range of currently available early warning methods and apply them to two simulated time series that are typical of systems undergoing a critical transition. In addition to a methodological guide, our work offers a practical toolbox that may be used in a wide range of fields to help detect early warning signals of critical transitions in time series data.
Prognosis of breast cancer is primarily predicted by the histological grading of the tumor, where pathologists manually evaluate microscopic characteristics of the tissue. This labor intensive process suffers from intra- and inter-observer variations; thus, computer-aided systems that accomplish this assessment automatically are in high demand. We address this by developing an image analysis framework for the automated grading of breast cancer in in vitro three-dimensional breast epithelial acini through the characterization of acinar structure morphology. A set of statistically significant features for the characterization of acini morphology are exploited for the automated grading of six (MCF10 series) cell line cultures mimicking three grades of breast cancer along the metastatic cascade. In addition to capturing both expected and visually differentiable changes, we quantify subtle differences that pose a challenge to assess through microscopic inspection. Our method achieves 89.0% accuracy in grading the acinar structures as nonmalignant, noninvasive carcinoma, and invasive carcinoma grades. We further demonstrate that the proposed methodology can be successfully applied for the grading of in vivo tissue samples albeit with additional constraints. These results indicate that the proposed features can be used to describe the relationship between the acini morphology and cellular function along the metastatic cascade.
The structure/function relationship is fundamental to our understanding of biological systems at all levels, and drives most, if not all, techniques for detecting, diagnosing, and treating disease. However, at the tissue level of biological complexity we encounter a gap in the structure/function relationship: having accumulated an extraordinary amount of detailed information about biological tissues at the cellular and subcellular level, we cannot assemble it in a way that explains the correspondingly complex biological functions these structures perform. To help close this information gap we define here several quantitative temperospatial features that link tissue structure to its corresponding biological function. Both histological images of human tissue samples and fluorescence images of three-dimensional cultures of human cells are used to compare the accuracy of in vitro culture models with their corresponding human tissues. To the best of our knowledge, there is no prior work on a quantitative comparison of histology and in vitro samples. Features are calculated from graph theoretical representations of tissue structures and the data are analyzed in the form of matrices and higher-order tensors using matrix and tensor factorization methods, with a goal of differentiating between cancerous and healthy states of brain, breast, and bone tissues. We also show that our techniques can differentiate between the structural organization of native tissues and their corresponding in vitro engineered cell culture models.
Strains of the Mycobacterium tuberculosis complex (MTBC) can be classified into coherent lineages of similar traits based on their genotype. We present a tensor clustering framework to group MTBC strains into sublineages of the known major lineages based on two biomarkers: spacer oligonucleotide type (spoligotype) and mycobacterial interspersed repetitive units (MIRU). We represent genotype information of MTBC strains in a high-dimensional array in order to include information about spoligotype, MIRU, and their coexistence using multiple-biomarker tensors. We use multiway models to transform this multidimensional data about the MTBC strains into two-dimensional arrays and use the resulting score vectors in a stable partitive clustering algorithm to classify MTBC strains into sublineages. We validate clusterings using cluster stability and accuracy measures, and find stabilities of each cluster. Based on validated clustering results, we present a sublineage structure of MTBC strains and compare it to the sublineage structures of SpolDB4 and MIRU-VNTRplus.
Tuberculosis; Mycobacterium tuberculosis complex; multiway models; clustering; cluster validation
The large amount of information contained in bibliographic databases has recently boosted the use of citations, and other indicators based on citation numbers, as tools for the quantitative assessment of scientific research. Citations counts are often interpreted as proxies for the scientific influence of papers, journals, scholars, and institutions. However, a rigorous and scientifically grounded methodology for a correct use of citation counts is still missing. In particular, cross-disciplinary comparisons in terms of raw citation counts systematically favors scientific disciplines with higher citation and publication rates. Here we perform an exhaustive study of the citation patterns of millions of papers, and derive a simple transformation of citation counts able to suppress the disproportionate citation counts among scientific domains. We find that the transformation is well described by a power-law function, and that the parameter values of the transformation are typical features of each scientific discipline. Universal properties of citation patterns descend therefore from the fact that citation distributions for papers in a specific field are all part of the same family of univariate distributions.
Pattern formation in developing tissues involves dynamic spatio-temporal changes in cellular organization and subsequent evolution of functional adult structures. Branching morphogenesis is a developmental mechanism by which patterns are generated in many developing organs, which is controlled by underlying molecular pathways. Understanding the relationship between molecular signaling, cellular behavior and resulting morphological change requires quantification and categorization of the cellular behavior. In this study, tissue-level and cellular changes in developing salivary gland in response to disruption of ROCK-mediated signaling by are modeled by building cell-graphs to compute mathematical features capturing structural properties at multiple scales. These features were used to generate multiscale cell-graph signatures of untreated and ROCK signaling disrupted salivary gland organ explants. From confocal images of mouse submandibular salivary gland organ explants in which epithelial and mesenchymal nuclei were marked, a multiscale feature set capturing global structural properties, local structural properties, spectral, and morphological properties of the tissues was derived. Six feature selection algorithms and multiway modeling of the data was performed to identify distinct subsets of cell graph features that can uniquely classify and differentiate between different cell populations. Multiscale cell-graph analysis was most effective in classification of the tissue state. Cellular and tissue organization, as defined by a multiscale subset of cell-graph features, are both quantitatively distinct in epithelial and mesenchymal cell types both in the presence and absence of ROCK inhibitors. Whereas tensor analysis demonstrate that epithelial tissue was affected the most by inhibition of ROCK signaling, significant multiscale changes in mesenchymal tissue organization were identified with this analysis that were not identified in previous biological studies. We here show how to define and calculate a multiscale feature set as an effective computational approach to identify and quantify changes at multiple biological scales and to distinguish between different states in developing tissues.
Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and Mycobacterium Interspersed Repetitive Unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes, in a large patient dataset from the United States Centers for Disease Control and Prevention (CDC). Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligoforest, using three genetic distance measures. An analysis of topological attributes of the spoligoforest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Finally, the total number of mutation events at consecutive DR loci follows a bimodal distribution, which results in accumulation of shorter deletions in the DR region. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements. The change point in the bimodal distribution is spacer 34, which is absent in most MTBC strains. This bimodal separation results in accumulation of shorter deletions, which explains why a power law distribution is not a plausible fit to the mutation length frequency.
tuberculosis; Mycobacterium tuberculosis complex; DR locus; spoligotype; MIRU-VNTR; mutation
Strains of Mycobacterium tuberculosis complex (MTBC) can be classified into major lineages based on their genotype. Further subdivision of major lineages into sublineages requires multiple biomarkers along with methods to combine and analyze multiple sources of information in one unsupervised learning model. Typically, spacer oligonucleotide type (spoligotype) and mycobacterial interspersed repetitive units (MIRU) are used for TB genotyping and surveillance. Here, we examine the sublineage structure of MTBC strains with multiple biomarkers simultaneously, by employing a tensor clustering framework (TCF) on multiple-biomarker tensors.
Simultaneous analysis of the spoligotype and MIRU type of strains using TCF on multiple-biomarker tensors leads to coherent sublineages of major lineages with clear and distinctive spoligotype and MIRU signatures. Comparison of tensor sublineages with SpolDB4 families either supports tensor sublineages, or suggests subdivision or merging of SpolDB4 families. High prediction accuracy of major lineage classification with supervised tensor learning on multiple-biomarker tensors validates our unsupervised analysis of sublineages on multiple-biomarker tensors.
TCF on multiple-biomarker tensors achieves simultaneous analysis of multiple biomarkers and suggest a new putative sublineage structure for each major lineage. Analysis of multiple-biomarker tensors gives insight into the sublineage structure of MTBC at the genomic level.
Computational analysis of tissue structure reveals sub-visual differences in tissue functional states by extracting quantitative signature features that establish a diagnostic profile. Incomplete and/or inaccurate profiles contribute to misdiagnosis.
In order to create more complete tissue structure profiles, we adapted our cell-graph method for extracting quantitative features from histopathology images to now capture temporospatial traits of three-dimensional collagen hydrogel cell cultures. Cell-graphs were proposed to characterize the spatial organization between the cells in tissues by exploiting graph theory wherein the nuclei of the cells constitute the nodes and the approximate adjacency of cells are represented with edges. We chose 11 different cell types representing non-tumorigenic, pre-cancerous, and malignant states from multiple tissue origins.
We built cell-graphs from the cellular hydrogel images and computed a large set of features describing the structural characteristics captured by the graphs over time. Using three-mode tensor analysis, we identified the five most significant features (metrics) that capture the compactness, clustering, and spatial uniformity of the 3D architectural changes for each cell type throughout the time course. Importantly, four of these metrics are also the discriminative features for our histopathology data from our previous studies.
Together, these descriptive metrics provide rigorous quantitative representations of image information that other image analysis methods do not. Examining the changes in these five metrics allowed us to easily discriminate between all 11 cell types, whereas differences from visual examination of the images are not as apparent. These results demonstrate that application of the cell-graph technique to 3D image data yields discriminative metrics that have the potential to improve the accuracy of image-based tissue profiles, and thus improve the detection and diagnosis of disease.
Cell cooperation is a critical event during tissue development. We present the first precise metrics to quantify the interaction between mesenchymal stem cells (MSCs) and extra cellular matrix (ECM). In particular, we describe cooperative collagen alignment process with respect to the spatio-temporal organization and function of mesenchymal stem cells in three dimensions.
We defined two precise metrics: Collagen Alignment Index and Cell Dissatisfaction Level, for quantitatively tracking type I collagen and fibrillogenesis remodeling by mesenchymal stem cells over time. Computation of these metrics was based on graph theory and vector calculus. The cells and their three dimensional type I collagen microenvironment were modeled by three dimensional cell-graphs and collagen fiber organization was calculated from gradient vectors. With the enhancement of mesenchymal stem cell differentiation, acceleration through different phases was quantitatively demonstrated. The phases were clustered in a statistically significant manner based on collagen organization, with late phases of remodeling by untreated cells clustering strongly with early phases of remodeling by differentiating cells. The experiments were repeated three times to conclude that the metrics could successfully identify critical phases of collagen remodeling that were dependent upon cooperativity within the cell population.
Definition of early metrics that are able to predict long-term functionality by linking engineered tissue structure to function is an important step toward optimizing biomaterials for the purposes of regenerative medicine.
The concept of using stem cells as self-renewing sources of healthy cells in regenerative medicine has existed for decades, but most applications have yet to achieve clinical success. A main reason for the lack of successful stem cell therapies is the difficulty in fully recreating the maintenance and control of the native stem cell niche. Improving the performance of transplanted stem cells therefore requires a better understanding of the cellular mechanisms guiding stem cell behavior in both native and engineered three-dimensional (3D) microenvironments. Most techniques, however, for uncovering mechanisms controlling cell behavior in vitro have been developed using 2D cell cultures and are of limited use in 3D environments such as engineered tissue constructs. Deciphering the mechanisms controlling stem cell fate in native and engineered 3D environments, therefore, requires rigorous quantitative techniques that permit mechanistic, hypothesis-driven studies of cell–microenvironment interactions. Here, we review the current understanding of 2D and 3D stem cell control mechanisms and propose an approach to uncovering the mechanisms that govern stem cell behavior in 3D.
Pathological examination of a biopsy is the most reliable and widely used technique to diagnose bone cancer. However, it suffers from both inter- and intra- observer subjectivity. Techniques for automated tissue modeling and classification can reduce this subjectivity and increases the accuracy of bone cancer diagnosis. This paper presents a graph theoretical method, called extracellular matrix (ECM)-aware cell-graph mining, that combines the ECM formation with the distribution of cells in hematoxylin and eosin (H&E) stained histopathological images of bone tissues samples. This method can identify different types of cells that coexist in the same tissue as a result of its functional state. Thus, it models the structure-function relationships more precisely and classifies bone tissue samples accurately for cancer diagnosis. The tissue images are segmented, using the eigenvalues of the Hessian matrix, to compute spatial coordinates of cell nuclei as the nodes of corresponding cell-graph. Upon segmentation a color code is assigned to each node based on the composition of its surrounding ECM. An edge is hypothesized (and established) between a pair of nodes if the corresponding cell membranes are in physical contact and if they share the same color. Hence, multiple colored-cell-graphs coexist in a tissue each modeling a different cell-type organization. Both topological and spectral features of ECM-aware cell-graphs are computed to quantify the structural properties of tissue samples and classify their different functional states as healthy, fractured, or cancerous using support vector machines. Classification accuracy comparison to related work shows that ECM-aware cell-graph approach yields 90.0% whereas Delaunay triangulation and simple cell-graph approach achieves 75.0% and 81.1% accuracy, respectively.
Colored Cell-Graphs; Cancer Diagnosis; Graph Mining; Tissue Classification
Systems biology refers to multidisciplinary approaches designed to uncover emergent properties of biological systems. Stem cells are an attractive target for this analysis, due to their broad therapeutic potential. A central theme of systems biology is the use of computational modeling to reconstruct complex systems from a wealth of reductionist, molecular data (e.g., gene/protein expression, signal transduction activity, metabolic activity, etc.). A number of deterministic, probabilistic, and statistical learning models are used to understand sophisticated cellular behaviors such as protein expression during cellular differentiation and the activity of signaling networks. However, many of these models are bimodal i.e., they only consider row-column relationships. In contrast, multiway modeling techniques (also known as tensor models) can analyze multimodal data, which capture much more information about complex behaviors such as cell differentiation. In particular, tensors can be very powerful tools for modeling the dynamic activity of biological networks over time. Here, we review the application of systems biology to stem cells and illustrate application of tensor analysis to model collagen-induced osteogenic differentiation of human mesenchymal stem cells.
We applied Tucker1, Tucker3, and Parallel Factor Analysis (PARAFAC) models to identify protein/gene expression patterns during extracellular matrix-induced osteogenic differentiation of human mesenchymal stem cells. In one case, we organized our data into a tensor of type protein/gene locus link × gene ontology category × osteogenic stimulant, and found that our cells expressed two distinct, stimulus-dependent sets of functionally related genes as they underwent osteogenic differentiation. In a second case, we organized DNA microarray data in a three-way tensor of gene IDs × osteogenic stimulus × replicates, and found that application of tensile strain to a collagen I substrate accelerated the osteogenic differentiation induced by a static collagen I substrate.
Our results suggest gene- and protein-level models whereby stem cells undergo transdifferentiation to osteoblasts, and lay the foundation for mechanistic, hypothesis-driven studies. Our analysis methods are applicable to a wide range of stem cell differentiation models.
Recently, we demonstrated that human mesenchymal stem cells (hMSC) stimulated with dexamethazone undergo gene focusing during osteogenic differentiation (Stem Cells Dev 14(6): 1608–20, 2005). Here, we examine the protein expression profiles of three additional populations of hMSC stimulated to undergo osteogenic differentiation via either contact with pro-osteogenic extracellular matrix (ECM) proteins (collagen I, vitronectin, or laminin-5) or osteogenic media supplements (OS media). Specifically, we annotate these four protein expression profiles, as well as profiles from naïve hMSC and differentiated human osteoblasts (hOST), with known gene ontologies and analyze them as a tensor with modes for the expressed proteins, gene ontologies, and stimulants.
Direct component analysis in the gene ontology space identifies three components that account for 90% of the variance between hMSC, osteoblasts, and the four stimulated hMSC populations. The directed component maps the differentiation stages of the stimulated stem cell populations along the differentiation axis created by the difference in the expression profiles of hMSC and hOST. Surprisingly, hMSC treated with ECM proteins lie closer to osteoblasts than do hMSC treated with OS media. Additionally, the second component demonstrates that proteomic profiles of collagen I- and vitronectin-stimulated hMSC are distinct from those of OS-stimulated cells. A three-mode tensor analysis reveals additional focus proteins critical for characterizing the phenotypic variations between naïve hMSC, partially differentiated hMSC, and hOST.
The differences between the proteomic profiles of OS-stimulated hMSC and ECM-hMSC characterize different transitional phenotypes en route to becoming osteoblasts. This conclusion is arrived at via a three-mode tensor analysis validated using hMSC plated on laminin-5.