PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1441594)

Clipboard (0)
None

Related Articles

1.  Characterization of Vascular Disease Risk in Postmenopausal Women and Its Association with Cognitive Performance 
PLoS ONE  2013;8(7):e68741.
Objectives
While global measures of cardiovascular (CV) risk are used to guide prevention and treatment decisions, these estimates fail to account for the considerable interindividual variability in pre-clinical risk status. This study investigated heterogeneity in CV risk factor profiles and its association with demographic, genetic, and cognitive variables.
Methods
A latent profile analysis was applied to data from 727 recently postmenopausal women enrolled in the Kronos Early Estrogen Prevention Study (KEEPS). Women were cognitively healthy, within three years of their last menstrual period, and free of current or past CV disease. Education level, apolipoprotein E ε4 allele (APOE4), ethnicity, and age were modeled as predictors of latent class membership. The association between class membership, characterizing CV risk profiles, and performance on five cognitive factors was examined. A supervised random forest algorithm with a 10-fold cross-validation estimator was used to test accuracy of CV risk classification.
Results
The best-fitting model generated two distinct phenotypic classes of CV risk 62% of women were “low-risk” and 38% “high-risk”. Women classified as low-risk outperformed high-risk women on language and mental flexibility tasks (p = 0.008) and a global measure of cognition (p = 0.029). Women with a college degree or above were more likely to be in the low-risk class (OR = 1.595, p = 0.044). Older age and a Hispanic ethnicity increased the probability of being at high-risk (OR = 1.140, p = 0.002; OR = 2.622, p = 0.012; respectively). The prevalence rate of APOE-ε4 was higher in the high-risk class compared with rates in the low-risk class.
Conclusion
Among recently menopausal women, significant heterogeneity in CV risk is associated with education level, age, ethnicity, and genetic indicators. The model-based latent classes were also associated with cognitive function. These differences may point to phenotypes for CV disease risk. Evaluating the evolution of phenotypes could in turn clarify preclinical disease, and screening and preventive strategies.
ClinicalTrials.gov NCT00154180
doi:10.1371/journal.pone.0068741
PMCID: PMC3714288  PMID: 23874743
2.  Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities 
Non small cell lung cancer H460 clones exhibit a high degree of heterogeneity in signaling states.Clones with similar patterns of basal signaling heterogeneity have similar paclitaxel sensitivities.Models of signaling heterogeneity among the clones can be used to classify sensitivity to paclitaxel for other cancer populations.
A high degree of phenotypic diversity has been classically observed among cancer cells, even within a single tumor (Heppner, 1984; Anderson et al, 2006; Ichim and Wells, 2006; Campbell and Polyak, 2007). Importantly, not all cancer cells contribute equally to disease progression or respond equally to therapeutic intervention (Campbell and Polyak, 2007). This heterogeneity has traditionally been viewed as an impediment to efficient diagnosis and treatment. Understanding the relevance of cellular diversity to cancer requires methods for relating patterns of phenotypic heterogeneity to functional outcomes, such as drug sensitivity. Recent advances in fluorescence microscopy image-based analysis have enabled quantitative single-cell measurements of the activation and (co-)localization of signaling molecules within large cellular populations (Boland and Murphy, 2001; Perlman et al, 2004). Here, we apply this technology to explore the extent to which patterns of basal signaling heterogeneity, present within cancer populations before treatment, reveal information about population-level response to drug perturbation.
To investigate basal cell signaling heterogeneity among a collection of cancer populations having minimal exogenous differences, such as those due to environment, cell type, and genetic background, we generated a collection of 49 low-passage clonal populations from the highly metastatic nonsmall cell lung cancer cell line H460 (Kozaki et al, 2000). We chose to observe patterns of spatial organization and activation for multiple components from diverse signaling pathways associated with cancer (marker sets 1–4: DNA/pSTAT3/pPTEN; DNA/pERK/pP38; DNA/E-cadherin/β-catenin/pGSK3; DNA/pAkt/H3K9-Ac).
We identified an objective set of signaling stereotypes from each marker set based on a probabilistic description of the distribution of cells in the feature space. For each marker set, a ‘reference' set of representative cells was sampled from all 50 H460 cancer populations. Then, each reference set was represented as a mixture of subpopulations modeled as Gaussian distributions with means centered on distinct, ‘stereotyped' signaling states (Slack et al, 2008). Our quantitative analysis suggested that a small collection of signaling stereotypes was sufficient to characterize the complexity of observed cellular phenotypes among all clones. For simplicity, we chose to use five subpopulations to model cellular heterogeneity in each marker set.
For each clone, we computed the fraction of cells in each of the identified subpopulations (Figure 2, scatter plots). Estimation of these fractions allowed us to represent each clone as a probabilistic ensemble of subpopulations. Visual differences among the clones (Figure 2, thumbnail images) were reflected by clear differences in subpopulation mixtures (Figure 2, scatter plots). To compare the subpopulation mixtures of each clone to the parent, a ‘subpopulation enrichment' profile vector was computed. The vector measured the log-fold change between the clone and the H460 parent population for each subpopulation (Figure 2, heat map).
We applied hierarchical clustering to group clones based on the similarity of their subpopulation enrichment profiles (Figure 2). Clustering by subpopulation enrichment profiles revealed only a small number of distinct patterns (or ‘signatures') of subpopulation mixtures (Figure 2, dendrogram and heat map). Thus, parameterization of observed cellular heterogeneity using subpopulation enrichment profiles succinctly encapsulated the apparent complexity of cancer cell phenotypes, and further allowed comparison of clonal populations at a resolution greater than provided by population means.
We next assessed the degree to which clones with distinct patterns of heterogeneity had distinct responses to the drug paclitaxel. We used a multidimensional scaling (Borg and Groenen, 1997) plot to visualize similarity among the clones and annotated each clone with the index of drug sensitivity. This visualization revealed striking geometric separation in ‘profile space' of paclitaxel-sensitive from paclitaxel-nonsensitive clones for each marker set (Figure 3A, green versus red and black circles). The significance of separation was further confirmed by machine learning-based classification studies. Thus heterogeneity of basal cellular signaling states contained information that could be used to predict sensitivity to drug treatment.
Our approach is general, and makes heterogeneity a computable property of cellular populations. Interrogation at subpopulation-resolution facilitated a dramatic reduction in the observed phenotypic complexity of cancer populations, yet retained sufficient biological information to identify drug responses. Our work suggests that rigorous analysis of cancer heterogeneity can provide a new resolution at which to match disease to more effective therapies.
Phenotypic heterogeneity has been widely observed in cellular populations. However, the extent to which heterogeneity contains biologically or clinically important information is not well understood. Here, we investigated whether patterns of basal signaling heterogeneity, in untreated cancer cell populations, could distinguish cellular populations with different drug sensitivities. We modeled cellular heterogeneity as a mixture of stereotyped signaling states, identified based on colocalization patterns of activated signaling molecules from microscopy images. We found that patterns of heterogeneity could be used to separate the most sensitive and resistant populations to paclitaxel within a set of H460 lung cancer clones and within the NCI-60 panel of cancer cell lines, but not for a set of less heterogeneous, immortalized noncancer human bronchial epithelial cell (HBEC) clones. Our results suggest that patterns of signaling heterogeneity, characterized as ensembles of a small number of distinct phenotypic states, can reveal functional differences among cellular populations.
doi:10.1038/msb.2010.22
PMCID: PMC2890326  PMID: 20461076
cancer; heterogeneity; multivariate analysis; signaling; systems biology
3.  Cascaded discrimination of normal, abnormal, and confounder classes in histopathology: Gleason grading of prostate cancer 
BMC Bioinformatics  2012;13:282.
Background
Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC), where all classes are identified simultaneously, and one-versus-all (OVA), where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer), while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS) approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity.
Results
We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN), and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV) of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV) and OSC approaches (0.76 PPV).
Conclusions
Use of the CAS strategy increases the PPV for a multi-category classification system over two common alternative strategies. In classification problems such as histopathology, where multiple class groups exist with varying degrees of heterogeneity, the CAS system can intelligently assign class labels to objects by performing multiple binary classifications according to domain knowledge.
doi:10.1186/1471-2105-13-282
PMCID: PMC3563463  PMID: 23110677
4.  A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays 
BMC Bioinformatics  2006;7:514.
Background
Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples.
Results
We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset.
Conclusion
The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.
doi:10.1186/1471-2105-7-514
PMCID: PMC1698579  PMID: 17125514
5.  Developmental Profiles of Eczema, Wheeze, and Rhinitis: Two Population-Based Birth Cohort Studies 
PLoS Medicine  2014;11(10):e1001748.
Using data from two population-based birth cohorts, Danielle Belgrave and colleagues examine the evidence for atopic march in developmental profiles for allergic disorders.
Please see later in the article for the Editors' Summary
Background
The term “atopic march” has been used to imply a natural progression of a cascade of symptoms from eczema to asthma and rhinitis through childhood. We hypothesize that this expression does not adequately describe the natural history of eczema, wheeze, and rhinitis during childhood. We propose that this paradigm arose from cross-sectional analyses of longitudinal studies, and may reflect a population pattern that may not predominate at the individual level.
Methods and Findings
Data from 9,801 children in two population-based birth cohorts were used to determine individual profiles of eczema, wheeze, and rhinitis and whether the manifestations of these symptoms followed an atopic march pattern. Children were assessed at ages 1, 3, 5, 8, and 11 y. We used Bayesian machine learning methods to identify distinct latent classes based on individual profiles of eczema, wheeze, and rhinitis. This approach allowed us to identify groups of children with similar patterns of eczema, wheeze, and rhinitis over time.
Using a latent disease profile model, the data were best described by eight latent classes: no disease (51.3%), atopic march (3.1%), persistent eczema and wheeze (2.7%), persistent eczema with later-onset rhinitis (4.7%), persistent wheeze with later-onset rhinitis (5.7%), transient wheeze (7.7%), eczema only (15.3%), and rhinitis only (9.6%). When latent variable modelling was carried out separately for the two cohorts, similar results were obtained. Highly concordant patterns of sensitisation were associated with different profiles of eczema, rhinitis, and wheeze. The main limitation of this study was the difference in wording of the questions used to ascertain the presence of eczema, wheeze, and rhinitis in the two cohorts.
Conclusions
The developmental profiles of eczema, wheeze, and rhinitis are heterogeneous; only a small proportion of children (∼7% of those with symptoms) follow trajectory profiles resembling the atopic march.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Our immune system protects us from viruses, bacteria, and other pathogens by recognizing specific molecules on the invader's surface and initiating a sequence of events that culminates in the death of the pathogen. Sometimes, however, our immune system responds to harmless materials (allergens such as pollen) and triggers allergic, or atopic, symptoms. Common atopic symptoms include eczema (transient dry itchy patches on the skin), wheeze (high pitched whistling in the chest, a symptom of asthma), and rhinitis (sneezing or a runny nose in the absence of a cold or influenza). All these symptoms are very common during childhood, but recent epidemiological studies (examinations of the patterns and causes of diseases in a population) have revealed age-related changes in the proportions of children affected by each symptom. So, for example, eczema is more common in infants than in school-age children. These findings have led to the idea of “atopic march,” a natural progression of symptoms within individual children that starts with eczema, then progresses to wheeze and finally rhinitis.
Why Was This Study Done?
The concept of atopic march has led to the initiation of studies that aim to prevent the development of asthma in children who are thought to be at risk of asthma because they have eczema. Moreover, some guidelines recommend that clinicians tell parents that children with eczema may later develop asthma or rhinitis. However, because of the design of the epidemiological studies that support the concept of atopic march, children with eczema who later develop wheeze and rhinitis may actually belong to a distinct subgroup of children, rather than representing the typical progression of atopic diseases. It is important to know whether atopic march adequately describes the natural history of atopic diseases during childhood to avoid the imposition of unnecessary strategies on children with eczema to prevent asthma. Here, the researchers use machine learning techniques to model the developmental profiles of eczema, wheeze, and rhinitis during childhood in two large population-based birth cohorts by taking into account time-related (longitudinal) changes in symptoms within individuals. Machine learning is a data-driven approach that identifies structure within the data (for example, a typical progression of symptoms) using unsupervised learning of latent variables (variables that are not directly measured but are inferred from other observable characteristics).
What Did the Researchers Do and Find?
The researchers used data from two UK birth cohorts—the Avon Longitudinal Study of Parents and Children (ALSPAC) and the Manchester Asthma and Allergy Study (MAAS)—for their study (9,801 children in total). Both studies enrolled children at birth and monitored their subsequent health at regular review clinics. At each review clinic, information about eczema, wheeze, and rhinitis was collected from the parents using validated questionnaires. The researchers then used these data and machine learning methods to identify groups of children with similar patterns of onset of eczema, wheeze, and rhinitis over the first 11 years of life. Using a type of statistical model called a latent disease profile model, the researchers found that the data were best described by eight latent classes—no disease (51.3% of the children), atopic march (3.1%), persistent eczema and wheeze (2.7%), persistent eczema with later-onset rhinitis (4.7%), persistent wheeze with later-onset rhinitis (5.7%), transient wheeze (7.7%), eczema only (15.3%), and rhinitis only (9.6%).
What Do These Findings Mean?
These findings show that, in two large UK birth cohorts, the developmental profiles of eczema, wheeze, and rhinitis were heterogeneous. Most notably, the progression of symptoms fitted the profile of atopic march in fewer than 7% of children with symptoms. The researchers acknowledge that their study has some limitations. For example, small differences in the wording of the questions used to gather information from parents about their children's symptoms in the two cohorts may have slightly affected the findings. However, based on their findings, the researchers propose that, because eczema, wheeze, and rhinitis are common, these symptoms often coexist in individuals, but as independent entities rather than as a linked progression of symptoms. Thus, using eczema as an indicator of subsequent asthma risk and assigning “preventative” measures to children with eczema is flawed. Importantly, clinicians need to understand the heterogeneity of patterns of atopic diseases in children and to communicate this variability to parents when advising them about the development and resolution of atopic symptoms in their children.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001748.
The UK National Health Service Choices website provides information about eczema (including personal stories), asthma (including personal stories), and rhinitis
The US National Institute of Allergy and Infectious Diseases provides information about atopic diseases
The UK not-for-profit organization Allergy UK provides information about atopic diseases and a description of the atopic march
MedlinePlus encyclopedia has pages on eczema, wheezing, and rhinitis (in English and Spanish)
MedlinePlus provides links to further resources about allergies, eczema, and asthma (in English and Spanish)
Information about ALSPAC and MAAS is available
Wikipedia has pages on machine learning and latent disease profile models (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001748
PMCID: PMC4204810  PMID: 25335105
6.  Sufficiency of FNAB aspirates of posterior uveal melanoma for cytologic versus GEP classification in 159 patients, and relative prognostic significance of these classifications 
Objective
To determine the relative sufficiency of paired aspirates of posterior uveal melanomas obtained by FNAB for cytopathology and GEP, and their prognostic significance for predicting death from metastasis.
Methods
Prospective non-randomized IRB-approved single-center longitudinal clinical study of 159 patients with posterior uveal melanoma sampled by FNAB in at least two tumor sites between 09/2007 and 12/2010. Cases were analyzed with regard to sufficiency of the obtained aspirates for cytopathologic classification and GEP classification. Statistical strength of associations between variables and GEP class was computed using Chi-square test. Cumulative actuarial survival curves of subgroups of these patients based on their cytopathologic versus GEP-assigned categories were computed by the Kaplan–Meier method. The endpoint for this survival analysis was death from metastatic uveal melanoma.
Results
FNAB aspirates were insufficient for cytopathologic classification in 34 of 159 cases (21.9 %). In contrast, FNAB aspirates were insufficient for GEP classification in only one of 159 cases (0.6 %). This difference is statistically significant (P < 0.001). Six of 34 tumors (17.6 %) that yielded an insufficient aspirate for cytopathologic diagnosis were categorized as GEP class 2, while 43 of 125 tumors (34.7 %) that yielded a sufficient aspirate for cytopathologic diagnosis were categorized as GEP class 2. To date, 14 of the 49 patients with a GEP class 2 tumor (28.6 %) but only five of the 109 patients with a GEP class 1 tumor (5.6 %) have developed metastasis. Fifteen of 125 patients (12 %) whose tumors yielded sufficient aspirates for cytopathologic classification but only four of 34 patients (11.8 %) whose tumors yielded insufficient aspirates for cytopathologic classification developed metastasis. The median post-biopsy follow-up time for surviving patients in this series was 32.5 months. Cumulative actuarial 5-year probability of death from metastasis 14.1 % for those with an insufficient aspirate for cytopathologic classification versus 22.4 % for those with a sufficient aspirate for cytopathologic classification (log rank P = 0.68). In contrast, the cumulative actuarial 5-year probability of metastatic death was 8.0 % for those with an insufficient/unsatisfactory aspirate for GEP classification or GEP class 1 tumor, versus 45.0 % for those with a GEP class 2 tumor (log rank P = 0.005).
Conclusion
This study confirmed that GEP classification of posterior uveal melanoma cells obtained by FNAB is feasible in almost all cases, including most in which FNAB yields an insufficient aspirate for cytodiagnosis. The study also confirmed that GEP classification is substantially better than cytologic classification for predicting subsequent metastasis and metastatic death.
doi:10.1007/s00417-013-2515-0
PMCID: PMC3889697  PMID: 24270974
Melanoma; Uveal neoplasm; Choroidal melanoma; Cytology; Biopsy, needle/methods; Gene expression profile; Survival prognosis; Melanoma/metastasis
7.  Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism 
A comprehensive genome-scale metabolic network of Chlamydomonas reinhardtii, including a detailed account of light-driven metabolism, is reconstructed and validated. The model provides a new resource for research of C. reinhardtii metabolism and in algal biotechnology.
The genome-scale metabolic network of Chlamydomonas reinhardtii (iRC1080) was reconstructed, accounting for >32% of the estimated metabolic genes encoded in the genome, and including extensive details of lipid metabolic pathways.This is the first metabolic network to explicitly account for stoichiometry and wavelengths of metabolic photon usage, providing a new resource for research of C. reinhardtii metabolism and developments in algal biotechnology.Metabolic functional annotation and the largest transcript verification of a metabolic network to date was performed, at least partially verifying >90% of the transcripts accounted for in iRC1080. Analysis of the network supports hypotheses concerning the evolution of latent lipid pathways in C. reinhardtii, including very long-chain polyunsaturated fatty acid and ceramide synthesis pathways.A novel approach for modeling light-driven metabolism was developed that accounts for both light source intensity and spectral quality of emitted light. The constructs resulting from this approach, termed prism reactions, were shown to significantly improve the accuracy of model predictions, and their use was demonstrated for evaluation of light source efficiency and design.
Algae have garnered significant interest in recent years, especially for their potential application in biofuel production. The hallmark, model eukaryotic microalgae Chlamydomonas reinhardtii has been widely used to study photosynthesis, cell motility and phototaxis, cell wall biogenesis, and other fundamental cellular processes (Harris, 2001). Characterizing algal metabolism is key to engineering production strains and understanding photobiological phenomena. Based on extensive literature on C. reinhardtii metabolism, its genome sequence (Merchant et al, 2007), and gene functional annotation, we have reconstructed and experimentally validated the genome-scale metabolic network for this alga, iRC1080, the first network to account for detailed photon absorption permitting growth simulations under different light sources. iRC1080 accounts for 1080 genes, associated with 2190 reactions and 1068 unique metabolites and encompasses 83 subsystems distributed across 10 cellular compartments (Figure 1A). Its >32% coverage of estimated metabolic genes is a tremendous expansion over previous algal reconstructions (Boyle and Morgan, 2009; Manichaikul et al, 2009). The lipid metabolic pathways of iRC1080 are considerably expanded relative to existing networks, and chemical properties of all metabolites in these pathways are accounted for explicitly, providing sufficient detail to completely specify all individual molecular species: backbone molecule and stereochemical numbering of acyl-chain positions; acyl-chain length; and number, position, and cis–trans stereoisomerism of carbon–carbon double bonds. Such detail in lipid metabolism will be critical for model-driven metabolic engineering efforts.
We experimentally verified transcripts accounted for in the network under permissive growth conditions, detecting >90% of tested transcript models (Figure 1B) and providing validating evidence for the contents of iRC1080. We also analyzed the extent of transcript verification by specific metabolic subsystems. Some subsystems stood out as more poorly verified, including chloroplast and mitochondrial transport systems and sphingolipid metabolism, all of which exhibited <80% of transcripts detected, reflecting incomplete characterization of compartmental transporters and supporting a hypothesis of latent pathway evolution for ceramide synthesis in C. reinhardtii. Additional lines of evidence from the reconstruction effort similarly support this hypothesis including lack of ceramide synthetase and other annotation gaps downstream in sphingolipid metabolism. A similar hypothesis of latent pathway evolution was established for very long-chain fatty acids (VLCFAs) and their polyunsaturated analogs (VLCPUFAs) (Figure 1C), owing to the absence of this class of lipids in previous experimental measurements, lack of a candidate VLCFA elongase in the functional annotation, and additional downstream annotation gaps in arachidonic acid metabolism.
The network provides a detailed account of metabolic photon absorption by light-driven reactions, including photosystems I and II, light-dependent protochlorophyllide oxidoreductase, provitamin D3 photoconversion to vitamin D3, and rhodopsin photoisomerase; this network accounting permits the precise modeling of light-dependent metabolism. iRC1080 accounts for effective light spectral ranges through analysis of biochemical activity spectra (Figure 3A), either reaction activity or absorbance at varying light wavelengths. Defining effective spectral ranges associated with each photon-utilizing reaction enabled our network to model growth under different light sources via stoichiometric representation of the spectral composition of emitted light, termed prism reactions. Coefficients for different photon wavelengths in a prism reaction correspond to the ratios of photon flux in the defined effective spectral ranges to the total emitted photon flux from a given light source (Figure 3B). This approach distinguishes the amount of emitted photons that drive different metabolic reactions. We created prism reactions for most light sources that have been used in published studies for algal and plant growth including solar light, various light bulbs, and LEDs. We also included regulatory effects, resulting from lighting conditions insofar as published studies enabled. Light and dark conditions have been shown to affect metabolic enzyme activity in C. reinhardtii on multiple levels: transcriptional regulation, chloroplast RNA degradation, translational regulation, and thioredoxin-mediated enzyme regulation. Through application of our light model and prism reactions, we were able to closely recapitulate experimental growth measurements under solar, incandescent, and red LED lights. Through unbiased sampling, we were able to establish the tremendous statistical significance of the accuracy of growth predictions achievable through implementation of prism reactions. Finally, application of the photosynthetic model was demonstrated prospectively to evaluate light utilization efficiency under different light sources. The results suggest that, of the existing light sources, red LEDs provide the greatest efficiency, about three times as efficient as sunlight. Extending this analysis, the model was applied to design a maximally efficient LED spectrum for algal growth. The result was a 677-nm peak LED spectrum with a total incident photon flux of 360 μE/m2/s, suggesting that for the simple objective of maximizing growth efficiency, LED technology has already reached an effective theoretical optimum.
In summary, the C. reinhardtii metabolic network iRC1080 that we have reconstructed offers insight into the basic biology of this species and may be employed prospectively for genetic engineering design and light source design relevant to algal biotechnology. iRC1080 was used to analyze lipid metabolism and generate novel hypotheses about the evolution of latent pathways. The predictive capacity of metabolic models developed from iRC1080 was demonstrated in simulating mutant phenotypes and in evaluation of light source efficiency. Our network provides a broad knowledgebase of the biochemistry and genomics underlying global metabolism of a photoautotroph, and our modeling approach for light-driven metabolism exemplifies how integration of largely unvisited data types, such as physicochemical environmental parameters, can expand the diversity of applications of metabolic networks.
Metabolic network reconstruction encompasses existing knowledge about an organism's metabolism and genome annotation, providing a platform for omics data analysis and phenotype prediction. The model alga Chlamydomonas reinhardtii is employed to study diverse biological processes from photosynthesis to phototaxis. Recent heightened interest in this species results from an international movement to develop algal biofuels. Integrating biological and optical data, we reconstructed a genome-scale metabolic network for this alga and devised a novel light-modeling approach that enables quantitative growth prediction for a given light source, resolving wavelength and photon flux. We experimentally verified transcripts accounted for in the network and physiologically validated model function through simulation and generation of new experimental growth data, providing high confidence in network contents and predictive applications. The network offers insight into algal metabolism and potential for genetic engineering and efficient light source design, a pioneering resource for studying light-driven metabolism and quantitative systems biology.
doi:10.1038/msb.2011.52
PMCID: PMC3202792  PMID: 21811229
Chlamydomonas reinhardtii; lipid metabolism; metabolic engineering; photobioreactor
8.  Phenotype Recognition with Combined Features and Random Subspace Classifier Ensemble 
BMC Bioinformatics  2011;12:128.
Background
Automated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.
Results
Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.
Conclusions
The characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.
doi:10.1186/1471-2105-12-128
PMCID: PMC3098787  PMID: 21529372
9.  A full Bayesian hierarchical mixture model for the variance of gene differential expression 
BMC Bioinformatics  2007;8:124.
Background
In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log2 transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log2 transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes.
Results
We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance.
Conclusion
The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the p-values obtained are more robust than either using a constant gene or gene-specific variance estimate.
doi:10.1186/1471-2105-8-124
PMCID: PMC1876253  PMID: 17439644
10.  Locally dependent latent class models with covariates: an application to under-age drinking in the USA 
Summary
Under-age drinking is a long-standing public health problem in the USA and the identification of underage drinkers suffering alcohol-related problems has been difficult by using diagnostic criteria that were developed in adult populations. For this reason, it is important to characterize patterns of drinking in adolescents that are associated with alcohol-related problems. Latent class analysis is a statistical technique for explaining heterogeneity in individual response patterns in terms of a smaller number of classes. However, the latent class analysis assumption of local independence may not be appropriate when examining behavioural profiles and could have implications for statistical inference. In addition, if covariates are included in the model, non-differential measurement is also assumed. We propose a flexible set of models for local dependence and differential measurement that use easily interpretable odds ratio parameterizations while simultaneously fitting a marginal regression model for the latent class prevalences. Estimation is based on solving a set of second-order estimating equations. This approach requires only specification of the first two moments and allows for the choice of simple ‘working’ covariance structures. The method is illustrated by using data from a large-scale survey of under-age drinking. This new approach indicates the effectiveness of introducing local dependence and differential measurement into latent class models for selecting substantively interpretable models over more complex models that are deemed empirically superior.
doi:10.1111/j.1467-985X.2008.00544.x
PMCID: PMC2600526  PMID: 19079793
Differential measurement; Latent class; Local dependence; Marginal regression; Odds ratio; Second-order estimating equations
11.  From Cellular Characteristics to Disease Diagnosis: Uncovering Phenotypes with Supercells 
PLoS Computational Biology  2013;9(9):e1003215.
Cell heterogeneity and the inherent complexity due to the interplay of multiple molecular processes within the cell pose difficult challenges for current single-cell biology. We introduce an approach that identifies a disease phenotype from multiparameter single-cell measurements, which is based on the concept of “supercell statistics”, a single-cell-based averaging procedure followed by a machine learning classification scheme. We are able to assess the optimal tradeoff between the number of single cells averaged and the number of measurements needed to capture phenotypic differences between healthy and diseased patients, as well as between different diseases that are difficult to diagnose otherwise. We apply our approach to two kinds of single-cell datasets, addressing the diagnosis of a premature aging disorder using images of cell nuclei, as well as the phenotypes of two non-infectious uveitides (the ocular manifestations of Behçet's disease and sarcoidosis) based on multicolor flow cytometry. In the former case, one nuclear shape measurement taken over a group of 30 cells is sufficient to classify samples as healthy or diseased, in agreement with usual laboratory practice. In the latter, our method is able to identify a minimal set of 5 markers that accurately predict Behçet's disease and sarcoidosis. This is the first time that a quantitative phenotypic distinction between these two diseases has been achieved. To obtain this clear phenotypic signature, about one hundred CD8+ T cells need to be measured. Although the molecular markers identified have been reported to be important players in autoimmune disorders, this is the first report pointing out that CD8+ T cells can be used to distinguish two systemic inflammatory diseases. Beyond these specific cases, the approach proposed here is applicable to datasets generated by other kinds of state-of-the-art and forthcoming single-cell technologies, such as multidimensional mass cytometry, single-cell gene expression, and single-cell full genome sequencing techniques.
Author Summary
The behavior of organisms is based on the concerted action occurring on an astonishing range of scales from the molecular to the organismal level. Molecular properties control the function of a cell, while cell ensembles form tissues and organs, which work together as an organism. In order to understand and characterize the molecular nature of the emergent properties of a cell, it is essential that multiple components of the cell are measured simultaneously in the same cell. Similarly, multiple cells must be measured in order to understand health and disease in the organism. In this work, we develop an approach that is able to determine how many cells, how many measurements per cell, and which measurements are needed to reliably diagnose disease. We apply this method to two different problems: the diagnosis of a premature aging disorder using images of cell nuclei, and the distinction between two similar autoimmune eye diseases using stained cells from patients' blood samples. Our findings shed new light on the role of specific kinds of immune system cells in systemic inflammatory diseases and may lead to improved diagnosis and treatment.
doi:10.1371/journal.pcbi.1003215
PMCID: PMC3763994  PMID: 24039568
12.  Latent Class Analysis With Distal Outcomes: A Flexible Model-Based Approach 
Although prediction of class membership from observed variables in latent class analysis is well understood, predicting an observed distal outcome from latent class membership is more complicated. A flexible model-based approach is proposed to empirically derive and summarize the class-dependent density functions of distal outcomes with categorical, continuous, or count distributions. A Monte Carlo simulation study is conducted to compare the performance of the new technique to two commonly used classify-analyze techniques: maximum-probability assignment and multiple pseudo-class draws. Simulation results show that the model-based approach produces substantially less biased estimates of the effect compared to either classify-analyze technique, particularly when the association between the latent class variable and the distal outcome is strong. In addition, we show that only the model-based approach is consistent. The approach is demonstrated empirically: latent classes of adolescent depression are used to predict smoking, grades, and delinquency. SAS syntax for implementing this approach using PROC LCA and a corresponding macro are provided.
doi:10.1080/10705511.2013.742377
PMCID: PMC4240499  PMID: 25419096
latent class analysis; distal outcome; finite mixture model; pseudo-class draws
13.  HIV, Gender, Race, Sexual Orientation, and Sex Work: A Qualitative Study of Intersectional Stigma Experienced by HIV-Positive Women in Ontario, Canada 
PLoS Medicine  2011;8(11):e1001124.
Mona Loutfy and colleagues used focus groups to examine experiences of stigma and coping strategies among HIV-positive women in Ontario, Canada.
Background
HIV infection rates are increasing among marginalized women in Ontario, Canada. HIV-related stigma, a principal factor contributing to the global HIV epidemic, interacts with structural inequities such as racism, sexism, and homophobia. The study objective was to explore experiences of stigma and coping strategies among HIV-positive women in Ontario, Canada.
Methods and Findings
We conducted a community-based qualitative investigation using focus groups to understand experiences of stigma and discrimination and coping methods among HIV-positive women from marginalized communities. We conducted 15 focus groups with HIV-positive women in five cities across Ontario, Canada. Data were analyzed using thematic analysis to enhance understanding of the lived experiences of diverse HIV-positive women. Focus group participants (n = 104; mean age = 38 years; 69% ethnic minority; 23% lesbian/bisexual; 22% transgender) described stigma/discrimination and coping across micro (intra/interpersonal), meso (social/community), and macro (organizational/political) realms. Participants across focus groups attributed experiences of stigma and discrimination to: HIV-related stigma, sexism and gender discrimination, racism, homophobia and transphobia, and involvement in sex work. Coping strategies included resilience (micro), social networks and support groups (meso), and challenging stigma (macro).
Conclusions
HIV-positive women described interdependent and mutually constitutive relationships between marginalized social identities and inequities such as HIV-related stigma, sexism, racism, and homo/transphobia. These overlapping, multilevel forms of stigma and discrimination are representative of an intersectional model of stigma and discrimination. The present findings also suggest that micro, meso, and macro level factors simultaneously present barriers to health and well being—as well as opportunities for coping—in HIV-positive women's lives. Understanding the deleterious effects of stigma and discrimination on HIV risk, mental health, and access to care among HIV-positive women can inform health care provision, stigma reduction interventions, and public health policy.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
HIV-related stigma and discrimination—prejudice, negative attitudes, abuse, and maltreatment directed at people living with HIV—is a major factor contributing to the global HIV epidemic. HIV-related stigma, which devalues and stereotypes people living with HIV, increases vulnerability to HIV infection by reducing access to HIV prevention, testing, treatment, and support. At the personal (micro) level, HIV-related stigma can make it hard for people to take tests to determine their HIV status or to tell other people that they are HIV positive. At the social/community (meso) level, it can mean that HIV-positive people are ostracized from their communities. At the organizational/political (macro) level, it can mean that health-care workers treat HIV-positive people differently and that governments are deterred from taking fast, effective action against the HIV epidemic. In addition, HIV-related stigma is negatively associated with well-being among people living with HIV. Thus, among HIV-positive people, those who have experienced HIV-related stigma have higher levels of mental and physical illness.
Why Was This Study Done?
Racism (oppression and inequity founded on ethno-racial differences), sexism and gender discrimination (oppression and inequity based on gender bias in attitudes), and homophobia and transphobia (discrimination, fear, hostility, and violence towards nonheterosexual and transgender people, respectively) can also affect access to HIV services. However, little is known about how these different forms of stigma and discrimination interact (intersect). A better understanding of the effect of intersecting stigmas on people living with HIV could help in the development of stigma reduction interventions and HIV prevention, treatment and care programs, and could help to control global HIV infection rates. In this qualitative study (an analysis of people's attitudes and experiences rather than numerical data), the researchers investigate the intersection of HIV-related stigma, racism, sexism and gender discrimination, homophobia and transphobia among marginalized HIV-positive women in Ontario, Canada. As elsewhere in the world, HIV infection rates are increasing among women in Canada. Nearly 25% of people living with HIV in Canada are women and about a quarter of all new infections are in women. Moreover, there is a disproportionately high infection rate among marginalized women in Canada such as sex workers and lesbian, bisexual, and queer women.
What Did the Researchers Do and Find?
The researchers held 15 focus groups with 104 marginalized HIV-positive women who were recruited by word-of-mouth and through flyers circulated in community agencies serving women of diverse ethno-cultural origins. Each focus group explored topics that included challenges in daily life, medical issues and needs, and issues that were silenced within the participants' communities. The researchers analyzed the data from these focus groups using thematic analysis, an approach that identifies, analyzes, and reports themes in qualitative data. They found that women living with HIV in Ontario experienced multiple types of stigma at different levels. So, for example, women experienced HIV-related stigma at the micro (“If you're HIV-positive, you feel shameful”), meso (“The thing I hate most for people that test positive for HIV is that society ostracizes them”), and macro (“A lot of women are not getting employed because they have to disclose their status”) levels. The women also attributed their experiences of stigma and discrimination to sexism and gender discrimination, racism, homophobia and transphobia, and involvement in sex work at all three levels and described coping strategies at the micro (resilience; “I always live with hope”), meso (participation in social networks), and macro (challenging stigma) levels.
What Do These Findings Mean?
These findings indicate that marginalized HIV-positive women living in Ontario experience overlapping forms of stigma and discrimination and that these forms of stigma operate over micro, meso, and macro levels, as do the coping strategies adopted by the women. Together, these results support an intersectional model of stigma and discrimination that should help to inform discussions about the complexity of stigma and coping strategies. However, because only a small sample of nonrandomly selected women was involved in this study, these findings need to be confirmed in other groups of HIV-positive women. If confirmed, the complex system of interplay of different forms of stigma revealed here should help to inform health-care provision, stigma reduction interventions, and public-health policy, and could, ultimately, help to bring the global HIV epidemic under control.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001124.
Information is available from the US National Institute of Allergy and Infectious Diseases on HIV infection and AIDS
NAM/aidsmap basic information about HIV/AIDS, and summaries of recent research findings on HIV care and treatment; its publication HIV and stigma deals with HIV-related stigma in the UK
Information is available from Avert, an international AIDS charity on many aspects of HIV/AIDS, including information on women, HIV, and AIDS, on HIV and AIDS stigma and discrimination, and on HIV/AIDS statistics for Canada (in English and Spanish)
The People Living with Stigma Index to address stigma relating to HIV and advocate on key barriers and issues perpetuating stigma; it has recently published Piecing it together for women and girls, the gender dimensions of HIV-related stigma; its website will soon include a selection of individual stories about HIV-related stigma
Patient stories about living with HIV/AIDS are available through Avert and through the charity website Healthtalkonline
doi:10.1371/journal.pmed.1001124
PMCID: PMC3222645  PMID: 22131907
14.  Classification of adults suffering from typical gastroesophageal reflux disease symptoms: contribution of latent class analysis in a European observational study 
BMC Gastroenterology  2014;14:112.
Background
As illustrated by the Montreal classification, gastroesophageal reflux disease (GERD) is much more than heartburn and patients constitute a heterogeneous group. Understanding if links exist between patients’ characteristics and GERD symptoms, and classify subjects based on symptom-profile could help to better understand, diagnose, and treat GERD. The aim of this study was to identify distinct classes of GERD patients according to symptom profiles, using a specific statistical tool: Latent class analysis.
Methods
An observational single-visit study was conducted in 5 European countries in 7700 adults with typical symptoms. A latent class analysis was performed to identify “latent classes” and was applied to 12 indicator symptoms.
Results
On 7434 subjects with non-missing indicators, latent class analysis yielded 5 latent classes. Class 1 grouped the highest severity of typical GERD symptoms during day and night, more digestive and non-digestive GERD symptoms, and bad sleep quality. Class 3 represented less frequent and less severe digestive and non-digestive GERD symptoms, and better sleep quality than in class 1. In class 2, only typical GERD symptoms at night occurred. Classes 4 and 5 represented daytime and nighttime regurgitation. In class 4, heartburn was also identified and more atypical digestive symptoms. Multinomial logistic regression showed that country, age, sex, smoking, alcohol use, low-fat diet, waist circumference, recent weight gain (>5 kg), elevated triglycerides, metabolic syndrome, and medical GERD treatment had a significant effect on latent classes.
Conclusion
Latent class analysis classified GERD patients based on symptom profiles which related to patients’ characteristics. Although further studies considering these proposed classes have to be conducted to determine the reproducibility of this classification, this new tool might contribute in better management and follow-up of patients with GERD.
doi:10.1186/1471-230X-14-112
PMCID: PMC4094535  PMID: 24969728
GERD; Acid related disease; Adults; Latent class analysis; Symptoms; Classification
15.  Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection 
Objective
The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload.
Design
We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (Npos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations.
Measurements
Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed.
Results
Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively).
Conclusion
Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.
doi:10.1136/amiajnl-2010-000022
PMCID: PMC3168300  PMID: 21709163
Medical informatics; text mining, data mining; longitudinal data analysis; classification and clustering algorithms; adverse event identification; text
16.  Computer-assisted lip diagnosis on traditional Chinese medicine using multi-class support vector machines 
Background
In Traditional Chinese Medicine (TCM), the lip diagnosis is an important diagnostic method which has a long history and is applied widely. The lip color of a person is considered as a symptom to reflect the physical conditions of organs in the body. However, the traditional diagnostic approach is mainly based on observation by doctor’s nude eyes, which is non-quantitative and subjective. The non-quantitative approach largely depends on the doctor’s experience and influences accurate the diagnosis and treatment in TCM. Developing new quantification methods to identify the exact syndrome based on the lip diagnosis of TCM becomes urgent and important. In this paper, we design a computer-assisted classification model to provide an automatic and quantitative approach for the diagnosis of TCM based on the lip images.
Methods
A computer-assisted classification method is designed and applied for syndrome diagnosis based on the lip images. Our purpose is to classify the lip images into four groups: deep-red, red, purple and pale. The proposed scheme consists of four steps including the lip image preprocessing, image feature extraction, feature selection and classification. The extracted 84 features contain the lip color space component, texture and moment features. Feature subset selection is performed by using SVM-RFE (Support Vector Machine with recursive feature elimination), mRMR (minimum Redundancy Maximum Relevance) and IG (information gain). Classification model is constructed based on the collected lip image features using multi-class SVM and Weighted multi-class SVM (WSVM). In addition, we compare SVM with k-nearest neighbor (kNN) algorithm, Multiple Asymmetric Partial Least Squares Classifier (MAPLSC) and Naïve Bayes for the diagnosis performance comparison. All displayed faces image have obtained consent from the participants.
Results
A total of 257 lip images are collected for the modeling of lip diagnosis in TCM. The feature selection method SVM-RFE selects 9 important features which are composed of 5 color component features, 3 texture features and 1 moment feature. SVM, MAPLSC, Naïve Bayes, kNN showed better classification results based on the 9 selected features than the results obtained from all the 84 features. The total classification accuracy of the five methods is 84%, 81%, 79% and 81%, 77%, respectively. So SVM achieves the best classification accuracy. The classification accuracy of SVM is 81%, 71%, 89% and 86% on Deep-red, Pale Purple, Red and lip image models, respectively. While with the feature selection algorithm mRMR and IG, the total classification accuracy of WSVM achieves the best classification accuracy. Therefore, the results show that the system can achieve best classification accuracy combined with SVM classifiers and SVM-REF feature selection algorithm.
Conclusions
A diagnostic system is proposed, which firstly segments the lip from the original facial image based on the Chan-Vese level set model and Otsu method, then extracts three kinds of features (color space features, Haralick co-occurrence features and Zernike moment features) on the lip image. Meanwhile, SVM-REF is adopted to select the optimal features. Finally, SVM is applied to classify the four classes. Besides, we also compare different feature selection algorithms and classifiers to verify our system. So the developed automatic and quantitative diagnosis system of TCM is effective to distinguish four lip image classes: Deep-red, Purple, Red and Pale. This study puts forward a new method and idea for the quantitative examination on lip diagnosis of TCM, as well as provides a template for objective diagnosis in TCM.
doi:10.1186/1472-6882-12-127
PMCID: PMC3522569  PMID: 22898352
Traditional chinese medicine; Computer-assisted lip diagnosis; Image analysis; Feature selection; Support vector machine
17.  Social and geographic inequalities in premature adult mortality in Japan: a multilevel observational study from 1970 to 2005 
BMJ Open  2012;2(2):e000425.
Objectives
To examine trends in social and geographic inequalities in all-cause premature adult mortality in Japan.
Design
Observational study of the vital statistics and the census data.
Setting
Japan.
Participants
Entire population aged 25 years or older and less than 65 years in 1970, 1975, 1980, 1985, 1990, 1995, 2000 and 2005. The total number of decedents was 984 022 and 532 223 in men and women, respectively.
Main outcome measures
For each sex, ORs and 95% CIs for mortality were estimated by using multilevel logistic regression models with ‘cells’ (cross-tabulated by age and occupation) at level 1, 8 years at level 2 and 47 prefectures at level 3. The prefecture-level variance was used as an estimate of geographic inequalities of mortality.
Results
Adjusting for age and time-trends, compared with production process and related workers, ORs ranged from 0.97 (95% CI 0.96 to 0.98) among administrative and managerial workers to 2.22 (95% CI 2.19 to 2.24) among service workers in men. By contrast, in women, the lowest odds for mortality was observed among production process and related workers (reference), while the highest OR was 12.22 (95% CI 11.40 to 13.10) among security workers. The degree of occupational inequality increased in both sexes. Higher occupational groups did not experience reductions in mortality throughout the period and was overtaken by lower occupational groups in the early 1990s, among men. Conditional on individual age and occupation, overall geographic inequalities of mortality were relatively small in both sexes; the ORs ranged from 0.87 (Okinawa) to 1.13 (Aomori) for men and from 0.84 (Kanagawa) to 1.11 (Kagoshima) for women, even though there is a suggestion of increasing inequalities across prefectures since 1995 in both sexes.
Conclusions
The present findings suggest that both social and geographic inequalities in all-cause mortality have increased in Japan during the last 3 decades.
Article summary
Article focus
While Japan enjoys the highest average life expectancy in the world, less has been documented on the trends and patterns of health inequalities within the nation.
We examined trends in social and geographic inequalities in all-cause premature adult mortality from 1970 through 2005.
Key messages
This is the first study that simultaneously examines time-trends in premature mortality by occupational class as well as geographic locality, and the results of our study indicate that health disparities have widened during the decades following the collapse of the asset bubble in the early 1990s.
Given the multiple challenges that threaten to further dampen economic activity of the nation, it is imperative to continue to monitor future trends in health inequalities in order to avert the potential impacts on Japan's health security.
Strengths and limitations of this study
The data are census based and cover the whole of Japan from 1970 to 2005.
This study uses multilevel methods to properly adjust for micro- and macro-level bias simultaneously.
We lacked information on whether the individuals were in standard jobs or precarious jobs and a possibility of measurement error in occupation at the time of death cannot be ruled out.
doi:10.1136/bmjopen-2011-000425
PMCID: PMC3293144  PMID: 22389360
18.  Patterns of Obesity Development before the Diagnosis of Type 2 Diabetes: The Whitehall II Cohort Study 
PLoS Medicine  2014;11(2):e1001602.
Examining patterns of change in body mass index (BMI) and other cardiometabolic risk factors in individuals during the years before they were diagnosed with diabetes, Kristine Færch and colleagues report that few of them experienced dramatic BMI changes.
Please see later in the article for the Editors' Summary
Background
Patients with type 2 diabetes vary greatly with respect to degree of obesity at time of diagnosis. To address the heterogeneity of type 2 diabetes, we characterised patterns of change in body mass index (BMI) and other cardiometabolic risk factors before type 2 diabetes diagnosis.
Methods and Findings
We studied 6,705 participants from the Whitehall II study, an observational prospective cohort study of civil servants based in London. White men and women, initially free of diabetes, were followed with 5-yearly clinical examinations from 1991–2009 for a median of 14.1 years (interquartile range [IQR]: 8.7–16.2 years). Type 2 diabetes developed in 645 (1,209 person-examinations) and 6,060 remained free of diabetes during follow-up (14,060 person-examinations). Latent class trajectory analysis of incident diabetes cases was used to identify patterns of pre-disease BMI. Associated trajectories of cardiometabolic risk factors were studied using adjusted mixed-effects models. Three patterns of BMI changes were identified. Most participants belonged to the “stable overweight” group (n = 604, 94%) with a relatively constant BMI level within the overweight category throughout follow-up. They experienced slightly worsening of beta cell function and insulin sensitivity from 5 years prior to diagnosis. A small group of “progressive weight gainers” (n = 15) exhibited a pattern of consistent weight gain before diagnosis. Linear increases in blood pressure and an exponential increase in insulin resistance a few years before diagnosis accompanied the weight gain. The “persistently obese” (n = 26) were severely obese throughout the whole 18 years before diabetes diagnosis. They experienced an initial beta cell compensation followed by loss of beta cell function, whereas insulin sensitivity was relatively stable. Since the generalizability of these findings is limited, the results need confirmation in other study populations.
Conclusions
Three patterns of obesity changes prior to diabetes diagnosis were accompanied by distinct trajectories of insulin resistance and other cardiometabolic risk factors in a white, British population. While these results should be verified independently, the great majority of patients had modest weight gain prior to diagnosis. These results suggest that strategies focusing on small weight reductions for the entire population may be more beneficial than predominantly focusing on weight loss for high-risk individuals.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Worldwide, more than 350 million people have diabetes, a metabolic disorder characterized by high amounts of glucose (sugar) in the blood. Blood sugar levels are normally controlled by insulin, a hormone released by the pancreas after meals (digestion of food produces glucose). In people with type 2 diabetes (the commonest form of diabetes) blood sugar control fails because the fat and muscle cells that normally respond to insulin by removing sugar from the blood become insulin resistant. Type 2 diabetes, which was previously called adult-onset diabetes, can be controlled with diet and exercise, and with drugs that help the pancreas make more insulin or that make cells more sensitive to insulin. Long-term complications, which include an increased risk of heart disease and stroke, reduce the life expectancy of people with diabetes by about 10 years compared to people without diabetes. The number of people with diabetes is expected to increase dramatically over the next decades, coinciding with rising obesity rates in many countries. To better understand diabetes development, to identify people at risk, and to find ways to prevent the disease are urgent public health goals.
Why Was This Study Done?
It is known that people who are overweight or obese have a higher risk of developing diabetes. Because of this association, a common assumption is that people who experienced recent weight gain are more likely to be diagnosed with diabetes. In this prospective cohort study (an investigation that records the baseline characteristics of a group of people and then follows them to see who develops specific conditions), the researchers tested the hypothesis that substantial weight gain precedes a diagnosis of diabetes and explored more generally the patterns of body weight and composition in the years before people develop diabetes. They then examined whether changes in body weight corresponded with changes in other risk factors for diabetes (such as insulin resistance), lipid profiles and blood pressure.
What Did the Researchers Do and Find?
The researchers studied participants from the Whitehall II study, a prospective cohort study initiated in 1985 to investigate the socioeconomic inequalities in disease. Whitehall II enrolled more than 10,000 London-based government employees. Participants underwent regular health checks during which their weight and height were measured, blood tests were done, and they filled out questionnaires for other relevant information. From 1991 onwards, participants were tested every five years for diabetes. The 6,705 participants included in this study were initially free of diabetes, and most of them were followed for at least 14 years. During the follow-up, 645 participants developed diabetes, while 6,060 remained free of the disease.
The researchers used a statistical tool called “latent class trajectory analysis” to study patterns of changes in body mass index (BMI) in the years before people developed diabetes. BMI is a measure of human obesity based on a person's weight and height. Latent class trajectory analysis is an unbiased way to subdivide a number of people into groups that differ based on specified parameters. In this case, the researchers wanted to identify several groups among all the people who eventually developed diabetes each with a distinct pattern of BMI development. Having identified such groups, they also examined how a variety of tests associated with diabetes risk, and risks for heart disease and stroke changed in the identified groups over time.
They identified three different patterns of BMI changes in the 645 participants who developed diabetes. The vast majority (606 individuals, or 94%) belonged to a group they called “stable-overweight.” These people showed no dramatic change in their BMI in the years before they were diagnosed. They were overweight when they first entered the study and gained or lost little weight during the follow-up years. They showed only minor signs of insulin-resistance, starting five years before they developed diabetes. A second, much smaller group of 15 people gained weight consistently in the years before diagnosis. As they were gaining weight, these people also had raises in blood pressure and substantial gains in insulin resistance. The 26 remaining participants who formed the third group were persistently obese for the entire time they participated in the study, in some cases up to 18 years before they were diagnosed with diabetes. They had some signs of insulin resistance in the years before diagnosis, but not the substantial gain often seen as the hallmark of “pre-diabetes.”
What Do These Findings Mean?
These results suggest that diabetes development is a complicated process, and one that differs between individuals who end up with the disease. They call into question the common notion that most people who develop diabetes have recently gained a lot of weight or are obese. A substantial rise in insulin resistance, another established risk factor for diabetes, was only seen in the smallest of the groups, namely the people who gained weight consistently for years before they were diagnosed. When the scientists applied a commonly used predictor of diabetes called the “Framingham diabetes risk score” to their largest “stably overweight” group, they found that these people were not classified as having a particularly high risk, and that their risk scores actually declined in the last five years before their diabetes diagnosis. This suggests that predicting diabetes in this group might be difficult.
The researchers applied their methodology only to this one cohort of white civil servants in England. Before drawing more firm conclusions on the process of diabetes development, it will be important to test whether similar results are seen in other cohorts and among more diverse individuals. If the three groups identified here are found in other cohorts, another question is whether they are as unequal in size as in this example. And if they are, can the large group of stably overweight people be further subdivided in ways that suggest specific mechanisms of disease development? Even without knowing how generalizable the provocative findings of this study are, they should stimulate debate on how to identify people at risk for diabetes and how to prevent the disease or delay its onset.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001602.
The US National Diabetes Information Clearinghouse provides information about diabetes for patients, health-care professionals, and the general public, including information on diabetes prevention (in English and Spanish)
The UK National Health Service Choices website provides information for patients and carers about type 2 diabetes; it includes people's stories about diabetes
The charity Diabetes UK also provides detailed information about diabetes for patients and carers, including information on healthy lifestyles for people with diabetes, and has a further selection of stories from people with diabetes; the charity Healthtalkonline has interviews with people about their experiences of diabetes
MedlinePlus provides links to further resources and advice about diabetes (in English and Spanish)
More information about the Whitehall II study is available
doi:10.1371/journal.pmed.1001602
PMCID: PMC3921118  PMID: 24523667
19.  A latent class model for defining severe hemorrhage: Experience from the PRospective, Observational, Multicenter, Major Trauma Transfusion (PROMMTT) Study 
Background
Several predictive models have been developed to identify trauma patients who have had severe hemorrhage (SH) and may need a massive transfusion protocol (MTP). However, almost all these models define SH as the transfusion of ≥10 units of red blood cells (RBCs) within 24 hours of ED admission (aka massive transfusion, MT). This definition excludes some patients with SH, especially those who die before a 10th unit of RBCs could be transfused, which calls the validity of these prediction models into question. We show how a latent class model could improve the accuracy of identifying the SH patients.
Methods
Modeling SH classification as a latent variable, we estimate the posterior probability of a patient in SH based on ED admission variables (SBP, HR, pH, Hemoglobin), the 24-hour blood product utilization (plasma:RBCs and platelets: RBCs ratios), and 24-hour survival status. We define the SH subgroup as those having a posterior probability of ≥0.5. We compare our new classification of SH with that of the traditional MT using data from PROMMTT study.
Results
Of 1245 patients, 913 had complete data which were used in the latent class model. About 25.3% of patients were classified as SH. The overall agreement between the MT and SH classifications was 83.8%. However, among 49 patients who died before receiving the 10th unit of RBCs 41 (84%) were classified as SH. Seven of the remaining 8 (87.5%) who were not classified as SH had head injury.
Conclusion
Our definition of SH based on the aforementioned latent class model has an advantage of improving on the traditional MT definition by identifying SH patients who die before receiving the 10th unit of RBCs. We recommend further improvements to more accurately classify SH patients that could replace the traditional definition of MT for use in developing prediction algorithms.
Level of Evidence
II, Prospective
doi:10.1097/TA.0b013e31828fa3d3
PMCID: PMC3744183  PMID: 23778516
PROMMTT; Massive Transfusion; Hemorrhage; Trauma; Latent Class Analysis
20.  In silico microdissection of microarray data from heterogeneous cell populations 
BMC Bioinformatics  2005;6:54.
Background
Very few analytical approaches have been reported to resolve the variability in microarray measurements stemming from sample heterogeneity. For example, tissue samples used in cancer studies are usually contaminated with the surrounding or infiltrating cell types. This heterogeneity in the sample preparation hinders further statistical analysis, significantly so if different samples contain different proportions of these cell types. Thus, sample heterogeneity can result in the identification of differentially expressed genes that may be unrelated to the biological question being studied. Similarly, irrelevant gene combinations can be discovered in the case of gene expression based classification.
Results
We propose a computational framework for removing the effects of sample heterogeneity by "microdissecting" microarray data in silico. The computational method provides estimates of the expression values of the pure (non-heterogeneous) cell samples. The inversion of the sample heterogeneity can be facilitated by providing accurate estimates of the mixing percentages of different cell types in each measurement. For those cases where no such information is available, we develop an optimization-based method for joint estimation of the mixing percentages and the expression values of the pure cell samples. We also consider the problem of selecting the correct number of cell types.
Conclusion
The efficiency of the proposed methods is illustrated by applying them to a carefully controlled cDNA microarray data obtained from heterogeneous samples. The results demonstrate that the methods are capable of reconstructing both the sample and cell type specific expression values from heterogeneous mixtures and that the mixing percentages of different cell types can also be estimated. Furthermore, a general purpose model selection method can be used to select the correct number of cell types.
doi:10.1186/1471-2105-6-54
PMCID: PMC1274251  PMID: 15766384
21.  Heterogeneity Mapping of Protein Expression in Tumors using Quantitative Immunofluorescence 
Morphologic heterogeneity within an individual tumor is well-recognized by histopathologists in surgical practice. While this often takes the form of areas of distinct differentiation into recognized histological subtypes, or different pathological grade, often there are more subtle differences in phenotype which defy accurate classification (Figure 1). Ultimately, since morphology is dictated by the underlying molecular phenotype, areas with visible differences are likely to be accompanied by differences in the expression of proteins which orchestrate cellular function and behavior, and therefore, appearance. The significance of visible and invisible (molecular) heterogeneity for prognosis is unknown, but recent evidence suggests that, at least at the genetic level, heterogeneity exists in the primary tumor1,2, and some of these sub-clones give rise to metastatic (and therefore lethal) disease.
Moreover, some proteins are measured as biomarkers because they are the targets of therapy (for instance ER and HER2 for tamoxifen and trastuzumab (Herceptin), respectively). If these proteins show variable expression within a tumor then therapeutic responses may also be variable. The widely used histopathologic scoring schemes for immunohistochemistry either ignore, or numerically homogenize the quantification of protein expression. Similarly, in destructive techniques, where the tumor samples are homogenized (such as gene expression profiling), quantitative information can be elucidated, but spatial information is lost. Genetic heterogeneity mapping approaches in pancreatic cancer have relied either on generation of a single cell suspension3, or on macrodissection4. A recent study has used quantum dots in order to map morphologic and molecular heterogeneity in prostate cancer tissue5, providing proof of principle that morphology and molecular mapping is feasible, but falling short of quantifying the heterogeneity. Since immunohistochemistry is, at best, only semi-quantitative and subject to intra- and inter-observer bias, more sensitive and quantitative methodologies are required in order to accurately map and quantify tissue heterogeneity in situ.
We have developed and applied an experimental and statistical methodology in order to systematically quantify the heterogeneity of protein expression in whole tissue sections of tumors, based on the Automated QUantitative Analysis (AQUA) system6. Tissue sections are labeled with specific antibodies directed against cytokeratins and targets of interest, coupled to fluorophore-labeled secondary antibodies. Slides are imaged using a whole-slide fluorescence scanner. Images are subdivided into hundreds to thousands of tiles, and each tile is then assigned an AQUA score which is a measure of protein concentration within the epithelial (tumor) component of the tissue. Heatmaps are generated to represent tissue expression of the proteins and a heterogeneity score assigned, using a statistical measure of heterogeneity originally used in ecology, based on the Simpson's biodiversity index7.
To date there have been no attempts to systematically map and quantify this variability in tandem with protein expression, in histological preparations. Here, we illustrate the first use of the method applied to ER and HER2 biomarker expression in ovarian cancer. Using this method paves the way for analyzing heterogeneity as an independent variable in studies of biomarker expression in translational studies, in order to establish the significance of heterogeneity in prognosis and prediction of responses to therapy.
doi:10.3791/3334
PMCID: PMC3227185  PMID: 22064683
22.  Inferring Pathway Activity toward Precise Disease Classification 
PLoS Computational Biology  2008;4(11):e1000217.
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.
Author Summary
The advent of microarray technology has drawn immense interest to identify gene expression levels that can serve as biomarkers for disease. Marker genes are selected by examining each individual gene to see how well its expression level discriminates different disease types. In complex diseases such as cancer, good marker genes can be hard to find due to cellular heterogeneity within the tissue and genetic heterogeneity across patients. A promising technique for addressing these challenges is to incorporate biological pathway information into the marker identification procedure, permitting disease classification based on the activity of entire pathways rather than simply on the expression levels of individual genes. However, previous pathway-based methods have not significantly outperformed gene-based methods. Here, we propose a new pathway-based classification procedure in which markers are encoded not as individual genes, nor as the set of genes making up a known pathway, but as subsets of “condition-responsive genes (CORGs)” within those pathways. Using expression profiles from seven different microarray studies, we show that the accuracy of this method is significantly better than both the conventional gene- and pathway- based diagnostics. Furthermore, the identified CORGs may facilitate the development of effective diagnostic markers and the discovery of molecular mechanisms underlying disease.
doi:10.1371/journal.pcbi.1000217
PMCID: PMC2563693  PMID: 18989396
23.  Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques 
Objective
Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.
Methods
We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.
Measurements
Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed.
Results
At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.
Conclusion
Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.
doi:10.1197/jamia.M2077
PMCID: PMC1561792  PMID: 16799125
24.  Short-term memory in gene induction reveals the regulatory principle behind stochastic IL-4 expression 
Combining experiments on primary T cells and mathematical modeling, we characterized the stochastic expression of the interleukin-4 cytokine gene in its physiologic context, showing that a two-step model of transcriptional regulation acting on chromatin rearrangement and RNA polymerase recruitment accounts for the level, kinetics, and population variability of expression.A rate-limiting step upstream of transcription initiation, but occurring at the level of an individual allele, controls whether the interleukin-4 gene is expressed during antigenic stimulation, suggesting that the observed stochasticity of expression is linked to the dynamics of chromatin rearrangement.The computational analysis predicts that the probability to re-express an interleukin-4 gene that has been expressed once is transiently increased. In support, we experimentally demonstrate a short-term memory for interleukin-4 expression at the predicted time scale of several days.The model provides a unifying framework that accounts for both graded and binary modes of gene regulation. Graded changes in expression level can be achieved by controlling transcription initiation, whereas binary regulation acts at the level of chromatin rearrangement and is targeted during the differentiation of T cells that specialize in interleukin-4 production.
Cell populations are typically heterogeneous with respect to protein expression even when clonally derived from a single progenitor. In bacteria and yeast, such heterogeneity has been shown to be due to intrinsically stochastic dynamics of gene expression (Raj and van Oudenaarden, 2008). Thus, cross-population heterogeneity may be an unavoidable by-product of random fluctuations in molecular interactions (Raser and O'Shea, 2004; Pedraza and van Oudenaarden, 2005). The phenotypic variability deriving from it may also be beneficial for cell function, differentiation, or adaptation to changing environments (Chang et al, 2008; Feinerman et al, 2008; Losick and Desplan, 2008). However, little is known about how gene-expression variability is caused in mammalian cells.
Two principal modes of gene regulation have been identified: graded and binary. In the graded mode, transcriptional regulators can tune the level of a gene product in a continuous manner (Hazzalin and Mahadevan, 2002). In the binary mode, the gene is expressed at an invariant level, whereas its probability of being expressed in a given cell is regulated, so that the gene has discrete ‘on' and ‘off' states (Walters et al, 1995; Hume, 2000; Biggar and Crabtree, 2001). In humans and mice, cytokine genes are expressed in a binary manner (Bix and Locksley, 1998; Riviere et al, 1998; Hu-Li et al, 2001; Apostolou and Thanos, 2008). A particularly well-studied case is the interleukin-4 (il4) gene that is critical for antibody-based immune responses. This gene is expressed by antigen-stimulated T cells initially with low probability, so that in most IL-4-positive cells only one allele is active (Bix and Locksley, 1998; Riviere et al, 1998). The expressed allele is not imprinted but chosen stochastically during each cell stimulation (Hu-Li et al, 2001).
Here, we have studied the dynamics of IL-4 expression quantitatively. Primary murine CD4+ T cells have been differentiated uniformly into type-2 T-helper (Th2) cells that express the lineage-specifying transcription factor (TF) Gata-3 and are competent to activate the il4 gene upon challenge with antigen. Using T cells heterozygous for an il4 wild-type allele and an il4 allele with GFP knock-in after the promoter, the alleles are found to be expressed stochastically and in an uncorrelated manner (Figure 2A; Hu-Li et al, 2001). To account for the observed stochastic dynamics of IL-4 expression, we considered a basic model of gene transcription, mRNA translation, turnover, and protein secretion (Figure 2B). However, our experimental estimates of the intracellular life times of IL-4 mRNA and protein (∼1 h) and their absolute numbers (mRNA∼103, protein∼105) rule out random fluctuations in transcription, translation as well as mRNA and protein turnover as an explanation for the observed stochastic properties of IL-4 expression (Thattai and van Oudenaarden, 2001; Paulsson, 2004).
As il4 is known to be strongly regulated at the chromatin level (Ansel et al, 2006), we included in the model a reversible step of chromatin opening that is permissive for transcription (Figure 2C and D). Both chromatin opening and transcription initiation are driven by TFs that are transiently activated during the antigen stimulus, with NFAT1 playing a prominent role (Agarwal et al, 2000; Avni et al, 2002; Guo et al, 2004). The model accounts for the kinetics of NFAT1 TF activity (Figure 2E) (Loh et al, 1996). Using a best-fit procedure for estimating the kinetics of the chromatin transition and TF activity from experimental data, we found that the model accurately reproduces the distribution of IL-4 expression within the cell population over the entire time course of a stimulation (Figure 3A). At the same time, it accounts for the measured kinetics of IL-4 mRNA, intracellular and secreted protein (Figure 3B). Additional data show that the model can also explain IL-4 expression at different stages of Th2 differentiation and upon pharmacological inhibition of NFAT1 activity. In each case, the model predicts a slow and stochastic chromatin opening (Step 1 in Figure 2C) that is the limiting step for the activation of the gene.
The slowness of chromatin opening inferred by the model implies an extended lifetime of the open chromatin state (several days), which lasts longer than TF activity during antigenic stimulation (several hours). This indicates that acute IL-4 expression is terminated by the cessation of TF activity (Step 2 in Figure 2C), rather than by the closing of the chromatin (Step 1). In support of this prediction, we observed an elevated fraction of IL-4-producing cells after secondary stimulations administered within a few days of the primary stimulus. Consistent with the model, this elevation disappeared with a half-life of ∼3 days (Figure 4B). To test whether this ‘short-term memory' for activation of the il4 gene is indeed due to the IL-4 producers in the primary stimulation, we sorted stimulated Th2 cells into viable IL-4-producing and non-producing fractions using the cytokine secretion assay (Ouyang et al, 2000) and cultured them separately for different resting periods. The probability of IL-4 re-expression in the positive-sorted cells was consistently larger than in negative-sorted cells and decreased progressively over several days (Figure 4C). By contrast, the sorted IL-4 negative cells exhibited a constant induction probability indistinguishable from the unsorted population. This behavior was not due to differential cell proliferation in the sorted populations or different success of Th2 differentiation. Moreover, using heterozygous il4-wild-type/il4-gfp cells, and sorting for expression of the wild-type allele, we observed that expression of the il4-gfp allele was similar in IL-4-positive and negative sorted fractions. Taken together, these findings imply that stochastic, slow chromatin changes at individual il4 genes govern the binary expression pattern of this cytokine.
In conclusion, we propose an experimentally based model of inducible gene expression where strong stochasticity arises from slow (hours to days) chromatin opening and closing transitions, rather than being due to small numbers of mRNA or protein molecules or transcriptional bursting (Raj et al, 2006). This rate-limiting step upstream of transcription initiation (which may entail several interacting epigenetic processes) naturally gives rise to a binary expression pattern of the gene. By contrast, regulation at the level of transcription initiation can have a graded effect on the expression level. We provide evidence that both binary and graded regulation can occur for the il4 gene. Physiological regulation of il4 seems to be mainly binary, thus enabling a dose–response within a population while producing an unequivocal all-or-none signal at the single-cell level.
Although cell-to-cell variability has been recognized as an unavoidable consequence of stochasticity in gene expression, it may also serve a functional role for tuning physiological responses within a cell population. In the immune system, remarkably large variability in the expression of cytokine genes has been observed in homogeneous populations of lymphocytes, but the underlying molecular mechanisms are incompletely understood. Here, we study the interleukin-4 gene (il4) in T-helper lymphocytes, combining mathematical modeling with the experimental quantification of expression variability and critical parameters. We show that a stochastic rate-limiting step upstream of transcription initiation, but acting at the level of an individual allele, controls il4 expression. Only a fraction of cells reaches an active, transcription-competent state in the transient time window determined by antigen stimulation. We support this finding by experimental evidence of a previously unknown short-term memory that was predicted by the model to arise from the long lifetime of the active state. Our analysis shows how a stochastic mechanism acting at the chromatin level can be integrated with transcriptional regulation to quantitatively control cell-to-cell variability.
doi:10.1038/msb.2010.13
PMCID: PMC2872609  PMID: 20393579
cytokines; cytokine secretion assay; epigenetic regulation; gene expression; stochastic model
25.  Meeting U.S. Healthy People 2010 Levels of Physical Activity: Agreement of 2 Measures across 2 years 
Annals of epidemiology  2010;20(7):511-523.
Background
Measuring the way people vary across time in meeting recommended levels of physical activity should be a fundamental component of public health surveillance. However, we were unaware of prospective cohort studies that had examined this in a population base using convergent measures.
Purpose
We examined agreement between two validated measures used to estimate periodic change in the rate of meeting U.S. Healthy People 2010 guidelines for participation in moderate or vigorous physical activity.
Methods
A cohort (N=497) from a random, multi-ethnic sample of adults living in Hawaii was assessed every 6-months for 2 years starting spring 2004. Latent transition analysis classified people as meeting or not meeting the guidelines. Intra-class kappa statistics and multinomial logistic regression analysis were used to evaluate agreement.
Results
Agreement for classifying stable classes of people who met or did not meet the guideline each time was substantial for vigorous activity (kappa ∼ .65 - .70) but fair-to-moderate for moderate activity (kappa ∼ .38 - .48). Agreement was poorer for classifying people who transitioned between meeting and not meeting the vigorous guideline (kappa ∼ .45) or the moderate guideline (kappa ∼ .21 - .29).
Conclusion
Rates of meeting the guidelines varied across time and were estimated differently by the two measures, especially for moderate activity. This illustrates an understudied problem for public health promotion. Accurate classification of change within people is necessary for determining exposure in outcome studies, personal determinants of sufficient activity, and for evaluating whether interventions are successful in sustaining increases in rates of meeting physical activity guidelines.
doi:10.1016/j.annepidem.2010.04.004
PMCID: PMC2895401  PMID: 20538194
Asian American; inter-rater agreement; latent transition analysis; Native Hawaiian/Pacific Islander; public health policy

Results 1-25 (1441594)