Calibration studies are routinely performed to establish examiner reliability in clinical periodontal research. In these studies, each periodontal site is assessed in duplicate, enabling point and interval estimation of agreement measures. We show how these data can be used additionally to discover subgroups among the periodontal sites according to degree of agreement with true periodontal status and to identify factors associated with examiner bias.
A Bayesian hierarchical model is developed that, for all examiners, links the examiner’s recorded measurement with the site’s true periodontal status, allowing for site-specific examiner effects on the recorded measurement. These site-specific examiner effects are modeled as arising from a Dirichlet process mixture, which yields a small number (relative to the number of sites) of distinct effects for each examiner. Hence sites that share the same examiner effect form a subgroup for which that examiner exhibits consistent bias relative to truth. We fit this model to data from a pilot calibration study for probed pocket depth measurements and use the results to explore examiner-specific groupings of sites according to degree of agreement with true pocket depth. The discovered group assignments were then associated with characteristics of the site.
The Bayesian hierarchical modeling revealed that periodontal sites were grouped according to bias into three, two and two subgroups, respectively, for each of the three study examiners. The magnitude of the bias was associated with tooth position and true depth of the pocket.
Our Bayesian hierarchical model enhances the utility of data obtained from calibration studies for periodontal pocket depth by facilitating discovery of subgroups of sites according to examiner bias. The results indicate that targeting specific tooth locations and pocket depths during examiner training, uniquely for each examiner, may reduce bias in periodontal pocket depth measurements, thereby enhancing the quality of oral epidemiologic research.
Although it is known that periodontal MMP-8 expression is associated with periodontal disease, the information concerning the periodontal MMP-8 expression in diabetic patients with periodontal disease is insufficient.
Materials and Methods
Periodontal tissue specimens were collected from 7 patients without periodontal disease and diabetes (Group 1), 15 patients with periodontal disease alone (Group 2) and 10 patients with both periodontal disease and diabetes (Group 3). The frozen sections were prepared and MMP-8 protein expression was detected using immunohistochemistry and quantified. For in vitro study, human U937 mononuclear cells were pre-exposed to normal or high glucose and then treated with LPS.
The nonparametric Kruskal-Wallis test showed that the difference in MMP-8 protein levels among the three groups were statistically significant (p = 0.003). Nonparametric analysis using Jonckheere-Terpstra test showed a tendency of increase in periodontal MMP-8 levels across Group 1 to Group 2 to Group 3 (p = 0.0002). In vitro studies showed that high glucose and LPS had a synergistic effect on MMP-8 expression.
Our current study showed an increasing trend in MMP-8 protein expression levels across patients without both periodontal disease and diabetes, patients with periodontal disease alone and patients with both diseases.
Periodontal diseases; Diabetes mellitus; MMP-8; Gene expression
Many human diseases are attributable to complex interactions among genetic and environmental factors. Statistical tools capable of modeling such complex interactions are necessary to improve identification of genetic factors that increase a patient's risk of disease. Logic Forest (LF), a bagging ensemble algorithm based on logic regression (LR), is able to discover interactions among binary variables predictive of response such as the biologic interactions that predispose individuals to disease. However, LF's ability to recover interactions degrades for more infrequently occurring interactions. A rare genetic interaction may occur if, for example, the interaction increases disease risk in a patient subpopulation that represents only a small proportion of the overall patient population. We present an alternative ensemble adaptation of LR based on boosting rather than bagging called LBoost. We compare the ability of LBoost and LF to identify variable interactions in simulation studies. Results indicate that LBoost is superior to LF for identifying genetic interactions associated with disease that are infrequent in the population. We apply LBoost to a subset of single nucleotide polymorphisms on the PRDX genes from the Cancer Genetic Markers of Susceptibility Breast Cancer Scan to investigate genetic risk for breast cancer. LBoost is publicly available on CRAN as part of the LogicForest package, http://cran.r-project.org/.
Prostate cancer is the most common malignancy in men and a leading cause of cancer mortality among males in the United States. Large geographical variation and racial disparities exist in both the incidence of prostate cancer and in the survival rate after diagnosis. In this population-based study, a joint spatial survival model is constructed to investigate factors that affect the age at diagnosis of prostate cancer and the subsequent survival. The joint model for these two time-to-event outcomes is specified through parametric models for age at diagnosis and survival time conditional on diagnosis age. To account for possible correlation in these outcomes among men from the same geographical region, frailty terms are included in the survival model. Both spatially correlated and uncorrelated frailties are incorporated in each model considered. The deviance information criterion (DIC) is used to select a best-fitting model within the Bayesian framework. The results from our final best-fitting model indicate that race, marital status at diagnosis, and cancer stage are significantly associated with both of the two time-to-event outcomes. No pattern emerged in the geographical distribution of age at prostate cancer diagnosis. In contrast, a spatially clustered pattern was observed in the geographic distribution of survival experience post diagnosis.
Prostate cancer; joint spatial survival model; spatial clustering; deviance information criterion (DIC)
Associations between dental conditions and overall health have been previously reported. Investigators have also shown significant inverse relationships between serum albumin (a general health status marker) and root caries. This relationship was explored among a study population of Gullah African Americans (who have a considerably lower level of non-African genetic admixture when compared to other African American populations) with type-2 diabetes (T2DM) and self-reported history of normal kidney function (N = 280).
Root caries indices were defined as total decayed and/or filled root surfaces. The coronal caries index [total decayed, missing, and/or filled coronal surfaces (DMFS)], level of glycemic control, total number of teeth, and other covariates were also evaluated. Logistic regression models were used to evaluate the associations between these factors and hypoalbuminemia (serum albumin concentrations <4 g/dl).
Serum albumin concentrations ranged 2.4–4.5 g/dl (mean = 3.8, SD = 0.3), with 70.4% exhibiting hypoalbuminemia. Root caries totals ranged 0–38 (mean = 1.3, SD = 4.5) surfaces decayed/filled, while total teeth ranged 1–28 (mean = 19.4, SD = 6.2). DMFS totals ranged 2–116 (mean = 55.2, SD = 28.0). We failed to detect significant associations for root caries; however, the final multivariable logistic regression models showed significant associations between hypoalbuminemia and total teeth [odds ratio (OR) = 0.93, P = 0.01], poor glycemic control (OR = 2.49, P < 0.01), elevated C-reactive protein (OR = 1.57, P < 0.01), glomerular filtration rates ≥60 (OR = 0.31, P = 0.03), and age (OR = 0.97, P = 0.03).
Previously reported inverse relationships between serum albumin and root caries were not evident in our study population. We propose that these null findings are because of the considerably lower level of root caries as well as other differing characteristics (including oral health status, the chronic presence of T2DM, and predominantly younger age) within our study population compared to these previously assessed groups.
diabetes; Gullah African Americans; root caries; serum albumin
Using multiplex bead assays to measure urine proteins has a great potential for biomarker discovery, but substances in urine (the matrix) can interfere with assay measurements. By comparing the recovery of urine spiked with known quantities of several common analytes, this study demonstrated that the urine matrix variably interfered with the accurate measurement of low abundance proteins. Dilution of the urine permitted a more accurate measure of these proteins, equivalent to the standard dilution technique when the diluted analytes were above the limits of detection of the assay. Therefore, dilution can be used as an effective technique for over-coming urine matrix effects in urine immunoassays. These results may be applicable to other biological fluids in which matrix components interfere with assay performance.
biomarkers; body fluids urine; analysis/urine; standard addition; assay validation
To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets.
We combined False Discovery Rate (FDR) estimation and the non-parametric RankProd approach to estimate the Type I error rate in each microarray dataset of the meta-analysis. These Type I error rates from all datasets were then used to identify genes with common differential expression patterns. Our simulation study showed that CDEP achieved higher statistical power and maintained low Type I error rate when compared with two recently proposed meta-analysis approaches. We applied CDEP to analyze microarray data from different laboratories that compared transcription profiles between metastatic and primary cancer of different types. Many genes identified as differentially expressed consistently across different cancer types are in pathways related to metastatic behavior, such as ECM-receptor interaction, focal adhesion, and blood vessel development. We also identified novel genes such as AMIGO2, Gem, and CXCL11 that have not been shown to associate with, but may play roles in, metastasis.
CDEP is a flexible approach that borrows information from each dataset in a meta-analysis in order to identify genes being differentially expressed consistently. We have shown that CDEP can gain higher statistical power than other existing approaches under a variety of settings considered in the simulation study, suggesting its robustness and insensitivity to data variation commonly associated with microarray experiments.
Availability: CDEP is implemented in R and freely available at: http://genomebioinfo.musc.edu/CDEP/
Background and objectives
Epidemiological studies have established that patients with diabetes have an increased prevalence and severity of periodontal disease. Interleukin (IL)-6, a multifunctional cytokine, plays a role in the tissue inflammation that characterizes periodontal disease. Our recent study has shown a trend of increase in periodontal IL-6 expression at the mRNA level across patients with neither periodontal disease nor diabetes, patients with periodontal disease alone and patients with both diseases. However, the periodontal IL-6 expression at the protein level in these patients has not been investigated.
Material and Methods
Periodontal tissue specimens were collected from eight patients without periodontal disease and diabetes (group 1), from 17 patients with periodontal disease alone (group 2) and from 10 patients with both periodontal disease and diabetes (group 3). The frozen sections were prepared from these tissue specimens and IL-6 protein expression was detected and quantified.
The nonparametric Kruskal-Wallis test showed that differences in IL-6 protein levels among the three groups were statistically significant (p = 0.035). Nonparametric analysis using Jonckheere-Terpstra test showed a tendency of increase in periodontal IL-6 protein levels across group 1 to group 2 to group 3 (p = 0.006). Parametric analysis of variance (ANOVA) on IL-6 protein levels showed that neither age nor gender significantly affected the difference of IL-6 levels among the groups.
Periodontal IL-6 expression at the protein level is increased across patients with neither periodontal disease nor diabetes, patients with periodontal disease alone, and patients with both diseases.
Periodontal diseases; Diabetes mellitus; IL-6; Gene expression
Motivation: Highly sensitive and specific screening tools may reduce disease -related mortality by enabling physicians to diagnose diseases in asymptomatic patients or at-risk individuals. Diagnostic tests based on multiple biomarkers may achieve the needed sensitivity and specificity to realize this clinical gain.
Results: Logic regression, a multivariable regression method predicting an outcome using logical combinations of binary predictors, yields interpretable models of the complex interactions in biologic systems. However, its performance degrades in noisy data. We extend logic regression for classification to an ensemble of logic trees (Logic Forest, LF). We conduct simulation studies comparing the ability of logic regression and LF to identify variable interactions predictive of disease status. Our findings indicate LF is superior to logic regression for identifying important predictors. We apply our method to single nucleotide polymorphism data to determine associations of genetic and health factors with periodontal disease.
Availability: LF code is publicly available on CRAN, http://cran.r-project.org/.
Supplementary information: Supplementary data are available at Bioinformatics online.
One of the most important indicators of dental caries prevalence is the total count of decayed, missing or filled (DMF) surfaces in a tooth. These count data are often clustered in nature (several count responses clustered within a subject), over-dispersed, as well as spatially referenced (a diseased tooth might be positively influencing the decay process of a set of neighboring teeth). In this paper, we develop a multivariate spatial Beta-Binomial (BB) model for these data that accommodates both over-dispersion as well as latent spatial associations. Using a Bayesian paradigm, the re-parameterized marginal mean (as well as variance) under the BB framework are modeled using a regression on subject/tooth-specific co-variables and a conditionally autoregressive (CAR) prior that models the latent spatial process. The necessity of exploiting spatial associations to model count data arising in dental caries research is demonstrated using a small simulation study. Real data confirms that our spatial BB model provides a superior estimation and model fit as compared to other sub-models that do not consider modeling spatial associations.
Beta-Binomial; conditionally auto-regressive (CAR); count data; dental caries; spatial
Measurement error is pervasive in medical research. In periodontal research studies, one measure of disease status is the probed pocket depth (PPD), the depth of the space between a tooth and the surrounding gum. In larger studies, these assessments are made by multiple examiners, each having distinct measurement error characteristics. Because PPD is recorded in whole millimeters, it may be regarded as discrete and its associated error as misclassification error. This study investigates the impact of this measurement error when evaluating the effect of periodontal disease status on levels of inflammatory markers in gingival crevicular fluid (GCF). The marker readings are either left or right censored, due to quantities that are either too small to be reliably quantified or so large that they saturate the detector. Additionally, marker readings from multiple periodontal sites within a subject's mouth are correlated. These considerations give rise to a clustered survival model for the marker readings in which the discrete predictor of interest is misclassified. Associations between the GCF markers and periodontal assessments are corrected for misclassification error using the MC-SIMEX method. Simulation studies reveal the impact of varying degrees of misclassification error on associations of interest. Analysis of pilot data from a periodontal study, for which examiner misclassification rates are estimated from calibration studies, further illustrates the approach.
African Americans have a disproportionate burden of diabetes. Gullah African Americans are the most genetically homogeneous population of African descent in the US, with an estimated European Caucasian admixture of only 3.5%. This study assessed the previously unknown prevalence of periodontal disease among a sample of Gullah African Americans with diabetes and investigated the association between diabetes control and presence of periodontal disease.
Gullah African Americans with Type 2 diabetes (n=235) were included. Diabetes control was assessed by HbA1C, and divided into three categories: well controlled, <7%; moderately controlled, 7–8.5%; and poorly controlled, >8.5%. Participants were categorized as healthy, having no clinical attachment loss (CAL) or bleeding on probing (BOP); early periodontitis, having CAL ≥1 mm in ≥2 teeth; moderate periodontitis, having 3 sites with CAL ≥4 mm and at least 2 sites with probing depth (PD) ≥3 mm; and severe periodontitis, having CAL ≥6 mm in ≥2 teeth and PD ≥5 mm in ≥1 site. Observed prevalences of periodontitis were compared to rates reported for the NHANES studies.
All subjects had evidence of periodontal disease: 70.6% had moderate periodontitis and 28.5% had severe disease. Diabetes control was not associated with periodontal disease. The periodontal disease proportions were significantly higher than the reported national prevalence of 10.6% among African Americans without diabetes.
Our sample of Gullah African Americans with type 2 diabetes exhibits higher prevalence of periodontal disease than African Americans, both with and without diabetes, described in NHANES III and NHANES 1999–2000.
Type 2 diabetes; Gullah African Americans; Periodontal disease
We investigated the efficacy of plaque removal after an oral self-care demonstration among adult Gullah-speaking African Americans with diabetes. Fiftyfour adults with diabetes completed an observed, uninstructed oral self-care demonstration with their normal mode of oral self-care. Before and after the oral self-care demonstration, the plaque levels of six test teeth were assessed using the Plaque Index. The mean percentage of plaque removal after the oral self-care demonstration was 27.4%. The mandibular teeth and the lingual surface had less plaque removal compared with the maxillary teeth and buccal surfaces. Only approximately 10% of participants achieved 50% or more plaque removal after the oral self-care demonstration. Thus, the majority of the participants did not achieve an acceptable level of plaque removal. Dental health professionals should emphasize better oral home care for people with diabetes and teach them how to access the lingual surfaces, especially of the mandibular teeth.
An easy-to-implement global procedure for testing the four assumptions of the linear model is proposed. The test can be viewed as a Neyman smooth test and it only relies on the standardized residual vector. If the global procedure indicates a violation of at least one of the assumptions, the components of the global test statistic can be utilized to gain insights into which assumptions have been violated. The procedure can also be used in conjunction with associated deletion statistics to detect unusual observations. Simulation results are presented indicating the sensitivity of the procedure in detecting model violations under a variety of situations, and its performance is compared with three potential competitors, including a procedure based on the Box-Cox power transformation. The procedure is demonstrated by applying it to a new car mileage data set and a water salinity data set that has been used previously to illustrate model diagnostics.
Box-Cox transformation; Deletion statistics; Model diagnostics and validation; Neyman smooth test; Outlier detection; Score test
To explore factors associated with self-reported current oral (tooth and gum) problems and oral pain in the past 12 months among adults with spinal cord injury.
An online oral health survey on the South Carolina Spinal Cord Injury Association website. Respondents were 192 adult residents of the US who identified themselves as having spinal cord injury at least 1 year before the survey date.
Approximately 47% of respondents reported having oral problems at the time of the survey, and 42% reported experiencing oral pain in the 12 months before the survey date. Multiple predictor analyses (controlling for age, gender, income, and dental insurance) indicated that current oral problems were positively associated with dry mouth symptoms, financial barriers to dental care access, smoking, and paraplegia. Oral pain experienced in the past 12 months was positively associated with dry mouth symptoms, financial barriers to dental care access, minority race, and paraplegia.
Adults with spinal cord injury reported a high prevalence of oral problems and oral pain. Those with paraplegia were more likely to report problems than those with tetraplegia. Because dry mouth and smoking were significantly associated with these problems, patient education from both dental and medical providers should emphasize awareness of the side effects of xerostomia-causing medications, dry mouth management, and smoking cessation. Findings also indicate unmet needs for low-cost preventive and treatment dental services for this vulnerable population.
Spinal cord injuries; Paraplegia; Tetraplegia; Dental care; Oral health; Oral pain; Xerostomia; Smoking cessation; Dental hygiene; Activities of daily living
Meta-analytic methods for diagnostic test performance, Bayesian methods in particular, have not been well developed. The most commonly used method for meta-analysis of diagnostic test performance is the Summary Receiver Operator Characteristic (SROC) curve approach of Moses, Shapiro and Littenberg. In this paper, we provide a brief summary of the SROC method, then present a case study of a Bayesian adaptation of their SROC curve method that retains the simplicity of the original model while additionally incorporating uncertainty in the parameters, and can also easily be extended to incorporate the effect of covariates. We further derive a simple transformation which facilitates prior elicitation from clinicians. The method is applied to two datasets: an assessment of computed tomography for detecting metastases in non-small-cell lung cancer, and a novel dataset to assess the diagnostic performance of endoscopic ultrasound (EUS) in the detection of biliary obstructions relative to the current gold standard of endoscopic retrograde cholangiopancreatography (ERCP).
Bayesian; biliary system; ERCP; EUS; SROC
Procedures for estimating the parameters of the general class of semiparametric models for recurrent events proposed by Peña and Hollander (2004) are developed. This class of models incorporates an effective age function encoding the effect of changes after each event occurrence such as the impact of an intervention, it models the impact of accumulating event occurrences on the unit, it admits a link function in which the effect of possibly time-dependent covariates are incorporated, and it allows the incorporation of unobservable frailty components which induce dependencies among the inter-event times for each unit. The estimation procedures are semiparametric in that a baseline hazard function is nonparametrically specified. The sampling distribution properties of the estimators are examined through a simulation study, and the consequences of mis-specifying the model are analyzed. The results indicate that the flexibility of this general class of models provides a safeguard for analyzing recurrent event data, even data possibly arising from a frailtyless mechanism. The estimation procedures are applied to real data sets arising in the biomedical and public health settings, as well as from reliability and engineering situations. In particular, the procedures are applied to a data set pertaining to times to recurrence of bladder cancer and the results of the analysis are compared to those obtained using three methods of analyzing recurrent event data.
Correlated inter-event times; counting process; effective age process; EM algorithm; frailty; intensity models; model mis-specification; sum-quota accrual scheme
We describe biological and experimental factors that induce variability in reporter ion peak areas obtained from iTRAQ experiments. We demonstrate how these factors can be incorporated into a statistical model for use in evaluating differential protein expression and highlight the benefits of using analysis of variance to quantify fold change. We demonstrate the model's utility based on an analysis of iTRAQ data derived from a spikein study.
Analysis of variance; ANOVA; iTRAQ; mass spectrometry; differential expression
Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization.
We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used.
A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments.
The aim of this study is to explore behavioral factors associated with toothache among African American adolescents living in rural South Carolina.
Using a self-administered questionnaire, data were collected on toothache experience in the past 12 months, oral hygiene behavior, dental care utilization, and cariogenic snack and non-diet soft drink consumption in a convenience sample of 156 African American adolescents aged 10-18 years old living in rural South Carolina. Univariable and multivariable logistic regression analyses were used to assess the associations between reported toothache experience and socio-demographic variables, oral health behavior, and snack consumption.
Thirty-four percent of adolescents reported having toothache in the past 12 months. In univariable modeling, age, dental visit in the last two years, quantity and frequency of cariogenic snack consumption, and quantity of non-diet soft drink consumption were each significantly associated with experiencing toothache in the past 12 months (all p-values < 0.05). Multivariable logistic regression analysis indicated that younger age, frequent consumption of cariogenic snacks, and number of cans of non-diet soft drink consumed during the weekend significantly increased the odds of experiencing toothache in the past 12 months (all p-values ≤ 0.01).
Findings indicate age, frequent consumption of cariogenic snacks and number of cans of non-diet soft drinks are related to toothache in this group. Public policy implications related to selling cariogenic snacks and soft drink that targeting children and adolescents especially those from low income families are discussed.
Dental pain; carbonated beverages; dietary sucrose; rural health; questionnaire