1.  Maximum Likelihood, Profile Likelihood, and Penalized Likelihood: A Primer 
American Journal of Epidemiology  2013;179(2):252-260.
The method of maximum likelihood is widely used in epidemiology, yet many epidemiologists receive little or no education in the conceptual underpinnings of the approach. Here we provide a primer on maximum likelihood and some important extensions which have proven useful in epidemiologic research, and which reveal connections between maximum likelihood and Bayesian methods. For a given data set and probability model, maximum likelihood finds values of the model parameters that give the observed data the highest probability. As with all inferential statistical methods, maximum likelihood is based on an assumed model and cannot account for bias sources that are not controlled by the model or the study design. Maximum likelihood is nonetheless popular, because it is computationally straightforward and intuitive and because maximum likelihood estimators have desirable large-sample properties in the (largely fictitious) case in which the model has been correctly specified. Here, we work through an example to illustrate the mechanics of maximum likelihood estimation and indicate how improvements can be made easily with commercial software. We then describe recent extensions and generalizations which are better suited to observational health research and which should arguably replace standard maximum likelihood as the default method.
PMCID: PMC3873110  PMID: 24173548
epidemiologic methods; maximum likelihood; modeling; penalized estimation; regression; statistics
2.  Statistical Methods for Multivariate Meta-analysis of Diagnostic Tests: An Overview and Tutorial 
Statistical methods in medical research  2013;10.1177/0962280213492588.
In this article, we present an overview and tutorial of statistical methods for meta-analysis of diagnostic tests under two scenarios: 1) when the reference test can be considered a gold standard; and 2) when the reference test cannot be considered a gold standard. In the first scenario, we first review the conventional summary receiver operating characteristics (ROC) approach and a bivariate approach using linear mixed models (BLMM). Both approaches require direct calculations of study-specific sensitivities and specificities. We next discuss the hierarchical summary ROC curve approach for jointly modeling positivity criteria and accuracy parameters, and the bivariate generalized linear mixed models (GLMM) for jointly modeling sensitivities and specificities. We further discuss the trivariate GLMM for jointly modeling prevalence, sensitivities and specificities, which allows us to assess the correlations among the three parameters. These approaches are based on the exact binomial distribution and thus do not require an ad hoc continuity correction. Last, we discuss a latent class random effects model for meta-analysis of diagnostic tests when the reference test itself is imperfect for the second scenario. A number of case studies with detailed annotated SAS code in procedures MIXED and NLMIXED are presented to facilitate the implementation of these approaches.
PMCID: PMC3883791  PMID: 23804970
meta-analysis; diagnostic test; gold standard; generalized linear mixed models
3.  Change-Point Models to Estimate the Limit of Detection 
Statistics in medicine  2013;32(28):4995-5007.
In many biological and environmental studies, measured data is subject to a limit of detection. The limit of detection is generally defined as the lowest concentration of analyte that can be differentiated from a blank sample with some certainty. Data falling below the limit of detection is left-censored, falling below a level that is easily quantified by a measuring device. A great deal of interest lies in estimating the limit of detection for a particular measurement device. In this paper we propose a change-point model to estimate the limit of detection using data from an experiment with known analyte concentrations. Estimation of the limit of detection proceeds by a two-stage maximum likelihood method. Extensions are considered that allow for censored measurements and data from multiple experiments. A simulation study is conducted demonstrating that in some settings the change-point model provides less biased estimates of the limit of detection than conventional methods. The proposed method is then applied to data from an HIV pilot study.
PMCID: PMC3858526  PMID: 23784922
change point; limit of detection; linear calibration curve; two-stage maximum likelihood
4.  A trivariate meta-analysis of diagnostic studies accounting for prevalence and non-evaluable subjects: re-evaluation of the meta-analysis of coronary CT angiography studies 
A recent paper proposed an intent-to-diagnose approach to handle non-evaluable index test results and discussed several alternative approaches, with an application to the meta-analysis of coronary CT angiography diagnostic accuracy studies. However, no simulation studies have been conducted to test the performance of the methods.
We propose an extended trivariate generalized linear mixed model (TGLMM) to handle non-evaluable index test results. The performance of the intent-to-diagnose approach, the alternative approaches and the extended TGLMM approach is examined by extensive simulation studies. The meta-analysis of coronary CT angiography diagnostic accuracy studies is re-evaluated by the extended TGLMM.
Simulation studies showed that the intent-to-diagnose approach under-estimate sensitivity and specificity. Under the missing at random (MAR) assumption, the TGLMM gives nearly unbiased estimates of test accuracy indices and disease prevalence. After applying the TGLMM approach to re-evaluate the coronary CT angiography meta-analysis, overall median sensitivity is 0.98 (0.967, 0.993), specificity is 0.875 (0.827, 0.923) and disease prevalence is 0.478 (0.379, 0.577).
Under MAR assumption, the intent-to-diagnose approach under-estimate both sensitivity and specificity, while the extended TGLMM gives nearly unbiased estimates of sensitivity, specificity and prevalence. We recommend the extended TGLMM to handle non-evaluable index test subjects.
PMCID: PMC4280699  PMID: 25475705
Meta-analysis; Diagnostic test; Non-evaluable subjects
5.  A Bayesian approach to strengthen inference for case-control studies with multiple error-prone exposure assessments 
Statistics in medicine  2013;32(25):4426-4437.
In case-control studies, exposure assessments are almost always error-prone. In the absence of a gold standard, two or more assessment approaches are often used to classify people with respect to exposure. Each imperfect assessment tool may lead to misclassification of exposure assignment; the exposure misclassification may be differential with respect to case status or not; and, the errors in exposure classification under the different approaches may be independent (conditional upon the true exposure status) or not. Although methods have been proposed to study diagnostic accuracy in the absence of a gold standard, these methods are infrequently used in case-control studies to correct exposure misclassification that is simultaneously differential and dependent. In this paper, we proposed a Bayesian method to estimate the measurement-error corrected exposure-disease association, accounting for both differential and dependent misclassification. The performance of the proposed method is investigated using simulations, which show that the proposed approach works well, as well as an application to a case-control study assessing the association between asbestos exposure and mesothelioma.
PMCID: PMC3788843  PMID: 23661263
Case-control study; gold standard; misclassification; dependent; differential
6.  DNA methylation profiling in the Carolina Breast Cancer Study defines cancer subclasses differing in clinicopathologic characteristics and survival 
Breast cancer is a heterogeneous disease, with several intrinsic subtypes differing by hormone receptor (HR) status, molecular profiles, and prognosis. However, the role of DNA methylation in breast cancer development and progression and its relationship with the intrinsic tumor subtypes are not fully understood.
A microarray targeting promoters of cancer-related genes was used to evaluate DNA methylation at 935 CpG sites in 517 breast tumors from the Carolina Breast Cancer Study, a population-based study of invasive breast cancer.
Consensus clustering using methylation (β) values for the 167 most variant CpG loci defined four clusters differing most distinctly in HR status, intrinsic subtype (luminal versus basal-like), and p53 mutation status. Supervised analyses for HR status, subtype, and p53 status identified 266 differentially methylated CpG loci with considerable overlap. Genes relatively hypermethylated in HR+, luminal A, or p53 wild-type breast cancers included FABP3, FGF2, FZD9, GAS7, HDAC9, HOXA11, MME, PAX6, POMC, PTGS2, RASSF1, RBP1, and SCGB3A1, whereas those more highly methylated in HR-, basal-like, or p53 mutant tumors included BCR, C4B, DAB2IP, MEST, RARA, SEPT5, TFF1, THY1, and SERPINA5. Clustering also defined a hypermethylated luminal-enriched tumor cluster 3 that gene ontology analysis revealed to be enriched for homeobox and other developmental genes (ASCL2, DLK1, EYA4, GAS7, HOXA5, HOXA9, HOXB13, IHH, IPF1, ISL1, PAX6, TBX1, SOX1, and SOX17). Although basal-enriched cluster 2 showed worse short-term survival, the luminal-enriched cluster 3 showed worse long-term survival but was not independently prognostic in multivariate Cox proportional hazard analysis, likely due to the mostly early stage cases in this dataset.
This study demonstrates that epigenetic patterns are strongly associated with HR status, subtype, and p53 mutation status and may show heterogeneity within tumor subclass. Among HR+ breast tumors, a subset exhibiting a gene signature characterized by hypermethylation of developmental genes and poorer clinicopathologic features may have prognostic value and requires further study. Genes differentially methylated between clinically important tumor subsets have roles in differentiation, development, and tumor growth and may be critical to establishing and maintaining tumor phenotypes and clinical outcomes.
Electronic supplementary material
The online version of this article (doi:10.1186/s13058-014-0450-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4303129  PMID: 25287138
7.  An Empirical Bayes Method for Multivariate Meta-analysis with an Application in Clinical Trials 
We propose an empirical Bayes method for evaluating overall and study-specific treatment effects in multivariate meta-analysis with binary outcome. Instead of modeling transformed proportions or risks via commonly used multivariate general or generalized linear models, we directly model the risks without any transformation. The exact posterior distribution of the study-specific relative risk is derived. The hyperparameters in the posterior distribution can be inferred through an empirical Bayes procedure. As our method does not rely on the choice of transformation, it provides a flexible alternative to the existing methods and in addition, the correlation parameter can be intuitively interpreted as the correlation coefficient between risks.
PMCID: PMC4115294  PMID: 25089070
Bivariate beta-binomial model; Exact method; Hypergeometric function; Meta-analysis; Relative risk; Sarmanov family
8.  Graduated driver licensing and motor vehicle crashes involving teenage drivers: an exploratory age-stratified meta-analysis 
Graduated Driver Licensing (GDL) has been implemented in Australia, Canada, New Zealand, USA and Israel. We conducted an exploratory summary of available data to estimate whether GDL effects varied with age.
We searched MEDLINE and other sources from 1991–2011. GDL evaluation studies with crashes resulting in injuries or deaths were eligible. They had to provide age-specific incidence rate ratios with CI or information for calculating these quantities. We included studies from individual states or provinces, but excluded national studies. We examined rates based on person-years, not license-years.
Of 1397 papers, 144 were screened by abstract and 47 were reviewed. Twelve studies from 11 US states and one Canadian province were selected for meta-analysis for age 16, eight were selected for age 17, and four for age 18. Adjusted rate ratios were pooled using random effects models. The pooled adjusted rate ratios for the association of GDL presence with crash rates was 0.78 (95% CI 0.72 to 0.84) for age 16 years, 0.94 (95% CI 0.93 to 0.96) for 17 and 1.00 (95% CI 0.95 to 1.04) for 18. The difference between these three rate ratios was statistically significant: p<0.001.
GDL policies were associated with a 22% reduction in crash rates among 16-year-old drivers, but only a 6% reduction for 17-year-old drivers. GDL showed no association with crashes among 18-year-old drivers. Because we had few studies to summarise, particularly for older adolescents, our findings should be considered exploratory.
PMCID: PMC4103686  PMID: 23211352
9.  Nitrogen Dioxide and Allergic Sensitization in the 2005–2006 National Health and Nutrition Examination Survey 
Respiratory medicine  2013;107(11):1763-1772.
Allergic sensitization is a risk factor for asthma and allergic diseases. The relationship between ambient air pollution and allergic sensitization is unclear.
To investigate the relationship between ambient air pollution and allergic sensitization in a nationally representative sample of the US population.
We linked annual average concentrations of nitrogen dioxide (NO2), particulate matter ≤ 10 µm (PM10), particulate matter ≤ 2.5 µm (PM25), and summer concentrations of ozone (O3), to allergen-specific immunoglobulin E (IgE) data for participants in the 2005–2006 National Health and Nutrition Examination Survey (NHANES). In addition to the monitor-based air pollution estimates, we used the Community Multiscale Air Quality (CMAQ) model to increase the representation of rural participants in our sample. Logistic regression with population-based sampling weights was used to calculate adjusted prevalence odds ratios per 10 ppb increase in O3 and NO2, per 10 µg/m3 increase in PM10, and per 5 µg/m3 increase in PM2.5 adjusting for race, gender, age, socioeconomic status, smoking, and urban/rural status.
Using CMAQ data, increased levels of NO2 were associated with positive IgE to any (OR 1.15, 95% CI 1.04, 1.27), inhalant (OR 1.17, 95% CI 1.02, 1.33), and outdoor (OR 1.16, 95% CI 1.03, 1.31) allergens. Higher PM2.5 levels were associated with positivity to indoor allergen-specific IgE (OR 1.24, 95% CI 1.13, 1.36). Effect estimates were similar using monitored data.
Increased ambient NO2 was consistently associated with increased prevalence of allergic sensitization.
PMCID: PMC4071349  PMID: 24045117
air pollution; allergic; sensitization; epidemiology; NHANES; IgE
10.  mmeta: An R Package for Multivariate Meta-Analysis 
This paper describes the core features of the R package mmeta, whichimplements the exact posterior inference of odds ratio, relative risk, and risk difference given either a single 2 × 2 table or multiple 2 × 2 tables when the risks within the same study are independent or correlated.
PMCID: PMC4043353  PMID: 24904241
Appell function; Bayesian inference; bivariate beta-binomial; exact distribution; hypergeometric function; Sarmanov family
11.  Sample Size Determination in Shared Frailty Models for Multivariate Time-to-Event Data 
The frailty model is increasingly popular for analyzing multivariate time-to-event data. The most common model is the shared frailty model. Although study design consideration is as important as analysis strategies, sample size determination methodology in studies with multivariate time-to-event data is greatly lacking in the literature. In this paper, we develop a sample size determination method for the shared frailty model to investigate the treatment effect on multivariate event times. We analyzed the data using both a parametric model and a piecewise model with unknown baseline hazard, and compare the empirical power with the calculated power. Last, we discuss the formula for testing the treatment effect on recurrent events.
PMCID: PMC4024091  PMID: 24697252
Frailty model; Multivariate survival; Sample size
12.  Flexible Stopping Boundaries When Changing Primary Endpoints after Unblinded Interim Analyses 
It has been widely recognized that interim analyses of accumulating data in a clinical trial can inflate type I error. Different methods, from group sequential boundaries to flexible alpha spending functions, have been developed to control the overall type I error at pre-specified level. These methods mainly apply to testing the same endpoint in multiple interim analyses. In this paper, we consider a group sequential design with pre-planned endpoint switching after unblinded interim analyses. We extend the alpha spending function method to group sequential stopping boundaries when the parameters can be different between interim, or between interim and final analyses.
PMCID: PMC4024106  PMID: 24697500
Alpha spending function; Switching endpoints; Stopping boundaries; Interim analyses; Group sequential trials
13.  Analysis of Occupational Asbestos Exposure and Lung Cancer Mortality Using the G Formula 
American Journal of Epidemiology  2013;177(9):989-996.
We employed the parametric G formula to analyze lung cancer mortality in a cohort of textile manufacturing workers who were occupationally exposed to asbestos in South Carolina. A total of 3,002 adults with a median age of 24 years at enrollment (58% male, 81% Caucasian) were followed for 117,471 person-years between 1940 and 2001, and 195 lung cancer deaths were observed. Chrysotile asbestos exposure was measured in fiber-years per milliliter of air, and annual occupational exposures were estimated on the basis of detailed work histories. Sixteen percent of person-years involved exposure to asbestos, with a median exposure of 3.30 fiber-years/mL among those exposed. Lung cancer mortality by age 90 years under the observed asbestos exposure was 9.44%. In comparison with observed asbestos exposure, if the facility had operated under the current Occupational Safety and Health Administration asbestos exposure standard of <0.1 fibers/mL, we estimate that the cohort would have experienced 24% less lung cancer mortality by age 90 years (mortality ratio = 0.76, 95% confidence interval: 0.62, 0.94). A further reduction in asbestos exposure to a standard of <0.05 fibers/mL was estimated to have resulted in a minimal additional reduction in lung cancer mortality by age 90 years (mortality ratio = 0.75, 95% confidence interval: 0.61, 0.92).
PMCID: PMC3639723  PMID: 23558355
asbestos; bias (epidemiology); epidemiologic methods; healthy worker effect; occupations
14.  Bayesian inference on risk differences: an application to multivariate meta-analysis of adverse events in clinical trials 
Multivariate meta-analysis is useful in combining evidence from independent studies which involve several comparisons among groups based on a single outcome. For binary outcomes, the commonly used statistical models for multivariate meta-analysis are multivariate generalized linear mixed effects models which assume risks, after some transformation, follow a multivariate normal distribution with possible correlations. In this article, we consider an alternative model for multivariate meta-analysis where the risks are modeled by the multivariate beta distribution proposed by Sarmanov (1966). This model have several attractive features compared to the conventional multivariate generalized linear mixed effects models, including simplicity of likelihood function, no need to specify a link function, and has a closed-form expression of distribution functions for study-specific risk differences. We investigate the finite sample performance of this model by simulation studies and illustrate its use with an application to multivariate meta-analysis of adverse events of tricyclic antidepressants treatment in clinical trials.
PMCID: PMC3706106  PMID: 23853700
Bivariate beta-binomial model; Exact method; Hypergeometric function; Meta-analysis; Relative risk; Sarmanov family
15.  A prognostic signature of G₂ checkpoint function in melanoma cell lines 
Cell Cycle  2013;12(7):1071-1082.
As DNA damage checkpoints are barriers to carcinogenesis, G2 checkpoint function was quantified to test for override of this checkpoint during melanomagenesis. Primary melanocytes displayed an effective G2 checkpoint response to ionizing radiation (IR)-induced DNA damage. Thirty-seven percent of melanoma cell lines displayed a significant defect in G2 checkpoint function. Checkpoint function was melanoma subtype-specific with “epithelial-like” melanoma lines, with wild type NRAS and BRAF displaying an effective checkpoint, while lines with mutant NRAS and BRAF displayed defective checkpoint function. Expression of oncogenic B-Raf in a checkpoint-effective melanoma attenuated G2 checkpoint function significantly but modestly. Other alterations must be needed to produce the severe attenuation of G2 checkpoint function seen in some BRAF-mutant melanoma lines. Quantitative trait analysis tools identified mRNA species whose expression was correlated with G2 checkpoint function in the melanoma lines. A 165 gene signature was identified with a high correlation with checkpoint function (p < 0.004) and low false discovery rate (≤ 0.077). The G2 checkpoint gene signature predicted G2 checkpoint function with 77–94% accuracy. The signature was enriched in lysosomal genes and contained numerous genes that are associated with regulation of chromatin structure and cell cycle progression. The core machinery of the cell cycle was not altered in checkpoint-defective lines but rather numerous mediators of core machinery function were. When applied to an independent series of primary melanomas, the predictive G2 checkpoint signature was prognostic of distant metastasis-free survival. These results emphasize the value of expression profiling of primary melanomas for understanding melanoma biology and disease prognosis.
PMCID: PMC3646863  PMID: 23454897
G2 checkpoint; melanoma; microarray; ionizing radiation; oncogene
16.  The Bayesian Covariance Lasso 
Statistics and its interface  2013;6(2):243-259.
Estimation of sparse covariance matrices and their inverse subject to positive definiteness constraints has drawn a lot of attention in recent years. The abundance of high-dimensional data, where the sample size (n) is less than the dimension (d), requires shrinkage estimation methods since the maximum likelihood estimator is not positive definite in this case. Furthermore, when n is larger than d but not sufficiently larger, shrinkage estimation is more stable than maximum likelihood as it reduces the condition number of the precision matrix. Frequentist methods have utilized penalized likelihood methods, whereas Bayesian approaches rely on matrix decompositions or Wishart priors for shrinkage. In this paper we propose a new method, called the Bayesian Covariance Lasso (BCLASSO), for the shrinkage estimation of a precision (covariance) matrix. We consider a class of priors for the precision matrix that leads to the popular frequentist penalties as special cases, develop a Bayes estimator for the precision matrix, and propose an efficient sampling scheme that does not precalculate boundaries for positive definiteness. The proposed method is permutation invariant and performs shrinkage and estimation simultaneously for non-full rank data. Simulations show that the proposed BCLASSO performs similarly as frequentist methods for non-full rank data.
PMCID: PMC3925647  PMID: 24551316
Bayesian covariance lasso; non-full rank data; Network exploration; Penalized likelihood; Precision matrix
17.  Physical activity and maternal-fetal circulation measured by Doppler ultrasound 
To examine the association of physical activity on maternal-fetal circulation measured by uterine and umbilical artery Doppler flow velocimetry waveforms.
Study Design
Participants included 781 pregnant women with Doppler ultrasounds of the uterine and umbilical artery and who self-reported past week physical activity. Linear and generalized estimating equation regression models were used to examine these associations.
Moderate-to-vigorous total and recreational activity were associated with higher uterine artery pulsatility index (PI) and an increased risk of uterine artery notching as compared to reporting no total or recreational physical activity, respectively. Moderate-to-vigorous work activity was associated with lower uterine artery PI and a reduced risk of uterine artery notching as compared to no work activity. No associations were identified with the umbilical circulation measured by the resistance index.
In this epidemiologic study, recreational and work activity were associated with opposite effects on uterine artery PI and uterine artery notching, though associations were modest in magnitude.
PMCID: PMC3459289  PMID: 22678142
work; recreational activity; maternal-fetal blood flow; pregnancy; Doppler flow velocimetry waveforms; preeclampsia
18.  Comparison of Viral Env Proteins from Acute and Chronic Infections with Subtype C Human Immunodeficiency Virus Type 1 Identifies Differences in Glycosylation and CCR5 Utilization and Suggests a New Strategy for Immunogen Design 
Journal of Virology  2013;87(13):7218-7233.
Understanding human immunodeficiency virus type 1 (HIV-1) transmission is central to developing effective prevention strategies, including a vaccine. We compared phenotypic and genetic variation in HIV-1 env genes from subjects in acute/early infection and subjects with chronic infections in the context of subtype C heterosexual transmission. We found that the transmitted viruses all used CCR5 and required high levels of CD4 to infect target cells, suggesting selection for replication in T cells and not macrophages after transmission. In addition, the transmitted viruses were more likely to use a maraviroc-sensitive conformation of CCR5, perhaps identifying a feature of the target T cell. We confirmed an earlier observation that the transmitted viruses were, on average, modestly underglycosylated relative to the viruses from chronically infected subjects. This difference was most pronounced in comparing the viruses in acutely infected men to those in chronically infected women. These features of the transmitted virus point to selective pressures during the transmission event. We did not observe a consistent difference either in heterologous neutralization sensitivity or in sensitivity to soluble CD4 between the two groups, suggesting similar conformations between viruses from acute and chronic infection. However, the presence or absence of glycosylation sites had differential effects on neutralization sensitivity for different antibodies. We suggest that the occasional absence of glycosylation sites encoded in the conserved regions of env, further reduced in transmitted viruses, could expose specific surface structures on the protein as antibody targets.
PMCID: PMC3700278  PMID: 23616655
19.  Bivariate Random Effects Models for Meta-Analysis of Comparative Studies with Binary Outcomes: Methods for the Absolute Risk Difference and Relative Risk 
Multivariate meta-analysis is increasingly utilized in biomedical research to combine data of multiple comparative clinical studies for evaluating drug efficacy and safety profile. When the probability of the event of interest is rare or when the individual study sample sizes are small, a substantial proportion of studies may not have any event of interest. Conventional meta-analysis methods either exclude such studies or include them through ad-hoc continuality correction by adding an arbitrary positive value to each cell of the corresponding 2 by 2 tables, which may result in less accurate conclusions. Furthermore, different continuity corrections may result in inconsistent conclusions. In this article, we discuss a bivariate Beta-binomial model derived from Sarmanov family of bivariate distributions and a bivariate generalized linear mixed effects model for binary clustered data to make valid inferences. These bivariate random effects models use all available data without ad hoc continuity corrections, and accounts for the potential correlation between treatment (or exposure) and control groups within studies naturally. We then utilize the bivariate random effects models to reanalyze two recent meta-analysis data sets.
PMCID: PMC3348438  PMID: 21177306
clustered binary data; bivariate random effects models; Beta-binomial distribution; meta-analysis; bivariate generalized linear mixed models
20.  Missing Data in Clinical Studies: Issues and Methods 
Journal of Clinical Oncology  2012;30(26):3297-3303.
Missing data are a prevailing problem in any type of data analyses. A participant variable is considered missing if the value of the variable (outcome or covariate) for the participant is not observed. In this article, various issues in analyzing studies with missing data are discussed. Particularly, we focus on missing response and/or covariate data for studies with discrete, continuous, or time-to-event end points in which generalized linear models, models for longitudinal data such as generalized linear mixed effects models, or Cox regression models are used. We discuss various classifications of missing data that may arise in a study and demonstrate in several situations that the commonly used method of throwing out all participants with any missing data may lead to incorrect results and conclusions. The methods described are applied to data from an Eastern Cooperative Oncology Group phase II clinical trial of liver cancer and a phase III clinical trial of advanced non–small-cell lung cancer. Although the main area of application discussed here is cancer, the issues and methods we discuss apply to any type of study.
PMCID: PMC3948388  PMID: 22649133
21.  The Effect of HAART on HIV RNA Trajectory Among Treatment Naïve Men and Women: a Segmental Bernoulli/Lognormal Random Effects Model with Left Censoring 
Epidemiology (Cambridge, Mass.)  2010;21(0 4):S25-S34.
Highly active antiretroviral therapy (HAART) rapidly suppresses human immunodeficiency virus (HIV) viral replication and reduces circulating viral load, but the long-term effects of HAART on viral load remain unclear.
We evaluated HIV viral load trajectories over 8 years following HAART initiation in the Multicenter AIDS Cohort Study and the Women’s Interagency HIV Study. The study included 157 HIV-infected men and 199 HIV-infected women who were antiretroviral naïve and contributed 1311 and 1837 semiannual person-visits post-HAART, respectively. To account for within-subject correlation and the high proportion of left-censored viral loads, we used a segmental Bernoulli/lognormal random effects model.
Approximately 3 months (0.30 years for men and 0.22 years for women) after HAART initiation, HIV viral loads were optimally suppressed (ie, with very low HIV RNA) for 44% (95% confidence interval = 39%–49%) of men and 43% (38%–47%) of women, whereas the other 56% of men and 57% of women had on average 2.1 (1.5–2.6) and 3.0 (2.7–3.2) log10 copies/mL, respectively.
After 8 years on HAART, 75% of men and 80% of women had optimal suppression, whereas the rest of the men and women had suboptimal suppression with a median HIV RNA of 3.1 and 3.7 log10 copies/mL, respectively.
PMCID: PMC3736572  PMID: 20386106
22.  A prognostic signature of defective p53-dependent G1 checkpoint function in melanoma cell lines 
Pigment cell & melanoma research  2012;25(4):514-526.
Melanoma cell lines and normal human melanocytes were assayed for p53-dependent G1 checkpoint response to ionizing radiation-induced DNA damage. Sixty six percent of melanoma cell lines displayed a defective G1 checkpoint. Checkpoint function was correlated with sensitivity to ionizing radiation with checkpoint-defective lines being radio-resistant. Microarray analysis identified 316 probes whose expression was correlated with G1 checkpoint function in melanoma lines (P≤0.007) including p53 transactivation targets CDKN1A, DDB2 and RRM2B. The 316 probe list predicted G1 checkpoint function of the melanoma lines with 86% accuracy using a binary analysis and 91% accuracy using a continuous analysis. When applied to microarray data from primary melanomas, the 316 probe list was prognostic of four year distant metastases-free survival. Thus, p53 function, radio-sensitivity and metastatic spread may be estimated in melanomas from a signature of gene expression.
PMCID: PMC3397470  PMID: 22540896
gene; expression; signature; p53; function; checkpoint; melanoma
23.  Bayesian Analysis on Meta-analysis of Case-control Studies Accounting for Within-study Correlation 
Statistical methods in medical research  2011;10.1177/0962280211430889.
In retrospective studies, odds ratio is often used as the measure of association. Under independent beta prior assumption, the exact posterior distribution of odds ratio given a single 2 × 2 table has been derived in the literature. However, independence between risks within the same study may be an oversimplified assumption because cases and controls in the same study are likely to share some common factors and thus to be correlated. Furthermore, in a meta-analysis of case-control studies, investigators usually have multiple 2×2 tables. In this paper, we first extend the published results on a single 2×2 table to allow within study prior correlation while retaining the advantage of closed form posterior formula, and then extend the results to multiple 2 × 2 tables and regression setting. The hyperparameters, including within study correlation, are estimated via an empirical Bayes approach. The overall odds ratio and the exact posterior distribution of the study-specific odds ratio are inferred based on the estimated hyperparameters. We conduct simulation studies to verify our exact posterior distribution formulas and investigate the finite sample properties of the inference for the overall odds ratio. The results are illustrated through a twin study for genetic heritability and a meta-analysis for the association between the N-acetyltransferase 2 (NAT2) acetylation status and colorectal cancer.
PMCID: PMC3683108  PMID: 22143403
Bivariate beta-binomial model; Exact method; Hypergeometric function; Meta-analysis; Odds ratio; Sarmanov family
24.  Bayesian Posterior Distributions Without Markov Chains 
American Journal of Epidemiology  2012;175(5):368-375.
Bayesian posterior parameter distributions are often simulated using Markov chain Monte Carlo (MCMC) methods. However, MCMC methods are not always necessary and do not help the uninitiated understand Bayesian inference. As a bridge to understanding Bayesian inference, the authors illustrate a transparent rejection sampling method. In example 1, they illustrate rejection sampling using 36 cases and 198 controls from a case-control study (1976–1983) assessing the relation between residential exposure to magnetic fields and the development of childhood cancer. Results from rejection sampling (odds ratio (OR) = 1.69, 95% posterior interval (PI): 0.57, 5.00) were similar to MCMC results (OR = 1.69, 95% PI: 0.58, 4.95) and approximations from data-augmentation priors (OR = 1.74, 95% PI: 0.60, 5.06). In example 2, the authors apply rejection sampling to a cohort study of 315 human immunodeficiency virus seroconverters (1984–1998) to assess the relation between viral load after infection and 5-year incidence of acquired immunodeficiency syndrome, adjusting for (continuous) age at seroconversion and race. In this more complex example, rejection sampling required a notably longer run time than MCMC sampling but remained feasible and again yielded similar results. The transparency of the proposed approach comes at a price of being less broadly applicable than MCMC.
PMCID: PMC3282880  PMID: 22306565
Bayes theorem; epidemiologic methods; inference; Monte Carlo method; posterior distribution; simulation
25.  Performance of rapid influenza H1N1 diagnostic tests: a meta-analysis 
Following the outbreaks of 2009 pandemic H1N1 infection, rapid influenza diagnostic tests have been used to detect H1N1 infection. However, no meta-analysis has been undertaken to assess the diagnostic accuracy when this manuscript was drafted.
The literature was systematically searched to identify studies that reported the performance of rapid tests. Random effects meta-analyses were conducted to summarize the overall performance.
Seventeen studies were selected with 1879 cases and 3477 non-cases. The overall sensitivity and specificity estimates of the rapid tests were 0.51 (95%CI: 0.41, 0.60) and 0.98 (95%CI: 0.94, 0.99). Studies reported heterogeneous sensitivity estimates, ranging from 0.11 to 0.88. If the prevalence was 30%, the overall positive and negative predictive values were 0.94 (95%CI: 0.85, 0.98) and 0.82 (95%CI: 0.79, 0.85). The overall specificities from different manufacturers were comparable, while there were some differences for the overall sensitivity estimates. BinaxNOW had a lower overall sensitivity of 0.39 (95%CI: 0.24, 0.57) compared to all the others (p-value < 0.001), whereas QuickVue had a higher overall sensitivity of 0.57 (95%CI: 0.50, 0.63) compared to all the others (p-value = 0.005).
Rapid tests have high specificity but low sensitivity and thus limited usefulness.
PMCID: PMC3288365  PMID: 21883964
meta analysis; H1N1; diagnostic tests; rapid tests; sensitivity and specificity

