# Related Articles

We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this ‘passive’ imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this ‘active’ imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

doi:10.1002/sim.5935

PMCID: PMC3920636
PMID: 23922236

missing data; multiple imputation; ratios; compatibility

Background

Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures.

Methods

Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies.

Results

Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated.

Conclusion

The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies.

doi:10.1186/1471-2288-9-57

PMCID: PMC2727536
PMID: 19638200

Background

Intermediate outcome variables can often be used as auxiliary variables for the true outcome of interest in randomized clinical trials. For many cancers, time to recurrence is an informative marker in predicting a patient’s overall survival outcome, and could provide auxiliary information for the analysis of survival times.

Purpose

To investigate whether models linking recurrence and death combined with a multiple imputation procedure for censored observations can result in efficiency gains in the estimation of treatment effects, and be used to shorten trial lengths.

Methods

Recurrence and death times are modeled using data from 12 trials in colorectal cancer. Multiple imputation is used as a strategy for handling missing values arising from censoring. The imputation procedure uses a cure model for time to recurrence and a time-dependent Weibull proportional hazards model for time to death. Recurrence times are imputed, and then death times are imputed conditionally on recurrence times. To illustrate these methods, trials are artificially censored 2-years after the last accrual, the imputation procedure is implemented, and a log-rank test and Cox model are used to analyze and compare these new data with the original data.

Results

The results show modest, but consistent gains in efficiency in the analysis by using the auxiliary information in recurrence times. Comparison of analyses show the treatment effect estimates and log rank test results from the 2-year censored imputed data to be in between the estimates from the original data and the artificially censored data, indicating that the procedure was able to recover some of the lost information due to censoring.

Limitations

The models used are all fully parametric, requiring distributional assumptions of the data.

Conclusions

The proposed models may be useful to improve the efficiency in estimation of treatment effects in cancer trials and shortening trial length.

doi:10.1177/1740774511414741

PMCID: PMC3197975
PMID: 21921063

Auxiliary Variables; Colon Cancer; Cure Models; Multiple Imputation; Surrogate Endpoints

We present a framework for generating multiple imputations for continuous data when the missing data mechanism is unknown. Imputations are generated from more than one imputation model in order to incorporate uncertainty regarding the missing data mechanism. Parameter estimates based on the different imputation models are combined using rules for nested multiple imputation. Through the use of simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal clinical trial of low-income women with depression where nonignorably missing data were a concern. We show that different assumptions regarding the missing data mechanism can have a substantial impact on inferences. Our method provides a simple approach for formalizing subjective notions regarding nonresponse so that they can be easily stated, communicated, and compared.

doi:10.1214/12-AOAS555

PMCID: PMC3596844
PMID: 23503984

nonignorable; NMAR; MNAR; not missing at random; missing not at random

When the outcome of interest is a quantity whose value may be altered through the use of medications, estimation of associations with this outcome is a challenging statistical problem. For participants taking medication the treated value is observed, but the underlying “untreated” value may be the measure that is truly of interest. Problematically, those with the highest untreated values may have some of the lowest observed measurements due to the effectiveness of medications. In this paper we propose an approach in which we parametrically estimate the underlying untreated variable of interest as a function of the observed treated value, dose and type of medication. Multiple imputation is used to incorporate the variability induced by the estimation. We show that this approach yields more realistic parameter estimates than other more traditional approaches to the problem, and that study conclusions may be altered in a meaningful way by using the imputed values.

doi:10.1002/sim.3341

PMCID: PMC2829271
PMID: 18613245

SYNOPSIS

Objective.

The purpose of this study was to assess an alternative statistical approach—multiple imputation—to risk factor redistribution in the national human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) surveillance system as a way to adjust for missing risk factor information.

Methods.

We used an approximate model incorporating random variation to impute values for missing risk factors for HIV and AIDS cases diagnosed from 2000 to 2004. The process was repeated M times to generate M datasets. We combined results from the datasets to compute an overall multiple imputation estimate and standard error (SE), and then compared results from multiple imputation and from risk factor redistribution. Variables in the imputation models were age at diagnosis, race/ethnicity, type of facility where diagnosis was made, region of residence, national origin, CD-4 T-lymphocyte cell count within six months of diagnosis, and reporting year.

Results.

In HIV data, male-to-male sexual contact accounted for 67.3% of cases by risk factor redistribution and 70.4% (SE=0.45) by multiple imputation. Also among males, injection drug use (IDU) accounted for 11.6% and 10.8% (SE=0.34), and high-risk heterosexual contact for 15.1% and 13.0% (SE=0.34) by risk factor redistribution and multiple imputation, respectively. Among females, IDU accounted for 18.2% and 17.9% (SE=0.61), and high-risk heterosexual contact for 80.8% and 80.9% (SE=0.63) by risk factor redistribution and multiple imputation, respectively.

Conclusions.

Because multiple imputation produces less biased subgroup estimates and offers objectivity and a semiautomated approach, we suggest consideration of its use in adjusting for missing risk factor information.

PMCID: PMC2496935
PMID: 18828417

Due to the growing need to combine data across multiple studies and to impute untyped markers based on a reference sample, several analytical tools for imputation and analysis of missing genotypes have been developed. Current imputation methods rely on single imputation, which ignores the variation in estimation due to imputation. An alternative to single imputation is multiple imputation. In this paper, we assess the variation in imputation by completing both single and multiple imputations of genotypic data using MACH, a commonly used hidden Markov model imputation method. Using data from the North American Rheumatoid Arthritis Consortium genome-wide study, the use of single and multiple imputation was assessed in four regions of chromosome 1 with varying levels of linkage disequilibrium and association signals. Two scenarios for missing genotypic data were assessed: imputation of untyped markers and combination of genotypic data from two studies. This limited study involving four regions indicates that, contrary to expectations, multiple imputations may not be necessary.

PMCID: PMC2795971
PMID: 20018064

Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear.

We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H0(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H0(T), we approximate it by the Nelson–Aalen estimator of H(T) or estimate it by Cox regression.

We compare the methods using simulation studies. We find that using log T biases covariate-outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson–Aalen estimator of H(T) in the imputation model. Copyright © 2009 John Wiley & Sons, Ltd.

doi:10.1002/sim.3618

PMCID: PMC2998703
PMID: 19452569

missing data; missing covariates; multiple imputation; proportional hazards model

Background

Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases.

Methods

Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses.

Results

For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate’s effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure.

Conclusions

Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.

doi:10.1186/1742-7622-9-3

PMCID: PMC3544721
PMID: 22695083

Missing data; Multiple imputation; Fully conditional specification; Multivariate normal imputation; Non-normal data

Missing data often occur in cross-sectional surveys and longitudinal and experimental studies. The purpose of this study was to compare the prediction of self-rated health (SRH), a robust predictor of morbidity and mortality among diverse populations, before and after imputation of the missing variable “yearly household income.” We reviewed data from 4,162 participants of Mexican origin recruited from July 1, 2002, through December 31, 2005, and who were enrolled in a population-based cohort study. Missing yearly income data were imputed using three different single imputation methods and one multiple imputation under a Bayesian approach. Of 4,162 participants, 3,121 were randomly assigned to a training set (to derive the yearly income imputation methods and develop the health-outcome prediction models) and 1,041 to a testing set (to compare the areas under the curve (AUC) of the receiver-operating characteristic of the resulting health-outcome prediction models). The discriminatory powers of the SRH prediction models were good (range, 69–72%) and compared to the prediction model obtained after no imputation of missing yearly income, all other imputation methods improved the prediction of SRH (P<0.05 for all comparisons) with the AUC for the model after multiple imputation being the highest (AUC = 0.731). Furthermore, given that yearly income was imputed using multiple imputation, the odds of SRH as good or better increased by 11% for each $5,000 increment in yearly income. This study showed that although imputation of missing data for a key predictor variable can improve a risk health-outcome prediction model, further work is needed to illuminate the risk factors associated with SRH.

doi:10.1007/s10903-010-9415-8

PMCID: PMC3205225
PMID: 21103931

Self-rated health; Missing income data; Data imputation techniques; Mean substitution; Multiple imputation; Minority health

Background

Environmental epidemiology, when focused on the life course of exposure to a specific pollutant, requires historical exposure estimates that are difficult to obtain for the full time period due to gaps in the historical record, especially in earlier years. We show that these gaps can be filled by applying multiple imputation methods to a formal risk equation that incorporates lifetime exposure. We also address challenges that arise, including choice of imputation method, potential bias in regression coefficients, and uncertainty in age-at-exposure sensitivities.

Methods

During time periods when parameters needed in the risk equation are missing for an individual, the parameters are filled by an imputation model using group level information or interpolation. A random component is added to match the variance found in the estimates for study subjects not needing imputation. The process is repeated to obtain multiple data sets, whose regressions against health data can be combined statistically to develop confidence limits using Rubin’s rules to account for the uncertainty introduced by the imputations. To test for possible recall bias between cases and controls, which can occur when historical residence location is obtained by interview, and which can lead to misclassification of imputed exposure by disease status, we introduce an “incompleteness index,” equal to the percentage of dose imputed (PDI) for a subject. “Effective doses” can be computed using different functional dependencies of relative risk on age of exposure, allowing intercomparison of different risk models. To illustrate our approach, we quantify lifetime exposure (dose) from traffic air pollution in an established case–control study on Long Island, New York, where considerable in-migration occurred over a period of many decades.

Results

The major result is the described approach to imputation. The illustrative example revealed potential recall bias, suggesting that regressions against health data should be done as a function of PDI to check for consistency of results. The 1% of study subjects who lived for long durations near heavily trafficked intersections, had very high cumulative exposures. Thus, imputation methods must be designed to reproduce non-standard distributions.

Conclusions

Our approach meets a number of methodological challenges to extending historical exposure reconstruction over a lifetime and shows promise for environmental epidemiology. Application to assessment of breast cancer risks will be reported in a subsequent manuscript.

doi:10.1186/1476-069X-12-62

PMCID: PMC3751034
PMID: 23919666

Exposure; Air pollution; Traffic; Benzo(a)pyrene; PAH; Multiple imputation; Epidemiology; In-migration; Dose

Summary

Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use repeated measures model to impute missing data on responder status as a result of subject drop-out and apply logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques (Rubin, 1987, Multiple Imputation for Nonresponse in Surveys) which draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within- and between-imputation variances are found to be conservative. The frequentist multiple imputation approach which fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of (Robins and Wang, 2000, Biometrika 87, 113–124) is shown to be more efficient. We propose to apply (Kenward and Roger, 1997, Biometrics 53, 983–997) degrees-of-freedom to account for the uncertainty associated with variance-covariance parameter estimates for the repeated measures model.

doi:10.1111/j.1541-0420.2010.01405.x

PMCID: PMC3245577
PMID: 20337628

Logistic regression; Missing data; Multiple imputation; Repeated measures

Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.

PMCID: PMC3280694
PMID: 22347786

Doubly robust; Missing at random; Multiple imputation; Nearest neighbor; Nonparametric imputation; Sensitivity analysis

Summary

Multiple imputation is a practically useful approach to handling incompletely observed data in statistical analysis. Parameter estimation and inference based on imputed full data have been made easy by Rubin's rule for result combination. However, creating proper imputation that accommodates flexible models for statistical analysis in practice can be very challenging. We propose an imputation framework that uses conditional semiparametric odds ratio models to impute the missing values. The proposed imputation framework is more flexible and robust than the imputation approach based on the normal model. It is a compatible framework in comparison to the approach based on fully conditionally specified models. The proposed algorithms for multiple imputation through the Monte Carlo Markov Chain sampling approach can be straightforwardly carried out. Simulation studies demonstrate that the proposed approach performs better than existing, commonly used imputation approaches. The proposed approach is applied to imputing missing values in bone fracture data.

doi:10.1111/j.1541-0420.2010.01538.x

PMCID: PMC3135790
PMID: 21210771

Acceptance-rejection sampling; Dirichlet process prior; Gibbs sampler; Hybrid MCMC; Molecular dynamics algorithm; Nonparametric Bayesian inference; Rejection control

Background

Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic.

Methods

Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios.

Results

The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test.

Conclusions

Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research.

doi:10.1186/1471-2288-13-144

PMCID: PMC3840572
PMID: 24252653

Missing data; Multiple imputation; Model checking; Kolmogorov-Smirnov test; Diagnostics; Simulations

In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared.

doi:10.1002/bimj.201000140

PMCID: PMC3124925
PMID: 21259309

Cluster randomized; Missing Data; Multiple Imputation

Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within-study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse-variance weighted meta-analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between-study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse-variance weighted meta-analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta-analysis, rather than meta-analyzing each of the multiple imputations and then combining the meta-analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration.

doi:10.1002/sim.5844

PMCID: PMC3963448
PMID: 23703895

missing data; multiple imputation; meta-analysis; individual participant data; Rubin's rules

Principled techniques for incomplete-data problems are increasingly part of mainstream statistical practice. Among many proposed techniques so far, inference by multiple imputation (MI) has emerged as one of the most popular. While many strategies leading to inference by MI are available in cross-sectional settings, the same richness does not exist in multilevel applications. The limited methods available for multilevel applications rely on the multivariate adaptations of mixed-effects models. This approach preserves the mean structure across clusters and incorporates distinct variance components into the imputation process. In this paper, I add to these methods by considering a random covariance structure and develop computational algorithms. The attraction of this new imputation modeling strategy is to correctly reflect the mean and variance structure of the joint distribution of the data, and allow the covariances differ across the clusters. Using Markov Chain Monte Carlo techniques, a predictive distribution of missing data given observed data is simulated leading to creation of multiple imputations. To circumvent the large sample size requirement to support independent covariance estimates for the level-1 error term, I consider distributional impositions mimicking random-effects distributions assigned a priori. These techniques are illustrated in an example exploring relationships between victimization and individual and contextual level factors that raise the risk of violent crime.

doi:10.1177/1471082X1001100404

PMCID: PMC3263314
PMID: 22271079

Missing data; multiple imputation; linear mixed-effects models; complex sample surveys; mixed effects; random covariances

This is a continuation of and a development of a debate between John Keown and me. The issue discussed is whether, in Britain, an unpaid system of blood donation promotes and is justified by its promotion of altruism. Doubt is cast on the notions that public policies can, and, if they can, that they should, be aimed at the promotion and expression of altruism rather than of self-interest, especially that of a mercenary sort. Reflections upon President Kennedy's proposition, introduced into the debate by Keown, that we should ask not what our country can do for us but what we can do for our country is pivotal to this casting of doubt. A case is made for suggesting that advocacy along the lines which Keown presents of an exclusive reliance on a voluntary, unpaid system of blood donation encourages inappropriate attitudes towards the provision of health care. Perhaps, it is suggested, and the suggestion represents, on my part, a change of mind as a consequence of the debate, a dual system of blood provision might be preferable.

PMCID: PMC479309
PMID: 10635510

It remains challenging to incorporate comprehensive geriatric assessment (CGA) into prospective clinical trials, but efforts continue to test the efficacy of CGA-guided interventions for information about how the CGA data should be utilized to modify treatment.

doi:10.1634/theoncologist.2013-0303

PMCID: PMC3868429
PMID: 24336119

Neuroendocrine tumors exhibit diverse clinical outcomes, and several recent studies have demonstrated that tumor histology, which incorporates assessments of tumor differentiation, tumor grade, mitotic rate, and proliferative index, can be associated with clinical outcomes and overall survival.

doi:10.1634/theoncologist.2013-0242

PMCID: PMC3825312
PMID: 24218002

SUMMARY

Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial.

doi:10.1002/sim.3111

PMCID: PMC3032542
PMID: 18205247

multiple partial imputation; selection model; pattern-mixture model; Markov transition model; nonignorable dropout; intermittent missing values

Background

Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection.

Method

In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels.

Results

We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found.

Conclusion

We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.

doi:10.1186/1471-2288-7-33

PMCID: PMC1945032
PMID: 17629912

The temporal context model posits that search through episodic memory is driven by associations between the multiattribute representations of items and context. Context, in turn, is a recency weighted sum of previous experiences or memories. Because recently processed items are most similar to the current representation of context, M. Usher, E. J. Davelaar, H. J. Haarmann, and Y. Goshen-Gottstein (2008) have suggested that the temporal context model (TCM-A) embodies a distinction between short-term and long-term memory and that this distinction is central to TCM-A’s success in accounting for the pattern of recency and contiguity observed across short and long timescales. The authors dispute Usher et al’s claim that context in TCM-A has much in common with classic interpretations of short-term memory. The idea that multiple representations interact in the process of memory encoding and retrieval (across timescales), as proposed in TCM-A, is very different from the classic dual-process view that distinct rules govern retrieval of recent and remote memories.

doi:10.1037/a0013724

PMCID: PMC2957902
PMID: 20976034

episodic memory; context; free recall; short-term memory; working memory

The authors attempted to catalog the use of procedures to impute missing data in the epidemiologic literature and to determine the degree to which imputed results differed in practice from unimputed results. The full text of articles published in 2005 and 2006 in four leading epidemiologic journals was searched for the text imput. Sixteen articles utilizing multiple imputation, inverse probability weighting, or the expectation-maximization algorithm to impute missing data were found. The small number of relevant manuscripts and diversity of detail provided precluded systematic analysis of the use of imputation procedures. To form a bridge between current and future practice, the authors suggest details that should be included in articles that utilize these procedures.

doi:10.1093/aje/kwn071

PMCID: PMC2561989
PMID: 18591202

expectation; imputation; missing data; probability weighting