It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.
In propensity score modeling, it is a standard practice to optimize the prediction of exposure status based on the covariate information. In a simulation study, we examined in what situations analyses based on various types of exposure propensity score (EPS) models using data mining techniques such as recursive partitioning (RP) and neural networks (NN) produce unbiased and/or efficient results.
We simulated data for a hypothetical cohort study (n=2000) with a binary exposure/outcome and 10 binary/ continuous covariates with seven scenarios differing by non-linear and/or non-additive associations between exposure and covariates. EPS models used logistic regression (LR) (all possible main effects), RP1 (without pruning), RP2 (with pruning), and NN. We calculated c-statistics (C), standard errors (SE), and bias of exposure-effect estimates from outcome models for the PS-matched dataset.
Data mining techniques yielded higher C than LR (mean: NN, 0.86; RPI, 0.79; RP2, 0.72; and LR, 0.76). SE tended to be greater in models with higher C. Overall bias was small for each strategy, although NN estimates tended to be the least biased. C was not correlated with the magnitude of bias (correlation coefficient [COR]=−0.3, p=0.1) but increased SE (COR=0.7, p<0.001).
Effect estimates from EPS models by simple LR were generally robust. NN models generally provided the least numerically biased estimates. C was not associated with the magnitude of bias but was with the increased SE.
propensity score; logistic regression; neural networks; recursive partitioning
With an adaptive partition procedure, we can partition a “time
course” into consecutive non-overlapped intervals such that the population
means/proportions of the observations in two adjacent intervals are
significantly different at a given level . However, the
widely used recursive combination or partition procedures do not guarantee a
global optimization. We propose a modified dynamic programming algorithm to
achieve a global optimization. Our method can provide consistent estimation
results. In a comprehensive simulation study, our method shows an improved
performance when it is compared to the recursive combination/partition
procedures. In practice, can be determined
based on a cross-validation procedure. As an application, we consider the
well-known Pima Indian Diabetes data. We explore the relationship among the
diabetes risk and several important variables including the plasma glucose
concentration, body mass index and age.
Investigators addressing nursing research are faced increasingly with the need to analyze data that involve variables of mixed types and are characterized by complex nonlinearity and interactions. Tree-based methods, also called recursive partitioning, are gaining popularity in various fields. In addition to efficiency and flexibility in handling multifaceted data, tree-based methods offer ease of interpretation.
To introduce tree-based methods, discuss their advantages and pitfalls in application, and describe their potential use in nursing research.
In this paper, (a) an introduction to tree-structured methods is presented, (b) the technique is illustrated via quality of life (QOL) data collected in the Breast Cancer Education Intervention (BCEI) study, and (c) implications for their potential use in nursing research are discussed.
As illustrated by the QOL analysis example, tree methods generate interesting and easily understood findings that cannot be uncovered via traditional linear regression analysis. The expanding breadth and complexity of nursing research may entail the use of new tools to improve efficiency and gain new insights. In certain situations, tree-based methods offer an attractive approach that help address such needs.
breast cancer survivors; CART; quality of life; tree-based methods; random forests
In this paper, we demonstrate the efficiency of simulations via direct computation of the partition function under various macroscopic conditions, such as different temperatures or volumes. The method can compute partition functions by flattening histograms, through, for example, the Wang-Landau recursive scheme, outside the energy space. This method offers a more general and flexible framework for handling various types of ensembles, especially ones in which computation of the density of states is not convenient. It can be easily scaled to large systems, and it is flexible in incorporating Monte Carlo cluster algorithms or molecular dynamics. High efficiency is shown in simulating large Ising models, in finding ground states of simple protein models, and in studying the liquid-vapor phase transition of a simple fluid. The method is very simple to implement and we expect it to be efficient in studying complex systems with rugged energy landscapes, e.g., biological macromolecules.
Our goal was to identify subgroups of sib pairs from the Framingham Heart Study data set that provided higher evidence of linkage to particular candidate regions for cardiovascular disease traits. The focus of this method is not to claim identification of significant linkage to a particular locus but to show that tree models can be used to identify subgroups for use in selected sib-pair sampling schemes.
We report results using a novel recursive partitioning procedure to identify subgroups of sib pairs with increased evidence of linkage to systolic blood pressure and other cardiovascular disease-related quantitative traits, using the Framingham Heart Study data set provided by the Genetic Analysis Workshop 13. This procedure uses a splitting rule based on Haseman-Elston regression that recursively partitions sib-pair data into homogeneous subgroups.
Using this procedure, we identified a subgroup definition for use as a selected sib-pair sampling scheme. Using the characteristics that define the subgroup with higher evidence for linkage, we have identified an area of focus for further study.
The specificity of nonnucleoside reverse transcriptase (RT) inhibitors (NNRTIs) for the RT of human immunodeficiency virus type 1 (HIV-1) has prevented the use of simian immunodeficiency virus (SIV) in the study of NNRTIs and NNRTI-based highly active antiretroviral therapy. However, a SIV-HIV-1 chimera (RT-SHIV), in which the RT from SIVmac239 was replaced with the RT-encoding region from HIV-1, is susceptible to NNRTIs and is infectious to rhesus macaques. We have evaluated the antiviral activity of efavirenz against RT-SHIV and the emergence of efavirenz-resistant mutants in vitro and in vivo. RT-SHIV was susceptible to efavirenz with a mean effective concentration of 5.9 ± 4.5 nM, and RT-SHIV variants selected with efavirenz in cell culture displayed 600-fold-reduced susceptibility. The efavirenz-resistant mutants of RT-SHIV had mutations in RT similar to those of HIV-1 variants that were selected under similar conditions. Efavirenz monotherapy of RT-SHIV-infected macaques produced a 1.82-log-unit decrease in plasma viral-RNA levels after 1 week. The virus load rebounded within 3 weeks in one treated animal and more slowly in a second animal. Virus isolated from these two animals contained the K103N and Y188C or Y188L mutations. The RT-SHIV-rhesus macaque model may prove useful for studies of antiretroviral drug combinations that include efavirenz.
In class prediction problems using microarray data, gene selection is essential to improve the prediction accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVM-RFE) has become one of the leading methods and is being widely used. The SVM-based approach performs gene selection using the weight vector of the hyperplane constructed by the samples on the margin. However, the performance can be easily affected by noise and outliers, when it is applied to noisy, small sample size microarray data.
In this paper, we propose a recursive gene selection method using the discriminant vector of the maximum margin criterion (MMC), which is a variant of classical linear discriminant analysis (LDA). To overcome the computational drawback of classical LDA and the problem of high dimensionality, we present efficient and stable algorithms for MMC-based RFE (MMC-RFE). The MMC-RFE algorithms naturally extend to multi-class cases. The performance of MMC-RFE was extensively compared with that of SVM-RFE using nine cancer microarray datasets, including four multi-class datasets.
Our extensive comparison has demonstrated that for binary-class datasets MMC-RFE tends to show intermediate performance between hard-margin SVM-RFE and SVM-RFE with a properly chosen soft-margin parameter. Notably, MMC-RFE achieves significantly better performance with a smaller number of genes than SVM-RFE for multi-class datasets. The results suggest that MMC-RFE is less sensitive to noise and outliers due to the use of average margin, and thus may be useful for biomarker discovery from noisy data.
Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization.
Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables.
A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay.
The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.
logistic models; infant; premature; predictive value of tests
Pachycephalosaurs were bipedal herbivorous dinosaurs with bony domes on their heads, suggestive of head-butting as seen in bighorn sheep and musk oxen. Previous biomechanical studies indicate potential for pachycephalosaur head-butting, but bone histology appears to contradict the behavior in young and old individuals. Comparing pachycephalosaurs with fighting artiodactyls tests for common correlates of head-butting in their cranial structure and mechanics.
Computed tomographic (CT) scans and physical sectioning revealed internal cranial structure of ten artiodactyls and pachycephalosaurs Stegoceras validum and Prenocephale prenes. Finite element analyses (FEA), incorporating bone and keratin tissue types, determined cranial stress and strain from simulated head impacts. Recursive partition analysis quantified strengths of correlation between functional morphology and actual or hypothesized behavior. Strong head-strike correlates include a dome-like cephalic morphology, neurovascular canals exiting onto the cranium surface, large neck muscle attachments, and dense cortical bone above a sparse cancellous layer in line with the force of impact. The head-butting duiker Cephalophus leucogaster is the closest morphological analog to Stegoceras, with a smaller yet similarly rounded dome. Crania of the duiker, pachycephalosaurs, and bighorn sheep Ovis canadensis share stratification of thick cortical and cancellous layers. Stegoceras, Cephalophus, and musk ox crania experience lower stress and higher safety factors for a given impact force than giraffe, pronghorn, or the non-combative llama.
Anatomy, biomechanics, and statistical correlation suggest that some pachycephalosaurs were as competent at head-to-head impacts as extant analogs displaying such combat. Large-scale comparisons and recursive partitioning can greatly refine inference of behavioral capability for fossil animals.
Focusing on chromosome 1, a recursive partitioning linkage algorithm (RP) was applied to perform linkage analysis on the rheumatoid arthritis NARAC data, incorporating covariates such as HLA-DRB1 genotype, age at onset, severity, anti-cyclic citrullinated peptide (anti-CCP), and life time smoking. All 617 affected sib pairs from the ascertained families were used, and an RP linkage model was used to identify linkage possibly influenced by covariates. This algorithm includes a likelihood ratio (LR)-based splitting rule, a pruning algorithm to identify optimal tree size, and a bootstrap method for final tree selection.
The strength of the linkage signals was evaluated by empirical p-values, obtained by simulating marker data under null hypothesis of no linkage. Two suggestive linkage regions on chromosome 1 were detected by the RP linkage model, with identified associated covariates HLA-DRB1 genotype and age at onset. These results suggest possible gene × gene and gene × environment interactions at chromosome 1 loci and provide directions for further gene mapping.
The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.
The impact of minor drug-resistant variants of the type 1 immunodeficiency virus (HIV-1) on the failure of antiretroviral therapy remains unclear. We have evaluated the importance of detecting minor populations of viruses resistant to non-nucleoside reverse-transcriptase inhibitors (NNRTI) during intermittent antiretroviral therapy, a high-risk context for the emergence of drug-resistant HIV-1. We carried out a longitudinal study on plasma samples taken from 21 patients given efavirenz and enrolled in the intermittent arm of the ANRS 106 trial. Allele-specific real-time PCR was used to detect and quantify minor K103N mutants during off-therapy periods. The concordance with ultra-deep pyrosequencing was assessed for 11 patients. The pharmacokinetics of efavirenz was assayed to determine whether its variability could influence the emergence of K103N mutants. Allele-specific real-time PCR detected K103N mutants in 15 of the 19 analyzable patients at the end of an off-therapy period while direct sequencing detected mutants in only 6 patients. The frequency of K103N mutants was <0.1% in 7 patients by allele-specific real-time PCR without further selection, and >0.1% in 8. It was 0.1%–10% in 6 of these 8 patients. The mutated virus populations of 4 of these 6 patients underwent further selection and treatment failed for 2 of them. The K103N mutant frequency was >10% in the remaining 2, treatment failed for one. The copy numbers of K103N variants quantified by allele-specific real-time PCR and ultra-deep pyrosequencing agreed closely (ρ = 0.89 P<0.0001). The half-life of efavirenz was higher (50.5 hours) in the 8 patients in whom K103N emerged (>0.1%) than in the 11 patients in whom it did not (32 hours) (P = 0.04). Thus ultrasensitive methods could prove more useful than direct sequencing for predicting treatment failure in some patients. However the presence of minor NNRTI-resistant viruses need not always result in virological escape.
We compared several techniques for assigning Hispanic ethnicity to records in data systems where this information may be missing, variously making use of country of origin, surname, race, and county of residence. We considered an algorithm in use by the North American Association of Central Cancer Registries (NAACCR), a variation of this developed by the authors, a “fast and frugal” algorithm developed with the aid of recursive partitioning methods, and conventional logistic regression. With the exception of logistic regression, each approach was rule-based: if specific criteria were met, an ethnicity assignment was made; otherwise, the next criterion was considered, until all records were assigned. We evaluated the algorithms on a sample of over 500,000 female clients from the New York State Cancer Services Program for whom self-reported Hispanic ethnicity was known. We found that all approaches yielded similarly high accuracy, sensitivity, and positive predictive value in all parts of the state, from areas with very low to very high Hispanic populations. An advantage of the fast and frugal method is that it consists of a small number of easily remembered steps.
Although prior research has identified a number of separate risk factors for suicide among patients with depression, little is known about how these factors may interact to modify suicide risks. Using an empirically-based decision tree analysis for a large national sample of Veterans Affairs (VA) health system patients treated for depression, we identify subgroups with particularly high or low rates of suicide.
We identified 887,859 VA patients treated for depression between April 1, 1999 and September 30, 2004. Randomly splitting the data into two samples (primary and replication samples), we developed a decision tree for the primary sample using recursive partitioning. We then tested whether the groups developed within the primary sample were associated with increased suicide risk in the replication sample.
The exploratory data analysis produced a decision tree with subgroups of patients at differing levels of risk for suicide. These were identified by a combination of factors including a co-occurring substance use disorder diagnosis, male gender, African American race and psychiatric hospitalization in the past year. The groups developed as part of the decision tree accurately discriminated between those with and without suicide in the replication sample. The patients at highest risk for suicide were those with a substance use disorder, who were non-African American and had an inpatient psychiatric stay within the past 12 months.
Study findings suggest that the identification of depressed patients at increased risk for suicide is improved through the examination of higher order interactions between potential risk factors.
suicide; depression; substance use disorders; decision trees; data mining
There is controversy about which children with minor head injury need to undergo computed tomography (CT). We aimed to develop a highly sensitive clinical decision rule for the use of CT in children with minor head injury.
For this multicentre cohort study, we enrolled consecutive children with blunt head trauma presenting with a score of 13–15 on the Glasgow Coma Scale and loss of consciousness, amnesia, disorientation, persistent vomiting or irritability. For each child, staff in the emergency department completed a standardized assessment form before any CT. The main outcomes were need for neurologic intervention and presence of brain injury as determined by CT. We developed a decision rule by using recursive partitioning to combine variables that were both reliable and strongly associated with the outcome measures and thus to find the best combinations of predictor variables that were highly sensitive for detecting the outcome measures with maximal specificity.
Among the 3866 patients enrolled (mean age 9.2 years), 95 (2.5%) had a score of 13 on the Glasgow Coma Scale, 282 (7.3%) had a score of 14, and 3489 (90.2%) had a score of 15. CT revealed that 159 (4.1%) had a brain injury, and 24 (0.6%) underwent neurologic intervention. We derived a decision rule for CT of the head consisting of four high-risk factors (failure to reach score of 15 on the Glasgow coma scale within two hours, suspicion of open skull fracture, worsening headache and irritability) and three additional medium-risk factors (large, boggy hematoma of the scalp; signs of basal skull fracture; dangerous mechanism of injury). The high-risk factors were 100.0% sensitive (95% CI 86.2%–100.0%) for predicting the need for neurologic intervention and would require that 30.2% of patients undergo CT. The medium-risk factors resulted in 98.1% sensitivity (95% CI 94.6%–99.4%) for the prediction of brain injury by CT and would require that 52.0% of patients undergo CT.
The decision rule developed in this study identifies children at two levels of risk. Once the decision rule has been prospectively validated, it has the potential to standardize and improve the use of CT for children with minor head injury.
We propose recursively imputed survival tree (RIST) regression for right-censored data. This new nonparametric regression procedure uses a novel recursive imputation approach combined with extremely randomized trees that allows significantly better use of censored data than previous tree based methods, yielding improved model fit and reduced prediction error. The proposed method can also be viewed as a type of Monte Carlo EM algorithm which generates extra diversity in the tree-based fitting process. Simulation studies and data analyses demonstrate the superior performance of RIST compared to previous methods.
Trees; Ensemble; Random Forests; Censored data; Imputation; Survival Analysis
We propose an interaction tree (IT) procedure to optimize the subgroup analysis in comparative studies that involve censored survival times. The proposed method recursively partitions the data into two subsets that show the greatest interaction with the treatment, which results in a number of objectively defined subgroups: in some of them the treatment effect is prominent while in others the treatment may have a negligible or even negative effect. The resultant tree structure can be used to explore the overall interaction between treatment and other covariates and help identify and describe possible target populations on which an experimental treatment demonstrates desired efficacy. We follow the standard CART (Breiman, et al., 1984) methodology to develop the interaction tree structure. Variable importance information is extracted via random forests of interaction trees. Both simulated experiments and an analysis of the primary billiary cirrhosis (PBC) data are provided for evaluation and illustration of the proposed procedure.
Brain metastases (BM) are the most common form of intracranial cancer. The incidence of BM seems to have increased over the past decade. Recursive partitioning analysis (RPA) of data from three Radiation Therapy Oncology Group (RTOG) trials (1200 patients) has allowed three prognostic groups to be identified. More recently a simplified stratification system that uses the evaluation of three main prognostics factors for radiosurgery in BM was developed.
To analyze the overall survival rate (OS), prognostic factors affecting outcomes and to estimate the potential improvement in OS for patients with BM from breast cancer, stratified by RPA class and brain metastases score (BS-BM). From January 1996 to December 2004, 174 medical records of patients with diagnosis of BM from breast cancer, who received WBRT were analyzed. The surgery followed by WBRT was used in 15.5% of patients and 84.5% of others patients were submitted at WBRT alone; 108 patients (62.1%) received the fractionation schedule of 30 Gy in 10 fractions. Solitary BM was present in 37.9 % of patients. The prognostic factors evaluated for OS were: age, Karnofsky Performance Status (KPS), number of lesions, localization of lesions, neurosurgery, chemotherapy, absence extracranial disease, RPA class, BS-BM and radiation doses and fractionation.
The OS in 1, 2 and 3 years was 33.4 %, 16.7%, and 8.8 %, respectively. The RPA class analysis showed strong relation with OS (p < 0.0001). The median survival time by RPA class in months was: class I 11.7, class II 6.2 and class III 3.0. The significant prognostic factors associated with better OS were: higher KPS (p < 0.0001), neurosurgery (P < 0.0001), single metastases (p = 0.003), BS-BM (p < 0.0001), control primary tumor (p = 0.002) and absence of extracranial metastases (p = 0.001). In multivariate analysis, the factors associated positively with OS were: neurosurgery (p < 0.0001), absence of extracranial metastases (p <0.0001) and RPA class I (p < 0.0001).
Our data suggests that patients with BM from breast cancer classified as RPA class I may be effectively treated with local resection followed by WBRT, mainly in those patients with single BM, higher KPS and cranial extra disease controlled. RPA class was shown to be the most reliable indicators of survival.
Long-term antiretroviral therapy (ART) for human immunodeficiency virus type one (HIV-1) infection shows limitations in pharmacokinetics and biodistribution while inducing metabolic and cytotoxic aberrations. In turn, ART commonly requires complex dosing schedules and leads to the emergence of viral resistance and treatment failures. We posit that the development of nanoformulated ART could preclude such limitations and affect improved clinical outcomes. To this end, we wet-milled 20 nanoparticle formulations of crystalline indinavir, ritonavir, atazanavir, and efavirenz, collectively referred to as “nanoART,” then assessed their performance using a range of physicochemical and biological tests. These tests were based on cell-nanoparticle interactions using monocyte-derived macrophages and their abilities to uptake and release nanoformulated drugs and effect viral replication. We demonstrate that physical characteristics such as particle size, surfactant coating, surface charge, and most importantly shape are predictors of cell uptake and antiretroviral efficacy. These studies bring this line of research a step closer to developing nanoART that can be used in the clinic to affect the course of HIV-1 infection.
antiretroviral; nanoparticles; HIV; crystalline; macrophage; monocyte; nanomedicine
Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data.
We developed a recursive support vector machine (R-SVM) algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE), paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments.
The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features.
Two commonly used ideas in the development of citation-based research performance indicators are the idea of normalizing citation counts based on a field classification scheme and the idea of recursive citation weighing (like in PageRank-inspired indicators). We combine these two ideas in a single indicator, referred to as the recursive mean normalized citation score indicator, and we study the validity of this indicator. Our empirical analysis shows that the proposed indicator is highly sensitive to the field classification scheme that is used. The indicator also has a strong tendency to reinforce biases caused by the classification scheme. Based on these observations, we advise against the use of indicators in which the idea of normalization based on a field classification scheme and the idea of recursive citation weighing are combined.
Bibliometric indicator; Citation impact; Field normalization; Recursive indicator
To determine the extent of viral resistance over time among non-clade B HIV-1 infected patients in Uganda maintained on first line highly active antiretroviral therapy (HAART) following virologic failure.
Genotyping was performed on sixteen patients with virologic failure who were enrolled in an open label randomized clinical trial of short-cycle treatment interruption.
All patients receiving efavirenz containing HAART had at least 1 efavirenz resistance mutation develop during follow-up. The majority 13/15 (86%) developed lamivudine resistance during follow-up but no thymidine analogue mutations (TAMS) developed during a median duration of virologic failure of 325.5 days.
Genotypic resistance to both efavirenz and lamivudine developed early during the course of treatment after virologic failure. TAMs did not emerge early despite moderate exposure time to thymidine analogs during virologic failure.
human immunodeficiency virus (HIV); antiretroviral drug resistance; virologic failure
Etravirine (ETV) is a second-generation nonnucleoside reverse transcriptase (RT) inhibitor (NNRTI) introduced recently for salvage antiretroviral treatment after the emergence of NNRTI-resistant human immunodeficiency virus type 1 (HIV-1). Following its introduction, two naturally occurring mutations in HIV-1 RT, V106I and V179D, were listed as ETV resistance-associated mutations. However, the effect of these mutations on the development of NNRTI resistance has not been analyzed yet. To select highly NNRTI-resistant HIV-1 in vitro, monoclonal HIV-1 strains harboring V106I and V179D (HIV-1V106I and HIV-1V179D) were propagated in the presence of increasing concentrations of efavirenz (EFV). Interestingly, V179D emerged in one of three selection experiments from HIV-1V106I and V106I emerged in two of three experiments from HIV-1V179D. Analysis of recombinant HIV-1 clones showed that the combination of V106I and V179D conferred significant resistance to EFV and nevirapine (NVP) but not to ETV. Structural analysis indicated that ETV can overcome the repulsive interactions caused by the combination of V106I and V179D through fine-tuning of its binding module to RT facilitated by its plastic structure, whereas EFV and NVP cannot because of their rigid structures. Analysis of clinical isolates showed comparable drug susceptibilities, and the same combination of mutations was found in some database patients who experienced virologic NNRTI-based treatment failure. The combination of V106I and V179D is a newly identified NNRTI resistance pattern of mutations. The combination of polymorphic and minor resistance-associated mutations should be interpreted carefully.
To derive and validate a search strategy that identifies administrative database research (ADR) in the MEDLINE database.
We downloaded all articles published between January 1, 2008 and October 7, 2009 in 20 top journals in internal medicine, cardiovascular medicine, public health, and health services research. These were reviewed to determine whether they were ADR (in which the study cohort, exposure, or outcome was defined using electronic data created for or during the processing of patients through their health care). We used chi-squared recursive partitioning to create a search strategy that maximized sensitivity based on publication type, MeSH headings, and text words.
Main Outcome Measures
Sensitivity and positive predictive value of the search strategy for true ADR in three samples: derivation (n=5,513); internal validation (n=2,710); and external validation (n=1,500).
The prevalence of ADR in the derivation, internal validation, and external validation samples was 2.6, 2.9, and 2.2 percent, respectively. The sensitivity of our search strategy in these samples was 90.9 percent (95 percent confidence interval [CI] 85.0–95.1), 88.5 percent (79.2–94.6), and 100 percent (99.3–100), respectively. The positive predictive value in these samples was 10.7 percent (9.0–12.6), 11.5 percent (9.1–14.4), and 3.3 percent (2.3–4.6), respectively.
We derived and validated a search strategy that is highly sensitive for ADR in MEDLINE.
Administrative data uses; biostatistical methods; information technology in health