The choice of subsequent treatment in failing patients is of major importance in the management of HIV-infected patients. Genotypic and phenotypic resistance tests are important tools for choosing promising combination therapy for those patients. We investigated on a small sample a framework both for choosing optimal learner and building an estimator among a set of candidate through two different loss functions and *k*-fold cross-validation.

Based on cross-validation risk, the Super Learner estimator was the “best” learner though the linear model with only main terms LM(1) providing similar performance to that of Super Learner-5 and -6. The use of the SqE as loss function indicated that the inclusion of Logic Reg as an additional learner decreased the performance of the Super Learner estimator. However, prediction results based on the full dataset as well as accuracy questioned the use of SqE as loss function, although it is known that full dataset provided different results than those based on cross-validation strategy [

34,

35]. Based on cross-validation risk, the good performance of LM(1) should be compared with the poor performance of the linear model with interaction terms LM(2). Inversely, LM(2) outperforms LM(1) in the full dataset. In our small dataset, this finding is clearly due to overfit of the data by the LM(2) model. A researcher ignoring the Super Learner methodology using a linear model with interaction terms would obtain a good performance on the full dataset while such a learner would have not been selected from the discrete Super Learner methodology.

The choice of *m*_{try} parameter for Random Forest is a real problem. However, the common *m*_{try} used in regression setting (number of covariables divided by three) appears as a good compromise. Whatever the *m*_{try} value is, all mutations were selected at least on time using Random Forest on full dataset. This was expected due to the relative small number of mutations compared with 1,000 trees generated by the Random Forest model

The HIV-1 resistance study used either a continuous outcome (as HIV-1 RNA reduction from baseline to the time of interest) or a categorical outcome (classifying patients as achieving a virologic response at the time of interest). For example, virologic response can be defined an HIV-1 reduction of 1.5log _{10} copies/mL or more or having a viral load >50 copies/mL at the time of interest. Even if a continuous outcome is preferable as being more informative, the final goal of determining the drug resistance mutations associated with a poorer virologic response is to classify patients as “sensible” or “resistant” to a specified drug. The former patients would receive the corresponding drug as a part of their regimen while the latter patients would not. We used two threshold values of −0.5 and −0.6log _{10} copies/mL to define virologic response. For both threshold values LM(2), Super Learner-5 and -6 provided the highest accuracy with approximately 80% of patients correctly classified.

All the methods used in this work are usually applied to large or very large datasets. Simple linear regression model was fitted on more than 5,000 genotype-phenotype paired datasets from the same database [

6]. Investigation of logistic regression and nonlinear machine learning for predicting response to antiretroviral treatment was done on more than 3,000 treatment change episodes from the EuResist database [

34]. All these analyses were made retrospectively mainly for comparing different methods rather than for building rule-based algorithm.

A major reason to apply the Super Learner methodology on the Jaguar trial is that often the first version of an algorithm for a specific drug is based on a limited amount of data [

35–

37]. Such algorithms are updated later with publication of new data. Nonparametric methods are then often used on such a relative small amount of data [

38,

39]. Parametric methods have the advantage of not only integrating two-way interactions terms but also adjusting for some other variables that improve the prediction. Randomized clinical trials, in treatment experienced patients, provide frequently the first opportunity to investigate the impact of baseline mutations in the subsequent virologic response in those patients. It was then of interest to know whether the Super Learner methodology applied only on around one hundred of patients was able to produce the “best” learner on the basis of accuracy and prediction. The Jaguar trial which is an “add-on” study ensuring a good quality of relation between reverse transcriptase mutations and effect on the drug investigated, was a good opportunity for such investigation.

It has been shown that, in the context of genotype-phenotype correlation with a large database, the linear model without interactions provided also accurate predictions [

6]. However, based on the full dataset results, we highlight the importance of the two-way interactions terms for Least Squares. Interactions between mutations are of scientific interest, both to help in drug selection and to understand mechanisms of resistance.