Search tips
Search criteria 


Logo of aabcDove Medical PressSubscribeSubmit a ManuscriptSearchFollowDovepressAdvances and Applications in Bioinformatics and Chemistry
Adv Appl Bioinforma Chem. 2010; 3: 39–44.
Published online 2010 June 15.
PMCID: PMC3170005

Pharmacogenomics of drug efficacy in the interferon treatment of chronic hepatitis C using classification algorithms


Chronic hepatitis C (CHC) patients often stop pursuing interferon-alfa and ribavirin (IFN-alfa/RBV) treatment because of the high cost and associated adverse effects. It is highly desirable, both clinically and economically, to establish tools to distinguish responders from nonresponders and to predict possible outcomes of the IFN-alfa/RBV treatments. Single nucleotide polymorphisms (SNPs) can be used to understand the relationship between genetic inheritance and IFN-alfa/RBV therapeutic response. The aim in this study was to establish a predictive model based on a pharmacogenomic approach. Our study population comprised Taiwanese patients with CHC who were recruited from multiple sites in Taiwan. The genotyping data was generated in the high-throughput genomics lab of Vita Genomics, Inc. With the wrapper-based feature selection approach, we employed multilayer feedforward neural network (MFNN) and logistic regression as a basis for comparisons. Our data revealed that the MFNN models were superior to the logistic regression model. The MFNN approach provides an efficient way to develop a tool for distinguishing responders from nonresponders prior to treatments. Our preliminary results demonstrated that the MFNN algorithm is effective for deriving models for pharmacogenomics studies and for providing the link from clinical factors such as SNPs to the responsiveness of IFN-alfa/RBV in clinical association studies in pharmacogenomics.

Keywords: chronic hepatitis C, artificial neural networks, interferon, pharmacogenomics, ribavirin, single nucleotide polymorphisms

Chronic hepatitis C (CHC) affects more than 170 million individuals worldwide and is a chronic liver disease characterized by infection with the hepatitis C virus persisting for more than six months.1,2 Combination therapy with interferon-alfa and ribavirin (IFN-alfa/RBV) has been the preferred treatment for CHC patients,1,2 however, due to the high cost and significant adverse reactions, patients often stop pursuing the treatment.1,2 Consequently, it would be highly desirable to establish models that distinguish responders from nonresponders (NRs) and predict the possible outcome of the IFN-alfa/RBV treatment.3,4

The efficacy of IFN-alfa/RBV is likely influenced by the combined effects of a number of genetic variants.35 Accumulating evidence reveals that single nucleotide polymorphisms (SNPs) could be used as genetic markers to predict IFN-alfa/RBV treatment outcome in CHC.35 Results of several studies68 in different populations support the implication that the effects of IFN-alfa/RBV are associated with genetic variants. In addition, the genetic differences have been analyzed and found to be associated with IFN-alfa/RBV responses using a multiple logistic regression method.3

Artificial neural network (ANN) algorithms are generally adopted for complex classification applications because of the advantages of ANN algorithms, such as nonlinearity, fault tolerance, universality, and real-time operation.9,10 ANN algorithms have been employed to build a prediction model for the drug efficacy of IFN-alfa/RBV in CHC patients based on SNPs and other clinical factors.4,5 Moreover, the possible nonlinear relationships between genetic variants and antidepressant response have been explored using ANN algorithms in pharmacogenomics studies.11,12

The previous researchers35 mainly reported modeling IFN-alfa/RBV treatment response by using logistic regression or ANN methods without feature selection. In this work, we extended the previous research and applied both ANN algorithms and logistic regression with feature selection to predict IFN-alfa/RBV treatment outcomes using genetic factors.

Materials and methods


The cohort of 523 CHC patients was original to the previous study by Lin and colleagues4 and is described in detail in the latter research.4 Briefly, blood samples were collected from 523 CHC patients at National Taiwan University Hospital, Kaohsiung Medical University Hospital, Kaohsiung Chang-Gung Memorial Hospital, and Tri-Service General Hospital in Taiwan from 2002 to 2004. Patients whose serum HCV RNA became negative and lasted for more than 6 months after the end of treatment were defined as sustained virologic responders (SVRs) of the treatment. Those who still remained viremic were defined as NRs. There were 523 participants, including 350 SVRs and 173 NRs.4 We further converted the clinical diagnostic data into numerical forms, that is, 1 for “SVR” and 0 for “NR”, respectively.


Genomic DNAs were extracted from each of the blood samples by using QIAamp DNA Blood kit according to the manufacturer’s instructions as described in detail elsewhere.4,5 The quality of the extracted genomic DNAs was checked by agarose gel electrophoresis analysis and stored at −80°C until use.

Furthermore, genomic DNA was amplified using a commercially available INFor SNP detection kit (Vita Genomics, Inc., Taiwan) according to the manufacturer’s instructions as described in detail elsewhere.4,5 More specifically, fragments of target genes were amplified by the PCR reaction. Amplification was carried out using 2700 PCR machines (ABI, Foster City, USA) and the amplified products were purified by membrane ultra-filtration with MultiScreen PCR plate (Millipore, Billerica, USA) according to the manufacturer’s instructions. After the sequencing reaction, the reaction product was loaded onto an ABI 3700 Capillary Sequencer. Finally, the genotype of each tested individual was determined by computer software and was confirmed manually.

Genetic factors

In the present study, we only focused on the 24 SNPs as described in the previous study.13 The rationale for selecting these SNPs is described in detail elsewhere.35 The SNPs genetic markers of the participants were generated at the high-throughput genomics lab of Vita Genomics, Inc.

Because there are three genotypes per locus, each SNP was coded as 0 for homozygote of the major allele, 1 for heterozygote, and 2 for homozygote of the minor allele, respectively.

Artificial neural network algorithms

In this study, we used two families of classification algorithms, including multilayer feedforward neural network (MFNN) and logistic regression as a basis for comparisons. An MFNN is one type of ANN models where connections between the units do not form a directed cycle.14 These classifiers were performed using the Waikato Environment for Knowledge Analysis (WEKA) software.15

From an algorithmic point of view, the underlying process of this MFNN can be divided into the retrieving and learning phases.4,5 Let us assume an L-layer feedforward neural network (with Nl units at the l-th layer). In the retrieving phase, the MFNN iterates through all the layers to produce the retrieval response {ai(L), i = 1, …, NL} at the output layer based on the inputs of test patterns {ai(0), i = 1, …, N0}, the known weights wij of the network, and the nonlinear activation function fi (for example, sigmoid function). In the learning phase of this MFNN, the back-propagation algorithm16 is employed for the learning scheme. The back-propagation algorithm is a simple gradient descent approach. The weight updating process adopts the mechanism of back-propagated corrective signals from the output layer for the hidden layers. The goal is to iteratively select a set of weights wij(l) for all layers such that the squared error function E can be minimized by giving a pair of input training patterns {ai(0), i = 1, …, N0} and target training patterns {tj, j = 1, …, NL}.

Mathematically, the iterative gradient descent formulation for updating each specific weight wij (l) can be expressed as the following equation


where η is the learning rate and [partial differential]E/[partial differential]wij(l ) can be effectively calculated through a numerical chain rule by back-propagating the error signal from the output layer to the input layer.4

On the other hand, an MFNN is a spatial and iterative neural network which has several layers of hidden neuron units between the input and output neuron layers from a structural point of view.4,5 The basic function of each neuron is the linear basis function, and a nondecreasing and differentiable sigmoid function models the activation.16 In our approach, we employed an MFNN for modeling the responsiveness of IFN-alfa/RBV. Inputs contain the information about clinical factors such as SNPs for the CHC patients. Outputs contain the information about the responsiveness of IFN-alfa/RBV.

In summary, the MFNN is trained first by repeatedly providing input-output training pairs and executing the back-propagation learning algorithm.4,5 After this training process, the MFNN is tested by giving the inputs of testing data (that is, clinical factors) to the network. The forward propagation of the MFNN furnishes us with the responsiveness of IFN-alfa/RBV for a particular patient, indicating a means of inference from cause to effect.

Here, we used WEKA’s default parameters, such as the learning rate = 0.3 and the momentum variable = 0.2.

Feature selection

To identify a subset of clinical factors that maximize the performance of the prediction model, we employed the wrapper-based feature selection approach, where the feature selection algorithm acts as a wrapper around the classification algorithm.17 The wrapper-based approach conducts best-first search for a good subset and uses the classification algorithm itself as part of the function for evaluating feature subsets.18 The best-first search starts with an empty set of clinical factors and searches forward to choose possible subsets of clinical factors by greedy hill-climbing augmented with a backtracking technique.15 As shown in Figure 1, we applied MFNN and logistic regression with the wrapper-based approach, respectively.

Figure 1
In the wrapper-based feature selection approach, clinical factors are evaluated independently of the classification algorithms, such as multilayer feedforward neural network (MFNN) and logistic regression.

Evaluation of the predictive performance

To investigate the generalization of the prediction models produced by the above algorithms, we utilized the repeated 10-fold cross-validation method.17 First, the whole dataset was randomly divided into 10 distinct parts. Second, the model was trained by nine-tenths of the data and tested by the remaining tenth of data to estimate the predictive performance. Then, the above procedure was repeated nine more times by leaving out a different tenth of data as testing data and different nine-tenths of the data as training data. Finally, the average estimate over all runs was reported by running the above regular 10-fold cross-validation for 100 times with different splits of data.

To measure the performance of prediction models, we defined the accuracy as the proportion of true predicted participants of all tested participants.4,5 In addition, we used the receiver operating characteristic (ROC) methodology and calculated the area under the ROC curve (AUC).13,17 Most researchers have now adopted AUC for evaluating predictive ability of classifiers owing to the fact that AUC is a better performance metric than accuracy.19 The AUC of a classifier can be interpreted as the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative one.19 The higher the AUC, the better the learner.20 In this study, AUC was used as a value to compare the performance of different prediction models on a dataset.


Table 1 summarizes the results of repeated 10-fold cross-validation experiments using the MFNN algorithm and logistic regression with the wrapper-based feature selection method. First, the input-output training data pairs were used to train the MFNN models. There were 24 genetic factors, that is, 24 SNPs. Using this information, the MFNN models were trained with 1–4 hidden layers using the wrapper-based feature selection method. These trained MFNNs approximate the model of the responsiveness of IFN-alfa/RBV among CHC patients. After the networks were trained, we used the trained networks to find the responsiveness condition corresponding to the testing set with the 10-fold cross-validation method. We calculated accuracy and AUC for the 10-fold cross-validation experiments. As indicated in Table 1, the average values of accuracy for the MFNN prediction models with 1–4 layers were 80.4%, 80.4%, 80.0%, and 79.7%, respectively. Of all the MFNN models, MFNN with one layer and MFNN with two layers performed best, outperforming the other two MFNN models in terms of accuracy and AUC. For the MFNN models with the wrapper-based approach, only 4 factors out of 24 were identified.

Table 1
The results of repeated 10-fold cross-validation experiments using multilayer feedforward neural network (MFNN) and logistic regression with the wrapper-based feature selection method

Next, we employed logistic regression with the wrapper-based approach for comparisons. As shown in Table 1, the average value of accuracy for the logistic regression prediction model with the wrapper-based approach was 75.3%. Among all five predictive models, the MFNN models were superior to the logistic regression model in terms of accuracy. In addition, MFNN with one layer and MFNN with two layers were better than logistic regression in terms of AUC. Moreover, logistic regression with the wrapper-based approach selected 5 out of 24 factors.

Finally, the performance of logistic regression with the same four factors as the selected MFNN was at an accuracy of 72.1% and an AUC of 0.67, respectively.


To the best of our knowledge, this is the first study that proposes the use of MFNN and logistic regression with the wrapper-based feature selection method to model the drug responding status in CHC patients using genetic factors. We developed a pharmacogenomics methodology to predict the drug efficacy of IFN-alfa/RBV in CHC patients based on genetic factors such as SNPs. Our results demonstrated that a trained MFNN model is a promising method for providing the inference from genetic factors, such as SNPs, to the responsiveness of IFN-alfa/RBV. Our findings suggest that our tool may provide the medical reference prior to treatment based on the information of genetic factors such as SNP genotypes.

A similar study by Lin and colleagues4 has been reported to utilize the MFNN algorithms to evaluate the possible nonlinear interactions between IFN-alfa/RBV response and factors such as seven SNPs, viral genotype, viral load, age, and gender. The same cohort of 523 patients with CHC was used in their and our studies. They reported that an MFNN network with one hidden layer had an accuracy of 77.4%.4 The difference between our study and theirs was that in the present study we used 24 SNPs instead of only seven polymorphisms. Moreover, the wrapper-based feature selection method was not utilized in the previous study. As shown in our simulation results, our MFNN prediction model performed better than theirs in terms of accuracy. These preliminary results suggest that an MFNN model may be considered as a good method to deal with the complex nonlinear relationship between clinical factors and the responsiveness of IFN-alfa/RBV.

In the wrapper-based approach, no knowledge of the classification algorithm is needed for the feature selection process, which finds optimal features by using the classification algorithm as part of the evaluation function.17,18 In addition, the wrapper-based method has the advantage that it includes the interaction between feature subset search and the classification model.17 However, the wrapper-based method may have a risk of over-fitting.17,21 In a recent study, Huang and colleagues applied three classification algorithms including naive Bayes, the support vector machine algorithm, and the C4.5 decision tree algorithm with two feature selection methods to identify a subset of influential SNPs.17 They utilized the wrapper-based feature selection method and the hybrid feature selection approach combining the chi-squared and information-gain methods. Their results suggested that the naive Bayes model with the wrapper-based approach performed maximally among predictive models to infer the disease susceptibility dealing with the complex relationship between chronic fatigue syndrome and SNPs.17

The MFNN and logistic regression models are currently the most widely used pattern recognition techniques. In this study, our MFNN model achieved a higher successful rate of prediction than the traditional logistic regression model. Unlike logistic regression, MFNN has the ability to model the multidimensional and nonlinear relationships between the variables as found in complex medical applications.22,24 Moreover, the MFNN algorithms demonstrate robust performance in dealing with noisy or incomplete data.22,24 It is difficult to interpret individual variables generated by the MFNN, while logistic regression analysis provides insightful information for the interpretation of model parameters.14,23 Therefore, logistic regression can be used as a complementary method to the MFNN approach.22

In this study, we found that the MFNN model with two layers performed the same as the MFNN with one hidden layer in terms of accuracy and AUC. It has been demonstrated that the MFNN with only one hidden layer should be adequate as a universal approximator of any nonlinear function, indicating that the MFNN with one hidden layer is always enough.4,25 Thus, this implication was validated by our simulation results in the present study. When an approximation with one hidden layer would require an impractically large number of hidden units in solving some complex real world problems, multiple hidden layers may become necessary.4,26,27

Further direct experimentation is warranted to evaluate the impact of the proposed approach on patient outcomes in the context of computerized clinical decision support systems (CDSSs), which are information systems designed to aid clinicians in making clinical decisions.28 In general, CDSSs provide clinicians with information systems for diagnosis, prevention, and disease management, as well as for drug dosing and drug prescribing,28 and CDSSs have shown great promise for reducing practice errors, improving patient care, and achieving lower costs.29 Furthermore, CDSSs are probably best introduced into healthcare organizations in two stages, basic stage (such as drug-allergy checking, basic dosing guidance, and drug-drug interaction checking) and advanced stage (including dosing support for geriatric patients, guidance for medication-related laboratory testing, and drug-pregnancy checking).30 In addition, Kawamoto and colleagues identified several features strongly associated with a CDSS’s ability to improve clinical practice and suggested that the automatic provision of decision support as part of clinician workflow is the most important feature (p < 0.00001).29 This finding is consistent with one of the Ten Commandments for effective CDSSs published by Bates and colleagues, that is, implementing CDSSs should fit into the user’s work flow and integrate suggestions with clinical practice.31

There were several limitations to this study as follows. First, the small size of the sample does not allow definite conclusions to be drawn. In addition, the contributions of other genetic markers as well as demographic and clinical factors should be further examined. It would seem that SNPs are inadequate as the only variable. Other data, especially from the clinical records and laboratory values of patients, could be included to improve model performance as a further development of the method. In future work, large prospective clinical trials are necessary in order to answer whether these genetic and clinical factors are reproducibly associated with IFN-alfa/RBV treatment response.


In this study, we developed an ANN methodology with the wrapper-based feature selection method to predict the drug efficacy of IFN-alfa/RBV in CHC patients based on clinical factors such as SNPs. We demonstrated that a trained MFNN model is a promising method for providing the inference from genetic factors to the responsiveness of IFN-alfa/RBV.

Our findings suggested that our tool may allow patients and doctors to make more informed decisions based on clinical factors such as SNP genotyping data. Over the next few years, genetic tests for the pretreatment prediction may become a reality in patient care after prospective large clinical trials to validate clinical factors and genetic markers.4,5 It may also provide potential drug targets for the development of alternative therapeutic agents to treat CHC patients, especially for those NRs.4,5


The authors extend their sincere thanks to Vita Genomics, Inc. for funding this research and to Dr Pei-Jer Chen of the Hepatitis Research Center, National Taiwan University, Dr You-Chen Chao of the Tri-Service General Hospital, Dr Ming-Lung Yu of the Kaohsiung Medical University Hospital, and Dr Chuan-Mo Lee of the Kaohsiung Chang-Gung Memorial Hospital for research collaboration. The authors would also like to thank the anonymous reviewers for their constructive comments, which improved the context and the presentation of this paper.


1. Lo ReV, 3rd, Kostman JR. Management of chronic hepatitis C. Postgrad Med J. 2005;81(956):376–382. [PMC free article] [PubMed]
2. Modi AA, Liang TJ. Hepatitis C: a clinical review. Oral Dis. 2008;14(1):10–14. [PMC free article] [PubMed]
3. Hwang Y, Chen EY, Gu ZJ, et al. Genetic Predisposition of Responsiveness to Therapy for Chronic Hepatitis C. Pharmacogenomics. 2006;7(5):697–709. [PubMed]
4. Lin E, Hwang Y, Wang SC, Gu ZJ, Chen EY. An artificial neural network approach to the drug efficacy of interferon treatments. Pharmacogenomics. 2006;7(7):1017–1024. [PubMed]
5. Lin E, Hwang Y, Chen EY. Gene-gene and gene-environment interactions in interferon therapy for chronic hepatitis C. Pharmacogenomics. 2007;8(10):1327–1335. [PubMed]
6. Hijikata M, Ohta Y, Mishiro S. Identification of a single nucleotide polymorphism in the MxA gene promoter (G/T at nt-88) correlated with the response of hepatitis C patients to interferon. Intervirology. 2000;43(2):124–127. [PubMed]
7. Yee LJ, Tang J, Gibson AW, Kimberly R, Van Leeuwen DJ, Kaslow RA. Interleukin 10 polymorphisms as predictors of sustained response in antiviral therapy for chronic hepatitis C infection. Hepatology. 2001;33(3):708–712. [PubMed]
8. Sugimoto Y, Kuzushita N, Takehara T, et al. A single nucleotide polymorphism of the low molecular mass polypeptide 7 gene influences the interferon response in patients with chronic hepatitis C. J Viral Hepat. 2002;9(5):377–384. [PubMed]
9. Kung SY, Hwang JN. Neural networks for intelligent multimedia processing. Proc IEEE. 1998;86:1244–1272.
10. Erb RJ. Introduction to backpropagation neural network computation. Pharm Res. 1993;10:165–170. [PubMed]
11. Serretti A, Smerald E. Neural network analysis in pharmacogenetics of mood disorders. BMC Medical Genetics . 2004;5:27. [PMC free article] [PubMed]
12. Lin E, Chen PS, Lee IH, et al. Modeling short-term antidepressant responsiveness with artificial neural networks. Open Access Bioinformatics. 2010 In press.
13. Lin E, Hwang Y. A support vector machine approach to assess drug efficacy of interferon-alpha and ribavirin combination therapy. Mol Diagn Ther. 2008;12(4):219–223. [PubMed]
14. Bishop CM. Neural Networks for Pattern Recognition. Oxford, UK: Clarendon Press; 1995.
15. Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers; 2005.
16. Rumelhart DE, Hinton GE, William RJ. The Micro-Structure of Cognition, vol 1, Foundations. Cambridge, MA: MIT Press; 1996. Learning internal representation by error propagation. Parallel Distributed Processing: Explorations.
17. Huang LC, Hsu SY, Lin E. A comparison of classification methods for predicting Chronic Fatigue Syndrome based on genetic data. J Transl Med. 2009;7(1):81. [PMC free article] [PubMed]
18. Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence. 1997;97:273–324.
19. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27:861–874.
20. Hewett R, Kijsanayothin P. Tumor classification ranking from microarray data. BMC Genomics. 2008;9(Suppl 2):S21. [PMC free article] [PubMed]
21. Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–2517. [PubMed]
22. Sargent DJ. Comparison of artificial neural networks with other statistical approaches. Cancer. 2001;91(8):1636–1642. [PubMed]
23. Eftekhar B, Mohammad K, Ardebili HE, Ghodsi M, Ketabchi E. Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data. BMC Medical Informatics and Decision Making . 2005;5:3. [PMC free article] [PubMed]
24. Lin E, Hwang Y, Liang KH, Chen EY. Pattern-recognition techniques with haplotype analysis in pharmacogenomics. Pharmacogenomics. 2007;8(1):75–83. [PubMed]
25. White H. Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings. Neural Networks. 1990;3:535–549.
26. Sontag ED. Feedback stabilization using two-hidden-layers nets. IEEE Trans Neural Networks. 1992;3(6):981–990. [PubMed]
27. Lin CT, Lee G. Neural Fuzzy Systems. Upper Saddle River, NJ: Prentice-Hall; 1996.
28. Garg AX, Adhikari NK, McDonald H, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005;293(10):1223–1238. [PubMed]
29. Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ. 2005;(330):7494–765. [PMC free article] [PubMed]
30. Kuperman GJ, Bobb A, Payne TH, et al. Medication-related clinical decision support in computerized provider order entry systems: a review. J Am Med Inform Assoc. 2007;14(1):29–40. [PMC free article] [PubMed]
31. Bates DW, Kuperman GJ, Wang S, et al. Ten commandments for effective clinical decision support: making the practice of evidence-based medicine a reality. J Am Med Inform Assoc. 2003;10:523–530. [PMC free article] [PubMed]

Articles from Advances and Applications in Bioinformatics and Chemistry : AABC are provided here courtesy of Dove Press