Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Genet Evol Comput Conf. Author manuscript; available in PMC 2010 December 29.
Published in final edited form as:
Genet Evol Comput Conf. 2008; 2008: 353–354.
doi:  10.1145/1389095.1389159
PMCID: PMC3011228

A Balanced Accuracy Fitness Function Leads to Robust Analysis using Grammatical Evolution Neural Networks in the Case of Class Imbalance

Nicholas E. Hardison
Bioinformatics Research Ctr. Department of Statistics North Carolina State University Raleigh, NC 27606 ; ude.uscn@sidrahn
Theresa J. Fanelli
Ctr. for Human Genetics Research Department of Molecular Physiology & Biophysics; Vanderbilt University Nashville, TN 37232 ; ude.usp@4005fjt
Scott M. Dudek
Ctr. for Human Genetics Research Department of Molecular Physiology & Biophysics; Vanderbilt University Nashville, TN 27232 ;
David M. Reif
National Ctr. for Computational Toxicology; U.S. Environmental Protection Agency RTP, NC 27711 ; vog.ape@divad.fier
Marylyn D. Ritchie
Ctr. for Human Genetics Research Department of Molecular Physiology & Biophysics; Vanderbilt University Nashville, TN 37232 ;


Grammatical Evolution Neural Networks (GENN) is a computational method designed to detect gene-gene interactions in genetic epidemiology, but has so far only been evaluated in situations with balanced numbers of cases and controls. Real data, however, rarely has such perfectly balanced classes. In the current study, we test the power of GENN to detect interactions in data with a range of class imbalance using two fitness functions (classification error and balanced error), as well as data re-sampling. We show that when using classification error, class imbalance greatly decreases the power of GENN. Re-sampling methods demonstrated improved power, but using balanced accuracy resulted in the highest power. Based on the results of this study, balanced error has replaced classification error in the GENN algorithm.


Grammatical Evolution Neural Networks (GENN) uses grammatical evolution to evolve neural networks to detect gene-gene interactions in studies of complex human diseases [1]. GENN has shown initial successes in both real and simulated data, and while these results are encouraging, previous simulation studies have used datasets with balanced numbers of cases and controls. Unfortunately, when using standard classification error as the fitness function, many machine learning methods are not robust to class imbalance.

To try to solve this problem, investigators have tried techniques such as re-sampling [2] or altering the fitness metric. One metric that has been shown to be highly successful is balanced error/accuracy [3]. This metric has been shown to solve the class imbalance problem for another approach designed to detect epistasis–Multifactor Dimensionality Reduction (MDR) [4].

We assessed the performance of GENN on data with varying levels of class imbalance and show that the power of GENN using classification error decreases as the control:case ratio departs from unity. We compared three methods for addressing this concern: re-sampling methods (over- and under-sampling) and balanced accuracy as a fitness function.


2.1 Grammatical Evolution Neural Networks

The steps of GENN have been previously described in detail [1]. For the purposes of the current study, an option was added to the configuration file to specify the fitness function used: classification error (CE) or balanced error (BE). BE is the inverse of balanced accuracy, defined as the mean of sensitivity and specificity [3]:

Balanced Accuracy=(sensitivity+specificity)2=12[TP(TP+FN)+TN(TN+FP)]

where TP represents true positives, TN represents true negatives, FP represents false positives, and FN represents false negatives. This formula equally weights the errors within each class. In the case of balanced data, this is equivalent to standard CE.

2.2 Data Simulation

The intention of the data simulations for this power study was to mimic gene-gene interaction, or epistasis, in case-control genetic data to evaluate GENN using penetrance functions. Penetrance defines the probability of disease given a particular genotype combination by modeling the relationship between genetic variations and disease risk. We used two well-described purely epistatic models, where the heritability (the proportion of trait variance due to genetics) ~5%. The first is referred to as the XOR model, and the second is referred to as the ZZ model [5]. Both are nonlinear models with no marginal main effects. Software described by Moore et al [5] was used to simulate the data.

For both models, we simulated data with a range of control:case ratios and sample sizes. For the first set of simulations, the total number of individuals in the dataset was held constant, at two different total sample sizes: 600 and 1200. For each sample size, three control:case ratios were simulated: 1:1, 2:1, and 4:1. To ensure the results seen were due to class imbalance instead of decreasing numbers of cases, a second set of simulations was done, holding the number of cases constant at 300 and 600. Again, for each number of cases, three control:case ratios were simulated. For each set of parameters, 100 replicates were simulated. Each dataset had a total of 100 SNPs, two of which were functional in predicting disease. For the models with imbalanced control:case ratios, re-sampling was performed. In the case of under-sampling (US), controls were randomly removed until a ratio of 1:1 was achieved. In the case of over-sampling (OS), cases were randomly re-sampled until a 1:1 ratio was achieved.

2.3 Data Analysis

GENN was used to analyze all epistasis models with classification error, balanced error, or classification error in combination with data re-sampling. Parameter settings remained identical between the analyses and included: 4 demes, migration every 25 generations, population size of 200 per deme, 400 generations, crossover rate of 0.9, and a reproduction rate of 0.1. Power for all analyses is reported as the number of times GENN correctly identified the correct loci with no false positives over 100 datasets.


Tables 1 and and22 show the results for all analyses, with several apparent trends. Using classification error (CE), increased imbalanced ratios greatly decreases the power of GENN. The power of GENN greatly improves when OS is used. With US, a marked decrease in power in smaller datasets with large class imbalance is seen. This trend is ameliorated somewhat in larger datasets, as well as the datasets with fixed numbers of cases. Most significantly, for all models analyzed, power recovers completely when using balanced error (BE).

Table 1
Results for constant sample size simulations for different control:case ratios (CCR).
Table 2
Results for constant case number simulations.


From these results, we conclude that balanced error should be used as the fitness metric in GENN instead of classification error, as it outperforms standard classification error and re-sampling methods. Additionally, since balanced error and classification error are mathematically equivalent in when data is balanced, there is no disadvantage to using balanced error in balanced data.


This work was supported by National Institutes of Health grants HL65962, GM62758, and AG20135. We would also like to thank Jason H. Moore and Digna R. Velez for helpful discussions on class imbalance. This paper has been reviewed and approved for publication according to US EPA policy but does not necessarily represent the views of the Agency.


Categories and Subject Descriptors Genetics-Based Machine Learning and Learning Classifier Systems.

General Terms Algorithms


[1] Motsinger-Reif AA, Dudek SM, Hahn LW, Ritchie MD. Genet. Epidemiol. 2008. Comparison of Approaches for Machine Learning Optimization of Neural Networks for Detecting Gene-Gene Interactions in Genetic Epidemiology. Epub ahead of print. [PubMed]
[2] Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study. Intelligent Data Analysis Journal. 2002;6:429–450.
[3] Powers R, Goldszmidt M, Cohen I. Hewlett-Packard Development Company Technical Reports. Computer Science Department, Stanford University; Stanford, CA: 2005. Short term performance forecasting in enterprise systems.
[4] Velez D, White BW, Motsinger AA, Bush WS, Ritchie MD, Moore JH. A Balanced Accuracy Metric for Epistasis Modeling in Imbalanced Datasets using Multifactor Dimensionality Reduction. Genet. Epidemiol. 2007;4:306–15. [PubMed]
[5] Moore, J, Hahn, L, Ritchie, M, Thornton, T, White, B. Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. Genetic and Evolutionary Computation Conference; New York, USA. July 9–13, 2002; San Francisco, CA: Morgan Kaufman; 2002. pp. 1150–1155. [PMC free article] [PubMed]