Same data set was used to assess the performances of the four models: RANN, FL, HEA and MLR. The RMSE value indicates that both HEA (4.27 ug/l) and RANN (4.28 ug/l) performed better than FL (4.49 ug/l) and MLR (4.60 ug/l). RMSE values obtained in this study were comparable with similar models developed for chlorophyll-a estimation at temperate lakes. FL model developed for temperate lakes recorded RMSE value of (7.0 ug/l) [
13] and HEA model reported RMSE value of (39 – 87 ug/l) [
15]. All the models are generally reliable as their predicted values correlate with observed values with r value of 0.5 or above. HEA and RANN produced similar performance when used for predicting phytoplankton biomass at temperate lakes [
17]. Previous studies have also shown that RANN and FL performed better than MLR model [
9-
11]. This is consistent with findings of the present study.
Based on AUC rating in [
32], FL (AUC value 0.84) and HEA (AUC value 0.82) can be categorized as excellent prediction models of chlorophyll-a concentration. RANN (AUC value 0.79) and MLR (AUC value 0.76) are categorized as acceptable models of chlorophyll-a concentration.
Performance inconsistencies between four models in terms of performance criteria in this study resulted from the methodology used in measuring the performance. RMSE is based on the level of error of prediction whereas AUC is based on binary classification task. Better performance of FL model over RANN and HEA might be due to collapsing continuous response (chlorophyll-a concentration) into two values. Theoretically and empirically that AUC is a better measure for model evaluation than accuracy. RMSE meanwhile measures model accuracy. Many ecological responses are difficult to measure accurately and definitely. Therefore AUC is suitable for characterizing responses that are dichotomous such as lake eutrophication [
44].
Dissolved oxygen was used to predict chlorophyll-a concentrations for the MLR model. Other variables were not used because they are highly correlated. Highly correlated variables were excluded stepwise during the process of constructing the MLR model. The use of MLR model to predict chlorophyll-a has serious drawbacks as the model is oversimplified. Eutrophication is a complex process with non linear relations between environmental variables and therefore cannot be explained with simplistic approach. Sensitivity analysis to select variables as used in RANN and FL determine the contributions of the independent variables and the way they act on the dependent variable. Sensitivity analysis adds strength to ANNs in their explanatory capacity. More importance is placed on variables that have large sensitivities. Variables with small sensitivities are discarded. This is important as the effect of presenting large number of input to ANN, increases the network size, which leads to increase of amount of data to estimate the connection weight and possible reduction of processing speed. Similarly for FL model large number of input causes difficulty in defining fuzzy members. Both RANN and FL models were developed using the final selected variables such as water temperature, Secchi depth, pH, ammonia nitrogen, dissolved oxygen and nitrate nitrogen. Chlorophyll-a concentration are related to algal biomass and concentration of chlorophyll-a in this study represent the five major division of algae that is Bacillariophyta, Chlorophyta, Cynanobacteria , Chrysophyta and Pyrropytha. It is well known that temperature can enhance phytoplankton growth rate [
45,
46]. Cyanobacteria and Chlorophyta which comprises 28% and 26% of algae population are identified as major contributor of chlorophyll-a concentration in Putrajaya Lake. Cyanobacteria and Chlorophyta are known to prefer high water temperature [
47-
49]. Inability to grow at high pH is a characteristic of oligotrophic species mainly desmids which comprises of major population of algae at Putrajaya Lake [
50]. It can be inferred that algae abundance at Putrajaya Lake are controlled by pH concentration. The nutrients, both ammonia nitrogen (NH3-N) and nitrate nitrogen (NO3-N) are among parameters selected by sensitivity analysis. Nutrients inputs into oligotrophic lakes often increase phytoplankton biomass and productivity [
51]. Secchi depth is correlated with chlorophyll-a measurements. In many standing waters, determination of Secchi depth has been found to be a simple and reliable approach to monitoring changes in seasonal phytoplankton biomass. Meanwhile it is typical to find higher levels of oxygen in depths where larger concentrations of phytoplankton are found [
52].
Even though ANN models are able to make perfect predictions and are recognised as powerful, they are considered to be ‘black-box’ in nature. Therefore explanatory methods such as FL and HEA have been adopted in this study with the idea to clarify the ‘black-box’ approach of ANNs. An FL approach proves to be a practical and successful technique when dealing with semi-qualitative knowledge and semiqualitative data [
53] which is, for example, the case when trying to model algal biomass or algal blooms. However, the definition of appropriate membership functions and the induction of inference rules, common to any FL modelling approach, remain difficult, since these very much depend on specific knowledge and expertise of any particular ecologist [
54]. HEA approach can overcome the limitation of FL and ANN approach. HEA allows discovery of predictive rule set in complex ecological data. The genetic algorithm used in HEA provides parameter optimization which resulted in the inclusion of nitrate nitrogen, Secchi depth, dissolved oxygen and pH for chlorophyll-a concentration estimation at Putrajaya Lake. The HEA rule sets discovered for chlorophyll-a concentrations at Putrajaya Lake is rather complex. The IF branch of the discovered rule set explains chlorophyll-a concentration can be determine by using dissolved oxygen when concentration of nitrate and Secchi depth are reported to be high. If this condition is not meet chlorophyll-a concentration is determine using the ELSE branch, where pH and dissolved oxygen is used. This can be justified by findings postulated in literature. Nutrients such as nitrates increase algae biomass. Concentrations of chlorophyll-a can be determined using dissolved oxygen as algal photosynthesis is usually the major supplier of oxygen to slow flowing water body. Dissolved oxygen and pH value in natural waters is primarily associated with photosynthesis [
55].