|Home | About | Journals | Submit | Contact Us | Français|
Uganda just like any other Sub-Saharan African country, has a high under-five child mortality rate. To inform policy on intervention strategies, sound statistical methods are required to critically identify factors strongly associated with under-five child mortality rates. The Cox proportional hazards model has been a common choice in analysing data to understand factors strongly associated with high child mortality rates taking age as the time-to-event variable. However, due to its restrictive proportional hazards (PH) assumption, some covariates of interest which do not satisfy the assumption are often excluded in the analysis to avoid mis-specifying the model. Otherwise using covariates that clearly violate the assumption would mean invalid results.
Survival trees and random survival forests are increasingly becoming popular in analysing survival data particularly in the case of large survey data and could be attractive alternatives to models with the restrictive PH assumption. In this article, we adopt random survival forests which have never been used in understanding factors affecting under-five child mortality rates in Uganda using Demographic and Health Survey data. Thus the first part of the analysis is based on the use of the classical Cox PH model and the second part of the analysis is based on the use of random survival forests in the presence of covariates that do not necessarily satisfy the PH assumption.
Random survival forests and the Cox proportional hazards model agree that the sex of the household head, sex of the child, number of births in the past 1 year are strongly associated to under-five child mortality in Uganda given all the three covariates satisfy the PH assumption. Random survival forests further demonstrated that covariates that were originally excluded from the earlier analysis due to violation of the PH assumption were important in explaining under-five child mortality rates. These covariates include the number of children under the age of five in a household, number of births in the past 5 years, wealth index, total number of children ever born and the child’s birth order. The results further indicated that the predictive performance for random survival forests built using covariates including those that violate the PH assumption was higher than that for random survival forests built using only covariates that satisfy the PH assumption.
Random survival forests are appealing methods in analysing public health data to understand factors strongly associated with under-five child mortality rates especially in the presence of covariates that violate the proportional hazards assumption.
The online version of this article (doi:10.1186/s13104-017-2775-6) contains supplementary material, which is available to authorized users.
The third sustainable development goal states that ensuring healthy lives and promoting the well-being for all at all ages is essential to sustainable development [1, 2]. Critical among these age groups are the children under the age of five. In 2015, the United Nations recorded that a total of 17,000 fewer children died each day than was the case in 1990. However, more than six million children still die before their fifth birthday each year. Most of these deaths occur in Sub-Saharan Africa. Uganda in particular recorded an under-five mortality rate of 71.28 per 1000 live births in the period of 2005–2011 . This rate is approximately 3 times the third sustainable development goal target of at least as low as 25 per 1000 live births .
Identifying factors strongly associated with under-five child mortality rates is a topic of increased research interest for most of the countries in Sub-Saharan Africa, Uganda included. Several statistical methods have been used in studies aimed at identifying factors that are strongly associated with under-five child mortality rates [5–7]. Most studies have employed standard survival methodologies like the Cox-proportional hazards model [8–11]. However, the model has constantly been criticized for its restrictive assumption commonly referred to as the proportional hazards (PH) assumption [12–14].
Extensions for this model to deal with survival data in situations where the PH assumption is violated have been suggested such as the extended Cox model [15–17]. The extended Cox model is more flexible and most importantly relaxes the standard assumptions of the original Cox model, this however, comes at a cost of a more complicated model. For example, employing a smooth spline helps one to explicitly specify the functions for the Cox regression relationship but it requires one to specify correct degrees of freedom, number and placement of the knot points and order of the regression spline model (which could be quadratic, cubic, quartic, some combination of different orders, among others). In addition, polynomial spline models must be constrained by goodness-of-fit characteristics based on the actual data, resulting in penalty functions and other such criteria that cannot be universally applied to varying datasets [18–20]. This implies therefore that the hazard estimates of the extended Cox model are dependent on the parameter and model specification considered. Estimates of both nonlinearity and time-dependence vary depending upon the degrees of freedom and other parameters. Furthermore, models that fit the data equally well can have different shapes for the hazard function and result in different hazard estimates. Relying heavily on hazard estimates based on these models may require a more skilled user methodologically because there is no standardized method for determining which parameters are most appropriate . However, it should be noted that when all the covariates being considered satisfy the PH assumption then the Cox PH model is preferred.
Survival trees and random survival forests formally implemented in R [21, 22], are simple but robust methods that have been considered to be an attractive alternative model choice for survival data. These methods are extensions of classification and regression trees (CART) and random forests [23, 24]. The methods are fully non parametric, have fewer assumptions and can easily deal with high dimensional data . Random survival forests do not impose a restrictive structure on how the variables should be combined. If the relationship between the predictor variables and the response variable is complex with non linear patterns and interactions then random survival forests are capable of incorporating this automatically [26, 27]. Most often researchers who use the Cox PH model for time-to-event data go ahead and use it even when covariates in the model do not satisfy the PH assumption and make interpretations as if the PH assumption holds for each covariate in the model. Random survival forests do not rely on this assumption for their validity thus this can protect a user who is not familiar with model enhancements such as the extended Cox model to deal with covariates that do not satisfy the restrictive PH assumption.
In a study to identify factors strongly associated to under-five child mortality rates in Uganda , many of the covariates were excluded from the Cox PH model analysis due to their violation of the PH assumption. Random survival forests were recommended as alternative methods for the study . These methods have been found appropriate to use in the presence of covariates that do not satisfy the PH assumption or in situations where the relationship between the response and the covariates may be complicated [26, 27]. In this study, we re-analyse the dataset used in the study by  using both the Cox PH model and random survival forests where the former is used to emphasize the difference between them. We also investigate the predictive performance of the two random survival forest models used in this study in the presence of covariates that violate the PH assumption and compared these results with the predictive performance of the models used in the presence of only those covariates that satisfied the PH assumption.
We implement random survival forests on Uganda Demographic Health Survey data for 2011 to determine factors strongly associated to under-five child mortality rates. First we compare the results from random survival forests with those of the Cox PH model in the presence of covariates that satisfy the PH assumption. We also fit random survival forests on our dataset including covariates that violate the PH assumption which were excluded in the first analysis . We further discuss our findings on predictive performance for random survival forests in the presence of covariates that violate and those that do not violate the PH assumption.
The article is structured as follows: in the “Methods” section, we discuss the data and the methods used. The “Results” section presents results from the methods used. In the “Predictive performance” section, we present the results on predictive performance of the methods used. We state the general discussion and conclusions from this study in the “Discussion” and “Conclusions” section, respectively. Appendices 1 and 2 are provided as additional materials to describe the models and the methods used to evaluate the models, respectively.
To understand factors affecting under-five child mortality rates in Uganda, the 2011 Uganda Demographic Health Survey (UDHS) data was used . This dataset was collected from May 2011 through to December 2011. This was the fifth comprehensive survey conducted in Uganda as part of the worldwide Demographic and Health Surveys . A representative sample of 10,086 households was selected during the 2011 UDHS. The sample was selected in two stages. A total of 404 enumeration areas (EAs) were selected from among a list of clusters sampled for the 2009/10 Uganda National Household Survey (2010 UNHS). In the second stage of sampling, households in each cluster were selected from a complete listing of households. Eligible women for the interview were aged between 15 and 49 years of age who were either usual residents or visitors present in the selected household on the night before the survey. Out of 9247 eligible women, 8674 were successively interviewed with a response rate of 94% (91% in urban and 95% in rural areas). The study population for this analysis includes infants born between exactly one and 5 years preceding the 2011 UDHS.
In this study, 19 covariates are considered as candidates for analysis and their choice was based on related literature [29–31]. To some extent, other limitations like high level of missingness in the dataset influenced our covariate choice. The covariates include; mother’s age group (<20, 20–29, 30–39, 40+ years); type of residence (urban, rural); mother’s level of education (illiterate, primary, secondary and higher); partner’s level of education (illiterate, primary, secondary and higher); birth status (singleton birth, multiple births); sex of the child (male, female); wealth index (poorest, poorer, middle, richer, richest); children ever born (one child, two children, three children, four and more); birth order (first child, second to third child, 4th–6th child); religion (Catholic, Muslim, other Christians, others); types of toilet facility (flush toilet, pit latrine, no facility); mother’s occupation (not-working, sales and service, agriculture); current working status (working, not working); births in the past 1 year (no births, 1-birth, 2-births); births in the past 5 years (1-birth, 2-births, 3-births, 4-births); children under the age of five in the household (no child, one child, two children, three children, four children); sex of the household head (male, female); source of drinking water (piped water, borehole, well, surface/rain/pond/lake, others); mother’s age at first birth (less than 20, 20–29, 30–39 years). Note that all covariates are categorical. The categories of covariates that were not originally categorical, were created based on other similar studies in literature .
Table Table11 shows the distribution of deaths for children under the age of five across all covariates considered in the study. The percentages of deaths for each of the covariate categories is stated in the second column of Table 1. For example, 7.7% of children born to mothers with no education died before celebrating their fifth birthday. This is the highest percentage compared to those children born of mothers with primary education which is 6.4% and secondary or higher education which is 4.2%. Covariates with categories that have the highest percentage of deaths include number of children in the household under the age of five, number of births in the past 5 years, number of births in the past 1 year, birth status and lastly age of the mother at first birth.
Under-five child mortality rate is defined as the mortality rate from the age of 1 month to the age of 59 months. Thus the dependent variable used in our analysis is the time-to-event which in our case is the age of a child reported at the time of the interview (survey) for those still alive or the age of the child when he/she died. Thus children under the age of five that were still alive at the date of the interview were considered to be right censored.
The Cox proportional hazards model and random survival forests are both used in this analysis to identify factors that affect under-five child survival in Uganda. Two random survival forest implementations are used. The first forest is constructed on survival trees that are built using the log-rank split-rule. The second forest is constructed on survival trees built using the log-rank score split-rule. Note that the split-rule based on the log-rank score is desirable in the presence of tied event times. To evaluate the predictive performance for the models used, cross-validated integrated brier scores are used. The Cox PH model and the two random survival forest implementations are described in detail in Additional file 1: Appendix 1. To evaluate the predictive performance for the models used, cross-validated integrated brier scores are used and these are described in detail in Additional file 1: Appendix 2. Note that Appendices 1 and 2 are given as additional material in Addition file 1: Appendices 1 and 2.
To use the Cox PH model, it is important to establish which covariates in the dataset satisfy the PH assumption. We used the Schoenfeld residual test [32–34] in R an open source software  using the command cox.zph. Under this test, it is assumed that regression parameters are constant over time, hence the corresponding hazard ratios are constant over time. All those regression parameters (covariate effects) that changed with time, do not satisfy the PH assumption and therefore do not qualify to be entered in the final Cox PH model. Note that as our first step, we fitted a Cox PH model on all covariates considered in the study and then obtained Schoenfeld residuals. Results from this analysis are presented in Table Table2.2. Covariates that violated the PH assumption include: mother’s education level, total number of children ever born, type of residence, wealth index, birth order, number of births in the past 5 years, mother’s occupation and type of birth. These covariates were, therefore, not included in the final Cox PH analysis.
It is important to note that graphical methods can also be used to identify covariates that may potentially violate the PH assumption but are not statistical tests except for an initial exploratory assessment before a formal statistical test. Covariates with categories whose survival curves intersect or diverge disproportionately from each other over time are known to violate the PH assumption.
Figures 1 and and22 illustrate a graphical method mentioned above for assessing PH assumption using two covariates that have been identified as those that violate the PH assumption. Both figures give supporting evidence to violate the PH assumption by the two covariates considered. We fitted a univariate and a multivariate Cox PH model on all covariates that did not violate the PH assumption. The results from this analysis are presented in Table 3. Sex of the child, sex of the household head and number of births in the past 1 year are the factors strongly associated with under-five child mortality rate in Uganda. The results suggest that a girl child has a 17% lower hazard of death compared to the boy child. Children born in households headed by females have a 30% higher hazard of death than those born in households headed by males. The results further suggest that mothers who had more than one birth in a year put their children at a higher hazard of death than those with no birth. The hazard of death for children born of mothers who had 2 births in the past 1 year was 2.34-fold higher than those born of mothers with no birth in the past 1 year. Lastly, children whose fathers had secondary and higher education were at a lower hazard of death compared to those born of illiterate fathers.
Using the Akaike information criteria (AIC) , the best fitting Cox PH model had four covariates namely: father’s education, sex of the child, mother’s age group and sex of the household head.
Results presented in Table 4 confirm that sex of the child, sex of the household head and number of births in the last 1 year were strongly associated with under-five child mortality rates in Uganda. Children whose father’s education level is secondary and higher had a lower hazard of death compared to children whose fathers were illiterate. There was no significant difference in the hazard of death for children whose fathers were illiterate or had primary education. Mother’s age group was not significant but the age groups considered gave some interesting results. Children born of mothers below 20 years of age had a higher hazard of death than those born of mothers aged between 20 and 29 years of age. There was no significant difference between the hazard of death for children under the age of five born of mothers below 20 years and those who were 40 + years of age. This indicates that women who give birth before 20 years of age and those who give birth after 40 years of age, put their children at an equally higher hazard of death before celebrating their fifth birthday.
We graphically illustrate the results for two of the covariates considered to be strongly associated to under-five child mortality rates in Uganda using survival curves.
Figures 3 and and44 illustrate survival curves for the two selected covariates. The survival curve for girls is above that of boys and hence indicates a better survival rate for girls. Female headed households were also associated with a higher hazard of death for children under the age of five compared to male headed households.
We fitted two random survival forest models on the dataset, that is, the one based on survival trees built using the log-rank and the log-rank score split-rules, respectively. Note that these two models were built using only covariates that were identified as satisfying the PH assumption. Characteristics of the two forests are presented in Table 5 below.
To identify the most important covariates in explaining survival of children under the age of five in Uganda, permutation importance was used as the measure of variable importance [22, 26, 37]. Results from fitting a random survival forest of 1000 survival trees built using the log-rank split-rule are summarised in Fig. 5. They indicate that sex of the household head (SHH), religion (RELI), father’s education (FE), source of drinking water (SDW), number of births in the past 1 year (BP1Y) and sex of the child (SC) are the most important covariates strongly associated to under-five child mortality rates in Uganda. These results are in agreement with the results obtained from fitting a multivariate Cox PH model presented in Table 3 as far as significant effects are concerned but it is interesting to note that the random survival forest model did pick other covariates as important, namely, religion and source of drinking water. The error rate for any new prediction and in this case the out-of-bag prediction error rate was 47.32%.
For comparison, we also fitted a random survival forest model with survival trees built using the log-rank score split-rule.
The results on variable importance presented in Fig. 5 are similar to the results in Fig. 6. The figures further indicate that the two survival forest models have an approximately equal error rate which confirms or is in agreement with a study by  where the two models were found to have a similar predictive performance.
Survival trees and random survival forests divide the covariate space into subgroups of good and poor survival experience predictors. They are therefore promising methods in analysing survival data in the presence of non-proportional hazards . We fitted random survival forest models under the two split rules (log-rank and log-rank score, respectively) on the 2011 Uganda Demographic Health Survey dataset. We considered all covariates in the analysis including those that violated the PH assumption. The characteristics of these two forests are presented in Table Table66 below.
The error rates from the out-of-bag sample for the forests built with survival trees based on the log-rank and the log-rank score split-rules are 17.29 and 19.69, respectively. These two error rates are much lower compared to the error rates for survival forests built based on only covariates that satisfy the PH assumption. This result confirms the improved performance of random survival forests in the presence of non-proportional hazards covariates . However, making this conclusion based on the out-of-bag error rate may not be sufficient. It is also important to note that it is expected of the error rate to decrease with addition of more covariates. However, the key point in the above analysis is that the importance of covariates that satisfied and those that violated the PH assumption were evaluated.
Results from both forests indicate that the number of children under the age of five in the household (CUF) highly influences under-five child mortality rate in Uganda. Other covariates that are strongly associated to under-five child mortality in Uganda as ranked by the forest according to their importance include: the number of births in the past 5 years (BP5Y), birth order (BORD), wealth index (WI) and the total number of children ever born (CEB). Note that the number of children under the age of five in the household had the highest percentage of death as seen in Table 1.
Covariates that were strongly associated to under-five child mortality rates in Uganda in the presence of proportional hazards show up among other covariates but do not appear to be highly ranked. This result indicates that excluding covariates in the analysis of survival data due to violation of the PH assumption leads to loss of information. We see this as a very important property for random survival forests demonstrated in these two analyses namely, the choice of covariates in the model do not need a priori to rely on the too restrictive PH assumption. This is a demonstration of flexibility on the part of random survival forests as an additional attractive property compared to models that rely on the strict PH assumption. We can, therefore, conclude that random survival forests are good alternative models to use while identifying factors affecting under-five mortality rates especially in the presence of non-proportional hazards covariates. To verify this results, we used integrated brier scores  as a measure of predictive performance as presented in the next section.
The predictive performance for the models used was evaluated using the integrated brier scores , presented in Additional file 1: Appendix 2. We used the pec package  in R  for this analysis. Prediction error rates of 50% or higher are useless because they are no better than tossing a coin [26, 41].
The results in Fig. 9 show that models used in this analysis have a good predictive performance. In the presence of non-proportional hazards covariates, random survival forest models under the two split rules (log-rank and log-rank score, respectively) show a much better predictive performance. Their predictive performance exhibited is better than that of models based strictly on the PH assumption. In the presence of proportional hazards, however, the Cox model shows a better predictive performance compared to the two random survival forests models. This strengthens the recommendation that if all covariates satisfy the PH assumption, the Cox PH model is preferable.
The good predictive performance for random survival forests in the presence of non-proportional hazards covariates is an appealing result in the analysis of survival data especially that from public health. This is because covariates with non-proportional hazards have often been excluded in the analysis of survival data especially when the standard Cox proportional hazards model was being used for analysis. In some cases, other models like the extended Cox model have been used but they are known to have some restrictive formulation complexities. Using a stratified Cox PH model is another alternative to dealing with covariates that do not satisfy the PH assumption. However, the downside of this approach is that if a covariate is used as a stratifying variable its effect on the outcome cannot be estimated yet a researcher(s) might be interested in its effect. Random survival forests are flexible and have fewer assumptions. They are, therefore, plausible alternative models in analysing survival data to understand factors affecting under-five mortality rates in the presence of proportional and non-proportional hazards. However, further research is required on the merits and demerits of the methods.
Survival trees and random survival forests are increasingly becoming popular alternative models for the analysis of time-to-event outcomes . They have been identified as suitable models in analysing survival data in situations where the proportional hazards assumption is violated [27, 43]. However, not much literature is available to confirm the assertion. In this study, we have therefore compared the predictive performance of the Cox proportional hazards model to the random survival forests by re-analysing a dataset that was first analysed by . The study further compares the performance of random survival forests on the same dataset in the presence of covariates that violate the proportional hazards assumption to that when these covariates are excluded. Under the PH assumption, the three models show that sex of the household head, sex of the child and the number of births in the past 1 year are strongly associated to under-five child mortality rate in Uganda.
Other covariates such as source of drinking water, Father’s education and religion show up as important in explaining under-five child mortality rates in Uganda with random survival forest models. However, these covariates did not appear to be very strongly associated to under-five child mortality rate in the Cox proportional model. It is interesting to note that random survival forest models give additional information in regard to variable importance.
Results from the two forest models in the presence of non-proportional hazards show that the number of children under the age of five in a household, greatly influences under-five child mortality rates. This ranks top in the two random survival forest models. Other factors ranked as important in understanding under-five child mortality rates by random survival forests in the presence of non-proportional hazards covariates are: births in the past 5 years, wealth index, birth order and total number of children ever born. Similar factors have emerged to be strongly associated to under-five child mortality rates in other studies [3, 29, 44, 45].
To compare the predictive performance of these three models on the scenarios considered, we used integrated brier scores via cross-validation. The Cox proportional hazards model had a better predictive performance in the presence of only those covariates that satisfy the proportional hazards assumption compared to the two random survival forest models. This result may not be seen as a surprise because the Cox PH model works best under this assumption from which its original formulation by  is based. The result is further confirmed because the two random survival models had a high out-of-bag error rate of 47.36 and 47.32%, respectively. The out-of-bag error rate for the two random survival forest models (RSFLR, RSFLRS) in the presence of proportional hazards are higher compared to those of random survival forest models (RSFLRNON, RSFLRSNON) in the presence of non-proportional hazards covariates. This implies that excluding covariates that have non-proportional hazards in the analysis gives less informative results. The results further confirm that random survival forests are robust in approximating complex survival functions, including functions based on covariates with non-proportional hazards, while maintaining low prediction error rates [27, 46–48].
However, since most aspects of these models are under development, it is recommended that one uses them hand in hand with the standard methods like the Cox proportional hazards model. The same recommendation was made in other studies related to random forests [42, 47, 49, 50]. It has also been established that random survival forests are useful in situations where the relationship between the response and the predictors may be complicated . However, there are concerns that survival trees are built using the log-rank split-rule whose power to discriminate between two groups is highest when the proportionality hazards assumption holds. This may have an impact on the predictive performance of the survival forest model. This is important especially when the survival (or hazard) functions cross each other in the two groups being compared . However, more research is needed to fully ascertain this fact especially in the presence of non-proportional hazards. More research will also guide scholars to the best split-rule that may help in such circumstances. A recent study  has recommended the use of the integrated absolute difference between the two daughter nodes’ survival functions as the splitting rule in circumstances where the hazard function cross. They have concluded that forests built with this rule produce very good results in general, and that they are often better compared to forests built with the log-rank splitting rule.
The study confirms that random survival forests have a good predictive performance in the presence of non-proportional hazards . It is, therefore, clear that these methods are promising alternatives to models that rely heavily on the proportional hazards assumption where the presence of covariates that violate the proportional hazards assumption is inevitable.
This study has demonstrated that the Cox PH model and random survival forests could cleverly be used in a complementary manner to fully model and analyse survival data in the presence of proportional and non-proportional hazards. The good predictive performance shown by the two random survival forest models in the presence of non-proportional hazards covariates for this dataset implies that these models could be alternative models in analysing survival datasets especially when the assumption is violated. Our conclusions on the use of random survival forests to analyse survival data are in agreement with the recommendations by [26, 50]. Obvious extensions that came to light when dealing with large survey data is when there are outcomes and covariates with missing data. We propose combining random survival forests with multiple imputation methods to reduce the loss of information. The combined approach will be to apply random survival forests after multiple imputation. A limitations to this study is that we have used random survival forest models that have been identified to favour to covariates with many split points in survival tree building [52–55]. Given the fact that most of our covariates were categorical with more than two categorises, biased results on estimates such as variable importance are inevitable [53, 55]. Our recent study has therefore recommended the use of conditional inference forests suggested by  in the presence of covariates with many split points.
Authors, JBN and HM conceived the concept, JBN, analysed the data. JBN and HM prepared the manuscript. Both authors read and approved the final manuscript.
We thank the German Academic Exchange Service (DAAD) and the University of Kwazulu-Natal for supporting the first author in her Ph.D. studies. We sincerely thank The Demographic Health Survey program for providing the data used in this study.
Both authors declare that they have no competing interests.
The authors confirm that all data underlying the findings are fully available without restriction. The data is held by the Demographic and Health Survey Program and freely available to the public but a request has to be sent to the Demographic and Health Survey Program.
The Demographic Health Survey Data is collected according to the rules and guidelines stipulated by WHO World Health Survey on consent from the participants. Some of these rules include but not limited to; participation in the survey is voluntary and the respondent can refuse to be interviewed. The interviewer is responsible for explaining what the survey is about, providing all the necessary information, and making sure the respondent understands the implications of his/her participation before giving his/her consent. The information given should be simple and clear and adapted to the respondent’s level of understanding. Consents must be documented by asking the respondents to sign an Informed Consent Forms (Household Informant Consent Form; Individual Consent Form) before doing the interview. These forms must mention who will be doing the study, the types of questions that will be asked, why the study is being done, and who will have access to the information provided.
The statement is available on the DHS ethical clearance certificate and it states that: The IRB-approved procedures for DHS public-use datasets do not in any way allow respondents, households, or sample communities to be identified. There are no names of individuals or household addresses in the data files. The geographic identifiers only go down to the regional level (where regions are typically very large geographical areas encompassing several states/provinces). Each enumeration area (Primary Sampling Unit) has a PSU number in the data file, but the PSU numbers do not have any labels to indicate their names or locations. Before each interview is conducted, an informed consent statement is read to the respondent, who may accept or decline to participate. A parent or guardian must provide consent prior to participation by a child or adolescent. The informed consent statement emphasizes that participation is voluntary; that the respondent may refuse to answer any question, or terminate participation at any time; and that the respondent’s identity and information will be kept strictly confidential. The study was also submitted to the ethics committee of University of Kwazulu-Natal and they stated that no ethical clearance was required.
We thank the University of Kwazulu-Natal for supporting JN in her PH.D. work. The first author is also financially supported by DAAD.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
The online version of this article (doi:10.1186/s13104-017-2775-6) contains supplementary material, which is available to authorized users.
Justine B. Nasejje, Email: az.ca.smia@enitsuj.
Henry Mwambi, Email: az.ca.nzku@HibmawM.