|Home | About | Journals | Submit | Contact Us | Français|
Recent years have seen a huge expansion in the range of methods and approaches that are being used to predict species occurrences. This expansion has been accompanied by many improvements in statistical methods, including more accurate ways of comparing models, better null models, methods to cope with autocorrelation, and greater awareness of the importance of scale and prevalence. However, the field still suffers from problems with incorporating temporal variation, overfitted models and poor out-of-sample prediction, confusion between explanation and prediction, simplistic assumptions, and a focus on pattern over process. The greatest advances in recent years have come from integrative studies that have linked species occurrence models with other themes and topics in ecology, such as island biogeography, climate change, disease geography, and invasive species.
Species occurrence models are used to develop spatially explicit interpolations from known species occurrences to unsampled areas. They are applied in ecology in a wide variety of ways that include (but are not limited to) the basic estimation of where a species can be expected to occur, explaining how species ranges may have changed in the past or predicting how they may do so in the future, understanding niches and the limits on species ranges, quantifying community-level patterns in biodiversity, and exploring alternative scenarios about the impacts of environmental change.
Species occurrence models (e.g., Figure 1) relate changes in a spatially explicit response variable (Y, the species occurrence, stated as either the number of individuals in a grid cell or species presence/absence) to changes in a spatially explicit set of predictor variables (X, which may be categorical or continuous and often include collinear variables such as temperature, rainfall, vegetation, and land cover). X variables are related to the Y variable via a link function, which defines the way in which the predictors relate to the response variable. Although link functions are formally components of generalised linear models (as for identity, logit, or poisson links, for example), most non-linear models also require the selection of a link function (e.g., discriminant function analysis, fuzzy classifiers, or trainable algorithms such as neural networks).
The basic concerns of developing and applying species occurrence models were nicely laid out in the classic paper by Fielding and Bell . A number of more recent papers [2-4] contribute in-depth summaries of important challenges, most of which are still relevant. The majority of current activity in the field can be classified into three interrelated themes: (a) development of new link functions and new statistical approaches; (b) exploration and resolution of issues relating to model fit and model comparisons for existing methods, including problems of scale, autocorrelation, and sampling; and (c) better integration with other themes in ecology, such as island biogeography, invasive species, disease ecology, and climate change impacts. I will expand on each of these three themes in a little more detail.
The development of new statistical approaches to distribution modelling seems to have become something of a spin-off industry, and the range of approaches now on offer is bewildering and (arguably) unnecessary. Nonetheless, there have been a few genuine advances in this area in recent years, particularly in developing approaches to non-linear link functions (e.g., [5,6]). The tradeoff in many cases is between model interpretability and model accuracy.
Statistical questions remain an important research area in species occurrence modelling . In addition to their ecological relevance, techniques for quantifying model fit are important for contrasting the strengths and weaknesses of alternative methods and for resolving questions about the influence of scale and sampling on model output. Under the influence of Fielding and Bell , there has been a gradual shift away from quoting kappa statistics or percentages of different errors and toward the use of ROC (Receiver Operating Characteristic) plots. Information criteria (particularly Akaike's Information Criterion and Bayesian Information Criterion) are also widely used. There have been few great leaps forward in this area in recent years, but a number of solid papers that are gradually bringing clarity to the field have been published (e.g., [8,9]). There have been several clear demonstrations that simple statistical tricks, such as increasing the extent of the sampling area or decreasing the grain (resolution) of analysis while keeping the number of positive records constant, can increase a model's significance [10-12] (although the grain of available data for the analysis of some taxa may genuinely be critical ). Since the power of any frequentist statistical test is contingent on sampling frequency and sample size, recent criticisms of the AUC (Area Under the Curve) (e.g., ) do not, in my opinion, address the fundamental problem, which is the need for a multi-scale rather than a single-scale approach to spatial analysis .
There has been relatively little use of model averaging and Occam's window (a procedure in which a subset of well-fitting models is used to obtain an average solution) [16-19] as ways of obtaining more reliable predictions, although some recent studies have explored the development of models that attempt to take both spatial autocorrelation and imperfect survey data into account (e.g., ) and consensus or ensemble methods are starting to be more widely used .
Species distribution models are increasingly being integrated with other themes in ecology, such as the influence of dispersal on species occurrences , the relevance of life history characteristics and fitness , the potential impacts of invasive species , and both forecasts and hindcasts about the impacts of climate change on species ranges (e.g., [25-27]) and community-level patterns [25,26]. A particularly fast-growing application is the development of models that are based on predictor variables (e.g., climate and land cover) that can be projected into the future under different scenarios to assist in the formulation of proactive strategies for problems such as changes in patterns of vector-borne and infectious diseases (e.g., [27-29]). The increasing availability of high-quality remotely sensed data sets and detailed atlasing and survey records is also contributing to the development of more accurate occurrence predictions, though not inevitably so [27,28].
In recent years, there has been a huge amount of research on predicting species occurrences. It is impossible to do full justice to this buzz of activity in such a short review; nonetheless, I will mention a few selected statistical and ecological highlights.
In the statistical arena, there has been considerable recent progress in dealing with autocorrelation [29-32] and in ways of thinking more effectively about non-linearities in species-habitat relationships, particularly in regard to the quantification of dispersal limitation [33,34] and environmental thresholds . Useful insights into the problem of model transferability are also accumulating .
As methods for predicting species occurrences have improved and become more widely accepted, researchers have been able to turn their attention toward a range of interesting applications. Perhaps the most important advances in recent years have come from applications of occurrence models in fields like evolutionary biology , climate change, invasive species , the study of patterns of species richness [39,40], and disease geography . Many of these studies, in turn, have offered further methodological and theoretical insights. The scale dependencies identified by Menke et al. , for example, constitute one of the most interesting of recent results and should go well beyond their relevance for statistics.
The field appears to be progressing in a number of interrelated ways. Some important methodological issues are still unresolved : the development of ways to correct for the influences of prevalence and scale on model fit, rigorous resolution of the problems created by autocorrelation, and better integration of species distribution models with other approaches to the analysis of spatial pattern in ecology, such as metapopulation and metacommunity models .
The development of more effective ways of incorporating temporal variation in species occurrences into distribution models remains an important challenge, particularly in regard to climate change. Unbalanced sampling regimes create a constant danger that current models interpret temporal variation as spatial variation, or vice versa, and in this way may provide substantially inaccurate predictions. For example, I am not aware of any studies of species occurrences that have dealt with both spatial and temporal autocorrelation in the underlying data sets.
There have been some interesting recent developments relating to the conceptual foundations of species occurrence models [44,45], and some important theoretical challenges remain in thinking through the different assumptions that underlie occurrence models. One approach that has been little explored (but see, e.g., ) is to contrast statistical occurrence models with mechanistic or process-based predictions. As I have argued elsewhere , there is a strong need to develop and use cross-scale comparisons (and data from different levels of organization) to understand species occurrences. Perhaps the most fundamental problem in the field is that too many occurrence models are correlative desktop exercises that are light on ecology; statistically accurate but mechanism-free models do not necessarily mean accurate prediction [48,49] and frequently result in poor transferability .
I am grateful to four tough but anonymous reviewers for their useful comments.
The electronic version of this article is the complete one and can be found at: http://F1000.com/Reports/Biology/content/1/94
The author declares that he has no competing interests.