|Home | About | Journals | Submit | Contact Us | Français|
Using the Southern African Bird Atlas Project (SABAP2) as a case study, we examine the possible determinants of spatial bias in volunteer sampling effort and how well such biased data represent environmental gradients across the area covered by the atlas. For each province in South Africa, we used generalized linear mixed models to determine the combination of variables that explain spatial variation in sampling effort (number of visits per 5′ × 5′ grid cell, or “pentad”). The explanatory variables were distance to major road and exceptional birding locations or “sampling hubs,” percentage cover of protected, urban, and cultivated area, and the climate variables mean annual precipitation, winter temperatures, and summer temperatures. Further, we used the climate variables and plant biomes to define subsets of pentads representing environmental zones across South Africa, Lesotho, and Swaziland. For each environmental zone, we quantified sampling intensity, and we assessed sampling completeness with species accumulation curves fitted to the asymptotic Lomolino model. Sampling effort was highest close to sampling hubs, major roads, urban areas, and protected areas. Cultivated area and the climate variables were less important. Further, environmental zones were not evenly represented by current data and the zones varied in the amount of sampling required representing the species that are present. SABAP2 volunteers' preferences in birding locations cause spatial bias in the dataset that should be taken into account when analyzing these data. Large parts of South Africa remain underrepresented, which may restrict the kind of ecological questions that may be addressed. However, sampling bias may be improved by directing volunteers toward undersampled regions while taking into account volunteer preferences.
Progress in macroecology, biogeography, and large‐scale conservation planning is enabled by a growing number of nonsystematically collected species distribution databases in the form of museum‐curated collections (specimen collections) and large‐scale species atlases (Robertson, Cumming, & Erasmus, 2010). Such databases, representing multiple taxa and large regional to subcontinental spatial scales, are increasing in scope (i.e., taxonomic and geographical) and detail (i.e., spatiotemporal resolution and types of information recorded). This development is aided by constant improvements in digital database management, accessibility (e.g., open source and Internet‐based data), and analysis (computing power and statistical techniques) (Boakes et al., 2010; Kelling et al., 2013). However, the adequate sampling of huge amounts of georeferenced species distribution data is a persistent challenge.
Specimen collections depend largely on professional scientists such as taxonomists, whereas species atlases, especially of conspicuous or charismatic taxa (e.g., birds or butterflies), are often organized as citizen science projects supported by hundreds of volunteer observers (Bird et al., 2014; Botts, Erasmus, & Alexander, 2011; Robertson et al., 2010; Tulloch & Szabo, 2012). Both specimen collections and species atlases tend to be inherently biased in terms of when and where contributors decide to sample (spatiotemporal bias) and the skill of contributors as data collectors (e.g., variation in identification and record keeping) (Bird et al., 2014; Boakes et al., 2010; Peterson, Navarro‐Sigüenza, & Benítez‐Díaz, 1998; Reddy & Dávalos, 2003; Robertson et al., 2010; Sastre & Lobo, 2009; Tulloch & Szabo, 2012). Several recent studies on spatial or geographical sampling bias show that sampling sites tend to be chosen based on accessibility, that is, traveling distance and ease of traveling (e.g., roads and terrain) to or within the sampling site, and on the attractiveness of a site for sampling, for example, the expectation of high biodiversity or of observing rare or charismatic species (Botts et al., 2011; Reddy & Dávalos, 2003; Romo, García‐Barros, & Lobo, 2006; Tulloch, Mustin, Possingham, Szabo, & Wilson, 2013). Citizen volunteers may also be motivated by a esthetic (e.g., scenic landscape features) and recreational factors (Tulloch et al., 2013). Consequently, a large proportion of samples originate from a small proportion of geographical space in and around residential and protected areas, whereas locations that are remote or believed to be low in biodiversity tend to be poorly sampled (Botts et al., 2011; Peterson et al., 1998; Sastre & Lobo, 2009).
If ignored, spatial sampling bias may result in distorted views of biodiversity, biogeography, and species distributions, with observed patterns of variation reflecting sampling effort rather than environmental or demographic causes (Bird et al., 2014; Botts et al., 2011; Evans, Greenwood, & Gaston, 2007). Species distribution databases are more useful if data are compiled with a standardized sampling protocol and include information about the observation process, for example, a measure of sampling effort for each record within the database (Bird et al., 2014; Guillera‐Arroita, 2017; Robertson et al., 2010). Further, species distribution databases may be designed with a variety of objectives, for example, whether sampling would attempt a wide coverage or whether sampling would be focused or stratified according to habitat or protected areas (Tulloch et al., 2013). Clear understanding of spatial sampling bias, survey objectives, and data types is essential, especially when considering that various species distribution databases, each with particular sampling methods and biases, are integrated and studied at a global scale (www.gbif.org; www.mol.org; Jetz, McPherson, & Guralnick, 2012).
Species distribution (Guisan & Zimmermann, 2000) or occupancy (Mackenzie et al., 2006) modeling techniques relate species distribution data to environmental covariates (e.g., spatial variation in climate and habitat type) to infer species spatial distributions. These techniques can account for variation in sampling effort, interpolate geographical “gaps” in the data, or predict the geographical locations that should be prioritized for additional sampling (Bird et al., 2014; Bled, Nichols, & Altwegg, 2013; Hernandez, Graham, Master, & Albert, 2006; Kramer‐Schadt et al., 2013; Phillips et al., 2009). However, these techniques are most reliable if based on repeated visits of sampling sites that represent the full range of variation in the environment (Araújo & Guisan, 2006; Bled et al., 2013; Hernandez et al., 2006; Phillips et al., 2009). Occupancy techniques, in particular, require multiple repeated visits to model the probability of detecting species that are present (Altwegg, Wheeler, & Erni, 2008; Bled et al., 2013; Broms, Hooten, Johnson, Altwegg, & Conquest, 2016; Guillera‐Arroita, 2017). Species detectability may vary due to several mechanisms, such as species traits, observer skill, survey methods and conditions, and habitat characteristics (Guillera‐Arroita, 2017). Species distribution and occupancy techniques are an actively developing field of research, and are widely and increasingly used to study species spatial distributions and range dynamics (Guillera‐Arroita, 2017; Guillera‐Arroita et al., 2015). These techniques benefit most from an environmentally stratified sampling design, rather than attempting to close geographical gaps by sampling as much area as possible but with low effort per unit area (Araújo & Guisan, 2006; Guillera‐Arroita, 2017; Kramer‐Schadt et al., 2013; Tulloch et al., 2013).
In South Africa, large‐scale species distribution databases facilitated a wealth of ecological research and conservation planning analyses (e.g., Harrison, Underhill, & Barnard, 2008), with historical and current databases including birds, frogs, mammals, butterflies, spiders, proteas, and invasive alien plants (find the host organizations at adu.org.za, www.proteaatlas.org.za and www.sanbi.org). The second Southern African Bird Atlas Project (SABAP2), which was launched in the year 2007, is arguably the most ambitious atlas project for the region in terms of scope, resolution and data volume. Citizen scientists record bird species presence at a relatively fine resolution (grid cells of 5 min latitude by 5 min longitude, termed “pentads”) within eight sub‐Saharan African countries, namely South Africa, Lesotho, Swaziland, Namibia, Botswana, Zimbabwe, Mozambique, and Kenya. By the end of May 2017, nearly 2,300 observers had conducted nearly 187,000 separate surveys, contributing more than 9.6 million records, and covering more than 17,700 pentads, and rate of contributions remain high (http://sabap2.adu.org.za/). However, current SABAP2 data show obvious and substantial spatial bias in sampling effort. Repeatedly sampled pentads comprise a small proportion of the total area and tend to be spatially clustered, forming a few well‐sampled geographical regions. Conversely, outside these well‐sampled regions, there still remain large poorly sampled geographical areas.
The second Southern African Bird Atlas Project is designed to run indefinitely with the aim of creating a valuable long‐term dataset for southern Africa. Thus, an assessment of sampling bias will provide much‐needed information for data users and future sampling endeavors, and ensure that volunteers' time and effort and their contributed data are used to full potential. Wright, Underhill, Keenec, and Knight (2015) previously studied the motivation of SABAP2 volunteers and the benefits they gain. However, a spatially explicit study of the possible causes and consequences of spatial sampling bias has not been conducted for SABAP2. Moreover, accounting for and improving observation bias contributes to developing species distribution data that are useful in global ecological studies. Similar evaluations of sampling bias could benefit other new or existing species atlases for many taxa around the world. Our aims are (1) to reveal spatially explicit determinants of variation in sampling effort in SABAP2 and (2) to illustrate variation in data representativeness among a variety of environments.
We focused on South Africa, Lesotho, and Swaziland where data are accumulating most rapidly and widely, and for which comprehensive environmental and human‐related GIS (geographical information system) datasets are available. The sampling protocol of SABAP2 was designed to standardize sampling by requesting that the volunteers record all the birds they encounter within a pentad for at least 2 hr (intensive sampling period), but no longer than five consecutive days, and that they attempt to cover all habitat types within the pentad. Volunteers are coordinated through regional atlas committees and the SABAP2 Web site (http://sabap2.adu.org.za/), which includes training materials (e.g., how to use GIS programmes and recognize pentad boundaries), workshops (e.g., bird identification), and birding events. The online submission process links records automatically to a coverage map and flags unusual (e.g., out of range) records that are then vetted by regional atlas committees. The SABAP2 database includes information on sampling effort for each pentad in terms of number of contributed species lists (i.e., one list per visit) and number of records (i.e., species sightings), as well as number of hours and days spent sampling per pentad.
Atlas data used in this study were contributed between June 2007 and the end of August 2016. For this period, about 75% of the pentads covering South Africa, Lesotho, and Swaziland were visited at least once (i.e., one or more lists contributed); however, <16% of pentads were sampled 10 times or more; that is, enough repeated visits to ensure that common species were detected with high probability, even with relatively low detectability (Guillera‐Arroita, Ridout, & Morgan, 2010). Spatial bias could be partly attributed to coordination efforts of the regional atlas committees for each province and the “birding challenges” that aim to intensively sample regions of special concern for bird biodiversity. Areas covered by birding challenges include Kruger National Park, Western Cape Province, and the four degrees latitude and longitude encompassing Gauteng and parts of the surrounding provinces (the Gauteng 4D birding challenge).
For this analysis, we investigated each South African province separately (Lesotho and Swaziland were not included) to account for the possible influence of the regional atlas committees and birding challenges, and to account for regional differences in level of human population density and development (Figure 1, Table S1). We investigated the entire four degrees comprising the Gauteng 4D challenge separately from the rest of the surrounding provinces to account for the increased sampling in this region (Figure 1, Table S1). We explored factors representing the accessibility (1–2) and attractiveness (3–5) of each pentad that may explain spatial variation in sampling effort.
Study area: (1) Limpopo Province, (2) North West Province, (3) the four degree square comprising the Gauteng 4D birding challenge, (4) Mpumalanga Province, (5) Swaziland, (6) Northern Cape Province, (7) Free State Province, (8) Lesotho, (9) KwaZulu‐Natal ...
We used linear regression to determine how well variables 1–5 explain spatial variation in sampling effort represented by number of lists per pentad. “Distance to sampling hub” was log‐transformed to ensure a linear relationship with the response variable, because the untransformed relationship is a distance‐decay function. Separate generalized linear models for each province included all the explanatory variables listed, to examine their relative importance. Collinearity among predictors was generally low and never severe enough to justify excluding any predictors (O'Brien, 2007; see Variance Inflation Factors in Table S2). The models were fitted via penalized quasi‐likelihood using function “glmmPQL” in package “MASS” version 7.3‐45 (Venables & Ripley, 2002) in program R (R Core Team, 2016). We assumed a Poisson distribution and included an exponential spatial correlation structure as a random variable in each model to account for spatial autocorrelation.
We examined how variation in sampling intensity and the ability to detect the species that are present (sampling completeness) reflect in both geographical and environmental space. This idea is based on the potential for species distribution and occupancy models to estimate species distributions based on patchy species presence records and environmental background data (Bird et al., 2014; Bled et al., 2013; Guisan & Thuiller, 2005; Kramer‐Schadt et al., 2013). Bird distributions are driven by climate and vegetation type (Acevedo & Currie, 2003; Boone & Krohn, 2000; Van Rensburg, Koleff, Gaston, & Chown, 2004). Therefore, we defined environmentally distinct zones by partitioning all the pentads comprising South Africa, Lesotho, and Swaziland, into subsets of pentads with similar environments in terms of climate and vegetation biomes (for similar methods, see Robertson & Barker, 2006; Botts et al., 2011; Tulloch & Szabo, 2012). We first simplified the three climatic variables with a principal components analysis (PCA). The first PCA scores were related to mean annual precipitation, mean summer temperature, and mean winter temperature, with factor loadings 0.703, −0.675, and 0.225, respectively (Figs. S1 and S2a). The mapped component scores (Fig. S2a) show the main gradient between hotter, drier areas in the northwest, and milder, wetter areas in the southeast, as well as more local‐scale variations, such as at mountain ranges (see also Botts et al., 2011; Robertson and Barker, 2006). Therefore, although this component explains only about 57% of the variation, we deemed it a useful representation of climatic variation for our purposes. We then grouped the pentads into ten climate zones based on a histogram of the first PCA scores, which generally ranged from hot and dry at class 1 to moist and mild at class 10 (Figures 2, S1 and S2a).
We defined 27 subsets of pentads to represent environmentally distinct zones. First, pentads were each assigned one of the seven biomes and one of the ten climate classes. The biome and climate classes were superimposed to form several climate zones within ...
Further, we assigned each pentad to the plant biome that covers the largest percentage of the pentad (Figures 2 and S2b). Mucina and Rutherford (2006) defined nine biomes, namely Desert, Nama Karoo, Succulent Karoo, Fynbos, Grassland, Savanna, Albany Thicket, the Indian Ocean Coastal Belt, and Forest (Fig. S2b). However, Forest and Desert comprised only a few pentads, and Mucina and Rutherford's (2006) Desert biome is mainly designated as Nama Karoo and Succulent Karoo in earlier vegetation maps for South Africa (e.g., Low & Rebelo, 1996; Rutherford, 1997; Rutherford & Westfall, 1994). Therefore, we assigned the Forest and Desert pentads to the closest neighboring biomes. Next, we superimposed the biomes and climate zones to define 43 distinct environmental zones of various geographical sizes (i.e., various numbers of pentads), representing several climate zones within each biome (Figure 2). That is, each biome was divided into several large (i.e., large number of pentads) climate zones that represent the typical climate range for that biome and several smaller zones that represent climate extremes for that biome.
Next, to quantify variation in sampling effort for each environmental zone, we counted all of the pentads with at least one list (i.e., total geographical coverage) as well as the pentads with ten lists or more (i.e., repeated samples necessary to model the observation process, Guillera‐Arroita et al., 2010). To examine sampling bias among environmental zones, we conducted G‐tests of independence comparing all pentads with sampled pentads, for both levels of sampling intensity, that is, at least one list and at least ten lists. To ensure that expected frequencies are above 5% in the G‐test, we pooled the smallest similar environmental zones within each biome to increase the number of pentads (Figure 2). This final process combining the smallest zones resulted in 27 environmental zones that were used for all further analyses (Figures 2 and S3).
We ranked the 27 zones according to sampling effort by calculating the difference between observed and expected frequency, where expected frequency is the number of lists that would have been contributed for each zone if sampling effort was geographically homogeneous. Expected frequencies were calculated for both levels of sampling intensity, using the following formula (see also Tulloch & Szabo, 2012): expected frequency = (number of pentads comprising an environmental zone ÷ total number of pentads) × total number of sampled pentads.
Assuming that number of species recorded would increase with number of pentads sampled (the species–area relationship), we used species accumulation curves to assess sampling completeness for each zone (see Moreno & Halffter, 2000; Tulloch & Szabo, 2012), for both levels of sampling intensity. For each zone, we calculated Mao Tau species richness estimates (R package “vegan,” version 2.4‐0, Oksanen et al., 2016), that is, a smoothed species accumulation curve produced by adding the pentads in random order (i.e., the average curve of 1,000 runs). We then fitted the Mao Tau estimates to an asymptotic Lomolino curve to estimate the total species richness (i.e., the asymptote) for each environmental zone (R package “vegan”; Lomolino, 2000; Dengler, 2009; Oksanen et al., 2016). We also tested the Clench and Weibull models (Hortal, Borges, & Gaspar, 2006; Moreno & Halffter, 2000; Tulloch & Szabo, 2012); however, the Lomolino model performed best in terms of fit and robustness. We then ranked the environmental zones according to sampling completeness by dividing each zone's observed species richness by the total estimated species richness, giving a percentage of completeness of the species inventory for that zone.
The variables that best explained spatial variation in sampling effort varied somewhat among the provinces (Table 1, see Table S3 for more detailed results). Nevertheless, for most provinces sampling effort was significantly negatively related to distance to sampling hub (except for Limpopo and Northern Cape provinces) and distance to major road (except for Gauteng 4D and Mpumalanga Province), and significantly positively related to protected area cover (all provinces) and urban cover (except for Mpumalanga and North West provinces) (Table 1). Cultivated area, mean annual precipitation, mean summer temperature, and mean winter temperature were less important explanatory variables, being significant in only a few provinces (Table 1).
Possible determinants of spatial variation in sampling effort in each province of South Africa. Distance to sampling hub and distance to major road represent accessibility, whereas the other variables describe the characteristics of grid cells. T‐values ...
The 27 environmental zones were not equally represented by pentads for both levels of sampling intensity (≥1 lists: G = 579.088, p < .0001; ≥10 lists: G = 1765.687, p < .0001; 26 degrees of freedom). For separate biomes, only the climate zones within the Albany Thicket, Fynbos, Indian Ocean Coastal Belt, and Succulent Karoo were evenly represented by pentads with at least one list, whereas only the Indian Ocean Coastal Belt's climate zones were evenly represented by pentads with ten lists or more (Figures 3 and 5; see also Fig. S4 for more details). The climate zones within the Grassland, Indian Ocean Coastal Belt, Fynbos, and Albany Thicket, and the wetter zones within the Savanna and Succulent Karoo have been especially well covered (more than 70% of pentads have been sampled at least once), with a substantial proportion of these pentads having ten or more lists (Figures 3 and 5, Table S4). However, the Nama Karoo's climate zones and the driest zones of the Succulent Karoo and Savanna are less well covered, and a negligible number of these pentads (fewer than 5% of pentads) have ten or more lists (Figures 3 and 5).
A comparison between the total number of pentads, the number of pentads that had been sampled at least once, and the number of pentads sampled at least ten times. This is shown for distinct environmental zones (subsets with varying numbers of pentads) ...
The recorded species inventories for most of the environmental zones are more than 80% complete for both levels of sampling intensity, when comparing observed species richness to total estimated species richness given by the asymptote of the species accumulation curves (Figures 4 and S5, Table S4). Environmental zones were ranked differently (Figures 5, S4 and S5) when considering sampling completeness (i.e., species accumulation curves) compared to sampling effort (i.e., observed vs. expected sampling effort). For example, although the arid Savanna Zone is poorly sampled in terms of sampling effort, the species inventory is more than 87% complete because fewer species occur there (Figures 3, ,4,4, ,5).5). Conversely, climate zone 3 of the Fynbos biome is well sampled; however, its species inventory is <72% complete (Figures 3, ,4,4, ,55).
A comparison of observed, S(obs), and estimated, S(est), species richness for pentads that had been sampled at least once and at least ten times. This comparison was made for distinct environmental zones comprising seven biomes, (a) Albany Thicket, (b) ...
Environmental zones were ranked according to (1) sampling effort (a and c), that is, whether the zone had been sampled more or less than expected (number of lists contributed, see figure key) given the size of the zone and the overall number of lists ...
Volunteers are indispensable to the development of species atlases given the sheer magnitude of their contributed data and associated time, labor, and costs (Robertson et al., 2010; Tulloch et al., 2013). The second Southern African Bird Atlas Project covers an extensive geographical area, with large amounts of data especially for several subregions that are of special concern for bird diversity and conservation. However, like other species atlases (e.g., Botts et al., 2011; Tulloch & Szabo, 2012) SABAP2 is subject to pronounced spatial sampling bias, due to purposefully focused sampling in regions of special concern and due to the preferences of volunteers for certain sampling sites. Here we explored the causes and consequences of spatial variation in sampling effort, and we discuss current trends, strategies, and tools to mitigate bias and to improve the data accumulation process in species distribution atlases.
We found that variation in sampling effort is generally best explained by amount of urban area and protected area, and by the proximity of major roads, cities, and towns known for ecotourism (i.e., the “sampling hubs”). These findings agree with previous studies examining variation in sampling effort, including butterflies in the Iberian Peninsula (Romo et al., 2006), frogs in South Africa (Botts et al., 2011), and birds in Australia (Tulloch et al., 2013). In the current study, the importance of these determinants varied among the provinces (Table 1). For example, distance to nearest major road is not significant in the Gauteng 4D region, probably because of the relatively good road access (Figure 1, Tables 1 and S1). In contrast, distance to major road is important in the Northern Cape Province where the lack of major roads could restrict the movements of volunteers (Figure 1; Tables 1 and S1). Based on the overall results, we reason that many volunteers are likely resident in major cities and regularly conduct sampling in their own neighborhood and surroundings. When volunteers sample some distance from home, they prefer easy road access to a preferred destination such as a protected area or ecotourism town where they expect good birding opportunities. For the other southern African countries that we have not examined here, most areas remain unsampled and sampling appears to be closely linked to cities, towns, roads,. and other developed areas, or popular tourism destinations, more so than for South Africa (http://sabap2.adu.org.za/coverage.php, as viewed on 31 May 2017).
Spatial sampling bias affects how well the available data represent geographical and environmental space (Bird et al., 2014). We found that sampling coverage and intensity in current SABAP2 data are unequal among a set of distinct environmental zones across South Africa, Lesotho, and Swaziland. Large arid zones tend to be characterized by low sampling effort, unsampled gaps, and a small proportion of pentads with ten or more lists. Similar patterns were reported for other species distribution datasets in southern Africa (e.g., frogs, Botts et al., 2011; the first SABAP, Harrison & Underhill, 1997; plants, Robertson & Barker, 2006). The arid zones may be less attractive to volunteers due to expected low species richness and low accessibility to remote locations or private property (e.g., the mostly arid Northern Cape Province, Figure 1 and Tables 1 and S1; Tulloch et al., 2013). In contrast, wetter, milder environmental zones tend to coincide with the more densely populated areas of South Africa and have therefore been sampled more intensively, with a greater area covered and larger proportion of repeatedly sampled pentads. These well‐sampled zones also tend to be smaller compared to the arid zones, suggesting a higher turnover in environmental conditions across a smaller geographical area. Therefore, intensive sampling in these environmental zones may be beneficial and necessary to detect a higher species turnover and higher overall species richness that is often linked to environmental heterogeneity (Botts et al., 2011; Robertson & Barker, 2006; Van Rensburg et al., 2004). This is supported given that zones with high species richness and low species detectability may require a greater sampling effort (Figure 5; Garrard, Bekessy, Mccarthy, & Wintle, 2008; Wintle, Walshe, Parris, & Mccarthy, 2012).
Survey designs contend with a trade‐off between wider coverage of a geographical area and repeated sampling of representative sampling sites, depending on the objectives and the amount of sampling effort possible. SABAP2 currently comprises both wide‐coverage low‐intensity data and high‐intensity data from repeated sampling that are limited to certain geographical regions, biomes, and climates. Spatial and environmental sampling bias may have several consequences in terms of how well the spatial bias can be mitigated through data processing, how statistical and modeling techniques may be affected, and the type of ecological questions that can be adequately addressed (Bird et al., 2014; Guillera‐Arroita et al., 2015; Peterson et al., 1998). Therefore, researchers and conservation planners need to be aware of the region‐specific limitations of the data.
An environmental bias may affect the accuracy of species distribution and occupancy models that rely on environmental background data (Araújo & Guisan, 2006; Bird et al., 2014; Bled et al., 2013; Hernandez et al., 2006; Phillips et al., 2009). Wide‐coverage low‐intensity data are often used in broad‐scale species distribution modeling (Guillera‐Arroita et al., 2015). In addition, some species may not be present in the proportion of geographical and environmental space that had been repeatedly sampled, although they are likely to be observed through a wide‐coverage strategy covering a greater geographical area (Figure 4, Table S4). However, repeated sampling increases the probability of detecting the species that are present (Gu & Swihart, 2004). Moreover, sufficient repeated sampling is necessary to model the observation process (occupancy modeling), obtain abundance estimates, and examine species range dynamics (Bled et al., 2013; Broms et al., 2016; Guillera‐Arroita et al., 2015). Occupancy modeling can be refined with information about species' probability of detection (Guillera‐Arroita, 2017). Therefore, it would be useful to examine whether variation in detectability is predictable or quantifiable (Gu & Swihart, 2004), perhaps depending on environmental covariates (e.g., restricted visibility due to dense vegetation) or species traits (e.g., coexistence of species that are difficult to distinguish, or cryptic or nocturnal species).
Spatial biases in sampling effort may affect the conservation decision‐making process. For example, sampling bias in favor of regions with dense human populations may exaggerate any existing broad‐scale positive correlation between humans and bird species richness (Chown, Van Rensburg, Gaston, Rodrigues, & Van Jaarsveld, 2003; Evans et al., 2007; Van Rensburg et al., 2004). Consequently, conservation planning efforts often emphasize areas with high biodiversity near human settlements where there may be stronger competition between conservation goals and human development, while neglecting poorly sampled remote locations that may have high conservation potential (Evans et al., 2007). Further, an inability to account for the observation process could confound spatial changes in sampling effort with species range changes, in turn misrepresenting the species' conservation status (Broms, Johnson, Altwegg, & Conquest, 2014; Guillera‐Arroita et al., 2015; Péron & Altwegg, 2015).
The current study focused on natural environmental variation; however, future studies could examine sampling bias among land cover types. For example, relatively pristine and remote environmental zones might be underrepresented if data are mainly collected from the transformed areas within these zones, especially if species composition differs from the nearby natural environment (Dean, Anderson, Milton, & Anderson, 2002). For SABAP2, sampling effort in the arid zones tends to be close to human settlements and along roads (e.g., the Northern Cape Province, Table 1), that is, habitats that are atypical of the relatively untransformed arid zones. Over the past few decades, bird species such as pied crows (Corvus albus) that are native to more mesic areas of South Africa expanded their ranges into the arid areas, where they are associated with transformed areas and woody alien plants (Cunningham, Madden, Barnard, & Amar, 2016; Dean, 2000; Dean & Milton, 2003; Macdonald, 1986; Macdonald, Richardson, & Powrie, 1986). Increasing sampling in natural habitat may address this bias, and it may be helpful to incorporate land use as a covariate in species distribution models (Thuiller, Araújo, & Lavorel, 2004).
Recent developments in statistical methods provide many options for mitigating observation bias. However, ultimately, sampling bias should be actively monitored and addressed in all new or existing species distribution atlases. Wright et al. (2015) showed that SABAP2 volunteers are motivated by experiencing nature, recreation, personal growth, and the opportunity to contribute toward research and conservation. Communication and coordination among all participants are necessary to address sampling bias without sacrificing volunteer satisfaction and contribution (Bird et al., 2014; Sastre & Lobo, 2009). Atlas organizers play an essential role in maintaining volunteer participation by organizing a variety of birding events and challenges and supporting the online volunteer community (http://sabap2.adu.org.za). However, the link between volunteers and species atlases is becoming increasingly automatic and interactive. SABAP2 and other atlases such as eBird (http://ebird.org) link the online record submission process with tools to improve data accumulation (Kelling et al., 2013). Doubtful records, such as observing a bird species out of its known range, are automatically flagged during the submission process for vetting by experts (e.g., regional atlas committees). Online submissions are automatically linked to sampling effort coverage maps on the atlas Web sites, to inform volunteers' future sampling efforts. Species atlases and other citizen science projects benefit from increasingly sophisticated machine learning algorithms to facilitate the interaction between databases and volunteers (Kelling et al., 2013). Additional tools can be added to enhance the data accumulation process. For example, for an environmentally stratified sampling protocol, occupancy modeling could be applied to existing data to model the observation and detection processes and identify sampling sites to prioritize for additional sampling (Williams et al., 2009). Further, spatially predictable volunteer preferences could be taken into account when creating sampling coverage maps to encourage volunteers to visit priority sampling areas (current study, Tulloch et al., 2013). Thus, species atlasing is moving toward an iterative process whereby current data inform future priority sampling areas, and data accumulation is continually improved (Kelling et al., 2013).
This study was funded by the University of Cape Town and an Innovation postdoctoral fellowship and grants (88121, 85802, 81685 and 73912) from the National Research Foundation of South Africa. Thanks to two anonymous reviewers for their helpful comments on a previous draft of the manuscript. The NRF accepts no liability for opinions, findings, and conclusions or recommendations expressed in this publication.
Hugo S, Altwegg R. The second Southern African Bird Atlas Project: Causes and consequences of geographical sampling bias. Ecol Evol. 2017;7:6839–6849. https://doi.org/10.1002/ece3.3228