The spatial distribution of the Duffy blood group variants has been of interest since its discovery 60 years ago because of its link to the pathology of both infectious and non-communicable diseases, including most notably with
P. vivax infection. We have assembled an up-to-date database of Duffy phenotypic and genotypic data, from which we identified 821 geographically unique community surveys, and developed a geostatistical model to generate global frequency maps for the main Duffy alleles, as well as the first map of the Duffy-negative phenotype. These refined maps and associated uncertainty measures allow both an assessment of the quality and distribution of existing data as well as a discussion of how the maps may help direct further research into the interactions between Duffy negativity and
P. vivax malaria. A detailed comparison with the existing maps from Cavalli-Sforza
et al.
28 is presented in the
Supplementary Discussion and
Supplementary Figures S8-S10.
The summary median maps presented reveal relatively smooth global-scale patterns of geographic differentiation among populations. Despite being considered the ancestral allele
34, our maps show a remarkable restriction in the distribution and frequency of the
FY*B allele, with highest prevalence found in Europe and parts of the Americas, with further patches of increased prevalence in areas buffering the region of
FY*BES predominance in sub-Saharan Africa. Frequencies of
FY*A prevalence increase with distance from Africa and Europe, becoming dominant across south-east Asia, including those areas where
P. vivax endemicity is highest
17. Although the
FY*BES allele map predicts presence outside the African continent and the Arabian Peninsula, its frequencies remain too low for the Duffy-negative phenotype frequencies to exceed 10%. Although these static contemporary representations of allelic frequencies cannot alone be interpreted to advance current speculation regarding the causative mechanisms of selection of the high frequencies of the
FY*BES allele
4,
5,
35,
36, the Duffy negativity map does reflect visually the historical areas of malaria transmission, as defined by Lysenko's pre-control era malaria map
37 (
Supplementary Fig. S4, recently republished by Piel
et al.
38)
A major challenge in this study was synthesizing the results of surveys, which used a range of diagnostic methods with potentially different reliabilities, particularly between genotyping and phenotyping methods. The possible influence of such variability on the model input is reviewed in detail in the
Supplementary Methods, but is not considered to have major influence on the final output. By categorizing results into five data types () and developing a versatile geostatistical model, we were able to draw information from the differing data types in our full data set to generate each allele frequency map simultaneously. The
Genotype data, generated from molecular diagnostic methods only widely available after the previous maps
28 were published, were most informative for the model. Despite a generally good global spread of survey data points (), the uncertainty maps allow identification of areas where additional data would have proportionally greatest impact on our understanding of the distributions. Both the quality (data type) and quantity (data distribution) of the data affect the uncertainty measures. Uncertainty is increased by both scarcity of input data (exemplified across the Arabian Peninsula where only
Phenotype-a data were available) and heterogeneity (characteristic of the Americas where populations of diverse origins coexist; and ). In contrast, areas of lowest uncertainty match data-rich regions and areas of near-fixation, illustrated by the hatched areas of 95% confidence in the prediction shown in . Scarcity of input data also leaves us uncertain about possible fine-scale variation of allelic heterogeneity. This is demonstrated by the relatively high uncertainty in the predictions of the patchily distributed
FY*BES allele across the Americas, where spatial heterogeneity is expected to be high and perhaps not fully represented by the data set. As well as improving reliability in the current predictions, additional molecularly diagnosed data would allow refinements of the model to include additional polymorphic variants, such as the low-frequency weak
FY*X variant
39. This is discussed in detail in the
Supplementary Discussion.
Reflecting the growing appreciation of
P. vivax's public health significance and the realization that it is not 'benign'
16,
40,
41, the parasite's relationship with the Duffy receptor is the primary focus of contemporary studies of the Duffy antigen. However, two lines of evidence, both from a community and an individual standpoint, support the need for further research into the Duffy–parasite association. First, contrary to expectation, there is evidence of
P. vivax transmission in areas mapped with highest Duffy negativity frequencies. Although widespread surveys have failed to identify the parasite in this region (including a continental-wide survey by Culleton
et al.
42, and the data set of community parasite rate surveys displayed in ), reports of infected mosquitoes
13, travellers
17 and exposed individuals
43 suggest low level transmission. Across this predominantly Duffy-negative region, very low numbers of Duffy-positive individuals were identified (0.6% of individuals in 123 surveys across the 98–100% Fy(a−b−) region;
Supplementary Table S3). To see whether these two observations can be reconciled to explain transmission, mathematical modelling is needed to estimate the basic reproductive number (R
0) of
P. vivax (as done for
Plasmodium falciparum44) to help assess whether the very low predicted frequencies of susceptible Duffy-positive hosts could sustain transmission in populations mapped as predominantly Duffy negative.
Second, from areas mapped with high Duffy phenotypic heterogeneity,
P. vivax infections have been identified in Duffy-negative hosts (in Madagascar
15 and Brazil
14). If this phenomenon of infected Fy(a−b−) individuals is associated with local Duffy heterogeneity, as hypothesized by Ménard
et al.
15, the Duffy maps presented here could be used to target further studies in other heterogeneous
P. vivax endemic areas
17, including southern Africa, Ethiopia, southern Sudan and pockets of the Brazilian and Colombian coasts. Investigation of
P. vivax transmission in these areas particularly, but also across regions with a spectrum of characteristic Duffy phenotypes, could provide vital public health insights into
P. vivax populations at risk, particularly when coupled with host-level data on Duffy types.
In this era of increasing concern about the
P. vivax parasite, we believe that a contemporary spatial description of the prevalence of the Duffy antigen receptor is essential for optimizing our understanding of the parasite's clinical burden. The geopositioned database and maps represent a new effort to document the spatial characteristics of a fundamental biomedical trait implicated in haematological and other clinical contexts. The versatile geostatistical model developed was adapted to a multiple-locus trait, informed by a range of input data types to generate a suite of output products. Such methods are uncommonly used by the genetics community, but we believe could have an important role in the current era of large-scale spatial genomic analyses. Although we present a cartographic suite which we believe constitutes a significant improvement from previously published attempts
28 (see
Supplementary Discussion and
Supplementary Figs S8-S10), this study highlights limitations to our current knowledge of the Duffy blood group: both in terms of the scarcity of data from many areas, and in relation to the
P. vivax invasion pathway. All collated data and model code will be made openly accessible.