African-Americans suffer from kidney failure at high rates compared with individuals without recent African ancestry(1
). Genetic variation at a locus in or near the MYH9
gene on chromosome 22 has been associated with the increased susceptibility to focal segmental glomerulosclerosis (FSGS), HIV-associated nephropathy, and hypertension-attributed end-stage kidney disease (H-ESKD) observed in African-Americans(4
), but thus far causal mutations in MYH9
have not been identified(6
Previous genome-wide analyses have shown a strong signal of natural selection in the region containing the MYH9
genes (iHS data available at http://hgdp.uchicago.edu/
). This observation led us to hypothesize that the kidney disease risk alleles might be located in a larger interval than originally thought(4
). The longer patterns of linkage disequilibrium (LD) associated with variants undergoing selection suggests that a positively selected risk variant could be in a larger interval containing the APOL genes rather than be confined to MYH9
. Because the risk allele(s) are likely to be common in people with African ancestry, we reasoned that such alleles would be present in the data from the African individuals whose DNA was sequenced in the 1000 Genomes Project (www.1000genomes.org
). We therefore used this newly available sequence data to identify polymorphisms within this expanded risk interval that showed large frequency differences between Africans and Europeans in order to test for association with renal disease.
We performed an initial association analysis comparing 205 African-Americans with biopsy proven FSGS but no family history of FSGS with 180 African-American controls. The strongest genetic associations with FSGS were clustered in a 10kb region in the last exon of APOL1
, the gene encoding apolipoprotein L-1 (Table S1
). The strongest signal was obtained for a two-allele haplotype termed “G1” consisting of the two derived non-synonymous coding variants rs73885319 (S342G) and rs60910145 (I384M) in the last exon of APOL1
. These two alleles were in perfect LD (r2
=1). The G1 haplotype (342G:384M) has a frequency of 52% in FSGS cases and 18% in controls (, p=1.07×10−23
Figure 1 Association analysis in FSGS cohorts with logistic regression for alleles G1 and G2. (A) Results of association beween 205 idiopathic biopsy-proven African American FSGS cases and 180 African American controls using Fisher’s exact test. On the (more ...)
Number and frequencies of APOL1 genotypes and alleles in FSGS and hypertension-attributed ESKD cases and controls.
When we performed logistic regression to control for G1, we identified a second strong signal (Table S1
, ; p=4.38×10−7
). This second signal is a 6 base pair deletion (rs71785313, termed “G2”) close to G1 in APOL1
. This deletion removes amino acids N388 and Y389. Due to the proximity of rs73885319, rs60910145, and rs71785313, the two alleles G1 and G2 are mutually exclusive, as recombination between them is very unlikely. Allele G2 has a frequency of 23% in FSGS cases and 15% in controls ().
After performing regressions controlling for both G1 and G2, we observed no other significant associations (Table S1
, ). Conversely, controlling for multiple sets of variants in MYH9
failed to eliminate the APOL1
signal. The LD patterns in this region show that G1 and G2 are in strong LD with variants in MYH9
(Figs S1 and S2
). In particular, the MYH9
E-1 haplotype, the best predictor of renal disease in previous studies, is present in most haplotypes containing the G1 or G2 allele. Specifically, E-1 is present in 89% of haplotypes carrying G1 and in 76% of haplotypes carrying G2, explaining the association of MYH9
E-1 with renal disease.
Haplotype frequencies for FSGS cases and controls are shown in . No difference in FSGS risk was seen when comparing subjects with no risk allele to subjects with one risk allele (G1 or G2, p=0.81, OR=1.04, CI 0.63–2.13). Comparing subjects with zero or one risk allele to subjects with two risk alleles provided an odds ratio for FSGS of 10.5 (CI 6.0–18.4). This analysis supports a completely recessive pattern of inheritance.
Next, we tested association of APOL1
variants and renal disease in a larger cohort of 1030 African-American cases with hypertension-attributed ESKD and 1025 geographically matched African-American controls from the southeastern United States(7
). In this cohort, we tested 36 variants chosen based on the strongest signals of positive selection in a broader genomic region. We also tested nearby coding variants including G1 and G2 and putative MYH9
risk single-nucleotide polymorphisms (SNPs). The strongest association was again with rs73885319 (G1 tag SNP; p=1.1 × 10−39
) (Table S2
). When we controlled for rs73885319 by logistic regression, rs71785313 (G2) again emerged as the strongest association signal (p=8.8×10−18
) (Table S2
). The statistical significance of the combined signal (p=10−63
) was 35 orders of magnitude stronger than for MYH9
SNPs. When we controlled for both G1 and G2, no residual association remained after correction for multiple SNP testing (Table S2
). Frequencies for these alleles are shown in .
With this larger population, we were able to examine the mode of inheritance of renal disease caused by G1 and G2 with greater precision. We partitioned cases and controls by genotype. One risk allele was associated with only a small increase in renal disease risk (odds ratio 1.26, CI 1.01–1.56). Two risk alleles versus zero risk alleles yielded an odds ratio of 7.3 (CI 5.6–9.5). Two risk alleles compared to one risk allele showed an odds ratio of 5.8 (CI 4.5–7.5). Overall, a recessive model best explains these findings and is in agreement with our analysis of the FSGS cohort.
We compared the frequency of G1 and G2 in several HapMap populations using 1000 Genomes sequence data. G1 was present in approximately 40% of Yoruba (from Nigeria in West Africa) chromosomes, but not in any chromosomes from European, Japanese, or Chinese individuals. Similarly, G2 was detected in sequence data from 3 Yoruba subjects, but not in the other three ancestral groups. This distribution data raised the possibility that these variants were selected for in Africa, but not outside of Africa. The high frequency of the disease-associated variants in Yoruba and African-Americans suggests that these variants may confer selective advantage in Africa.
Given the strong evidence for selection previously described in this chromosomal region(9
), we genotyped G1 and G2 in 180 Yoruba samples from HapMap3 to test these variants for their potential contribution to selection (Table S3
). The allele frequencies in Yoruba are 38% for G1 and 8% for G2. We focused on statistical tests that detect selection by evaluating differential degrees of LD surrounding a putatively selected allele compared to the LD around the alternate allele at the same locus(13
A recent (<10,000 years) selective sweep by a positively selected allele that rises quickly in frequency creates longer patterns of LD around the locus under selection(16
). To determine if this is the case for G1 and G2, we computed the Extended Haplotype Homozygosity (EHH)(15
) for each one of the two risk alleles and the non-risk allele(, S3
). We also computed the integrated haplotype score (iHS)(13
) and ΔiHH(11
). The iHS statistic is suited to detect selective sweeps where the selected allele has reached intermediate frequency. |iHS| greater than 2 indicates unusual LD at a locus relative to the rest of the genome, a typical signature of natural selection. As iHS is designed to have a standard normal distribution, its value is significant for the two G1 SNPs (rs73885319 and rs60910145, iHS=−2.45; ). ΔiHH is similar to iHS but tests absolute rather than relative differences in the length of haplotypes(11
). ΔiHH was high for G1 SNPs (ΔiHH=0.471 cM, more than 5 s.d. from the mean) and for rs71785313 (G2) (ΔiHH=0.275 cM, 2.6 s.d. from the mean) compared with the genome as a whole, again showing that haplotypes carrying the derived alleles are positively selected (). Results of multiple tests for selection and population differentiation for the entire region from 34,900–35,100kb (NCBI 36) are reported in Table S4
. These same tests in Europeans showed no deviation from neutrality at APOL1
Figure 2 Natural selection analysis for the Yoruba population. (A) Extended haplotype homozygosity (EHH) values for the three APOL1 alleles (G1, G2, and WT). We computed EHH after combining Hapmap3 genotype data with our genotype data for alleles G1 and G2 (Supplementary (more ...)
Taken together, these data are consistent with the hypothesis that G1 rose quickly to high frequency due to positive selection in Africa. There is less power to show an effect for G2 because of its lower frequency in Yoruba (8%) and the more robust effect of G1 within the same interval, but haplotypes containing G2 show higher degrees of homozygosity than haplotypes that contain neither G1 nor G2, again suggesting positive selection for G2 in Africans ().
While statistical tests for selection are valuable for identifying haplotypes under selection, only functional tests can convincingly demonstrate the causal allele. Our selection tests indicated that G1 and G2 are on haplotypes that have been strongly selected for in Africa but not in Europe or Asia. G1 and G2 are plausible candidates for the causal variants because they alter the sequence of the encoded protein.
ApoL1 is the trypanolytic factor of human serum that confers resistance to the Trypanosoma brucei brucei (T.b. brucei) parasite(17
). T.b. brucei has evolved into two additional subspecies, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense, which have both acquired the ability to infect humans(19
). T.b. rhodesiense is predominantly found in Eastern and Southeastern Africa, while T.b. gambiense is typically found in Western Africa, though some overlap exists(21
). Since these parasites exist only in sub-Saharan Africa, we hypothesized that the APOL1
gene may have undergone natural selective pressure to counteract these trypanosoma adaptations. As an initial test of this hypothesis, we performed in vitro assays in which the trypanolytic potential of the disease-associated ApoL1 proteins was compared with that of the typical ApoL1 protein.
T.b. rhodesiense can infect humans because of a serum resistance-associated protein (SRA) that interacts with the C-terminal helix of ApoL1 and inhibits its anti-trypanosomal activity(18
). A recent study showed that mutations and deletions engineered into this helix prevent SRA from binding to ApoL1(23
). Intriguingly, one of the G1 sequence variants (I384M) and the 6bp deletion (G2) are located exactly at the SRA binding site in the ApoL1 C-terminal helix.
We conducted an analysis of the in vitro lytic potential of 75 human plasma samples with different combinations of G1 and G2 genotypes on T.b. brucei, T.b. rhodesiense, and T.b. gambiense. All 75 plasma samples efficiently lysed T.b. brucei, but none of them lysed T.b. gambiense. Of the 75 samples, 46 lysed SRA-positive T.b. rhodesiense clones, which are typically resistant to lysis by human serum, and all 46 originated from individuals harboring at least one G1 or G2 allele (Table S5
). As measured by titration using serial dilution, the lytic potency of these plasmas against SRA-positive T.b. rhodesiense was higher for G2 than for G1, whereas it was similar for both genotypes against SRA-negative parasites (). While lysis of T.b. rhodesiense by G2 could be explained by the inability of SRA to bind to this mutant, this conclusion did not hold for G1 ApoL1 variants, which SRA could still efficiently bind ().
Figure 3 G1 and G2 alleles of ApoL1 kill T.b. rhodesiense. Trypanolytic potential of ApoL1 variants on normal human serum-resistant (SRA+) and normal human serum-sensitive (SRA−) T. b. rhodesiense ETat 1.2 clones. ETat 1.2R is resistant to normal human (more ...)
We confirmed these results with recombinant ApoL1 proteins. The S342G/I384M (G1) and delN388/Y389 (G2) variants lysed both SRA-negative and SRA-positive T.b. rhodesiense parasites (), but not T.b. gambiense. While G2 was more potent than G1 against SRA-positive T.b. rhodesiense, the reverse was true on SRA-negative parasites. Recombinant ApoL1 variants with either S342G alone or I384M alone were less lytic against T.b. rhodesiense than when present together, while recombinant ApoL1 engineered to have both G1 and G2 mutations was not more active than mutants with G2 alone (). As shown in , all measured features of the T.b. rhodesiense lytic process (kinetics, transient inhibition by chloroquine, typical swelling of the lysosome) were similar to those observed on T.b. brucei with either normal human serum or recombinant ApoL1(17
). Therefore, deletion of N388/Y389 was necessary and sufficient to prevent interaction with SRA and to confer on ApoL1 the ability to lyse T.b. rhodesiense in vitro, whereas the combination of S342G and I384M was required for maximal ability to lyse T.b. rhodesiense despite remaining bound by SRA. None of the variant forms of ApoL1 lysed T.b. gambiense.
In summary, we have shown that sequence variation in APOL1 contributes to the increased risk of renal disease in African-Americans. Two lines of evidence support this conclusion: (i) the non-synonymous variants coded by G1 and the coding region deletion G2 in APOL1 are the sequence variants showing the strongest association with FSGS and H-ESKD; and (ii) association of renal disease with the MYH9 sequence variants disappears after controlling for the APOL1 risk variants. An important question to be addressed in future studies is how sequence variation in ApoL1 mechanistically contributes to the pathogenesis of kidney disease. The recessive model that best fits our genetic data suggests that ApoL1 is performing a critical role in the kidney that is impaired in the setting of the ApoL1 variants, though toxicity of the ApoL1 variants remains a possibility.
We have shown that both ApoL1 variants lyse a deadly subspecies of Trypanosoma that is normally completely resistant to ApoL1 lytic activity. The G2 mutation prevents the SRA virulence factor produced by T.b. rhodesiense from binding to and inactivating ApoL1. Even 10,000-fold dilutions of plasma containing these mutations (particularly G2) are active against the parasite. This raises the possibility that transfusion of small volumes of plasma, ApoL1-containing HDL particles, or recombinant protein might be effective treatment for trypanosomiasis caused by T.b. rhodesiense.
The kidney-disease associated variants are located on haplotypes that show statistical evidence of natural selection. The lytic activity of the variant proteins against Trypanosoma provides a plausible—albeit still speculative—biological explanation for natural selection. The results are consistent with a heterozygous advantage model since the protective effect against T.b. rhodesiense is dominant whereas the association with renal disease is recessive. Sickle cell disease is a well-established precedent for a model in which mutations conferring heterozygote advantage against a parasitic infection can confer a strong biological disadvantage for homozygotes (24
). When present in heterozygous from, certain hemoglobin mutations confer protection against malaria but when homozygous cause severe diseases of the red blood cell (sickle cell disease, thalassemia).
It will be interesting to determine the distribution of these mutations throughout sub-Saharan Africa. In present-day Africa, T.b. rhodesiense is found in the Eastern part of the continent, while we noted high frequency of the trypanolytic variants and the signal of positive selection in a West African population. Changes in trypanosome biology and distribution and/or human migration may explain this discrepancy, or resistance to T.b. rhodesiense could have favored the spreading of T.b. gambiense in West Africa. Alternatively, ApoL1 variants may provide immunity to a broader array of pathogens beyond just T.b. rhodesiense, as a recent report linking ApoL1 with anti-Leishmania activity may suggest(25
). Thus, resistance to T.b. rhodesiense may not be the only factor causing these variants to be selected.
The APOL1 risk alleles for renal disease occur in more than 30% of African-American chromosomes. Given their high frequency and their strong effect on disease risk, unraveling the molecular mechanisms by which they contribute to renal injury will be of great importance in understanding and potentially preventing renal disease in individuals of recent African ancestry.