|Home | About | Journals | Submit | Contact Us | Français|
Genetic variation at the MYH9 locus is linked to the high incidence of focal segmental glomerulosclerosis (FSGS) and non-diabetic end-stage renal disease among African Americans. To further define risk alleles with FSGS we performed a genome-wide association analysis using more than one million single nucleotide polymorphisms in 56 African and 61 European American patients with biopsy-confirmed FSGS. Results were compared to 1641 European and 1800 African Americans as unselected controls. While no association was observed in the cohort of European Americans; the case-control comparison of African Americans found variants within a 60kb region of chromosome 22 containing part of the APOL1 and MYH9 genes associated with increased risk of FSGS. This region spans different linkage disequilibrium blocks and variants associating with disease within this region are in linkage disequilibrium with variants which have shown signals of natural selection. APOL1 is a strong candidate for a gene that has undergone recent natural selection and is known to be involved in the infection by Trypanosome brucei, a parasite common in Africa that has recently adapted to infect human hosts. Further studies will be required to establish which variants are causally related to kidney disease, what mutations caused the selective sweep, and to ultimately determine if these are the same.
Focal and segmental glomerulosclerosis (FSGS) is a common pattern of renal injury. This injury pattern is observed in patients with idiopathic proteinuria, in association with various forms of primary renal injury, and can be the pattern of injury seen in highly penetrant inherited forms of kidney disease. In the last decade, several loci have been identified in families affected with FSGS by genetic linkage analysis. Rare deleteroious mutations that segregate with disease have subsequently been identified from fine mapping the regions identified by these methods in NPHS2 (at chr1q25-q31) , ACTN4 (chr19q13) , TRPC6 (chr11q21-q22) [3,4] and PLCE1 (chr10q23) . NPHS2 mutations are responsible for a non-trivial fraction of pediatric FSGS and steroid-resistant nephrotics syndrome, but only a small percentage of adult-onset disease. Coding-region variation in the other genes mentioned explain disease in very few patients with ‘sporadic’ non-familial FSGS[6,7,8,9,10,11,12].
A significant disparity in the frequency of FSGS between African Americans and people of European ancestry is well described . African Americans have approximately four times the risk of FSGS compared to European Americans. Recent work [14,15] studying the genetics of this disparity by admixture analysis identified a region of chromosome 22 with excess African ancestry in the genome of African American patients. These studies have suggested that a specific haplotype near the MYH9 gene and located within this region of increased African ancestry, denoted E-1, associates with FSGS and other non-diabetic kidney diseases, including human immunodeficiency (HIV)-associated nephropathy and hypertensive renal disease, in African Americans. While the genetic penetrance and absolute risk of FSGS to an individual with the E-1 haplotype is relatively low, the population attributable risk can be considered to be high.
We performed a genome-wide association analysis (GWA) of sporadic biopsy-confirmed FSGS subjects with either African or European ancestry using a dense set of more than one million single nucleotide polymorphisms (SNPs) included on the Illumina 1M-Duo array. We reasoned that common genetic factors may be high in magnitude of effect in cases, thus allowing us to use relatively small numbers of cases, matched for ethnicity, to a large control dataset. We used samples from the Illumina iControl (http://www.illumina.com/science/icontroldb.ilmn) database genotyped on the Illumina 550Kv3 array matched for ancestry to the cases.
Our analysis shows that the association between the chromosome 22 locus and FSGS risk is due to a genetic variant or variants that are likely to have arisen relatively recently in the African population. This region spans the APOL1 and MYH9 genes. Recent literature suggests recent selective pressure in Africa on the APOL1 locus, due to Trypanosoma brucei rhodesiense, a subclade of Trypanosoma brucei which recently adapted to be able to affect human hosts . This selective pressure would explain why this long haplotype has attained high frequency in people of African descent, and makes APOL1 a good candidate for explaining the four fold increase of in FSGS in African Americans. This also raises the distinct possibility that a disease-influencing alteration in MYH9 was brought to high frequency because of linkage with APOL1.
We used DNA extracted from whole blood samples from individuals with biopsy confirmed FSGS and no family history of FSGS for these studies. These samples were collected as part of our ongoing studies of the genetic basis of FSGS in accordance with a protocol approved by the Institutional Review Board at the Brigham and Women’s Hosptial. We performed genome-wide SNP genotyping using Illumina microarrays. We genotyped DNA from 56 African-Americans and 61 individuals of European descent, with FSGS. We excluded individuals with a strong family history of FSGS from this analysis. We used data from 1827 African American and 1641 European American individuals as a control dataset.
Genome-wide association was performed on 61 European FSGS samples by comparing their genotype frequencies with data from 1641 unselected European American controls.
We initially performed a Principal component analysis (PCA) for the 61 cases and 1641 controls. We excluded 110 of these controls that we found to have a significant amount of non-European ancestry. In this method, the genotype of each sample is thought as a vector with one component for each SNP. The set of all these vectors is then projected to a lower dimensional space which best preserves the distances among the vectors.
The two-dimensional PCA plot of the remaining samples is shown in Figure 1. Dimensions above the second did not show any further significant population structure. We performed an independent association analysis of each SNP concordant between the Illumina 1M-Duo and 550Kv3 chips by comparing allele frequency. We used a Fisher’s exact test using Plink software  because of the small number of cases used. Due to the small sample size, we were powered only to find variants with strong effect.
We detected no significant associations for European American cases, using a Bonferroni genome-wide significance level of p<5×10−8, which is a widely used and highly conservative method used to address the problem of multiple tests in GWA studies. We performed follow-up genotyping with a number of the SNPs that were shown to associate with a lower significance p-value of p<10−5 in a larger cohort of 229 European American cases. We were unable to replicate any of these associations, leading us to conclude that there are no common variants with large effect with respect to susceptibility to FSGS. Because no common variants were found, this confirmed our belief that stratification did not create false positive associations.
GWA testing was performed among 56 African American FSGS samples and 1827 African American controls.
In the African American samples, we performed a preliminary PCA of cases and controls. We excluded 68 controls from further study as the analysis showed that these individuals had ancestry that was neither European nor African. The two dimensional PCA of the remaining samples is visualized in Figure 2. Dimensions above the first one did not show significant population structure. The first dimension, displayed horizontally, represents amount of European ancestry. We computed the amount of European ancestry in our cases to verify that this was not statistically different among cases and controls.
In Figure 3 a histogram displays the amount of European ancestry in the African American cases, computed with an HMM model described in the Methods. The average European ancestry of the cases was 18% (with a standard deviation of 12%), while the average European ancestry of the controls was 17% (with a standard deviation of 9%) similar to the percentage previously reported . Again, due to the small sample size and the fact that no significant difference in European ancestry was found among cases and controls, stratification was not considered an issue.
We first determined which SNPs were genotyped on both the 1M-Duo and the 550Kv3 illumina chip platforms. Given the small number of cases (n=56) genotyped using the 1M-Duo chip, we performed an independent association analysis of each matching SNP between the two chips using Fisher’s exact test on allele frequencies, using plink software . Only one locus showed genome-wide significant associations. In Table 1 we show the nine SNPs with p-value p<10−7 that associated with disease. These were all located in and around the MYH9 locus on chromosome 22. We could not directly measure the association for haplotype E-1 defined in  which is mostly identified by the risk alleles for SNPs rs4821481 and rs3752462, since this last variant was genotyped only in cases. However, Hapmap data for the African Yoruba phopulation shows that SNP rs8141971 is a good proxy for rs3752462, with very high LD (r2>0.9). Therefore we also measured frequencies and associations for haplotype E-1p, which is defined by the risk alleles for SNP rs4821481 and rs8141971, and for haplotype T-1, defined by the risk alleles for SNP rs2239785 and SNP rs136187. We picked these two SNPs as they were found to be the best predictor for the disease among all possible pair of SNPs from Table 2. Interestingly, these SNPs are in significant linkage disequilibrium in African Yoruba samples from Hapmap (D′=0.43, r2=0.07) while almost in complete linkage equilibrium in European samples from Hapmap (D′=0.12, r2=0.002). We performed the haplotype analysis using Plink.
In order to determine which SNPs associate with disease, rather than associate with ancestry, we performed a controlled analysis of a 1000kb region on chromosome 22 from position 34,500kb to position 35,500kb (NCBI36.3 coordinates). Using the Hidden Markov Model (HMM) described in the Methods, we identified 54 cases and 1141 controls whose two chromosomes are both of African ancestry in this same 1000kb region. As we expected, most of the SNPs that previously associated with FSGS were shown to be artifacts of ancestry alone. Importantly, we found many previously unidentified SNPs that associated strongly with FSGS over many linkage disequilibrium blocks. These blocks include genes APOL1, APOL2, APOL4, as well as MYH9. By investigating the patterns of linkage disequilibrium in this 1000kb region from the African Yoruba Hapmap participants , we conclude that the disease causing variant is likely contained in a long 60kb region (NCBI36.3 chromosome 22 position 34,982,110–35,044,581 defined by SNPs rs7284919 and rs2239784 which contains part of APOL1 as well as part of MYH9) which spans several different linkage disequilibrium blocks. In Table 2 we show results for those SNPs which associated after ancestry correction with a p-value p<10−5, and for haplotype E-1p and T-1.
We also performed an analysis controlling for haplotpyes E-1p. To avoid uncertainty in the phase, we restricted this analysis to samples homozygous for the risk alleles defining E-1p. In Table 3 we report association results of Fisher’s exact test for 10 SNPs which associated with p-value p<10−3 within samples of this subset. It is clear that controlling for E-1p falls far from explaining the large disparity observed for variants within APOL1.
In Figure 4 the genes APOL1 and MYH9 are represented, together with the location of the recombination hotspots which were predicted from Hapmap data. The main recombination hotspot between APOL1 and MYH9 has been estimated to have an intensity of 0.2cm from Hapmap data. Stated differently, this means that variants on different sides of the hotspot are farther away a genetic distance equivalent to a physical distance of approximately 200kb, while the 3′ ends of the two genes are less than 20kb away from each other. It is clear from this picture that strong associations are still detected even after correction for ancestry from different linkage disequilibrium blocks. Supplementary Figures 1 and 2 show patterns of linkage disequilibrium for the African Yoruba population from Hapmap and the European population from Hapmap.
In this paper, we describe the results of a dense genome-wide scan in two small groups of individuals with FSGS, one African American, and the other European American. We were unable to identify any reproducible associations between FSGS and common genetic variants in European cases compared with unselected controls, which suggests that there are no common variants in the European population of large effect in the susceptibility to disease. (Because our sample size was small, we cannot exclude the existence of genetic variants of small effect.) We replicated the signal originating from the MYH9 locus that has been previously observed [14,15]. This signal did not replicate in European samples, although the statistical power to detect a true association was limited by the relatively small number of cases available for study. We did not find any other signal outside of the MYH9 locus. Given the strong association at this locus for African Americans, this association explains most of FSGS in this cohort, and therefore we were extremely underpowered to identify any additional and independent association.
Although these results are preliminary and we have not identified a clear causal variant that accounts for the increased FSGS risk in African Americans, they still suggest that, given the stronger association we observed for haplotype T-1 compared to haplotype E-1p, a close proxy for haplotype E-1 studied in , the causal variant has a larger penetrance than previously thought. While haplotype E-1 has a frequency close to 60% in African Americans, haplotype T-1 has a frequency close to 30%. This frequency likely reflects more faithfully the frequency of the causal variant than haplotype E-1, which shows that the prevalence of this variant is lower than recently thought and that, once identified, it may in fact have a clinical impact and significant clinical utility.
Patterns of LD in African Yoruba (YRI) and Europeans (CEU) from Hapmap show that there are some variants at this locus that are in strong LD in Africans and that this LD spans both sides of a number of putative recombination hotspots. According to Hapmap, MYH9 SNPs rs11912763, rs16996648, rs5756152, rs16996672, and rs16996677 (NCBI36.3 chromosome 22 position 35,014,668, 35,022,698, 35,042,418, 35,055,916, and 35,057,229 respectively) show strong linkage disequilibrium with variants in APOL1 (see Supplementary Figure 1 and 2), have high minor allele frequency, and they are nearly non-existent in European populations. At the same time signals of natural selection in the region have previously been identified in African Yoruba samples and never in European samples within a region containing genes APOL3, APOL4, APOL2, APOL1, and MYH9 (NCBI36.3 coordinates 34800kb-35100kb) in publicly available databases using the long-range haplotype test (LRH) , the integrated haplotype score (iHS) [21,22], the rMHH , and the composite of multiple signals (CMS) . In this last study, the strongest signal of selection was found for variant rs9622363 (NCBI36.3 position 34,986,501, Genome-wide CMS 14.987, p=1.3×10−5, untyped in our study) and according to Hapmap this variant in African Yoruba is in strong linkage disequilibrium with rs7364143 (D′=0.7, r2=0.18), rs2239785 (D′=1.0, r2=0.8), rs136187 (D′=0.7, r2=0.14), and rs1699672 (D′=0.6, r2=0.07), four out of seven of the variants in Table 2. It is therefore possible that the disease causing variant rose to higher frequency in Africans due to a selective sweep which took place near or at the MYH9 locus in the African population, and which was recent enough to leave extensive patterns of linkage disequilibrium. The Illumina chips that we used for genotyping, despite being the densest commercially available at the time of this study, do not have sufficient numbers of SNPs that are highly polymorphic in the African population to allow us to be more precise at the present time. This is likely due to an ascertainment bias in the way SNPs have been discovered with, until recently, a reliance on non-African samples.
Recently, the 1000 Genomes project (http://www.1000genomes.org/) released sequence and variation data from the DNS samples used in Hapmap data. These data, accessible through the Integretive Genomics Viewer (http://www.broadinstitute.org/igv/), show the existence of many variants at the APOL1-MYH9 locus that are observed at high frequency in Africans but are rare or non-existent in Europeans and Asians and have not been analyzed in Hapmap. Recent studies [25,26] based on variants genotyped in Hapmap failed at defining a clear candidate for causal association. If it is in fact true that one single variant can explain the association between this locus and FSGS, then it is likely that it is one of the variants which are highly polymorphic exclusively among Africans, but rare or non-existent in non-Africans, which would explain the lack of replication of association of this locus in other large studies in different populations . The small level of associations observed in Europeans by others  could also be explained by low levels of African ancestry in the cases considered or even by the fact that Southern European populations have a 1–3% level of African ancestry due to migrations which took place 50–70 generations ago (private communication with Nick Patterson).
Interestingly, it has been recently shown that mutations in the C-terminal region of APOL1 interfere with the interaction of the Serum Resistance-Associated (SRA) protein with APOL1, which is the mechanism by which Trypanosoma brucei rhodesiense evades the trypanolytic activity of APOL1 that normally confers resistance to the parasite [28,29]. This suggests a potential explanation for a selective sweep which would have taken place in those regions in Africa where the Trypanosoma brucei rhodesiense has evolved and human infection has become widespread. If this is indeed the case, such a sweep is probably no more than a few thousands years old, since it is believed that the SRA protein evolved in Trypanosoma brucei rhodesiense when humans had become more numerous . It is possible that the association of the variants caught up in this selective sweep with increased risk of kidney disease is incidental, although there may be more subtle mechanisms awaiting elucidation.
We genotyped 61 European American (36 males and 25 females, average age at diagnosis 36.0±17.5) and 56 African American (26 males and 30 females, average age at diagnosis 22.5±14.5) patients with idiopathic biopsy-proven FSGS and no family history of FSGS using the Illumina 1M-Duo platform. Patients were recruited in accordance with protocols approved by the relevant human research committees from medical centers in the eastern United States, mainly from New York city. We used 1641 unselected European American samples and 1827 unselected African American samples genotyped with the Illumina 550Kv3 platform from the iControldb database (http://www.illumina.com/science/icontroldb.ilmn). In brief, we downloaded all genotyping data available from genotyping studies that used the illumina 550Kv3 array using ‘European American’ (n= 1641) or ‘African American’ (n= 1827) as the selection criteria. These two datasets were then converted to plink format using scripts we developed (Ross Lazarus, unpublished). We matched cases and control datasets based on their ethnicity. After correction for ancestry, 110 European American controls and 68 African American controls were excluded on the basis of excessive ancestry which was neither European nor African. This left 1531 European American controls and 1759 African American controls for our study. Cases were tested for cryptic relatedness looking for long segments identical by descent, showing that all cases were unrelated.
Genotyping was performed at the Broad Institute of Harvard and MIT. In the cases, out of the 1,199,187 variants analyzed by the Illumina 1M-Duo chip, 47 variants were discarded because they showed discordant clustering among different Illumina chips (private communication, Illumina Technical Support, August 11, 2009), 56,590 variants were excluded because more than 10% of genotypes were missing in African Americans and 58,524 were excluded for similar reasons in European Americans. In controls, out of 549,837 variants in African Americans, 7,298 variants were excluded because missing more than 10% of genotypes and out of 561,255 variants in European Americans 19,051 were excluded for the same reason. Almost all of the variants in the Illumina 550Kv3 chip are also included in the Illumina 1M-Duo chip, so there was not an additional loss of coverage when comparing results for the two platforms. Strand issues were not a concern, because C/G or A/T SNPs are not genotyped in the Illumina 550Kv3 platform — these polymorphisms can give rise to spurious case-control comparisons when other genotyping platforms are used, a frequent cause of problems if the data describing which strand is used to code the SNPs is incorrect or unavailable.
To identify which regions of the genome in African American cases and controls had African or European origin, we used a Hidden Markov Model with three hidden states corresponding to the number of chromosomes of European ancestry at a given locus. Given a polymorphic locus t in the genome, we indicate with pE(t) the frequency of the reference allele in the European population, p′E(t) the frequency of the reference allele in the Illumina European cohort, with pA(t) the frequency of reference allele in the African population, with p′B(t) the frequency of the reference allele in the Illumina African American cohort, and m=21% the expected European ancestry in African American samples. We approximated the ancestral frequencies of the two populations using the Illumina controls cohort in the following way:
This method of estimating the two allele frequencies is conceptually much simpler than the one described in , which uses a Markov chain Monte Carlo approach to estimate the frequencies both from cases and controls. Differences between these frequencies is not an important concern given the overwhelmingly large number of controls genotyped and given that we already know the average frequency of European ancestry in African Americans. Following the notation in , we set the emission probabilities ejk(t) for j,k=0..2, where j indicates the number of chromosomes of European ancestry at a given locus and k indicates the number of non reference alleles observed at that locus, as
where for clarity we have dropped the variable t indicating the locus, and we set the transmission probabilities aij for i,j=0..2, where i,j indicate the number of chromosomes of European ancestry at a given locus and the next locus, as
where ε~2−40 is chosen small enough to avoid short false positives. For the admixture determination of the region in between 34,500kb and 35,500kb (NCBI 36.3 coordinates), all 242 SNPs common to the two platforms for cases and controls were used.
Supplementary Figure 1.
Patterns of linkage disequilibrium in African Yoruba samples from Hapmap. Values for D′ are displayer in the image, while coordinates for chromosome 22 are shown on top. Figure generated with Haploview (http://www.broadinstitute.org/mpg/haploview).
Supplementary Figure 2.
Patterns of linkage disequilibrium in European samples from Hapmap. Values for D′ are displayed in the image, while coordinates for chromosome 22 are shown on top. Figure generated with Haploview (http://www.broadinstitute.org/mpg/haploview).
Supplementary Table 1.
Fisher’s exact test association results for SNPs in the African American cohort in between position 34,800kb and 35,100kb (NCBI 36.3 coordinates).
Supplementarty Table 2.
Fisher’s exact test association results for SNPs in the cohort of African American samples who were detected as having two African chromosomes in the 22q12.3 region in between position 34,800kb and 35,100kb (NCBI 36.3 coordinates).
We thank the study subjects for their participation. This work was supported by grants from the NIH (NIH DK54931 and a Genzyme GRIP award to M.P, RO1 HG003646 to R.L.), and the J.D.R.F (to S.T). M.P is an established investigator of the American Heart Association. We thank Drs. Jeffrey Kopp, Cheryl Winkler, George Nelson, and David Friedman for helpful discussions.
The authors have no conflicts of interest to disclose.