|Home | About | Journals | Submit | Contact Us | Français|
Case-parent trios were used in a genome wide association study of cleft lip with/without cleft palate (CL/P). SNPs near two genes not previously associated with CL/P [MAFB: most significant SNP rs13041247, with odds ratio per minor allele OR=0.704; 95%CI=0.635,0.778; p=2.05*10−11; and ABCA4: most significant SNP rs560426, with OR=1.432; 95%CI=1.292,1.587; p=5.70*10−12] and two previously identified regions (chr. 8q24 and IRF6) attained genome wide significance. Stratifying trios into European and Asian ancestry groups revealed differences in statistical significance, although estimated effect sizes were similar. Replication studies from several populations showed confirming evidence, with families of European ancestry giving stronger evidence for markers in 8q24 while Asian families showed stronger evidence for MAFB and ABCA4. Expression studies support a role for MAFB in palate development.
Cleft lip with or without cleft palate (CL/P) is a common human birth defect with documented genetic and environmental risk factors1. While CL/P can occur in many Mendelian malformation syndromes, the isolated, non-syndromic form constitutes 70% of all cases2. Evidence for genetic control of CL/P is compelling: recurrence risks are 20–30 times greater than population prevalences3,4 and both twin and family studies5 suggest a major role for genes,
Mutations in IRF6 cause VanderWoude syndrome, the most common Mendelian syndrome including CL/P, and markers in IRF6 have repeatedly shown evidence of association with isolated, non-syndromic CL/P6–9. An allele disrupting an AP2 binding site near IRF6 showed particularly strong evidence among European CL families, although multiple risk alleles are likely10.
Birnbaum et al.11 conducted a case-control genome wide association study (GWAS) in Germany and found significant evidence of association with markers in 8q24.21, and a US case-control GWAS confirmed this region12, with rs987525 being the most significant marker in both studies. Here we present a GWAS using a case-parent trio design in a consortium drawing cases from Europe, the US, China, Taiwan, Singapore, Korea and the Philippines. This design has the advantage of being robust to confounding due to population stratification, which is important when cases from diverse populations are combined.
Because these case-parent trios came from different populations (Table 1), we conducted a principal components analysis (PCA) on all parents to document genetic variation in our consortium (Supplementary Figure 1). Approximately 50% of parents could be classified as Asian and 45% as European, with remaining parents being of African or “other” ancestry (including mixed). Transmission disequilibrium tests (TDT) on autosomal SNPs in 1908 CL/P case-parent trios showed strong evidence of linkage and association for multiple markers (see QQ plot in Supplementary Figure 2), which clustered into specific chromosomal regions (Figure 1a). Multiple SNPs on chr. 8q24 and 4 SNPs in IRF6 showed genome wide significance (p<5*10−8). In addition, SNPs in two genes not previously associated with CL/P (ABCA4 on chr. 1p22.1 and MAFB on 20q12) achieved genome-wide significance (Table 2), and three potential candidate genes (PAX7 on chr. 1p36, VAX1 on 10q25.3 and NTN1 on 17p13) had one or more SNPs near genome-wide significance (Supplementary Table 1). We stratified these trios into 825 trios of European ancestry (Figure 1b) and 1038 of Asian ancestry (Figure 1c) as a check for consistency across racial groups (omitting 45 case-parent trios of African or “other” ancestry). Interestingly, trios of European ancestry (including European Americans) showed stronger support for chr. 8q24, while Asian trios gave the most significant evidence for both new and old candidate genes with weaker evidence for 8q24. However, p-values cannot be the only criteria when interpreting these results.
Multiple SNPs in 8q24 showed evidence at or near genome-wide significance in the allelic TDT. The strongest individual SNP was rs987525 (Table 2) in both the total sample and the European sub-group (p-value=1.43*10−16 in the total sample), as in two previous case-control studies12,13. In our trios, rs987525 showed significant over-transmission of the A allele, giving OR(transmission)=1.78 (95%CI=1.55–2.05). Among 825 trios of European ancestry, this OR(transmission) was larger (2.01 with 95%CI=1.69–2.38); than among Asian trios (1.39 with 95%CI=1.09–1.78). Both groups were nominally significant (p-value=5*10−16 for European trios; p-value=0.00893 for Asian trios), and both yielded similar patterns of over-transmission despite differences in p-values shown in Figures 1b and 1c.
Conditional logistic regression was used to estimate genotype relative risks under an additive model as the odds ratio of being a case, OR(case), given each additional target allele (arbitrarily defined as the minor allele among parents of European ancestry). Supplementary Figure 3 presents estimated OR(case) for 78 SNPs in a region of signal on 8q24, where multiple SNPs showed distinct over- or under-transmission. Under the additive model, all trios gave an estimated OR(case)=1.73 (95%CI=1.36–2.03) for AT heterozygotes at rs987525 and OR(case)=2.99 (95%CI=1.26–4.10) for AA homozygotes. A more general model with separate effects for heterozygotes and homozygotes yielded estimates of OR(case|AT)=1.58 (95%CI=1.30–1.94) and OR(case|AA)=3.72 (95%CI=2.36–5.87) in the total sample. When trios were stratified into European and Asian ancestry groups, the additive model gave OR(case)=1.91 (95%CI=1.57–2.33) among trios of European ancestry, and OR(case)=1.42 (95%CI=1.08–1.85) among trios of Asian ancestry, again with overlapping 95%CI. A test for heterogeneity between European and Asian trios under this model did not reach statistical significance (likelihood ratio test=3.11 with 1 df; p=0.07).
A lower minor allele frequency (MAF) at rs987525 among Asians compared to Europeans (0.078 vs. 0.260, respectively), resulting in fewer informative Asian parents, could explain differences in statistical significance. Linkage disequilibrium (LD) patterns for parents of European and Asian ancestry were similar (Supplementary Figure 4). Haplotype analysis of markers in this region strengthened evidence from Asian trios somewhat, but could not overcome limitations due to low MAF (data not shown).
SNPs in or near two other genes yielded genome wide significance: ABCA4 on 1q22.1 and MAFB on 20q12 (Table 2). Among 237 SNPs mapping near MAFB, a group of 17 SNPs located 20–60Kb 3′ of MAFB’s single exon defined a region of signal including 6 SNPs with p<5*10−8. Figure 2a shows −log10(p-value) of these SNPs;; Figure 2b shows estimated OR(case) and 95%CI (the null hypothesis value is always 1) and Figure 2c notes their physical location and the MAFB exon. Supplementary Figure 5 shows LD patterns (as r2) for Asian and European parents.
A total of 210 SNPs mapped to the large ABCA4 gene (with 50 exons) on 1p22.1, and a 78Kb region encompassing 97 SNPs contained two SNPs yielding genome wide significance and several approaching this level (Figure 3a). Figure 3b presents estimated OR(case) and their 95%CI and Figure 3c shows their physical position. Supplementary Figure 6 shows LD (as r2).
Replication in independent samples focused on 5 SNPs (rs987525 in 8q24 region and 2 SNPs each in MAFB and ABCA4). Altogether 8,115 individuals from 1,965 CL/P families were drawn from several populations (Supplementary Table 2). Family-based association tests (FBAT, equivalent to the allelic TDT under an additive model in independent trios) were conducted in each population separately and pooled over all families (Supplementary Table 3). Table 3 shows each SNP was nominally significant in populations of similar ancestry to our GWAS sample. Specifically, European ancestry families (both European and European American) gave the strongest evidence for rs987525 in 8q24, while families of Asian ancestry gave stronger evidence for MAFB and ABCA4. Two SNPs near MAFB showed different levels of significance in families of Asian ancestry compared to families of European ancestry. Interestingly, families from Argentina and Colombia confirmed rs987525 in 8q24, while Guatemalan families (who had more Native American ancestry) did not. In Irish trios, conditional logistic regression gave an estimated OR(case)=1.75 (95%CI=1.31–2.35) for rs987525, although a nearby SNP (rs1530300) was even stronger (p=0.00008). Haplotype analysis on 11 SNPs across this 8q24 region yielded still stronger evidence from these 293 Irish trios (data not shown).
Among unrelated Irish controls, the A allele frequency at rs987525 was 0.143, substantially lower than among Irish case parents (0.247). Using allele frequencies from independent control samples from Northern Europe (Denmark, Ireland, Norway), population attributable risks (PAR) were: rs13041247 near MAFB gave PAR=11.1% (95%CI=6.7–15.4), and rs560426 near ABCA4 gave PAR=9.9% (95%CI=6.7–13.2). Similar analysis on rs987525 in 8q24 in Danish and Irish controls gave PAR=10.4% (95%CI=8.4–12.5).
Supplementary Table 1 presents estimated OR(case) and allele frequencies for genes showing signal at or near genome-wide significance. These included recognized or potential candidate genes: PAX7 on 1p36, VAX1 on 10q25.3, plus SNPs between NTN1 on 17p13 and a putative gene LOC728685 (previously predicted to be a protein coding gene). Among 70 SNPs spanning 221Kb around PAX7, 6 had 10−7<p<10−5. Among 13 SNPs in VAX1 spanning 90Kb, two SNPs (rs7078160 and rs4752028) approached genome wide significance with TDT and conditional logistic regression (see Supplementary Table 1 for the latter model). SNP rs7078160 was among the most significant in the German case-control GWAS11 and achieved genome wide significance in an expanded set of case-parent trios13. NTN1 on 17q13.1 spanned 259Kb and included 1 SNP (rs9788972) achieving genome wide significance and 6 other SNPs yielding evidence between 10−8<p<10−6. SNPs giving strong signals were clustered in the 5′ end of this gene and encompassed LOC728867. Supplementary Table 4 lists all SNPs with p<10−5 among all trios (4a), trios of Asian ancestry (4b) and trios of European ancestry (4c).
We sequenced the single MAFB exon, plus four conserved elements 3′ of MAFB, and identified a rare missense variant (H131Q) which was predicted to be damaging to the protein structure (Supplementary Table 5). An additional 357 cases and 360 controls from the Philippines were genotyped, among whom 24 unrelated cases and 5 controls carried this variant. The difference in allele frequencies was significant (p=0.0002), and a TDT on Filipino families was marginally significant (p=0.08), although low MAF meant few informative trios. The H131Q variant was not present in 760 members of the CEPH diversity panel (individuals from 50 populations) nor in 180 European cases and controls. We also sequenced the 50 exons of ABCA4, and identified 27 missense variants, 2 of which were predicted to be damaging (R1443H and N380K, Supplementary Table 5).
Whole mount in situ hybridization analysis of Mafb and immunodetection of expressed Mafb was carried out in mice. Mafb mRNA and protein were expressed in both craniofacial neuroectoderm and neural-crest derived mesoderm between embryonic (e) day 13.5–14.5 (Figure 4). Expression was strong in epithelium around the palatal shelves and in the medial edge epithelium during palatal fusion. After fusion, Mafb expression was stronger in oral epithelium compared to mesenchymal tissue. Similar expression studies for Abca4 were negative for palatal expression.
Case-parent trio designs have two important advantages: 1) as a family based design they are robust to confounding due to population stratification, a critical concern for multi-center studies; and 2) family based tests can provide greater statistical power compared to case-control designs for rare diseases14. Robustness becomes a primary concern when samples from multiple populations are combined, as in this consortium. Large sample sizes are required to achieve extreme levels of statistical significance demanded by GWAS, and for most birth defects this means combining samples across populations. Pooling samples can increase statistical power at the cost of increasing genetic heterogeneity, so checking for heterogeneity among sub-groups remains prudent. As seen here, there can be dramatic differences in p-values between subgroups, even when the direction and estimated magnitude of effects are similar.
In this GWAS, two recognized genes/regions were confirmed (IRF6 and chr. 8q24) and two genes not previously associated with CL/P were identified (ABCA4 and MAFB). MAFB is expressed in the mouse palatal shelf. A rare missense mutation in MAFB (H131Q) was over-represented in Filipino cases and absent in other populations. The H residue is strongly conserved across species (Supplementary Figure 7), and this change is predicted to impair protein function. MAFB is a transcription factor shown to play a role in development of the hindbrain structures, thymus, interneurons, pancreatic islet cells and the hematopoetic system15. Its expression pattern in the mouse is consistent with some role in the developing lip and palate. ABCA4 is a member of a superfamily of transmembrane proteins, and mutations in ABCA4 play a major role in the etiology of Stargardt disease and related retinopathies (www.ncbi.nlm.nih.gov/omim). Although there is no evidence of any relationship with clefting, more than 30 missense, frameshift and splice site variants in this large gene have been reported.
It is possible all evidence of linkage and association observed here represents indirect associations with other genes or regulatory elements outside any gene. The success of this CL/P GWAS reflects its large sample size, the robust family based approach and inclusion of samples from populations of different ancestry which confirmed previous findings and identified new genes needing further study.
Supplementary Figure 1: Principal components analysis based on 33,078 randomly selected, independent SNPs among unrelated parents of cases with an isolated, non-syndromic oral cleft and unrelated individuals from HapMap Phase II populations (YRI, CEU and CHB/JPT). PC1 accounted for 8.89% of the total genetic variation, while PC2 accounted for 1.57% of the observed genetic variation.
Supplementary Figure 2: Q-Q plot for −log10(p-values) of all autosomal SNPs not flagged for QC metrics among 1908 complete CL/P case-parent trios. The gray shaded region indicates 95% confidence band for order statistics. The numbers on the top axis indicate the respective locations for (ordered) expected −log10 p-values. (e.g., the number 1 (10) indicates the expected value, on the −log10 scale, for the minimum (i.e. the tenth smallest) p-value). Clearly visible is a departure from the expected values, beyond what can be explained by random chance.
Supplementary Figure 3: Estimated OR(case) from a conditional logistic regression] and their 95%CI under an additive model for 78 SNPs in chr. 8q24. The minor allele among European parents was the target allele. The most significant SNP (rs987525) is noted by the star.
Supplementary Figure 4: Linkage disequilibrium (LD) patterns among parents of European and Asian ancestry (measured as r2) for markers in chr. 8q24 region showing evidence of linkage and association at the genome wide level of significance. The position of rs987525 is noted by the diamond.
Supplementary Figure 5: Linkage disequilibrium (LD) patterns among parents of European and Asian ancestry (measured as r2) for markers in MAFB showing evidence of linkage and association at the genome wide level of significance.
Supplementary Figure 6: Linkage disequilibrium (LD) patterns among parents of European and Asian ancestry (measured as r2) for markers in ABCA4 showing evidence of linkage and association at genome wide level of significance.
Supplementary Figure 7: Comparative amino acid sequence alignment of vertebrate Mafb and location of the conserved H131Q variant found in humans, using amino acid sequences from the Ensembl genome browser (release 56) as aligned by ClustalW.
We sincerely thank all of the families at each recruitment site for participating in this study, and we gratefully acknowledge the invaluable assistance of clinical, field and laboratory staff who contributed to making this work possible. We are particularly grateful to Kate Durda and Jamie L’Heureux of the University of Iowa for assistance with samples and phenotype data: Laure Henkle for technical assistance; Judith Resick, Carla Brandon, and Kathy Bardi of the University of Pittsburgh for assistance with recruitment; Wendy Carricato, Kathleen Deeley, and Joseph Ruff of the University of Pittsburgh for sample handling; Poorav Patel and Margaret Rose of Johns Hopkins for assistance with data analysis; Felicia Cheah of the National University of Singapore for sample processing and management; Marcie Feldkamp and John Carey, Utah Birth Defects Network, for assistance with recruitment and case-review; Dawnya Pearce, Utah State University for sample collection and processing. Funding to support data collection, genotyping and analysis came from several sources, some to individual investigators and some to the consortium itself. The consortium for GWAS genotyping and analysis was supported by the National Institute for Dental and Craniofacial Research through U01-DE-004425; “International Consortium to Identify Genes & Interactions Controlling Oral Clefts”, 2007–2009; TH Beaty, PI. Funding for individual investigators and the replication studies include: R01-DE-014581 (THB); R37-DE08559 (JCM, MLM), R01-DE09886 (MLM, LM, LLF), R01-DE012472 (MLM), R01-DE014677 (ACL, MLM), R01-DE016148 (MLM), P50-DE016215 (JCM, MLM), R21-DE016930 (MLM, EEC, AEC), R01-HD390661 and R01-DE016877 (RGM). JK02-AEO15291 (ACL), March of Dimes Basil O’Connor award #FY 98-0718 & Research Grant #6-FY01-61 (ACL); NIH R03-AR-055313 & NIH P50-DE-16215 (project #3) (MD), K99-DE-018441 (RM). Smile Train Foundation supported data collection in Chengdu (EWJ, BS). This research was supported in part by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (AJW). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Dental and Craniofacial Research, nor the National Institutes of Health.