Because these case-parent trios came from different populations (), we conducted a principal components analysis (PCA) on all parents to document genetic variation in our consortium (Supplementary Figure 1
). Approximately 50% of parents could be classified as Asian and 45% as European, with remaining parents being of African or “other” ancestry (including mixed). Transmission disequilibrium tests (TDT) on autosomal SNPs in 1908 CL/P case-parent trios showed strong evidence of linkage and association for multiple markers (see QQ plot in Supplementary Figure 2
), which clustered into specific chromosomal regions (). Multiple SNPs on chr. 8q24 and 4 SNPs in IRF6
showed genome wide significance (p<5*10−8
). In addition, SNPs in two genes not previously associated with CL/P (ABCA4
on chr. 1p22.1 and MAFB
on 20q12) achieved genome-wide significance (), and three potential candidate genes (PAX7
on chr. 1p36, VAX1
on 10q25.3 and NTN1
on 17p13) had one or more SNPs near genome-wide significance (Supplementary Table 1
). We stratified these trios into 825 trios of European ancestry () and 1038 of Asian ancestry () as a check for consistency across racial groups (omitting 45 case-parent trios of African or “other” ancestry). Interestingly, trios of European ancestry (including European Americans) showed stronger support for chr. 8q24, while Asian trios gave the most significant evidence for both new and old candidate genes with weaker evidence for 8q24. However, p-values cannot be the only criteria when interpreting these results.
Number of trios by recruitment site noting complete and incomplete trios (those with 1 parent missing).
Figure 1 Manhattan plots of log10(p-values) from transmission disequilibrium test (TDT) for autosomal SNPs on CL/P case-parent trios (omitting SNPs flagged for QC). (a) Results based on all 1908 CL/P trios; (b) Results based on 825 CL/P case-parent trios of European (more ...)
Estimated OR(case) for SNPs showing genome wide significance in 4 regions under an additive model plus minor allele and its frequency among all parents and among parents of European and Asian CL/P cases.
Multiple SNPs in 8q24 showed evidence at or near genome-wide significance in the allelic TDT. The strongest individual SNP was rs987525 () in both the total sample and the European sub-group (p-value=1.43*10−16
in the total sample), as in two previous case-control studies12,13
. In our trios, rs987525 showed significant over-transmission of the A allele, giving OR(transmission)=1.78 (95%CI=1.55–2.05). Among 825 trios of European ancestry, this OR(transmission) was larger (2.01 with 95%CI=1.69–2.38); than among Asian trios (1.39 with 95%CI=1.09–1.78). Both groups were nominally significant (p-value=5*10−16
for European trios; p-value=0.00893 for Asian trios), and both yielded similar patterns of over-transmission despite differences in p-values shown in .
Conditional logistic regression was used to estimate genotype relative risks under an additive model as the odds ratio of being a case, OR(case), given each additional target allele (arbitrarily defined as the minor allele among parents of European ancestry). Supplementary Figure 3
presents estimated OR(case) for 78 SNPs in a region of signal on 8q24, where multiple SNPs showed distinct over- or under-transmission. Under the additive model, all trios gave an estimated OR(case)=1.73 (95%CI=1.36–2.03) for AT heterozygotes at rs987525 and OR(case)=2.99 (95%CI=1.26–4.10) for AA homozygotes. A more general model with separate effects for heterozygotes and homozygotes yielded estimates of OR(case|AT)=1.58 (95%CI=1.30–1.94) and OR(case|AA)=3.72 (95%CI=2.36–5.87) in the total sample. When trios were stratified into European and Asian ancestry groups, the additive model gave OR(case)=1.91 (95%CI=1.57–2.33) among trios of European ancestry, and OR(case)=1.42 (95%CI=1.08–1.85) among trios of Asian ancestry, again with overlapping 95%CI. A test for heterogeneity between European and Asian trios under this model did not reach statistical significance (likelihood ratio test=3.11 with 1 df; p=0.07).
A lower minor allele frequency (MAF) at rs987525 among Asians compared to Europeans (0.078 vs. 0.260, respectively), resulting in fewer informative Asian parents, could explain differences in statistical significance. Linkage disequilibrium (LD) patterns for parents of European and Asian ancestry were similar (Supplementary Figure 4
). Haplotype analysis of markers in this region strengthened evidence from Asian trios somewhat, but could not overcome limitations due to low MAF (data not shown).
SNPs in or near two other genes yielded genome wide significance: ABCA4
on 1q22.1 and MAFB
on 20q12 (). Among 237 SNPs mapping near MAFB
, a group of 17 SNPs located 20–60Kb 3′ of MAFB
’s single exon defined a region of signal including 6 SNPs with p<5*10−8
. shows −log10
(p-value) of these SNPs;; shows estimated OR(case) and 95%CI (the null hypothesis value is always 1) and notes their physical location and the MAFB
exon. Supplementary Figure 5
shows LD patterns (as r2
) for Asian and European parents.
Figure 2 Significance and effect size for SNPs near MAFB based on all CL/P trios. a) −log10(p-value) for allelic TDT for 17 SNPs near MAFB on chr. 20q11; b) Estimated OR(case) from a conditional logistic regression and their 95%CI under an additive model; (more ...)
A total of 210 SNPs mapped to the large ABCA4
gene (with 50 exons) on 1p22.1, and a 78Kb region encompassing 97 SNPs contained two SNPs yielding genome wide significance and several approaching this level (). presents estimated OR(case) and their 95%CI and shows their physical position. Supplementary Figure 6
shows LD (as r2
Figure 3 Significance and effect size for SNPs in and near ABCA4 based on all CL/P trios. a) −log10(p-value) for allelic TDT for 98 SNPs in or near ABCA4 on chr. 1p22.1; b) Estimated OR(case) from a conditional logistic regression and their 95%CI under (more ...)
Replication in independent samples focused on 5 SNPs (rs987525 in 8q24 region and 2 SNPs each in MAFB
). Altogether 8,115 individuals from 1,965 CL/P families were drawn from several populations (Supplementary Table 2
). Family-based association tests (FBAT, equivalent to the allelic TDT under an additive model in independent trios) were conducted in each population separately and pooled over all families (Supplementary Table 3
). shows each SNP was nominally significant in populations of similar ancestry to our GWAS sample. Specifically, European ancestry families (both European and European American) gave the strongest evidence for rs987525 in 8q24, while families of Asian ancestry gave stronger evidence for MAFB
. Two SNPs near MAFB
showed different levels of significance in families of Asian ancestry compared to families of European ancestry. Interestingly, families from Argentina and Colombia confirmed rs987525 in 8q24, while Guatemalan families (who had more Native American ancestry) did not. In Irish trios, conditional logistic regression gave an estimated OR(case)=1.75 (95%CI=1.31–2.35) for rs987525, although a nearby SNP (rs1530300) was even stronger (p=0.00008). Haplotype analysis on 11 SNPs across this 8q24 region yielded still stronger evidence from these 293 Irish trios (data not shown).
P-values for replication of 5 SNPs showing genome wide significance in GWAS using independent families from various populations.
Among unrelated Irish controls, the A allele frequency at rs987525 was 0.143, substantially lower than among Irish case parents (0.247). Using allele frequencies from independent control samples from Northern Europe (Denmark, Ireland, Norway), population attributable risks (PAR) were: rs13041247 near MAFB gave PAR=11.1% (95%CI=6.7–15.4), and rs560426 near ABCA4 gave PAR=9.9% (95%CI=6.7–13.2). Similar analysis on rs987525 in 8q24 in Danish and Irish controls gave PAR=10.4% (95%CI=8.4–12.5).
Supplementary Table 1
presents estimated OR(case) and allele frequencies for genes showing signal at or near genome-wide significance. These included recognized or potential candidate genes: PAX7
on 1p36, VAX1
on 10q25.3, plus SNPs between NTN1
on 17p13 and a putative gene LOC728685
(previously predicted to be a protein coding gene). Among 70 SNPs spanning 221Kb around PAX7
, 6 had 10−7
. Among 13 SNPs in VAX1
spanning 90Kb, two SNPs (rs7078160 and rs4752028) approached genome wide significance with TDT and conditional logistic regression (see Supplementary Table 1
for the latter model). SNP rs7078160 was among the most significant in the German case-control GWAS11
and achieved genome wide significance in an expanded set of case-parent trios13
on 17q13.1 spanned 259Kb and included 1 SNP (rs9788972) achieving genome wide significance and 6 other SNPs yielding evidence between 10−8
. SNPs giving strong signals were clustered in the 5′ end of this gene and encompassed LOC728867
. Supplementary Table 4
lists all SNPs with p<10−5
among all trios (4a), trios of Asian ancestry (4b) and trios of European ancestry (4c).
We sequenced the single MAFB
plus four conserved elements 3′ of MAFB
, and identified a rare missense variant (H131Q) which was predicted to be damaging to the protein structure (Supplementary Table 5
). An additional 357 cases and 360 controls from the Philippines were genotyped, among whom 24 unrelated cases and 5 controls carried this variant. The difference in allele frequencies was significant (p=0.0002), and a TDT on Filipino families was marginally significant (p=0.08), although low MAF meant few informative trios. The H131Q variant was not present in 760 members of the CEPH diversity panel (individuals from 50 populations) nor in 180 European cases and controls. We also sequenced the 50 exons of ABCA4
, and identified 27 missense variants, 2 of which were predicted to be damaging (R1443H and N380K, Supplementary Table 5
Whole mount in situ hybridization analysis of Mafb and immunodetection of expressed Mafb was carried out in mice. Mafb mRNA and protein were expressed in both craniofacial neuroectoderm and neural-crest derived mesoderm between embryonic (e) day 13.5–14.5 (). Expression was strong in epithelium around the palatal shelves and in the medial edge epithelium during palatal fusion. After fusion, Mafb expression was stronger in oral epithelium compared to mesenchymal tissue. Similar expression studies for Abca4 were negative for palatal expression.
Figure 4 Mafb, and not Abca4, is expressed during the development of the secondary palate in the mouse. In situ hybridization for Mafb on whole mount e13.5 embryos (a–d) shows expression in craniofacial ectoderm, vibrissae, and neural-crest derived mesoderm (more ...)