|Home | About | Journals | Submit | Contact Us | Français|
Non-syndromic orofacial clefts, i.e. cleft lip (CL) and cleft palate (CP), are among the most common birth defects. The goal of this study was to identify genomic regions and genes for CL with or without CP (CL/P).
We performed linkage analyses of a 10 cM genome scan in 820 multiplex CL/P families (6,565 individuals). Significant linkage results were followed by association analyses of 1,476 SNPs in candidate genes and regions, utilizing a weighted false discovery rate (wFDR) approach to control for multiple testing and incorporate the genome scan results.
Significant (multipoint HLOD ≥3.2) or genome-wide-significant (HLOD ≥4.02) linkage results were found for regions 1q32, 2p13, 3q27-28, 9q21, 12p11, 14q21-24 and 16q24. SNPs in IRF6 (1q32) and in or near FOXE1 (9q21) reached formal genome-wide wFDR-adjusted significance. Further, results were phenotype dependent in that the IRF6 region results were most significant for families in which affected individuals have CL alone, and the FOXE1 region results were most significant in families in which some or all of the affected individuals have CL with CP.
These results highlight the importance of careful phenotypic delineation in large samples of families for genetic analyses of complex, heterogeneous traits such as CL/P.
Orofacial clefts (OFC) are a major public health problem, affecting one in every 500–1000 births worldwide. It has become increasingly apparent that the genetic contribution to OFC is complex, probably heterogeneous, and likely due to interacting effects of multiple loci coupled to environmental covariates. Over 400 syndromes have been described in which a cleft of the lip and/or palate is a feature . Among these syndromes are numerous genetic examples such as cytogenetic abnormalities (e.g., trisomies 13 and 18, 4p–), and single gene Mendelian disorders (e.g. van der Woude syndrome, Stickler syndrome). Non-syndromic (sometimes termed ‘isolated’) cleft lip with or without cleft palate (NS CL/P) is the most prevalent type of OFC, and is the focus of the current study.
Familial aggregation of NS CL/P has been reported in the scientific literature for 250 years [2,3,4,5,6,7]. Since then, segregation analyses of CL/P have supported models that include genes of major effect [8,9,10], and analyses of recurrence risk patterns have been consistent with estimates ranging from 3 to 14 genes (possibly interacting) for CL/P [11,12,13,14,15].
Mutation screens of more than 20 NS CL/P candidate genes find that 2–6% of the total number of individuals with NS CL/P have mutations in genes such as MSX1, FOXE1, GLI2, JAG2, LHX8, SATB2, RYK1 and others [16,17,18]. That is, the large majority of individuals with NS CL/P (94–98%) do not have mutations in any of a wide range of plausible candidate genes. In parallel, many candidate gene association studies have also been carried out seeking specific polymorphic variants that increase the risk of NS CL/P [1,19,20,21,22].
Most notably, the gene identified in van der Woude syndrome (IRF6 ) has been shown by our group  and confirmed in multiple other populations (Italy ; Belgium ; US ; Thailand ; US/Taiwan/Singapore/Korea ; South America ; Norway ) to show highly significant association with NS CL/P and may explain about 12–18% of NS CL/P . Recently we have identified a specific SNP (rs642961) in IRF6 that disrupts the binding site for the transcription factor AP-2α, and that represents the etiologic locus within IRF6, at least in some populations .
In addition to analyses of candidate genes/loci, several genome-wide linkage screens of NS CL/P have been published [33,34,35,36,37,38]. About 8–10 regions have positive, although not highly significant, results in the individual studies. We performed a genome scan meta-analysis of these 5 published genome scans plus data from our additional studies in six countries and found several regions with genome-wide significant results , notably a novel region on 9q21. The purpose of this study is to describe a larger genome-wide scan for NS CL/P with twice as many families and individuals as were reported previously , to summarize the results of our follow-up fine mapping and candidate gene efforts, and to perform phenotypic-specific analyses. Thus the aim was to identify those genetic regions that appear to contribute to OFC overall, and also those regions that appear to have an effect on specific OFC phenotypes.
This study was conducted in two phases: (1) linkage genome-scan, and (2) fine mapping and candidate gene studies. For the genome-scan, there were 820 families ascertained in six countries (Philippines, Colombia, China, India, Turkey, USA), with 6,565 total individuals (summarized in table table1).1). Of the total family members, 4,373 were genotyped (1,514 affected and 2,859 unaffected). For fine-mapping and candidate gene studies, there were 861 families, with 7,047 total individuals (summarized in table table2),2), representing a subset of the 820 genome-scan families plus additional families that were ascertained after the genome-scan was completed. Most of the families were extended multiplex kindreds, i.e. multigenerational families with ≥2 affected individuals. The largest family had 51 individuals; there were 5 pedigrees with 40–50 individuals, 10 with 30–40; 46 with 20–30, many with 10–20 individuals. Each study had approval by the appropriate institutional review boards, and all study subjects provided informed consent to participate. Note that the original 388 genome-scan families  were also included in the current study (denoted ‘CIDR-7 families’ in the previous report ).
The phenotype was NS CL/P, i.e. for families to be included, it was necessary that the proband have NS CL/P (i.e. no other anomalies), and that no other family member have an indication of an OFC syndrome (e.g. lip pits). Each study included evaluations of family members by clinical geneticists to rule out syndromic forms of CL/P.
For some analyses, the total families were divided into the following non-overlapping subsets based on family members reported to be affected (whether or not all affected family members participated in these studies): CL, those families in which all affected family members had CL only; CLP (all affected family members had CL plus CP); CL+CLP (at least one affected family member had CL only and at least one had CL plus CP). The numbers of families and individuals in each of these subsets are summarized in table table11 (genome-scan dataset) and table table22 (fine-mapping dataset). There was one additional subset that had very few families and was therefore not included in the subset studies reported here, i.e. families in which at least one affected family member had CP only.
The genome scans were conducted in multiple batches (the previous report  summarized results from the first two batches of genotyping). Microsatellite genotyping for all batches was performed at the Center for Inherited Disease Research (CIDR) using STRP markers with average spacing of about 9 cM (1–19 cM). Marker set details and methods are available at www.cidr.jhmi.edu.
Trace data was reviewed and genotypes were called using the software package Genotyper from Applied Biosystems. The raw genotype sizes for each genotype were exported from Genotyper and binned using SAS. Study samples were plated with 2 CEPH Utah control samples 1331-1 and 1331-2 repeated on each plate. For each microsatellite marker the CEPH control sample sizes for each plate were compared to the project average and used to adjust the raw sizes for all data for that plate. Data with similar raw sizes were grouped into proposed ‘binned’ alleles. Plots were made for each marker showing the overall as well as by plate allele sizes and proposed bins for both the raw and adjusted size data. Each plot was manually reviewed by at least two data analysts. Trace data for unexpected sizes for CEPH controls and outliers that didn't fall within bins were reviewed in Genotyper. Trace data was also reviewed for markers with high numbers of mendelian errors according to Pedcheck . Binning parameters and plate adjustments were done manually if needed.
After the entire dataset was genotyped, all alleles were re-binned across batches before performing the analyses reported here. The final combined dataset consisted of 375 STRP markers which were genotyped in all three sample submissions. Quality control was monitored by investigator-provided blind duplicate samples. 220 blind duplicate pairs across all three sample submissions resulted in a reproducibility rate of 99.95%.
After the results of the linkage genome scan, the families were genotyped by CIDR for a custom panel of 1,536 single nucleotide polymorphisms (SNPs), of which 1,476 were analyzed (list available on request, see below for reasons some SNPs were not analyzed).
SNPs were chosen to saturate the 1-LOD regions for each genome-wide significant region found in the linkage scan, targeting all known genes in each region, with 2–6 SNPs per gene and filling intergenic gaps >100 kb with 1 SNP per 20 kb. In addition, SNPs in cleft candidate genes were included in the custom panel, and were chosen from a variety of resources, including published linkage and association studies on clefts, genome-wide scans, gene knockout experiments in mice, studies of chromosomal rearrangements in humans, and gene-expression analyses in human and mouse embryonic tissues [20,40,41,42,43], including the Craniofacial and Oral Gene Expression Network  (COGENE). We also searched the Serial Analysis of Gene Expression (SAGE) libraries to see whether a particular gene of interest is expressed in the relevant embryonic tissues at the pertinent developmental stage (weeks 5–6 for fusion of the embryonic lip and weeks 7–10 for fusion of the palatal shelves ). Data from two gene expression approaches were also considered, the ENU project , and the Developmental Genome Anatomy Project (DGAP ). Finally genes involved in Mendelian forms of clefting, particularly those that may manifest as phenocopies of NS CL/P (identified through OMIM) were considered.
To guide the selection of SNPs within the candidate genes and regions, the genome browser of the International HapMap Consortium was used. SNPs were prioritized according to prior evidence of an association with clefting, a preference for coding SNPs and those located in putative regulatory regions, haplotype-tagging properties (htSNPs), and minor allele frequency (MAF) of at least 10%, ideally across the multiple ethnicities represented in this study.
A combination of software, including HAPLOVIEW version 2.05, BEST (Best Enumeration of SNP Tags ), and SNP Browser) (Applied Biosystems; Foster City, Calif., USA), was used to evaluate MAF, deviations from Hardy-Weinberg equilibrium (HWE), inter-marker distance, as well as LD patterns and haplotype block structures (for the selection of htSNPs). SNP assays were designed by Illumina (San Diego, Calif., USA) and a ‘design score’ was computed for each SNP using an algorithm that rigorously tests the performance of that SNP on an Illumina GoldenGate platform.
SNP genotyping was performed at CIDR on a BeadLab system using the Illumina GoldenGate technology  (Illumina; San Diego, Calif., USA). Genotypic data for 1,489 of 1,536 attempted SNPs were released by CIDR. The overall rate of missing genotypic data was 0.35%. As part of quality control, CIDR genotyped a parent-child trio plus one duplicate child, which yielded a duplicate error rate of 0.009% and an overall parent-child discordance rate of zero. Of the 1,489 SNPs released after CIDR quality control checks, 13 more were not included in our analysis due to poor performance in one or more of the individual study populations (e.g. low MAF or high rates of mendelian inconsistencies). Therefore, results are reported here for the remaining 1,476 SNPs.
The inheritance of each marker (genome-wide STRPs and custom SNPs) was assessed with PedCheck  to test for inconsistencies due to non-paternity or other errors. Marker allele frequencies are required by linkage analysis approaches and were estimated in the founders of the families, separately by country due to the diverse ethnicities. Other genetic model parameters are summarized in table table3,3, and were taken from the results of segregation analysis (Philippines – unpublished results; China ; India ; Colombia ; Caucasians [9, 52, 53]).
Standard multipoint parametric linkage statistics were calculated at each of the 375 genome-scan STRPs, in particular the heterogeneity LOD score or HLOD. HLODs are based on the admixture heterogeneity test  where the recombination fraction (θ) and the proportion of linked families (α) are simultaneously estimated. Simulation studies have shown that although the estimate of the proportion of linked families may be not be precise, HLODs are a powerful method for detecting linkage in the presence of heterogeneity [55,56,57]. The descent graph method  implemented in computer program SIMWALK2 was used for the multipoint HLOD calculations.
HLOD calculations were done under the best-fitting dominant and recessive models for each study population (estimated from segregation analysis, see preliminary analyses section above and table table3).3). Maximizing LOD scores over a range of genetic models is valid for simultaneously evaluating linkage and determining the most likely genetic model (without adjustment to significance levels and without need to correct for ascertainment) provided that there is indeed linkage . Furthermore, if an oligogenic model is suspected (as seems likely for CL/P), or if significant heterogeneity exists, some of the causal genes may act in a dominant fashion and others recessive.
To combine the multipoint results across the study populations we summed the multipoint HLODs. Because each study population had different allele frequency estimates and different genetic model parameters it was not optimal to perform a combined linkage analysis pooling all families across all populations. Simulation studies  have shown that HLOD analyses that pooled multiple datasets resulted in a loss of power for detecting linkage compared to summing the HLODs from individual studies.
For determining genome-wide significance for the multipoint linkage calculations of the families, standard guidelines  were followed. A Bonferroni correction was applied to account for the multiple markers, data subsets, and genetic models tested. To do so, the desired α level of 0.05 was divided by [375 (number of markers) × 4 (number of data subsets) × 2 (number of genetic models tested)] to yield a Bonferroni-adjusted alpha of 0.000017. The χ2 corresponding to the p value (i.e. 18.5) was divided by 2(loge10) = 4.605 to yield 4.02 as the LOD score threshold for genome-wide significance for this study. Values between 3.2 and 4.02 were considered as significant but not genome-wide significant.
As an additional method used primarily to estimate the weighting factors for the weighted FDR approach summarized below, we used the genome scan meta-analysis method (GSMA [38, 61, 62]). The GSMA is a nonparametric rank ordering method that can combine genome scan methods across studies with different markers and different statistical tests. In simulation studies the GSMA detected linkage with power comparable to or greater than that obtained by performing a combined linkage analysis of all data [62, 63].
For the GSMA procedure, the genome was divided into bins with bin-width selected such that there were at least two bins on the smallest chromosome and at least one marker was genotyped within each bin. Therefore, for combining the current 10 cM genome scans a bin width of 30 cM was selected (i.e. 130 bins across the genome). For each of the individual populations, bins were assigned a rank (R, with values from 1 to 130) according to the maximum linkage statistic of markers in each bin (multipoint HLOD scores). Any tied bins were assigned equal R's based on the mean of the sequential ranks for those bins.
Because the study populations covered a wide range of sample sizes (see table table1),1), we weighted the R statistics based on sample size. Various weighting strategies have been proposed for the GSMA [61, 62], and simulation studies showed that weighting increased the power of the GSMA to detect linked loci . We used a weighting factor based on the total number of genotyped individuals (N-genotyped) – the ranks within each population were multiplied by the square root of N-genotyped divided by the mean of N-genotyped over all studies. The weighting factors calculated for each population are listed in table table33.
The GSMA identified 30 cM bins that are best supported across the studies. In order to narrow the regions of positive findings, we used an extension of the GSMA that involves repeating the GSMA with different bin starting points and then determining the Minimum Region of Maximum Significance (MRMS ). In brief, we repeated the GSMA twice more shifting the starting point for the binning procedure to first 10 cM and then to 20 cM. This determined the 10 cM MRMS for each positive finding. Given that the genome scan STRP panel averaged 10 cM between markers, 10 cM was the limit of resolution for the meta-analyses.
The transmission disequilibrium test (TDT) was used to assess association in the presence of linkage disequilibrium between the 1,476 fine-mapping markers and CL/P. The Family Based Association Test extension of the TDT (FBAT [65,66,67]) was used to assess association between each SNP and CL/P. To control for multiple testing and to include information from the prior linkage genome scan we performed a weighted approach to false discovery rate (wFDR ). In this approach, per marker FBAT p values were adjusted using weights derived from the GSMA/MRMS results, and allowing for a maximum 10% FDR.
Figure Figure11 shows a summary of the multipoint HLOD results for the total dataset, and for the CL, CLP and CL+CLP subsets. Depicted are the largest multipoint HLODs on each chromosome, under dominant and recessive models. The HLOD genome scan revealed genome-wide significant linkage results (i.e. multipoint HLOD ≥4.02) in the regions 3q27–28 (under a dominant model for CL/P), 9q21 (dominant model), and 14q21–24 (recessive model). Three additional regions reached nominal significance (i.e. multipoint HLOD ≥3.2): 1q32 (under a dominant model for CL/P), 2p13 (dominant model), and 16q24 (recessive model). Of those regions, 1q32, 9q21, 12p11, 14q,21–24 and 16q24 were also statistically significant in the GSMA analysis. Figure Figure2A2A 2B2B 2C2C 2D2D 2E2E 2F2F–G shows the detailed HLOD and GSMA results for each of the chromosomal regions with significant results.
As can be seen in figure figure1,1, the phenotypic subsets have differing results. In the CL subset, the region on chromosome 1q32 was significant under a dominant model (see fig. fig.2A).2A). In the CL+CLP subset, regions on chromosome 9q21 (dominant) and 16q24 (recessive) were significant (see fig. fig.2D2D and andGG for details). In the CLP subset a region on chromosome 12p11 was significant under a dominant model (see fig. fig.2E).2E). Table Table44 summarizes the chromosomal regions with significant linkage results in the TOTAL dataset and the phenotypic subsets, plus potential CL/P candidate genes in those significant regions.
Figure Figure33 shows the weighted p values and weighted FDR alpha levels for the total dataset, and the CLP and CL+CLP datasets. One SNP in IRF6 and 3 SNPs in or near FOXE1 were the only ones reaching formal weighted-FDR-adjusted significance (p < 10–7, and p < 10–6, respectively) in the total dataset. Although not reaching formal genome-wide significance, additional SNPs on 1q, 6q and 9q were near significant (p < 0.001, results not shown in detail).
Of the phenotypic subsets, only CLP had SNPs reaching genome-wide significance: i.e., 5 SNPs in or near FOXE1 on 9q. Although not reaching genome-wide significance, the most significant SNP in both the CL and CL+CLP phenotypic subsets was in IRF6 (p < 0.001 and p < 0.002, respectively), and was the same SNP significant in the TOTAL dataset. Table Table55 summarizes the genome-wide significant SNPs in the total dataset and in the CLP subset.
The genome scan revealed multiple significant linkage results (i.e. multipoint HLOD ≥3.2) in the regions 1q32, 2p13, 3q27–28, 9q21, 14q21–24 and 16q24 for the TOTAL dataset, with the 3q, 9q and 14q regions also genome-wide significant (HLOD ≥4.02). The 1q32 region result was also significant in the CL subset but not the others, implying that the significant linkage was due to the CL families. Similarly, the 9q21 and 16q24 results were also genome-wide significant in the CL+CLP subset. In the CLP subset, an additional region of significance was found for the 12p11 region. The remaining two regions (2p11, 3q27–28) were not significant in any individual subset, implying that these regions may be involved in OFC overall, rather than any specific phenotype. Also, note that in each case where there were significant findings in one of the phenotypic subgroups, the estimated proportion of linked families (α) was larger in the subgroup than in the total dataset (see table table4),4), further supporting the notion that phenotypic sub-grouping may be a useful approach to reduce heterogeneity across cleft families.
Follow-up fine-mapping association studies found SNPs in IRF6 (chromosome 1q) and in or near FOXE1 (chromosome 9q) that reached formal FDR-adjusted significance (see table table5),5), and SNPs in 6q were near significant. Consistent with the linkage results, the fine-mapping results were also phenotype dependent. The IRF6 SNP rs2013162 (significant in the TOTAL dataset) was only positive in the CL and CL+CLP subsets (although not reaching formal genome-wide significance in those subsets). Similarly, FOXE1 SNPs (significant in the TOTAL dataset) were genome-wide significant only in the CLP subset.
Following is a brief discussion of the most notable results by chromosome. Although not summarized in detail, several of these regions also have chromosomal rearrangements reported in CL/P [47, 69, 70].
The 1q32 region with significant linkage and association results is the location for the IRF6 gene that was identified as the etiologic locus for van der Woude syndrome (VDWS, MIM# 119300 ) and also significantly associated with non-syndromic CL/P (SNP rs2235371 [22,24,25,26,28,29,30, 71]).
Recently we have identified a specific IRF6 SNP (rs642961) where the ancestral allele is highly evolutionarily conserved and where functional assays show disruption of an AP-2α binding site that likely represents the major NS CL/P etiologic locus within IRF6 . Notably the results with SNP rs642961 were significant only in families with one or more CL-alone affected members , consistent with the linkage and association results reported here. In the current study, IRF6 SNP rs2013162 reached genome-wide significant association with CL/P in the TOTAL dataset; further the association results were positive in only the CL and CL+CLP subsets (although not reaching formal genome-wide significance in those subsets). Based on Haploview version 4.1 , the current study SNP (rs2013162) lies between the other two SNPs, but is not in LD with them.
The current results demonstrating significant linkage to the IRF6 region provide powerful support for the IRF6 association findings. Interestingly, until the current study we only found weak linkage signals to the 1q32 region  and/or the IRF6 locus itself  (LODs <1.0); the current study thus demonstrates the utility of large sample sizes with careful sub-phenotyping in detecting subtle effects that otherwise are only detectable with association methods.
Note that in our previous linkage report , a 2q32–35 region had highly significant GSMA results. In the current study with double the sample size there are no longer genome-wide significant results with this region. The 2q32–35 region contains the gene for DNA-binding protein SATB2 (a.k.a. KIAA1034) that has been identified through translocation breakpoint analysis as a gene involved in cleft palate , and that also shows site- and stage-specific expression in murine palate development. It may be that some population or phenotypic subsets led to the previous positive results and additional analyses are on-going to continue to investigate SATB2.
The 2p13 region with significant linkage results contains TGFA, the gene with the first reported association with CL/P  and numerous confirmatory reports [19, 75]. There were no genome-wide significant association results with TGFA fine-mapping SNPs (rs3732253, rs1807968, rs374640, rs377122), but the positive linkage results warrant continued study of this region.
The 3q27–28 region had genome-wide significant linkage results, consistent with our previously reported result near this region in Chinese families , but there have been no other published reports of either positive linkage or positive association of non-syndromic CL/P to this region. A potential candidate gene in this region is TP63.
TP63 encodes a member of the p53 family of transcription factors. An animal model, p63–/– mice, has been useful in defining the role this protein plays in the development and maintenance of stratified epithelial tissues. p63–/– mice have several developmental defects which include the lack of limbs and other tissues, such as teeth and mammary glands, which develop as a result of interactions between mesenchyme and epithelium. Furthermore, recently completed detailed expression studies  during murine craniofacial development to dissect the molecular pathogenesis of the bilateral cleft lip and cleft palate seen in Tp63-deficient mice. Analysis of key signaling molecules revealed that the craniofacial defects resulted from increased Bmp4 signaling acting antagonistically on Fgf8 and Shh .
In humans, mutations in the TP63 gene are associated with ectodermal dysplasia and cleft lip/palate syndrome 3 (EEC3 [78, 79]); split-hand/foot malformation 4 (SHFM4); Hay Wells syndrome  (ankyloblepharon-ectodermal defects-cleft lip/palate); ADULT syndrome (acro-dermato-ungual-lacrimal-tooth); limb-mammary syndrome; Rap-Hodgkin syndrome (RHS [81, 82]) and NS OFC . Both alternative splicing and the use of alternative promoters results in multiple transcript variants encoding different proteins, which may underlie the wide phenotypic spectrum associated with mutations in TP63 [84, 85].
Therefore, the custom SNP panel included 10 SNPs within TP63: rs4396880, rs1920266, rs4575879, rs7616178, rs4607088, rs7619526, rs4686529, rs7619549, rs9810322, and rs1515490. None of these SNPs reached genome-wide significance in the wFDR analyses, but given the strong linkage signal and the biological plausibility of this gene, our group is continuing analyses in this region. Furthermore, given the interacting nature of the involvement of Tp63 in the murine model  we are simultaneously exploring TP63, BMP4, FGF8 and SHH in our multiplex families. Notably, as described below, we also saw genome-wide significant linkage to region 14q which is near the location for BMP4.
Region 9q21 had the most significant linkage result in our previous report . This region also reached genome-wide significance in the current study, and showed the most positive HLOD and GSMA results in the TOTAL and CL+CLP datasets. There are a number of possible candidate genes in this region including the human homolog of patched (PTCH, 9q22.3), receptor tyrosine kinase-like orphan receptor 2 (ROR2, 9q22), transforming growth factor beta receptor type 1 (TGFBR1, 9q33-q34), zinc finger protein 189 (ZNF189, 9q22-q31) and FOXE1 (a.k.a. TTF2, TITF2; 9q22). Therefore multiple SNPs from the 1-LOD and 2-LOD intervals around the 9q linkage peak were genotyped, as well as SNPs in each of these candidate genes.
The only SNPs genotyped in the 9q region to reach genome-wide significant association were in or near FOXE1 (see table table5),5), and were significantly associated with CL/P in the TOTAL and CLP datasets. FOXE1 is a forkhead domain-containing transcription factor (a.k.a. TTF2, TITF2; 9q22); mutations in FOXE1 are associated with congenital hypothyroidism, thyroid agenesis and cleft palate in humans (Bamford-Lazarus syndrome, MIM 241850) and mice [86,87,88]. The forkhead gene family (Fox), originally identified in Drosophila, encodes transcription factors with a conserved 100-amino acid DNA-binding motif called the ‘forkhead domain’  and regulates diverse developmental processes in eukaryotes. Rare missense mutations in FOXE1 have been associated with isolated clefting [17, 90].
FOXE1 is a major current focus of our research group, including studies of multi-species conservation, structure, function, and etiology. Genotyping the FOXE1 locus at a greater SNP density has excluded adjacent genes and narrowed the mutation to a 43 kb region including and upstream of FOXE1 (Moreno et al., unpublished results). Figure Figure44 shows the relative positions of the 9q21 SNPs that were genome-wide significant in the current study (see table table55 for list), as well as three additional fine-mapping SNPs recently found to be significantly associated with CL/P (rs894673; rs3758249; rs1867278; Moreno et al. unpublished results), along with the LD patterns estimated in the HapMap CEU population (Data Release 23a/phaseII Mar08, on NCBI B36 assembly, dbSNP b126) utilizing Haploview version 4.1 . Of the recently studied SNPs, rs3758249 is the one most strongly associated with CL/P in multiple populations (Moreno et al., unpublished results) and the associated SNPs from the current study are in a haplotype block with rs3758249 (see fig. fig.4).4). Notably, in both the current study and our recent FOXE1 studies, the strongest associations are seen with families in which one or more affected family members have CL plus CP, and little or no evidence of association in families with CL alone or CP alone. Further, the involvement of FOXE1 during primary palatogenesis is supported by the previously uncharacterized epithelial expression in the medial nasal and maxillary processes that will undergo fusion (Moreno et al., unpublished results).
This region did not achieve significant linkage results in the TOTAL dataset, however, it did reach significance in the CLP subset. Chromosomal rearrangements involving the 12p region result in phenotypes including CL/P or CP [69, 70], including Pallister-Killian syndrome (mosaic isochromosome 12p)  and others (del 12p ; Fryns syndrome ). However this region has not been previously investigated in non-syndromic CL/P and warrants further investigation.
The 14q21–24 region had genome-wide significant results in our previous report, and also reached genome-wide significant linkage results in the current study. This region contains PAX9, BMP4 and transforming growth factor beta 3 (TGFB3). Both Pax9 and Tgfb3 when inactivated in mice results in clefts of the secondary palate . Positive results have been seen in association studies of CL/P and TGFB3, and one missense mutation has been reported [19, 95]. However, none of the one PAX9 or five TGFB3 SNPs tested in the current study reached genome-wide significant association in any of the datasets.
BMP4 (14q22–23) is another plausible candidate gene in this region. Multiple murine model studies have demonstrated a role for the Bmp4 pathway in lip and palate fusion . In a Bmp4 conditional knockout mouse model , all embryos had bilateral cleft lip at 12 days post-conception but by 14.5 days only 22% still exhibited cleft lip, suggesting in utero healing. Our group has found BMP4 nonsense and missense mutations in CL/P, microform, and subclinical cleft cases that are absent in controls  and a recent report showed association of a common variant missense change with isolated cleft lip and palate in a Chinese population . Only one BMP4 SNP was included in the custom SNP panel (rs2147105), and was not significantly associated with CL/P in this study.
Given the strong linkage finding, and the biological plausibility of the possible candidates in this region, we are continuing to investigate BMP4 and other candidates in this region, and will incorporate GxG analyses as well (given the interacting patterns in the murine model recently reported ).
Chromosome 16q24.1 was first identified in a genome scan of Caucasian NSCLP sib pairs . Two candidate genes in this region were subsequently investigated: IRF8 and CRISPLD2. No association was seen with IRF8, but CRISPLD2 showed association in Caucasian and Hispanic families. Although the function of CRISPLD2 is unknown, its structure featuring the presence of a LCCL (Limulus factor C, Coch-5b2 and Lgl1) domain has been suggested to play a structural or immunologic role, or even be involved in cell motility which is required for effective cell migration. In situ hybridization studies  showed that CRISPLD2 is expressed in the mandible, palate, and nasopharynx regions during craniofacial development .
FOXC2 is another plausible candidate in this region since FOXC2 mutations cause lymphodema-distichiasis which has incomplete penetrance of cleft palate [100,101,102], the knockout has cleft palate, and decreased Foxc2 expression in the lateral nasal processes occurs in mice with SHH signaling inactivated in the cranial neural crest cells . We are currently testing FOXC2 and CRISPLD2 SNPs in our study populations, as these were not included in the fine-mapping SNPs genotyped in the current study.
The current study utilized linkage approaches in a large sample of multiplex CL/P families to identify six genomic regions with genome-wide significant findings, plus one additional region significant in one phenotypic subset. A follow-up fine-mapping SNP panel identified two genome-wide significant associations with SNPs in or near candidate genes (IRF6 and FOXE1), which have led to identification of the likely etiologic variants (IRF6 ; FOXE1 – Moreno et al., unpublished results). Note that the fine-mapping approach utilized here would only detect relatively common variants associated with CL/P. Therefore, sequencing of candidate genes is now required to find those variants that may have resulted in linkage signals but with family-specific etiologic variants that could not be detected by LD approaches such as the wFDR utilized here. We are also examining the genotyped SNPs for evidence of micordeletions that are causal and that may be flagged from apparent non-mendelian inheritance. As described above, we are also continuing to pursue those regions with significant linkage results with further fine-mapping efforts.
A striking result from these studies is that some of the significant findings could be attributed to specific phenotypic subsets. Indeed, some findings were significant only in specific phenotypic subgroups. Non-syndromic CL/P is a complex trait, with etiologic heterogeneity, so these results highlight the importance of careful phenotypic delineation in large samples of families as one method to begin to understand the observed etiologic heterogeneity in CL/P families, and as a method to implement in other human disorders with similar levels of etiologic heterogeneity.
There are a number of additional sub-clinical phenotypes now under analysis in our group that suggest increased risk for CL/P: dental anomalies [104, 105], deficiencies of the upper lip orbicularis oris muscle [106, 107], asymmetry [108, 109], and specific craniofacial measurements  in the non-overtly-affected relatives of individuals affected with CL/P. Such sub-clinical phenotyping holds great promise for addressing etiologic heterogeneity in CL/P families by allowing a subdivision of families into potentially more homogeneous subsets based on both the overt and sub-clinical phenotypes present, thus markedly increasing the power and specificity of our genetic studies.
Finally, identification of specific genes such as IRF6 and FOXE1 as well as the association with specific cleft phenotypes leads to a realistic expectation that improved recurrence risk and OFC prognoses may soon be possible using a combination of molecular testing and improved phenotyping in families.
URL's for this article are: CIDR: Center for Inherited Disease Research (http://www.cidr.jhmi.edu/); OMIM: Online Mendelian Inheritance in Man (http://www.ncbi.nlm.nih.gov/); Haploview: http://www.broad.mit.edu/haploview/haploview; HapMap: http://www.hapmap.org/; FBAT: http://www.biostat.harvard.edu/~fbat/default.html.
Many thanks to the families in our world-wide study sites who graciously agreed to participate, and without whom these studies would not be possible. The staff and collaborators at each study site were critical for successful completion of these studies: Dr. You-e Liu and Dr. Dan-ning Hu (China); Dr. Ajit Ray (India); Dr. Tuncbilek (Turkey); Carla Brandon, Kathy Bardi, Judith Resick, Dr. Katherine Neiswanger, Dr. Joseph Losee (USA, Pittsburgh); Katy Krahn, Amy Mach and Tamara Busch (USA, Iowa), Dora Rivera and Dr. Mauricio Camargo (Colombia). In the Philippines, Operation Smile International, Operation Smile Philippines, the HOPE Foundation, Bill and Kathy Magee, Edith Villanueva, Buena Nepomuceno, Henrietta Gamboa, Salie Onggada, Rachel Lim, and Gloria Melocoton were all critical to sample collection. Many thanks to Dr. Peter Chines of NHGRI for SNP mapping and annotation.
These studies were supported by NIH grants R37-DE08559, R01-DE09886, R01-DE012472, R01-DE014677, R01-DE016148, P50-DE016215, R03-DE016632, KO2-DE015291, M01-RR00084; and a March of Dimes grant #6-FY01-616. Genotyping services were provided by the Center for Inherited Disease Research (CIDR) which is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number N01-HG-65403. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Dental and Craniofacial Research or the National Institutes of Health.