|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: AJ ØS RTL AJW KC JCM HKG. Performed the experiments: AJ ØS HKG. Analyzed the data: AJ ØS HKG. Contributed reagents/materials/analysis tools: AJ ØS LC HKG RTL AJW JCM TTN. Wrote the paper: AJ ØS HKG AJW JCM.
Orofacial clefts are common birth defects of complex etiology, with an excess of males among babies with cleft lip and palate, and an excess of females among those with cleft palate only. Although genes on the X chromosome have been implicated in clefting, there has been no association analysis of X-linked markers.
We added new functionalities in the HAPLIN statistical software to enable association analysis of X-linked markers and an exploration of various causal scenarios relevant to orofacial clefts. Genotypes for 48 SNPs in 18 candidate genes on the X chromosome were analyzed in two population-based samples from Scandinavia (562 Norwegian and 235 Danish case-parent triads). For haplotype analysis, we used a sliding-window approach and assessed isolated cleft lip with or without cleft palate (iCL/P) separately from isolated cleft palate only (iCPO). We tested three statistical models in HAPLIN, allowing for: i) the same relative risk in males and females, ii) sex-specific relative risks, and iii) X-inactivation in females. We found weak but consistent associations with the oral-facial-digital syndrome 1 (OFD1) gene (formerly known as CXORF5) in the Danish iCL/P samples across all models, but not in the Norwegian iCL/P samples. In sex-specific analyses, the association with OFD1 was in male cases only. No analyses showed associations with iCPO in either the Norwegian or the Danish sample.
The association of OFD1 with iCL/P is plausible given the biological relevance of this gene. However, the lack of replication in the Norwegian samples highlights the need to verify these preliminary findings in other large datasets. More generally, the novel analytic methods presented here are widely applicable to investigations of the role of X-linked genes in complex traits.
Orofacial clefts are relatively common craniofacial birth defects, with a birth prevalence of about 1–2/1000. They require extensive surgical, nutritional, dental, speech, behavioral and medical interventions, and thus impose a substantial economic and personal health burden , . As with other complex traits, multiple genetic and environmental risk factors are thought to underlie these birth defects .
Clefts are characterized by a particularly strong genetic component, as evidenced by studies of familial recurrence risk and heritability . First-degree relatives of an affected individual have a 30–40 fold higher recurrence risk compared with the background population , , and heritability estimates were recently reported to be as high as 91% for CL/P and 90% for CP in a large Danish twin study .
The strong genetic component to clefting has spurred decade-long efforts to identify the genes underpinning these complex birth defects. Increased collaborative efforts coupled with major advances in high-density SNP genotyping arrays have heralded a new era of gene discovery for complex traits. For clefts, the first genome-wide association study (GWAS) identified a strong signal on chromosome 8q24 in individuals of central European ancestry. This association was subsequently replicated in three independent GWAS –. In addition to the 8q24 locus, these studies also identified significant associations with several genes, including v-maf musculoaponeurotic fibrosarcoma oncogene homolog B (MAFB), ATP-binding cassette sub-family A member 4 (ABCA4), ventral anterior homeobox 1 (VAX1), paired box 7 (PAX7) and interferon regulatory factor 6 (IRF6) . IRF6 is particularly noteworthy, being the only gene on this list to be confirmed as a major player for clefts through approaches other than GWAS –.
The above studies and virtually all association studies of clefts (as well as other complex traits) have targeted primarily autosomal markers, without attention to potential contributions of X-linked common gene variants. This is partly because most of the statistical methods originally designed for association analysis were only targeted towards the analysis of autosomal markers. The finding that X-linked gene variants may be implicated in a number of complex traits – has encouraged the development of statistical methods for analyzing X-linked markers. The majority of these methods are extensions of the transmission/disequilibrium test (TDT) , for example, the X-linked sibling TDT (XS-TDT) , the reconstruction-combined TDT for X-chromosome markers (XRC-TDT) , the X-linkage TDT (X-TDT) , and the X-chromosome pedigree disequilibrium test (XPDT) . Two additional tests compare observed versus expected distributions of a specific allele or haplotype in affected siblings, conditional on the parental genotypes. These are the 1) association in the presence of linkage (APL) test that accommodates X-chromosome markers (X-APL) , and 2) X-linked quantitative trait loci linkage mapping (X-QTL) . Despite several attractive attributes of these methods (summarized in Table 1), an important limitation is that they can only provide a p-value for association, but not estimates of genetic risk. The exception is the likelihood ratio test (LRT) developed by Zhang and co-workers (X-LRT) .
An exploration of X-linked variants is particularly relevant when a complex trait is more common in one sex – as is seen for the two main types of orofacial clefts. For this study, we implemented new functionalities in the HAPLIN software  to enable X-chromosome marker analysis and an estimation of relative risks associated with either a single or double dose of an allele or haplotype. We considered various model parameterizations that address a range of causal scenarios relevant to an X-linked disease locus. This included allowing for different baseline risks for males and females (to reflect the higher prevalence of CL/P in males), and accounting for X-inactivation in females (where one of the two copies of the X chromosome is inactivated in each cell to ensure similar gene dosage across the two sexes ).
We applied this new method to a collection of 48 SNPs in 18 cleft candidate genes on the X chromosome and used data from two national cleft studies in Scandinavia – one of the largest collections of orofacial cleft triads available. To our knowledge, no previous study has explored the role of X-linked genes in the etiology of orofacial clefts using association.
From a population-based case-control study of orofacial clefts in Norway (1996–2001), 311 iCL/P and 114 iCPO case-parent triads were available for the current analysis. As a replication sample, we had a further 166 iCL/P and 69 iCPO case-parent triads from a population-based study of orofacial clefts in Denmark (1991–2001). The sample distribution according to cleft type, sex, and population is provided in Table 2. Details regarding study design and participants have been provided elsewhere , . The 18 X-linked genes and 48 SNPs for the current analysis derive from a larger candidate-gene based study in which we examined 357 candidate genes in the same study populations . A detailed description of these 18 X-linked genes and 48 SNPs is provided in the online Table S1.
The statistical software HAPLIN  was specifically designed to analyze genetic and environmental risk factors in offspring-parent triads and case-control collections. It is based on log-linear modeling as originally described – and implements a full maximum-likelihood model for estimation. HAPLIN computes explicit estimates of relative risks with asymptotic standard errors and confidence intervals. It uses the expectation-maximization (EM) algorithm to impute genotypes that are missing at “random=" (e.g. due to failed genotyping) and those missing by “design=" (e.g. if DNA from a family member was not available for genotyping). The EM algorithm can also reconstruct unknown haplotype phase for haplotype analysis, but on the X chromosome this is not needed since phase can be deduced directly when data are non-missing.
Central to HAPLIN is a generalized linear model (glm) being estimated from the observed genotype frequencies–the M-step of the EM algorithm. The E-step consists of all three imputations described above, performed in a single step. The algorithm then alternates between the M-step and E-step until convergence is achieved. The results obtained from the EM algorithm correspond to the maximum-likelihood estimates of the model, which include gene frequencies and all relative risk parameters. However, to obtain the correct standard errors, confidence intervals and likelihood ratio test (LRT) for the models, HAPLIN corrects for the fact that imputation has taken place. If the imputed data were used uncorrected, they would seem to contain more information than what is actually available in the raw data.
HAPLIN allows a range of X-chromosome models to be estimated, depending on assumptions made about allele effects in males versus females. The following models, summarized in Table 3, are examples of risk parameterizations provided by HAPLIN:
With fewer parameters, more assumptions are needed but power is improved (provided the model is correct). The log-linear model implemented in HAPLIN extends the X-LRT approach described by Zhang and colleagues , which essentially estimates Model 5 for single SNP markers, although with a different parameterization. Even though stratifying the case-parent triads by sex may reduce the statistical power to detect an association, we performed sex-specific analyses to verify whether there is a stronger gene effect in one sex versus the other. Model 1, with the assumption of equal baselines, is less relevant to our data. Models 3 and 5 can be estimated assuming equal allele frequencies between males and females, or by completely stratifying on sex. All models have natural extensions from single SNP markers to testing multiple haplotypes.
Our study comprised one of the largest collections of case-parent triads for clefts from two population-based sources. Even so, it is unlikely that gene-effects are large enough that a single gene would remain significant after a full Bonferroni correction, even when the two sexes are analyzed together. The stringent requirement of ensuring an overall Type 1 error rate of ≤5% will be overly conservative, especially in a study such as this one where the candidate genes had been selected a priori for their potential roles in clefting.
To adequately deal with the multiple-testing issue, we followed a two-part approach. First, all p-values computed from single SNPs or haplotypes within a gene were summarized into a single p-value for that gene, corrected for within-gene multiple testing. Second, the single gene p-values were plotted together in a quantile-quantile (QQ) plot, which would reveal p-values more significant than what would be expected by chance given the number of genes being tested.
For the first part, HAPLIN includes the function haplinSlide (for details, see http://www.uib.no/smis/gjessing/genetics/software/haplin/or the HAPLIN help pages in R), which automates the analysis of a long sequence of single SNPs, or alternatively a sequence of overlapping sliding-windows with haplotypes. Overlapping sliding-windows will in principle increase the chance of “bracketing=" a causal variant by having a haplotype with SNPs on each side of the variant. However, estimating haplotypes entails a certain loss of power due to the higher number of alleles taken into account and the unknown phase of the haplotypes. It is a priori not obvious whether a single-SNP approach or a sliding-window haplotype approach would have the best chance of detecting an association; therefore, we performed both single-marker and haplotype analyses on the current data. We restricted HAPLIN to use up to 4 SNPs in a sliding-window haplotype analysis, which typically ensures bracketing causal variants with two SNPs on each side. Using more than 4 SNPs in a haplotype would most likely entail an unnecessary loss of power. With longer haplotypes, the number of possible haplotypes increases exponentially, and any given haplotype will be found in very few individuals (if any), making effect estimation difficult. After running a sliding-window analysis, the results were summarized in the form of a single, overall p-value associated with each gene. This is done by choosing the smallest p-value from the series of windows. If the tests from each window were independent, this p-value could be adjusted with a standard Bonferroni or Šidák correction. However, when analyzing SNPs in strong LD, and in particular when analyzing overlapping windows of length four haplotypes (which share three SNPs with the neighboring haplotype), there is a strong correlation between results obtained from nearby SNPs or windows, and a Bonferroni correction would be far too strict. The suest function in HAPLIN corrects the minimum p-value for the dependencies in an optimal manner. This is achieved by first saving the individual (family) score contributions from each window estimation. Under the null hypothesis of no disease association with the haplotypes within a window, the score contributions in that window follow an approximate multivariate normal distribution with mean zero and a covariance matrix which can be computed from the estimated individual score contributions. This allows computing the standard score p-value for that window using a chi-squared test statistic . Then, over a series of windows, the combined score contributions from each window follow approximately a multivariate normal distribution with mean zero and an (empirical) covariance matrix computed from the combined score values. This allows computing the theoretical null distribution of the minimum p-value and thus in calibrating (correcting) the observed minimum p-value. This approach is closely related to the principle of “seemingly unrelated estimation=", as implemented in the statistical package Stata .
For the second part, QQ plots were used to inspect visually whether our analyses produced more significant results than what would be expected by chance. The rationale behind the procedure is that if no genes have an effect, the computed p-values should derive from a uniform distribution and thus follow the straight diagonal in the QQ plot. If some genes have an effect (with correspondingly lower p-values), their p-values are likely to show up in the QQ plot as a clear deviation from the diagonal, exhibiting a higher significance than expected by chance under the null hypothesis. We generated QQ plots for each cleft type (iCL/P and iCPO) after combining p-values from the Norwegian and Danish HAPLIN analyses using Fisher’s method . The pQQ function in HAPLIN includes confidence bands to assess the size of any deviations from the diagonal. The confidence bands are computed under the null hypothesis of no association between genes and disease. In that case, the order statistic of the sorted p-values will follow a beta distribution with parameters determined by the number of assessed p-values, and the 2.5 and 97.5 percentiles from the corresponding beta distribution provide lower and upper limits, respectively.
HAPLIN version 4.1 is implemented in the publicly available R statistical package  and is freely downloadable from our web site at http://www.uib.no/smis/gjessing/genetics/software/haplin. A user-friendly graphical user interface (GUI), which includes some (but not all) of the HAPLIN functionalities, is also freely available at http://haplin.fhi.no.
Clinicopathological information from all participating families and biologic specimens for DNA extraction were obtained with the written informed consent of the mothers and fathers. The study was approved by the Norwegian Data Inspectorate, the Regional Committee on Research Ethics for Western Norway, and the respective Institutional Review Boards of the US National Institute of Environmental Health Sciences (NIH/NIEHS) and the University of Iowa. For the Danish orofacial clefts study, study approval was obtained from the regional scientific-ethical committee. All aspects of this research are in compliance with the tenets of the Declaration of Helsinki for human research (http://www.wma.net).
Figures 1 and and22 display the QQ plots for the analysis of 48 SNPs in 18 cleft candidate genes on the X chromosome, without stratification by sex. We first tested a multiplicative model assuming the same relative risk for males and females and no X-inactivation (Model 2 in Table 3). The analyses in Figure 1 were performed one SNP at a time, whereas Figure 2 shows the results of the sliding-window haplotype analysis of up to 4 SNPs per window. Overall, the QQ plots show only weak evidence of an association with the oral-facial-digital syndrome 1 gene (OFD1, formerly known as CXORF5) on chr Xp22 in the Danish iCL/P samples. This association was not replicated in the Norwegian iCL/P samples. For iCPO, there was no evidence of association in either population.
To assess for different gene-effects in males versus females, with effects more evident in one sex stratum than the other, we repeated the analyses for males and females separately. Figures 3 and and44 show the results of haplotype analysis using a sliding-window of up to 4 SNPs. These analyses correspond to Model 3 in Table 3, in which we assume a multiplicative model with different baseline risks for males and females, different relative risks for males and females, and no X-inactivation. The corresponding single-marker analyses for female and male cases are provided in the online Figures S1 and S2, respectively. The association with OFD1 is now only observed in the Danish iCL/P males, with a relative risk of 2.2 (95% confidence interval: 1.3–3.7; p-value: 3.6×10−3) with one copy of the variant (minor) allele at the OFD1 SNP rs2285635 when compared with the reference (major) allele.
Lastly, we tested Model 4 in which we assume different baseline risks for males and females, a shared relative risk for males and females, and X-inactivation in females. The results of haplotype analysis are depicted in Figure 5 and the corresponding single-marker analyses are shown in the online Figure S3. Again, the only notable association is with OFD1 in the Danish iCL/P sample only.
Our study was strongly motivated by the unequal sex distribution observed in the two main types of clefts (CL/P and CPO) as well as previous findings of a strong link between X-linked genes and orofacial clefts. X-linked genes have been identified primarily in syndromic forms of clefting and include midline 1 (MID1) on chr Xp22, T-box 22 (TBX22) on chr Xq21.1, PHD finger protein 8 (PHF8) on chr Xp11.22, and RNA binding motif protein 10 (RBM10) on chr Xp11.23. Mutations in MID1 cause the X-linked Opitz GBBB syndrome (OSX, MIM 300000), a congenital midline malformation syndrome characterized by clefting of the lip/palate and a variety of other pathologies . An association between specific haplotypes in MID1 and isolated CL/P was later reported in an Italian population . Mutations in TBX22 cause the rare X-linked syndrome ‘cleft palate with ankyloglossia’ (CPX; MIM 303400) . TBX22 belongs to the T-box family of genes that are evolutionarily highly conserved and recognized for playing key roles in early vertebrate development. Consistent with the CPX phenotype in humans –, the expression of Tbx22 in mice is localized to the developing palatal shelves and the base of the tongue. Further, a genome-wide linkage analysis of families with iCL/P identified a susceptibility locus near TBX22, suggesting that the linkage signal may emanate from this gene . Mutations in TBX22 have also been identified in patients with isolated CPO , . As to PHF8, mutations in this gene cause the X-linked mental retardation syndrome Siderius that includes cleft palate as a common phenotypic feature , . PHF8 is a histone lysine transcription activator expected to have a wide range of functions. Finally, deep sequencing of exons on the X chromosome identified RBM10 as the gene causing TARP (MIM 311900), a syndromic form of cleft palate .
Given these strong links between X-linked genes and syndromic clefts, we examined whether variants in X-linked genes might also be relevant for isolated forms of clefting. To enable X-linked gene analysis, we first developed a method that can i) perform both single-marker and haplotype analyses, ii) generate relevant relative risk estimates with confidence intervals, and iii) assess several etiological models relevant to an X-linked disease locus. The higher prevalence/penetrance for CL/P in males compared with females may be due to hemizygosity for an X-linked disease locus . Therefore, we first analyzed males and females together to account for the possibility that an X-linked disease locus might contribute to clefting risk in both sexes, followed by sex-stratified analyses to investigate whether the X-linked disease locus affects one sex in particular. X-chromosome inactivation in females was also taken into account in the models by treating a heterozygous female (X1X2) as the average of the two homozygotes (X1X1 and X2X2).
Overall we found only weak associations with OFD1 in the Danish iCL/P sample, with no replication in the Norwegian iCL/P sample. As noted in our previous analyses of fetal gene-effects in the same study samples , the genotype call rates for the Norwegian sample (DNA extracted from blood) and Danish sample (DNA extracted from buccal swabs) were 99.6% and 99.1% respectively. Hence, the lack of replication of the OFD1 association in the Norwegian iCL/P samples cannot be ascribed to differences in DNA source. Moreover, different genotype frequencies do not imply differences in gene effects on the phenotype.
In sex-stratified analyses, the association of OFD1 in the Danish iCL/P sample was confined to males only, suggesting a possible sex-specific effect as previously reported for several loci when only males were analyzed . Separate analyses for males and females can be potentially more powerful than a pooled analysis if the X-linked disease locus affects only one sex . An alternative explanation for the apparent sex-specific effect in our data is the potentially higher statistical power to detect an effect of OFD1 in males due to the larger number of male iCL/P cases available for analysis.
Mutations in OFD1 underlie the X-linked dominant oral-facial-digital syndrome type 1 (OFD1, MIM 311200), which is characterized by malformations of the face, oral cavity and digits, as well as lethality in the vast majority of affected males . Featuring prominently among the orofacial abnormalities are median cleft lip, clefts of alveolar ridge at the area of lateral incisors, and cleft palate . To our knowledge, however, no genetic association with this gene has previously been reported in isolated clefts.
For haplotype analysis of X-chromosome markers, the standard log-linear approach needs some modification. First, many diseases show markedly different birth prevalences in males versus females, as is the case for orofacial clefts, with higher prevalence of CL/P in males and higher prevalence of CPO in females . This difference may be due to causes other than the effect of the particular locus under study, such as loci differentially expressed between males and females. To avoid confounding of the genetic risk estimation by the sex effect, separate baseline risks should be assumed for males and females; i.e. in males the effect of an allele A2 should be measured relative to the reference allele A1 in males, whereas in females the effects of A1A2 and A2A2 should be measured relative to A1A1 in females (Table 3). Second, it is not clear a priori whether a single dose of A2 in males has an effect comparable to a single dose in females (A1A2) or to a double dose (A2A2), or is entirely different from the effect in females. This is aptly illustrated by craniofrontonasal syndrome (CFNS; MIM 304110), an X-linked developmental disorder that paradoxically affects heterozygous females more severely than hemizygous males . In addition, there is the usual question of the relationship between A1A2 and A2A2 in females; i.e. whether there is a dose-response relationship, a dominant relationship etc. Third, the basic log-linear model in HAPLIN assumes the same allele frequencies for males and females in the background population. While this is a relatively robust assumption for autosomal markers, it is less obvious for X-linked markers. For populations that are genetically relatively homogeneous (like the Danes and Norwegians), however, this assumption seems to be reasonable.
The most extreme solution to the problems raised above is to run separate analyses on males and females. HAPLIN has a special option for doing this, which allows several different response patterns in females to be explored, whereas males are obviously restricted to single-dose effects. To increase statistical power, HAPLIN allows joint analyses of males and females, which reduce the number of parameters to be estimated. The analyses all assume the same allele/haplotype frequencies but different baseline risks for males and females. In addition, various response patterns can be specified.
Another important consideration is X-inactivation in females which may produce a special relationship between male and female allele effects. In females, one X allele in each cell is inactivated (except for a very few second X chromosomes that escape inactivation). A deleterious X-linked allele would be expected to be more detrimental to males than females because males have no chance of compensation by a corresponding normal allele . Because X-inactivation in women occurs in early embryogenesis, women will tend to have a mixture of cells expressing either their mother’s or father’s X-linked genes (mosaicism). This heterogeneity can have different consequences on a female’s disease response depending on how the two X chromosomes are distributed among tissues . The normal expectation would be an equal distribution of the two cell types . However, there may be “founder=" effects due to the relatively small number of cells in the embryo at the time of X-inactivation, or differential cellular reproduction rates, leading to an imbalance between the two cell types.
If the risk associated with allele A2 in females is RRF, genotypes A1A1 and A2A2 will produce risks BF and BFRRF, respectively. Assuming a 5050 cell type distribution, the risk associated with genotype A1A2 will be an average of the two homozygotes, i.e. (BF+BF RRF)/2=BF(1+RRF)/2 (Model 4 in Table 3). Technically speaking this is not a log-linear model, so HAPLIN replaces the heterozygous risk with BF√RRF–the geometric mean of the two homozygous risks. This results in a log-linear model, and as long as RRF is neither very small nor very large, the approximation is reasonable. For males, the single-dose effect is then assumed equal to female homozygotes, i.e. RRM=RRF (denoted simply as RR in Model 4). HAPLIN also provides an extension of this model to accommodate an unbalanced cell type distribution.
The basic likelihood models in X-LRT  and HAPLIN are similar; for a single SNP, X-LRT uses zero-dose males as reference and estimates relative risks for single-dose males, and zero-, single-, and double-dose females independently. This corresponds to Model 5 in HAPLIN, and in this special case the results are identical, except that HAPLIN chooses reference levels differently. In addition, HAPLIN provides a number of other modeling options on the X-chromosome, and the software provides a full framework for autosomal and X-linked haplotype association analyses in a candidate-gene or GWAS setting.
To summarize, this is the first candidate-gene based study to investigate the role of X-linked genes in orofacial clefting. Although OFD1 is a highly plausible gene for clefts, the lack of replication in the Norwegian iCL/P sample highlights the need to confirm these preliminary findings in other datasets. The novel methods presented here address several scenarios relevant to an X-disease locus and can easily be adapted to explore the role of X-linked genes in other complex disorders.
Single-marker analyses of female cases only. These sex-specific analyses are based on Model 3 in which we assume different baseline risks for males and females, different relative risks for males and females, and no X-inactivation.
Single-marker analyses of male cases only, Model 3.
Single-marker analyses taking X-inactivation into account. These analyses are based on Model 4 in which we assume different baseline risks for males and females, a shared relative risk for males and females, and X-inactivation.
Summary of the 18 X-linked cleft candidate genes and 48 SNPs analyzed in this study.
We are grateful to all participating families for making this study possible. We also thank Drs. Abee L. Boyles, Min Shi and Jennifer R. Harris for their valuable comments on earlier drafts of this manuscript.
Competing Interests: The authors have declared that no competing interests exist.
Funding: Center for Inherited Disease Research (CIDR) is fully funded through a federal contract from the National Institutes of Health (NIH) to The Johns Hopkins University, Contract Number N01-HG-65403. This research was supported in part by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences; by NIH grants DE08559, P60 DE13076, NIH P30 ES05605, and RO1 DE-11948-04; and by the Norwegian Research Council (NFR 177522). The authors also thank the United States National Institute of Dental and Craniofacial Research for underwriting a significant proportion of the genotyping costs by CIDR. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.