|Home | About | Journals | Submit | Contact Us | Français|
A meta-analysis of genome-wide linkage studies allows us to summarize the extensive information available from family-based studies, as the field moves into genome-wide association studies.
Here we apply the genome scan meta-analysis (GSMA) method, a rank-based, model-free approach, to combine results across eight independent genome-wide linkages performed on celiac disease (CD), including 554 families with over 1,500 affected individuals. We also investigate the agreement between signals we identified from this meta-analysis of linkage studies and those identified from genome-wide association analysis using a hypergeometric distribution.
Not surprisingly, the most significant result was obtained in the HLA region. Outside the HLA region, suggestive evidence for linkage was obtained at the telomeric region of chromosome 10 (10q26.12-qter; p = 0.00366), and on chromosome 8 (8q22.2-q24.21; p = 0.00491). Testing signals of association and linkage within bins showed no significant evidence for co-localization of results.
This meta-analysis allowed us to pool the results from available genome-wide linkage studies and to identify novel regions potentially harboring predisposing genetic variation contributing to CD. This study also shows that linkage and association studies may identify different types of disease-predisposing variants.
Celiac disease (CD) is an inflammatory disorder of the small intestine with a complex etiology, triggered by dietary gluten. The disease prevalence is estimated to be up to 1% in Western populations. In addition to the classical gastrointestinal form, a variety of other clinical manifestations of the disease have been described, including atypical and asymptomatic forms . The most common extra-intestinal manifestation is dermatitis herpetiformis (DH), a blistering skin condition, currently regarded as a variant of CD, affecting one-quarter of the patients with gluten intolerance. Asymptomatic forms are characterized by the presence of histological changes, identified through screening programs on apparently healthy subjects.
The only genetic factor indisputably involved in CD is the human leukocyte antigen (HLA) locus, where a clear association has been found with HLA-DQ variants . However, the HLA association alone is insufficient to explain the hereditary nature of the disease , indicating that additional, non-HLA genes must also be involved in disease susceptibility.
Several independent genome-wide scans in multiplex families have identified genomic regions that may harbor susceptibility genes for CD, but few studies have shown significant evidence for linkage outside the HLA region, and there is little convincing replication of linked regions across studies. This is typical of linkage studies for complex diseases, and is likely due to the low power of studies to detect genes of relatively small effect and to the high degree of genetic heterogeneity among families. Promising loci include a region on 5q31-33 (CELIAC2) which was identified in several linkage studies [4,5,6] and in a meta-analysis and a pooled genotype analysis of data from the European Genetics Cluster on Coeliac Disease . A locus on chromosome 19p13.1 (CELIAC 4), which was identified in a Dutch linkage study , harbors a functional candidate gene implicated in intestinal barrier integrity (MYO9B, MIM: 602129) for which an intronic SNP shows significant association with CD .
More recently, one genome-wide association (GWA) study has been published , with follow-up of top regions confirming eight novel, non-HLA regions associated with CD . These studies, and other previous studies focusing on candidate genes, have identified evidence of association with common variants, generally with only minor effect on disease risk.
A meta-analysis of genome-wide linkage studies, to summarize the available information in these extensive family-based studies, is timely as the field moves into genome-wide association studies. We use the genome scan meta-analysis (GSMA) method  to combine results across linkage studies, and identify regions which may harbor genetic determinants for CD. We also investigate the agreement between signals of genome-wide linkage and association analysis to test whether CD-associated SNPs are over-represented in linked regions.
The genome scan meta-analysis (GSMA) method  was used to combine linkage results from CD genome screens. The GSMA method is designed to deal with dissimilarity in study design, marker panels, analysis methods, and linkage statistics used in different genome scans. The GSMA uses only genome-wide linkage results, with no original genotype or family data required. To apply the GSMA method the genome needs to be fragmented into bins of approximately equal length. Using a bin width of approximately 30 cM, 118 bins provided the most even coverage across autosomes based on the Marshfield linkage maps (http://research.marshfieldclinic.org/genetics/home/index.asp) , with chromosome 1 having 10 bins, and chromosomes 21 and 22 having 2 bins each .
For each scan, the maximum value of the linkage statistic obtained in each bin was identified. The bins were then ranked, from the lowest (rank = 1) to the highest (rank = 118) value of the linkage statistic across the genome. Bin ranks were then summed across studies and the summed rank (SR) was compared to its probability distribution under the null hypothesis of no linkage (i.e. assuming ranks were randomly assigned to a bin). The significance of the SR in each bin was assessed with the GSMA software which obtains an empirical p value for the observed SR in each bin, using Monte Carlo simulations permuting the bin location of the ranks within each study (http://www.kcl.ac.uk/schools/medicine/depts/memoge/research/epidemiology/gsma/index.html) . To control for multiple testing of bins, we used a Bonferroni correction. For 30 cM width bins, a p value of 0.05/118 = 0.00042 was required for genome-wide evidence of linkage, or a p value of 1/118 = 0.00847 for suggestive evidence of linkage over the autosomes, corresponding to the SR p values expected to arise by chance once in 20 GSMA studies, and once in a single GSMA study, respectively.
Meta-analysis of CD was performed both unweighted (assuming equal contribution from each study) and weighted by study size. The weighting factor for each study was defined as the square root of (# affecteds – # families), scaled to a mean of 1. Analyses under different weighting functions produced similar results, and are not reported. The primary statistical analysis used 30 cM bin widths; chromosomes showing suggestive evidence for linkage (under weighted or unweighted analysis) were analysed further using bins of 10 and 20 cM width in order to improve the localization of linkage signals detected.
In total, 13 published genome scans of CD families were identified through a review of the literature and through PubMed search. Only whole genome-wide linkage scan analyses were considered, with partial scans and follow-up studies in candidate regions excluded. One study  was an update of a previous study, with a larger family collection, and the earlier study was therefore omitted . We also excluded a study of a single four-generation Dutch pedigree (17 affecteds) with disease transmission following a Mendelian pattern , and two studies performed in extended, consanguineous families from a Finnish genetic isolate  and a Bedouin population in the Negev Desert . These studies were not included as the loci involved in these families may not play a major role in CD as a multifactorial disorder represented in the other studies. Finally, we also excluded a small, low-powered study for which only best peaks were available from the published paper .
Authors of identified studies were contacted by e-mail and invited to contribute linkage results required for the meta-analysis. All groups agreed to participate in the meta-analysis, providing multipoint non-parametric linkage statistics across the genome, with results of one study being extracted from published graphs .
The eight independent CD studies analysed consisted of 554 families and over 1,500 affected individuals (table (table1).1). All studies were of European ancestry and comprised sibships with at least two affected children or families with more distant affected relative pairs. X chromosome results were available for only four studies and were therefore not analysed.
Five studies included only patients diagnosed with classic CD, while three studies [5, 16, 22] also included patients with dermatitis herpetiformis (DH). DH and CD share a common immunogenetic background but may be heterogeneous gluten-sensitive diseases, and we therefore performed a separate analysis of the five CD-only.
Linkage results obtained from investigators consisted of computer files from analysis programs showing linkage statistics (NPL score, p value, etc.) at each marker genotyped or at specific genetic locations. For results with marker data [4, 5, 23], we first mapped markers to locations on the Marshfield linkage map and then determined the bin location of each marker using the previously defined bin boundaries. For results indexed by genetic location [6, 8, 16, 24], we determined the locations of the first and last markers genotyped on each chromosome and then rescaled the genetic locations of the linkage statistics accordingly, before dividing the chromosome into the required number of bins. This correction overcomes the problem of bin shifting when the first or last marker genotyped on a chromosome is located several cM from the telomere, and ensures that bins from each study cover the same genetic region, regardless of data format. Data was extracted from genome-wide linkage graphs , using Engauge Digitizer (http://digitizer.sourceforge.net/) and chromosomes divided into the required number of equal width bins .
After markers and linkage statistic locations for each study had been localized onto the Marshfield map, we then identified the maximum linkage statistic per bin. R-scripts facilitating these data extraction steps are available from the authors.
In order to compare results from the linkage studies with the recent CD GWA study , we extracted SNPs showing association with a p value of <10–4 from their online supplementary table 3 (www.karger.com/doi/10.1159/000228920), and mapped these onto the 30 cM linkage bins using the Rutgers combined linkage-physical map for Build 36 (http://compgen.rutgers.edu/mapomat/). We tested for significant co-occurrence of nominally significant linkage findings with these association findings using the hypergeometric distribution. Specifically, we tested whether the associated SNPs were more likely to be located in bins identified in the GSMA than expected by chance, using different p value thresholds in the weighted and unweighted 30 cM GSMA.
Meta-analysis of eight genome-wide linkage studies in CD was performed both unweighted and weighted. Results based on 30 cM bin widths are summarized in figure figure11 and table table2,2, which lists all nominally significant bins (p value <0.05), highlighting those with suggestive and genome-wide evidence for linkage.
Genome-wide evidence for linkage was obtained in the HLA region (bin 6_2, i.e. the second bin on chromosome 6). A strong signal was also seen in the flanking bins (6_1 and 6_3), due to a carry-over effect from elevated multipoint linkage scores in an extended region of chromosome 6.
Outside the HLA region, the strongest evidence for linkage occurred in bin 10_6, with suggestive evidence for linkage in both weighted and unweighted analyses (p = 0.00366, p = 0.00271, respectively). This telomeric region on chromosome 10 corresponds to the genetic interval 144.3–173.1 cM in the Marshfield map, and to the physical interval 123.3–135.1 Mb in Build 36 (cytogenetic band 10q26.12-qter). The contribution of individual studies to this bin suggests that most linkage evidence arises in the studies with broad European-ancestry studies, with weaker evidence from the Finnish, Scandinavian, and Dutch studies (fig. (fig.2).2). A further region on chromosome 8 showed suggestive evidence for linkage in the unweighted analysis (p = 0.00635), but only nominal significance in the weighted analysis. This region (bin 8_5; 8q22.2-q24.21; 101.5–139.5 Mb), includes rs648119 (at 103.2 Mb) which was associated with CD in the GWA study with p = 4 × 10–5. In the analysis of five studies based on classic CD patients, we observed only nominal significance in bins 10_6 and 8_5, but suggestive significance for linkage on chromosome 19 (bin 19_2, 19p13.2-q12). This bin contains SNP rs1036229 (at 34.7 Mb), which achieved a p value of 2.5 × 10–5 in the GWA, and MYO9B (at 17.2 Mb).
We re-analyzed chromosomes showing suggestive evidence for linkage using two narrower bin widths of 20 cM (giving 173 bins) and 10 cM (giving 349 bins) to assess whether narrower bins provide finer localization of linkage findings. Results of the weighted GSMA are shown in the supplementary material for chromosome 8 and 10 (using 8 studies) and chromosome 19 (for the 5 CD-only studies). Similar results were obtained in the unweighted analysis. For chromosome 8, suggestive evidence for linkage (p = 0.00491) was obtained with the 20 cM bin at 99.2–118.6 Mb. No improved localization for chromosome 10 linkage finding was obtained. On chromosome 19, analysis of 10 cM bins showed that the strongest evidence for linkage occurred at 13.4–17.9 Mb, an interval which includes the MYO9B gene.
Testing signals of association and linkage within bins showed no significant evidence for co-localization of results (see fig. fig.11 for bin location of association signals with p < 10–4 and for the p values from the hypergeometric distribution), except for the genome-wide threshold that identified in the GSMA analysis the HLA region (p = 0.02044). For example, 8 out of 117 bins showed nominal significance in the weighted GSMA (p < 0.05), and only 3 of these were among the 33 bins containing significant association signals (p = 0.15123).
CD is an interesting model disorder for complex diseases, as the main environmental factor (gluten) and genetic factor (HLA) have been identified. Several genome-wide linkage scans and candidate gene studies have been carried out to identify other genetic factors underlying CD, with no clear definition of further genetic contributions to CD. In this study, we performed meta-analysis of eight genome-wide linkage studies in CD, to summarize the evidence for linkage in these extensive family studies.
Highly significant evidence for linkage was obtained in the HLA region and flanking regions on chromosome 6. We also obtained suggestive evidence for linkage to the telomeric region of chromosome 10q. This locus provided only marginal evidence in some studies and would have not be identified by ad hoc assessment of individual scan results, showing the strength of the GSMA in identifying regions with modest but consistently elevated linkage scores across studies. Further, looking at the specific contributions of individual genome-wide linkage studies to the 10q26.12-qter region, it is apparent that most linkage evidence derives from studies with broad European ancestry, including the Italian study. An additional region on chromosome 8 showed suggestive significance in some analyses, and includes SNP rs648119 which was identified in the GWA , although not confirmed in the subsequent follow-up .
The chromosome 5 region highlighted in the previous meta-analysis  did not achieve even nominal significance in this study. This inconsistency is not surprising given current study doubled both the number of studies and the number of affected individuals included. Application of different weighting functions did not affect results, but other explanations include the different test statistics (NPL and MLS) included in the current study (the previous analysis had used Kong and Cox's Zlr statistic for all studies) , or a false positive finding in the initial meta-analysis. Analysis of the subset of four studies from the original meta-analysis recaptured the 5q linkage evidence, implying that heterogeneity between studies, rather than differences in statistical methodology may explain the discrepancy. Notably, the 5q results may be correlated with family structure. When we stratified the studies by affected sibling pairs and more extended pedigrees, we obtained suggestive significance in the same 5q region when analyzing the four affected sib pairs studies [4,5,6, 8]. Indeed, in the previous meta-analysis, 3 of the 4 studies included only affected sib pairs.
In general, we observed no concordance between regions showing significant linkage identified in this GSMA and the occurrences of the significant SNPs in genome-wide association analysis study , except for the HLA locus. The number of ‘linked’ bins with significant association findings was not significant under either GSMA analysis method (weighted, unweighted) or the GSMA thresholds considered (suggestive significance, nominal significance, and p < 0.10). Association studies with current SNPs platforms are aimed at identifying common variants involved in disease etiology, whereas linkage approaches could be used efficiently to identify multiple rare variants (even variation that is unique at the individual level) of moderate effect, which are not possible to detect using traditional association methods. On the other hand, common variants identified by van Heel et al.  would have very low power to be detected in a linkage study.
We sub-divided studies by phenotype, under the hypothesis that more phenotypically homogenous studies may have higher power to detect linkage. Excluding three studies that included DH cases gave an additional region with suggestive evidence for linkage at 19p13.2-q12 (8–36 Mb), which was further localised to 14.9–17.9 Mb under analysis with 10 cM bins. This region contains the MYO9B gene located at 17.2 Mb (which was not highlighted in the GWA study). MYO9B is thought to have a role in epithelial barrier functions, and association with CD has been replicated, but does not arise in all populations . Association with inflammatory bowel disease, systemic lupus erythematosus, rheumatoid arthritis, and type 1 diabetes have also been reported, making this gene a potential shared risk factor across autoimmune disorders.
These results, in conjunction with data emerging from dense SNP typing of specific regions or future genome-wide association studies will help guide efforts to identify the actual predisposing genetic variation contributing to this complex genetic disease. In particular, results from a meta-analysis of linkage studies can be used in a GWA study to enhance power by weighting association evidence using linkage results. This approach had been shown to have considerable powerful, if the linkage study is informative .
This research was funded by the UK MRC (G0400960 to CML), and the NIH (DK50678 to SLN). The Dutch linkage study was supported by the Coeliac Disease Consortium, an innovative cluster approved by the Netherlands Genomics Initiative and partially funded by the Dutch government (grant BSIK03009). We thank all authors of the original genome wide linkage studies for their contribution to this study.