In this systematic review, computer algorithms were used to map all P/PI genes listed in the MEROPS database onto critical genomic regions extracted from genetic association and linkage studies performed in IBD. While the top ranked genes ( and ) included some P/PIs previously found to be associated with CD and/or UC, such as MMP2, MMP15 and MST1, a series of P/PI genes were identified, which have not been previously related to Crohn's disease or ulcerative colitis. The top 5 ranked P/PI genes for CD and UC were all characterized by high evidence scores and positive results in several GWAS and/or replication studies of GWAS. P/PI genes ranked lower were typically based on positive results in candidate region studies and genome-wide linkage scans, which were of lower resolution. At the time of the last update of our systematic review, most of the evidence had accumulated for CD, with 67 studies addressing CD as compared to 37 studies in UC. The number of positive studies among top ranked P/PIs was considerably larger, evidence scores were clearly higher and their variation more pronounced in CD as compared with UC. Unsurprisingly, ranks were completely robust for CD in a sensitivity analysis omitting GWAS, but showed some changes in the ranking for UC.
Among the top-ranked P/PIs identified in our study, some of the most promising are CYLD
for CD, and APEH
and the group of ubiquitin-specific peptidases for both, CD and UC. In an expression microarray study, CYLD
, encoding a deubiquitinating enzyme (also see above), has been identified as one of the most significantly downregulated genes in the intestine of IBD patients 
. In an IBD animal model, cyld−/−
mice displayed more severe intestinal inflammation and intestinal tumorigenesis 
encodes acylpeptide hydrolase, an enzyme expressed in the intestinal mucosa, which is able to cleave N-formyl peptides derived from bacteria, a potent pro-inflammatory chemo-attractant for phagocytes 
encodes alpha- und beta-dystroglycan proteins, which are generated from a common precursor through autocatalytic cleavage. It has been hypothesized that alpha-dystroglycan acts as a receptor for mycobacterium avium paraturbeculosis
in the intestine, a bacterium repeatedly suspected to be causally related to CD 
. The ubiquitin-proteasome system (UPS) is closely linked to the top ranked CYLD
and includes, among the top 20 ranked genes, USP40
for CD, USP3
, and PSMB9
for UC, and USP4
for both phenotypes. It is known to play a role in the development of inflammatory and autoimmune diseases through multiple pathways, including MHC-mediated antigen presentation, cytokine and cell cycle regulation, and apoptosis 
. Finally, MST1
, already repeatedly associated with IBD 
, was also ranked high for both CD and UC. It encodes macrophage stimulating protein 1 and is involved in apoptosis. Note however that the protein is presumably not active as a protease due to a mutation at the catalytic site.
In this systematic review we included genetic studies with differences in methodology (linkage versus association) and thus differences in resolution and accuracy by which a given genomic region was studied, in genetic markers used, and in definitions applied to establish and report association or linkage of a gene or region with IBD. A formal meta-analysis was not feasible, therefore. Rather, we based our systematic review on an approach commonly referred to as vote count 
, and merely distinguished between positive and negative studies on a specific P/PI gene as identified by our mapping algorithm. The higher the power of the studies included in the systematic review the more appropriate vote count methods will be 
. As suggested by Barrett et al. 
, individual genetic studies in IBD often have enough power to detect large effect sizes, but limited power to detect small to moderate effects corresponding to odds ratios of 1.2 to 1.5. It is therefore likely that some of the vote counts observed in included studies were false negative on small to moderate associations of a P/PI gene with IBD. We took this into account by using low cut-offs for evidence scores of P/PI genes to be retained in the final ranking. This low cut-off counteracted the limited power of individual genetic studies and was deemed to decrease the overall risk of false negative conclusions about the association of a P/PI gene with CD or UC in our review. This means that a P/PI gene was retained even if the proportion of positive studies was small. If the majority of negative studies were true negatives and the majority of positive studies false positives, we would erroneously suggest an association of a retained P/PI gene with IBD. There will always be a trade-off between false negatives and false positives, and our strategy of counteracting false negatives was bound to increase the risk of false positives. Therefore, any of the retained P/PI genes considered for further scientific investigation needs to be confirmed first in an adequately powered, independent replication study on its association with CD or UC.
We emphasize that even if associations between a P/PI gene and IBD were true, this does not necessarily indicate that a polymorphism in this gene has a causal role for CD or UC. Genetic linkages and associations are influenced by linkage disequilibrium patterns of the study population, which limit the resolution of any genetic study. Therefore, associations observed in our study may not be attributable to single genes but rather to genomic regions containing several genes, which are in strong linkage disequilibrium. Therefore, genes other than the P/PI gene identified by our algorithm in a specific critical region could be responsible for the observed association with IBD. For example, the top-ranked P/PI gene in CD, CYLD
on chromosome 16 (49.33 to 49.39 Mb) is located adjacent to CARD15
(Mb 49.28 to 49.32) which traces back to the same critical region. The functional link of CARD15
to IBD has been firmly and reproducibly established 
: there are several well-characterized polymorphisms in CARD15
that lead to different capacities of the protein products to regulate NF-kappaB-mediated inflammatory responses to bacterial components in the gut, thus providing a causal explanation for the observed association with the disease. However, the association and linkage signals of the involved critical region on chromosome 16 can only partially be explained by polymorphisms in CARD15
: Hampe et al. found that a robust association signal in this region remains after stratification by CARD15
. It is therefore plausible that an adjacent gene, such as CYLD
, may account for this association signal in this critical region and the neighborhood of CYLD
should not preclude CYLD
to be considered as a potential candidate P/PI gene and further investigated in IBD. Conditional genotypic analysis of CYLD
-negative patients, which is ongoing in the replication study, will clarify the hypothesized independent association signals in both genes.
Another important limitation is that we were unable to gauge the direction of associations between P/PI genes and IBD for two reasons. First, in the presence of identical genetic markers and definitions of associations, the vote count used in our study could not distinguish between an increase in the odds of IBD associated with the marker in one study and a decrease in the odds associated with the marker in another study. If both studies were positive on an association of this marker with IBD, then we would consider them to be concordant even though they may have found opposite directions of associations. Second, the heterogeneity in markers used in different studies makes it impossible to achieve comparability of measures of association. Even if two studies showed an association in the same direction and of a similar magnitude, differences in the types of genetic markers could still mean that the two studies are actually discordant. Ignoring the directions of associations as described here, may therefore result in an overestimation of the accumulated evidence and we emphasize once more the need for validation of our results. Although being careful in avoiding any duplicate extraction within the same genetic region of the same population, we cannot not fully exclude that some genetic region of some patients were included multiple times in our study if some previously studied patients were subsequently included in later studies of larger populations. Finally, candidate gene and candidate region studies may be subject to selective reporting and publication bias, with predominant reporting of statistically significant results. We cannot exclude that this has influenced our ranking of some P/PI genes. We believe, however, that the direction and magnitude of this bias are similar across all P/PI genes. Therefore its overall impact on relative rankings is likely to be small. In addition, a variety of strategies for internal validation through negative and positive controls suggested our approach to be valid.
Our method is complementary to the classical approach of formal meta-analysis: using the algorithm, genetic evidence can be gauged genome-widely, considering all available studies of different types, even if different analytical methods were used. The common concept ascertained is the ‘critical genomic region’ irrespective of study design and genotyping technique used. This avoids the need for fully compatible genetic markers or imputations to achieve compatibility, as used in classical meta-analysis 
. The ranking algorithm is based on numerical information about the critical regions and the genomic locations of P/PI genes in the human genome in relevant databases. Errors in these databases inevitably lead to errors in the gene ranking, which can only be addressed in subsequent updates. It must be noted that many entries in MEROPS
are putative P/PI genes predicted theoretically, but have not been functionally validated. For example, Haptoglobin (HP
) and Haptoglobin-related protein (HRP
), which rank in the top 20 for UC (), are taken up in the MEROPS
database due to a peptidase inhibitor sequence motif, despite that there is no supporting experimental evidence. The high scores for the firmly established susceptibility genes CARD15
in CD, and IL23R
in both CD and UC, which were generated by the algorithm after mapping the genomic locations of these genes onto the critical regions extracted from genetic studies, suggest that the methodology used in our systematic review is indeed valid. The scores for CARD15
in CD, and IL23R
in UC, were in the range of the 20 top-ranked P/PI genes in both phenotypes.
After closure of our database, various genome-wide association scans in UC and CD were published 
. Several previously known genomic regions were replicated and novel susceptibility regions were revealed. These studies, together with other recently published genetic studies 
, increase considerably the available genetic information for UC and CD, and will be considered in future updates. In an attempt to validate our approach, however, we examined whether top ranked P/PI genes met genome-wide significance at the level of p<5×10−8
in the two most recent meta-analyses of GWAS in CD and UC 
. For both conditions, the 5 highest ranked P/PI genes all met genome-wide significance (Table S4
and Table S5
). For 14 of the top 20 P/PI genes in CD and 11 of the top 18 P/PI genes in UC, criteria of genome-wide significance were not formally met in the meta-analyses 
. The relevant, but only partial concordance in 30 to 40% of P/PI genes suggests in any case that our approach is not redundant in the presence of large scale meta-analyses. Rather, it will provide complementary information to be subsequently verified. Based on published results, we are currently unable to determine whether the discordance observed was due to false negatives in the meta-analyses or false positives in our study and would welcome detailed data on all top ranked P/PI genes as found in these meta-analyses 
. As part of the EC-funded research project IBDase, the ranking of P/PI genes established in our systematic review is also used to guide replication studies of candidate P/PI genes and their functional characterization in interdisciplinary mechanistic studies in vitro and in vivo. These additional data will contribute to our understanding of putative causal links of these genes with IBD.