Given the complexity of the integration of results and incorporation of information from within and between various levels of analyses, e.g., pairwise allele level comparisons, Unique Combinations analyses, LD patterns, and SFVT analysis (), we summarize the findings now to aid navigation through the series of results detailed below. The AAs we identify as major or important in JIA-OP disease risk, always indicated in bold below are (listed in numeric order): AAs 13 (pockets 4 and 6), 37 and 57 (both in pocket 9), 67 (pocket 7), 74 (pocket 4), 86 (pocket 1), and those potentially involved in disease risk are (underlined): 30 (pockets 6 and 7), and 71 (pockets 4, 5, and 7).
3.2. HLA DRB1 allele level analyses
A total of 38 DRB1 alleles were observed: 17 were included in the “binned” category, leaving 21 frequent alleles (). (Note that for the SFVT analysis, the rare DRB1 alleles are always included.) As previously described [12
] there is significant heterogeneity in allele counts between patients and controls (overall test: p < 1.1E-27). Two predisposing alleles: DRB1*0801 and *1104, and three protective alleles: DRB1*1501, *0701, and *0401, show very strong individual effects (based on their p
-values), with weaker effects of DRB1*1301, *1103, *0404, and *0103. Note that for the rarer alleles, e.g., DRB1*1103 which has the highest OR, the 95% OR confidence interval (CI) spans a large range; hence conclusions from analyses that use this information are subject to this uncertainty.
JIA-OP HLA DRB1 allele data ranked by Odds Ratio (OR)
The first column of lists the alleles (labeled in categories 1-3) which were sequentially removed in successive rounds in the Relative Predispositional Effects (RPE) analysis until no significant differential effects were seen. Note that (as mentioned in Section 2.4) the higher frequency alleles are targeted with this analysis. In the second column, the more common alleles are divided into a set (A) of three differential risk categories containing high frequency alleles: I (predisposing), II (neutral), and III (protective). The boundaries, and inclusion of alleles in each category, were predicated on evidence of disease risk heterogeneity between categories, and homogeneity within categories. In the third column, the list of alleles (set B) in each risk category is expanded to include some rarer alleles, but with the boundaries still delineated by more common alleles; inclusion in each respective risk category is now indicated by Ix, IIx, and IIIx. The rare allele DRB1*0403 is not included as it cannot be classified as predisposing versus neutral, similarly the four rare alleles between sets IIx and IIIx (the neutral/protective boundary). These sets A and B of differential risk categories are used below in the Unique Combinations analyses.
3.3. HLA DRB1 allele level pairwise within serogroup analyses
HLA nomenclature (except for DP) is such that alleles sharing the same 2 first digits generally belong within the same serogroup. Variation in the 2 last digits indicates AA differences within the serogroup. For example, DRB1*0101, *0102, and *0103 belong to the serogroup denoted *01XX. Alleles within the same serogroup are more closely related at the AA level, hence significant differences in risk within serogroups, and between specific pairs of alleles, may identify specific AAs, or a few AAs, involved in disease. Pairwise comparisons within serogroups of alleles with sufficient sample size—DRB1*01XX, *04XX, *11XX, and *13XX—were performed; this was followed by manual inspection of their respective sequences for comparisons where significant risk heterogeneity was detected. Significant results (ordered by p-value) are given in . The strong evidence for the role of AA 86 in differential disease risk is of particular interest, since with SFVT analysis this AA shows no significant effect (see Section 3.6, below). There is also evidence for a direct role of AA 74.
JIA-OP HLA DRB1 allele pairwise comparisons
3.4. Unique Combinations comparisons
In the original Unique Combinations algorithm of Salamon et al. [3
], two categories of sequences are defined by the user: those in the “check” category are compared against those in the “group” category in order to identify combinations of sites that are unique between these two sets of sequences. However, when there are two or more sequences in the “check” category, sites that are polymorphic between these “check” sequences are excluded from consideration. We have extended the algorithm to allow inclusion of all sites that are polymorphic in the “check” category, thus expanding the utility of the method. Also, this means that the “group” and “check” categories are now interchangeable, whereas before there was an asymmetry.
This extension of the Unique Combinations algorithm provides an ordered list of a minimal number of polymorphic positions, which as a haplotype (combination of AAs on a chromosome) can differentiate between any set of sequences of alleles in the “check” versus “group” categories. Deriving the vectors of AAs that correspond to the resulting minimal unique combination generates unique sequences that either belong to the “check” category or the “group” category.
Using the subdivisions of common DRB1 alleles into the three categories defined above as sets A and B ()—I and Ix (predisposing), II and IIx (intermediate), and III and IIIx (protective)—we performed various Unique Combinations comparisons of each risk category versus the other two risk categories, e.g., I versus II + III (). The AAs identified as important in the Unique Combinations analyses are AA 86 (as in the pairwise allele within serogroup analyses in Section 3.3 above) combined with AAs 13 and 37, or 13 and 67.
JIA-OP HLA DRB1 Unique Combinations (UC) analyses
3.5. HLA DRB1 amino acid LD patterns
The AA LD values in the control data for exon 2 of the HLA DRB1 variation (AAs 9-86) are given in . This information is used in evaluation of the SFVT data below. The LD values show a complex pattern with (using examples from the six AAs we have identified as most strongly implicated in disease risk): (1) “blocks” of AAs where adjacent sites all have high LD with each other (13 and 9-12); (2) individual AAs, each with high and moderate levels of LD with quite a few other AAs (13 and 37), and similarly but with lower levels of LD (57 and 74); and (3) individual AAs with very low levels of LD with most or all other AAs (67 and 86).
Linkage disequilibrium (LD) plot of polymorphic amino acids 9 - 86 of HLA DRB1 a
3.6. SFVT analyses (Table 4, part A)
The SFVT data in are ranked by the p-value of the overall chi-square heterogeneity patient versus control analysis of the VTs for each SF listed. Note that any attempt to draw conclusions from minor differences in p-values of this magnitude is an over-interpretation of the data. Also listed are the maximum (max) and minimum (min) ORs seen for individual VTs for each SF, for example, for SF1 (allele) (rank 5 in ), these are 9.40 and 0.28 (see ). These values are used to determine the ability of a SF to differentiate between risk categories of VTs, both with consideration of the max and min ORs and the range of the ORs for a particular SF. However, keep in mind, as mentioned above, that the highest OR of 9.40 for allele DRB1*1103 is based on a very rare allele, the next highest allele level OR is 6.90 for DRB1*0801 (see ).
AA 13, pockets 4 and 6
The single AA 13 (SF57) is rank 1 in the SFVT analysis (), with a reasonable range in OR values (4.91 to 0.33). From , we see how AA 13 by itself partitions the disease risk (although certainly not perfectly): the residues G and S are only seen in the predisposing and neutral allele level disease risk categories, while the other 4 residues: F, R, H and Y are only seen in the neutral and protective categories. AA13 is the major contributor to the pocket 6 and pocket 4 associations (ranks 2 and 3). AA 30 (pocket 6) may play some role in disease risk, but the effect may be explained by LD.
JIA-OP HLA DRB1 Amino Acid Residue Variation
The effects of the single AAs 9, 10, 11, 12 and 16, which occur in the top 20 ranked SFs in , can be explained by LD with AA 13 (). These AAs are indicated in italics in , and are not discussed individually below. AA 13 is chosen over these AAs since it individually has a stronger effect, based on p-values and OR values and range, it occurs in more top ranked SFs than do any of these AAs, and it was identified individually over these other AAs, in combination with AAs 37 and 86, or 67 and 86, in the Unique Combinations analyses.
AA 67, pocket 7
AA 67 (SF98, rank 14) has the second highest rank of the single AAs, and is the major contributor to the SFs ranked above it and below SF1 (allele) (excluding the 3 single AA SFs AAs 10, 11, and 12, see above), and was identified in the Unique Combinations analyses. While AA 71 (SF102, rank 22) contributes an additional effect to AA 67 by itself, due to greatly increasing the OR max value of pocket 7 (rank 6), this in fact is a minor effect (in this data set) reflecting unique identification of the very rare predisposing DRB1*1103 allele.
AAs 13, 67, 74, and 86, and pockets 6, 4, and 7
AAs 13 and 67 together distinguish all the significant effects of the alleles listed in , with two exceptions which are covered by AAs 74 and 86. AA 74 (the third ranked highest single AA effect (SF104, rank 16)) was identified in the allele level pairwise comparisons () as necessary to distinguish between the rare neutral/predisposing DRB1*0403 allele and the protective *0401 and *0404 alleles (p < 0.002). Note that this effect of AA 74 was not picked up in the Unique Combinations analyses, since our inability to definitively place DRB1*0403 in either the predisposing or neutral categories precluded its consideration in the Unique Combinations analyses. AA 86 was identified in the allele level pairwise comparisons as necessary to distinguish between the predisposing DRB1*1104 and neutral *1101 effects (p < 0.003) (); it was also picked up in the Unique Combinations analyses. However, in the SFVT analysis (), note that the individual effect of AA 86 is not significant, nor is it identified as potentially involved in disease from its presence in other SFs: it is the 8th lowest ranked SF (SF110, rank 42), and pocket 1 (SF132) in which it occurs ranks two below this. However, as indicated in , AA 86 is necessary to explain significant disease risk effects. The four AAs 13, 67, 74, and 86 uniquely define all the alleles listed in (see ), not considering the very rare alleles in the binned category in .
AAs 37 and 57 (pocket 9)
AAs 37 (SF74, rank 17) and 57 (SF90, rank 19) are the next to enter into our consideration. The Unique Combinations results () implicate 13, 37, and 86 as a potential AA combination that can explain disease risk, along with 13, 67, and 86 as an alternate combination. Further, to jump ahead, the AA combination 13 and 37 together have rank 1 in the analysis of tSFs discussed below (). The combination of AAs 13, 37, 74, and 86 also explains all known disease risk at the allele level, except the marginally significant allele pairwise comparison of DRB1*0101 (neutral) to the rare *0103 (protective) (p < 0.02); AA 67 distinguishes this risk, as does AA 71; as above AA 71 also further distinguishes DRB1*1103 and *1104, which in this data set are not significantly different in their effects.
Single AAs and other SFs listed below rank 20 in either do not themselves have a wide range of OR values, and/or their effects may be explained by LD with other AAs or SFs, or they do not have a significant overall effect. However, we note that AAs 9, 10, 11, 12, 26, 28, 47, 56, 58, 60, and 85 also show up frequently in the top SFVT results, including comparisons based on p-values for individual VTs of each SF, and these should be considered in additional analyses.
3.7. SFVT Analysis of Temporary SFs (tSFs) (, part B)
The SFVT data in are the most relevant data from analyses of tSFs defined by specific potentially informative combinations of the AAs 13, 37, 57, 67, 74, 86,
30, and 71 (based on the analyses above). Again, these are ordered by p-value, but note that the range of p-values in shows only minor differences, and the effects we will concentrate on hence focus more on the max and min OR and the range of OR values. The SFs from which include the span of p-values of the tSFs are also included in for comparison.
The pair of AAs 13 and 37 (SFt201, rank 1 in ) captures the JIA-OP disease risk with 11 VTs, including a good range of OR values, as does the pair 13 and 67 (SFt203, rank 9) with 13 VTs. Both increase the max OR, but barely change the min OR, compared to AA 13 by itself (SF57, rank 7, 6 VTs). The addition of AAs 74 and 86 to both these combinations (SFt205, rank 2: and SFt206, rank 5) adds no further discrimination based on the SFVT analysis, but as noted above are required to account for all disease risk heterogeneity.
Only addition of AA 71 gives the full range of ORs seen with DRB1 allele level variation (SF1, rank 19) (see SFt216, rank 17 and SFt215, rank 18); as noted above, AA 71 distinguishes the rare highest predisposing risk DRB1*1103 allele from the predisposing (common) DRB1*1104 allele.