The extremely polymorphic nature of the HLA-encoding loci can create challenges with maintaining a consistent nomenclature among laboratories. Before the advent of molecular genotyping technology, HLA variation was determined serologically, using antisera derived from women who have had multiple pregnancies, and had a simple nomenclature consisting of a letter to describe the locus and a number to specify the allele (eg, A1 or DR4). As serologic genotyping technology became more refined, and as more variation was discovered in HLA, some of the original designations were subdivided (eg, the DR2 group was subdivided into DR15 and DR16).
The emergence of DNA-based genotyping led to the creation of a nomenclature system that included the locus name, followed by an asterisk, followed by a numerical designation for the allele (eg, DRB1*04:05). In the previous example, the “04” refers to the serologic group of the DRB1 allele, and the “05” refers to the individual allele within the DRB1*04 group. DPB1 is an exception to the serology-based nomenclature because no adequate serologic typing system was available for DPB1. Nearly all DPB1 polymorphisms have been discovered since the advent of DNA-based genotyping, with alleles named in order of discovery, rather than by serologic group. Thus, with two exceptions (DPB1*02:02 and DPB1*04:02), the second section of DPB1 allele designations is written “01.”
Adding to the complexity of the nomenclature system are silent polymorphisms (ie, changes to the third nucleotide in a codon that do not change which amino acid it codes for). For example, a G to T change in the third position of the “TCG” codon that encodes serine would result in a “TCT” codon, which also encodes serine. Silent polymorphisms were originally designated as an additional digit (eg, A*68011), but later changed to an additional two digits (eg, A*010101). In 2010, the nomenclature system was again changed, this time to add a colon delimiter between the sections of an allele designation such that each section is no longer limited to two digits (eg, A*02:171:01). Allele designations can become even longer when intronic polymorphisms are reported, and alleles with known expression anomalies can be designated with “L” (low expression) or “N” (null) at the end. Reporting of HLA data is still a mixed bag, with some papers using the new nomenclature, other papers using the old, and others still using old, serologic nomenclature. For clarity throughout this text we will report alleles with four-digit resolution (the first two numeric sections of the allele designation), because this is sufficient to indicate the amino acid sequence of the encoded protein.
In addition to variation in the reporting of allele designations, published HLA and disease association reports also vary with respect to the extent of HLA information reported. In some cases, reported data are given only for alleles at a single genetic locus, such as DQB1. In other cases, haplotypes (ie, alleles at multiple genetic loci that lie on the same chromosome) are reported (eg, DRB1*03:01-DQA1*05:01-DQB1*02:01). Haplotypes can only be directly determined in family-based studies, where transmission of sets of alleles from parent to child can be documented; however, many allele combinations are commonly found together due to linkage disequilibrium (LD) in the HLA region, thus allowing haplotypes to be assumed and/or estimated with specialized software programs. Some studies report data for genotypes, which represent genetic information taken from both chromosomes. A genotype can be reported for alleles at a single locus, or for haplotypes. The most T1D predisposing genotype is comprised of haplotypes DRB1*03:01-DQA1*05:01-DQB1*02:01 and DRB1*04:01/02/04/05/08-DQA1*03:01-DQB1*03:02/04 (or DQB1*02) and is commonly abbreviated with the short serology notation “DR3/DR4.”
Polymerase chain reaction–based HLA typing has greatly refined and increased our understanding of these HLA associations with T1D, initially observed and reported based on serological typing, and has led to an explosion of newly discovered alleles in the past decade, with more than 6000 total alleles currently named. Alleles can differ by as few as a single nucleotide in the DNA sequence, or they can have multiple differences. Most of the extensive polymorphism in the HLA genes is found in the portions of the DNA sequence that encode the amino acid residues that form the peptide-binding groove, where these differences can change the shape or the charge within the groove, thus changing the repertoire of peptides that can bind to and be presented by HLA.