|Home | About | Journals | Submit | Contact Us | Français|
In the last two decades, all but one of the genes encoding the 30 blood group systems present on red blood cells have been identified. This body of knowledge has permitted the application of molecular techniques to characterize the common blood group antigens and to elucidate the background for some of the variant phenotypes. DNA sequencing methodology was developed in the late 1970s and has become one of the most widely used techniques in molecular biology. In the field of immunohematology, this method is currently used by specialized laboratories to elucidate the molecular basis of unusual blood group phenotypes that cannot be defined by serology and genotyping. Because of the heterogeneity of the blood groups on both the antigen and the genetic level, special knowledge of the biology of blood group systems is needed to design sequencing strategies and interpret sequence data. This review summarizes the technical and immunohematologic expertise that is required when applying sequence-based typing for characterization of human blood groups.
Die Klonierung der Gene fast aller Blutgruppensysteme (29 von insgesamt 30) in den letzten 20 Jahren war die Grundlage dafür, dass viele häufige und auch seltene Blutgruppenmerkmale molekularbiologisch abgeklärt werden konnten. Dabei spielte die Technologie der Sequenzierung, die sich seit ihrer Entwicklung in den späten 1970er Jahren zu einer zentralen Methode in der Molekularbiologie entwickelt hat und eine Analyse von Gensequenzen auf Einzelnukleotidbasis ermöglicht, eine entscheidende Rolle. In der Immunhämatologie hat sich die Sequenzierung als Referenzmethode etabliert, die vor allem dann zum Einsatz kommt, wenn ungewöhnliche Blutgruppeneigenschaften serologisch und mit Hilfe von anderen Genotypisierungsverfahren nicht abgeklärt werden können. Da sich die verschiedenen Blutgruppen in ihrer molekularen Struktur und genetischen Grundlage stark unterscheiden, setzten die Konzeption von Sequenzierungsstrategien und die Auswertung der Sequenzdaten ein Spezialwissen über die Biologie der Blutgruppensysteme voraus, welches in der Regel nur in Referenzlaboratorien verfügbar ist. Diese Übersichtsarbeit befasst sich mit dem technischen und immunhämatologischen Know-how, das notwendig ist, wenn Sequenzierungsverfahren für die Charakterisierung von menschlichen Blutgruppenmerkmalen eingesetzt werden.
In the last 20 years, the field of immunohematology has faced a rapid development and application of molecular biologic techniques. Numerous studies have analyzed blood samples from people with known antigen profiles and identified the molecular basis associated with many red blood cell (RBC) antigens. In the process of gathering more information on the molecular basis of RBC phenotypes it became obvious that the blood group systems are based on a great genetic diversity. According to the Blood Group Antigen Gene Mutation (BGMUT; www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.cgi?cmd=bgmut/resource) database of variations in the genes that encode antigens of blood group systems, more than 1,000 alleles of 40 genes are currently known . The characterization of genes and determination of the molecular basis of antigens and phenotypes have made it possible to use PCR to amplify parts of DNA of interest in order to detect alleles encoding blood groups. While there are many molecular events that give rise to blood group antigens and phenotypes, the majority of genetically defined blood group antigens are the consequence of a single nucleotide polymorphism (SNP). Thus, simple DNA-based assays can be used to identify SNPs in genes encoding blood groups. Innumerable genotyping assays have been described to detect specific blood group SNPs, including PCR-RFLP (RFLP = restriction fragment length polymorphism), allele-specific PCR, real-time quantitative PCR, and high-throughput bead technology. However, all these genotyping methods are designed to detect and identify known alleles. In cases where new nucleotide mutations or new combinations of known mutations occur, a nucleotide-by-nucleotide analysis of the respective gene becomes necessary to determine the genetic sequence. Nucleic acid sequencing is a technique that allows determination of the order of the four different nucleotides in a DNA molecule . It is considered the reference method in molecular biology. The nucleic acid sequence data obtained by this method can provide detailed information on any mutation present, on the organization of genes and on the protein product of the DNA/RNA analyzed.
While nucleic acid sequencing has already become an indispensable tool in blood group research, it is also increasingly used in routine blood grouping to finally clarify the genetic background of variant blood group phenotypes when genotyping assays do not provide conclusive results. The importance of nucleic acid sequencing as reference method for blood group determination will grow with the increasing application of molecular methods in immunohematology. Therefore, it becomes more and more relevant for immunohematologists to become familiar with the principle and the areas of application of this method.
The introduction of the PCR technique  presented the opportunity for the generation of target-specific DNA fragments and the easy detection of known polymorphisms in different genes, for example via single specific primer PCR (SSP-PCR). For the investigation of suspected mutations, DNA sequencing provides a powerful tool. A variety of sequencing methods have been described, mainly based on chain-terminating dideoxynucleotide sequencing published by Sanger et al.  and Rapley (ed.) . While the original methods were time-consuming and often involved radioisotopes, the introduction of commercially available dye-labeled terminators has increased the reliability for cycle sequencing of PCR products. Each terminator (dideoxynucleotide triphosphates; ddNTPs) is labeled with a different fluorescent dye and attaches covalently to the amplicon during the extension reaction. The resulting products are separated with capillary electrophoresis and detected with laser technology. During this process an electropherogram is generated, which graphically represents the sequence of the PCR product (fig. (fig.11).
There are principally two ways to sequence DNA: direct sequencing and sequencing of cloned DNA segments. Direct sequencing means that the PCR products obtained by amplification of a certain DNA part are directly sequenced without first cloning the fragment. Figure Figure22 schematically shows the general procedure of a direct sequencing experiment.
In contrast to direct sequencing, sequencing of cloned DNA refers to the procedure in which a DNA sequence is amplified by genetic engineering techniques and then used as template for nucleic acid sequencing. The major advantages over direct sequencing are that the cloned DNA can be propagated indefinitely and that individual haplotypes are separated which can subsequently be sequenced. The drawbacks are that cloning is labor-intensive and time-consuming and requires special know-how and laboratory equipment. In addition, it can sometimes be difficult to discriminate true mutations from those artificially introduced into the DNA during the amplification process in bacteria. Nevertheless, sequencing of cloned DNA is particularly helpful in cases where haplotypes cannot otherwise be separated or a chimerism is suspected, suggesting the presence of more than two haplotypes.
The classical method of testing for blood group antigens and antibodies is hemagglutination. This technique is simple and inexpensive and, when done correctly, has a specificity and sensitivity that is appropriate for the clinical care of the vast majority of patients. However, in some aspects, hemagglutination has limitations. For example, it cannot indicate RHD zygosity in D-positive individuals precisely and it cannot be relied upon to type patients and donors who have a positive direct antiglobulin test and who have recently received transfusions. DNA-based blood group typing assays have proven extremely useful in those areas where hemagglutination is of limited value. There is a growing list of clinical applications of DNA analysis for blood group antigens . At present, methods for blood group genotyping are predominantly used as a supplement to hemagglutination. In clinical laboratories it is becoming more and more common practice to follow a graduated operating procedure that provides for hemagglutination followed by DNA analysis when a blood sample shows an unusual blood group phenotype or contains difficult-to-identify RBC antibodies. While this step-wise approach proves successful in many cases, its limitation is documented by the fact that there are consistently a number of blood samples showing unusual serologic characteristics in the results of the genotyping assays that do not match with the corresponding phenotype. In these situations, extended DNA analysis by sequence-based typing is the most reliable way to define the genetic variations that may have caused the unusual blood group phenotypes. Consequently, modern blood group diagnostics is based on a three-step algorithm including serology, genotyping, and sequencing (fig. (fig.33).
For correct determination of variant blood groups it will be crucial for the immunohematologist to know when sequencing is indicated. Figure Figure33 shows a flow chart demonstrating which findings make further DNA analysis by nucleic acid sequencing necessary. Initially, the genotyping result for a blood sample has to be compared with the possible combinations of known alleles and correlated with the serology-defined blood group. If the genotyping result is either incompatible with known allele combinations or inconsistent with the RBC phenotype, the genotype has subsequently to be determined by sequence analysis for further clarification. In other words, sequencing has always to be taken into account if genotyping does not provide plausible results that fit the phenotype.
When designing a sequencing strategy, the following questions have to be addressed:
Depending on the structure and allelic repertoire of the gene of interest, some or all of these questions may become relevant for blood group sequencing.
A gene is a portion of DNA that contains both ‘coding’ sequences that determine what the gene does, and ‘non-coding’ sequences that determine when the gene is active (expressed) . The number of exons differs between the blood group genes, e.g. ABO has 7 exons, RHD has 10, and KEL has 19 exons. The length of the intermediate non-coding regions shows a wide range and significantly contributes to the total length of the gene. Thus, it depends on the gene structure and the lengths of the intron sequences whether the complete blood group gene can be amplified by a single PCR reaction or whether a set of PCR reactions is needed to amplify different gene segments. Most sequencing strategies focus on the coding parts and the adjacent splice sites of the blood group genes because mutations in these regions can directly affect the associated blood group phenotypes. The FY(DARC) gene, for example, comprises 1,558 base pairs and can therefore easily be amplified by a single PCR and afterwards sequenced using three or four primers. In contrast, the RHD gene comprises more than 55,000 base pairs while its coding region consists of only 1,254 (<2%) nucleotides. RHD has 10 exons and the size of the intervening introns ranges from 426 base pairs to 11.8 kilo base pairs. Because of the considerable lengths of most of the introns, most investigators prefer an exon-wise amplification and sequencing approach to determine the RHD genotype.
The two RH genes, RHD and RHCE are highly homologous and share more than 90% identity . Because the orientation of both genes is opposite, the molecular basis for gene conversion in cis is given . The proposed mechanism is the formation of a hairpin at the chromosomal level, resulting in the alignment of two homologous gene segments in identical orientation. Resolving the hairpin leads to different hybrid genes often recognized in RHD and RHCE, for example D category VI type 2 (RHDCE(4-6)-D) or RHCE-D(5)-CE. Also parts of RH exons can exchange with their D or CE-specific counterpart (e.g. D category Va type 1 or E Variant type III). So far, a deletion of one RHCE gene like RHD in D-negatives is not known . Therefore, hybrid genes may sometimes remain unrecognized by exon-wise sequencing of the RHCE gene when a normal RHCE in trans is present. The primer design for PCR and/or cycle sequencing requires the differentiation between these two genes . A similar situation is given for the MNS blood group where three homologous genes with an identity of 95% and higher (GYPA, GYPB, GYPE) are present. As described for RH, gene conversion occurs in GYPA and GYPB . Thus, amplification and sequencing of homologous genes require special attention to a possible misinterpretation of data and a profound knowledge of the methods in use.
For correct definition of the genotype it is necessary to determine whether nucleotide mutations are located on the same allele (cis configuration) or different alleles (trans configuration) (fig. (fig.4A).4A). There are principally two strategies to determine the cis/trans linkage of mutations in a certain gene segment by direct sequencing. One strategy is to simultaneously amplify both alleles of the template (generic amplification) and use allele-specific primers for sequencing (fig. (fig.4B).4B). The alterative strategy is to generate haplotype-specific templates using allele-specific primer pairs; these templates can then be sequenced using generic or allele-specific primers (fig. (fig.4C)4C) . Separation of haplotypes by cloning and subsequent DNA sequencing provides another opportunity to determine the cis/trans linkage of polymorphisms. Haplotype determination can also be accomplished by the bead technology (so far only available for HLA genotyping). The principle of this method is the haplotype-specific extraction of DNA with specific oligonucleotides bound to electromagnetic beads (QIAGEN, Hilden, Germany).
Ribonucleic acid or RNA is a nucleic acid polymer consisting of nucleotide monomers that plays several important roles in the processes that translate genetic information from DNA into protein products; RNA acts as a messenger between DNA and the protein synthesis complexes known as ribosomes. RNA is less stable in the cell and experimentally also more prone to nuclease attack. As RNA is generated by transcription from DNA, the information is already present in the cell's DNA. However, it is sometimes desirable to sequence RNA molecules. In particular, in eukaryotes RNA molecules are not necessarily co-linear with their DNA template, as introns are excised. To sequence RNA, the usual method is first to reverse transcribe the sample to generate DNA fragments. These can then be sequenced as described above [12, 13]. Figure Figure55 shows the organizations of the RHCE gene and its mRNA transcript. Similar to RHD, the intron sequences are up to 11 kilo base pairs, whereas the length of the RNA as a product of transcription and splicing has only 1,254 base pairs. Analogously to genomic DNA sequencing, the cis/trans linkage of mutations can be determined by using allele-specific amplification or sequencing primers.
A main problem of handling RNA is its instability in blood samples . Because RNA can be rapidly degraded by RNAses, storage and preparation of RNA require special care. The simplest (and cheapest) method to keep RNA in blood samples stable is freezing of the sample immediately after collection (below −30 °C). In addition, some companies offer special blood vacutainer or stabilizers for storage of RNA samples. Several methods for manual and automated RNA extraction from blood samples exist. Independent of the procedure, a possible contamination with DNA should be proofed. An easy control is the amplification of RNA with intron-specific primer pairs; a detectable PCR product is an indicator for a contamination with DNA.
Several computer programs are available for the analysis of sequencing data and comparison to known reference sequences. In our institute we use the Sequencing Analysis Software v. 5.2® (Applied Biosystems, Foster City, CA, USA) for analysis of raw data and either SeqScape® (Applied Biosystems) or MacVector® (Accelrys Inc., San Diego, CA, USA) to compare the sequence of a sample with a known reference.
The exactness of the alignment and notification of nucleotide positions depends on the settings of the programs. The identification of heterozygous positions, especially if the second peak is much lower than the first, can lead to an oversight of single mutations. To avoid this error, we regularly perform a manual examination of the sequence. For the comparison with reference sequences the NCBI databank (National Center for Biotechnology Information; www.ncbi.nlm.nih.gov) provides a huge number of sequences. If the sequencing for a gene is performed for the first time, we compare our data with several references to avoid a misinterpretation because of sequencing errors. An example can be given for the RHD gene: The alignment with Acc. No. L08429 shows a missense mutation at position 1,035 while the comparison with another reference or with a control sample reveals a wild-type sequence. The gene name, numbers of exons, and reference sequence for the blood group systems which are routinely sequenced in our institute (except ABO) are given in table table11.
A deletion or insertion of one or more nucleotides can cause problems in computer-based analysis of sequencing data if a generic sequencing strategy was used. The missing/ additional nucleotides causes a frame shift in one haplotype and leads to a superimposition of the haplotype sequences 3’ from this mutation (fig. (fig.1),1), and manual analysis is required. Databases are available listing nucleotide changes so far detected in blood group genes. For most of the blood group systems the BGMUT database is a useful source for information about known variants. For the variations of the RHD gene, however, the ‘Rhesus Base’ on the server of the university of Ulm (www.uni-ulm.de/~fwagner/RH/RB) is a regularly updated database with information about phenotype and origin of the mutation as well as several links to publications. Unfortunately, the database does not include the RHCE gene, which is highly homologous to the RHD gene and also shows a great number of known and newly detected variations [15, 16].
To make a new mutation public the ‘EMBL database’ (European Molecular Biology Laboratory; www.ebi.ac.uk/embl) offers a submission entry in which all information about the source of the sample, the blood group system, and the kind of variation is stored.
Nucleic acid sequencing provides detailed information on the genetic sequence by determining the order of the four different nucleotides in a DNA molecule. The advantage of this method compared to other DNA-based typing approaches is that not only known but also novel nucleotide mutations can be detected. However, it is a general limitation of nucleic acid sequencing that the genetic data obtained are restricted to those parts of the DNA for which a certain sequencing strategy was designed. Therefore, immunohematologists who interpret sequencing results need to have a profound knowledge of the molecular basis of blood groups as well as of the specific sequencing strategy used.
It is known from numerous publications that the phenotypical relevance of a certain mutation depends on its type and location within the blood group gene. In general, mutations within coding and regulatory regions are more likely to have an impact on the blood group phenotype than mutations located within the non-coding gene segments. In addition, multiple base exchanges as well as single sequence mutations significantly affecting transcription or translation of the encoded protein (splice site mutation, early stop codon) normally have a greater influence on blood group antigen formation than nucleotide mutations resulting in a single amino acid exchange only. Nevertheless, the real impact on the phenotype of a given sequence variation can only be assessed if the biology of the respective blood group is also considered.
If the antigens of a blood group system reside on a (glyco)protein, their occurrence depends on the conformation and surface expression level of the protein. The protein conformation in turn is based on a number of different factors: the membrane protein type (single-pass, multi-pass, gly-cosylphosphatidylinositol(GPI)-linked, passively absorbed), the structure of the extracellular domain(s), and the degree of glycosylation. Dependence of the phenotypical relevance of an amino acid exchange on the topology of the protein and the site where it occurs in the molecule is nicely exemplified by the multi-pass protein RhD. Amino acid substitutions in weak RhD are predominantly located in the intracellular or transmembraneous protein segments and those found in partial RhD phenotypes in the extracellular protein segments . Whereas weak RhD samples are characterized by a reduced RhD antigen density, partial RhDs have altered RhD proteins that differ sufficiently from normal RhD to allow allo-anti-D production. These findings strongly suggest that the changes in shape and expression level of RhD caused by amino acid substitutions follow a role that is strongly associated with the domains defined by the three-dimensional structure of the protein.
Different types of mutations in blood group genes were shown to cause blood group null phenotypes. In 2001  and 2004  two large studies on the presence of RHD in RhD-negative Europeans were published which found large hybrid genes (e.g. RHDCE(2-9)-D), disrupted start or stop codons, splice site mutations, and single base exchanges in RHD to be associated with truncated or non-expressed Rh proteins. RHDψ and Cdes  are the leading RHD null alleles in populations of African origin and K409K in Asians [8, 20, 21]. Non-expressed genes also exist in other blood group systems. In the FY gene a mutation in the GATA-1 binding motif causes Fy(a− b−) phenotypes, frequently observed in individuals of African origin . Similar to RHD, the KEL gene shows a broad spectrum of null alleles detected in a large survey in Austria . So far, 17 different non-expressed KEL genes are known with either splice site mutations in introns or premature stop codons caused by different single mutations.
Many blood group proteins are expressed in close association with (e.g. RhD in the Rhesus complex) or are covalently linked (Kell and Kx)  to other membrane proteins. This explains why abnormal expression of a neighboring protein or mutations in those parts of the blood group protein which are in direct contact with other RBC proteins can cause unusual blood group phenotypes. For example, cell surface expression of RhD blood group polypeptide is posttranscriptionally regulated by the RhAG glycoprotein; in most cases, the lack of Rhesus in Rh(null) red cells is associated with mutations in the RHAG gene . Thus, for correlation of the genetic with the serologic findings a basic understanding of the molecular mechanisms associated with the intracellular transport pathways of membrane proteins and the requirements for cell surface expression is needed.
In those cases in which the blood group antigens of interest are carried by carbohydrate chains, it is critical for assessing the phenotype from the genotype to know that the monosaccharide residues are sequentially assembled by the action of specific glycosyltransferases and that the sequence in sugar chains is determined by the substrate specificity of the respective glycosyltransferases involved. The AB determinants of the ABO blood group, for example, are characterized by a single monosaccharide residue, and their biosynthesis is based on the action of the A or B allele products, α1.3-N-acetylgalactosaminyl- or α1.3-galactosyltransferase . The H structure, which is the determinant of blood group O and the substrate for the A and B transferases, is in turn the result of the action of the H gene-encoded α1.2-fucosyltransferase. This proves that the genes of sugar-based blood group systems are not responsible for the synthesis of the entire blood group substance (as is the case for the blood group proteins); rather, they are responsible only for the transfer of the respective immunodominant monosaccharide to a suitable acceptor substance.
The difference in enzyme specificity between the highly homologous A and B transferases is merely based on amino acid exchanges at three positions of the catalytic cleft of the AB enzyme. However, additional amino acid variations in this region can change the specificity of the glycosyltransferase or produce enzymes with dual specificity generating AB phenotypes (e.g. cis-AB) . More frequently, amino acid exchanges in the catalytic domain of the ABO transferase are associated with a reduction or complete loss of enzymatic activity (e.g. Ael blood group) . According to the principle that the assembly of the sugar chains is based on the sequential action of different glycosyltransferases, the synthesis of an oligosaccharide chain and its antigenic epitopes is interrupted when one step in the biosynthesis sequence is blocked. This regularly happens in O individuals who express only non-functional ABO transferases incapable of converting the H determinants. Another example is the H-deficient phenotype (Bombay phenotype), in which the formation of H-specific structures is blocked because of a catalytically inactive H transferase, thus no A or B substance can be produced . Overall, the fact that the impact of genetic variations on antigen formation is indirect and mediated by enzymes makes phenotype-genotype correlation in sugar-based blood group systems very complex. The recent finding that aberrant trafficking of variant A and B transferases can be involved in the formation of weak ABO phenotypes further emphasizes the need for special molecular biologic knowledge when dealing with unusual blood group phenotypes .
Sequence databases like BGMUT are extremely helpful in validation and interpretation of sequencing results when known nucleotide mutations are detected. However, in those cases in which novel mutations are found it is up to the immunohematologist to check for plausibility of these findings and to discriminate real sequence variations from technical artifacts. An even more challenging situation for the investigator is when no sequence variations are found in a sample with an unusual phenotype. Differential diagnosis in this case includes phenotypically relevant mutations in those parts of the gene that are not covered by the sequencing strategy used, antigen diminution due to epigenomic factors like DNA methylation, and antigen alteration associated with various medical conditions [30, 31]. At this point, reference laboratories with the relevant scientific expertise and an extended spectrum of methods should be engaged to assist in the diagnosis of unusual blood groups.
The increasing use of molecular genetic typing methods in immunohematology will further augment the importance of sequence-based typing as a reference method for genetic determination of unusual human blood group phenotypes. In the last few years, research on the genetic background of the blood group systems revealed that some of the systems, particularly ABO and Rhesus, show a great allelic diversity similar to that observed for HLA. Because only selected populations have been studied so far, it can be expected that the nucleotide sequence database on blood groups will continuously increase in the future. Since traditional genotyping methods are based on detection of known nucleotide mutations, both the increasing number of alleles and the fact that allele frequencies vastly differ between populations generally limit their possible applications. Nucleic acid sequencing is a powerful tool and provides by far the most detailed analysis of a particular piece of DNA. New high-throughput technologies for DNA sequencing combined with powerful computer-based data analysis have opened the avenue for rapid and efficient large-scale typing of donors and patients. Therefore, sequence-based typing strategies including whole gene sequencing hold the potential to become the standard for blood group genotyping.
The authors declared no conflict of interest.