|Home | About | Journals | Submit | Contact Us | Français|
The major histocompatibility complex (MHC) class I region of nonhuman primates (NHPs) is highly polymorphic and has undergone complex segmental duplications such that the number of expressed genes differs between individuals. In addition the relative abundance of transcripts varies more than 100-fold between NHP class I genes. This unparalleled complexity makes rapid, efficient class I genotyping difficult for NHPs. The ‘gold standard’ of cDNA library construction, screening and sequencing is both costly and labor-intensive. Several rapid genotyping methods have been utilized, but all require some degree of prior sequence knowledge. Here, we describe a method for sequence-based MHC class I genotyping which reduces cost by (1) pooling molecularly barcoded class I cDNA-PCR amplicons for cloning and (2) targeting sequencing of a region of concentrated polymorphism spanning the two exons encoding the peptide binding domain. This method can efficiently genotype both known and novel MHC class I alleles. In addition, full-length cDNA amplicons with novel sequences can be resequenced in their entireties to expand the repertoire of characterized MHC class I sequences for NHPs.
The major histocompatibility complex (MHC) class I genes play a vital role in the adaptive immune response mounted against viral and bacterial infections . In macaque monkeys, common models for infectious disease and transplantation research, the MHC class I genes are highly polymorphic and have undergone segmental duplication, leading to high allelic diversity and variable numbers of MHC transcripts in each animal . Variation between MHC class I alleles is concentrated within the peptide binding domain. Thus, the repertoire of MHC class I alleles expressed by a macaque determines the specificity of cytotoxic T lymphocyte responses as different alleles have the ability to bind and present different peptides [3-5]. Therefore, it is important to determine the MHC class I genotypes of macaques utilized in biomedical research protocols.
Various methods have been applied to macaque MHC genotyping including denaturing gradient gel electrophoresis (DGGE), reference strand conformational analysis (RSCA), sequence-specific PCR (PCR-SSP), and microsatellite analysis. DGGE and RSCA can both resolve multiple MHC alleles in a single run and can potentially identify both known and novel alleles present in an animal. However, analysis with these techniques is complicated by co-migration of heteroduplexes and both methods require clones of known sequence as exemplars for mobility for positive identification [6,7]. PCR-SSP assays are difficult to multiplex and only available for a very small fraction of nearly 1000 nonhuman primate (NHP) class I alleles identified to date (primarily alleles common among Indian-origin rhesus macaques) [5,8,9]. Development of PCR-SSP assays for additional alleles is dependent on identifying one or more single nucleotide polymorphisms that differentiate an allele of interest from all other alleles found in a particular species. Microsatellite analysis can easily and economically identify MHC alleles in a single run and can potentially identify both known and novel alleles present in an animal. However, analysis with these techniques is complicated by co-migration of heteroduplexes and both methods require clones of known sequence as exemplars for mobility for positive identification [6,7]. PCR-SSP assays are difficult to multiplex and only available for a very small fraction of nearly 1000 nonhuman primate (NHP) class I alleles identified to date (primarily alleles common among Indian-origin rhesus macaques) [5,8,9]. Development of PCR-SSP assays for additional alleles is dependent on identifying one or more single nucleotide polymorphisms that differentiate an allele of interest from all other alleles found in a particular species. Microsatellite analysis can easily and economically identify shared MHC class I haplotypes, particularly between related individuals, but provide no information on the alleles encoded on each haplotypes unless an independent method of genotyping is employed to associate specific class I alleles with microsatellite haplotypes [10,11]. Overall, all of these methods require prior knowledge of allele sequences for each macaque species.
The ‘gold standard’ for MHC class I genotyping is cloning and sequencing from a cDNA library of an animal since this has the potential to detect all expressed alleles . The disadvantage of this approach is the prohibitive cost associated with library construction, screening and sequencing enough clones per animal for effective genotyping. We have developed a rapid method for sequence-based genotyping that amplifies full-length NHP class I cDNAs that are molecularly barcoded with multiplex identifier (MID) tags. This approach allows for ligation and transformation of pools of PCR amplicons from multiple animals, reducing cloning costs. Additionally, our method focuses sequencing on a concentrated region of high polymorphism in exons 2 and 3, which encodes the peptide binding domain. By selectively sequencing only this region, we can unambiguously resolve the majority of known macaque alleles while significantly reducing the cost of sequencing. Finally, this method also detects previously unidentified alleles that can easily be sequenced using additional primers to fully characterize these novel MHC class I sequences.
The workflow outlined in Fig. 1 describes a method for rapid, cost-effective sequence-based genotyping of MHC class I alleles expressed in NHPs. Total cellular RNA is converted to first-strand cDNA and PCR amplicons containing the complete open reading frame of class I A and I B transcripts are generated using primers in the 5′ and 3′ untranslated regions. Purified amplicons are cloned in E. coli and the resulting plasmid DNAs (typically 192/animal) serve as templates for single-pass Sanger sequencing with a primer that flanks the highly polymorphic peptide binding domains encoded by exons 2 and 3 of the class I genes. BLASTn analysis of the resulting sequences against a database of known NHP MHC sequences yields a summary of the class I transcripts expressed by an animal. By altering the depth of sequence coverage the balance between cost and sensitivity of this genotyping approach may be controlled. Depending on experimental goals, barcoded cDNA amplicons from four or more animals may be pooled and genotyped simultaneously. Since plasmids containing novel cDNA sequences may be prioritized for further characterization by full-length sequencing, this approach is particularly useful for allele discovery efforts in less well-characterized NHP populations such as pig-tailed macaques.
|5′ PCR primer||5′MHC-UTR||5′ [MID]AGAGTCTCCTCAGACGCCGAG 3′|
|3′ IA PCR primer||3′MHC-UTR-A||5′ CAGGAACAYAGACACATTCAGG 3′|
|3′ IB PCR primer||3′MHC-UTR-B||5′ GGCTGTCTCTCCACCTCCTCAC 3′|
|MID tag set||MID1||5′ ACGAGTGCGT 3′|
|MID2||5′ ACGCTCGACA 3′|
|MID3||5′ AGACGCACTC 3′|
|MID4||5′ AGCACTGTAG 3′|
|MID5||5′ ATCAGACACG 3′|
|MID6||5′ ATATCGCGAG 3′|
|MID7||5′ CGTGTCTCTA 3′|
|MID8||5′ CTCGCGTGTC 3′|
|MID9||5′ TAGTATCAGC 3′|
|MID10||5′ TCTCTATGCG 3′|
|MID11||5′ TGATACGTCT 3′|
|MID12||5′ TACTGAGCTA 3′|
|MID13||5′ CATAGTAGTG 3′|
|MID14||5′ CGAGAGATAC 3′|
To facilitate rapid total RNA isolation from fresh whole blood or fresh or frozen cell samples from up to 32 animals simultaneously, we use the Roche MagNA Pure LC Instrument. Other RNA isolation techniques can be substituted if this robot is not available. Regardless of isolation technique, all reagents and consumables used should be strictly RNase-free. Proper protective equipment should be worn when handling reagents from the MagNA Pure LC RNA Isolation Kit. Typical RNA concentrations from robotic isolation range between 10-35ng/μl with a 50μl elution volume.
We use an MJ Research Tetrad Thermocycler (Bio-Rad Laboratories) for all incubation steps but other thermocyclers may be utilized.
The PCR primers are specific for the MHC class I untranslated regions and generate an ~1.25kb amplicon containing the entire open reading frame. A common 5′ primer is paired with two locus-specific 3′ primers, so two PCR reactions are performed for each cDNA sample. The 3′MHC-UTR-A oligo has a higher degree of specificity for and preferentially amplifies MHC class I A loci alleles; likewise 3′MHC-UTR-B amplifies primarily class I B loci alleles. Additionally, MID tag sequences can be engineered into the 5′ ends of each oligo during synthesis to generate molecularly barcoded PCR products. By using a unique set of MID tagged primers for each animal, it is possible to pool PCR products from multiple animals, thus reducing the total number of ligation and transformation reactions required for cloning. Fourteen such MID tag sequences designed by Roche/454 Life Sciences are provided and additional tags can be designed as necessary. To ensure the generation of sufficient clones per animal for routine genotyping, it is not recommended to pool more than four animals together. However, up to 14 animals may be pooled if the primary goal is discovery of novel class I alleles in understudied populations or other such studies which only require a low depth of coverage.
To reduce the amount of PCR-induced error, the total number of cycles for each reaction should be kept to the absolute minimum required for sufficient amplification and will vary by sample. To rapidly check an aliquot of each reaction for amplification initially after 20 cycles and periodically thereafter, we use the FlashGel DNA cassette system (Lonza). Alternatively, several PCR reactions could be set up for each sample and run in parallel with varying numbers of cycles. Sufficient PCR amplification should be at least 0.4ng/μl (a 1.5kb band visible on a FlashGel in ambient light contains at least 1.6ng DNA). Cloning and sequencing will likely reveal an unacceptably high number of amplicons with PCR errors if more than 30 PCR cycles are used.
In addition to the desired ~1.25kb product, the PCR amplification process may yield primer dimers and truncated ~1.0kb products that result from aberrant internal priming by the 5′ oligo. Therefore gel electrophoresis is required to isolate the full-length MHC class I amplicon. We use SYBR® Safe DNA gel stain (Invitrogen) to visualize products, but ethidium bromide could also be used. Additionally, we have found that the Qiagen MinElute® Gel Extraction Kit yields the best concentration of purified PCR amplicon, but other gel purification kits can be substituted. The MinElute® Gel Extraction Kit includes both spin and vacuum protocols.
Accurate quantitation of the purified DNA products is essential if pooling PCR products from multiple animals. If products are not pooled in equal concentrations, there may be an over representation of clones from the most concentrated sample and some animals in the pool may not be adequately genotyped. We use the Qubit™ fluorometer (Invitrogen) to accurately quantitate our samples, but other approaches may also be used.
As stated previously, molecularly barcoding the PCR amplicons can greatly reduce the required number of ligation and transformations. However, since a routine transformation typically results in several hundred clones, this limits the number of animals that can be pooled together and still achieve sufficient coverage for genotyping. The MHC class I A region contains at most 6 alleles per animal and at least half of those are considered to be comparatively rare transcripts (‘minor’), so a depth of coverage of 48 class I A clones per animal is adequate to detect the abundant transcripts (‘major’). The macaque class I B region is considerably more complex, containing up to 24 expressed loci per animal and therefore requiring more clones to satisfactorily detect the major B locus alleles. For convenience we use a depth of coverage of 48 class I A and 144 class I B clones per animal for routine genotyping (192 clones total in two 96-well deep-well culture plates). As seen in Fig. 2, similar coverage can also be achieved with strategic pooling of MID barcoded class I A and I B products from multiple animals, reducing the number of ligation/transformation reactions for both regions. This pooling strategy reduces the number of ligations/transformations required for both loci in four animals from eight to three for a significant decrease in overall cloning cost. Depth of coverage can be varied by picking more or less clones per animal, depending on genotyping needs.
We use a four station vacuum manifold (5 PRIME) to miniprep up to four 96-well plasmid plates simultaneously, but other plasmid purification protocols such as magnetic bead-based methods may also be used. The pCR®-BluntII-TOPO® vector contains EcoRI cleavage sites flanking the PCR product ligation site, so an EcoRI restriction digest can easily confirm cloning efficiency with a subset of samples before proceeding to sequencing.
MHC class I alleles exhibit polymorphism throughout the entire ~1.1kb open reading frame. However, there is a short highly conserved region of sequence just upstream of the peptide binding domains encoded by exons 2 and 3. We developed a sequencing primer, 5′Refstrand, in this conserved region that we can use to perform single-pass sequencing of these highly polymorphic exons. With the ~500bp of sequence generated from sequencing from this single oligo, we can unambiguously resolve the majority of the known MHC class I alleles of macaques. For the subset of alleles that only differ from each other outside of the region we are sequencing with this method, we can still unambiguously assign them to a unique allele lineage with the sequence we generate. For example, as seen in Fig. 3, Mamu-A1*0040101 and Mamu-A1*0040102 are identical through exons 2 and 3 so we cannot distinguish the specific allele but can unambiguously assign that sequence to the Mamu-A1*00401 lineage. Since sufficient information for genotyping can be generated from a single sequencing reaction (rather than five reactions required to span the entire open reading frame), our method dramatically reduces the sequencing costs. Previously unidentified sequences noted in the single pass sequencing can be selectively sequenced with additional primers to characterize the entire open reading frame.
If using the pooled ligation approach, a second sequencing reaction using the oligo SBT190-R is required to generate sequence back to the MID tags incorporated in the 5′MHC-UTR oligo to link each sequence with the animal from which it was amplified. This reaction is necessary since the 5′Refstrand primer is located ~1.1kb away from the 3′MHC-UTR primers and a typical sequencing reaction produces only ~500bp of high quality sequence, so a single sequencing reaction would not be sufficient to generate enough sequence to reach the MID tags if they were incorporated into the 3′ primers. SBT190-R is located ~350bp from the 5′MHC-UTR primer. Even with this additional reaction, sequencing costs are still reduced by more than half.
We maintain an internal custom database of NHP MHC class I alleles which we can use to perform Basic Local Alignment Search Tool (BLAST) analyses. In the absence of a curated MHC-specific database, similar analyses can be performed through the nucleotide BLAST link on the National Center for Biotechnology Information (NCBI) website (http://blast.ncbi.nlm.nih.gov/Blast.cgi).
High-resolution sequence-based genotyping of human HLA alleles is widely available , but comparable capabilities are currently lacking for the analogous MHC alleles in NHPs that serve as preclinical models for human infectious disease and transplantation research. Our method for sequence-based MHC class I genotyping seeks to begin addressing this issue. It has been applied to over 120 samples from 3 different NHP species to date (Macaca mulatta, M. fascicularis, and M. nemestrina), proving that the same primers can effectively amplify MHC class I alleles in several species of macaques and likely other related NHP species. An example of the typical results obtained for a single Indian-origin rhesus macaque by this method is shown in Fig. 3. Of the 192 cDNA plasmids originally cloned and sequenced for this animal, 170 (88.5%) resulted in high quality sequences that were subsequently compared against known rhesus alleles using BLASTn. The efficiency of performing transformation reactions with pooled PCR products and reduction in required number of sequencing reactions has proven this to be a relatively rapid and cost-effective genotyping method. Additionally, many of the steps involved in this method, such as colony picking and plasmid minipreps, are amenable to automation. This method also simplifies allele discovery efforts since it generates full-length cDNAs and rapidly identifies novel clones for selective additional sequencing.
The general principles of this method could be applied to any highly polymorphic gene of interest, provided there are diagnostic regions of polymorphism like exons 2 and 3 of MHC class I alleles. For example, we have adapted this genotyping method for DRB and other MHC class II loci using analogous polymorphic domains . Finally, this method can also be adapted for use on next-generation sequencing platforms such as the Roche/454 Genome Sequencer FLX system (Wiseman et al., manuscript under review). Pooled, MID barcoded amplicons of appropriate length, currently approximately 200bp, can be amplified and pyrosequenced in massively parallel fashion for ultra high-throughput genotyping.
We thank members of the O'Connor laboratory for assistance developing these protocols. This work was supported by NIAID contract number HHSN266200400088C/N01-A1-40088 and by grant number 1 R24 RR021745-01A1. This publication was made possible in part by grant numbers P51 RR000167 and P40 RR019995 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH) to the Wisconsin National Primate Research Center, University of Wisconsin-Madison. This research was conducted at a facility constructed with support from Research Facilities Improvement Program grant numbers RR15459-01 and RR020141-01.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.