|Home | About | Journals | Submit | Contact Us | Français|
Human cancer cell lines and xenografts are valuable samples for whole-genome sequencing of human cancer. Tumors can be maintained by serial xenografting in athymic (nude) or severe combined immunodeficient (SCID) mice. In the current study, we developed a molecular assay to quantify the relative contributions of human and mouse in mixed DNA samples. The assay was designed based on deletion/insertion variation between human and mouse genomes. The percentage of mouse DNA was calculated according to the relative peak heights of PCR products analyzed by capillary electrophoresis. Three markers from chromosomes 9 and 10 accurately predicted the mouse genome ratio and were combined into a multiplex PCR reaction. We used the assay to quantify the relative DNA amounts of 93 mouse xenografts used for a recently reported integrated genomic analysis of human pancreatic cancer. Of the 93 xenografts, the mean % of contaminating mouse DNA was 47%, ranging from 17% to 73%, with 43% of samples having greater than 50% mouse DNA. We then comprehensively compared the human and mouse genomes to identify 370 additional candidate gene loci demonstrating human-mouse length variation. With increasing whole genome sequencing of human cancers, this assay should be useful to monitor strategies to enrich human cancer cells from mixed human-mouse cell xenografts. Finally, we discuss how contaminating mouse DNA affects next generation DNA sequencing.
Genetic analysis of human cancers often involves analysis of xenografted tumors raised in immunodeficient mice (1–3). During expansion of human cancer xenografts in mice, human stromal cells are largely replaced by mouse stromal cells, resulting in mixed species samples (4). Since mouse cell contamination can complicate downstream genetic analysis, an accurate and sensitive estimate of the relative human and mouse composition of xenografted tumors is important. This is especially relevant to international efforts to sequence cancer genomes (3, 5). A mixture of human cancer cells and mouse stromal cells could theoretically be sequenced with high depth of coverage in order to discover mutations in the tumor genome even in the face of some level of mouse cell contamination. The assumptions inherent in this approach are many: 1) the human tumor sample is homogeneous, 2) the mouse genotype is completely known, 3) all mutations come from the human tumor sample, 4) any change in the human genome will clearly be a mismatch to human and not a perfect match to mouse, 5) sequencing coverage is fairly even across both genomes, and 6) the percent of mouse contamination is known. While the last point can be known, none of the other assumptions is exactly valid; thus it is not clear that this approach is entirely tenable.
Length determination assays could potentially provide the required accuracy and sensitivity. Resolving different length products using capillary electrophoresis and analyzing peak heights typically provides a limit of detection of 1–3% and an accuracy of 97–99% (6). An example is the PCR amplification of the X-Y homologous amelogenin alleles (AMELX, AMELY) to generate amplicons with a 6 bp difference between X and Y alleles (6–9). This simple PCR assay has been incorporated into commercial multiplex microsatellite identity testing kits for forensic, paternity and bone marrow transplant analyses.
In this report, we hypothesized that length variation in genes between mouse and human would be common. In a preliminary analysis, 12 loci with length variation were selected from human chromosomes 8, 9, 10 and 12. Surprisingly, only 3 of them (25%) resulted in accurate standard curves in mixed DNA samples. These 3 markers were multiplexed into a single PCR reaction to quantify the mouse component in the xenografts of human pancreatic cancer cells. This demonstrated that among 93 xenografts, the contaminating mouse DNA averaged 47%, and could be as high as 73%. In a comprehensive analysis of the two genomes, we further identified 370 loci demonstrating human-mouse length variation. This assay should also be useful in monitoring human cell enrichment, from mouse xenografts of different human cancers, in preparation for next generation sequencing.
Initially, 12 loci with length variation between mouse and human genome were randomly selected from chromosomes 8, 9, 10 and 12 according to the NCBI database. Chromosomes 9 and 12 were included since the cytogenetic studies of the primary pancreatic carcinomas and the comparative genomic hybridization analyses of pancreatic cell lines showed least chromosomal aberrations within these 2 chromosomes (10,11). The primers were designed to straddle the length variation and placed in regions that were conserved between the human and mouse genomes (Figure 1A). Primers for the three loci that were determined to be accurate for discriminating mouse from human DNA: Ribonuclease P/MRP 38kDa subunit gene on chromosome 10p13 (primer pair 5, F: 5′-TCATTGGCTTAAAATGTGT-3′, R: 5′-FAM-TTTATTTTAAGGGGTTGTAATG-3′), and two loci downstream ring finger and CCCH-type zinc finger domains 2 (RC3H2) on chromosome 9q34 (primer pair 43, F: 5′-CTATTCCTATAGCACAAAGG-3′, R: 5′-FAM-GATGGTGTACACCCATCATG-3′, and primer pair 45, F: 5′-HEX-ACTAAGTCAAGGCTACTGTG-3′, R: 5′-TTCTGGTGTCAGTATGGAAG-3′).
DNA samples were extracted using QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA) and the concentration was determined using NanoDrop ND-100 spectrophotometer (NanoDrop Technologies, Wilmington, DE). PCR reactions were performed in a 10 μl total volume containing 2 pmol of forward primer, 2 pmol reverse primer, 2.5 mM MgCl2, 0.2 mM each deoxyribonucleotide, 0.25 units AmpliTaq Gold DNA polymerase and 1.0 μl buffer (Applied Biosystems, Foster City, CA). Samples containing 10 ng DNA were subjected to 35 cycles of denaturation (95°C, 30 sec), annealing (52°C, 30 sec) and extension (72°C, 60 sec). The total amount of forward primers and reverse primers used for multiplex PCR was both 5 pmol and the AmpliTaq Gold DNA polymerase was increased to 0.5 units.
One μl of PCR product, 0.5 μl ROX size standard and 8 μl deionized formamide was mixed according to the manufacture’s protocol (Applied Biosystems), heated at 95°C for 2 minutes and placed on ice for at least 1 minute before electrokinetic injection into the ABI 3100 Genetic Analyzer. The size in bases and the peak heights on the electropherogram were determined using GeneScan® analysis software (Applied Biosystems). The mouse component was calculated for each locus by dividing the peak height of the mouse amplicon by the sum of peak heights of human and mouse amplicons. Samples with off-scale peak heights were diluted and reinjected into the ABI 3100 Genetic Analyzer.
DNAs extracted from 4 human EBV infected lymphoblastoid cell lines were used as controls for human DNA. DNAs extracted from spleens of 2 nude mice (nu/nu athymic mice, Charles River, Wilmington, MA) were used as controls for mouse DNA. The 2 mouse DNA samples were mixed with 98%, 80%, 50%, 20%, 5% and 2% weigh of 4 human DNA samples, resulting in 8 samples for assessing each standard point of percentage. Linear regression was performed using the least-squares method.
Fresh pancreatic cancer tissues were initially implanted into nude mice and subsequently explanted for DNA extraction or in vitro culture to establish human pancreatic cell lines as described previously (12). Briefly, the explanted xenografts were finely minced and digested with collagenase (Sigma-Aldrich, St. Lois, MO) before incubation in a rat tail collagen-coated T25 flask containing 5 ml of Minimum Essential Medium or Dulbecco’s Modified Eagle Medium (Invitrogen, Grand Island, NY) supplemented with 20% fetal bovine serum (Invitrogen) and 0.1 ng/ml human recombinant epidermal growth factor (EGF, Invitrogen). The human pancreatic cancer cells in the tissue culture flasks were enriched by repeated “selective” trypsinization using 5 ml 0.25% trypsin (Invitrogen) for 1–2 minutes.
All mouse and human Consensus CDS (CCDS) records from the May 1, 2008 release were downloaded (20091 human and 17704 mouse sequences). The Reciprocal Best Hits method (PubMed ID 9381173), while probably not the most sensitive method, is conservative and was implemented in a Perl script to identify orthologs between the two species. An in-house BLAST (PubMed ID 9254694) version 2.2.15 was used to create the alignments and identified 13566 candidate orthologs. All results were parsed (using custom Perl scripts) to find orthologs with length differences resulting from an insertion/deletion variation that was flanked by two 100% conserved sequences that were sufficiently long to potentially serve as primer binding sites. Both potential primer sites had to be in the same exon as the insertion/deletion (this was verified using the accompanying CCDS annotation files) and roughly 100–300 nucleotides apart. Finally, bl2seq (blast2sequences, version 2.2.15) was used to carefully re-align and verify the primer amplicons. For the alignments, BLAST was run with the filter at hash option, with gap opening and extension penalties set to favor gaps, and with lenient mismatch penalties to favor long alignments. Perl scripts are available upon request.
For chromosomes 8, 9, 10 and 12, we selected 12 loci with species-specific length variation, and designed primers to bind to conserved regions straddling the difference (Figure 1A). Eight mixed DNA samples with 20% mouse DNA were initially used to test the designed primers. We confirmed appropriate sized amplicons for primer pair 5 (271 bases human, 277 bases mouse), primer pair 43 (206 bases mouse, 211bases human), and primer pair 45 (93 bases mouse and 96 bases human) (Figure 1B). The relative ratio of the peak heights detected by capillary electrophoresis was used to calculate the ratio of genomes between mouse and human. Surprisingly, only these 3 primer pairs, one from chromosome 10 and 2 from chromosome 9, demonstrated peak height ratios close to 23% (the amount of mouse DNA per cell is only about 85% that of human DNA per cell, so a 20% mixture by DNA amount converts to 22.9% mouse when considered as relative genomes). The calculated data using primers pairs 5, 43 and 45 were 24%, 21% and 22%, respectively. The other primers pairs showed preferential amplification of mouse genome (more than 28% mouse DNA) or human genome (less than 18% mouse DNA). For example, the % mouse DNA was 15% ± 0.9% (mean±SD) using primer pair 56. We hypothesized that there might be a SNP affecting primer binding and therefore biased amplification, however the results were similar (15±1.3%) when a set of external primers (primer pair 57) was used. Further, direct sequencing of the PCR products amplified by the external primer pairs confirmed that both internal primer binding sites completely matched (data not shown), indicating that preferential amplification of human genome by this set of primers was not caused by a mismatch of a primer with mouse DNA. The explanation for the bias in amplification for these loci remains elusive.
Three primers pairs were subsequently used to test the mixed DNA samples with 98%, 80%, 50%, 20%, 5% and 2% of mouse DNA, and showed accurate prediction of mouse genome ratio with slopes close to 1.0, and R2 greater than 0.99 for each primer set (Figure 2). The primer pairs used to test the samples mixed with DNA from 4 human LCL lines and DNA from 2 c57BL/6 mice gave similar results (data not shown).
Gain or loss of chromosomes in cancer cells may lead to potential errors in calculating the degree of mixed chimerism following hematopoietic cell transplantation (13,14). At least two loci, particularly those located within normal chromosomes, are recommended. One might especially focus on loci where haploinsufficiency is not tolerated. In the current study, loci from chromosomes 9 and 12 were initially included since previous reports have shown less chromosomal aberrations of pancreatic cancers in both chromosomes (10,11). However, we have failed for several genes from chromosome 12 because of preferential amplification of mouse or human genome in our PCR assay designed according to species-specific length variations. Direct sequencing of one locus confirmed the complete matching of both forward and reverse primers to the human and mouse genomes, suggesting other factors, such as insertion/deletion variations or single nucleotide variations within the amplified segments of mouse and human genomes, may affect the amplification efficiency of mouse or human genome. Screening multiple loci may be needed in order to identify a few loci that are applicable for accurate quantification of mouse genome in xenografts.
DNA samples were extracted from 93 xenografts (PaX designated samples). The % mouse DNA detected by the three sets of primers were similar for the vast majority of xenografts (Supplemental Table 1). For the 93 xenografts, we graphed the % mouse as a histogram (Figure 3A), where the mean % mouse contamination was 47% (range 17–73%). 43% of samples had greater than 50% mouse DNA. The 3 pairs of primers were then combined into a single tube multiplex PCR with molar ratios of 2:8:1 for primer pairs 5, 43 and 45. The results were consistent with those detected by individual PCR (data not shown).
Molecular testing was used to corroborate phase microscopic observations. The 3 sets of primers were used to monitor the presence of mouse cells during the process of establishing human pancreatic cancer cell lines from explanted mouse xenografts, Panc185 and JH024 (Figure 3B). With Panc185, quantification of mouse DNA was used to select flask A for continuous passage (rather than B or C) and to confirm subsequent cell-line passages (A-P2, A-P3). The absence of human DNA was also useful to document the failure to establish the JH024 pancreatic cancer cell-line.
The assay can also be used to monitor enrichment of human tumor cells from xenografts (15). We anticipate that these markers will be useful to analyze xenografts from breast and colon cancers, however 9q is commonly lost in non-small cell lung cancer (16–18).
Since gain or loss of chromosomes may occur in pancreatic cancers (9), discrepancies are expected when the % mouse DNA is measured using primer pair 5 located on chromosome 10 compared with those measured using primers pairs 43 and 45 located on chromosome 9 (supplemental Figures 1A and 1B). The results measured by primer pairs 43 and 45 located within the same chromosome, however, were highly consistent (supplemental Figure 1C). To distinguish between a systematic error of the assay versus biologic variability in chromosome numbers due to genomic instability inherent in cancer, we developed an alternative assay. We identified areas of the genome, adjacent to these chromosome 9 and 10 loci, which contained conserved regions to place PCR primers and nucleotide differences to assess the relative percentage by Sanger sequencing.
We calculated the % mouse DNA using direct sequencing based on the peak height at 3–6 base positions with nucleotide variation between mouse and human genomes. We identified primer pair 68 near the location of primer 5 and primer pair 71 near the location of primer 45, which accurately report the % mouse in mixed DNA samples (Supplemental Figure 2). Among the 10 xenografts with more than 15% discrepancy between primer pairs 5 and 45 (Supplemental Table 1), we measured the % mouse DNA by direct sequencing. These data demonstrated concordance of the % mouse between the original length variation method and the Sanger method using primers from the same chromosome (Supplemental Figures 3 and 4). We therefore conclude that the original discrepancies identified were due to the inherent biologic variation in cancers from genomic instability resulting in gains and losses of chromosomes.
We then comprehensively identified all of the length variants between mouse and human. We analyzed exons only as they tend to be more highly conserved between species and would be least likely to be affected by polymorphisms among mouse strains. To do this, we compared the CCDS genomes using the reciprocal best hits method, in which a human gene is aligned to all mouse genes and a mouse gene is aligned to all human genes, and if human gene ‘A’ is at the top of the list of hits for mouse gene ‘a’, and mouse gene ‘a’ tops the list of mouse hits for human gene ‘A’, the two genes ‘A’ and ‘a’ are said to be orthologs.
The set of 13,566 ortholog pairs identified in this way was examined further. In 8,842 pairs, a length difference was identified between the mouse and human coding regions. In 7,329 of those pairs, the length difference was due to a discrete indel, and in 370 of those cases, we found pairs of flanking sequences appropriate for PCR primer design to detect species-specific length variation (Supplemental Table 2). These flanking sequences are identical in the human and mouse genomes and span the species-specific length variation in an amplicon of 100–300 nucleotides, and thus could be used for PCR primer design to differentiate mouse and human sequences (Supplemental Table 2). The number of loci containing candidate indels ranges from 3 loci within chromosome 21 to 41 loci within chromosome 1, reflecting the relative sizes of the chromosomes as well as the variation in conservation of the orthologs. It must be noted that these 370 loci are only “potential” loci and if they are to be used for this purpose, their binding and amplification properties must be confirmed because as outlined above, only 25% of the loci we initially analyzed were confirmed to accurately report DNA mixes.
To gauge whether our analysis could be generalized past the reference mouse genome, we used SNP genotyping data for 15 mouse strains (available from Perlegen/NIEHS (19)). Only four of our suggested primer binding sites, in three genes, (marked with asterisks in Supplementary Table 2) span known strain polymorphisms in the mouse genome.
The assay reported also has implications for next generation sequencing of human cancers from xenografts. Ideal sequencing coverage is approximately 26x for pure human cancer cell lines (20). Regions that are identical or that are completely different between the species do not pose a problem for mixture sequencing, as any read that does not align perfectly to either genome can be presumed to be a human mutation. Sequences that are very similar between the two genomes may be impossible to deconvolute, however. From alignments of the mouse-human syntenic regions (21), 16% of the regions that are functionally similar in the two species have a nucleotide identity of 90% or above; this covers nearly 13% of the human genome. A variation in a human sequence can masquerade as a mouse sequence if the novel human variation now matches a mouse sequence perfectly. Simulations suggest that as many as 1% of human reads containing single base changes will now match the mouse genome perfectly and therefore be undetectable in a mixture, regardless of the level of sequencing coverage achieved (Harris Jaffee, personal communication).
In the current study, we have developed a more genome-wide approach for selecting loci with length variation between human and mouse genomes (supplement Table 2). Successful identification of robust loci from each chromosome will facilitate the mouse genome quantification in xenografts of human pancreatic cancer and other solid tumors since a specific chromosome or groups of chromosomes are preferentially affected in different human cancers.
In summary, we have developed a PCR assay for rapid and robust quantification of mouse genome in xenografts of human pancreatic cancers. The results facilitate the genome-wide analysis of human pancreatic cancers and the establishment of human pancreatic cancer cell lines in vitro. The assay also can be applied to xenografts of other human cancers, and other situations where 2 species are admixed.
We thank Drs. Bert Vogelstein, Anirban Maitra and Scott Kern for helpful discussions. Supported by Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, and R01-CA130938 (JRE).
Statement of competing interests: The authors declare no competing interests.