|Home | About | Journals | Submit | Contact Us | Français|
Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable.
Bulk high-throughput genomic profiling studies have improved the understanding of cancer biology and facilitated the development of novel therapeutics. However, there is increasing awareness that bulk profiling approaches do not adequately produce information concerning tumor heterogeneity, an improved insight into which may facilitate the development of more effective therapeutic strategies (1).
Genomic profiling of individual single cells is currently technically available and recent reports of the highly parallel expression profiling of thousands of cells suggest that single-cell genomic profiling for clinical applications may become a reality (2,3). Notably, single-cell profiling using flow cytometry for immunophenotyping is currently a routine hematological diagnostic assay (4). Single-cell genomic profiling is therefore, in theory, potentially of clinical utility in the diagnostic work-up of a hematological malignancy such as acute myeloid leukemia (AML).
AML is a malignant disease of abnormally differentiated cells of the hematopoietic system (5). It is a clonally complex disease that is characterized by the presence of multiple clonal populations in the primary cancer, any of which may evolve to result in relapse (6).
Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnostics through the identification of putative subclonal populations and their respective drug sensitivity profiles (Fig. 1). In an attempt to develop a clinically relevant single-cell genomic profiling protocol, a pilot study of single-cell RNA-sequencing (RNA-seq) of an acute myeloid leukemia (AML) sample was performed.
An AML bone marrow sample, which was harvested in February 2011 from a 35-year-old female patient, was obtained from the archives of the Department of Hematology-Oncology (National University Hospital, Singapore). Ethical approval was obtained for the present study (Domain Specific Review Boards; National Healthcare Group, Singapore; ref. 2016/00547). Informed consent was obtained from the subject participating in the present study.
A total of 250,000 events were acquired for multiparametric analysis using a lyse-wash method on bone marrow cells (7). Blasts were identified using a cluster of differentiation (CD) 45/CD34/CD117/human leukocyte antigen-antigen D related (HLA-DR) combination (7).
A total of 5×106 cells were incubated with anti-CD45 antibody (Miltenyi Biotec, Inc., Cambridge, MA, USA; cat. no. 130-080-201; clone, 5B1) for 1 h at 4°C to stain white blood cells, while Hoechst 33342 (Thermo Fisher Scientific, Inc., Waltham, MA, USA; cat. no. H3570) was used to stain nuclei by adding the staining solution to the cells for 1 h at 4°C. Cells were loaded at the optimal concentration (250,000 cells/ml, as recommended by the manufacturer) into the microfluidics chip.
Single cells were isolated into individual chambers using an integrated fluidic circuit (IFC) on the Automated Microfluidic C1 system (Fluidigm Corporation, San Francisco, CA, USA). Cells positive for CD45 and Hoechst were lysed, and RNA isolation and complementary DNA (cDNA) synthesis was performed using the SMART-Seq® v4 Ultra® Low Input RNA kit for Sequencing (Clontech Laboratories, Inc., Mountainview, CA, USA; cat. no. 634888) which was preamplified using a unique SMARTer II A oligonucleotide and template switch primer (both reagents being present in the SMARTer Ultra Low RNA kit; Clontech Laboratories, Inc.; cat. no. 634833), according to the manufacturer's protocol. The cDNA was harvested manually by retrieving 3.5 µl of cDNA from the wells of the IFC for library preparation. Notably, only RNA strands with polyadenylated [poly(A)] tails were converted to cDNA and used for downstream processing.
Using the coordinates from the imaging, 20 cells that stained positive for the leukocyte marker CD45 and had intact nuclei, as observed using the Hoechst stain, were selected. Library preparation was performed using the Nextera XT DNA Sample Preparation kit (Illumina, Inc., San Diego, CA, USA; cat. no. FC-131-1096), according to the manufacturer's protocol. The 20 libraries were processed individually, with each library being assigned a unique barcode for pooled multiplex sequencing using the Illumina HiSeq 2000 platform (Illumina, Inc.), according to the manufacturer's protocol. Paired-end 100 bp reads were generated for analysis.
Paired-end FASTQ files were initially mapped to the reference human Hg19 transcriptome (ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz) (8) using Tophat2 (version 2.1.0; Johns Hopkins University, Baltimore, MD, USA) (9). Aligned reads (BAM files) were subsequently sorted and indexed using SAMtools (version 1.2; Wellcome Trust Sanger Institute, Cambridge, UK) (10). Cufflinks (version 2.2.1; University of Washington, Seattle, WA, USA) (11) was utilized for final transcriptome assembly (cufflink and cuffmerge function), and abundance estimation and normalization in Fragments Per Kilobase of transcript per Million mapped reads (FPKM) units (cuffnorm function).
A heatmap was generated for the key cell-type specific markers (typically used inimmunophenotyping using flow cytometry, including CD34 and CD45) based on their expression levels. For any gene, the presence of a transcript with an FPKM normalized expression value >0 is indicative of gene expression, while a FPKM normalized expression value of 0 indicates absence of expression. The presence and absence of the cell-type specific markers were plotted in a heatmap generated using the ‘pheatmap’ package (version 1.0.8) produced by the R Programming Environment (www.r-project.org).
Based on the gene expression profiles, cells that were CD34-positive, or HLA-DRA- and CD117-positive, were classified as ‘putative blasts’ (12).
Principal component analysis was carried out on the log2-transformed FPKM normalized expression values of all transcripts using the prcomp function of the R Programming Environment.
Targeted DNA-seq was performed as previously described (13,14). A total of 50 ng of genomic DNA was extracted from the AML bone marrow sample and processed using the TruSight Myeloid Sequencing Panel (Illumina, Inc.). A total of 54 genes known to be mutated inmyeloid neoplasms, including fms related tyrosine kinase 3 (FLT3), nucleophosmin (NPM1) and DNA methyltransferase 3 alpha (DNMT3A), were assessed. The TruSeq Amplicon (BaseSpace Workflow; version 184.108.40.206; Illumina, Inc.) was used to generate the BAM and VCF files. Visualization of reads was performed using the Integrative Genomics Viewer (version 2.3.69; Broad Institute, Cambridge, MA, USA) (15). Pindel (version 0.2.5a8; McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA) (16) was used to identify the presence of FLT3 internal tandem duplications.
Variant calling analysis was carried out on the aligned paired end reads using the Genome Analysis Toolkit (Broad Institute) (version3.4.46; Haplotype Caller function) with reference to the aforementioned human Hg19 genome (17,18). Variants identified from the analysis were annotated using the SeattleSeq Annotation webserver (snp.gs.washington.edu/SeattleSeqAnnotation138) (19). Visualization of reads was performed using the SAMtools tview function (10).
Immunophenotyping using flow cytometry demonstrated the blast population to comprise of ~65% the total number of cells. Based on the single-cell gene expression profile, 11/20 cells were identified to be putative blasts (Fig. 2).
Principal component analysis was performed in an attempt to identify potential subclonal populations (Fig. 3). Two outlier cells were identified, RHA115 and RHA118. Based on their gene expression profile (Fig. 2), these cells were classified as putative blasts.
Targeted DNA-seq revealed the presence of aDNMT3A mutation (c.2644C>T; p.Arg882Cys; Fig. 4); an NPM1 mutation (c.859_860insTCTG; p.Trp288CysfsTer12); and a 108 bp FLT3 internal tandem duplication (data not shown).
Variant calling of the RNA-seq data did not identify cells with any of the aforementioned NPM1 and FLT3 mutations. The DNMT3A mutation (c.2644C>T; p.Arg882Cys) was identified in one cell (RNA human AML119) (Fig. 5). Coverage analysis was performed in an attempt to understand the apparent absence of NPM1 and FLT3 transcript mutations, and low abundance of DNMT3A transcript mutations across the 20 cells. This revealed the reason to be the absence of transcripts mapping to the relevant mutation site, potentially secondary to stochastic transcript dropout (24).
Single-cell genomic analysis of AML has been previously reported (25,26). However, these studies involved only DNA analysis. To the best of our knowledge, the present study is the first single-cell transcriptomic analysis of AML.
In the present study, a clinical workflow for single-cell transcriptomic profiling has been piloted. Using single-cell RNA-seq, putative blasts were identified based on the gene expression profile of conventional immunophenotypic markers used in routine flow cytometry. For flow cytometric analysis, ~20 markers are typically used for profiling. There is a large contrast with transcriptomic analysis, as there are in principle ≥20,000 markers (genes) (27) that can be utilized, and individual cellular characterization is able theoretically to be highly detailed.
In addition to information derived from expression profiling, mutational (variant) data provides further information that maybe useful for individual cell categorization (26). Variant identification is most commonly performed on DNA-seq data (28). However, variant identification has also been performed on bulk (29) and single-cell RNA-seq data (23,30). In the present study, the DNMT3Ap.Arg882Cys mutation was identified in the transcript, providing evidence that the mutant transcript is expressed.
High-dimensional data presents an opportunity for increased cellular characterization and the potential identification of subclonal populations. Principal component analysis of the dataset in the present study revealed two putative blasts that did not cluster with the other blasts. In future studies, the authors of the present study aim to investigate the possibility of predicting the drug sensitivity of putative subclonal populations based on high-dimensional characterization, as has been performed in previous studies (31).
One of the primary limitations of the protocol proposed in the present study is the stochastic RNA loss, in which between 60 and 90% of poly (A) RNA may be lost during sample preparation (24). In the presentstudy, FLT3 and NPM1 transcript mutations were not identified in the 20 cells, while the DNMT3A (c.2644C>T; p.Arg882Cys) transcript mutation was identified in one cell. Following further analysis, this observation maybe explained by the absence of transcripts mapping to the relevant mutation site. Significant methodological improvements and protocol optimization are required to overcome this limitation.
Another current limitation is the relatively low throughput of the protocol proposed in the present study. Due to reasons of cost and logistics, routine clinical genomic profiling of a hundred cells is currently challenging (32). By contrast, flow cytometric immunophenotyping typically involves profiling tens of thousands of cells (33). With the development of higher-throughput platforms (2,3), there is the potential that the cost of single-cell genomic profiling will decrease significantly to a point where it becomes viable for clinical implementation.
Despite the aforementioned limitations, single-cell genomic profiling may lead to the improved diagnosis and theragnosis of various types of cancer, including AML. In the present study, a possible single-cell genomic profiling protocol was piloted for clinical diagnostics. In future studies, a larger number of cells may need to be profiled to identify distinct subclonal populations and predict respective drug sensitivity profiles based on subclonal genomic signatures.
The present study was supported by the Clinician Scientist Award (grant no. NMRC/CSA/046/2012) and the Clinician Scientist Individual Research Grant (grant no. CIRG/1379/2013) from the National Medical Research Council (Singapore) awarded to R.F.