Cell lines and biopsies
To test our novel method for fusion gene detection, we selected four prostate cancer samples (fresh frozen tissue obtained from prostatectomy specimens of four independent patients) and two leukemic cell lines, all known to harbor a specific fusion gene. The cell lines, RCH-ACV [14
] and REH [15
], are of human B-cell precursor leukemia origin and were provided by Dr. Edith Rian.
Preparation of cDNA for microarray analysis and RT-PCR
Total RNA was isolated using the Trizol reagent (Life Technologies, Rockville, MD, USA), and the RNA quality was evaluated by use of the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). To enrich for messenger RNA, we used the RiboMinus kit (Invitrogen, Carlsbad, CA, USA) which subtracts ribosomal RNA from total RNA. To ensure detection of fusion junctions far away from the poly-A tail, the first strand cDNA was prepared by random priming to avoid the 3' end bias introduced by oligo-dT labeling. Double stranded cDNA was labeled and hybridized onto the oligo microarrays.
We set up a database with a broad coverage of the reported fusion genes in cancer (351 to date), including information on which of the fusion partners are up- and downstream in the majority of the resulting fusion transcripts. See Additional file 1
for the identities and orientation of the 275 fusion genes included in the pilot microarray design. We used public genome sequence information from Biomart to extract the exon sequences of all listed transcript variants [17
A script was written in the programming language Python for design of the oligos. For genes that constitute the 5' portion of fusion genes, we used the 3' end-sequences of the exons when constructing chimeric fusion junction oligos. For genes that are the 3' portion of fusion genes, we used the 5' start-sequences of the exons. Thus, for each fusion gene, we joined and listed all combinations of end-sequences and start-sequences. These chimeric sequences served as input for the design of chimeric fusion junction oligos, enabling detection of any breakpoint combination in the fusion genes. Chimeric oligos were constructed targeting all possible combinations of chimeric exon junctions between the up- and downstream partners of 275 known fusion genes. For a set of fusion genes, including the ones known to be present in the control samples, we extended the design to include four replicates of each of the exon-exon junctions, as well as altogether four extra control oligos for each exon-exon junction (oligos up- and down-shifted by two nucleotides as compared to the standard ones). Furthermore, a series of intragenic oligos were designed for measurements of longitudinal profiles of each of the fusion gene partners of altogether 115 genes, including all the positive control fusion genes. These were oligos targeting the start, mid, and end part of all exons and all introns, as well as oligos targeting the exon-exon, exon-intron, and intron-exon junctions. The exon-intron junctions and intron-exon junctions are also included among the single-gene oligos, as the pre-mRNA processing machinery may alter the splicing pattern following removal or introduction of cis-acting splicing regulatory sequences.
The constructed microarray included a design with 68,861 oligos, including 59,381 chimeric oligos (of which 55,482 were unique), which were synthesized onto custom-produced NimbleGen microarray slides (Roche NimbleGen, Inc., Madison, WI, USA). The chimeric oligos were designed to optimize for similar melting temperatures on each side of the junctions, thus reducing half-binder effects.
Two versions of the microarray were designed, differing as to the probe lengths. The set of shorter oligos, with lengths ranging from 34 to 40-mers, had a Tm optimum of 72°C. The set of longer oligos, with lengths ranging from 44 to 50-mers, had a Tm optimum of 75°C. All samples, except the REH cell line, were hybridized onto the short-oligo microarray, whereas the RCH-ACV and REH cell lines were hybridized onto the long-oligo microarray. The cell line RCH-ACV was analyzed by both microarray designs, and data from its positive control gene, TCF3:PBX1, demonstrated best performance of the short oligos due to substantial half-binder signals with the longer oligos (data not shown).
Because of the relatively short length of the sequences on each side of the junction, the binding may be sensitive to single nucleotide polymorphisms (SNPs). Thus, at known SNP positions, we created extra sets of probes, accounting for each of the SNP variants.
Data preprocessing and annotation
Data preprocessed by NimbleGen were further normalized by dividing all individual probe intensity values for each of the samples by the median of the three leukemia cell lines. We normalized based on these three samples (instead of all samples) because when the majority of the samples contain the same fusion gene and breakpoint (TMPRSS2:ERG, e1:e4), normalizing on all samples would level out the appearance of this fusion event in the dataset.
All oligonucleotide probes were mapped to their one or two respective genomic loci. For each locus, the Ensembl identifiers for exon (ENSE), transcript (ENST), and gene (ENSG) identities were used.
Raw and processed data were deposited to the Gene Expression Omnibus public repository for microarray data [accession number GSE14435] according to the MIAME, minimum information about a microarray experiment, recommendations for recording and reporting microarray-based gene expression data [18
Automated scoring algorithm
Downstream fusion partners will generally have higher expression values for exons downstream of the fusion breakpoint. For each exon-exon junction of downstream fusion partner genes, two probabilities were calculated. One probability was based on a t-test for whether values from all upstream and all downstream exons are likely to belong to different populations. A second probability was based on a t-test for whether the values from the immediate up- and downstream exons are likely to belong to different populations.
A fusion score was calculated as the product of the normalized expression value for the chimeric oligo and the probabilities of the exon-exon junction of the corresponding position in the downstream fusion partner being a breakpoint in the longitudinal profile [Fusion score = Chimeric junction score * P(B-gene transcript) * P(B-gene exon)].
To keep the values within scale, the following thresholds were applied: when the normalized values for chimeric oligos were larger than 5, they were set to 5 (approximately 5 per 10,000 values). Similarly, when probabilities for a breakpoint in the longitudinal profiles were < 0.10, they were set to 0.10. When the values from the downstream exons were lower than the values from the upstream exons, the probability was set to 0.10 as well.
Experimental validation of fusion transcript breakpoints
We used RT-PCR followed by DNA sequencing to validate the actual fusion junctions in the positive control fusion genes. The following primers were applied: TCF3:PBX1
, exon 15, forward, 5'-CACCCTCCCTGACCTGTCT-3', and PBX1
, exon 3, reverse, 5'-TGCTCCACTGAGTTGTCTGAA-3'; yielding a chimeric fusion product of 218 basepairs. ETV6:RUNX1
, exon 5, forward, 5'-CACTCCGTGGATTTCAAACA-3', and RUNX1
, exon 2, reverse, 5'-CGTGGACGTCTCTAGAAGGA-3'; yielding a chimeric fusion product of 204 basepairs. TMPRSS2:ERG
[as published in ref. [19
, exon 1, forward, 5'-TAGGCGCGAGCTAAGCAGGAG-3', and ERG
, exon 6, reverse, 5'-CTGCCGCACATGGTCTGTAC-3'; yielding a chimeric fusion product of 597 basepairs. The PCR products were separated by gel electrophoresis in a 2% agarose gel. For all fusion genes, DNA was isolated from the appropriate PCR bands (MiniElute Gel Extraction kit, Qiagen Co., Valencia, CA, USA) and sequenced in both directions using the same primers as for the RT-PCR (ABI Prism 3730; Applied Biosystems, Foster City, CA, USA).
Cell cultures from the leukemia cell lines were harvested for chromosome banding analysis. Chromosome preparations were made and G-banded using trypsin (DIFCO Laboratories, Detroit, MI, USA) and Leishman staining (BDH, Poole, England). For metaphase FISH, commercially available probes for the TCF3:PBX1 (TCF3 FISH DNA probe, split signal, DAKO Denmark A/S, Glostrup, Denmark) and ETV6:RUNX1 (dual color, Dual Fusion Translocation Probe Set; Vysis, Abbott Laboratories, Abbott Park, IL, USA) fusion genes were used. The denaturation and hybridization conditions as well as the subsequent detection procedures were in accordance with the manufacturers' protocols. Two hundred successive, whole, and single nuclei were examined through a Zeiss fluorescence microscope (Zeiss Axioplan, Oberkochen, Germany) for each FISH experiment.