Telomeres are the protective caps at the ends of chromosomes and are composed of telomeric DNA repeats, TTAGGG, and associated proteins. The telomeres are critical for genomic stability, as they prevent chromosome ends from being recognized as double strand breaks; they prevent end-to-end chromosome fusions and help maintain replicative competence. Telomere length varies widely among individuals at birth [
1] and decreases with each cell division since the DNA replication machinery is unable to replicate chromosome ends ('end-replication problem'). Telomere attrition inevitably reaches a critical point at which cellular senescence or apoptosis is triggered [
2]. Approximately 85% of cancers [
3] escape the cellular crisis caused by telomere shortening by activating telomerase, an enzyme that catalyzes the synthesis of telomeric DNA from an RNA template. An alternative mechanism to lengthen telomeres has also been observed in a small number of malignancies termed 'alternative lengthening of telomeres' (ALT) [
4]. This mechanism operates in a telomerase-independent fashion and is characterized by the production of long, heterogeneous telomeres [
5] that can be identified as large bright nuclear foci by fluorescence
in situ hybridization (FISH) [
6].
A number of experimental methods have been used to measure telomere length. Telomere restriction fragment (TRF) analysis involves digesting a large quantity of genomic DNA (1.5 to 2 µg) with enzymes that cut near the ends of the chromosomes. Southern blotting of this DNA with a telomere probe detects the sizes of the restriction fragments generated and thereby provides an average telomere length estimation. FISH can be useful for detecting ALT, but without a metaphase spread it is difficult to judge total telomeric DNA content. A high-throughput technique favored by those carrying out large studies is quantitative PCR (qPCR) with two reactions - one with primers specific for telomeric sequence and one with a single copy gene to allow normalization [
7,
8].
The development of massively parallel sequencing, that is, next-generation sequencing, provides an alternative and potentially highly robust method to measure telomeres. Castle
et al. [
9] previously suggested a potential application for whole-genome sequencing (WGS) to ascertain telomeric DNA content. By counting and normalizing WGS reads containing the telomere repeats (TTAGGG)
4, they reported that a lung carcinoid cell line had fewer telomere reads compared with the pooled DNA of healthy individuals [
9]. This
in silico finding, although consistent with the hypothesis that cell lines may have shorter telomeres due to many cycles of cell divisions, has several caveats. First, the observation was based on a single cell line with no experimental validation. Second, since the normal control DNA employed was not matched to the cell line source, it remains unclear if normal heterogeneity in telomere length might have contributed to the observed telomere difference. At present, the potential application of using WGS for telomere analysis has not been explored.
In this study we present the first comprehensive characterization of telomeres in primary tumors using WGS data from The St Jude Children's Research Hospital - Washington University Pediatric Cancer Genome Project (PCGP). The PCGP is sequencing 600 pediatric cancers and their matched normal DNA to identify somatic lesions that drive the initiation, biological and clinical behavior of pediatric cancers. It was launched in 2010 and WGS is complete for over 235 tumors from 15 different types of pediatric cancers with an average of 30-fold haploid coverage [
10], making it possible to carry out a comprehensive telomere analysis using WGS data [
11-
14].