In this study, we have characterized common genetic polymorphisms (SNPs and indels) spanning a 56 kb region on chromosome 19q13.33 in 78 individuals (156 chromosomes) of European ancestry by using 454 next-generation sequencing (Rothberg and Leamon
2008) coupled with a novel solution-based sequence capture method. This capture method provides a reliable and less labor intensive alternative to long-range PCR when sequencing large genomic regions. We discovered 298 new polymorphisms (116 SNPs and 182 indels) and confirmed 257 previously known loci in the process and constructed a detailed LD map of the region. A large fraction (~65%) of the SNPs described here has also been observed in an early release of the 1000 Genomes Project. Many of the indel polymorphisms detected are rare and validation is required to conclusively establish allele frequencies. Our analysis provides a comprehensive inventory of common genetic variation in the region surrounding the
KLK3 gene and allows for the selection of tag SNPs to be used in follow-up studies to thoroughly examine the association of genetic polymorphisms on chromosome 19q13.33 to prostate cancer risk and PSA levels. At an
r2 threshold of 0.8 and MAF of 1% or higher, 144 variants are necessary to tag the region, and at an
r2 threshold of 0.8 and MAF >5%, 86 loci are required. The resulting improvement in coverage is an additional 78% as compared to HapMap SNPs and 35% over variants known prior to this study (dbSNP).
Chromosome 19q13.33 harbors a cluster of 15 kallikrein genes tandemly arranged over ~300 kb. Three genes that belong to this family of serine proteases are located within the region sequenced in this study:
KLK15,
KLK3 and
KLK2. The
KLK3 gene encodes PSA, a protein that is produced almost solely by the prostate gland. Small amounts of PSA are detectable in the bloodstream of healthy men (Lilja
1985). An increase in serum PSA levels in men with prostate cancer forms the basis of the PSA test, a widely used screening tool for prostate cancer. The lack of specificity and sensitivity of the test has led to questions about its usefulness as a screening tool for prostate cancer and two large prospective randomized trials are currently underway to directly assess its benefits: The Prostate, Lung, Colorectal and Ovarian Cancer Screening trial (PLCO) (Andriole et al.
2009) and the European Randomized Study of Screening for Prostate Cancer (ERSPC) (Schroder et al.
2009).
The
KLK2 and
KLK15 genes have also been implicated in prostate cancer etiology. The
KLK2 gene is expressed in the prostate gland and has been proposed as a potential marker for prostate cancer. Like PSA, human kallikrein 2 (hK2) levels in the bloodstream are strongly associated with prostate cancer but do not increase the value of total PSA measurements for predicting risk of disease (Lilja et al.
2007). Interestingly,
KLK3 and
KLK2 share ~80% nucleotide sequence identity across exons, introns and non-coding regions of the two genes, suggesting a recent duplication event (Gan et al.
2000). PSA and hK2 also share ~80% amino acid identity (Gan et al.
2000).
KLK15 is the next gene centromeric to
KLK3 and shares considerable similarities to other kallikrein genes. It encodes yet another member of the kallikrein family, hK15. Expression of the
KLK15 gene appears to be upregulated in a large percentage of prostate cancers and is possibly associated with a higher stage disease (Stephan et al.
2003; Yousef et al.
2001).
Previous association studies with candidate or tag SNPs have reported a number of SNPs in or near the
KLK3 gene that appear to be associated with prostate cancer, PSA levels or both (Cramer et al.
2003;
2008; Pal et al.
2007). Results from GWAS and their follow-up studies are conflicting, and it appears that the association to prostate cancer may depend on how control individuals were selected. The SNP most significantly associated with prostate cancer risk (Eeles et al.
2008b) and PSA levels (Ahn et al.
2008), rs2735839, lies in a region of relatively low LD. We discovered two markers in high LD (
r2 ≥ 0.8) with rs2735839; thus, these variants are the most likely to be advanced in laboratory analyses designed to investigate the biological basis of the association signal(s).
Prostate cancer is the second leading cause of cancer deaths in the United States (Jemal et al.
2008). It shows both indolent and aggressive forms and it is difficult to distinguish patients that require aggressive therapy and management from those that should be left to watchful waiting. Although the benefits of PSA screening in detecting earlier stage cancers may be important, this leads to a significant intervention and unnecessary treatment. Evidence for or against the efficacy of PSA screening in reducing morbidity and mortality due to prostate cancer is eagerly awaited. Our effort to comprehensively describe common genetic variation in the
KLK3 locus on chromosome 19q13.33 should enable a rational approach towards the follow-up analyses of the role genetic variation plays on PSA levels and prostate cancer risk.