Visual refractive errors (REs) are complex genetic traits with a largely unknown etiology. To date, genome-wide association studies (GWASs) of moderate size have identified several novel risk markers for RE, measured here as mean spherical equivalent (MSE). We performed a GWAS using a total of 7280 samples from five cohorts: the Age-Related Eye Disease Study (AREDS); the KORA study (‘Cooperative Health Research in the Region of Augsburg’); the Framingham Eye Study (FES); the Ogliastra Genetic Park-Talana (OGP-Talana) Study and the Multiethnic Study of Atherosclerosis (MESA). Genotyping was performed on Illumina and Affymetrix platforms with additional markers imputed to the HapMap II reference panel. We identified a new genome-wide significant locus on chromosome 16 (rs10500355, P = 3.9 × 10−9) in a combined discovery and replication set (26 953 samples). This single nucleotide polymorphism (SNP) is located within the RBFOX1 gene which is a neuron-specific splicing factor regulating a wide range of alternative splicing events implicated in neuronal development and maturation, including transcription factors, other splicing factors and synaptic proteins.
Prostate cancer (PrCa) is the most common male cancer in developed countries and the second most common cause of cancer death after lung cancer. We recently reported a genome-wide linkage scan in 69 Finnish hereditary PrCa (HPC) families, which replicated the HPC9 locus on 17q21-q22 and identified a locus on 2q37. The aim of this study was to identify and to detect other loci linked to HPC. Here we used ordered subset analysis (OSA), conditioned on nonparametric linkage to these loci to detect other loci linked to HPC in subsets of families, but not the overall sample. We analyzed the families based on their evidence for linkage to chromosome 2, chromosome 17 and a maximum score using the strongest evidence of linkage from either of the two loci. Significant linkage to a 5-cM linkage interval with a peak OSA nonparametric allele-sharing LOD score of 4.876 on Xq26.3-q27 (ΔLOD=3.193, empirical P=0.009) was observed in a subset of 41 families weakly linked to 2q37, overlapping the HPCX1 locus. Two peaks that were novel to the analysis combining linkage evidence from both primary loci were identified; 18q12.1-q12.2 (OSA LOD=2.541, ΔLOD=1.651, P=0.03) and 22q11.1-q11.21 (OSA LOD=2.395, ΔLOD=2.36, P=0.006), which is close to HPC6. Using OSA allows us to find additional loci linked to HPC in subsets of families, and underlines the complex genetic heterogeneity of HPC even in highly aggregated families.
linkage analysis; ordered subset analysis; prostate cancer
Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios.
We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented.
The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from.
Consistent nonparametric regression; Logistic regression; Probability machine; Odds ratio; Counterfactuals; Interactions
A previous study of Old Order Amish families has shown association of ocular refraction with markers proximal to matrix metalloproteinase (MMP) genes MMP1 and MMP10 and intragenic to MMP2. We conducted a candidate gene replication study of association between refraction and single nucleotide polymorphisms (SNPs) within these genomic regions.
Candidate gene genetic association study.
2,000 participants drawn from the Age Related Eye Disease Study (AREDS) were chosen for genotyping. After quality control filtering, 1912 individuals were available for analysis.
Microarray genotyping was performed using the HumanOmni 2.5 bead array. SNPs originally typed in the previous Amish association study were extracted for analysis. In addition, haplotype tagging SNPs were genotyped using TaqMan assays. Quantitative trait association analyses of mean spherical equivalent refraction (MSE) were performed on 30 markers using linear regression models and an additive genetic risk model, while adjusting for age, sex, education, and population substructure. Post-hoc analyses were performed after stratifying on a dichotomous education variable. Pointwise (P-emp) and multiple-test study-wise (P-multi) significance levels were calculated empirically through permutation.
Main outcome measures
MSE was used as a quantitative measure of ocular refraction.
The mean age and ocular refraction were 68 years (SD=4.7) and +0.55 D (SD=2.14), respectively. Pointwise statistical significance was obtained for rs1939008 (P-emp=0.0326). No SNP attained statistical significance after correcting for multiple testing. In stratified analyses, multiple SNPs reached pointwise significance in the lower-education group: 2 of these were statistically significant after multiple testing correction. The two highest-ranking SNPs in Amish families (rs1939008 and rs9928731) showed pointwise P-emp<0.01 in the lower-education stratum of AREDS participants.
We show suggestive evidence of replication of an association signal for ocular refraction to a marker between MMP1 and MMP10. We also provide evidence of a gene-environment interaction between previously-reported markers and education on refractive error. Variants in MMP1- MMP10 and MMP2 regions appear to affect population variation in ocular refraction in environmental conditions less favorable for myopia development.
refraction; refractive error; myopia; association study; gene-environment interaction; matrix metalloproteinase; MMP; genetics
Linkage analysis was developed to detect excess co-segregation of the putative alleles underlying a phenotype with the alleles at a marker locus in family data. Many different variations of this analysis and corresponding study design have been developed to detect this co-segregation. Linkage studies have been shown to have high power to detect loci that have alleles (or variants) with a large effect size, i.e. alleles that make large contributions to the risk of a disease or to the variation of a quantitative trait. However, alleles with a large effect size tend to be rare in the population. In contrast, association studies are designed to have high power to detect common alleles which tend to have a small effect size for most diseases or traits. Although genome-wide association studies have been successful in detecting many new loci with common alleles of small effect for many complex traits, these common variants often do not explain a large proportion of disease risk or variation of the trait. In the past, linkage studies were successful in detecting regions of the genome that were likely to harbor rare variants with large effect for many simple Mendelian diseases and for many complex traits. However, identifying the actual sequence variant(s) responsible for these linkage signals was challenging because of difficulties in sequencing the large regions implicated by each linkage peak. Current ‘next-generation’ DNA sequencing techniques have made it economically feasible to sequence all exons or the whole genomes of a reasonably large number of individuals. Studies have shown that rare variants are quite common in the general population, and it is now possible to combine these new DNA sequencing methods with linkage studies to identify rare causal variants with a large effect size. A brief review of linkage methods is presented here with examples of their relevance and usefulness for the interpretation of whole-exome and whole-genome sequence data.
Linkage; Genetics; DNA sequence; Whole-genome sequence; Whole-exome sequence
Genome-wide association studies have identified novel genetic factors that contribute to intracranial aneurysm (IA) susceptibility. We sought to confirm previously reported loci, to identify novel risk factors, and to evaluate the contribution of these factors to familial and sporadic IA.
We utilized 2 complementary samples, one recruited on the basis of a dense family history of IA (discovery sample 1: 388 IA cases and 397 controls) and the other without regard to family history (discovery sample 2: 1095 IA cases and 1286 controls). Imputation was used to generate a common set of single nucleotide polymorphisms (SNP) across samples, and a logistic regression model was used to test for association in each sample. Results from each sample were then combined in a meta-analysis.
There was only modest overlap in the association results obtained in the 2 samples. In neither sample did results reach genome-wide significance. However, the meta-analysis yielded genome-wide significance for SNP on chromosome 9p (CDKN2BAS; rs6475606; P=3.6×10−8) and provided further evidence to support the previously reported association of IA with SNP in SOX17 on chromosome 8q (rs1072737; P=8.7×10−5). Analyses suggest that the effect of smoking acts multiplicatively with the SNP genotype, and smoking has a greater effect on risk than SNP genotype.
In addition to replicating several previously reported loci, we provide further evidence that the association on chromosome 9p is attributable to variants in CDKN2BAS (also known as ANRIL, an antisense noncoding RNA).
genome-wide association study; intracranial aneurysm
Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.
rare variants; LASSO; machine learning; random forests; logic regression; binary trees; Poisson regression; ISIS; classification trees; meta-analysis; extreme sampling
Genetics Analysis Workshop 17 provided common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates. We provide a brief review of the machine learning and regression-based methods used in the analyses of these data. Several regression and machine learning methods were used to address different problems inherent in the analyses of these data, which are high-dimension, low-sample-size data typical of many genetic association studies. Unsupervised methods, such as cluster analysis, were used for data segmentation and subset selection. Supervised learning methods, which include regression-based methods (e.g., generalized linear models, logic regression, and regularized regression) and tree-based methods (e.g., decision trees and random forests), were used for variable selection (selecting genetic and clinical features most associated or predictive of outcome) and prediction (developing models using common and rare genetic variants to accurately predict outcome), with the outcome being case-control status or quantitative trait value. We include a discussion of cross-validation for model selection and assessment and a description of available software resources for these methods.
unsupervised learning; supervised learning; cluster analysis; logistic regression; Poisson regression; logic regression; LASSO; ridge regression; decision trees; random forests; cross-validation; software
Despite many years of research, most of the genetic factors contributing to myopia development remain unknown. Genetic studies have pointed to a strong inherited component, but although many candidate regions have been implicated, few genes have been positively identified.
We have previously reported 2 genomewide linkage scans in a population of 63 highly aggregated Ashkenazi Jewish families that identified a locus on chromosome 22. Here we used ordered subset analysis (OSA), conditioned on non-parametric linkage to chromosome 22 to detect other chromosomal regions which had evidence of linkage to myopia in subsets of the families, but not the overall sample.
Strong evidence of linkage to a 19-cM linkage interval with a peak OSA nonparametric allele-sharing logarithm-of-odds (LOD) score of 3.14 on 20p12-q11.1 (ΔLOD=2.39, empirical p=0.029) was identified in a subset of 20 families that also exhibited strong evidence of linkage to chromosome 22. One other locus also presented with suggestive LOD scores >2.0 on chromosome 11p14-q14 and one locus on chromosome 6q22-q24 had an OSA LOD score=1.76 (ΔLOD=1.65, empirical p=0.02).
The chromosome 6 and 20 loci are entirely novel and appear linked in a subset of families whose myopia is known to be linked to chromosome 22. The chromosome 11 locus overlaps with the known Myopia-7 (MYP7, OMIM 609256) locus. Using ordered subset analysis allows us to find additional loci linked to myopia in subsets of families, and underlines the complex genetic heterogeneity of myopia even in highly aggregated families and genetically isolated populations such as the Ashkenazi Jews.
To determine the potential influence of genetic factors on the prevalence of myopia in Tehran.
Of 6497 citizens of Tehran sampled from 160 clusters using stratified random cluster sampling, 4565 (70.3%) participated in the study and were referred to a clinic for an extensive eye examination and interview. These were from 1259 nuclear families with the average size of 3.6. Refraction data obtained from 3321 participants aged 16 years and over are presented. Three definitions of myopia, as the spherical equivalent of −0.5, −1, and −2 diopters or less, were used. Familial aggregation of myopia was evaluated with odds ratios and recurrence risk ratios (λR) using a multiple logistic regression with generalised estimating equations (GEE), adjusted for age, sex, height, and education.
Multivariate analyses showed a strong familial aggregation of myopia among siblings (λR ranging from 2.09 to 3.86) and parent–offspring pairs (λR from 1.82 to 3.81) adjusted for age, sex, height, and education. The aggregation increased with higher myopia thresholds and with the use of cycloplegic refraction. The odds ratios for spouse pairs were not significantly different from 1.0. The association of myopia with sex, height, and education (and not age) remained significant in the final GEE2 model.
The findings indicate a relatively high degree of familial aggregation of myopia in the Tehran population, independent of age, sex, height, and education. This residual aggregation may be a result of heredity or of an unmeasured common environmental effect.
familial myopia; myopia; refractive error; recurrence risk
In spite of intensive efforts, understanding of the genetic aspects of familial prostate cancer remains largely incomplete. In a previous microsatellite-based linkage scan of 1233 prostate cancer (PC) families, we identified suggestive evidence for linkage (i.e. LOD≥1.86) at 5q12, 15q11, 17q21, 22q12, and two loci on 8p, with additional regions implicated in subsets of families defined by age at diagnosis, disease aggressiveness, or number of affected members.
In an attempt to replicate these findings and increase linkage resolution, we used the Illumina 6000 SNP linkage panel to perform a genome-wide linkage scan of an independent set of 762 multiplex PC families, collected by 11 ICPCG groups.
Of the regions identified previously, modest evidence of replication was observed only on the short arm of chromosome 8, where HLOD scores of 1.63 and 3.60 were observed in the complete set of families and families with young average age at diagnosis, respectively. The most significant linkage signals found in the complete set of families were observed across a broad, 37 cM interval on 4q13-25, with LOD scores ranging from 2.02 to 2.62, increasing to 4.50 in families with older average age at diagnosis. In families with multiple cases presenting with more aggressive disease, LOD scores over 3.0 were observed at 8q24 in the vicinity of previously identified common PC risk variants, as well as MYC, an important gene in PC biology.
These results will be useful in prioritizing future susceptibility gene discovery efforts in this common cancer.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.
Covariate-based linkage analyses using a conditional logistic model as implemented in LODPAL can increase the power to detect linkage by minimizing disease heterogeneity. However, each additional covariate analyzed will increase the degrees of freedom for the linkage test, and therefore can also increase the type I error rate. Use of a propensity score (PS) has been shown to improve consistently the statistical power to detect linkage in simulation studies. Defined as the conditional probability of being affected given the observed covariate data, the PS collapses multiple covariates into a single variable. This study evaluates the performance of the PS to detect linkage evidence in a genome-wide linkage analysis of microsatellite marker data from the Collaborative Study on the Genetics of Alcoholism. Analytical methods included nonparametric linkage analysis without covariates, with one covariate at a time including multiple PS definitions, and with multiple covariates simultaneously that corresponded to the PS definitions. Several definitions of the PS were calculated, each with increasing number of covariates up to a maximum of five. To account for the potential inflation in the type I error rates, permutation based p-values were calculated.
Results suggest that the use of individual covariates may not necessarily increase the power to detect linkage. However the use of a PS can lead to an increase when compared to using all covariates simultaneously. Specifically, PS3, which combines age at interview, sex, and smoking status, resulted in the greatest number of significant markers identified. All methods consistently identified several chromosomal regions as significant, including loci on chromosome 2, 6, 7, and 12.
These results suggest that the use of a propensity score can increase the power to detect linkage for a complex disease such as alcoholism, especially when multiple important covariates can be used to predict risk and thereby minimize linkage heterogeneity. However, because the PS is calculated as a conditional probability of being affected, it does require the presence of observed covariate data on both affected and unaffected individuals, which may not always be available in real data sets.
We compared seven different tagging single-nucleotide polymorphism (SNP) programs in 10 regions with varied amounts of linkage disequilibrium (LD) and physical distance. We used the Collaborative Studies on the Genetics of Alcoholism dataset, part of the Genetic Analysis Workshop 14. We show that in regions with moderate to strong LD these programs are relatively consistent, despite different parameters and methods. In addition, we compared the selected SNPs in a multipoint linkage analysis for one region with strong LD. As the number of selected SNPs increased, the LOD score, mean information content, and type I error also increased.
The haplotypes of the X chromosome are accessible to direct count in males, whereas the diplotypes of the females may be inferred knowing the haplotype of their sons or fathers. Here, we investigated: 1) the possible large-scale haplotypic structure of the X chromosome in a Caucasian population sample, given the single-nucleotide polymorphism (SNP) maps and genotypes provided by Illumina and Affimetrix for Genetic Analysis Workshop 14, and, 2) the performances of widely used programs in reconstructing haplotypes from population genotypic data, given their known distribution in a sample of unrelated individuals.
All possible unrelated mother-son pairs of Caucasian ancestry (N = 104) were selected from the 143 families of the Collaborative Study on the Genetics of Alcoholism pedigree files, and the diplotypes of the mothers were inferred from the X chromosomes of their sons. The marker set included 313 SNPs at an average density of 0.47 Mb. Linkage disequilibrium between pairs of markers was computed by the parameter D', whereas for measuring multilocus disequilibrium, we developed here an index called D*, and applied it to all possible sliding windows of 5 markers each. Results showed a complex pattern of haplotypic structure, with regions of low linkage disequilibrium separated by regions of high values of D*. The following programs were evaluated for their accuracy in inferring population haplotype frequencies: 1) ARLEQUIN 2.001; 2) PHASE 2.1.1; 3) SNPHAP 1.1; 4) HAPLOBLOCK 1.2; 5) HAPLOTYPER 1.0. Performances were evaluated by Pearson correlation (r) coefficient between the true and the inferred distribution of haplotype frequencies.
The SNP haplotypic structure of the X chromosome is complex, with regions of high haplotype conservation interspersed among regions of higher haplotype diversity. All the tested programs were accurate (r = 1) in reconstructing the distribution of haplotype frequencies in case of high D* values. However, only the program PHASE realized a high correlation coefficient (r > 0.7) in conditions of low linkage disequilibrium.
Prostate cancer has a strong familial component but uncovering the molecular basis for inherited susceptibility for this disease has been challenging. Recently, a rare, recurrent mutation (G84E) in HOXB13 was reported to be associated with prostate cancer risk. Confirmation and characterization of this finding is necessary to potentially translate this information to the clinic. To examine this finding in a large international sample of prostate cancer families, we genotyped this mutation and 14 other SNPs in or flanking HOXB13 in 2,443 prostate cancer families recruited by the International Consortium for Prostate Cancer Genetics (ICPCG). At least one mutation carrier was found in 112 prostate cancer families (4.6 %), all of European descent. Within carrier families, the G84E mutation was more common in men with a diagnosis of prostate cancer (194 of 382, 51 %) than those without (42 of 137, 30 %), P = 9.9 × 10−8 [odds ratio 4.42 (95 % confidence interval 2.56–7.64)]. A family-based association test found G84E to be significantly over-transmitted from parents to affected offspring (P = 6.5 × 10−6). Analysis of markers flanking the G84E mutation indicates that it resides in the same haplotype in 95 % of carriers, consistent with a founder effect. Clinical characteristics of cancers in mutation carriers included features of high-risk disease. These findings demonstrate that the HOXB13 G84E mutation is present in ~5 % of prostate cancer families, predominantly of European descent, and confirm its association with prostate cancer risk. While future studies are needed to more fully define the clinical utility of this observation, this allele and others like it could form the basis for early, targeted screening of men at elevated risk for this common, clinically heterogeneous cancer.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-012-1229-4) contains supplementary material, which is available to authorized users.
In this study, we observed loss of heterozygosity (LOH) in human chromosomal fragment 6q25.1 in sporadic lung cancer patients. LOH was observed in 65% of the 26 lung tumors examined and was narrowed down to a 2.2-Mb region. Single-nucleotide polymorphism (SNP) analysis of genes located within this region identified a candidate gene, termed p34. This gene, also designated as ZC3H12D, C6orf95, FLJ46041, or dJ281H8.1, carries an A/G nonsynonymous SNP at codon 106, which alters the amino acid from lysine to arginine. Nearly 73% of heterozygous lung cancer tissues with LOH and the A/G SNP also exhibited loss of the A allele. In vitro clonogenic and in vivo nude mouse studies showed that overexpression of the A allele exerts tumor suppressor function compared with the G allele. p34 is located within a recently mapped human lung cancer susceptibility locus, and association of the p34 A/G SNP was tested among these families. No significant association between the less frequent G allele and lung cancer susceptibility was found. Our results suggest that p34 may be a novel tumor suppressor gene involved in sporadic lung cancer but it seems not to be the candidate familial lung cancer susceptibility gene linked to chromosomal region 6q23-25.
The use of tyrosine kinase inhibitors (TKI) has yielded great success in treatment of lung adenocarcinomas. However, patients who develop resistance to TKI treatment often acquire a somatic resistance mutation (T790M) located in the catalytic cleft of the epidermal growth factor receptor (EGFR) enzyme. Recently, a report describing EGFR-T790M as a germ-line mutation suggested that this mutation may be associated with inherited susceptibility to lung cancer. Contrary to previous reports, our analysis indicates that the T790M mutation confers increased Y992 and Y1068 phosphorylation levels. In a human bronchial epithelial cell line, overexpression of EGFR-T790M displayed a growth advantage over wild-type (WT) EGFR. We also screened 237 lung cancer family probands, in addition to 45 bronchoalveolar tumors, and found that none of them contained the EGFR-T790M mutation. Our observations show that EGFR-T790M provides a proliferative advantage with respect to WT EGFR and suggest that the enhanced kinase activity of this mutant is the basis for rare cases of inherited susceptibility to lung cancer.
Strong epidemiologic evidence links smoking and cancer. An increased understanding of the molecular biology of tobacco-related cancers could advance progress toward improving smoking cessation and patient management. Knowledge gaps between tobacco addiction, tumorigenesis, and cancer brought an interdisciplinary group of investigators together to discuss “The Biology of Nicotine and Tobacco: Bench to Bedside.” Presentations on the signaling pathways and pathogenesis in tobacco-related cancers, mouse models of addiction, imaging and regulation of nicotinic receptors, the genetic basis for tobacco carcinogenesis and development of lung cancer, and molecular mechanisms of carcinogenesis were heard. Importantly, new opportunities to use molecular biology to identify and abrogate tobacco-mediated carcinogenesis and to identify high-risk individuals were recognized.
Prostate cancer is generally believed to have a strong inherited component, but the search for susceptibility genes has been hindered by the effects of genetic heterogeneity. The recently developed sumLINK and sumLOD statistics are powerful tools for linkage analysis in the presence of heterogeneity.
We performed a secondary analysis of 1233 prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics (ICPCG) using two novel statistics, the sumLINK and sumLOD. For both statistics, dominant and recessive genetic models were considered. False discovery rate (FDR) analysis was conducted to assess the effects of multiple testing.
Our analysis identified significant linkage evidence at chromosome 22q12, confirming previous findings by the initial conventional analyses of the same ICPCG data. Twelve other regions were identified with genomewide suggestive evidence for linkage. Seven regions (1q23, 5q11, 5q35, 6p21, 8q12, 11q13, 20p11-q11) are near loci previously identified in the initial ICPCG pooled data analysis or the subset of aggressive prostate cancer (PC) pedigrees. Three other regions (1p12, 8p23, 19q13) confirm loci reported by others, and two (2p24, 6q27) are novel susceptibility loci. FDR testing indicates that over 70% of these results are likely true positive findings. Statistical recombinant mapping narrowed regions to an average of 9 cM.
Our results represent genomic regions with the greatest consistency of positive linkage evidence across a very large collection of high-risk prostate cancer pedigrees using new statistical tests that deal powerfully with heterogeneity. These regions are excellent candidates for further study to identify prostate cancer predisposition genes.
Using the Genetic Analysis Workshop 13 simulated data set, we compared the technique of importance sampling to several other methods designed to adjust p-values for multiple testing: the Bonferroni correction, the method proposed by Feingold et al., and naïve Monte Carlo simulation. We performed affected sib-pair linkage analysis for each of the 100 replicates for each of five binary traits and adjusted the derived p-values using each of the correction methods. The type I error rates for each correction method and the ability of each of the methods to detect loci known to influence trait values were compared. All of the methods considered were conservative with respect to type I error, especially the Bonferroni method. The ability of these methods to detect trait loci was also low. However, this may be partially due to a limitation inherent in our binary trait definitions.
Recently, a nonsense alteration Trp149Stop in the ARLTS1 gene was found more frequently in familial cancer cases vs. sporadic cancer patients and healthy controls. Here, the role of Trp149Stop or any other ARLTS1 germline variant was evaluated on breast, prostate, and colorectal cancer risk. The whole gene was screened for germline alterations in 855 familial cancer patients. The five observed variants were further screened in 1169 non-familial cancer patients as well as in 809 healthy population controls. The Trp149Stop was found at low frequencies (0.5–1.2%) in all patient subgroups vs. 1.6% in controls, and the mutant allele did not co-segregate with disease status in families with multiple affected individuals. The CC genotype in the Cys148Arg variant was slightly more common among both familial and sporadic breast (OR=1.48, 95% CI 1.16–1.87, p=0.001) and prostate cancer patients (OR 1.50, 95% CI 1.13–1.99, p=0.005) when compared to controls. A novel ARLTS1 variant Gly65Val was found at higher frequency among familial prostate cancer patients (8/164, 4.9%) than in controls (13/809, 1.6%; OR 3.14, 95% CI 1.28–7.70, p=0.016). However, after adjusting for multiple testing, none of these results were still significant. No association was found with any of the variants and colorectal cancer risk. Our results suggest that Trp149Stop is not a predisposition allele in breast, prostate, or colorectal cancer in the Finnish population, and, while the Gly65Val variant may increase familial prostate cancer risk and the Cys148Arg change may affect both breast and prostate cancer risk, the evidence is not strong in these data.
ARLTS1; ARL11; prostate cancer; breast cancer; colorectal cancer
Prostate cancer is a genetically complex disease with locus and disease heterogeneity. The RNASEL gene and HPCX locus have been implicated in hereditary prostate cancer; however, their contributions to sporadic forms of this malignancy remain uncertain.
Associations of prostate cancer with two variants in the RNASEL gene (a founder mutation, 471delAAAG, and a non-synonymous SNP, rs486907), and with five microsatellite markers in the HPCX locus, were examined in 979 cases and 1,251 controls of Ashkenazi Jewish descent. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using logistic regression models.
There was an inverse association between RNASEL rs486907 and prostate cancer in younger men (<65 years) and those with a first-degree relative with prostate cancer; men with AA genotype had ORs of 0.64 and 0.47 (both p<0.05), respectively, in comparison to men with GG genotype. Within the HPCX region, there were positive associations for allele 135 of bG82i1.1 marker (OR=1.77, p=0.01) and allele 188 of DXS1205 (OR=1.65, p=0.02). In addition, allele 248 of marker D33 was inversely associated (OR=0.65, p=0.05) with Gleason score ≥7 tumors.
Results suggest that variants in RNASEL contribute to susceptibility to early onset and familial forms of prostate cancer, whereas HPCX variants are associated with prostate cancer risk and tumor aggressiveness. The observation that a mutation predicted to completely inactivate RNASEL protein was not associated with prostate cancer, but that a missense variant was associated, suggests that the effect is due to either partial inactivation of the protein, and/or acquisition of a new protein activity.