Search tips
Search criteria

Results 1-25 (60)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
Document Types
1.  Genetic heterogeneity in Finnish hereditary prostate cancer using ordered subset analysis 
Prostate cancer (PrCa) is the most common male cancer in developed countries and the second most common cause of cancer death after lung cancer. We recently reported a genome-wide linkage scan in 69 Finnish hereditary PrCa (HPC) families, which replicated the HPC9 locus on 17q21-q22 and identified a locus on 2q37. The aim of this study was to identify and to detect other loci linked to HPC. Here we used ordered subset analysis (OSA), conditioned on nonparametric linkage to these loci to detect other loci linked to HPC in subsets of families, but not the overall sample. We analyzed the families based on their evidence for linkage to chromosome 2, chromosome 17 and a maximum score using the strongest evidence of linkage from either of the two loci. Significant linkage to a 5-cM linkage interval with a peak OSA nonparametric allele-sharing LOD score of 4.876 on Xq26.3-q27 (ΔLOD=3.193, empirical P=0.009) was observed in a subset of 41 families weakly linked to 2q37, overlapping the HPCX1 locus. Two peaks that were novel to the analysis combining linkage evidence from both primary loci were identified; 18q12.1-q12.2 (OSA LOD=2.541, ΔLOD=1.651, P=0.03) and 22q11.1-q11.21 (OSA LOD=2.395, ΔLOD=2.36, P=0.006), which is close to HPC6. Using OSA allows us to find additional loci linked to HPC in subsets of families, and underlines the complex genetic heterogeneity of HPC even in highly aggregated families.
PMCID: PMC3598326  PMID: 22948022
linkage analysis; ordered subset analysis; prostate cancer
2.  Risk estimation using probability machines 
BioData Mining  2014;7:2.
Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios.
We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented.
The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from.
PMCID: PMC4015350  PMID: 24581306
Consistent nonparametric regression; Logistic regression; Probability machine; Odds ratio; Counterfactuals; Interactions
3.  Matrix metalloproteinases and educational attainment in refractive error: evidence of gene-environment interactions in the AREDS study 
Ophthalmology  2012;120(2):298-305.
A previous study of Old Order Amish families has shown association of ocular refraction with markers proximal to matrix metalloproteinase (MMP) genes MMP1 and MMP10 and intragenic to MMP2. We conducted a candidate gene replication study of association between refraction and single nucleotide polymorphisms (SNPs) within these genomic regions.
Candidate gene genetic association study.
2,000 participants drawn from the Age Related Eye Disease Study (AREDS) were chosen for genotyping. After quality control filtering, 1912 individuals were available for analysis.
Microarray genotyping was performed using the HumanOmni 2.5 bead array. SNPs originally typed in the previous Amish association study were extracted for analysis. In addition, haplotype tagging SNPs were genotyped using TaqMan assays. Quantitative trait association analyses of mean spherical equivalent refraction (MSE) were performed on 30 markers using linear regression models and an additive genetic risk model, while adjusting for age, sex, education, and population substructure. Post-hoc analyses were performed after stratifying on a dichotomous education variable. Pointwise (P-emp) and multiple-test study-wise (P-multi) significance levels were calculated empirically through permutation.
Main outcome measures
MSE was used as a quantitative measure of ocular refraction.
The mean age and ocular refraction were 68 years (SD=4.7) and +0.55 D (SD=2.14), respectively. Pointwise statistical significance was obtained for rs1939008 (P-emp=0.0326). No SNP attained statistical significance after correcting for multiple testing. In stratified analyses, multiple SNPs reached pointwise significance in the lower-education group: 2 of these were statistically significant after multiple testing correction. The two highest-ranking SNPs in Amish families (rs1939008 and rs9928731) showed pointwise P-emp<0.01 in the lower-education stratum of AREDS participants.
We show suggestive evidence of replication of an association signal for ocular refraction to a marker between MMP1 and MMP10. We also provide evidence of a gene-environment interaction between previously-reported markers and education on refractive error. Variants in MMP1- MMP10 and MMP2 regions appear to affect population variation in ocular refraction in environmental conditions less favorable for myopia development.
PMCID: PMC3563738  PMID: 23098370
refraction; refractive error; myopia; association study; gene-environment interaction; matrix metalloproteinase; MMP; genetics
4.  Linkage Analysis in the Next-Generation Sequencing Era 
Human Heredity  2011;72(4):228-236.
Linkage analysis was developed to detect excess co-segregation of the putative alleles underlying a phenotype with the alleles at a marker locus in family data. Many different variations of this analysis and corresponding study design have been developed to detect this co-segregation. Linkage studies have been shown to have high power to detect loci that have alleles (or variants) with a large effect size, i.e. alleles that make large contributions to the risk of a disease or to the variation of a quantitative trait. However, alleles with a large effect size tend to be rare in the population. In contrast, association studies are designed to have high power to detect common alleles which tend to have a small effect size for most diseases or traits. Although genome-wide association studies have been successful in detecting many new loci with common alleles of small effect for many complex traits, these common variants often do not explain a large proportion of disease risk or variation of the trait. In the past, linkage studies were successful in detecting regions of the genome that were likely to harbor rare variants with large effect for many simple Mendelian diseases and for many complex traits. However, identifying the actual sequence variant(s) responsible for these linkage signals was challenging because of difficulties in sequencing the large regions implicated by each linkage peak. Current ‘next-generation’ DNA sequencing techniques have made it economically feasible to sequence all exons or the whole genomes of a reasonably large number of individuals. Studies have shown that rare variants are quite common in the general population, and it is now possible to combine these new DNA sequencing methods with linkage studies to identify rare causal variants with a large effect size. A brief review of linkage methods is presented here with examples of their relevance and usefulness for the interpretation of whole-exome and whole-genome sequence data.
PMCID: PMC3267991  PMID: 22189465
Linkage; Genetics; DNA sequence; Whole-genome sequence; Whole-exome sequence
5.  Genome-Wide Association Study of Intracranial Aneurysms Confirms Role of Anril and SOX17 in Disease Risk 
Genome-wide association studies have identified novel genetic factors that contribute to intracranial aneurysm (IA) susceptibility. We sought to confirm previously reported loci, to identify novel risk factors, and to evaluate the contribution of these factors to familial and sporadic IA.
We utilized 2 complementary samples, one recruited on the basis of a dense family history of IA (discovery sample 1: 388 IA cases and 397 controls) and the other without regard to family history (discovery sample 2: 1095 IA cases and 1286 controls). Imputation was used to generate a common set of single nucleotide polymorphisms (SNP) across samples, and a logistic regression model was used to test for association in each sample. Results from each sample were then combined in a meta-analysis.
There was only modest overlap in the association results obtained in the 2 samples. In neither sample did results reach genome-wide significance. However, the meta-analysis yielded genome-wide significance for SNP on chromosome 9p (CDKN2BAS; rs6475606; P=3.6×10−8) and provided further evidence to support the previously reported association of IA with SNP in SOX17 on chromosome 8q (rs1072737; P=8.7×10−5). Analyses suggest that the effect of smoking acts multiplicatively with the SNP genotype, and smoking has a greater effect on risk than SNP genotype.
In addition to replicating several previously reported loci, we provide further evidence that the association on chromosome 9p is attributable to variants in CDKN2BAS (also known as ANRIL, an antisense noncoding RNA).
PMCID: PMC3752852  PMID: 22961961
genome-wide association study; intracranial aneurysm
6.  Regression and Data Mining Methods for Analyses of Multiple Rare Variants in the Genetic Analysis Workshop 17 Mini-Exome Data 
Genetic Epidemiology  2011;35(Suppl 1):S92-100.
Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.
PMCID: PMC3360949  PMID: 22128066
rare variants; LASSO; machine learning; random forests; logic regression; binary trees; Poisson regression; ISIS; classification trees; meta-analysis; extreme sampling
7.  Brief Review of Regression-Based and Machine Learning Methods in Genetic Epidemiology: The Genetic Analysis Workshop 17 Experience 
Genetic Epidemiology  2011;35(Suppl 1):S5-11.
Genetics Analysis Workshop 17 provided common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates. We provide a brief review of the machine learning and regression-based methods used in the analyses of these data. Several regression and machine learning methods were used to address different problems inherent in the analyses of these data, which are high-dimension, low-sample-size data typical of many genetic association studies. Unsupervised methods, such as cluster analysis, were used for data segmentation and subset selection. Supervised learning methods, which include regression-based methods (e.g., generalized linear models, logic regression, and regularized regression) and tree-based methods (e.g., decision trees and random forests), were used for variable selection (selecting genetic and clinical features most associated or predictive of outcome) and prediction (developing models using common and rare genetic variants to accurately predict outcome), with the outcome being case-control status or quantitative trait value. We include a discussion of cross-validation for model selection and assessment and a description of available software resources for these methods.
PMCID: PMC3345521  PMID: 22128059
unsupervised learning; supervised learning; cluster analysis; logistic regression; Poisson regression; logic regression; LASSO; ridge regression; decision trees; random forests; cross-validation; software
8.  Familial aggregation of myopia in the Tehran eye study: estimation of the sibling and parent–offspring recurrence risk ratios 
The British Journal of Ophthalmology  2007;91(11):1440-1444.
To determine the potential influence of genetic factors on the prevalence of myopia in Tehran.
Of 6497 citizens of Tehran sampled from 160 clusters using stratified random cluster sampling, 4565 (70.3%) participated in the study and were referred to a clinic for an extensive eye examination and interview. These were from 1259 nuclear families with the average size of 3.6. Refraction data obtained from 3321 participants aged 16 years and over are presented. Three definitions of myopia, as the spherical equivalent of −0.5, −1, and −2 diopters or less, were used. Familial aggregation of myopia was evaluated with odds ratios and recurrence risk ratios (λR) using a multiple logistic regression with generalised estimating equations (GEE), adjusted for age, sex, height, and education.
Multivariate analyses showed a strong familial aggregation of myopia among siblings (λR ranging from 2.09 to 3.86) and parent–offspring pairs (λR from 1.82 to 3.81) adjusted for age, sex, height, and education. The aggregation increased with higher myopia thresholds and with the use of cycloplegic refraction. The odds ratios for spouse pairs were not significantly different from 1.0. The association of myopia with sex, height, and education (and not age) remained significant in the final GEE2 model.
The findings indicate a relatively high degree of familial aggregation of myopia in the Tehran population, independent of age, sex, height, and education. This residual aggregation may be a result of heredity or of an unmeasured common environmental effect.
PMCID: PMC2095425  PMID: 17494955
familial myopia; myopia; refractive error; recurrence risk
9.  Chromosomes 4 and 8 Implicated in a Genome Wide SNP Linkage Scan of 762 Prostate Cancer Families Collected by the ICPCG 
The Prostate  2011;72(4):410-426.
In spite of intensive efforts, understanding of the genetic aspects of familial prostate cancer remains largely incomplete. In a previous microsatellite-based linkage scan of 1233 prostate cancer (PC) families, we identified suggestive evidence for linkage (i.e. LOD≥1.86) at 5q12, 15q11, 17q21, 22q12, and two loci on 8p, with additional regions implicated in subsets of families defined by age at diagnosis, disease aggressiveness, or number of affected members.
In an attempt to replicate these findings and increase linkage resolution, we used the Illumina 6000 SNP linkage panel to perform a genome-wide linkage scan of an independent set of 762 multiplex PC families, collected by 11 ICPCG groups.
Of the regions identified previously, modest evidence of replication was observed only on the short arm of chromosome 8, where HLOD scores of 1.63 and 3.60 were observed in the complete set of families and families with young average age at diagnosis, respectively. The most significant linkage signals found in the complete set of families were observed across a broad, 37 cM interval on 4q13-25, with LOD scores ranging from 2.02 to 2.62, increasing to 4.50 in families with older average age at diagnosis. In families with multiple cases presenting with more aggressive disease, LOD scores over 3.0 were observed at 8q24 in the vicinity of previously identified common PC risk variants, as well as MYC, an important gene in PC biology.
These results will be useful in prioritizing future susceptibility gene discovery efforts in this common cancer.
PMCID: PMC3568777  PMID: 21748754
10.  Investigation of altering single-nucleotide polymorphism density on the power to detect trait loci and frequency of false positive in nonparametric linkage analyses of qualitative traits 
BMC Genetics  2005;6(Suppl 1):S20.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.
PMCID: PMC1866766  PMID: 16451629
11.  Application of the propensity score in a covariate-based linkage analysis of the Collaborative Study on the Genetics of Alcoholism 
BMC Genetics  2005;6(Suppl 1):S33.
Covariate-based linkage analyses using a conditional logistic model as implemented in LODPAL can increase the power to detect linkage by minimizing disease heterogeneity. However, each additional covariate analyzed will increase the degrees of freedom for the linkage test, and therefore can also increase the type I error rate. Use of a propensity score (PS) has been shown to improve consistently the statistical power to detect linkage in simulation studies. Defined as the conditional probability of being affected given the observed covariate data, the PS collapses multiple covariates into a single variable. This study evaluates the performance of the PS to detect linkage evidence in a genome-wide linkage analysis of microsatellite marker data from the Collaborative Study on the Genetics of Alcoholism. Analytical methods included nonparametric linkage analysis without covariates, with one covariate at a time including multiple PS definitions, and with multiple covariates simultaneously that corresponded to the PS definitions. Several definitions of the PS were calculated, each with increasing number of covariates up to a maximum of five. To account for the potential inflation in the type I error rates, permutation based p-values were calculated.
Results suggest that the use of individual covariates may not necessarily increase the power to detect linkage. However the use of a PS can lead to an increase when compared to using all covariates simultaneously. Specifically, PS3, which combines age at interview, sex, and smoking status, resulted in the greatest number of significant markers identified. All methods consistently identified several chromosomal regions as significant, including loci on chromosome 2, 6, 7, and 12.
These results suggest that the use of a propensity score can increase the power to detect linkage for a complex disease such as alcoholism, especially when multiple important covariates can be used to predict risk and thereby minimize linkage heterogeneity. However, because the PS is calculated as a conditional probability of being affected, it does require the presence of observed covariate data on both affected and unaffected individuals, which may not always be available in real data sets.
PMCID: PMC1866752  PMID: 16451643
12.  Identification of tag single-nucleotide polymorphisms in regions with varying linkage disequilibrium 
BMC Genetics  2005;6(Suppl 1):S73.
We compared seven different tagging single-nucleotide polymorphism (SNP) programs in 10 regions with varied amounts of linkage disequilibrium (LD) and physical distance. We used the Collaborative Studies on the Genetics of Alcoholism dataset, part of the Genetic Analysis Workshop 14. We show that in regions with moderate to strong LD these programs are relatively consistent, despite different parameters and methods. In addition, we compared the selected SNPs in a multipoint linkage analysis for one region with strong LD. As the number of selected SNPs increased, the LOD score, mean information content, and type I error also increased.
PMCID: PMC1866708  PMID: 16451687
13.  Haplotypic structure of the X chromosome in the COGA population sample and the quality of its reconstruction by extant software packages 
BMC Genetics  2005;6(Suppl 1):S77.
The haplotypes of the X chromosome are accessible to direct count in males, whereas the diplotypes of the females may be inferred knowing the haplotype of their sons or fathers. Here, we investigated: 1) the possible large-scale haplotypic structure of the X chromosome in a Caucasian population sample, given the single-nucleotide polymorphism (SNP) maps and genotypes provided by Illumina and Affimetrix for Genetic Analysis Workshop 14, and, 2) the performances of widely used programs in reconstructing haplotypes from population genotypic data, given their known distribution in a sample of unrelated individuals.
All possible unrelated mother-son pairs of Caucasian ancestry (N = 104) were selected from the 143 families of the Collaborative Study on the Genetics of Alcoholism pedigree files, and the diplotypes of the mothers were inferred from the X chromosomes of their sons. The marker set included 313 SNPs at an average density of 0.47 Mb. Linkage disequilibrium between pairs of markers was computed by the parameter D', whereas for measuring multilocus disequilibrium, we developed here an index called D*, and applied it to all possible sliding windows of 5 markers each. Results showed a complex pattern of haplotypic structure, with regions of low linkage disequilibrium separated by regions of high values of D*. The following programs were evaluated for their accuracy in inferring population haplotype frequencies: 1) ARLEQUIN 2.001; 2) PHASE 2.1.1; 3) SNPHAP 1.1; 4) HAPLOBLOCK 1.2; 5) HAPLOTYPER 1.0. Performances were evaluated by Pearson correlation (r) coefficient between the true and the inferred distribution of haplotype frequencies.
The SNP haplotypic structure of the X chromosome is complex, with regions of high haplotype conservation interspersed among regions of higher haplotype diversity. All the tested programs were accurate (r = 1) in reconstructing the distribution of haplotype frequencies in case of high D* values. However, only the program PHASE realized a high correlation coefficient (r > 0.7) in conditions of low linkage disequilibrium.
PMCID: PMC1866704  PMID: 16451691
15.  HOXB13 is a susceptibility gene for prostate cancer: results from the International Consortium for Prostate Cancer Genetics (ICPCG) 
Human Genetics  2012;132(1):5-14.
Prostate cancer has a strong familial component but uncovering the molecular basis for inherited susceptibility for this disease has been challenging. Recently, a rare, recurrent mutation (G84E) in HOXB13 was reported to be associated with prostate cancer risk. Confirmation and characterization of this finding is necessary to potentially translate this information to the clinic. To examine this finding in a large international sample of prostate cancer families, we genotyped this mutation and 14 other SNPs in or flanking HOXB13 in 2,443 prostate cancer families recruited by the International Consortium for Prostate Cancer Genetics (ICPCG). At least one mutation carrier was found in 112 prostate cancer families (4.6 %), all of European descent. Within carrier families, the G84E mutation was more common in men with a diagnosis of prostate cancer (194 of 382, 51 %) than those without (42 of 137, 30 %), P = 9.9 × 10−8 [odds ratio 4.42 (95 % confidence interval 2.56–7.64)]. A family-based association test found G84E to be significantly over-transmitted from parents to affected offspring (P = 6.5 × 10−6). Analysis of markers flanking the G84E mutation indicates that it resides in the same haplotype in 95 % of carriers, consistent with a founder effect. Clinical characteristics of cancers in mutation carriers included features of high-risk disease. These findings demonstrate that the HOXB13 G84E mutation is present in ~5 % of prostate cancer families, predominantly of European descent, and confirm its association with prostate cancer risk. While future studies are needed to more fully define the clinical utility of this observation, this allele and others like it could form the basis for early, targeted screening of men at elevated risk for this common, clinically heterogeneous cancer.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-012-1229-4) contains supplementary material, which is available to authorized users.
PMCID: PMC3535370  PMID: 23064873
16.  Identification of a Novel Tumor Suppressor Gene p34 on Human Chromosome 6q25.1 
Cancer research  2007;67(1):93-99.
In this study, we observed loss of heterozygosity (LOH) in human chromosomal fragment 6q25.1 in sporadic lung cancer patients. LOH was observed in 65% of the 26 lung tumors examined and was narrowed down to a 2.2-Mb region. Single-nucleotide polymorphism (SNP) analysis of genes located within this region identified a candidate gene, termed p34. This gene, also designated as ZC3H12D, C6orf95, FLJ46041, or dJ281H8.1, carries an A/G nonsynonymous SNP at codon 106, which alters the amino acid from lysine to arginine. Nearly 73% of heterozygous lung cancer tissues with LOH and the A/G SNP also exhibited loss of the A allele. In vitro clonogenic and in vivo nude mouse studies showed that overexpression of the A allele exerts tumor suppressor function compared with the G allele. p34 is located within a recently mapped human lung cancer susceptibility locus, and association of the p34 A/G SNP was tested among these families. No significant association between the less frequent G allele and lung cancer susceptibility was found. Our results suggest that p34 may be a novel tumor suppressor gene involved in sporadic lung cancer but it seems not to be the candidate familial lung cancer susceptibility gene linked to chromosomal region 6q23-25.
PMCID: PMC3461257  PMID: 17210687
17.  EGFR-T790M Is a Rare Lung Cancer Susceptibility Allele with Enhanced Kinase Activity 
Cancer research  2007;67(10):4665-4670.
The use of tyrosine kinase inhibitors (TKI) has yielded great success in treatment of lung adenocarcinomas. However, patients who develop resistance to TKI treatment often acquire a somatic resistance mutation (T790M) located in the catalytic cleft of the epidermal growth factor receptor (EGFR) enzyme. Recently, a report describing EGFR-T790M as a germ-line mutation suggested that this mutation may be associated with inherited susceptibility to lung cancer. Contrary to previous reports, our analysis indicates that the T790M mutation confers increased Y992 and Y1068 phosphorylation levels. In a human bronchial epithelial cell line, overexpression of EGFR-T790M displayed a growth advantage over wild-type (WT) EGFR. We also screened 237 lung cancer family probands, in addition to 45 bronchoalveolar tumors, and found that none of them contained the EGFR-T790M mutation. Our observations show that EGFR-T790M provides a proliferative advantage with respect to WT EGFR and suggest that the enhanced kinase activity of this mutant is the basis for rare cases of inherited susceptibility to lung cancer.
PMCID: PMC3460269  PMID: 17510392
18.  The Biology of Tobacco and Nicotine: Bench to Bedside 
Strong epidemiologic evidence links smoking and cancer. An increased understanding of the molecular biology of tobacco-related cancers could advance progress toward improving smoking cessation and patient management. Knowledge gaps between tobacco addiction, tumorigenesis, and cancer brought an interdisciplinary group of investigators together to discuss “The Biology of Nicotine and Tobacco: Bench to Bedside.” Presentations on the signaling pathways and pathogenesis in tobacco-related cancers, mouse models of addiction, imaging and regulation of nicotinic receptors, the genetic basis for tobacco carcinogenesis and development of lung cancer, and molecular mechanisms of carcinogenesis were heard. Importantly, new opportunities to use molecular biology to identify and abrogate tobacco-mediated carcinogenesis and to identify high-risk individuals were recognized.
PMCID: PMC3459058  PMID: 15824140
19.  Genome-wide linkage analysis of 1233 prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics using novel sumLINK and sumLOD analyses 
The Prostate  2010;70(7):735-744.
Prostate cancer is generally believed to have a strong inherited component, but the search for susceptibility genes has been hindered by the effects of genetic heterogeneity. The recently developed sumLINK and sumLOD statistics are powerful tools for linkage analysis in the presence of heterogeneity.
We performed a secondary analysis of 1233 prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics (ICPCG) using two novel statistics, the sumLINK and sumLOD. For both statistics, dominant and recessive genetic models were considered. False discovery rate (FDR) analysis was conducted to assess the effects of multiple testing.
Our analysis identified significant linkage evidence at chromosome 22q12, confirming previous findings by the initial conventional analyses of the same ICPCG data. Twelve other regions were identified with genomewide suggestive evidence for linkage. Seven regions (1q23, 5q11, 5q35, 6p21, 8q12, 11q13, 20p11-q11) are near loci previously identified in the initial ICPCG pooled data analysis or the subset of aggressive prostate cancer (PC) pedigrees. Three other regions (1p12, 8p23, 19q13) confirm loci reported by others, and two (2p24, 6q27) are novel susceptibility loci. FDR testing indicates that over 70% of these results are likely true positive findings. Statistical recombinant mapping narrowed regions to an average of 9 cM.
Our results represent genomic regions with the greatest consistency of positive linkage evidence across a very large collection of high-risk prostate cancer pedigrees using new statistical tests that deal powerfully with heterogeneity. These regions are excellent candidates for further study to identify prostate cancer predisposition genes.
PMCID: PMC3428045  PMID: 20333727
20.  Importance sampling method of correction for multiple testing in affected sib-pair linkage analysis 
BMC Genetics  2003;4(Suppl 1):S73.
Using the Genetic Analysis Workshop 13 simulated data set, we compared the technique of importance sampling to several other methods designed to adjust p-values for multiple testing: the Bonferroni correction, the method proposed by Feingold et al., and naïve Monte Carlo simulation. We performed affected sib-pair linkage analysis for each of the 100 replicates for each of five binary traits and adjusted the derived p-values using each of the correction methods. The type I error rates for each correction method and the ability of each of the methods to detect loci known to influence trait values were compared. All of the methods considered were conservative with respect to type I error, especially the Bonferroni method. The ability of these methods to detect trait loci was also low. However, this may be partially due to a limitation inherent in our binary trait definitions.
PMCID: PMC1866512  PMID: 14975141
22.  ARLTS1 germline variants and the risk for breast, prostate, and colorectal cancer 
Recently, a nonsense alteration Trp149Stop in the ARLTS1 gene was found more frequently in familial cancer cases vs. sporadic cancer patients and healthy controls. Here, the role of Trp149Stop or any other ARLTS1 germline variant was evaluated on breast, prostate, and colorectal cancer risk. The whole gene was screened for germline alterations in 855 familial cancer patients. The five observed variants were further screened in 1169 non-familial cancer patients as well as in 809 healthy population controls. The Trp149Stop was found at low frequencies (0.5–1.2%) in all patient subgroups vs. 1.6% in controls, and the mutant allele did not co-segregate with disease status in families with multiple affected individuals. The CC genotype in the Cys148Arg variant was slightly more common among both familial and sporadic breast (OR=1.48, 95% CI 1.16–1.87, p=0.001) and prostate cancer patients (OR 1.50, 95% CI 1.13–1.99, p=0.005) when compared to controls. A novel ARLTS1 variant Gly65Val was found at higher frequency among familial prostate cancer patients (8/164, 4.9%) than in controls (13/809, 1.6%; OR 3.14, 95% CI 1.28–7.70, p=0.016). However, after adjusting for multiple testing, none of these results were still significant. No association was found with any of the variants and colorectal cancer risk. Our results suggest that Trp149Stop is not a predisposition allele in breast, prostate, or colorectal cancer in the Finnish population, and, while the Gly65Val variant may increase familial prostate cancer risk and the Cys148Arg change may affect both breast and prostate cancer risk, the evidence is not strong in these data.
PMCID: PMC3404127  PMID: 18337727
ARLTS1; ARL11; prostate cancer; breast cancer; colorectal cancer
23.  Contribution of HPC1 (RNASEL) and HPCX variants to prostate cancer in a founder population 
The Prostate  2010;70(15):1716-1727.
Prostate cancer is a genetically complex disease with locus and disease heterogeneity. The RNASEL gene and HPCX locus have been implicated in hereditary prostate cancer; however, their contributions to sporadic forms of this malignancy remain uncertain.
Associations of prostate cancer with two variants in the RNASEL gene (a founder mutation, 471delAAAG, and a non-synonymous SNP, rs486907), and with five microsatellite markers in the HPCX locus, were examined in 979 cases and 1,251 controls of Ashkenazi Jewish descent. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using logistic regression models.
There was an inverse association between RNASEL rs486907 and prostate cancer in younger men (<65 years) and those with a first-degree relative with prostate cancer; men with AA genotype had ORs of 0.64 and 0.47 (both p<0.05), respectively, in comparison to men with GG genotype. Within the HPCX region, there were positive associations for allele 135 of bG82i1.1 marker (OR=1.77, p=0.01) and allele 188 of DXS1205 (OR=1.65, p=0.02). In addition, allele 248 of marker D33 was inversely associated (OR=0.65, p=0.05) with Gleason score ≥7 tumors.
Results suggest that variants in RNASEL contribute to susceptibility to early onset and familial forms of prostate cancer, whereas HPCX variants are associated with prostate cancer risk and tumor aggressiveness. The observation that a mutation predicted to completely inactivate RNASEL protein was not associated with prostate cancer, but that a missense variant was associated, suggests that the effect is due to either partial inactivation of the protein, and/or acquisition of a new protein activity.
PMCID: PMC3404133  PMID: 20564318
24.  A Founder Mutation in LEPRE1 Carried by 1.5% of West Africans and 0.4% of African Americans Causes Lethal Recessive Osteogenesis Imperfecta 
Genetics in Medicine  2012;14(5):543-551.
Deficiency of prolyl 3-hydroxylase 1, encoded by LEPRE1, causes recessive osteogenesis imperfecta. We previously identified a LEPRE1 mutation, exclusively in African Americans and contemporary West Africans. We hypothesized that this allele originated in West Africa and was introduced to the Americas with the Atlantic slave trade. We aimed to determine the frequency of carriers for this mutation among African Americans and West Africans, and the mutation origin and age.
Genomic DNA was screened for the mutation using PCR and restriction digestion, and a custom TaqMan genomic SNP assay. The mutation age was estimated using microsatellites and short tandem repeats spanning 4.2 Mb surrounding LEPRE1 in probands and carriers.
Approximately 0.4% of Mid-Atlantic African Americans carry this mutation, estimating recessive OI in 1/260,000 births in this population. In Nigeria and Ghana, 1.48% of unrelated individuals are heterozygous carriers, predicting 1/18,260 births will be affected with recessive OI, equal to the incidence of de novo dominant OI. The mutation was not detected in Africans from surrounding countries. All carriers shared a haplotype of 63-770 Kb, consistent with a single founder for this mutation. Using linkage disequilibrium analysis, the mutation was estimated to have originated between 650 and 900 years before present (1100-1350 C.E.).
We identified a West African founder mutation for recessive OI in LEPRE1. Nearly 1.5% of Ghanians and Nigerians are carriers. The age of this allele is consistent with introduction to North America via the Atlantic slave trade (1501 – 1867 C.E).
PMCID: PMC3393768  PMID: 22281939
LEPRE1; osteogenesis imperfecta; founder mutation; West Africa
25.  A Second Genetic Variant on Chromosome 15q24–25.1 Associates with Lung Cancer 
Cancer Research  2010;70(8):3128-3135.
A common variant on chromosomal region 15q24–25.1, marked by rs1051730, was found to be associated with lung cancer risk. Here, we attempted to confirm the second variant on 15q24–25.1 in several large sporadic lung cancer populations and determined what percentage of additional risk for lung cancer is due to the genetic effect of the second variant. SNPs rs1051730 and rs481134 were genotyped in 2,818 lung cancer cases and 2,766 controls from four populations. Joint analysis of these two variants (rs1051730 and rs481134) on 15q24–25.1 identified three major haplotypes (G_T, A_C, and G_C) and provided stronger evidence for association of 15q24–25.1 with lung cancer (P = 9.72 × 10−9). These two variants represent three levels of risk associated with lung cancer. The most common haplotype G_T is neutral; the haplotype A_C is associated with increased risk for lung cancer with 5.0% higher frequency in cases than in controls [P = 1.68 × 10−7; odds ratio (OR), 1.24; 95% confidence interval (95% CI), 1.14–1.35]; whereas the haplotype G_C is associated with reduced risk for lung cancer with 4.4% lower frequency in cases than in controls (P = 7.39 × 10−7; OR, 0.80; 95% CI, 0.73–0.87). We further showed that these two genetic variants on 15q24–25.1 independently influence lung cancer risk (rs1051730: P = 4.42 × 10−11; OR, 1.60; 95% CI, 1.46–1.74; rs481134: P = 7.01 × 10−4; OR, 0.81; 95% CI, 0.72–0.92). The second variant on 15q24–25.1, marked by rs481134, explains an additional 13.2% of population attributable risk for lung cancer.
PMCID: PMC3378320  PMID: 20395203

Results 1-25 (60)