1.  Prognostic relevance of acquired uniparental disomy in serous ovarian cancer 
Molecular Cancer  2015;14(1):29.
Acquired uniparental disomy (aUPD) can lead to homozygosity for tumor suppressor genes or oncogenes. Our purpose is to determine the frequency and profile aUPD regions in serous ovarian cancer (SOC) and investigated the association of aUPD with clinical features and patient outcomes.
We analyzed single nucleotide polymorphism (SNP) array-based genotyping data on 532 SOC specimens from The Cancer Genome Atlas database to identify aUPD regions. Cox univariate regression and Cox multivariate proportional hazards analyses were performed for survival analysis.
We found that 94.7% of SOC samples harbored aUPD; the most common aUPD regions were in chromosomes 17q (76.7%), 17p (39.7%), and 13q (38.3%). In Cox univariate regression analysis, two independent regions of aUPD on chromosome 17q (A and C), and whole-chromosome aUPD were associated with shorter overall survival (OS), and five regions on chromosome 17q (A, D-G) and BRCA1 were associated with recurrence-free survival time. In Cox multivariable proportional hazards analysis, whole-chromosome aUPD was associated with shorter OS. One region of aUPD on chromosome 22q (B) was associated with unilateral disease. A statistically significant association was found between aUPD at TP53 loci and homozygous mutation of TP53 (p < 0.0001).
aUPD is a common event and some recurrent loci are associated with a poor outcome for patients with serous ovarian cancer.
Electronic supplementary material
The online version of this article (doi:10.1186/s12943-015-0289-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4320828  PMID: 25644622
Acquired uniparental disomy; Ovarian cancer; Overall survival; Recurrence-free survival
2.  Genes-environment interactions in obesity- and diabetes-associated pancreatic cancer: A GWAS data analysis 
Obesity and diabetes are potentially alterable risk factors for pancreatic cancer. Genetic factors that modify the associations of obesity and diabetes with pancreatic cancer have previously not been examined at the genome-wide level.
Using GWAS genotype and risk factor data from the Pancreatic Cancer Case Control Consortium, we conducted a discovery study of 2,028 cases and 2,109 controls to examine gene-obesity and gene-diabetes interactions in relation to pancreatic cancer risk by employing the likelihood ratio test (LRT) nested in logistic regression models and Ingenuity Pathway Analysis (IPA).
After adjusting for multiple comparisons, a significant interaction of the chemokine signaling pathway with obesity (P = 3.29 × 10−6) and a near significant interaction of calcium signaling pathway with diabetes (P = 1.57 × 10−4) in modifying the risk of pancreatic cancer was observed. These findings were supported by results from IPA analysis of the top genes with nominal interactions. The major contributing genes to the two top pathways include GNGT2, RELA, TIAM1 and GNAS. None of the individual genes or SNPs except one SNP remained significant after adjusting for multiple testing. Notably, SNP rs10818684 of the PTGS1 gene showed an interaction with diabetes (P = 7.91 × 10−7) at a false discovery rate of 6%.
Genetic variations in inflammatory response and insulin resistance may affect the risk of obesity and diabetes-related pancreatic cancer. These observations should be replicated in additional large datasets.
Gene-environment interaction analysis may provide new insights into the genetic susceptibility and molecular mechanisms of obesity- and diabetes-related pancreatic cancer.
PMCID: PMC3947145  PMID: 24136929
GWAS; obesity; diabetes; interaction; pancreatic cancer; genetic susceptibility
3.  Trans-ethnic genome-wide association study of colorectal cancer identifies a new susceptibility locus in VTI1A 
Nature communications  2014;5:4613.
The genetic basis of sporadic colorectal cancer (CRC) is not well explained by known risk polymorphisms. Here we perform a meta-analysis of two genome-wide association studies in 2,627 cases and 3,797 controls of Japanese ancestry and 1,894 cases and 4,703 controls of African ancestry, to identify genetic variants that contribute to CRC susceptibility. We replicate genome-wide statistically significant associations (P < 5×10−8) in 16,823 cases and 18,211 controls of European ancestry. This study reveals a new pan-ethnic CRC risk locus at 10q25 (rs12241008, intronic to VTI1A; P=1.4×10−9), providing additional insight into the etiology of CRC and highlighting the value of association mapping in diverse populations.
PMCID: PMC4180879  PMID: 25105248
4.  Targeted Sequencing in Chromosome 17q Linkage Region Identifies Familial Glioma Candidates in the Gliogene Consortium 
Scientific Reports  2015;5:8278.
Glioma is a rare, but highly fatal, cancer that accounts for the majority of malignant primary brain tumors. Inherited predisposition to glioma has been consistently observed within non-syndromic families. Our previous studies, which involved non-parametric and parametric linkage analyses, both yielded significant linkage peaks on chromosome 17q. Here, we use data from next generation and Sanger sequencing to identify familial glioma candidate genes and variants on chromosome 17q for further investigation. We applied a filtering schema to narrow the original list of 4830 annotated variants down to 21 very rare (<0.1% frequency), non-synonymous variants. Our findings implicate the MYO19 and KIF18B genes and rare variants in SPAG9 and RUNDC1 as candidates worthy of further investigation. Burden testing and functional studies are planned.
PMCID: PMC4317686  PMID: 25652157
5.  Fine-mapping of the 5p15.33, 6p22.1-p21.31 and 15q25.1 regions identifies functional and histology-specific lung cancer susceptibility loci in African-Americans 
Genome-wide association studies of European and East Asian populations have identified lung cancer susceptibility loci on chromosomes 5p15.33, 6p22.1-p21.31 and 15q25.1. We investigated whether these regions contain lung cancer susceptibly loci in African-Americans refined previous association signals by utilizing the reduced linkage disequilibrium observed in African-Americans.
1308 African-American cases and 1241 African-American controls from three centers were genotyped for 760 single nucleotide polymorphisms spanning three regions, and additional SNP imputation was performed. Associations between polymorphisms and lung cancer risk were estimated using logistic regression, stratified by tumor histology where appropriate.
The strongest associations were observed on 15q25.1 in/near CHRNA5, including a missense substitution (rs16969968: OR = 1.57, 95% CI = 1.25–1.97, P = 1.1 × 10−4) and variants in the 5′-UTR. Associations on 6p22.1-p21.31 were histology-specific and included a missense variant in BAT2 associated with squamous-cell carcinoma (rs2736158: OR = 0.64, 95% CI = 0.48–0.85, P = 1.82 × 10−3). Associations on 5p15.33 were detected near TERT, the strongest of which was rs2735940 (OR = 0.82, 95% CI = 0.73–0.93, P = 1.1 × 10−3). This association was stronger among cases with adenocarcinoma (OR = 0.75, 95% CI = 0.65–0.86, P = 8.1 × 10−5).
Polymorphisms in 5p15.33, 6p22.1-p21.31 and 15q25.1 are associated with lung cancer in African-Americans. Variants on 5p15.33 are stronger risk factors for adenocarcinoma and variants on 6p21.33 associated only with squamous-cell carcinoma.
Results implicate the BAT2, TERT and CHRNA5 genes in the pathogenesis of specific lung cancer histologies.
PMCID: PMC3565099  PMID: 23221128
Lung cancer; adenocarcinoma; squamous-cell carcinoma; fine-mapping; African-American; genetic association
6.  A Network-Based Kernel Machine Test for the Identification of Risk Pathways in Genome-Wide Association Studies 
Human heredity  2014;76(2):64-75.
Biological pathways provide rich information and biological context on the genetic causes of complex diseases. The logistic kernel machine test integrates prior knowledge on pathways in order to analyze data from genome-wide association studies (GWAS). Here, the kernel converts genomic information of two individuals to a quantitative value reflecting their genetic similarity. With the selection of the kernel one implicitly chooses a genetic effect model. Like many other pathway methods, none of the available kernels accounts for topological structure of the pathway or gene-gene interaction types. However, evidence indicates that connectivity and neighborhood of genes are crucial in the context of GWAS, because genes associated with a disease often interact. Thus, we propose a novel kernel that incorporates the topology of pathways and information on interactions. Using simulation studies, we demonstrate that the proposed method maintains the type I error correctly and can be more effective in the identification of pathways associated with a disease than non-network-based methods. We apply our approach to genome-wide association case control data on lung cancer and rheumatoid arthritis. We identify some promising new pathways associated with these diseases, which may improve our current understanding of the genetic mechanisms.
PMCID: PMC4026009  PMID: 24434848
Kernel Machine Test; Pathways; Networks; Gene-Gene Interactions; Score Test; Generalized Linear Model; Lung Cancer; Rheumatoid Arthritis; Disease Association; Genetic Association Studies
7.  Trends in prevalence of prognostic factors and survival in lung cancer patients from 1985 to 2004 at a tertiary care center 
Cancer detection and prevention  2008;32(2):101-108.
After a prolonged period of increasing rates of lung cancer incidence and mortality for both men and women, incidence and mortality rates are decreasing in men and stabilizing in women. The goal of this study was to assess changes over 20 years in the prevalence of known risk factors for lung cancer and to elucidate possible predictors associated with lung cancer survival.
The study included a total of 908 patients with primary lung cancer referred to The University of Texas M. D. Anderson Cancer Center over three study periods 1985–1989 (N=392), 1993–1997 (N= 216), and 2000–2004 (N= 300). Detailed questionnaires were used to collect information from the patients. Hazard ratios were estimated by fitting a Cox proportional hazards model. Using the Kaplan Meier method, survival in months was calculated up to 2 years from the date of diagnosis to achieve comparability in the three groups.
We observed a decrease in the proportion of patients who are current cigarette smokers and an increase in the proportion of patients who present with adenocarcinoma of the lung, are obese and patients who present with localized disease. We also found an increase in the number of patients who report a family history of lung cancer. The overall median survival duration has increased over the years from 12.0 months in 1985–1989 to 17.5 months in 2000–2004. Also, the probability of survival of patients who were alive at 2 years after diagnosis has also increased (26.5% in 1985–1989 to 40.8% in 2000–2004). Overall, women had a better median survival than men.
The results show that the demographic, histologic, clinical, and outcome variables of patients with lung cancer have changed over the past 20 years. Most important, the survival of patients with lung cancer has improved.
PMCID: PMC4287275  PMID: 18639390
Lung cancer survival; time-trends; predictors for lung cancer survival
8.  Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer 
Wang, Yufei | McKay, James D. | Rafnar, Thorunn | Wang, Zhaoming | Timofeeva, Maria | Broderick, Peter | Zong, Xuchen | Laplana, Marina | Wei, Yongyue | Han, Younghun | Lloyd, Amy | Delahaye-Sourdeix, Manon | Chubb, Daniel | Gaborieau, Valerie | Wheeler, William | Chatterjee, Nilanjan | Thorleifsson, Gudmar | Sulem, Patrick | Liu, Geoffrey | Kaaks, Rudolf | Henrion, Marc | Kinnersley, Ben | Vallée, Maxime | LeCalvez-Kelm, Florence | Stevens, Victoria L. | Gapstur, Susan M. | Chen, Wei V. | Zaridze, David | Szeszenia-Dabrowska, Neonilia | Lissowska, Jolanta | Rudnai, Peter | Fabianova, Eleonora | Mates, Dana | Bencko, Vladimir | Foretova, Lenka | Janout, Vladimir | Krokan, Hans E. | Gabrielsen, Maiken Elvestad | Skorpen, Frank | Vatten, Lars | Njølstad, Inger | Chen, Chu | Goodman, Gary | Benhamou, Simone | Vooder, Tonu | Valk, Kristjan | Nelis, Mari | Metspalu, Andres | Lener, Marcin | Lubiński, Jan | Johansson, Mattias | Vineis, Paolo | Agudo, Antonio | Clavel-Chapelon, Francoise | Bueno-de-Mesquita, H.Bas | Trichopoulos, Dimitrios | Khaw, Kay-Tee | Johansson, Mikael | Weiderpass, Elisabete | Tjønneland, Anne | Riboli, Elio | Lathrop, Mark | Scelo, Ghislaine | Albanes, Demetrius | Caporaso, Neil E. | Ye, Yuanqing | Gu, Jian | Wu, Xifeng | Spitz, Margaret R. | Dienemann, Hendrik | Rosenberger, Albert | Su, Li | Matakidou, Athena | Eisen, Timothy | Stefansson, Kari | Risch, Angela | Chanock, Stephen J. | Christiani, David C. | Hung, Rayjean J. | Brennan, Paul | Landi, Maria Teresa | Houlston, Richard S. | Amos, Christopher I.
Nature genetics  2014;46(7):736-741.
We conducted imputation to the 1000 Genomes Project of four genome-wide association studies of lung cancer in populations of European ancestry (11,348 cases and 15,861 controls) and genotyped an additional 10,246 cases and 38,295 controls for follow-up. We identified large-effect genome-wide associations for squamous lung cancer with the rare variants of BRCA2-K3326X (rs11571833; odds ratio [OR]=2.47, P=4.74×10−20) and of CHEK2-I157T (rs17879961; OR=0.38 P=1.27×10−13). We also showed an association between common variation at 3q28 (TP63; rs13314271; OR=1.13, P=7.22×10−10) and lung adenocarcinoma previously only reported in Asians. These findings provide further evidence for inherited genetic susceptibility to lung cancer and its biological basis. Additionally, our analysis demonstrates that imputation can identify rare disease-causing variants having substantive effects on cancer risk from pre-existing GWAS data.
PMCID: PMC4074058  PMID: 24880342
9.  Diverse convergent evidence in the genetic analysis of complex disease: coordinating omic, informatic, and experimental evidence to better identify and validate risk factors 
BioData Mining  2014;7:10.
In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions.
PMCID: PMC4112852  PMID: 25071867
Replication; Validation; Complex disease; Heterogeneity; GWAS; Omics; Type 2 error; Type 1 error; False negatives; False positives
10.  Fine-mapping of genome-wide association study-identified risk loci for colorectal cancer in African Americans 
Human Molecular Genetics  2013;22(24):5048-5055.
Genome-wide association studies of colorectal cancer (CRC) in Europeans and Asians have identified 21 risk susceptibility regions [29 index single-nucleotide polymorphisms (SNPs)]. Characterizing these risk regions in diverse racial groups with different linkage disequilibrium (LD) structure can help localize causal variants. We examined associations between CRC and all 29 index SNPs in 6597 African Americans (1894 cases and 4703 controls). Nine SNPs in eight regions (5q31.1, 6q26-q27, 8q23.3, 8q24.21, 11q13.4, 15q13.3, 18q21.1 and 20p12.3) formally replicated in our data with one-sided P-values <0.05 and the same risk directions as reported previously. We performed fine-mapping of the 21 risk regions (including 250 kb on both sides of the index SNPs) using genotyped and imputed markers at the density of the 1000 Genomes Project to search for additional or more predictive risk markers. Among the SNPs correlated with the index variants, two markers, rs12759486 (or rs7547751, a putative functional variant in perfect LD with it) in 1q41 and rs7252505 in 19q13.1, were more strongly and statistically significantly associated with CRC (P < 0.0006). The average per allele risk was improved using the replicated index variants and the two new markers (odds ratio = 1.14, P = 6.5 × 10−16) in African Americans, compared with using all index SNPs (odds ratio = 1.07, P = 3.4 × 10−10). The contribution of the two new risk SNPs to CRC heritability was estimated to be 1.5% in African Americans. This study highlights the importance of fine-mapping in diverse populations.
PMCID: PMC3836473  PMID: 23851122
11.  Genome-wide Association Study of Dermatomyositis Reveals Genetic Overlap with other Autoimmune Disorders 
Arthritis and rheumatism  2013;65(12):3239-3247.
To identify new genetic associations with juvenile and adult dermatomyositis (DM).
We performed a genome-wide association study (GWAS) of adult and juvenile DM patients of European ancestry (n = 1178) and controls (n = 4724). To assess genetic overlap with other autoimmune disorders, we examined whether 141 single nucleotide polymorphisms (SNPs) outside the major histocompatibility complex (MHC) locus, and previously associated with autoimmune diseases, predispose to DM.
Compared to controls, patients with DM had a strong signal in the MHC region consisting of GWAS-level significance (P < 5x10−8) at 80 genotyped SNPs. An analysis of 141 non-MHC SNPs previously associated with autoimmune diseases showed that three SNPs linked with three genes were associated with DM, with a false discovery rate (FDR) < 0.05. These genes were phospholipase C like 1 (PLCL1, rs6738825, FDR=0.00089), B lymphoid tyrosine kinase (BLK, rs2736340, FDR=0.00031), and chemokine (C-C motif) ligand 21 (CCL21, rs951005, FDR=0.0076). None of these genes was previously reported to be associated with DM.
Our findings confirm the MHC as the major genetic region associated with DM and indicate that DM shares non-MHC genetic features with other autoimmune diseases, suggesting the presence of additional novel risk loci. This first identification of autoimmune disease genetic predispositions shared with DM may lead to enhanced understanding of pathogenesis and novel diagnostic and therapeutic approaches.
PMCID: PMC3934004  PMID: 23983088
dermatomyositis; adult; juvenile; shared autoimmunity genes
12.  Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci 
Nature genetics  2010;42(6):508-514.
To identify novel genetic risk factors for rheumatoid arthritis (RA), we conducted a genome-wide association study (GWAS) meta-analysis of 5,539 autoantibody positive RA cases and 20,169 controls of European descent, followed by replication in an independent set of 6,768 RA cases and 8,806 controls. Of 34 SNPs selected for replication, 7 novel RA risk alleles were identified at genome-wide significance (P<5×10−8) in analysis of all 41,282 samples. The associated SNPs are near genes of known immune function, including IL6ST, SPRED2, RBPJ, CCR6, IRF5, and PXK. We also refined the risk alleles at two established RA risk loci (IL2RA and CCL21) and confirmed the association at AFF3. These new associations bring the total number of confirmed RA risk loci to 31 among individuals of European ancestry. An additional 11 SNPs replicated at P<0.05, many of which are validated autoimmune risk alleles, suggesting that most represent bona fide RA risk alleles.
PMCID: PMC4243840  PMID: 20453842
13.  Gene variants in angiogenesis and lymphangiogenesis and cutaneous melanoma progression 
Angiogenesis and lymphangiogenesis are important in the progression of melanoma. We investigated associations between genetic variants in these pathways with sentinel lymph node (SLN) metastasis and mortality in two independent series of melanoma patients.
Participants at Moffitt Cancer Center were 552 patients, all Caucasian, with primary cutaneous melanoma referred for SLN biopsy. A total of 177 patients had SLN metastasis, among whom 60 died from melanoma. Associations between 238 SNPs in 26 genes and SLN metastasis were estimated as odds ratios and 95%CI using logistic regression. Competing risk regression was used to estimate hazard ratios and 95%CI for each SNP and melanoma-specific mortality. We attempted to replicate significant findings using data from a genome-wide association study comprising 1,115 melanoma patients, who were referred for SLN biopsy from MD Anderson Cancer Center (MDACC), among whom 189 patients had SLN metastasis and 92 patients died from melanoma.
In the Moffitt dataset, we observed significant associations in 18 SNPs with SLN metastasis and 17 SNPs with mortality. Multiple SNPs in COL18A1, EGFR, FLT1, IL10, PDGFD, PIK3CA and TLR3 were associated with risk of SLN metastasis and/or patient mortality. The MDACC data set replicated an association between mortality and rs2220377 in PDGFD. Further, in a meta-analysis, three additional SNPs were significantly associated with SLN metastasis (EGFR rs723526 and TLR3 rs3775292) and melanoma specific death (TLR3 rs7668666).
These findings suggest that genetic variation in angiogenesis and lymphangiogenesis contributes to regional nodal metastasis and progression of melanoma.
Additional research attempting to replicate these results is warranted.
PMCID: PMC3708315  PMID: 23462921
SNP; lymph/angiogenesis; melanoma; sentinel lymph node
14.  Genomic Sequencing in Cancer 
Cancer letters  2012;340(2):10.1016/j.canlet.2012.11.004.
Genomic sequencing has provided critical insights into the etiology of both simple and complex diseases. The enormous reductions in cost for whole genome sequencing have allowed this technology to gain increasing use. Whole genome analysis has impacted research of complex diseases including cancer by allowing the systematic analysis of entire genomes in a single experiment, thereby facilitating the discovery of somatic and germline mutations, and identification of the function and impact of the insertions, deletions, and structural rearrangements, including translocations and inversions, in novel disease genes. Whole-genome sequencing can be used to provide the most comprehensive characterization of the cancer genome, the complexity of which we are only beginning to understand. Hence in this review, we focus on whole-genome sequencing in cancer.
PMCID: PMC3622788  PMID: 23178448
15.  Joint Effects of Germ-Line p53 Mutation, MDM2 SNP309, and Gender on Cancer Risk in Family Studies of Li-Fraumeni Syndrome 
Human genetics  2011;129(6):663-673.
Li-Fraumeni syndrome (LFS) is a rare familial cancer syndrome characterized by early cancer onset, diverse tumor types, and multiple primary tumors. Germ-line p53 mutations have been identified in most LFS families. A high-frequency genetic variant of single-nucleotide polymorphism (SNP) 309 in the MDM2 gene was recently confirmed to be a modifier of cancer risk in several case-control studies: substantially earlier cancer onset was observed in SNP309 G-allele carriers than in wild-type individuals by 7–16 years. However, risk analyses in family studies that jointly account for measured hereditary p53 mutations and SNP309 have not been evaluated. Here, we extended the statistical method that we recently developed to determine the combined effects of measured p53 mutations, SNP309, and gender and their interactions simultaneously. The method is structured for age-specific risk models based on Cox proportional hazards regression for censored age-of-onset traits. We analyzed the cancer incidence in 19 extended pedigrees with multiple germ-line p53 mutations ascertained through clinical LFS phenotype. The dataset consisted of 463 individuals with 129 p53 mutation carriers. Our analyses showed that the p53 germ-line mutation and its interaction with gender were strongly associated with familial cancer incidence and that the association between SNP309 and increased cancer risk was modest. In contrast with several outcomes in case-control studies, the interaction between SNP309 and p53 mutation was not statistically significant. The causal role of SNP309 in family studies was consistent with a previous finding that SNP309 G-alleles are associated with accelerated tumor formation in patients with sporadic and hereditary cancers.
PMCID: PMC4194062  PMID: 21305319
16.  The Effect on Melanoma Risk of Genes Previously Associated With Telomere Length 
Telomere length has been associated with risk of many cancers, but results are inconsistent. Seven single nucleotide polymorphisms (SNPs) previously associated with mean leukocyte telomere length were either genotyped or well-imputed in 11108 case patients and 13933 control patients from Europe, Israel, the United States and Australia, four of the seven SNPs reached a P value under .05 (two-sided). A genetic score that predicts telomere length, derived from these seven SNPs, is strongly associated (P = 8.92x10-9, two-sided) with melanoma risk. This demonstrates that the previously observed association between longer telomere length and increased melanoma risk is not attributable to confounding via shared environmental effects (such as ultraviolet exposure) or reverse causality. We provide the first proof that multiple germline genetic determinants of telomere length influence cancer risk.
PMCID: PMC4196080  PMID: 25231748
17.  Meta-analyses of Colorectal Cancer Risk Factors 
Cancer causes & control : CCC  2013;24(6):1207-1222.
Demographic, behavioral and environmental factors have been associated with increased risk of colorectal cancer (CRC). We reviewed the published evidence and explored associations between risk factors and CRC incidence.
We identified 12 established non-screening CRC risk factors and performed a comprehensive review and meta-analyses to quantify each factor’s impact on CRC risk. We used random effects models of the logarithms of risks across studies: inverse variance weighted averages for dichotomous factors and generalized least squares for dose-response for multi-level factors.
Significant risk factors include inflammatory bowel disease (RR = 2.93, 95% CI: 1.79–4.81); CRC history in first-degree relative (RR = 1.79, 95% CI: 1.60–2.02); body mass index (BMI) to overall population (RR = 1.10 per 8 kg/m2 increase, 95% CI: 1.08–1.12); physical activity (RR = 0.88, 95% CI: 0.86–0.91 for 2 standard deviations increased physical activity score); cigarette smoking (RR = 1.06, 95% CI: 1.03–1.08 for 5 pack-years), and consumption of red meat (RR = 1.13, 95% CI: 1.09–1.16 for 5 servings/week), fruit (RR = 0.85, 95% CI: 0.75–0.96 for 3 servings/day), and vegetables (RR = 0.86, 95% CI: 0.78–0.94 for 5 servings/day).
We developed a comprehensive risk modeling strategy that incorporates multiple effects to predict an individual’s risk of developing colorectal cancer. Inflammatory bowel disease and history of CRC in first-degree relatives are associated with much higher risk of CRC. Increased BMI, red meat intake, cigarette smoking, low physical activity, low vegetable consumption, and low fruit consumption were associated with moderately increased risk of CRC.
PMCID: PMC4161278  PMID: 23563998
Colorectal cancer; colon neoplasms; colonic neoplasms and colorectal neoplasms; colorectal risk factors; and colorectal cancer prevention; meta-analysis
18.  Serrated and Adenomatous Polyp Detection Increases with Longer Withdrawal Time: Results from the New Hampshire Colonoscopy Registry 
Detection and removal of adenomas and clinically significant serrated polyps is critical to the effectiveness of colonoscopy in preventing colorectal cancer. While longer withdrawal time has been found to increase polyp detection, this association, and the use of withdrawal time as a quality indicator, remains controversial. Few studies have reported on withdrawal time and serrated polyp detection. Using data from the New Hampshire Colonoscopy Registry, we examined how an endoscopist’s withdrawal time in normal colonoscopies affects adenoma and serrated polyp detection.
We analyzed 7996 colonoscopies performed in 7972 patients between 2009 and 2011 by 42 endoscopists at 14 hospitals, ambulatory surgery centers, and community practices. Clinically significant serrated polyps (CSSPs) were defined as sessile serrated polyps and hyperplastic polyps proximal to the sigmoid. Adenoma and CSSP detection rates were calculated based on median endoscopist withdrawal time in normal exams. Regression models were used to estimate the association of increased normal withdrawal time and polyp, adenoma, and CSSP detection.
Polyp and adenoma detection rates were highest among endoscopists with 9 minute median normal withdrawal time, while detection of CSSPs reached its highest levels at 8 to 9 minutes. Incident rate ratios for adenoma and CSSP detection increased with each minute of normal withdrawal time above 6 minutes, with maximum benefit at 9 minutes for adenomas (1.50, 95% CI (1.21,1.85)) and CSSPs (1.77, 95% CI (1.15, 2.72)). When modeling was used to set the minimum withdrawal time at 9 minutes, we predicted that adenomas and CSSPs would be detected in 302 (3.8%) and 191 (2.4%) more patients. The increase in detection was most striking for the CSSPs, with nearly a 30% relative increase.
A withdrawal time of 9 minutes resulted in a statistically significant increase in adenoma and serrated polyp detection. Colonoscopy quality may improve with a median normal withdrawal time benchmark of 9 minutes.
PMCID: PMC4082336  PMID: 24394752
Colon cancer; cancer early detection; quality indicators; cancer prevention
19.  Insight in glioma susceptibility through an analysis of 6p22.3, 12p13.33-12.1, 17q22-23.2 and 18q23 SNP genotypes in familial and non-familial glioma 
Human genetics  2012;131(9):1507-1517.
The risk of glioma has consistently been shown to be increased two-fold in relatives of patients with primary brain tumors (PBT). A recent genome-wide linkage study of glioma families provided evidence for a disease locus on 17q12-21.32, with the possibility of four additional risk loci at 6p22.3, 12p13.33-12.1, 17q22-23.2, and 18q23.
To identify the underlying genetic variants responsible for the linkage signals, we compared the genotype frequencies of 5,122 SNPs mapping to these five regions in 88 glioma cases with and 1,100 cases without a family history of PBT (discovery study). An additional series of 84 familial and 903 non-familial cases were used to replicate associations.
In the discovery study, 12 SNPs showed significant associations with family history of PBT (P < 0.001). In the replication study, two of the 12 SNPs were confirmed: 12p13.33-12.1 PRMT8 rs17780102 (P = 0.031) and 17q12-21.32 SPOP rs650461 (P = 0.025). In the combined analysis of discovery and replication studies, the strongest associations were attained at four SNPs: 12p13.33-12.1 PRMT8 rs17780102 (P = 0.0001), SOX5 rs7305773 (P = 0.0001) and STKY1 rs2418087 (P = 0.0003), and 17q12-21.32 SPOP rs6504618 (P = 0.0006). Further, a significant gene-dosage effect was found for increased risk of family history of PBT with these four SNPs in the combined data set (Ptrend < 1.0 ×10−8).
The results support the linkage finding that some loci in the 12p13.33-12.1 and 17q12-q21.32 may contribute to gliomagenesis and suggest potential target genes underscoring linkage signals.
PMCID: PMC3604903  PMID: 22688887
Association; Polymorphisms; Glioma; Family history of primary brain tumor; Linkage analysis
20.  A Unified Framework Integrating Parent-of-Origin Effects for Association Study 
PLoS ONE  2013;8(8):e72208.
Genetic imprinting is the most well-known cause for parent-of-origin effect (POE) whereby a gene is differentially expressed depending on the parental origin of the same alleles. Genetic imprinting is related to several human disorders, including diabetes, breast cancer, alcoholism, and obesity. This phenomenon has been shown to be important for normal embryonic development in mammals. Traditional association approaches ignore this important genetic phenomenon. In this study, we generalize the natural and orthogonal interactions (NOIA) framework to allow for estimation of both main allelic effects and POEs. We develop a statistical (Stat-POE) model that has the orthogonal estimates of parameters including the POEs. We conducted simulation studies for both quantitative and qualitative traits to evaluate the performance of the statistical and functional models with different levels of POEs. Our results showed that the newly proposed Stat-POE model, which ensures orthogonality of variance components if Hardy-Weinberg Equilibrium (HWE) or equal minor and major allele frequencies is satisfied, had greater power for detecting the main allelic additive effect than a Func-POE model, which codes according to allelic substitutions, for both quantitative and qualitative traits. The power for detecting the POE was the same for the Stat-POE and Func-POE models under HWE for quantitative traits.
PMCID: PMC3753359  PMID: 23991061
21.  Genome-wide association studies identify several new loci associated with pigmentation traits and skin cancer risk in European Americans 
Human Molecular Genetics  2013;22(14):2948-2959.
Aiming to identify novel genetic loci for pigmentation and skin cancer, we conducted a series of genome-wide association studies on hair color, eye color, number of sunburns, tanning ability and number of non-melanoma skin cancers (NMSCs) among 10 183 European Americans in the discovery stage and 4504 European Americans in the replication stage (for eye color, 3871 males in the discovery stage and 2496 males in the replication stage). We targeted novel chromosome regions besides the known ones for replication. As a result, we identified a new region downstream of the EDNRB gene on 13q22 associated with hair color and the strongest association was the single-nucleotide polymorphism (SNP) rs975739 (P = 2.4 × 10−14; P = 5.4 × 10−9 in the discovery set and P = 1.2 × 10−6 in the replication set). Using blue, intermediate (including green) and brown eye colors as co-dominant outcomes, we identified the SNP rs3002288 in VASH2 on 1q32.3 associated with brown eye (P = 7.0 × 10−8; P = 5.3 × 10−5 in the discovery set and P = 0.02 in the replication set). Additionally, we identified a significant interaction between the SNPs rs7173419 and rs12913832 in the OCA2 gene region on brown eye color (P-value for interaction = 3.8 × 10−3). As for the number of NMSCs, we identified two independent SNPs on chr6 and one SNP on chromosome 14: rs12203592 in IRF4 (P = 7.2 × 10−14; P = 1.8 × 10−8 in the discovery set and P = 6.7 × 10−7 in the replication set), rs12202284 between IRF4 and EXOC2 (P = 5.0 × 10−8; P = 6.6 × 10−7 in the discovery set and P = 3.0 × 10−3 in the replication set) and rs8015138 upstream of GNG2 (P = 6.6 × 10−8; P = 5.3 × 10−7 in the discovery set and P = 0.01 in the replication set).
PMCID: PMC3690971  PMID: 23548203
22.  Natural and Orthogonal Interaction framework for modeling gene-environment interactions with application to lung cancer 
Human heredity  2012;73(4):185-194.
We aimed at extending the natural and orthogonal interaction (NOIA) framework, developed for modeling gene-gene interactions in the analysis of quantitative traits, to allow for reduced genetic models, dichotomous traits, and gene-environment interactions. We evaluate the performance of the NOIA statistical models using simulated data and lung cancer data.
The NOIA statistical models are developed for the additive, dominant, recessive genetic models, and a binary environmental exposure. Using the Kronecker product rule, a NOIA statistical model is built to model gene-environment interactions. By treating the genotypic values as the logarithm of odds, the NOIA statistical models are extended to the analysis of case-control data.
Our simulations showed that power for testing associations while allowing for interaction using the statistical model is much higher than using functional models for most of the scenarios we simulated. When applied to the lung cancer data, much smaller P-values were obtained using the NOIA statistical model for either the main effects or the SNP-smoking interactions for some of the SNPs tested.
The NOIA statistical models are usually more powerful than the functional models in detecting main effects and interaction effects for both quantitative traits and binary traits.
PMCID: PMC3534768  PMID: 22889990
Statistical power; Genetic association studies; Case-control association analysis; Gene-environment interaction; Environmental risk factor; Association mapping; Orthogonal modeling
23.  Empirical Hierarchical Bayes Approach to Gene-Environment Interactions: Development and Application to Genome-Wide Association Studies of Lung Cancer in TRICL 
Genetic epidemiology  2013;37(6):551-559.
The analysis of gene-environment (GxE) interactions remains one of the greatest challenges in the post-genome-wide-association-studies (GWAS) era. Recent methods constitute a compromise between the robust but underpowered case-control and powerful case-only methods. Inferences of the latter are biased when the assumption of gene-environment (G-E) independence fails. We propose a novel empirical hierarchical Bayes approach to GxE interaction (EHB-GE), which benefits from greater power while accounting for population-based G-E dependence. Building on Lewinger et al.'s ([2007] Genet Epidemiol 31:871-882) hierarchical Bayes prioritization approach, the method utilizes posterior G-E association estimates in controls based on G-E information across the genome to adjust for it in resulting test statistics. These posteriori estimates are subtracted from the corresponding G-E association coefficients within cases.
We compared EHB-GE with rival methods using simulation. EHB-GE has similar or greater rank power to detect GxE interactions in the presence of large numbers of G-E associations with weak to strong effects or only a low number of such associations with large effect. When there are no or only a few weak G-E associations, Murcray et al.'s method ([2009] Am J Epidemiol 169:219-226) identifies markers with low GxE interaction effects better. We applied EHB-GE and competing methods to four lung cancer case-control GWAS from the TRICL/ILCCO consortium with smoking as environmental factor. Genes identified by the EHB-GE approach are reasonable candidates, suggesting usefulness of the method.
PMCID: PMC4082246  PMID: 23893921
population G-E association; GWAS; rank power; lung cancer
24.  Obesity-related genetic variants, human pigmentation, and risk of melanoma 
Human genetics  2013;132(7):793-801.
Previous biological studies showed evidence of a genetic link between obesity and pigmentation in both animal models and humans. Our study investigated the individual and joint associations between obesity-related single nucleotide polymorphisms (SNPs) and both human pigmentation and risk of melanoma. Eight obesity-related SNPs in the FTO, MAP2K5, NEGR1, FLJ35779, ETV5, CADM2, and NUDT3 genes were nominally significantly associated with hair color among 5,876 individuals of European ancestry. The genetic score combining 35 independent obesity-risk loci was significantly associated with darker hair color (beta-coefficient per ten alleles=0.12, P-value=4 10−5). However, single SNPs or genetic scores showed non-significant association with tanning ability. We further examined the SNPs at the FTO locus for their associations with pigmentation and risk of melanoma. Among the 783 SNPs in the FTO gene with imputation R-square quality metric >0.8 using the 1000 genome data set, ten and three independent SNPs were significantly associated with hair color and tanning ability respectively. Moreover, five independent FTO SNPs showed nominally significant association with risk of melanoma in 1,804 cases and 1,026 controls. But none of them was associated with obesity or in linkage disequilibrium with obesity-related variants. FTO locus may confer variation in human pigmentation and risk of melanoma, which may be independent of its effect on obesity.
PMCID: PMC3683389  PMID: 23539184
obesity; pigmentation; melanoma; genetic association; FTO gene
25.  Simplifying Clinical Use of the Genetic Risk Prediction Model BRCAPRO 
Health care providers need simple tools to identify patients at genetic risk of breast and ovarian cancers. Genetic risk prediction models such as BRCAPRO could fill this gap if incorporated into Electronic Medical Records or other Health Information Technology solutions. However, BRCAPRO requires potentially extensive information on the counselee and her family history. Thus, it may be useful to provide simplified version(s) of BRCAPRO for use in settings that do not require exhaustive genetic counseling.
We explore four simplified versions of BRCAPRO, each using less complete information than the original model. BRCAPROLYTE uses information on affected relatives only up to second degree. It is in clinical use but has not been evaluated. BRCAPROLYTE-Plus extends BRCAPROLYTE by imputing the ages of unaffected relatives. BRCAPROLYTE-Simple reduces the data collection burden associated with BRCAPROLYTE and BRCAPROLYTE-Plus by not collecting the family structure. BRCAPRO-1Degree only uses first-degree affected relatives. We use data on 2713 individuals from seven sites of the Cancer Genetics Network and MD Anderson Cancer Center to compare these simplified tools with the Family History Assessment Tool (FHAT) and BRCAPRO, with the latter serving as the benchmark.
BRCAPROLYTE retains high discrimination, however, because it ignores information on unaffected relatives, it overestimates carrier probabilities. BRCAPROLYTE-Plus and BRCAPROLYTE-Simple provide better calibration than BRCAPROLYTE, so they have higher specificity for similar values of sensitivity. BRCAPROLYTE-Plus performs slightly better than BRCAPROLYTE-Simple. The Areas Under the ROC curve are 0.783 (BRCAPRO), 0.763 (BRCAPROLYTE), 0.772 (BRCAPROLYTE-Plus), 0.773 (BRCAPROLYTE-Simple), 0.728 (BRCAPRO-1Degree), and 0.745 (FHAT). The simpler versions, especially BRCAPROLYTE-Plus and BRCAPROLYTE-Simple, lead to only modest loss in overall discrimination compared to BRCAPRO in this dataset.
Simplified implementations of BRCAPRO can be used for genetic risk prediction in settings where collection of complete pedigree information is impractical.
PMCID: PMC3699331  PMID: 23690142

