We consider semiparametric transition measurement error models for
longitudinal data, where one of the covariates is measured with error in
transition models, and no distributional assumption is made for the underlying
unobserved covariate. An estimating equation approach based on the pseudo
conditional score method is proposed. We show the resulting estimators of the
regression coefficients are consistent and asymptotically normal. We also
discuss the issue of efficiency loss. Simulation studies are conducted to
examine the finite-sample performance of our estimators. The longitudinal AIDS
Costs and Services Utilization Survey data are analyzed for illustration.
Asymptotic efficiency; Conditional score method; Functional modeling; Measurement error; Longitudinal data; Transition models
Asthma exacerbation and other respiratory symptoms are associated with exposure to air pollution. Since environment affects gene methylation, it is hypothesized that asthmatic responses to pollution are mediated through methylation.
Materials & methods
We study the possibility that airborne particulate matter affects gene methylation in the asthma pathway. We measured methylation array data in clinic visits of 141 subjects from the Normative Aging Study. Black carbon and sulfate measures from a central monitoring site were recorded and 30-day averages were calculated for each clinic visit. Genespecific methylation scores were calculated for the genes in the asthma pathway, and the association between the methylation in the asthma pathway and the pollution measures was analyzed using sparse Canonical Correlation Analysis.
The analysis found that exposures to black carbon and sulfate were significantly associated with the methylation pattern in the asthma pathway (p-values 0.05 and 0.02, accordingly). Specific genes that contributed to this association were identified.
These results suggest that the effect of air pollution on asthmatic and respiratory responses may be mediated through gene methylation.
sulfate; black carbon; epigenetics; gene-specific methylation scores; pathway analysis; sulfate
With development of massively parallel sequencing technologies, there is a substantial need for developing powerful rare variant association tests. Common approaches include burden and non-burden tests. Burden tests assume all rare variants in the target region have effects on the phenotype in the same direction and of similar magnitude. The recently proposed sequence kernel association test (SKAT) (Wu, M. C., and others, 2011. Rare-variant association testing for sequencing data with the SKAT. The American Journal of Human Genetics
89, 82–93], an extension of the C-alpha test (Neale, B. M., and others, 2011. Testing for an unusual distribution of rare variants. PLoS Genetics
7, 161–165], provides a robust test that is particularly powerful in the presence of protective and deleterious variants and null variants, but is less powerful than burden tests when a large number of variants in a region are causal and in the same direction. As the underlying biological mechanisms are unknown in practice and vary from one gene to another across the genome, it is of substantial practical interest to develop a test that is optimal for both scenarios. In this paper, we propose a class of tests that include burden tests and SKAT as special cases, and derive an optimal test within this class that maximizes power. We show that this optimal test outperforms burden tests and SKAT in a wide range of scenarios. The results are illustrated using simulation studies and triglyceride data from the Dallas Heart Study. In addition, we have derived sample size/power calculation formula for SKAT with a new family of kernels to facilitate designing new sequence association studies.
Burden tests; Correlated effects; Kernel association test; Rare variants; Score test
In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology
23, 429–435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One
2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics
63, 1079–1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics
9, 292–2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics
86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50–57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk.
Adaptive procedures; Empirical Bayes; GWAS; Pathway analysis; Score test; SNP sets
Chronic exposure to arsenic in drinking water is associated with increased risk of type 2 diabetes mellitus (T2DM) but the underlying molecular mechanism remains unclear.
This study evaluated the interaction between single nucleotide polymorphisms (SNPs) in genes associated with diabetes and arsenic exposure in drinking water on the risk of developing T2DM.
In 2009–2011, we conducted a follow up study of 957 Bangladeshi adults who participated in a case-control study of arsenic-induced skin lesions in 2001–2003. Logistic regression models were used to evaluate the association between 38 SNPs in 18 genes and risk of T2DM measured at follow up. T2DM was defined as having a blood hemoglobin A1C level greater than or equal to 6.5% at follow-up. Arsenic exposure was characterized by drinking water samples collected from participants' tubewells. False discovery rates were applied in the analysis to control for multiple comparisons.
Median arsenic levels in 2001–2003 were higher among diabetic participants compared with non-diabetic ones (71.6 µg/L vs. 12.5 µg/L, p-value <0.001). Three SNPs in ADAMTS9 were nominally associated with increased risk of T2DM (rs17070905, Odds Ratio (OR) = 2.30, 95% confidence interval (CI) 1.17–4.50; rs17070967, OR = 2.02, 95%CI 1.00–4.06; rs6766801, OR = 2.33, 95%CI 1.18–4.60), but these associations did not reach the statistical significance after adjusting for multiple comparisons. A significant interaction between arsenic and NOTCH2 (rs699780) was observed which significantly increased the risk of T2DM (p for interaction = 0.003; q-value = 0.021). Further restricted analysis among participants exposed to water arsenic of less than 148 µg/L showed consistent results for interaction between the NOTCH2 variant and arsenic exposure on T2DM (p for interaction = 0.048; q-value = 0.004).
These findings suggest that genetic variation in NOTCH2 increased susceptibility to T2DM among people exposed to inorganic arsenic. Additionally, genetic variants in ADAMTS9 may increase the risk of T2DM.
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses.
We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA).
We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
We consider frailty models with additive semiparametric covariate effects
for clustered failure time data. We propose a doubly penalized partial
likelihood (DPPL) procedure to estimate the nonparametric functions using
smoothing splines. We show that the DPPL estimators could be obtained from
fitting an augmented working frailty model with parametric covariate effects,
whereas the nonparametric functions being estimated as linear combinations of
fixed and random effects, and the smoothing parameters being estimated as extra
variance components. This approach allows us to conveniently estimate all model
components within a unified frailty model framework. We evaluate the finite
sample performance of the proposed method via a simulation study, and apply the
method to analyze data from a study of sexually transmitted infections
Doubly penalized partial likelihood; smoothing spline; Gaussian frailty; sexually transmitted disease; Smoothing parameter; Variance components
Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are more likely to be expression quantitative trait loci (eQTLs). Incorporating eQTL information hence has potential to increase power of genome-wide association studies (GWAS). In this paper, we propose using eQTL weights as prior information in SNP based association tests to improve test power while maintaining control of the family-wise error rate (FWER) or the false discovery rate (FDR). We apply the proposed methods to the analysis of a GWAS for childhood asthma consisting of 1296 unrelated individuals with German ancestry. The results confirm that eQTLs are enriched for previously reported asthma SNPs. We also find that some SNPs are insignificant using procedures without eQTL weighting, but become significant using eQTL-weighted Bonferroni or Benjamini–Hochberg procedures, while controlling the same FWER or FDR level. Some of these SNPs have been reported by independent studies in recent literature. The results suggest that the eQTL-weighted procedures provide a promising approach for improving power of GWAS. We also report the results of our methods applied to the large-scale European GABRIEL consortium data.
asthma; family-wise error rate; false discovery rate; eQTL; genome-wide association study; weighted hypothesis test
Genome-wide association studies have identified variants on chromosome 15q25.1 that increase the risks of both lung cancer and nicotine dependence and associated smoking behavior. However, there remains debate as to whether the association with lung cancer is direct or is mediated by pathways related to smoking behavior. Here, the authors apply a novel method for mediation analysis, allowing for gene-environment interaction, to a lung cancer case-control study (1992–2004) conducted at Massachusetts General Hospital using 2 single nucleotide polymorphisms, rs8034191 and rs1051730, on 15q25.1. The results are validated using data from 3 other lung cancer studies. Tests for additive interaction (P = 2 × 10−10 and P = 1 × 10−9) and multiplicative interaction (P = 0.01 and P = 0.01) were significant. Pooled analyses yielded a direct-effect odds ratio of 1.26 (95% confidence interval (CI): 1.19, 1.33; P = 2 × 10−15) for rs8034191 and an indirect-effect odds ratio of 1.01 (95% CI: 1.00, 1.01; P = 0.09); the proportion of increased risk mediated by smoking was 3.2%. For rs1051730, direct- and indirect-effect odds ratios were 1.26 (95% CI: 1.19, 1.33; P = 1 × 10−15) and 1.00 (95% CI: 0.99, 1.01; P = 0.22), respectively, with a proportion mediated of 2.3%. Adjustment for measurement error in smoking behavior allowing up to 75% measurement error increased the proportions mediated to 12.5% and 9.2%, respectively. These analyses indicate that the association of the variants with lung cancer operates primarily through other pathways.
gene-environment interaction; lung neoplasms; mediation; pathway analysis; smoking
Genomewide association studies (GWAS) routinely apply principal component analysis (PCA) to infer population structure within a sample to correct for confounding due to ancestry. GWAS implementation of PCA uses tens of thousands of SNPs to infer structure, despite the fact that only a small fraction of such SNPs provides useful information on ancestry. The identification of this reduced set of ancestry-informative markers (AIMs) from a GWAS has practical value; for example, researchers can genotype the AIM set to correct for potential confounding due to ancestry in follow-up studies that utilize custom SNP or sequencing technology. We propose a novel technique to identify AIMs from genomewide SNP data using sparse principal component analysis (sparse PCA). The procedure uses penalized regression methods to identify those SNPs in a genomewide panel that significantly contribute to the principal components while encouraging SNPs that provide negligible loadings to vanish from the analysis. We found that sparse PCA leads to negligible loss of ancestry information compared to traditional PCA analysis of genomewide SNP data. We further demonstrate the value of sparse PCA for AIM selection using real data from the International HapMap Project and a genomewide study of Inflammatory Bowel Disease. We have implemented our approach in open-source R software for public use.
Ancestry-informative markers; Genome-wide association studies; Population stratification; Principal component analysis; Variable selection
In the increasing number of sequencing studies aimed at identifying rare variants associated with complex traits, the power of the test can be improved by guided sampling procedures. We confirm both analytically and numerically that sampling individuals with extreme phenotypes can enrich the presence of causal rare variants and can therefore lead to an increase in power compared to random sampling. While application of traditional rare variant association tests to these extreme phenotype samples requires dichotomizing the continuous phenotypes before analysis, the dichotomization procedure can decrease the power by reducing the information in the phenotypes. To avoid this, we propose a novel statistical method based on optimal SKAT (SKAT-O) that allows us to test for rare variant effects using continuous phenotypes in the analysis of extreme phenotype samples. The increase in power of this method is demonstrated through simulation of a wide range of scenarios as well as in the triglyceride data of the Dallas Heart Study.
Complex trait associations; Selective sampling; Rare genetic variants; Extreme phenotype sampling
Urinary metabolites of the tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK), 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol (NNAL) and its glucuronides, termed total NNAL, have recently been shown to be good predictors of lung cancer risk, years prior to diagnosis. We sought to determine the contribution of several genetic polymorphisms to total NNAL output and inter-individual variability. The study subjects were derived from the Harvard/Massachusetts General Hospital Lung cancer case-control study. We analyzed 87 self-described smokers (35 lung cancer cases and 52 controls), with urine samples collected at time of diagnosis and (1992–1996). We tested 82 tagging SNPs in 16 genes related to the metabolism of NNK to total NNAL. Using weighted case status least squares regression, we tested for the association of each SNP with square-root (sqrt) transformed total NNAL (pmol per mg creatinine), controlling for age, sex, sqrt packyears and sqrt nicotine (ng per mg creatinine). After a sqrt transformation, nicotine significantly predicted a 0.018 (0.014, 0.023) pmol/mg creatinine unit increase in total NNAL for every ng/mg creatinine increase in nicotine at p<10E-16. Three HSD11B1 SNPs and AKR1C4 rs7083869 were significantly associated with decreasing total NNAL levels: HSD11B1 rs2235543 (p= 4.84E-08) and rs3753519 (p= 0.0017) passed multiple testing adjustment at FDR q=1.13E-05 and 0.07 respectively, AKR1C4 rs7083869 (p=0.019) did not, FDR q=0.51. HSD11B1 and AKR1C4 enzymes are carbonyl reductases directly involved in the single step reduction of NNK to NNAL. The HSD11B1 SNPs may be correlated with the functional variant rs13306401 and the AKR1C4 SNP is correlated with the enzyme activity reducing variant rs17134592, L311V.
NNK; NNAL; tobacco specific nitrosamine; genetic polymorphism; HSD11B1
Gastroesophageal reflux symptoms (GERD), higher body mass index (BMI), smoking, and genetic variants in angiogenic pathway genes have been individually associated with increased risk of esophageal adenocarcinoma (EA). However, how angiogenic gene polymorphisms and environmental factors jointly affect EA development remains unclear.
Using a case-only design (n = 335), we examined interaction between 141 functional/tagging angiogenic SNPs and environmental factors (GERD, BMI, smoking) in modulating EA risk. Gene-environment interactions were assessed by a two-step approach. First, we applied random forest (RF) to screen for important SNPs that had either main or interaction effects. Second, we used case-only logistic regression (LR) to assess the effects of gene-environment interactions on EA risk, adjusting for covariates and false-discovery rate (FDR).
RF analyses identified three sets of SNPs (17 SNPs-GERD, 26 SNPs-smoking, and 34 SNPs-BMI) that had the highest importance scores. In subsequent LR analyses, interactions between 3 SNPs (rs2295778 of HIF1AN, rs133376 of TSC2, and rs2519757 of TSC1) and GERD, 2 SNPs (rs2295778 of HIF1AN, rs2296188 (VEGFR1) and smoking, and 7 SNPs (rs2114039 of PDGRFA, rs2296188 of VEGFR1, rs11941492 of VEGFR1, rs3756309 of PDGFRB, rs7324547 of VEGFR1, rs17619601 of VEGFR1, and rs17625898 of VEGFR1) and BMI were significantly associated with EA development (all FDR ≤0.10). Moreover, these interactions tended to have a SNP dose-response effects for increased EA risk with increasing number of combined risk genotypes.
These findings suggest that genetic variations in angiogenic genes may modify EA susceptibility through interactions with environmental factors in a SNP dose-response manner.
Esophageal adenocarcinoma; angiogenesis pathway genes; gene-environment interaction; case-only analysis
Both nurture (environmental) and nature (genetic factors) play an important role in human disease etiology. Traditionally, these effects have been thought of as independent. This perspective is ill informed for non-mendelian complex disorders which result as an interaction between genetics and environment. To understand health and disease we must study how nature and nurture interact. Recent advances in human genomics and high-throughput biotechnology make it possible to study large numbers of genetic markers and gene products simultaneously to explore their interactions with environment. The purpose of this review is to discuss design and analytic issues for gene-environment interaction studies in the “-omics” era, with a focus on environmental and genetic epidemiological studies. We present an expanded environmental genomic disease paradigm. We discuss several study design issues for gene-environmental interaction studies, including confounding and selection bias, measurement of exposures and genotypes. We discuss statistical issues in studying gene-environment interactions in different study designs, such as choices of statistical models, assumptions regarding biological factors, and power and sample size considerations, especially in genome-wide gene-environment studies. Future research directions are also discussed.
Gene-environment; Interactions; Expanded environmental genomic disease paradigm; Critical developmental windows; Genome-wide; Epigenetics
We propose in this paper a powerful testing procedure for detecting a gene effect on a continuous outcome in the presence of possible gene-gene interactions (epistasis) in a gene set, e.g. a genetic pathway or network. Traditional tests for this purpose require a large number of degrees of freedom by testing the main effect and all the corresponding interactions under a parametric assumption, and hence suffer from low power. In this paper, we propose a powerful kernel machine based test. Specifically, our test is based on a garrote kernel method and is constructed as a score test. Here, the term garrote refers to an extra nonnegative parameter which is multiplied to the covariate of interest so that our score test can be formulated in terms of this nonnegative parameter. A key feature of the proposed test is that it is flexible and developed for both parametric and nonparametric models within a unified framework, and is more powerful than the standard test by accounting for the correlation among genes and hence often uses a much smaller degrees of freedom. We investigate the theoretical properties of the proposed test. We evaluate its finite sample performance using simulation studies, and apply the method to the Michigan prostate cancer gene expression data.
Garrote; Gene-gene interaction; Kernel machine; Mixed models; Restricted maximum likelihood; Score test; Semiparametric regression
Background: Chronic exposure to arsenic is associated with skin lesions. However, it is not known whether reducing arsenic exposure will improve skin lesions.
Objective: We evaluated the association between reduced arsenic exposures and skin lesion recovery over time.
Methods: A follow-up study of 550 individuals was conducted in 2009–2011 on a baseline population of skin lesion cases (n = 900) previously enrolled in Bangladesh in 2001–2003. Arsenic in drinking water and toenails, and skin lesion status and severity were ascertained at baseline and follow-up. We used logistic regression and generalized estimating equation (GEE) models to evaluate the association between log10-transformed arsenic exposure and skin lesion persistence and severity.
Results: During the study period, water arsenic concentrations decreased in this population by 41% overall, and 65 individuals who had skin lesions at baseline had no identifiable lesions at follow-up. In the adjusted models, every log10 decrease in water arsenic and toenail arsenic was associated with 22% [odds ratio (OR) = 1.22; 95% CI: 0.85, 1.78] and 4.5 times (OR = 4.49; 95% CI: 1.94, 11.1) relative increase in skin lesion recovery, respectively. In addition, lower baseline arsenic levels were significantly associated with increased odds of recovery. A log10 decrease in toenail arsenic from baseline to follow-up was also significantly associated with reduced skin lesion severity in cases over time (mean score change of –5.22 units; 95% CI: –8.61, –1.82).
Conclusions: Reducing arsenic exposure increased the odds that an individual with skin lesions would recover or show less severe lesions within 10 years. Reducing arsenic exposure must remain a public health priority in Bangladesh and in other regions affected by arsenic-contaminated water.
arsenic; Bangladesh; change; recovery; skin lesion
We consider nonparametric regression of a scalar outcome on a covariate when the outcome is missing at random (MAR) given the covariate and other observed auxiliary variables. We propose a class of augmented inverse probability weighted (AIPW) kernel estimating equations for nonparametric regression under MAR. We show that AIPW kernel estimators are consistent when the probability that the outcome is observed, that is, the selection probability, is either known by design or estimated under a correctly specified model. In addition, we show that a specific AIPW kernel estimator in our class that employs the fitted values from a model for the conditional mean of the outcome given covariates and auxiliaries is double-robust, that is, it remains consistent if this model is correctly specified even if the selection probabilities are modeled or specified incorrectly. Furthermore, when both models happen to be right, this double-robust estimator attains the smallest possible asymptotic variance of all AIPW kernel estimators and maximally extracts the information in the auxiliary variables. We also describe a simple correction to the AIPW kernel estimating equations that while preserving double-robustness it ensures efficiency improvement over nonaugmented IPW estimation when the selection model is correctly specified regardless of the validity of the second model used in the augmentation term. We perform simulations to evaluate the finite sample performance of the proposed estimators, and apply the methods to the analysis of the AIDS Costs and Services Utilization Survey data. Technical proofs are available online.
Asymptotics; Augmented kernel estimating equations; Double robustness; Efficiency; Inverse probability weighted kernel estimating equations; Kernel smoothing
In this paper, we develop a powerful test for identifying SNP-sets that are predictive of survival with data from genome-wide association studies (GWAS). We first group typed SNPs into SNP-sets based on genomic features and then apply a score test to assess the overall effect of each SNP-set on the survival outcome through a kernel machine Cox regression framework. This approach uses genetic information from all SNPs in the SNP-set simultaneously and accounts for linkage disequilibrium (LD), leading to a powerful test with reduced degrees of freedom when the typed SNPs are in LD with each other. This type of test also has the advantage of capturing the potentially non-linear effects of the SNPs, SNP-SNP interactions (epistasis), and the joint effects of multiple causal variants. By simulating SNP data based on the LD structure of real genes from the HapMap project, we demonstrate that our proposed test is more powerful than the standard single SNP minimum p-value based test for association studies with censored survival outcomes. We illustrate the proposed test with a real data application.
cox model; genetic studies; gene-based analysis; kernel machine; multi-locus test; score test; single nucleotide polymorphism
GWAS has facilitated greatly the discovery of risk SNPs associated with complex diseases. Traditional methods analyze SNP individually and are limited by low power and reproducibility since correction for multiple comparisons is necessary. Several methods have been proposed based on grouping SNPs into SNP sets using biological knowledge and/or genomic features. In this article, we compare the linear kernel machine based test (LKM) and principal components analysis based approach (PCA) using simulated datasets under the scenarios of 0 to 3 causal SNPs, as well as simple and complex linkage disequilibrium (LD) structures of the simulated regions. Our simulation study demonstrates that both LKM and PCA can control the type I error at the significance level of 0.05. If the causal SNP is in strong LD with the genotyped SNPs, both the PCA with a small number of principal components (PCs) and the LKM with kernel of linear or identical-by-state function are valid tests. However, if the LD structure is complex, such as several LD blocks in the SNP set, or when the causal SNP is not in the LD block in which most of the genotyped SNPs reside, more PCs should be included to capture the information of the causal SNP. Simulation studies also demonstrate the ability of LKM and PCA to combine information from multiple causal SNPs and to provide increased power over individual SNP analysis. We also apply LKM and PCA to analyze two SNP sets extracted from an actual GWAS dataset on non-small cell lung cancer.
We present new statistical analyses of data arising from a clinical trial designed to compare two-stage dynamic treatment regimes (DTRs) for advanced prostate cancer. The trial protocol mandated that patients were to be initially randomized among four chemotherapies, and that those who responded poorly were to be rerandomized to one of the remaining candidate therapies. The primary aim was to compare the DTRs’ overall success rates, with success defined by the occurrence of successful responses in each of two consecutive courses of the patient’s therapy. Of the one hundred and fifty study participants, forty seven did not complete their therapy per the algorithm. However, thirty five of them did so for reasons that precluded further chemotherapy; i.e. toxicity and/or progressive disease. Consequently, rather than comparing the overall success rates of the DTRs in the unrealistic event that these patients had remained on their assigned chemotherapies, we conducted an analysis that compared viable switch rules defined by the per-protocol rules but with the additional provision that patients who developed toxicity or progressive disease switch to a non-prespecified therapeutic or palliative strategy. This modification involved consideration of bivariate per-course outcomes encoding both efficacy and toxicity. We used numerical scores elicited from the trial’s Principal Investigator to quantify the clinical desirability of each bivariate per-course outcome, and defined one endpoint as their average over all courses of treatment. Two other simpler sets of scores as well as log survival time also were used as endpoints. Estimation of each DTR-specific mean score was conducted using inverse probability weighted methods that assumed that missingness in the twelve remaining drop-outs was informative but explainable in that it only depended on past recorded data. We conducted additional worst-best case analyses to evaluate sensitivity of our findings to extreme departures from the explainable drop-out assumption.
Causal inference; Efficiency; Informative dropout; Inverse probability weighting; Marginal structural models; Optimal regime; Simultaneous confidence intervals
There is growing evidence that genomic and proteomic research holds great potentials for changing irrevocably the practice of medicine. The ability to identify important genomic and biological markers for risk assessment can have a great impact in public health from disease prevention, to detection, to treatment selection. However, the potentially large number of markers and the complexity in the relationship between the markers and the outcome of interest impose a grand challenge in developing accurate risk prediction models. The standard approach to identifying important markers often assesses the marginal effects of individual markers on a phenotype of interest. When multiple markers relate to the phenotype simultaneously via a complex structure, such a type of marginal analysis may not be effective. To overcome such difficulties, we employ a kernel machine Cox regression framework and propose an efficient score test to assess the overall effect of a set of markers, such as genes within a pathway or a network, on survival outcomes. The proposed test has the advantage of capturing the potentially non-linear effects without explicitly specifying a particular non-linear functional form. To approximate the null distribution of the score statistic, we propose a simple resampling procedure that can be easily implemented in practice. Numerical studies suggest that the test performs well with respect to both empirical size and power even when the number of variables in a gene set is not small compared to the sample size.
Genetic Association; Gene-set analysis; Genetic Pathways; Kernel Machine; Kernel PCA; Risk Prediction; Score Test; Survival Analysis
Genetic risk factors for sporadic neuroendocrine tumors (NET) are poorly understood. We tested risk associations in patients with sporadic NET and non-cancer controls, using a custom array containing 1536 single-nucleotide polymorphisms (SNPs) in 355 candidate genes. We identified 18 SNPs associated with NET risk at a P-value <0.01 in a discovery set of 261 cases and 319 controls. Two of these SNPs were found to be significantly associated with NET risk in an independent replication set of 235 cases and 113 controls, at a P value ≤0.05. An SNP in interleukin 12A (IL12A rs2243123), a gene implicated in inflammatory response, replicated with an adjusted odds ratio (95% confidence interval) (aOR) = 1.47 (1.03, 2.11) P-trend = 0.04. A second SNP in defender against cell death, (DAD1 rs8005354), a gene that modulates apoptosis, replicated at aOR = 1.43 (1.02, 2.02) P-trend = 0.04. Consistent with our observations, a pathway analysis, performed in the discovery set, suggested that genetic variation in inflammatory pathways or apoptosis pathways is associated with NET risk. Our findings support further investigation of the potential role of IL12A and DAD1 in the etiology of NET.
Cardiac abnormalities are one of the most common congenital defects observed in individuals with Down syndrome. Considerable research has implicated both folate deficiency and genetic variation in folate pathway genes with birth defects, including both congenital heart defects (CHD) and Down syndrome (DS). Here, we test variation in folate pathway genes for a role in the major DS-associated CHD atrioventricular septal defect (AVSD). In a group of 121 case families (mother, father, and proband with DS and AVSD) and 122 control families (mother, father, and proband with DS and no CHD), tag SNPs were genotyped in and around five folate pathway genes: 5,10-methylenetetrahyrdofolate reductase (MTHFR), methionine synthase (MTR), methionine synthase reductase (MTRR), cystathionine β-synthase (CBS), and the reduced folate carrier (SLC19A1, RFC1). SLC19A1 was found to be associated with AVSD using a multilocus allele-sharing test. Individual SNP tests also showed nominally significant associations with odds ratios of between 1.34 and 3.78, depending on the SNP and genetic model. Interestingly, all marginally significant SNPs in SLC19A1 are in strong linkage disequilibrium (r2≥0.8) with the nonsynonymous coding SNP rs1051266 (c.80A>G), which has previously been associated with nonsyndromic cases of CHD. In addition to SLC19A1, the known functional polymorphism MTHFR c.1298A was over-transmitted to cases with AVSD (P = 0.05) and under-transmitted to controls (P = 0.02). We conclude, therefore, that disruption of the folate pathway contributes to the incidence of AVSD among individuals with DS.
Down syndrome; atrioventricular septal defect; folate; trisomy; congenital heart defects
In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross ratio. Motivated by Cox’s partial likelihood idea, we propose a novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is examined using simulation studies, and it is applied to the Australian twin data.
Correlated survival times; Empirical process theory; Local dependency measure; Pseudo-partial likelihood
This study examined the correlation between manganese exposure and manganese concentrations in different biomarkers.
Air measurement data and work histories were used to determine manganese exposure over a workshift and cumulative exposure. Toenail samples (n=49), as well as blood and urine before (n=27) and after (urine, n=26; blood, n=24) a workshift were collected.
Toenail manganese, adjusted for age and dietary manganese, was significantly correlated with cumulative exposure in months 7-9, 10-12, and 7-12 before toenail clipping date, but not months 1-6. Manganese exposure over a work shift was not correlated with changes in blood nor urine manganese.
Toenails appeared to be a valid measure of cumulative manganese exposure 7 to 12 months earlier. Neither change in blood nor urine manganese appeared to be suitable indicators of exposure over a typical workshift.
Manganese; biomarkers; toenail manganese; blood manganese; urine manganese; low-level manganese exposure; welding fumes; welders