Summary
Objectives
In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution.
Methods
We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects.
Results
Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance.
Conclusions
Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.
doi:10.3414/ME11-02-0019
PMCID: PMC3598607
PMID: 22344268
Breast cancer prognosis; Gene Expression; Integrative analysis; Sparse boosting
Krauthammer, Michael | Kong, Yong | Ha, Byung Hak | Evans, Perry | Bacchiocchi, Antonella | McCusker, James P | Cheng, Elaine | Davis, Matthew J | Goh, Gerald | Choi, Murim | Ariyan, Stephan | Narayan, Deepak | Dutton-Regester, Ken | Capatana, Ana | Holman, Edna C | Bosenberg, Marcus | Sznol, Mario | Kluger, Harriet M | Brash, Douglas E | Stern, David F | Materin, Miguel A | Lo, Roger S | Mane, Shrikant | Ma, Shuangge | Kidd, Kenneth K | Hayward, Nicholas K | Lifton, Richard P | Schlessinger, Joseph | Boggon, Titus J | Halaban, Ruth
We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS. Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 9.2% of sun-exposed melanomas. This activating mutation, the third most frequent in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1P29S) in the highly conserved switch I domain. Crystal structures, and biochemical and functional studies of RAC1P29S showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration. These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit.
doi:10.1038/ng.2359
PMCID: PMC3432702
PMID: 22842228
Aschebrook-Kilfoy, Briseis | Zheng, Tongzhang | Foss, Francine | Ma, Shuangge | Han, Xuesong | Lan, Qing | Holford, Theodore | Chen, Yingtai | Leaderer, Brian | Rothman, Nathaniel | Zhang, Yawei
Introduction
Cytokines play a critical role in regulating the immune system. In the tumor microenvironment, they influence survival, proliferation, differentiation, and movement of both tumor and stromal cells, and regulate tumor interactions with the extracellular matrix. Given these biologic properties, there is reason to hypothesize that cytokine activity influences the pathogenesis of non-Hodgkin lymphoma (NHL).
Methods
We investigated the effect of genetic variation in cytokine genes on NHL prognosis and survival by evaluating genetic variation in individual SNPs as well as the combined effect of multiple deleterious genotypes. Survival information from 496 female incident NHL cases diagnosed during 1996–2000 in Connecticut were abstracted from Connecticut Tumor Registry in 2008. Survival analyses were conducted by comparing Kaplan-Meier curves and hazard ratios (HR) were computed using Cox proportional hazard models adjusting for demographic and tumor characteristics for genes that were suggested by previous studies to be associated with NHL survival.
Results
We found that the variant IL6 genotype is significantly associated (HR=0.42; 95%CI: 0.23–0.77) with a decreased risk of death, as well as relapse and secondary cancer occurrence, among those with NHL. We also found that risk of death, relapse, and secondary cancers varied by specific SNPs for the follicular, DLBCL, and CLL/SLL histologic types. We identified combinations of polymorphisms whose combined deleterious effect significantly alter overall NHL survival and disease-free survival.
Conclusion
Our study provides evidence that the identification of genetic polymorphisms in cytokine genes may help improve the prediction of NHL survival and prognosis.
doi:10.1007/s11764-010-0164-4
PMCID: PMC3326600
PMID: 22113576
Non-Hodgkin lymphoma; Cytokines; Single nucleotide polymorphisms; Survival
Summary
High-throughput gene profiling studies have been extensively conducted, searching for markers associated with cancer development and progression. In this study, we analyse cancer prognosis studies with right censored survival responses. With gene expression data, we adopt the weighted gene co-expression network analysis (WGCNA) to describe the interplay among genes. In network analysis, nodes represent genes. There are subsets of nodes, called modules, which are tightly connected to each other. Genes within the same modules tend to have co-regulated biological functions. For cancer prognosis data with gene expression measurements, our goal is to identify cancer markers, while properly accounting for the network module structure. A two-step sparse boosting approach, called Network Sparse Boosting (NSBoost), is proposed for marker selection. In the first step, for each module separately, we use a sparse boosting approach for within-module marker selection and construct module-level ‘super markers ’. In the second step, we use the super markers to represent the effects of all genes within the same modules and conduct module-level selection using a sparse boosting approach. Simulation study shows that NSBoost can more accurately identify cancer-associated genes and modules than alternatives. In the analysis of breast cancer and lymphoma prognosis studies, NSBoost identifies genes with important biological implications. It outperforms alternatives including the boosting and penalization approaches by identifying a smaller number of genes/modules and/or having better prediction performance.
doi:10.1017/S0016672312000419
PMCID: PMC3573352
PMID: 22950901
Background
The main goal of this study is to examine the associations between illness conditions and out-of-pocket medical expenditure with other types of household consumptions. In November and December of 2011, a survey was conducted in three cities in western China, namely Lan Zhou, Gui Lin and Xi An, and their surrounding rural areas.
Results
Information on demographics, income and consumption was collected on 2,899 households. Data analysis suggested that the presence of household members with chronic diseases was not associated with characteristics of households or household heads. The presence of inpatient treatments was significantly associated with the age of household head (p-value 0.03). The level of per capita medical expense was significantly associated with household size, presence of members younger than 18, older than 65, basic health insurance coverage, per capita income, and household head occupation. Adjusting for confounding effects, the presence of chronic diseases was negatively associated with the amount of basic consumption (p-value 0.02) and the percentage of basic consumption (p-value 0.01), but positively associated with the percentage of insurance expense (p-value 0.02). Medical expenditure was positively associated with all other types of consumptions, including basic, education, saving and investment, entertainment, insurance, durable goods, and alcohol/tobacco. It was negatively associated with the percentage of basic consumption, saving and investment, and insurance.
Conclusions
Early studies conducted in other Asian countries and rural China found negative associations between illness conditions and medical expenditure with other types of consumptions. This study was conducted in three major cities and surrounding areas in western China, which had not been well investigated in published literature. The observed consumption patterns were different from those in early studies, and the negative associations were not observed. This study may complement the existing rural studies and provide useful information on western Chinese cities.
doi:10.1371/journal.pone.0052928
PMCID: PMC3532419
PMID: 23285229
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods.
doi:10.1371/journal.pone.0051198
PMCID: PMC3522680
PMID: 23272092
Although in cancer research microarray gene profiling studies have been successful in identifying genetic variants predisposing to the development and progression of cancer, the identified markers from analysis of single datasets often suffer low reproducibility. Among multiple possible causes, the most important one is the small sample size hence the lack of power of single studies. Integrative analysis jointly considers multiple heterogeneous studies, has a significantly larger sample size, and can improve reproducibility. In this article, we focus on cancer prognosis studies, where the response variables are progression-free, overall, or other types of survival. A group minimax concave penalty (GMCP) penalized integrative analysis approach is proposed for analyzing multiple heterogeneous cancer prognosis studies with microarray gene expression measurements. An efficient group coordinate descent algorithm is developed. The GMCP can automatically accommodate the heterogeneity across multiple datasets, and the identified markers have consistent effects across multiple studies. Simulation studies show that the GMCP provides significantly improved selection results as compared with the existing meta-analysis approaches, intensity approaches, and group Lasso penalized integrative analysis. We apply the GMCP to four microarray studies and identify genes associated with the prognosis of breast cancer.
doi:10.1002/sim.4337
PMCID: PMC3399910
PMID: 22105693
integrative analysis; cancer prognosis; microarray; penalized selection
Background
The health insurance system in Taiwan is comprised of public health insurance and private health insurance. The public health insurance, called “universal national health insurance” (NHI), was first established in 1995 and amended in 2011. The goal of this study is to provide an updated description of several important aspects of health insurance in Taiwan. Of special interest are household insurance coverage, medical expenditures (both gross and out-of-pocket), and coping strategies.
Methods
Data was collected via a phone call survey conducted in August and September of 2011. A household was the unit for survey and data analysis. A total of 2,424 households covering all major counties and cities in Taiwan were surveyed.
Results
The survey revealed that households with smaller sizes and higher incomes were more likely to have higher coverage of public and private health insurance. In addition, households with the presence of chronic diseases were more likely to have both types of insurance. Analysis of both gross and out-of-pocket medical expenditure was conducted. It was suggested that health insurance could not fully remove the financial burden caused by illness. The presence of chronic disease and inpatient treatment were significantly associated with higher gross and out-of-pocket medical expenditure. In addition, the presence of inpatient treatment was significantly associated with extremely high medical expenditure. Regional differences were also observed, with households in the northern, central, and southern regions having less gross medical expenditures than those on the offshore islands. Households with the presence of inpatient treatment were more likely to cope with medical expenditure using means other than salaries.
Conclusion
Despite the considerable achievements of the health insurance system in Taiwan, there is still room for improvement. This study investigated coverage, cost, and coping strategies and may be informative to stakeholders of both basic and commercial health insurance.
doi:10.1186/1472-6963-12-442
PMCID: PMC3519736
PMID: 23206690
Taiwan; Health insurance coverage; Medical expenditure; Coping strategy
Kim, Christopher | McGlynn, Katherine A. | McCorkle, Ruth | Erickson, Ralph L. | Niebuhr, David W. | Ma, Shuangge | Graubard, Barry | Aschebrook-Kilfoy, Briseis | Barry, Kathryn Hughes | Zhang, Yawei
Introduction
Evidence from previous studies has suggested there may be physical and mental changes in health among testicular cancer survivors. No studies have been conducted in the United States, however.
Methods
Study participants were initially enrolled in the US Servicemen's Testicular Tumor Environmental and Endocrine Determinants (STEED) study between 2002 and 2005. A total of 246 TGCT (testicular germ cell tumor) cases and 236 non-testicular cancer controls participated in the current study, and completed a self-administered questionnaire. Mean time since diagnosis for cases was 14 years, and no less than five for all cases. Component scores determined from responses to questions about physical and mental health on SF36 were tabulated to yield two summary measures, physical component scores (PCS), and mental component scores (MCS). Component and summary scores were normalized to a score of 50 with a standard deviation of 10 by a linear T-score transformation.
Results
Overall, cases may not suffer greatly in different quality of life than controls. When all cases and controls are compared, TGCT cases had lower PCS (mean: 51.9 95% CI: 50.6–53.2, P value: 0.037) than controls (mean: 53.6 95% CI: 52.7–54.6). MCS were not significantly different (P value: 0.091). In multivariate analyses, several physical health components were worse for TGCT cases such as role-physical (OR 1.19, 95% CI: 1.01–1.39) and general health (OR 1.26, 95% CI: 1.07–1.49) compared to controls. However, TGCT cases treated with chemotherapy had lower PCS (cases: 50.2, 95% CI: 47.6–52.8; controls: 53.6, 95% CI: 52.7–54.6, P value: 0.0032) and MCS (cases: 49.3, 95% CI: 46.5–52.1; controls: 52.0, 95% CI: 50.9–53.2, P value: 0.039). TGCT cases who received treatments other than chemotherapy did not differ from controls in either PCS or MCS.
Discussion
Physical and general health limitations may affect testicular cancer survivors. Men treated with chemotherapy, however, may be most likely to suffer adverse health outcomes due to a combination of body-wide effects on physical and mental factors which affect various aspects of physical health, mental health, and overall quality of life. And in particular, physical functioning, role–physical, and general health are strongly affected.
doi:10.1007/s11136-011-9907-6
PMCID: PMC3149776
PMID: 21499930
Health status; Quality of life; Testicular cancer
Introduction
Despite decades of intensive research, Non-Hodgkin Lymphoma (NHL) remains poorly understood and is largely incurable. NHL is a heterogeneous group of malignancies with multiple subtypes, each of which has distinct morphologic, immunophenotypic, and clinical features. Identifying the risk factors for NHL may improve our understanding of the underlying biological mechanisms and have an impact on clinical practice.
Areas covered
This article provides a review of several aspects of NHL, including epidemiology and subtype classification, clinical, environmental, genetic, and genomic risk factors identified for etiology and prognosis, and available statistical and bioinformatics tools for identification of genetic and genomic risk factors from the analysis of high-throughput studies.
Expert opinion
Multiple clinical and environmental risk factors have been identified. However, they have failed to provide practically effective prediction. Genetic and genomic risk factors identified from high-throughput studies have suffered a lack of reproducibility. The identification of genetic/genomic risk factors demands innovative statistical and bioinformatics tools. Although multiple analysis methods have been developed, there is still room for improvement. There is a critical need for well-designed, prospective, large-scale pangenomic studies.
doi:10.1517/17530059.2011.618185
PMCID: PMC3205981
PMID: 22059093
NHL; etiology; prognosis; risk factors; bioinformatics analysis
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent ‘non-standard’ applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.
doi:10.1093/bib/bbq090
PMCID: PMC3220871
PMID: 21242203
principal component analysis; dimension reduction; bioinformatics methodologies; gene expression
Background
Significant health expenses can force households to reduce consumption of items required for daily living and long-term well-being, depriving them of the capability to lead economically stable and healthy lives. Previous studies of out-of-pocket (OOP) and other health expenses have typically characterized them as “catastrophic” in terms of a threshold level or percentage of household income. We aim to re-conceptualize the impact of health expenses on household “flourishing” in terms of “basic capabilities.”
Methods and Findings
We conducted a 2008 survey covering 697 households, on consumption patterns and health treatments for the previous 12 months. We compare consumption patterns between households with and without inpatient treatment, and between households with different levels of outpatient treatment, for the entire study sample as well as among different income quartiles. We find that compared to households without inpatient treatment and with lower levels of outpatient treatment, households with inpatient treatment and higher levels of outpatient treatment reduced investments in basic capabilities, as evidenced by decreased consumption of food, education and production means. The lowest income quartile showed the most significant decrease. No quartile with inpatient or high-level outpatient treatment was immune to reductions.
Conclusions
The effects of health expenses on consumption patterns might well create or exacerbate poverty and poor health, particularly for low income households. We define health expenditures as catastrophic by their reductions of basic capabilities. Health policy should reform the OOP system that causes this economic and social burden.
doi:10.1371/journal.pone.0047423
PMCID: PMC3471826
PMID: 23077612
In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.
doi:10.1093/biostatistics/kxr004
PMCID: PMC3169668
PMID: 21415015
High-dimensional data; Integrative analysis; 2-norm group bridge
Background
In high throughput cancer genomic studies, results from the analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost.
Results
An integrative prescreening approach is developed for the analysis of multiple cancer genomic datasets. Simulation shows that the proposed integrative prescreening has better performance than alternatives, particularly including prescreening with individual datasets, an intensity approach and meta-analysis. We also analyze multiple microarray gene profiling studies on liver and pancreatic cancers using the proposed approach.
Conclusions
The proposed integrative prescreening provides an effective way to reduce the dimensionality in cancer genomic studies. It can be coupled with existing analysis methods to identify cancer markers.
doi:10.1186/1471-2105-13-168
PMCID: PMC3436748
PMID: 22799431
South Asians from India and Pakistan represent one of the fastest growing immigrant populations in the US, yet there are limited data assessing breast cancers for this distinct ethnic sub-group. The aim of this study was to analyze clinical-pathologic, treatment and outcome characteristics of U.S.-residing Indian-Pakistani (IP) versus non-Hispanic white (NHW) female breast cancer patients to assess if any differences/disparities exist. The study cohort consisted of 2,393 IP and 555,832 NHW women (diagnosed 1988–2006) in the SEER database. Differences between the two populations were analyzed using chisquared and multivariate regression analysis. Age-adjusted incidence, mortality, and relative survival rates were calculated for the two groups. Significant differences in the characteristics of the IP cohort’s invasive disease included: younger median age at presentation; larger tumor size; higher stage, higher grade, more involved lymph-nodes, and more hormone receptor negative disease (all P < 0.01). The age-adjusted incidence and breast cancer mortality were lower in IP women. The relative survival at 5 years was statistically significant at 84% for IP versus 89% for NHW women, but was not significantly different on multivariate analysis (P > 0.05). Within each stage (Tis, I, II), there were no disparities in the rate of breast conservation surgery (BCS) or in the percentage of patients receiving adjuvant radiation after BCS for the 2 cohorts. Post-mastectomy radiation was delivered significantly more often in stage I/II IP patients undergoing mastectomy. In conclusion, this analysis suggests that while there appear to be significant differences in the features of breast cancers of US-residing IP women, no disparities were noted in the rates of breast conserving surgery or adjuvant radiation, as seen in some other ethnicities. The more aggressive clinical-pathologic features stage-for-stage in IP women may partially explain the more frequent use of post-mastectomy RT in this patient population. These findings warrant further investigation.
doi:10.1007/s10549-011-1362-0
PMCID: PMC3235412
PMID: 21301957
Breast cancer; Ethnicity Indian; Disparities; SEER Radiation; Pakistan; Breast conservation; Asian
Background
China has one of the world's largest health insurance systems, composed of government-run basic health insurance and commercial health insurance. The basic health insurance has undergone system-wide reform in recent years. Meanwhile, there is also significant development in the commercial health insurance sector. A phone call survey was conducted in three major cities in China in July and August, 2011. The goal was to provide an updated description of the effect of health insurance on the population covered. Of special interest were insurance coverage, gross and out-of-pocket medical cost and coping strategies.
Results
Records on 5,097 households were collected. Analysis showed that smaller households, higher income, lower expense, presence of at least one inpatient treatment and living in rural areas were significantly associated with a lower overall coverage rate. In the separate analysis of basic and commercial health insurance, similar factors were found to have significant associations. Higher income, presence of chronic disease, presence of inpatient treatment, higher coverage rates and living in urban areas were significantly associated with higher gross medical cost. A similar set of factors were significantly associated with higher out-of-pocket cost. Households with lower income, inpatient treatment, higher commercial insurance coverage, and living in rural areas were significantly more likely to pursue coping strategies other than salary.
Conclusions
The surveyed cities and surrounding rural areas had socioeconomic status far above China's average. However, there was still a need to further improve coverage. Even for households with coverage, there was considerable out-of-pocket medical cost, particularly for households with inpatient treatments and/or chronic diseases. A small percentage of households were unable to self-finance out-of-pocket medical cost. Such observations suggest possible targets for further improving the health insurance system.
doi:10.1371/journal.pone.0039157
PMCID: PMC3377611
PMID: 22723954
In cancer research, high-throughput genomic studies have been extensively conducted, searching for markers associated with cancer diagnosis, prognosis and variation in response to treatment. In this article, we analyze cancer prognosis studies and investigate ranking markers based on their marginal prognosis power. To avoid ambiguity, we focus on microarray gene expression studies where genes are the markers, but note that the methodology and results are applicable to other high-throughput studies. The objectives of this study are 2-fold. First, we investigate ranking markers under three commonly adopted semiparametric models, namely the Cox, accelerated failure time and additive risk models. Data analysis shows that the ranking may vary significantly under different models. Second, we describe a nonparametric concordance measure, which has roots in the time-dependent ROC (receiver operating characteristic) framework and relies on much weaker assumptions than the semiparametric models. In simulation, it is shown that ranking using the concordance measure is not sensitive to model specification whereas ranking under the semiparametric models is. In data analysis, the concordance measure generates rankings significantly different from those under the semiparametric models.
doi:10.1093/bib/bbq069
PMCID: PMC3030811
PMID: 21087949
cancer prognosis markers; semiparametric survival analysis; concordance measure
We propose a new penalized method for variable selection and estimation that explicitly incorporates the correlation patterns among predictors. This method is based on a combination of the minimax concave penalty and Laplacian quadratic associated with a graph as the penalty function. We call it the sparse Laplacian shrinkage (SLS) method. The SLS uses the minimax concave penalty for encouraging sparsity and Laplacian quadratic penalty for promoting smoothness among coefficients associated with the correlated predictors. The SLS has a generalized grouping property with respect to the graph represented by the Laplacian quadratic. We show that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability. This result holds in sparse, high-dimensional settings with p ≫ n under reasonable conditions. We derive a coordinate descent algorithm for computing the SLS estimates. Simulation studies are conducted to evaluate the performance of the SLS method and a real data example is used to illustrate its application.
PMCID: PMC3217586
PMID: 22102764
Graphical structure; minimax concave penalty; penalized regression; high-dimensional data; variable selection; oracle property
We use a novel penalized approach for genome-wide association study that accounts for the linkage disequilibrium between adjacent markers. This method uses a penalty on the difference of the genetic effect at adjacent single-nucleotide polymorphisms and combines it with the minimax concave penalty, which has been shown to be superior to the least absolute shrinkage and selection operator (LASSO) in terms of estimator bias and selection consistency. Our method is implemented using a coordinate descent algorithm. The value of the tuning parameters is determined by extended Bayesian information criteria. The leave-one-out method is used to compute p-values of selected single-nucleotide polymorphisms. Its applicability to a simulated data from Genetic Analysis Workshop 17 replication one is illustrated. Our method selects three SNPs (C13S522, C13S523, and C13S524), whereas the LASSO method selects two SNPs (C13S522 and C13S523).
doi:10.1186/1753-6561-5-S9-S67
PMCID: PMC3287906
PMID: 22373491
Ba, Yue | Yu, Hebert | Liu, Fudong | Geng, Xue | Zhu, Cairong | Zhu, Quan | Zheng, Tongzhang | Ma, Shuangge | Wang, Gang | Li, Zhiyuan | Zhang, Yawei
Background/Objective
One of the speculated mechanisms underlying fetal origin hypothesis of breast cancer is the possible influence of maternal environment on epigenetic regulation, such as changes in DNA methylation of the insulin-like growth factor-2 (IGF2) gene. The aim of the study is to investigate the relationship between folate, vitamin B12 and methylation of the IGF2 gene in maternal and cord blood.
Subjects/Methods
We conducted a cross-sectional study to measure methylation patterns of IGF2 in promoters 2 (P2) and 3 (P3).
Results
The percentage of methylation in IGF2 P3 was higher in maternal blood than in cord blood (p<0.0001), while the methylation in P2 was higher in cord blood than in maternal blood (p=0.016). P3 methylation was correlated between maternal and cord blood (p<0.0001) but not P2 (p=0.06). The multivariate linear regression model showed that methylation patterns of both promoters in cord blood were not associated with serum folate levels in either cord or maternal blood, while the P3 methylation patterns were associated with serum levels of vitamin B12 in mother’s blood (MC=−0.22, p=0.0014). Methylation patterns in P2 of maternal blood were associated with serum levels of vitamin B12 in mother’s blood (MC=−0.23, p=0.012), exposure to passive smoking (MC=0.46, p=0.034) and mother’s weight gain during pregnancy (MC=0.23, p=0.019).
Conclusions
The study suggests that environment influences methylation patterns in maternal blood, and then the maternal patterns influence the methylation status and levels of folate and vitamin B12 in cord blood.
doi:10.1038/ejcn.2010.294
PMCID: PMC3071883
PMID: 21245875
Folate; vitamin B12; methylation; IGF2; cord blood
U-estimates are defined as maximizers of objective functions that are U-statistics. As an alternative to M-estimates, U-estimates have been extensively used in linear regression, classification, survival analysis, and many other areas. They may rely on weaker data and model assumptions and be preferred over alternatives. In this article, we investigate penalized variable selection with U-estimates. We propose smooth approximations of the objective functions, which can greatly reduce computational cost without affecting asymptotic properties. We study penalized variable selection using penalties that have been well investigated with M-estimates, including the LASSO, adaptive LASSO, and bridge, and establish their asymptotic properties. Generically applicable computational algorithms are described. Performance of the penalized U-estimates is assessed using numerical studies.
doi:10.1080/10485250903348781
PMCID: PMC3167075
PMID: 21904440
U-estimate; penalization; variable selection
Han, Xuesong | Zheng, Tongzhang | Foss, Francine M. | Ma, Shuangge | Holford, Theodore R. | Boyle, Peter | Leaderer, Brian | Zhao, Ping | Dai, Min | Zhang, Yawei
Introduction
Epidemiological studies have shown that moderate alcohol drinkers have a lower death rate for all causes. Alcohol drinking has also been associated with reduced risk of non-Hodgkin lymphoma (NHL). Here, we examined the role of alcohol consumption on NHL survival by type of alcohol consumed and NHL subtype.
Methods
A cohort of 575 female NHL incident cases diagnosed during 1996–2000 in Connecticut was followed-up for a median of 7.75 years. Demographic, clinical, and lifestyle information was collected at diagnosis. Survival analyses were conducted with Kaplan-Meier methods, and hazard ratios (HR) were estimated from Cox Proportional Hazards models.
Results
Compared to never drinkers, wine drinkers experienced better overall survival (75% vs. 69% five-year survival rates, p-value for log-rank test=0.030) and better disease free survival (70% vs. 67% five-year disease-free survival rates, p-value for log-rank test=0.049). Analysis by NHL subtype shows that the favorable effect of wine consumption was mainly seen for patients diagnosed with diffuse large B-cell lymphoma (DLBCL) (wine drinkers for more than 25 years vs. never drinkers: HR=0.36, 95% CI 0.14–0.94 for overall survival; HR=0.38, 95% CI 0.16–0.94 for disease-free survival), and the adverse effect of liquor consumption was also observed among DLBCL patients (liquor drinkers vs. never drinkers: HR=2.49, 95% CI 1.26–4.93 for disease-free survival).
Conclusions
Our results suggest a moderate relationship between pre-diagnostic alcohol consumption and NHL survival, particularly for DLBCL. The results need to be replicated in larger studies.
Implications for cancer survivors
Pre-diagnostic behaviors might impact the prognosis and survival of NHL patients.
doi:10.1007/s11764-009-0111-4
PMCID: PMC3141078
PMID: 20039144
Alcohol; Wine; Liquor; Non-Hodgkin lymphoma; Prognosis; Survival
Development of high-throughput technologies makes it possible to survey the whole genome. Genomic studies have been extensively conducted, searching for markers with predictive power for prognosis of complex diseases such as cancer, diabetes and obesity. Most existing statistical analyses are focused on developing marker selection techniques, while little attention is paid to the underlying prognosis models. In this article, we review three commonly used prognosis models, namely the Cox, additive risk and accelerated failure time models. We conduct simulation and show that gene identification can be unsatisfactory under model misspecification. We analyze three cancer prognosis studies under the three models, and show that the gene identification results, prediction performance of all identified genes combined, and reproducibility of each identified gene are model-dependent. We suggest that in practical data analysis, more attention should be paid to the model assumption, and multiple models may need to be considered.
doi:10.1093/bib/bbp070
PMCID: PMC2905523
PMID: 20123942
genomic studies; semiparametric prognosis models; model comparison
HAN, XUESONG | ZHENG, TONGZHANG | FOSS, FRANCINE | HOLFORD, THEODORE R. | MA, SHUANGGE | ZHAO, PING | DAI, MIN | KIM, CHRISTOPHER | ZHANG, YAQUN | BAI, YANA | ZHANG, YAWEI
We investigated whether an increased intake of vegetables and fruits favors NHL survival. A cohort of 568 female cases of incident NHL diagnosed during 1996–2000 in Connecticut was followed up for a median of 7.7 years. Adjusted hazard ratios (HRs) were estimated by Cox proportional hazard models. Our results show that a pre-diagnostic high intake of vegetables appeared to favor overall survival (HR = 0.74, 95% CI 0.57–0.98) among patients with NHL who survived longer than 6 months. In particular, pre-diagnostic high intakes of green leafy vegetables and citrus fruits were associated with 29% (95% CI 0.51–0.98) and 27% (95% CI 0.54–0.99) reduced risk of death, respectively. When different types of vegetables and fruits were investigated separately, their impacts were found to vary in NHL subtypes. Our study suggests that increasing vegetable and citrus fruit consumption could be a useful strategy to improve survival in NHL patients.
doi:10.3109/10428191003690364
PMCID: PMC3110752
PMID: 20350273
Vegetables and fruits; non-Hodgkin lymphoma; prognosis; survival
Kim, Christopher | McGlynn, Katherine A. | McCorkle, Ruth | Zheng, Tongzhang | Erickson, Ralph L. | Niebuhr, David W. | Ma, Shuangge | Zhang, Yaqun | Bai, Yana | Dai, Li | Graubard, Barry I. | Kilfoy, Briseis | Barry, Kathryn Hughes | Zhang, Yawei
Introduction
Testicular germ cell tumors (TGCT) disproportionately affect men between the ages of 15 and 49 years, when reproduction is typical. Although TGCT treatment directly affects gonadal tissues, it remains unclear whether there are long-term effects on fertility.
Methods
To examine post-TGCT treatment fertility, study participants in a previously conducted case-control study were contacted. The men were initially enrolled in the US Servicemen's Testicular Tumor Environmental and Endocrine Determinants (STEED) study between 2002 and 2005. A total of 246 TGCT cases and 236 controls participated in the current study and completed a self-administered questionnaire in 2008-2009.
Results
TGCT cases were significantly more likely than controls to experience fertility distress (OR 5.23; 95% CI 1.99-13.76) and difficulty in fathering children (OR 6.41; 2.72-15.13). Cases were also more likely to be tested for infertility (OR 3.65; 95% CI 1.55-8.59). Cases, however, did not differ from controls in actually fathering children (OR 1.37; 95% CI 0.88-2.15). These findings were predominantly observed among nonseminoma cases and cases treated with surgery only or surgery-plus-chemotherapy.
Discussion
While expressing greater fertility distress, higher likelihood of fertility testing, and difficulty fathering children, these data suggest that TGCT survivors are no less likely to father children than are other men. It is possible that treatment for TGCT does not permanently affect fertility or, alternatively, that TGCT survivors attempt to father children with greater persistence or at younger ages than do other men.
doi:10.1007/s11764-010-0134-x
PMCID: PMC3057887
PMID: 20571931