For censored survival outcomes, it can be of great interest to evaluate the predictive power of individual markers or their functions. Compared with alternative evaluation approaches, the time-dependent ROC (receiver operating characteristics) based approaches rely on much weaker assumptions, can be more robust, and hence are preferred. In this article, we examine evaluation of markers’ predictive power using the time-dependent ROC curve and a concordance measure which can be viewed as a weighted area under the time-dependent AUC (area under the ROC curve) profile. This study significantly advances from existing time-dependent ROC studies by developing nonparametric estimators of the summary indexes and, more importantly, rigorously establishing their asymptotic properties. It reinforces the statistical foundation of the time-dependent ROC based evaluation approaches for censored survival outcomes. Numerical studies, including simulations and application to an HIV clinical trial, demonstrate the satisfactory finite-sample performance of the proposed approaches.
time-dependent ROC; concordance measure; inverse-probability-of-censoring weighting; marker evaluation; survival outcomes
In breast cancer research, it is of great interest to identify genomic markers associated with prognosis. Multiple gene profiling studies have been conducted for such a purpose. Genomic markers identified from the analysis of single datasets often do not have satisfactory reproducibility. Among the multiple possible reasons, the most important one is the small sample sizes of individual studies. A cost-effective solution is to pool data from multiple comparable studies and conduct integrative analysis. In this study, we collect four breast cancer prognosis studies with gene expression measurements. We describe the relationship between prognosis and gene expressions using the accelerated failure time (AFT) models. We adopt a 2-norm group bridge penalization approach for marker identification. This integrative analysis approach can effectively identify markers with consistent effects across multiple datasets and naturally accommodate the heterogeneity among studies. Statistical and simulation studies demonstrate satisfactory performance of this approach. Breast cancer prognosis markers identified using this approach have sound biological implications and satisfactory prediction performance.
Breast cancer prognosis; Gene expression; Marker identification; Integrative analysis; 2-norm group bridge
High-throughput studies have been extensively conducted in the research of complex human diseases. As a representative example, consider gene-expression studies where thousands of genes are profiled at the same time. An important objective of such studies is to rank the diagnostic accuracy of biomarkers (e.g. gene expressions) for predicting outcome variables while properly adjusting for confounding effects from low-dimensional clinical risk factors and environmental exposures. Existing approaches are often fully based on parametric or semi-parametric models and target evaluating estimation significance as opposed to diagnostic accuracy. Receiver operating characteristic (ROC) approaches can be employed to tackle this problem. However, existing ROC ranking methods focus on biomarkers only and ignore effects of confounders. In this article, we propose a model-based approach which ranks the diagnostic accuracy of biomarkers using ROC measures with a proper adjustment of confounding effects. To this end, three different methods for constructing the underlying regression models are investigated. Simulation study shows that the proposed methods can accurately identify biomarkers with additional diagnostic power beyond confounders. Analysis of two cancer gene-expression studies demonstrates that adjusting for confounders can lead to substantially different rankings of genes.
ranking biomarkers; ROC; confounders; high-throughput data
Recent biomedical studies often measure two distinct sets of risk factors: low-dimensional clinical and environmental measurements, and high-dimensional gene expression measurements. For prognosis studies with right censored response variables, we propose a semiparametric regression model whose covariate effects have two parts: a nonparametric part for low-dimensional covariates, and a parametric part for high-dimensional covariates. A penalized variable selection approach is developed. The selection of parametric covariate effects is achieved using an iterated Lasso approach, for which we prove the selection consistency property. The nonparametric component is estimated using a sieve approach. An empirical model selection tool for the nonparametric component is derived based on the Kullback-Leibler geometry. Numerical studies show that the proposed approach has satisfactory performance. Application to a lymphoma study illustrates the proposed method.
Semiparametric regression; variable selection; right censored data; iterated Lasso
Illness conditions lead to medical expenditure. Even with various types of medical insurance, there can still be considerable out-of-pocket costs. Medical expenditure can affect other categories of household consumptions. The goal of this study is to provide an updated empirical description of the distributions of illness conditions and medical expenditure and their associations with other categories of household consumptions.
A phone-call survey was conducted in June and July of 2012. The study was approved by ethics review committees at Xiamen University and FuJen Catholic University. Data was collected using a Computer-Assisted Telephone Survey System (CATSS). “Household” was the unit for data collection and analysis. Univariate and multivariate analyses were conducted, examining the distributions of illness conditions and the associations of illness and medical expenditure with other household consumptions.
The presence of chronic disease and inpatient treatment was not significantly associated with household characteristics. The level of per capita medical expenditure was significantly associated with household size, income, and household head occupation. The presence of chronic disease was significantly associated with levels of education, insurance and durable goods consumption. After adjusting for confounders, the associations with education and durable goods consumption remained significant. The presence of inpatient treatment was not associated with consumption levels. In the univariate analysis, medical expenditure was significantly associated with all other consumption categories. After adjusting for confounding effects, the associations between medical expenditure and the actual amount of entertainment expenses and percentages of basic consumption, savings, and insurance (as of total consumption) remained significant.
This study provided an updated description of the distributions of illness conditions and medical expenditure in Taiwan. The findings were mostly positive in that illness and medical expenditure were not observed to be significantly associated with other consumption categories. This observation differed from those made in some other Asian countries and could be explained by the higher economic status and universal basic health insurance coverage of Taiwan.
Illness; Medical expenditure; Household consumption; Taiwan
Sexual function among testicular cancer survivors is a concern because affected men are of reproductive age when diagnosed. We conducted a case-control study among United States military men to examine whether testicular cancer survivors experienced impaired sexual function.
A total of 246 testicular cancer cases and 236 ethnicity and age matched controls were enrolled in the study in 2008-2009. The Brief Male Sexual Function Inventory (BMSFI) was used to assess sexual function.
Compared to controls, cases scored significantly lower on sex drive (5.77 vs. 5.18), erection (9.40 vs. 8.63), ejaculation (10.83 vs. 9.90), and problem assessment (10.55 vs. 9.54). Cases were significantly more likely to have impaired erection (OR 1.72; 95% CI 1.11-2.64), ejaculation (OR 2.27; 95% CI 1.32-3.91), and problem assessment (OR 2.36; 95% CI 1.43-3.90). In histology and treatment analysis, nonseminoma, chemotherapy and radiation treated cases risk of erectile dysfunction, delayed ejaculation, and/or problem assessment were greater when compared to controls.
This study provides evidence that testicular cancer survivors are more likely to have impaired sexual functioning compared to demographically matched controls. The observed impaired sexual functioning appeared to vary by treatment regimen and histologic subtype.
Testicular cancer; sexual function; military men
Non-Hodgkin Lymphoma (NHL) is a heterogeneous group of malignancies with over thirty different subtypes. Follicular lymphoma (FL) is the most common form of indolent NHL and the second most common form of NHL overall. It has morphologic, immunophenotypic and clinical features significantly different from other subtypes. Considerable effort has been devoted to the identification of risk factors for etiology and prognosis of FL. These risk factors may advance our understanding of the biology of FL and have an impact on clinical practice.
The epidemiology of NHL and FL is briefly reviewed. For FL etiology and prognosis separately, we review clinical, environmental and molecular (including genetic, genomic, epigenetic and others) risk factors suggested in the literature.
A large number of potential risk factors have been suggested in recent studies. However, there is a lack of consensus, and many of the suggested risk factors have not been rigorously validated in independent studies. There is a need for large-scale, prospective studies to consolidate existing findings and discover new risk factors. Some of the identified risk factors are successful at the population level. More effective individual-level risk factors and models remain to be identified.
Follicular lymphoma; Etiology; Non-Hodgkin lymphoma; Prognosis; Risk factor
In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack of reproducibility because of the small sample sizes. An ideal solution is to conduct large-scale prospective studies, which are extremely expensive and time consuming. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple data sets is challenging because of the high dimensionality of genomic measurements and heterogeneity among studies. In this article, we propose a sparse boosting approach for marker identification in integrative analysis of multiple heterogeneous cancer diagnosis studies with gene expression measurements. The proposed approach can effectively accommodate the heterogeneity among multiple studies and identify markers with consistent effects across studies. Simulation shows that the proposed approach has satisfactory identification results and outperforms alternatives including an intensity approach and meta-analysis. The proposed approach is used to identify markers of pancreatic cancer and liver cancer.
Cancer genomics; Marker identification; Sparse boosting
We analyze the Agatston score of coronary artery calcium (CAC) from the Multi-Ethnic Study of Atherosclerosis (MESA) using semi-parametric zero-inflated modeling approach, where the observed CAC scores from this cohort consist of high frequency of zeroes and continuously distributed positive values. Both partially constrained and unconstrained models are considered to investigate the underlying biological processes of CAC development from zero to positive, and from small amount to large amount. Different from existing studies, a model selection procedure based on likelihood cross-validation is adopted to identify the optimal model, which is justified by comparative Monte Carlo studies. A shrinkaged version of cubic regression spline is used for model estimation and variable selection simultaneously. When applying the proposed methods to the MESA data analysis, we show that the two biological mechanisms influencing the initiation of CAC and the magnitude of CAC when it is positive are better characterized by an unconstrained zero-inflated normal model. Our results are significantly different from those in published studies, and may provide further insights into the biological mechanisms underlying CAC development in human. This highly flexible statistical framework can be applied to zero-inflated data analyses in other areas.
cardiovascular disease; coronary artery calcium; likelihood cross-validation; model selection; penalized spline; proportional constraint; shrinkage
In MESA (Multi-Ethnic Study of Atherosclerosis), it is of interest to model the development and progression of CAC (coronary artery calcium). With about half of the CAC scores equal to zero and the rest continuously distributed, semiparametric two-part models are needed. Our main interest lies in determining the (partial) proportionality between the two covariate effects in two-part models. Such an investigation can provide important information on the mechanisms underlying CAC development. We propose a novel approach, which consists of penalized maximum likelihood estimation and a step-wise hypothesis testing procedure to determine proportionality. Simulation shows satisfactory performance of the proposed approach. Analysis of MESA suggests that proportionality holds for all covariates except LDL and HDL.
Two-part models; Proportionality; Semiparametric estimation
Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates not associated with response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalization approach for variable selection. In the first step, a mixture of the Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of the weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes dataset, show satisfactory performance of the proposed approach.
Iterated penalization; Variable selection; Semiparametric regression
In the evaluation of a healthcare system, it is of interest to identify factors associated with the usage of different healthcare facilities and with different levels of medical expenditure.
A survey was conducted in January and February of 2012 in China. It focused on the middle-aged and elderly with age of 45 and above. A total of 2,093 people from 1,152 households were surveyed.
For inpatient treatment, the probability of using grade III hospitals, which had the highest level of care, was positively associated with age, being married, living in urban areas, and having higher income. For outpatient treatment, the probability of using grade III hospitals was positively associated with age, being married, working in enterprises, living in urban areas, living in central and western regions, and having higher income, and negatively associated with being farmers. The total and out-of-pocket (OOP) medical expenses were analyzed separately. It was found that the expense level was associated with age, education, occupation, living in urban areas, type of hospital used, insurance being used, and per capita income.
The access to healthcare and level of medical expenditure were found as associated with demographic characteristics. In addition, differences between areas and regions were observed. Such results may be useful for identifying vulnerable population and for tuning future healthcare development policies.
The semiparametric partially linear model allows flexible modeling of covariate effects on the response variable in regression. It combines the flexibility of nonparametric regression and parsimony of linear regression. The most important assumption in the existing methods for the estimation in this model is to assume a priori that it is known which covariates have a linear effect and which do not. However, in applied work, this is rarely known in advance. We consider the problem of estimation in the partially linear models without assuming a priori which covariates have linear effects. We propose a semiparametric regression pursuit method for identifying the covariates with a linear effect. Our proposed method is a penalized regression approach using a group minimax concave penalty. Under suitable conditions we show that the proposed approach is model-pursuit consistent, meaning that it can correctly determine which covariates have a linear effect and which do not with high probability. The performance of the proposed method is evaluated using simulation studies, which support our theoretical results. A real data example is used to illustrated the application of the proposed method.
Group selection; Minimax concave penalty; Model-pursuit consistency; Penalized regression; Semiparametric models
The main goal of this study is to examine the distributions of illness conditions and resulting medical expenditures and their associated factors. To achieve this goal, an in-house survey was conducted in August of 2012 in rural Beijing, the capital city of China.
The survey was conducted in Nanjianchang and Beijianchang, which are two villages 20 KM away from Miyun, a satellite city of Beijing. Data was collected on 346 households, which included 834 members. Variables measured included household characteristics, household head characteristics, illness conditions, and medical expenditures. Illness conditions and corresponding expenditure were measured for inpatient treatment, outpatient treatment, and self-treatment separately. Multivariate analysis suggested that the presence of inpatient treatment was associated with household head characteristics including age, gender, and education. The presence of a high level of outpatient treatment was associated with household head characteristics including gender and education. The presence of a high level of self-treatment was significantly associated with household size. In the analysis of overall out-of-pocket (OOP) medical expenditure, only age of household head was borderline significant. In the analysis of OOP inpatient expenditure, age and gender of household head were borderline significant. The OOP outpatient expenditure was associated with household size, presence of members older than 60, household head's gender, marital status, and occupation. The OOP self-treatment expenditure was not associated with any household characteristic.
For the surveyed households, medical expenditure made up a considerable proportion of the total consumption. This study suggested that the presence of illness conditions and resulting OOP medical expenditure were associated with certain household and household head characteristics. Such results may help identify the subgroup that is the most affected by illness conditions. As this study collected recent data on inpatient, outpatient, and self-treatment separately, it may provide a useful complement to the existing studies.
In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution.
We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects.
Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance.
Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.
Breast cancer prognosis; Gene Expression; Integrative analysis; Sparse boosting
We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS. Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 9.2% of sun-exposed melanomas. This activating mutation, the third most frequent in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1P29S) in the highly conserved switch I domain. Crystal structures, and biochemical and functional studies of RAC1P29S showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration. These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit.
Cytokines play a critical role in regulating the immune system. In the tumor microenvironment, they influence survival, proliferation, differentiation, and movement of both tumor and stromal cells, and regulate tumor interactions with the extracellular matrix. Given these biologic properties, there is reason to hypothesize that cytokine activity influences the pathogenesis of non-Hodgkin lymphoma (NHL).
We investigated the effect of genetic variation in cytokine genes on NHL prognosis and survival by evaluating genetic variation in individual SNPs as well as the combined effect of multiple deleterious genotypes. Survival information from 496 female incident NHL cases diagnosed during 1996–2000 in Connecticut were abstracted from Connecticut Tumor Registry in 2008. Survival analyses were conducted by comparing Kaplan-Meier curves and hazard ratios (HR) were computed using Cox proportional hazard models adjusting for demographic and tumor characteristics for genes that were suggested by previous studies to be associated with NHL survival.
We found that the variant IL6 genotype is significantly associated (HR=0.42; 95%CI: 0.23–0.77) with a decreased risk of death, as well as relapse and secondary cancer occurrence, among those with NHL. We also found that risk of death, relapse, and secondary cancers varied by specific SNPs for the follicular, DLBCL, and CLL/SLL histologic types. We identified combinations of polymorphisms whose combined deleterious effect significantly alter overall NHL survival and disease-free survival.
Our study provides evidence that the identification of genetic polymorphisms in cytokine genes may help improve the prediction of NHL survival and prognosis.
Non-Hodgkin lymphoma; Cytokines; Single nucleotide polymorphisms; Survival
High-throughput gene profiling studies have been extensively conducted, searching for markers associated with cancer development and progression. In this study, we analyse cancer prognosis studies with right censored survival responses. With gene expression data, we adopt the weighted gene co-expression network analysis (WGCNA) to describe the interplay among genes. In network analysis, nodes represent genes. There are subsets of nodes, called modules, which are tightly connected to each other. Genes within the same modules tend to have co-regulated biological functions. For cancer prognosis data with gene expression measurements, our goal is to identify cancer markers, while properly accounting for the network module structure. A two-step sparse boosting approach, called Network Sparse Boosting (NSBoost), is proposed for marker selection. In the first step, for each module separately, we use a sparse boosting approach for within-module marker selection and construct module-level ‘super markers ’. In the second step, we use the super markers to represent the effects of all genes within the same modules and conduct module-level selection using a sparse boosting approach. Simulation study shows that NSBoost can more accurately identify cancer-associated genes and modules than alternatives. In the analysis of breast cancer and lymphoma prognosis studies, NSBoost identifies genes with important biological implications. It outperforms alternatives including the boosting and penalization approaches by identifying a smaller number of genes/modules and/or having better prediction performance.
The main goal of this study is to examine the associations between illness conditions and out-of-pocket medical expenditure with other types of household consumptions. In November and December of 2011, a survey was conducted in three cities in western China, namely Lan Zhou, Gui Lin and Xi An, and their surrounding rural areas.
Information on demographics, income and consumption was collected on 2,899 households. Data analysis suggested that the presence of household members with chronic diseases was not associated with characteristics of households or household heads. The presence of inpatient treatments was significantly associated with the age of household head (p-value 0.03). The level of per capita medical expense was significantly associated with household size, presence of members younger than 18, older than 65, basic health insurance coverage, per capita income, and household head occupation. Adjusting for confounding effects, the presence of chronic diseases was negatively associated with the amount of basic consumption (p-value 0.02) and the percentage of basic consumption (p-value 0.01), but positively associated with the percentage of insurance expense (p-value 0.02). Medical expenditure was positively associated with all other types of consumptions, including basic, education, saving and investment, entertainment, insurance, durable goods, and alcohol/tobacco. It was negatively associated with the percentage of basic consumption, saving and investment, and insurance.
Early studies conducted in other Asian countries and rural China found negative associations between illness conditions and medical expenditure with other types of consumptions. This study was conducted in three major cities and surrounding areas in western China, which had not been well investigated in published literature. The observed consumption patterns were different from those in early studies, and the negative associations were not observed. This study may complement the existing rural studies and provide useful information on western Chinese cities.
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods.
Although in cancer research microarray gene profiling studies have been successful in identifying genetic variants predisposing to the development and progression of cancer, the identified markers from analysis of single datasets often suffer low reproducibility. Among multiple possible causes, the most important one is the small sample size hence the lack of power of single studies. Integrative analysis jointly considers multiple heterogeneous studies, has a significantly larger sample size, and can improve reproducibility. In this article, we focus on cancer prognosis studies, where the response variables are progression-free, overall, or other types of survival. A group minimax concave penalty (GMCP) penalized integrative analysis approach is proposed for analyzing multiple heterogeneous cancer prognosis studies with microarray gene expression measurements. An efficient group coordinate descent algorithm is developed. The GMCP can automatically accommodate the heterogeneity across multiple datasets, and the identified markers have consistent effects across multiple studies. Simulation studies show that the GMCP provides significantly improved selection results as compared with the existing meta-analysis approaches, intensity approaches, and group Lasso penalized integrative analysis. We apply the GMCP to four microarray studies and identify genes associated with the prognosis of breast cancer.
integrative analysis; cancer prognosis; microarray; penalized selection
The health insurance system in Taiwan is comprised of public health insurance and private health insurance. The public health insurance, called “universal national health insurance” (NHI), was first established in 1995 and amended in 2011. The goal of this study is to provide an updated description of several important aspects of health insurance in Taiwan. Of special interest are household insurance coverage, medical expenditures (both gross and out-of-pocket), and coping strategies.
Data was collected via a phone call survey conducted in August and September of 2011. A household was the unit for survey and data analysis. A total of 2,424 households covering all major counties and cities in Taiwan were surveyed.
The survey revealed that households with smaller sizes and higher incomes were more likely to have higher coverage of public and private health insurance. In addition, households with the presence of chronic diseases were more likely to have both types of insurance. Analysis of both gross and out-of-pocket medical expenditure was conducted. It was suggested that health insurance could not fully remove the financial burden caused by illness. The presence of chronic disease and inpatient treatment were significantly associated with higher gross and out-of-pocket medical expenditure. In addition, the presence of inpatient treatment was significantly associated with extremely high medical expenditure. Regional differences were also observed, with households in the northern, central, and southern regions having less gross medical expenditures than those on the offshore islands. Households with the presence of inpatient treatment were more likely to cope with medical expenditure using means other than salaries.
Despite the considerable achievements of the health insurance system in Taiwan, there is still room for improvement. This study investigated coverage, cost, and coping strategies and may be informative to stakeholders of both basic and commercial health insurance.
Taiwan; Health insurance coverage; Medical expenditure; Coping strategy
Evidence from previous studies has suggested there may be physical and mental changes in health among testicular cancer survivors. No studies have been conducted in the United States, however.
Study participants were initially enrolled in the US Servicemen's Testicular Tumor Environmental and Endocrine Determinants (STEED) study between 2002 and 2005. A total of 246 TGCT (testicular germ cell tumor) cases and 236 non-testicular cancer controls participated in the current study, and completed a self-administered questionnaire. Mean time since diagnosis for cases was 14 years, and no less than five for all cases. Component scores determined from responses to questions about physical and mental health on SF36 were tabulated to yield two summary measures, physical component scores (PCS), and mental component scores (MCS). Component and summary scores were normalized to a score of 50 with a standard deviation of 10 by a linear T-score transformation.
Overall, cases may not suffer greatly in different quality of life than controls. When all cases and controls are compared, TGCT cases had lower PCS (mean: 51.9 95% CI: 50.6–53.2, P value: 0.037) than controls (mean: 53.6 95% CI: 52.7–54.6). MCS were not significantly different (P value: 0.091). In multivariate analyses, several physical health components were worse for TGCT cases such as role-physical (OR 1.19, 95% CI: 1.01–1.39) and general health (OR 1.26, 95% CI: 1.07–1.49) compared to controls. However, TGCT cases treated with chemotherapy had lower PCS (cases: 50.2, 95% CI: 47.6–52.8; controls: 53.6, 95% CI: 52.7–54.6, P value: 0.0032) and MCS (cases: 49.3, 95% CI: 46.5–52.1; controls: 52.0, 95% CI: 50.9–53.2, P value: 0.039). TGCT cases who received treatments other than chemotherapy did not differ from controls in either PCS or MCS.
Physical and general health limitations may affect testicular cancer survivors. Men treated with chemotherapy, however, may be most likely to suffer adverse health outcomes due to a combination of body-wide effects on physical and mental factors which affect various aspects of physical health, mental health, and overall quality of life. And in particular, physical functioning, role–physical, and general health are strongly affected.
Health status; Quality of life; Testicular cancer
Despite decades of intensive research, Non-Hodgkin Lymphoma (NHL) remains poorly understood and is largely incurable. NHL is a heterogeneous group of malignancies with multiple subtypes, each of which has distinct morphologic, immunophenotypic, and clinical features. Identifying the risk factors for NHL may improve our understanding of the underlying biological mechanisms and have an impact on clinical practice.
This article provides a review of several aspects of NHL, including epidemiology and subtype classification, clinical, environmental, genetic, and genomic risk factors identified for etiology and prognosis, and available statistical and bioinformatics tools for identification of genetic and genomic risk factors from the analysis of high-throughput studies.
Multiple clinical and environmental risk factors have been identified. However, they have failed to provide practically effective prediction. Genetic and genomic risk factors identified from high-throughput studies have suffered a lack of reproducibility. The identification of genetic/genomic risk factors demands innovative statistical and bioinformatics tools. Although multiple analysis methods have been developed, there is still room for improvement. There is a critical need for well-designed, prospective, large-scale pangenomic studies.
NHL; etiology; prognosis; risk factors; bioinformatics analysis
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent ‘non-standard’ applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.
principal component analysis; dimension reduction; bioinformatics methodologies; gene expression