PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (82)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
author:("Ma, shuang")
1.  Illness and medical and other expenditures: observations from western and eastern China 
Background
Illness and the medical expenditure that follows have a profound impact on the well-being of individuals and households. China is a huge country with significant regional differences. The goal of this study is to investigate the associations of illness and medical expenditure with other categories of household expenditures, with special attention paid to the differences in observations between the western and eastern regions.
Methods
A survey was conducted in six major cities in China, three in the east and three in the west, in 2011. Data on demographics, illness conditions, and medical and other expenditures were collected from 12,515 households.
Results
In the analysis of the associations of illness conditions and medical expenditure with demographics, multiple significant associations were observed, and there are differences between the eastern and western regions. In univariate analyses, illness conditions and medical expenditure were found as having significant associations with other categories of expenditures. In multivariate analyses adjusting for household and household head characteristics, few associations were observed, and there exist differences between the regions.
Conclusions
This study has provided empirical evidence on the associations of illness/medical expenditure with demographics and with other categories of expenditures. Differences across regions were observed in multiple aspects. The reasons underlying such differences are worth investigating further.
doi:10.1186/s12913-015-0730-6
PMCID: PMC4336723
Illness condition; Medical expenditure; Household expenditure; Cross-region difference; China
2.  Censored Rank Independence Screening for High-dimensional Survival Data 
Biometrika  2014;101(4):799-814.
Summary
In modern statistical applications, the dimension of covariates can be much larger than the sample size. In the context of linear models, correlation screening (Fan and Lv, 2008) has been shown to reduce the dimension of such data effectively while achieving the sure screening property, i.e., all of the active variables can be retained with high probability. However, screening based on the Pearson correlation does not perform well when applied to contaminated covariates and/or censored outcomes. In this paper, we study censored rank independence screening of high-dimensional survival data. The proposed method is robust to predictors that contain outliers, works for a general class of survival models, and enjoys the sure screening property. Simulations and an analysis of real data demonstrate that the proposed method performs competitively on survival data sets of moderate size and high-dimensional predictors, even when these are contaminated.
doi:10.1093/biomet/asu047
PMCID: PMC4318124  PMID: 25663709
High-dimensional survival data; Rank independence screening; Sure screening property
3.  Racial Differences in Nasopharyngeal Carcinoma in the United States 
Cancer epidemiology  2013;37(6):10.1016/j.canep.2013.08.008.
Background
Nasopharyngeal carcinoma (NPC) is a malignant neoplasm arising from the mucosal epithelium of the nasopharynx. Different races can have different etiology, presentation, and progression patterns.
Methods
Data were analyzed on NPC patients in the United States reported to the SEER (Surveillance, Epidemiology, and End Results) database between 1973 and 2009. Racial groups studied included non-Hispanic whites, Hispanic whites, blacks, Asians, and others. Patient characteristics, age-adjusted incidence and mortality rates, treatment, and five-year relative survival rates were compared across races. Stratification by stage at diagnosis and histologic type was considered. Multivariate regression was conducted to evaluate the significance of racial differences.
Results
Patient characteristics that were significantly different across races included age at diagnosis, histologic type, in situ/malignant tumors in lifetime, stage, grade, and regional nodes positive. Incidence and mortality rates were significantly different across races, with Asians having the highest rates overall and stratified by age and/or histologic type. Asians also had the highest rate of receiving radiation only. The racial differences in treatment were significant in the multivariate stratified analysis. When stratified by stage and histologic type, Asians had the best five-year survival rates. The survival experience of other races depended on stage and type. In the multivariate analysis, the racial differences were significant.
Conclusions
Analysis of the SEER data shows that racial differences exist among NPC patients in the U.S. This result can be informative to cancer epidemiologists and clinicians.
doi:10.1016/j.canep.2013.08.008
PMCID: PMC3851929  PMID: 24035238
nasopharyngeal carcinoma; racial differences; SEER
4.  Racial differences in mantle cell lymphoma in the United States 
BMC Cancer  2014;14(1):764.
Background
MCL (mantle cell lymphoma) is a rare subtype of NHL (non-Hodgkin lymphoma) with mostly poor prognosis. Different races have different etiology, presentation, and progression patterns.
Methods
Data were analyzed on MCL patients in the United States reported to the SEER (Surveillance, Epidemiology, and End Results) database between 1992 and 2009. SEER contains the most comprehensive population-based cancer information in the U.S., covering approximately 28% of the population. Racial groups analyzed included non-Hispanic whites, Hispanic whites, blacks, and Asians/PIs (Pacific Islanders). Patient characteristics, age-adjusted incidence rate, and survival rate were compared across races. Stratification by age, gender, and stage at diagnosis was considered. Multivariate analysis was conducted on survival.
Results
In the analysis of patients’ characteristics, distributions of gender, marital status, age at diagnosis, stage, and extranodal involvement were significantly different across races. For all three age groups and both male and female, non-Hispanic whites have the highest incidence rates. In the analysis of survival, for cancers diagnosed in the period of 1992–2004, no significant racial difference is observed. For cancers diagnosed in the period of 1999–2004, significant racial differences exist for the 40–64 age group and stage III and IV cancers.
Conclusions
Racial differences exist among MCL patients in the U.S. in terms of patients’ characteristics, incidence, and survival. More extended data collection and analysis are needed to more comprehensively describe and understand the racial differences.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2407-14-764) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2407-14-764
PMCID: PMC4210548  PMID: 25315847
Mantle cell lymphoma; Racial differences; SEER; Non-hodgkin lymphoma
5.  Identification of gene–environment interactions in cancer studies using penalization 
Genomics  2013;102(4):10.1016/j.ygeno.2013.08.006.
High-throughput cancer studies have been extensively conducted, searching for genetic markers associated with outcomes beyond clinical and environmental risk factors. Gene–environment interactions can have important implications beyond main effects. The commonly-adopted single-marker analysis cannot accommodate the joint effects of a large number of markers. The existing joint-effects methods also have limitations. Specifically, they may suffer from high computational cost, do not respect the “main effect, interaction” hierarchical structure, or use ineffective techniques. We develop a penalization method for the identification of important G × E interactions and main effects. It has an intuitive formulation, respects the hierarchical structure, accommodates the joint effects of multiple markers, and is computationally affordable. In numerical study, we analyze prognosis data under the AFT (accelerated failure time) model. Simulation shows satisfactory performance of the proposed method. Analysis of an NHL (non-Hodgkin lymphoma) study with SNP measurements shows that the proposed method identifies markers with important implications and satisfactory prediction performance.
doi:10.1016/j.ygeno.2013.08.006
PMCID: PMC3869641  PMID: 23994599
Gene–environment interaction; Penalized marker identification; Cancer prognosis
6.  Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method 
Statistics and its interface  2013;6(1):99-115.
Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with LASSO and MCP approaches are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using heterogeneous stock mice data and a rheumatoid arthritis.
doi:10.4310/SII.2013.v6.n1.a10
PMCID: PMC4172344  PMID: 25258655
Genetic association; Feature selection; Linkage disequilibrium; Penalized regression; Single nucleotide polymorphism
7.  Integrative analysis of multiple cancer genomic datasets under the heterogeneity model 
Statistics in medicine  2013;32(20):3509-3521.
In the analysis of cancer studies with high-dimensional genomic measurements, integrative analysis provides an effective way of pooling information across multiple heterogeneous datasets. The genomic basis of multiple independent datasets, which can be characterized by the sets of genomic markers, can be described using the homogeneity model or heterogeneity model. Under the homogeneity model, all datasets share the same set of markers associated with responses. In contrast, under the heterogeneity model, different studies have overlapping but possibly different sets of markers. The heterogeneity model contains the homogeneity model as a special case and can be much more flexible. Marker selection under the heterogeneity model calls for bi-level selection to determine whether a covariate is associated with response in any study at all as well as in which studies it is associated with responses. In this study, we consider two minimax concave penalty (MCP) based penalization approaches for marker selection under the heterogeneity model. For each approach, we describe its rationale and an effective computational algorithm. We conduct simulation to investigate their performance and compare with the existing alternatives. We also apply the proposed approaches to the analysis of gene expression data on multiple cancers.
doi:10.1002/sim.5780
PMCID: PMC3743947  PMID: 23519988
Integrative analysis; Heterogeneity model; Marker selection
8.  Polymorphisms in JAK/STAT Signaling Pathway Genes and Risk of Non-Hodgkin Lymphoma 
Leukemia research  2013;37(9):1120-1124.
Impaired function of Janus kinase/signal transducer and activator of transcription (JAK/STAT) signaling pathway genes leads to immunodeficiency and various hematopoietic disorders. We evaluated the association between genetic polymorphisms (SNPs) in 12 JAK/STAT pathway genes (JAK3, STAT1, STAT2, STAT3, STAT4, STAT5a, STAT5b, STAT6, SCOS1, SCOS2, SCOS3, and SCOS4) and NHL risk in a population-based case-control study of Connecticut women. We identified three SNPs in STAT3 (rs12949918 and rs6503695) and STAT4 (rs932169) associated with NHL risk after adjustment for multiple comparison. Our results suggest that genetic variation in JAK/STAT pathway genes may play a role in lymphomagenesis and warrants further investigation.
doi:10.1016/j.leukres.2013.05.003
PMCID: PMC3998836  PMID: 23768868
JAK/STAT signaling pathway; Non-Hodgkin Lymphoma; polymorphism; case-control study
9.  Birth Weight Reference Percentiles for Chinese 
PLoS ONE  2014;9(8):e104779.
Objective
To develop a reference of population-based gestational age-specific birth weight percentiles for contemporary Chinese.
Methods
Birth weight data was collected by the China National Population-based Birth Defects Surveillance System. A total of 1,105,214 live singleton births aged ≥28 weeks of gestation without birth defects during 2006–2010 were included. The lambda-mu-sigma method was utilized to generate percentiles and curves.
Results
Gestational age-specific birth weight percentiles for male and female infants were constructed separately. Significant differences were observed between the current reference and other references developed for Chinese or non-Chinese infants.
Conclusion
There have been moderate increases in birth weight percentiles for Chinese infants of both sexes and most gestational ages since 1980s, suggesting the importance of utilizing an updated national reference for both clinical and research purposes.
doi:10.1371/journal.pone.0104779
PMCID: PMC4134219  PMID: 25127131
10.  Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets 
Genetics research  2013;95(0):68-77.
SUMMARY
In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach.
doi:10.1017/S0016672313000086
PMCID: PMC4090387  PMID: 23938111
Integrative analysis; Cancer prognosis; Heterogeneity model; Penalization
11.  Ethnicity and Sex Modify the Association of Serum C-Reactive Protein with Microalbuminuria 
Ethnicity & disease  2008;18(3):324-329.
Objectives
To study the association between serum C-reactive protein (CRP) and urinary albumin excretion in the Multi-Ethnic Study of Atherosclerosis and to assess whether the association is modified by ethnicity, sex, or systolic blood pressure.
Methods
This was a cross-sectional study of 6675 participants who were free from macro albuminuria and clinical cardiovascular disease (mean age 62.1 years, 53% female; 39% White, 27% African American, 22% Hispanic, and 12% Chinese). Urinary albumin excretion was measured by spot urine albumin-to-creatinine ratio (ACR). Effect modifications were tested after adjusting for age, diabetes, body mass index, smoking, use of angiotensin-converting enzyme inhibitor or angiotensin-receptor blocker, other antihypertensive drugs, estrogens, statins, and high-density lipoprotein cholesterol and triglyceride levels.
Results
The association between CRP and ACR was modified by ethnicity (P=.01) and sex (P<.001), but not by systolic blood pressure. After multivariate adjustment, the association remained in Chinese, African American, and Hispanic men and African American women (P<.02 for African American men, and P<.04 for the other subgroups).
Conclusions
The association between CRP and ACR was modified by ethnicity and sex; it was stronger in non-White men and African American women. These interactions have not been reported before, and future studies should consider them.
PMCID: PMC4089959  PMID: 18785447
Albuminuria; C-Reactive Protein; Ethnicity; Gender
12.  Polymorphisms in DNA Repair Pathway Genes, Body Mass Index, and Risk of Non-Hodgkin Lymphoma 
American journal of hematology  2013;88(7):606-611.
We conducted a population-based case-control study in Connecticut women to test the hypothesis that genetic variations in DNA repair pathway genes may modify the relationship between body mass index (BMI) and risk of non-Hodgkin lymphoma (NHL). Compared to those with BMI < 25, women with BMI ≥ 25 had significantly increased risk of NHL among women who carried BRCA1 (rs799917) CT/TT, ERCC2 (rs13181) AA, XRCC1 (rs1799782) CC, and WRN (rs1801195) GG genotypes, but no increase in NHL risk among women who carried BRCA1 CC, ERCC2 AC/CC, XRCC1 CT/TT, and WRN GT/TT genotypes. A significant interaction with BMI was only observed for WRN (rs1801195, P=0.004) for T-cell lymphoma and ERCC2 (rs13181, P=0.002) for diffuse large B-cell lymphoma. The results suggest that common genetic variation in DNA repair pathway genes may modify the association between BMI and NHL risk.
doi:10.1002/ajh.23463
PMCID: PMC3902049  PMID: 23619945
Non-Hodgkin lymphoma; BMI; polymorphisms; DNA repair genes
13.  Penalized multivariate linear mixed model for longitudinal genome-wide association studies 
BMC Proceedings  2014;8(Suppl 1):S73.
We consider analysis of Genetic Analysis Workshop 18 data, which involves multiple longitudinal traits and dense genome-wide single-nucleotide polymorphism (SNP) markers. We use a multivariate linear mixed model to account for the covariance of random effects and multivariate residuals. We divide the SNPs into groups according to the genes they belong to and score them using weighted sum statistics. We propose a penalized approach for genetic variant selection at the gene level. The overall modeling and penalized selection method is referred to as the penalized multivariate linear mixed model. Cross-validation is used for tuning parameter selection. A resampling approach is adopted to evaluate the relative stability of the identified genes. Application to the Genetic Analysis Workshop 18 data shows that the proposed approach can effectively select markers associated with phenotypes at gene level.
doi:10.1186/1753-6561-8-S1-S73
PMCID: PMC4143695  PMID: 25519343
14.  The Risk of Second Cancers After Diagnosis of Primary Thyroid Cancer Is Elevated in Thyroid Microcarcinomas 
Thyroid  2013;23(5):575-582.
Background
Thyroid cancers have increased dramatically over the past few decades. Comorbidities may be important, and previous studies have indicated elevated second cancer risk after initial primary thyroid cancers. This study examined the risk of second cancers after development of a thyroid cancer, primary utilizing the Surveillance, Epidemiology, and End Results (SEER) program database.
Methods
The cohort consisted of men and women diagnosed with first primary thyroid cancer who were reported to a SEER database in 1973–2008 (n=52,103). Standardized incidence ratios (SIR) were calculated for all secondary cancers. Confidence intervals and p-values are at 0.05 significance alpha level and are two-sided based on Poisson exact methods.
Results
In this cohort, 4457 individuals developed second cancers. The risk of developing second cancers after a primary thyroid cancer varied from 10% to 150% depending on different cancer types. Cancers in all sites, breast, skin, prostate, kidney, brain, salivary gland, second thyroid, lymphoma, myeloma, and leukemia were elevated. The magnitude of the risk varied by histology, tumor size, calendar year of first primary thyroid cancer diagnosis, and the treatment of the primary thyroid cancer. The risk of a second cancer was elevated in patients whose first primary thyroid carcinomas were small, or were diagnosed after 1994, or in whom some form of radiation treatment was administered.
Conclusions
This large population-based analysis of second cancers among thyroid cancer patients suggests that there was an increase of second cancers in all sites, and the most commonly elevated second cancers were the salivary gland and kidney. Additionally, the increase in second cancers in patients with recently diagnosed thyroid microcarcinomas (<10 mm) suggests that aggressive radiation treatment of the first primary thyroid cancer, the environment, and genetic susceptibility, may increase the risk of a second cancer.
doi:10.1089/thy.2011.0406
PMCID: PMC3643257  PMID: 23237308
15.  Incorporating group correlations in genome-wide association studies using smoothed group Lasso 
Biostatistics (Oxford, England)  2012;14(2):205-219.
In genome-wide association studies, penalization is an important approach for identifying genetic markers associated with disease. Motivated by the fact that there exists natural grouping structure in single nucleotide polymorphisms and, more importantly, such groups are correlated, we propose a new penalization method for group variable selection which can properly accommodate the correlation between adjacent groups. This method is based on a combination of the group Lasso penalty and a quadratic penalty on the difference of regression coefficients of adjacent groups. The new method is referred to as smoothed group Lasso (SGL). It encourages group sparsity and smoothes regression coefficients for adjacent groups. Canonical correlations are applied to the weights between groups in the quadratic difference penalty. We first derive a GCD algorithm for computing the solution path with linear regression model. The SGL method is further extended to logistic regression for binary response. With the assistance of the majorize–minimization algorithm, the SGL penalized logistic regression turns out to be an iteratively penalized least-square problem. We also suggest conducting principal component analysis to reduce the dimensionality within groups. Simulation studies are used to evaluate the finite sample performance. Comparison with group Lasso shows that SGL is more effective in selecting true positives. Two datasets are analyzed using the SGL method.
doi:10.1093/biostatistics/kxs034
PMCID: PMC3590928  PMID: 22988281
Group selection; Regularization; SNP; Smoothing
16.  Incorporating Network Structure in Integrative Analysis of Cancer Prognosis Data 
Genetic epidemiology  2012;37(2):173-183.
In high-throughput cancer genomic studies, markers identified from the analysis of single datasets may have unsatisfactory properties because of low sample sizes. Integrative analysis pools and analyzes raw data from multiple studies, and can effectively increase sample size and lead to improved marker identification results. In this study, we consider the integrative analysis of multiple high-throughput cancer prognosis studies. In the existing integrative analysis studies, the interplay among genes, which can be described using the network structure, has not been effectively accounted for. In network analysis, tightly-connected nodes (genes) are more likely to have related biological functions and similar regression coefficients. The goal of this study is to develop an analysis approach that can incorporate the gene network structure in integrative analysis. To this end, we adopt an AFT (accelerated failure time) model to describe survival. A weighted least squares approach, which has low computational cost, is adopted for estimation. For marker selection, we propose a new penalization approach. The proposed penalty is composed of two parts. The first part is a group MCP penalty, and conducts gene selection. The second part is a Laplacian penalty, and smoothes the differences of coefficients for tightly-connected genes. A group coordinate descent approach is developed to compute the proposed estimate. Simulation study shows satisfactory performance of the proposed approach when there exist moderate to strong correlations among genes. We analyze three lung cancer prognosis datasets, and demonstrate that incorporating the network structure can lead to the identification of important genes and improved prediction performance.
doi:10.1002/gepi.21697
PMCID: PMC3909475  PMID: 23161517
Integrative analysis; Cancer prognosis; Gene network; Penalized selection; Laplacian shrinkage
17.  Integrative Analysis of Cancer Prognosis Data with Multiple Subtypes Using Regularized Gradient Descent 
Genetic epidemiology  2012;10.1002/gepi.21669.
In cancer research, high-throughput profiling studies have been extensively conducted, searching for genes/SNPs associated with prognosis. Despite seemingly significant differences, different subtypes of the same cancer (or different types of cancers) may share common susceptibility genes. In this study, we analyze prognosis data on multiple subtypes of the same cancer, but note that the proposed approach is directly applicable to the analysis of data on multiple types of cancers. We describe the genetic basis of multiple subtypes using the heterogeneity model, which allows overlapping but different sets of susceptibility genes/SNPs for different subtypes. An accelerated failure time (AFT) model is adopted to describe prognosis. We develop a regularized gradient descent approach, which conducts gene-level analysis and identifies genes that contain important SNPs associated with prognosis. The proposed approach belongs to the family of gradient descent approaches, is intuitively reasonable, and has affordable computational cost. Simulation study shows that when prognosis-associated SNPs are clustered in a small number of genes, the proposed approach outperforms alternatives with significantly more true positives and fewer false positives. We analyze an NHL (non-Hodgkin lymphoma) prognosis study with SNP measurements, and identify genes associated with the three major subtypes of NHL, namely DLBCL, FL and CLL/SLL. The proposed approach identifies genes different from using alternative approaches and has the best prediction performance.
doi:10.1002/gepi.21669
PMCID: PMC3729731  PMID: 22851516
Integrative analysis; Cancer Prognosis; Gradient descent; NHL; SNP
18.  Health Insurance Utilization and Its Impact: Observations from the Middle-Aged and Elderly in China 
PLoS ONE  2013;8(12):e80978.
Objective
In China, despite a high coverage rate, health insurance is not used for all illness episodes. Our goal is to identify subjects’ characteristics associated with insurance utilization and the association between utilization and medical expenditure.
Methods
A survey was conducted in January and February of 2012. 2093 middle-aged and elderly subjects (45 years old and above) were surveyed.
Results
Heath insurance was not utilized for 12.6% (inpatient), 53.3% (outpatient), and 72.6% (self-treatment) of disease episodes. Subjects’ characteristics were associated with insurance utilization. Inpatient and outpatient treatments were expensive. In the multivariate analysis of outpatient treatment expenditure, insurance utilization was significantly associated with higher treatment cost, lost income, and gross total cost.
Conclusion
Utilization of health insurance may need to be improved. Insurance utilization can reduce out-of-pocket medical expenditure. However, the amount paid by the insured is still high. Policy intervention is needed to further improve the effectiveness of health insurance.
doi:10.1371/journal.pone.0080978
PMCID: PMC3855696  PMID: 24324654
19.  Hierarchical Shrinkage Priors and Model Fitting for High-dimensional Generalized Linear Models 
Statistical applications in genetics and molecular biology  2012;11(6):10.1515/1544-6115.1803 /j/sagmb.2012.11.issue-6/1544-6115.1803/1544-6115.1803.xml.
Genetic and other scientific studies routinely generate very many predictor variables, which can be naturally grouped, with predictors in the same groups being highly correlated. It is desirable to incorporate the hierarchical structure of the predictor variables into generalized linear models for simultaneous variable selection and coefficient estimation. We propose two prior distributions: hierarchical Cauchy and double-exponential distributions, on coefficients in generalized linear models. The hierarchical priors include both variable-specific and group-specific tuning parameters, thereby not only adopting different shrinkage for different coefficients and different groups but also providing a way to pool the information within groups. We fit generalized linear models with the proposed hierarchical priors by incorporating flexible expectation-maximization (EM) algorithms into the standard iteratively weighted least squares as implemented in the general statistical package R. The methods are illustrated with data from an experiment to identify genetic polymorphisms for survival of mice following infection with Listeria monocytogenes. The performance of the proposed procedures is further assessed via simulation studies. The methods are implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
doi:10.1515/1544-6115.1803
PMCID: PMC3658361  PMID: 23192052
Adaptive Lasso; Bayesian inference; Generalized linear model; Genetic polymorphisms; Grouped variables; Hierarchical model; High-dimensional data; Shrinkage prior
20.  A Selective Review of Group Selection in High-Dimensional Models 
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.
doi:10.1214/12-STS392
PMCID: PMC3810358  PMID: 24174707
Bi-level selection; group LASSO; concave group selection; penalized regression; sparsity; oracle property
21.  Nonparametric ROC Based Evaluation for Survival Outcomes 
Statistics in medicine  2012;31(23):2660-2675.
SUMMARY
For censored survival outcomes, it can be of great interest to evaluate the predictive power of individual markers or their functions. Compared with alternative evaluation approaches, the time-dependent ROC (receiver operating characteristics) based approaches rely on much weaker assumptions, can be more robust, and hence are preferred. In this article, we examine evaluation of markers’ predictive power using the time-dependent ROC curve and a concordance measure which can be viewed as a weighted area under the time-dependent AUC (area under the ROC curve) profile. This study significantly advances from existing time-dependent ROC studies by developing nonparametric estimators of the summary indexes and, more importantly, rigorously establishing their asymptotic properties. It reinforces the statistical foundation of the time-dependent ROC based evaluation approaches for censored survival outcomes. Numerical studies, including simulations and application to an HIV clinical trial, demonstrate the satisfactory finite-sample performance of the proposed approaches.
doi:10.1002/sim.5386
PMCID: PMC3743052  PMID: 22987578
time-dependent ROC; concordance measure; inverse-probability-of-censoring weighting; marker evaluation; survival outcomes
22.  Identification of Breast Cancer Prognosis Markers via Integrative Analysis 
Summary
In breast cancer research, it is of great interest to identify genomic markers associated with prognosis. Multiple gene profiling studies have been conducted for such a purpose. Genomic markers identified from the analysis of single datasets often do not have satisfactory reproducibility. Among the multiple possible reasons, the most important one is the small sample sizes of individual studies. A cost-effective solution is to pool data from multiple comparable studies and conduct integrative analysis. In this study, we collect four breast cancer prognosis studies with gene expression measurements. We describe the relationship between prognosis and gene expressions using the accelerated failure time (AFT) models. We adopt a 2-norm group bridge penalization approach for marker identification. This integrative analysis approach can effectively identify markers with consistent effects across multiple datasets and naturally accommodate the heterogeneity among studies. Statistical and simulation studies demonstrate satisfactory performance of this approach. Breast cancer prognosis markers identified using this approach have sound biological implications and satisfactory prediction performance.
doi:10.1016/j.csda.2012.02.017
PMCID: PMC3389801  PMID: 22773869
Breast cancer prognosis; Gene expression; Marker identification; Integrative analysis; 2-norm group bridge
23.  Adjusting confounders in ranking biomarkers: a model-based ROC approach 
Briefings in Bioinformatics  2012;13(5):513-523.
High-throughput studies have been extensively conducted in the research of complex human diseases. As a representative example, consider gene-expression studies where thousands of genes are profiled at the same time. An important objective of such studies is to rank the diagnostic accuracy of biomarkers (e.g. gene expressions) for predicting outcome variables while properly adjusting for confounding effects from low-dimensional clinical risk factors and environmental exposures. Existing approaches are often fully based on parametric or semi-parametric models and target evaluating estimation significance as opposed to diagnostic accuracy. Receiver operating characteristic (ROC) approaches can be employed to tackle this problem. However, existing ROC ranking methods focus on biomarkers only and ignore effects of confounders. In this article, we propose a model-based approach which ranks the diagnostic accuracy of biomarkers using ROC measures with a proper adjustment of confounding effects. To this end, three different methods for constructing the underlying regression models are investigated. Simulation study shows that the proposed methods can accurately identify biomarkers with additional diagnostic power beyond confounders. Analysis of two cancer gene-expression studies demonstrates that adjusting for confounders can lead to substantially different rankings of genes.
doi:10.1093/bib/bbs008
PMCID: PMC3431720  PMID: 22396461
ranking biomarkers; ROC; confounders; high-throughput data
24.  VARIABLE SELECTION IN PARTLY LINEAR REGRESSION MODEL WITH DIVERGING DIMENSIONS FOR RIGHT CENSORED DATA 
Statistica Sinica  2012;22(3):1003-1020.
Recent biomedical studies often measure two distinct sets of risk factors: low-dimensional clinical and environmental measurements, and high-dimensional gene expression measurements. For prognosis studies with right censored response variables, we propose a semiparametric regression model whose covariate effects have two parts: a nonparametric part for low-dimensional covariates, and a parametric part for high-dimensional covariates. A penalized variable selection approach is developed. The selection of parametric covariate effects is achieved using an iterated Lasso approach, for which we prove the selection consistency property. The nonparametric component is estimated using a sieve approach. An empirical model selection tool for the nonparametric component is derived based on the Kullback-Leibler geometry. Numerical studies show that the proposed approach has satisfactory performance. Application to a lymphoma study illustrates the proposed method.
PMCID: PMC3744344  PMID: 23956611
Semiparametric regression; variable selection; right censored data; iterated Lasso
25.  Illness, medical expenditure and household consumption: observations from Taiwan 
BMC Public Health  2013;13:743.
Background
Illness conditions lead to medical expenditure. Even with various types of medical insurance, there can still be considerable out-of-pocket costs. Medical expenditure can affect other categories of household consumptions. The goal of this study is to provide an updated empirical description of the distributions of illness conditions and medical expenditure and their associations with other categories of household consumptions.
Methods
A phone-call survey was conducted in June and July of 2012. The study was approved by ethics review committees at Xiamen University and FuJen Catholic University. Data was collected using a Computer-Assisted Telephone Survey System (CATSS). “Household” was the unit for data collection and analysis. Univariate and multivariate analyses were conducted, examining the distributions of illness conditions and the associations of illness and medical expenditure with other household consumptions.
Results
The presence of chronic disease and inpatient treatment was not significantly associated with household characteristics. The level of per capita medical expenditure was significantly associated with household size, income, and household head occupation. The presence of chronic disease was significantly associated with levels of education, insurance and durable goods consumption. After adjusting for confounders, the associations with education and durable goods consumption remained significant. The presence of inpatient treatment was not associated with consumption levels. In the univariate analysis, medical expenditure was significantly associated with all other consumption categories. After adjusting for confounding effects, the associations between medical expenditure and the actual amount of entertainment expenses and percentages of basic consumption, savings, and insurance (as of total consumption) remained significant.
Conclusion
This study provided an updated description of the distributions of illness conditions and medical expenditure in Taiwan. The findings were mostly positive in that illness and medical expenditure were not observed to be significantly associated with other consumption categories. This observation differed from those made in some other Asian countries and could be explained by the higher economic status and universal basic health insurance coverage of Taiwan.
doi:10.1186/1471-2458-13-743
PMCID: PMC3751346  PMID: 23938071
Illness; Medical expenditure; Household consumption; Taiwan

Results 1-25 (82)