The assumption of proportional hazards (PH) fundamental to the Cox PH model sometimes may not hold in practice. In this paper, we propose a generalization of the Cox PH model in terms of the cumulative hazard function taking a form similar to the Cox PH model, with the extension that the baseline cumulative hazard function is raised to a power function. Our model allows for interaction between covariates and the baseline hazard and it also includes, for the two sample problem, the case of two Weibull distributions and two extreme value distributions differing in both scale and shape parameters. The partial likelihood approach can not be applied here to estimate the model parameters. We use the full likelihood approach via a cubic B-spline approximation for the baseline hazard to estimate the model parameters. A semi-automatic procedure for knot selection based on Akaike’s Information Criterion is developed. We illustrate the applicability of our approach using real-life data.
censored survival data analysis; crossing hazards; Frailty model; maximum likelihood; regression; spline function; Akaike information criterion; Weibull distribution; extreme value distribution
Cox models with time-varying coefficients offer great flexibility in capturing the temporal dynamics of covariate effects on right censored failure times. Since not all covariate coefficients are time-varying, model selection for such models presents an additional challenge, which is to distinguish covariates with time-varying coefficient from those with time-independent coefficient. We propose an adaptive group lasso method that not only selects important variables but also selects between time-independent and time-varying specifications of their presence in the model. Each covariate effect is partitioned into a time-independent part and a time-varying part, the latter of which is characterized by a group of coefficients of basis splines without intercept. Model selection and estimation are carried out through a fast, iterative group shooting algorithm. Our approach is shown to have good properties in a simulation study that mimics realistic situations with up to 20 variables. A real example illustrates the utility of the method.
B-spline; Group lasso; Varying-coefficient
We sough to investigate the effect of serum uric acid (SUA) levels on risk of cancer incidence in men and to flexibly determine the shape of this association by using a novel analytical approach.
A population-based cohort of 78,850 Austrian men who received 264,347 serial SUA measurements was prospectively followed-up for a median of 12.4 years. Data were collected between 1985 and 2003. Penalized splines (P-splines) in extended Cox-type additive hazard regression were used to flexibly model the association between SUA, as a time-dependent covariate, and risk of overall and site-specific cancer incidence and to calculate adjusted hazard ratios with their 95% confidence intervals.
During follow-up 5189 incident cancers were observed. Restricted maximum-likelihood optimizing P-spline models revealed a moderately J-shaped effect of SUA on risk of overall cancer incidence, with statistically significantly increased hazard ratios in the upper third of the SUA distribution. Increased SUA (≥8.00 mg/dL) further significantly increased risk for several site-specific malignancies, with P-spline analyses providing detailed insight about the shape of the association with these outcomes.
Our study is the first to demonstrate a dose–response association between SUA and cancer incidence in men, simultaneously reporting on the usefulness of a novel methodological framework in epidemiologic research.
Cancer incidence; Epidemiology; Extended Cox-type additive hazard regression; Men; Penalized splines; Risk factor; Serum uric acid
For functional neuroimaging studies that involve experimental stimuli measuring dose levels, e.g. of an anesthetic agent, typical statistical techniques include correlation analysis, analysis of variance or polynomial regression models. These standard approaches have limitations: correlation analysis only provides a crude estimate of the linear relationship between dose levels and brain activity; ANOVA is designed to accommodate a few specified dose levels; polynomial regression models have limited capacity to model varying patterns of association between dose levels and measured activity across the brain. These shortcomings prompt the need to develop methods that more effectively capture dose-dependent neural processing responses. We propose a class of mixed effects spline models that analyze the dose-dependent effect using either regression or smoothing splines. Our method offers flexible accommodation of different response patterns across various brain regions, controls for potential confounding factors, and accounts for subject variability in brain function. The estimates from the mixed effects spline model can be readily incorporated into secondary analyses, for instance, targeting spatial classifications of brain regions according to their modeled response profiles. The proposed spline models are also extended to incorporate interaction effects between the dose-dependent response function and other factors. We illustrate our proposed statistical methodology using data from a PET study of the effect of ethanol on brain function. A simulation study is conducted to compare the performance of the proposed mixed effects spline models and a polynomial regression model. Results show that the proposed spline models more accurately capture varying response patterns across voxels, especially at voxels with complex response shapes. Finally, the proposed spline models can be used in more general settings as a flexible modeling tool for investigating the effects of any continuous covariates on neural processing responses.
Regression splines; Smoothing splines; Dose-dependent effect; Mixed effects spline models; Continuous covariates
Maternal and fetal characteristics are important determinants of fetal growth potential, and should ideally be taken into consideration when evaluating fetal growth variation. We developed a model for individually customised growth charts for estimated fetal weight, which takes into account physiological maternal and fetal characteristics known at the start of pregnancy. We used fetal ultrasound data of 8,162 pregnant women participating in the Generation R Study, a prospective, population-based cohort study from early pregnancy onwards. A repeated measurements regression model was constructed, using backward selection procedures for identifying relevant maternal and fetal characteristics. The final model for estimating expected fetal weight included gestational age, fetal sex, parity, ethnicity, maternal age, height and weight. Using this model, we developed individually customised growth charts, and their corresponding standard deviations, for fetal weight from 18 weeks onwards. Of the total of 495 fetuses who were classified as small size for gestational age (<10th percentile) when fetal weight was evaluated using the normal population growth chart, 80 (16%) were in the normal range when individually customised growth charts were used. 550 fetuses were classified as small size for gestational age using individually customised growth charts, and 135 of them (25%) were classified as normal if the unadjusted reference chart was used. In conclusion, this is the first study using ultrasound measurements in a large population-based study to fit a model to construct individually customised growth charts, taking into account physiological maternal and fetal characteristics. These charts might be useful for use in epidemiological studies and in clinical practice.
Electronic supplementary material
The online version of this article (doi:10.1007/s10654-011-9629-7) contains supplementary material, which is available to authorized users.
Customised fetal growth curves; Ultrasound; Fetal weight; Biometry; Ethnicity; Maternal anthropometrics
One of the major issues in expression profiling analysis still is to outline proper thresholds to determine differential expression, while avoiding false positives. The problem being that the variance is inversely proportional to the log of signal intensities. Aiming to solve this issue, we describe a model, expression variation (EV), based on the LMS method, which allows data normalization and to construct confidence bands of gene expression, fitting cubic spline curves to the Box–Cox transformation. The confidence bands, fitted to the actual variance of the data, include the genes devoid of significant variation, and allow, based on the confidence bandwidth, to calculate EVs. Each outlier is positioned according to the dispersion space (DS) and a P-value is statistically calculated to determine EV. This model results in variance stabilization. Using two Affymetrix-generated datasets, the sets of differentially expressed genes selected using EV and other classical methods were compared. The analysis suggests that EV is more robust on variance stabilization and on selecting differential expression from both rare and strongly expressed genes.
Fully understanding the determinants and sequelae of fetal growth requires a continuous measure of birth weight adjusted for gestational age. Published United States reference data, however, provide estimates only of the median and lowest and highest 5th and 10th percentiles for birth weight at each gestational age. The purpose of our analysis was to create more continuous reference measures of birth weight for gestational age for use in epidemiologic analyses.
We used data from the most recent nationwide United States Natality datasets to generate multiple reference percentiles of birth weight at each completed week of gestation from 22 through 44 weeks. Gestational age was determined from last menstrual period. We analyzed data from 6,690,717 singleton infants with recorded birth weight and sex born to United States resident mothers in 1999 and 2000.
Birth weight rose with greater gestational age, with increasing slopes during the third trimester and a leveling off beyond 40 weeks. Boys had higher birth weights than girls, later born children higher weights than firstborns, and infants born to non-Hispanic white mothers higher birth weights than those born to non-Hispanic black mothers. These results correspond well with previously published estimates reporting limited percentiles.
Our method provides comprehensive reference values of birth weight at 22 through 44 completed weeks of gestation, derived from broadly based nationwide data. Other approaches require assumptions of normality or of a functional relationship between gestational age and birth weight, which may not be appropriate. These data should prove useful for researchers investigating the predictors and outcomes of altered fetal growth.
MeSH Headings: Birth weight; fetal weight; gestational age; premature birth; ultrasonography
In this work, we propose penalized spline based methods for functional mixed effects models with varying coefficients. We decompose longitudinal outcomes as a sum of several terms: a population mean function, covariates with time-varying coefficients, functional subject-specific random effects and residual measurement error processes. Using penalized splines, we propose nonparametric estimation of the population mean function, varying-coefficient, random subject-specific curves and the associated covariance function which represents between-subject variation and the variance function of the residual measurement errors which represents within-subject variation. Proposed methods offer flexible estimation of both the population-level and subject-level curves. In addition, decomposing variability of the outcomes as a between-subject and a within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. We use a likelihood based method to select multiple smoothing parameters. Furthermore, we study the asymptotics of the baseline P-spline estimator with longitudinal data. We conduct simulation studies to investigate performance of the proposed methods. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of Berkeley growth data where we identified clearly distinct patterns of the between- and within-subject covariance functions of children's heights. We also apply the proposed methods to estimate the effect of anti-hypertensive treatment from the Framingham Heart Study data.
Multi-level functional data; Functional random effects; Semiparametric longitudinal data analysis
Sex differences in fetal growth have been reported, but how this happens remains to be described. It is unknown if fetal growth rates, a reflection of genetic and environmental factors, express sexually dimorphic sensitivity to the mother herself.
This analysis investigated homogeneity of male and female growth responses to maternal height and weight. The study sample included 3495 uncomplicated singleton pregnancies followed longitudinally. Analytic models regressed fetal and neonatal weight on tertiles of maternal height and weight, and modification by sex was investigated (n=1814 males, n=1681 females) with birth gestational age, maternal parity and smoking as covariates.
Sex modified the effects of maternal height and weight on fetal growth rates and birth weight. Among boys, tallest maternal height influenced fetal weight growth prior to 18 gestational weeks of age (p=0.006), pre-pregnancy maternal weight and BMI subsequently had influence (p<0.001); this was not found among girls. Additionally, interaction terms between sex, maternal height, and maternal weight identified that males were more sensitive to maternal weight among shorter mothers (p=0.003), and more responsive to maternal height among lighter mothers (p<=0.03), compared to females. Likewise, neonatal birth weight dimorphism varied by maternal phenotype. A male advantage of 60 grams occurred among neonates of the shortest and lightest mothers (p=0.08), compared to 150 and 191 grams among short and heavy mothers, and tall and light weight mothers, respectively (p=0.01). Sex differences in response to maternal size are underappreciated sources of variation in fetal growth studies and may reflect differential growth strategies.
Maternal anthropometry; fetal growth rate; birth weight; sexual dimorphism; pregnancy
We previously developed a flexible specification of the UNAIDS Estimation and Projection Package (EPP) that relied on splines to generate time-varying values for the force of infection parameter. Here, we test the feasibility of this approach for concentrated HIV/AIDS epidemics with very sparse data and compare two methods for making short-term future projections with the spline-based model.
Penalised B-splines are used to model the average infection risk over time within the EPP 2011 modelling framework, which includes antiretroviral treatment effects and CD4 cell count progression, and is fit to sentinel surveillance prevalence data with a Bayesian algorithm. We compare two approaches for future projections: (1) an informative prior related to equilibrium prevalence and (2) a random walk formulation.
The spline-based model produced plausible fits across a range of epidemics, which included 87 subpopulations from 14 countries with concentrated epidemics and 75 subpopulations from 33 countries with generalised epidemics. The equilibrium prior and random walk approaches to future projections yielded similar prevalence estimates, and both performed well in tests of out-of-sample predictive validity for prevalence. In contrast, in some cases the two approaches varied substantially in estimates of incidence, with the random walk formulation avoiding extreme changes in incidence.
A spline-based approach to allowing the force of infection parameter to vary over time within EPP 2011 is robust across a diverse array of epidemics, including concentrated ones with limited surveillance data. Future work on the EPP model should consider the impact that different modelling approaches have on estimates of HIV incidence.
HIV; Surveillance; Mathematical Model
The semiparametric partially linear model allows flexible modeling of covariate effects on the response variable in regression. It combines the flexibility of nonparametric regression and parsimony of linear regression. The most important assumption in the existing methods for the estimation in this model is to assume a priori that it is known which covariates have a linear effect and which do not. However, in applied work, this is rarely known in advance. We consider the problem of estimation in the partially linear models without assuming a priori which covariates have linear effects. We propose a semiparametric regression pursuit method for identifying the covariates with a linear effect. Our proposed method is a penalized regression approach using a group minimax concave penalty. Under suitable conditions we show that the proposed approach is model-pursuit consistent, meaning that it can correctly determine which covariates have a linear effect and which do not with high probability. The performance of the proposed method is evaluated using simulation studies, which support our theoretical results. A real data example is used to illustrated the application of the proposed method.
Group selection; Minimax concave penalty; Model-pursuit consistency; Penalized regression; Semiparametric models
We discuss a flexible method for modeling survival data using penalized smoothing splines when the values of covariates change for the duration of the study. The Cox proportional hazards model has been widely used for the analysis of treatment and prognostic effects with censored survival data. However, a number of theoretical problems with respect to the baseline survival function remain unsolved. We use the generalized additive models (GAMs) with B splines to estimate the survival function and select the optimum smoothing parameters based on a variant multifold cross-validation (CV) method. The methods are compared with the generalized cross-validation (GCV) method using data from a long-term study of patients with primary biliary cirrhosis (PBC).
Nonparametric regression models are proposed in the framework of ecological inference for exploratory modeling of disease prevalence rates adjusted for variables, such as age, ethnicity/race, and socio-economic status. Ecological inference is needed when a response variable and covariate are not available at the subject level because only summary statistics are available for the reporting unit, for example, in the form of R × C tables. In this article, only the marginal counts are assumed available in the sample of R × C contingency tables for modeling the joint distribution of counts. A general form for the ecological regression model is proposed, whereby certain covariates are included as a varying coefficient regression model, whereas others are included as a functional linear model. The nonparametric regression curves are modeled as splines fit by penalized weighted least squares. A data-driven selection of the smoothing parameter is proposed using the pointwise maximum squared bias computed from averaging kernels (explained by O’Sullivan, 1986, Statistical Science 1, 502–517). Analytic expressions for bias and variance are provided that could be used to study the rates of convergence of the estimators. Instead, this article focuses on demonstrating the utility of the estimators in a study of disparity in health outcomes by ethnicity/race.
Ecological inference; Incomplete R × C tables; P-splines; Randomized response
We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH-AARP Diet and Health Study and examine its performance in a simulation study.
B-splines; Dirichlet process prior; Gibbs sampling; Measurement error; Metropolis-Hastings algorithm; Partly linear model
We propose and study a unified procedure for variable selection in partially linear models. A new type of double-penalized least squares is formulated, using the smoothing spline to estimate the nonparametric part and applying a shrinkage penalty on parametric components to achieve model parsimony. Theoretically we show that, with proper choices of the smoothing and regularization parameters, the proposed procedure can be as efficient as the oracle estimator (Fan and Li, 2001). We also study the asymptotic properties of the estimator when the number of parametric effects diverges with the sample size. Frequentist and Bayesian estimates of the covariance and confidence intervals are derived for the estimators. One great advantage of this procedure is its linear mixed model (LMM) representation, which greatly facilitates its implementation by using standard statistical software. Furthermore, the LMM framework enables one to treat the smoothing parameter as a variance component and hence conveniently estimate it together with other regression coefficients. Extensive numerical studies are conducted to demonstrate the effective performance of the proposed procedure.
Key words and phrases: Semiparametric regression; Smoothing splines; Smoothly clipped absolute deviation; Variable selection
Based on combined data for 4880 patients, 2 previous studies reported that advanced age is a predictor of increased renal cell carcinoma–specific mortality (RCC-SM). We explored the effect of age in cubic spline analyses to identify the age groups with the most elevated risk for renal cell carcinoma (RCC).
Our study included 3595 patients from 14 European centres who had partial or radical nephrectomies. We used the Kaplan–Meier method to compile life tables, and we performed Cox regression analyses to assess RCC-SM. Covariates included age at diagnosis, sex, TNM (tumour, node, metastasis) stage, tumour size, Fuhrman grade, symptom classification and histological subtype.
Age ranged from 10 to 89 (mean 63, median 67) years. The median duration of follow-up was 2.9 years. The median survival for the cohort was 13.4 years. Stage distribution was as follows: 1915 patients (53.3%) had stage I disease, 388 (10.8%) had stage II, 895 (24.9%) had stage III and 397 (11.0%) had stage IV disease. In multivariate analyses, we coded age at diagnosis as a cubic spline, and it achieved independent predictor status (p < 0.001). The risk of RCC-SM was lowest among patients younger than 50 years. We observed an increase in RCC-SM until the age of 50, at which point the level of risk reached a plateau. We observed a second increase among patients aged 75–89 years. We found similar patterns when we stratified patients according to the 2002 American Joint Committee on Cancer (AJCC) stages.
The effect of age shows prognostic significance and indicates that follow-up and possibly secondary treatments might need to be adjusted according to the age of the patient.
Mathematical models for revealing the dynamics and interactions properties of biological systems play an important role in computational systems biology. The inference of model parameter values from time-course data can be considered as a "reverse engineering" process and is still one of the most challenging tasks. Many parameter estimation methods have been developed but none of these methods is effective for all cases and can overwhelm all other approaches. Instead, various methods have their advantages and disadvantages. It is worth to develop parameter estimation methods which are robust against noise, efficient in computation and flexible enough to meet different constraints.
Two parameter estimation methods of combining spline theory with Linear Programming (LP) and Nonlinear Programming (NLP) are developed. These methods remove the need for ODE solvers during the identification process. Our analysis shows that the augmented cost function surfaces used in the two proposed methods are smoother; which can ease the optima searching process and hence enhance the robustness and speed of the search algorithm. Moreover, the cores of our algorithms are LP and NLP based, which are flexible and consequently additional constraints can be embedded/removed easily. Eight system biology models are used for testing the proposed approaches. Our results confirm that the proposed methods are both efficient and robust.
The proposed approaches have general application to identify unknown parameter values of a wide range of systems biology models.
Upper cervical cord injury was produced in fetal rabbits at 22-26 days' gestation. In 11 setuses with severe cord injury delivered at 28-29 days' gestation there was a median reduction in lung weight (expressed as a proportion of body weight) of 43% and a median reduction in estimated total lung DNA of 16% in comparison with paired operated littermates with intact cords. The hypoplastic lungs showed collapse on histology; if cord damage had been inflicted before 24 days' gestation there was retarded maturation. We conclude that the central nervous system plays a vital role in fetal lung growth and maturation, probably by maintenance of fetal respiratory movements.
The accurate characterization of spike firing rates including the determination of when changes in activity occur is a fundamental issue in the analysis of neurophysiological data. Here we describe a state-space model for estimating the spike rate function that provides a maximum likelihood estimate of the spike rate, model goodness-of-fit assessments, as well as confidence intervals for the spike rate function and any other associated quantities of interest. Using simulated spike data, we first compare the performance of the state-space approach with that of Bayesian adaptive regression splines (BARS) and a simple cubic spline smoothing algorithm. We show that the state-space model is computationally efficient and comparable with other spline approaches. Our results suggest both a theoretically sound and practical approach for estimating spike rate functions that is applicable to a wide range of neurophysiological data.
Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error.
covariance function; growth; beef cattle; random regression; B-splines
In this paper we develop a new framework for path planning of flexible needles with bevel tips. Based on a stochastic model of needle steering, the probability density function for the needle tip pose is approximated as a Gaussian. The means and covariances are estimated using an error propagation algorithm which has second order accuracy. Then we adapt the path-of-probability (POP) algorithm to path planning of flexible needles with bevel tips. We demonstrate how our planning algorithm can be used for feedback control of flexible needles. We also derive a closed-form solution for the port placement problem for finding good insertion locations for flexible needles in the case when there are no obstacles. Furthermore, we propose a new method using reference splines with the POP algorithm to solve the path planning problem for flexible needles in more general cases that include obstacles.
flexible needles; path planning; stochastic model; path-of-probability algorithm; error propagation; port placement; feedback control
Exposure to air pollutants is suggested to adversely affect fetal growth, but the evidence remains inconsistent in relation to specific outcomes and exposure windows.
Using birth records from the two major maternity hospitals in Newcastle upon Tyne in northern England between 1961 and 1992, we constructed a database of all births to mothers resident within the city. Weekly black smoke exposure levels from routine data recorded at 20 air pollution monitoring stations were obtained and individual exposures were estimated via a two-stage modeling strategy, incorporating temporally and spatially varying covariates. Regression analyses, including 88,679 births, assessed potential associations between exposure to black smoke and birth weight, gestational age and birth weight standardized for gestational age and sex.
Significant associations were seen between black smoke and both standardized and unstandardized birth weight, but not for gestational age when adjusted for potential confounders. Not all associations were linear. For an increase in whole pregnancy black smoke exposure, from the 1st (7.4 μg/m3) to the 25th (17.2 μg/m3), 50th (33.8 μg/m3), 75th (108.3 μg/m3), and 90th (180.8 μg/m3) percentiles, the adjusted estimated decreases in birth weight were 33 g (SE 1.05), 62 g (1.63), 98 g (2.26) and 109 g (2.44) respectively. A significant interaction was observed between socio-economic deprivation and black smoke on both standardized and unstandardized birth weight with increasing effects of black smoke in reducing birth weight seen with increasing socio-economic disadvantage.
The findings of this study progress the hypothesis that the association between black smoke and birth weight may be mediated through intrauterine growth restriction. The associations between black smoke and birth weight were of the same order of magnitude as those reported for passive smoking. These findings add to the growing evidence of the harmful effects of air pollution on birth outcomes.
Black smoke; Particulate matter; Air pollution; Birth weight; Gestational age
The analysis of genetic and environmental contributions to preterm birth is not straightforward in family studies, as etiology could involve both maternal and fetal genes. Markov Chain Monte Carlo (MCMC) methods are presented as a flexible approach for defining user-specified covariance structures to handle multiple random effects and hierarchical dependencies inherent in children of twin (COT) studies of pregnancy outcomes. The proposed method is easily modified to allow for the study of gestational age as a continuous trait and as a binary outcome reflecting the presence or absence of preterm birth. Estimation of fetal and maternal genetic factors and the effect of the environment are demonstrated using MCMC methods implemented in WinBUGS and maximum likelihood methods in a Virginia COT sample comprising 7,061 births. In summary, although the contribution of maternal and fetal genetic factors was supported using both outcomes, additional births and/or extended relationships are required to precisely estimate both genetic effects simultaneously. We anticipate the flexibility of MCMC methods to handle increasingly complex models to be of particular relevance for the study of birth outcomes.
preterm birth; fetal; maternal; genetic; environment; MCMC; ML
We consider frailty models with additive semiparametric covariate effects
for clustered failure time data. We propose a doubly penalized partial
likelihood (DPPL) procedure to estimate the nonparametric functions using
smoothing splines. We show that the DPPL estimators could be obtained from
fitting an augmented working frailty model with parametric covariate effects,
whereas the nonparametric functions being estimated as linear combinations of
fixed and random effects, and the smoothing parameters being estimated as extra
variance components. This approach allows us to conveniently estimate all model
components within a unified frailty model framework. We evaluate the finite
sample performance of the proposed method via a simulation study, and apply the
method to analyze data from a study of sexually transmitted infections
Doubly penalized partial likelihood; smoothing spline; Gaussian frailty; sexually transmitted disease; Smoothing parameter; Variance components
For dairy producers, a reliable description of lactation curves is a valuable tool for management and selection. From a breeding and production viewpoint, milk yield persistency and total milk yield are important traits. Understanding the genetic drivers for the phenotypic variation of both these traits could provide a means for improving these traits in commercial production.
It has been shown that Natural Cubic Smoothing Splines (NCSS) can model the features of lactation curves with greater flexibility than the traditional parametric methods. NCSS were used to model the sire effect on the lactation curves of cows. The sire solutions for persistency and total milk yield were derived using NCSS and a whole-genome approach based on a hierarchical model was developed for a large association study using single nucleotide polymorphisms (SNP).
Estimated sire breeding values (EBV) for persistency and milk yield were calculated using NCSS. Persistency EBV were correlated with peak yield but not with total milk yield. Several SNP were found to be associated with both traits and these were used to identify candidate genes for further investigation.
NCSS can be used to estimate EBV for lactation persistency and total milk yield, which in turn can be used in whole-genome association studies.