The first FPC score is shown here to be an effective and efficient summary of sparsely-sampled longitudinal measurements of CD4 counts and viral load. A clear benefit is that it exploits information from the population and hence can be stably calculated for individuals in a cohort having any number of measurements taken at any time during the study period. Using FPC analysis, we are able to demonstrate a significant relationship between CD4 profiles and survival outcomes when previous analyses of this data failed to find a significant association between a summary measure of CD4 counts and survival [

7].

This approach is successful for a variety of reasons. Longitudinal measurements of CD4 counts and viral load are trajectories with time-dependent structure. However the classical summaries such as the estimated viral set point or early CD4 count (and others such as CD4 nadir) are single-value quantities which ignore a majority of the data from each subject. FPCA, in contrast, is well-suited for the purpose of summarizing an entire trajectory and can do so with a single quantity: the first FPC score. As seen in the analysis of this cohort, the FPC score summarizes both the overall measurement level and the measurement changes over time. Furthermore, any principle components analysis is designed to capture the most variation among subjects so that any variation in viral load or CD4 count trajectories associated with survival will be captured by the FPC scores. For this reason, the FPC scores can be used in a preliminary analysis to determine if there is *any* feature of the longitudinal trajectories associated with survival. Once established, subsequent analysis and evaluation of the characteristics of the functional principle components is needed to identify which clinical features are associated with survival.

The classical measures, viral set point or early CD4 count, are only available for a fraction of the cohort–in this case, individuals for whom measurements were taken during early HIV infection–and so the analysis must ignore a substantial portion of the cohort. Indeed, in the viral load cohort, only 168 women had data available for estimation of viral set point whereas FPC scores were estimated for all 216 women. In the CD4 cohort, only 83 out of 132 women had data available for estimation of early CD4 count. Thus, the FPC analysis has the potential to increase sample size and improve power.

An attractive feature of this analysis is that no priori assumptions are made on the trajectory shapes. Viral load trajectories are sometimes estimated using linear or piece-wise linear models [

2,

9,

10] with different rates of change during and after the acute infection phase. While the acute infection stage is typically reported as the first 8–10 weeks of infection [

23], Lyles et al. [

10] used piece-wise linear models and found different rates of change for viral load in an Italian cohort before and after 18 months post-infection. Using a linear or piece-wise linear model, the duration of the acute infection phase must somehow be estimated–via statistical inference, from biological knowledge of the population, or by some other means. FPC obviates the need for this. Different rates of change at different times post-infection arise naturally through the estimated mean curve and principal components. Different individuals may have different trajectory shapes, depending upon their FPC scores. In the dataset analyzed in this work, the estimated mean curve and first principal component shape suggest that some individuals may have continued rates of viral load decay lasting as long as approximately 18 months post infection, consistent with the work of Lyles et al. [

10] (see ).

There are a number of weaknesses in the use of the two-stage FPC/Cox proportional hazards approach for evaluating the relationship between longitudinal profiles and survival. The FPC score is not a measure that can be calculated for individuals in the absence of data from an entire cohort; in that regard it is similar, e.g., to LME estimates of slopes and intercepts. In addition, without careful evaluation of the shape of the first FPC and its relationship to the population mean, the first (or higher order) FPC scores lack clinical interpretation and carry no explicit information about how CD4 or viral load trajectories are associated with survival. The hazard ratios estimated in our analysis do not have obvious clinical interpretations; our results indicate primarily that there is an association between both viral load and CD4 count trajectories and survival. In this study we are able to relate the FPC scores to clinically meaningful features of CD4 and viral load trajectories and provide some guidance on the interpretation of the hazard ratios, however, this is not guaranteed in general. For example, if the FPC changes sign or monotonicity over time, separate interpretations might be needed over different time intervals, possibly obscuring any relevant clinical interpretation. Thus FPC scores are not likely to be useful measures for clinical evaluation. Nonetheless, the approach is well suited to research analyses and population studies and can be used to identify if *any* features of a longitudinal trajectory are associated with an outcome of interest. When this is the case, additional analysis and careful study of the form of the functional principle components is needed to identify which clinically meaningful features of the trajectories are associated with the outcome of interest.

A further weakness, which is shared by many two-stage approaches when evaluating the relationship between longitudinal profiles and survival is that the summary measures used in the two-stage analysis (FPC score, estimated viral set point, early CD4 count, or slopes and intercepts from random effects models) are often measured with error. As such, the point estimate of the hazard ratio is known to be biased [

24] towards the null with amount of bias depending on the variation in the covariate. However, as described above, the magnitude of the hazard ratio associated with the FPC score covariates lacks clinical interpretation; the strength of the approach is that it identifies the trajectories as being strongly associated with survival, leading to additional analysis to determine specific features of these trajectories which impact survival.

There are many possible applications and areas for refinement. The analysis presented here is a two-stage approach in the sense that FPC scores are obtained first and subsequently used in a survival analysis. Joint modeling of survival data together with FPC is a promising alternative approach [

15,

25]. Because of the computational simplicity, our two-stage approach can serve as a good initial estimate in the joint modeling approach, which involves an Expectation Maximization (EM) algorithm to estimate parameters and may require selection of the number of FPCs within the iterations. Due to the enrollment of nonparametric longitudinal components, this estimating procedure will be more computational intensive than that in the joint models with popular parametric longitudinal submodels. Nevertheless, the proposed two-stage estimate can be used as an exploratory tool in the preliminary analysis and a starting point in the further joint models.

For simplicity and direct comparison with other commonly used summaries we have focused on only the first score. Examining finer trajectory details by including additional principal components is of interest. The number of selected principal components and smoothing parameters could be chosen objectively by automatic procedures. In this data set, adding the second principle component scores as covariates in the regression analysis did not qualitatively change the conclusions: neither the second FPC scores for CD4 count or viral load were significant predictors of survival and did not change the conclusions based on analysis with only the first FPC score as a covariate. Finally, FPC scores are used here to evaluate survival outcomes but they have application in a variety of other statistical analyses. Examples include the comparison of trajectories between two or more groups or as an adjustment covariate in regression analysis which require adjustment for viral loads or CD4 counts.

In summary, the functional principal components approach for the evaluation of longitudinal data is an attractive alternative to using single summary measures of longitudinal profiles in a research setting. It requires almost no a priori assumptions and can increase power of analysis by including all members of a cohort with observed data. Furthermore, if associations between the FPC score(s) and survival are detected, it will often be the case that additional evaluation of the profile using the functional principal components can reveal specific characteristics that are associated with the outcome of interest. As more biomarkers are identified and used to evaluate HIV survival and progression [

26,

27] (such as measures of inflammation or immune activation), summarizing longitudinal trajectories of these biomarkers will become increasingly important. The FPC method presented here provides one such method that is rigorous and effective for revealing structure in these data that is associated with clinical outcomes.