Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2995285

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Identifiability of ODE models
- 3. ODE models with constant parameters
- 4. ODE models with both constant and time-varying parameters
- 5. Numerical studies
- 6. Discussion
- References

Authors

Related links

Ann Stat. Author manuscript; available in PMC 2010 December 1.

Published in final edited form as:

Ann Stat. 2010 January 1; 38(4): 2351–2387.

PMCID: PMC2995285

NIHMSID: NIHMS227843

Department of Biostatistics and Computational Biology, University of Rochester School of Medicine and Dentistry, 601 Elmwood Avenue, Box 630, Rochester, New York 14642, USA

Hongqi Xue: ude.retsehcor.cmru@euX_iqgnoH; Hongyu Miao: ude.retsehcor.cmru@oaiM_uygnoH; Hulin Wu: ude.retsehcor.tsb@uwh

See other articles in PMC that cite the published article.

This article considers estimation of constant and time-varying coefficients in nonlinear ordinary differential equation (ODE) models where analytic closed-form solutions are not available. The numerical solution-based nonlinear least squares (NLS) estimator is investigated in this study. A numerical algorithm such as the Runge–Kutta method is used to approximate the ODE solution. The asymptotic properties are established for the proposed estimators considering both numerical error and measurement error. The B-spline is used to approximate the time-varying coefficients, and the corresponding asymptotic theories in this case are investigated under the framework of the sieve approach. Our results show that if the maximum step size of the *p*-order numerical algorithm goes to zero at a rate faster than *n*^{−1/(p∧4)}, the numerical error is negligible compared to the measurement error. This result provides a theoretical guidance in selection of the step size for numerical evaluations of ODEs. Moreover, we have shown that the numerical solution-based NLS estimator and the sieve NLS estimator are strongly consistent. The sieve estimator of constant parameters is asymptotically normal with the same asymptotic co-variance as that of the case where the true ODE solution is exactly known, while the estimator of the time-varying parameter has the optimal convergence rate under some regularity conditions. The theoretical results are also developed for the case when the step size of the ODE numerical solver does not go to zero fast enough or the numerical error is comparable to the measurement error. We illustrate our approach with both simulation studies and clinical data on HIV viral dynamics.

Ordinary differential equations (ODE) are widely used to model dynamic processes in many scientific fields such as engineering, physics, econometrics and biomedical sciences. In particular, new biotechnologies allow scientists to use ODE models to more accurately describe biological processes such as genetic regulatory networks, tumor cell kinetics, epidemics and viral dynamics of infectious diseases [Chen, He and Church (1999), Jansson and Revesz (1975), Michelson and Leith (1997), Daley and Gani (1999), Anderson and May (**?**AM91), Brookmeyer and Gail (1994), Nowak and May (2000)]. The mathematical modeling approach has made a great impact on these scientific fields over the past decades. For instance, ODE models have been used to quantify HIV viral dynamics which resulted in important scientific findings [Ho et al. (1995), Perelson et al. (1996, 1997)]. Comprehensive reviews of the application of ODE models in HIV dynamics can be found in Perelson and Nelson (1999), Nowak and May (2000), Tan and Wu (2005) and Wu (2005).

Although differential equation models have been widely used in scientific research, very little statistical research has been dedicated to parameter estimation and inference for differential equation models. The existing statistical methods for ODE models include the nonlinear least squares method [Bard (1974), van Domselaar and Hemker (1975), Benson (1979), Li, Osborne and Pravan (2005)], the smoothing-based techniques [Swartz and Bremermann (1975), Varah (1982), Chen and Wu (2008), Liang and Wu (2008), Brunel (2008)], the principal differential analysis (PDA) [Ramsay (1996), Heckman and Ramsay (2000), Poyton et al. (2006), Ramsay et al. (2007), Varziri et al. (**?**V07)] and the Bayesian approaches [Putter et al. (2002), Huang, Liu and Wu (2006), Donnet and Samson (2007)]. However, very few of these publications rigorously address the theoretical issues and study the asymptotic properties of the proposed estimators when both measurement error and numerical error are significant. In this paper, we intend to investigate statistical estimation methods for both constant and time-varying parameters in ODE models and study the asymptotic properties of the proposed estimators under the framework of the sieve approach.

Denote a general set of ODE models containing only constant parameters as

$$\{\begin{array}{ll}\frac{d\mathbf{\text{X}}(t)}{dt}=\mathbf{\text{F}}\left\{t,\mathbf{\text{X}}(t),\mathit{\text{\beta}}\right\},\hfill & \forall t\in \left[{t}_{0},T\right],\hfill \\ \mathbf{\text{X}}({t}_{0})={\mathbf{\text{X}}}_{0},\hfill & \hfill \end{array}$$

(1.1)

and denote a general set of ODE models with both constant and time-varying parameters as

$$\{\begin{array}{ll}\frac{d\mathbf{\text{X}}(t)}{dt}=\mathbf{\text{F}}\{t,\mathbf{\text{X}}(t),\mathit{\text{\beta}},\eta (t)\},\hfill & \forall t\in [{t}_{0},T],\hfill \\ \mathbf{\text{X}}({t}_{0})={\mathbf{\text{X}}}_{0},\hfill & \hfill \end{array}$$

(1.2)

where **X**(*t*) = {*X*_{1}(*t*),…, *X _{K}*(

If a closed-form solution to (1.1) or (1.2) is available, the standard statistical approaches for nonlinear regression or time-varying coefficient regression models can be used to estimate unknown parameters. In practice, (1.1) and (1.2) usually do not have closed-form solutions for a nonlinear **F**. In this case, numerical methods such as the Runge–Kutta algorithm [Runge (1895), Kutta (1901)] have to be used to approximate the solution of the ODEs for a given set of parameter values and initial conditions. Consequently, the nonlinear least squares (NLS) principle (minimizing the residual sum of squares of the differences between the experimental observations and numerical solutions) can be used to obtain the estimates of the unknown parameters. The NLS method for (1.1) was first described by mathematicians in 1970s [Bard (1974), van Domselaar and Hemker (1975), Benson (1979)]. The NLS method was also widely used to estimate the unknown parameters in ODE models in the fields of mathematics, computer science and control engineering. In the 1990s, the NLS method was extended to estimate time-varying parameters in (1.2). For example, the NLS method with spline approximation to time-varying parameters has been successfully applied to pharmacokinetic [Li et al. (2002)], physiologic [Thomaseth et al. (1996)] and HIV studies [Adams (2005)].

Though the NLS was the earliest and the most popular method developed for estimating the parameters in ODE models, so far the proposed NLS estimators and their asymptotic properties for ODE models have not been systematically studied, in particular, for time-varying parameter estimates. The influence of the numerical approximation error of ODEs on the asymptotic properties has not been analyzed. All existing studies took the numerical solution as the true solution and did not consider the difference between them. The difficulty is due to the co-existence of both measurement error and numerical error, and the standard theories of the NLS method [Jennerich (1969), Malinvaud (1970), Wu (1981), Delgado (1992)] cannot be directly applied. In this article, we intend to fill this gap.

The rest of the article is organized as follows. In Section 2, we discuss the identifiability problem of ODE models. Then we introduce the numerical solution-based NLS estimators for (1.1) and (1.2), and study their asymptotic properties in Sections 3 and 4, respectively. The asymptotic properties of the proposed estimators, including strong consistency, rate of convergence and asymptotic normalities, are established using the tools of empirical processes [Pollard (1984, 1990), Pakes and Pollard (1989), van der Vaart and Wellner (1996), Ma and Kosorok (2005), Wellner and Zhang (2007)] and the sieve methods [Grenander (1981), Shen and Wong (1994), Huang (1996) and Shen (1997)]. We perform simulation studies to investigate the finite-sample performance of the proposed estimation methods in Section 5. In this section, we also apply the proposed approaches to a set of ODE models for HIV dynamics. We provide a summary and discussion for the proposed methods in Section 6. Finally, the proofs of all the theoretical results are given in the Appendix.

Identifiability of ODE models is a critical question to answer before parameter estimation. To verify the uniqueness of parameter estimates for given system inputs and outputs, both analytical and numerical techniques have been developed for ODE models since 1950s. Before jumping into technical details, two commonly used definitions of identifiability are given as follows [Bellman and Åström (1970), Cobelli, Lepschy and Jacur (1979), Walter (1987), Ljung and Glad (1994), Audoly et al. (2001), Jeffrey and Xia (2005)].

Definition 1. Globally identifiable: a system structure is said to be globally identifiable if for any two parameter vectors *β*_{1} and *β*_{2} in the parameter space , **X**(*t*, *β*_{1}) = **X**(*t*, *β*_{2}) can be satisfied for all *t* if and only if *β*_{1} = *β*_{2}.

However, global identifiability is a strong condition to satisfy and usually difficult to verify in practice. Therefore, the definition of at-a-point identifiability was introduced by Ljung and Glad (1994) and Quaiser and Mönnigmann (2009) as follows.

Definition 2. At-a-point identifiable: a system structure is said to be locally (or globally) identifiable at a point *β*_{*} if for any ** β** within an open neighborhood of

A number of methods have been proposed for identifiability analysis of ODE models, including *structural* [Bellman and Åström (1970), Ljung and Glad (1994), Xia and Moog (2003)], *practical* [e.g., Rodriguez-Fernandez, Egea and Banga (2006), Miao et al. (2008)] and *sensitivity-based* [e.g., Jolliffe (1972), Quaiser and Mönnigmann (2009)] approaches. Due to the limited space, we may not be able to provide an exhaustive list of publications on identifiability of ODE models. In this article, the structural identifiability analysis techniques are of particular interest mainly due to the theoretical completeness.

Various structural identifiability approaches have been proposed, such as power series expansion [Pohjanpalo (1978)], similarity transformation [Vajda et al. (1989), Chappel and Godfrey (1992)] and implicit function theorem method [Xia and Moog (2003), Miao et al. (2008)]. Particularly, Ollivier (1990) and Ljung and Glad (1994) introduced another approach in the framework of differential algebra [Ritt (1950), Kolchin (1973)]. The differential algebra approach is suitable to general nonlinear dynamic systems, and it has been successfully applied to nonlinear differential equation models, including models with time-varying parameters [Audoly et al. (2001)]. In this article, the differential algebra approach is employed to verify the identifiability of ODE models with both constant and time-varying parameters.

For most structural identifiability analysis techniques such as the implicit function theorem method and the differential algebra approach, a key step is the elimination of latent variables via taking derivatives and algebraic operations, which makes such techniques suitable for multivariate ODE models with partially observed state variables. After all unobserved state variables are eliminated, equations involving only given inputs, measured outputs and unknown parameters can be obtained. If we consider the parameters as unknowns, it is easy to verify that the identifiability of unknown parameters is determined by the number of roots of these equations.

For illustration purposes, we consider a classical HIV dynamic model with both constant and time-varying parameters [Nowak and May (2000), Huang et al. (**?**H03), Wu et al. (2005)] as an example

$$\{\begin{array}{c}\frac{d}{dt}{T}_{U}(t)=\lambda -\rho {T}_{U}(t)-\eta (t){T}_{U}(t)V(t),\hfill \\ \frac{d}{dt}{T}_{I}(t)=\eta (t){T}_{U}(t)V(t)-\delta {T}_{I}(t),\hfill \\ \frac{d}{dt}V(t)=N\delta {T}_{I}(t)-cV(t),\hfill \end{array}$$

(2.1)

where *T _{U}* is the concentration of uninfected target CD4+

$$\{\begin{array}{l}{x}_{1}^{\prime}=\lambda -\rho {x}_{1}-\eta (t){x}_{1}{x}_{3},\hfill \\ {x}_{2}^{\prime}=\eta (t){x}_{1}{x}_{3}-\delta {x}_{2},\hfill \\ {x}_{3}^{\prime}=N\delta {x}_{2}-c{x}_{3},\hfill \end{array}$$

(2.2)

where *x′*_{1}, *x′*_{2} and *x′*_{3} denote the derivatives of *x*_{1}, *x*_{2} and *x*_{3}, respectively. We adopt the following ranking for variable elimination [Ljung and Glad (1994)],

$$\eta \prec {y}_{2}\prec {y}_{1}\prec \mathit{\text{\beta}}\prec {x}_{3}\prec {x}_{2}\prec {x}_{1},$$

(2.3)

where ** β** = (

$${y}_{1}^{(2)}+(\rho +\delta ){y}_{1}^{\prime}+\delta \rho {y}_{1}-\delta \lambda +\eta (t){y}_{2}({y}_{1}^{\prime}+\delta {y}_{1}-\lambda )=0,$$

(2.4)

$${y}_{2}^{(2)}+(\delta +c){y}_{2}^{\prime}+\delta c{y}_{2}-\eta (t){y}_{2}(N\delta {y}_{1}-{y}_{2}^{\prime}-c{y}_{2})=0,$$

(2.5)

where
${y}_{1}^{(2)}$ and
${y}_{2}^{(2)}$ denote the second-order derivative of *y*_{1}(*t*) and *y*_{2}(*t*), respectively. Therefore, *η*(*t*) can be expressed in terms of measurable system outputs and other constant unknown parameters either from (2.4) as

$$\eta (t)=\frac{{y}_{1}^{(2)}+(\rho +\delta ){y}_{1}^{\prime}+\delta \rho {y}_{1}-\delta \lambda}{-{y}_{2}({y}_{1}^{\prime}+\delta {y}_{1}-\lambda )}$$

(2.6)

or from (2.5) as

$$\eta (t)=\frac{{y}_{2}^{(2)}+(\delta +c){y}_{2}^{\prime}+\delta c{y}_{2}}{{y}_{2}(N\delta {y}_{1}-{y}_{2}^{\prime}-c{y}_{2})}\cdot $$

(2.7)

Thus, *η*(*t*) is identifiable if all the constant unknown parameters are identifiable. To verify the identifiability of all unknown parameters ** θ** = (

$${y}_{1}^{(2)}{y}_{2}{y}_{2}^{\prime}-{y}_{1}^{\prime}{y}_{2}{y}_{2}^{(2)}-\delta {y}_{1}{y}_{2}{y}_{2}^{(2)}+\lambda {y}_{2}{y}_{2}^{(2)}-(\delta +c){y}_{1}^{\prime}{y}_{2}{y}_{2}^{\prime}+(\rho \delta +\rho +\delta -{\delta}^{2}-\delta c){y}_{1}{y}_{2}{y}_{2}^{\prime}+c{y}_{2}{y}_{2}^{\prime}+\rho c{y}_{1}^{\prime}{y}_{2}^{2}+(\rho \delta c-{\delta}^{2}c){y}_{1}{y}_{2}^{2}-N\delta {y}_{1}{y}_{1}^{(2)}{y}_{2}+c{y}_{1}^{(2)}{y}_{2}^{2}-N\delta (\rho +\delta ){y}_{1}{y}_{1}^{\prime}{y}_{2}-N{\delta}^{2}\rho {y}_{1}^{2}{y}_{2}+N{\delta}^{2}\lambda {y}_{1}{y}_{2}=0.$$

(2.8)

The equation above only involves measurable system outputs [(*T _{U}* +

Throughout this article, we let ‖**a**‖ be the Euclidean norm (or *L*_{2} norm) of a vector (or a matrix) **a**;
${\Vert \mathbf{\text{A}}\Vert}_{\infty}={max}_{1\le i\le m}{\sum}_{j=1}^{n}|{a}_{ij}|$ be the supremum norm of an *m* × *n* matrix **A**, where *a _{ij}* is the (

In this section, we consider ODE models with constant parameters, that is, equation (1.1), over the time range of interest *I* = [*t*_{0}, *T*] (−∞ < *t*_{0} < *T* < +∞), where the initial value **X**_{0} = **X**(*t*_{0}) is assumed to be known in this article. In reality, **X**(*t*) cannot be measured exactly and directly; instead, its surrogate **Y**(*t*) can be measured. For simplicity, here we assume an additive measurement error model to describe the relationship between **X**(*t _{i}*) and the surrogate

$$\mathbf{\text{Y}}({t}_{i})=\mathbf{\text{X}}({t}_{i})+\mathit{\text{\epsilon}}({t}_{i}),$$

(3.1)

at random or fixed design time points *t*_{1},…,*t _{n}*, where the measurement errors (

If (1.1) does not have a closed-form solution, we need to resort to numerical techniques to obtain numerical solutions at discrete time points. In this article, we consider a general one-step numerical method. Let *t*_{0} = *s*_{0} < *s*_{1} < < *s _{m}*

$${\mathbf{\text{X}}}_{j+1}^{h}={\mathbf{\text{X}}}_{j}^{h}+h\mathbf{\Phi}({s}_{j},{\mathbf{\text{X}}}_{j}^{h},{\mathbf{\text{X}}}_{j+1}^{h},h),$$

(3.2)

where the specific form of **Φ** depends on the numerical method. The common numerical methods include the Euler backward method, the trapezoidal rule, the *r*-stage Runge-Kutta algorithm (*r* is usually between 2 and 5), and so on. Among these algorithms, the 4-stage Runge-Kutta algorithm [Mattheij and Molenaar (2002), page 53, Hairer, Nørsett and Wanner (1993), page 134] has been well developed and widely used in practice. Therefore, we employ the 4-stage Runge–Kutta algorithm as an example in our numerical studies.

Define
${e}^{h}={max}_{0\le j\le m-1}\Vert \mathbf{\text{X}}({s}_{j})-{\mathbf{\text{X}}}_{j}^{h}\Vert $, which is called the numerical error or the global discretization error [Hairer, Nørsett and Wanner (1993), page 159, Mattheij and Molenaar (2002), page 57]. If *e ^{h}* =

Following Mattheij and Molenaar (2002, page 58), the interpolation technique is commonly used if the measurement points (*t _{i}*,

$${\mathrm{\Xi}}_{1}(\mathit{\text{\beta}})=\sum _{i=1}^{n}\sum _{j=1}^{K}{[{Y}_{j}({t}_{i})-{\stackrel{\sim}{\mathbf{\text{X}}}}_{j}({t}_{i},\mathit{\text{\beta}})]}^{2}\cdot $$

(3.3)

Note that if the data are correlated or the measurement variances are heterogeneous, the weighted NLS can be used. The theoretical results can be extended to the weighted NLS. Also note that we can easily obtain the estimator (*t*) = (*t*, * _{n}*) for

To minimize the NLS objective function (3.3), the standard gradient optimization methods may fail due to the complicated nonlinear ODE model and the NLS objective function may have multiple local minima or may be ill-behaved [Englezos and Kalogerakis (2001)]. Fortunately, various global optimization methods are available to more reliably solve the parameter estimation problem for ODE models, although the global optimization methods are very computationally intensive. Moles, Banga and Keller (2004) compared the performance and computational cost of seven global optimization methods, including the differential evolution method [Storn and Price (1997)]. Their results suggest that the differential evolution method outperforms the other six methods with a reasonable computational cost. Improved performance can be achieved using a hybrid method combining gradient methods and global optimization methods. A hybrid method based on the scatter search and sequential quadratic programming (SQP) has been proposed by Rodriguez-Fernandez, Egea and Banga (2006), who showed that the hybrid scatter search method is much faster than the differential evolution method for a simple HIV ODE model. In addition, Miao et al. (2008) also suggested that global optimization methods should be used for general nonlinear ODE models. Here we combine the differential evolution, the scatter search method and the SQP local optimization technique to implement our NLS minimization.

In this section, we study the asymptotic properties of the proposed numerical solution-based NLS estimator when both measurement error and numerical error are considered. First we make the following assumptions:

- A1
, where is a compact subset of*β*^{d}with a finite diameter.*R*_{β} - A2Ω = {
**X**(*t*,) :*β**t**I*,} is a closed and bounded convex subset of*β*^{K}. - A3There exist two constants −∞ <
**c**_{1}<**c**_{2}< +∞ such that*c*_{1}≤**Y**(*t*) ≤*c*_{2}for all*t**I*. - A4All partial derivatives of
**F**(*t*,**X**,) up to order*β**p*with respect to*t*and**X**exist and are continuous. - A5The numerical method for solving ODEs is of order
*p*. - A6For any
,*β**E*[_{t}**X**(*t*,) −*β***X**(*t*,*β*_{0})]^{2}= 0 if and only if=*β**β*_{0}. - A7The first and second partial derivatives, $\frac{\partial \mathbf{\text{X}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}$ and $\frac{{\partial}^{2}\mathbf{\text{X}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}$, exist and are continuous and uniformly bounded for all
*t**I*and.*β* - A8For the ODE numerical solution (
*t*,), the first and second partial derivatives, $\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}$ and $\frac{{\partial}^{2}\stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}$, exist and are continuous and uniformly bounded for all*β**t**I*and.*β* - A9Let 0 <
*c*_{3}<*c*_{4}< ∞ be two constants. For random design points,*t*_{1},…,*t*are i.i.d. The joint density function_{n}*ϕ*(*t*,**y**) of (*t*,**Y**) satisfies*c*_{3}≤*ϕ*(*t*,**y**) ≤*c*_{4}for all (*t*,**y**) [*t*_{0},*T*] × [**c**_{1},**c**_{2}]. - A10The true parameter
*β*_{0}is an interior point of . - A11
**is an interior point of , where****= arg min**_{β }*E*_{0}[**Y**(*t*) − (*t*,)]*β*^{T}× [**Y**(*t*) − (*t*,)] and*β**E*_{0}is the expectation with respect to*P*_{β0}, the joint probability distribution of (*t*,**Y**(*t*)) at true value*β*_{0}. - A12${\mathbf{\text{V}}}_{1}={\{{E}_{t}(\frac{\partial \mathbf{\text{X}}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial \mathbf{\text{X}}}{\partial {\mathit{\text{\beta}}}_{0}^{T}})\}}^{-1}{E}_{t}(\frac{\partial \mathbf{\text{X}}}{\partial {\mathit{\text{\beta}}}_{0}}\sum \frac{\partial \mathbf{\text{X}}}{\partial {\mathit{\text{\beta}}}_{0}^{T}}){\{{E}_{t}(\frac{\partial \mathbf{\text{X}}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial \mathbf{\text{X}}}{\partial {\mathit{\text{\beta}}}_{0}^{T}})\}}^{-1}$ is positive definite, where
*E*[_{t}*g*(*t*)] is expectation of function*g*(*t*) with respect to*t*. - A13${\stackrel{\sim}{\mathbf{\text{V}}}}_{1}={\{{E}_{t}(\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}}{\partial \stackrel{\sim}{\mathit{\text{\beta}}}}\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}}{\partial {\stackrel{\sim}{\mathit{\text{\beta}}}}^{T}})\}}^{-1}{E}_{0}(\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}}{\partial \stackrel{\sim}{\mathit{\text{\beta}}}}{[\mathbf{\text{Y}}(t)-\stackrel{\sim}{\mathbf{\text{X}}}(t,\stackrel{\sim}{\mathit{\text{\beta}}})]}^{\otimes 2}\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}}{\partial {\stackrel{\sim}{\mathit{\text{\beta}}}}^{T}}){\{{E}_{t}(\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}}{\partial \stackrel{\sim}{\mathit{\text{\beta}}}}\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}}{\partial {\stackrel{\sim}{\mathit{\text{\beta}}}}^{T}})\}}^{-1}$ is positive definite.

Assumptions A1–A4 are general requirements for existence of numerical solutions of ODE models. Assumption A5 from Mattheij and Molenaar (2002, pages 55 and 56) defines the precision of the numerical algorithm. For example, the Euler backward method, the trapezoidal rule, the 4-stage and 5-stage Runge–Kutta are of order 1, 2, 4 and 5, respectively. Theorem 2.13 in Hairer, Nørsett and Wanner (1993, page 153) provides sufficient and necessary conditions for the numerical method to be of order *p*. Theorems 3.1 and 3.4 in Hairer, Nørsett and Wanner (1993, pages 157 and 160) give the magnitude of the numerical error of the numerical algorithms. Assumption A6 is required for identifiability and imposed for consistency. From Section 2, we know that the HIV model (2.1) is at-a-point identifiable at the true value *β*_{0}. This result and assumption A9 are sufficient conditions for assumption A6 to be satisfied. Assumptions A7–A9 are needed for consistency. Assumptions A10–A13 are needed for the proof of asymptotic normality in Theorem 3.2.

Theorem 3.1. Assume that there exists a λ > 0 such that h = O(n^{−λ}), then under assumptions A1–A10, we have _{n} − **β**_{0} → 0, almost surely under P_{β0}.

Theorem 3.2. (i) For h = O(n^{−λ}) with λ > 1/(p ∧ 4) where p is the order of the numerical method (3.2), under assumptions A1–A10 and A12, we have that
${n}^{1/2}({\widehat{\mathit{\text{\beta}}}}_{n}-{\mathit{\text{\beta}}}_{0})\stackrel{d}{\to}N(0,{\mathbf{\text{V}}}_{1})$.

(ii) For h = O(n^{−λ}) with 0 < λ ≤ 1/(p ∧ 4), under assumptions A1–A9, A11 and A13, we have that
${n}^{1/2}({\widehat{\mathit{\text{\beta}}}}_{n}-{\stackrel{\sim}{\mathit{\text{\beta}}}}_{0})\stackrel{d}{\to}N(0,{\stackrel{\sim}{\mathbf{\text{V}}}}_{1})$ with ‖ − **β**_{0}‖ = O(h^{(p∧4)/2}) = **O**(n^{−λ(p∧4)/2}) and ‖_{1} − V_{1}‖ = O(h^{(p∧4)/2}) = **O**(n^{−λ(p∧4)/2}).

The detailed proofs of Theorems 3.1 and 3.2 are provided in the Appendix. The basic idea for the proofs is motivated by Pakes and Pollard (1989) in which a general central limit theorem is proved for a broad class of simulation estimators, that is, the objective function of the estimator is too complicated to evaluate directly, and instead the Monte Carlo simulation is used to approximate the objective function to obtain the estimator. The asymptotic properties of the simulation-based estimator are established using a general central limit theorem under nonstandard conditions given in Huber (1967) and Pollard (1985), which are often called the Huber–Pollard Z-theorem [see Theorem 3.3.1 in van der Vaart and Wellner (1996)]. In this article, we use the same theorem to prove the asymptotic normality of the numerical solution-based NLS estimator for ODEs. Similarly, our objective function Ξ_{1} (*β*) in (3.3) cannot be directly evaluated; instead we have to approximate it by solving (1.1) numerically. Thus, similar ideas in Pakes and Pollard (1989) can be borrowed to establish the asymptotic results of our estimator in Theorems 3.1 and 3.2.

Remark 1. Theorems 3.1 and 3.2 can be extended to fixed design points *t _{i}* [

Remark 2. From the proof of Theorem 3.2 in the Appendix, we still have ‖** − ***β*_{0}‖ = *O*(*h*^{(p∧4)/2}) and ‖_{1} − *V*_{1}‖ = *O*(*h*^{(p∧4)/2}) for a fixed constant *h,* which is independent of the sample size *n*. This suggests that, if the maximum step size *h* of the numerical algorithm for solving ODEs is a fixed constant, the numerical solution-based NLS estimator is not consistent. Instead the asymptotic bias is in the order of *h*^{(p∧4)/2}.

Notice that our asymptotic results provide a theoretical foundation for the relationship between the numerical step size and sample size, that control numerical error and measurement error, respectively, for the widely-used NLS estimator based on the numerical solutions of the ODEs. Intuitively, the smaller the numerical step size is, better the estimator is. However, a smaller step size will increase the computational cost and this may become a serious problem when the ODE system is large and the computational cost is high. It is important to study the trade-off between the numerical error and measurement error when the computational cost needs to be taken into consideration. Our theoretical results show that, only when the numerical step size, which controls the numerical error and computational cost, goes to zero with a rate faster than a particular rate *n*^{−1/(p∧4)}, the numerical solution-based NLS estimator converges to the true value of the parameters with the rate of root-*n*. In addition, the asymptotic variance of the NLS estimator is the one as if the true solution **X**(*t*) is exactly known.

The asymptotic variance–covariance matrix needs to be estimated in order to perform statistical inference for unknown parameters ** β**. There are some standard methods that can be used. The first approach is to use the observed pseudo-information matrix based on the NLS objective function (3.3). The observed pseudo-information matrix is defined as
${\mathcal{I}}_{1}(\mathit{\text{\beta}})=-\frac{{\partial}^{2}{\mathrm{\Xi}}_{1}}{\partial {\mathit{\text{\beta}}}^{2}}$. The standard error of

The second approach is the weighted bootstrap method [Ma and Kosorok (2005)]. Let *W _{i}*,

$${\widehat{\mathit{\text{\beta}}}}_{n}^{0}=\text{arg min}\sum _{i=1}^{n}\sum _{j=1}^{K}{W}_{i}{[{Y}_{j}({t}_{i})-{\stackrel{\sim}{\mathbf{\text{X}}}}_{j}({t}_{i},\mathit{\text{\beta}})]}^{2}.$$

From Corollary 2 and Theorem 2 in Ma and Kosorok (2005), given {*t _{i}*,

Note that the empirical bootstrap has been used for statistical inference for ODE models [Joshi, Seidel-Morgenstern and Kremling (2006)]. However, the asymptotic properties of the empirical bootstrap estimators are quite difficult to derive. This is why we propose to use the weighted bootstrap method instead of the empirical bootstrap approach.

In this section, we consider (1.2) with both constant and time-varying parameters, where the initial value **X**_{0} = **X**(*t*_{0}) is assumed to be known. Again, **X**(*t*) is not observed directly in practice; instead, we observe its surrogate **Y**(*t*) through (3.1).

Let be the following class of functions,

$$\mathcal{A}=\{\eta \in {C}^{\mu}[{t}_{0},T]:|{\eta}^{(\mu )}({z}_{1})-{\eta}^{(\mu )}({z}_{2})|\le L{|{z}_{1}-{z}_{2}|}^{\gamma}\},$$

(4.1)

where *μ* is a nonnegative integer, *γ* (0, 1], = *μ* + *γ* > 0.5, and *L* an unknown constant. The smoothness assumption is often used in nonparametric curve estimation. Usually, either = 1 (i.e., *μ* = 0 and *γ* = 1) or = 2 (i.e., *μ* = 1 and *γ* = 1) should be satisfied in various situations. Denote ** θ** = (

In this article, we use the method of sieves to approximate *η*_{0}(*t*) on the support interval [*t*_{0}, *T*] of *t*. The basic idea of the sieve approach is to approximate an infinite-dimensional parameter space Θ by a series of finite-dimensional parameter spaces Θ* _{n}*, which depend on the sample size

Here we apply the sieve estimation method to (1.2) with a time-varying parameter. First, we approximate *η*(*t*) by B-spline functions on the support interval *I* of *t*. Let *t*_{0} = *u*_{0} < *u*_{1} < < *u _{q}* =

For any *θ** _{i}* × (

$$d({\mathit{\text{\theta}}}_{1},{\mathit{\text{\theta}}}_{2})=\Vert {\mathit{\text{\beta}}}_{1}-{\mathit{\text{\beta}}}_{2}\Vert +{\Vert {\eta}_{1}-{\eta}_{2}\Vert}_{2}.$$

(4.2)

Denote set

$${\mathcal{A}}_{n}=\left\{\eta (t)=\sum _{i=1}^{N}{B}_{i}(t){\alpha}_{i}:\underset{1\le i\le N}{max}|{\alpha}_{i}|\le {\ell}_{n}\right\},$$

where * _{n}* ≤

$$\frac{d\mathbf{\text{X}}(t)}{dt}\approx \mathbf{\text{F}}\{t,\mathbf{\text{X}}(t),\mathit{\text{\beta}},\pi {(t)}^{T}\mathit{\text{\alpha}}\}.$$

For this approximation model, let (*t*, ** β**,

$${\widehat{\mathit{\text{\theta}}}}_{n}=\underset{\mathit{\text{\theta}}\in {\mathrm{\Theta}}_{n}}{\text{arg inf}}\phantom{\rule{0.2em}{0ex}}{\mathrm{\Xi}}_{2}(\mathit{\text{\theta}})=\underset{\mathit{\text{\theta}}\in {\mathrm{\Theta}}_{n}}{\text{arg inf}}\sum _{i=1}^{n}\sum _{j=1}^{K}{[{Y}_{j}({t}_{i})-{\stackrel{\sim}{X}}_{j}({t}_{i},\mathit{\text{\beta}},\eta (t))]}^{2}.$$

(4.3)

When we substitute the sieve NLS estimators * _{n}* into the numerical approximation, we can obtain the estimator (

The empirical objective function for the sieve NLS method proposed in Section 4.1 is a second-order loss function which is not a likelihood function. We cannot use the standard information calculation of the maximum likelihood estimator (MLE) based on orthogonal projections in semiparametric models [Bickel et al. (1993)], and the asymptotic normality theory for semiparametric MLEs obtained by Huang (1996, Theorem 6.1) does not apply to our case. Fortunately, Ma and Kosorok (2005) and Wellner and Zhang (2007) extended the Huang's asymptotic normality results to more general semiparametric M-estimators by using a so-called pseudo-information calculation. We are able to employ these new asymptotic results to asymptotic properties of the proposed sieve NLS estimator, and the following additional assumptions are needed:

- B1
- B2All partial derivatives of
**F**up to order*p*with respect to*t*,**X**, and*η*, respectively, exist and are continuous. - B3For any
and*β**η*,*E*[_{t}**X**(*t*,,*β**η*(*t*)) −**X**(*t*,*β*_{0},*η*_{0}(*t*))]^{2}= 0 if and only if=*β**β*_{0}and*P*{*t*:*η*(*t*) =*η*_{0}(*t*)} = 1. - B4The first and second partial Fréchet-derivatives [van der Vaart and Wellner (1996), page 373] in the norm
*d*defined in (4.2), $\frac{\partial \mathbf{\text{X}}(t,\mathit{\text{\beta}},\eta )}{\partial \mathit{\text{\beta}}}$, $\frac{\partial \mathbf{\text{X}}(t,\mathit{\text{\beta}},\eta )}{\partial \eta}$, $\frac{{\partial}^{2}\mathbf{\text{X}}(t,\mathit{\text{\beta}},\eta )}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}$, $\frac{{\partial}^{2}\mathbf{\text{X}}(t,\mathit{\text{\beta}},\eta )}{\partial \mathit{\text{\beta}}\partial \eta}$ and $\frac{{\partial}^{2}\mathbf{\text{X}}(t,\mathit{\text{\beta}},\eta )}{\partial {\eta}^{2}}$ exist and are continuous and uniformly bounded for all*t*,*I*and*β**η*. - B5For the ODE numerical solution (
*t*,,*β**η*), the first and second partial Fréchet-derivatives in the norm*d*, $\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}},\eta )}{\partial \mathit{\text{\beta}}}$, $\frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}},\eta )}{\partial \eta}$, $\frac{{\partial}^{2}\stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}},\eta )}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}$, $\frac{{\partial}^{2}\stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}},\eta )}{\partial \mathit{\text{\beta}}\partial \eta}$ and $\frac{{\partial}^{2}\stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}},\eta )}{\partial {\eta}^{2}}$ exist and are continuous and uniformly bounded for all*t*,*I*and*β**η*. - B6For
*K*≥ 2, ${\mathbf{\text{V}}}_{2}={\mathbf{\text{S}}}_{1}^{-1}{\mathbf{\text{S}}}_{2}{({\mathbf{\text{S}}}_{1}^{-1})}^{T}$ is positive definite, where**S**_{1}and**S**_{2}are defined in (A.5) and (A.6) in the Appendix, respectively. - B7
*v*satisfies the restrictions 0.25/ <*v*< 0.5 and*v*(2 + ) > 0.5, where is the measure of smoothness of*η*(*t*) defined in assumption (B1).

Theorem 4.1. Assume that there exists a λ > 0 such that h = O(n^{−λ}) and under assumptions A1–A4, A9, A10 and B1–B5, we have d(_{n}, **θ**_{0}) → 0, almost surely under P_{θ0}.

Theorem 4.2. Assume that there exists a λ > 1/[2(p ∧ 4)] such that h = O(n^{−λ}) where p is the order of the numerical algorithm (3.2), and under assumptions A1–A4, A9, A10 and B1–B5, we have d(_{n}, **θ**_{0}) = O_{p}(n^{−v} + n^{−(1−v)/2}).

From Theorem 4.2, we know that ‖* _{n}* −

Theorem 4.3. For the maximum step size h = O(n^{−λ}) with λ > 1/(p ∧ 4), under assumptions A1–A4, A9, A10 and B1–B7, and K ≥ 2, we have
${n}^{1/2}({\widehat{\mathit{\text{\beta}}}}_{n}-{\mathit{\text{\beta}}}_{0})\stackrel{d}{\to}N(0,{\mathbf{\text{V}}}_{2})$.

Remark 3. For the case *h* = *O*(*n*^{−λ}) with 1/[2(*p* ∧ 4)] < *λ* ≤ 1/(*p* ∧ 4), similar results to case (ii) in Theorem 3.2 can be obtained.

For *K* = 1, Theorem 4.3 does not hold, since in this case the special perturbation direction *a**(*t*) given in (A.4) is
$\frac{\partial X}{\partial {\mathit{\text{\beta}}}_{0}}/\frac{\partial X}{\partial {\eta}_{0}}$, which leads to both **S**_{1} in (A.5) and **S**_{2} in (A.6) to be zero (see the proof of Theorem 4.3 in the Appendix). In this article, we consider one special case that we assume there exists an additive relationship between ** β** and

$$\frac{dX\left(t\right)}{dt}=F\left\{t,\phantom{\rule{0.2em}{0ex}}X\left(t\right),\beta +\eta \left(t\right)\right\},$$

(4.4)

which is a special case of (1.2), then the function *X*(*t*) has the form of *X*(*t, β* + *η*(*t*)). In this case, we are able to establish similar asymptotic normality results under the identifiability constraint *E _{t}η*(

$${\widehat{\eta}}_{n}({t}_{i})\approx \sum _{i=1}^{N}{B}_{j}({t}_{i}){\widehat{\alpha}}_{j}-\frac{1}{n}\sum _{i=1}^{n}\sum _{j=1}^{N}{B}_{j}({t}_{i}){\widehat{\alpha}}_{j}=\sum _{j=1}^{N}{\widehat{\alpha}}_{j}\left[{B}_{j}({t}_{i})-\frac{1}{n}\sum _{i=1}^{n}{B}_{j}({t}_{i})\right],$$

which is subject to the constraints ${\sum}_{i=1}^{n}{\widehat{\eta}}_{n}({t}_{i})=0$. Under similar assumptions, the strong consistency and the rate of weak convergence of the estimators, similar to those of Theorems 4.1 and 4.2, can be obtained. In particular, the asymptotic normality can be established as follows:

Proposition 1. For (4.4) with K = 1, when the maximum step size h = O(n^{−λ}) with λ > 1/(p ∧ 4), under assumptions A1–A4, A9, A10, B1–B5, B7 and in addition E_{t} [η(t)] = 0, we have
${n}^{1/2}({\widehat{\mathit{\text{\beta}}}}_{n}-{\mathit{\text{\beta}}}_{0})\stackrel{d}{\to}N(0,{V}_{3})$, where
${V}_{3}={\sigma}_{0}^{2}{\{{E}_{t}{(\frac{\partial X}{\partial \xi})}^{2}\}}^{-1}$ with ξ = β_{0} + η_{0}(t).

The proof of this proposition is different from that of Theorem 4.3 and is given in the Appendix.

Remark 4. By combining Theorem 4.3 and Proposition 1, we can see that the proposed sieve NLS estimator is asymptotically normal with a convergence rate of
$\sqrt{n}$ for *K* ≥ 2 under assumption B6, but we are only able to prove the result for a special ODE model (4.4) for *K* = 1. This is because the asymptotic covariance *V*_{2} defined in B6 is always singular in the case of *K* = 1, and is only possibly nonsingular in the case of *K* ≥ 2. Since *V*_{2} is always singular for *K* = 1, we derive the asymptotic distribution for the special ODE model (4.4) using a different approach which results in Proposition 1.

Similar approaches proposed in Section 3 can be used to estimate the asymptotic variance-covariance matrix for (* _{n}, _{n}* (

$${\mathcal{I}}_{2}(\mathit{\text{\beta}},\mathit{\text{\alpha}})=\left(\begin{array}{cc}-\frac{{\partial}^{2}{\mathrm{\Xi}}_{2}}{\partial {\mathit{\text{\beta}}}^{2}}& -\frac{{\partial}^{2}{\mathrm{\Xi}}_{2}}{\partial \mathit{\text{\beta}}\partial \mathit{\text{\alpha}}}\\ -\frac{{\partial}^{2}{\mathrm{\Xi}}_{2}}{\partial \mathit{\text{\alpha}}\partial \mathit{\text{\beta}}}& -\frac{{\partial}^{2}{\mathrm{\Xi}}_{2}}{\partial {\mathit{\text{\alpha}}}^{2}}\end{array}\right).$$

The standard error of (_{n}, _{n}) is approximately
${\mathcal{I}}_{2}^{-1/2}({\widehat{\mathit{\text{\beta}}}}_{n},{\widehat{\mathit{\text{\alpha}}}}_{n})/\sqrt{n}$ from which the standard error of _{n} can be obtained. We also find that the inverse of the observed pseudo-information matrix provides a reasonable approximation to **V**_{2} via our simulation studies in the next section.

Similarly the weighted bootstrap method can also be used. For (1.2), the weighted M-estimators $({\widehat{\mathit{\text{\beta}}}}_{n}^{0},{\widehat{\mathit{\text{\alpha}}}}_{n}^{0})$ satisfy

$$({\widehat{\mathit{\text{\beta}}}}_{n}^{0},{\widehat{\mathit{\text{\alpha}}}}_{n}^{0})=\text{arg min}\sum _{i=1}^{n}\sum _{j=1}^{K}{W}_{i}{[{Y}_{j}({t}_{i})-{\stackrel{\sim}{\mathbf{\text{X}}}}_{j}({t}_{i},\mathit{\text{\beta}},\mathit{\text{\pi}}{(t)}^{T}\mathit{\text{\alpha}})]}^{2}.$$

Based on Corollary 2 and Theorem 2 in Ma and Kosorok (2005), given {*t _{i},*

In this section, we consider the HIV dynamic model described in Section 2. Recall that in this system, *T _{U}*(

The following parameter values and initial conditions were used to simulate observation data for (2.1): *T _{U}*(0) = 600,

Let *y*_{1} = *T* = *T _{U}* +

$$\begin{array}{c}{y}_{1i}=T({t}_{i})+{\epsilon}_{1i},\\ {y}_{2i}=V({t}_{i})+{\epsilon}_{2i},\end{array}$$

where *ε*_{1}* _{i}* and

For evaluating the performance of the estimation methods, we define the average relative estimation error (ARE) as

$$\text{ARE}=\frac{1}{M}\sum _{j=1}^{M}\frac{|{\widehat{\theta}}_{j}-\theta |}{|\theta |}\times 100\%,$$

where * _{j}* is the estimate of the parameter vector

In Table 1, the AREs of the constant parameters (*λ, N, c*) are listed. In addition, we also report
${\sigma}_{\text{ODE}}^{2}$ as the average of the estimated variance by the observed pseudo-information matrix and
${\sigma}_{\text{emp}}^{2}$ as the empirical variance based on simulation runs. Based on these results, we can see that, when the change of *η*(*t*) is small as a function of time *t* or *η* is a small constant, the estimation of parameters is always good by fitting a constant *η* model as observed by the low ARE values. However, when the change of *η*(*t*) is large or *η* is a large constant, misspecification of *η*(*t*) may produce large AREs for all parameter estimates. In particular, when *η*(*t*) is time-varying with a large variation, using a constant *η* model may result in very poor estimates for all constant parameters. The variance estimates based on the pseudo-information agree well with the empirical estimates based on simulations, which shows that the pseudo-information-based variance estimate is reasonably good. The evaluation of the bootstrap variance estimation is prohibited in our simulation study due to high computational cost.

In Figure 1, the average trajectories of estimated *η*(*t*) are compared to the true trajectories of *η*(*t*) for four different scenarios. From this figure, we observed a similar trend as the constant parameter estimates. The misspecification of *η*(*t*) produces estimation error, in particular for the cases with a large variation of *η*(*t*) or a large constant *η*. When the model of *η*(*t*) is correctly specified, the estimates based on the proposed methods are reasonably good. In order to evaluate the robustness of the proposed approach, we also performed further simulation studies for a complex function *η*(*t*) = 9.0 × 10^{−6} +9.0 × 10^{−7} × *t* ×{1 − 0.5sin(*πt/*5.8)} under the same simulation settings (i.e., 40 time points, 20% error, 500 simulation runs). The results suggest that the sieve estimator can still capture the essential pattern of the complex *η*(*t*) reasonably well (plots not shown).

To illustrate applicability and feasibility of our proposed methods and theories, we also applied the proposed estimation methods to fit the HIV dynamic model to a clinical data set obtained from an HIV-1 infected patient who was treated with an antiretroviral therapy. Very frequent viral load measurements were collected from this patient after initiating the antiretroviral regimen: 13 measurements during the first day, 14 measurements from day 2 to week 2, and then one measurement at weeks 4, 8, 12, 14, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 64, 74 and 76, respectively. In addition, the measurements of total CD4+ *T* cell counts were also taken at Day 1, weeks 2 and 4, and monthly thereafter. Equation (2.1) was used to estimate HIV kinetic parameters using the viral load and total CD4+ *T* cell data.

For simplicity of illustration and computation, we fixed the initial conditions of the state variables in (2.1) as *T _{U}*(0) = 1,

$$\text{AICc}=n\phantom{\rule{0.2em}{0ex}}\text{ln}\left(\frac{\text{RSS}}{n}\right)+\frac{2nk}{n-k-1},$$

where RSS is the residual of the sum of squares obtained from the NLS model fitting, *n* is the total number of observations and *k* is the number of unknown parameters [including the coefficients in the spline representation of *η*(*t*)]. Note that as a practical guideline, if the number of unknown parameters exceeds *n*/40 (where *n* is the sample size), the AICc instead of AIC should be used. For our clinical data, the sample size *n* is equal to 65, and the number of unknown parameters varies between 6 and 13 for different scenarios, which is much larger than *n*/40 = 65/40 = 1.6. Thus the AICc is more appropriate for our applications. In general, the AICc converges to the AIC as the sample size gets larger, thus the AICc is often suggested to be employed regardless of the sample size [Burnham and Anderson (2004)]. For our application, we used AICc and compared the models with the splines of order 3 and 4, and the number of knots from 3 to 10. In Table 2, the AICc values for these different models are reported, from which the best model was selected as the spline with order 3 and 5 knots for *η*(*t*) approximation.

We used the weighted bootstrap method to calculate both the confidence intervals for the constant parameters and the confidence bands for the time-varying parameter. The basic idea of the weighted bootstrap method is provided in Sections 3.2 and 4.2. For the computational implementation, we first generated a positive random weight for each data point in the raw data set from the exponential distribution with mean one and variance one. By repeating this step, a large number of (say, 1000) sets of weights can be generated. Second, for each set of weights, the ODE model is fitted to the data to obtain parameter estimates by minimizing the weighted residual sum of squares (see Sections 3 and 4). Recall that the time-varying parameter in the model has been approximated by B-splines, then both the constant parameters and the constant B-spline coefficients are actually estimated. Once the estimates of the B-spline coefficients are obtained, we construct the B-splines which approximate the time-varying parameter. Thus, we eventually obtain 1000 estimates for each constant parameter and 1000 B-splines for each time-varying parameter. Third, for each constant parameter, we select the 2.5% and 97.5% quantiles of the 1000 estimates to form the 95% confidence intervals for this parameter. For the time-varying parameter, at a single time point, the 1000 B-splines have 1000 values. We also select the 2.5% and 97.5% quantiles of the 1000 values at this time point to eventually form the 95% pointwise confidence bands for the time-varying parameter.

Model fitting results are given in Figure 2 and Table 3. From Figures 2(a) and (b), we can see that the fitting is reasonably good for both CD4+ *T* cell counts and viral load data. The estimates of constant parameters (*λ, N, c*) are listed in Table 3, and the 95% bootstrap confidence intervals of the estimates are also provided. The uninfected cell proliferation rate (*λ*) was estimated as 46.52 cells per day, the average number of virions produced by one infected cell (*N*) was estimated as 1300 per day and the clearance rate of free virions was 4.35 per day which corresponds to a half-life of 3.8 hours. All these estimates are in the ball-park of similar estimates from other methods [Perelson et al. (1996, 1997)]. In Figure 2(c), the estimated trajectory of the time-varying parameter *η*(*t*) (the viral infection rate), is plotted with 95% bootstrap quantile confidence intervals, which shows an initial fluctuation but converges to a constant after 2 to 3 months.

In this paper, we have systematically studied numerical solution-based NLS estimators for general nonlinear ODE models which the closed-form solutions are not available. Both constant and time-varying parameters are considered. For the model involved time-varying parameters, we formulated the estimator under the framework of sieve approach. Our main contribution is the establishment of the asymptotic properties for the proposed numerical solution-based NLS estimators (including the sieve NLS estimator for the time-varying parameter) with consideration of both numerical error and measurement error. Our results show that if the maximum step size of the *p*-order numerical algorithm goes to zero at a rate faster than *n*^{−1/(p∧4)}, the numerical error is negligible compared with the measurement error. This provides guidance in selecting the step size for numerical evaluations of ODEs. Moreover, we have shown that the numerical solution-based NLS estimator and the sieve NLS estimator for the model with a time-varying parameter are strongly consistent. The sieve estimator of constant parameters is asymptotically normal with the same asymptotic co-variance as that of the case where the true solution is exactly known, while the estimator of the time-varying parameter has an optimal convergence rate under some regularity conditions. We also obtained the theoretical results for the case when the step size of the ODE numerical solver does not go to zero fast enough or the numerical error is comparable to the measurement error [see case (ii) of Theorem 3.2 and Remark 3]. To our best knowledge, this is the first time that the sieve method has been extended to the case of ODE models which have no closed-form solutions, and the sieve-based theories were used to establish the asymptotic results and construct confidence intervals (bands) for both constant and time-varying parameters. Note that we only considered a single time-varying parameter in the model, but the methodologies can be extended to multiple time-varying parameters although it is more tedious to implement.

Note that the NLS estimators have good properties under some assumptions and are more accurate compared to other estimates such as those proposed in Ramsay et al. (2007), Chen and Wu (2008) and Liang and Wu (2008). But the price that we have to pay is the high computational cost to obtain the NLS estimates. To reduce the computational burden, we may use the rough estimates from other methods [Ramsay et al. (2007), Chen and Wu (2008), Liang and Wu (2008)] to narrow down the search range for the NLS optimization algorithm. More efficient optimization algorithms may also be employed to speed up the computation. We are also considering to parallel our global optimization algorithms on high-performance computers. Hopefully these efforts can help us to handle a reasonable size of ODE models.

This article only considered the initial value problem (IVP), that is, the initial conditions are assumed to be given. In practice, the initial conditions can be estimated from the data. However, the generalizations of the theoretical results to the cases of estimated initial conditions and other boundary value problems as well as constraints on parameters are not trivial. Also note that, if there is more than one time-varying parameter in the model, similar identifiability techniques in Section 2 may be applied to these parameters one by one, sequentially. Spline approximation to these multiple time-varying parameters can be used for estimation. But the computation and theoretical results are more complicated in this case. However, these generalizations are worth further investigations in future.

The authors thank Drs. Hua Liang, Xing Qiu and Jianhua Huang for helpful discussions, and Ms. Jeanne Holden-Wiltse for assistance in editing the manuscript. We also highly appreciate the two referees and the Associate Editors for their insightful comments and useful suggestions that have helped us to greatly improve this manuscript.

Lemma 1. Under conditions A1–A5, sup_{tI} ‖(t, **β**) − **X**(t, β)‖_{∞} = O(h^{p∧4}) for any given **β** in (1.1).

Proof. By Theorem 3.4 in Hairer, Nørsett and Wanner (1993, page 160), under conditions A1–A5, for the *p*th order numerical algorithm (3.2) for (1.1), its global discretization error satisfies

$$\underset{0\le i\le m-1}{\text{max}}{\Vert \stackrel{\sim}{\mathbf{\text{X}}}({s}_{i},\mathit{\text{\beta}})-\mathbf{\text{X}}({s}_{i},\mathit{\text{\beta}})\Vert}_{\infty}=O({h}^{p})\phantom{\rule{2em}{0ex}}\text{for given}\phantom{\rule{0.2em}{0ex}}\mathit{\text{\beta}}\in \mathcal{B}.$$

When *t* is not coincident with the grid points of the numerical algorithm, the cubic Hermite interpolation [de Boor (1978), page 51] will be used to obtain the solution at time *t*. In this case,

$$\underset{t\in I\backslash \{{s}_{i}:0\le i\le m-1\}}{\text{sup}}{\Vert \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})-\mathbf{\text{X}}(t,\mathit{\text{\beta}})\Vert}_{\infty}=O({h}^{4}).$$

Then it follows that

$$\underset{t\in I}{\text{sup}}{\Vert \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})-\mathbf{\text{X}}(t,\mathit{\text{\beta}})\Vert}_{\infty}\le \underset{t\in I\backslash \{{s}_{i}:0\le i\le m-1\}}{\text{sup}}{\Vert \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})-\mathbf{\text{X}}(t,\mathit{\text{\beta}})\Vert}_{\infty}+\underset{t\in \{{s}_{i}:0\le i\le m-1\}}{\text{max}}{\Vert \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})-\mathbf{\text{X}}(t,\mathit{\text{\beta}})\Vert}_{\infty}=O({h}^{4})+O({h}^{p}).$$

In general, *h* is less than 1, *O*(*h*^{4}) + *O*(*h ^{p}*) =

Moreover, Lemma 1 can be extended to the ODE model (1.2) with both constant and time-varying parameters, since for this model, it can be verified that the result of Theorem 3.1 in Hairer, Nørsett and Wanner (1993, page 157) is still valid for any given ** β** and

Proof of Theorem 3.1. Denote
${\stackrel{\sim}{M}}_{n}(\mathit{\text{\beta}})=\frac{1}{n}{\sum}_{i=1}^{n}{[Y({t}_{i})-\stackrel{\sim}{X}({t}_{i},\mathit{\text{\beta}})]}^{2}$,
${M}_{n}(\mathit{\text{\beta}})=\frac{1}{n}{\sum}_{i=1}^{n}{[Y({t}_{i})-X({t}_{i},\mathit{\text{\beta}})]}^{2}$ and *M*(** β**) = [

First, we claim that *E*_{0}[*M*(** β**)] reaches its unique minimum at

$$\begin{array}{cc}{E}_{0}[M(\mathit{\text{\beta}})]\hfill & ={E}_{0}{[Y(t)-X(t,\mathit{\text{\beta}})]}^{2}\hfill \\ \hfill & ={E}_{0}{[Y(t)-X(t,{\mathit{\text{\beta}}}_{0})+X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]}^{2}\hfill \\ \hfill & ={E}_{0}[Y(t)-X{(t,{\mathit{\text{\beta}}}_{0})]}^{2}+{E}_{t}{[X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]}^{2}\hfill \\ \hfill & ={E}_{0}{[\epsilon (t)]}^{2}+{E}_{t}{[X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]}^{2}\hfill \\ \hfill & \ge {E}_{0}{[\epsilon (t)]}^{2}={E}_{0}[M({\mathit{\text{\beta}}}_{0})],\hfill \end{array}$$

where the third equality holds because the intersection term equals zero according to the following calculation:

$$\begin{array}{l}{E}_{0}[\epsilon (t)][X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]\hfill \\ \phantom{\rule{2em}{0ex}}={E}_{t}{E}_{0}\{[\epsilon (t)][X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]\mid t\}\hfill \\ \phantom{\rule{2em}{0ex}}={E}_{t}\{[X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]{E}_{0}[\epsilon (t)]\}\hfill \\ \phantom{\rule{2em}{0ex}}=0,\hfill \end{array}$$

because of *E*_{0}[*ε*(*t*)] = 0. Moreover, *E _{t}*[

$${E}_{0}[M({\widehat{\mathit{\text{\beta}}}}_{n})-M({\mathit{\text{\beta}}}_{0})]\ge C{\Vert {\widehat{\mathit{\text{\beta}}}}_{n}-{\mathit{\text{\beta}}}_{0}\Vert}^{2}.$$

Thus it is sufficient to prove *E*_{0}[*M*(*β*_{0})] − *E*_{0}[*M*(_{n})] → 0, a.s.

Let *N* _{1} (*ε,* , ) be the covering number of the class in the probability measure , as given in Pollard (1984, page 25). From Lemma 4.1 in Pollard (1990), we have that
${N}_{1}(\epsilon ,{L}_{2},\mathcal{B})\le {(\frac{{3R}_{\mathit{\text{\beta}}}}{\epsilon})}^{d}$. Let * _{n}* be the set {

$$|{M}_{n}({\mathit{\text{\beta}}}_{1})-{M}_{n}({\mathit{\text{\beta}}}_{2})|\le C\Vert {\mathit{\text{\beta}}}_{1}-{\mathit{\text{\beta}}}_{2}\Vert ,$$

where *C* is some constant. Then for any probability measure *Q,* we have

$$\underset{Q}{\text{sup}}\phantom{\rule{0.2em}{0ex}}{N}_{1}(\epsilon ,Q,{\mathcal{F}}_{n})\le {N}_{1}(\epsilon /C,{L}_{2},\mathcal{B})\le C{\left(\frac{1}{\epsilon}\right)}^{d}\phantom{\rule{2em}{0ex}}\text{for}\phantom{\rule{0.2em}{0ex}}0<\epsilon <1.$$

Then by Theorem II.37 in Pollard (1984), sup**_{β}** |

Next, by Lemma 1,

$$\begin{array}{cc}{\stackrel{\sim}{M}}_{n}(\mathit{\text{\beta}})\hfill & =\frac{1}{n}\sum _{i=1}^{n}{[Y({t}_{i})-\stackrel{\sim}{X}({t}_{i},\mathit{\text{\beta}})]}^{2}\hfill \\ \hfill & =\frac{1}{n}\sum _{i=1}^{n}{[Y({t}_{i})-X({t}_{i},\mathit{\text{\beta}})+O({n}^{-\lambda (p\wedge 4)})]}^{2}\hfill \\ \hfill & =\frac{1}{n}\sum _{i=1}^{n}{[Y({t}_{i})-X({t}_{i},\mathit{\text{\beta}})]}^{2}+O({n}^{-\lambda (p\wedge 4)})\hfill \\ \hfill & ={M}_{n}(\mathit{\text{\beta}})+O({n}^{-\lambda (p\wedge 4)}).\hfill \end{array}$$

(A.1)

Then

$${\stackrel{\sim}{M}}_{n}({\widehat{\mathit{\text{\beta}}}}_{n})-{E}_{0}[M({\mathit{\text{\beta}}}_{0})]\ge {\stackrel{\sim}{M}}_{n}({\widehat{\mathit{\text{\beta}}}}_{n})-{E}_{0}[M({\widehat{\mathit{\text{\beta}}}}_{n})]={M}_{n}({\widehat{\mathit{\text{\beta}}}}_{n})+O({n}^{-\lambda (p\wedge 4)})-{E}_{0}[M({\widehat{\mathit{\text{\beta}}}}_{n})]$$

and

$${\stackrel{\sim}{M}}_{n}({\widehat{\mathit{\text{\beta}}}}_{n})-{E}_{0}[M({\mathit{\text{\beta}}}_{0})]\le {\stackrel{\sim}{M}}_{n}({\mathit{\text{\beta}}}_{0})-{E}_{0}[M({\mathit{\text{\beta}}}_{0})]={M}_{n}({\mathit{\text{\beta}}}_{0})+O({n}^{-\lambda (p\wedge 4)})-{E}_{0}M({\mathit{\text{\beta}}}_{0}).$$

Hence * _{n}*(

$$|{E}_{0}[M({\widehat{\mathit{\text{\beta}}}}_{n})]-{E}_{0}[M({\mathit{\text{\beta}}}_{0})]|\le |{\stackrel{\sim}{M}}_{n}({\widehat{\mathit{\text{\beta}}}}_{n})-{E}_{0}[M({\widehat{\mathit{\text{\beta}}}}_{n})]|+|{\stackrel{\sim}{M}}_{n}({\widehat{\mathit{\text{\beta}}}}_{n})-{E}_{0}[M({\mathit{\text{\beta}}}_{0})]|\to 0\phantom{\rule{2em}{0ex}}\text{a}.\text{s}.$$

Since *β*_{0} is the unique minimum point for *E*_{0}[*M*(** β**)],

Proof of Theorem 3.2. For the proof of part (i), it suffices to verify conditions of Theorem 2 in Pollard (1985). Denote
${\stackrel{\sim}{G}}_{n}(\mathit{\text{\beta}})=\frac{1}{n}{\sum}_{i=1}^{n}[Y({t}_{i})-\stackrel{\sim}{X}({t}_{i},\mathit{\text{\beta}})]\frac{\partial \stackrel{\sim}{X}({t}_{i},\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}$,
${G}_{n}(\mathit{\text{\beta}})=\frac{1}{n}{\sum}_{i=1}^{n}[Y({t}_{i})-X({t}_{i},\mathit{\text{\beta}})]\frac{\partial X({t}_{i},\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}$ and
$G(\mathit{\text{\beta}})={E}_{0}[Y(t)-X(t,\mathit{\text{\beta}})]\frac{\partial X(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}$. Obviously, *Ĝ _{n}*(

First, we verify the following result:
$\sqrt{n}[{\stackrel{\sim}{G}}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]\stackrel{d}{\to}N(0,{H}_{1})$. For fixed *t,* according to the multivariate inequality of Kolmogorov type for *L*_{2}-norms of derivatives [Babenko, Kofanov and Pichugov (1996), page 9], we have
$\Vert \frac{\partial \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}-\frac{\partial \mathbf{\text{X}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}\Vert \phantom{\rule{0.2em}{0ex}}\le \phantom{\rule{0.2em}{0ex}}C{\Vert \frac{{\partial}^{2}\stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}-\frac{{\partial}^{2}\mathbf{\text{X}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}\Vert}_{\infty}^{1/2}{\Vert \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})-\mathbf{\text{X}}(t,\mathit{\text{\beta}})\Vert}_{\infty}^{1/2}\le C\prime {\Vert \stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})-\mathbf{\text{X}}(t,\mathit{\text{\beta}})\Vert}_{\infty}^{1/2}$ for two constants *C* and *C′,* where the second inequality holds because of the uniform boundedness of both
$\frac{{\partial}^{2}\mathbf{\text{X}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}$ and
$\frac{{\partial}^{2}\stackrel{\sim}{\mathbf{\text{X}}}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}$ under conditions A7 and A8. Based on sup_{tI} ‖(*t,* ** β**) −

$$\begin{array}{l}\sqrt{n}[{\stackrel{\sim}{G}}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]\hfill \\ \phantom{\rule{2em}{0ex}}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}[Y({t}_{i})-\stackrel{\sim}{X}({t}_{i},{\mathit{\text{\beta}}}_{0})]\frac{\partial \stackrel{\sim}{X}({t}_{i},{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}\hfill \\ \phantom{\rule{2em}{0ex}}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}[Y({t}_{i})-X({t}_{i},{\mathit{\text{\beta}}}_{0})+O({n}^{-\lambda (p\wedge 4)})]\left[\frac{\partial X({t}_{i},{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}+O({n}^{-\lambda (p\wedge 4)/2})\right]\hfill \\ \phantom{\rule{2em}{0ex}}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}[Y({t}_{i})-X({t}_{i},{\mathit{\text{\beta}}}_{0})]\frac{\partial X({t}_{i},{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}+O({n}^{-\lambda (p\wedge 4)/2+1/2}).\hfill \end{array}$$

When *λ* > 1/(*p* ∧ 4), *O*(*n*^{−λ(p∧4)/2+1/2}) = *o*(1). So for the above expression, we have

$$\begin{array}{l}\sqrt{n}[{\stackrel{\sim}{G}}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]\hfill \\ \phantom{\rule{2em}{0ex}}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}[Y({t}_{i})-X({t}_{i},{\mathit{\text{\beta}}}_{0})]\frac{\partial X({t}_{i},{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}+o(1)\hfill \\ \phantom{\rule{2em}{0ex}}=\sqrt{n}[{G}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]+o(1).\hfill \end{array}$$

Based on the general central limit theorem, $\sqrt{n}[{G}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]\to N(0,{\mathbf{\text{H}}}_{1})$ with

$${\mathbf{\text{H}}}_{1}={E}_{0}{[Y(t)-X(t,{\mathit{\text{\beta}}}_{0})]}^{2}{\left[\frac{\partial X(t,{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}\right]}^{\otimes 2}={\sigma}_{0}^{2}{E}_{t}{\left[\frac{\partial X(t,{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}\right]}^{\otimes 2}.$$

Second, let *δ _{n} ↓* 0. For ‖

$$\sqrt{n}[{\stackrel{\sim}{G}}_{n}(\mathit{\text{\beta}})-G(\mathit{\text{\beta}})]-\sqrt{n}[{\stackrel{\sim}{G}}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]={o}_{p}(1).$$

In fact, from the first step above, for any ** β** , we have that
$\sqrt{n}[{\stackrel{\sim}{G}}_{n}(\mathit{\text{\beta}})-{G}_{n}(\mathit{\text{\beta}})]={o}_{p}(1)$. Then

$$\sqrt{n}[{\stackrel{\sim}{G}}_{n}(\mathit{\text{\beta}})-G(\mathit{\text{\beta}})]-\sqrt{n}[{\stackrel{\sim}{G}}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]=\sqrt{n}[{G}_{n}(\mathit{\text{\beta}})-G(\mathit{\text{\beta}})]-\sqrt{n}[{G}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]+{o}_{p}(1).$$

From Lemma 4.1 in Pollard (1990), we have that
${N}_{1}(\epsilon ,{L}_{2},\mathcal{B})\le {(\frac{3R}{\epsilon})}^{d}$. Let Λ* _{n}* be the set {

$$|{G}_{n}({\mathit{\text{\beta}}}_{1})-{G}_{n}({\mathit{\text{\beta}}}_{2})|\le C\Vert {\mathit{\text{\beta}}}_{1}-{\mathit{\text{\beta}}}_{2}\Vert ,$$

where *C* is some constant. Then for any probability measure *Q,* we have

$${N}_{1}(\epsilon ,{L}_{2}(Q),{\mathrm{\Lambda}}_{n})\le {N}_{1}(\epsilon /C,{L}_{2},\mathcal{B})\le C{\left(\frac{1}{\epsilon}\right)}^{d},$$

and thus

$$log\phantom{\rule{0.2em}{0ex}}{N}_{1}(\epsilon ,{L}_{2}(Q),{\mathrm{\Lambda}}_{n})\le d\phantom{\rule{0.2em}{0ex}}log\frac{1}{\epsilon}.$$

Since
${\int}_{0}^{1}log(1/\epsilon )\phantom{\rule{0.2em}{0ex}}d\epsilon <\infty $, Λ* _{n}* is a P-Donsker class by Theorem 2.5.2 in van der Vaart and Wellner (1996). Hence
$\sqrt{n}[{G}_{n}(\mathit{\text{\beta}})-G(\mathit{\text{\beta}})]-\sqrt{n}[{G}_{n}({\mathit{\text{\beta}}}_{0})-G({\mathit{\text{\beta}}}_{0})]={o}_{p}(1).$.

Third, with some simple calculations, we have
$G(\mathit{\text{\beta}})={E}_{t}[X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]\frac{\partial X(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}$, then
$\frac{\partial G(\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}=-\int {\left\{\frac{\partial X(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}\right\}}^{\otimes 2}d\mathrm{\Phi}(t)+\int [X(t,{\mathit{\text{\beta}}}_{0})-X(t,\mathit{\text{\beta}})]\frac{{\partial}^{2}X(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}\partial {\mathit{\text{\beta}}}^{T}}d\mathrm{\Phi}(t)$ and
$\frac{\partial G(\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}{\mid}_{\mathit{\text{\beta}}={\mathit{\text{\beta}}}_{0}}=-{E}_{t}{\{\frac{\partial X(t,{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}\}}^{\otimes 2}$. Denote
${\mathbf{\text{H}}}_{2}={E}_{t}{\{\frac{\partial X(t,{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}\}}^{\otimes 2}$. Then by using the Taylor series expansion again, the function *G*(** β**) is Fréchet-differentiable at

Thus all conditions of Theorem 2 in Pollard (1985) are satisfied, then Theorem 3.2(i) holds with ${\mathbf{\text{V}}}_{1}={\mathbf{\text{H}}}_{2}^{-1}{\mathbf{\text{H}}}_{1}{({\mathbf{\text{H}}}_{2}^{-1})}^{T}={\sigma}_{0}^{2}{\{{E}_{t}{[\frac{\partial X(t,{\mathit{\text{\beta}}}_{0})}{\partial {\mathit{\text{\beta}}}_{0}}]}^{\otimes 2}\}}^{-1}$.

For the proof of case (ii) of Theorem 3.2, it is easy to verify the conditions of Theorem 2 in Pollard (1985) for the asymptotic normality. Now we just need to show ** = ***β*_{0} + *O*(*h*^{(p∧4)/2}) and _{1} = *V* _{1} + *O*(*h*^{(p∧4)/2}). Denote (** β**) = [

$$\begin{array}{cc}\stackrel{\sim}{G}(\mathit{\text{\beta}})\hfill & ={E}_{0}[Y(t)-\stackrel{\sim}{X}(t,\mathit{\text{\beta}})]\frac{\partial \stackrel{\sim}{X}(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}\hfill \\ \hfill & ={E}_{0}[Y(t)-X(t,\mathit{\text{\beta}})+O({h}^{p\wedge 4})]\left[\frac{\partial X(t,\mathit{\text{\beta}})}{\partial \mathit{\text{\beta}}}+O({h}^{(p\wedge 4)/2})\right]\hfill \\ \hfill & =G(\mathit{\text{\beta}})+O({h}^{(p\wedge 4)/2}).\hfill \end{array}$$

It follows that (**) = ***G*(**) + ***O*(*h*^{(p∧4)/2}), then *G*(**) = ***O* (*h*^{(p∧4)/2}) from (**) = 0. The Taylor series expansion yields that there exist constants 0 < ***c*_{1}, *c*_{2} < ∞ such that

$${c}_{1}\Vert \widehat{\mathit{\text{\beta}}}-{\mathit{\text{\beta}}}_{0}\Vert \le |G(\stackrel{\sim}{\mathit{\text{\beta}}})-G({\mathit{\text{\beta}}}_{0})|\le {c}_{2}\Vert \stackrel{\sim}{\mathit{\text{\beta}}}-{\mathit{\text{\beta}}}_{0}\Vert .$$

Thus ‖** − ***β*_{0}‖ = *O*(*h*^{(p∧4)/2}) from G(*β*_{0}) = 0. Similarly we can show that ‖_{1} − **V**_{1}‖ = *O*(*h*^{(p∧4)/2}).

Some definitions and notation are necessary in order to prove Theorems 4.1–4.3. Denote
${\stackrel{\sim}{M}}_{n}(\mathit{\text{\theta}})=\frac{1}{n}{\sum}_{i=1}^{n}{\sum}_{j=1}^{K}{[{Y}_{j}({t}_{i})-{\stackrel{\sim}{\mathbf{\text{X}}}}_{j}({t}_{i},\mathit{\text{\beta}},\eta ({t}_{i}))]}^{2}$,
${M}_{n}(\mathit{\text{\theta}})=\frac{1}{n}{\sum}_{i=1}^{n}{\sum}_{j=1}^{K}{[{Y}_{j}({t}_{i})-{\mathbf{\text{X}}}_{j}({t}_{i},\mathit{\text{\beta}},\eta ({t}_{i}))]}^{2}$ and
$M(\mathit{\text{\theta}})={\sum}_{j=1}^{K}{[{Y}_{j}-{\mathbf{\text{X}}}_{j}(t,\mathit{\text{\beta}},\eta (t))]}^{2}$. We define a semidistance *ρ* on Θ as

$${\rho}^{2}(\mathit{\text{\theta}},{\mathit{\text{\theta}}}_{0})={E}_{0}{\{{(\mathit{\text{\beta}}-{\mathit{\text{\beta}}}_{0})}^{T}{\dot{M}}_{1}(\mathit{\text{\theta}})+{\dot{M}}_{2}(\mathit{\text{\theta}})[\eta -{\eta}_{0}]\}}^{2},$$

where _{1} is the score function of *M* for ** β**, and

Proof of Theorem 4.1. Similarly to the proof of Theorem 3.1, we have that *E*_{0}[*M*(*θ*_{0})] reaches its unique minimum at ** θ** =

$${E}_{0}[M({\mathit{\text{\theta}}}_{0})-M({\widehat{\mathit{\text{\theta}}}}_{n})]\ge C{\rho}^{2}({\widehat{\mathit{\text{\theta}}}}_{n},{\mathit{\text{\theta}}}_{0}),$$

where *C* is some constant. Thus if *E*_{0}[*M*(*θ*_{0})] − *E*_{0}[*M*(* _{n}*)] → 0, almost surely under

Let
${\mathcal{A}}_{n}^{\delta}$ be the set {*η* * _{n}*, ‖

$${N}_{2}(\epsilon ,{L}_{\infty},{\mathcal{A}}_{n}^{\delta})\le C{(\delta /\epsilon )}^{N},$$

where *N* = *q* + *l* is the number of B-splines basis functions. Let * _{n}* be the set {

$$|{M}_{n}({\mathit{\text{\theta}}}_{1})-{M}_{n}({\mathit{\text{\theta}}}_{2})|\le C(\Vert {\mathit{\text{\beta}}}_{1}-{\mathit{\text{\beta}}}_{2}\Vert +{\Vert {\eta}_{1}-{\eta}_{2}\Vert}_{\infty})$$

using Taylor's expansion. Hence

$$\begin{array}{cc}{N}_{2}(\epsilon ,{L}_{\infty},{\mathcal{F}}_{n})\hfill & \le {N}_{1}(\epsilon /2,{L}_{2},\mathcal{B})\times {N}_{2}(\epsilon /2,{L}_{\infty},{\mathcal{A}}_{n}^{\delta})\hfill \\ \hfill & \le C{(3{R}_{d}/\epsilon )}^{d}{(\delta /\epsilon )}^{N}\hfill \\ \hfill & \le C\prime {(1/\epsilon )}^{N+d}.\hfill \end{array}$$

Note that, since *N*_{2}(*ε, L _{∞}*,

From the extension of Lemma 1 for any given ** β** and

$${\stackrel{\sim}{M}}_{n}(\mathit{\text{\theta}})={M}_{n}(\mathit{\text{\theta}})+O({n}^{-\lambda (p\wedge 4)}).$$

(A.2)

Then the remaining steps are similar to those in the proof of Theorem 3.1.

Proof of Theorem 4.2. We apply Theorem 3.4.1 in van der Vaart and Wellner (1996) to obtain the rate of convergence.

For *θ*_{n}_{0} in the proof of Theorem 4.1, define *θ*_{n}_{0} *ρ*_{1}(** θ**,

Let Ξ* _{n}* be the set {

$$\begin{array}{cc}\stackrel{\sim}{J}(\delta ,{L}_{2}(P),{\mathrm{\Xi}}_{n})\hfill & ={\int}_{0}^{\delta}\sqrt{1+log{N}_{2}(\epsilon ,{L}_{2}(P),{\mathrm{\Xi}}_{n})}d\epsilon \hfill \\ \hfill & \le {\int}_{0}^{\delta}\sqrt{1+log{N}_{2}(\epsilon ,{L}_{\infty},{\mathrm{\Xi}}_{n})}d\epsilon \hfill \\ \hfill & \le C{N}^{1/2}\delta .\hfill \end{array}$$

Let

$${\phi}_{n}(\delta )=\stackrel{\sim}{J}(\delta ,{L}_{2}(P),{\mathrm{\Xi}}_{n})\left(1+\frac{\stackrel{\sim}{J}(\delta ,{L}_{2}(P),{\mathrm{\Xi}}_{n})}{{\delta}^{2}\sqrt{n}}\right)={N}^{1/2}\delta +\frac{N}{\sqrt{n}}.$$

Obviously, *ϕ _{n}*(

$${E}_{0}\left[\underset{\mathrm{\Omega}}{\text{sup}}\phantom{\rule{0.2em}{0ex}}\sqrt{n}({M}_{n}-M)(\mathit{\text{\theta}}-{\mathit{\text{\theta}}}_{n0})\right]\preccurlyeq {\phi}_{n}(\delta ).$$

For λ > 1/[2(*p* ∧ 4)], from (A.2), it follows that

$$\sqrt{n}[{\stackrel{\sim}{M}}_{n}(\mathit{\text{\theta}})-{M}_{n}(\mathit{\text{\theta}})]=O({n}^{1/2-\lambda (p\wedge 4)})=o(1).$$

Then we have that

$${E}_{0}\left[\underset{\mathrm{\Omega}}{\text{sup}}\phantom{\rule{0.2em}{0ex}}\sqrt{n}({\stackrel{\sim}{M}}_{n}-M)(\mathit{\text{\theta}}-{\mathit{\text{\theta}}}_{n0})\right]\preccurlyeq {\phi}_{n}(\delta ).$$

Then the conditions of Theorem 3.4.1 in van der Vaart and Wellner (1996) are satisfied for the *δ _{n}*,

Now, we define a distance *ρ*_{2} as

$${\rho}_{2}({\mathit{\text{\theta}}}_{1},{\mathit{\text{\theta}}}_{2})=\Vert {\mathit{\text{\beta}}}_{1}-{\mathit{\text{\beta}}}_{2}\Vert +{\Vert {\eta}_{1}-{\eta}_{2}\Vert}_{\infty}.$$

Let *ς* be a positive constant. Similarly to the proof of Theorem 3.2 in Huang (1999), it is easy to follow that for any ** θ** with

$$-{c}_{1}{d}^{2}(\mathit{\text{\theta}},{\mathit{\text{\theta}}}_{n0})+{O}_{p}({n}^{-2v\varrho})\le -{\rho}_{1}^{2}(\mathit{\text{\theta}},{\mathit{\text{\theta}}}_{n0})\le -{c}_{2}{d}^{2}(\mathit{\text{\theta}},{\mathit{\text{\theta}}}_{n0})+{O}_{p}({n}^{-2v\varrho}).$$

Therefore, for a constant *c*_{2} > 0,

$${c}_{2}{d}^{2}({\widehat{\mathit{\text{\theta}}}}_{n},{\mathit{\text{\theta}}}_{n0})\le {O}_{p}({n}^{-2v\varrho}+{n}^{-(1-v)}).$$

Because *d*(*θ** _{n0}*,

Proof of Theorem 4.3. We prove this theorem using Theorem 6.1 in Wellner and Zhang (2007). It suffices to validate conditions A1–A6 of Theorem 6.1 in Wellner and Zhang (2007). From the proof of Theorems 4.1 and 4.2 above, it is easy to see that condition A1 regarding consistency and rate of convergence and condition A2 for Theorem 6.1 in Wellner and Zhang (2007) hold.

For condition A3, we need to calculate the pseudo-information matrix. For any fixed *η* , let _{0} = {*η _{ω}*(·) :

$$\begin{array}{c}{\dot{M}}_{1}=\frac{\partial M}{\partial \mathit{\text{\beta}}}=-2\sum _{j=1}^{K}({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}},\\ {\dot{M}}_{2}[a]=\frac{\partial M}{\partial {\eta}_{0}}=-2\sum _{j=1}^{K}({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}a(t).\end{array}$$

We also set

$$\begin{array}{c}{\dot{M}}_{11}=\frac{{\partial}^{2}M}{\partial {\mathit{\text{\beta}}}_{0}\partial {\mathit{\text{\beta}}}_{0}^{T}}=2\sum _{j=1}^{K}\left[\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}^{T}}-({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{{\partial}^{2}{\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}\partial {\mathit{\text{\beta}}}_{0}^{T}}\right],\\ {\dot{M}}_{12}[a]={\dot{M}}_{21}^{T}[a]=\frac{{\partial}^{2}M}{\partial {\mathit{\text{\beta}}}_{0}\partial {\mathit{\text{\eta}}}_{0}}=2\sum _{j=1}^{K}\left[\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}-({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{{\partial}^{2}{\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}\partial {\eta}_{0}}\right]\phantom{\rule{0.2em}{0ex}}a(t).\end{array}$$

and

$${\dot{M}}_{22}[{a}_{1},{a}_{2}]=\frac{{\partial}^{2}M}{\partial {\eta}_{0}^{2}}=2\sum _{j=1}^{K}\left[{\left(\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}\right)}^{2}-({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{{\partial}^{2}{\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}^{2}}\right]{a}_{1}(t){a}_{2}(t),$$

where *a*_{1}(*t*), *a*_{2}(*t*) . Following the idea from the proofs of the asymptotic results for semiparametric M-estimator in Ma and Kosorok (2005) and Wellner and Zhang (2007), we assume that the special perturbation direction
${\mathbf{\text{a}}}^{\ast}(t)={({a}_{1}^{\ast}(t),\dots ,{a}_{d}^{\ast}(t))}^{T}$ with
${a}_{i}^{\ast}(t)\in \Upsilon $ for 1 ≤ *i* ≤ *d*, satisfies *E0*{_{12}[*a*] − _{22}[**a***, *a*]} = 0 for any *a* . Some calculations yield

$$\begin{array}{l}{E}_{0}\{{\dot{M}}_{12}[a]-{\dot{M}}_{22}[{\mathbf{\text{a}}}^{\ast},a]\}\hfill \\ \phantom{\rule{2em}{0ex}}=2\sum _{j=1}^{K}{E}_{0}\left[\left\{\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}-({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{{\partial}^{2}{\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{\ast}\partial {\eta}_{0}}\right\}a(t)-\left\{{\left(\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}\right)}^{2}-({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{{\partial}^{2}{\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}^{2}}\right\}a(t){\mathbf{\text{a}}}^{\ast}(t)\right]\hfill \\ \phantom{\rule{2em}{0ex}}=2\sum _{j=1}^{K}{E}_{t}{E}_{0}\left(\left[\left\{\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}-({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{{\partial}^{2}{\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{\ast}\partial {\eta}_{0}}\right\}a(t)-\left\{{\left(\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}\right)}^{2}-({Y}_{j}-{\mathbf{\text{X}}}_{j})\frac{{\partial}^{2}{\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}^{2}}\right\}a(t){\mathbf{\text{a}}}^{\ast}(t)\right]\mid t\right).\hfill \end{array}$$

It follows that

$${\mathbf{\text{a}}}^{\ast}(t)=\frac{{\sum}_{j=1}^{K}{E}_{\ast}[\{\partial {\mathbf{\text{X}}}_{j}/\partial {\mathit{\text{\beta}}}_{0}\partial {\mathbf{\text{X}}}_{j}/\partial {\eta}_{0}-({Y}_{j}-{\mathbf{\text{X}}}_{j}){\partial}^{2}{\mathbf{\text{X}}}_{j}/\partial {\mathit{\text{\beta}}}_{0}\partial {\eta}_{0}\}\mid t]}{{\sum}_{j=1}^{K}{E}_{0}[\{{(\partial {\mathbf{\text{X}}}_{j}/\partial {\eta}_{0})}^{2}-({Y}_{j}-{\mathbf{\text{X}}}_{j}){\partial}^{2}{\mathbf{\text{X}}}_{j}/\partial {\eta}_{0}^{2}\}\mid t]}.$$

(A.3)

Since *E*_{0}[** Y**(

$${\mathbf{\text{a}}}^{\ast}(t)=\frac{{\sum}_{j=1}^{K}\partial {\mathbf{\text{X}}}_{j}/\partial {\mathit{\text{\beta}}}_{0}\partial {\mathbf{\text{X}}}_{j}/\partial {\eta}_{0}}{{\sum}_{j=1}^{K}{(\partial {\mathbf{\text{X}}}_{j}/\partial {\eta}_{0})}^{2}}.$$

(A.4)

For *K* ≥ 2, both

$${\mathbf{\text{S}}}_{1}={E}_{0}({\dot{M}}_{11}-{\dot{M}}_{12}[{\mathbf{\text{a}}}^{\ast}])=2\sum _{j=1}^{K}{E}_{t}\left[\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}^{T}}-\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}}\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}{\mathbf{\text{a}}}^{\ast}(t)\right]$$

(A.5)

and

$${\mathbf{\text{S}}}_{2}={E}_{0}{({\dot{M}}_{1}-{\dot{M}}_{2}[{\mathbf{\text{a}}}^{\ast}])}^{\otimes 2}=4\sum _{j=1}^{K}{\sigma}_{j}^{2}{E}_{t}{\left\{\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\mathit{\text{\beta}}}_{0}}-\frac{\partial {\mathbf{\text{X}}}_{j}}{\partial {\eta}_{0}}{\mathbf{\text{a}}}^{\ast}(t)\right\}}^{\otimes 2}$$

(A.6)

are nonsingular. Let ${\mathbf{\text{V}}}_{2}={\mathbf{\text{S}}}_{1}^{-1}{\mathbf{\text{S}}}_{2}{({\mathbf{\text{S}}}_{1}^{-1})}^{T}$. Thus condition A3 of finite variance for Theorem 6.1 in Wellner and Zhang (2007) is satisfied.

Condition A4 for Theorem 6.1 in Wellner and Zhang (2007) follows similar arguments as condition (i) of the proof of Theorem 4 in Xue, Lam and Li (2004). Condition A5 of stochastic equicontinuity is similar to verify conditions C3 of the proof of Theorem 4 in Xue, Lam and Li (2004). Condition A6 of smoothness of the model can be easily verified using a straightforward Taylor expansion where *n*^{−c1} is just the rate of convergence in Theorem 4.2 and faster than *n*^{−1/4}, and *c*_{2} = 2, which completes the proof.

Proof of Proposition 1. Let be the set of a real valued functions *g* on [*a, b*] which are absolutely continuous and satisfy
${\int}_{a}^{b}{g}^{2}(t)dt<\infty $ and *E _{t} g*(

$$\begin{array}{c}{\dot{M}}_{1}=\frac{\partial M}{\partial \mathit{\text{\beta}}}=-2\left(Y-X\right)\frac{\partial X}{\partial \xi},\\ {\dot{M}}_{2}[a]=\frac{\partial M}{\partial \eta}=-2\left(Y-X\right)\frac{\partial X}{\partial \xi}a(t)\end{array}$$

with *ξ* = *β*_{0} + *η*_{0}(*t*). Let *Ṗ* be the linear span of _{2}[*a*]. Since *E _{0}*{

*AMS 2000 subject classifications*. Primary 60G02, 60G20; secondary 60G08, 62P10.

- Adams BM. Ph D thesis. North Carolina State Univ; 2005. Non-parametric parameter estimation and clinical data fitting with a model of HIV infection. MR2623319.
- Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control. Oxford Univ. Press; Oxford: 1996.
- Audoly S, Bellu G, D'Angio L, Saccomani MP, Cobelli C. Global identifiability of nonlinear models of biological systems. IEEE Trans Biomed Eng. 2001;48:55–65. [PubMed]
- Babenko VF, Kofanov VA, Pichugov SA. Multivariate inequalities of Kolmogorov type and their applications. In: Nuraberger G, Schmidt JW, Walz G, editors. Multivariate Approximation and Splines. Birkhauser; Basel: 1996. pp. 1–12. MR1484990.
- Bard Y. Nonlinear Parameter Estimation. Academic Press; New York: 1974. MR0326870.
- Bellman R, Åström KJ. On structural identifiability. Math Biosci. 1970;7:329–339.
- Benson M. Parameter fitting in dynamic models. Ecol Mod. 1979;6:97–115.
- Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ Press; Baltimore: 1993. MR1245941.
- Brookmeyer R, Gail MH. AIDS Epidemiology: A Quantitative Approach Monographs in Epidemiology and Biostatistics. Vol. 23. Oxford Univ Press; New York: 1994.
- Brunel N. Parameter estimation of ODE's via nonparametric estimators. Electron J Statist. 2008;2:1242–1267. MR2471285.
- Burnham KP, Anderson DR. Multimodel inference: Understanding AIC and BIC in model selection. Sociol Methods Res. 2004;33:261. MR2086350.
- Chappel MJ, Godfrey KR. Structural identifiability of the parameters of a nonlinear batch reactor model. Math Biosci. 1992;108:245–251. MR1154720. [PubMed]
- Chen J, Wu H. Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to HIV-1 dynamics. J Amer Statist Assoc. 2008;103:369–384. MR2420240.
- Chen T, He HL, Church GM. Modeling gene expression with differential equations. Pac Symp Biocomput. 1999:29–40. [PubMed]
- Cobelli C, Lepschy A, Jacur R. Identifiability of compartmental systems and related structural proerties. Math Biosci. 1979;44:1–18.
- Daley DJ, Gani J. Epidemic Modeling. Cambridge Univ Press; Cambridge: 1999.
- de Boor C. A Practical Guide to Splines. Springer; New York: 1978. MR0507062.
- Delgado MA. Semiparametric generalized least squares in the multivariate nonlinear regression model. Econometric Theory. 1992;8:203–222. MR1179510.
- Donnet S, Samson A. Estimation of parameters in incomplete data models defined by dynamical systems. J Statist Plann Inference. 2007;137:2815–2831. MR2323793.
- Englezos P, Kalogerakis N. Applied Parameter Estimation for Chemical Engineers. Dekker; New York: 2001.
- Grenander U. Abstract Inference. Wiley; New York: 1981. MR0599175.
- Hairer E, Nørsett SP, Wanner G. Solving Ordinary Differential Equations I: Nonstiff Problems. 2nd. Springer; Berlin: 1993. MR1227985.
- He X, Fung WK, Zhu ZY. Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika. 2002;89:579–590. MR1929164.
- Heckman NE, Ramsay JO. Penalized regression with model-based penalties. Canad J Statist. 2000;28:241–258. MR1792049.
- Ho DD, Neumann AU, Perelson AS, et al. Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature. 1995;373:123–126. [PubMed]
- Huang J. Efficient estimation for the proportinal hazards model with interval censoring. Ann Statist. 1996;24:540–568. MR1394975.
- Huang J. Efficient estimation of the partly linear additive Cox model. Ann Statist. 1999;27:1536–1563. MR1742499.
- Huang J, Rossini AJ. Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Amer Statist Assoc. 1997;92:960–967. MR1482126.
- Huang JZ. Local asymptotics for polynomial spline regression. Ann Statist. 2003;31:1600–1635. MR2012827.
- Huang JZ, Zhang L, Zhou L. Efficient estimation in marginal partially linear models for longitudinal/clustered data using plines. Scand J Statist. 2007;34:451–477. MR2368793.
- Huang Y, Liu D, Wu H. Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics. 2006;62:413–423. MR2227489. [PMC free article] [PubMed]
- Huber PJ. The behavior of maximum likelihood estimates under nonstandard conditions. Fifth Berkeley Symposium on Mathematical Statistics and Probability; Berkeley: Univ California; 1967. pp. 221–233. MR0216620.
- Jansson B, Revesz L. Analysis of the growth of tumor cell populations. Math Biosci. 1975;19:131–154.
- Jeffrey AM, Xia X. Identifiability of HIV/AIDS model. In: Tan WY, Wu H, editors. Deterministic and Stochastic Models of AIDS Epidemics and HIV Infections with Intervention. World Scientific; 2005.
- Jennerich RI. Asymptotic properties of non-linear least squares estimators. Ann Math Statist. 1969;40:633–643. MR0238419.
- Jolliffe IT. Discarding variables in a principal component analyssis. I: Artificial data. J Roy Statist Soc Ser C Appl Statist. 1972;21:160–172. MR0311034.
- Joshi M, Seidel-Morgenstern A, Kremling A. Exploiting the boostrap method for quantifying parameter confidence intervals in dynamic systems. Metabolic Engineering. 2006;8:447–455. [PubMed]
- Kolchin E. Differential Algebra and Algebraic Groups. Academic Press; New York: 1973. MR0568864.
- Kutta W. Beitrag zur näherungsweisen itegration totaler differentialgleichungen. Zeitschr Math Phys. 1901;46:435–453.
- Li L, Brown MB, Lee KH, Gupta S. Estimation and inference for a splineenhanced pupulation pharmacokinetic model. Biometrics. 2002;58:601–611. MR1933534. [PubMed]
- Li Z, Osborne MR, Pravan T. Parameter estimation of ordinary differential equations. IMA J Numer Anal. 2005;25:264–285. MR2126204.
- Liang H, Wu H. Parameter estimation for differential equation models using a framework of measurement error in regression models. J Amer Statist Assoc. 2008;103:1570–1583. MR2504205. [PMC free article] [PubMed]
- Ljung L, Glad T. On global identifiability for arbitrary model parametrizations. Automatica. 1994;30:265–276. MR1261705.
- Ma S, Kosorok MR. Robust semiparametric M-estimation and the weighted bootstrap. J Multivariate Anal. 2005;96:190–217. MR2202406.
- Malinvaud E. The consistancy of nonlinear regressions. Ann Math Statist. 1970;41:956–969. MR0261754.
- Mattheij R, Molenaar J. Ordinary Differential Equations in Theory and Practice. SIAM; Philadelphia: 2002. MR1946758.
- Miao H, Dykes C, Demeter LM, Cavenaugh J, Park SY, Perelson AS, Wu H. Modeling and estimation of kinetic parameters and replicative fitness of HIV-1 from flow-cytometry-based growth competition experiments. Bull Math Biol. 2008;70:1749–1771. MR2430325. [PubMed]
- Michelson S, Leith JT. Tumor heterogeneity and growth control. In: Adam JA, Bellomo N, editors. Tumor Heterogeneity and Growth Control. Boston: Birkhauser; 1997. pp. 295–326.
- Moles CG, Banga JR, Keller K. Solving nonconvex climate control problems: Pitfalls and algorithm performances. Appl Soft Comput. 2004;5:35–44.
- Nowak MA, May RM. Virus Dynamics: Mathematical Principles of Immunology and Virology. Oxford Univ. Press; Oxford: 2000. MR2009143.
- Ollivier F. Ph D thesis. École Polytechnique; Paris, France: 1990. Le problème de l'identifiabilité globale: Étude thé orique, méthodes effectives et bornes de complexité
- Pakes A, Pollard D. Simulation and the asymptotics of optimization estimators. Econometrica. 1989;57:1027–1057. MR1014540.
- Perelson AS, Essunger P, Cao Y, Vesanen M, Hurley A, Saksela K, Markowitz M, Ho DD. Decay characteristics of HIV-1-infected compartments during combination therapy. Nature. 1997;387:188–191. [PubMed]
- Perelson AS, Nelson PW. Mathematical analysis of HIV-1 dynamics in vivo. SIAM Rev. 1999;41:3–44. MR1669741.
- Perelson AS, Neumann AU, Markowitz M, Leonard JM, Ho DD. HIV-1 dynamics in vivo: Virion clearance rate, infected cell life-span, and viral generation time. Science. 1996;271:1582–1586. [PubMed]
- Pohjanpalo H. System identifiability based on the power series expansion of the solution. Math Biosci. 1978;41:21–33. MR0507373.
- Pollard D. Convergence of Stochastic Processes. Springer; New York: 1984. MR0762984.
- Pollard D. New ways to prove central limit theorems. Econometric Theory. 1985;1:295–314.
- Pollard D. Empirical Processes Theory and Applications. IMS; Hayward, CA: 1990. MR1089429.
- Poyton AA, Varziri MS, McAuley KB, McLellen PJ, Ramsay JO. Parameter estimation in continuous-time dynamic models using principal differential analysis. Computers and Chemical Engineering. 2006;30:698–708.
- Putter H, Heisterkamp SH, Lange JM, de Wolf F. A Bayesian approach to parameter estimation in HIV dynamical models. Stat Med. 2002;21:2199–2214. [PubMed]
- Quaiser T, Mönnigmann M. Systematic identifiability testing for unambiguous mechanistic modeling—application to JAK-STAT, MAP kinase, and NF-
*κ*B signaling pathway models. BMC Sys Bio. 2009;3:50. [PMC free article] [PubMed] - Ramsay JO. Principal Differential Analysis: Data Reduction by Differential Operators. J Roy Statist Soc Ser B. 1996;58:495–508. MR1394362.
- Ramsay JO, Hooker G, Campbell D, Cao J. Parameter estimation for differential equations: A generalized smoothing approach (with discussion) J R Stat Soc Ser B Stat Methodol. 2007;69:741–796. MR2368570.
- Ritt JF. Differential Algebra. Amer Math Soc; Providence, RI: 1950. MR0035763.
- Rodriguez-Fernandez M, Egea JA, Banga JR. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics. 2006;7:1–18. [PMC free article] [PubMed]
- Runge C. Ueber die numerische Auflösung von Differentialgleichungen. Math Ann. 1895;46:167–178. MR1510879.
- Schick A. On asymptotically efficient estimation in semiparametric models. Ann Statist. 1986;14:1139–1151. MR0856811.
- Schumaker LL. Spline Functions. Wiley; New York: 1981. MR0606200.
- Shen X. On methods of sieves and penalization. Ann Statist. 1997;25:2555–2591. MR1604416.
- Shen X, Wong WH. Convergence rate of sieve estimates. Ann Statist. 1994;22:580–615. MR1292531.
- Stone CJ. Optimal global rates of convergence for nonparametric regression. Ann Statist. 1982;10:1040–1053. MR0673642.
- Stone CJ. Additive regression and other nonparametric models. Ann Statist. 1985;13:689–705. MR0790566.
- Storn R, Price K. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim. 1997;11:341–359. MR1479553.
- Swartz J, Bremermann H. Discussion of parameter estimation in biological modeling: Algorithms for estimation and evaluation of the estimates. J Math Biol. 1975;1:241–275.
- Tan WY, Wu H. Deterministic and Stochastic Models of AIDS Epidemics and HIV Infections With Intervention. World Scientific; Singapore: 2005. MR2169300.
- Thomaseth K, Alexandra KW, Bernhard L, et al. Integrated mathematical model to assess
*β*-cell activity during the oral glucose test. Amer J Phisiol. 1996;270:E522–531. [PubMed] - Vajda S, Rabitz H, Walter E, Lecourtier Y. Qualitative and quantitative identifiability analysis of nonlinear chemical kinetiv-models. Chem Eng Commun. 1989;83:191–219.
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer; New York: 1996. MR1385671.
- van Domselaar B, Hemker PW. Report NW18/75. Mathematical Centrum; Amsterdam: 1975. Nonlinear parameter estimation in initial value problmes.
- Varah JM. A spline least squares method for numerical parameter estimation in differential equations. SIAM J Sci Comput. 1982;3:28–46. MR0651865.
- Varziri MS, Poyton AA, McAuley KB, McLellen PJ, Ramsay JO. Selecting optimal weighting factors in iPDA for parameter estimation in continuous-time dynamic models. Comp Chem Eng. 2008;32:3011–3022.
- Walter E. Identifiability of Parameteric Models. Pergamon Press; Oxford: 1987.
- Wei X, Ghosh SK, Taylor ME, et al. Viral dynamics in human immunodeficiency virus type 1 infection. Nature. 1995;373:117–122. [PubMed]
- Wellner JA, Zhang Y. Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann Statist. 2007;35:2106–2142. MR2363965.
- Wu CF. Asymptotic theory of nonlinear least squares estiamtion. Ann Statist. 1981;9:501–513. MR0615427.
- Wu H. Statistical methods for HIV dynamic studies in AIDS clinical trials. Stat Methods Med Res. 2005;14:1–22. MR2135921. [PubMed]
- Wu H, Huang Y, Acosta EP, et al. Modeling long-term HIV dynamics and antiretroviral response: Effects of drug potency, pharmacokinetics, adherence, and drug resistance. JAIDS. 2005;39:272–283. [PubMed]
- Wu H, Zhu H, Miao H, Perelson AS. Identifiability and statistical estimation of dynamic parameters in HIV/AIDS dynamic models. Bull Math Biol. 2008;70:785–799. MR2393024. [PubMed]
- Xia X. Estimation of HIV/AIDS parameters. Automatica J IFAC. 2003;39:1983–1988. MR2142834.
- Xia X, Moog CH. Identifiability of nonlinear systems with applications to HIV/AIDS models. IEEE Trans Automat Control. 2003;48:330–336. MR1957979.
- Xue H, Lam KF, Li G. Sieve maximum likelihood estimator for semiparametric regression models with current status data. J Amer Statist Assoc. 2004;99:346–356. MR2062821.