PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS One. 2010; 5(1): e8915.
Published online 2010 January 27. doi:  10.1371/journal.pone.0008915
PMCID: PMC2811744

Parameter Identifiability and Redundancy: Theoretical Considerations

Fabio Rapallo, Editor

Abstract

Background

Models for complex biological systems may involve a large number of parameters. It may well be that some of these parameters cannot be derived from observed data via regression techniques. Such parameters are said to be unidentifiable, the remaining parameters being identifiable. Closely related to this idea is that of redundancy, that a set of parameters can be expressed in terms of some smaller set. Before data is analysed it is critical to determine which model parameters are identifiable or redundant to avoid ill-defined and poorly convergent regression.

Methodology/Principal Findings

In this paper we outline general considerations on parameter identifiability, and introduce the notion of weak local identifiability and gradient weak local identifiability. These are based on local properties of the likelihood, in particular the rank of the Hessian matrix. We relate these to the notions of parameter identifiability and redundancy previously introduced by Rothenberg (Econometrica 39 (1971) 577–591) and Catchpole and Morgan (Biometrika 84 (1997) 187–196). Within the widely used exponential family, parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are shown to be largely equivalent. We consider applications to a recently developed class of cancer models of Little and Wright (Math Biosciences 183 (2003) 111–134) and Little et al. (J Theoret Biol 254 (2008) 229–238) that generalize a large number of other recently used quasi-biological cancer models.

Conclusions/Significance

We have shown that the previously developed concepts of parameter local identifiability and redundancy are closely related to the apparently weaker properties of weak local identifiability and gradient weak local identifiability—within the widely used exponential family these concepts largely coincide.

Introduction

Models for complex biological systems may involve a large number of parameters. It may well be that some of these parameters cannot be derived from observed data via regression techniques. Such parameters are said to be unidentifiable or non-identifiable, the remaining parameters being identifiable. Closely related to this idea is that of redundancy, that a set of parameters can be expressed in terms of some smaller set. Before data is analysed it is critical to determine which model parameters are identifiable or redundant to avoid ill-defined and poorly convergent regression.

Identifiability in stochastic models has been considered previously in various contexts. Rothenberg [1] and Silvey [2] (pp. 50, 81) defined a set of parameters for a model to be identifiable if no two sets of parameter values yield the same distribution of the data. Catchpole and Morgan [3] considered identifiability and parameter redundancy and the relations between them in a general class of (exponential family) models. Rothenberg [1], Jacquez and Perry [4] and Catchpole and Morgan [3] also defined a notion of local identifiability, which we shall extend in the Analysis Section. [There is also a large literature on identifability in deterministic (rather than stochastic) models, for example the papers of Audoly et al. [5], and Bellu [6], which we shall not consider further.] Catchpole et al. [7] and Gimenez et al. [8] outlined use of computer algebra techniques to determine numbers of identifiable parameters in the exponential family. Viallefont et al. [9] considered parameter identifiability issues in a general setting, and outlined a method based on considering the rank of the Hessian for determining identifiable parameters; however, some of their claimed results are incorrect (as we outline briefly later). Gimenez et al. [8] used Hessian-based techniques, as well as a number of purely numerical techniques, for determining the number of identifiable parameters. Further general observations on parameter identifiability and its relation to properties of sufficient statistics are given by Picci [10], and a more recent review of the literature is given by Paulino and de Bragança Pereira [11].

In this paper we outline some general considerations on parameter identifiability. We shall demonstrate that the concepts of parameter local identifiability and redundancy are closely related to apparently weaker properties of weak local identifiability and gradient weak local identifiability that we introduce in the Analysis Section. These latter properties relate to the uniqueness of likelihood maxima and likelihood turning points within the vicinity of sets of parameter values, and are shown to be based on local properties of the likelihood, in particular the rank of the Hessian matrix. Within the widely-used exponential family we demonstrate that these concepts (local identifiability, redundancy, weak local identifiability, gradient weak local identifiability) largely coincide. We briefly consider applications of all these ideas to a recently developed general class of carcinogenesis models [12], [13], [14], presenting results that generalize those of Heidenreich [15] and Heidenreich et al. [16] in the context of the two-mutation cancer model [17]. These are outlined in the later parts of the Analysis and the Discussion, and in more detail in a companion paper [12].

Analysis

General Considerations on Parameter Identifiability

As outlined in the Introduction, a general criterion for parameter identifiability has been set out by Jacquez and Perry [4]. They proposed a simple linearization of the problem, in the context of models with normal error. They defined a notion of local identifiability, which is that in a local region of the parameter space, there is a unique An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e001.jpg that fits some specified body of data, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e002.jpg, i.e. for which the model predicted mean An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e003.jpg is such that the residual sum of squares:

equation image
(1)

has a unique minimum. We present here a straightforward generalization of this to other error structures. If the model prediction An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e005.jpg for the observed data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e006.jpg is a function of some vector parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e007.jpg then in general it can be assumed, under the general equivalence of likelihood maximization and iteratively reweighted least squares for generalized linear models [18](chapter 2) that one is trying to minimize:

equation image
(2)

where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e009.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e010.jpg is the observed measurement (e.g., the numbers of observed cases in the case of binomial or Poisson models) at point An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e011.jpg and the An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e012.jpg are the current estimates of variance at each point. This has a unique minimum in the perturbing An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e013.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e014.jpg) given by An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e015.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e016.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e017.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e018.jpg, whenever An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e019.jpg has full rank ( = An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e020.jpg).

More generally, suppose that the likelihood associated with observation An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e021.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e022.jpg and let An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e023.jpg. Then generalizing the least squares criterion (1) we now extend the definition of local identifiability to mean that there is at most one maximum of:

equation image
(3)

in the neighborhood of any given An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e025.jpg. More formally:

Definitions 1

A set of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e026.jpg is identifiable if for any An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e027.jpg there are no An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e028.jpg for which An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e029.jpg. A set of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e030.jpg is locally identifiable if there exists a neighborhood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e031.jpg such that for no An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e032.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e033.jpg. A set of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e034.jpg is weakly locally identifiable if there exists a neighborhood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e035.jpg and data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e036.jpg such that the log-likelihood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e037.jpg is maximized by at most one set of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e038.jpg. If An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e039.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e040.jpg as a function of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e041.jpg a set of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e042.jpg is gradient weakly locally identifiable if there exists a neighborhood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e043.jpg and data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e044.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e045.jpg (i.e., An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e046.jpg is a turning point of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e047.jpg) for at most one set of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e048.jpg.

Our definitions of identifiability and local identifiability coincide with those of Rothenberg [1], Silvey [2](pp. 50, 81) and Catchpole and Morgan [3]. Rothenberg [1] proved that if the Fisher information matrix, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e049.jpg, in a neighborhood of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e050.jpg is of constant rank and satisfies various other more minor regularity conditions, then An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e051.jpg is locally identifiable if and only if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e052.jpg is non-singular. Clearly identifiability implies local identifiability, which in turn implies weak local identifiability. By the Mean Value Theorem [19](p. 107) gradient weak local identifiability implies weak local identifiability. Heuristically, (gradient) weak local identifiability happens when:

equation image
(4)

and in general this system of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e054.jpg equations has a unique solution in An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e055.jpg in the neighborhood of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e056.jpg (assumed An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e057.jpg) whenever An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e058.jpg has full rank ( = An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e059.jpg). This turns out to be (nearly) the case, and will be proved later (Corollary 2). More rigorously, we have the following result.

Theorem 1

Suppose that the log-likelihood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e060.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e061.jpg as a function of the parameter vector An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e062.jpg, for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e063.jpg.

  1. Suppose that for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e064.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e065.jpg it is the case that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e066.jpg. Then turning points of the likelihood in the neighborhood of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e067.jpg are isolated, i.e., there is an open neighborhood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e068.jpg for which there is at most one An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e069.jpg that satisfies An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e070.jpg.
  2. Suppose that for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e071.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e072.jpg it is the case that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e073.jpg then local maxima of the likelihood in the neighborhood of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e074.jpg are isolated, i.e., there is an open neighborhood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e075.jpg for which there is at most one An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e076.jpg that is a local maximum of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e077.jpg.
  3. Suppose that for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e078.jpg and all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e079.jpg it is the case that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e080.jpg then all local maxima of the likelihood in An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e081.jpg are not isolated, as indeed are all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e082.jpg for which An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e083.jpg.

We prove this result in Text S1 Section A. As an immediate consequence we have the following result.

Corollary 1

For a given An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e084.jpg, a sufficient condition for the likelihood (3) to have at most one maximum and one turning point in the neighborhood of a given An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e085.jpg is that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e086.jpg. In particular, if this condition is satisfied An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e087.jpg is gradient weakly locally identifiable (and therefore weakly locally identifiable). (An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e088.jpg is the parameter space.)

That this condition is not necessary is seen by consideration of the likelihood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e089.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e090.jpg is chosen so that this has unit mass. Then An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e091.jpg which has rank 0 at An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e092.jpg and a unique maximum there. In particular, this shows that the result claimed by Viallefont et al. [9](proposition 2, p. 322) is incorrect.

Definitions 2

A subset of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e093.jpg (for some permutation An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e094.jpg) is weakly maximal (respectively weakly gradient maximal) if for any permissible fixed An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e095.jpg (such that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e096.jpg) An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e097.jpg is weakly locally identifiable (respectively gradient weakly locally identifiable) at that point, but that this is not the case for any larger number of parameters. A subset of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e098.jpg is strongly maximal (respectively strongly gradient maximal) if for any permissible fixed An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e099.jpg and any open An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e100.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e101.jpg restricted to the set An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e102.jpg is weakly maximal (respectively weakly gradient maximal), i.e., all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e103.jpg are weakly maximal (respectively weakly gradient maximal).

From this it easily follows that a strongly (gradient) maximal set of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e104.jpg is a fortiori weakly (gradient) maximal at all points An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e105.jpg for any permissible An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e106.jpg. Assume now that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e107.jpg of the An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e108.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e109.jpg are a weakly maximal set of parameters. So for some permutation An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e110.jpg and for any permissible fixed An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e111.jpg and any An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e112.jpg there is an open neighborhood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e113.jpg and some data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e114.jpg for which An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e115.jpg is maximized by at most one set of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e116.jpg, but that this is not the case for any larger number of parameters. Assume that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e117.jpg. If An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e118.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e119.jpg as a function of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e120.jpg then it follows easily that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e121.jpg must be an open non-empty subset of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e122.jpg. By Theorem 1 (iii) any An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e123.jpg which maximizes An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e124.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e125.jpg cannot be isolated, a contradiction (unless there are no maximizing An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e126.jpg). Therefore, either there are no maximizing An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e127.jpg or for at least one An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e128.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e129.jpg. This implies that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e130.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e131.jpg in the obvious sense.

Assume now that the An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e132.jpg are strongly maximal. Suppose that for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e133.jpg and some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e134.jpg it is the case that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e135.jpg. Because An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e136.jpg is symmetric, there is a permutation An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e137.jpg for which An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e138.jpg [20](p. 79). If An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e139.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e140.jpg as a function of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e141.jpg this will be the case in some open neighborhood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e142.jpg. By Theorem 1 (ii) this implies that the parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e143.jpg have at most one maximum in An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e144.jpg, so that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e145.jpg is not a strongly maximal set of parameters in An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e146.jpg. With small changes everything above also goes through with “weakly gradient maximal” substituted for “weakly maximal” and “strongly gradient maximal” substituted for “strongly maximal”. Therefore we have proved the following result.

Theorem 2

Let An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e147.jpg be An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e148.jpg as a function of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e149.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e150.jpg.

  1. If there is a weakly maximal (respectively weakly gradient maximal) subset of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e151.jpg parameters, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e152.jpg (for some permutation An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e153.jpg), and for fixed An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e154.jpg and some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e155.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e156.jpg has a maximum (respectively turning point) on the set of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e157.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e158.jpg is maximal then An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e159.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e160.jpg.
  2. If there is a strongly maximal (respectively strongly gradient maximal) subset of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e161.jpg parameters, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e162.jpg (for some permutation An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e163.jpg) then An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e164.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e165.jpg.

All further results in this Section assume that the model is a member of the exponential family, so that if the observed data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e166.jpg then the log-likelihood is given by An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e167.jpg for some functions An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e168.jpg. We assume that the natural parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e169.jpg are functions of the model parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e170.jpg and some auxiliary data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e171.jpg, but that the scaling parameter An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e172.jpg is not. LetAn external file that holds a picture, illustration, etc.
Object name is pone.0008915.e173.jpg, so that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e174.jpg. In all that follows we shall assume that the function An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e175.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e176.jpg. The following definition was introduced by Catchpole and Morgan [3].

Definition 3

With the above notation, a set of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e177.jpg is parameter redundant for an exponential family model if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e178.jpg can be expressed in terms of some strictly smaller parameter vector An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e179.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e180.jpg). Otherwise, the set of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e181.jpg is parameter irredundant or full rank.

Catchpole and Morgan [3] proved (their Theorem 1) that a set of parameters is parameter redundant if and only if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e182.jpg. They defined full rank models to be essentially full rank if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e183.jpg for every An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e184.jpg; if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e185.jpg only for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e186.jpg then the parameter set is conditionally full rank. They also showed (their Theorem 3) that if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e187.jpg is the Fisher information matrix then An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e188.jpg, and that parameter redundancy implies lack of local identifiability; indeed their proof of Theorems 2 and 4 showed that there is also lack of weak local identifiability (respectively gradient weak local identifiability) for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e189.jpg which for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e190.jpg are local maxima (respectively turning points) of the likelihood.

Assume that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e191.jpg are an essentially full rank set of parameters for the model. From the above result for every An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e192.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e193.jpg. Therefore, since An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e194.jpg is of full rank and so negative definite, so by the strong law of large numbers we can choose An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e195.jpg so that the same is true of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e196.jpg. This implies that on some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e197.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e198.jpg is of full rank, and therefore by Corollary 1 An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e199.jpg is (gradient) weakly locally identifiable. Furthermore, the above argument shows that if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e200.jpg are a conditionally full rank set of parameters then on the (open) set An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e201.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e202.jpg is gradient weakly locally identifiable. We have therefore proved:

Theorem 3

Let An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e203.jpg belong to the exponential family and be An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e204.jpg as a function of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e205.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e206.jpg.

  1. If the parameter set An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e207.jpg is parameter redundant then it is not locally identifiable, and is not weakly locally identifiable (respectively gradient weakly locally identifiable) for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e208.jpg which for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e209.jpg are local maxima (respectively turning points) of the likelihood.
  2. If the parameter set An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e210.jpg is of essentially full rank then for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e211.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e212.jpg is of full rank and therefore An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e213.jpg is gradient weakly locally identifiable (and so weakly locally identifiable) for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e214.jpg.
  3. If the parameter set An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e215.jpg is of conditionally full rank then it is gradient weakly locally identifiable on the open set An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e216.jpg.

Remarks: It should be noted that part (i) of this generalizes part (i) of Theorem 4 of Catchpole and Morgan [3], who proved that if a model is parameter redundant then it is not locally identifiable. However, some components of part (ii) (that being essentially full rank implies gradient weak local identifiability) is weaker than the other result, proved in part (ii) of Theorem 4 of Catchpole and Morgan [3], namely that if a model is of essentially full rank it is locally identifiable. As noted by Catchpole and Morgan [3] (pp. 193–4), there are exponential-family models that are conditionally full rank, but not locally identifiable, so part (iii) is about as strong a result as can be hoped for.

From Theorem 3 we deduce the following.

Corollary 2

Let An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e217.jpg belong to the exponential family and be An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e218.jpg as a function of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e219.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e220.jpg. Then

  1. If for some subset of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e221.jpg and some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e222.jpg it is the case that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e223.jpg then this subset is gradient weakly locally identifiable at this point.
  2. If a subset of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e224.jpg is weakly locally identifiable and for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e225.jpg this point is a local maximum of the likelihood then it is parameter irredundant, i.e., of full rank, so An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e226.jpg, so that for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e227.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e228.jpg. In particular, if this holds for all An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e229.jpg then parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are all equivalent.

Proof

This is an immediate consequence of the remarks after Definition 1, Corollary 1, Theorem 3 (i) and Theorems 1 and 3 of Catchpole and Morgan [3]. QED.

Remarks: (i) By the remarks preceding Theorem 3 the conditions of part (i) (that for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e230.jpg it is the case that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e231.jpg) are automatically satisfied if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e232.jpg are an essentially full rank set of parameters for the model.

(ii) Assume the model is constructed from a stochastic cancer model embedded in the exponential family, in the sense outlined in Text S1 Section B, so that the natural parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e233.jpg are functions of the model parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e234.jpg and some auxiliary data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e235.jpg, and the means are given by An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e236.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e237.jpg is the cancer hazard function. In this case, as shown in Text S1 Section B, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e238.jpg. The second term inside the summation An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e239.jpg is a rank 1 matrix and can be made small in relation to the first term, e.g., by making An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e240.jpg small. Therefore finding data An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e241.jpg for which An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e242.jpg is equivalent to finding data for which An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e243.jpg, or by the result of Dickson [20](p. 79) for which An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e244.jpg.

Hessian vs Fisher Information Matrix as a Method of Determining Redundancy and Identifiability in Generalised Linear Models

We, as with Catchpole and Morgan [3], emphasise use of the Hessian of the likelihood rather than the Fisher information matrix considered by Rothenberg [1]. In the context of GLMs, we have An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e245.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e246.jpg for some link function An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e247.jpg and fixed matrix An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e248.jpg. We define An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e249.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e250.jpg. Theorem 1 of Catchpole and Morgan [3] states that a model is parameter irredundant if and only if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e251.jpg. The score vector is given by An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e252.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e253.jpg. The Fisher information is therefore given by An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e254.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e255.jpg is the data variance. Theorem 1 of Rothenberg [1] states that a model is locally identifiable if and only if An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e256.jpg. As above (Corollary 2 (ii)), heuristically parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are all equivalent and occur whenever An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e257.jpg. Clearly evaluating the rank of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e258.jpg is generally much easier than that of An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e259.jpg. Catchpole and Morgan [3] demonstrate use of Hessian-based methods to estimate parameter redundancy in a class of capture-recapture models.

However, for certain applications, both the Fisher information and the Hessian must be employed, as we now outline. Assume that the model is constructed from a stochastic cancer model embedded in an exponential family model in the sense outlined in Text S1 Section B. The key to showing that such an embedded model has no more than An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e260.jpg irredundant parameters is to construct (as is done in Little et al. [12]) some scalar functions An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e261.jpg such that the cancer hazard function An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e262.jpg can be written as An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e263.jpg. Since the cancer model is embedded in a member of the exponential family (in the sense outlined in Text S1 Section B) the same will be true of the total log-likelihood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e264.jpg. By means of the Chain Rule we obtain An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e265.jpg, so that the Fisher information matrix is given by:

equation image
(5)

which therefore has rank at most An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e267.jpg. Therefore by Corollary 2 there can be at most An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e268.jpg irredundant parameters, or indeed (gradient) weak locally identifiable parameters. [A similar argument shows that if one were to reparameterise (via some invertible An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e269.jpg mapping An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e270.jpg) then the embedded log-likelihood An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e271.jpg associated with An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e272.jpg must also have Fisher information matrix of rank at most An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e273.jpg.] By remark (ii) after Corollary 2, to show that a subset of cardinality An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e274.jpg of the parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e275.jpg is (gradient) weak locally identifiable parameters, requires that one show that An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e276.jpg has rank at least An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e277.jpg for some An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e278.jpg. This is the approach adopted in the paper of Little et al. [12].

Discussion

In this paper we have introduced the notions of weak local identifiability and gradient weak local identifiability, which we have related to the notions of parameter identifiability and redundancy previously introduced by Rothenberg [1] and Catchpole and Morgan [3]. In particular we have shown that within the exponential family models parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are largely equivalent.

The slight novelty of our approach is that the notions of weak local identifiability and gradient weak local identifiability that we introduce are related much more to the Hessian of the likelihood rather than the Fisher information matrix that was considered by Rothenberg [1]. However, in practice, the two approaches are very similar; Catchpole and Morgan [3] used the Hessian of the likelihood, as do we, because of its greater analytic tractability. The use of this approach is motivated by the application, namely to determine identifiable parameter combinations in a large class of stochastic cancer models, as we outline at the end of the Analysis Section. In certain applications the Fisher information may be best for estimating the upper bound to the number of irredundant parameters, but the Hessian may be best for estimating the lower bound of this quantity.

In the companion paper of Little et al. [12] we consider the problem of parameter identifiability in a particular class of stochastic cancer models, those of Little and Wright [13] and Little et al. [14]. These models generalize a large number of other quasi-biological cancer models, in particular those of Armitage and Doll [21], the two-mutation model [17], the generalized multistage model of Little [22], and a recently developed cancer model of Nowak et al. [23] that incorporates genomic instability. These and other cancer models are generally embedded in an exponential family model in the sense outlined in Text S1 Section B, in particular when cohort data are analysed using Poisson regression models, e.g., as in Little et al. [13], [14], [24]. As we show at the end of the Analysis Section, proving (gradient) weak local identifiability of a subset of cardinality An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e279.jpg of the parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e280.jpg can be done by showing that for this subset of parameters An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e281.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e282.jpg is the cancer hazard function. Little et al. [12] demonstrate (by exhibiting a particular parameterization) that there is redundancy in the parameterization for this model: the number of theoretically estimable parameters in the models of Little and Wright [13] and Little et al. [14] is at most two less than the number that are theoretically available, demonstrating (by Corollary 2) that there can be no more than this number of irredundant parameters. Two numerical examples suggest that this bound is sharp – we show that the rank of the Hessian, An external file that holds a picture, illustration, etc.
Object name is pone.0008915.e283.jpg, is two less than the row dimension of this matrix. This result generalizes previously derived results of Heidenreich and others [15], [16] relating to the two-mutation model.

Supporting Information

Text S1

(0.33 MB DOC)

Acknowledgments

The authors are very grateful for the comments of Professor Byron Morgan on an advanced draft of the paper, also for the detailed and helpful remarks of a referee.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This work was funded partially by the European Commission under contracts FI6R-CT-2003-508842 (RISC-RAD) and FP6-036465 (NOTE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Rothenberg TJ. Identification in parametric models. Econometrica. 1971;39:577–591.
2. Silvey SD. Statistical inference. 191 p. London: Chapman and Hall; 1975.
3. Catchpole EA, Morgan BJT. Detecting parameter redundancy. Biometrika. 1997;84:187–196.
4. Jacquez JA, Perry T. Parameter estimation: local identifiability of parameters. Am J Physiol. 1990;258:E727–E736. [PubMed]
5. Audoly S, D'Angio L, Saccomani MP, Cobelli C. Global identifiability of linear compartmental models – a computer algebra algorithm. IEEE Trans Biomed Eng. 1998;45:36–47. [PubMed]
6. Bellu G, Saccomani MP, Audoly S, D'Angio L. DAISY: a new software tool to test global identifiability of biological and physiological systems. Computer Meth Prog Biomed. 2007;88:52–61. [PMC free article] [PubMed]
7. Catchpole EA, Morgan BJT, Viallefont A. Solving problems in parameter redundancy using computer algebra. J Appl Stat. 2002;29:626–636.
8. Gimenez O, Viallefont A, Catchpole EA, Choquet R, Morgan BJT. Methods for investigating parameter redundancy. Animal Biodiversity Conservation. 2004;27.1:1–12.
9. Viallefont A, Lebreton J-D, Reboulet A-M, Gory G. Parameter identifiability and model selection in capture-recapture models: a numerical approach. Biometrical J. 1998;40:313–325.
10. Picci G. Some connections between the theory of sufficient statistics and the identifiability problem. SIAM J Appl Math. 1977;33:383–398.
11. Paulino CDM, de Bragança Pereira CA. On identifiability of parametric statistical models. J Ital Stat Soc. 1994;1:125–151. (Stat Methods Appl3)
12. Little MP, Heidenreich WF, Li G. Parameter identifiability and redundancy in a general class of stochastic carcinogenesis models. PLoS ONE. 2009;4(12):e8520. [PMC free article] [PubMed]
13. Little MP, Wright EG. A stochastic carcinogenesis model incorporating genomic instability fitted to colon cancer data. Math Biosci. 2003;183:111–134. [PubMed]
14. Little MP, Vineis P, Li G. A stochastic carcinogenesis model incorporating multiple types of genomic instability fitted to colon cancer data. J Theoret Biol. 2008;254255:229–238. [PubMed]
15. Heidenreich WF. On the parameters of the clonal expansion model. Radiat Environ Biophys. 1996;35:127–129. [PubMed]
16. Heidenreich WF, Luebeck EG, Moolgavkar SH. Some properties of the hazard function of the two-mutation clonal expansion model. Risk Anal. 1997;17:391–399. [PubMed]
17. Moolgavkar SH, Venzon DJ. Two-event models for carcinogenesis: incidence curves for childhood and adult tumors. Math Biosci. 1979;47:55–77.
18. McCullagh P, Nelder JA. Generalized linear models (2nd edition) London: Chapman and Hall; 1989.
19. Rudin W. Principles of mathematical analysis (3rd edition) Auckland: McGraw Hill; 1976.
20. Dickson LE. Modern algebraic theories. Chicago: Sanborn; 1926.
21. Armitage P, Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer. 1954;8:1–12. [PMC free article] [PubMed]
22. Little MP. Are two mutations sufficient to cause cancer? Some generalizations of the two-mutation model of carcinogenesis of Moolgavkar, Venzon, and Knudson, and of the multistage model of Armitage and Doll. Biometrics. 1995;51:1278–1291. [PubMed]
23. Nowak MA, Komarova NL, Sengupta A, Jallepalli PV, Shih I-M, et al. The role of chromosomal instability in tumor initiation. Proc Natl Acad Sci U S A. 2002;99:16226–16231. [PubMed]
24. Little MP, Li G. Stochastic modelling of colon cancer: is there a role for genomic instability? Carcinogenesis. 2007;28:479–487. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science