|Home | About | Journals | Submit | Contact Us | Français|
Linear regression quantifies the linear relationship between paired sets of input and output observations. The well known least-squares regression optimizes the performance criterion defined by the residual error, but is highly sensitive to uncertainties or perturbations in the observations. Robust least-squares algorithms have been developed to optimize the worst case performance for a given limit on the level of uncertainty, but they are applicable only when that limit is known. Herein, we present a robust-satisficing approach that maximizes the robustness to uncertainties in the observations, while satisficing a critical sub-optimal level of performance. The method emphasizes the trade-off between performance and robustness, which are inversely correlated. To resolve the resulting trade-off we introduce a new criterion, which assesses the consistency between the observations and the linear model. The proposed criterion determines a unique robust-satisficing regression and reveals the underlying level of uncertainty in the observations with only weak assumptions. These algorithms are demonstrated for the challenging application of linear regression to neural decoding for brain-machine interfaces. The model-consistent robust-satisfying regression provides superior performance for new observations under both similar and different conditions.
Linear regression is a classical inverse problem where the parameters of a linear model that relate the dependent variables to the independent variables need to be estimated from a set of observations . The problem is complicated by three major sources of uncertainties: (i) measurement uncertainty, which accounts for inaccuracies in the observations, (ii) model uncertainty, which accounts for possible non-linear effects, and (iii) temporal uncertainty, which accounts for potential changes that are not present in the available observations.
A major goal of linear regression is to use the estimated parameters to predict the dependent variable from new observations of the independent variables. The performance criterion of interest in this case is the norm of the residual error, which is estimated from the available set of observations. The well known least-squares regression is derived by minimizing the residual norm. However, given the above uncertainties, the resulting least-squares regression may fail to provide high, or even acceptable, performance for new observations . An alternative criterion is based on the combination of the residual norm and the weighted regression norm –. The weight of the regression norm is referred to as the regularization parameter: increasing the regularization parameter reduces the sensitivity to uncertainties in the observations at the expense of increasing the residual norm. Thus regularized regression depends on properly choice of the regularization parameter to balance this trade-off .
The choice of the regularization parameter depends on the distribution of the singular values of the observation matrix A, whose columns describe the independent variables . In general, when the condition number of the observation matrix A is large, the least squares solution is greatly affected by uncertainties in the observations and may differ considerably from the underlying linear model . If there is a clear gap in the spectrum of the singular values, i.e., A is rank deficient, the regularization may be based on truncated Singular Value Decomposition (tSVD) , and the rank of A can be used to determine the regularization parameter. However, if the singular values of A decay gradually, the problem is ill-posed and the choice of the regularization parameter is more complicated. For ill-posed problems there are two classes of parameter choice methods:
When the uncertainty in the observations is bounded by a known limit, a robust least-squares regression can be determined using the min-max approach, which minimizes the worst case residual norm . Under a specific set of assumptions, the robust least-squares regression has been shown to have the form of a Tikhonov regularized regression  with a regularization parameter that depends on the presumed bound on the level of uncertainty in the observations. Hence, the method substitutes the choice of the Tikhonov regularization parameter with the assessment of the bound on the level of uncertainty in the observations.
The L-curve has been suggested as a method for choosing the regularization parameter when the bound on the level of uncertainties or perturbations is not known . However, the L-curve may loose its characteristic L-shape in the presences of large uncertainties and thus the method may fail to determine the appropriate regularization parameter (see section V). Consequently, the choice of the Tikhonov regularization parameter under uncertainties remains a challenge.
Here we address the case where the bound on the uncertainty in the observations is unknown. In section II, we describe the uncertainty in the observations by an information-gap (info-gap) model, which is parameterized by the level of uncertainty. Instead of optimizing performance, we focus on maximizing the level of uncertainty under which a critical level of performance can be guaranteed. This approach has been termed robust-satisficing, where satisficing (coined by H. Simon ) refers to satisfying a sufficient level of performance .
The info-gap approach emphasizes the robustness/performance trade-off detailed in section III: high robustness to uncertainties can be achieved only by relinquishing performance. To determine the appropriate trade-off, Section IV introduces a new criterion, which assesses the consistency between the observations and the robust-satisficing regression. The model-consistency criterion determines a unique regression and reveals the appropriate performance and robustness trade-off. The effectiveness of the model-consistent robust-satisficing method is demonstrated in Section V for the challenging application of linear regression to neural decoding in Brain-Machine-Interfaces (BMI), where the L-curve method is inadequate due to the large level of uncertainty.
We consider a linear regression between the dependent variable b and the (column) vector of independent variables a K, computed from a set of N available observations , of the independent and dependent variables, respectively. The estimation of the regression is complicated by potential deviations between the available observations and the actual values of the underlying variables that are expected to be related linearly. The latter are referred to as ideal observations – observations that would have been obtained under no uncertainties.
For convenience, the available observations are arranged into a N × K observation matrix Ã = [ã1, ã2,…. ãN ]T, and an observation vector [1, 2,…N]T. Given a regression vector x K, the common criterion for its performance is the 2-norm of the residual error r(x; Ã, ) ||Ãx − ||2. However, when there are uncertainties in the observations, the criterion of interest is r(x; A, b) ||Ax−b||2 based on the ideal observation matrix A = [a1, a2,…. aN]T and vector b = [b1, b2, bN ]T of the linearly related variables. In order to assess the relation between these two criteria, we need to specify the set of ideal observations that might have resulted in the available observations, given the level of uncertainty. We consider models of the form:
Where α ≥ 0 is the level of uncertainty in the observation matrix A and ν ≥ 0 is the relative level of uncertainty in b. The 2-norm is used as the vector-norm and the induced 2-norm as the matrix norm. The index-2 that specifies the nature of the norm would be omitted except for the definitions.
We note that Eq. (1) defines a nested and unbounded set of ideal observations that might have resulted in the available observations, under increasing levels of uncertainty. Nesting implies that ideal observations associated with larger levels of uncertainty include ideal observations associated with smaller levels of uncertainty, while only the available observations are possible when there is no uncertainty. Unboundedness implies that the norm of the deviation from the available observations may increase without bound as the level of uncertainty increases. Since the worst case depends on the level of uncertainty, unboundedness implies that the worst case is not known. Such a structure is referred to as an info-gap (short for information-gap) model of uncertainty .
The model in (1) has been used in  and a similar model with the Frobenius norm was used in . Another model, which bounds the uncertainty in each row of the observation matrix A and vector b, is considered in . Herein we assume that the relative weight ν is known (but not necessarily one) and capture the uncertainty with the single parameter α.
Minmax regression minimizes the maximum residual norm under a presumed level of uncertainty and thus depends on knowing that level. Furthermore, concentrating on the worst case, which may be very unlikely, may result in poor performance under more likely cases. Instead we propose the robust-satisficing approach, which focuses on the required level of performance without making any assumptions about the level of uncertainty. This approach is outlined in the next two definitions.
Given a critical performance ρ, the robustness of the regression x to uncertainties in the observations is the largest level of uncertainty α up to which all possible ideal observations result in residual norms that are smaller than ρ:
The robustness is zero if the critical performance cannot be achieved even under zero uncertainty. For a given regression x, the robustness as a function of the critical performance ρ is termed the robustness curve.
Given a sub-optimal critical performance ρ, the robust satisficing (RS) regression is the regression that maximizes the robustness for satisficing ρ:
The next Proposition evaluates the robustness when the uncertainties are described by the info-gap model of Eq. (1):
Given the available observations Ã, , the robustness of the regression x for achieving the critical performance ρ under uncertainties in the observations described by the info-gap model of Eq. (1) is:
The internal maximization in Eq. (2) defines the worst residual norm that may occur with the regression x when the level of uncertainty is α. It can be shown (, see also Lemma B for detailed analogous derivation of the best residual norm) that the worst residual norm is given by:
The first term in Eq. (5) describes the nominal residual error rnom (x) = ||Ãx − ||2 when there is no uncertainty, whereas the second term describes the effect of uncertainty. Thus, a critical performance that equals the nominal residual error ρ = rnom(x), can be achieved only with zero robustness. Positive robustness can be attained only for ρ > rnom (x), i.e., by relinquishing performance, and in that range the robustness curve increases linearly.
Examples of robustness curves for the application described in Section V are shown in Figure 1. Different robustness curves correspond to different Tikhonov regressions xμ (Ã, ) (ÃT Ã + μI)−1 ÃT with different values of the Tikhonov parameter μ. The Tikhonov regression with μ = 0 corresponds to the optimal least-squares (LS) regression xLS = xμ=0 (Ã, ) (ÃT Ã)−1 ÃT, and achieves the optimal performance rnom (xLS) ||ÃxLS −||. The LS regression minimizes the nominal residual norm, but achieves this critical optimal performance, i.e., ρ = rnom (xLS), with zero robustness.
The above discussion implies that the robustness curve of a sub-optimal regressions x ≠ xLS becomes positive at ρ >rnom (x) > rnom (xLS). Furthermore, when ||x|| < ||xLS ||, the slope of the robustness curve of x is larger than the slope of the robustness curve of the LS regression, so the two robustness curves cross each other (see Figure 1). Hence, a sub-optimal regression with ||x|| < ||xLS || (e.g., Tikhonov regression with positive Tikhonov parameter) provides a higher robustness for some sub-optimal performance ρ > rnom(x), as detailed in the next proposition.
For critical sub-optimal performance ρ >||ÃxLS −b||, the robust-satisfying regression xRS (ρ) has the form of a Tikhonov regression xμ(Ã, ) (ÃT Ã + μI)−1 ÃT with the parameter μ = , which satisfies:
At a fixed critical performance ρ, the gradient of the robustness given by (4), with respect to the regression x, can be expressed as:
The gradient vanishes when x has the form of a Tikhonov regression with the parameter specified by Eq. (6).
Assuming a non singular case, (i.e., 0 < ||Ãx − for all x), the robustness reaches a global maximum at the point where the gradient vanishes (Lemma A1 in Appendix A). Hence, there is a unique that satisfies Eq. (6), and the RS regression defined by Eq. (3) has the form of a Tikhonov regression with the Tikhonov parameter .
Proposition 2 indicates that for each critical performance ρ within the specified range, the RS regression has the form of a Tikhonov parameter with a parameter given implicitly by (6). The opposite is also true, as stated in Proposition 3.
A Tikhonov regression xμ (Ã, ) (ÃTÃ + μI)−1 ÃT with parameter μ is the RS regression for the critical performance:
We note that ρRS(μ) specified in Eq. (8) is always larger than the optimal residual norm, i.e., ρRS (μ) ≥ ||ÃxLS−b||, since the optimal performance is the minimum of the first term and the second term is always positive. According to Proposition 2, the RS regression for a critical performance ρ > ||ÃxLS−b|| is a Tikhonov regression with the parameter μ = that satisfies Eq. (6). Substituting ρ = ρRS(μ) given by Eq. (8), it is easily verified that Eq. (6) is satisfied for = μ. Hence the corresponding RS regression for ρ = ρRS(μ) is the Tikhonov regression with the parameter μ.
Eq. (8) specifies explicitly the critical performance for which a Tikhonov regression is the RS regression. We next specify the resulting robustness:
Given a Tikhonov regression xμ, its robustness for obtaining the critical performance ρ = ρRS (μ) (Eq. (8)), for which it is the RS regression, is:
Furthermore, the RS robustness RS (ρRS (μ); xμ) is a non-decreasing function of the critical performance ρ = ρRS(μ).
Substituting Eq. (8) in Eq. (4), results in Eq. (9) for the robustness. Equation (4) indicates that the robustness of a fixed regression is a non-decreasing function of the critical performance. The robustness of the RS regression is the maximum robustness that can be achieved for a given critical performance, so it is also a nondecreasing function of the critical performance ρ.
Equations (8) and (9) specify the envelope of the robustness curves as shown in Figure 1, which depicts the RS robustness RS as a function of the critical performance ρ. The circles on each of the robustness curves indicate the points at which the corresponding Tikhonov regressions are the RS regressions, with the critical performance given by Eq. (8), and the resulting robustness by Eq. (9).
In agreement with Proposition 4, the RS robustness RS is a non-decreasing function of the critical performance. Hence, the family of RS regressions establishes a trade-off between the critical performance and the robustness: Higher robustness can be achieved only by relinquishing performance (increasing the critical performance), while better performance (lower critical performance) can be obtained only with lower robustness.
The performance/robustness trade-off defined by (8) and (9) describes how the Tikhonov parameter affects both the performance and the robustness, and thus provides important insight for the choice of the proper Tikhonov parameter. However, the trade-off indicates that neither the robustness nor the performance criterion provides a unique solution. An additional criterion, which assesses the consistency between the RS regression model and the level of uncertainty in the observations, is developed in the next section. It is shown that the model-consistency criterion defines a unique regression.
The level of uncertainty in the observations affects not only the worst case performance (as described in Section II and III) but also the best case performance. Given the regression x, the smallest residual norm that can be achieved under the level of uncertainty α, is defined by: . The smallest residual norm assesses how well an ideal observation within the level of uncertainty α, may match the linear model with the regression x. In particular, ŝ(α; x) = 0 implies that there is an ideal observation within the info-gap model with uncertainty α that matches the linear model defined by the regression x exactly. The strategy proposed in this section is based on evaluating the smallest residual norm of the RS regressions when the level of uncertainty is the corresponding robustness.
The best performance that can be obtained by the regression x when there is no uncertainty is the nominal performance rnom(x) = ||Ãx−||2. Better than nominal performance ρ < rnom (x) is referred to as windfall performance. Windfall performance cannot be achieved robustly, but as the level of uncertainty increases, the opportunity for its occurrence increases, leading to the notion of opportuneness  defined next:
The opportuneness for obtaining a windfall performance ϕ, with the regression x is the smallest level of uncertainty for which the info-gap model includes an ideal observation with a smaller or equal residual norm:
For a given regression x, the opportuneness as a function of the windfall performance ϕ is termed the opportuneness curve.
The next Proposition evaluates the opportuneness when the uncertainties are described by the info-gap model of Eq. (1):
Given the available observations Ã, , the opportuneness of the regression x for achieving the windfall performance ϕ under uncertainties in the observations described by the info-gap model of Eq. (1) is:
The internal minimization in Eq. (10) defines the smallest residual norm that may occur with regression x when the level of uncertainty is α. Lemma B (Appendix B) shows that the smallest residual error is given by:
Eq. (11) indicates that the opportuneness decreases linearly as the windfall performance increases until it reaches zero for the nominal residual norm. Examples of opportunity curves for the three Tikhonov regressions analyzed in Figure 1 are shown in Figure 2, along with the corresponding robustness curves. Each pair of robustness and opportunity curves, for the same Tikhonov regression, emanates from the x-axis at nominal performance ρ = ϕ = rnom (xμ).
A windfall performance that is consistent with the RS regression xRS(ρ), is a windfall performance ϕRS whose opportuneness (ϕRS, xRS(ρ)) is the same level of uncertainty as the robustness (ρ, xRS(ρ)).
The consistent windfall performance of a RS regression assesses how best it may fit an ideal observation within the level of uncertainty that equals the corresponding robustness. It therefore provides a criterion for choosing the model-consistent robust-satisficing (MCRS) regression, as defined next:
The model-consistent robust satisficing (MCRS) regression is the RS regression that minimizes the consistent windfall performance: xMCRSx xRS(ρMCRS) where
That is, an MCRS regression is a robust-satisficing regression that under the level of uncertainty that corresponds to its robust-satisficing robustness provides the best performance. Note that determining xMCRS does not require any assumption about the level of uncertainty nor the critical performance, and is therefore parameter free.
For the info-gap model of Eq. (1), the RS regression is a Tikhonov regression xμ. The following Propositions develop: (i) the corresponding consistent windfall performance (Proposition 6), and (ii) the MCRS regression (Proposition 7).
Given the info-gap model of Eq. (1), the consistent windfall performance of the RS regression xRS(ρ) is:
Where xμ is a Tikhonov regression with parameter μ satisfying Eq. (6).
According to Proposition 4, the robustness of a Tikhonov regression RS(μ) RS (ρRS (μ);xμ) is given by Eq. (9). Assuming that RS (μ) < (ϕ=0, xμ), the condition (ϕRS; xμ) = RS (μ), results in: , which can be inverted to express ϕRS as the second term in Eq. (14). If RS(μ) ≥ (ϕ=0, xμ), the robustness is higher than the opportuneness for zero windfall performance so ϕRS(μ) = 0.
The consistent windfall performance is determined graphically in Figure 2 for each of the Tikhonov regressions analyzed there. The Figure depicts the robustness and opportuneness curves for three Tikhonov regressions. The circle on each robustness curve marks the critical performance for which that regression is robust-satisficing and the resulting robustness. Assuming that the level of uncertainty equals the robustness, the corresponding opportuneness curve is crossed at (ϕRS; xμ) = RS(μ), as marked by the triangle on the matching opportuneness curve, thereby determining the consistent windfall performance for that RS regression. The three pairs of robustness and opportunity curves in Figure 2 suggests that the consistent windfall performance reaches a minimum for some μ, as proved in the next proposition.
Given the info-gap model of Eq. (1), the MCRS regression is a Tikhonov regression with μ = μc given by:
According to Proposition 2, the RS regressions under uncertainties described by the info-gap model of Eq. (1) have the form of Tikhonov regressions. Lemma C in Appendix C states the consistent windfall performance either becomes zero at some μ, or reaches a single minimum. Hence, the MCRS is the Tikhonov regression with the parameter μc given by Eq. (15).
Note that when ϕRS(μ0) = 0 for some μ0, the unique MCRS regression defined with μc = min(μ|ϕRS (μ) =0) is satisfied exactly by a possible ideal observation within the specified level of uncertainty RS(μc).
Brain-machine interfaces (BMIs) are based on extracting movement-related signals from the neural activity of a large number of cortical neurons with the goal of restoring motor functions in severely paralyzed patients –. The data used in this paper was recorded during the BMI experiments reported in . The experiments were conducted with monkeys that controlled the position of a cursor on a computer screen using either a hand-held pole (pole control) or a BMI (brain control). The last ten minutes of pole control were used to determine the LS regression between the measured velocity of hand movements and the neural activity. The resulting regression was subsequently used in brain control to generate real time predictions of the velocity from the recorded neural activity and direct the cursor accordingly.
The neural activity was represented by the spike counts in 100msec bins, and the regression included ten lags of binned spike counts from each of the recorded neurons . The typical session analyzed here included 183 neurons so the regression was based on 1831 inputs. Non-linear methods, including Kalman filter, and multi-layer feedforward artificial neural networks were investigated off line and it was concluded that they could not consistently outperform the linear filter .
The L-curve shown in Figure 3 depicts the solution norm versus the residual norm for different Tikhonov regressions, based on the last ten minutes of pole control (training data). The L-curve is usually characterized by an “L” shape and the Tikhonov parameter is selected at its corner, by locating, for example, the point of maximum curvature . However, the L-curve in Figure 3 does not depict the characteristic L-shape, reflecting the low signal-to-noise ratio in the neural activity. Thus, the L-curve method is inadequate for choosing the proper Tikhonov parameter in BMI applications.
As expected from proposition 7, the consistent windfall performance reaches a unique minimum as a function of the Tikhonov parameter, as demonstrated in Figure 4 (for the same 10 minutes of training data used in Figure 3). Based on a signal-to-noise ratio analysis, the neural activity is expected to be 2–3 orders of magnitude more noisy than the velocity measurements, so the relative weight ν was set to ν = 0.001. The resulting minimum is reached at μ= 3600. Sensitivity analysis indicates that increasing or decreasing the relative weight ν by an order of magnitude has only a small effect on the chosen parameter which remains in the range [3300,3900].
Figure 5 compares the performance of different Tikhonov regressions for ten minutes of testing data during pole control and brain control. Whereas the LS algorithm minimizes the residual error for the training data (last ten minutes of pole control), alternative Tikhonov regressions outperform on testing data from pole control (collected during the preceding ten minutes of pole control) and brain control (collected during the first ten minutes of brain control). Consider, in particular, the performance on testing data from brain control depicted in the lower panel of Figure 5. The LS regression results in large residual norm above 180, which is outside the scale. All the Tikhonov regressions with the Tikhonov parameter in the depicted range outperform the LS regression. However, while the Tikhonov regression selected using the L-curve reduces the residual norm to 78.5, the MCRS regression reduces the residual norm to 67.5, close to the minimum level that can be achieved by any Tikhonov regression.
The MCRS provides significant improvement in performance for both pole and, most importantly, brain control. For testing data from pole control, additional improvement in performance could be achieved with a Tikhonov parameter that is smaller than the one chosen by the model-consistency criterion. Thus, the level of uncertainty implied by the model consistency criterion is higher than the level of perturbation in adjacent ten minutes records of the neural activity. However, when testing the performance on brain control, which is the critical application of the linear regression, it is evident that the method indeed captures the adequate level of uncertainty in the data, and provides close to optimal performance.
In this paper we considered the inverse problem of estimating the linear regression x from available observations of the dependent and independent variables. We developed an info-gap robust satisficing approach to regression, which maximizes the robustness for obtaining a critical sub-optimal performance. For the particular info-gap considered here, the resulting regression was shown to have the form of a Tikhonov regularized solution. This is the same regression that solves the dual min-max problem of maximizing the performance for a given level of uncertainty . We note that different info-gap models would result in different RS regressions (and min-max regressions) that are not necessarily the same as Tikhonov regressions, as demonstrated in .
The notion of robustness was applied in  to estimate a deterministic regression vector x from observations y=Ax+w where A is known and w is additive noise. The robust regression for a given level of performance was shown to be the min-max regression for the corresponding level of uncertainty. It was concluded there that the resulting robustness/performance trade off establishes an important design tool for choosing the regrssion based on both criteria. In our case the matrix A is not known and only uncertain observations of its values are available. Furthermore, here we use also the notion of opportuneness to resolve the resulting trade-off and select a unique model-consistent robust-satisficing regression.
The info-gap approach emphasizes the trade-off between performance and robustness: increasing robustness is possible only by relinquishing performance. The proper trade-off is usually determined by presuming the knowledge of an associated design parameter: The min-max approach presumes a level of uncertainty; the robust-satisficing approach developed here presumes a level of critical performance; Tikhonov regression relies on the weight of the regression norm relative to the residual norm in the performance criterion. Thus, all these methods rely on proper selection of a design parameter: the level of uncertainty, the critical performance or the relative weight of the regression norm.
Here we introduced a new criterion, based on the consistency between the observations and the linear model, to resolve the robustness/performance trade-off and determine a unique parameter-free regression. The consistency of a robust satisfying (RS) regression is assessed by the consistent windfall performance – the minimum residual error that the regression can achieve with an ideal observation that is consistent with the corresponding level of robustness. The consistent windfall performance has a unique minimum as a function of the Tikhonov parameter and thus can be used to determine a unique regression, the model consistent robust satisfying (MCRS) regression, and assess the underlying level of uncertainty in the observations.
We demonstrated the model-consistent algorithm for choosing the Tikhonov parameter for the challenging application of neural decoding for brain-machine interfaces (BMIs). The MCRS provides significant improvement in performance for both pole and, most importantly, brain control.
The authors are pleased to acknowledge valuable comments by Yakov Ben-Haim and Yonina Eldar. This research was supported by the Abramsom Center for the Future of Health and by the fund for the promotion of research at the Technion and by grants from DARPA and NIH to MALN.
For non singular observations, i.e., 0 < ||Ãx − || for all x, the robustness has a unique global maximum for desired performance ρ > ||ÃxLS − b|| where xLS is the least-squares regression.
The gradient of the robustness can be expressed as:
At a point where the gradient vanishes, the Hessian of the robustness is given by:
Furthermore, at the point where the gradient of the robustness vanishes, the last term in (A.2) is zero. We will show that the sum of the remaining two matrices in the external parenthesis is positive definite, and thus that after multiplication by minus, the resulting Hessian is negative definite.
The first matrix can be expressed as:
The inner matrix in H1 has the form of Lemma-A2 (see below) and hence all its eigen-values are positive except for a single eigen-value that is zero. The eigen-vector that corresponds to the zero eigen-value is Ãx − . Considering the non-singular problem, Ãy is never parallel to Ãx − for any x and y (since otherwise there exist χ such that Ãχy = Ãx − and hence Ã (x − χy) = ). Since no vector in y n is mapped into the single eigen-vector with zero eigen-value, H1 is positive definite. The second term in the Hessian has of the form of the Lemma-A2 and hence is positive semi-definite. The sum of a positive definite matrix and a positive semi-definite matrix is a positive definite matrix. Finally, the multiplication by minus results in a negative definite Hessian matrix.
A negative definite Hessian implies that any point, at which the gradient of the robustness vanishes, is a local maximum. However, since there are no local minima there can be only one local maximum, which is therefore the global unique maximum.
For any vector y n the matrix is positive semi-definite, with a single zero eigen-value whose corresponding eigen-vector is y.
The first matrix is full rank with n eigen-values ||y||−1. The second matrix has a single eigen-value equal of ||y||−1 with eigen-vector y. Hence their difference is a matrix with rank n−1, with n−1 positive eigen-values of ||y||−1 and one zero eigen-value, forming a semi-positive matrix. The eigen-vector with a zero eigen-value is y.
Triangle inequality implies that given A,b U (α):
and equality is achieved for: and . Since Abest, bbest U (α), the minimum of ||Ax − b|| with respect to all A,b U (α) is ||Ãx − ||− α||x||− να, or zero if this is negative.
For the Info-gap of Eq. (1), the consistent windfall performance either becomes zero or reaches a single minimum as a function of the Tikhonov parameter μ, i.e., when ϕRS (μ) > 0 for all μ, then is uniquely defined.
The consistent windfall performance is given by: . Consider the case when ϕRS(μ) > 0 for all μ, then the derivative of ϕRS(μ) with respect to μ is given by:
Noting that for a Tikhonov regression xμ= (AT A +μI)−1 ATb, implies AT (Axμ − b) = −μxμ, the gradient of ϕRS (μ) with respect to xμ is:
The derivative of xμ with respect to μ is given by:
Since while , it follows that ϕRS (μ) reaches a local minimum in at least one point.
At the point where the first derivative vanishes, the second derivative of ϕRS (μ) can be evaluated as:
where g(μ) is the nominator of in (B.4), i.e.,
The second derivative can be expressed as:
The positive second derivative of (B.7) indicates that the extreme points of ϕRS (μ) are local minima. Since no extreme point is a local maximum there can be only one local minimum and hence is unique.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.