PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Econometrica. Author manuscript; available in PMC 2010 June 29.
Published in final edited form as:
Econometrica. 2010 May 1; 78(3): 1031–1092.
doi:  10.3982/ECTA7245
PMCID: PMC2893418
NIHMSID: NIHMS163683

Optimal Mandates and The Welfare Cost of Asymmetric Information: Evidence from The U.K. Annuity Market*

Abstract

Much of the extensive empirical literature on insurance markets has focused on whether adverse selection can be detected. Once detected, however, there has been little attempt to quantify its welfare cost, or to assess whether and what potential government interventions may reduce these costs. To do so, we develop a model of annuity contract choice and estimate it using data from the U.K. annuity market. The model allows for private information about mortality risk as well as heterogeneity in preferences over different contract options. We focus on the choice of length of guarantee among individuals who are required to buy annuities. The results suggest that asymmetric information along the guarantee margin reduces welfare relative to a first best symmetric information benchmark by about £127 million per year, or about 2 percent of annuitized wealth. We also find that by requiring that individuals choose the longest guarantee period allowed, mandates could achieve the first-best allocation. However, we estimate that other mandated guarantee lengths would have detrimental effects on welfare. Since determining the optimal mandate is empirically difficult, our findings suggest that achieving welfare gains through mandatory social insurance may be harder in practice than simple theory may suggest.

Keywords: Annuities, contract choice, adverse selection, structural estimation

1. INTRODUCTION

Ever since the seminal works of Akerlof (1970) and Rothschild and Stiglitz (1976), a rich theoretical literature has emphasized the negative welfare consequences of adverse selection in insurance markets and the potential for welfare-improving government intervention. More recently, a growing empirical literature has developed ways to detect whether asymmetric information exists in particular insurance markets (Chiappori and Salanie (2000), Finkelstein and McGarry (2006)). Once adverse selection is detected, however, there has been little attempt to estimate the magnitude of its efficiency costs, or to compare welfare in the asymmetric information equilibrium to what would be achieved by potential government interventions. In an attempt to start filling this gap, this paper develops an empirical approach that can quantify the efficiency cost of asymmetric information and the welfare consequences of government intervention.1

We apply our approach to the semi-compulsory market for annuities in the United Kingdom. Individuals who have accumulated funds in tax-preferred retirement saving accounts (the equivalents of an IRA or 401(k) in the United States) are required to annuitize their accumulated lump sum balances at retirement. These annuity contracts provide a survival-contingent stream of payments. As a result of these requirements, there is a sizable volume in the market. In 1998, new funds annuitized in this market totalled £6 billion (Association of British Insurers (1999)).

Although they are required to annuitize their balances, individuals are allowed choice in their annuity contract. In particular, they can choose from among guarantee periods of 0, 5, or 10 years. During a guarantee period, annuity payments are made (to the annuitant or to his estate) regardless of the annuitant's survival. The choice of a longer guarantee period comes at the cost of lower annuity payments while alive. When annuitants and insurance companies have symmetric information about an annuitant's mortality rate, a longer guarantee is more attractive to an annuitant who cares more about their wealth when they die relative to consumption while alive; as a result, the first-best guarantee length may differ across annuitants. When annuitants have private information about their mortality rate, a longer guarantee period is also more attractive, all else equal, to individuals who are likely to die sooner. This is the source of adverse selection, which can affect the equilibrium price of guarantees and thereby distort guarantee choices away from the first-best symmetric information allocation.

The pension annuity market provides a particularly interesting setting in which to explore the welfare costs of asymmetric information and the welfare consequences of potential government intervention. Annuity markets have attracted increasing attention and interest as Social Security reform proposals have been advanced in various countries. Some proposals call for partly or fully replacing government-provided defined benefit, pay-as-you-go retirement systems with defined contribution systems in which individuals would accumulate assets in individual accounts. In such systems, an important question concerns whether the government would require individuals to annuitize some or all of their balance, and whether it would allow choice over the type of annuity product purchased. The relative attractiveness of these various options depends critically on consumer welfare in each alternative allocation.

In addition to their substantive interest, several features of annuities make them a particularly attractive setting for our purpose. First, adverse selection has already been detected and documented in this market along the choice of guarantee period, with private information about longevity affecting both the choice of contract and its price in equilibrium (Finkelstein and Poterba (2004, 2006)). Second, annuities are relatively simple and clearly defined contracts, so that modeling the contract choice requires less abstraction than in other insurance settings. Third, the case for moral hazard in annuities is arguably less compelling than for other forms of insurance; our ability to assume away moral hazard substantially simplifies the empirical analysis.

We develop a model of annuity contract choice and use it, together with individual-level data on annuity choices and subsequent mortality outcomes from a large annuity provider, to recover the joint distribution of individuals’ (unobserved) risk and preferences. Using this joint distribution and the annuity choice model, we compute welfare at the observed allocation, as well as allocations and welfare for counterfactual scenarios. We compare welfare under the observed asymmetric information allocation to what would be achieved under the first-best, symmetric information benchmark; this comparison provides our measure of the welfare cost of asymmetric information. We also compare equilibrium welfare to what would be obtained under mandatory social insurance programs; this comparison sheds light on the potential for welfare improving government intervention.

Our empirical object of interest is the joint distribution of risk and preferences. To estimate it, we rely on two key modeling assumptions. First, to recover mortality risk we assume that mortality follows a mixed proportional hazard model. Individuals’ mortality tracks their own individual-specific mortality rates, allowing us to recover the extent of heterogeneity in (ex-ante) mortality rates from (ex-post) information about mortality realization. Second, to recover preferences, we use a standard dynamic model of consumption by retirees. In our baseline model we assume that retirees perfectly know their (ex-ante) mortality rate, which governs their stochastic time of death. This model allows us to evaluate the (ex-ante) value-maximizing choice of a guarantee period as a function of ex ante mortality rate and preferences for wealth at death relative to consumption while alive.

Given the above assumptions, the parameters of the model are identified from the variation in mortality and guarantee choices in the data, and in particular from the correlation between them. However, no modeling assumptions are needed to establish the existence of private information about the individual's mortality rate. This is apparent from the existence of (conditional) correlation between guarantee choices and ex post mortality in the data. Given the annuity choice model, rationalizing the observed choices with only variation in mortality risk is hard. Indeed, our findings suggest that unobserved mortality risk and preferences are both important determinants of the equilibrium insurance allocations.

We measure welfare in a given annuity allocation as the average amount of money an individual would need to make him as well off without the annuity as with his annuity allocation and his pre-existing wealth. We also examine the optimal government mandate among the currently existing guarantee options of 0, 5, or 10 years. In a standard insurance setting – that is, when all individuals are risk averse, the utility function is state-invariant, and there are no additional costs of providing insurance – it is well-known that mandatory (uniform) full insurance can achieve the first best allocation, even when individuals vary in their preferences. In contrast, we naturally view annuity choices as governed by two different utility functions, one from consumption when alive and one from wealth when dead. In such a case, whether and which mandatory guarantee can improve welfare gains relative to the adverse selection equilibrium is not clear without more information on the cross-sectional distribution of preferences and mortality risk. The investigation of the optimal mandate – and whether it can produce welfare gains relative to the adverse selection equilibrium – therefore becomes an empirical question.

While caution should always be exercised when extrapolating estimates from a relatively homogeneous subsample of annuitants of a single firm to the market as a whole, our baseline results suggest that a mandatory social insurance program that required individuals to purchase a 10 year guarantee would increase welfare by about £127 million per year or £423 per new annuitant, while one that requires annuities to provide no guarantee would reduce welfare by about £107 million per year or £357 per new annuitant. Since determining which mandates would be welfare improving is empirically difficult, our results suggest that achieving welfare gains through mandatory social insurance may be harder in practice than simple theory would suggest. We also estimate welfare in a symmetric information, first-best benchmark. We find that the welfare cost of asymmetric information within the annuity market along the guarantee margin is about £127 million per year, £423 per new annuitant, or about two percent of the annuitized wealth in this market. Thus, we estimate that not only is a 10 year guarantee the optimal mandate, but also that it achieves the first best allocation.

To put these welfare estimates in context given the margin of choice, we benchmark them against the maximum money at stake in the choice of guarantee. This benchmark is defined as the additional (ex-ante) amount of wealth required to ensure that if individuals were forced to buy the policy with the least amount of insurance, they would be at least as well off as they had been. We estimate that the maximum money at stake in the choice of guarantee is only about 8 percent of the annuitized amount. Our estimates therefore imply that the welfare cost of asymmetric information is about 25 percent of this maximum money at stake.

Our welfare analysis is based on a model of annuity demand. This requires assumptions about the nature of the utility functions that govern annuity choice, as well as assumptions about the expectation individuals form regarding their subsequent mortality outcomes. Data limitations, particularly lack of detail on annuitant's wealth, necessitate additional modeling assumptions. Finally, our approach requires several other parametric assumptions for operational and computational reasons. The assumptions required for our welfare analysis are considerably stronger than those that have been used in prior work to test whether or not asymmetric information exists. This literature has tested for the existence of private information by examining the correlation between insurance choice and ex-post risk realization (Chiappori and Salanie (2000)). Indeed, the existing evidence of adverse selection along the guarantee choice margin in our setting comes from examining the correlation between guarantee choice and ex-post mortality (Finkelstein and Poterba (2004)). By contrast, our effort to move from testing for asymmetric information to quantifying its welfare implications requires considerably stronger modeling assumptions. Our comfort with this approach is motivated by a general “impossibility” result which we illustrate in the working paper version (Einav, Finkelstein, and Schrimpf (2007)): even when asymmetric information is known to exist, the reduced form equilibrium relationship between insurance coverage and risk occurrence does not permit inference about the efficiency cost of this asymmetric information without strong additional assumptions.

Of course, a critical question is how important our particular assumptions are for our central results regarding welfare. We therefore explore a range of possible alternatives, both for the appropriate utility model and for our various parametric assumptions. We are reassured that our central results are quite stable. In particular, the finding that the 10 year guarantee is the optimal mandate, and achieves virtually the same welfare as the first best outcome, persists under all the alternative specifications that we have tried. However, the quantitative estimates of the welfare cost of adverse selection can vary with the modeling assumptions by a non trivial amount; more caution should therefore be exercised in interpreting these quantitative estimates.

The rest of the paper proceeds as follows. Section 2 describes the environment and the data. Section 3 describes the model of guarantee choice, presents its identification properties, and discusses estimation. Section 4 presents our parameter estimates and discusses their in-sample and out-of-sample fit. Section 5 presents the implications of our estimates for the welfare costs of asymmetric information in this market, as well as the welfare consequences of potential government policies. The robustness of the results is explored in Section 6. Section 7 concludes by briefly summarizing our findings and discussing how the approach we develop can be applied in other insurance markets, including those where moral hazard is likely to be important.

2. DATA AND ENVIRONMENT

Environment

All of the annuitants we study are participants in the semi-compulsory market for annuities in the U.K.. In other words, they have saved for retirement through tax-preferred defined contribution private pensions (the equivalents of an IRA or 401(k) in the United States) and are therefore required to annuitize virtually all of their accumulated balances.2 They are however offered choice over the nature of their annuity product. We focus on the choice of the length of the guarantee period, during which annuity payments are made (to the annuitant or to his estate) regardless of annuitant survival. Longer guarantees therefore trade off lower annuity payments in every period the annuitant is alive in return for payments in the event that the annuitant dies during the guarantee period.

The compulsory annuitization requirement is known to individuals at the time (during working age) that they make their pension savings contributions, although of course the exact nature of the annuity products (and their pricing) that will be available when they have to annuitize is uncertain. Choices over annuity products are only made at the time of conversion of the lump-sum defined contribution balances to an annuity and are based on the products and annuity rates available at that time.

All of our analysis takes the pension contribution decisions of the individual during the accumulation phase (as well as their labor supply decisions) as given. In other words, in our analysis of welfare under counterfactual pricing of the guarantee options, we do not allow for the possibility that the pre-annuitization savings and labor supply decisions may respond endogenously to the change in guarantee pricing. This is standard practice in the annuity literature (Brown (2001), Davidoff, Brown, and Diamond (2005), and Finkelstein, Poterba, and Rothschild (2009)). In our context, we do not think it is a particularly heroic assumption. For one thing, as we will discuss in more detail in Section 5.1, the maximum money at stake in the choice over guarantee is only about 8 percent of annuitized wealth under the observed annuity rates (and only about half that amount under the counterfactual rates we compute); this should limit any responsiveness of preannuitization decisions to guarantee pricing. Moreover, many of these decisions are made decades before annuitization and therefore presumably factor in considerable uncertainty (and discounting) of future guarantee prices.

Data and descriptive statistics

We use annuitant-level data from one of the largest annuity providers in the U.K. The data contain each annuitant's guarantee choice, several demographic characteristics (including everything on which annuity rates are based), and subsequent mortality. The data consist of all annuities sold between 1988 and 1994 for which the annuitant was still alive on January 1, 1998. We observe age (in days) at the time of annuitization, the gender of the annuitant, and the subsequent date of death if the annuitant died before the end of 2005.

For analytical tractability, we make a number of sample restrictions. In particular, we restrict our sample to annuitants who purchase at age 60 or 65 (the modal purchase ages), and who purchased a single life annuity (that insures only his or her own life) with a constant (nominal) payment profile.3 Finally, the main analysis focuses on the approximately two-thirds of annuitants in our sample who purchased an annuity with a pension fund that they had accumulated within our company; in Section 6 we re-estimate the model for the remaining individuals who had brought in external funds. Appendix A discusses these various restrictions in more detail; they are made so that we can focus on the purchase decisions of a relatively homogenous subsample.

Table I presents summary statistics for the whole sample and for each of the four age-gender combinations. Our baseline sample consists of over 9,000 annuitants. Sample sizes by age and gender range from a high of almost 5,500 for 65 year old males to a low of 651 for 65 year old females. About 87 percent of annuitants choose a 5 year guarantee period, 10 percent choose no guarantee, and only 3 percent choose the 10 year guarantee. These are the only three options available to annuitants in our sample and the focus of our subsequent analysis.

Table I
Summary statistics

Given our sample construction described above, our mortality data are both left-truncated and right-censored, and cover mortality outcomes over an age range of 63 to 83. About one-fifth of our sample dies between 1998 and 2005. As expected, death is more common among men than women, and among those who purchase at older ages.

There is a general pattern of higher mortality among those who purchase 5 year guarantees than those who purchase no guarantees, but no clear pattern (possibly due to the smaller sample size) of mortality differences for those who purchase 10 year guarantees relative to either of the other two options. This mortality pattern as a function of guarantee persists in more formal hazard modeling that takes account of the left truncation and right censoring of the data (not shown).4

As discussed in the introduction, the existence of a (conditional) correlation between guarantee choice and mortality – such as the higher mortality experienced by purchasers of the 5 year guarantee relative to purchasers of no guarantee – indicates the presence of private information about individual mortality risk in our data, and motivates our exercise. That is, this correlation between mortality outcomes and guarantee choices rules out a model in which individuals have no private information about their idiosyncratic mortality rates, and guides our modeling assumption in the next section that allow individuals to make their guarantee choices based on information about their idiosyncratic mortality rate.

Annuity rates

The company supplied us with the menu of annuity rates, that is the annual annuity payments per £1 of the annuitized amount. These rates are determined by the annuitant's gender, age at the time of purchase, and the date of purchase; there are essentially no quantity discounts.5 All of these components of the pricing structure are in our data.

Table II shows the annuity rates by age and gender for different guarantee choices from January 1992; these correspond to roughly the middle of the sales period we study (1988-1994) and are roughly in the middle of the range of rates over the period. Annuity rates decline, of course, with the length of guarantee. Thus, for example, a 65 year old male in 1992 faced a choice among a 0 guarantee with an annuity rate of 0.133, a 5 year guarantee with a rate of 0.1287, and a 10 year guarantee with a rate of 0.1198. The magnitude of the rate differences across guarantee options closely tracks expected mortality. For example, our mortality estimates (discussed later) imply that for 60 year old females the probability of dying within a guarantee period of 5 and 10 years is about 4.3 and 11.4 percent, respectively, while for 65 year old males these probabilities are about 7.4 and 18.9 percent. Consequently, as shown in Table II, the annuity rate differences across guarantee periods are much larger for 65 year old males than they are for 60 year old females.

Table II
Annuity payment rates

The firm did not change the formula by which it sets annuity rates over our sample of annuity sales. Changes in nominal payment rates over time reflect changes in interest rates. To use such variation in annuity rates for estimation would require assumptions about how the interest rate that enters the individual's value function covaries with the interest rate faced by the firm, and whether the individual's discount rate covaries with these interest rates. Absent any clear guidance on these issues, we analyze the guarantee choice with respect to one particular menu of annuity rates. For our baseline model we use the January 1992 menu shown in Table II. In the robustness analysis, we show that the welfare estimates are virtually identical if we choose pricing menus from other points in time; this is not surprising since the relative payouts across guarantee choices is quite stable over time. For this reason, the results hardly change if we instead estimate a model with time-varying annuity rates, but constant discount factor and interest rate faced by annuitants (not reported).

Representativeness

Although the firm whose data we analyze is one of the largest U.K. annuity sellers, a fundamental issue when using data from a single firm is how representative it is of the market as a whole. We obtained details on market-wide practices from Moneyfacts (1995), Murthi, Orszag, and Orszag (1999), and Finkelstein and Poterba (2002).

On all dimensions we are able to observe, our sample firm appears typical of the industry as a whole. The types of contracts it offers are standard for this market. In particular, like all major companies in this market during our time period, it offers a choice of 0, 5, and 10 year guaranteed, nominal annuities.

The pricing practices of the firm are also typical. The annuitant characteristics that the firm uses in setting annuity rates (described above) are standard in the market. In addition, the level of annuity rates in our sample firm's products closely match industry-wide averages.

While market-wide data on characteristics of annuitants and the contracts they choose are more limited, the available data suggest that the annuitants in this firm and the contracts they choose are typical of the market. In our sample firm, the average age of purchase is 62, and 59 percent of purchasers are male. The vast majority of annuities purchased pay a constant nominal payment stream (as opposed to one that escalates over time), and provide a guarantee, of which the 5 year guarantee is by far the most common.6 These patterns are quite similar to those in another large firm in this market analyzed by Finkelstein and Poterba (2004), as well as to the reported characteristics of the broader market as described by Murthi, Orszag, and Orszag (1999).

Finally, the finding in our data of a higher mortality rate among those who choose a 5 year guarantee than those who choose no guarantee is also found elsewhere in the market. Finkelstein and Poterba (2004) present similar patterns for another firm in this market, and Finkelstein and Poterba (2002) present evidence on annuity rates that is consistent with such patterns for the industry as a whole.

Thus, while caution must always be exercised in extrapolating from a single firm, the available evidence suggests that the firm appears to be representative – both in the nature of the contracts it offers and its consumer pool – of the entire market.

3. MODEL: SPECIFICATION, IDENTIFICATION, AND ESTIMATION

We start by discussing a model of guarantee choice for a particular individual. We then complete the empirical model by describing how (and over which dimensions) we allow for heterogeneity. We finish this section by discussing the identification of the model, our parameterization, and the details of the estimation.

3.1. A model of guarantee choice

We consider the utility-maximizing guarantee choice of a fully rational, forward looking, risk averse, retired individual, with an accumulated stock of wealth, stochastic mortality, and time-separable utility. This framework has been widely used to model annuity choices (Kotlikoff and Spivak (1981), Mitchell, Poterba, Warshawsky, and Brown (1999), Davidoff, Brown, and Diamond (2005)). At the time of the decision, the age of the individual is t0, and he expects a random length of life7 characterized by a mortality hazard κt during period t > t0.8 We also assume that there exists time T after which individual i expects to die with probability one.

Individuals obtain utility from two sources. When alive, they obtain flow utility from consumption. When dead, they obtain a one-time utility that is a function of the value of their assets at the time of death. In particular, if the individual is alive as of the beginning of period t ≤ T , his period t utility, as a function of his current wealth wt and his consumption plan ct, is given by

v(wt,ct)=(1κt)u(ct)+κtb(wt),
(1)

where u(·) is his utility from consumption and b(·) is his utility from wealth remaining after death. A positive valuation for wealth at death may stem from a number of possible underlying structural preferences, such as a bequest motive (Sheshinski (2006)) or a “regret” motive (Braun and Muermann (2004)). Since the exact structural interpretation is not essential for our goal, we remain agnostic about it throughout the paper.

In the absence of an annuity, the optimal consumption plan can be computed by solving the following program:

VtNA(wt)=maxct0[(1κt)(u(ct)+δVt+1NA(wt+1))+κtb(wt)]s.t.wt+1=(1+r)(wtct)0
(2)

where δ is the per-period discount rate and r is the per-period real interest rate. That is, we make the standard assumption that, due to mortality risk, the individual cannot borrow against the future. Since death is expected with probability one after period T, the terminal condition for the program is given by VT+1NA(wT+1)=b(wT+1).

Suppose now that the individual annuitizes a fraction η of his initial wealth, w0. Broadly following the institutional framework discussed earlier, individuals take the (mandatory) annuitized wealth as given. In exchange for paying ηw0 to the annuity company at t = t0, the individual receives a per-period real payout of zt when alive. Thus, the individual solves the same problem as above, with two small modifications. First, initial wealth is given by (1 – η)w0. Second, the budget constraint is modified to reflect the additional annuity payments zt received every period.

For a given annuitized amount ηw0, consider a choice from a set G[0,T] of possible guarantee lengths; during the guaranteed period, the annuity payments are not survival-contingent. Each guarantee length g [set membership] G corresponds to a per-period payout stream of zt(g), which is decreasing in g(zt(g)g<0for anytt0). For each g, the optimal consumption plan can be computed by solving

VtA(g)(wt)=maxct0[(1κt)(u(ct)+δVt+1A(g)(wt+1))+κtb(wt+Zt(g))]s.t.wt+1=(1+r)(wt+zt(g)ct)0
(3)

where Zt(g)=τ=tt0+g((11+r)τtzτ(g)) is the present value of the remaining guaranteed payments. As before, since after period T death is certain and guaranteed payments stop for sure (recall, G[0,T]), the terminal condition for the program is given by VT+1A(g)(wT+1)=b(wT+1).

The optimal guarantee choice is then given by

g=argmaxgG{Vt0A(g)((1η)w0)}.
(4)

Information about the annuitant's guarantee choice combined with the assumption that this choice was made optimally thus provides information about the annuitant's underlying preference and expected mortality parameters. Intuitively, everything else equal, a longer guarantee will be more attractive for individuals with higher mortality rate and for individuals who obtain greater utility from wealth after death. We later check that this intuition in fact holds in the context of the specific parametrized model we estimate.

3.2. Modeling heterogeneity

To obtain our identification result in the next section, we make further assumptions that allow only one-dimensional heterogeneity in mortality risk and one-dimensional heterogeneity in preferences across different individuals in the above model.

We allow for one-dimensional heterogeneity in mortality risk by using a mixed proportional hazard (MPH) model. That is, we assume that the mortality hazard rate of individual i at time t is given by

θitlimdt0Pr(mi[t,t+dt)xi,mit)dt=αiθ0(xi)ψ(t)
(5)

where mi denotes the realized mortality date, ψ(t) the baseline hazard rate, xi is an observable that shifts the mortality rate, and αi [set membership] R+ represents unobserved heterogeneity. We also assume that individuals have perfect information about this stochastic mortality process; that is, we assume that individuals know their θit's. This allows us to integrate over this continuous hazard rate to obtain the vector κi(κti)t=t0T that enters the guarantee choice model.

We allow for one-dimensional heterogeneity in preferences by assuming that ui(c) is homogeneous across all individuals and that bi(w) is the same across individuals up to a multiplicative factor. Moreover, we assume that

ui(c)=c1γ1γ
(6)

and

bi(w)=βiw1γ1γ.
(7)

That is, we follow the literature and assume that all individuals have a (homogeneous) CRRA utility function, but, somewhat less standard, we specify the utility from wealth at death using the same CRRA form with the same parameter γ, and allow (proportional) heterogeneity across individuals in this dimension, captured by the parameter βi. One can interpret βi as the relative weight that individual i puts on wealth when dead relative to consumption while alive. All else equal, a longer guarantee is therefore more attractive when βi is higher. We note, however, that since u(·) is defined over a flow of consumption while b(·) is defined over a stock of wealth, it is hard to interpret the level of βi directly. We view this form of heterogeneity as attractive both for intuition and for computation; in Section 6 we investigate alternative assumptions regarding the nature of preference heterogeneity.

Since we lack data on individuals’ initial wealth w0i, we chose the utility function above to enable us to ignore w0i. Specifically, our specification implies that preferences are homothetic, and – combined with the fact that guarantee payments are proportional to the annuitized amount (see Section 2) – that an individual's optimal guarantee choice gi is invariant to initial wealth w0i. This simplifies our analysis, as it means that in our baseline specification unobserved heterogeneity in initial wealth w0i is not a concern. It is, however, potentially an unattractive modeling decision, since it is not implausible that wealthier individuals care more about wealth after death. In Section 6 we explore specifications with non-homothetic preferences, but this requires us to make an additional assumption regarding the distribution of initial wealth. With richer data that included w0i we could estimate a richer model with non-homothetic preferences.

Finally, we treat a set of other parameters that enter the guarantee choice model as observable (known) and identical across all annuitants. Specifically, as we describe later, we use external data to calibrate the values for risk aversion γ, the discount rate δ, the fraction of wealth which is annuitized η, and the real interest rate r. While in principle we could estimate some of these parameters, they would be identified solely by functional form assumptions. We therefore consider it preferable to choose reasonable calibrated values, rather than impose a functional form that would generate these reasonable values. Some of these calibrations are necessitated by the limitations of our existing data. For example, we observe the annuitized amount so with richer data on wealth we could readily incorporate heterogeneity in ηi into the model.

3.3. Identification

In order to compute the welfare effect of various counterfactual policies, we need to identify the distribution (across individuals) of preferences and mortality rates. Here we explain how the assumptions we made allow us to recover this distribution from the data we observe about the joint distribution of mortality outcomes and guarantee choices. We make the main identification argument in the context of a continuous guarantee choice set, a continuous mortality outcome, and no truncation or censoring. In the end of the section we discuss how things change with a discrete guarantee choice and mortality outcomes that are left truncated and right censored, as we have in our setting. This requires us to make additional assumptions, which we discuss later.

Identification with a continuous guarantee choice (and uncensored mortality outcomes)

To summarize briefly, our identification is achieved in two steps. In the first step we identify the distribution of mortality rates from the observed marginal (univariate) distribution of mortality outcomes. This is possible due to the mixed proportional hazard model we assumed. In the second step we use the model of guarantee choice and the rest of the data – namely, the distribution of guarantee choices conditional on mortality outcomes – to recover the distribution of preferences and how it correlates with the mortality rate. The key conceptual step here is an exclusion restriction, namely that the mortality process is not affected by the guarantee choice. We view this “no moral hazard” assumption as natural in our context.

We start by introducing notation. The data about individual i is (mi, gi, xi), where mi is his observed mortality outcome, gi [set membership] G his observed guarantee choice, and xi is a vector of observed (individual) characteristics. The underlying object of interest is the joint distribution of unobserved preferences and mortality rates F(α, β|x), as well as the baseline mortality hazard rate (θ0(xi) and ψ(t)). Identification requires that, with enough data, these objects of interest can be uniquely recovered.

At the risk of repetition, let us state four important assumptions that are key to the identification argument.

Assumption 1 Guarantee choices are given by gi=g((κti)t=t0T,βixi), which comes from the solution to the guarantee choice model of Section 2.1.

Assumption 2 (MPH) Mortality outcomes are drawn from a mixed proportional hazard (MPH) model. That is, θit = αi θ0(xi)ψ(t) with αi [set membership] R+.

Assumption 3 (No moral hazard) mi is independent of βi, conditional on αi.

Assumption 4 (Complete information) κti=(exp(0t1θiτdτ)exp(0tθiτdτ))exp(0t1θiτdτ).

The first assumption simply says that all individuals in the data make their guarantee choices using the model. It is somewhat redundant, as it is only the model that allows us to define κi and βi, but we state it for completeness. The second assumption (MPH) is key for the first step of the identification argument. This assumption will drive our ability to recover the distribution of mortality rates from mortality data alone. Although this is a non-trivial assumption, it is a formulation which is broadly used in much of the duration data literature (Van den Berg (2001)). We note that assuming that αi is one-dimensional is not particularly restrictive, as any multidimensional αi could be summarized by a one-dimensional statistic in the context of the MPH model.

The third assumption formalizes our key exclusion restriction. It states that θit is a sufficient statistic for mortality, and although βi may affect guarantee choices gi, this in turn doesn't affect mortality. In other words, if individuals counterfactually change their guarantee choice, their mortality experience will remain unchanged. This seems a natural assumption in our context. We note that, unconditionally, βi could be correlated with mortality outcomes indirectly, through a possible cross-sectional correlation between αi and βi.

The fourth and final assumption states that individuals have perfect information about their mortality process; that is, we assume that individuals know their θit's. This allows us to integrate over this continuous hazard rate to obtain the vector κi(κti)t=t0T that enters the guarantee choice model, so we can write g(αi, βi) instead of g((κti)t=t0T,βixi). This is however a very restrictive assumption, and its validity is questionable. Fortunately, we note that any other information structure – that is, any known (deterministic or stochastic) mapping from individuals’ actual mortality process θit to their perception of it κi – would also work for identification. Indeed, we investigate two such alternative assumptions in Section 6.4. Some assumption about the information structure is required since we lack data on individuals’ ex ante expectations about their mortality.

Before deriving our identification results, we should point out that much of the specification decisions, described in the previous section, were made to facilitate identification. That is, many of the assumptions were made so that preferences and other individual characteristics are known up to a one-dimensional unobservable βi. This is a strong assumption, which rules out interesting cases of, for example, heterogeneity in both risk aversion and utility from wealth after death.

We now show identification of the model in two steps, in Proposition 1 and Proposition 2.

Proposition 1 If (i) Assumption 2 holds; (ii) E[α] < ∞; and (iii) θ0(xi) is not a constant, then the marginal distribution of αi, Fα(αi), as well as θ0(xi) and ψ(t), are identified – up to the normalizations E[α] = 1 and θ0(xi) = 1 for some i – from the conditional distribution of Fm(mi|xi).

This proposition is the well known result that MPH models are non-parameterically identified. It was first proven by Elbers and Ridder (1982). Heckman and Singer (1984) show a similar result, but instead of assuming that α has a finite mean, they make an assumption about the tail behavior of α. Ridder (1990) discusses the relationship between these two assumptions, and Van den Berg (2001) reviews these and other results. The key requirement is that xi (such as a gender dummy variable in our context) shifts the mortality distribution.

We can illustrate the intuition for this result using two values of θ0(xi), say θ1 and θ2. The data then provides us with two distributions of mortality outcomes, Hj(m) = F(m|θ0(xi) = θj) for j = 1, 2. With no heterogeneity in αi, the MPH assumption implies that the hazard rates implied by H1(m) and H2(m) should be a proportional shift of each other. Once αi is heterogeneous, however, the difference between θ1 and θ2 leads to differential composition of survivors at a given point in time. For example, if θ1 is less than θ2, then high αi people will be more likely to survive among those with θ1. Loosely, as time passes, this selection will make the hazard rate implied by Zm1 closer to that implied by Zm2. With continuous (and uncensored) information about mortality outcomes, these differential hazard rates between the two distributions can be used to back out the entire distribution of αi, Fα (αi), which will then allow us to know θ0(xi) and ψ(t).

This result is useful because it shows that we can obtain the (marginal) distribution of αi (and the associated θ0(xi) and ψ(t) functions) from mortality data alone, i.e. from the marginal distribution of mi. We now proceed to the second step, which shows that given θ0(xi), ψ(t), and Fα(·), the joint distribution F(α, β|x) is identified from the observed joint distribution of mortality and guarantee choices. Although covariates were necessary to identify θ0(xi), ψ(t), and Fα(·), they will play no role in what follows, so we will omit them for convenience for the remainder of this section.

Proposition 2 If Assumptions 1-4 hold, then the joint distribution of mortality outcomes and guarantee choices identifies Pr(g(α, β ) ≤ y|α). Moreover, if, for every value of α, g(α, β) is invertible with respect to β then Fβ|α is identified.

The proof is provided in Appendix B. Here we provide intuition, starting with the first part of the proposition. If we observed αi, identifying Pr(g(α, β) ≤ y|α) would have been trivial. We could simply estimate the cumulative distribution function of gi for every value of αi off the data. While in practice we can't do exactly this because αi is unobserved, we can almost do this using the mortality information mi and our knowledge of the mortality process (using Proposition 1). Loosely, we can estimate Pr(g(α, β) ≤ y|m) off the data , and then “invert” it to Pr(g(α, β) ≤ y|α) using knowledge of the mortality process. That is, we can write

Pr(g(α,β)ym)=(αfm(mα)dFα(α))1αPr(g(α,β)yα)fm(mα)dFα(α)
(8)

where the left hand side is known from the data, and fm(m|α) (the conditional density of mortality date) and Fα(α) are known from the mortality data alone (Proposition 1). The proof (in Appendix B) simply verifies that this integral can be inverted. The second part of Proposition 2 is fairly trivial. If Pr(g(α, β) ≤ y|α) is identified for every α, and g(α, β) is invertible (with respect to β) for every α, then it is straightforward to obtain Pr(βy|α) for every α. This together with the marginal distribution of α, which is identified through Proposition 1, provides the entire joint distribution.

One can see that the invertibility of g(α, β) (with respect to β) is important. The identification statement is stated in such a way because, although intuitive, proving that the guarantee choice is monotone (and therefore invertible) in βis difficult. The difficulty arises due to the dynamics and non-stationarity of the guarantee choice model, which require its solution to be numerical and make general characterization of its properties difficult. One can obtain analytic proofs of this monotonicity property in simpler (but empirically less interesting) environments (e.g., in a two period model, or in an infinite horizon model with log utility). We note, however, that we are reassured about our simple intuition based on numerical simulations; the monotonicity result holds for any specification of the model and/or values of the parameters that we have tried, although absent an analytical proof some uncertainty must remain regarding identification.

Implications of a discrete guarantee choice and censored mortality outcomes

In many applications the (guarantee) choice is discrete, so – due to its discrete nature – g(β|α) is only weakly monotone in β, and therefore not invertible. In that case, the first part of Proposition 2 still holds, but Pr(βy|α) is identified only in a discrete set of points, so some parametric assumptions will be needed to recover the entire distribution of β, conditional on α. In our specific application, there are only three guarantee choices, so we can only identify the marginal distribution of α, F(α), and, for every value of α, two points of the conditional distribution Fβ|α. We therefore recover the entire joint distribution by making a parametric assumption (see below) that essentially allows us to interpolate Fβ|α from the two points at which it is identified to its entire support. We note that, as in many discrete choice models, if we had data with sufficiently rich variation in covariates or variation in annuity rates that was exogenous to demand, we would be non-parameterically identified even with a discrete choice set.

Since our data limitations mean that we require a parametric assumption for Fβ|α we try to address concerns about such (ad hoc) parametric assumptions by investigating the sensitivity of the results to several alternatives in Section 6. An alternative to a parametric interpolation is to make no attempt at interpolation, and to simply use the identified points as bounds on the cumulative distribution function. In Section 6 we also report such an exercise.

A second property of our data that makes it not fully consistent with the identification argument above is the censoring of mortality outcomes. Specifically, we do not observe mortality dates for those who are alive by the end of 2005, implying that we have no information in the data about mortality hazard rates for individuals older than 83. While we could identify and estimate a non-parametric baseline hazard for the periods for which mortality data are available (as well as a non-parametric distribution of αi), there is obviously no information in the data about the baseline hazard rate for older ages. Because evaluating the guarantee choice requires knowledge of the entire mortality process (through age T , which we assume to be 100), some assumption about this baseline hazard is necessary. We therefore make (and test for) a parametric assumption about the functional form of the baseline hazard.

3.4. Parameterization

Mortality process

As we have just mentioned, due to the censored mortality data, we make a parametric assumption about the mortality hazard rate. Specifically, we assume that the baseline hazard rate follows a Gompertz distribution with shape parameter λ. That is, the baseline hazard rate is given by ψ(t) = eλt and individual i's mortality hazard at time t = agei 60 is therefore given by ψi(t) = αieλt. We can test the Gompertz assumption in our sample against more flexible alternatives by focusing on individuals’ mortality experience prior to the age of 83. We are reassured that the Gompertz assumption cannot be rejected by our (censored) mortality data.9 We also note that the Gompertz distribution is widely used in the actuarial literature that models mortality (Horiuchi and Coale (1982)).

We model mortality as a continuous process and observe mortality at the daily level. However, since the parameterized version of the guarantee choice model is solved numerically, we work with a coarser, annual frequency, reducing the computational burden. In particular, given the above assumption, let

S(α,λ,t)=exp(αλ(1eλt))
(9)

be the Gompertz survival function, and the discrete (annual) hazard rate at year t is given by κti=S(αi,λ,t)S(αi,λ,t+1)S(αi,λ,t).

Unobserved heterogeneity

An individual in our data can be characterized by an individual-specific mortality parameter αi and an individual-specific preference parameter βi. Everything else is assumed common across individuals. Although, as we showed, the joint distribution F(α, β) is non-parameterically identified with continuous guarantee choice, in practice only three guarantee lengths are offered, so we work with a parametrized distribution.

In the baseline specification we assume that αi and βi are drawn from a bivariate lognormal distribution

(logαilogβi)N([μαμβ],[σα2ρσaσβρσασβσβ2]).
(10)

In Section 6 we explore other distributional assumptions.

Calibrated values for other parameters

As mentioned, we treat a set of other parameters – γ, δ, η, and r – as observables, and calibrate their values. Here, we list the calibrated values and their source; in Section 6 we assess the sensitivity of the results to these values.

Since the insurance company does not have information on the annuitant's wealth outside of the annuity, we calibrate the fraction of wealth annuitized (η) based on Banks and Emmerson (1999), who use market-wide evidence from the Family Resources Survey. They report that for individuals with compulsory annuity payments, about one-fifth of income (and therefore presumably of wealth) comes from the compulsory annuity. We therefore set η = 0.2. In Section 6 we discuss what the rest of the annuitants’ wealth portfolio may look like and how this may affect our counterfactual calculations.

We use γ = 3 as the coefficient of relative risk aversion. A long line of simulation literature uses this value (Hubbard, Skinner, and Zeldes (1995), Engen, Gale, and Uccello (1999), Mitchell, Poterba, Warshawsky, and Brown (1999), Scholz, Seshadri, and Khitatrakun (2003), Davis, Kubler, and Willen (2006)). Although a substantial consumption literature, summarized in Laibson, Repetto, and Tobacman (1998), has found risk aversion levels closer to 1, as did Hurd's (1989) study among the elderly, other papers report higher levels of relative risk aversion (Barsky, Kimball, Juster, and Shapiro (1997), Palumbo (1999)).

For r we use the real interest rate corresponding to the inflation-indexed zero-coupon ten-year Bank of England bond, as of the date of the pricing menu we use (January 1, 1992, in the baseline specification). This implies a real interest rate r of 0.0426. We also assume that the discount rate δ is equal to the real interest rate r.

Finally, since the annuities make constant nominal payments, we need an estimate of the expected inflation rate π to translate the initial nominal payment rate shown in Table II into the real annuity payout stream zt in the guarantee choice model. We use the difference between the real and nominal interest rates on the zero-coupon ten year Treasury bonds on the same date to measure the (expected) inflation rate. This implies an (expected) inflation rate π of 0.0498.10

Summary and intuition

Thus, to summarize, in the baseline specification we estimate six remaining structural parameters: the five parameters of the joint distribution of αi and βi, and the shape parameter λ of the Gompertz distribution. We also allow for observable shifters to the means of the distribution. Specifically, we allow μα and μβ to vary based on the individual's gender and age at the time of annuitization. We do this because annuity rates vary with these characteristics, presumably reflecting differential mortality by gender and age of annuitization; so that our treatment of preferences and mortality is symmetric, we also allow mean preferences to vary on these same dimensions.

To gain intuition, note that one way to summarize the mortality data is by a graph of the log hazard mortality rate with respect to age. The Gompertz assumption implies that, without heterogeneity, this graph is linear with a slope of λ. Heterogeneity implies a concave graph, as over time lower mortality individuals are more likely to survive. Thus, loosely, the level of this graph affects the estimate of μα, the average slope affects the estimate of λ, and the concavity affects the estimate of σα . Since σα is a key parameter (which determines the extent of adverse selection), in Section 6 we explore the sensitivity of the results to more and less concave baseline hazard models.

Consider now the data on guarantee choices, and its relationship to mortality outcomes. Suppose first that there was no heterogeneity in mortality rates (σα = 0). In such a case, the guarantee choice model would reduce to a standard ordered probit with three choices (see equation (14) below), and the thresholds would be known from the guarantee choice model and estimates of μα and λ. In this simple case the mean and variance of β would be directly estimated off the observed shares of the three different guarantee choices.

It is the presence of unobserved heterogeneity in mortality risk (σα > 0) that makes intuition more subtle. The guarantee choice is still similar to an ordered probit, but the thresholds (which depend on αi) are now unobserved. Therefore, the model is similar to an ordered probit with random effects. This is where the relationship between mortality and guarantee choices is crucial. By observing mi, we obtain information about the unobserved αi. Although this information is noisy (due to the inherent randomness of any hazard model), it is still useful in adjusting the weights Pr(mi|α, λ) in the integral in equations (13) and (14) below. Loosely, individuals who (ex post) die earlier are more likely (from the econometrician's perspective) to be of higher (ex ante) mortality rate αi. Therefore, the mortality data is used as a stochastic shifter of the individual random effects. This allows separate identification of σβ and the correlation parameter ρ.

3.5. Estimation

For computational convenience, we begin by estimating the shape parameter of the Gompertz hazard λ using only mortality data. We then use the guarantee choice and mortality data together to estimate the parameters of the joint distribution F(α, β). We estimate the model using maximum likelihood. Here we provide a general overview; more details are provided in Appendix C.

Estimation of the parameters of the baseline hazard rate (λ)

We observe mortality in daily increments, and treat it as continuous for estimation. We normalize ti = agei – 60 (as 60 is the age of the youngest individual who makes a guarantee choice in our sample). For each individual i, the mortality data can be summarized by mi = (ci; ti; di) where ci is the (normalized) age at which individual i entered the sample (due to left truncation) and ti is the age at which he exited the sample (due to death or censoring). di is an indicator for whether the person died (di = 1) or was censored (di = 0).

Conditional on α, the likelihood of observing mi is

Pr(mi=(ci,ti,di)α,λ)=1S(α,λ,ci)(s(α,λ,ti))di(S(α,λ,ti))1di,
(11)

where S(·) is the Gompertz survival function (see equation (9)) and s()=S(α,λ,t)t is the Gompertz density. Our incorporation of ci into the likelihood function accounts for the left truncation in our data.

We estimate λ using only mortality data. We do so by using equation (11) and integrating over αi. That is, we maximize the following likelihood

LM(λ,μα,σα(mi)i=1N)=i=1Nlog(Pr(miα,λ)1σαϕ(logαμασα)dα)
(12)

to obtain a consistent estimate of λ.11

Estimation of the parameters of F(α, β)

Having estimated λ, we can then use the guarantee choice model to numerically compute the optimal guarantee choice for each combination of αi and βi. This choice is also a function of the other (calibrated) parameters of the model and of the observed annuity rates. Consistent with intuition, the numerical solution to the model has the property that the relative value that individual i obtains from a (longer) guarantee is increasing in both αi and βi. Recall that this monotonicity property is important for identification; specifically, it is key to proving the second part of Proposition 2. This implies that for any value of αi, the guarantee choice can be characterized by two cutoff points: β05(αi) and β510(αi). The former is the value of βi that makes an individual (with parameter αi) indifferent between choosing no guarantee and a 5 year guarantee, while the latter is the value of βi that makes an individual (with parameter αi) indifferent between choosing a 5 year and a 10 year guarantee. For almost all relevant values of αi the baseline model – as well as other variants we estimated – and its specification results in β05(αi)<β510(αi), implying that there exists a range of βi's that implies a choice of a 5 year guarantee (the modal choice in the data). For some extreme values of αi this does not hold, but because αi is unobserved this does not create any potential problem. Figure 1 illustrates the optimal guarantee choice in the space of αi and βi, in the context of the baseline specification and the mortality data (which were used to estimate λ).

Figure 1
Schematic indifference sets

Keeping λ fixed at its estimate, we then estimate the parameters of F(α, β) by maximizing the likelihood of guarantee choices and mortality. The likelihood depends on the observed mortality data mi and on individual i's guarantee choice gi [set membership] {0, 5, 10}. We can write the contribution of individual i to the likelihood as

li(migi;μ,Σ,λ)=Pr(miα,λ)(1(gi=argmaxgV0A(g)(β,α,λ))dF(βα;μ,Σ))dF(α;μ,Σ)
(13)

where F(α; μ, Σ) is the marginal distribution of αi, F(β|α; μ, Σ) is the conditional distribution of βi, λ is the Gompertz shape parameter, Pr(mi|α, λ ) is given in equation (11), 1(·) is the indicator function, and the value of the indicator function is given by the guarantee choice model discussed in Section 3.1.

Given the monotonicity of the optimal guarantee choice in βi (and ignoring – for presentation only – the rare cases of β05(αi)>β510(αi)), we can rewrite equation (13) as

li(mi,gi;μ,Σ,λ)={Pr(miα,λ)(F(β05(α)α;μ,Σ))dF(α;μ,Σ)ifgi=0Pr(miα,λ)(F(β510(α)α;μ,Σ)F(β05(α)α;μ,Σ))dF(α;μ,Σ)ifgi=5Pr(miα,λ)(1F(β510(α)α;μ,Σ))dF(α;μ,Σ)ifgi=10}.
(14)

That is, the inner integral in equation (13) becomes an ordered probit, where the cutoff points are given by the location in which a vertical line in Figure 1 crosses the two curves.

The primary computational challenge in maximizing the likelihood is that, in principle, each evaluation of the likelihood requires us to resolve the guarantee choice model and compute these cutoff points for a continuum of values of α. Since the guarantee choice model is solved numerically, this is not trivial. Therefore, instead of recalculating these cutoffs at every evaluation of the likelihood, we calculate the cutoffs on a large grid of values of α only once and then interpolate to evaluate the likelihood. Unfortunately, since the cutoffs also depend on λ, this method does not allow us to estimate λ jointly with all the other parameters. We could calculate the cutoffs on a grid of values of both α and λ, but this would increase computation time substantially. This is why, at some loss of efficiency but not of consistency, we first estimate λ using only the mortality portion of the likelihood, fix λ at this estimate, calculate the cutoffs, and estimate the remaining parameters from the full likelihood above. To compute standard errors, we use a nonparametric bootstrap.

4. ESTIMATES AND FIT OF THE BASELINE MODEL

4.1. Parameter estimates

Table III reports the parameter estimates. We estimate significant heterogeneity across individuals, both in their mortality and in their preference for wealth after death. We estimate a positive correlation (ρ) between mortality and preference for wealth after death. That is, individuals who are more likely to live longer (lower α) are likely to care less about wealth after death. This positive correlation may help to reduce the magnitude of the inefficiency caused by private information about risk; individuals who select larger guarantees due to private information about their mortality (i.e. high α individuals) are also individuals who tend to place a relatively higher value on wealth after death, and for whom the cost of the guarantee is not as great as it would be if they had relatively low preferences for wealth after death.

Table III
Parameter estimates

For illustrative purposes, Figure 2 shows random draws from the estimated distribution of log α and log β for each age-gender cell, juxtaposed over the estimated indifference sets for that cell. The results indicate that both mortality and preference heterogeneity are important determinants of guarantee choice. This is similar to recent findings in other insurance markets that preference heterogeneity can be as or more important than private information about risk in explaining insurance purchases (Finkelstein and McGarry (2006), Cohen and Einav (2007), Fang, Keane, and Silverman (2008)). As discussed, we refrain from placing a structural interpretation on the β parameter, merely noting that a higher β reflects a larger preference for wealth after death relative to consumption while alive. Nonetheless, our finding of heterogeneity in β is consistent with other estimates of heterogeneity in the population in preferences for leaving a bequest (Laitner and Juster (1996), Kopczuk and Lupton (2007)).

Figure 2
Estimated distributions

4.2. Model fit

Table IV and Table V present some results on the in-sample and out-of-sample fit of the model, respectively. We report results both overall and separately for each age-gender cell. Table IV shows that the model fits very closely the probability of choosing each guarantee choice, as well as the observed probability of dying within our sample period. The model does, however, produce a monotone relationship between guarantee choice and mortality rate, while the data show a non-monotone pattern, with individuals who choose a 5 year guarantee period associated with highest mortality. As previously discussed (see footnote 4), the non-monotone pattern in the data may merely reflect sampling error; we are unable to reject the null that the 5 and 10 year guarantees have the same mortality rate.

Table IV
Within-sample fit
Table V
Out–of-sample fit

Table V compares our mortality estimates to two different external benchmarks. These speak to the out-of-sample fit of our model in two regards: the benchmarks are not taken from the data, and the calculations use the entire mortality distribution based on the estimated Gompertz mortality hazard, while our mortality data are right censored. The top panel of Table V reports the implications of our estimates for life expectancy. As expected, men have lower life expectancies than women. Men who purchase annuities at age 65 have higher life expectancies than those who purchase at age 60, which is what we would expect if age of annuity purchase were unrelated to mortality. Women who purchase at 65, however, have lower life expectancy than women who purchase at 60, which may reflect selection in the timing of annuitization, or the substantially smaller sample size available for 65 year old women. As one way to gauge the magnitude of the mortality heterogeneity we estimate, Table V indicates that in each age-gender cell, there is about a 1.4 year difference in life expectancy, at the time of annuitization, between the 5th and 95th percentile.

The fourth row of Table V contains life expectancy estimates for a group of U.K. pensioners whose mortality experience may serve as a rough proxy for that of U.K. compulsory annuitants.12 We would not expect our life expectancy estimates – which are based on the experience of actual compulsory annuitants in a particular firm – to match this rough proxy exactly, but it is reassuring that they are in a similar ballpark. Our estimated life expectancy is about 2 years higher. This difference is not driven by the parametric assumptions, but reflects higher survival probabilities for our annuitants than our proxy group of U.K. pensioners; this difference between the groups exists even within the range of ages for which we observe survival in our data and can compare the groups directly (not shown).

The bottom of Table V presents the average expected present discounted value (EPDV) of annuity payments implied by our mortality estimates and our assumptions regarding the real interest rate and the inflation rate. Since each individual's initial wealth is normalized to 100, of which 20 percent is annuitized, an EPDV of 20 would imply that the company, if it had no transaction costs, would break even. Note that nothing in our estimation procedure guarantees that we arrive at reasonable EPDV payments. It is therefore encouraging that for all the four cells, and for all guarantee choices within these cells, the expected payout is fairly close to 20; it ranges across the age-gender cells from 19.74 to 20.66. One might be concerned by an average expected payment that is slightly above 20, which would imply that the company makes negative profits. Note, however, that if the effective interest rate the company uses to discount its future payments is slightly higher than the risk-free rate of 0.043 that we use in the individual's guarantee choice model, the estimated EPDV annuity payments would all fall below 20. It is, in practice, likely that the insurance company receives a higher return on its capital than the risk free rate, and the bottom row of Table V shows that a slightly higher interest rate of 0.045 would, indeed, break even. In Section6 we show that our welfare estimates are not sensitive to using an interest rate that is somewhat higher than the risk free rate used in the baseline model.

As another measure of the out-of-sample fit, we examined the optimal consumption trajectories implied by our parameter estimates and the guarantee choice model. These suggest that most of the individuals are saving in their retirement (not shown). This seems contrary to most of the empirical evidence (e.g., Hurd (1989)), although there is evidence consistent with positive wealth accumulation among the very wealthy elderly (Kopczuk (2007)), and evidence, more generally, that saving behavior of high wealth individuals may not be representative of the population at large (Dynan, Skinner, and Zeldes (2004)); individuals in this market are higher wealth than the general U.K. population (Banks and Emmerson (1999)). In light of these potentially puzzling wealth accumulation results, we experimented with a variant of the baseline model that allows individuals to discount wealth after death more steeply than consumption while alive. Specifically, we modified the consumer per-period utility function (as shown in equation (1)) to be

vi(wt,ct)=(1κti)ui(ct)+ζtκtibi(wt),
(15)

where ζ is an additional parameter to be estimated. Our benchmark model corresponds to ζ = 1. Values of ζ < 1 imply that individuals discount wealth after death more steeply than consumption while alive. Such preferences might arise if individuals care more about leaving money to children (or grandchildren) when the children are younger than when they are older. We find that the maximum likelihood value of ζ is 1. Moreover, when we re-estimate the model imposing values of ζ relatively close to 1 (such as ζ = 0.95), we are able to produce more sensible wealth patterns in retirement, but do not have a noticeable effect on our core welfare estimates.

5. WELFARE ESTIMATES

We now take our parameter estimates as inputs in calculating the welfare consequences of asymmetric information and government mandates. We start by defining the welfare measure we use, and calculating welfare in the observed, asymmetric information equilibrium. We then perform two counterfactual exercises in which we compare equilibrium welfare to what would arise under a mandatory social insurance program that does not permit choice over guarantee, and under symmetric information. Although we focus primarily on the average welfare, we also briefly discuss distributional implications.

5.1. Measuring welfare

A useful monetary metric for comparing utilities associated with different annuity allocations is the notion of wealth-equivalent. The wealth-equivalent denotes the amount of initial wealth that an individual would require in the absence of an annuity, in order to be as well off as with his initial wealth and his annuity allocation. The wealth-equivalent of an annuity with guarantee period g and initial wealth of w0 is the implicit solution to

V0A(g)(w0)V0NA(wealthequivalent),
(16)

where both V0A(g)() and V0NA() are defined in Section 3. This measure is commonly used in the annuity literature (Mitchell, Poterba, Warshawsky, and Brown (1999), Davidoff, Brown, and Diamond (2005)).

A higher value of wealth-equivalent corresponds to a higher value of the annuity contract. If the wealth equivalent is less than initial wealth, the individual would prefer not to purchase an annuity. More generally, the difference between the wealth-equivalent and the initial wealth is the amount an individual is willing to pay in exchange for having access to the annuity contract. This difference is always positive for a risk averse individual who does not care about wealth after death and faces an actuarially fair annuity rate. It can take negative values if the annuity contract is over-priced (compared to the individual-specific actuarially fair rate) or if the individual sufficiently values wealth after death.

Our estimate of the average wealth-equivalent in the observed equilibrium provides a monetary measure of the welfare gains (or losses) from annuitization given equilibrium annuity rates and individuals’ contract choices. The difference between the average wealth equivalent in the observed equilibrium and in a counterfactual allocation provides a measure of the welfare difference between these allocations.

We provide two ways to quantify these welfare differences. The first provides an absolute monetary estimate of the welfare gain or loss associated with a particular counterfactual scenario. To do this, we scale the difference in wealth equivalents by the £6 billion which are annuitized annually (in 1998) in the U.K. annuity market (Association of British Insurers (1999)). Since the wealth equivalents are reported per 100 units of initial wealth and we assume that 20 percent of this wealth is annuitized, this implies that each unit of wealth-equivalent is equivalent, at the aggregate, to £300 million annually. We also occasionally refer to a per-annuitant welfare gain, which we compute by dividing the overall welfare effect by 300,000, which is our estimate of new annuitants in the U.K. market in 1998.13 Of course, one has to be cautious about these specific numbers, as they rely on extrapolating our estimates from our specific sample to the entire market.

While an absolute welfare measure may be a relevant benchmark for policies associated with the particular market we study, a relative measure may be more informative when considering using our estimates as a possible benchmark in other contexts, or examining the quantitative sensitivity of our estimates. For example, if we considered the decision to buy a one month guarantee, we would not expect efficiency costs associated with this decision to be large relative to life-time wealth. A relative welfare estimate essentially requires a normalization factor.

Therefore, to put these welfare estimates in perspective, we measure the welfare changes relative to how large this welfare change could have been, given the observed annuity rates. We refer to this maximum potential welfare change as the “Maximum Money at Stake” (MMS). We define the MMS as the minimum lump sum that individuals would have to receive to insure them against the possibility that they receive their least-preferred allocation in the observed equilibrium, given the observed equilibrium pricing. The MMS is therefore the additional amount of pre-existing wealth an individual requires so that they receive the same annual annuity payment if they purchase the maximum guarantee length (10 years) as they would receive if they purchase the minimum guarantee length (0 years).

The nature of the thought experiment behind the MMS is that the welfare loss from buying a 10 year guarantee is bounded by the lower annuity payment that the individual receives as a result. This maximum welfare loss would occur in the worst case scenario, in which the individual had no chance of dying during the first 10 years (or alternatively, no value of wealth after death). We report the MMS per 100 units of initial wealth (i.e., per 20 units of the annuitized amount)

MMS20(z0z101),
(17)

where z0 and z10 denote the annual annuity rates for 0 and 10 year guarantees, respectively (see Table II). A key property of the MMS is that it depends only on annuity rates, but not on our estimates of preferences or mortality risk. Converting this to absolute amounts, the MMS is just over £500 million annually, just below £1, 700 per new annuitant, or about 8 percent of the market as a whole.

5.2. Welfare in observed equilibrium

The first row of Table VI shows the estimated average wealth equivalents per 100 units of initial wealth in the observed allocations implied by our parameter estimates. The average wealth equivalent for our sample is 100.16, and ranges from 99.9 (for 65 year old males) to 100.4 (for 65 year old females). An average wealth equivalent of less than 100 indicates an average welfare loss associated with the equilibrium annuity allocations relative to a case in which wealth is not annuitized; conversely, an average wealth equivalent of more than 100 indicates an average welfare gain from annuitization at the observed rates. Note that because annuitization of some form is compulsory, it is possible that individuals in this market would prefer not to annuitize.14

Table VI
Welfare estimates

Figure 3 shows the distribution across different types of the welfare gains and losses from annuitization at the observed annuity rates, relative to no annuities. This figure super-imposes iso-welfare contour lines over the same scatter plots presented in Figure 2. It indicates that, as expected, the individuals who benefit the most from the annuity market are those with low mortality (low α) and weak preference for wealth after death (low β). The former are high (survival) risk, who face better than actuarially fair annuity rates when they are pooled with the rest of the annuitants. The latter are individuals who get less disutility from dying without much wealth, which is more likely to occur with than without annuities.

Figure 3
Welfare contours

5.3. The welfare cost of asymmetric information

In the counterfactual symmetric information equilibrium, each person faces an actuarially fair adjustment to annuity rates depending on her mortality. Specifically, we offer each person payment rates such that the EPDV of payments for that person for each guarantee length is equal to the equilibrium average EPDV of payments. This ensures that each person faces an individual-specific actuarially fair reductions in payments in exchange for longer guarantees. Note that this calculation is (expected) revenue neutral, preserving any average load (or subsidy) in the market.

Figure 2 may provide a visual way to think about this counterfactual. In the counterfactual exercise, the points in Figure 2, which represent individuals, are held constant, while the indifference sets, which represent the optimal choices at a given set of annuity rates, shift. Wealth equivalents are different at the new optimal choices both because of the direct effect of the different annuity rates and because these rates in turn affect optimal contract choices.

We note that our welfare analysis of the impact of adverse selection considers only the impact of selection on the pricing of the observed contracts. Adverse selection may also affect the set of contracts offered, and this may have non trivial welfare costs. Our analysis however treats the contract set (of 0, 5, and 10 year guarantees) as given; that is, we assume that the contract space does not change in the counterfactual of symmetric information. The most important reason for this assumption is that incorporating the impact of adverse selection on the contract space would require a model of guarantee lengths in which the current offered guarantee lengths are optimal. This seems difficult given that the three offered guarantee lengths are fixed over time, across the annuity providers in the market, and perhaps most surprisingly over different age and gender combinations, which are associated with different mortality profiles.

The second panel of Table VI presents our estimates of the welfare cost of asymmetric information. The first row shows our estimated wealth-equivalents in the symmetric information counterfactual. As expected, welfare is systematically higher in the counterfactual world of symmetric information. For 65 year old males, for example, the estimates indicate that the average wealth equivalent is 100.74 under symmetric information, compared to 100.17 under asymmetric information. This implies that the average welfare loss associated with asymmetric information is equivalent to 0.57 units of initial wealth. For the other three age-gender cells, this number ranges from 0.14 to 0.27. Weighting all cells by their relative sizes, we obtain the overall estimate reported in the introduction of annual welfare costs of £127 million, £423 per new annuitant, or about 2 percent of annuitized wealth. This also amounts to 0.25 of the concept of maximum money at stake (MMS) introduced earlier.

What is the cause of this welfare loss? It arises from the distortion in the individual's choice of guarantee length relative to what he would have chosen under symmetric information pricing. Despite preference heterogeneity, we estimate that under symmetric information all individuals would choose 10 year guarantees (not shown). However, in the observed equilibrium only about 3 percent of individuals purchase these annuities. This illustrates the distortions in optimal choices in the observed equilibrium.

To illustrate the impact on different individuals, Figure 4 presents contour graphs of the changes in wealth equivalents associated with the change to symmetric information. That is, as before, for each age-gender cell we plot the individuals as points in the space of log α and log β, and then draw contour lines over them. All the individuals along a contour line are predicted to have the same absolute welfare change as a result of the counterfactual. Figure 4 indicates that, while almost all individuals benefit from a move to the first best, there is significant heterogeneity in the welfare gains arising from individual-specific pricing. The biggest welfare gains accrue to individuals with high mortality (high α) and high preferences for wealth after death (high β).

Figure 4
Welfare change contours (symmetric information)

Two different factors work in the same direction to produce the highest welfare gains for high α, high β individuals. First, a standard one-dimensional heterogeneity setting would predict that symmetric information would improve welfare for low risk (high α) individuals relative to high risk (low α) individuals. Second, the asymmetric information equilibrium involves cross-subsidies from higher guarantees to lower guarantees (the EPDV of payout decreases with the length of the guarantee period, as shown in Table V);15 by eliminating these cross-subsidies, symmetric information also improves the welfare of high β individuals, who place more value on higher guarantees. Since we estimate that α and β are positively correlated, these two forces reinforce each other.

A related question concerns the extent to which our estimate of the welfare cost of asymmetric information is influenced by re-distributional effects. As just discussed, symmetric information produces different welfare gains for individuals with different α and β. To investigate the extent to which our welfare comparisons are affected by the changes in cross-subsidy patterns, we recalculated wealth-equivalents in the symmetric information counterfactual under the assumption that each individual faces the same expected payments for each option in the choice set of the counterfactual as she receives at her choice in the observed equilibrium. The results (not shown) suggest that, in all the age-gender cells, our welfare estimates are not, in practice, affected by redistribution.

5.4. The welfare consequences of government mandated annuity contracts

Although symmetric information is a useful conceptual benchmark, it may not be relevant from a policy perspective since it ignores the information constraints faced by the social planner. We therefore consider the welfare consequences of government intervention in this market. Specifically, we consider the consequences of government mandates that each individual purchases the same guarantee length, eliminating any contract choice; as noted previously, such mandates are the canonical solution to adverse selection in insurance markets (Akerlof (1970)). To evaluate welfare under alternative mandates, we calculate average wealth equivalents when all people are forced to have the same guarantee period and annuity rate, and compare them to the average wealth equivalents in the observed equilibrium. We set the payment rate such that average EPDV of payments is the same as in the observed equilibrium; this preserves the average load (or subsidy) in the market.

Before presenting the results, it is useful to note a contrast between our setting and the standard or canonical insurance model. As mentioned in the introduction, unlike in a standard insurance setting, the optimal mandatory annuity contract cannot be determined by theory alone. In the canonical insurance model – that is, when all individuals are risk averse, the utility function is state-invariant, and there are no additional cost of providing insurance – it is well-known that mandatory (uniform) full insurance can achieve the first best allocation, even when individuals vary in their preferences. Since adverse selection reduces insurance coverage away from this first-best, no estimation is required in this standard context to realize that the optimal mandate is full insurance. In contrast, our model of annuity choices is governed by two different utility functions, one from consumption when alive, u(·), and one from wealth when dead, b(·) (see equation (1)). Therefore optimal (actuarially fair) guarantee coverage will vary across individuals depending on their relative preference for wealth at death vis-a-vis consumption while alive. In such a case, whether and which mandatory guarantee can improve welfare gains relative to the adverse selection equilibrium is not a-priori clear.16 The investigation of the optimal mandate – and whether it can produce welfare gains relative to the adverse selection equilibrium – therefore becomes an empirical question.

The results are presented in the bottom panels of Table VI. In all four age-gender cells, welfare is lowest under a mandate with no guarantee period, and highest under a mandate of a 10 year guarantee. Welfare under a mandate of a 5 year guarantee is similar to welfare in the observed equilibrium.

The increase in welfare from a mandate of 10 year guarantee is virtually identical to the increase in welfare associated with the first best, symmetric information outcome reported earlier. This mandate involves no allocative inefficiency, since we estimated that a 10 year guarantee is the first best allocation for all individuals. Although it does involve transfers (through the common pooled price) across individuals of different mortality risk, these do not appear to have much effect on our welfare estimate.17 Consistent with this, when we recalculated wealth-equivalents in each counterfactual under the assumption that each individuals faces the same expected payments in the counterfactual as she receives from her choice in the observed equilibrium, our welfare estimates were not noticeably affected (not shown). As with the counterfactual of symmetric information, there is heterogeneity in the welfare effects of the different mandates for individuals with different α and β. Not surprisingly, high β individuals benefit relatively more from the 10 year mandate and lose relatively more from the 0 year mandate (not shown).

Our findings highlight both the potential benefits and the potential dangers from government mandates. Without estimating the joint distribution of risk and preferences, it would not have been apparent that a 10 year guarantee is the welfare-maximizing mandate, let alone that such a mandate comes close to achieving the first best outcome. Were the government to mandate no guarantee, it would reduce welfare by about £107 million per year (£357 per new annuitant), achieving a welfare loss of about equal and opposite magnitude to the £127 million per year (£423 per new annuitant) welfare gain from the optimal 10 year guarantee mandate. Were the government to pursue the naive approach of mandating the currently most popular choice (5 year guarantees) our estimates suggest that this would raise welfare by only about £2 million per year or less than £7 per new annuitant, foregoing most of the welfare gains achievable from the welfare maximizing 10 year mandate. These results highlight the practical difficulties involved in trying to design mandates to achieve social welfare gains.

6. ROBUSTNESS

In this section, we explore the robustness of our welfare findings. Our qualitative welfare conclusions are quite stable across a range of alternative assumptions. In particular, the finding that the welfare maximizing mandate is a 10 year guarantee, and that this mandate achieves virtually the same welfare as the first best outcome, persists across all alternative specifications. The finding of welfare gains from a 10 year guarantee mandate but welfare losses from mandating no guarantee is also robust.

However, the quantitative estimates of the welfare cost of asymmetric information can vary non-trivially across specifications, and as a result needs to be interpreted with more caution. It is £127 million per year (i.e. 25 percent of the MMS) in our baseline specification. It ranges from £111 million per year to £244 million per year (or from 22 percent to about 50 percent of the MMS) across the alternative specifications. Our bounds exercise, which we discuss below, produces similar conclusions concerning the robustness of our findings concerning the optimal guarantee mandate and its ability to achieve close to the first best outcome, as well as the greater uncertainty about our quantitative welfare estimates of the gains from symmetric information.

Finally, we note that our robustness discussion focuses on the (qualitative and quantitative) sensitivity of our welfare estimates, rather than the estimates of the underlying parameters (e.g., the magnitude of the average β). The underlying parameters change quite a bit under many of the alternative models. This is important for understanding why, as we vary certain assumptions, it is not a-priori obvious how our welfare estimates will change (in either sign or magnitude). For example, although it may seem surprising that welfare estimates are not very sensitive to our assumption about the risk aversion parameter, recall that the estimated parameters also change with the change in the assumption about risk aversion.

The change in the estimated parameters across specifications is also important for the overall interpretation of our findings. One reason we hesitate to place much weight on the structural interpretation of the estimated parameters (or the extent of heterogeneity in these parameters) is that their estimates will be affected by our assumptions about other parameters (such as risk aversion or discount rate). This is closely related to the identification result in Section 3.

The remainder of this section describes the alternative specifications we explored. Table VII provides a summary of the main results.

Table VII
Robustness

6.1. Parameter choices

Following our discussion of the baseline model in Section 3, although we estimate the average level and heterogeneity in mortality (αi) and in preferences for wealth after death (βi), we choose values for a number of other parameters based on external information. While we could, in principle, estimate some of these parameters, they would be identified solely by functional form assumptions. Therefore, we instead chose to explore how our welfare estimates are affected by alternative choices for these parameters.

Choice of risk aversion coefficient (γ)

Our baseline specification (reproduced in row 1 of Table VII) assumes a (common) CRRA parameter of γ = 3 for both the utility from consumption u(c) and from wealth after death b(w). Rows 2 and 3 of Table VII show the results if instead we assume γ = 5 or γ = 1.5.

Rows 4 and 5 report specifications in which we hold constant the CRRA parameter in the utility from consumption (at γ = 3) but vary the CRRA parameter in the utility from wealth after death. Specifically, we estimate the model with γ = 1.5 or γ = 5 for the utility from wealth after death b(w).

A downside of the specifications reported in rows 4 and 5 is that they give rise to non-homothetic preferences and are therefore no longer scalable in wealth. This implies that heterogeneity in initial wealth may confound the analysis. Therefore, in row 6, we also allow for heterogeneity in initial wealth. As in row 5, we assume that γ = 3 for utility from consumption, but that γ = 1.5 for the utility from wealth after death. This implies that wealth after death acts as a luxury good, with wealthier individuals caring more, at the margin, about wealth after death. Such a model is consistent with the hypothesis that bequests are a luxury good, which may help explain the higher rate of wealth accumulation at the top of the wealth distribution (Dynan, Skinner, and Zeldes (2004), Kopczuk and Lupton (2007)). Unfortunately, we do not have data on individual's initial wealth w0i, which would allow us to incorporate it directly into the model. Instead, to allow for heterogeneity in initial wealth, we calibrate the distribution of wealth based on Banks and Emmerson (1999) and integrate over this (unobserved) distribution.18 We also let the means (but not variances) of log β and log β to vary with unobserved wealth. The welfare estimates are normalized to be comparable with the other exercises.

Choice of other parameters

We also reestimated the model assuming a higher interest rate than in the baseline case. As already mentioned, our estimates suggest that a slightly higher interest rate than the risk free rate we use in the individual's value function is required to have the annuity company not lose money. Thus, rather than the baseline which uses the risk free rate as of 1992 (r = δ = 0.043), in row 7 we allow for the likely possibility that the insurance company receives a higher rate of return, and reestimate the model with r = δ = 0.05. This in turn implies an average load on policies of 3.71 percent.

In row 8 we use a different set of annuity rates. Since the choice of 1992 pricing for our baseline model was arbitrary, we report results for a different set of annuity rates, from 1990, with the corresponding inflation and interest rates.

6.2. Wealth portfolio outside of the compulsory annuity market

As noted, our data do not contain information on the annuitant's wealth portfolio outside of the compulsory market. This is an important limitation to the data. In our baseline specification we used survey data reported by Banks and Emmerson (1999) to assume that 20 percent of the annuitants’ financial wealth is in the compulsory annuity market (η = 0.2), and the rest is in liquid financial wealth. Rows 9 and 10 report results under different assumptions of the fractions of wealth annuitized in the compulsory market (we tried values of 0.1 and 0.3 of η).

In row 11 we report results in which we allow for heterogeneity in η. We calibrate the distribution of η and integrate over this unobserved distribution.19 We allow the means (but not variances) of log α and log β to vary with this unobserved η.

In row 12, we assume that 50 percent of wealth is annuitized (at actuarially fair annuity rates) through the public Social Security program.20 We then consider the welfare cost of asymmetric information for the 20 percent of wealth annuitized in the compulsory market. As can be seen in Table VII, this alternative assumption has by far the biggest effect on our estimate of the welfare cost of asymmetric information, raising it from £127 million per year (or about 25 percent of the MMS) in the baseline specification to £244 million per year (or about 50 percent of the MMS).

As we noted at the outset of this section, it is difficult to develop good intuition for the comparative statics across alternative models since the alternative models also yield different estimated parameters. However, one potential explanation for our estimate of a larger welfare cost when 50 percent of wealth is in the public annuity may be that the individual now only has 30 percent of his wealth available to “offset” any undesirable consumption path generated by the 70 percent of annuitized wealth.

A related issue is the possibility that annuitants may adjust their non-annuitized financial wealth portfolio in response to the changes in guarantee prices created by our counterfactuals. Our analysis assumes that individuals do not adjust the rest of their portfolio in response to changes in their guarantee length or price. If individuals could purchase actuarially fair life insurance policies with no load, and without incurring any transaction costs in purchasing these policies, they could in principle undo much of the efficiency cost of annuitization in the current asymmetric information equilibrium. More generally, this issue fits into the broader literature that investigates the possibility and extent of informal insurance to lower the welfare benefits from government interventions or private insurance (Golosov and Tsyvinski (2007))

Of course, in practice the ability to offset the equilibrium using other parts of the financial portfolio will be limited by factors such as loads and transaction costs. Given that the maximum money at stake in the choice of guarantee is only about 8 percent of annuitized wealth under the observed annuity rates (and only about 4 percent (on average) under the counterfactual symmetric information rates), even relatively small transaction costs could well deter individuals from re-optimizing their portfolios in response to changes in guarantee prices. Re-optimization will also be limited by the fact that much of individuals’ wealth outside of the compulsory annuity market is tied up in relatively illiquid forms such as the public pension. Indeed, the data suggest that for individuals likely to be in the compulsory annuity market, only about 10 to 15 percent of their total wealth is in the form of liquid financial assets (Banks, Emmerson, Oldfield, and Tetlow (2005)). A rigorous analysis of this is beyond the scope of the current work, and would probably require better information than we have on the asset allocation of individual annuitants. With richer data that included information on the life insurance holdings in each individual's portfolio, we could potentially expand our model to include a model of life insurance demand and thereby use our estimates to examine how this aspect of the portfolio would respond to our counterfactual annuity rates, and how this in turn it would affect the welfare estimates of these counterfactuals. We hope that further research with hopefully richer data will build on the model and identification results here to extend the analysis in this important dimension.

6.3. Modeling heterogeneity

Different distributional assumptions of heterogeneity

We explored the sensitivity of our welfare estimates to the parameterization of unobserved heterogeneity. One potential issue concerns our parametric assumption regarding the baseline mortality distribution at the individual level. As discussed in the end of Section 3, our assumption about the shape of the individual mortality hazard affects our estimate of unobserved mortality heterogeneity (i.e., σα). To explore the importance of our assumption, row 13 presents results under a different assumption about the mortality distribution at the individual level. In particular, we assume a mortality distribution at the individual level with a hazard rate of αi exp (λt – t0)h) with h = 1.5, which increases faster over time than the baseline Gompertz specification (which has the same form, but h = 1). This, by construction, leads to a higher estimated level of heterogeneity in mortality, since the baseline hazard is more convex at the individual level.

We also investigated the sensitivity of the results to alternative joint distributional assumptions than our baseline assumption that α and β are joint lognormally distributed. Due to our estimation procedure, it is convenient to parameterize the joint distribution of α and β in terms of the marginal distribution of α and the conditional distribution of β. It is common in hazard models with heterogeneity to assume a gamma distribution (Han and Hausman (1990)). Accordingly, we estimate our model assuming that α follows a gamma distribution. We assume that conditional on α, β is distributed either lognormally (row 14) or gamma (row 15). Specifically, let aα be the shape parameter and bα be the scale parameter of the marginal distribution of α. When β is conditionally log-normally distributed, its distribution is parameterized by

log(β)αN(μβ+ρ(log(α)log(bα)),σβ2).
(18)

When β is conditionally gamma distributed, its shape parameter is simply aβ, and its conditional scale parameter is bβ = exp μβ + ρ (log(α) – log(bα))). These specifications allow thinner tails, compared to the bivariate lognormal baseline.

In unreported specifications, we have also experimented with discrete mixtures of lognormal distributions, in an attempt to investigate the sensitivity of our estimates to the one-parameter correlation structure of the baseline specification. These mixtures of lognormal distributions almost always collapsed back to the single lognormal distribution of the baseline estimates, trivially leading to almost identical welfare estimates.

Bounds

As mentioned earlier, an alternative to a parametric interpolation is to make no attempt at interpolation, and to simply use the identified points as bounds on the cumulative distribution function. To do so, we fix μα and σα (and λ) at our baseline estimates, and then use semiparametric Maximum Likelihood to obtain estimates for Pr(g(α, β) = y|α, where y = 0, 5, 10. As shown in Proposition 2, this conditional guarantee choice is identified even when the choice set is discrete. Using the guarantee choice model and the fact that the guarantee choice is (weakly) monotone in β in our model, these conditional guarantee choices can be mapped to bounds on the conditional distribution Fβ|α (see our discussion of β05(αi) and β510(αi) in the end of Section 3). We can then use these bounds to compute bounds on any object of interest.

To be more precise, let h(α, β) be an object of interest (e.g., welfare), and consider the case in which we wish to bound its population average. We then compute an upper bound by:

Eh=((supβ<β05(α)h(α,β))Pr(β<β05(α))++(supβ[β05(α),β510(α)]h(α,β))Pr(β[β05(α),β510(α)])++(supβ>β510(α)h(α,β))Pr(β>β510(α)))dF(α),
(19)

and similarly for the lower bound (with sup replaced by inf). We focus on bounding the welfare change from the different counterfactuals. To do this, we first compute the expected annuity payments in the observed equilibrium (these are point identified, as they are a function of the conditional guarantee choice, Pr(g(α, β) = y|α)), and use this to compute annuity rates in each of the counterfactuals. We then follow the procedure above to obtain bounds on the welfare change for each of the counterfactuals (a symmetric information case, and each of the three mandates we explored), for each of the age and gender combination separately.

The results from this exercise (not shown) imply that across all age and gender combinations, the welfare ranking of the different mandates is the same as in our baseline case. In all age-gender cases, the welfare effect of the different mandates can be unambiguously ranked in the sense that their bounds do not overlap. In particular, a 10 year guarantee mandate results in a positive welfare gain which even at its lower bound is always higher than the upper bound of the welfare gain from any other mandate. The no guarantee mandate always produces a negative effect on welfare (even at the upper bound), and a 5 year guarantee mandate results in a small and mostly negative welfare effect (in two of the four age-gender combinations the upper bound of the welfare is positive, but very small). As in the baseline model, the welfare gain of the symmetric information equilibrium is similar to that of a 10 year guarantee mandate in the sense that the ranges of these welfare gains largely overlap (although in most cases the symmetric equilibrium outcome results in slightly tighter bounds). Consistent with the baseline results, in all cases we also obtain the result that the vast majority of individuals choose the 10-year guarantee contract in the symmetric information counterfactual. To check robustness, we also use the same procedure to bound the difference in welfare between one counterfactual to each of the others. Given that the bounds on the welfare change do not overlap, it may not be surprising that the bounds on the welfare differences also give rise to the same ranking of guarantee mandates. That is, zero is never within these bounds, so each mandate can be unambiguously ranked with respect to each of the alternatives.

In contrast to the robust ranking, the bounds on the estimated magnitude of the welfare gains (from either symmetric information or from the 10-year guarantee mandate) are not tight. For example, in the largest age-gender cell (65 year old males), we estimate the lower bound on the welfare gain from symmetric information to be as low as 30 percent of our baseline estimate, and in another cell (60 year old males) the upper bound on the welfare change from symmetric information is 56% higher than our baseline estimate. We view these results as largely consistent with the rest of the sensitivity analysis in this section; the results regarding the optimal mandate, as well as the similarity of the welfare gains from the optimal mandate and symmetric information are quite robust, but the quantitative estimates of the welfare gains are more sensitive to various assumptions. Allowing heterogeneity in other parameters. While we allow for heterogeneity in mortality (α) and in preference for wealth after death (β), our baseline specification does not allow for heterogeneity in other determinants of annuity choice, such as risk aversion and discount rate. Since the various parameters are only identified up to a single dimension (see Section 3), except by functional form, more flexible estimation of α and β is analogous to a specification which frees up these other parameters.

One way to effectively allow for more flexible heterogeneity is to allow the mean of α and β to depend on various observable covariates. In particular, one might expect both mortality and preferences for wealth after death to vary with an individual's socioeconomic status. We observe two proxies for the annuitant's socioeconomic status: the amount of wealth annuitized and the geographic location of the annuitant residence (his or her ward) if the annuitant is in England or Wales (about 10 percent of our sample is from Scotland). We link the annuitant's ward to ward-level data on socioeconomic characteristics of the population from the 1991 U.K. Census; there is substantial variation across wards in average socioeconomic status of the population (Finkelstein and Poterba (2006)). Row 16 shows the results of allowing the mean of both parameters to vary with the annuitized amount and the percent of the annuitant's ward that has received the equivalent of a high school degree of higher; both of these covariates may proxy for the socioeconomic status of the annuitant.

We also report results from an alternative model in which – in contrast to our baseline model – we assume that individuals are homogenous in their β but heterogeneous in their consumption γ. Rows 17 and 18 report such a specification. In row 17 we fix β at its estimated conditional median from the baseline specification (Table III) and assume that α and the coefficient of risk aversion for utility from consumption are heterogeneous and (bivariate) lognormally distributed. The γ coefficient in the utility from wealth after death b(w) is fixed at 3. As in row 6, this specification gives rise to non-homothetic preferences, so we use the median wealth level from Banks and Emmerson (1999) and later renormalize, so the reported results are comparable.

Row 18 allows for preference heterogeneity in both β and γ. For computational reason, we assume that γ is drawn from a discrete support (of 1.5, 3, and 4.5). We assume that α and β are (as in the baseline model) joint lognormally distributed, but we allow γ (which is unobserved) to shift their means. We note that this specification of heterogeneity in both β and γ is only identified by functional form, cautioning against structural interpretation of the estimated distribution of heterogeneity.

6.4. Imperfect information about mortality

Throughout we made a strong assumption that individuals have perfect information about their actual mortality rate αi. This is consistent with empirical evidence that individuals’ perceptions about their mortality probabilities covary in sensible ways with known risk factors, such as age, gender, smoking, and health status (Hamermesh (1985), Smith, Taylor, and Sloan (2001), Hurd and McGarry (2002)). Of course, such work does not preclude the possibility that individuals also make some form of an error in forecasting their mortality.

We therefore investigate other assumptions about the information structure. Recall that while we make a perfect information assumption in order to establish identification, we can identify the model using alternative assumptions about the information structure. We report two such exercises here.

Before reporting the exercises, we note at the outset two potential complications with models of imperfect information, which are why we prefer to work with perfect information in our baseline specification. First, the dynamic nature of our model gives rise to potential learning. As individuals survive longer they may update their prior about their true underlying mortality process. While such learning can no longer affect their (past) guarantee choice, it could affect their consumption decisions. If forward looking individuals anticipate this possibility for learning, they may take this into account and it could alter their guarantee choice. We do not account for such learning in the exercises we report below. Second, once information is imperfect, the notion of welfare may be less obvious. One could measure “perceived” welfare which is measured with respect to the individual's information, or “true” welfare which is measured with respect to the true mortality process. We choose to report perceived welfare, which is more consistent with our notion of wealth equivalence.

Throughout, we assume that individuals have perfect information about the mortality process, except for their idiosyncratic risk characterized by αi. With some abuse of notation, we denote by κ(αi) the perceived mortality risk by individual i. Our first set of exercises assumes that individuals have biased beliefs about their mortality risk. In particular, individuals know that

logκ(αi)=μα(xi)+θ(logαiμα(xi)),
(20)

where αi is the true mortality rate of individual i, μα is the population mean of log αi (estimated in Table III), and κ(αi) is the mortality rate perceived by individuals when they make their guarantee choice and subsequent consumption decisions. θ is a free parameter. When θ = 1 individuals have correct beliefs and the above assumption reduces to our baseline model. When θ < 1 individuals perceive their mortality process as closer to the mean, while θ > 1 is the case where individuals over-weight idiosyncratic information. Results for the cases of θ = 0.5 and θ = 2 are summarized in rows 19 and 20 of Table VII.

The second set of exercises assumes that individuals have correct, but uncertain beliefs about their mortality risk. In particular, let

logκ(αi)N(logαi,σε2).
(21)

Our baseline model is the special case of σε = 0. The case of σε > 0 represents specifications where individuals are more uncertain about their mortality realization. We model the guarantee choices by having individuals form expected value functions by integrating over this additional uncertainty. In rows 21 and 22 we summarize results for the cases of σε = 0.027 and σε = 0.108, which are half and twice our estimate of σα (see Table III).

6.5. Departing from the neoclassical model

Our baseline model is a standard neoclassical model with fully rational individuals. It is worth briefly discussing various “behavioral” phenomena that our baseline model (or extensions to it) can accommodate.

A wide variety of non-standard preferences may be folded into the interpretation for the preference for wealth after death parameter β. As previously noted, this preference may reflect a standard bequest motive, or some version of “regret” or “peace of mind” that have been discussed in the behavioral literature (Braun and Muermann (2004)).

Another possibility we considered is non-traditional explanations for the high fraction of individuals in our data who choose the 5 year guarantee option. One natural possibility that can be ruled out is that this reflects an influence of the 5 year guarantee as the default option. In practice there is no default for individuals in our sample, all of whom annuitized at age 60 or 65. Individuals in this market are required to annuitize by age 70 (for women) or 75 (for men). To annuitize before that age, they must actively fill a form when they decide to annuitize, and must check a chosen guarantee length. Failure to complete such an active decision would simply delay annuitization until the maximum allowed age.

Another natural possibility is that the popularity of the 5 year guarantee may partly reflect the well-known phenomenon in the marketing literature that individuals are more likely to “choose the middle” (Simonson and Tversky (1992)). We therefore estimated a specification of the model in which we allow for the possibility that some portion of individuals “blindly” choose the middle, that is the 5 year guarantee option. We allow such individuals to also differ in the mean mortality rate. Row 23 summarizes the results from such a specification.21

6.6. Estimates for a different population

As a final robustness exercise, we re-estimated the baseline model on a distinct sample of annuitants. As mentioned briefly in Section 2 and discussed in more detail in Appendix A, in our baseline estimates we limit the annuitant sample to the two-thirds of individuals who have accumulated their pension fund with our company. Annuitants may choose to purchase their annuity from an insurance company other than the one in which their funds have been accumulating, and about one-third of the annuitants in the market choose to do so. As our sample is from a single company, it includes those annuitants who accumulated their funds with the company and stayed with the company, as well as those annuitants who brought in external funds. Annuitants who approach the company with external funds face a different pricing menu than those who buy internally. Specifically, the annuity payment rates are lower by 2.5 pence per pound of the annuitized amount than the payment rates faced by “internal” annuitants.22 Annuitants who approach the company with external funds may also be drawn from a different distribution of risk and preferences, which is why we do not include them in our main estimates. The estimated parameters for this population are, indeed, quite different from the estimates we obtain for the internal individuals (not shown).

Row 24 shows the results of estimating the model separately for this distinct group of individuals, using their distinct pricing menu. We continue to find that the welfare minimizing mandate is of no guarantee and that the welfare maximizing mandate is a 10 year guarantee, and it can get very close to the welfare level of the first best outcome. The welfare cost of asymmetric information is also quite similar: £137 in this “external” annuitant sample, compared to our baseline estimate of £127 in our sample of annuitants who are “internal” to our firm. This gives us some confidence that our results may be more broadly applicable to the U.K. annuitant population as a whole and are not idiosyncratic to our particular firm and its pricing menu.

7. CONCLUSIONS

This paper represents, to our knowledge, one of the first attempts to empirically estimate the welfare costs of asymmetric information in an insurance market and the welfare consequences of mandatory social insurance. We have done so in the specific context of the semi-compulsory U.K. annuity market. In this market, individuals who save for retirement through certain tax-deferred pension plans are required to annuitize their accumulated wealth. They are allowed, however, to choose among different types of annuity contracts. This choice simultaneously opens up scope for adverse selection as well as selection based on preferences over different contracts. We estimate that both private information about risk and preferences are important in determining the equilibrium allocation of contracts across individuals. We use our estimates of the joint distribution of risk and preferences to calculate welfare under the current allocation and to compare it to welfare under various counterfactual allocations.

We find that government mandates that eliminate any choice among annuity contracts do not necessarily improve on the asymmetric information equilibrium. We estimate that a mandated annuity contract could increase welfare relative to the current equilibrium by as much as £127 million per year, or could reduce it by as much as £107 million per year, depending on what contract is mandated. Moreover, the welfare maximizing choice for a mandated contract would not be apparent to the government without knowledge of the joint distribution of risk and preferences. Our results therefore suggest that achieving welfare gains through mandatory social insurance may be harder in practice than simple theory would suggest.

Our results also suggest that, relative to a first-best symmetric information benchmark, the welfare cost of asymmetric information along the dimension of guarantee choice is about 25 percent of the maximum money at stake in this choice. These estimates account for about £127 million annually, or about 2 percent of annual premia in the market. However, these quantitative results are less robust to some of the modeling assumptions than the results concerning the optimal mandate.

Although our analysis is specific to the U.K. annuity market, the approach we take can be applied in other insurance markets. As seen, the data requirements for recovering the joint distribution of risk and preferences are data on the menu of choices each individual faces, the contract each chooses, and a measure of each individual's ex-post risk realization. Such data are often available from individual surveys or from insurance companies. These data are now commonly used to test for the presence of asymmetric information in insurance markets, including automobile insurance (Chiappori and Salanie (2000), Cohen and Einav (2007)), health insurance (Cardon and Hendel (2001)), and long term care insurance (Finkelstein and McGarry (2006)), as well as annuity markets. This paper suggests that such data can now also be used to estimate the welfare consequences of any asymmetric information that is detected, or of imposing mandatory social insurance in the market.

Our analysis was made substantially easier by the assumption that moral hazard does not exist in annuity markets. As discussed, this may be a reasonable assumption for the annuity market. It may also be a reasonable assumption for several other insurance markets. For example, Cohen and Einav (2007) argue that moral hazard is unlikely to be present over small deductibles in automobile insurance. Grabowski and Gruber (2005) present evidence that suggests that there is no detectable moral hazard effect of long term care insurance on nursing home use. In such markets, the approach in this paper can be straightforwardly adopted.

In other markets, such as health insurance, moral hazard is likely to play an important role. Estimation of the efficiency costs of asymmetric information therefore requires some additional source of variation in the data to separately identify the incentive effects of the insurance policies. One natural source would be exogenous changes in the contract menu. Such variation may occur when regulation requires changes in pricing, or when employers change the menu of health insurance plans from which their employees can choose.23 Non-linear experience rating schemes may also introduce useful variation in the incentive effects of insurance policies (Abbring, Chiappori, and Pinquet (2003), Abbring, Heckman, Chiappori, and Pinquet (2003), Israel (2004)). We consider the application and extension of our approach to other markets, including those with moral hazard, an interesting and important direction for further work.

Appendix A. Additional details about the data

As mentioned in the text, we restrict our sample in several ways:

  • As is common in the analysis of annuitant choices, we limit the sample to the approximately sixty percent of annuities that insure a single life. The mortality experience of the single life annuitant provides a convenient ex-post measure of risk; measuring mortality risk of a joint life policy which insures multiple lives is less straightforward (Mitchell, Poterba, Warshawsky, and Brown (1999), Finkelstein and Poterba (2004, 2006)).
  • We also restrict the sample to the approximately eighty percent of annuitants who hold only one annuity policy, since characterizing the features of the total annuity stream for individuals who hold multiple policies is more complicated. Finkelstein and Poterba (2006) make a similar restriction.
  • We focus on the choice of guarantee period and abstract from a number of other dimensions of individuals’ choices.
    • – Individuals can choose the timing of their annuitization, although they cannot annuitize before age 50 (45 for women) or delay annuitizing past age 75 (70 for women). We allow average mortality and preferences for wealth after death to vary with age at purchase (as well as gender), but do not explicitly model the timing choice.
    • – Annuitants may also take a tax-free lump sum of up to 25 percent of the value of the accumulated assets. We do not observe this decision – we observe only the amount annuitized – and therefore do not model it. However, because of the tax advantage of the lump sum – income from the annuity is treated as taxable income – it is likely that most individuals fully exercise this option, and ignoring it is therefore unlikely to be a concern.
    • – To simplify the analysis, we analyze policies with the same payment profile, restricting our attention to the 90 percent of policies that pay a constant nominal payout (rather than payouts that escalate in nominal terms). As an ancillary benefit, this may make our assumption that individuals all have the same discount rate more plausible.
  • We limit our sample of annuitants to those who purchased a policy between January 1, 1988 and December 31, 1994. Although we also have data on annuitants who purchased a policy between January 1, 1995 and December 31, 1998, the firm altered its pricing policy in 1995. An exogenous change in the pricing menu might provide a useful source of variation in estimating the model. However, if the pricing change arose due to changes in selection of individuals into the firm – or if it affects subsequent selection into the firm – using this variation without allowing for changes in the underlying distribution of the annuitant parameters (i.e., in the joint distribution of α and β) could produce misleading estimates. We therefore limit the sample to the approximately one-half of annuities purchased in the pre-1995 pricing regime. In principle, we could also separately estimate the model for the annuities purchased in the post-1995 pricing regime. In practice, the small number of deaths among these more recent purchasers created problems for estimation in this sample.
  • Annuitants may choose to purchase their annuity from an insurance company other than the one in which their fund has been accumulating, and about one-third of annuitants market-wide choose to do so. As our sample is from a single company, it includes both annuitants who accumulated their fund with the company and stayed with the company, as well as those annuitants who brought in external funds. We limit our main analysis to the approximately two-thirds of individuals in our sample who purchased an annuity with a pension fund that they had accumulated within our company. In the robustness section, we re-estimate the model for the one-third of individuals who brought in external funds, and find similar welfare estimates.
  • The pricing of different guarantees varies with the annuitant's gender and age at purchase. We limit our sample of annuitants to those who purchased at the two most common ages of 60 or 65. About three-fifths of our sample purchased their annuity at 60 or 65.

Appendix B. Proof of Proposition 2

We can write the observed distribution of mortality outcomes and guarantee choices in terms of the unobservables as

Pr(g(α,β)ymim)Pr(mim)=0Pr(g(α,β)yα)Pr(mimα)dFα(α)
(22)

The left side of this equation is known from Z(g, m). From Proposition 1 we know that Pr(mi ≤ m|α) and Fα (α) can be identified from mortality data. Thus, all we need to show is that this equation can be uniquely solved for Pr(g(α, β) ≤ y|α). We will use the fact that mortality follows an MPH model to derive an explicit expression for Pr(g(α, β) ≤ y|α) in terms of the inverse Laplace transform.24

Since Pr(mim|α) comes from an MPH model, we can write it as

Pr(mimα)=1eαΛ(m),
(23)

where Λ(m)=0mψ(t) is the integrated hazard function, which increases from 0 to ∞. Substituting equation (23) into equation (22) and rearranging yields

Pr(g(α,β)y,mim)=0Pr(g(α,β)yα)(1eαΛ(m))dFα(α)==0Pr(g(α,β)yα)dFα(α)0Pr(g(α,β)yα)eαΛ(m)dFα(α)==Pr(g(α,β)y)0Pr(g(α,β)yα)eαΛ(m)dFα(α).
(24)

The first part of the right side of this equation is simply the unconditional cumulative distribution function of g and is known. The remaining integral on the right side is the Laplace transform of Pr(g(α, β) ≤ y|α) fα(α) evaluated at Λ(m). It is well known that the Laplace transform is unique and can be inverted. If we let L1{h()}(α) denote the inverse Laplace transform of h(·) evaluated at α, then

Pr(g(α,β)yα)=1fα(α)L1{Pr(g(α,β)y)Pr(g(α,β)y,miΛ())}(α).
(25)

This equation provides an explicit expression for Pr(g(α, β) ≤ y|α), so it is identified.

Given Pr(g(α, β) ≤ y|α) we can recover Fβ|α if g(α, β) is invertible with respect to β, for every α. With invertibility, we can write:

Pr(g(α,β)yα)=Pr(βgβ1(α,y)α)=Fβα(gβ1(α,y)α).
(26)

Thus, we identify Fβ|α.

Appendix C. Additional details about estimation

C.1. Likelihood

For each individual we observe mortality data, mi = (ci, ti, di), where ci is the time at which person i entered the sample, ti is the time at which the person left the sample, and di indicates whether the person died (di = 1) or was censored (di = 0). The contribution of an individual's mortality to the likelihood, conditional on αi, is:

Pr(mi=(ci,ti,di)α,λ)=Pr(t=tit>ci,α,λ)diPr(ttit>ci,α,λ)1di==1S(α,λ,ci)(s(α,λ,ti))di(S(α,λ,ti))1di,
(27)

where S(α,λ,t)=exp(1λ(1eλt)) is the Gompertz survival function, and s(α,λ,t)=αeλtexp(1λ(1eλt)) is the Gompertz density. The log likelihood of the mortality data is computed by integrating equation (27) over α, and adding up all individuals:

LM(λ,μα,σα(mi)i=1N)=i=1Nlog(Pr(miα,λ)1σαϕ(logαμασα)dα).
(28)

We maximize equation (28) over λ, μα, and σα to obtain an estimate of λ. The initial estimates of μα and σα are not used, as we obtain more efficient estimate of these parameters in the next step (described below).

The contribution of an individual's guarantee choice to the likelihood is based on the guarantee choice model above. Recall that the value of a given guarantee depends on preference for wealth after death β, and annual mortality hazard, which depends on λ and α. Some additional notation will be necessary to make this relationship explicit. Let V0A(g)(w0;β,α,λ) be the value of an annuity with guarantee length g to someone with initial wealth w0, Gompertz parameter λ, mortality rate α, and preference for wealth after death β. Conditional on α, the likelihood of choosing a guarantee of length gi is:

Pr(giα,λ)=1(gi=argmaxgV0A(g)(w0;β,α,λ))dFβα(βα)
(29)

where 1(·) is an indicator function. As mentioned in the text, we numerically verified that the relative value of a longer guarantee increases with β. Therefore, we know that for each α there is some interval, [0, β0,5(α,λ)), such that the zero year guarantee is optimal for all β in that interval. β0,5(α,λ) is the value of β that makes someone indifferent between choosing a 0 and 5 year guarantee. Similarly, there are intervals, [β0,5(α,λ),β5,10(α,λ)), where the five year guarantee is optimal, and [β5,10(α,λ), ∞), where the ten year guarantee is optimal.25

We can express the likelihood of an individual's guarantee choice in terms of these indifference cutoffs as:

Pr(giα,λ)={Fβα(β0.5(α,λ))ifg=0Fβα(β5,10(α,λ))Fβα(β0,5(α,λ))ifg=51Fβα(β5,10(α,λ))ifg=10}
(30)

Given our lognormality assumption, the conditional cumulative distribution function Fβ|α (·) can be written as:

Fβα(β(α,λ))=Φ(log(β(α,λ))μβασβα)
(31)

where Φ(·) is the normal cumulative distribution function, μβα=μβ+σαβσα2(logαμα) is the conditional mean of β, and σβα=σβ2σα,β2σα2 is the conditional standard deviation of β. The full log likelihood is obtained by combining Pr(gi|α, λ) and Pr(mi|α, λ), integrating over α, taking logs, and adding up over all individuals:

L(μ,Σ,λ)=i=1NlogPr(miα,λ)Pr(giα,λ)1ααϕ(logαμασα)dα.
(32)

We calculate the integral in equation (32) by quadrature. Let {xj}j=1M and {wj}j=1M be M quadrature points and weights for integrating from –∞ to ∞. Person i's contribution to the likelihood is:

Li(μ,Σ,λ)=j=1MPr(miα=(exjσα+μα,λ)Pr(giα=exjσα+μα,λ)ϕ(xj)wj.
(33)

We maximize the likelihood using a gradient based search. Specifically, we use the modeling language AMPL along with the SNOPT sequential quadratic programming algorithm (Gill, Murray, and Saunders (2002)) for maximization.

C.2. Guarantee indifference curves

As mentioned in the text, the most difficult part of calculating the likelihood is finding the points where people are indifferent between one guarantee option and another, that is finding β0,5(α,λ) and β5,10(α,λ). To find these points we need to compute the expected utility associated with each guarantee length.

The value of a guarantee of length g with associated annual payments zt(g) is

VA(g)(w0;α,β)=maxct,wtt=0Tat(α)δtct1γ1γ+βft(α)δt(wt+Zt(g))1γ1γs.t.wt+1=(1+r)(wt+zt(g)ct)0
(34)

where δ is the discount factor, r is the interest rate, and Zt(g)=τ=tt0+g(11+r)τtzτ(g) is the present discounted value of guaranteed future payments at time t. Also, at(α)=τ=1t(tκτ(α)) is the probability of being alive at time t and ft(α)=κt(α)τ=1t1(1κτ(α)) is the probability of dying at time t. Note that a person who dies at time t, dies before consuming ct or receiving zt(g). Technically, there are also no borrowing constraints and non-negativity constraints on wealth and consumption. However, it is easy to verify that these constraints never bind, the former due to the fact that the individuals are retirees who do not accumulate new income, and the latter due to the form of the utility functions.

We used the first order conditions from equation (34) to collapse the problem to a numerical optimization over a single variable, consumption at time zero. The first order conditions for equation (34) are

δtat(α)ctγ=ψtt{0,1,,T}
(35)
δtft(α)β(wt+Gtg)γ=ψt+11+rψt1t{1,2,,T}
(36)
(wt+ztct)(1+r)=wt+1t{0,1,,T1}
(37)

where ψt is the Lagrange multiplier on the budget constraint at time t. Initial wealth w0 is taken as given. It is not possible to completely solve the first order conditions analytically. However, suppose we knew c0. Then from the budget constraint (equation (37)), we can calculate w1. From the first order condition for c0 (equation (35)), we can find ψ0:

ψ0=s0(α)δ0c0γ.
(38)

We can then use the first order condition for w1 to solve for ψ1

ψ1=f1(α)δ1β(w1+G1g)γ+11+rψ0.
(39)

Then, ψ1 and the first order condition for ct gives c1:

c1=(ψ1δ1a1(α))1γ.
(40)

Continuing in this way, we can find the whole path of optimal ct and wt associated with the chosen c0. If this path satisfies the non-negativity constraints on consumption and wealth, then we have defined a value function of c0, V(c0, g, α, β). Thus, we can reformulate the optimal consumption problem as an optimization problem over one variable.

maxc0V~(c0,g,α,β).
(41)

Numerically maximizing a function of a single variable is a relatively easy problem and can be done quickly and robustly. We solve the maximization problem in equation (41) using a simple bracket and bisection method. To check our program, we compared the value function as computed in this way and by an earlier version of the program that used a discretization and backward induction approach. They agreed up to the expected precision.

Finally, the guarantee cutoffs, β0,5(α,λ) and β5,10(α,λ), are defined as the solution to

VA(0)(w0;α,β0,5(α,λ))=VA(5)(w0;α,β0,5(α,λ))
(42)
VA(5)(w0;α,β5,10(α,λ))=VA(10)(w0;α,β5,10(α,λ))
(43)

For each α, we solve for these cutoff points using a simple bisective search. Each evaluation of the likelihood requires knowledge of β0,5(α(xj),λ)) and β5,10(α(xj),λ)) at each integration point xj. Maximizing the likelihood requires searching over μα and σα , which will shift α(xj). As mentioned in the text, rather than recomputing these cutoff points each time α(xj) changes, we initially compute them on a dense grid of values of α, and log-linearly interpolate as needed.

Footnotes

*We are grateful to three anonymous referees and Steve Berry (the Editor) for many useful comments and suggestions. We also thank James Banks, Richard Blundell, Jeff Brown, Peter Diamond, Carl Emmerson, Jerry Hausman, Igal Hendel, Wojciech Kopczuk, Jonathan Levin, Alessandro Lizzeri, Ben Olken, Casey Rothschild, and many seminar participants for helpful comments, and to several patient and helpful employees at the firm whose data we analyze. Financial support from the National Institute of Aging grant #R01 AG032449 (Einav and Finkelstein), the National Science Foundation grant #SES-0643037 (Einav), the Social Security Administration grant #10-P-98363-3 to the National Bureau of Economic Research as part of the SSA Retirement Research Consortium (Einav and Finkelstein), and the the Alfred P. Sloan Foundation (Finkelstein) is gratefully acknowledged. Einav also acknowledges the hospitality of the Hoover Institution. The findings and conclusions expressed are solely those of the author(s) and do not represent the views of SSA, any agency of the Federal Government, or the NBER.

1More recently, several new working papers have presented additional attempts to quantify the efficiency cost of adverse selection in annuities (Hosseini (2008)) and in health insurance (Carlin and Town (2007), Bundorf, Levin, and Mahoney (2008), Einav, Finkelstein, and Cullen (2008), and Lustig (2008)).

2For more details on these rules, see Appendix A and Finkelstein and Poterba (2002).

3Over 90 percent of the annuitants in our firm purchase policies that pay a constant nominal payout (rather than policies that escalate in nominal terms). This is typical of the market as a whole. Although escalating policies (including inflation-indexed policies) are offered by some firms, they are rarely purchased (Murthi, Orszag, and Orszag (1999), and Finkelstein and Poterba (2004)).

4Specifically, we estimated Gompertz and Cox proportional hazard models in which we included indicator variables for age at purchase and gender, as well as indicator variables for a 5 year guarantee and a 10 year guarantee. In both models, we found that the coefficient on the 5 year guarantee dummy was significantly different from that on the 0 year guarantee dummy; however, the standard error on the coefficient on the 10 year guarantee dummy was high, so it wasn't estimated to be significantly different from the 5 year guarantee dummy (or from the 0 year guarantee dummy as well).

5A rare exception on quantity discounts is made for individuals who annuitize an extremely large amount.

6These statistics are reported in Finkelstein and Poterba (2006) who also analyze data from this firm. These statistics refer to single life annuities, which are the ones we analyze here, but are (obviously) computed prior to the additional sample restrictions we make here (e.g., restriction to nominal annuities purchased at ages 60 or 65).

7As might be expected, we can rule out a model with deterministic length of life and perfect foresight. Most individuals in the data choose a positive guarantee length and are alive at the end of it, thus violating such a model.

8Of course, one would expect some relationship between the individual's expectation and the actual underlying risk which governs the (stochastic) mortality outcome. We make specific assumptions about this relationship later, but for the purpose of modeling guarantee choice this is not important.

9Specifically, we use likelihood-ratio tests of the baseline Gompertz model against more general alternatives where λ is allowed to vary with time. We divide the period of observation over which we observe mortality outcomes (21 years) into two and three evenly spaced intervals and let λ vary across intervals. The p – value of these tests are 0.938 and 0.373, respectively.

10We ignore inflation uncertainty, which may lead us to over-state the welfare value of the nominal annuities we analyze. We make this abstraction for computational simplicity, and because prior work has found that incorporating uncertain inflation based on historical inflation patterns in the U.S. has a small quantitative effect (of about 1-2 percent) on the welfare gain from annuitization (Mitchell, Poterba, Warshawsky, and Brown (1999)). Since the U.K. inflation experience has been broadly similar, it seems natural to expect a qualitatively similar (small) effect in our context too.

11Note that all three parameters – λ, μα, σα – are in fact identified and estimated. However, we later re-estimate μα and σα using the entire data (that contain the guarantee choices), which is more efficient. As will be clear below, estimating λ using the entire data is computationally more demanding.

12Exactly how representative the mortality experience of the pensioners is for that of compulsory annuitants is not clear. See Finkelstein and Poterba (2002) for further discussion of this issue.

13We obtain it by dividing the £6 billion figure we have just referred to by the average annuitized amount (in 1998) in our full company data (rather than the sample we use for estimation; see Appendix A), which is £20, 000.

14Our average wealth equivalent is noticeably lower than what has been calculated in the previous literature (Mitchell, Poterba, Warshawsky, and Brown (1999), Davidoff, Brown, and Diamond (2005)). The high wealth equivalents in these papers in turn implies a very high rate of voluntary annuitization, giving rise to what is known as the “annuity puzzle” since, empirically, very few individuals voluntarily purchase annuities (Brown, Mitchell, Poterba, and Warshawsky (2001)). Our substantially lower wealth equivalents – which persist in the robustness analysis (see Table VII) – arise because of the relatively high β that we estimate. Previous papers have calibrated rather than estimated β and assumed it to be 0. If we set log α = μα and β = 0, and also assume – like these other papers – that annuitization is full (i.e., 100 percent vs. 20 percent in our baseline), then we find that the wealth equivalent of a 0 year guarantee for a 65 year old male rises to 135.9, which is much closer to the wealth equivalent of 156 reported by Davidoff, Brown, and Diamond (2005).

15The observed cross-subsidies across guarantee choices may be due to asymmetric information. For example, competitive models of pure adverse selection (with no preference heterogeneity), such as Miyazaki (1977) and Spence (1978), can produce equilibria with cross-subsidies from the policies with less insurance (in our context, longer guarantees) to those with more insurance (in our context, shorter guarantees). We should note that the observed cross subsidies may also arise from varying degrees of market power in different guarantee options. In such cases, symmetric information may not eliminate cross-subsides, and our symmetric information counterfactual would therefore conflate the joint effects of elimination of informational asymmetries and of market power. Our analysis of the welfare consequences of government mandates in the next subsection does not suffer from this same limitation.

16This is somewhat analogous to an insurance market with a state-dependent utility function. In such a case, the optimal mandate could be either full, partial, or no insurance (and analogously longer or shorter guarantee). For more details, see Sections 2 and 3.1 of the working paper version (Einav, Finkelstein, and Schrimpf (2007)).

17We estimate that welfare is slightly higher under the 10 year mandate than under the symmetric information equilibrium (in which everyone chooses the 10 year guarantee). This presumably reflects the fact that under the mandated (pooling) annuity payout rates, consumption is higher for low mortality individuals and lower for high mortality individuals than it would be under the symmetric information annuity payout rates. Since low mortality individuals have lower consumption in each period and hence higher marginal utility of consumption, this transfer improves social welfare (given the particular social welfare measure we use).

18Banks and Emmerson (1999) report that the quartiles of the welath distribution among 60-69 pensioners are 1,750, 8,950, and 24,900 pounds. We assume that the population of retirees is drawn from these three levels, with probability 37.5%, 25%, and 37.5%, respectively.

19Banks and Emmerson (1999) report an average η of 20 percent and a median of 10 percent. We therefore calibrate heterogeneity in η by assuming it can obtain one of three values – 0.1, 0.2, and 0.4 – with probabilities of 0.5, 0.25, and 0.25, respectively.

20On average in the U.K. population, about 50 percent of retirees’ wealth is annuitized through the public Social Security program, although this fraction declines with retiree wealth (Office of National Statistics (2006)). Compulsory annuitiants tend to be of higher than average socio-economic status (Banks and Emmerson (1999)) and may therefore have on average a lower proportion of their wealth annuitized through the public Social Security program. However, since our purpose is to examine the sensitivity of our welfare estimates to accounting for publicly provided annuities, we went with the higher estimate to be conservative.

21Welfare of individuals who always choose the middle is not well defined, and the reported results only compute the welfare for those individuals who are estimated to be “rational” and to choose according to the baseline model. For comparability with the other specifications, we still scale the welfare estimates by the overall annuitized amount in the market.

22We found it somewhat puzzling that payout rates are lower for individuals who approach the company with external funds, and who therefore are more likely to be actively searching across companies. According to the company executives, some of the explanation lies in the higher administrative costs associated with transferring external funds, also creating higher incentives to retain internal individuals by offerring them better rates.

23See also Adams, Einav, and Levin (2009) for a similar variation in the context of credit markets.

24Alternatively, we could proceed by noting that for each x, equation (22) is a Fredholm integral equation of the first kind with kernel Pr(mi ≤ m|α). We could appeal to the theory of integral equations and linear operators to show that the equation has a unique solution when Pr(mi ≤ m|α) satisfies an appropriate condition. Proving the proposition in this way would be slightly more general, but it would lead to a highly implicit function that defines Pr(g(α, β) ≤ x|α).

25Note that it is possible that β0,5(α,λ)>β5,10(α,λ). In this case there is no interval where the five year guarantee is optimal. Instead, there is some β0,10(α,λ) such that a 0 year guarantee is optimal if β<β0,10(α,λ) and a 10 guarantee is optimal otherwise. This situation (which does not create potential estimation problems, but simply implies that a 5 year guarantee is never optimal) only arises for high values of α's that are well outside the range of our mortality data.

References

  • Abbring J, Chiappori P-A, Pinquet J. Moral hazard and Dynamic Insurance Data. Journal of the European Economic Association. 2003;1:767–820.
  • Abbring J, Heckman JJ, Chiappori P-A, Pinquet J. Adverse Selection and Moral Hazard in Insurance: Can Dynamic Data Help to Distinguish? Journal of the European Economic Association Papers and Proceedings. 2003;1:512–521.
  • Adams W, Einav L, Levin J. Liquidity Constraints and Imperfect Information in Subprime Lending. American Economic Review. 2009;99:49–84.
  • Akerlof G. The Market for ‘Lemons’: Quality Uncertainty and The Market Mechanism. Quarterly Journal of Economics. 1970;84:488–500.
  • Association of British Insurers Insurance Statistics Year Book: 1986-1998. 1999.
  • Banks J, Emmerson C. IFS Briefing Note. Institute for Fiscal Studies; London: 1999. U.K. Annuitants.
  • Banks J, Emmerson C, Oldfield Z, Tetlow G. Prepared for Retirement? The Adequacy and Distribution of Retirement Resources in England. Institute for Fiscal Studies; London: 2005. available at http://www.ifs.org.uk/comms/r67.pdf.
  • Barsky RB, Kimball MS, Juster FT, Shapiro MD. Preference Parameters and Behavioral Heterogeneity: An Experimental Approach in The Health and Retirement Study. Quarterly Journal of Economics. 1997;112:537–579.
  • Braun M, Muermann A. The Impact of Regret on The Demand for Insurance. Journal of Risk and Insurance. 2004;71:737–767.
  • Brown JR. Private Pensions, Mortality Risk, and the Decision to Annuitize. Journal of Public Economics. 2001;82:29–62.
  • Brown JR, Mitchell O, Poterba J, Warshawsky M. The Role of Annuity Markets in Financing Retirement. MIT Press; Cambridge, MA: 2001.
  • Bundorf KM, Levin J, Mahoney N. Pricing, Matching and Efficiency in Health Plan Choice. mimeo, Stanford University; 2008. available at http://www.stanford.edu/~jdlevin/research.htm.
  • Cardon JH, Hendel I. Asymmetric Information in Health Insurance: Evidence from the National Medical Expenditure Survey. Rand Journal of Economics. 2001;32:408–427. [PubMed]
  • Carlin C, Town RJ. Adverse Selection, Welfare and Optimal Pricing of Employer-Sponsored Health Plans. mimeo, University of Minnesota; 2007.
  • Chiappori P-A, Salanie B. Testing for Asymmetric Information in Insurance Markets. Journal of Political Economy. 2000;108:56–78.
  • Cohen A, Einav L. Estimating Risk Preferences from Deductible Choice. American Economic Review. 2007;97:745–788.
  • Davidoff T, Brown JR, Diamond PA. Annuities and Individual Welfare. American Economic Review. 2005;95:1573–1590.
  • Davis SJ, Kubler F, Willen P. Borrowing Costs and the Demand for Equity over the Life Cycle. Review of Economics and Statistics. 2006;86:348–362.
  • Dynan K, Skinner J, Zeldes S. Do The Rich Save More? Journal of Political Economy. 2004;112:397–444.
  • Einav L, Finkelstein A, Cullen MR. Estimating Welfare in Insurance Markets using Variation in Prices. 2008. NBER Working Paper 14414. [PMC free article] [PubMed]
  • Einav L, Finkelstein A, Schrimpf P. The Welfare Cost of Asymmetric Information: Evidence from the U.K. Annuity Market. 2007. NBER working paper No. 13228. [PMC free article] [PubMed]
  • Elbers C, Ridder G. True and Spurious Duration Dependence: The Identifiability of The Proportional Hazard Model. Review of Economic Studies. 1982;49:403–409.
  • Engen EM, Gale WG, Uccello CR. The Adequacy of Retirement Saving. Brookings Papers on Economic Activity. 1999;2:65–165.
  • Fang H, Keane M, Silverman D. Sources of Advantageous Selection: Evidence from the Medigap Insurance Market. Journal of Political Economy. 2008;116:303–350.
  • Feldstein M. Rethinking Social Insurance. 2005. NBER Working Paper No. 11250.
  • Finkelstein A, McGarry K. Multiple Dimensions of Private Information: Evidence from The Long-Term Care Insurance Market. American Economic Review. 2006;96:938–958. [PMC free article] [PubMed]
  • Finkelstein A, Poterba J. Selection Effects in the Market for Individual Annuities: New Evidence from the United Kingdom. Economic Journal. 2002;112:28–50.
  • Finkelstein A, Poterba J. Adverse Selection in Insurance Markets: Policyholder Evidence from the U.K. Annuity Market. Journal of Political Economy. 2004;112:193–208.
  • Finkelstein A, Poterba J. Testing for Adverse Selection with ’Unused Observables’. 2006. NBER Working Paper No. 12112.
  • Finkelstein A, Rothschild C, Poterba J. Redistribution by Insurance Market Regulation: Analyzing a Ban on Gender-Based Retirement Annuities. Journal of Financial Economics. 2009;91:38–58. [PMC free article] [PubMed]
  • Gill PE, Murray W, Saunders MA. SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM Journal of Optimization. 2002;12:979–1006.
  • Golosov M, Tsyvinski A. Optimal Taxation with Endogenous Insurance Markets. Quarterly Journal of Economics. 2007;122:487–534.
  • Grabowski DC, Gruber J. Moral Hazard in Nursing Home Use. 2005. NBER Working Paper No. 11723.
  • Hamermesh D. Expectations, Life Expectancy, and Economic Behavior. Quarterly Journal of Economics. 1985;100:389–408.
  • Han A, Hausman J. Flexible Parametric Estimation of Duration and Competing Risk Models. Journal of Applied Econometrics. 1990;5:1–28.
  • Heckman JJ, Singer B. The Identifiability of the Proportional Hazard Model. Review of Economic Studies. 1984;51:231–241.
  • Hosseini R. Adverse Selection in the Annuity Market and the Role for Social Security. mimeo, Arizona State University; 2008.
  • Horiuchi S, Coale A. A simple Equation for Estimating the Expectation of Life at Old Ages. Population Studies. 1982;36:317–326. [PubMed]
  • Hubbard G, Skinner J, Zeldes S. Precautionary Savings and Social Insurance. Journal of Political Economy. 1995;103:360–99.
  • Hurd MD. Mortality Risk and Bequests. Econometrica. 1989;57:779–813.
  • Hurd MD, McGarry K. The Predictive Validity of Subjective Probabilities of Survival. Economic Journal. 2002;112:966–985.
  • Institute of Actuaries and Faculty of Actuaries. Continuous Mortality Investigation Committee Continuous Mortality Investigation Reports, Number 17. 1999.
  • Israel M. Do We Drive More Safely When Accidents Are More Expensive? Identifying Moral Hazard from Experience Rating Schemes. 2004. unpublished mimeo, available at http://www.kellogg.northwestern.edu/faculty/israel/htm/research.html.
  • Kopczuk W. Bequest and Tax Planning: Evidence from Estate Tax Returns. Quarterly Journal of Economics. 2007;122:1801–1854.
  • Kopczuk W, Lupton J. To Leave or not To Leave: The Distribution of Bequest Motives. Review of Economic Studies. 2007;74:207–235.
  • Kotlikoff LJ, Spivak A. The Family as an Incomplete Annuities Market. Journal of Political Economy. 1981;89:372–391.
  • Laibson DI, Repetto A, Tobacman J. Self-Control and Saving for Retirement. Brookings Papers on Economic Activity. 1998;1:91–196.
  • Laitner J, Thomas JF. New Evidence on Altruism: A Study of TIAA-CREF Retirees. American Economic Review. 1996;86:893–908.
  • Lustig JD. The Welfare Effects of Adverse Selection in Privatized Medicare. Mimeo, Boston University; 2008.
  • Mitchell OS, Poterba J, Warshawsky M, Brown JR. New Evidence on the Money's Worth of Individual Annuities. American Economic Review. 1999;89:1299–1318.
  • Miyazaki H. The Rat Race and Internal Labor Markets. Bell Journal of Economics. 1977;8:394–418.
  • Moneyfacts . Annuities Daily Update. Moneyfacts Publications; Norfolk, United Kingdom: Jan 4, 1995.
  • Murthi M, Orszag JM, Orszag PR. The Value for Money of Annuities in the UK: Theory, Experience and Policy. mimeo; 1999. available at http://citeseerx.ist.psu.edu/view-doc/summary?doi=10.1.1.39.8154.
  • Office of National Statistics The Pensioners’ Incomes Series 2004/5. Pensions Analysis Directorate. 2006. available at http://www.dwp.gov.uk/asd/asd6/PI_series_0405.pdf.
  • Palumbo MG. Uncertain Medical Expenses and Precautionary Saving Near the End of the Life Cycle. Review of Economic Studies. 1999;66:395–421.
  • Ridder G. The Non-Parametric Identification of Generalized Hazard Models. Review of Economic Studies. 1990;57:167–182.
  • Rothschild M, Stiglitz JE. Equilibrium in Competitive Insurance Markets: An Essay on the Economics of Imperfect Information. Quarterly Journal of Economics. 1976;90:630–649.
  • Scholz JK, Seshadri A, Khitatrakun S. Are Americans Saving ‘Optimally’ for Retirement? Journal of Political Economy. 2006;114:607–643.
  • Sheshinski E. Differentiated Annuities in a Pooling Equilibrium. mimeo, Hebrew University of Jerusalem; 2006.
  • Simonson I, Tversky A. Choice in Context: Trade-off Contrast and Extremeness Aversion. Journal of Marketing Research. 1992;29:281–295.
  • Smith VK, Taylor D, Sloan F. Longevity Expectations and Death: Can People Predict Their Own Demise? American Economic Review. 2001;91:1126–1134.
  • Spence M. Product Differentiation and Performance in Insurance Markets. Journal of Public Economics. 1978;10:427–447.
  • Van den Berg GJ. Duration Models: Specification, Identification and Multiple Durations. In: Heckman JJ, Leamer EE, editors. Handbook of Econometrics. Edition 1. Vol. 5. Elsevier: 2001. pp. 3381–3460. Chapter 55.