Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2885826

Formats

Article sections

- Abstract
- 1 Introduction
- 2 A Model of Cognitive and Noncognitive Skill Formation
- 3 Identifying the Technology using Dynamic Factor Models
- 3.3 The Identification of a General Measurement Error Model
- 4 Estimation
- 5 Estimating the Technology of Skill Formation
- 6 Conclusion
- References

Authors

Related links

Econometrica. Author manuscript; available in PMC 2010 August 1.

Published in final edited form as:

Econometrica. 2010 May 1; 78(3): 883–931.

doi: 10.3982/ECTA6551PMCID: PMC2885826

NIHMSID: NIHMS117812

Flavio Cunha, Department of Economics University of Pennsylvania 3718 Locust Walk Philadelphia, PA 19102 ; Email: ude.nnepu.sas@ahnucf phone: 215−898−5652;

See other articles in PMC that cite the published article.

This paper formulates and estimates multistage production functions for child cognitive and noncognitive skills. Output is determined by parental environments and investments at different stages of childhood. We estimate the elasticity of substitution between investments in one period and stocks of skills in that period to assess the benefits of early investment in children compared to later remediation. We establish nonparametric identification of a general class of nonlinear factor models. A by-product of our approach is a framework for evaluating childhood interventions that does not rely on arbitrarily scaled test scores as outputs and recognizes the differential effects of skills in different tasks. Using the estimated technology, we determine optimal targeting of interventions to children with different parental and personal birth endowments. Substitutability decreases in later stages of the life cycle for the production of cognitive skills. It increases in later stages of the life cycle for the production of noncognitive skills. This finding has important implications for the design of policies that target the disadvantaged. For some configurations of disadvantage and outcomes, it is optimal to invest relatively more in the later stages of childhood.

A large body of research documents the importance of cognitive skills for social and economic success.^{1} An emerging body of research establishes the parallel importance of noncognitive skills.^{2} Understanding the factors affecting the evolution of cognitive and noncognitive skills is important for understanding how to promote successful lives.^{3}

This paper estimates the technology governing the formation of cognitive and noncognitive skills in childhood. We establish identification of general nonlinear factor models which enable us to determine the technology of skill formation. Our multistage technology captures different developmental phases in the life cycle of a child. We identify and estimate substitution parameters that determine the importance of early parental investment for subsequent lifetime achievement, and the costliness of later remediation if early investment is not undertaken.

Cunha and Heckman (2007) present a theoretical framework that organizes and interprets a large body of empirical evidence on child and animal development.^{4}
Cunha and Heckman (2008) estimate a linear dynamic factor model that exploits cross equation restrictions (covariance restrictions) to secure identification of a multistage technology for child investment.^{5} With enough measurements relative to the number of latent skills and investments, it is possible to identify the latent state space dynamics generating the evolution of skills.

The linear technology used by Cunha and Heckman (2008) imposes the assumption that early and late investments are perfect substitutes. This paper identifies a more general nonlinear technology by extending linear state space and factor analysis to a nonlinear setting. This extension allows us to identify crucial elasticity of substitution parameters governing the trade-o between early and late investments.

Drawing on the analyses of Schennach (2004a) and Hu and Schennach (2008), we establish identification of the technology of skill formation. We relax the strong independence assumptions for error terms in the measurement equations that are maintained in Cunha and
Heckman (2008) and Carneiro, Hansen, and Heckman (2003). The assumption of linearity of the technology in inputs that is used by Cunha and Heckman (2008) and Todd and Wolpin (2003, 2005) is not required. We allow inputs to interact in producing output. We generalize the factor-analytic index function models used by Carneiro, Hansen, and Heckman (2003) to allow for more general functional forms for measurement equations. We solve the problem of defining a scale for the output of childhood investments by anchoring test scores using the adult outcomes of the child, which have a well-defined cardinal scale. We determine the latent variables that generate test scores by estimating how the latent variables predict adult outcomes.^{6} Our approach sets the scale of test scores and latent variables in an interpretable metric. Using this metric, analysts can meaningfully interpret changes in output and conduct interpretable value-added analyses.^{7}

The plan of this paper is as follows. Section 2 briefly summarizes the previous literature to motivate our generalization of it. Section 3 presents our identification analysis. Section 4 discusses our estimation strategy. Section 5 discusses the data used to estimate the model and the model estimates. Section 6 concludes.

We analyze a model with multiple periods of childhood, *t* {1, 2, . . . , *T*}, *T* ≥ 2, followed by *J* periods of adult working life, *t* {*T* + 1, *T* + 2, . . . , *T* + *J*}. The *T* childhood periods are divided in *S* stages of development, *s* {1, . . . , *S*}, with *S* ≤ *T*. Adult outcomes are produced by cognitive skills, *θ _{C,T}*, and noncognitive skills,

Skills evolve in the following way. Each agent is born with initial conditions *θ*_{1} = (*θ*_{C,1}, *θ*_{N,1}). Family environments and genetic factors may influence these initial conditions (see Olds, 2002, and Levitt, 2003). We denote by *θ _{P}* = (

$${\theta}_{k,t+1}={f}_{s,k}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{t},{I}_{k,t},{\theta}_{P},{\eta}_{k,t}),$$

(2.1)

for *k* {*C, N*}, *t* {1, 2, . . . , *T*}, and *s* {1, . . . , *S*}. We assume that *f _{s,k}* is monotone increasing in its arguments, twice continuously differentiable, and concave in

Direct complementarity between the stock of skill *l* and the productivity of investment *I _{k,t}* in producing skill

$$\frac{{\partial}^{2}{f}_{s,k}(\cdot )}{\partial {I}_{k,t}\partial {\theta}_{l,t}}>0,\phantom{\rule{1em}{0ex}}t\in \{1,\dots ,T\},\phantom{\rule{1em}{0ex}}l,k\in \{C,N\}.$$

Period *t* stocks of abilities and skills promote acquisition of skills by making investment more productive. Students with greater early cognitive and noncognitive abilities are more effcient in later learning of both cognitive and noncognitive skills. The evidence from the early intervention literature suggests that the enriched early environments of the Abecedarian, Perry and CPC programs promoted greater effciency in learning in high schools and reduce problem behaviors.^{9}

Adult outcome *j*, *Q _{j}*, is produced by a combination of different period

$${Q}_{j}={g}_{j}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,T+1},{\theta}_{N,T+1}),\phantom{\rule{1em}{0ex}}j\in \{1,\dots ,J\}{.}^{10}$$

(2.2)

These outcome equations capture the twin concepts that both cognitive and noncognitive skills matter for performance in most tasks in life and have different effects in different tasks in the labor market and in other areas of social performance. Outcomes include test scores, wages, achievement in an occupation, hours worked, criminal activity, teenage pregnancy, etc.

In this paper, we focus attention on a *CES* version of technology (2.1) where we assume
that *θ _{C,t}*,

$${\theta}_{C,t+1}={\left[{\gamma}_{s,C,1}{\theta}_{C,t}^{{\varphi}_{s,C}}+{\gamma}_{s,C,2}{\theta}_{N,t}^{{\varphi}_{s,C}}+{\gamma}_{s,C,3}{I}_{C,t}^{{\varphi}_{s,C}}+{\gamma}_{s,C,4}{\theta}_{C,P}^{{\varphi}_{s,C}}+{\gamma}_{s,C,5}{\theta}_{N,P}^{{\varphi}_{s,C}}\right]}^{\frac{1}{{\varphi}_{s,C}}},$$

(2.3)

$${\theta}_{N,t+1}={\left[{\gamma}_{s,N,1}{\theta}_{C,t}^{{\varphi}_{s,N}}+{\gamma}_{s,N,2}{\theta}_{N,t}^{{\varphi}_{s,N}}+{\gamma}_{s,N,3}{I}_{N,t}^{{\varphi}_{s,N}}+{\gamma}_{s,N,4}{\theta}_{C,P}^{{\varphi}_{s,N}}+{\gamma}_{s,N,5}{\theta}_{N,P}^{{\varphi}_{s,N}}\right]}^{\frac{1}{{\varphi}_{s,N}}},$$

(2.4)

where *γ _{s,k,l}* [0, 1], Σ

A *CES* specification of adult outcomes in periods after *T* writes

$${Q}_{j}={\{{\rho}_{j}\phantom{\rule{thinmathspace}{0ex}}{\left({\theta}_{C,T+1}\right)}^{{\varphi}_{Q,j}}+(1-{\rho}_{j}){\left({\theta}_{N,T+1}\right)}^{{\varphi}_{Q,j}}\}}^{\frac{1}{{\varphi}_{Q,j}}},$$

(2.5)

where *ρ _{j}* [0, 1], and

To gain some insight into this model, consider a special case where the elasticities of substitution are the same across technologies (2.3) and (2.4) and in all outcome functions (2.5), so *ϕ _{s,C}* =

$$Q={\left[{\tau}_{1}{I}_{1}^{\varphi}+{\tau}_{2}{I}_{2}^{\varphi}+{\tau}_{3}{\theta}_{C,1}^{\varphi}+{\tau}_{4}{\theta}_{N,1}^{\varphi}+{\tau}_{5}{\theta}_{C,P}^{\varphi}+{\tau}_{6}{\theta}_{N,P}^{\varphi}\right]}^{\frac{1}{\varphi}},$$

(2.6)

where *τ _{i}* for

Suppose that parents maximize the net present value of child wealth, that they can lend and borrow freely at market rate *r* and that there is no uncertainty. Parents decide how much to invest in period “1”, *I*_{1}, and period “2”, *I*_{2}, and how much to transfer in risk-free assets at a fixed interest rate *r*, given total parental resources. Assuming an interior solution, and that the price of investment is the same in both periods, the optimal ratio of period 1 investment to period 2 investment is

$$\text{log}\left(\frac{{I}_{1}}{{I}_{2}}\right)=\left(\frac{1}{1-\varphi}\right)\left[\text{log}\left(\frac{{\tau}_{1}}{{\tau}_{2}}\right)-\text{log}(1+r)\right].$$

(2.7)

Figure 1 plots the ratio of early to late investment as a function of *τ*_{1}/*τ*_{2} for different values of *ϕ*. *Ceteris paribus*, the higher *τ*_{1} relative to *τ*_{2}, the higher first period investment should be relative to second period investment. The parameters *τ*_{1} and *τ*_{2} are affected by the productivity of investments in producing skills, which are generated by the technology parameters *γ*_{s,k,3}, for *s* {1, 2} and *k* {*C, N*}, and also depend on the relative importance of cognitive skills, *ρ*, versus noncognitive skills, 1 – *ρ*, in producing the adult outcome *Q. Ceteris paribus*, if ${\scriptstyle \frac{{\tau}_{1}}{{\tau}_{2}}}>(1+r)$, the higher the *CES* complementarity, (i.e., the lower *ϕ*), the greater is the ratio of early to late investment. The greater *r*, the smaller should be the ratio of early to late investment. In the limit, if investments complement each other strongly, optimality implies that they should be equal in both periods.

Ratio of early to late investment in human capital as a function of the ratio of first period to second period investment productivity for different values of the complementarity parameter

To see how these parameters affect the ratio of early to late investment, suppose that early investment only produces cognitive skill, so that *γ*_{1,N,3} = 0, and late investment only produces noncognitive skill, so that *γ*_{2,C,3} = 0. In this case, the ratio $\left({\scriptstyle \frac{{\tau}_{1}}{{\tau}_{2}}}\right)$ can be expressed in terms of the technology and outcome function parameters:

$$\left(\frac{{\tau}_{1}}{{\tau}_{2}}\right)=\frac{(\rho {\gamma}_{2,C,1}+(1-\rho ){\gamma}_{2,N,1})}{(1-\rho )}\frac{{\gamma}_{1,C,3}}{{\gamma}_{2,N,3}}.$$

For a given value of *ρ* (the weight placed on cognition in final outcomes), the ratio of early to late investment is higher the greater the ratio ${\scriptstyle \frac{{\gamma}_{1,C,3}}{{\gamma}_{2,N,3}}}$. To investigate the role *ρ* plays in determining the optimal ratio of investments, assume that *γ*_{2,C,1} ≥ *γ*_{2,N,1}, so that the stock of cognitive skill, *θ*_{C,1}, is at least as effective in producing next period cognitive skill, *θ*_{C,2}, as in producing next period noncognitive skill, *θ*_{N,2}. Under this assumption, the higher *ρ*, that is, the more important cognitive skills are in producing *Q*, the higher the equilibrium ratio *I*_{1}/*I*_{2}. If, on the other hand, *Q* is more intensive in noncognitive skills, then *I*_{1}/*I*_{2} is smaller.

This example builds intuition about the importance of the elasticity of substitution in determining the optimal timing of lifecycle investments. However, it oversimplifies the analysis of skill formation. It is implausible that the elasticity of substitution between skills in adult output $\left({\scriptstyle \frac{1}{1-{\varphi}_{Q}}}\right)$ is the same as the elasticity of substitution for inputs in production, and that a common elasticity of substitution governs the productivity of inputs in producing both cognitive and noncognitive skills.

Our analysis allows for multiple adult outcomes and outputs of multiple skills. We allow different elasticities of substitution to govern the technologies of cognitive and noncognitive skills, for these to differ at different stages of the life cycle and for both to be different from the elasticity of substitution for cognitive and noncognitive skills in producing adult outcomes. We test and reject the assumption that *ϕ _{s,C}* =

Identifying and estimating technology (2.1) is challenging. Both inputs and outputs can only be proxied. Measurement error in general nonlinear specifications of technology (2.1) raises serious econometric challenges. Inputs may be endogenous and the unobservables in the input equations may be correlated with unobservables in the technology equations.

This paper addresses these challenges. Specifically, we: (1) Determine how stocks of cognitive and noncognitive skills at date *t* affect the stocks of skills at date *t* + 1, identifying both self productivity (the effects of *θ _{N,t}* on

Our analysis of identification proceeds in the following way. We start with a model where measurements are linear and separable in the latent variables, as in Cunha and Heckman (2008). We establish identification of the joint distribution of the latent variables without imposing conventional independence assumptions about measurement errors. With the joint distribution of latent variables in hand, we nonparametrically identify technology (2.1) given alternative assumptions about *η _{k,t}*. We then extend this analysis to identify nonparametric measurement, and production models. We anchor the latent variables in adult outcomes to make their scales interpretable. Finally, we account for endogeneity of inputs in the technology equations and we model investment behavior.

We use a general notation for all measurements to simplify the econometric analysis. Let *Z _{a,k,t,j}* be the

$${Z}_{1,k,t,j}={\mu}_{1,k,t,j}+{\alpha}_{1,k,t,j}{\theta}_{k,t}+{\epsilon}_{1,k,t,j}$$

(3.1)

$${Z}_{2,k,t,j}={\mu}_{2,k,t,j}+{\alpha}_{2,k,t,j}{I}_{k,t}+{\epsilon}_{2,k,t,j},$$

(3.2)

and where *ε _{a,k,t,j}* are uncorrelated across the

$$\begin{array}{cc}\hfill & {Z}_{3,k,1,j}={\mu}_{3,k,1,j}+{\alpha}_{3,k,1,j}{\theta}_{k,P}+{\epsilon}_{3,k,1,j}{,}^{12}\hfill \\ \hfill & E\phantom{\rule{thinmathspace}{0ex}}\left({\epsilon}_{3,k,1,j}\right)=0,j\in \{1,\dots ,{M}_{3,k,1}\},\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}k\in \{C,N\}.\hfill \end{array}$$

(3.3)

The *α*s are factor loadings. The parameters and variables are defined conditional on *X* which we keep implicit. Following standard conventions in factor analysis, we set the scale of the factors by assuming *α*_{a,k,t,1} = 1 and normalize *E*(*θ _{k,t}*) = 0 and

We first establish identification of the factor loadings under the assumption that the *ε _{a,k,t,j}* are uncorrelated across

Since *Z*_{1,C,t,1} and *Z*_{1,C,t+1,1} are observed, one can compute *Cov* (*Z*_{1,C,t,1}, *Z*_{1,C,t+1,1}) from the data. Because of normalization *α*_{1,C,t,1} = 1 for all t, we obtain:

$$Cov\phantom{\rule{thinmathspace}{0ex}}({Z}_{1,C,t,1},{Z}_{1,C,t+1,1})=Cov\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t},{\theta}_{C,t+1}).$$

(3.4)

In addition, one can compute the covariance of the second measurement on cognitive skills at period *t* with the first measurement on cognitive skills at period *t* + 1:

$$Cov\phantom{\rule{thinmathspace}{0ex}}({Z}_{1,C,t,2},{Z}_{1,C,t+1,1})={\alpha}_{1,C,t,2}Cov\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t},{\theta}_{C,t+1}).$$

(3.5)

If *Cov* (*θ _{C,t}*,

$$\frac{Cov\phantom{\rule{thinmathspace}{0ex}}({Z}_{1,C,t,2},{Z}_{1,C,t+1,1})}{Cov\phantom{\rule{thinmathspace}{0ex}}({Z}_{1,C,t,1},{Z}_{1,C,t+1,1})}={\alpha}_{1,C,t,2}.$$

If there are more than two measures of cognitive skill in each period *t*, one can identify *α*_{1,C,t,j} for *j* {2, 3, . . . , *M*_{1,C,t}}, *t* {1, . . . , *T*} up to the normalization *α*_{1,C,t,1} = 1. The assumption that the *ε _{a,k,t,j}* are uncorrelated across

Once the parameters *α*_{1,C,t,j} are identified, one can rewrite (3.1), assuming *α*_{1,C,t,j} ≠ 0, as:

$$\frac{{Z}_{1,C,t,j}}{{\alpha}_{1,C,t,j}}=\frac{{\mu}_{1,C,t,j}}{{\alpha}_{1,C,t,j}}+{\theta}_{C,t}+\frac{{\epsilon}_{1,C,t,j}}{{\alpha}_{1,C,t,j}},j\in \{1,2,\dots ,{M}_{1,C,t}\}.$$

(3.6)

In this form, it is clear that the known quantities ${\scriptstyle \frac{{Z}_{1,C,t,j}}{{\alpha}_{1,C,t,j}}}$ play the role of repeated errorcontaminated measurements of *θ _{C,t}*. Collecting results for all

$$\theta =\left({\left\{{\theta}_{C,t}\right\}}_{t=1}^{T},{\left\{{\theta}_{N,t}\right\}}_{t=1}^{T},{\left\{{I}_{C,t}\right\}}_{t=1}^{T},{\left\{{I}_{N,t}\right\}}_{t=1}^{T},{\theta}_{C,P},{\theta}_{N,P}\right).$$

Thus, we can identify the joint distribution of *θ*, *p*(*θ*).

Although the availability of numerous indicators for each latent factor is helpful in improving the effciency of the estimation procedure, the identification of the model can be secured (after the factor loadings are determined) if only two measurements of each latent factor are available. Since in our empirical analysis we have at least two different measurements for each latent factor, we can define, without loss of generality, the following two vectors

$$\begin{array}{cc}\hfill {W}_{i}=& {\left({\left\{\frac{{Z}_{1,C,t,i}}{{\alpha}_{1,C,t,i}}\right\}}_{t=1}^{T},{\left\{\frac{{Z}_{1,N,t,i}}{{\alpha}_{1,N,t,i}}\right\}}_{t=1}^{T},{\left\{\frac{{Z}_{2,C,t,i}}{{\alpha}_{2,C,t,i}}\right\}}_{t=1}^{T},{\left\{\frac{{Z}_{2,N,t,i}}{{\alpha}_{2,N,t,i}}\right\}}_{t=1}^{T},\frac{{Z}_{3,C,1,i}}{{\alpha}_{3,C,1,i}},\frac{{Z}_{3,N,1,i}}{{\alpha}_{3,N,1,i}}\right)}^{\prime}\hfill \\ \hfill & i\in \{1,2\}.\hfill \end{array}$$

These vectors consist of the first and the second measurements for each factor, respectively. The corresponding measurement errors are

$$\begin{array}{cc}\hfill {\omega}_{i}=& {\left({\left\{\frac{{\epsilon}_{1,C,t,i}}{{\alpha}_{1,C,t,i}}\right\}}_{t=1}^{T},{\left\{\frac{{\epsilon}_{1,N,t,i}}{{\alpha}_{1,N,t,i}}\right\}}_{t=1}^{T},{\left\{\frac{{\epsilon}_{2,C,t,i}}{{\alpha}_{2,C,t,i}}\right\}}_{t=1}^{T},{\left\{\frac{{\epsilon}_{2,N,t,i}}{{\alpha}_{2,N,t,i}}\right\}}_{t=1}^{T},\frac{{\epsilon}_{3,C,1,i}}{{\alpha}_{3,C,1,i}},\frac{{\epsilon}_{3,N,1,i}}{{\alpha}_{3,N,1,i}}\right)}^{\prime},\hfill \\ \hfill & i\in \{1,2\}.\hfill \end{array}$$

Identification of the distribution of *θ* is obtained from the following theorem. Let *L* denote the total number of latent factors, in our case 4*T* + 2.

**Theorem 1 ***Let W*_{1}, *W*_{2}, *θ*, *ω*_{1}, *ω*_{2 }*be random vectors taking values in*
${\mathbb{R}}^{L}$
*and related through*

$$\begin{array}{cc}\hfill & {W}_{1}=\theta +{\omega}_{1}\hfill \\ \hfill & {W}_{2}=\theta +{\omega}_{2}.\hfill \end{array}$$

*If (i) E* [*ω*_{1}|*θ*, *ω*_{2}] = 0 *and (ii) ω*_{2} is independent from θ, then the density of θ can be expressed *in terms of observable quantities as:*

$${p}_{\theta}\phantom{\rule{thinmathspace}{0ex}}\left(\theta \right)={\left(2\pi \right)}^{-L}\int {e}^{-i\chi \cdot \theta}\text{exp}\left({\int}_{0}^{\chi}\frac{E\left[i{W}_{1}{e}^{i\zeta \cdot {W}_{2}}\right]}{E\left[{e}^{i\zeta \cdot {W}_{2}}\right]}\cdot d\zeta \right)d\chi ,$$

*where*
$i=\sqrt{-1}$, provided that all the requisite expectations exist and $E\left[{e}^{i\zeta \cdot {W}_{2}}\right]$ is nonvanishing. Note that innermost integral is the integral of a vector-valued field along a continuous path joining the origin and the point $\chi \in {\mathbb{R}}^{L}$, *while the outermost integral is over the whole*
${\mathbb{R}}^{L}$ space. If θ does not admit a density with respect to the Lebesgue measure, p_{θ} (*θ*) can be interpreted within the context of the theory of distributions.

**Proof.** See Web Appendix, Part 1.^{15}

The striking improvement in this analysis over the analysis of Cunha and Heckman (2008) is that identification can be achieved under much weaker conditions regarding measurement errors— far fewer independence assumptions are needed. The asymmetry in the analysis of *ω*_{1} and *ω*_{2} generalizes previous analysis which treats these terms symmetrically. It gives the analyst a more flexible toolkit for the analysis of factor models. For example, our analysis allows analysts to accommodate heteroscedasticity in the distribution of *ω*_{1} that may depend on *ω*_{2} and *θ*. It also allows for potential correlation of components within the vectors *ω*_{1} and *ω*_{2}, thus permitting serial correlation within a given set of measurements.

The intuition for identification in this paper, as in all factor analyses, is that the signal is common to multiple measurements but the noise is not. In order to extract the noise from signal, the disturbances have to satisfy some form of orthogonality with respect to the signal and with respect to each other. These conditions are, various uncorrelatedness assumptions, conditional mean assumptions or conditional independence assumptions. They are used in various combinations in Theorem 1, in Theorem 2 below and in other results in this paper.

In this section, we extend the previous analysis for linear factor models to consider a measurement model of the general form

$${Z}_{j}={a}_{j}(\theta ,{\epsilon}_{j})\phantom{\rule{thickmathspace}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}j\in \{1,\dots ,M\},$$

(3.7)

where *M* ≥ 3 and where the indicator *Z _{j}* is observed while the latent factor

$$\begin{array}{cc}\hfill {Z}_{j}& ={\left({\left\{{Z}_{1,C,t,j}\right\}}_{t=1}^{T},{\left\{{Z}_{1,N,t,j}\right\}}_{t=1}^{T},{\left\{{Z}_{2,C,t,j}\right\}}_{t=1}^{T},{\left\{{Z}_{2,N,t,j}\right\}}_{t=1}^{T},{Z}_{3,C,1,j},{Z}_{3,N,1,j}\right)}^{\prime}\hfill \\ \hfill {\epsilon}_{j}& ={\left({\left\{{\epsilon}_{1,C,t,j}\right\}}_{t=1}^{T},{\left\{{\epsilon}_{1,N,t,j}\right\}}_{t=1}^{T},{\left\{{\epsilon}_{2,C,t,j}\right\}}_{t=1}^{T},{\left\{{\epsilon}_{2,N,t,j}\right\}}_{t=1}^{T},{\epsilon}_{3,C,1,j},{\epsilon}_{3,C,N,1,j}\right)}^{\prime}\hfill \end{array}$$

while the vector of unobserved latent factors is:

$$\theta ={\left({\left\{{\theta}_{C,t}\right\}}_{t=1}^{T},{\left\{{\theta}_{N,t}\right\}}_{t=1}^{T},{\left\{{I}_{C,t}\right\}}_{t=1}^{T},{\left\{{I}_{N,t}\right\}}_{t=1}^{T},{\theta}_{C,P},{\theta}_{N,P}\right)}^{\prime}.$$

The functions *a _{j}* (·, ·) for

**Theorem 2** The distribution of θ in Equations (3.7) is identified under the following conditions:

*The joint density*^{16}*of θ, Z*_{1},*Z*_{2},*Z*_{3}is bounded and so are all their marginal and conditional densities.*Z*_{1},*Z*_{2},*Z*_{3 }*are mutually independent conditional on θ.**p*_{Z1|Z2}(*Z*_{1}|*Z*_{2})*and p*_{θ|Z1}(*θ*|*Z*_{1}) form a bounded, complete family of distributions indexed by Z_{2 }*and Z*_{1}*, respectively.**Whenever*$\theta \ne \stackrel{~}{\theta},\phantom{\rule{thickmathspace}{0ex}}{p}_{{Z}_{3}\mid \theta}({Z}_{3}\mid \theta )$*and p*_{Z3|θ}$\left({Z}_{3}\mid \stackrel{~}{\theta}\right)$ differ over a set of strictly positive probability.*There exists a known functional*Ψ, mapping a density to a vector, that has the property that $\Psi \left[{p}_{{Z}_{1}\mid \theta}\left(\cdot \mid \theta \right)\right]=\theta $.

**Proof.** See Web Appendix, Part 1.^{17}

The proof of Theorem 2 proceeds by casting the analysis of identification as a linear algebra problem analogous to matrix diagonalization. In contrast to the standard matrix diagonalization used in linear factor analyses, we do not work with random vectors. Instead, we work with their densities. This approach o ers the advantage that the problem remains linear even when the random vectors are nonlinearly related.

The conditional independence requirement of Assumption 2 is weaker than the full independence assumption traditionally made in standard linear factor models as it allows for heteroskedasticity. Assumption 3 requires *θ*, *Z*_{1}, *Z*_{2} to be vectors of the same dimensions, while Assumption 4 can be satisfied even if *Z*_{3} is a scalar. The minimum number of measurements needed for identification is therefore 2*L* + 1, which is exactly the same number of measurements as in the linear, classical measurement error case.

Versions of Assumption 3 appear in the nonparametric instrumental variable literature (e.g. Newey and Powell (2003), Darolles, Florens, and Renault (2002)). Intuitively, the requirement that *p*_{Z1|Z2} (*Z*_{1}|*Z*_{2}) forms a bounded complete family requires that the density of *Z*_{1} vary sufficiently as *Z*_{2} varies (and similarly for *p*_{θ|Z1} (*θ*|*Z*_{1})).^{18}

Assumption 4 is automatically satisfied, for instance, if *θ* is univariate and *a*_{3} (*θ*, *ε*_{3}) is strictly increasing in *θ*. However, it holds much more generally. Since *a*_{3} (*θ*, *ε*_{3}) is nonseparable, the distribution of *Z*_{3} conditional on *θ* can change with *θ*, thus making it possible for Assumption 4 to be satisfied even if *a*_{3} (*θ*, *ε*_{3}) is not strictly increasing in *θ*.

Assumption 5 specifies how the observed *Z*_{1} is used to “anchor” the scale of the unobserved *θ*. The most common choice of functional Ψ would be the mean, the mode, the median, or any other well-defined measure of location. This specification allows for non-classical measurement error. One way to satisfy this assumption is to normalize *a*_{1} (*θ*, *ε*_{1}) to be equal to *θ* + *ε*_{1}, where *ε*_{1} has zero mean, median or mode. The zero mode assumption is particularly plausible for surveys where respondents face many possible wrong answers but only one correct answer. Moving the mode of the answers away from zero would therefore require a majority of respondents to misreport in exactly the same way— an unlikely scenario. Many other nonseparable functions can also satisfy this assumption. With the distribution of *p _{θ}* (

Note that Theorem 2 *does not* claim that the distributions of the errors *ε _{j}* or that the functions

Nevertheless, various normalizations ensuring that the functions *a _{j}*(

The conditions justifying Theorems 1 and 2 are not nested within each other. Their different assumptions represent different trade-offs best suited for different applications. While Theorem 1 would suffice for the empirical analysis of this paper, the general result established in Theorem 2 will likely be quite useful as larger sample sizes become available.

Carneiro, Hansen, and Heckman (2003) present an analysis for nonseparable measurement equations based on a separable latent index structure, but invoke strong independence and “identification-at-infinity” assumptions. Our approach for identifying the distribution of *θ* from general nonseparable measurement equations does not require these strong assumptions.

Once the density of *θ* is known, one can identify nonseparable technology function (2.1) for *t* {1, . . . , *T*}; *k* {*C, N*}; and *s* {1, . . . ., *S*}. Even if (*θ _{t}*,

One solution to this problem is to assume that (2.1) is additively separable in *η _{k,t}*. Another way to avoid this ambiguity is to normalize

$$\mathrm{Pr}\phantom{\rule{thinmathspace}{0ex}}[{\theta}_{k,t+1}\le \stackrel{\u2012}{\theta}\mid {\theta}_{t},{I}_{k,t},{\theta}_{P}]\equiv G\phantom{\rule{thinmathspace}{0ex}}(\stackrel{\u2012}{\theta}\mid {\theta}_{t},{I}_{k,t},{\theta}_{P}).$$

We identify technology (2.1) using the relationship

$${f}_{s,k}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{t},{I}_{k,t},{\theta}_{P})={G}^{-1}\phantom{\rule{thinmathspace}{0ex}}({\eta}_{k,t}\mid {\theta}_{t},{I}_{k,t},{\theta}_{P})$$

where *G*^{−1} (*η _{k,t}* |

The more traditional separable technology with zero mean disturbance, *θ*_{k,t+1} = *f _{s,k}* (

$${f}_{s,k}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{t},{I}_{k,t},{\theta}_{P})\equiv E\phantom{\rule{thinmathspace}{0ex}}[{\theta}_{k,t+1}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{\theta}_{t},{I}_{k,t},{\theta}_{P}],$$

where the expectation is taken under the density ${p}_{{\theta}_{k,t+1}}\mid {\theta}_{t},{I}_{k,t},{\theta}_{P}$, which can be calculated from *p _{θ}*. The density of

$${p}_{{\theta}_{k,t+1}\mid {\theta}_{t},{I}_{k,t},{\theta}_{P}}({\eta}_{k,t}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{\theta}_{t},{I}_{k,t},{\theta}_{P})={p}_{{\theta}_{k,t+1}\mid {\theta}_{t},{I}_{k,t},{\theta}_{P}}({\eta}_{k,t}+E\phantom{\rule{thinmathspace}{0ex}}[{\theta}_{k,t+1}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{\theta}_{t},{I}_{k,t},{\theta}_{P}]\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{\theta}_{t},{I}_{k,t},{\theta}_{P}),$$

since ${p}_{{\theta}_{k,t+1}}\mid {\theta}_{t},{I}_{k,t},{\theta}_{P}$ is known once *p _{θ}* is known. We now show how to anchor the scales of

It is common in the empirical literature on child schooling and investment to measure outcomes by test scores. However, test scores are arbitrarily scaled. To gain a better understanding of the relative importance of cognitive and noncognitive skills and their interactions and the relative importance of investments at different stages of the life cycle, it is desirable to anchor skills in a common scale. In what follows, we continue to keep the conditioning on the regressors implicit.

We model the effect of period *T* + 1 cognitive and noncognitive skills on adult outcomes *Z*_{4,j}, for *j* {1, . . . , *J*}. Suppose that there are *J*_{1} observed outcomes that are linear functions of cognitive and noncognitive skills in period *T* + 1:

$${Z}_{4,j}={\mu}_{4,j}+{\alpha}_{4,C,j}{\theta}_{C,T+1}+{\alpha}_{4,N,j}{\theta}_{N,T+1}+{\epsilon}_{4,j},\phantom{\rule{thickmathspace}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}j\in \{1,\dots ,{J}_{1}\}.$$

When adult outcomes are linear and separable functions of skills, we can define the anchoring functions to be:

$$\begin{array}{cc}\hfill {g}_{C,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{C,T+1}\right)& ={\mu}_{4,j}+{\alpha}_{4,C,j}{\theta}_{C,T+1}\phantom{\rule{1em}{0ex}}\text{and}\hfill \\ \hfill {g}_{N,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{N,T+1}\right)& ={\mu}_{4,j}+{\alpha}_{4,N,j}{\theta}_{N,T+1}.\hfill \end{array}$$

(3.8)

We can also anchor using nonlinear functions. One example would be an outcome produced by a latent variable ${Z}_{4,j}^{\ast}$, for *j* {1, . . . , *J*}:

$${Z}_{4,j}^{\ast}={\stackrel{~}{g}}_{j}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,T+1},{\theta}_{N,T+1})-{\epsilon}_{4,j}.$$

Note that we do not observe ${Z}_{4,j}^{\ast}$, but we observe the variable *Z*_{4,j} which is defined as:

$${Z}_{4,j}=\{\begin{array}{cc}1,\hfill & \text{if}\phantom{\rule{thickmathspace}{0ex}}{\stackrel{~}{g}}_{j}({\theta}_{C,T+1},{\theta}_{N,T+1})-{\epsilon}_{4,j}\ge 0\hfill \\ 0,\hfill & \text{otherwise}.\hfill \end{array}\phantom{\}}$$

In this notation

$$\begin{array}{cc}\hfill \mathrm{Pr}\phantom{\rule{thinmathspace}{0ex}}({Z}_{4,j}=1\mid {\theta}_{C,T+1},{\theta}_{N,T+1})& =\mathrm{Pr}\phantom{\rule{thinmathspace}{0ex}}[{\epsilon}_{4,j}\le {\stackrel{~}{g}}_{j}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,T+1},{\theta}_{N,T+1})\mid {\theta}_{C,T+1},{\theta}_{N,T+1}]\hfill \\ \hfill & ={F}_{{\epsilon}_{4,j}}\phantom{\rule{thinmathspace}{0ex}}[{\stackrel{~}{g}}_{j}({\theta}_{C,T+1},{\theta}_{N,T+1})\mid {\theta}_{C,T+1},{\theta}_{N,T+1}]\hfill \\ \hfill & ={g}_{j}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,T+1},{\theta}_{N,T+1}).\hfill \end{array}$$

Adult outcomes such as high school graduation, criminal activity, drug use, and teenage pregnancy may be represented in this fashion.

To establish identification of *g _{j}* (

We can extract two separate “anchors” *g _{C,j}* (

$$\begin{array}{cc}\hfill {g}_{C,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{C,T+1}\right)& \equiv \int {g}_{j}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,T+1},{\theta}_{N,T+1}){p}_{{\theta}_{N,T+1}}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{N,T+1}\right)\phantom{\rule{thinmathspace}{0ex}}d{\theta}_{N,T+1},\hfill \\ \hfill {g}_{N,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{N,T+1}\right)& \equiv \int {g}_{j}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,T+1},{\theta}_{N,T+1}){p}_{{\theta}_{C,T+1}}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{C,T+1}\right)\phantom{\rule{thinmathspace}{0ex}}d{\theta}_{C,T+1},\hfill \end{array}$$

(3.9)

where the marginal densities, *p*_{θ}_{j,T+1} (*θ*_{N,T+1}), *j* {*C, N*} are identified by applying the preceding analysis. Both *g _{C,j}*(

The “anchored” skills, denoted by ${\stackrel{~}{\theta}}_{j,k,t}$, are defined as

$${\stackrel{~}{\theta}}_{j,k,t}={g}_{k,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{k,t}\right),k\in \{C,N\},\phantom{\rule{1em}{0ex}}t\in \{1,\dots ,T\}.$$

The anchored skills inherit the subscript *j* because different anchors generally scale the same latent variables differently.

We combine the identification of the anchoring functions with the identification of the technology function *f _{s,k}* (

$$\begin{array}{cc}\hfill {\stackrel{~}{f}}_{j,s,k}& \left({\stackrel{~}{\theta}}_{j,C,t},{\stackrel{~}{\theta}}_{j,N,t},{I}_{k,t},{\theta}_{C,P},{\theta}_{N,P},{\eta}_{k,t}\right)\hfill \\ \hfill & \equiv {g}_{k,j}\left({f}_{s,k}\left({g}_{C,j}^{-1}\left({\stackrel{~}{\theta}}_{j,C,t}\right),{g}_{N,j}^{-1}\left({\stackrel{~}{\theta}}_{j,N,t}\right),{I}_{k,t},{\theta}_{C,P},{\theta}_{N,P},{\eta}_{k,t}\right)\right),k\in \{C,N\}\hfill \end{array}$$

where ${g}_{k,j}^{-1}(\cdot )$ denotes the inverse of the function *g _{k,j}* (·). Invertibility follows from the assumed monotonicity. It is straightforward to show that

$$\begin{array}{cc}\hfill {\stackrel{~}{f}}_{j,s,k}& \left({\stackrel{~}{\theta}}_{j,C,t},{\stackrel{~}{\theta}}_{j,N,t},{I}_{k,t},{\theta}_{C,P},{\theta}_{N,P},{\eta}_{k,t}\right)\hfill \\ \hfill & ={\stackrel{~}{f}}_{j,s,k}\phantom{\rule{thinmathspace}{0ex}}({g}_{C,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{C,t}\right),{g}_{N,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{N,t}\right),{I}_{k,t},{\theta}_{C,P},{\theta}_{N,P},{\eta}_{k,t})\hfill \\ \hfill & ={g}_{k,j}\phantom{\rule{thinmathspace}{0ex}}\left({f}_{s,k}\phantom{\rule{thinmathspace}{0ex}}({g}_{C,j}^{-1}\phantom{\rule{thinmathspace}{0ex}}\left({g}_{C,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{C,t}\right)\right),{g}_{N,j}^{-1}\left({g}_{N,j}\left({\theta}_{N,t}\right)\right),{I}_{k,t},{\theta}_{C,P},{\theta}_{N,P},{\eta}_{k,t})\right)\hfill \\ \hfill & ={g}_{k,j}\phantom{\rule{thinmathspace}{0ex}}\left({f}_{s,k}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t},{\theta}_{N,t},{I}_{k,t},{\theta}_{C,P},{\theta}_{N,P},{\eta}_{k,t})\right)\hfill \\ \hfill & ={g}_{k,j}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{k,t+1}\right)={\stackrel{~}{\theta}}_{j,k,t+1},\hfill \end{array}$$

as desired. Hence, ${\stackrel{~}{f}}_{j,s,k}$ is the equation of motion for the anchored skills ${\stackrel{~}{\theta}}_{j,k,t+1}$ that is consistent with the equation of motion *f _{s,k}* for the original skills

Thus far, we have maintained the assumption that the error term *η _{k,t}* in the technology (2.1) is independent of all the other inputs (

To see how this can be done, suppose that we observe at least three adult outcomes, so that *J* ≥ 3. We can then write outcomes as functions of *T* + 1 skills as well as unobserved heterogeneity component, *π*:

$${Z}_{4,j}={\alpha}_{4,C,j}{\theta}_{C,T+1}+{\alpha}_{4,N,j}{\theta}_{N,T+1}+{\alpha}_{4,\pi ,j}\pi +{\epsilon}_{4,j},\phantom{\rule{thickmathspace}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}j\in \{1,2,\dots ,J\}.$$

We can use the analysis of section 3.2, suitably extended to allow for measurements *Z*_{4,j}, to secure identification of the factor loadings *α*_{4,C,j}, *α*_{4,N,j}, and *α*_{4,π,j}. We can apply the argument of section 3.4 to secure identification of the joint distribution of (*θ _{t}*,

$${\theta}_{k,t}={f}_{s,k}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{t},{I}_{k,t},{\theta}_{P},\pi ,{\nu}_{k,t}\}.$$

*π* is permitted to be correlated with the inputs (*θ _{t}*,

Economic theory (see, e.g., Cunha and Heckman, 2007) predicts that parental investments in period *t, I _{t}*, should depend on parental skills, (

$${I}_{k,t}={g}_{k,t}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t},{\theta}_{N,t},\pi ,{\theta}_{C,P},{\theta}_{N,P},{y}_{t})+{\zeta}_{k,t},\phantom{\rule{thinmathspace}{0ex}}k\in \{C,N\},t\in \{1,\dots ,T\}$$

(3.10)

*ζ _{k,t}*

$${Z}_{2,k,t,j}={\mu}_{2,k,t,j}+{\alpha}_{2,k,t,j}{g}_{k,t}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t},{\theta}_{N,t},\pi ,{\theta}_{C,P},{\theta}_{N,P},{y}_{t})+{\alpha}_{2,k,t,j}{\zeta}_{k,t}+{\epsilon}_{2,k,t,j}$$

(3.11)

for *j* {1, . . . , *M*_{2,k,t}}, *t* {1, . . . , *T*}, and *k* {*C, N*}. From measurements on child skills, parental skills, child adult outcomes, and family income, we can obtain the joint distribution of (*θ _{C,t}*,

Our analysis of identification of production functions with missing inputs is more general than that of Olley and Pakes (1996), who also consider use of proxies to measure unobserved inputs. They assume that the researcher has access to perfect proxies to measure unobserved inputs, whereas we allow for imperfectly measured proxies, i.e., measurement error.

Technology (3.1) and the associated measurement systems are nonparametrically identified. However, we use parametric maximum likelihood to estimate the model and do not estimate under the most general conditions. We do this for two reasons. First, a fully nonparametric approach is too data hungry to apply to samples of the size that we have at our disposal, because the convergence rates of nonparametric estimators are quite slow. Second, solving a high-dimensional dynamic factor model is a computationally demanding task that can only be made manageable by invoking parametric assumptions. Nonetheless, the analysis of this paper shows that in principle the parametric structure used to secure the estimates reported below is not strictly required to identify the technology.

We now develop the likelihood function for our model. Let *p* (*θ*) denote the density of *θ*. Although we do not directly observe *θ*, we observe measurements on it, *Z*, with realization *z*. Let *z*_{1,k,t,j,h} denote measurement *j* associated with the skill factor *θ _{k,t}* for person

$$\begin{array}{cc}\hfill p\left(z\right)=& \prod _{h=1}^{H}\int \dots \int p\left(\theta \right)\hfill \\ \hfill & \times \prod _{k\in \{C,N\}}\prod _{t=1}^{T}\prod _{j=1}^{{M}_{1,k,t}}{p}_{{\epsilon}_{1,k,t,j,h}}({z}_{1,k,t,j,h}-{\mu}_{1,k,t,j}-{\alpha}_{1,k,t,j}{\theta}_{k,t})\phantom{\rule{thinmathspace}{0ex}}d{\theta}_{k,t}\hfill \\ \hfill & \times \prod _{k\in \{C,N\}}\prod _{t=1}^{T}\prod _{j=1}^{{M}_{2,k,t}}{p}_{{\epsilon}_{2,k,t,j,h}}({z}_{2,k,t,j,h}-{\mu}_{2,k,t,j}-{\alpha}_{2,k,t,j}{I}_{k,t})\phantom{\rule{thinmathspace}{0ex}}d{I}_{k,t}\hfill \\ \hfill & \times \prod _{k\in \{C,N\}}\prod _{j=1}^{{M}_{3,k,1}}{p}_{{\epsilon}_{3,k,1,j,h}}({z}_{3,k,t,j,h}-{\mu}_{3,k,t,j}-{\alpha}_{3,k,t,j}{\theta}_{k,P})\phantom{\rule{thinmathspace}{0ex}}d{\theta}_{k,P}\hfill \\ \hfill & \times \prod _{j=1}^{{M}_{4,T+1}}p\phantom{\rule{thinmathspace}{0ex}}({z}_{4},j,h)\phantom{\rule{thinmathspace}{0ex}}d{\theta}_{C,T+1}d{\theta}_{N,T+1}{,}^{22}\hfill \end{array}$$

(4.1)

where

$$\begin{array}{cc}\hfill & {p}_{4,j,h}\phantom{\rule{thinmathspace}{0ex}}\left({z}_{4,T+1,j,h}\right)={p}_{{\epsilon}_{4,T+1,j,h}}({z}_{4,T+1,j,h}-{\mu}_{4,T+1,j}-{\alpha}_{4,C,T+1}{\theta}_{C,T+1}-{\alpha}_{4,N,T+1}{\theta}_{N,T+1})\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}j=1,\dots ,{J}_{1}.\hfill \end{array}$$

and

$$\begin{array}{cc}\hfill {p}_{4,j,h}\phantom{\rule{thinmathspace}{0ex}}\left({z}_{4,T+1,j,h}\right)=& {F}_{{\epsilon}_{4,j}}{({\mu}_{4,T+1,j}+{\alpha}_{4,C,T+1}{\theta}_{C,T+1}+{\alpha}_{4,N,T+1}{\theta}_{N,T+1})}^{{z}_{4,T+1,j,h}}\hfill \\ \hfill & \times {[1-{F}_{{\epsilon}_{4,j}}({\mu}_{4,T+1,j}+{\alpha}_{4,C,T+1}{\theta}_{C,T+1}+{\alpha}_{4,N,T+1}{\theta}_{N,T+1})]}^{1-{z}_{4,T+1,j,h}}.\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}j={J}_{1}+1,\dots ,{M}_{4,T+1}\hfill \end{array}$$

The likelihood is maximized subject to parametric versions of technology constraints (2.1) and the normalizations on the measurements discussed in section 3.1. We assume that the measurement error *ε _{l,k,t,j,h}* is classical, and independent of

In principle, one can estimate the parameters of the model, the parameters of the technology, and the *p* (*θ*) by maximizing (4.1) directly. In order to do that, one can approximate *p* (*z*) by computing the integrals numerically in a deterministic fashion. However, if the number
of integrals is very large, a serious practical problem arises. The number of points required to evaluate the integrals is very large. For example, if there are three latent variables and four time periods, so that *T* = 4, then dim (*θ*) = 12 and one has to compute an integral of dimension twelve to obtain the function *p* (*z*). This requires computing approximately seventeen million points of evaluation for each individual *h* if we pick four points of evaluation for each integral. The rate of convergence of the numerical approximation decreases with dim (*θ*). In order to obtain good approximations of *p* (*z*) even in the case with three factors and four time periods, we would need more than 4 points of evaluation for each integral.

We avoid this problem by relying on nonlinear filtering methods. They facilitate the approximation of the likelihood by recursive methods, greatly reducing the computational burden. Further details on how we implement nonlinear filtering are presented in Web Appendix, Section 3.

We estimate the technology on a sample of 2207 firstborn white children from the Children of the NLSY/79 (CNLSY/79) sample. Starting in 1986, the children of the NLSY/1979 female respondents, ages 0−14, have been assessed every two years. The assessments measure cognitive ability, temperament, motor and social development, behavior problems, and self-competence of the children as well as their home environments. Data are collected via direct assessment and maternal report during home visits at every biannual wave. Section 4 of the Web Appendix discusses the measurements used to proxy investment and output. Web Appendix Tables 4−1−4−3 present summary statistics of the sample we use.^{24}

To match the biennial data collection plan, in our empirical analysis, a period is equivalent to two years. We have eight periods distributed over two stages of development.^{25} We report estimates of a variety of specifications.

Dynamic factor models allow us to exploit the wealth of measures on investment and outcomes available in the CNLSY data. They solve several problems in estimating skill formation technologies. First, there are many proxies for parental investments in children's cognitive and noncognitive development. Using the dynamic factor model, we let the data pick the best combinations of family input measures that predict the levels and growth in test scores. Measured inputs that are not very informative on family investment decisions will have negligible estimated factor loadings. Second, our models help us solve the problem of missing data. Assuming that the data are missing at random, we integrate out the missing items from the sample likelihood.

In practice, we cannot empirically distinguish investments in cognitive skills from investments in noncognitive skills. Accordingly, we assume investment in period *t* is the same for both skills although it may have different effects on those skills. Thus we assume *I _{C,t}* =

We use separable measurement system (3.1). We estimate versions of the technology (2.3)-(2.4) augmented to include shocks:

$${\theta}_{k,t+1}={\left[{\gamma}_{s,k,1}{\theta}_{C,t}^{{\varphi}_{s,k}}+{\gamma}_{s,k,2}{\theta}_{N,t}^{{\varphi}_{s,k}}+{\gamma}_{s,k,3}{I}_{t}^{{\varphi}_{s,k}}+{\gamma}_{s,k,4}{\theta}_{C,P}^{{\varphi}_{s,k}}+{\gamma}_{s,k,5}{\theta}_{N,P}^{{\varphi}_{s,k}}\right]}^{\frac{1}{{\varphi}_{s,k}}}{e}^{{\eta}_{k,t+1}},$$

(5.1)

where *γ _{s,k,l}* ≥ 0 and ${\sum}_{l=1}^{5}{\gamma}_{s,k,l}=1,\phantom{\rule{thickmathspace}{0ex}}k\in \{C,N\},\phantom{\rule{thinmathspace}{0ex}}t\in \{1,2\},s\in \{1,2\}$. We assume that the innovations are normally distributed: ${\eta}_{k,t}\sim N(0,{\delta}_{\eta ,s}^{2})$. We further assume that the

$$\begin{array}{cc}\hfill {Z}_{1,k,t,j}& ={\mu}_{1,k,t,j}+{\alpha}_{1,k,t,j}\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{k,t}+{\epsilon}_{1,k,t,j}\hfill \\ \hfill & j\in \{1,\dots ,{M}_{a,k,t}\},t\in \{1,\dots ,T\},k\in \{C,N\}.\hfill \end{array}$$

We use the factors (and not their logarithms) as arguments of the technology.^{26} This keeps the latent factors non-negative, as is required for the definition of technology (5.1). Collect the *ε* terms for period *t* into a vector *ε _{t}*. We assume that

$$\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{t}^{r}=(\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{C,t},\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{N,t},\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{I}_{t},\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{C,P},\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{N,P},\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}\pi ).$$

Identification of this model follows as a consequence of Theorems 1 and 2 and results in Matzkin (2003, 2007). We estimate the model under different assumptions about the distribution of the factors. Under the first specification, ln *θ _{t}* is normally distributed with mean zero and variance-covariance matrix Σ

$$p\phantom{\rule{thinmathspace}{0ex}}\left(\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{t}^{r}\right)=\sum _{r=1}^{\mathcal{T}}{\omega}_{\tau}\varphi \phantom{\rule{thinmathspace}{0ex}}(\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{t}^{r};{\mu}_{t,\tau},{\Sigma}_{t,\tau})$$

subject to: ${\sum}_{\tau =1}^{\mathcal{T}}{\omega}_{\tau}=1$ and ${\sum}_{\tau =1}^{\mathcal{T}}{\omega}_{\tau}{\mu}_{t,\tau}=0$.

We report anchored results in the text. We use the anchoring procedures described in detail in Section 6 of the Web Appendix. The anchored results allow us to compare the productivity of investments and stocks of different skills at different stages of the life cycle on the anchored outcome. In this paper, we mainly use completed years of education by age 19, a continuous variable, as an anchor. We explore the sensitivity of the estimates to alternative anchors for a one stage model in Web Appendix 7.

This section presents results from an extensive empirical analysis estimating the multistage technology of skill formation accounting for measurement error, non-normality of the factors, endogeneity of inputs and family investment decisions. The plan of development of this section is as follows. We first present baseline two stage models that anchor outcomes in terms of their effects on schooling attainment, that correct for measurement errors, and that assume that the factors are normally distributed. These models do not account for endogeneity of inputs through unobserved heterogeneity components or family investment decisions. The baseline model is already far more general than what is presented in previous research on the formation of child skills that uses unanchored test scores as outcome measures and does not account for measurement error (see, e.g., Fryer and Levitt, 2004).

We present evidence on the first order empirical importance of measurement error. When we do not correct for it, the estimated technology suggests that there is no effect of early investment on child outcomes. Controlling for endogeneity of family inputs by accounting for unobserved heterogeneity (*π*), and accounting explicitly for family investment decisions has substantial effects on estimated parameters.

The following empirical regularities emerge across all models that account for measurement error. Self productivity of skills is greater in the second stage than in the first stage. Noncognitive skills are cross productive for cognitive skills in the first stage of production. The cross productivity effect is weaker and less precisely determined in the second stage. There is no evidence for a cross productivity effect of cognitive skills on noncognitive skills at either stage. The estimated elasticity of substitution for inputs in cognitive skill is substantially lower in the second stage of a child's life cycle than in the first stage. For non-cognitive skills, the ordering is reversed for models that control for unobserved heterogeneity (*π*). These estimates suggest that it is easier to redress endowment deficits that determine cognition in the first stage of a child's lifecycle than in the second stage. For socioemotional (noncognitive) skills, the opposite is true. For cognitive skills, the productivity parameter associated with parental investment (*γ*_{1,C,3}) is greater in the first stage than in the second stage (*γ*_{2,C,3}). For noncognitive skills, the pattern of estimates for the productivity parameter across models is less clear cut, but there are not dramatic differences across the stages. For both outputs, the parameter associated with the effect of parental noncognitive skills on output is smaller at the second stage than the first stage.

Web Appendix 7 discusses the sensitivity of estimates of a one-stage two-skill model to alternative anchors and to allowing for nonnormality of the factors. For these and other estimated models which are not reported, allowing for nonnormality has only minor effects on the estimates. Anchoring affects the estimates.^{27} Below, we report anchored estimates. To facilitate computation, we use years of schooling attained as the anchor in all of the models reported in this section of the paper.^{28}

Table 1 presents evidence on our baseline two stage model of skill formation. Outcomes are anchored in years of schooling attained. Factors are assumed to be normally distributed and we ignore heterogeneity (*π*). The estimates show that for both skills, self productivity increases in the second stage. Noncognitive skills foster cognitive skills in the first stage but not in the second stage. Cognitive skills have no cross-productivity effect on noncognitive skills at either stage.^{29} The productivity parameter for investment is greater in the first period than the second period for either skill. The difference in the parameter is dramatic for cognitive skills. The variability in the shocks is greater in the second period than in the first period. The elasticity of substitution for cognitive skills is much greater in the first period than in the second period. The opposite is found for cognitive skills.

Estimates of the Technology Using the Factor Model to Correct for Measurement Error Linear Anchoring on Educational Attainment (Years of Schooling) No Unobserved Heterogeneity (π), Factors Normally Distributed

For cognitive skill production, the parental cognitive skill parameter increases in the second stage. The opposite is true for parental noncognitive skills. In producing noncognitive skills, parental cognitive skills play no role at either stage. Parental noncognitive skills play a strong role in stage 1 and a weaker role in stage 2.

Using our factor model, we can investigate the extent of measurement error on each measure of skill and investment in our data. To fix ideas, keep the conditioning on the regressors implicit and, without loss of generality, consider the measurements on cognitive skills in period *t*. For linear measurement equations

$$Var\phantom{\rule{thinmathspace}{0ex}}\left({Z}_{1,C,t,j}\right)={\alpha}_{1,C,t,j}^{2}Var\phantom{\rule{thinmathspace}{0ex}}\left(\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{C,t}\right)+Var\phantom{\rule{thinmathspace}{0ex}}\left({\epsilon}_{1,C,t,j}\right).$$

The fractions of the variance of *Z*_{1,C,t,j} due to measurement error, ${s}_{1,C,t,j}^{\epsilon}$, and true signal, ${s}_{1,C,t,j}^{\theta}$ are, respectively,

$${s}_{1,C,t,j}^{\epsilon}=\frac{Var\phantom{\rule{thinmathspace}{0ex}}\left({\epsilon}_{1,C,t,j}\right)}{{\alpha}_{1,C,t,j}^{2}Var\phantom{\rule{thinmathspace}{0ex}}\left(\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{C,t}\right)+Var\phantom{\rule{thinmathspace}{0ex}}\left({\epsilon}_{1,C,t,j}\right)}\phantom{\rule{thickmathspace}{0ex}}\left(\text{noise}\right)$$

and

$${s}_{1,C,t,j}^{\theta}=\frac{{\alpha}_{1,C,t,j}^{2}Var\phantom{\rule{thinmathspace}{0ex}}\left(\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{C,t}\right)}{{\alpha}_{1,C,t,j}^{2}Var\phantom{\rule{thinmathspace}{0ex}}\left(\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}{\theta}_{C,t}\right)+Var\phantom{\rule{thinmathspace}{0ex}}\left({\epsilon}_{1,C,t,j}\right)}\phantom{\rule{thickmathspace}{0ex}}\left(\text{signal}\right).$$

For each measure of skill and investment used in the estimation, we construct ${s}_{1,C,t,j}^{\epsilon}$ and ${s}_{1,C,t,j}^{\theta}$ which are reported in Table 2A. Note that early proxies tend to have a higher fraction of observed variance due to measurement error. For example, the measure that contains the lowest true signal ratio is the MSD (Motor and Social Developments Score) at year of birth, in which less than 5% of the observed variance is signal. The proxy with the highest signal ratio is the PIAT Reading Recognition Scores at ages 5−6, for which almost 96% of the observed variance is due to the variance of the true signal. Overall, about 54% of the observed variance is associated with the cognitive skill factors *θ _{C,t}*.

Table 2A also shows show the same ratios for measures of child noncognitive skills. The measures of noncognitive skills tend to be lower in informational content than their cognitive counterparts. Overall, less than 40% of the observed variance is due to the variance associated with the factors for noncognitive skills. The poorest measure for noncognitive skills is the “Sociability” measure at ages 1−2, in which less than 1% of the observed variance is signal. The richest is the “BPI Headstrong” score, in which almost 62% of the observed variance is due to the variance of the signal.

Table 2A also presents the signal-noise ratio of measures of parental cognitive and noncognitive skills. Overall, measures of maternal cognitive skills tend to have higher information content than measures of noncognitive skills. While the poorest measurement on cognitive skills has a signal ratio of almost 35%, the richest measurements on noncognitive skills are slightly above 40%.

Analogous estimates of signal and noise for our investment measures are reported in Table 2B. Investment measures are much noisier than either measure of skill. The measures for investments at earlier stages tend to be noisier than the measures at later stages. It is interesting to note that the measure “Number of Books” has a high signal-noise ratio at early years, but not in later years. At earlier years, the “How Often Mom Reads to the Child” has about the same informational content as “Number of Books.” In later years, measures such as “Trips to the Museum” and “Attendance of Musical Performances” have higher signal-noise ratios.

These estimates suggest that it is likely to be empirically important to control for measurement error in estimating technologies of skill formation. A general pattern is that at early ages measures of skill tend to be riddled with measurement error. The general pattern is reversed for measurement error in investments.

We now demonstrate the impact of neglecting measurement error on estimates of the technology. To make the most convincing case for the importance of measurement error, we use the least error prone proxies as determined in our estimates of Table 2.^{30}

Not accounting for measurement error has substantial effects on the estimated technology. Comparing the estimates in Table 3 with those in Table 1, the estimated first stage investment effects are much less precisely estimated in a model that ignores measurement errors than in a model that corrects for them. In the second stage, the estimated investment effects are generally stronger. Unlike all of the specifications that control for measurement error, we estimate strong cross productivity effects of cognitive skills on noncognitive skill production. As in Table 1, there are cross productivity effects of noncognitive skills on cognitive skills at both stages although the estimated productivity parameters are somewhat smaller. The estimated elasticities of substitution for cognitive skills at both stages are comparable across the two specifications. The elasticities of substitution for noncognitive skills are substantially lower at both stages in the specification that does not control for measurement error. The error variances of the shocks are substantially larger. Parental cognitive skills are estimated to have substantial effects on child cognitive skills but not their noncognitive skills. This contrasts with the estimates reported in Table 1 that show strong effects of parental noncognitive skills on child cognitive skills in both stages, and on noncognitive skills in the first stage.

We next consider the effect of controlling for unobserved heterogeneity for the specification with estimates reported in Table 1. Doing so allows for endogeneity of the inputs. We break the error term for the technology into two parts: a time-invariant unobserved heterogeneity factor *π* that is correlated with the vector (*θ _{t}*,

Table 4 shows that correcting for heterogeneity, the estimated coefficients for parental investments have higher impact on cognitive skills at the first stage. The coefficient on parental investment in the first stage is ${\gamma}_{1,C,3}\cong 0.17$, while in the second stage ${\gamma}_{2,C,3}\cong 0.06$. The elasticity of substitution in the first stage is well above one, ${\sigma}_{1,C}={\scriptstyle \frac{1}{1-0.33}}\cong 1.5$, and in the second stage it is well below one, ${\sigma}_{2,C}\cong {\scriptstyle \frac{1}{1+0.8}}\cong 0.55$. These results suggest that early investments are important in producing cognitive skills. Consistent with the estimates reported in Table 1, noncognitive skills increase cognitive skills in the first stage, but not in the second stage. Parental cognitive and noncognitive skills affect the accumulation of child cognitive skills.

Estimated Technology Allowing for Heterogeneity Linear Anchoring on Educational Attainment (Years of Schooling) Allowing for Unobserved Heterogeneity (π), Factors Normally Distributed

Panel B of Table 4 presents estimates of the technology of noncognitive skills. Note that, contrary to the estimates reported for the technology for cognitive skills, the elasticity of substitution increases from the first stage to the second stage. At the early stage, ${\sigma}_{1,N}\cong 0.54$ while at the late stage, ${\sigma}_{2,N}\cong 0.77$. The impact of parental investments is slightly larger at late stages as well $({\gamma}_{1,N,3}\cong 0.05\phantom{\rule{thickmathspace}{0ex}}\mathrm{vs}.\phantom{\rule{thickmathspace}{0ex}}{\gamma}_{2,N,3}\cong 0.07)$. While parental noncognitive skills affect the accumulation of a child's noncognitive skills early and late, parental cognitive skills only affect the accumulation of a child's noncognitive skills at early stages. The estimates in Table 1 show no effect of parental cognitive skills on either stage of the production of cognitive skills.

Table 5 reports estimates of our model when we adjoin investment parameters of the equations (3.10) to the model just discussed and identify *g _{t}* along with all of the other parameters estimated in the model reported in Table 4.

Estimates of the Technology for Cognitive and Noncognitive Skill Formation Estimated Along with Investment Equation with Linear Anchoring on Educational Attainment (Years of Schooling) Allowing for Unobserved Heterogeneity (π), Factors Normally **...**

Comparable changes in the estimates occur in our estimates of the technology for producing noncognitive skills. The impact of early investments is reduced from ${\gamma}_{1,N,3}\cong 0.05$ (see Table 4, Panel B) to ${\gamma}_{1,C,3}\cong 0.02$ (in Table 5, Panel B). The elasticity of substitution in noncognitive skills barely moves, changing from ${\sigma}_{2,N}={\scriptstyle \frac{1}{1-{\varphi}_{2,N}}}\cong 0.54$ to ${\sigma}_{2,N}={\scriptstyle \frac{1}{1-{\varphi}_{2,N}}}\cong 0.55$ (in Table 5, Panel B). The estimated impact of late investments in producing noncognitive skills is estimated to be somewhat smaller, falling from ${\gamma}_{2,C,3}\cong 0.07$ to ${\gamma}_{2,C,3}\cong 0.05$. Compare Table 4, Panel B with Table 5, Panel B. When we include an equation for investments, the estimated elasticity of substitution increases for noncognitive skills in late stages, from ${\sigma}_{2,N}={\scriptstyle \frac{1}{1-{\varphi}_{2,N}}}\cong 0.55$ (in Table 4, Panel B) to ${\sigma}_{2,N}={\scriptstyle \frac{1}{1-{\varphi}_{2,N}}}\cong 0.68$ (in Table 5, Panel B).

Most of the empirical literature on skill production focuses on cognitive skills as the output of family investment. (See, e.g., Todd and Wolpin, 2005, 2007, and the references they cite.) It is of interest to estimate a more traditional model that ignores noncognitive skills. Table 6 reports estimates of a version of the model in Table 5 where noncognitive skills are excluded.

Technology of Cognitive Skill Formation Model with Cognitive Skills Only Estimated Along with Investment Equation with Linear Anchoring on Educational Attainment (Years of Schooling) Allowing for Unobserved Heterogeneity (π), Factors Normally **...**

The estimated self-productivity effect increases from the first stage to the second stage, in accord with the estimates found for all other specifications. However, the estimated first period elasticity of substitution is much smaller than the corresponding parameter in Table 5. The estimated second period elasticity is slightly higher. The estimated productivity parameters for investment are substantially higher in both stages of the model reported in Table 6, as are the productivity parameters for parental cognitive skills. The simulations discussed in the next subsection suggest dramatically different policies towards disadvantaged families from a model that ignores noncognitive skills compared to a model that does not.

The major findings from our analysis of models with two skills that control for measurement error and endogeneity of inputs are: (a) Self-productivity becomes stronger as children become older, for both cognitive and noncognitive skill formation. (b) Complementarity between cognitive skills and investment becomes stronger as children become older. The elasticity of substitution for cognition is *smaller* in second stage production. It is more difficult to compensate for the effects of adverse environments on cognitive endowments at later ages than it is at earlier ages.^{33} This pattern of the estimates helps to explain the evidence on ineffective cognitive remediation strategies for disadvantaged adolescents reported in Cunha, Heckman, Lochner, and Masterov (2006). (c) Complementarity between noncognitive skills and investments becomes weaker as children become older. The elasticity of substitution between investment and skills increases between the first stage and the second stage in the production of noncognitive skills. It is easier at *later* stages of childhood to remediate early disadvantage using investments in noncognitive skills.

We find that 34% of the variation in educational attainment in the sample is explained by the measures of cognitive and noncognitive capabilities that we use. Sixteen percent is due to adolescent cognitive capabilities. Twelve percent is due to adolescent noncognitive capabilities.^{34} Measured parental investments account for 15% of the variation in educational attainment. These estimates suggest that the measures of cognitive and noncognitive capabilities that we use are powerful, but not exclusive, determinants of educational attainment and that other factors, besides the measures of family investment that we use, are at work in explaining variation in educational attainment.

To examine the implications of these estimates, we analyze two social planning problems that can be solved solely from knowledge of the technology of skill formation and without knowledge of parental preferences and parental access to lending markets. The first problem determines the cost of investment required to produce high school attainment for children with different initial endowments of their own and parental capabilities. For the same distribution of endowments, the second problem determines optimal allocations of investments from a fixed budget to maximize aggregate schooling for a cohort of children and to minimize aggregate crime. Our analysis assumes that the state has full control over family investment decisions. For neither problem do we model parental investment responses to the policy or parental investment. These simulations produce a measure of the investment that is needed from whatever source to achieve the specified target.

Suppose that there are *H* children indexed by *h* {1, . . . , *H*}. Let (*θ*_{C,1,h}, *θ*_{N,1,h}) denote the initial cognitive and noncognitive skills of child *h*. She has parents with cognitive and noncognitive skills denoted by *θ _{C,P,h}* and

The criterion adopted for the first problem assumes that the goal of society is to get the schooling of every child to a twelfth grade level. The required investments measure the power of initial endowments in determining inequality and the compensation through investment that is required to eliminate their influence. Let *e*(*θ*_{1,h}) be the minimum cost of attaining 12 years of schooling for a child with endowment *θ*_{1,h}. Assuming no discounting, the problem is formally defined by

$$e\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{1,h}\right)=\text{min}\phantom{\rule{thinmathspace}{0ex}}[{I}_{1,h}+{I}_{2,h}]$$

subject to a schooling constraint:

$$S\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,3,h},{\theta}_{N,3,h},{\pi}_{h})=12,$$

where *S* maps end of childhood capabilities and other relevant factors (*π _{h}*) into schooling attainment, subject to the technology of capability formation constraint

$${\theta}_{k,t+1,h}={f}_{k,t}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t,h},{\theta}_{N,t,h},{\theta}_{C,P,h},{\theta}_{N,P,h},{I}_{t,h},{\pi}_{h})\phantom{\rule{thickmathspace}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}k\in \{C,N\}\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}t\in \{1,2\},$$

and the initial endowments of the child and her parents. We have estimated all of the ingredient functions.^{35}

Figures 2 (for child endowments) and and33 (for parental endowments) plot the percentage increase in investment over that required for a child with mean parental and personal endowments to attain high school graduation.^{36} The shading in the graphs represents different values of investments. The lightly shaded areas of the graph correspond to higher values.
Eighty percent more investment is required for children with the most disadvantaged personal endowments (Figure 2). The corresponding figure for children with the most disadvantaged parental endowments is 95% (Figure 3). The negative percentages for children with high initial endowments is a measure of their advantage. From the analysis of Moon (2008), investments *received* as a function of a child's endowments are typically in reverse order from what are required. Children born with advantageous endowments typically receive more parental investment than children from less advantaged environments.

Percentage Increase in Total Investments as a Function of Child Initial Conditions of Cognitive and Noncognitive Skills

Percentage Increase in Total Investments as a Function of Maternal Cognitive and Noncognitive Skills

A more standard social planner's problem maximizes aggregate human capital subject to a budget constraint *B* = 2*H*, so that the per capita budget is 2 units of investments. We draw *H* children from the initial distribution *F* (*θ*_{1,h}), and solve the problem of how to allocate finite resources 2*H* to maximize the average education of the cohort. Formally, the social planner maximizes aggregate schooling

$$\sum _{h=1}^{H}({I}_{1,h}+{I}_{2,h})=2H$$

(5.2)

subject to the aggregate budget constraint,

$$\text{max}\phantom{\rule{thinmathspace}{0ex}}\stackrel{\u2012}{S}=\frac{1}{H}\sum _{h=1}^{H}S\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,3,h},{\theta}_{N,3,h},{\pi}_{h}),$$

the technology constraint,

$${\theta}_{k,t+1,h}={f}_{k,t}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t,h},{\theta}_{N,t,h},{\theta}_{C,P,h},{\theta}_{N,P,h},{\pi}_{h})\phantom{\rule{thickmathspace}{0ex}}\text{for}\phantom{\rule{thickmathspace}{0ex}}k\in \{C,N\}\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}t\in \{1,2\},$$

and the initial endowments of the child and her family. Again, we assume no discounting. Solving this problem, we obtain optimal early and late investments, *I*_{1,h} and *I*_{2,h}, respectively, for each child *h*. An analogous social planning problem is used to minimize crime.

Figures 4 (for child personal endowments) and and55 (for maternal endowments) show the profiles of early (left hand side graph) and late (right hand side graph) investment as a function of endowments. For the most disadvantaged, the optimal policy is to invest a lot in the early years. The decline in investment by level of advantage is dramatic for early investment. Second period investment profiles are much flatter and slightly favor more advantaged children. A similar profile emerges for investments to reduce aggregate crime, which for the sake of brevity, we do not display.

Optimal Early (Left) and Late (Right) Investments by Child Initial Conditions of Cognitive and Noncognitive Skills Maximizing Aggregate Education

Optimal Early (Left) and Late (Right) Investments by Maternal Cognitive and Noncognitive Skills Maximizing Aggregate Education

Figures 6 and and77 reveal that the ratio of optimal early-to-late investment as a function of the child's personal endowments declines with advantage whether the social planner seeks to maximize educational attainment (left hand side) or to minimize aggregate crime (right hand side). A somewhat similar pattern emerges for the optimal ratio of early-to-late investment as a function of maternal endowments with one interesting twist. The optimal investment ratio is non-monotonic in the mother's cognitive skill for each level of her noncognitive skills. At very low or very high levels of maternal cognitive skills, it is better to invest relatively more in the second period than if her endowment is at the mean.

Ratio of Early to Late Investments by Child Initial Conditions of Cognitive and Noncognitive Skills Maximizing Aggregate Education (Left) and Minimizing Aggregate Crime (Right)

Ratio of Early to Late Investments by Maternal Cognitive and Noncognitive Skills Maximizing Aggregate Education (Left) and Minimizing Aggregate Crime (Right)

The optimal ratio of early-to-late investment depends on the desired outcome, the endowments of children and budget *B* = 2*H*. Figure 8 plots the density of the ratio of early-to-late investment for education and crime.^{37} Crime is more intensive in noncognitive skill than educational attainment, which depends much more strongly on cognitive skills. Because compensation for adversity in noncognitive skills is less costly in the second period than in the first period, while the opposite is true for cognitive skills, it is optimal to weigh first and second period investments in the directions indicated in the figure.

Densities of Ratio of Early to Late Investments Maximizing Aggregate Education versus Minimizing Aggregate Crime

These simulations suggest that the timing and level of optimal interventions for disadvantaged children depend on the conditions of disadvantage and the nature of desired outcomes. Targeted strategies are likely to be effective especially for different targets that weight cognitive and noncognitive traits differently.

We now compare the policy implications of the model formulated only for cognitive skills with estimates reported in Table 6. We consider the problem of maximizing aggregate educational attainment using the estimates from a model with only cognitive skills. Figures 9 and and1010 compare optimal early investments from the cognitive-skill-only model (left) with investments from the model with both skills (right). As before, less shaded regions of the figures correspond to higher values for investment.

Optimal Early Investments by Child Initial Cognitive Skills and Maternal Cognitive Skills Model with Cognitive Skill Only (Left) and the Model with Cognitive and Noncognitive Skills (Right)

Optimal Late Investments by Child Initial Cognitive Skills and Maternal Cognitive Skills Model with Cognitive Skill Only (Left) and the Model with Cognitive and Noncognitive Skills (Right)

A model of skill formation that focuses solely on cognitive skills suggests that it is optimal to perpetuate inequality. In contrast to the implications from the two skill model, investments are *lower* at the first stage of the life cycle for the most disadvantaged as measured by initial endowments compared to the most advantaged. The cognition-only model ignores the cross productivity of noncognitive skills on cognitive skills and the greater malleability of noncognitive skills in the second stage. By ignoring a central feature of the human skill formation process, it produces a misleading guide to public policy.^{38}

This paper formulates and estimates a multistage model of the evolution of child cognitive and noncognitive skills as determined by parental investments at different stages of the life cycle of children. We estimate the elasticity of substitution between contemporaneous investment and stocks of skills inherited from previous periods to determine the substitutability between early and late investments. We also determine the quantitative importance of early endowments and later investments in determining schooling attainment. We account for the proxy nature of the measures of parental inputs and of outputs and find evidence for substantial measurement error which, if not accounted for, leads to badly distorted characterizations of the technology of skill formation. We establish nonparametric identification of a wide class of nonlinear factor models which enable us to determine the technology of skill formation. A by-product of our approach is a framework for the evaluation of childhood interventions that avoids reliance on arbitrarily scaled test scores. We develop a nonparametric approach to this problem by anchoring test scores in adult outcomes with interpretable scales.

Using measures of parental investment and child outcomes from the Children of the National Longitudinal Survey of Youth, we estimate the parameters governing the substitutability between early and late investments in cognitive and noncognitive skills. In our preferred specification, we find greater malleability and substitutability for noncognitive skills in later stages of a child's life cycle than for cognitive skills, consistent with evidence reported in Cunha, Heckman, Lochner, and Masterov (2006). These estimates imply that successful adolescent remediation strategies for disadvantaged children should focus on noncognitive skills. Investments in the early years are important for the formation of adult cognitive skills. Policy simulations from the model suggest that there is no tradeo between equity and efficiency. The optimal investment strategy to maximize aggregate schooling attainment is to target the most disadvantaged at younger ages. Accounting for noncognitive skills is important. A model that ignores the impact of noncognitive skills on productivity and outcomes suggests an equity-efficiency tradeo and that to maximize aggregate productivity those born with the most advantage should receive relatively more investment in the early years.

^{*}This paper was supported by grants from the National Science Foundation (SES-0241858, SES-0099195, SES-0452089, SES-0752699); the National Institute of Child Health and Human Development (R01HD43411); the J. B. and M. K. Pritzker Foundation; the Susan Bu ett Foundation; the American Bar Foundation; the Children's Initiative, a project of the Pritzker Family Foundation at the Harris School of Public Policy Studies at the University of Chicago; and PAES, supported by the Pew Foundation. We thank the editor and three referees for very helpful comments. We have also benefited from comments received from Orazio Attanasio, Gary Becker, Lars Hansen, Kevin Murphy, Petra Todd, and Ken Wolpin, as well as from participants at the Yale Labor/Macro Conference (May 2006), University of Chicago Applications Workshop (June 2006), the New York University Applied Microeconomics Workshop (March 2008), the Indiana University Macroeconomics Workshop (September 2008), the Empirical Microeconomics and Econometrics Seminar at Boston College, the Applied Economics and Econometrics Seminar at the University of Western Ontario, and the IFS Conference on Structural Models of the Labour Market and Policy Analysis. A website containing supplementary material is available at http://jenni.uchicago.edu/elast-sub.

^{1}See Herrnstein and Murray (1994), Murnane, Willett, and Levy (1995), and Cawley, Heckman, and Vytlacil (2001).

^{2}See Heckman, Stixrud, and Urzua (2006), Borghans, Duckworth, Heckman, and ter Weel (2008) and the references they cite. See also the special issue of the *Journal of Human Resources* 43 (4), Fall 2008 on noncognitive skills.

^{3}See Cunha, Heckman, Lochner, and Masterov (2006) and Cunha and Heckman (2007).

^{4}See Knudsen, Heckman, Cameron, and Shonko (2006) and Heckman (2008).

^{5}See Shumway and Sto er (1982) and Watson and Engle (1983) for early discussions of such models. Amemiya and Yalcin (2001) survey the literature on nonlinear factor analysis. Our identification analysis is new. For a recent treatment of dynamic factor and related state space models see Durbin, Harvey, Koopman, and Shephard (2004) and the voluminous literature they cite.

^{6}Cawley, Heckman, and Vytlacil (1999) anchor test scores in earnings outcomes.

^{7}Cunha and Heckman (2008) develop a class of anchoring functions invariant to a ne transformations. This paper develops a more general class of monotonic transformations and presents a new analysis of joint identification of the anchoring equations and the technology of skill formation.

^{8}This model generalizes Becker and Tomes (1986), who assume only one period of childhood (*T* = 1) and consider one output associated with “human capital” that can be interpreted as a composite of cognitive (*C*) and noncognitive (*N*) skills.

^{9}See, e.g., Cunha, Heckman, Lochner, and Masterov (2006), Heckman, Moon, Pinto, Savelyev, and Yavitz (2008) and Heckman, Moon, Pinto, and Yavitz (2008).

^{10}To focus on the main contribution of this paper, we focus on investment in children. Thus we assume that *θ*_{T+1} is the adult stock of skills for the rest of life contrary to the evidence reported in Borghans, Duckworth, Heckman, and ter Weel (2008). The technology could be extended to accommodate adult investment as in Ben-Porath (1967) or its generalization (Heckman, Lochner, and Taber, 1998).

^{11}See Web Appendix 5 for the derivation of this expression in terms of the parameters of equations (2.3)–(2.5).

^{12}This formulation assumes that measurements *a* {1, 2, 3} proxy only one factor. Carneiro, Hansen, and Heckman (2003) consider alternative specifications, but in a much less general econometric model. The key idea in all factor approaches is one normalization of the factor loading for each factor in one measurement to set the scale of the factor and *some* measurements for each measurement of type *a* dedicated to each factor. It is clear that even within the framework of this paper, as long as *some* of each of the measurements of type *a* satisfy the assumptions in this paper, one can identify the factor loadings of the remaining measurements that do not satisfy the assumptions if, for example, the factors are mutually independent.

^{13}In our framework, parental skills are assumed to be constant over time. Consequently, we need only two measures of each parental skill in one period, say the first.

^{14}The idea is to write

$$\frac{Cov\phantom{\rule{thinmathspace}{0ex}}({Z}_{1,C,t,2},{Z}_{{a}^{\prime},{k}^{\prime},{t}^{\prime},3})}{Cov\phantom{\rule{thinmathspace}{0ex}}({Z}_{1,C,t,1},{Z}_{{a}^{\prime},{k}^{\prime},{t}^{\prime},3})}=\frac{{\alpha}_{1,C,t,2}{\alpha}_{{a}^{\prime},{k}^{\prime},{t}^{\prime},3}Cov\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t},{\theta}_{{k}^{\prime},{t}^{\prime}})}{{\alpha}_{1,C,t,1}{\alpha}_{{a}^{\prime},{k}^{\prime},{t}^{\prime},3}Cov\phantom{\rule{thinmathspace}{0ex}}({\theta}_{C,t},{\theta}_{{k}^{\prime},{t}^{\prime}})}=\frac{{\alpha}_{1,C,t,2}}{{\alpha}_{1,C,t,1}}={\alpha}_{1,C,t,2}$$

^{15}The results of Theorem 1 are sketched informally in Schennach (2004a, footnote ^{11}).

^{16}This is a density with respect to the product measure of the Lebesgue measure on ${\mathbb{R}}^{L}\times {\mathbb{R}}^{L}\times {\mathbb{R}}^{L}$ and some dominating measure *μ*. Hence *θ*, *Z*_{1}, *Z*_{2} must be continuously distributed while *Z*_{3} may be continuous or discrete.

^{17}A vector of correctly measured variables *C* can trivially be added to the model by including *C* in the list of conditioning variables for all densities in the statement of the theorem. Theorem 2 then implies that *p*_{θ|C}(*θ*|*C*) is identified. Since *p _{C}*(

^{18}In the case of classical measurement error, bounded completeness assumptions can be phrased in terms of primitive conditions requiring nonvanishing characteristic functions of the distributions of the measurement errors as in Mattner (1993). However, apart from this special case, very little is known about primitive conditions for bounded completeness, and research is still ongoing on this topic. See d'Haultfoeuille (2006).

^{19}See Matzkin (2003, 2007).

^{20}Observe that Theorem 2 covers the identifiability of the outcome (*Q _{j}*) functions (2.2) even if we supplement the model with errors

^{21}See Section 2 of the Web Appendix for the formal analysis of identification. We have not systematically investigated identification for a general nonseparable model for *Z*_{4,j} with *π* as an argument of the function. A parametric approach appears to require at least one outcome that depends on *π* and not other factors.

^{22}See the Web Appendix for a more detailed derivation of the likelihood function and filtering equations (see Web Appendix Section 3 and Web Appendix Section 6.4). Section 6.4 presents the full model with heterogeneity and investment equations.

^{23}Our analysis establishes that we can identify models with correlated measurement errors. However, the computational cost for such a model is substantial.

^{24}While we have rich data on home inputs, the information on schooling inputs is not so rich. Consistent with results reported in Todd and Wolpin (2005), we find that the poorly measured schooling inputs in the CNLSY are estimated to have only weak and statistically insignificant effects on outputs. Even correcting for measurement error, we find no evidence for important effects of schooling inputs on child outcomes. This finding is consistent with the Coleman Report (1966), but we do not push this interpretation. We do not report estimates of the model which include schooling inputs.

^{25}The first period is age 0, the second period is ages 1−2, the third period covers ages 3−4, and so on until the eighth period in which children are 13−14 years-old. The first stage of development starts at age 0 and finishes at ages 5−6, while the second stage of development starts at ages 5−6 and finishes at ages 13−14.

^{26}The modification to likelihood (4.1) from using logs is straightforward and for the sake of brevity we do not show the explicit expression. We use five regressors (*X*) for every measurement equation: a constant, the age of the child at the assessment date, the child's gender, a dummy variable if the mother was less than 20 years-old at the time of the first birth, and a cohort dummy (one if the child was born after 1987 and zero otherwise).

^{27}Cunha and Heckman (2008) show the sensitivity of the estimates to alternative anchors for a linear model specification.

^{28}The normalizations for the factors are presented in Web Appendix 8.

^{29}Zero values of coefficients in this and other tables arise from the optimizer attaining a boundary of zero in the parameter space.

^{30}At birth we use Cognitive Skill: weight at birth, Noncognitive Skill: Temperament/Difficulty Scale, Parental Investment: Number of books. At ages 1−2 we use Cognitive Skill: Body Parts, Noncognitive Skill: Temperament/Difficulty Scale, Parental Investment: Number of books. At ages 3−4 we use Cognitive Skill: PPVT, Noncognitive Skill: BPI Headstrong, Parental Investment: How often mother reads to the child. At ages 5−6 to ages 13−14 we use Cognitive Skill: Reading Recognition, Noncognitive Skill: BPI Headstrong, Parental Investment: How often child is taken to musical performances. Maternal Skills are time invariant: For Maternal Cognitive Skill: ASVAB Arithmetic Reasoning, For Maternal Noncognitive Skill: Self-Esteem Item: I am a failure.

^{31}We assume that *g _{t}* is linear and separable in its arguments, although this is not a necessary assumption in our identification, but certainly helps to save on computation time. Notice that under our assumption that

^{32}We also report the covariance matrix for the initial conditions of the model in that appendix.

^{33}This is true even in a model that omits noncognitive skills.

^{34}The skills are correlated so the marginal contributions of each skill do not add up to 34%. The decomposition used to produce these estimates is discussed in Web Appendix 9.

^{35}See Web Appendix 8 for the estimates of the schooling equation.

^{36}In graphing the investments as a function of the displayed endowments, we set the values of other endowments at mean values.

^{37}The optimal policy is not identical for each *h* and depends on *θ*_{1,h}, which varies in the population. The crime outcome is the number of arrests. Estimates of the coefficients of the outcome equations including those for crime are reported in Web Appendix Section 8.

^{38}Web Appendix 10 shows that this contrast is stronger if we assume a one stage-one cognitive skill model.

Flavio Cunha, Department of Economics University of Pennsylvania 3718 Locust Walk Philadelphia, PA 19102 ; Email: ude.nnepu.sas@ahnucf phone: 215−898−5652.

James Heckman, Department of Economics University of Chicago 1126 E. 59th Street Chicago, IL 60637 ; Email: ude.ogacihcu@hjj phone: 773−702−0634 fax: 773−702−8490 and Geary Institute Room B005 University College Dublin Belfield, Dublin 4, Ireland phone: +353 1 716 4615 fax: +353 1 716 1108.

Susanne Schennach, Department of Economics University of Chicago 1126 E. 59th Street Chicago, IL 60637 ; Email: ude.ogacihcu@nnehcsms phone: 773−702−8199 fax: 773−702−8490.

- Amemiya Y, Yalcin I. Nonlinear factor analysis as a statistical method. Statistical Science. 2001;16(3):275–294.
- Becker GS, Tomes N. Human capital and the rise and fall of families. Journal of Labor Economics. 1986 Jul;4(3, Part 2):S1–S39. [PubMed]
- Ben-Porath Y. The production of human capital and the life cycle of earnings. Journal of Political Economy. 1967 Aug;75(4, Part 1):352–365.
- Borghans L, Duckworth AL, Heckman JJ, ter Weel B. The economics and psychology of personality traits. Journal of Human Resources. 2008;43(4):972–1059. Fall.
- Carneiro P, Hansen K, Heckman JJ. Estimating distributions of treatment effects with an application to the returns to schooling and measurement of the effects of uncertainty on college choice. International Economic Review. 2003 May;44(2):361–422.
- Cawley J, Heckman JJ, Vytlacil EJ. On policies to reward the value added by educators. Review of Economics and Statistics. 1999 Nov;81(4):720–727.
- Cawley J, Heckman JJ, Vytlacil EJ. Three observations on wages and measured cognitive ability. Labour Economics. 2001 Sep;8(4):419–442.
- Center for Human Resource Research, editor. NLSY79 Child and Young Adult Data User's Guide. Ohio State University; Columbus, Ohio: 2004.
- Coleman JS. Equality of Educational Opportunity. U.S. Dept. of Health, Education, and Welfare, Office of Education; Washington, DC: 1966.
- Cunha F, Heckman JJ. The technology of skill formation. American Economic Review. 2007 May;97(2):31–47.
- Cunha F, Heckman JJ. Formulating, identifying and estimating the technology of cognitive and noncognitive skill formation. Journal of Human Resources. 2008;43(4):738–782. Fall.
- Cunha F, Heckman JJ, Lochner LJ, Masterov DV. Interpreting the evidence on life cycle skill formation. In: Hanushek EA, Welch F, editors. Handbook of the Economics of Education. North-Holland; Amsterdam: 2006. pp. 697–812. Chapter 12.
- Darolles S, Florens J-P, Renault E. Nonparametric instrumental regression. Working Paper 05−2002, Centre interuniversitaire de recherche en conomie quantitative, CIREQ. 2002
- d'Haultfoeuille X. On the completeness condition in nonparametric instrumental problems. Working Paper, ENSAE, CREST-INSEE and Université de Paris I. 2006
- Durbin J, Harvey AC, Koopman SJ, Shephard N. State Space and Unobserved Component Models: Theory and Applications: Proceedings of a Conference in Honour of James Durbin. Cambridge University Press; New York NY: 2004.
- Fryer R, Levitt S. Understanding the black-white test score gap in the first two years of school. Review of Economics and Statistics. 2004 May;86(2):447–464.
- Heckman JJ. Schools, skills and synapses. Economic Inquiry. 2008 Jul;46(3):289–324. [PMC free article] [PubMed]
- Heckman JJ, Lochner LJ, Taber C. Explaining rising wage inequality: Explorations with a dynamic general equilibrium model of labor earnings with heterogeneous agents. Review of Economic Dynamics. 1998 Jan;1(1):1–58.
- Heckman JJ, Moon SH, Pinto RR, Yavitz AQ. The rate of return to the Perry Preschool program. University of Chicago, Department of Economics. 2008. Unpublished manuscript.
- Heckman JJ, Moon SH, Pinto RRA, Savelyev PA, Yavitz AQ. Cost-benefit analysis of the Perry preschool program. University of Chicago, Department of Economics. 2008. Unpublished manuscript.
- Heckman JJ, Stixrud J, Urzua S. The effects of cognitive and noncognitive abilities on labor market outcomes and social behavior. Journal of Labor Economics. 2006 Jul;24(3):411–482.
- Herrnstein RJ, Murray CA. The Bell Curve: Intelligence and Class Structure in American Life. Free Press; New York: 1994.
- Hu Y, Schennach SM. Instrumental variable treatment of nonclassical measurement error models. Econometrica. 2008 Jan;76(1):195–216.
- Kniesner TJ, ter Weel B. Special issue on noncognitive skills and their development. Journal of Human Resources. 2008;43(4):729–1059. Fall.
- Knudsen EI, Heckman JJ, Cameron J, Shonko JP. Economic, neurobiological, and behavioral perspectives on building America's future workforce. Proceedings of the National Academy of Sciences. 2006 Jul;103(27):10155–10162. [PubMed]
- Levitt P. Structural and functional maturation of the developing primate brain. Journal of Pediatrics. 2003 Oct;143(4, Supplement):S35–S45. [PubMed]
- Mattner L. Some incomplete but boundedly complete location families. Annals of Statistics. 1993 Dec;21(4):2158–2162.
- Matzkin RL. Nonparametric estimation of nonadditive random functions. Econometrica. 2003 Sep;71(5):1339–1375.
- Matzkin RL. Nonparametric identification. In: Heckman J, Leamer E, editors. Handbook of Econometrics. 6B. Elsevier; Amsterdam: 2007.
- Moon SH. Ph. D. thesis. University of Chicago, Department of Economics; 2008. Skill Formation Technology and Multi-Dimensional Parental Investment.
- Murnane RJ, Willett JB, Levy F. The growing importance of cognitive skills in wage determination. Review of Economics and Statistics. 1995 May;77(2):251–266.
- Newey WK, Powell JL. Instrumental variable estimation of nonparametric models. Econometrica. 2003 Sep;71(5):1565–1578.
- Olds DL. Prenatal and infancy home visiting by nurses: From randomized trials to community replication. Prevention Science. 2002 Sep;3(2):153–172. [PubMed]
- Olley GS, Pakes A. The dynamics of productivity in the telecommunications equipment industry. Econometrica. 1996 Nov;64(6):1263–1297.
- Schennach SM. Estimation of nonlinear models with measurement error. Econometrica. 2004a Jan;72(1):33–75.
- Schennach SM. Nonparametric estimation in the presence of measurement error. Econometric Theory. 2004b;20:1046–1093.
- Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis. 1982 May;3(3):253–264.
- Todd PE, Wolpin KI. On the specification and estimation of the production function for cognitive achievement. Economic Journal. 2003 Feb;113(485):F3–33.
- Todd PE, Wolpin KI. The production of cognitive achievement in children: Home, school and racial test score gaps. Working paper, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania. 2005
- Todd PE, Wolpin KI. The production of cognitive achievement in children: Home, school, and racial test score gaps. Journal of Human Capital. 2007 Winter;1(1):91–136.
- Watson MW, Engle RF. Alternative algorithms for the estimation of dynamic factor, mimic and varying coefficient regression models. Journal of Econometrics. 1983 Dec;23(3):385–400.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |