Home | About | Journals | Submit | Contact Us | Français |

Formats

Article sections

Authors

Related links

Sci Rep. 2012; 2: 514.

Published online 2012 July 19. doi: 10.1038/srep00514

PMCID: PMC3400147

Received 2012 April 4; Accepted 2012 June 18.

Copyright © 2012, Macmillan Publishers Limited. All rights reserved

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareALike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

This article has been cited by other articles in PMC.

Many dynamical systems, both natural and artificial, are stimulated by time dependent external signals, somehow processing the information contained therein. We demonstrate how to quantify the different modes in which information can be processed by such systems and combine them to define the computational capacity of a dynamical system. This is bounded by the number of linearly independent state variables of the dynamical system, equaling it if the system obeys the *fading memory* condition. It can be interpreted as the total number of linearly independent functions of its stimuli the system can compute. Our theory combines concepts from machine learning (*reservoir computing*), system modeling, stochastic processes, and functional analysis. We illustrate our theory by numerical simulations for the logistic map, a recurrent neural network, and a two-dimensional reaction diffusion system, uncovering universal trade-offs between the non-linearity of the computation and the system's short-term memory.

Many dynamical systems, both natural and artificial, are driven by external input signals and can be seen as performing non-trivial, real-time computations of these signals. The aim of this work is to study the online information processing that takes place within these dynamical systems. The importance of this question is underlined by the multitude of examples from nature and engineering which fall into this category. These include the activation patterns of biological neural circuits to sensory input streams^{1}, the dynamics of many classes of artificial neural network models used in artificial intelligence (including the whole field of “reservoir computing”^{2}^{,3}^{,4}^{,5} as well as its recent implementation using time delay systems^{6}^{,7}^{,8}), and systems of interacting chemicals, such as those found within a cell, that display intra-cellular control mechanisms in response to external stimuli^{9}. Recent insight in robotics has demonstrated that the control of animal and robot locomotion can greatly be simplified by exploiting the physical body's response to its environment^{10}^{,11}. Additional systems to which our theory could apply are population dynamics responding to, e.g., changes in food supply, ecosystems responding to external signals such as global climate changes, and the spread of information in social network graphs. All of these exemplary systems display completely different physical implementations.

We present a general framework that allows to compare the computational properties of a broad class of dynamical systems. We are able to characterize the information processing that takes place within these systems in a way which is independent of their physical realization, using a quantitive measure. It is normalized appropriately such that completely different systems can be compared and it allows us to characterize different computational regimes (linear vs. non-linear; long vs. short memory).

Some initial steps in this direction are provided by the linear memory capacity introduced in^{12} (later theoretically extended to linear systems in discrete time^{13} and continuous time systems^{14}, and using Fischer information^{15}). Although it has been argued that short-term memory of a neural system is crucial for the brain to perform useful computation on sensory input streams^{15}^{,16}, it is our belief that the general paradigm underlying the brain's computation cannot solely rely on maximizing linear memory. The neural circuits have to compute complex non-linear and spatio-temporal functions of the inputs. Prior models focusing on linear memory capacity in essence assumed the dynamic system to only implement memory, while the complex non-linear mapping is off-loaded to an unspecified read-out mechanism. We propose to endow the dynamic system with all the required computational capacity, and use only simple linear read-out functions. The capacity measures we introduce are therefore of great interest since they quantify all the information processing that takes place within the dynamical system, and don't introduce an artificial separation between linear and non-linear information processing.

One of the startling results of this work with potentially far reaching consequences is that all dynamical systems, provided they obey the condition of having a fading memory^{17} and have linearly independent internal variables, have in principle the same total normalized capacity to process information. The ability to carry out useful information processing is therefore an almost universal characteristic of dynamical systems. This result provides theoretical justification for the widely used paradigm of reservoir computing with a linear readout layer^{2}^{,3}^{,4}; indeed it confirms that such systems can be universal for computation of time invariant functions with fading memory.

We illustrate the versatility of our approach by using numerical simulations on three very different dynamical systems: the logistic map, a recurrent neural circuit, and a two-dimensional reaction diffusion system. These examples allow us to uncover the (potentially universal) trade-off between memory depth and non-linear computations performed by the dynamic system itself. We also discuss how the influence of noise decreases the computation capacity of a dynamical system.

The dynamical systems we consider are observed in discrete time . They are described by internal variables *x _{j}*(

The internal variables are driven by external signals *u*(*t*) *U ^{K}* which can have arbitrary dimensionality. Typical examples for the set

The system evolves according to an evolution equation of the form:

with *T _{j}* the mapping from input and previous state to the

In order to evaluate the performance of the system in a standardized way and to make analytical derivations tractable, we will take the inputs *u*(*t*) to be independent and identically drawn from some probability distribution *p*(*u*). It should be noted that in many, if not all, real world situations the inputs are not i.i.d. Most of the theory should also hold for the non-i.i.d. case, but this significantly complicates the construction of a basis for the Hilbert space defined in Definition 5. However our aim is to characterize the dynamical system itself. By taking the inputs to be i.i.d. we eliminate as much as possible any observed structure originating from the inputs, so that any measured structure will be due to the dynamical system itself. The distribution *p*(*u*) can be chosen according to what is most relevant for the dynamical system under study.

In order to study this dynamical system experimentally, we proceed as follows. First the system is initialized in an arbitrary initial state at a time -*T*′, and is driven for *T*′ time steps with a sequence of i.i.d. inputs drawn from the distribution *p*(*u*). This to ensure that potential initial transients have died out. Next, we drive the system for *T* timesteps with inputs from the same distribution. During this later phase we record both the values of the inputs *u*(*t*) and of the state variables *x*(*t*) which will be used to get insight into how information is processed by the dynamical system.

We will denote by *u ^{–h}*(

Consider a function on sequences of *h* inputs

Given the recorded inputs *u*(*t*), it induces a time dependent function *z*(*t*) = *z*(*u** ^{–h}*(

A central quantity in our analysis will be the *capacity* to reconstruct the function *z* from the state of a dynamical system using a linear estimator. We introduce this notion as follows. Consider a linear estimator constructed from the observed data

We measure the quality of the estimator by the *Mean Square Error* (MSE):

where here and throughout we denote by a subscript *T* quantities that are generated from the measured data. We are interested in the estimator which minimizes the MSE.

The *Capacity* for the dynamical system *X* to reconstruct the function *z* is

where .

The notion of capacity can also be interpreted from the point of view of computation. In this approach, one wishes to use the dynamical system to compute the function *z*(*u ^{–h}*), and to this end one uses, following the ideas introduced in reservoir computing

We note several properties of the capacity:

For any function *z*(*t*), and any set of recorded data *x*(*t*), *t* = 1, …, *T* the capacity can be expressed as

where denotes the time average of a set of data, and denotes the inverse of the matrix *x _{i}x_{j}*

For any function *z*, and any set of recorded data, the capacity is normalized according to:

where *C _{T}* [

Note that *C _{T}* [

The capacity is based on simple linear estimators . For this reason the capacity characterizes properties of the dynamical system itself, rather than the properties of the estimator. Consider for instance the case where *z*[*u ^{–h}*] =

In summary, by studying the capacity for different functions we can learn about the information processing capability of the dynamical system itself. Because the capacity is normalized to 1, we can compare the capacity for reconstructing the function *z* for different dynamical systems.

Consider two functions *z* and *z*′, and the associated capacities *C _{T}* [

To introduce this distance, we note that the probability distribution *p*(*u*) on the space of inputs, and associated notion of integrability, provide us with an inner product, distance, and norm on the space of functions , where , is the expectation taken with respect to the measure *p*(*u*) on *U ^{h}*. We define the Hilbert space as the space of functions with bounded norm (this is technically known as the weighted

The following theorem is one of our main results. It shows that if functions *z* and *z*′ in are orthogonal, then the corresponding capacities *C _{T}* [

Consider a dynamical system as described above with N output functions *x _{i}*(

The fact that the right hand side in eq. (8) depends only on the number of output functions *N* implies that we can compare sums of capacities, for the same sets of functions *Y _{L}*, but for completely different dynamical systems.

In order to characterize when equality is attained in Theorem 4 we introduce the notion of fading memory Hilbert space and fading memory dynamical system:

Hilbert space of fading memory functions. The Hilbert space of fading memory functions is defined as the limit of Cauchy sequences in , as *h* increases, as follows:

- If , then , and .
- Consider a sequence , , then the limit lim
_{h}_{→∞}*x*exists and is in if for all > 0, there exits , such that for all_{h}*h*,*h*′ >*h*_{0}, . - Conversely, all are the limit of a sequence of the type given in 2.
- The scalar product of
*x*, is given by , where*x*→_{h}*x*and are any two Cauchy sequences that converge to*x*and*x*′ according to 2) above.

In the supplementary information we prove that indeed constitutes a Hilbert space, and that it has a denumerable basis if is separable.

Fading Memory Dynamical System^{17}. Consider a dynamical system given by the recurrence *x*(*t*) = *T*(*x*(*t* – 1), *u*(*t*)) with some initial conditions *x*(–*T*′), and suppose that one has access to a finite number of internal variables *x _{i}*(

where the expectation is taken over the *t* + *T*′ previous inputs.

When the dynamical system has fading memory, its output functions *x _{i}*(

The following result states that if the dynamical system has fading memory, and if the dynamical variables *x _{i}*,

Consider a dynamical system with fading memory as described above with N accessible variables *x _{i}*(

This result tells us that under the non-strict conditions of Theorem 7 (fading memory and linearly independent internal variables), all dynamical systems driven by an external input and comprising *N* linearly independent internal variables have the same ability to carry out computation on the inputs, at least as measured using the capacities. Some systems may carry out more interesting computations than others, but at the fundamental level the amount of computation carried out by the different systems is equivalent.

There is a close connection between the sum of capacities for orthogonal functions and system modeling, and in particular Wiener series representation of dynamical systems^{18}^{,19}. Indeed suppose that one wishes to model the dynamical system with variables *x _{i}* (which for simplicity of the argument we suppose to belong to ) by linear combinations of the finite set of orthonormal functions ,

That is, the extent to which the sum of the capacities for a finite set of orthogonal functions saturates the bound given in Theorem 7 is proportional to how well the dynamical system can be modeled by the linear combinations of these functions. This connection between the degree to which dynamical systems can be used for computation and system modeling has to our knowledge not been noted before.

Reservoir computing was one of the main inspirations for the present work, and is closely connected to it. We shed new light on it, and in particular on the question raised in^{2} of characterizing the combinations of dynamical systems and output layer that have universal computing power on time invariant functions with fading memory. In^{2} it was shown that the system has universal computing power if the dynamical system has the *point-wise separation property* (a weak constraint), and if the readout layer has the *approximation property* (a strong constraint). Here we consider simple readout layers which are linear functions of the internal states of the dynamical system, see eq. (3). Our work shows that the combination of a dynamical system and a linear output layer has universal computing power on time invariant functions with fading memory if, in the limit *N* → ∞, the set of internal states *x _{i}*,

To illustrate our theory, we have chosen driven versions of three (simulated) dynamical systems. The first is the well-known logistic map^{22}, governed by the equation

with *u*(*t*) the input, , and a piecewise linear saturation function, i.e.,

to ensure that 0 ≤ *v* (*t*) ≤ 1. The parameter *ι* can be used to scale the strength with which the system is driven. When not driven, this system becomes unstable for *ρ* > 3.

The second system is an *echo-state network* (ESN)^{20}, which can be thought of as a simplified model of a random recurrent neural network. ESNs are very successful at processing time dependent information^{3}^{,21}. The *N* internal variables *x _{i}*(

where the coefficients *w _{ij}* and

The last system is a driven version of the continuous-time continuous space two-dimensional Gray-Scott reaction diffusion (RD) system^{25}^{,26}, a one-dimensional version of which was used in^{9} in a machine learning context. This system consists of two reagents, *A* and *B*, which have concentrations *a _{xy}*(

Reagents *A* and *B* diffuse through the system with diffusion constants *d _{a}* and

In order to break the spatial symmetry of the system, the modulation strength *w _{xy}* is randomized accross the system. The concentration of

We now apply our theoretical results to the study of these three dynamical systems. We first choose the probability distribution over the inputs *p*(*u*) and the complete orthonormal set of functions *y _{l}*. We take

where , *d _{i}* ≥ 0, is the normalized Legendre polynomial of degree

Each of the dynamical systems in our experiments were simulated. After an initialization period the inputs and outputs were recorded for a duration *T* = 10^{6} time steps. When using finite data, one must take care not to overestimate the capacities. According to the procedure outlined in the supplementary material, we set to zero all capacities for which , and therefore use in our analysis the truncated capacities where *θ* is the Heaviside function. The value of is taken to be = 1.7 10^{–4} for ESN, = 1.1 10^{–4} for RD system, and = 2.2 10^{–5} for the logistic map (see supplementary material for the rationale behind these choices of ). We have carefully checked that the values we obtain for are reliable.

According to the above theorems, for fading memory systems with linearly independent state variables, the sum of all capacities should equal the number of independent observed variables *N*. To test this, in Figure 1 we plot the sum of the measured capacities

as a function of the parameter *ρ* for logistic map and ESN, and for three values of the input scaling parameter *ι*. For very small input scaling, the capacity *C _{TOT}* remains close to

The measured capacities contain much additional information about how the dynamical systems process their inputs. To illustrate this, we have selected a parameter that changes the degree of nonlinearity for each system: the input scaling parameter *ι* for the ESN and logistic map, and the sampling period *T _{S}* for the RD system, and studied how the capacities change as a function of these parameters. In Figure 2 we show the breakdown of the total capacity according to the degree of the individual basis functions. The figure illustrates how, by increasing

Further insight into the different ways dynamical systems can process information is obtained by noting that the measured capacities rescaled by the number of observed signals, , constitute a (possibly sub-normalised because of underestimation or failure of the fading memory property) probability distribution over the basis functions. This enables one to compute quantities which summarize how information is processed in the dynamical systems, and compare between systems. We first (following^{12}) consider the *linear memory capacity* (the fraction of the total capacity associated to linear functions):

where the delta function is equal to 1 iff. its argument is zero, and otherwise is equal to 0. Conversely, the *non-linearity* of the dynamical system is defined by associating to each Legendre polynomial in eq. (12) the corresponding degree *d _{i}*, and to products of Legendre polynomials the sums of the corresponding degrees, i.e., the total degree of the basis function, to obtain

In Figure 3 we plot how these quantities evolve for the three dynamical systems as a function of the chosen parameters *ι* and *T _{S}*. Unsurprisingly the linearity

Real life systems are often affected by noise. The capacities enable one to quantify how the presence of noise decreases the computational power of a dynamical system. Here we model noise by assuming that there are two i.i.d. inputs *u* and *v*, representing the signal and the noise respectively. The aim is to carry out computation on *u*. The presence of the unwanted noise *v* will degrade the performances. The origin of this degradation can be understood as follows. We can define the Hilbert space of fading memory functions of *u*^{−∞}, as well as the Hilbert space of fading memory functions of *v*^{–∞}. The dynamical system belongs to the tensor product of these two spaces , and a basis of the full space is given by all possible products of basis functions for each space. It is only over the full space that Theorem 7 holds and that the capacities will sum to *N*. However since *v* is unknown, the capacities which are measurable are those which depend only on *u*. The corresponding basis functions project onto a subspace, and therefore these capacities will not sum to *N*. Furthermore, we expect that the more non-linear the system, the larger the fraction of the total capacity which will lie outside the subspace . This is illustrated in Figure 4 (left panel). We also consider in Figure 4 (right panel) the case where there are *K* input signals of equal strength, and the aim is to carry out computation on only one of them.

In the present work we introduce a framework for quantifying the information processing capacity of a broad class of input driven dynamical systems, independently of their implementation. Aside from the fundamental result that under general conditions dynamical systems of the same dimensionality have globally the same computational capacity, our framework also allows testing of hypotheses about these systems.

We have already outlined several insights about the influence of system parameters on the kind of computation carried out by the dynamical system, but the potential implications go much further. While previous work on memory capacities^{12}^{,13}^{,14}^{,15} seemed to indicate that non-linear networks tend to have bad (linear) memory properties and are thus unsuitable for long-term memory tasks, we now conclude that information is not lost inside these networks but merely transformed in various non-linear ways. This implies that non-linear mappings with memory in dynamical systems are not only realisable with linear systems providing the memory followed by a non-linear observation function, but that it is possible to find a system that inherently implements the desired mappings. Moreover, our framework allows an a-priori evaluation of the suitability of a given system to provide a desired functional mapping, e.g., required for a class of applications.

When considering sequences that are not i.i.d. (as is the case for most real-world data), the construction of an orthonormal function basis, although it exists in theory, is often not feasible. From this perspective, the upper bound on total capacity is largely a theoretical insight. However, by using either an over-complete set of functions or an orthonormal but incomplete set, a lot of useful information about the way a system processes a given type of input data or about the type of processing required for a given task can still be gained.

An avenue for further research with high potential would be to extend the current framework such that the systems can be adapted or even constructed so as to provide certain functional mappings. Also, a study of the effects of coupling several of these systems with known computational properties on the global capacity of the whole would be very interesting in the context of larger real-world dynamical systems.

The framework is applicable under rather loose conditions and thus to a variety of dynamical systems representative of realistic physical processes. It is for instance now possible to give a complete description of the functional mappings realised by neuronal networks, as well as the influence of network structure and even homeostatic and synaptic plasticity on the realised functional mappings. The application of the proposed measures to models for different neural substrates such as the hippocampus or prefrontal cortex could further elucidate their roles in the brain. Additionally, it allows for instance to evaluate the suitability of the physical embodiment of dynamical systems such as robots to inherently perform computation, thus creating an opportunity for simplifying their controller systems. Finally, it provides a novel way to approach systems which are not generally considered to explicitly perform computation, such as intra-cellular chemical reactions or social networks, potentially yielding entirely new insights into information processing and propagation in these systems.

J.D., B.S. and S.M. designed research, J.D., D.V. and S.M. performed research, all authors analyzed data and wrote the paper.

Supplementary material: Information processing capacity of dynamical systems

Click here to view.^{(673K, pdf)}

The authors acknowledge financial support by the IAP project Photonics@be (Belgian Science Policy Office) and the FP7 funded AMARSi EU project under grant agreement FP7-248311.

- Arbib M. (Ed.) The handbook of brain theory and neural networks, second edition. The MIT Press, Cambridge MA (2003).
- Maass W., Natschläger T. & Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comp. 14, 2531–2560 (2002). [PubMed]
- Jaeger H. & Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004). [PubMed]
- Verstraeten D., Schrauwen B., D'Haene M. & Stroobandt D. An experimental unification of reservoir computing methods. Neural Networks 20, 391–403 (2007). [PubMed]
- Vandoorne K.
*et al.*Toward optical signal processing using Photonic Reservoir Computing. Optics Express 16(15), 11182–11192 (2008). [PubMed] - Appeltant L.
*et al.*Information processing using a single dynamical node as complex system. Nat. Commun. 2, 468–472 (2011). [PMC free article] [PubMed] - Paquot Y.
*et al.*Optoelectronic Reservoir Computing. Nat. Sci. Rep. In press (2011). - Larger L.
*et al.*Photonic information processing beyond Turing: an optoelectronic implementation of reservoir computing. Optics Express 20(3), 3241–3249 (2012). [PubMed] - Dale K. & Husbands P. The evolution of Reaction-Diffusion Controllers for Minimally Cognitive Agents. Artificial Life 16, 1–19 (2010). [PubMed]
- Pfeifer R., Iida F. & Bongard J. C. New Robotics: Design Principles for Intelligent Systems. Artificial Life 11, 1–2 (2005). [PubMed]
- Hauser H., Ijspeert A. J., Füchslin R. M., Pfeifer R. & Maass W. Towards a theoretical foundation for morphological computation with compliant bodies. Biol. Cyber. 105, 355–370 (2011).
- Jaeger H. Short Term Memory in Echo State Networks. Fraunhofer Institute for Autonomous Intelligent Systems, Tech. rep. 152 (2002).
- White O., Lee D. & Sompolinsky H. Short-term memory in orthogonal neural networks. Phys. Rev. Lett. 92(14), 148102 (2002). [PubMed]
- Hermans M. & Schrauwen B. Memory in linear recurrent neural networks in continuous time. Neural Networks 23(3), 341–355 (2010). [PubMed]
- Gangulia S., Huhc D. & Sompolinsky H. Memory traces in dynamical systems. Proc. Natl. Acad. Sci. USA 105, 18970–18975 (2008). [PubMed]
- Buonomano D. & Maass W. State-dependent computations: Spatiotemporal processing in cortical networks. Nat. Rev. Neurosci. 10(2), 113–125 (2009). [PubMed]
- Boyd S. & Chua L. Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits Syst. 32, 1150–1161 (1985).
- Wiener N. Nonlinear Problems in Random Theory. John Wiley, New York (1958).
- Lee Y. W. & Schetzen M. Measurement of the Wiener kernels of a nonlinear system by cross-correlation. Int. J. Control 2, 237–254 (1965).
- Jaeger H. The “echo state” approach to analysing and training recurrent neural networks. Fraunhofer Institute for Autonomous Intelligent Systems, Tech. rep. 148(2001).
- Lukosevicius M. & Jaeger H. Reservoir computing approaches to recurrent neural network training. Comp. Sci. Rev. 3, 127–149 (2009).
- May R. M. Simple mathematical models with very complicated dynamics. Nature 261, 459–467 (1976). [PubMed]
- Buehner M. & Young P. A tighter bound for the echo state property. IEEE Trans. Neural Netw. 17, 820–824 (2006). [PubMed]
- Ozturk M. C., Xu D. & Principe J. C. Analysis and Design of Echo State Networks. Neural Comp. 19, 111–138 (2006). [PubMed]
- Gray P. & Scott S. K. Sustained oscillations and other exotic patterns of behavior in isothermal reactions. J. Phys. Chem. 89, 22–32 (1985).
- Pearson J. Complex patterns in a simple system. Science 261, 189–192 (1993). [PubMed]
- Langton C. G. Computation at the edge of chaos. Physica D 42, 12–37 (1990).

Articles from Scientific Reports are provided here courtesy of **Nature Publishing Group**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |