|Home | About | Journals | Submit | Contact Us | Français|
Many behaviors are composed of a series of elementary motor actions that must occur in a specific order, but the neuronal mechanisms by which such motor sequences are generated are poorly understood. In particular, if a sequence consists of a few motor actions, a primate can learn to replicate it from memory after practicing it for just a few trials. How do the motor and premotor areas of the brain assemble motor sequences so fast? The network model presented here reveals part of the solution to this problem. The model is based on experiments showing that, during the performance of motor sequences, some cortical neurons are always activated at specific times, regardless of which motor action is being executed. In the model, a population of such rank-order-selective (ROS) cells drives a layer of downstream motor neurons so that these generate specific movements at different times in different sequences. A key ingredient of the model is that the amplitude of the ROS responses must be modulated by sequence identity. Because of this modulation, which is consistent with experimental reports, the network is able not only to produce multiple sequences accurately but also to learn a new sequence with minimal changes in connectivity. The ROS neurons modulated by sequence identity thus serve as a basis set for constructing arbitrary sequences of motor responses downstream. The underlying mechanism is analogous to the mechanism described in parietal areas for generating coordinate transformations in the spatial domain.
Many sophisticated behaviors, from playing a Bach prelude on the piano to tying a knot, are organized as strings of motor actions that must be carried out in the proper order and at precisely the right times. Many simpler behaviors, however, such as eating a banana or climbing a tree, also involve the sequencing of multiple, separate movements. Thus, motor sequencing is a fundamental function of the brain. Not surprisingly, a variety of cortical and subcortical structures are involved in the learning, storage and execution of movement sequences (Tanji, 2001; Ashe et al., 2006). In general, however, evidence from lesion, imaging and neurophysiological studies indicates that, in primates, there are three cortical structures that are critical specifically for planning and executing sequences of discrete movements over time: the supplementary motor area (SMA), the pre-SMA, and the supplementary eye field (SEF) (Tanji, 2001; Nachev et al., 2008). A key observation is that when the SMA is pharmacologically inactivated, monkeys become incapable of making sequential movements from memory, although they can still perform the individual movement components (Gerloff et al., 1997; Shima and Tanji, 1998).
Single-neuron recordings in behaving monkeys trained to perform instructed motor sequences have provided a wealth of information about the neural basis of this capacity. In particular, studies by Tanji and collaborators have documented the activity of so-called rank-order-selective (ROS) neurons, which are abundant precisely in the pre-SMA, SMA and SEFs, and are also found in the basal ganglia, where they are known as phase-selective neurons (Kermadi and Joseph, 1995; Mushiake and Strick, 1995).
ROS cells fire during sequences of arm movements (Clower and Alexander, 1988; Mushiake et al., 1991; Shima and Tanji, 1998, 2000) or saccades (Isoda and Tanji, 2003, 2004; Averbeck et al., 2006; Berdyyeva and Olson, 2009). Their defining characteristic is that they are active during specific parts of a motor sequence. For instance, suppose sequence 1 is Pull-Push-Turn — meaning that the monkey has to pull a key, then push it, and then turn it — sequence 2 is Pull-Turn-Pull, and so on for other combinations. An ROS neuron could be active, say, during the transition between the second and third movements in all sequences, regardless of the actual movements performed at those points. That is, ROS cells are active at fixed time intervals during a multi-part motor action, and this activity is most consistent with a preference for a particular serial position in a sequence (Berdyyeva and Olson, 2009).
In addition, there is another feature of these cells that is suggested by the experimental reports and that, according to the model presented here, is also crucial for their function: their overall response amplitude depends on sequence identity. Thus, over a variety of sequences, an ROS unit is always activated during the same time period but with varying intensities.
This study shows that ROS neurons that are modulated by sequence identity in essence solve the problem of assembling arbitrary sequences of motor actions. The results are based on theoretical calculations and computer simulations of a model network in which such cells serve to construct many motor sequences composed of a set of discrete movements arranged in different orders. Interestingly, the underlying mechanism is similar to the nonlinear mechanism thought to be the basis for the computation of coordinate transformations in the visual system, except that here it is applied to the time domain.
All simulations were performed using Matlab (The Mathworks, Natick, MA). The code is available from the author upon request.
The model network has two layers. The first contains ROS neurons, and these drive a second layer of output or motor neurons through a set of synaptic connections. In each simulated trial, the ROS units are activated and their responses drive the motor units, which produce a sequence of three motor actions. The individual motor actions are denoted as A, B and C, they last 1 s and are separated from each other by additional blank or no-movement intervals also lasting 1 s. Each three-movement sequence has the structure -X-X-X- where X stands for one of the three movements and each dash indicates a blank interval. Thus, each simulated trial lasts a total of 7 s. This structure is not essential; it was chosen for simplicity and because it approximates the structure of the tasks used by Tanji and colleages and by other groups (Shima and Tanji, 1998, 2000; Lu et al., 2002; Isoda and Tanji, 2003, 2004; Averbeck et al., 2006). It is assumed that a cue at a beginning of a trial indicates which sequence is to be performed (from memory). The cue itself is not simulated, but its effect on the ROS cells is.
The responses of the ROS neurons depend on time and on the identity of the sequence to be performed. The symbol Rjqt identifies the instantaneous firing rate of ROS unit j at time step t during sequence q. The neurophysiological results suggest that the time dependence of an ROS neuron is the same for all sequences, and that sequence identity modulates the overall amplitude of its response. Therefore, the firing rate of ROS neuron j is modeled as
where rjqt is the mean firing rate, averaged over trials, and ε represents random noise. In addition, rmin = 2 and rmax = 33 spikes/s are constant terms, gjq is the gain of neuron j during sequence q, and fjt is the temporal profile of the neuron; i.e., its normalized response as a function of time in any sequence. In the simulations, the gain factors gjq for each model neuron vary randomly between gmin and 1 across sequences, which means that the maximum possible response suppression from one sequence to another, expressed as a percentage, is equal to (1-gmin) × 100%. The temporal profiles fjt are chosen so that each neuron is active for about 1000 ms, either during one of the movement periods or during one of the preparatory periods that precede them.
To include neuronal variability, a noise term ε is added to the response of every ROS unit (Eqn. 2). A different ε is drawn from a Gaussian distribution for each cell at every point in time. The variability is modeled as Poisson-like, meaning that the variance in the firing rate of unit j is equal to α times the mean rate of unit j. Therefore, since the mean firing rate of unit j at the current time is rjqt, the mean and variance of ε are
where the angle brackets indicate an average over trials, or repetitions. These random fluctuations are uncorrelated across cells.
In Eqn. 1, the temporal and sequence dependencies are combined multiplicatively. Although possible (Salinas and Abbott, 1996; Peña and Konishi, 2001; Sohn and Lee, 2007), an exact multiplication is not necessary; the obligatory condition is that the dependencies must be combined nonlinearly (Pouget and Sejnowski, 1997; Salinas, 2004a, b). This means that linear combinations should be avoided. For instance, when the mean firing rate of ROS neuron j during sequence q is described by an additive combination of time- and sequence-dependent terms, as in
The symbol mkqt stands for the firing rate of motor unit k during time point t of sequence q, and is calculated through a weighted sum of ROS responses,
where wkj represents the synaptic connection from ROS neuron j to output neuron k. This expression is used when the ROS neurons drive the motor neurons, i.e., after the synaptic connections have been established. However, there is also an intended or desired motor response for each output neuron. It is denoted as Mkqt and is used only for setting the synaptic connections. This is as follows.
In the simulations, each motor unit is supposed to contribute to the generation of one of the three possible movements, A, B or C. Initially, the desired response of each motor neuron is simply a square function of time that is on only during the associated movement period or during the preceding preparatory period, and is off otherwise. For instance, if motor neuron k is associated with the preparation of movement A, then during the sequence ABC (sequence 1) its desired motor response would be
because the preparatory period before A starts at time zero in this sequence. In contrast, during the sequence BAC (sequence 2) the desired motor response for the same cell would be
because the preparatory period before A now starts 2 s after the beginning of the trial. And so on for other sequences. For clarity, the t index in these expressions runs in 1 ms increments, but in the simulations every increment was equal to a 10 ms step. Finally, all the desired motor firing rates are smoothed with a Gaussian filter with a standard deviation of 50 ms, and the results are scaled and added to a background firing rate; see Fig. 3c for examples of the final shapes of the desired motor profiles. Equal numbers of motor units are assigned to each of the three movements.
The connections from the ROS to the motor neurons are chosen so that they minimize the average squared difference between the desired and driven responses; that is, they must make the error
as small as possible, where the angle brackets indicate an average over trials. Such minimization is a standard procedure for linear networks (Haykin, 1999; Salinas, 2004a, b). In this expression, the coefficients ϕq have been included to regulate the relative importance of different sequences, and thus the relative accuracy with which they are generated by the network. These importance coefficients satisfy the constraint Σqϕq=1, where the sum is over all sequences. Except when explicitly indicated, all sequences have the same importance, so ϕq=1/NQ for all q, where NQ is the total number of sequences.
The matrix of optimal connections w that minimizes the expression above is the one that satisfies the equation
The second term in Eqn. 11 is due to the variability (noise) in the ROS responses. Its overall strength is determined by α, where α=0 means that there is no noise and α=1 means that the noise follows Poisson statistics. In this term, δjk=1 if j=k and δjk=0 if j≠k.
The solution to Eqn. 10 can be obtained by calculating the inverse (or the pseudo-inverse) of the correlation matrix C, in which case w=LC-1. Alternatively, in Matlab, the system of equations can be solved more efficiently through the slash operator, such that w=L/C.
In selected simulations, the effect of random synaptic corruption is investigated by deleting some of the synapses in w. This is controlled by the parameter Pw, which is the probability that any element wij is set to zero after training.
Once the synaptic weights w are obtained, the motor responses driven by the ROS neurons are computed using Eqn. 6. Then, the actual differences between the desired and driven motor responses provide an indication of the accuracy of the network. The root-meansquare (RMS) error
is used to quantify the magnitude of these differences, where N is equal to the number of motor neurons, times the number of sequences, times the number of time points per sequence.
Another way to quantify the performance of the network is to decode the movement that is being generated at each point in time, and compare it to the intended movement at those same points. This is done with networks that include six motor neurons, one that fires before movement A, another that fires during movement A, another that fires before movement B, and so on. At each time point, the motor neuron with the maximum firing rate is identified. If this neuron is associated, say, with movement A and the desired movement at that time is indeed A, then that time point is scored as correct. And vice versa, if the unit with the highest rate is associated with a movement different from the desired one at that point, then that time point is scored as an error. In this way, by averaging over all sequences and multiple time points and trials, the probability of encoding an incorrect movement, Pm, is calculated. Time points near the borders between blank and movement intervals, during which the motor firing rates rise or fall, are excluded from the calculation.
The probability Pm quantifies a movement error that is brief, occurring over a single time step (of 10 ms in the simulations). To quantify movement errors that persist over the time scale of a whole movement, a similar procedure is implemented that scores each of the six movement periods as either correct or incorrect. For instance, suppose the first movement is A; the encoded movement is scored as correct if in at least one half of the time points of the first movement period the neuron with the maximum firing rate is the one associated with A. Conversely, the encoded movement is scored as incorrect if in more than one half of the time points during this movement period the neuron with the maximum firing rate is other than the one associated with A. By averaging over all sequences and time periods, and over trials, the probability of encoding an incorrect movement, PM, is calculated. This quantity indicates how often the network will generate a wrong movement signal that lasts throughout one of the 1 s intervals.
This study analyzes the functional relationship between two types of neurons, ROS neurons and motor neurons.
Shima and Tanji (2000) catalogued and described various subtypes of ROS neurons according to their temporal and sequence selectivities. The general class of ROS neurons considered here contains those neurons that they identified as ROS as well as other types that they discussed separately. Specifically, here, a neuron is considered as ROS if its temporal profile of activity remains approximately constant for all sequences, even if its overall response amplitude does change across sequences. This definition is consistent with their proposed functional role, as shown below.
Practically all areas where ROS cells are found contain a second type of cell that fires most intensely during the execution of specific movements (Tanji and Shima, 1994; Shima and Tanji, 2000; Isoda and Tanji, 2003). For simplicity, here these neurons are referred to as motor units. They show little or no dependence on sequence identity, or on movement order within a sequence. For example, a motor neuron selective for “Turn” would be active whenever a turn of the key is executed. Also, no distinction is made between neurons that fire predominantly before the actual movement (preparatory neurons) or during its execution. Here, all these movement-locked cells are considered the outputs of the circuit.
The model network developed here has a simple two-layer architecture in which a population of ROS neurons modulated by sequence identity drives a set of motor neurons downstream, which generate movements. The two layers are connected through a set of synaptic weights w, which are determined according to the identity of the sequences that the network is required to store (see Methods). The behavior of the network is first described with an example in which six sequences of three movements are performed. Each sequence is assembled by combining the movements A, B and C. The responses of the ROS and motor populations during each of the six sequences are shown in Fig. 1, which plots the firing rate of the cells, encoded by color, as a function of time.
Each ROS unit has a fixed temporal profile of activity, so it fires intensely during a particular time interval in any sequence. For instance, the model cells at the top left corner of the ROS arrays always fire during the first preparatory period. Their maximum response amplitude, or gain, however, changes as a function of sequence identity. For each model cell in this example, the six amplitude factors varied between 0.4 and 1 and were assigned randomly across sequences. This modulation can be seen by comparing the color intensity of a given cell across different sequences. The model ROS population includes a variety of temporal profiles, as reported (Shima and Tanji, 2000). Thus, their various segment or rank-order preferences tile the full duration of a sequence.
The motor neurons, on the other hand, behave in the opposite way: they fire at different points in time, depending on the sequence, but do so with the same intensity in all of them (if the corresponding movement is part of the sequence). Each motor unit fires in association with one of the movements (A, B or C), either during its execution or during the preceding preparatory interval. For example, the units at the top of the motor arrays in Fig. 1 always fire during the preparatory period leading to movement A.
What the model network achieves is to produce responses that can occur at any point in time out of responses that occur at fixed times. Although it is not obvious, the key for this is the sequence selectivity of the driving ROS cells.
Figure 2 plots the firing rates of three ROS neurons (Fig. 2a) and three motor neurons (Fig. 2b) as functions of time during the six sequences. These responses are from the same simulation as in Fig. 1, but the changes in gain are more evident in this format. Again, the key observation is that (1) the response of each ROS neuron occurs during a particular time interval but with different intensities across sequences, whereas (2) the response of each motor neuron has a constant amplitude but occurs at a different time interval in each sequence, depending on the time at which the corresponding motor movement needs to be executed.
For simplicity, and to mimick typical experimental situations, three types of movement and three movements per sequence were used in these examples, but in principle the mechanism works for any number of movements and sequences, as long as the ROS population contains the necessary combinations of temporal and sequence selectivities (see Appendix).
A crucial question for this type of network is how its accuracy relates to its size and to the number of motor sequences it must generate. Here, “size” really refers to the number of ROS neurons, because the accuracy of the simulated motor responses does not depend on the number of motor neurons included in the network. To investigate the network’s capacity, motor accuracy will be quantified as a function of network size under various conditions. Two different measures of accuracy will be considered. The first one is the RMS difference between driven and desired motor firing rates, ERMS (Eqn. 13).
The results in Figs. Figs.11 and and22 were obtained after training the network to produce a series of desired motor responses (see Methods). Here, “training” means that given the firing rates of the ROS neurons modulated by sequence identity, optimal synaptic connections are found such that the motor responses driven by those ROS cells approximate the desired responses as accurately as possible. The driven motor responses in Figs. Figs.11 and and22 are very close to those used during training, but are not exactly the same. The typical difference between them is ERMS. By means of this quantity, it will become clear that there are three potential sources of error in the network: the number of stored sequences relative to network size, the temporal structure of the ROS responses, and the trial-to-trial variability, or noise.
First consider an idealized case in which (1) there is no noise or response variability (α=0), and (2) all neurons, motor and ROS, have exactly the same temporal activation profiles. Thus, in contrast to Fig. 2a, where each of the three ROS units has a slightly different activation width, onset time (relative to the start of the relevant period) and activation shape, now the time courses of all neurons have exactly the same shape. In this case, motor accuracy is determined primarily by the number of ROS neurons relative to the number of stored sequences. In particular, in theory, the minimum number of ROS neurons needed to store NQ sequences of NS steps or time periods each is NROS=NQ × NS (see Appendix). The circles in Fig. 3a show the RMS error ERMS as a function of NROS for networks that generate six sequences throughout seven time periods, as in Figs. Figs.1,1, ,2.2. The curve falls very quickly until it exceeds 42 ROS neurons, the critical number; beyond that, the residual error is very small. Figures 3b and 3c show examples of driven and desired motor responses for this type of simplified network. With fewer than 42 ROS units (Fig. 3b, NROS=28), the two driven motor responses shown are are completely wrong during the third preparatory period: the magenta neuron fires much less than it should and the blue neuron fires much more than it should; the wrong movement is encoded in this period even though there is no noise in the system. In contrast, with more than 42 ROS units (Fig. 3c, NROS=91), the driven motor responses are indistinguishable from the desired ones.
Next, consider a situation in which (1) there is still no noise, but (2) the temporal profiles of the ROS units are not identical. In this case, each ROS unit is active for a slightly different amount of time (1000 ± 160 ms), its onset of activity is within 20 ms of the start of the corresponding period, and its profile is not flat at the top. Examples are shown in Fig. 2a. Meanwhile, the desired motor responses are exactly the same as before. Now a network with 91 ROS neurons has an RMS error of about ERMS ≈ 1.3 spikes/s, and the distinction between networks with more or fewer than 42 ROS units is strongly blurred (squares in Fig. 3a). As shown in Fig. 3d, although the driven motor responses are activated at the right times and with intensities that are on average very close to the correct levels, they are not smooth. There are many small deviations from the desired responses, which are due to the discrepancy between the temporal profiles of the ROS units and the profiles of the desired motor responses.
Finally, consider the more general case in which (1) there is response noise (α=1), and (2) there is variability across the ROS temporal profiles. Response noise, as one might expect, always increases the RMS error. For example, in a relatively small network (NROS=91), with Poisson variability the error goes from about 1.3 to about 3.9 spikes/s (compare Figs. 3d and 3e). The colored traces in Fig. 3e illustrate the motor responses of two model cells in a single trial under these conditions. Comparing these traces with those in Fig. 3g shows that the resulting single-trial fluctuations decrease substantially when the network grows to NROS=539 neurons. The same thing can be seen following the diamonds in Fig. 3a. Lastly, when there is response variability, as in real recordings, it is standard practice to average neuronal responses over multiple trials. The triangles in Fig. 3a show the RMS error between the desired motor responses and the mean driven responses in the limit when these are averaged over infinitely many trials. The effect of trial averaging can also be seen be comparing Figs. 3g and 3f.
In summary, to reproduce a set of desired motor sequences accurately, a network must have at least a minimum number of modulated ROS neurons, which depends on the number and length of the sequences that must be generated. Additional neurons are necessary, however, to compensate for trial-to-trial random fluctuations, and for differences between the temporal activation profiles of the ROS neurons and the desired motor activation profiles.
Another way to quantify the accuracy of the network is to compare the movement that it is supposed to produce at each point in time (A, B or C) with the movement that it actually encodes. This is done with the quantities Pm and PM (see Methods). Pm is the probability that the output layer of the network encodes the wrong movement at any time point during a simulated sequence. A coding error of this type applies to a single time step (of 10 ms), so it is a brief event. Figure 4a shows that this probability decreases steadily as the size of the network increases. The data along the thin black line are for networks trained to perform 6 sequences of three movements, as in Fig. 3, whereas the data along the gray line are for networks trained to perform 18 sequences of three movements. The Pm values rise in the latter case because when the network has to store a larger number of sequences, each one is reproduced less accurately.
A similar trend is seen with PM (Fig. 4b), which is the probability that the output layer of the network encodes the wrong movement throughout one of the movement or preparatory intervals. A coding error of this type occurs when a wrong movement is signaled during more than 50% of a movement or preparatory interval, so intuitively it corresponds more closely to the actual production of a wrong movement. The two lines and sets of data points in Fig. 4b plot PM versus network size for the same simulations as in Fig. 4a.
Like ERMS, the two probabilities for encoding an incorrect movement quantify the accuracy of the network, but they are much less sensitive than ERMS to small differences between driven and desired responses. For instance, Pm=0 for the network illustrated in Fig. 3d, which has an RMS error above 1, and Pm is only 0.085 for the network illustrated in Fig. 3e, which has an RMS error above 4. Furthermore, the curves for PM fall much more sharply than those for Pm, because each of the long error events that count toward PM requires at least 40 of the brief error events that count toward Pm. These differences are summarized by the following observation: with noise, as in Fig. 3a (diamonds and triangles) and Figs. 4a, 4b, ERMS falls as 1 over the square root of the number of ROS neurons , Pm falls exponentially with the number of ROS neurons , and PM falls even faster, with a Gaussian tail .
Four factors that affect network performance have been discussed: the size of the network, the number of stored sequences, the mismatch between ROS and motor temporal profiles, and response variability. There are two more components of the model that also determine performance: (1) the strength of the modulation by sequence identity, and (2) the precision of the synaptic weights.
The effect of modulation strength can be seen by varying the parameter gmin, which determines the minimum amplitude of the ROS responses. That is, gmin=1 corresponds to no modulation whatsoever, so that sequence identity has no effect on ROS activity, whereas gmin=0 corresponds to the maximum modulation possible, in which case a neuron can be completely suppressed in its least preferred sequence. In general, a lower gmin value should give rise to stronger differentiation between sequences, higher signal-to-noise ratio, and thus higher accuracy (see Salinas, 2004b). This is indeed what happens, as shown in Figs. 4c and 4d. The thin black lines in these plots indicate the standard condition in which gmin=0.4. With gmin=0 (orange lines), the stronger modulation indeed decreases the probability of coding the wrong movement, although the decrease is relatively modest, particularly in the case of PM. In contrast, when gmin=0.85 (purple lines) the maximum suppression is only 15%. This small amount of modulation produces rather drastic incrases in both Pm and PM, which are comparable to the increases seen from 6 to 18 stored sequences in Figs. 4a and 4b. Therefore, to obtain high motor accuracy, the minimum gain gmin should be considerably below 1, but on the other hand, it does not have to be very close to zero.
Finally, consider the effect of synaptic corruption on the performance of the model, as measured by Pm and PM. In this case, each model network is simulated as before, except that after training, each synaptic weight wij is randomly set to zero with a probability Pw. In other words, Pw=0.25, for instance, means that about 25% of the synaptic connections are randomly deleted. The behavior of the model in the presence of such synaptic corruption is shown in Figs. 4e and 4f, where, again, the thin black lines correspond to the standard condition in which Pw=0. Randomly deleting 5% of the network connections (orange lines) increases both Pm and PM by small amounts, and the manipulation has a negligible effect on large networks. The increase is much more notorious when 25% of the connections are randomly set to zero (purple lines), but the result is not catastrophic, in the sense that the probability of error still goes down exponentially with the number of ROS neurons in the network. Therefore, the model is not overly sensitive to corruption of the synaptic connections.
Experimental studies show that monkeys are capable of working with a relatively large repertoire of motor sequences (Shima and Tanji, 2000; Lu et al. 2002; Averbeck et al., 2006). In a typical motor sequence paradigm, at the start of each block of trials the monkey must execute a new sequence whose identity has to be discovered through trial and error. After some number of exploratory trials, the sequence is discovered and the monkey performs it a few more times from memory alone. Monkeys become more efficient in this process after a period of extensive training during which they become familiar with the available repertoire of sequences. Therefore, performance in these tasks depends in part on the learned repertoire, but in addition, every time a new block starts, there are a few initial trials during which the current sequence is recalled and quickly mastered (see Procyk et al., 2000; Lee and Quessy, 2003).
The network model simulates the generation of a number of learned sequences when these need to be reproduced from memory alone, but what happens when one sequence (the current one) needs to be performed more accurately or is practiced a few times, as in the monkey experiments? In particular, if the overall motor accuracy for the set of learned sequences is not very high, how much does the network need to change in order to improve the performance of one specific sequence, and does this have an impact on the other sequences?
These questions are addressed by including in the simulations a set of coefficients ϕq that tell the network how important sequence q is (see Methods). Two situations are contrasted: the standard case in which all sequences have equal importance, and a second case in which sequence 1 is singled out and is given an importance coefficient that is different from all the others (it does not matter which sequence is singled out; sequence 1 is chosen for convenience). The importance coefficients are such that Σqϕq=1, so notice the following: (1) ϕ1=1 means that sequence 1 is the only one that matters, all others are ignored, (2) ϕ1=0 means that sequence 1 is ignored, (3) ϕ1=1/NQ, where NQ is the number of available sequences, means that all sequences have the same importance. Therefore, when ϕ1 > 1/NQ sequence 1 is more important than any of the others and when ϕ1 < 1/NQ sequence 1 is less important.
Figure 5 illustrates what happens when ϕ1 is varied. In this example, the network included 252 ROS and 6 motor neurons and was trained to reproduce 18 sequences of three movements. Figure 5a shows the mean responses of the six motor neurons in the network, as functions of time, for 6 of the 18 sequences. These are the responses in the standard condition in which all sequences are equally important. The mean probability of encoding an incorrect movement in this case was Pm=0.15, but the probability varied slightly across sequences: the three sequences at the top of Fig. 5a had the highest values, whereas the three sequences at the bottom had the lowest. Figure 5b shows the motor responses of the same network under identical conditions except that sequence 1, AAB, had a higher importance. Notice that sequence AAB is reproduced much more accurately in Fig. 5b than in Fig. 5a: only the neurons that are supposed to be active fire intensely at each time point. In contrast, the rest of the sequences are reproduced only slightly less accurately than in Fig. 5a. This means that the network can generate a large improvement in one specific sequence with a relatively small cost in accuracy in each of the other sequences. A crucial question, however, is whether this improvement requires drastic changes in the synaptic connections between ROS and motor units.
It turns out that the answer is no, as illustrated in Fig. 5c. The synaptic weights that result when sequence 1 is favored have very similar values as the synaptic weights wij that result when sequence 1 has the same importance as all the others; the correlation coefficient between them is ρ=0.985. The similarity between w* and w results not because sequence 1 is special, but rather because of the properties of the network: each motor response is driven by many ROS neurons, so increasing the accuracy of one particular sequence requires many small changes in synaptic weight, rather than a few large changes. The impact of ϕ1 on the network can be seen more clearly in Fig. 5d, which plots three quantities: the probability of encoding a motor error during sequence 1 (red trace), the probability of encoding a motor error during any of the other sequences (dark blue trace), and the correlation coefficient between w* and w, all as functions of ϕ1. Interestingly, the shapes of these curves are ideal for changing the accuracy of one particular sequence without affecting the others or the synaptic weights very much. Around the point of equal importance (ϕ1=1/18, indicated by the dashed vertical line on the left), the red curve is very steep, which means that small changes in ϕ1 will translate into large changes in the accuracy with which sequence 1 is executed. In contrast, around the point of equal importance, the light and dark blue curves are very flat, which means that small changes in ϕ1 around this point will have a minimal impact on the other sequences and on the synaptic weights. As one would expect, increasing ϕ1 eventually does alter the weights drastically and gives rise to large motor errors during all other sequences, but this happens for values larger than ~0.6, which are extremely large.
These effects, the steepness of the curve for sequence 1 and the flatness of the other two curves, become stronger in larger networks (results not shown). Thus, the organization of these networks is such that a large improvement in the execution of one particular sequence can be achieved through very small changes in synaptic connectivity, and with minimal consequences for the execution of other sequences.
With the model, it is possible to simulate inactivation and stimulation experiments that would probe the connectivity of the networks involved in sequence generation. Two such manipulations were performed on the same model network used in Figs. Figs.11 and and22.
Inactivation of a homogeneous subset of ROS neurons was simulated first. The targeted ROS neurons were two thirds of those that fired during the preparatory period preceding the second movement. These cells were inactivated by multiplying their firing rates by 0.4, whereas the rest of the ROS neurons fired at their usual rates (as in Fig. 1). Inactivation was applied throughout the whole simulation time. Not surprisingly, the result was a reduction in firing rate in most motor neurons during the second preparatory period. This can be seen by comparing the three example motor responses in Fig. 6a, the standard condition, with those in Fig. 6b, the inactivation condition. The motor responses during the second preparatory period were suppressed by about 50%, althought the exact amount varied across cells. The key, however, is that the effect was temporally specific: inactivation of the ROS neurons that signal the second preparatory period had an impact on the motor responses exclusively during that period.
The second manipulation is more interesting. The same subset of ROS neurons was stimulated by adding 30 spikes/s to their usual firing rates at all times. In this way, these model neurons became active at the “wrong” times, at which they would normally have been silent. As can be seen in Fig. 6c, this typically shifted the firing rate curves of the downstream motor neurons upward. The size and direction of the shift varied across motor cells, but the key is that the effect was not temporally specific: for a given cell, the shift was the same at all times and for all sequences.
Similar shifts were obtained when the same number of ROS cells were stimulated but these were chosen randomly from the population. That is, the effect of such stimulation on the motor responses is qualitatively the same whether the stimulated ROS cells have the same rank-order selectivity or not. This is important because it means that an actual stimulation experiment could work even if the local stimulated ROS population contains a variety of temporal selectivities.
The observed shifts happen, of course, because the synaptic connections between ROS and motor neurons have fixed strengths, and thus the applied stimulation is passed on to the motor responses simply scaled by some amount. What changes across sequences, and which allows the motor neurons to fire at different times for different sequences, are the relative amplitudes of the ROS responses. Therefore, the stimulation and inactivation experiments potentially provide a separate method for verifying the crucial feature of the model, the nonlinear combination of rank-order and sequence selectivities of the ROS neurons.
The neural network model just described can generate a variety of motor sequences. It achieves this by using a type of neuronal response that has two principal characteristics: (1) for each cell, strong activation occurs over a specific period of time that is fixed across different sequences, and (2) the overall response amplitude, or gain, of each cell varies across sequences. The first property is solidly established experimentally; the second is true at least in some of the published examples, but has not been analyzed rigorously. According to the model, however, this second property is key, so the responses of most ROS neurons should display a perhaps modest but clearly significant sensitivity to sequence identity.
The network is also robust to synaptic corruption and behaves in a way that is ideally suited for mastering one particular sequence quickly. Thus, when a current sequence of movements becomes important, a large increase in accuracy can be achieved through small changes in synaptic connectivity, which could conceivably take place within the time span of a few practice trials (see Procyk et al., 2000; Lee and Quessy, 2003), without wrecking the performance of other stored sequences.
Functionally, the modulation by sequence identity exploited here is akin to the modulation by gaze angle that has been documented in parietal cortex (Andersen et al., 1985, 1990, 1993; Brotchie et al., 1995) and which is thought to be crucial for representing the locations of objects with respect to different body parts or with respect to the world (Zipser and Andersen, 1988; Salinas and Abbott, 1995; Andersen et al., 1997; Pouget and Sejnowski, 1997; Snyder et al., 1998; Buneo et al., 2002). In parietal cortex the modulation is by a proprioceptive signal, the responses are triggered by visual stimuli, and the key variable is spatial location. The situation seems completely different in the SMA and pre-SMA, where the modulation is by sequence identity, the ROS activity is driven by an internal representation of elapsed time (or number of elapsed events), and the main variable is time. Computationally, however, the underlying mechanisms are very similar: the gain-modulated neurons have a fixed preferred value along the relevant axis (space or time), whereas the downstream neurons have responses that can switch positions along the relevant axis (space or time). This transformation is a consequence of what is described mathematically as a basisfunction representation (Poggio, 1990; Pouget and Sejnowski, 1997). Such representations have been thoroughly studied in the spatial domain (Salinas and Thier, 2000; Xing and Andersen, 2000; Deneve et al., 2001; Deneve and Pouget, 2003; Salinas, 2004a, b), but may also be fundamental in the time domain (Botvinick and Watanabe, 2007; see also Wainscott et al., 2005).
Some of the original experimental reports documenting ROS responses suggested that they could serve as building blocks for assembling motor sequences (Mushiake and Strick, 1995; Shima and Tanji, 2000; Tanji, 2001). These simulations, however, are the first quantitative demonstration of this idea. In addition, they clarify three features of the data that seemed rather mysterious.
First, why are there at least some instances in which the ROS activity changes with the sequence? Without sequence-specific modulation, ROS neurons can only generate a fixed output, which means a single sequence — it is the modulation that makes them powerful and allows them to act as a temporal basis set, i.e., as building blocks for constructing multiple arbitrary sequences of movements downstream.
Second, some units show a strong dependence on sequence identity, whereas others seem to be entirely insensitive. Why are there so many types of ROS neurons? First, a wide diversity in modulation factors does not imply that there are fundamentally different cell classes; they could all participate in the same function. Different neurons may be simply modulated to different extents, or perhaps this results because the full modulation range of a cell can only be determined with a much larger repertoire of sequences. Second, diversity of modulation factors is in no way a problem for the circuit’s performance. What is important is that neurons should display all combinations of sequence and rank preferences.
Third, very few ROS neurons show two or more peaks of activity; why? In the experiments, some neurons were active in more than one, isolated epoch during a sequence (see Fig. 6 in Shima and Tanji, 2000). A neuron, for instance, could be strongly active both during the first and the third preparatory periods. In the model, with no further constraints, this makes virtually no difference; in principle, the same repertoire of downstream responses can be generated with single or multiple activation periods per neuron, as long as all the relevant combinations of preferred time segments (single or multiple) and preferred sequences are present. However, having ROS neurons that respond during single time intervals in the sequences drastically simplifies the calculation of the synaptic weights between them and the motor neurons (see Appendix). Therefore, the observed representation, in which most ROS units are active only during one discrete part of a sequence, is likely to be particularly convenient for learning.
In addition, two specific predictions arise from the general framework of neural basis functions (Salinas and Abbott, 1995, 1997; Pouget and Sejnowski, 1997; Ben Hamed et al., 2003; Salinas, 2004b) applied to this case. (1) The effect of sequence identity on the temporal profiles must be nonlinear, as in Eqn. 1, where it is multiplicative. This means it cannot be simply additive, for example (as in Eqn. 5). If it is, the model does not work at all: for instance, for the network in Fig. 1, PM increases from 0 to 0.55. (2) Preferred sequences and preferred activation time periods (ranks) should be distributed independently. The appropriate analyses for verifying these predictions remain to be done.
There are three types of neurophysiological paradigms involving sequences: (1) a subject must produce a complete sequence of motor actions from memory, (2) a subject must identify or discriminate sequences of stimuli presented in a given order (e.g., is your phone number 314-159-2653?), or (3) a subject must produce a sequence of motor actions such that each one is either accompanied or guided by a sensory stimulus. Case 1 corresponds to the purely motor tasks investigated here, and the proposal is that they depend on ROS responses modulated by sequence identity.
Case 2 is a perceptual task. Botvinick and Watanabe (2007) recently showed that such a task can be carried out on the basis of populations of neurons that are selective for individual sensory stimuli (shapes or locations) and are modulated by rank; that is, by the number of items that have been presented so far. Neurons with such combined selectivities have indeed been found in the prefrontal cortex and basal ganglia (Kermadi and Joseph, 1995; Ninokura et al., 2003, 2004; Inoue and Mikami, 2006). Thus, although the selectivities involved are very different, the circuits proposed here and in Botvinick and Watanabe’s study relie on similar mechanims for integrating information and generating a transformed representation downstream.
Case 3, on the other hand, is a mixed situation in which a sequence of motor actions is performed but not from memory alone. Neurophysiological studies employing such tasks (Clower and Alexander, 1988; Lu et al. 2002; Ninokura et al., 2003, 2004; Averbeck et al., 2006) have reported neurons with either straightforward motor properties or with combinations of rank-order (temporal) and sequence selectivities, as in the memory-based tasks of Tanji and colleagues (Ninokura et al., 2004; Averbeck et al., 2006). Interestingly, however, in some cases an intermediate motor representation seems to develop, whereby neurons prefer a specific motor action but only when it is triggered by a particular stimulus (Clower and Alexander, 1988; Lu et al. 2002). For example, a neuron could be active only when the display shows two dots aligned vertically and the target is the one on top. Such activation is not rank-selective, but could again result from a display-specific preference (for instance, to two vertically aligned dots) combined with a sequence-specific preference (see Salinas, 2004a, b). Task-dependent sensory responses of similar complexity have been reported before (Lauwereyns et al., 2001; Koida and Komatsu, 2007), so this is a reasonable possibility.
In general, the generation of serial behavior encompasses many tasks, and is likely to involve multiple neural mechanisms and representations. In the case analyzed here, modulated ROS responses arise when arbitrary sequences of discrete, unrelated movements are executed. In contrast, when monkeys are trained to draw geometrical shapes, prefrontal-cortex responses encoding each segment of a drawing are activated in parallel, rather than during discrete periods (Averbeck et al., 2002, 2003). Models based on this type of parallel activity account for a variety of serial-order effects observed in skilled behaviors such as speech production and cursive handwriting (Bullock, 2004; Rhodes et al., 2004).
Human psychophysical studies of sequence generation often use the immediate serial recall task, in which a list of items is presented and a subject must recall all items in the original order. Error patterns in this paradigm show a number of regularities having to do with list length, item repetition rates and item discriminability, and modeling studies suggest that these effects depend strongly on how the list is represented and dynamically updated in working memory (Rhodes et al., 2004; Botvinick and Plaut, 2006). However, because such models aim to explain a wide range of behavioral experiments, they are more abstract than the present one, and revolve around more general issues; for instance, whether a recurrent network architecture can, in principle, account for all the data.
Unlike these models, the current framework is based on a specific neuronal representation, and analyzes one experimentally identified step in the computational machinery required for producing a particular type of motor sequence. The next question in this regard is how the ROS cells are constructed in the first place. The underlying process is likely to require one or more mechanisms for time integration (Dehaene et al., 1987; Drew and Abbott, 2003; Mauk and Buonomano, 2004; Jin et al., 2007; Karmarkar and Buonomano, 2007). This will be an important challenge for future work.
Research was partially supported by grant NS044894 from NINDS to E.S.
This appendix describes a simplified version of the model with which it can be shown analytically (1) that the minimum number of ROS neurons needed to store NQ distinct sequences is NROS = NQ × NS , where NS is the number of steps or time periods per sequence, and (2) that having ROS neurons that are activated during a single time period simplifies the learning process.
For simplicity, assume (1) that α=0, so there is no response variability, (2) that the backround firing rate is zero, (3) that all sequences have the same importance, and (4) that there are NS segments or steps per sequence and each neuron is either active or inactive during each one of those steps; in other words, each movement or preparatory period is considered as one step, and a neuron, whether ROS or motor, fires at a constant rate throughout a step. Then, assuming multiplicative modulation by sequence identity, the firing rate of ROS neuron j during step s of sequence q is
where j goes from 1 to NROS, q goes from 1 to NQ, and s goes from 1 to NS.
In the examples discussed in the main text, there are are NQ=6 sequences, each composed of three movement periods, three preparatory periods, and a trailing blank period at the end, so the number of steps is NS=7 for those cases.
Now let’s find an exact solution to the problem, that is, a set of ROS responses and a set of synaptic weights such that the driven motor responses are exactly equal to the desired ones. This means that the following equality should be satisfied
where Mkqs is the desired response of motor neuron k during step s of sequence q and the expression on the left side is the corresponding driven response. Note that in this equation, the indices q and s appear together on both sides. Therefore, they can be replaced by a single index i that runs from 1 to NQ × NS. That is,
is equivalent to Eqn. 15. This, however, is a standard matrix equation, w r=M. Therefore, an exact solution for w can be found for any M as long as the number of independent rows of r equals or exceeds its number of columns (Barnett, 1990), that is, as long as NROS ≥ NQ × NS. This proves claim 1.
A couple of examples may provide some further insight. If is square (NROS=NQ × NS), and all the rows of r are linearly independent (i.e., the ROS cells respond differently from each other beyond additive and multiplicative constants), then the inverse of r exists and the synaptic weights that solve the problem are
and they are unique. In contrast, if NROS < NQ × NS, then weights can be found that approximate Eqn 16 in the least-squares sense, but there will be some sequences that the network will not be able to generate accurately. Finally, if NROS > NQ × NS, then there are many possible solutions, one of which is
and consider what happens when each ROS unit is active only during one step (the same in all sequences). Suppose that the the temporal profile of ROS unit j is
In that case, only those ROS neurons that fire during step s contribute to the motor response Mkqs. As a consequence, Eqn. 21 can be rewritten as follows
where NROS(s) is the number of ROS neurons that are active during step s, the index k now runs from 1 to the number of motor neurons that fire during step s, and the superscript s means “for neurons that fire during step s”. In effect, by doing this, Eqn. 21 is broken up into NS independent matrix equations, one for each step. The above expression is equivalent to
which shows that the problem is now local in time: only those ROS neurons that fire during step s drive the motor responses that occur during that step and contribute to the matrix ws. Thus, in practice, finding all the network connections now requires the solution to NS small systems of equations (Eqn. 24), which are decoupled. In contrast, without the temporal constraint imposed by Eqn. 22, each ROS neuron can fire during several steps, and as a consequence, a much larger, fully coupled system of equations must be solved simultaneously. Here is another way to think about the distinction: with Eqn. 24, the connections from presynaptic neuron i are updated only when unit i is active, whereas in general, the connections from unit i must be updated at every step, even if unit i is silent. This is what happens in Eqns. 18-20 as well as in Eqn. 17.
This analysis does not reveal which biological synaptic modification mechanisms actually work to establish the necessary connections in reality (see Davison and Frégnac, 2006), but whatever they are, the problem is vastly simplified if the correct strength of a synapse depends only on the pre- and post-synaptic activities evaluated at the same time, rather than at different times.