Cortical Integration: Possible Solutions to the Binding and Linking Problems in Perception, Reasoning and Long Term Memory
Nick Bostrom
[email protected]
Dept. of Philosophy, London School of Economics, and Dept. of Mathematics, King's College
(August, 1996)
ABSTRACT
The problem of cortical integration is described and various proposed solutions, including grandmother cells, cell assemblies, feed-forward structures, RAAM and synchronization, are reviewed. One method, involving complex attractors, that has received little attention in the literature, is explained and developed. I call this binding through annexation. A simulation study is then presented which suggests ways in which complex attractors could underlie our capacity to reason. The paper ends with a discussion of the efficiency and biological plausibility of the proposals as integration mechanisms for different regions and functions of the brain.
1. The problem of cortical integration
If sensory input and memory mechanisms lead to spatially distinct neural representations in the brain, these representations must somehow come to interact in order to bring about an appropriate behavioral output. The problem of cortical integration is how such a constructive interaction can take place.
One straightforward solution would be if the earlier representations converged on more abstract representations, determining which of these abstract representations should be activated; that representation, in turn, causing the adequate motor response. These abstract representations could be either individual cells or assemblies of cells. If they were individual cells, and if the representations were what we shall call transparent, i.e. possessing a readily identifiable representational content, then we would have a vindication of an extreme form of the so called "grandmother-cell hypothesis": there would be a specific cell for each possible situation the subject could ever be in. In this extreme form, the grandmother-cell hypothesis is obviously absurd: already the well-formed English sentences of less than 15 words with clearly distinct meanings outnumber the total of neurons in the brain [1]. The explosion of combinatorial possibilities makes it impossible to reserve one nerve cell, much less a cell assembly, for every conceivable input configuration that needs to be distinguished.
So it is necessary to have compositional representations: individual neurons must be enrolled to serve in combination with many different sets of other neurons as constituents of compositional representations. Somehow these individual neurons must interact, for instance by projecting onto the same down-stream neurons. This type of arrangement we will call integration through convergence and will be discussed in section 2.
A major disadvantage with convergence as an integration mechanism is its lack of flexibility. Since the functional significance of a node in a homogenous feed-forward network is wholly dependent on its position in the connection matrix, it is not easy to extract its representational content, examine it, and put it back in a modified form. This rigidity makes it unsuitable as a way of integrating those representations that will undergo frequent and drastic change, like the transitory combinations used in abstract reasoning. These cry out for more flexible a more flexible approach.
It is sometimes useful to distinguish two cases here. If the representations to be integrated, or held together, are within the same cortical area, we talk of the need of linking them together. If they are in different areas, we speak instead of binding. Thus the problem of cortical integration splits into the linking problem and the binding problem. Sometimes, in the literature, "binding" is also used in a broad sense to refer indiscriminately to all forms of cortical integration, but I we shall try to keep the terms distinct in this paper.
2. Integration through convergence
2.1 Grandmother cells and cell assemblies
Because of its rigidity, binding by convergence is almost certainly not the whole story of cortical integration. This does not mean, however, that it doesn't occur and play an important part. On the contrary, there is evidence, if not literally for grandmother cells, then at least for cell assemblies, and hence individual neurons, that respond selectively to faces, hands and similar items. Some cells in the visual cortex of a monkey, for example, respond exclusively when the animal is presented with monkey hands and star-like shapes; other cells show selectivity for faces. (No such specificity has been observed for unfamiliar objects or novel scenes.) Furthermore, it is well documented that in the early stages of visual processing there are successive layers of cells acting as feature extractors for edges, corners, moving edges etc. At the other extreme, there are reports that increased activity in individual cells antecedes specific motor actions in a regular manner Sakai&Miyashita(1991). It thus appears that convergence plays a prominent part in both early sensory processing and late motoric sequencing.Less is known about what happens in between. A reasonable guess is that insofar as the activity of individual cells significantly correlates with a particular act of reasoning or with the entertaining of a certain concept, it does so only by virtue of being a member of a cell assembly that correlates similarly but more significantly. The reason is that a single cell would not have the requisite robustness in a such a noisy system as the CNS. A representation in early visual processing may or may not be robust: if it fails on some occasions, it would only cause an unnoticeable error of the size of a pixel in the visual field, a blur that would probably disappear completely after later abstraction and interpretation. But if you were thinking "I wish run to away from yonder wolf", and the neuron representing the concept "run away from" inadvertently flipped, changing the thought to "I wish to greet yonder wolf.", then you would be in trouble.
If it is not a single neuron by itself, then it must be a group of neurons; and if the representation must be kept active, then the neuronal activity must persist, presumably in the same neuronal group. Therefore Amit(1995) has a strong case when he argues that the reverberating attractor is an ubiquitous building block in cortex. Sustained attention to a thought or percept indicates that a group of neurons is engaged in continuous firing. The best way to accomplish such attention is through mutual feedback. Even for periods shorter than one second, it seems necessary for a neuron to have steady support from excitatory inputs if its activity is not to wane. Thus a reverberating attractor, a state of self-sustained activity in a group of interconnected neurons, serves to sustain the information contained in the active representation until it has served its purpose in the present computation and can be dropped. The dance moves on to another site; the attractor remains, slumbering in the synaptic efficacies, awaiting the next occasion when it will be called upon like an organ pipe to contribute its designated note to the cognitive concerto.
The Attractor also has several other features that makes it an intuitively likely correlate to the Concept, but we can't go into that here. In any case, however useful it may be in every other respect, the unstructured attractor does not solve the problem of cortical integration. As has often been pointed out, understanding the sentence "John loves Mary", for example, cannot consist simply in a special attractor representing "John", and one representing "loves", and one for "Mary", being activated; for then there would be nothing to distinguish this sentence from "Mary loves John." which consists of the same words and the understanding of which presumably involves the same concepts.
2.2 Integration through opaque feed-forward structures
The paradigmatic neural network is the three-layered feed-forward structure, trained with the back-propagation algorithm. Such an architecture exhibits generalization, graceful degeneration, and it is efficient as a classifier or feature extractor in many natural settings. Feed-forward structures abound in the brain, especially in sensory processing. It is not wholly clear how learning takes place: there is no known mechanism for back-propagation of error messages in the brain. Possibly, a great deal is genetically predetermined; but since perceptual skills have to be learned, there must also be some mechanism for modifying synaptic weights in these feed-forward structures capable of leading to greatly improved performance.The properties of feed-forward structures and the location of their biological implementations are well-known and need not be further explained here. What needs to be clarified is rather the taxonomic issue of how these structures relate to other means of cortical integration. We may informally distinguish between two sorts of representations: transparent and opaque. A transparent representation is one that is readily given an interpretation in a natural language. For example, if there were a node that was activated only in the presence of a pink elephant, then that node could be said to represent (the notion of) a pink elephant; if a node responds to a moving edge in a certain part of the visual field, then that is what the node represents. For an opaque representation, it is not possible to single out a definite denotation or meaning and explain it with a short expression in a natural language. If a three-layered network is trained to perform some discrimination task, it sometimes happens that the nodes in the hidden layer correspond to readily identifiable features, but often they don't, and in that case we say that they are opaque representations: they correspond to some holistic feature of the input which could be expressed in English only as an enumeration of the inputs that would elicit it, or as some complicated sort of weighed sum of different properties for which we have predicates.
We can now see that binding by grandmother cells is only a special case of convergence-integration in a feed-forward structure: the case where the representations are transparent. The sense in which a grandmother cell or cell assembly is a mechanism of integration is the same as the sense in which any hidden or output node in a feed-forward structure is an integration mechanism: its activity reflects complex relations between the activity of nodes in earlier layers.
3. Integration through convolution
3.1 Recursion and compression as an integration mechanism.
One approach to the integration problem is to use a recurrent network. This was originally proposed by Pollack (1990) as a way of representing compositional symbolic structures such as lists and trees in neural networks. The principle behind the Recursive Auto-Associative Memory (or RAAM) is easily explained. Consider an sequence A, B, C, D... that is to be stored in the memory. The first step is to feed the representations of A and B to the input layer. The input is then forwarded by the network to the hidden layer, which is typically a bottle neck; in the simplest case it consists of half as many neurons as the input layer. So some compressed representation (AB) of A and B results in the hidden layer as a consequence of the input. Now (AB) is fed back to the input layer where it is combined with C. A new representation in the hidden layer, (ABC), results, which, in its turn, is combined with D; and so forth. Two things are needed if this is to work. First, we have to make sure that the compressed representation conserve the information contained in their constituents. Second, a mechanism must be trained that can develop the original sequence from its compressed form. Both these things are taken care of simultaneously in the training procedure of the RAAM. To achieve this, the hidden layer is connected to a third layer, the output layer. Now, for each training pattern, the network is trained by a backprop algorithm to reproduce the input pattern in the output layer. In this way, the first two layers come to act as an encoder, while the last two layers do the work of a decoder. What distinguishes the RAAM from a plain encoder/decoder network is that the training patterns are not determined in advance but are made to depend on what patterns emerge in the hidden layer after each update. This is what makes possible the construction of "convoluted" representations that can be unfolded and developed stepwise through a circular process.Advantages with the RAAM include: openness of memory capacity (overloading the RAAM gradually increases the frequency of retrieval failures, but there is no sharp upper bound on the capacity); generalization ability (rather modest in Pollack's simulations); some degree of biological plausibility (to be considered later); and a standardized form of representation (the compressed patterns that appear in the hidden layer can have very different structures, but they are all of the same size, which may facilitate further processing).
An extension of the RAAM has been simulated by Reilly (1991), who prefixed a standard simple recurrent network (SRN) (Elman (1990)) to the RAAM. This enables the system to perform some simple on-line parsing of verbal input. The prize paid is that learning gets more difficult, and the adequate RAAM representations have to be know in advance, before the SRN can be trained. The performance was also limited in several ways (some of which might probably be avoided if the SRN is replaced by something more powerful, such as the auto-associative recurrent network (AARN) developed by Maskara at al. (1993)).
3.2 The biological plausibility of binding by recursion.
The RAAM clearly has some attractive properties, but how could it be implemented in the brain?There are plenty of feedback loops on different levels in the nervous system. Within cortex there is recursive connection between the different layers, and between different areas. On a larger scale, there is, for instance, the loop from cortex down to the basal ganglia and back again. This latter alternative has the consequence that the compressed representations would appear down in the basal ganglia instead of in neocortex, unless they were somehow transferred back up to the relevant cortical areas. From lesion studies (e.g. Gainotti et al. (1996)) it appears that the main basis of conceptual representations is in the cortex, so the hypothesis that the basal ganglia serve as the bottleneck in an RAAM seems implausible. The same observation seems to rule out the hippocampus as the site of the RAAM.
Thus it seems more likely that the RAAM would be instantiated within the cortex. It would be interesting to have a biologically realistic simulation study of the potential of cortex to harbor a RAAM between different layers or between cortical areas cortex, vertically or horizontally. One possibility is that there is not one big RAAM, but rather multiple small scale RAAM-like structures, some of which might have their compressed representations forwarded as partial inputs to other RAAM-structures. This way an architectural structure might be built up that could be operated upon at different levels depending on the degree of detail required for the task at hand; synopsis would be combined with richness in content. So far, these ideas have not been elaborated.
One difficulty with RAAM like systems is that it is not obvious how the learning would proceed if they are biologically implemented. In the simulations, they are trained by backpropagation, and the same training patterns are circled though innumerable times while the weights are slowly adapting. But cortex does not use backpropagation in any ordinary sense, and some types of learning require but a single presentation. Until this crucial issue about learning is clarified, there is no RAAM-theory of cortical integration. One interesting approach, that could possibly overcome these obstacles that combines ideas from the RAAM with attractor net theory will be discussed in section 5.
4. Integration through synchronization
4.1 The idea
The idea that binding might be achieved in the brain by means of temporal co-ordination of spiking activity was first formulated, not very explicitly, by Milner, and later by Von der Malsburg (1981). It is only in later years, however, that it has been subjected to serious empirical investigations and simulation studies. The reason for this delay is that informative multi-electrode recordings have been technically difficult to perform and evaluate. Another factor for the delay is probably that the original idea was not sufficiently elaborated, so that it was not clear exactly what was supposed to be bound to what, or which areas in the brain that were involved. There is still much to be worked out on this theoretical level before we can say that we have even the outlines of a theory about how the linking and binding problems could be solved by synchronization. We shall return to this issue in section 6. For present purposes, it may suffice to think of the synchronization proposal as the hypothesis that some complex representations consist of distinct groups of activated nerve cells that are firing in synchronization with each other but not with other active cells in their neighborhood.4.2 The advantages of synchronization
Apart from being a way of dealing with the problems of linking and binding, synchronization would bestow several other advantages onto the nervous system. Let us just list them here; we shall later see how they relate to the evaluation of the evidence for the synchronization hypotheses.First: if what makes a subset of neurons in a population an active representation is the synchronization of their activity, rather than their level of activation (i.e. firing rate), then the level of activation can be invested with representative significance. For example, in sensory processing, the firing rate could conveniently be used to represent the intensity of the stimulus, or the degree of fit between stimulus features and perceptual categories.
Another advantage is that the synchronization mechanism might increase the processing capacity of cortex through enabling multiple representations to be simultaneously active in the same territory without fusing. On a coarse time scale (>40ms) they are overlapping, but on a finer scale they are alternating. Not only will this provide for more active patterns at any given time, but it will accommodate several patterns at the same location, which might be important for some types of operations and comparisons.
Moreover, co-ordinating spiking can serve to increase the impact of the regimented group of neurons. This might be necessary to achieve an immediate and certain effect in such a noisy environment as the central nervous system, where the contributions of individual neurons or even of groups of neurons are easily lost, unless they all arrive at their target cells within a narrowly defined time interval.
A fourth advantage has to do with quickness of operation. Since an excited neuron typically continues to respond for several 100 ms, an unsynchronized network could have difficulty going through more than five or six effective updating cycles per second. With synchrony, however, a neuron could be virtually switched off in a few milliseconds by having its firing schedule displaced with a small phase term; the neuron could continue to fire, but it would no longer exert a significant influence as it would be out of phase with the other neurons.
This brings with it yet another bonus. Since the total incoming activation to a cell at any given moment is very much greater if the inputs are synchronised and get there at exactly that moment, rather than unsynchronized and arriving at all times, the network can use a double threshold function to make fine temporal distinctions in its synaptic modifications. If the activity is lower than the lower threshold, no changes are made. If it is higher than the lower threshold, but lower than the upper threshold, then the synapses undergo long-term depression (LTD). If the activity exceeds the upper threshold, the synapses get potentiated (LTP). This means that in-phase inputs will be enhanced; out-of-phase inputs depressed; and unsynchronized input channels will remain unmodified. This arrangement reduces the effects of irrelevant noise on learning and makes modifications especially tailored to the essential factors that influenced the neuron at the time of the probing. Neuronal learning in accordance with these principles has been observed; for a review see Singer(1990).
Synchronization also suggest itself as a way of organizing attention. It is not yet known whether the thalamic nuclei, or the basal ganglia, play a role in achieving synchronization in cortico-cortical activity, but if it turns out that it does, then this would be an obvious candidate to serve as an attention mechanism. With the synchronization being driven both from the local area in cortex and from deeper regions of the brain, we would have a physical correlate to the psychological endowment that allows attention to be shifted either by will or by the intrinsic salience of the stimulus. For example, a stimulus could attract our attention by being so intense that its representation in the early processing is active enough to win the competition for impact on a higher level of processing. Or the stimulus could include an organized movement, for instance, that would tend to synchronize the cells in visual cortex and thereby increase the likelihood that their influence become predominant for the continued abstract processing. An finally, there is also the possibility that input from the thalamic nuclei or the basal ganglia prejudices the probability that synchronization should occur among certain groups of cortical cells, which would in effect mean that the system were actively searching for particular patterns or concentrating on some aspects of its perceptual field.
4.3 Review of some simulation studies of synchronization as a binding mechanism
Computer simulations aimed at cast light on the feasibility of achieving linking and binding by means of synchronization range form highly idealized models to ones that pay close attention to the detailed organization of the central nervous system. The highly idealized models of brain function can sometimes acquire relevance for neuroscience through discovering or illustrating computational principles which may also be employed by the brain, though presumably the implementation is rather different from how it is done in the simulation model. More realistic models can test whether a certain algorithm is efficient for dealing with the sort of tasks the nervous system is good at solving; if so, then this suggests that we look further into the matter to see whether Evolution has struck upon the same solution. Models that are still more realistic can be used to determine constants which can be directly compared to empirical data.A model by Mani et. al (1993) belongs to the first category, the one with the high degree of idealization. It is a hybrid system for knowledge representation and abstract reasoning based on a type hierarchy and a rule-based reasoner. It involves synchrony as a means of binding variables. The system makes swift deductions and the time it takes to answer questions is independent of the size of the knowledge base. There are, however, limitations as to the sort of sentences it can handle, and it is not clear what biological entities in the brain are supposed to correlate to the various elements of the model. The issue of learning is not addressed.
Nenov & Dyer (1993, I & II) have developed a system they call DETE, which exhibits many features which are biologically realistic compared to Mani's et. al system. DETE is also a neural/procedural hybrid. In its structure and performance, however, it is more realistic than Mani's et al. model. DETE is set a twofold task: to verbally describe elements of its two-dimensional blob world, and to ostend objects in the blob world in response to verbal input. Its problem situation is similar to that of on infant learning to speak. Corresponding to objects like "Mama", "ball", "milk", in the infants environment, DETE's blob world contains simple geometrical shapes of varying size, colour and motion. These distinct types of features are projected onto several feature planes by a procedural mechanism. Thus there is one feature plane where the shapes of the objects are represented as distinct spots of activity, another feature plane that encodes information about their direction of motion, and so forth. The binding problem arises here as the necessity of somehow keeping track of which activation spots in one feature plan are caused by the same object as which patterns on another feature plane. DETE handles this binding problem by means of having the ontologically related representations arranged to fire in synchrony. This is done by a procedural mechanism. The resulting performance is quite impressive: DETE uses a moveable token in the blob world, representing a hand, to single out or manipulate objects in response to sentences like "Push left the red square.". It also has circle in the blob world which represents its focus of attention, and the position and size of which can be determined by the experimenter. For example, by having the attention window set successively to different parts of the blob world, DETE generated the sentence "Two objects. Medium circle in the center. Small blue square up right." More challenging sentences involving bouncing objects were also successfully processed.
While DETE is a quite powerful and flexible, its efficiency is to a considerable extent due to clever hard-wiring. The procedurally managed distribution of feature representations to the proper feature planes unloads the neural system of part of the learning burden. It is indicated, however, that these procedural mechanisms could be replaced by parallelized processing in forthcoming developments. Additional biological realism could also be added by having the verbal input presented in a rawer form, so that neural modules had to pre-process it for phoneme extraction, word recognition etc. This seems relatively straightforward in principle. A potentially more problematic feature is the employment of procedural mechanisms to determine the phase of the activation patterns on the various feature planes. This solution to some extent begs the question, if what we wanted to find out was whether synchronization would be an efficient and biologically realistic way of modelling the brain's way of integrating neural representation.
An even higher degree of biologic plausibility is possessed by the simulations by Sporns et al. (1994) and Ritz et al. (1994) and others which do not contain any procedural modules, and wherein all parameters are given values that have been determined by neurophysiological experiments.
One of Sporn's system is big (10,000 neurons; one million connections) and has an architecture that is supposed to mirror that of the visual cortex in the human brain. It has successfully reproduced the linking of representations of segments of moving objects that has also been observed in cat and monkey. Neurons which are part of representations of contour segments of the same object tends to have their activity synchronised, in ways that suggests application of the basic gestalt laws.
Sporn's et al. argue that local synchronization is a prerequisite for intra-areal synchronization because the activity of single cells is not statistically significant in a realistic system; but if local synchronization is established, then it is often possible to propagate this coherence into distant regions since the brain is ordered in such a manner that axonal projections from units in a neighborhood typically have their end terminals clustered, with local branching. When this is the case, then local coherent activity in one cortical area can easily induce local coherence in another area, thus effecting a binding between the two activity patterns. One finding was that the dynamical properties of the network as regards its properties of synchronization were rather sensible to alterations in its micro-scale structural design. This would indicate that it might be necessary to include a considerable extent of the micro-structure of the cortex in order to get models that can reliably simulate the synchronised spiking that goes on in real brains. The richness in the fauna of different types of nerve cells and synaptic connection modes could conceivably have evolved to allow for such tuning and optimization of the reverberating properties of cortical areas.
In the Spike Resonance Model by Ritz et al., the neurons have an absolute refractory period; axonal delay times are assigned according to a biologically plausible distribution; and excitatory and inhibitory synaptic potentials are given a realistic shape. In spite of these complications, several quantities can be calculated analytically, and the whole system is sufficiently agile to have allowed the Ritz et al. to perform a simulation involving 32,000 rather heavily connected neurons!
The task set to the system is a rather simple matter of pattern completion. Learning is achieved according to a variant of the Hebbian learning rule. One especially interesting result was that binding by synchrony can occur between two modules, whose connectivity was set up so as to mirror the connectivity through the corpus callosum between columns in opposite hemispheres, if and only if the transmission delays are less than 5 ms on average. This is a prediction of an upper bound on interhemispheric axonal transmission delays and should lend itself to direct empirical verification. (If this prediction were falsified, that would indicate that the model is oversimplified or that the parameters have been given improper values; it would not say much about synchronization in the brain. But 5 ms does not appears to be too tight.)
Another finding was that the number of pattern that can be simultaneously active in a given region without fusing together is restricted to four. The number depends on the parameter values, especially the strength and duration of local inhibition; but four was the number obtained when the parameters were set to realistic values. If this result reflects a property of the nervous system, then that poses some limits on how synchronization can work as a mechanism for cortical integration. We shall consider this further in section 6.
4.4 The neurophysiological evidence for synchronization as a mechanism integration
That brain activity exhibits synchronization has been established. Local synchronization has been observed in pigeons, cats, and awake behaving monkeys. Interareal synchronization has been observed between area V1 and V2 in macaque monkey and between several areas in the cat, even between neurons in different hemispheres. It is still a matter of debate, however, what role, if any, synchrony has in the establishment of linking and binding structured neural representations.The observed synchrony is quite accurate; the half-width at half-height in the correlogram are typically about 2-3 ms. However, with present technology, it is not an easy task to measure synchronization. Improved multi-electrode recordings is what we hope for. Data from individual neurons do not imply anything about whether their activity is synchronised with that of other nerve cells. When a single electrode is used to register the activity from multiple units, we can sometimes observe oscillations; and that is evidence of synchronization, for only coherent firing from the recorded units could cause such large fluctuations. But often such single-electrode experiments fail to discover existent oscillations and synchronization. The reason is sampling problems: often one or two co-ordinated bursts are all there is. The event can still be salient and well-defined on a macro-scale, but when we record from only a few cells, noise makes the regularities disappear. One way around this would be to search for synchronizations that can be expected to endure for some time; for example under conditions where an ambiguous stimulus is presented which challenges the systems' problem solving capacity and demands sustained attention. This is not unproblematic either, however, for while the neurons may oscillate in synchronization, their oscillations are not normally rhythmic! The period keeps changing, so we are faced with a nonstationary time series, which requires long sampling times in order to yield significant results.
Anyhow, both oscillation (on the scale of individual neurons up to the level of the whole brain, as in EEG patterns) and synchronization have been observed, and we may ask in what sort of relationship they stand to one another. It has been pointed out that some degree of predictability, and hence oscillation, is necessary for the maintenance of a zero-phase lag synchronization, especially between cells separated by long conduction delays, as is the case when they are located in different hemispheres. Thus a tendency to oscillation in the activity of individual cells seems to be instrumental to their collective synchronization. At the same time, it appears natural for groups of synchronised cells to begin to oscillate as they influence one another; they discharge at the same time and then presumably reload at the same rate until they are ready to make a new simultaneous discharge. Simulation studies confirm this intuition.
While the phenomenon of synchrony in cerebral activity is established beyond doubt, this does not settle the question what functional role, in any, such synchrony plays in sensory and cognitive processing. But there are findings which have an indirect or a direct bearing on this issue.
To begin with the indirect evidence, it is known that the synchronised activity in cortex is at least partly due to cortico-cortical connections; for cutting off the corpus callosum causes the synchronization between cells in different hemispheres to disappear. The fact that the synchronizing is dependent upon cortical connections is indirect evidence in that it dispels the suspicion that the synchrony is due solely to the influence of a common input from thalamic region. If that were the case, then synchrony could hardly be a dynamic binding and linking mechanism for features in perceptual processing, because thalamic cells possess but very limited feature selectivity. So by discovering that cortico-cortical connection play a major part in the synchronization process, one stumbling block is cleared away from the path to the synchrony solution of the integration problem.
Another piece of negative evidence comes from the refutation of the objection that synchronization would take too long to establish, so that it could not comply with the severe time constraints in visual processing. Gray et al.(1987) demonstrated that synchronization can be established within 50-100 ms in the visual cortex of the cat, which is consistent with behaviorally measured response times in visual discrimination tasks. It has also been shown that the patterns of synchronization possess a high degree of flexibility and can be dissolved as quickly as they can be formed.
Important direct evidence for correlation between synchronization patterns and certain perceptual features has been found in studies that measured separately the activity of several cells in the visual cortex while moving bars or contours were projected onto the retina. For instance, in one experiment, two electrodes were separated by 7 mm in the primary visual cortex of cat. It was determined that these two cells responded selectively to vertically oriented light bars on two different locations in the visual field. Then three stimulus configurations were tried in turn. In the first configuration, one light bar moved left at one location, while a second light bar moved to the right at the other location. In this case, there was no synchronization between the cells. In the second configuration, both bars moved in the same direction. Here the cross-correlogram revealed a significant degree of synchronization. In the third configuration, one long light bar moved across both locations in the visual field (as well as over the area swept by the line connecting these locations). A highly synchronised firing resulted. This suggests that synchronization can serve to link of features into an object representation. Two bars moving in opposite directions do not belong to the same object; the nodes responding to those movements are not harmonized. Two lines fragments moving in the same direction, especially if they are part of a common continuous contour, tend to have a shared origin; they get linked together through their representations being synchronised. It is intriguing that one should find the basic gestalt laws (grouping by: vicinity, similarity, continuity, common motion) reflected at such an early stage of sensory processing.
Another result gives some direct evidence, not only that synchrony occurs and is related to perceptual features, but also that it has a functional significance. In a condition known as strabismic amblyopia, the perceptual powers of one eye have deteriorated as a result of the subjects suppressing its signals in an attempt to deal with strabismic double vision. The typical symptoms are loss of stereopsis, and, for the deteriorated eye, decreased resolution and the occurrence of "crowding", a drastic impairment of the ability to recognize shapes that are surrounded with other contours. When this deficit was induced in cat, multielectrode recordings from the striate cortex revealed that neurons driven by the amblyopic eye were much less synchronised with each other than were similar neurons driven by the normal eye, when light bars or gratings were projected onto the animals' retina. Taken together with the findings mentioned in the previous section, this suggests that the synchronization of neuronal responses helps solve the task of feature integration, and that the crowding phenomenon is caused by a failure to establish proper synchronization.
4.5 Summary of evidence
So, to sum up: It is known that synchronization occurs in cortex and that cortico-cortical connections make a very significant contribution to this phenomenon. The synchronization can be achieved and abolished on a time scale that is consistent with what we know from visual discrimination experiments. Thresholds for LTP and LTD are set so high that some synchronization seems necessary if activity should overcome them and learning take place. There are several apparent advantages with having coherent cortical activity (as reviewed in the preceding section). This speaks weakly against the assumption that synchronization is used to solve the linking and binding problems. For if synchronization would have not have been associated with definite advantages, or would even be counterproductive to many purposes, then this would strongly suggest that the synchronization we observe would be a way to bind and link neuronal representations; for why else would it be there in such a case? However, since there are independent motivations for its existence, and since in any case neuronal networks seem to have a natural tendency to engage in coherent firing and oscillations, we cannot from the fact that synchrony is there draw the conclusion that it probably is the mechanism whereby the integration problem is solved. To prove that, we need to look for evidence that establishes a direct causal connection between synchronization and performance on tasks that require compositional neural representations. The multielectrode recordings from cat presented with moving contours and from cat with strabismic amblyopia strongly suggest, but do not prove, such a causal connection between synchronization and linking in early visual processing. No comparable results have yet been obtained for later stages in visual processing or for the problem of interareal binding.
5. Integration through annexation
One way of achieving cortical integration by biologically realistic means is through having the complex representations consist of big attractors composed out of attractors corresponding to the component parts of the complex representation. This solution seems plain and obvious, but in spite (or perhaps because) of that, it has received little attention in the literature on the binding problem. In fact, the only source known to the author which mentions it is van der Velde (1995), and even there it is not put into clear focus.
Van der Velde's article is concerned with arguing that neural networks can in principle learn to produce non-regular grammars, in particular context-free grammars. This is not directly relevant to the questions we are trying to answer here, but the system he presents illustrates a powerful method of binding or linking representations together, and we will therefore review its main features in the paragraph that follows.
The principal component of the system is an attractor neural network (ANN) which can be thought of as consisting of three sections. In order to store a sequence A, B, C, D, ..., the apparatus begins by clamping A to the middle section while simultaneously clamping two random patters r0 and r1 to the left and right sections, respectively. The weights are then updated by a Hebbian learning rule. Next, the pattern B is presented to the middle section, while r1 is clamped to the left section and a new random pattern r2 is clamped to the right section. New Hebbian update. Then C to the middle, r2 to the left, and r3 to the right; and so on. After the last element in the sequence has been stored, a special pattern T is stored in the middle section, indicating a Termination point. Now, to retrieve the sequence, we simply feed in R0 to the left section. This will cause the activity to spread in the ANN and after a few updates land in the attractor consisting of pattern R0 in the left section, A in the middle, and R1 on the right. The pattern in middle section, A, is fed out as the first symbol in the sequence. Then the pattern in the right hand section is copied and clamped onto the left section, and the ANN is allowed to settle into a new attractor, which will be (R1, B, R2). And so it goes on, until the symbol T is encountered, which causes the process to stop.
The feeding back of intermediate results is clearly reminiscent of the RAAM machine we looked at in an earlier section. Recall that a major problem with that was how the learning, with backpropagation and multiple presentations, was suppose to take place in the cortex. In the present system, learning is achieved from a single presentation and through a purely Hebbian algorithm. That is a huge advantage.
There are still some complications, however. The whole procedure of generating random patterns and transferring them back and forth between various parts of the ANN, while scanning another part for a special termination pattern which shuts down the system, though undoubtedly cortically implementable in principle, looks more like the turnouts of a computer engineer than like a work of mother Nature. But I will argue that these complications are inessential features that can be simplified away as soon as we give up bothering about principled questions of our ability to generate truly non-regular grammars etc.
The core that remains is simply this: In order to integrate the patterns A, B, and C into an complex whole, simply clamp them on adjacent sections of an ANN! This way the complex representation is stored after one shot, and the synaptic mechanisms that support this process are the well-known phenomena of Hebbian short and long term potentiation. The memory trace is distributed, robust and manifests graceful degeneration. And it is content addressable.
Suppose now that we want to store the pattern CBA in the same memory compartment as where we stored ABC. Will this incur the risk that ABA or CBC is retrieved when B is clamped to the middle section? Not if the ANN is fully connected, or if there are sufficiently strong connections between the left and the right sections. There are many cortical areas that satisfy this requirement, even for complex representations much longer and bigger than a triple, at least if the constituent patterns are not too big. They need not be. In principle they could be just symbols of concepts whose full meaning and content were stored elsewhere.
For example, take the thought "Dogs can swim." The concept of "dogs" presumably contains the essentials of the whole lot of things the subject knows about dogs; and likewise for the concept "can swim". So a person's grasp of these concepts must involve a vast amount of nodes and connections. But this knowledge needs to be represented once only. It does not need to be explicit in the representation of "Dogs can swim." It would, in principle, suffice if the pattern DS were laid down in an ANN, presuming that there is a mechanism at hand that can activate the "dog" concept in response to the pattern D, and the "can swim" concept in response to S. And the pattern DS could be stored by a very small number of neurons.
This is not to be take literally as a theory of concept representations in the brain, but only as an illustration of the fact that the full representation of a concept need not be repeated in every ANN representation of a thought. A more realistic theory might perhaps start with the assumption that there are, in general, no separate representations of the conceptual content; there are only the concept symbols that occur in individual thoughts and beliefs: and the concept is nothing but the contribution this symbol makes to the belief representations wherein it occurs. A special status might then be granted to concepts that are directly grounded in perception; and so on. But it is clearly beyond the scope of the present document to elaborate on this line of thought.
So the need for multiple concept instantiations does not necessarily spell disaster. They are quite cheap if a symbolic encoding is used. Without prejudicing the issue of whether the symbolic attractors would mostly be extended over a wide cortical area, with very many attractors occupying the same region, or rather tend to be smallish, laying side by side, it can nevertheless be instructive to calculate how many propositions could be stored in a cortical ANN of 1 mm2. Let V be the size of the conceptual vocabulary. Then one concept can be coded in 2log(V) bits (or less, if the concepts have different usage frequencies). Let the average number of concept-instances in a belief (presumably somewhat less than the number of words in the sentences that express it) be n. Let d be the neuronal density in units of number of neurons per square mm. We then have
N = d*0.138 / (2log(V) *n*Robustness)
where 0.138 is the Hopfield value (i.e. the ratio of the storage capacity of a Hopfield net and the number of neurons it contains), and Robustness is a value that compensates for the difference in efficiency between an ideal Hopfield net and a noisy, asymmetric partially connected sheet of cortical cells. To get a very rough estimation of N we can take V=100.000, n=5, Robustness=50, and (from Douglas&Martin(1991)) d=105. We then obtain N=1000, plus minus an order of magnitude or so. This does not seem to be on wholly the wrong scale.
Another problem is this: How do we access all the patterns that begin with the subpattern A, for example? If we feed in A to the first position in the ANN, it will settle into an attractor, ABC, say. But there might be other memories that also begin with A, e.g. ACB, ADE, etc. If we simply repeat the process of clamping A to the first position, we may be frustrated to discover that the network keeps being sucked in by the same pattern ABC each time. ABC might be the strongest attractor beginning with A, and this prevents us from ever recalling the other memories from the clue A alone.
One countermeasure is to have the neurons getting tired after a while, so that the neurons active in the B and the C of ABC eventually retire and allow the activity to flow over to another basin. Depending on the delay before exhaustion sets in, this would make the attention flow quickly or slowly from item to item.
A less passive approach is to include an extra context segment in each complex attractor. Thus we would have ABC1, ACB2, ADE3, etc. In order to scan though all patterns beginning with A, we begin by clamping A and 1 to the first position and the context position, respectively. Then we change to the next context, 2; then to 3, and so forth. Each pattern ABC1, ACB2, ADE3, etc. will then come forth in turn, and will be maintained for exactly as long as we (or the system) choose. The context need, of course, not be represented as a distinct section; it can equally well be a general "colouration" of the whole pattern. And the same holds for the other parts of the representation: the sharp, discreet, linear form suggested here is chosen merely for the sake of clearness of exposition; in nature things will be more muddled.
One advantage of annexation over synchronization is that annexation not only groups representations together into a set; it orders them into a tupple. Though tupples can be defined in terms of sets, it is probably important to have an economic means of representing tupples.
In section 6 we will discuss what sort of contribution to the solution of the problems of cortical integration we can expect from complex attractors. We now turn to a review of a system that was developed with the purpose in mind that it should illustrate how structured attractors could underlie the ability to reason.
[Note: I have only recently become acquainted with the work of William Calvin; it will be reviewed here in an updated version of this paper.]
[SECTION 6 CAN BE SKIPPED. I DIDN'T HAVE ACCESS TO ENOUGH COMPUTATIONAL POWER TO DO THE SUGGESTED SIMULATION STUDY IN A MEANINGFUL WAY.]6. A sketch for a simulation model that uses annexation to deal with the integration problem
6.1 Synopsis
This section presents the outlines of a simulation study of a neural network that would be able to understand simple language, discover regularities and law-like connections in its environment, form hypotheses about these and test them, and communicate verbally the result of its investigations. The level of abstraction is high; i.e. little attention is paid to biological detail, but the task assumed is rather to illustrate some general principles in a suggestive way: to set forth a rudimentary framework, or skeleton, that could be the starting point for a process of having it fleshed out and scaled up, to biologically plausible system, by having components added, refined or substituted. Two desiderata are given special attention: the system should handle the integration problem without assuming binding by synchronous firing, and it should manage one-shot learning, i.e. be able learn a message, e.g. a sentence, from a single presentation. The results of simulations of some initial steps are also given at the end of this section.26.2 Overview of the system3
A simple toy world exhibiting regularities is observed. An Attention Focuser for Observation (AFO) suggests things to look for in this world by feeding an observation sentence to the Sensory Module (SM), which determines whether the sentence is true or false by making the relevant observations. The result of this investigation is stored in Short Term Memory (STM) and might later be transferred to Long Term Memory1 (LTM1). Based upon what is stored in LTM1, a Regularities Finder (RF) will suggest hypotheses, statements about regularities in the toy world, which will be stored in Long Term Memory2 (LTM2). A second attention mechanism, the Attention Focuser for Reasoning (AFR) will be called into action to pick out items from LTM2 and from STM and feed them to a Logic Circuit (LC) which performs a simple check to see whether these items are consistent or not. If they aren't, the law-like statements from LTM2 will suffer a credibility reduction. Non-visual information can also be provided to the system by adding sentences directly to the memory modules.All the memory modules are attractor nets, and sentences are stored as on top of each other. Each sentence representation is a concatenation of smaller patterns representing the words constituting the sentences, in appropriate order.
6.3 Toy World
The toy world was chosen to consist of 20*3 pixels, the home of up to four simultaneous objects: triangles, squares and circles. The objects were of uniform size and could not overlap. The thought was to get as simple a world as possible, yet one that had enough complexity to allow an illustration of the principles behind the system. Time was not included as a dimension of the world, although the network were to be presented with many scenes (which can be thought of as "days" in its history). The scenes could either be composed completely randomly, or they could be required to obey certain laws and regularities. One law would be that no object could be present on the left wing of the world unless there were another object of the same type present on the right wing. Statistical regularities could also be designed.6.4 Vocabulary and Syntax
To simplify matters, sentences are of uniform length and fixed word order. Each sentence contains one binary truth function and two simpler sentences of 7 words each. Such a subsentence has the following form:(There is no) (left-wing) (triangle) (that is to the left of) (a non) (middle) (circle).
The first and fifth places can be occupied with a blank or a negation; the second and sixth position can be filled with the words "left-wing", "right-wing", "middle" or "blank"; the third and seventh positions by "triangle", "square", "circle" or "blank"; and the forth position by "is-to-the-left-of", "is-to-the-right-of", or "and" or "not-both". The binary truth function could be either "and", "if-then", or their negations.
6.5 Training
There are two phases of the learning. In the first phase, the set-up phase, the modules are trained individually. This is done by some variant of backprop and will be a slow process. The second phase, the on-line phase, begins when the system has acquired the basic skills and concepts: it can then begin to function by storing facts in short term and long term memory and modify its law-like sentences in the light of new observation and received communications. This type of learning will be more or less instantaneous (one-shot learning).6.6 Set-up phase
Sensory Module: Three-layered feed-forward network. Its input is composed of a scene presented on the visual input sheet together with an encoding of a sentence, and it is trained to give as output a verdict (low or high activity of the output node) on whether the sentence is true or false of the observed toy world scene.Logic Circuit: Three-layered feed-forward network, whose task it is to perform some very simple deductions. Its takes as input a couple of the sentences stored in the short term memory and another two from the long term regularities memory. Which sentences, should later be determined by an Attention Focuser for Thinking (see below); but during the set-up training phase the sentences can be arbitrarily chosen (from the whole set of grammatically well-formed sentences). The Logic Circuit is trained to tell whether these four sentences are consistent with each other. The sort of inconsistency that the LC should be able to discover is one like that of the following pair of sentences:
(1) If there is a left-wing triangle then there is a right-wing triangle.
(2) There is a left-wing triangle and there is no right-wing triangle.
Attention Focuser for Observation: 3-layered ff net, has the task of choosing which features in the observed scene to pay attention to, i.e. which sentences should be sent to Sensory Module for evaluation. This choice should be made so as to maximize the likelihood that the information obtained form the SM will enable the Logic Circuit to falsify one of the sentences in the LTM2. Perfect performance is neither required nor expected. The Attention Focuser for Observation could be trained by giving the content of LTM2 as input and possibly also some elementary information about the present scene. This information could be obtained by adding a module (Primary Sensory Module) that recognizes the presence of, say, a geometric shape (Square, Triangle, Circle). The AFO would then give as output some sentences that concern a geometric shape which is also the topic of some of the regularities in the LTM2.
Attention Focuser for Reasoning: 3-layered ff net, gives as output two sentences from STM and two from LTM2 which are then fed to the Logic Circuit; takes as input the contents of the STM and the LTM2. The AFR is trained (with back prop), either after the LC, or else using a procedural deducer in place of the LC, so as to maximize the likelihood that the sentences which are the output are inconsistent with the present content of the STM.
Regularities Finder: has the task of finding regularities amongst the sentences in the LTM1. In the simplest case, this 3-layered ff net is trained to generate sentences with a high probability, given the content of the LTM1 (according to some statistically reasonable assignment). A more sophisticated regularities finder would also take into account the LTM2 (i.e. the present "theory") when searching for interesting regularities.
6.7 On-line phase
As explained in the Overview, the system, once set up, operates by making observations and generating hypotheses which are checked against new observations. The sentence patterns were to be stored together with a tag (which could be thought of as an additional word) that would give a value indicating the confidence level assigned by the system to the sentence. If the LC finds a contradiction, then the sentences from LTM2 that were involved get adjusted by having their credibility value lowered; otherwise it may increase slightly. If the credibility value sinks below a certain value, then the sentence might be removed from memory, or stored again with a "FALSITY" label attached4. Since all memory modules are attractor nets, learning is achieved by clamping the pattern to be stored and making a Hebbian weight update. This is done on-line.6.8 Results
The toy world generator, a sentence generator and a sentence evaluator, as well as the observation module and the ANN memory modules were successfully programmed in the Cortex Pro environment. Instead of the attention focuser mechanisms and the logic circuit, there was initially a procedural simulacrum, a rule that removed sentences from the long term memory if they either were too often found false or else failed on some simple criteria of relevance.Disappointingly, the success rate of the observation module failed to raise considerably above 85% within a reasonable simulation time on the available hardware. Various alternations in architecture and parameter values, as well as simplifications in the sentence structure, restriction of the vocabulary and compression of the encoding of the toy world were tried, but did not increase the reliability of the observation module's verdicts to more than about 90%. As this was judged to be too low for a meaningful development of the remaining interdependent modules, the simulations were discontinued. This result incidentally illustrates the point that was made earlier, that the rigidity of integration through convergence, in a homogeneous feed-forward net, makes it unsuitable in the context of linguistic processing and parsing, where there is a strong demand for swift recombination and repeated application of simple combinatorial rules.
7. Discussion: the means of cortical integration
We have looked at various proposals for how cortical integration could be achieved, and discussed what evidence there is for each. It now time to put things together. The first point to make is that the proposals are not mutually exclusive. On the contrary, there is reason to believe that several of them are involved, in different regions and for various purposes in the brain.
To begin with early sensory processing, there can be no doubt that integration by convergence plays an important role here. There is also evidence that synchronization occurs and probably makes a functional contribution to contour extraction in visual processing and to spatial localization of the sound source in audition, though in the latter case not primarily as an integration mechanism.
It also possible that convergence is used in motor processing. Here it would not mean that a large number of stimuli or instances of more abstract tokens are subsumed under a unitary representation, but rather that such a unitary representation would constitute the precept to undertake a certain action, or sequence of actions. For example, it is possible that there could be in snake brains a cluster of cells whose activity indicated that the animal were about bite. Exactly which motor neurons were to be activated in order to carry out the bite, would be determined later on after co-ordinating the snakes present position and the position and movement of the target etc. So the cluster indicating "bite!" could be thought of as the point of convergence for all possible motoric outputs that would constitute an act of biting.
In view of the importance of sequenciation in motoric computing, it could seem that this would be the best place to look for integration by convolution, for example through RAAM-like structures. That is possible, but there are also other mechanisms that could be used to achieve sequencing. Integrated representations are not necessary for temporally structured responses.
Even less in known when we move away from the coastal areas of sensory and motor processing into the inland of cognition. When we investigate into the integration mechanism underlying abstract thought, it is necessary to consider both passive and active representations: both your knowledge of your first year in school, say, and the your knowledge and awareness of your present thoughts, which are hopefully about this paper and its subject matter. The vast majority of our knowledge is encoded in passive representations. Synchronization, the state of temporal coherence in neuronal spiking, can obviously not bind passive representations, since they are, per definition, not spiking. If "synchronization" has any role to play in the integration of passive representations, it must rather be the tendency to engage in synchronised spiking, rather than synchronised spiking itself.
How would such a tendency be synaptically encoded? Two representations that were to be bound by a tendency to synchronization would have to influence one another, presumably by sending excitation or disinhibition to one another. Thus they would first of all tend to activate one another, before there could be any possibility of synchronization. But that means that they would, in effect, form a complex attractor. Hence, it looks as if binding passive representations by means of a tendency to synchronization would already presuppose that that they were bound through annexation. One may therefore wonder what the synchronization tendency would contribute to the integration: wouldn't it merely wed a couple that was already married to each other?
Perhaps it could abolish the need of multiple concept instantiations? Suppose the patterns ABC and DBE are to be remembered. Instead of storing a symbol representing B twice, we could mold a complex attractor ABC and another complex attractor DBE overlapping the first one so that the B pattern appears only once. ABC would be distinguished from DBE by having a tendency to synchronize internally but not with the D and E parts of DBE; and vice versa.
There are problems with this suggestion. First, it is not clear how the synaptic weights would be configured to achieve this selective tendency to synchronization. Second, even it such a synaptic weight configuration could be specified, it is not easy to see how it could be learnt. Third, the specificity of the synchronization tendency for passive representations would have to be extreme. There are countless pieces of knowledge that involves concepts such as "whales", "Paris", "inflation" etc., all of which would have to have their own unique synchronization properties. But since the spacing between spikes of a neuron is often as short as 10 ms, and since the resolution of synchronization is about 2-3 ms, no more than about four propositions containing a given concept could be stored through synchronization tendency with a single concept instantiation. For these reasons it seems unlikely that the tendency to synchronised is a major part of the explanation of how declarative knowledge is stored in passive memory.
Less implausible is the suggestion that synchronization plays a role in the integration of working memory, the active representations of abstract thought. This would be merely actual synchronization; there would be no need to lay down tendencies to specific synchronised states in the synaptic weights, and this would simplify matters. Also, it is much more plausible to suppose that there are only a handful "propositions" in our immediate awareness than it is to postulate the same for our entire store of passive memories.
One outstanding feature of early visual processing is the prevalence of topographic maps. It is tempting to conclude that this accounts for the distinct phenomenological quality of spatial consciousness: the sort of peripheral presence that all details of the visual scene enjoy even when they are not the object of our attention, in contrast to abstract thoughts and beliefs, which do not seem to "be there" when we are not thinking them. Is it possible that the "peripheral presence" of shapes and colours in our visual field in absence of attention is due to the fact that they are represented by distributed neuronal activity on topographic maps; whereas abstract beliefs become conscious only when their neural representations are uploaded from passive memory onto a special stage that only has room for but a small number of active actors at a time?
Less metaphorically, there could be a sheet of reusable neurons whereunto active representation were pasted by the clamping mechanism discussed in the section about annexation. As soon as one of these propositions were dropped from our awareness, its place could be taken by another proposition. The unlearning may be no problem if only short term potentiation were used. The increased synaptic efficacies would simply decay away within a second if the pattern is inhibited.
This would hardly be a plausible model for anything other than explicit logical reasoning. Since explicit logical reasoning is not the only, or even the major, thing that goes on in higher cognition, there would have to be ways for passive representations to interact other than through being lifted up in the lime light of such a stage of active support. Interaction, or operations performed on neural representations, is not the primary topic of this paper; but such issues must nonetheless be taken into consideration when we search for potential forms of representations and integration in cortex; for whatever the mechanism that brings about cortical representations, it must surely be such as to facilitate the operations of reason. In fact, there are good grounds for believing that the greatest part of animal and human intelligence lies not in the quickness with which certain operations can be performed upon cortical representations, but rather in the way these representations themselves are organized: into a fairly coherent world view which narrows down enormously the search space when we seek a solution to an everyday problem.
We have focused on the integration of the basic constituents of our knowledge representations, but in order to get a realistic and workable system, there would have to be much more structure than that. Distinct representations of "propositions", if they exist in the brain at all, can only be the bricks of the architecture of mind. Ways must be found to achieve the coherence of organization that allows relevant considerations from variegated fields of experience to be brought to bear as constraints in the shaping of a plan or an opinion.
7. Conclusions
Convergence is indispensable for cortical integration, but not by itself sufficient for a flexible representation system. Convergence onto reverberating cell assemblies, in particular, is a necessary basis for working memory. It is unknown whether integration through convolution occurs in the brain. Synchronization may play a part in early sensory processing and possibly in active reasoning, but not in holding together representations that have been laid down in long-term memory. Annexation, on the other hand, has many virtues that makes it a strong candidate for the integration mechanism for declarative passive memory as well as working memory. It could be worth searching for annexation experimentally, though the task is made difficult by the fact that annexation could take on so disparate forms.
----------------
I am grateful to Prof. J. G. Taylor for valuable advise.References
Amit, D. J. (1995) The Hebbian paradigm reintegrated: local reverberations as internal representations. Behavioural and Brain Sciences 18, 617-657.
Apolloni, B. , Zampolini, G. & Zanaboni, A. M. (1995) An integrated symbolic/sub-symbolic architecture for automatic parsing of Italian sentences containing PP-attachment ambiguities. Draft.
Crick F., Mitchinson, G. (1983) Nature 304, 111-4
Crick F., Mitchinson, G. (1985) In Principles of Neural Science, 2nd ed., Chapter 49. Elsevier, New York.
Engel A. K. et al. (1991) Interhemispheric synchronization of oscillatory neural responses in cat visual cortex. Science 252: 1177-1179.
Fodor, J. A. & Pylyshyn, Z. W. (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28: 3-72.
Eichenbaum, H., Otto, T. & Cohen, N. J. (1994) Two functional components of the hippocampal memory system. Behavioural and Brain Sciences 17, 449-518.
Elman, J. L. (1990) Finding structure in time. Cognitive Science 14, 179-212.
Engel, A. K. et al. (1991) Interhemispheric synchronization of oscillatory responses in cat visual cortex. Science 252: 1177-1179.
Greenfield, P. M. (1991) Language, tools and brain: The ontogeny and phylogeny of hierarchically organized sequential behaviour. Behavioural and Brain Sciences 14, 531-595.
Gray, C. & Singer, W. (1987) Stimulus-dependent neuronal oscillations in the cat visual cortex area 17. Neuroscience (suppl.) 22:434
Hopfield J. J., Feinstein, D. I. & Palmer, R. G. (1983) Nature 304, 158-9.
Ishikawa, M. (1995) Learning of modular structured networks. Artificial Intelligence 75, 51-62.
Jagadeesh, B., Gray, C. M. & Ferster, D. (1992) Science 257: 552
von der Malsburg, C. (1981) The correlation theory of brain function. Reprinted in Models of Neural Networks II, edt. Domany, E., van Hemmen, J. L. & Schulten K. (1994) Springer-Verlag.
Mani, D. R. & Shastri, L. (1993) Reflexive reasoning with multiple instantiation in a connectionist reasoning system with a type hierarchy. Connection Science, Vol. 5, Nos. 3&4: 205-241.
Maskara, A. & Noetzel, A. (1993) Sequence recognition with recurrent neural networks. Connection Science, Vol. 5, No. 2.
Munro, P. Cosic, C., & Tabasko, M. (1991) A network for encoding, decoding and translating locative prepositions. Connection Science, Vol. 3, No. 3.
Nenov, V. I. & Dyer M. G. (1994) Perceptually grounded learning: Part 1-A neural network architecture for robust sequence association. Connection Science, Vol. 5, No. 2.
Nenov, V. I. & Dyer M. G. (1994) Perceptually grounded learning: Part 2 -DETE: a neural\procedural model. Connection Science, Vol. 6, No 1.
Plunkett, K. et al. (1992) Symbol grounding or the emergence of symbols? Vocabulary growth in children and a connectionist net. Connection Science, Vo. 4, Nos. 3&4: 293-312.
Pollack, J. B. (1990) Recursive distributed representations. Artificial Intelligence 46:77-105
Reilly, R. (1992) Connectionist technique for on-line parsing. Network 3: 37-45
van der Velde, F. (1995) Symbol manipulation with neural networks: production of a context-free language using a modifiable working memory. Connection Science, Vol. 7, Nos. 3&4: 247-280.
Ritz, R. Gerstner, W. & van Hemmen J. L. Associative binding and segregation in a neural network of spiking neurons. In Models of Neural Networks II, edt. Domany, E., van Hemmen, J. L. & Schulten K. (1994) Springer-Verlag.
Sakai, K. & Miyashita, Y. (1991) Neural organization for the long-term memory of paired associates. Nature 354:152-55.
Singer, W. (1994) The role of synchrony in neocortical processing and synaptic plasticity. In Models of Neural Networks II, edt. Domany, E., van Hemmen, J. L. & Schulten K. Springer-Verlag.
Sporns, O., Tononi, G. & Edelman, G. M. (1994) Reentry and dynamical interactions of cortical networks. In Models of Neural Networks II, edt. Domany, E., van Hemmen, J. L. & Schulten K. (1994) Springer-Verlag.
Sutherlans, S. (1991) Only four possible solutions. Nature 353: 389-90.
Notes
1 Compare Fodor& Pylyshyn (1988)
2 While the nominal purpose of section 5 is to illustrate certain cognitive principles, there was an additional motivation for developing this model and carrying out some of the simulations; namely to give the author a little hand-on experience with programming computer simulations of neural networks.
3 See also appendix 1.
4 This cause of action, to store the rejected pattern with a falsity symbol, could in some circumstances be an alternative to forgetting. Beside the standard ways of forgetting in an attractor net -through weight decay and through interference from other overlaid memory traces-, there has been proposed an active way of forgetting too. It was originally suggested by F. Crick & G. Mitchinson (1983, 1985) as an explanation of REM sleep, and it was later subjected to simulation studies by Hopfield et al. (1983). The procedure of unlearning, as it is called, consists of three steps: First there is the "random shooting": the activity in the net is randomized; second, there is the relaxation phase when the net is allowed to sink into an attractor; third, there is the unlearning update, which modifies the weights according to a negative Hebbian ruleJij' = Jij - (D/N)AiAj
(Jij is the connection strength between node i and j, (D/N) is a measure of the (un)learning rate, and Ai and Aj are the activity at node i and j, respectively, after relaxation.) Hoppfield et al. demonstrated that this mechanism can increase the storage capacity of an attractor net from 0.14 (patterns per neuron on average) to 0.68.