Can we use Deep Neural Networks to understand how the mind works? The 5-year ERC grant entitled “Generalisation in Mind and Machine” compares how humans and artificial neural networks generalise across a range of domains, including visual perception, memory, language, reasoning, and game playing.

Why focus on generalisation? Generalisation provides a critical test-bed for contrasting two fundamentally different theories of mind, namely, symbolic and non-symbolic theories of mind. Symbolic representations are compositional (e.g., Fodor and Pylyshyn, 1988) and are claimed to be necessary to generalise “outside the training space” (Marcus,1998, 2017). By contrast, non-symbolic models, including PDP models and most Deep Neural Networks reject the claim that symbolic representations are required to support human-like intelligence. So can non-symbolic neural networks generalise as broadly as humans?  If so, this would seriously challenge a core motivation for symbolic theories of mind and brain.  For recent discussion on this issue, see Bowers (2017) in Trends in Cognitive Science.

Our research team is carrying out a series of empirical and modelling investigations that explore the generalisation capacities of humans and machines across a wide range of domains. These studies are designed to: (1) Focus on tasks that require symbols for the sake of generalisation. (2) Focus on generalisation across a range of domains in which human performance is well characterised, including vision, memory, and reasoning. (3) Develop new learning algorithms designed to make symbolic systems biologically plausible.

Ongoing projects below:

Project 1 | Empirical Studies of Visual Object and Word Generalisation
A series of empirical studies will assess the generalisation capacities of symbolic and non-symbolic theories of object and word identification within and outside the training space.  For example,  in two studies that have been completed, participants were trained to identify 24 novel objects presented at one retinal location, and then assessed on these same objects presented in new retinal locations.  We observed that participants’ identification performance was excellent at these novel retinal location (see Figure 1, chance being 50%) at a range of eccentricities (3, 6 and 9 degrees from fixation).  This falsifies most theories of translation invariance in humans that assume that translation invariance is much more limited (e.g., Cox & Dicarlo, 2008; Kravitz, Kriegeskorte, & Baker, 2010).For some earlier work on this, see our paper Bowers, Vankov, and Ludwig (2016).

Project 2Modelling visual generalisation in object and word identification

We are simultaneously developing a set of symbolic and non-symbolic (neural network) models for object and word identification, with the explicit aim of testing generalisation abilities of these models. One set of models will compare the degree of translation invariance achieved by deep convolutional networks to that shown by humans (e.g. in the experiment above).  Our goal here is to establish the extent to which convolution and pooling layers achieve invariance and whether we need additional mechanisms to achieve invariance comparable to human performance.

Another set of models compare the nature of representations in humans and artificial neural networks. For example, consider the stimuli in Figure 2 taken from Hummel and Stankiewicz (1996).  Humans find the Basis and V2 images to be more similar (the relations between the parts are more similar), but according to Hummel and Stankiewicz, all non-symbolic models will find the Basis and V1 stimuli to be more similar (the overlap between features are more similar).  We are testing this strong claim by seeing whether we can train a deep network to make human-like generalisations with these types of stimuli. This work is being conducted with Professor John Hummel (LINK).

Project 3Generalisation in STM, reasoning & game-playing
This work takes the same logic as above, but tests performance of models in the domains of short-term memory, reasoning, and game playing.  For example, relational reasoning is the archetypal task that is claimed to be difficult for non-symbolic models (e.g., Holyoak, & Hummel, 2000).  Consider the game of Set, a game that young children can quickly master.  This game requires the children to understand whether objects share the same set of features or not (a relational judgement) in order to make a Set (see figure for more details).   We are testing the performance of deep networks that are not endowed with explicit symbolic representations on these tasks.  For a similar point, see recent paper by Ricci, M., Kim, J., & Serre, T. (2018). Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks. arXiv preprint arXiv:1802.03390.

Project 4Biologically plausible symbolic models
An important issue is whether symbolic networks are biologically plausible.  We are developing a biologically plausible symbolic network that not only learns, but that can also solve problems that are difficult for non-symbolic networks. One possible mechanism to encode symbolic representations in neural networks is via “polychronization” (Izhikevich, 2006). The basic idea is that learning and computation may not only occur at the connections between units (an assumption shared by almost all neural networks), but also in the adaptive modification of “delay lines” that impact the speed of neural conductance.  This provides a “second degree” of computation that is common in many symbolic neural networks. Importantly, this approach appears to be biologically plausible given that the synchronous arrival of spikes is critical for driving post-synaptic neurons, and there is growing evidence that myelin along axons is adaptively tuned in order to alter the conduction speeds of neurons.  Here is an illustration of how letter order might be coded with delay lines (thicker lines refer to faster connections or more myelin), whereas connection weights code for whether a given word contains a given letter (for example, the word SPOT in bottom left caption has strong connections with the letters O, P, S, and T, but is not connected to other letters).    On this hypothesis, it is assumed that when you make a single fixation on a word the letters are activated quickly in sequence from left to right.  If letters are activated in the right temporal sequence, for instance in the order S followed by P, O, then T, then the letter signals will all arrive at the word detector SPOT at the same time (given the pattern of the delay lines) whereas the letter signals will arrive at other word units out of synch. As a consequence, the SPOT unit is maximally activated and recognized from the input SPOT.  Separating the identify of letters (via weights) and order of letters (via delay lines) allows the network to solves some word recognition tasks that are difficult to do otherwise. For details of myelin plasticity and how it can improve word identification performance, go to the following link: Bowers and Davis (in revision)

Top-left panel: Pattern of connectivity for the letter units O, P, S, T and the word units POST, SOAP, SPOT, STOP, and TOPS. The width of the lines connecting letter and word units represents the degree of myelination.  Top-right panel: Connections from the P letter unit are highlighted. Note that the same P units is connected to each of the word units shown, even though the P occurs in different positions (1, 2, 3, or 4) in the different words. The thickest connections (greatest myelination) terminate at the words (STOP and SOAP) that end with this letter and the thinnest connection terminates at the word (POST) beginning with this letter.  Bottom-left panel: Connections to the SPOT word unit are highlighted. Note that the thickest connection is associated with the last letter of the word and the thinnest connection is to the first letter. This pattern ensures that temporal differences due to the order of firing are countered by corresponding differences in the speed of transmission, resulting in temporally synchronous arrival of the signals at the word unit when the stimulus is “SPOT”.  Bottom-left panel:  Different pattern of delay lines code for the order of letters in the word TOPS.

Relevant Background Publications to ERC Project

For full list of Publications please visit Jeff Bowers personal website