BrainPlay - The self-teaching brain

A novel approach to the mysteries of biological learning

Rather than investing into more powerful muscles, better sensors, or body plans that could take them into the air, our ancestors grew big expensive brains. To be useful at all, each of these brains has to configure itself at the beginning of (and throughout) its life. Indeed, we and our phylogenetic relatives are bound to invest enormous resources to allow our offspring to configure its brains, before they can survive without us. Our research vision rests upon our conviction that the success story of big brains (which did, eventually, take us to the skies and the moon) is tightly linked to the co-evolution of learning algorithms that allow these incredibly universal and flexible machines to find the right set of parameters within the vast space of possible settings that control their function and behavior. We believe that these learning algorithms are so fundamental to the evolutionary viability of the mammalian big brain project that they are build in deeply into its fabric and connect across all of its levels from synapses to neurons, from neurons to systems, from systems to cognition, and from cognition to behavior. Given the severity of the challenge, we cannot imagine that our ancestors would not have evolved a dedicated set of behaviors that is specifically directed at configuring our brains and facilitate learning. We believe that play encompases one of these behavioral repertoires, perhaps the most crucial one. We think that current endeavours to understand biological learning are stalled because they approach learning from the perspective of supervised machine learning, i.e. the formation of externally controlled stimulus-response contingencies. Instead, we suggest that, if allowed, mammalian brains generate unique patterns of behavior associated with specific brain states that serve, instruct, and improve the learning performance of the brain. We propose to study learning in the context of these specific behaviors. In this research proposal we outline our mission to uncover the learning algorithms that subserve biological intelligence in the context of the learning behavior of play.

New teaching signals and self-supervised learning 

How can learning in neural systems bridge between the fast time scales of neural information processing and the slow time scales of behavior? This fundamental “temporal credit assignment problem” (Hull, 1943; Sutton, 1988; Izhikevich, 2007), has long placed severe limitations on the performance of learning in neural network models and their biological validity. Recently, one of our PIs (Robert Gütig) has discovered a candidate solution of this problem with dramatic consequences for biological learning. This solution is based on three breakthroughs, the spike-threshold- surface, aggregate-labels, and self-supervised learning (Gütig, 2016) that promise to reorient the research fields of synaptic plasticity and teaching signals and herald the new field of self-supervised learning.

Why play?  The mysterious gain of brain function from play

Loss of function analyses of play behavior have failed. For decades biologist have tried to determine the function of play by so called play deprivation experiments. However, the effects of such play deprivation are in many instances very subtle. The most obvious behavioral deficit from play-deprivation was a loss in the ability to play. A turning point in the analysis of play effects came, when one of our PIs (Daphne Bavelier) started focusing on gains of brain function associated with intense playing. This research approach, firmly grounded in quantitative psychophysics, revealed benefits of intense playing on a wide variety of cognitive functions. Thus, within a few years the question about the function of play changed from “what is it good for?” to “how can play possibly be so beneficial for the brain?” This is the question we try to answer.

A neglected problem in neuroscience and computation

Play has been of interest to many psychologists and biologists. Such interest should not blind one for the fact that neuroscience has so far failed to uncover the neural mechanisms of playful learning. We do not know what is happening in the brains of playing animals. We have collected data about neural activity in visual cortex in tens of thousands of studies, but not a single one describes visual cortical activity during play. Numerous studies have investigated learning in operant conditioning tasks, but there is a big dearth of data, when it comes to neural data referring to playful learning. The same holds for computational analyses of learning. The recent success of brute force deep learning approaches, that rely on our most powerful computer technologies and gigantic labeled data sets, obviously bears little resemblance to the swift and easily transferable learning occurring in playing brains.

Objective Functions

To subserve learning, synaptic plasticity must change how neurons respond to inputs. By contrast, mainstream research on synaptic plasticity is looking in the opposite direction and studies how postsynaptic activity takes part in shaping synaptic plasticity. These studies have been very successful in dissecting the mechanistic underpinnings of synaptic changes (Martin et al., 2000). In addition, they have been highly productive in generating a plethora of phenomenological synaptic learning rules, mostly one (in unfortunate cases also more than one) rule per induction protocol (Korte & Schmitz, 2016). However, these approaches are ill-posed to answer what it is about its postsynaptic response that a neuron is trying to change when engaging synaptic plasticity. To address this fundamental question, we disconnect from the established lines of research and align ourselves with the perspective of the brain when required to change its function. Specifically, we will measure the objective functions that govern synaptic plasticity. The nonlinear spike generating mechanisms that map postsynaptic voltage traces to discrete spike trains has prevented such measurements in the past. Here, we embrace a recently developed measure of postsynaptic responses, the spike-threshold-surface, that provides the basis for continuous objective functions for learning of discrete spiking responses (Gütig, 2016). Given a pattern of presynaptic activity, the spike-threshold-surface quantifies how far a neuron is away from generating different numbers of output spikes. Our goal is to measure how synaptic plasticity changes these distances (Fig. 1).

(A)    Multicompartmental neuron model (Armatrudo et al., 2012); (B) progression of voltage traces from cool to warm colors during operation of spike-timing dependent plasticity. (C) Although the number of spikes remains constant (4 spikes), the spike-threshold-surfaces before (blue) and after (red) plasticity reveal an increased plateau width, i.e. margin, around 4 spikes (dashed vertical line). Large margin classification is crucial for generalization and belongs to the most important breakthroughs in machine learning. Solid lines: synaptic stimulation, dotted lines: dynamic clamp.

Teaching signals
How do neurons teach neurons?

Aggregate-label learning. For many decades, research on learning in neural networks, in particular with spiking neurons, has been obscured by the dogma that learning requires temporally precise teaching signals that instruct each neuron at which points in time it should modify its responses. We now know that this dogma is wrong. We have discovered that models of spiking neurons can perform previously unimaginable learning tasks without any time resolved teaching signals (Gütig, 2016). Already binary teaching signals, “aggregate-labels”, without temporal information, that merely instruct neurons to elicit fewer or more output spikes, enable neurons to discover reward predictive features within long streams of noise. Overcoming the dogma of temporally precise teaching signals, i.e. solving the fundamental temporal credit assignment problem (Sutton, 1988), has opened a vast field of new possibilities how learning neurons can interact with the external world and how neurons can teach neurons.

(A) Network architecture, grey: input layer; blue: student layer; red: teacher unit. (B) Voltage traces of 10 self-supervised student neurons after learning. Despite no external feedback, all student neurons discover and learn to respond to the blue input feature. Modified from Gütig (2016).



Self-supervised learning

In a first step into this field of possibilities, we have discovered the concept of neural “self-supervision”: Feeding back simple projections of neural activity, such as its mean, as a teaching signal to groups of neurons allows simple neural networks to discover distributed ensembles of features even if their occurrences are rare and widely distributed across space and time (Fig. 2). In this part of the project we will test the neurobiological implementations of aggregate-label learning and neural self-supervision and hunt for their traces in human and animal learning behaviors. This part of our project also has a crucial theoretical component within which will combine and extend our theoretical advances to realize supervised and unsupervised learning, side-by-side, within deep multi-layer networks of spiking neurons.

Play as a self-teaching behaviour

fig 3.jpg

Fig. 3: Beneficial effects of video action games on visual performance.
(A) Scene from the video action game Counterstrike (modified from Wikipedia). (B) Visual localization performance of video-game-players (black squares) and non-video-game-players (open circles) (modified from Green & Bavelier, 2003). By all standards the psychophysical performance improvements from video game playing are astounding. The BrainPlay grant seeks to explore the neural basis of such effects.


Play behavior and the brain

While it is easy to recognize fitness advantages of other evolutionary conserved behaviors such as sexual behaviors, maternal behaviors or antagonistic behaviors, such advantages are less obvious for play behavior. A central idea driving our research is that play evolved as a brain self- teaching mechanism. In line with this idea juveniles play much more than adults, there is a positive correlation between brain size and playfulness (Iwaniuk et al. 2001) and there is an absolutely unmatched elaboration of human play. What is play? A key element of play is that it is “not serious outside of ordinary life” (Huizinga, 1949). Play behavior transposes ordinary activities to a play context. Play behaviors are modified from their real life counterparts, play attacks differ in systematic ways from serious fighting (Pellis & Pellis, 1987). Laughter might have evolved to communicate joy in ‘play’ (Grammer & Eibl-Eibesfeldt, 1990).

Deficits from play-deprivation and brain gains from play

It is very difficult to deprive animals in a meaningful way from play. In work, in which rats were raised not in social isolation, but in the company of non-playful adults suggested that play experience results in increased resilience to social adversity. Still, the study of play deprivation has only led to a limited set of findings. A different route to study play was taken by Bavelier and colleagues, when they studied the benefits of increased play experience in computer games. This analysis revealed a host of cognitive benefits from gaming (Green & Bavelier, 2003). Such cognitive benefits might be key to understanding play (Fig. 3).

Play and its neural mechanisms is an outlawed topic


At the neuronal level play is one of the least studied forms of all brain activity. The absence of data reflects deeply rooted research biases against play. In the sixteenth century play was prohibited under the rule of Calvin in Geneva. Behaviorism with its focus on stimulus reward contingencies was profoundly unable to cope with the purposelessness of play. Modern neuroscience is obsessed with behavioral control, which collides with the study of play.

A multidimensional neurobiological
analysis of the state of play

We assume that the power and effortlessness of playful learning is related to a particular internal state of the subject. We speculate that this “state of play” is the decisive variable that accounts for beneficial effects of playing on cognition and that understanding it will be a fundamental advance in understanding biological learning. In order to investigate the state of play the Brecht group has pioneered neural analysis of heterospecific play (Fig. 4).

fig 4.jpg

Fig. 4: Heterospecific play and neuronal recordings during playful activity

(A) A Human-Rat heterospecific tickling interaction. (B) A Human-Rat chasing hand game. (C) Neuronal recording from a single neuron in the deep layer of the trunk region of rat somatosensory cortex. (A-C) are adapted from Ishiyama & Brecht (2016). This cell responds intensely to tickling and touch, but also is active during the play (chasing hand), when the animal is not touched. The origin of such non-sensory play-evoked activity in sensory cortex is a riddle that BrainPlay will explore.

Mood, social factors and sensory cues shaping play


Playfulness is strongly affected by mental variables such as mood. We do not play, when we are in a bad mood or when we are afraid. The Bavelier and the Brecht groups will manipulate mood in animals and humans and quantify the effects on playfulness and neural activity. Playfulness is strongly affected by social variables both in human and animal behavior. Playing by yourself is less fun than playing together. The gaming industry is spending billions on arranging games, suggesting that fine sensory cues play a decisive role in engaging the playing brain.

Determining neuromodulator regimes and brain activity during play

The analysis neuromodulator regimes associated with the state of play will be a core part of our research. Neuromodulator action is a prime candidate mechanism, which might initiate a “state of play brain state” and initiate the enhanced synaptic plasticity that goes along with it. To quantify neuromodulator action we will measure axonal activity in neuromodulator systems, in which neuromodulator fibres have been transfected with genetically encoded calcium indicators. Alternatively, we will visualize neuromodulators directly with genetically encoded cnipher molecules. We will measure patterns of neural activity during play in humans (the Bavelier group using FMRI) and in animals (the Brecht group using tetrodes).

Understanding the difference between play and reality

A profound behavioral distinction is the differentiation between play and reality. The play context is associated with positive emotional signals such as laughter and in this context play-attacks evoke laughter and non-violent defenses. Shooting somebody is fun in gaming, whereas the same action in reality can cause trauma-disorders. We confront play and reality with different mindsets / brain states. Hence, we need to understand the differences in brain activity associated with play and reality.

The consequences of state of play for synaptic plasticity

We will induce neuromodulator regimes and neural activity patterns as measured in vivo and then prepare brain slices of brain structures involved in play, i.e. the somatosensory cortex. We will then measure synaptic plasticity. We will also explore the pharmacology of the state of play and the effect of drugs such as Ritalin, which is widely applied in hyperactive children, on play-related neural activity.