Common Principles in Leaning from Bees through to Humans

Although bees are separated from humans by about 600 million years with a common ancestor that had only a rudimentary nervous system, they still share over 60% of our Genome. Any commonly observed learning principles between bees and humans may be consequently either basal, or may have evolved in parallel due to their efficiency. While the universality of associative learning among the animal kingdom is well established, recent advances on honeybee’s cognition push further our understanding of shared mechanisms. Honeybees demonstrate the ability to prioritise information depending on context and the cost associated with making errors. Individual bees show evidence of having different heuristic approaches to solve complex tasks, and maintaining a diversity of cognitive strategies is also probably highly adaptive for group success. Bees can learn key numerosity abilities that humans acquire at school, such as the ability to add and subtract, understand the concept of zero and also how to link symbols with the specific numbers of items present. Such knowledge on bee cognitive-like processing serves as a source of inspiration for a better understanding of the biological roots of our intelligence, and may help shape educational theories or strategies to improve artificial intelligence efficiency.

VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18 How we learn, and whether different individuals learn in the same way is a key topic for understanding how to improve educational outcomes. For example, the idea that different psychological processes enable different symbolic representations and may set the basis of the multiple intelligence (MI) theories (Gardner and Hatch 1989;Gray 2010) challenges the classic Piagetian view of human intelligence: MI proposes that the human mind is 'designed' in modules, each one controlling a different aspect of symbol, whilst Piaget proposed that all these aspects are controlled by a single symbolic function (Gardner and Hatch, 1989), but see Morgan (1996). Critical thinkers in education have appreciated for some time that different modalities like sight or sound can promote optimal learning in different people depending or their particular cognitive styles (Mayer and Moreno 1998). Understanding human learning is a vital and pressing issue world-wide, but learning is not restricted to humans (Leadbeater and Chittka 2007). To better understand learning theory it is therefore useful to consider alternative animal models, to uncover how sensory systems and brains have evolved to promote efficient learning in a natural world.
Our current understanding of how associative learning operates was informed by the classical conditioning experiments by Pavlov in dogs (Pavlov 1927), which demonstrated that when a conditioned stimulus (e.g. a bell sound that does not evoke an innate behavioural response) is paired with an unconditioned stimulus (e.g. food which promotes salivation), an association is formed such that when the bell is subsequently sounded in isolation it enables the behavioural response of salivation. This associative learning framework became a classical model of learning as shown by numerous popular culture references to Pavlovian conditioning such as how Gringott's dragon is conditioned in the fictional Harry Potter book and movie "Harry Potter And the Deathly Hallows" (Rowling 2007), or inferences of how behavioural conditioning influences individuals in modern society in the dystopian crime book and movie "A Clockwork Orange" (Burgess 1962). A deeper understanding of the neurobiological mechanisms of associative learning was allowed by the reductionist approach of using simple animal models pioneered by the Nobel Laureate Eric Kandel. Kandel used Aplysia californica, a sea slug with a limited number of easily accessible neurons, to study how connections between neurons enable learning. He showed that synchronized repeated stimulation of neural cells due to the simultaneous perception of the conditioned and unconditioned stimuli induces a synaptic plasticity that underpin Hebbian theories of learning (Bi and Poo 1998;Markram et al. 1997;Kandel 2006). Thus, animal models from dogs to sea slugs are, and continue to be, of high value in understating learning theories.
Beyond deciphering the basic learning mechanisms, animal studies could also provide inspirational information to improve human pedagogy.
The honeybee has emerged as an important model for understanding learning in an animal that has a brain with less than a million neurons (Weinstock et al. 2006). In the commentary "From hive minds to humans", associated with the publication of the bee Genome, it was postulated how sociogenomics can use bees and other insects to explore the basis of environment and behaviour (Check 2006). The reason why it is possible to observe learning phenomenon in bees points back to the seminal work of Karl von Frisch, who demonstrated that bees were capable of both instinctual, and learnt tasks (von Frisch 1914;von Frisch 1967). In his seminal experiments, von Frisch trained honeybees to visit a blue coloured card by offering a reward in the form of a sucrose solution. Then in subsequent non-rewarded tests he showed the bees correctly chose the blue colour amongst many achromatic distractor stimuli ). Von Frisch (1967 also demonstrated in a series of elegant experiments that honeybees use a symbolic dance language to enable communication of the location of rewarding flowers between conspecific honeybees, for which he won a Nobel Prize in 1973. A particularly attractive factor for using bees as a test model animal for learning is that individual bees are not only collecting nutrition for their own benefit, but for the entire hive (von Frisch 1967). Thus when an individual bee becomes fully satiated in an experiment it simply flies home, offloads collected sucrose to hive-mates, and a few minutes later returns to participate in the experiment. This feature of bee lifestyles enables long-lasting experiments with multiple training trials without any decay of motivation (Dyer 2012;Srinivasan 2014).
Thus, bees can be trained to associate one stimulus with a sucrose reward while avoiding an unrewarded alternative via Pavlovian associative learning. The classical theoretical model of associative learning (Rescola and Wagner 1972) postulates that learning performance is dependent on the strength of the reinforcement, the saliency of the objects/stimuli to be learnt and the number of repetitions between the stimuli and the reinforcement. These basic properties of learning are universal (Pearce 2013) and have also been demonstrated in invertebrates with honeybees (Giurfa 2013). The bees' performance can thus be improved by using a negative reinforcement (a bitter solution of quinine) to the alternative stimulus in addition to providing a reward (sucrose solution) for a correct choice of the target stimulus (Avargues-Weber et al. 2010). While keeping the reward obtained (positive reinforcement) constant, adding a negative reinforcement increases the delta of reinforcement between a correct and incorrect choice with a direct effect VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18 on performance. This has fundamental implications when determining the perceptual and cognitive capacities of a species (Avargues and Giurfa 2014). A task that may appear too difficult can be solved when increasing the reinforcement. Importantly, while the strength of the reinforcement acts at the basic level of a neuronal association between stimuli as shown in Aplysia with very simple situations, playing on reinforcement level could also act at high cognitive level on more complex animals such as bees by inducing modulation of attention and time investment to the task (Chittka et al. 2003;Avargues-Weber et al. 2010;Avargues and Giurfa 2014).
Evidence of a modulation of perceptual abilities and thus attention after priming the bees to specific cues with reinforcement has also been observed in free flying bees. Zhang and Srinivasan (1994) tested bees on their capacity to discriminate a raised disc or a ring camouflaged with a random Julesz pattern on a similar Julesz pattern background. An initial test group of bees failed this test. However, when a second test group of bees were first presented with salient non-camouflaged stimuli to learn, these bees were subsequently able to learn how to solve the more complex camouflaged disc problem (Zhang and Srinivasan 1994;Zhang et al. 2004). Thus, in insects, prior experience enhanced learning of a complex problem. Similarly, honeybees also shifted their attention from a global perception to focus on the details of complex pictures after being reinforced with pictures containing only one single element, confirming that attention can be modulated via prior experience (Avargues-Weber et al. 2015).
This capacity of learning to use prior experience to solve novel problems is similar to how we as humans apply learning to solve problems. For example, the use of CAPTCHA tests (Completely Automated Public Turing test to tell Computers and Humans Apart) have become common place to ensure that it is a human, not a robot, interacting with an online environment (von Ahn et al. 2003;von Ahn et al. 2004). Figure 1 shows an image of a typical natural scene displayed as 16 separate panels. Although it takes a little time (typically less than 10 s), humans can correctly categorise panels containing flowers (in which case there are 14 panels) or flower petals (in which case there are only 9 panels) by using prior experience to categorise respective image criteria. Computers are poor at quickly solving this type of visual problem. VIDEO  The human brain is able to prioritise decision-making depending upon criteria (like not failing a CATCHA test and being locked out of a computer environment). If we have a perceptually difficult task there is frequently a speed-accuracy trade-off where some people are relatively slow and accurate, while other individuals are faster but less accurate. Such an effect is observed in young children as well as adults (Plamondon and Alimi 1997;Rival et al. 2003). In primates these behavioural choices are likely mediated by frontal cortical processing (Heitz and Schall 2012), but it is also interesting to understand how bees might also be capable of modulating their decisionmaking behaviour depending on the context. In bumblebees (Chittka et al. (2019) 1-18   2003) and honeybees (Burns and Dyer 2008) there is evidence of both speed accuracy trade-offs, and also changes in performance if the cost associated with making errors increased by employing the appetitive-aversive conditioning framework described above (Chittka et al. 2003). The perceptual difficulty to discriminate between correct and incorrect choices also has an effect on the level of learning accuracy in bees, and the response time to make decisions at both group, and an individual level (Dyer and Chittka 2004). It is now appreciated that speed vs accuracy trade-offs is a common principle in how animals solve problems ), and that a variety of speedaccuracy strategies between individuals is optimal for group or colony survival within complex environments (Burns and Dyer 2008). Thus comparative studies teach us that differences between individuals have likely evolved for a reason, and nurturing these differences can improve outcomes in complex conditions (Dyer et al. 2014). Indeed in modern human society different thinking strategies appear to be beneficial in particular scenarios (Tarter and Hoy 1998).

VIDEO JOURNAL OF EDUCATION AND PEDAGOGY
In their influential paper "Are bigger brains better", Chittka and Niven (2009) reviewed a growing body of evidence suggesting that bees can learn to process cognitive-like tasks including counting (Dacke and Srinivasan 2008), navigating mazes (Zhang et al. 1996), evaluating error probability (Perry and Barron 2013), or applying relational concepts such as 'same' or 'different' through Delayed-Matching To Sample (DMTS) paradigms . That review prompted a body of new research on bees, which we are now just beginning to understand within the context of classical learning theories.
The learning paradigm of DMTS requires the integrated use of both shortand long-term memory phases together with an ability for abstraction. DMTS requires an individual observer to see (or sense) a particular stimulus as a prompt in isolation, and then subsequently use that prompt to inform a choice, again in isolation. The prompt should be sufficient information for the individual to determine a correct stimulus option from one or more distractor options. The rule to follow requires then long-term memory, whilst the instance of a given stimulus on a particular trial requires short-term memory to be applied with the learnt rule. Interestingly, once bees acquire a skill using DMTS learning in one sensory domain like vision, they show evidence of being able to transfer the acquired learning to a different sensory domain. For example, Giurfa et al. (2001) demonstrated that bees could learn concepts of "sameness" and/or "difference" of stimuli in a visual experiment, and transfer that understanding to an olfactory experiment without any further training. Indeed, understanding a concept of 'sameness' or 'difference' means the ability to use an abstract rule independent of the stimuli to compare.

VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18
The honeybee has also been a high value model for comparative perspectives on how such an evolutionary distant species solves numerical problems. Two independent studies established that bees can count, at least up to the number four (Chittka and Geiger 1995;Dacke and Srinivasan 2008), and another showed that bees could discriminate between quantities up to four (Gross et al. 2009). Interestingly, whole numbers of four and below are within the subitising limit and appear to be easy to process for many animals, including human children (Agrillo et al. 2008;Cowan 2010;Jevons 1871;Le Corre 2006). By employing an appetitive-aversive conditioning framework, where incorrect choices result in constructive feedback via punishment (negative reinforcement, see above), it has recently been shown that honeybees learn to use numerical information in either a relative (e.g. choosing which number is less than an alternative) or an absolute (e.g. knowing the exact value of the number two or three) fashion (Howard et al. 2018;Bortot et al. 2019) depending upon conditioning. By employing the experimental framework of providing the bees with constructive feedback for incorrect responses, bees learnt to process higher numbers and exceed the subitising limit by accurately discriminating between presentations of four or five elements (Howard et al. 2019a). In one case of relative number learning (Howard et al. 2018), bees were presented with cards containing a certain number of different shapes and had to always choose the lowest number to gain a reward, where researchers controlled for low-level spatial cues. Bees then demonstrated a spontaneous capacity to understand that a blank stimulus containing no elements (known as an empty set) was less than any other stimulus that did present numbers of elements. Thus with the correct learning environment bees were able to determine that zero is quantitatively less than one, and is positioned at the lowest end of a sequence of positive integers, a non-trivial concept typically learnt by humans at school (Nieder 2016). The performance that bees demonstrated in this study on honeybees (Howard et al. 2018) is consistent with how humans understand 'zero-like concepts' at a quantitative level, which is the third stage of understanding 'zero' out of a four stage process (Nieder 2016). The emergence of a comprehensive understanding of 'zero' that passes through these four stages has been developed though the study of human history, developmental psychology, animal cognition, and neurophysiology (Nieder 2016).
In other recent experiments bees were trained and tested on their ability to perform a simple addition and subtraction problem. Using a DMTS procedure, individual bees were presented with a quantity ranging from 1 -5, which they either had to add one to, or subtract one from the priming sample number.
VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18 The task they had to complete depended on the colour of the initial quantity shown.
VIDEO 1 A representation of a bee learning this task (See here).
If the shapes on the stimulus were blue, the individually trained bees needed to perform addition, if they were yellow, bees needed to perform subtraction. Over the course of 100 learning trials, bees learnt to both add and subtract based upon the colour presented to them. Furthermore, bees were able to extrapolate addition and subtraction to a novel quantity (Howard et al. 2019b). The aggregated learning results, when averaged over all bees in the trial showed approximately linear improvement with the number of trials. Although this improvement in success clearly demonstrated learning of the numerical task, we wanted to know how the bees achieved this success. We had expected that the bees would demonstrate some kind of an "aha!" moment, a point where their ability to perform the addition/subtraction task shifted from approximately chance level to a very high level of accuracy. The aha! moment, a sudden comprehension of a situation which leads to a solution (Sternberg and Davidson, 1995), is well studied in human learning tasks (Kounios and Beeman 2009) and indicates that the underlying rules governing the task are understood and can be applied to the problem at hand. It was possible that the smooth increase in success was a result of each individual bee having their aha! moment at some semi-random time within the training, and so the averaging would smooth out the performances. To explore the learning methodology, we developed a Bayesian analysis where we used a moving average of 10 trials, and determined the probability that a particular VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18 learning outcome was consistent with any given success rate. The mode of this distribution corresponds to the regular moving average of the number of success, but the advantage of the Bayesian technique is that it also gives us a measure of the variation in the likely success probability. Our analysis of the learning rate of individual bees showed that was no single 'aha' moment where bees consistently learnt the problem (Howard et al. 2019c). Instead the bee performance often showed large swings in performance with several individuals showing significantly below average performance (ie worse than chance levels). Such results are interesting and are to be expected if each bee was attempting to apply incomplete heuristics to solve the task, or if they were trying different strategies in succession. One last numerical skill that has been recently tested in bees is the ability to link a numerosity with a symbol. The ability to represent numerosity values with a symbol is first known to be developed by the Greek mathematician Diophantus during the 3rd century, and became a key development in the widespread use of mathematics (Cooke 1997). Interestingly, it has also been shown that pigeons can match symbols to numerosity values (Xia et al. 2000). It was thus an important question to test if honeybees could learn this type of representation that has been so important to how we use mathematics. When honeybees were tested in a DMTS type task they successfully learnt that a symbol can represent a number. Bees were able to link a symbol, such as an Nshape or an inverted T-shape to the quantities of two or three, respectively VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18 ( Figure 2). However, if bees learnt the problem in one direction like matching the N-shape to two elements, they failed to correctly reverse the relationship. For example, if bees had learnt to match the N-shape to two elements, they were unable to match the quantity of two elements to the N-shape without training on that specific task. Thus, independent groups of bees could learn the association in either direction, but learning was unable to be reversed without specific experience. If we consider Piaget's theories on schemas in human learning where certain types of information have to be processed in a certain order to enable success, or an alternative MI theory that a brain has modules, each one controlling a different aspect of symbol (Gardner and Hatch 1989;Gray 2010), the data from studying bees symbol use for numerosity processing appears more consistent with the MI theory because the symbolic representation could not be reversed (Howard et al. 2019d). Indeed, bees in DTMS tasks can transfer processing between visual and olfactory domains , and neuroanatomical and electrophysiological evidence does reveal separate storage areas and processing pathways in the bee brain for these respective sensory modalities . However to fully understand the extent to which information processing in bee brains can inform different learning theories will require more experiments on a wider range of tasks used in human studies. In the symbol-tonumerosity-matching task, when bees view a sample sign (N-shape or inverted T-shape), they must match it to the correct quantity of two or three elements. (b) In the numerosity-to-sign-matching task, when bees view a sample quantity (two or three elements), they must match it to the correct sign (N-shape or inverted T-shape). The entrance hole and wall into the first chamber are not visible in this diagram. (c) An example of the signs being matched to their corresponding correct quantity (N-shape to two elements; inverted T-shape to three elements).

VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18
Given what we have recently learned from bees and numerosity processing, it is interesting to consider learning in young children for analogous type tasks. Whilst it is difficult to understand human learning differences because of our very different individual backgrounds and experimental access, eye tracking does allow us to map what a subject was viewing during a learning or evaluation stage. This is possible because attention is linked to where we move our eyes, and to see information in detail, like reading the words on this page, humans require the central, high-resolution foveal region to fixate on regions of interest (Dyer and Pink 2015). Children in the age range 9-11 years, when doing a mathematical task of number line estimation, showed significant differences between a group with Mathematical Learning Difficulties, compared to a control group; and there was also evidence of differences in estimation strategies and adaptability between groups (van't Noordende et al. 2016). Eye tracking of children with dyscalculia suggests that a lack of a capacity to use parallel subitising of small numbers, and thus having to count individual elements, that may explain numerosity processing differences in learning (Moeller et al. 2009). The recent findings that humans, bees and some other animals process numbers in a similar fashion suggests that numerosity processing may be linked to evolutionary conserved mechanisms (Giurfa 2019). However, it is also possible that numerosity processing may have evolved independently in both vertebrates and invertebrates. Understanding and appreciating these principles is likely to be of value in adapting how learning theories can be best customised to suit the heuristic learning requirements of modern education. Indeed such outcomes can extend beyond human and/or bee learning. New artificial intelligence techniques like deep learning can draw inspiration from how animals learn, enabling efficient solutions for complex problems like faster processing of photographic images (Yohanandan et al. 2018). A learning model based on how bees acquired the capacity to understand zero has recently been developed and shows that a bio-inspired single spiking neuron can learn relational number rules and a capacity to process zero, although AI deep learning still takes many more trials than bees to learn the task (Rapp et al. 2019).
We conclude that the importance of individual variability in learning strategies and speed appears to be a general biological principle which should be considered when forming decisional groups, or to adapt educational principles. Studies involving animals as distant from humans as bees could consequently be inspirational to better understand the biological root of our intelligence, and how we best learn. Indeed, it is important to apprehend how these universal properties of learning have shaped and constrained our VIDEO JOURNAL OF EDUCATION AND PEDAGOGY (2019) 1-18 cultural brain development, allowing us to efficiently solve many complex problems that remain challenging for current AI solutions.