Information and the Umwelt: A theoretical framework for the evolution of play

Play is phylogenetically widespread, and there are many proposed theories and fitness benefits of play. However, we still need a theoretical framework that unifies our understanding of the benefits that facilitated the evolution of play in so many diverse species. Starting with von Uexküll's theory of the Umwelt (i.e., the sensory-motor worlds of animals), together with the behavior systems approach, we propose that the Umwelt is an information processing system that serves basic biological functions. During development, the Umwelt undergoes a rapid expansion in the sensory and motor stimuli it processes. We argue that play is a process that converts surplus resources into information. By increasing the information content of the developing Umwelt, play confers fitness benefits. To demonstrate that play could evolve based on its information benefits, we present a model and simulation results of the evolution of a social play learning process that provides fitness-enhancing information in adult cooperative and competitive situations. Finally, we discuss this information-theoretic framework in relation to proposed hypotheses and fitness benefits of play.


Introduction
Play is common in mammals (Graham and Burghardt, 2010;Sharpe, 2018;Marley, 2022) and many birds (Ortega and Bekoff, 1987;Diamond and Bond, 2003;Kaplan, 2020), but it is also found in reptiles, amphibians, fish (Burghardt, 2005(Burghardt, , 2015)), and even some invertebrates (Kuba et al., 2006(Kuba et al., , 2014;;Zylinski, 2015;Dona et al., 2022).From an evolutionary perspective, the phylogenetic diversity of play is puzzling because play imposes fitness costs (e.g., lost energy, injury, predation, and spread of disease; Martin, 1984;Caro, 1988Caro, , 1995;;Barber, 1991;Burghardt, 2005;Sharpe, 2018;Schank et al., 2018;Smaldino et al., 2019) with no apparent benefits in many cases (Burghardt, 2005;Pellis et al., 2015;Leca, 2020;Leca and Gunst, 2023).Benefits have been proposed for some cases of play (as reviewed in Burghardt, 2005 and summarized in Table 1), but we still lack a general theoretical framework for understanding how play could evolve in such diverse species.The aim of this paper is to begin the development of a general theoretical framework for understanding and investigating the fitness benefits of play and its evolution.
Part of the challenge in developing such a framework is that play is notoriously difficult to define, especially in a way that captures its phylogenetic diversity (Burghardt, 2005;Pellegrini et al., 2007;Graham and Burghardt, 2010).Burghardt (2005) proposed five criteria for identifying play behavior that are broadly applicable and generally accepted (Lillard, 2015;Miller, 2017).These criteria specify aspects of process, structure, function, motivation, and conditions required to identify a behavior pattern as play.Satisfying all five criteria identifies a pattern of behavior as play.Following Burghardt (2005), with emphasis on the aspects to which they apply, play: (i) is repeated (i.e., but not in a stereotyped manner) during some portion of an animal's development (process); (ii) lacks some aspects (e.g., it is exaggerated, awkward, precocious, or the pattern of behavior is modified in sequence or target) that would make it functional (in the sense of promoting an immediate fitness benefit; structure); (iii) is not fully functional in that it does not contribute to immediate survival or fitness (function); (iv) is pleasurable, rewarding, reinforcing, or spontaneous (motivated); and (v) occurs when (a) an animal has sufficient or excess resources (e.g., an animal is well fed, healthy, and not under stress) and (b) there are no strong competing motivations (e.g., feeding or fear; conditional).
Of the patterns of behavior satisfying these criteria, play behavior is typically classified into three broad categories (Burghardt, 2005).Locomotor play is characterized by locomotor and rotational patterns of individual behavior.Object play is characterized by manipulative and exploratory patterns of behavior directed at one or more objects.Social play is characterized by interactive behavior patterns involving two or more individuals (e.g., chasing or wrestling).These categories of play are not mutually exclusive, and bouts of play may include combinations of behaviors from all three categories (Burghardt, 2005).Nevertheless, they can also evolve independently of each other within species and across lineages (Pellis et al., 2019).
Another challenge in developing a general framework for the fitness benefits of play is the view that not all play processes may have fitness benefits.Burghardt (2005) introduced a distinction between three types of play processes that are defined in terms of their fitness benefits.Primary process play is the most common form of play that satisfies the five criteria, but it is not viewed as resulting from the direct action of natural selection.Instead, Burghardt (2005) and Auerbach et al. (2015) have argued that primary process play is most likely to occur when an animal has surplus resources, a topic which we will return to later.Secondary process play can evolve from primary process play if it involves some fitness benefits from play, such as facilitating neurophysiological and behavioral development, and it requires the evolution of mechanisms controlling its expression (Pellis et al., 2019).Tertiary process play can evolve from secondary process play and involves critical fitness benefits to behavioral and cognitive abilities.Because tertiary process play leads to benefits in motor, social, and cognitive abilities, it requires the evolution of more general control processes, such as the learning processes that control play (Pellis et al., 2019).Thus, the fitness benefits of play only occur for secondary and tertiary process play in Burghardt's (2005) view.
A more recent distinction between types of play is simple and complex (Smaldino et al., 2019).Simple play is characterized by patterns of behavior requiring little motivation (e.g., spontaneous behavior) with few cognitive resources and investments.Simple play is typically locomotor play that involves uncomplicated explorations of physical and behavioral spaces and requires less integration of sensory, cognitive, and motor systems (Smaldino et al., 2019).Although Smaldino et al. (2019) do not directly compare simple and complex play to primary, secondary, and tertiary process play, simple play is typically primary process play, while complex play is typically tertiary play requiring more integrated motivational and cognitive systems.Between simple and complex play lies secondary process play, which characterizes the transition between simple and complex play.A general theoretical framework for understanding the evolution of play must accommodate the view that play processes differ in complexity, which has implications not only for the patterns of behavior we observe but also for the sensory, motor, cognitive, and motivational systems involved in play.
In the following sections, we develop a theoretical framework for the fitness benefits and evolution of play.While our focus in this paper is on the fitness benefits of juvenile play, adults of many species engage in play (e.g., Pellis and Iwaniuk, 2000;Antonacci et al., 2010;Lutz et al., 2019;Palagi, 2009Palagi, , 2023)), and as we discuss below, this framework can be applied to adult play as well.We begin by briefly reviewing theories of the proposed fitness benefits of play.Because play occurs in diverse taxa with diverse nervous, sensory, and motor systems (Burghardt, 2010;Auerbach et al., 2015;Burghardt and Pellis, 2019), a general framework is required that can span the diversity of sensory and motor systems in animals that play.Such a framework was introduced by Jacob von Uexküll (1934a, b) in his concept of the Umwelt (i.e., the perceptual-motor world of animals; see below for details).As discussed above, animal play ranges from simple to complex, and understanding play and especially complex play, requires the notion of complex Umwelten (the German plural of Umwelt).von Uexküll (1934a, b) introduced a distinction between simple and complex Umwelten but did not systematically articulate this distinction.We extend the notion of complex Umwelten by integrating behavior systems approaches (Timberlake and Grant, 1975;Timberlake, 1994;Timberlake and Lucas, 1989;Silva et al., 2019;Lucas, 2019; see details below) into the notion of the Umwelt.
The central principle of our framework is that early development is a period of rapid expansion in the complexity of the developing Umwelt.The flood of sensory and motor stimuli made available to the Umwelt by developing sensory and motor systems must be structured into coherent information about the world.We show that information theory (e.g., see Shannon, 1948;Jensen et al., 2013) allows us to characterize the complexity of a developing Umwelt as maximum entropy (i.e., the maximum possible uncertainty or information disorder of an Umwelt) and entropy (i.e., a measure of uncertainty or how well an Umwelt is informationally structured).We then consider the development of the Umwelt over ontogeny and how, as various sensory and motor systems develop, the maximum information entropy of the developing Umwelt rapidly increases, starting during perinatal development.This challenges young animals to reduce the entropy of their developing Umwelten.We argue that play, whether simple or complex, whether primary, secondary, or tertiary, is a process that reduces Umwelt entropy (i.e., increases information) while also expanding maximum entropy (i.e., a measure of Umwelt complexity).
To theoretically demonstrate that play could evolve by reducing Umwelt entropy, we present an agent-based model of social play in which juvenile agents can gain information about future adult cooperative and competitive social situations.We show that a social play learning process (complex play) evolves because it reduces Umwelt entropy about future adult cooperative and competitive social situations.

Fitness benefits of play
Historically, theories of play have ranged from positing no fitness benefits to play being highly beneficial.On the non-beneficial side, Spencer's (1872) surplus energy theory of play proposed that in "higher" animals (e.g., mammals), the accumulation of excess energy is released through spontaneous play behavior.Thus, play is a by-product of the release of surplus energy via spontaneous behavior.Hall's (1904) recapitulation theory of play also falls on the less beneficial side.He focused on human play and viewed play behaviors as recapitulations of evolutionarily older instinctive behaviors that are less essential to human behavior in modern societies-for example, throwing a ball,

Table 1
Types and subcategories of proposed fitness benefits of play.We distilled Burghardt's (2005) 11 categories of benefits into four major types, each with several subcategories.None of these categories are independent of each other.
though not adaptively functional now, might be a remnant of functional throwing behaviors from hunting contexts in our evolutionary past.Interestingly, Hall did not appear to claim that all play behaviors lack adaptive function since some play may be important for developing some modern functional behaviors (Burghardt, 2005).For example, although play has little benefit for adult functional behavior, Hall viewed play as emotionally important for relieving the boredom and stress of modern human life (Burghardt, 2005).
Groos's (1898) instinct practice theory of play is on the other end of the fitness benefits spectrum.For Groos, instinctive behavior requires practice during development to achieve adult competence.While adult functional behaviors are partly instinctive, they require experience and practice to become fully functional; play as a juvenile provides this experience and practice.More recent theories of the benefits of play also fall on the beneficial side, likewise emphasizing the importance of experience, practice, or learning for adequate neural and behavioral development (e.g., Baldwin and Baldwin, 1977;Brownlee, 1954;Špinka et al., 2001;Pellegrini et al., 2007;Gray, 2019;Pellis et al., 2010aPellis et al., , 2010b;;Riede et al., 2018; and systematic reviews of theories and benefits of play can be found in Takhvar, 1988;Mellou, 1994;Burghardt, 2005;Elkonin, 2005;Henricks, 2015;Saracho, 2017).Burghardt's (1988Burghardt's ( , 2005)); Graham and Burghardt (2010) modern surplus resource theory attempts to systematically integrate the insights of past theories and explain when and why primary process play occurs.It frames the study of the evolution of play in the context of four conditions or contexts (energetics, ontogeny, ecology, and psychological/social), which can favor the occurrence of play by specifying the conditions and contexts under which play may have evolved (Fig. 1).Burghardt's theory accommodates the earlier distinction between primary, secondary, and tertiary process forms of play.If all four conditions (Fig. 1) are favorable, primary process play can occur, especially among the juveniles of a species, even without fitness benefits (Auerbach et al., 2015).These four conditions are also required if primary process play is to evolve into secondary or tertiary process play.
To make progress in our evolutionary understanding of play, finding a general currency for describing its benefits is critical.All types and subcategories of play benefits (Table 1) involve, directly or indirectly, sensory and motor systems, neural pathways, central nervous systems, learning, and cognition.Moreover, play occurs not only in mammals but-as mentioned earlier-also in birds (Ortega and Bekoff, 1987;Diamond and Bond, 2003), reptiles (Burghardt, 1998), and even invertebrates (Kuba et al., 2006(Kuba et al., , 2014;;Zylinski, 2015;Dona et al., 2022).Thus, a theoretical framework for the fitness benefits of play should be sufficiently general to explain not only the benefits to mammals but also for species with radically different brain organizations, such as Octopus vulgaris (Gutnick et al., 2020), which also play (Kuba et al., 2006).Thus, our starting point in developing a general theoretical framework is Jacob von Uexküll's notion of the Umwelt, the sensorimotor world of an animal.

The Umwelt, information theory, and play
Von Uexküll (1934a, b) introduced the Umwelt as an animal's perceptual and motor world to counter the mechanistic-reductionistic view of the life of the late 19th and early 20th centuries.He was not a vitalist as he accepted a physical-chemical interpretation of life (Ziemke and Sharkey, 2001), but he did reject Darwinian explanations of the Umwelt (Feiten, 2020).Nevertheless, as a predecessor of cybernetics (Amrine, 2015;Burghardt and Bowers, 2017), von Uexküll's functional analysis of the Umwelt fits squarely into an evolutionary framework.
In the following sections, we introduce the notion of the Umwelt and show that information theory can be applied to measure the information content of the Umwelt.Next, we consider complex Umwelten, consisting of multiple perceptual-motor systems and levels of information processing required, especially for animals capable of complex play.We then turn to the development of the Umwelt during ontogeny, a period of rapid expansion of the Umwelt due to the development of sensory, motor, and information processing systems.A fundamental task of a developing animal is to obtain perceptual and motor information about its world to survive and reproduce.The information content of the Umwelt is partly of hereditary origin but also requires the developing animal to actively engage in its environment.The central thesis we develop is that play is beneficial because it increases and structures the information content of the Umwelt.

Von Uexküll's theory of the Umwelt
For von Uexküll, the primary functional components of the Umwelt are functional cycles in which an animal interacts with its environment via processing perceptual cues to motor cues (Fig. 2).To illustrate how Fig. 1.Burghardt's (2005) surplus resource theory play with four main conditions that support the evolution and occurrence of play.
J.C. Schank et al. functional cycles structure the Umwelt; he described the Umwelt of the adult female tick, which after mating, climbs up a blade of grass or a twig and waits for a mammal to pass underneath.A succession of three functional cycles can result in a blood meal for a tick (Fig. 3).A stimulus of butyric acid released from a mammal reflexively results in a tick letting go and dropping from a twig or blade of grass (cycle 1).The tactile stimulus of landing on hair extinguishes the butyric acid stimulus and reflexively elicits running-about behavior (cycle 2).Finally, the heat stimulus from a mammal reflexively starts the boring response into the skin (cycle 3).Thus, the "… whole rich world around the tick shrinks and changes into a scanty framework consisting, in essence, of three receptor cues and three effector cues-her Umwelt" (von Uexküll, 1934a p. 12), which are united in three functional cycles (Fig. 3).In essence, functional cycles for von Uexküll are a feedforward loop in which perceptual-field information is processed into motor-field responses that, in turn, affect subsequent perceptual-field information by acting on the environment.
The Umwelten of ticks vary among species and, in general, are more complex than described by von Uexküll (e.g., see Waladde, 1987).For example, some species of ticks have directional infrared sense organs on their forelegs, which allow them to detect warm bodies up to several meters (Carr and Salgado, 2019).However, von Uexküll's description of the tick's Umwelt is adequate to explicate the conceptual structure of the Umwelt.For the tick, the perceptual-field information about butyric acid results in a motor-field response of letting go.The "letting go" response subsequently affects the information available in the tick's perceptual field (i.e., letting go makes it possible to land on hair, introducing new perceptual-field information and extinguishing the butyric acid cue).For the tick, the functional cycles structuring its Umwelt are reflex arcs processed by central receptors and effectors (Fig. 3), but as we will see, the functional cycles of animals that play are far more complex.
For von Uexküll, the Umwelt is the phenomenal sensory-motor world of animals in which patterns of physical stimuli are constructed or emerge as patterns of neural activity that are the perceptual and motor cues of the Umwelt (Burghardt, 1991(Burghardt, , 1998;;Feiten, 2020).Von Uexküll's concept of the Umwelt shares with enactivism (i.e., embodied cognition, where cognition emerges from the dynamic interaction of the animal and its environment; Noë, 2004) the importance of experiencing and acting (Thompson, 2007;Feiten, 2020).A more recent interpretation of the Umwelt (Dennett, 2015;Baggs and Chemero, 2021), associated with recent attempts to incorporate the concept of Umwelt into ecological psychology (Gibson, 1979), is to view it as a subset of the physical properties accessible to an animal's sensory and motor systems (see Feiten, 2020 for an analysis of these different interpretations of the Umwelt).Here, we keep von Uexküll's phenomenal sensory-motor world interpretation of the Umwelt (Burghardt, 1991(Burghardt, , 1998)).In the next section, our information-theoretic analysis of the Umwelt is in terms of perceptual and motor cues of the Umwelt.

Information theory and the tick's Umwelt
A tick succeeds in obtaining a blood meal only if it has information about the meaning of particular sensory stimuli and how to behave in response.The information content of a tick's Umwelt can be measured with the help of information theory.The Umwelt is a functional system for receiving and processing perceptual cues and transmitting motor cues for appropriate behavioral responses.Information theory allows us, in principle, to quantify and measure the uncertainty of the transmission of signals (cues) through channels (e.g., perceptual and motor fields).Shannon (1948) introduced the mathematical theory of information to quantify the transmission of information through a system.Given a probability distribution for the signals (i.e., perceptual cues in a functional cycle) transmitted on a channel, the information entropy of the signal, which is the uncertainty in the outcome measured in bits (i.e.,  lower entropy reflects lower uncertainty in the outcome), is given by Eq. (1): where the entropy H is a measure of uncertainty about which outcome x in the set X of possible outcomes will occur.
In the case of the tick's Umwelt, the signals are cues in its perceptual and motor fields (Fig. 3).There are three perceptual channels and three motor channels (i.e., in von Uexküll's terminology, "three receptor cues and three effector cues").For each channel, there are two possible outcomes.For example, suppose butyric acid is the stimulus, and the set of receptor cues in its perceptual field for butyric acid, B, are x = {B, ¬B} (where "¬" is a symbol for "not").If p(B) = 0.5, then p(¬B) = 0.5, and the entropy of the signal would be -(0.5 × log 2 (0.5) + 0.5 × log 2 (0.5)) = 1 bit of entropy, which is the same as flipping a fair coin.If, on the other hand, p(B) = 0.01 (this is no doubt too high since rarely does a mammal releasing butyric acid pass underneath a tick), then p(¬B) = 0.99 which results in a lower entropy (i.e., signal uncertainty) of -(0.01 × log 2 (0.01) + 0.99 × log 2 (0.99)) = 0.08 bits of entropy.Lower entropy reflects reduced uncertainty as to the outcome.The maximum entropy of a probability distribution of signals occurs when the distribution is uniform, and for discrete distributions, it is log 2 (n), where n is the number of signals or cues.As the number of signals in a channel increases, so does the maximum entropy of the outcome.For example, for a fair coin, the maximum entropy is log 2 (2) = 1 bit, but if the signals are cards in a well-shuffled deck, the maximum entropy is log 2 (52) = 5.7 bits.
We can also apply information theory to the basic unit of analysis of the Umwelt, the functional cycle, which is the source of functional meaning (i.e., functional tone) for perceptual cues for von Uexküll (1934a, p.47).A measure of the entropy in a functional cycle is its joint entropy (Shannon, 1948; also see Jensen et al., 2013 for an introduction to information theory in the context of brain and behavior) given in Eq. (2): where X is a perceptual channel, and Y is a motor channel in the tick's Umwelt.The joint entropy of the tick's functional cycle 1 (i.e., the pairing of butyric acid olfactory cue and subsequent "let go" motor cue; Fig. 3 functional cycle 1) can be calculated using Eq. ( 2) if both the probabilities of detecting butyric acid and the conditional probabilities of "letting go" are known using Eq. ( 2); a hypothetical calculation is provided in Table 2.
Calculating the joint entropies for the remaining two functional cycles is more complicated because the tactile and heat cues depend on prior behavioral responses of the tick.Functional cycles are not merely unidirectional perceptual input and effector output relationships; they are conditional and even bidirectional (Pellis and Pellis, 2021), and therefore, a kind of recursive process in which one functional cycle feeds into the next (see Fig. 3).When a tick climbs up a blade of grass or a twig on a bush, it positions itself to be stimulated by butyric acid resulting in a butyric acid perceptual cue.When it lets go from its perch, its behavior makes the stimulus of landing on hair possible, and subsequent running about makes the cue of warmth possible.The joint entropies for functional cycles 2 and 3 could be estimated if all the relevant conditional probabilities were known.
Even without estimating the relevant probabilities for each of the tick's functional cycles, there are insights that information theory reveals about the tick's Umwelt.First, the maximum entropy of the tick's Umwelt is the sum of maximum entropies for the three functional cycles, which is 6 bits.This is because there are six channels for the sensory or effector cues (i.e., pairs of outcomes in its perceptual-motor worlds).To calculate maximum entropy, we assume that the channels are independent and that the probability of each cue is uniformly distributed within channels, which results in a maximum entropy of 6 bits.Second, a tick must accurately recognize each cue and react appropriately to obtain a blood meal.Therefore, the information entropy of the tick's Umwelt must be very low relative to its maximum entropy, given the extremely low base rate of mammals passing under ticks (von Uexküll, 1934a, p. 12).

Complex Umwelten and behavior systems
Von Uexküll (1934a, b) appreciated that the Umwelten of animals have a broad range of complexity.This point is expressed in his first principle of Umwelt theory: "all animals, from the simplest to the most complex, are fitted into their unique worlds with equal completeness.A simple world corresponds to a simple animal, a well-articulated world to a complex one" (von Uexküll, 1934a, p. 11).Intuitively, a more-articulated Umwelt has more information than a less articulated Umwelt.In this sense, a well-articulated Umwelt is also complex.A tick can detect the presence of a deer through the stimulus of butyric acid, but the more complex Umwelt of a mountain lion allows it to detect a deer through multiple sensory pathways with multiple stimuli.A more complex Umwelt, such as a mountain lion's, has more cues in its perceptual and motor fields and more information processing of these cues than the simpler Umwelt of the tick.
The complexity and information content of an Umwelt is not merely a matter of the number of perceptual-motor cues and functional cycles but also moods, according to von Uexküll (in this context, he is using the term mood to refer to motivational states) of an animal.The moods of animals alter the functional meaning (i.e., functional tone) of perceptual and motor cues.von Uexküll (1934a) discussed how moods function in giving cues meaning with the example of the hermit crab.For the hermit crab, "any object of a certain order of magnitude with cylindrical to conical outline can assume meaning for it" (von Uexküll, 1934a, pg. 47), and a sea anemone presents this receptor image (i.e., perceptual cue) to the hermit crab.Depending on its mood, the meaning of the perceptual cue changes.If a hermit crab lacks protective anemones on its shell, then its mood is defensive, and the meaning of the receptor cue of an anemone takes on a "defensive [functional] tone."The meaning is expressed in the functional cycle of responding to the receptor cue by placing the anemone on its shell.If the hermit crab loses its shell, its mood is safety, and the meaning of the receptor cue that an anemone takes on a "dwelling [functional] tone."This time the meaning is expressed in the futile functional cycle of attempting to crawl into the anemone.Finally, if the crab is starving, then its mood is feeding, and the meaning of the receptor cue takes on a "feeding [functional] tone."

Table 2
Hypothetical calculation of joint entropy for functional cycle 1 in Fig. 3 assuming a tick has climbed up a blade of grass or a twig.

Probability of olfactory cue
Conditional probabilities Joint probabilities* Joint entropy The meaning is expressed by the crab grabbing the anemone and eating it.For von Uexküll, a more complex Umwelt not only has more perceptual and motor cues than a simpler one, but the functional meaning of these cues changes with motivational states and processes.
A long-established ethological approach for analyzing the drives and motivational states of animals is the behavior-systems approach (Craig, 1918;Tinbergen, 1942Tinbergen, , 1950Tinbergen, , 1963;;Baerends, 1976a;1976b; see Burghardt and Bowers, 2017 for a theoretical and historical overview).Timberlake (Timberlake and Grant, 1975;Timberlake, 1994;Timberlake and Lucas, 1989;Lucas, 2019), for example, developed a detailed analysis of behavior systems in terms of multiple levels of information processing control for the Norway rat.Two critical features of Timberlake's behavior systems approach make it ideal for extending von Uexküll's theory of complex Umwelten (also see Burghardt and Bowers, 2017 for a discussion of behavior systems and the Umwelt).First, his (Timberlake and Lucas, 1989;Lucas, 2019) perceptual-motor structures (modules) can be interpreted in Umwelt theory as being sensitive to specific perceptual cues (using von Uexküll's terminology), which connect these cues to specific motor outputs.Second, motivational processes (i.e., systems or subsystems) can be interpreted as priming and organizing lower levels of processing functional connections between perceptual and motor cues.Following Timberlake's approach, different behavior systems correspond to important functions such as feeding, reproduction, predator defense, and social behavior.
Behavior systems approaches are not limited to motivational states and processes but can include other psychological processes that hierarchically organize behavior, in particular, emotions (Burghardt et al., 2017;Burghardt, 2019).In Burghardt's view (Burghardt and Bowers, 2017), motivational systems are neither independent of experience nor merely biological drives of survival and reproduction, especially in the case of humans (Dickinson and Balleine, 1994).Acquired motivations stem from visual, chemical, social, and tactile cues; for example, in humans, acquired motivations can be honor, truth, and prestige (Dickinson and Balleine, 1994).This suggests that for more complex Umwelten, the hierarchical organization of functional cycles is controlled by motivation, emotion, and affect and may be modified or organized by learning and experience, especially as the Umwelt develops.
An abstract representation of a behavior system, following Timberlake's behavior systems approach and applied to von Uexküll's theory of complex Umwelten, is depicted in Fig. 4. Behavior systems replace von Uexküll's largely unspecified notions of central receptor and effector (Fig. 4 top).The critical point is that behavior-systems approaches allow us to go beyond the reflex-mediated functional cycles of simple Umwelten, such as the tick's Umwelt, to Umwelten characterized by complex functional cycles with multiple levels of control and information processing that can support and are modifiable by learning (Burghardt and Bowers, 2017; Burghardt, 2020).Learning is essential for complex Umwelten because environments are ever-changing, requiring animals to continually engage their physical and social environments to maintain informationally well-articulated Umwelten.
Calculating the entropy of a behavior systems Umwelt with its many levels of perceptual and motor channels is currently not possible for several reasons.First, we do not have a complete behavior systems analysis for any species (Timberlake and Grant, 1975;Timberlake, 1994;Timberlake and Lucas, 1989;Lucas, 2019).Second, we do not know how behavior systems differ among species.Third, we do not know how many levels of processing occur or whether different systems have different levels or modes of information processing (e.g., are the brains of animals Bayesian?; Yon and Frith, 2021).Nevertheless, we still can draw general conclusions about the information content of complex Umwelten and how this information changes as animals develop, which is critical for understanding why animals play and why play evolves.
To begin, we can express the key elements of von Uexküll's first principle with information theory.First, complex Umwelten process more perceptual-motor cues and have more motivational processes than simpler Umwelten.In information theory terms, this implies that there will be more signals (cues) in perceptual and motor channels at different levels of information processing (see Fig. 4).That is, the channels and the probability distributions characterizing the signals in them differ with different motivational processes.As the quantity of perceptual and motor cues, as well as the complexity of motivational control levels, increases, so too does the maximum entropy of the Umwelt.Thus, more complex Umwelten will be associated with higher maximum entropy.Second, von Uexküll's (1934a) assertion that both simple and complex animals are "fitted into their unique worlds with equal completeness" (p.11) implies that whether an Umwelt is simple or complex, its functional cycles must be predictive (i.e., must have low entropy).For example, suppose the tick's Umwelt entropy increases (i.e., it becomes less predictive of its world).In Table 2, this could happen if the conditional probability of "letting go" when there is no sensory stimulus of butyric acid is less predictable: p(L|¬B) = 0.5 and so p(¬L|¬B) = 0.5.In this In the bottom portion of the figure, different levels of information processing control are represented with an arbitrary number of structures enumerated at each level.Lines indicate possible channels between different levels of information processing.In this depiction of behavior systems, the first three levels are different levels of motivational control, which prime lower levels for more specific motivational states and functions.Systems (the highest level) are general motivational control processes that prime lower-level systems for specific functions such as feeding.Subsystems are lower-level motivational processes that sensitize the animal to specific types of cues for the function activated by a higher-level system.For example, a feeding system may activate a predating subsystem, which sensitizes lower-level units to specific perceptual and motor cues appropriate for predation.Modes are the lowest level of motivational control that further narrow the range of specific cues an animal is sensitive to and primes specific modules to these cues.Modes in this sense are similar to the sequence of functional cycles structuring the tick's Umwelt (Fig. 3), but unlike the tick, modes, as with all levels of control, can be modified by learning.Modules are perceptual-motor functional units that respond to specific cues with specific motor outputs.case, the joint entropy increases from 0.366 bits to 1.07 bits of entropy.The increase in entropy results in a tick "letting go" more frequently when there is no butyric acid present, reducing a tick's foraging success rate and, thereby, its fitness.For a functional adult Umwelt to develop, the information entropy must be well below its maximum entropy so that an adult can predict and anticipate key events in its environment.
In short, the complexity of an informationally well-articulated Umwelt is characterized by its maximum entropy and entropy.The latter point is ambiguous because one Umwelt could have lower entropy than another due to one having a lower maximum entropy than another.Recall that the maximum entropy of a fair coin is 1 bit while the maximum entropy of a fair deck of cards is 5.7 bits.Miller and Frick (1949) introduced a relative measure of entropy as a percentage of maximum entropy, which can be extended to interdependent structures (Jensen et al., 2013) such as behavior systems (Fig. 4).Generalizing their relative measure to the entropy of an Umwelt, the relative entropy of an Umwelt is its entropy divided by its maximum entropy, which is a value between 0.0 and 1.0.For example, suppose a deck of cards is unfair, and the probability of drawing the ace of hearts from it is 0.5, and for all other cards, the probability is 0.5/51.In that case, the entropy of a card draw is approximately 3.84 bits, which is greater than a fair coin, but the relative entropy is 0.67 bits compared to 1 bit for a fair coin.Thus, an informationally well-articulated complex Umwelt is characterized by high maximum entropy (i.e., it processes many perceptual and motor cues modulated by motivational processes) and low relative entropy.

Development of the Umwelt
Von Uexküll did not address how the Umwelt develops over ontogeny.However, Alberts and colleagues (Alberts, 1984;Alberts and Ronca, 1993;Schank and Alberts, 2000) introduced an empirical model of the development of the Umwelt using the Norway rat as a model mammalian species for investigating perinatal development.He argued that rats, and mammals in general, go through a succession of Umwelten driven by the development of their sensory systems, which in mammals and birds develop starting with tactile to vestibular, auditory, and finally visual systems (Alberts, 1984;Alberts and Ronca, 1993).
Developing sensory systems depend on sensory input and active engagement with the environment.We list several examples to illustrate how developing animals must actively engage their environments postnatally and prenatally.Weisel and Hubel (1963) showed that the deprivation of light for two or three months in one eye of kittens was associated with the atrophy of neurons in the lateral geniculate body that received projections from the eye deprived of light.This implies that the complexity of the Umwelt increases during development only if developing kittens can engage the sensory stimuli available in the environment.During prenatal development in rats, fetal rats are exposed to tactile, vestibular, and olfactory stimuli to which they respond with leg movements, twitches, trunk curls, and swallowing movements (Ronca et al., 1993;Alberts and Ronca, 1993).Such sensorimotor activity may have several functions (Smotherman and Robinson, 1987), including the active integration of sensory and motor systems (Robinson et al., 2000).Postnatally, fetal sensorimotor activity may also have benefits; for example, if citral (a component in citrus peel oil) is injected into the amniotic fluid of prenatal rat pups, they exhibit a strong preference for attaching to the nipples of a mother rat with citral smeared on them (Alberts, 1984;Alberts and Ronca, 1993).Furthermore, Gottlieb and colleagues found postnatal benefits in mallard ducks (Lickliter, 2007) by experimentally investigating whether the self-stimulation of embryonic ducklings' vocalizations before hatching was required to establish a preference for maternal calls after hatching.A preference for maternal calls did not develop if an embryonic duckling did not experience its own vocalizations-they were as likely to prefer the maternal calls of chickens as maternal mallard calls (Gottlieb and Vandenbergh, 1968).
These examples suggest that as different sensory and motor systems develop pre-and postnatally, the maximum entropy of the developing Umwelt increases (i.e., there is a rapid increase in the number of perceptual and motor cues in the developing Umwelt), and active engagement by developing mammals and birds is essential both to increase the maximum entropy (i.e., increase the number of cues) while reducing the entropy (i.e., reduce uncertainty and give them functional meaning) of their developing Umwelten.

Play
Our working hypothesis is that the development of the Umwelt is (i) characterized by a rapid increase in maximum entropy beginning during perinatal development as sensory and motor systems develop, which (ii) challenges developing animals to lower their Umwelt entropy.By actively engaging the environment, animals increase the maximum entropy while reducing the relative entropy of their developing Umwelten.Any active engagement of the world, functional or non-functional, can increase maximum entropy while reducing Umwelt entropy.When potential costs are low, and there are no strongly competing motivations (i.e., the fifth criterion above of Burghardt's, 2005, five criteria for identifying play), play provides a non-functional complementary route (i.e., complementary to more functional behaviors) for increasing maximum entropy while reducing relative entropy of the developing Umwelt by actively encountering new sensory stimuli and generating novel motor responses.Thus, in our theoretical framework, play functions to convert surplus resources (see Fig. 1) into future information (i.e., reduced entropy) that has fitness benefits for the adult Umwelt.
To illustrate how play affects the development of the Umwelt, Fig. 5 presents a hypothetical example of the increase in maximum entropy and decrease in relative entropy of a developing Umwelt with and without play.Without play, the maximum entropy of the Umwelt increases from perinatal development to the juvenile and early adult phases of development, and the correspondingly relative entropy decreases as animals learn about their environment in non-play contexts.However, by actively engaging its world through play, developing animals experience new sensory stimuli and try out novel motor responses to explore and manipulate their world.Through experience and learning during play, developing animals increase the maximum entropy of their Umwelten (i.e., increase the quantity of perceptual and motor cues) relative to the no-play case while decreasing the relative entropy of their developing Umwelten relative to the case of no-play.

Fig. 5.
Hypothetical illustration of the effect of juvenile play on the maximum entropy of the Umwelt and relative entropy.Play is theorized to increase maximum entropy while decreasing relative entropy.
In the information-theoretic framework proposed here, simple or primary process play poses a problem.Primary process play is defined as having little or no fitness benefits (Burghardt, 2005), but in the information-theoretic framework, it can have fitness benefits by converting surplus resources into information.To motivate this claim, we view play as a kind of environmental enrichment process that informationally enriches the Umwelt.In environmental enrichment studies, the Umwelten of animals are manipulated by changing the complexity of the environment (either the sensory, motor, or social environments) to assess how changes in complexity affect the development of the brain and behavior.Experiments on laboratory animals such as rats and mice, domestic livestock, and animals in zoos have repeatedly found that environmental enrichment facilitates various kinds of learning, memory, and cognition (as reviewed in Zentall, 2021), increases brain growth, and increases the size of the brain and some of its substructures (Kempermann, 2019).Actively engaging and exploring the environment is critical for the effects of environmental enrichment (Kempermann, 2019).In contrast to environmental enrichment studies, where the complexity of the environment is manipulated, play can determine the complexity of the environment that developing animals experience (i.e., analogous to environmental enrichment research, Freund et al., 2013, in which the degree of enrichment depends on the behavior of animals, discussed below).Thus, locomotor-rotational, object, and social play can enrich interactions with physical and social environments allowing developing animals to experience novel stimuli and perform novel patterns of behavior that can informationally enrich their Umwelten.
If all or at least most of the fitness benefits of play, whether primary, secondary, or tertiary, ultimately involve reducing Umwelt entropy and increasing its maximum entropy, then it should be possible to demonstrate that play could evolve by serving one or both of these functions.Burghardt (2005) proposed using theoretical models (i.e., mathematical models and computer simulations) to demonstrate that play has fitness benefits.In the next section, we introduce a social play scenario where juvenile agents can gain information about adult cooperative and competitive situations by engaging in social play with other juveniles in their group.Using computer simulations, we show that a social play learning process (SPLP) can evolve because it provides information about future adult cooperative and competitive situations, thereby increasing the fitness of individuals who can learn from social play as juveniles.

Model overview
The information-theoretical framework developed above implies that selection should favor behavioral processes that reduce Umwelt entropy.Our starting point is that social play is instrumental in juveniles acquiring adult social competence (Palagi, 2018).In particular, juveniles learn to use competitive or cooperative strategies in adult social situations by engaging in rough-and-tumble play (RTP; Boulton and Smith, 1992;Bekoff, 2001;Bauer and Smuts, 2007;Gray, 2019;Reinhart et al., 2010;Palagi and Cordoni, 2012;Palagi et al., 2016;Pellis andPellis, 2009, 2017;Palagi, 2018;Cenni and Fawcett, 2018;Kraus et al., 2019;Nolfo et al., 2021).This model assumes that adult social competence is achieved by reducing entropy about whether adult-social situations are competitive or cooperative.That is, if juvenile agents can learn whether they are in a cooperative or competitive social group, as adults, they can use a strategy that optimizes their reproductive fitness in their social group.
To investigate the plausibility of selection for reduced entropy, we developed an agent-based model of an SPLP that could evolve by reducing Umwelt entropy in cooperative-competitive situations (e.g., see Burghardt, 1998 a model of inter-Umwelt communication).We started with a basic two-stage model of agent development (Durand and Schank, 2015;Schank et al., 2018; see Fig. 6).In the juvenile stage of development, agents engage in cooperative or competitive RTP, and they either have an SPLP, or they do not.In the adult stage, agents engage in a public goods game that is also cooperative or competitive.If most adult agents cooperate, then cooperative agents do well.If they do not (i.e., they are competitive), the cooperative agents are open to exploitation by competitive agents.The probability of playing cooperatively as a juvenile and behaving cooperatively in a public goods game (i.e., an N-person prisoner's dilemma game, e.g., see Hauert et al., 2006) as an adult is inherited, but if an SPLP evolves, the probability of cooperating as an adult can be modified by engaging in RTP as a juvenile.

Agents
All agents (except those present when a simulation is initiated, which are all adults) go through juvenile and adult stages.The juvenile stage lasts T = 24-time steps, and when a juvenile agent reaches time step T + 1 = 25, it becomes an adult.If an agent reaches the adult state at T + 1 = 25 steps, then it has an average lifespan of an adult agent, measured in time steps, L = 100-time steps, with the first 24 steps being the length of the juvenile stage.At the end of an agent's lifespan, it dies.The lifespan of an agent is determined by drawing a number from a normal distribution with mean L and standard deviation, S DL = 20, which is then truncated to an integer.This procedure introduced variation into agent lifespans, which avoids synchronous reproduction during simulations (Lin and Schank, 2022).
When a juvenile agent reaches time step T + 1 = 25, it enters the population only if the population size is below its maximum of N max ; otherwise, the agent dies.This procedure was implemented to prevent a population from growing indefinitely large and is a Moran (1962)-like process (Lin and Schank, 2022).As agents die, they are replaced with young adults (i.e., at step 25).Thus, for an agent that lives its entire life span, T = 24 steps are spent as a juvenile, and, on average, 76-time steps are spent as an adult.Agents die only at the end of the juvenile stage or the end of their lifespan.
During the juvenile stage, each juvenile agent engages in bouts of RTP with other juvenile agents in their group.However, RTP does not alter their probability of cooperating unless they also have an SPLP.
During the adult stage, each adult agent receives one unit of a resource during each time step.Adult agents then play a public goods game in which they contribute some portion ε (0 ≤ ε ≤ 1) of that resource to a common pot if they cooperate.At the end of the step, after all adult agents have contributed with a probability of cooperating that is either inherited or modified by RTP with an SPLP, the pot increases in value by a multiplier m > 1 and is evenly distributed among all adult group members.Even if an agent does not cooperate, it still receives an equal share of the pot.
When adult agents accumulate sufficient resources, they produce a single offspring (i.e., juvenile agent).Offspring agents inherit their probability of cooperating in juvenile and adult cooperative and competitive situations.Agents either have an SPLP to modify their probability of cooperating when engaged in bouts of RTP, or they do not.

RTP and SPLP
When two juvenile agents engage in RTP, it is rewarding with value b.There are two strategies of juvenile RTP: cooperative (C) and competitive (¬C).We assumed that the rewarding aspect of RTP may change depending on the strategies (i.e., C or ¬C) that the playmates use.For example, if a juvenile i plays cooperatively while their partner j plays competitively, (C i , ¬C j ), there is a potential reward cost, c, which diminishes the rewarding aspect of a play bout for the agent engaging in cooperative play.We also assumed that the rewarding aspect of RTP could be reduced by a reward cost, c* , when both juveniles engage in competitive RTP, (¬C i , ¬C j ).Thus, the reward, b, of RTP minus the costs determines how the probability of playing cooperatively changes during the juvenile stage if a juvenile has an SPLP.
Preliminary simulations were run to determine which values of reward costs, c and c* would result in the most robust evolution of SPLP reported below.We found that c = 0.4 and c* = 0.0 resulted in the highest evolved frequency of SPLP in populations of agents.Thus, Eq. ( 3) determines the reward for play for all combinations of cooperative and completive play with the only reward cost for a cooperative juvenile playing a competitive juvenile.
where i and j are juveniles (i ∕ = j), and x i and y j are variables for strategies C and ¬C that juveniles i and j employ.We assumed that in all cooperative or competitive situations (S), juvenile and adult agents behave cooperatively with conditional probability p(C|S) and competitively with conditional probability p(¬C|S) = 1 -p(C|S).The conditional probability of cooperating, p(C|S), can be modified in two ways in this model: (i) inheritance (i.e., it is "genetic") with mutation rate r when an offspring is reproduced or (ii) an evolved SPLP ability to modify p(C|S).
On each time step, each juvenile agent randomly selects another juvenile agent in its group to engage in RTP.Each juvenile agent engages in RTP whether or not it has an SPLP.Because each agent selects a playmate and may be selected as a playmate, on average, each juvenile agent plays with n ≥ 1 other agent(s) on each timestep.The average reward received by juvenile agent i is given by Eq. ( 4).
where n ≥ 1 is the number of bouts a juvenile agent i at age t i engages in, k is a bout of RTP, T is the length of the juvenile period (i.e., T = 24), juvenile agent i's age where t i = 1 at birth and is incremented by 1 with each time step.
If juvenile i has an SPLP, then the change in the probability of cooperating per RTP bout is given by Eq. ( 5).
where its probability of cooperating, p i,t+1 (C|S), at t + 1 is a function of the average reward defined in Eqs. ( 3) and ( 4) added to the probability of cooperating at t.

Adult public-goods game
On each time step, all adult agents receive 1 unit of resource for reproduction.Adult agents can either keep the entire resource or contribute a portion, ε = 0.5, of it to a common pot with probability p(C| S).At the end of a time step, when all adults have contributed or not, the pot's resources are multiplied by m > 1 and divided evenly among all adult group members regardless of whether they contributed.

Reproduction
To reproduce, adult agents must accumulate resources.When they accumulate R = 25 units of resources, they produce a single juvenile agent.Because adult agents have, on average, 76-time steps to produce juvenile agents, they can produce approximately 3 agents during their adult lifespan with just the 1 unit of resource they receive on each time step.This reproductive rate ensured, on average, that there were over 6 juveniles in a group during a given time-step for the smallest group sizes simulated, allowing multiple juveniles to play each other on each time step.If all adult agents cooperate, they can accumulate resources to reproduce more rapidly (by a factor of ε) and thus produce more offspring during their lifespan.An agent that does not cooperate can also accumulate resources more rapidly if it can exploit cooperators and thereby produce more offspring during its lifespan.Twenty-five units of resources are subtracted when an agent reproduces.Thus, after their first reproductive event, some agents may start their next reproductive cycle with R > 0 units of resources.An offspring agent inherits its parent's conditional probability of cooperating, p(C|S), unless a mutation occurs at a rate r = 0.01.If an agent inherits a mutation, its conditional probability of cooperating, p(C|S), is randomly drawn from a uniform distribution in the range [0,1].An offspring also inherits an SPLP from its parent unless a mutation occurs.If a mutation occurs (again at rate r = 0.01), then SPLP is turned on if it was turned off in the parent or turned off if it was turned on in the parent.

Groups and populations
All agents exist in groups with a maximum size, G max , of adult agents plus a varying number of juvenile agents.When a group reaches G max adults, each adult agent with a probability of 0.5 either remains in the parent group or is placed in the offspring group (see Schank, 2021;Lin and Schank, 2022).The juveniles of the group follow their parents.Thus, on average, the parent and offspring groups were of size G max /2 adults with their juvenile offspring.If a group had no members, it was removed from the population.Thus, the number of groups in a population was constrained by G max , and the total number of adult agents allowed in the population, N max .In all simulations reported below, populations were limited to N max = 10,000 adult agents plus a variable number of juvenile agents.

Measuring entropy
Adult agents cooperate with conditional probability p(C|S) in cooperative-competitive situations S and compete with probability 1 -p (C|S).From the perspective of the Umwelt, adult agents always participate in a public goods game on each time step, so p(S) = 1.0 simplifies the joint entropy calculation for the public goods game functional cycle.We need only use conditional probabilities p(C|S) and 1 -p(C|S) to calculate the joint entropy of the functional cycle (cf.,Table 2) and use Eq. ( 6) to measure the mean entropy for n adult agents in a group.
Group entropy will be low according to Eq. ( 6) if either all agents have a high probability of cooperating or if they all have a low probability of cooperating.Mean population entropy was calculated by averaging over the entropy of all groups.

Simulation study
At the start of each simulation, N max = 10,000 adult agents were created with no juvenile agents.Each agent was randomly assigned an integer lifespan randomly drawn from a uniform distribution in the interval [25,100].All agents had no initial resources to start a simulation.The initial heritable probabilities, p(C|S), of agents cooperating were randomly drawn from a uniform distribution in the interval [0,1], resulting in a uniform random distribution with a mean probability of cooperating of 0.5.No agent had an SPLP at the start of a simulation (see Table 3).
Two sets of simulations were run: control and experimental.In the control simulations, only the inherited conditional probability of cooperation, p(C|S), could evolve via mutations at rate r = 0.01.In the experimental simulations, both the inherited conditional probability of cooperating, p(C|S), and SPLP were allowed to evolve.For both control and experimental conditions, the dispersion rate from the natal group at adulthood, d, the resource pot multiplier, m, and maximum adult group size, G max , were systematically varied as specified in Table 3 for a total of 210 sets of five simulations each.Within each simulation, the reward costs of play, mutation rate, agent lifespan parameters, amount of resources required to reproduce, the proportion of resources contributed to the common resource pot for cooperative adults, and maximum population size remained constant (Table 3).All simulations ran for 40,000-time steps (i.e., 400 generations = time steps/L).Four measurements were taken at the end of each simulation (see Table 3) and averaged over the five replications.The simulation model was developed using the Java programming language and the MASON library for agent-based modeling (Luke et al., 2005).
We predicted that as the rate of offspring dispersion, d, to other groups increases, the probability of successful cooperation should decrease.Likewise, we predicted that successful cooperation should decrease as the maximum size of groups, G max , increases.Finally, we predicted that higher multiplier values, m, in adult public goods games should favor cooperation.By systematically varying these parameter values, we could assess how well SPLP evolved in contexts that either supported cooperation or competition.Thus, varying these parameter values allowed us to assess the frequency of adult cooperation and mean population entropy under conditions that favored cooperation and those that favored competition for both control and experimental conditions.

Results
Two example sets of simulations are depicted in Fig. 7 to illustrate how p(C|S) and SPLP evolved over generations in the control and experimental conditions.SPLP evolved rapidly in experimental conditions and reached higher levels in the higher dispersal rate condition (Fig. 7a, b).The probability of adult cooperation was much higher in the experimental conditions than in the control conditions (Fig. 7c, d), and dispersal decreased the probability of adult cooperation in control but  12, right column).SPLP was allowed to evolve in the experimental (orange) but could not evolve in the control condition (black).The first row is the proportion of the population with an SPLP as it evolved over generations in the experimental condition (a and b).The second row is the mean probabilities of cooperating for the experimental condition (orange) and the control condition (black; c and d).The third row is the mean inherited (genetic) probabilities of cooperating for the experimental condition (orange) and the control condition (black; e and f).The fourth row is the population average information entropy for the probabilities of cooperating for the experimental condition (orange; g and h) and the control condition (black).while varying G max (rows) and dispersion rates (x-axis).Orange lines and markers are from experimental conditions in which SPLP was allowed to evolve, and black lines and markers are from control conditions in which SPLP was not allowed to evolve.Each marker is an endpoint (i.e., the mean value of five simulations averaged over the last 5000 time steps of simulations).The values plotted in each of the 10 graphs are for the experimental conditions: the proportion of agents with SPLP, the mean probability of adult cooperation, and the mean population entropy.The values plotted in each of the 10 graphs for the control conditions are the mean probability of adult cooperation and the mean population entropy.
not experimental conditions.The inherited probability of cooperating was also much higher in the experimental conditions than in the control conditions (Fig. 7e, f).Most importantly, in the experimental conditions, mean population entropy was lower than in the control conditions (Fig. 7g, h).This effect was more pronounced for the lower dispersal rate condition.
Next, we turn to the overall results by plotting terminal means across the two public goods game multipliers, five maximum group sizes, and 21 dispersion rates.Three main results emerged (Fig. 8).First, in the experimental conditions, SPLPs evolved whether or not adult cooperation was favored.Indeed, SPLPs evolved to the highest levels when adult cooperation was not favored (i.e., under conditions of high dispersion and/or large maximum group size, Fig. 8).Second, for all groups sizes and low dispersal rates, cooperation occurred at higher frequencies than in control conditions (Fig. 8).Conversely when maximum group size and dispersion rates were too high to support adult cooperation, adult agents cooperated slightly less in the experimental than in the control conditions (Fig. 8).Third, mean population entropy was consistently lower in experimental conditions than in control conditions, whether or not cooperation or competition was favored (Fig. 8).Thus, when conditions were favorable for cooperation, SPLP agents had information that others would cooperate in their group and so cooperated, but when conditions were unfavorable for cooperation, they had information that others in their group would not cooperate and so played competitively.

SPLP discussion
These results demonstrated that an SPLP could be selected for as an entropy-reducing process in cooperative and competitive situations.SPLP favored cooperation by partially screening off selection against high values of the inherited p(C|S) even when competitive play was favored.That is, agents with an SPLP learned not to cooperate and thus avoided the deleterious results of cooperating in situations that favored competition even when their inherited p(C|S) was high.This allowed a kind of ratcheting effect where higher than expected inherited p(C|S) values could be maintained in a population (see Fig. 7e and f).When group size and dispersion rates were not too large and thus favored cooperation, the higher evolved probabilities of cooperating facilitated higher learned probabilities of cooperating through RTP with an SPLP.
The SPLP, defined by Eqs.(3) through ( 5), also evolved because it is a gradual consensus-reaching process biased towards competitive over cooperative behavior because the reward cost, c, for a juvenile agent playing cooperatively with a competitive juvenile is greater than zero.This allowed adult agents to reduce their uncertainty about whether to play cooperatively or competitively in the context of their group with a bias toward reaching a competitive consensus.Thus, SPLP evolved in the experimental conditions, including most conditions that did not favor cooperation (Fig. 8).However, even though SPLP was biased against cooperation when SPLP evolved, cooperation occurred in larger groups under higher dispersion rates than in the control conditions because it partially screened off inherited values of p(C|S) from selection (Fig. 8).Finally, increasing the multiplier m from 1.5 to 2.0 favored adult cooperation more in both the experimental and control conditions, but the evolution of SPLP resulted in proportionally more cooperation in the experimental conditions than it did in the control conditions (Fig. 8).Because SPLP partially screened off p(C|S) from selection and was biased against cooperative RTP when competitive RTP was present, the evolution of SPLP allowed agents to extend the benefits of cooperation while avoiding the costs of exploitation by synchronizing their information in their social context.

Discussion
This information-theoretic framework (i.e., an integration of the theories of the Umwelt, behavior systems, development, and information theory) is a work in progress, but it does yield general predictions of changes in information properties of the Umwelt as the sensory, motor, and nervous systems of animals develop.Further research is required to connect this framework to neural activity and brain structure (i.e., Uexküll's Innenwelt of animals, which is the neurophysiology and sensorimotor structures constituting the Umwelt; Jaroš and Brentari, 2022).Jost (2016), for example, proposed an information-theoretic framework for sensorimotor loops (which are conceptually similar to von Uexküll's functional cycles) and proposed how neural activity patterns correspond to precepts (analogs of Uexküll's perceptual cues) with a fundamental role for learning.Anderson et al. (2022), building on Špinka's et al. (2001) hypothesis that animals engaging in play are training for the unexpected, argue that children, during play, deliberately seek out surprising situations to reduce the surprise.In their view, children are Bayesian-brain learners who play to seek information, which supports the information-theoretic framework developed here.
Indeed, this framework can provide general theoretical foundations for several previously proposed hypotheses for the evolution of play, such as Špinka's et al. (2001) hypothesis that the fitness benefits of play come from training for the unexpected by losing control due to unpredictable events during play.Špinka's et al. ( 2001) hypothesis falls under the broader hypothesis of self-handicapping, where an animal compromises its advantage in social play (Leresche, 1976;Foroud et al., 2003;Petrů et al., 2008Petrů et al., , 2009;;Lutz et al., 2017;Llamazares-Martín and Palagi, 2021;Gunst et al., 2023), loses control in locomotor play (Donaldson et al., 2002;Petrů et al., 2008Petrů et al., , 2009)), or restricts its access to an object in object play (Ham et al., 2023).For all types of play, self-handicapping results in losing control during play, which in this framework increases the maximum entropy of the developing Umwelt (i.e., via new sensory and motor cues due to unpredictable events) while lowering the entropy for future similar experiences.
In addition, Baldwin and Baldwin (1977) proposed that exploration and play provide sensorimotor stimulation and expose young animals to new situations with new behaviors, which in this framework, increases the maximum entropy of the developing Umwelt while reducing entropy for future situations.Interestingly, they also predict that play should develop from simple to complex, matching the play development patterns observed in monkeys (Baldwin and Baldwin, 1977) and birds (Pellis, 1981).Furthermore, Brownlee (1954) proposed two benefits of play based on his work with domestic cattle: exercise, which we discuss below, and "[b]y playing the young animal becomes acquainted, from impressions received from its kinaesthetic sense organs, with properties of its environment and thereby can attack or escape with confidence in its knowledge of its terrain and the experience gained."(Brownlee, 1954, pp. 61-62).Brownlee's (1954) second benefit clearly fits with the central idea developed here that play provides the developing Umwelt with novel information valuable for future functional situations.Thus, the approach outlined here provides a general information-theoretic framework for several mechanistic hypotheses previously proposed for the fitness benefits of play in a variety of species.
In this information-theoretic framework, the primary fitness benefits of play come from increasing the maximum entropy of the developing Umwelt while also reducing its relative entropy.Almost all of the proposed benefits in Table 1 can be viewed as expanding maximum entropy while reducing the relative entropy of the developing Umwelt by facilitating the development of sensory and motor systems, learning about the environment, expanding responses to unexpected events, and learning social behaviors.First, developmental benefits of play include improving coordination and integrating sensory systems and other neural systems and pathways, all of which are required by a developing Umwelt that is increasing its maximum entropy while reducing its relative entropy.These benefits can be achieved by all types of play-simple to complex and primary process to tertiary process.Physical exercise (Table 1), which may not appear to fall into the theoretical framework developed here, closely connects to the developing Umwelt.Pellegrini and Smith (1998) reviewed the evidence for the benefits of rhythmic stereotypies, exercise play, and RTP.These forms of play affect skills of movement, coordination, and social behaviors; they also have physiological benefits (Pellegrini and Smith, 1998).The physiological effects of exercise are closely associated with improved cognitive function, brain volume, and anatomy in specific areas (Pellis et al., 2010a(Pellis et al., , 2010b;;Voss et al., 2011;Stillman et al., 2020).Thus, physical exercise facilitates the development of the Umwelt.Second, the learning and cognitive benefits of play involve learning about and responding to objects, other conspecific and heterospecific animals, and aspects of the environment.Such learning processes are essential for reducing the relative entropy of the developing Umwelt.Third, the benefits of acquiring innovations and preparing for unexpected events are closely related to the second category.As discussed above, these benefits come from expanding the maximum entropy of the developing Umwelt by encountering new sensory stimuli, discovering novel behavioral responses, and integrating novel information into motivational control processes (Fig. 4).Fourth is the benefits of social play for cooperation, communication, and social assessment, among others.Social play acquires information about social roles and behaviors appropriate for different social situations, essential for reducing the relative entropy of the developing Umwelt in social situations.
It may be objected that none of the benefits listed in Table 1 apply to primary process play but consider an even simpler behavior that fails criteria (i)-(iv) and probably (v) of Burghardt's (2005) five criteria: jerky limb movements observed in perinatal mammals during sleep called myoclonic twitches.One view of these jerky movements is that they are a functionless byproduct of the developing brain in a dreaming state (Blumberg et al., 2013).Another view stems from the problem, starting during fetal development, of establishing connections from pools of motor neurons in the spinal cord to the skeletal muscles in the limbs.Initially, the connections that form are inexact but must become more exact and integrated with other systems, and thus are modified as the animal develops (Blumberg et al., 2013).Blumberg et al. (2013) theorized that myoclonic twitches in sleeping perinatal rats are part of a self-organizing process in developing sensory and motor systems.To demonstrate the theoretical feasibility of such self-organizing mechanisms, they created a robotic system, implemented a myoclonic twitch analog, and found that starting from undifferentiated networks in which all sensory elements are arbitrarily connected to motor elements, highly structured reflex circuitry emerged.Their work may provide a roadmap for how even simple play is important in informationally structuring the developing Umwelt organizing specific behavioral systems.
This information-theoretic framework also sheds light on the five criteria of play widely accepted for identifying play behavior (Burghardt, 2005).Burghardt's (2005) process (i) and structural (ii) criteria require behavioral patterns to be incomplete and repeated but not stereotypically.Repeated but not stereotyped behaviors are needed to explore and learn about physical and social environments.Just such behaviors are predicted in this framework to increase the maximum entropy while reducing the relative entropy of the developing Umwelt.The functional (iii) criterion requires play behavior to be incompletely functional from an adult perspective.In this framework, play is a process that informationally structures the Umwelt for later adult functional behavior.As the Umwelt develops, an animal is increasingly capable of functional behavior as entropy decreases.In this framework, the motivational (iv) criterion is central to the benefits of play because it is the source of active engagement of the developing animal with its world and is under selection.The conditions (v) criterion for play requires sufficient or excess resources with no competing motivations.Selection favors systems that most efficiently convert energy into offspring (Wicken, 1980), and play by engaging the world converts surplus resources into information.
The agent-based model of social play had the theoretical aim of determining whether complex social play with the SPLP could be selected for based on entropy reduction in cooperative and competitive adult situations.The SPLP evolved under all tested parameter values in the experimental condition and did so because it enhanced both adult cooperation and avoidance of exploitation resulting in greater fitness benefits in both competitive and cooperative situations.This provides a theoretical demonstration that play processes can be selected for based on future information fitness benefits.Pellis et al. (2019) asked whether play is a behavior system.They suggest that for simple or primary process play, play behavior systems (i.e., motivational states or processes) evolve that correspond to specific functional behavior systems, such as an anti-predator behavior system being activated by a locomotor play behavior system or a foraging behavior system being activated by object play behavior systems.As play behavior becomes more complex (i.e., secondary and tertiary process play), play behavior systems evolve that integrate lower-level behavior systems.Our model could be viewed as doing just that: integrating RTP with behavior systems for adult cooperative-competitive situations via an evolved learning mechanism.The evolved SPLP allowed juveniles to synchronize behavior in adult cooperative-competitive situations (i.e., have the same information).This allowed the SPLP to evolve under all parameter values tested in the experimental condition because it resulted in greater fitness benefits in both competitive and cooperative situations.Thus, our model of the evolution of SPLP could be theorized as a simulation of the evolution of a complex play behavior system that integrates specific adult competencies.For example, the RTP behavior system could have evolved initially for a specific social competency, such as sexual or social dominance relationships (Pellis and Pellis, 2009).The evolution of the SPLP could be interpreted as integrating RTP with a new social competency for cooperative and competitive situations, thereby increasing the complexity of the RTP behavior system.
Our focus has been on the fitness benefits and evolution of play in the context of juvenile play, but adults of many species engage in play (e.g., Pellis and Iwaniuk, 2000;Antonacci et al., 2010;Lutz et al., 2019;Palagi, 2009Palagi, , 2023)).It is beyond the scope of this paper to review the application of this framework to adult play, but we briefly discuss how our information-theoretic approach dovetails with Palagi's (2023) theoretical analysis of play in tolerant and cooperative mammalian societies.Palagi (2023) hypothesized that adult play facilitates egalitarian and cooperative behavior by synchronizing adult collective decision-making in social primates and carnivores.For example, adult play behavior peaks in both great apes and monkeys before feeding, and adult play occurs at different times during African wild dog hunts (Palagi, 2023).In our view, adult play in egalitarian and cooperative social primate and carnivore societies functions to synchronize information about the social behavior of others in the group, just like our model of the evolution of social play in juveniles synchronized information about adult cooperative social context.Thus, just as juvenile social play can reduce average group entropy in adult cooperative situations, adult play can also reduce average group entropy in collective decision-making.More generally, our information-theoretic framework not only links juvenile play to adult fitness but also links adult play to fitness via play's information-enhancing effects.
Finally, two related fields of study dovetail with this theoretical framework, and their integration with the study of play could synergistically result in new theoretical and experimental insights.The first, as mentioned above, is environmental enrichment.As an example of a possible experimental synergy, Freund et al. (2013) placed genetically identical juvenile mice in an enriched environment.At the end of the study, they found that mice who explored their environment more (i.e., had greater roaming entropy or "randomness" in their exploratory behavior) also grew more neurons in their hippocampus.The second is the study of curiosity, which is motivated or rewarding behavior that is typically characterized as exploratory behavior of physical (space and objects) and social environments but is not immediately functional (Byrne, 2013;Kidd et al., 2015;Oudeyer. et al., 2016;Cervera et al., 2020).This characterization is similar to Burghardt's (2005) criteria for play behavior, and unsurprisingly, play is one of the phenomena studied in curiosity research (Kidd et al., 2015).Indeed, the widely accepted definition of curiosity is information-seeking behavior that is motivated and intrinsically rewarding (Byrne, 2013;Kidd et al., 2015;Oudeyer. et al., 2016;Cervera et al., 2020).Although there is controversy over whether play, exploration, and curiosity belong to the same behavioral-cognitive categories (Pellis and Burghardt, 2017), if play is a kind of self-motivated Umwelt information-enrichment mechanism, then combining the experimental study of play with the environmental enrichment paradigm and the study of curiosity may provide new insights into the benefits of play.Play, environmental enrichment, and curiosity research are all concerned with information-seeking behavior and could greatly benefit from integrating these theoretical and experimental perspectives.

Fig. 3 .
Fig. 3.The tick's Umwelt is structured by three functional cycles executed from top to bottom.

Fig. 4 .
Fig. 4. Abstract representation of integrating behavior systems into Umwelt theory.Behavior systems replace von Uexküll's notion of central receptors and effectors (top).In the bottom portion of the figure, different levels of information processing control are represented with an arbitrary number of structures enumerated at each level.Lines indicate possible channels between different levels of information processing.In this depiction of behavior systems, the first three levels are different levels of motivational control, which prime lower levels for more specific motivational states and functions.Systems (the highest level) are general motivational control processes that prime lower-level systems for specific functions such as feeding.Subsystems are lower-level motivational processes that sensitize the animal to specific types of cues for the function activated by a higher-level system.For example, a feeding system may activate a predating subsystem, which sensitizes lower-level units to specific perceptual and motor cues appropriate for predation.Modes are the lowest level of motivational control that further narrow the range of specific cues an animal is sensitive to and primes specific modules to these cues.Modes in this sense are similar to the sequence of functional cycles structuring the tick's Umwelt (Fig.3), but unlike the tick, modes, as with all levels of control, can be modified by learning.Modules are perceptual-motor functional units that respond to specific cues with specific motor outputs.

Fig. 6 .
Fig. 6.Illustration of the agent-based model for the evolution of SPLP in a group-structured population.Social groups are yellow-filled circles containing juveniles and adults.Juveniles are depicted with open circles (of various colors representing their probability of in cooperative RTP), and adults are shown with solid circles (of various colors also representing their probability of cooperating in public goods games).Bluer colors correspond to a higher probability of cooperating, and redder colors correspond to a lower probability of cooperating.Throughout their juvenile period, agents with an SPLP can modify their inherited probability of cooperation through RTP.

Fig. 7 .
Fig. 7.Evolved parameter values over generations for experimental and control simulations in populations with G max = 20 and under two different dispersal conditions (d = 0.07, left column; d = 0.12, right column).SPLP was allowed to evolve in the experimental (orange) but could not evolve in the control condition (black).The first row is the proportion of the population with an SPLP as it evolved over generations in the experimental condition (a and b).The second row is the mean probabilities of cooperating for the experimental condition (orange) and the control condition (black; c and d).The third row is the mean inherited (genetic) probabilities of cooperating for the experimental condition (orange) and the control condition (black; e and f).The fourth row is the population average information entropy for the probabilities of cooperating for the experimental condition (orange; g and h) and the control condition (black).

Fig. 8 .
Fig.8.Systematic analysis of parameter values for experimental and control simulations in populations for two different multiplier values (m = 1.5, left column; m = 2.0, right column) while varying G max (rows) and dispersion rates (x-axis).Orange lines and markers are from experimental conditions in which SPLP was allowed to evolve, and black lines and markers are from control conditions in which SPLP was not allowed to evolve.Each marker is an endpoint (i.e., the mean value of five simulations averaged over the last 5000 time steps of simulations).The values plotted in each of the 10 graphs are for the experimental conditions: the proportion of agents with SPLP, the mean probability of adult cooperation, and the mean population entropy.The values plotted in each of the 10 graphs for the control conditions are the mean probability of adult cooperation and the mean population entropy.

Table 3
Fixed parameters, evolvable variables, initial conditions, varied parameters, and measurements variables used in the simulation study.
inherited mean Mean value of the inherited (genetic) p(C|S) of adults p(C|S) end mean Mean values of p(C|S) of adults at the end of a simulation and after possible modification by an SPLP Hg mean Mean of the group entropies, Hg J.C. Schank et al.