Hamilton’s rule, gradual evolution, and the optimal (feedback) control of phenotypically plastic traits

Most traits expressed by organisms, such as gene expression profiles, developmental trajectories, behavioural sequences and reaction norms are function-valued traits (colloquially ‘‘phenotypically plastic traits”), since they vary across an individual’s age and in response to various internal and/or external factors (state variables). Furthermore, most organisms live in populations subject to limited genetic mixing and are thus likely to interact with their relatives. We here formalise selection on genetically determined function-valued traits of individuals interacting in a group-structured population, by deriving the marginal version of Hamilton’s rule for function-valued traits. This rule simultaneously gives a condition for the invasion of an initially rare mutant function-valued trait and its ultimate fixation in the population (invasion thus implies substitution). Hamilton’s rule thus underlies the gradual evolution of functionvalued traits and gives rise to necessary first-order conditions for their uninvadability (evolutionary stability). We develop a novel analysis using optimal control theory and differential game theory, to simultaneously characterise and compare the first-order conditions of (i) open-loop traits – functions of time (or age) only, and (ii) closed-loop (state-feedback) traits – functions of both time and state variables. We show that closed-loop traits can be represented as the simpler open-loop traits when individuals do not interact or when they interact with clonal relatives. Our analysis delineates the role of state-dependence and interdependence between individuals for trait evolution, which has implications to both life-history theory and social evolution. 2021 The Authors. Published by Elsevier Ltd. This is an openaccess article under the CCBY license (http:// creativecommons.org/licenses/by/4.0/).


Introduction
All biological organisms are open systems exchanging energy, matter, and information with their surrounding. As such, most if not all traits of an organism may vary in response to changes of its internal factors as well as to changes in its external biotic and abiotic environmental conditions. Examples include gene expression profiles, physiological processes, reaction norms, life-history traits, developmental trajectories, morphological shapes, and behavioural sequences. We collectively call these traits functionvalued traits by which we mean phenotypes whose expression depends on some parameter(s) or variable(s) (e.g., time, space, internal or external biotic and abiotic conditions), and these traits are often called colloquially as ''phenotypically plastic traits" in evolutionary biology (p. 33 West-Eberhard, 2003). Formalising how selection shapes these traits is relevant as it helps to understand their evolution and the mechanistic constraints involved in their functioning. This has been done for genetically determined function-valued traits using different theoretical approaches that consider different biological perspectives on the evolution of these traits.
First, the evolution of life-history schedules has often been studied by applying Pontryagin's maximum principle (e.g., León, 1976;Macevicz and Oster, 1976;Oster and Wilson, 1977;Iwasa and Roughgarden, 1984;Sibly et al., 1985;Stearns, 1992;Perrin, 1992;Kozłowski, 1992;Bulmer, 1994;Irie and Iwasa, 2005;Parvinen et al., 2013;Lehmann et al., 2013;Metz et al., 2016). Here, a trait evolves to vary as a function of the age or time of interaction of individuals, while individual fitness (expected survival and reproduction) can be constrained by the dynamics of state variables. These state variables are observables describing internal or external conditions of an individual, e.g., body size, fat reserves, information, resource availability, behaviour of others, that in turn depends on trait expression. Models applying Pontryagin's maximum principle formalise the evolution of so-called open-loop traits (Weber, 2011;Liberzon, 2011), whose name emphasises that the trait itself involves no feedback loop, since it depends only on time (age). As such, an open-loop trait can be thought of as an entirely fixed course of phenotypic expression from birth to death of an individual (trait expression happens ''by the clock"); an example would be an age-dependent growth trajectory. Evolution of open-loop traits has also been formalised to include interactions between relatives, which allows to consider their evolution under limited genetic mixing such as spatially or family-structured populations (Bulmer, 1983;Day and Taylor, 1997;Day and Taylor, 1998;Day and Taylor, 2000;Wild, 2011;Avila et al., 2019).
Second, in behavioural ecology and evolutionary game theory, selection on function-valued traits has typically been studied by using dynamic programming (see Houston and McNamara, 1999;Mangel and Clark, 1988 for textbook treatments and e.g. Leimar, 1997;Ewald et al., 2007;McNamara and Houston, 1987;Dechaume-Moncharmont et al., 2005). Here, the trait evolves to vary not only as a function of time but also as a function of relevant state variables. This formalises so-called closed-loop traits (Weber, 2011;Liberzon, 2011) as these now involve a feedback between trait expression and state dynamics (see Fig. 1 for a schematic conceptualisation). A closed-loop trait can thus be thought of as a contingency plan, which specifies a conditional trait expression rule according to fitness relevant conditions an individual may be in. An example would be an ontogenetic allocation of resources to somatic functions depending on fat reserve (by contrast, the corresponding open-loop trait would allocate resources only depending on age).
Both Pontryagin's maximum principle and dynamic programming are optimal control theory approaches (e.g., Bryson and Ho, 1975;Basar and Olsder, 1999;Dockner et al., 2000;Sydsaeter et al., 2005;Weber, 2011;Liberzon, 2011;Kamien and Schwartz, 2012), whose common aim is to identify a schedule of control variables-a trait-over a period of time that maximises (in the best response sense) an objective function (fitness in biology). A crucial result of this literature is that open-loop and closed-loop trait expression leads to different outcomes when individuals interact (e.g. Basar and Olsder, 1999;Dockner et al., 2000). This means that the evolution of a function-valued trait should depend on the assumptions about its functional dependence as well as the type of interactions individuals face. Yet the conditions under which it matters to distinguish between open-and closed-loop traits and how this impacts on an evolutionary analysis remains unclear and has, perhaps surprisingly, not been worked out in evolutionary biology despite the wide interest in the evolution of phenotypic plasticity.
Function-valued traits have also been studied in quantitative genetics theory, where the directional selection coefficient on function-valued traits has been derived assuming no interactions between individuals (Kirkpatrick and Heckman, 1989;Gomulkiewicz and Kirkpatrick, 1992;Gomulkiewicz and Beder, 1996;Beder and Gomulkiewicz, 1998). This selection coefficient describes selection over short time-scales (time-scales of demographic changes) and can be decomposed into component-wise descriptions, which allows to describe the direction of selection for each component of a function-valued trait. While this selection coefficient has been connected to long-term evolution and extended to include interactions between individuals in wellmixed populations Dieckmann et al., 2006), this literature does not distinguish between open-loop and closed-loop traits and it thus remains unclear how the selection coefficient on a trait connects to the dynamic state constraints (that can be physiological or informational) underlying trait evolution. This is why it would be useful to connect the directional selection coefficient on function-valued traits to optimal control theory results, because it provide a way to analyse how selection on traits depends on inter-dependencies and constraints between different trait components.
There are thus different approaches to the evolution of function-valued traits, but the scope of existing results and the connection between them is not clear. In contrast, for selection on quantitative scalar traits general results have long been proven to hold. In particular, for small trait deviations (weak selection), the selection coefficient on a scalar quantitative trait in a population subject to limited genetic mixing can be expressed as a marginal version of Hamilton's rule, where the direct and indirect fitness effects (the ''cost" and ''benefit") are given by partial derivatives of individual fitness (e.g., Taylor and Frank, 1996;Frank, 1998;Roze and Rousset, 2003;Rousset, 2004;Lehmann and Rousset, 2014;Van Cleve, 2015). This selection coefficient provides two useful results about gradual quantitative trait evolution. First, Fig. 1. Open-loop (age/time-dependent) and closed-loop (age/time-dependent and state-dependent) conceptualisation of function-valued traits. Open-loop traits affect state variable(s). Closed-loop traits affect state variable(s) and state-variable(s) affect(s) them in turn (thus there is a feedback-loop). In both cases: (i) traits and state variables can vary over age/time and (ii) state variables affect fitness. Same biological phenomena can be conceptualised as either open-loop or closed-loop traits. since the selection coefficient is independent of allele frequency and is of constant sign (Roze and Rousset, 2003;Roze and Rousset, 2004;Rousset, 2004;Lehmann and Rousset, 2014), the marginal Hamilton's rule subtends gradual evolution of all scalar traits even when the survival and reproduction of individuals depend on the behaviour of others, such as under density-and frequency-dependent selection. Second, when the selection gradient vanishes, Hamilton's rule provides the necessary first-order condition for a strategy to be locally uninvadable; that is, it allows to determine candidate evolutionary stable strategies, which is central to characterise long-term evolution (Geritz et al., 1998;Rousset, 2004). Do these general principles of gradual evolution in group-structured populations that hold for scalar traits also hold for function-valued traits?
Our goal in this paper is to formalise the directional selection on genetically determined function-valued traits when state constraints affect trait expression and the evolving population is group-structured (subject to limited genetic mixing). To achieve this we develop a two-step analysis. First, it is to formalise the directional selection coefficient on quantitative function-valued traits under limited genetic mixing (taking panmictic population as a special case into account) in order to characterise the gradual evolution of function-valued traits. Second, from the directional selection coefficient we derive how state-dependence of trait expression affects selection on function-valued traits. The rest of this paper is organised as follows. (1) We derive the selection coefficient acting on a mutant allele coding for a function-valued trait in the island model of dispersal (group-structured population) under weak selection, which yields the marginal version of Hamilton's rule for function-valued traits. We deduce from Hamilton's rule the necessary first-order condition for local uninvadability, which yields the candidate uninvadable function-valued traits and applies to both continuous and discrete traits. (2) We apply these results to time-dependent function-valued traits (dynamic traits) by deriving necessary conditions for uninvadability expressed in terms of dynamic constraints on state variables and their (marginal) effects on the reproductive value. This allows us to compare how selection acts on open-loop versus closed-loop traits, specifying the role of trait responsiveness. In turn, this allows to establish the connection between the dynamic programming and the maximum principle type of approaches in the context of gradual phenotypic evolution. (3) We illustrate the different main concepts of our approach by analysing the evolution of temporal common pool resource production and extraction within groups. (4) Finally, we discuss the scope of our results.

Biological scenario
Consider a haploid population subdivided into an infinite number of homogeneous groups (without division into class structure) with a fixed number N of individuals, where censusing takes place at discrete demographic time periods. All groups are subject to the same environmental conditions and are equally connected to each other by random dispersal. A discrete demographic time period spans an entire life cycle iteration where various events can occur (e.g. growth, reproduction, dispersal) to individuals. The life cycle may allow for one, several, or all individuals per group to die (thus including whole group extinction through environmental effects or warfare). Generations can thus overlap but when this occurs, the parents are considered equal (in respect to their ''demographic properties") to their offspring in each generation (since there is no within-group class structure). Dispersal can occur before, during, or after reproduction, and more than one offspring from the same natal group can establish in a non-natal group (i.e., propagule dispersal). We refer to this group-structured population where all individuals within groups are indistinguishable as the homogeneous island population (i.e., broadly this corresponds to the infinite island model of dispersal of Wright (1931), used since at least Eshel, 1972 under various versions to understand selection on social traits, e.g., Rousset (2004), and where the specifics of our demographic assumptions are equivalent to those considered in Mullon et al. (2016)).
We assume that two alleles segregate in the homogeneous island population at a locus of interest: a mutant allele with trait u m 2 U T ½ and a resident (wild-type) allele with trait u 2 U T ½ . Here, U T ½ is the set of feasible traits that individuals can potentially express and is formally defined as the set of real-valued functions with range U and domain T , where T is a space of some index variable(s) representing, for instance, time, an environmental gradient or cue. We assume here that T is a closed interval over some discrete or continuous index variable t. If t is discrete, then the element u 2 U T ½ is a vector and if t is continuous then the element u 2 U T ½ is a piece-wise continuous function. Hence, we write u ¼ u t ð Þ f g t2T to emphasise that it consists of the (finite or infinite) collection of all point-wise values u t ð Þ of the trait. Namely, u can be thought of as a continuous or a discrete ''path" (or a schedule, or a trajectory) on the space T . Note that in Table 1 we have outlined a list of symbols of key components of our model.
The crucial assumption of this paper is that the mutant trait u m can be expressed in the following form as a small deviation from the resident trait: where the phenotypic deviation function g ¼ g t ð Þ f g t2T must satisfy u þ g 2 U T ½ for sufficiently small non-negative parameter .
Because U T ½ may have a boundary, not all phenotypic deviations generate a mutant strategy u m that remains within the bounds of the feasible trait space u m 2 U T ½ , independent of the choice of (see Section 2.3.2). Note that we are making a distinction between a phenotypic deviation g (a function) and the effect size (a scalar) of that deviation. In the literature, a (scalar) mutant effect is often modelled with the notation d ¼ g (e.g. Rousset, 2004). This distinction between phenotypic deviation and effect size in the notation is necessary for analysing selection on function-valued traits.

Allele frequency change and short-term evolution
Our first aim is to characterise the change in mutant allele frequency in the homogeneous island population under weak selection ( ( 1). To that end, it is useful to follow the direct fitness approach (Taylor and Frank, 1996;Rousset and Billiard, 2000;Rousset, 2004) and introduce the individual fitness function Þgives the expected number of successful offspring produced over one life cycle iteration by a focal individual (possibly including self through survival) with trait u , when its average neighbour in the focal group has trait u and an average individual (from other groups) in the population has trait u, which is taken here to be the resident trait for simplicity of presentation. We note that any individual in the population can be taken to be the focal individual (Rousset and Billiard, 2000;Rousset, 2004) and that the fitness of this individual can always be expressed in terms of average phenotypes of other individuals in different roles with respect to the focal (e.g., group neighbour, cousin, members of other groups, etc.), whenever mutant and resident phenotypes are closely similar (see the argument in Appendix A.2 for function-valued traits and a textbook argument for scalar traits e.g. p. 95 Rousset, 2004). These individuals in different roles, as well as the focal individual itself, are actors on the fitness of the focal. Here, the focal individual is regarded as the recipient of the trait expressions of different actors (i.e. focal individual, average neighbour, average individual in other groups), which corresponds to the direct fitness or recipientcentred approach (e.g. Rousset, 2004, Chapter 7).
In terms of this definition of individual fitness, we define the direct fitness effect of expressing the mutant allele as which is the effect that the focal individual has on its own fitness if it would switch from expressing the resident to the mutant allele for a small allelic effect. Analogously, we define the indirect fitness effect of expressing the mutant allele as which is the effect that the whole set of neighbours have on focal's fitness if they were to all switch from expressing the resident to the mutant allele (note that appears in front of the derivatives in Eqs. (2)-(3) because it scales the fitness effect of a trait deviation in direction g and so Àc g u ð Þ and b g u ð Þ are the net fitness effects). Finally, let us denote by r u ð Þ the neutral relatedness between two randomly sampled group neighbours (Michod and Hamilton, 1980;Frank, 1998;Rousset, 2004) in the homogeneous island population that is monomorphic for the resident; namely, r u ð Þ is the probability that in a neutral process (where all individuals are alike) the two homologous alleles of these individuals coalesce in the same common ancestor (e.g., Roze and Rousset, 2003;Rousset, 2004;Lehmann and Rousset, 2014;Van Cleve, 2015). Note that relatedness defined as such depends only on the resident trait. In Appendix A, we show that the change Dp in the frequency p of the mutant allele over one demographic time period (one life cycle iteration) can be expressed in terms of these quantities as follows.
Invasion implies substitution principle result. In the homogeneous island population with two alleles, the change in mutant allele frequency p in the population takes the form where p 1 À p ð Þis the genetic variance at the locus of interest, is a selection coefficient of order O ð Þ that is independent of p, and O 2 À Á is a remainder of all higher order terms. This entails an ''invasion implies substitution" property of the mutant allele, which says that if s g u ð Þ > 0, the mutant allele coding for a small function-valued deviation g is selected for and not only invades but substitutes the (ancestral) resident allele [since effects of order O 2 À Á can be neglected in Eq. (4) whenever s g u ð Þ is non-zero]. We have thus formalised an ''invasion implies substitution"principle (see Priklopil and Lehmann, 2020 for a review) for function-valued traits in the homogeneous island population and which takes the form of Hamilton's rule: the mutant spreads if r u ð Þb g u ð Þ À c g u ð Þ > 0. This novel result is a multidimensional generalisation of previous analogous results for scalar traits (Roze and Rousset, 2003;Roze and Rousset, 2004;Rousset, 2004;Lehmann and Rousset, 2014).
Owing to its simplicity, the function-valued trait nature of our result is perhaps yet not fully apparent, but is made explicit on noting that the direct and indirect effects (Eqs. (2) and (3)) are both formally Gâteaux derivatives, which are directional derivatives (see Section A.1 in Appendix for a formal definition and e.g., Troutman, 1991, pp. 45-50;Luenberger, 1997, pp. 171-178) and represent changes in fitness resulting from a sum of all weighted component-wise changes in trait expression (over the domain T ) induced by the mutation function g. To outline the componentwise change in fitness, it is useful to decompose the selection coefficient as where Á is an inner product on functions (the generalisation of a dot product, see e.g. Anton and Rorres, 2013 Chapter 6), s u ð Þ ¼ s t; u ð Þ f g t2T is the selection gradient function, where the component s t; u ð Þ gives the selection gradient on component u t ð Þ of the trait, i.e. the value of u at time t, holding other components u t 0 ð Þ (for all t 0t 2 T ) of the trait fixed. Each component of the selection gradient function is then given by are, respectively, the effect on the focal's own fitness from changing marginally component u t ð Þ of its trait, while holding other trait components u t 0 ð Þ (for t 0t) fixed, while b t; u ð Þ is the effect of all group neighbours on the focal individuals fitness when changing marginally component u t ð Þ of their traits, while holding other components u t 0 ð Þ (for t 0t) of their traits fixed. That is, the costs and benefits are partial derivatives and s t; u ð Þ is the component-wise inclusive fitness effect. When t is discrete and T finite, Eq. (7) corresponds to the trait specific inclusive fitness effect derived previously for a backdrop monomorphic resident homogeneous island population (Mullon et al., 2016, Eq. (12)).
Eq. (6) shows that the selection coefficient is a weighted change of trait-specific changes. Note that for continuous index variable t over the interval T , the partial derivatives Àc t; u ð Þand b t; u ð Þin Eq. (8) are formally functional derivatives (see Eq. (A.3) in Appendix for a formal definition and see e.g. Appendix A in Engel and Dreizler, 2013, in particular, Eq. (A.28)). In the absence of interactions between relatives s t; u ð Þ reduces to b y ð Þ in Eq. (1) of Gomulkiewicz and Kirkpatrick (1992) Day and Taylor (2000), which allows for interactions between relatives).

Necessary condition for local uninvadability and long-term evolution
It follows from our ''Invasion implies substitution principle" result that a necessary first-order condition for a trait u Ã to be locally uninvadable (resistant to invasion by any mutant in a small radius ( 1) is given by a non-positive selection coefficient for all admissible mutants in the resident u Ã population, that is Local resistance to invasion by sets of alternative mutants allows to characterise candidate long-term evolutionary outcomes (Eshel and Feldman, 1984;Eshel, 1996;Eshel et al., 1998) and is a first-step (and often the only directly accessible computational step under limited genetic mixing) towards characterising uninvadable traits. A crucial question is whether a locally uninvadable strategy u Ã will be approached by gradual evolution from within its neighbourhood and thus be convergence stable (Eshel, 1983;Lessard, 1990;Geritz et al., 1998;Rousset, 2004;Leimar, 2009). Because characterising convergence stability involves a second order analysis of the selection coefficient, which is involved for multidimensional traits (Lessard, 1990;Leimar, 2009), it will not be investigated further in this paper. For the same reason, we will also not consider sufficient conditions for local uninvadability. In the remainder of this section, we focus on characterising in more detail the necessary condition of local univadability (Eq. (9)) in terms of the selection gradient function s u ð Þ, which allows removing the considerations of mutational effect g.

Local uninvadability for scalar-valued traits
Let us first consider the case of scalar quantitative traits, where the trait of each individual is an element belonging to a bounded subset U & R of the real line. That is, the resident and mutant traits stay within the feasibility bounds u min 6 u; u m 6 u max ). For this case, the index t in Eq. (7) can be dropped and one obtains the standard selection gradient on a scalar-valued trait for the homogeneous island population: (Taylor and Frank (1996), Frank (1998), Roze and Rousset (2003), Roze and Rousset (2004), Rousset (2004), Lehmann and Rousset (2014), Van Cleve (2015)).
Note that for u ¼ u min an admissible phenotypic deviation g must be non-negative g P 0 and for u ¼ u max it must be nonpositive g 0 while for u min 6 u 6 u max the deviation g is unrestricted. Substituting this into the first-order condition for uninvadability Eq. (9) yields that the necessary condition for uninvadability for scalar bounded traits can be expressed in the following form Note that if the set of admissible traits is unbounded (i.e. U ¼ R), then the first-order necessary condition for local uninvadability is given by the second line of Eq. (11).

Local uninvadability for function-valued traits
Let us return to the general case where the trait of each individual is an element of U T ½ , being either a vector (T discrete) or a (bounded and piece-wise continuous) function (T continuous). More precisely, for all t 2 T the resident and mutant traits stay within the feasibility bounds u min t ð Þ 6 u t ð Þ; u m t ð Þ 6 u max t ð Þ, such that u min ¼ u min t ð Þ f g t2T and u max ¼ u max t ð Þ f g t2T . Now, an admissible deviation g ¼ g t ð Þ f g t2T must satisfy for all t 2 T similar conditions as given for the scalar-traits in Section (2.3.1), that is, for u t ð Þ ¼ u min t ð Þ an admissible phenotypic deviation g must be nonnegative g t ð Þ P 0 and for u t ð Þ ¼ u max t ð Þ it must be non-positive Substituting the admissible deviations into Eq. (9) yields that a candidate uninvadable strategy u Ã ¼ u Ã t ð Þ f g t2T 2 U T ½ satisfies for all t 2 T : which is thus equivalent to Eq. (11) in a point-wise way.

From the selection gradient to candidate optimal controls
The point-wise description of the candidate uninvadable trait u Ã given by Eq. (12) is unlikely to be directly useful in solving for u Ã in concrete applications because factors characterising the organism and its environment change over time (e.g. organisms can grow and the resources in the environment can get depleted). Hence, solving u Ã from Eq. (12) would entail simultaneously solving a large number of equations, while tracking the changes in the relevant time-dependent factors. A more useful characterisation of u Ã can be achieved with the use of the mathematical framework of optimal control theory, most notably dynamic programming and Pontryagin's maximum principle, both of which have been used abundantly in evolutionary biology and in different contexts (e.g., León, 1976;Iwasa and Roughgarden, 1984;Mangel and Clark, 1988;Houston and McNamara, 1999;Stearns, 1992;Perrin, 1992;Kozłowski, 1992;Day and Taylor, 1997;Day and Taylor, 2000;Cichon and Kozlowski, 2000;Irie and Iwasa, 2005;Lehmann et al., 2013;Priklopil et al., 2015;English et al., 2016;Metz et al., 2016;Avila et al., 2019).

Fitness function, control variables and state constraints
For space reasons, we focus on a continuous time formulation (but parallel developments apply to discrete time), and assume that a demographic time period is characterised by the time interval T ¼ 0; t f ½ during which the trait expression is observed. This time interval can be thought of as the length of the lifespan of organisms or the time during which behavioural interactions occur between individuals (e.g. a mating season, winter season), which eventually leads to reproduction. More specifically, we now assume that the fitness of the focal individual can be written in the form Þis the so-called scrap value; namely, the contribution to individual fitness at the final time t ¼ t f (formally collect, respectively, the trait expression levels u t ð Þ; u t ð Þ, and u t ð Þ at time t of the focal individual, that of an average neighbour, and an average individual from the population, and the state variables x t ð Þ; x t ð Þ, and x t ð Þ of these respective individuals (note that the ''" in the subscript of u and x emphasises that these controls and state variables collect those of all actors on the fitness of the focal recipient). State variables describe some measurable conditions of an individual (e.g. size, stored energy, hunting skill) or that of its environment (e.g. amount of resource patches, environmental toxicity). The defining feature of a state variable in our model is that its time dynamics depends on the evolving trait of one or more individuals in interaction and we will henceforth from now on call the elements of u t ð Þ the control variables, which is customary for these type of models in the evolutionary literature (e.g., Perrin, 1992;Day and Taylor, 2000).
Because models with both control and state variables become rapidly notationally complex, we assume that both the controls and the state variables are one-dimensional real numbers. The state of every individual is assumed to change according to the function g : with initial condition (''i.c.") x 0 ð Þ ¼ x init and which is the rate of change of the state of a focal individual with control u t ð Þ in state x t ð Þ, when its neighbours have average control u t ð Þ and average state x t ð Þ in a population where the average control (in other groups) is u t ð Þ and the average state is x t ð Þ. Similarly, we can also express the rate of change of the state of an average neighbour of the focal and an average individual in the rest of the population, respectively, as where the vectors collect the (average) controls and states of actors on the state variables of an average neighbour of the focal individual (first line), and on an average individual in the population (second line), respectively (here and throughout all vectors are defined by default as being column vectors). These actors are thus second-order level actors on the focal recipient since they affect the state variables of actors affecting the focal's fitness. Note that the subscripts of the control vectors (u t ð Þ and x t ð Þ) and state vectors (u t ð Þ and x t ð Þ) emphasise the individual (actor) from who's perspective the second-order actors' control and variables are collected. Accordingly, the vectors in Eq. (17) contain elements which are, for an average neighbour of the focal, the control and state expressions of average neighbours viewed as actors on the focal individual. While we have so far explicitly distinguished between the states of different individuals, which is required if state represents some property of individual's condition (e.g. body size or individual knowledge), nothing prevents the individual state to represent some environmental condition common to the group or population and which can be influenced by individual behaviour (e.g. local amount of resources in the group, in which case x t ð Þ ¼ x t ð Þ, see concrete example in Section 4). Note that while tracking the dynamics of three state variables (Eqs. (15)-(18)) may appear complicated, it is much simpler than tracking the state of all individuals in a group separately (which would require as many equations as there are individuals in the group and is the approach taken in Day and Taylor (1997), Day and Taylor (2000) and differential game theory, e.g., Dockner et al. (2000), Weber (2011)).
Finally, we now make a couple of remarks about the properties of the fitness w u ; u ; u ð Þ(Eq. (13)) of the focal individual and its dynamic constraints (Eqs. (15) and (16)), which is a special case of a fitness w u ; u ; u ð Þconsidered in Section 2. First, the fitness (13) depends on the full trajectories of the control u ¼ u t ð Þ f g t2T and state x ¼ x t ð Þ f g t2T variables, but since the state variables are fully determined by the controls (by way of Eqs. (15) and (16)) and the initial condition x init (which we assume here to be fixed), then fitness is determined by the controls. In particular, if fitness depends only on the state of the system at the final time , then fitness still depends critically on the control variables. We assumed in Section 2 that the fitness w u ; u ; u ð Þ is Gâteaux differentiable (Eqs. (2) and (3)), which means here that functions f ; U and g are smooth enough with respect to its arguments (see e.g. Section 3 of Liberzon (2011) for textbook treatment of assumptions and Clarke (1976) for minimal assumptions needed). We finally note that in the homogeneous island population, individual fitness depends in a non-linear way on various vital rates (e.g., Roze and Rousset, 2003

Concept of neutral reproductive value and shadow value
A central role in our analysis will be played by the neutral of an individual at time t in a resident population, which gives the total contribution to fitness from time t onward of a (recipient) individual when the current state variables of the actors on its fitness is x t ð Þ. We emphasise that the future-value reproductive value or Þis connected but not equivalent to Fisher's reproductive value (e.g., Goodman, 1982;Charlesworth, 1994), which is the residual fitness conditional on reaching a certain age (a current-value reproductive value). The argument u has been separated with the semicolon in order to emphasise that the (future- Þis evaluated assuming a fixed control trajectory where u is treated as a parameter. Hence, the reproductive value is formally a function of current time t and state . In Appendix B.1, we show that the reproductive value satisfies the following partial differential equation (PDE) is the inner product of vectors and the vector is the gradient of the reproductive value with respect to the changes in the state variables of each individual affecting the focal's fitness (and associated with fixed resident control path u). In the last equality, we use the vector x t ð Þ as an argument of the reproductive value (which is defined in a resident population), which might be confusing at first glance, since all individuals in the resident population are actually in the same state. However, the reason why we use the vector x t ð Þ when expressing the partial derivatives is because we want to emphasise which state (focal individual's, average neighbour's or average in other groups) we are varying.
The ''-" sign on the left-hand-side of Eq. (20) indicates that the reproductive value of an individual is growing when looking backwards in time. Hence, it grows according to the current rate Þof the effects of the current state change of each type of actor on the future fitness of the focal individual, weighted by the change g u t ð Þ; x t ð Þ ð Þ of state of the actors that are all the same in a resident population. The elements of the gradient k t; x t ð Þ; u ð Þare called the shadow values of the states in the optimal control literature (see e.g. Dorfman, 1969;Caputo and Caputo, 2005), since by changing state, there is no immediate effect on fitness, but only future fitness effects.

Concept of open and closed-loop controls
Because the internal and external conditions of organisms vary, trait expression can evolve to be functionally dependent on these conditions (Sibly and McFarland, 1976;McFarland, 1977;McFarland and Houston, 1981;Houston and McNamara, 1999 where the function d : T Â R 3 ! U is the trait expression rule (or decision rule for short) in the so-called closed-loop (or feedback or Markovian) form of the control variable (Basar and Olsder, 1999, p. 221;Dockner et al., 2000, p. 59).
3.2. First-order conditions for closed-loop controls 3.2.1. The first-order condition in terms of dynamic constraints Let us now evaluate the point-wise fitness effects of Hamilton's marginal rule (7) Þ are vectors collecting the rates of state changes at time t. The derivatives are evaluated at u ¼ d Þ are vectors of closed-loop and open-loop trait expression rules, respectively, evaluated in a resident population.
We now make two observations about the direct and indirect effects. First, the perturbations of the change of the state variables Þ , but since under a first-order analysis everything else than the original perturbation needs to be held constant, the downstream effects are accounted for by the shadow values k t; x t ð Þ; u ð Þevaluated in the resident population. Second, the state dynamics of an average individual in the population is not affected from variations in u and u In order to obtain a full characterisation of the first-order condition taking into account the dynamic constraints brought by the shadow value, it is useful to introduce the Hamiltonian function This can be thought of as the contribution to individual fitness of all current ''activities" (Dorfman, 1969, p. 822); namely, the (phenotypic) expressions u t ð Þ of all individuals currently in state x t ð Þ at t, holding everything else fixed in a resident population. It is thus the sum of the current rate of fitness contribution Þevaluated in the resident population, since the shadow values do not directly depend on the activities at time t. Our next result (proved in Appendices B.1 and B.2) establishes the necessary condition for uninvadability for closed-loop control paths as follows.
Closed-loop control result.
and the shadow value We now emphasise two points about this result where all quantities are evaluated on the resident control u Ã ¼ d Ã x Ã ð Þ and state x Ã paths. First, the dynamic constraints entail solving forward in time Eq. (27), which is an ODE (ordinary differential equation), and solving backwards in time Eq. (28), which is a PDE (partial differential equation). Thus, the Hamiltonian can be thought as the growth rate of the reproductive value (when looking backwards in time). For a reader familiar with the dynamic programming literature, the Þ is not the so-called value function of the model and hence Eq. (20) (even when evaluated along the candidate uninvadable control path u ¼ u Ã ) is not the eponymous Hamilton-Jacobi-Bellman equation (e.g., Bryson and Ho, 1975;Kamien and Schwartz, 2012;Basar and Olsder, 1999;Dockner et al., 2000;Liberzon, 2011;Weber, 2011). This means that the above result says nothing about the sufficiency of uninvadability, like any standard first-order selection analysis. In this regard, our result provides a weaker, yet simpler and novel condition to characterise closed-loop controls.
Second, by substituting Eq. (25) into (26) yields that any interior candidate uninvadable strategy satisfying s t; u Ã ð Þ¼0 (recall Eq. (12)) must satisfy This fundamental balance condition says that the current inclusive fitness effect (on the focal individual) is traded-off (hence the negative sign) by the state-modulated future inclusive fitness effect resulting from the change in state variables. This trade-off is instrumental in allowing to characterise the candidate uninvadable control Þ , which can be typically done in two steps. The first step is to determine u Ã t ð Þ satisfying (29), while treating the system state x Ã t ð Þ and its shadow value Þas parameters, yielding the implicit expression in terms of some function D (that satisfies Eq. (29)). Essentially, this step is akin to solving a static (one-dimensional) first-order condition (and which can in principle also be used whenever s t; u Ã ð Þ-0, Dockner et al., 2000, p. 97). We will refer to this firststep characterisation as the static characterisation, since it allows to characterise the general nature of the solution in terms of Þindependently of their explicit values. The second step entails solving for the trajectories of Þgenerated by Eqs. (27) and (28) under Eq. (30) and then taking the gradi- Þwe can explicitly characterise the candidate uninvadable control by substituting these solutions into Þis the main technical challenge in finding the candidate uninvadable traits.
It is also often the case in biological models that the Hamiltonian is affine in the control variables so that fitness depends linearly on the evolving traits (e.g. Macevicz and Oster, 1976;Perrin, 1992;Irie and Iwasa, 2005;Avila et al., 2019). In such cases, controls do not appear in the selection gradient (26) and hence, one can not directly determine from it the static characterisation (30). These types of controls are known to be singular arcs (see Kelley, 1964;Kopp and Moyer, 1965;Goh, 1966 for classic developments and see e.g. Sethi and Thompson, 2006;Bryson and Ho, 1975 for textbook treatments). In order to characterise the candidate uninvadable singular arc, we can take the total time derivative of the selection gradient s t; u Ã ð Þ, which (potentially) provides an additional algebraic equation in the variables (u Ã ; x Ã ; k Ã ) that can contain the control(s) with a non-zero coefficient. In case it does not, another time derivative can be taken until expression for u Ã can be obtained. Hence, for singular arcs, we can obtain the static characterisation (30) by applying d dt until u Ã can be obtained. Note that for a candidate uninvadable control to be a singular arc, Eq. (31) has to hold for a finite interval. If Eq. (31) does not hold over a finite interval, then u Ã is known to be a bang-bang control (see e.g. Sethi and Thompson, 2006;Bryson and Ho, 1975), meaning that u Ã takes the values only on its boundaries ( (12)).

Shadow value dynamics and state feedback in a resident population
From the static (first-step) characterisation of (30)), we observe that the candidate uninvadable trait is at most a function of k Ã t; . The dynamics of the shadow value, given by Eq. (32), may at first glance appear to be an ODE (and therefore easier to solve than Eq. (28), which is clearly a PDE for the reproductive value v Ã ). This may lead one to hope that it is possible to circumvent from explicitly determining v Ã , by simply solving Eq. (32) to directly obtain k Ã . But this hope is crushed by the trait dependence on state, which entails that Eq. (32) depends on the derivatives of the elements of d t; x t ð Þ ð Þ with respect to state, which in turn depends on higher-order derivatives of v t; Þ . This can be seen by using Eq. (30), whereby which unveils that Eq. (32) is actually a PDE. This means that in general it is not possible to determine the candidate uninvadable trait from using Eq. (32), which has been repeatedly stressed in optimal control theory (e.g. Bas ßar, 1977). However, the analysis of the components of Eq. (32) has less been stressed, but turns out to be informative in highlighting the main similarities and differences between selection on closedloop and open-loop controls. Lets now decompose Eq. (32) for the component Þ . This says that the rate of change of the shadow value is given by a direct effect of state change on the Hamiltonian (current fitness effect and statemodulated fitness effect) and a feedback effect on the Hamiltonian, which arises since closed-loop traits react to changes in the state. Using the expression for the Hamiltonian in Eq. (25), the expressions for direct and indirect effects in Eqs. (23)-(24), and noting that @d t; x t ð Þ ð Þ =@x t ð Þ ¼ 0, the trait feedback effect can be further expanded as where the derivatives @d t; x t ð Þ ð Þ =@x t ð Þ and @d t; x t ð Þ ð Þ =@x t ð Þ give the trait sensitivities of the focal individual and its average neighbour, respectively, at time t to changes in focal's state variable x t ð Þ. Hence, the feedback effect of state change is equal to the trait sensitivities of all individuals in the group weighted by their effects on the focal's fitness (the latter are effectively the direct and indirect fitness effects).
We now make three observations about Eqs. (34)-(35). First, trait sensitivities result in inter-temporal feedbacks in trait expressions. We can see this by first observing that current trait expression affects changes in state variables (by way of the second line of Eq. (29)) which affect future fitness (measured by the shadow value). In turn, the dynamics of shadow value takes into account that closed-loop traits respond to changes in state variables (by way of the second line of Eq. (35)). That is, the shadow value takes into account the effects of current trait expression on future trait expression. Hence, under closed-loop control, the current trait expression of one individual is linked to future trait expression of itself and other individuals in its group. Second, the sign of the feedback effect of state change (sign of Eq. (35)) determines the direction of the effect of trait sensitivities on the shadow values. This means that the sign of the feedback effect balances the trade-off between current and future (state-modulated) fitness effects (by way of Eq. (29)). For positive feedback effect, trait sensitivity increases future (inclusive) fitness gains (second line of Eq. (29)), while for negative feedback effect, trait sensitivity decreases future (inclusive) fitness gains. Third, the shadow value dynamics given by Eqs. (34)-(35) is different from that in classical results from dynamic game theory (first developed by , ), where the feedback effect through the focal's own trait variation does not appear due to the absence of interactions between relatives, whereby Àc t; u Ã ð Þ¼0 at s t; u Ã ð Þ¼ 0. By contrast, in our model with interactions between relatives one has Àc t; u Ã ð Þþr u Ã ð Þb t; u Ã ð Þ¼0 at s t; u Ã ð Þ¼0. Thus, we recover the classical result for the feedback effect from dynamic game theory when r u Ã ð Þ ¼ 0. We now consider three scenarios (which are relevant for biology) under which the feedback term (given by Eq. (35)) that describes the dynamics of k Ã t; x Ã t ð Þ ð Þvanishes (similar arguments also hold for the feedback term k Ã t; x Ã t ð Þ ð Þand recall that we do not need to consider the dynamics k Ã t; Þdoes not affect the selection gradient). That is, we consider scenarios for which Eq. (32) is a system of ODE's (for compo- Þ ) and therefore solving a PDE (28) for Þis not necessary to determine the candidate uninvadable trait. These three scenarios are as follows.

First-order conditions for open-loop controls
We now focus specifically on open-loop controls by pointing out the simplifications that arise when the decision rule depends only on time: As we showed in the previous section, for open-loop controls the state-feedback term in Eq. (35) for the dynamics of the shadow value , which implies that Eq. (32) is a system of ODE's). Hence, we can characterise the necessary condition for an open-loop control paths as follows. Open Þ . The candidate uninvadable control path u Ã ¼ d Ã has to necessarily satisfy Eq. (12), and the shadow values satisfy This result is Pontryagin's weak principle for interactions between relatives (since only small mutant deviations are considered, Speyer and Jacobson (2010), p. 74) and only requires consideration of the shadow value. It has been derived previously (Day and Taylor, 1997;Day and Taylor, 2000), for a slightly less general model, where individuals locally play the field (fitness only depends on the traits of individuals in the focal group) or interact in a pairwise way (see also Day and Taylor, 1998;Wild, 2011 for related work). Yet this result covers both group-structured and panmictic populations and thus covers the first-order condition result of Metz et al. (2016) as well as those of classical life-history models (e.g., Perrin and Sibly, 1993 for a review). We here re-derived this result as a corollary of the closed-loop result of the previous section when the feedback-terms describing the dynamics of the shadow value k t ð Þ vanish (i.e, Eq. (35) vanishes). Hence, we closed the loop between the selection gradient on function-valued traits, invasion implies substitution, Hamilton's rule, dynamic programming, and Pontryagin's (weak) maximum principle.

Special case controls
We now work out further simplifications that arises in the characterisation of the first-order condition when more specific biological assumptions are made.

Stationary controls
For instance, under several biological situations, the individual fitness function (Eq. (13)) can be expressed in the simpler form Here, the time horizon is large (t f ! 1), there is no scrap value (U x t f ð Þ ð Þ¼0), l is a constant mortality (or discount) rate so that e Àlt can be interpreted as the probability of survival until time t, Þis the current rate of fitness increase at time t, which is assumed to not explicitly dependent on the time variable t (in game theory, fitness functions like Eq. (40) cover the so-called infinite-horizon autonomous differential games, Dockner et al., 2000;Weber, 2011).
The relevant feature of this setting is that, conditional on reaching a certain time (or age), the remaining fitness is the same as the original fitness; future fitness is thus time invariant. This can be seen more explicitly by considering the reproductive value. Indeed, conditional on reaching time t, which occurs with probability e Àlt , the conditional reproductive value starting at t-the current-value reproductive value-is given bỹ and this is Fisher's reproductive value under the present assumptions (e.g., Goodman, 1982;Charlesworth, 1994). It takes exactly the same functional form starting at any time t 2 0; 1 ½ Þ (in gametheory terms it is said that ''fundamentals of the game do not change over time" Dockner et al., 2000, p. 97). Owing to this time invariance, trait expression could be taken to be time invariant as well and we now let the control take the following form which we call a (closed-loop) stationary control (called a stationary feedback control or a stationary Markov strategy in game theory, e.g., Dockner et al., 2000;Weber, 2011 This means that given some fixed initial value x t ð Þ ¼ x init , the current-value reproductive value is the same regardless of the time the process is started. So any variation in the initial value has a time-independent effect onṽ t; x t ð Þ; u ð Þ , which, more formally owing to Eq. (43) satisfies Þis the current-value shadow value that depends on time only indirectly through x t ð Þ. This feature of stationary controls allows to markedly simplify their first-order characterisation (as shown by many examples of the game theory literature e.g., Dockner et al., 2000;Weber, 2011, p. 97). In order to carry out this characterisation explicitly it is useful to use the notion of current-value Hamiltonian (Weber, 2011, p. 111), which in our setting is defined as Applying Eqs. (41)-(45) to Eqs. (26)-(28) establishes the necessary condition for uninvadability of stationary controls as follows.

Stationary
Þcan be written for all t 2 T as Here, Eq. (48) follows from taking the (partial) time derivative of (first two equalities of) Eq. (41) and using Eq. (44) to obtain @ṽ x t ð Þ; u ð Þ =@t ¼ 0 and then using Eqs. (28) and (45). Since, 1=l is the average length of an interaction (or lifespan), Eq. (48) says that total expected fitnessṽ Ã is equal to the expected fitness gain e H from current phenotypic expression (a fitness flow) multiplied by the average length of time that fitness gain can accrue (and this holds for each state). By contrast to Eq. (28), Eq. (48) does not involve any partial derivatives with respect to time. Furthermore, if Þ is also one-dimensional, and in this case Eq. (48) is an ordinary differential equation. Hence, the analysis of stationary controls is generally more tractable.

Constant controls
We finally turn to a type of control that that is not analysed in the optimal control theory literature, yet is relevant to evolutionary biology. This is the case of constant control u t ð Þ ¼ u c 2 R for all t 2 T , where the subscript c emphasises that the control is independent of time. While constant controls are essentially scalar traits, these traits may nevertheless affect the dynamics of state variables that in turn affect fitness, and this makes the analysis of the necessary first-condition for uninvadability to involve time-dependencies. We finally provide this characterisation as a corollary of the ''Open-loop control result" result as follows.
Constant control result. Let u Ã c 2 R be a candidate uninvadable constant control (a scalar trait) with associated state path x Ã ¼ x Ã t ð Þ f g t2T and shadow value k Ã t ð Þ. The candidate uninvadable control u Ã has to satisfy Eq. (10) with the (scalar) selection coefficient and the shadow values satisfy The key difference between this result and the ''Open-loop control result" is Eq. (49), which says that the selection coefficient depends on the derivatives of the Hamiltonian integrated over the interaction period T . This follows directly from the fact that the control is a scalar and a mutant schedule is deviated in the same direction at all time points (hence there are no differences in point-wise deviations). Rather surprisingly, this or closely related results do not seem to have appeared previously in the optimal control nor evolutionary biology literature, even for the special cases of no interactions between individuals. However, this result may be useful for example in connecting our approach with various previous models (see Discussion and Box 1 for a concrete example).

Biological scenario
We here present an application of our results to the production of common pool resource that returns fitness benefits to all group members but is produced at a personal fitness cost. The evolving trait we consider is the schedule u ¼ u t ð Þ f g t2T of effort invested into resource production during an interaction period T 2 0; t f ½ and we will hereinafter refer to the control u t ð Þ as the production effort at time t. Let u t ð Þ; u t ð Þ; u t ð Þ denote the production efforts at time t of the focal individual, average neighbour in the focal group, and average individual (from other groups) in the population, respectively. Let x c t ð Þ and x t ð Þ be the resource level at time t (the total amount of resources produced) in the focal group and the average group in the population, respectively. Note that here we have a common state variable x c t ð Þ ¼ x t ð Þ ¼ x t ð Þ between individuals in the same group and individuals interact through the state only locally (resources produced in other groups does not directly affect the fitness of individuals in the focal group).
We study the evolution of the production effort under two different scenarios: (i) individuals can adjust their production effort according to the (local) resource level x c t ð Þ (closed-loop control) and (ii) individuals are committed to a fixed schedule of production effort (open-loop control). One difficulty in analysing the evolution of such traits is that limited genetic mixing generates relatedness between group members but also competition between them, which leads to kin competition (e.g., Taylor, 1988;Frank, 1998;Rousset, 2004;Van Cleve, 2015). Since we want to highlights the key effects of the evolution of open-loop and closed-loop controls in the context of interactions between relatives in a simple way, we want to avoid the complexities brought by kin competition and thus assume implicitly a life cycle that entails no kin competition and that relatedness is independent of the control r u ð Þ ¼ r. In particular, we assume that individual fitness takes the form which depends on the resource level x c t f ð Þ in the group at the end of interaction period t f and on the accumulated (personal) cost of producing the common resource for the group (the second term in Eq. (52)). Here c effort is a parameter scaling the personal cost. The resource level x c t f ð Þ and c effort are measured in units of the number of offspring to the focal individual and scaled such that they inherently take into account the proportional effect of density-dependent competition (proportional scaling of fitness does not affect the direction of selection). We assume that x c t ð Þ depends on the total amount of production effort that individuals in the focal's group invest into producing it and that the return from this effort decreases exponentially with the current level of resource where the parameter a > 0 is the efficiency of producing the common resource. We now make two observation about this model. First, neglecting the effects of kin competition in Eq. (52) does not lead to any loss of generality in our forthcoming analysis, since taking kin competition into account would only affect the final results by re-scaling the value of relatedness (e.g., Van Cleve, 2015;Mullon et al., 2016). Second, the mathematical properties and thus the structure of the game embodied in Eqs. (52)-(53) are equivalent to those of the parental investment game model of Ewald et al. (2007) from which we took inspiration. However, biologically Eqs. (52)-(53) have a different interpretation, since our model considers interactions between relatives (Ewald et al., 2007 considers pair-wise interactions of non-relatives) and we will compare open-loop and closed-loop controls (Ewald et al., 2007 compares static traits with closed-loop traits). The forthcoming analysis will also conceptually depart from that of Ewald et al. (2007), because it will be based on the (neutral) reproductive value function (28), while their analysis is based on solving the Hamilton-Jacobi-Bellman equation for the optimal value function (see Ewald et al., 2007Ewald et al., , p. 1456 with which the present n-player scenario cannot be obtained as a trivial extension of the approach used in Ewald et al. (2007).

Static characterisation of the production effort
From Eq. (52), the reproductive value (Eq. (19)) for this model is We denote the corresponding shadow value with Eq. (52) also entails that the fitness components f and U (as defined in Eq. (13)) take the while the rate of change of the state variable x c t ð Þ is given by On substituting Eqs. (55)-(56) into the Hamiltonian (25) produces which gives the direct fitness effect Þ . Hence, the balance condition (29) for this model reads This says that the net effect on accumulated personal cost due to spending effort to produce a unit resource must balance out the inclusive fitness benefit associated with that unit resource. Solving Eq. (60) for u Ã t ð Þ yields scales the benefit to cost ratio of producing the resource, note also that c > 0. Eq. (61) says that (candidate uninvadable) production effort u Ã t ð Þ decreases exponentially with the resource level x Ã t ð Þ, increases linearly with the shadow value and relatedness, and is not directly dependent on time. This general nature of the solution applies to both open-loop and closed-loop controls and is depicted in Fig. 2.

Closed-loop production effort
We now turn to analyse u Ã t ð Þ explicitly as a function of time when the control is the closed-loop To that end, we evaluate the dynamic Eq. (53) for x c t ð Þ along u ¼ u Ã and x c ¼ x Ã and substituting the expression for u Ã t ð Þ from Eq. (61), whereby Substituting Eqs. (57) and (61) into Eq. (28) and simplifying yields (61)) of the candidate uninvadable production effort as a function of resource Using the method of characteristics, Ewald et al. (2007) showed that the partial differential Eq. (64) has the following solution Substituting this into the static characterisation Eq. (61) shows that where the state variables is the solution of This system has one real-valued solution and substituting this solution back into Eq. (61) produces which turns out to be constant in time.

Comparison between closed-loop and open-loop production efforts
This example illustrates our generic result that in a population of clonal groups (r ¼ 1) closed-loop and open-loop equilibria coincide (Fig. 3). In a population of non-clonal groups (r < 1) the production effort u Ã t ð Þ and the resulting amount of resource x Ã t ð Þ tend to be lower under the closed-loop equilibrium than under the open-loop equilibrium (Fig. 3). Overall, the production effort monotonically increases over time for the closed-loop control and stays constant under the open-loop control (Fig. 3).
The difference between closed-loop and open-loop (non-clonal) equilibria arises from the difference in the shadow value dynamics (recall Section 3.2.2). We find that the shadow value is lower under closed-loop control than under open-loop control when r < 1 (Fig. 4). This is so because of the feedback effect of state change (Eq. (35)), which for our example is Since this is negative, the shadow value declines faster backwards in time than under the closed-loop equilibrium. In order to understand why the feedback effect is negative, we need to consider the signs of b t; u Ã ð ÞÀc t; u Ã ð Þand the trait sensitivity @d t; x c t ð Þ ð Þ =@x c t ð Þ (which is here the same for all group members). The term b t; u Ã ð ÞÀc t; u Ã ð Þis positive when groups are non-clonal and zero when they are clonal (Fig. 5, panel a). This means that if everyone in the group produces more of the resource, then the focal's fitness increases under the non-clonal equilibrium and is unaffected under the clonal equilibrium. The trait sensitivity is always negative (Fig. 5, panel b). Hence, individuals will reduce their production effort in response to an increase in the resource level and the magnitude of this effect is larger for higher values of relatedness.
In conclusion, investment effort is lower for closed-loop traits than for open-loop traits in a population of non-clonal groups (r < 1), because closed-loop trait expression takes into account that other individuals will reduce their production effort in response to the focal individual increasing its production effort. In a population of clonal groups where the focal individual increases its trait expression, the response from other individuals will not affect the fitness returns to the focal. For open-loop controls, the response from other individuals can never affect the fitness returns, because open-loop control trajectories are predetermined at birth (full commitment to control trajectory) and trait expression can not be adjusted in response to changes in the resource level. For clonal groups, trait sensitivity to changes in the resource level does not alter individual behaviour, because everyone's interests in the group are aligned.

Biological scenario
We finally present an application for a stationary (closed-loop) control, which involves the same demographic assumption as the previous example but with fitness given by Eq. (40). And we now let u t ð Þ; u t ð Þ; u t ð Þ denote the extraction rates of a given resource with abundance x c t ð Þ at time t in the focal group, which is again taken as a common state variable among group members ( . This resource is assumed to follow the dynamics and is thus depleted at a rate given by the sum of the extraction rates within the focal's group. We assume that the resources extracted by the focal individual translates into current-value fitness according tof where the parameter a > 0 is the efficiency of turning extracted resources into producing offspring and r 2 0; 1 ð Þ is a parameter shaping the concavity of the production function, which thus exhibits diminishing returns (we used Eq. (75) instead of the perhaps biologically more realistic Holling type 2 functional response for ease of calculations). The model defined by Eqs. (74)-(75) recasts into the homogeneous island model the standard baseline resource extraction game of environmental economics (Dockner et al., 2000, chapter 12.1;Weber, 2011, chapter 4.3).

Static characterisation of the extraction rate
On substituting Eqs. (74)-(75) into the current-value Hamiltonian (45) and considering only a current-value shadow valuẽ This yields the direct fitness effect For this model, the balance condition reads This says that the net current personal benefit of a unit extracted resource must balance out the future inclusive fitness cost resulting from depleting that unit resource, and yields the equilibrium extraction rate

Stationary (closed-loop) extraction rate
Let us now analyse u Ã t ð Þ explicitly as a function of time when the control is stationary is a constant that we assume to be positive (which thus requires that 1 þ r N À 1 ð Þ> N 1 À r ð Þ ). Following Dockner et al. (2000), p. 325 and arguments therein, the solution to Eq. (81) under the given constraints is Owing to the fact thatk c Ã x Ã t ð Þ ð Þ¼ l=c 1 ð Þ Àr x Ã t ð Þ=1 À r ð Þ Àr , the stationary control (80) written explicitly as a function of time becomes can be interpreted as the equilibrium extraction rate of a unit resource by an individual, since x Ã t ð Þ such units are available at t. Substituting Eq. (84) into Eq. (74) and solving the latter gives the candidate uninvadable control and state path explicitly as When r ¼ 0, we thus recover the standard result for the stationary control established in the game theory literature [the symmetric stationary Markov Nash equilibrium (Dockner et al., 2000, Eq. (12.38)), while when r ¼ 1 we recover the game theory cooperative solution (Dockner et al., 2000, Eq. (12.7)).

Open-loop extraction rate
We now work out an open-loop control for this model and to this end it is useful to express the static characterisation (80) in terms of the original shadow value k Ã c t ð Þ (not the current value shadow valuek c Ã t ð Þ), which, on recalling thatk c Ã t ð Þ ¼ e tl k Ã c t ð Þ, yields Note that _ k Ã c t ð Þ ¼ 0 in Eq. (39) because the partial derivative of the current-value Hamiltonian e H with respect to the second argument is zero (by way of Eq. (76)) and recall also that e H ¼ e tl H (hence the partial derivative of Hamiltonian with respect to its third argument is also zero). From this it follows that k Ã c t ð Þ ¼ K for some constant K > 0 (positive since otherwise no resources will be extracted). Inserting this into Eq. (87) and then substituting the resulting expression for u Ã t ð Þ into the state dynamic (74) gives whose solution is We now seek an open-loop solution u Ã t ð Þ that satisfies lim t!1 That is, we assume that all the resources will be depleted asymptotically. In game-theory terms, this corresponds to the case of a strictly feasible open-loop control (see Dockner et al., 2000, p. 319-320), which implies that we do not allow for individuals to extract all the resources in finite time (see Dockner et al., 2000, p. 321-323 for alternative open-loop controls when this is allowed). Now substituting Eq. (89) into Eq. (90) and taking the limit yields that K ¼ a Nr Þ . In turn, using the value of this constant in Eq. (87) and Eq. (89) allows us to determine explicitly the candidate equilibrium control and state paths as [Note that formally this solution also satisfies open-loop solution with a terminal condition lim t!1 x Ã t ð Þ P 0, which implies a transversality condition given by Note 3 by Sydsaeter et al. (2005), p. 349 and were their x 1 ¼ 0 and k t ð Þ ¼ e tl k Ã t ð Þ > 0 for our problem, which leads together with Note 4 that the terminal condition lim t!1 x Ã t ð Þ P 0 simplifies to (90)].

Comparison between closed-loop and open-loop resource extraction rate
Resource extraction crucially depends on relatedness r under the (stationary) closed-loop control and resources are extracted much faster when relatedness is low (see Fig. 6). Interestingly, under open-loop control resource extraction is independent of relatedness and corresponds, regardless of population structure, exactly to the open-loop control established for this model in the game theory literature (Dockner et al., 2000, Chapter 12.1), which itself corresponds to the so-called ''cooperative solution" maximising group payoff (Dockner et al., 2000, Theorem 12.1, p. 320). Hence, relatedness can have no impact on slowing down the extraction rate, which already takes the optimal path from the perspective of the group. Intuitively, one thus expects that the closedloop equilibrium coincides with the open-loop equilibrium when groups are clonal (r ¼ 1), which is indeed the case and again instantiates our broad result that state-feedback has no effect when interacting individuals are clonal.

Discussion
We formalised the directional selection coefficient on a genetically determined function-valued trait when interactions occur between individuals in a group-structured population subject to limited genetic mixing. This selection coefficient describes the directional evolution of a quantitative function-valued trait and determines three relevant evolutionary features. First, it gives the invasion condition of a mutant allele coding for a multidimensional phenotypic deviation (the deviation of a whole function) of small magnitude and takes the form of Hamilton's marginal rule Àc þ rb > 0, where the marginal direct fitness effect Àc and the marginal indirect fitness effect b are given by directional derivatives (formally Gâteaux derivatives). Second, the selection gradient is frequencyindependent (same for all allele frequencies) and thus underlies gradual evolution of function-valued traits, since Àc þ rb > 0 implies not only initial invasion of the mutant function-valued deviation, but also substitution of the resident ancestral type in the population. Finally, the stationary selection gradient (i.e. when Àc þ rb ¼ 0) gives the necessary first-order condition for uninvadability and allows to characterise long-term evolutionary outcomes. While these three features are well known to hold for scalar traits (e.g., Rousset, 2004;Lehmann and Rousset, 2014;Van Cleve, 2015), our derivation of Hamilton's marginal rule for multidimensional traits generalises them to traits of arbitrary complexity. Connecting Hamilton's marginal rule with optimal control and differential game theory, we developed an approach to characterise the necessary first-order condition for the uninvadability of dynamic traits, which applies to both open-loop controls, whose expression is only time-dependent, and closed-loop controls, whose expression is dependent on dynamic state variables as well. We showed that Hamilton's rule in this context can be decomposed into current inclusive fitness effect, on one side, and future inclusive fitness effect, on the other. The latter effect arises through changes in the states of interacting individuals and depends on the (neutral) shadow values of the states. The shadow value of a state measures how a current change in that state variable affects all future fitness contributions in a population at a demographic equilibrium in the absence of selection, which is given by the neutral current-value reproductive value (or residual fitness). The shadow values are thus central in balancing the trade-off between current and future fitness effects and thus in shaping inter-temporal tradeoffs; a feature well-know for open-loop controls (e.g.,  for a review) and that we showed applies equally well to closed-loop traits, which is a result that seems to have neither appeared previously in the differential game theory literature.
Open-loop and closed-loop trait characterisations have sometimes been used in the literature to analyse the same biological phenomena. Our analysis allows for a direct comparison between these two different modes of trait expression. While the selection coefficient takes the same form (Hamilton's rule) for both, the dynamic constraints are different and this is captured by differences in the dynamics of the shadow value. For open-loop traits, the shadow value dynamics for a given state depends only on how variation in that state variable affects current fitness and state dynamics. For closed-loop traits, the shadow value dynamics depends additionally on the state feedback effect, which captures how a variation in the state variable brings forth variations in traits of all individuals in interaction and this in turn affects current fitness and state dynamics (Eq. (35)). This causes inter-dependencies between the states of different individuals and inter-temporal effects between trait expressions that are absent under open-loop controls. Analysis of the feedback effect leads to two insights about the role of statedependence of trait expression in shaping trait evolution. First, the sign of the feedback effect (the sign of Eq. (35)) determines if the shadow value is larger (for positive feedback effect) or smaller (for negative feedback effect) for closed-loop traits than for open-loop traits. Second, state-dependence of trait expression plays no role if there are no social interactions between individuals or interactions occur only between clones (r ¼ 1), in which cases the candidate uninvad-able open-loop and closed-loop trait expressions coincide (and the state-feedback term is zero). This means that the use of closedloop controls appearing in life-history models without interactions between individuals (e.g., Houston and McNamara, 1999) do not lead to different results if instead an open-loop representation of traits would have been used.
We worked out two examples to illustrate these concepts, one of common pool resource production and the other of common pool resource extraction. Interactions under closed-loop trait expression cause individuals to invest less into common pool resource production (and more in extraction) at an evolutionary equilibrium. These results are in line with the literature of economic game theory, where ''period commitments" lead to higher levels of cooperation in common pool situations (Meinhardt, 2012). Indeed, an open-loop control can be viewed as a trait (or strategy) committing to its expression over the entire interaction time period, since it cannot be altered in response to some change experienced by individuals. A closed-loop control has no commitment over time since it is expressed conditional on state. Depending on the nature of the interaction between individuals, closedloop trait expression can then also lead to higher levels of cooperation. A concrete example is an analysis of the repeated prisoner's dilemma game, where open-loop controls lead to defection, while closed-loop controls can sustain cooperation (e.g., Weber, 2011). The reason why closed-loop strategies are able to sustain cooperation under repeated prisoner's dilemma is that they allow to condition trait expression on the actions of others, and thus take into account the future threat of punishment. In other words, for closed-loop traits current actions are linked to future ones. Hence, only closed-loop strategies can sustain the reciprocity principle of repeated games by giving rise to incentives that differ fundamentally from those of unconditional trait expression (see Binmore, 2020, p. 87 for a characterisation of this principle).
We finally discuss the scope and limitations of our formalisation. First, concerning scopes, we focused explicitly on the two main types of controls, but also worked out the simplifications that arise under special cases of these controls. In particular, under (closed-loop) stationary controls, where trait expression depends only on state and not on time and the time horizon of the interaction period becomes large, the PDE for the (current-value) reproductive value function no longer depends on time and becomes easier to solve (see ''Stationary control result"). Stationary controls have been considered in evolutionary biology and conservation biology, e.g. to model foraging strategies (McNamara et al., 1991;Mangel and Clark, 1988), web-building behaviour (Venner et al., 2006), adaptive management plans (Chadès et al., 2017). Finally, under (open-loop) constant controls, where trait expression depends neither on state nor on time, the ODEs' for the shadow values and state dynamics become autonomous, which makes it the simpler case to analyse (see ''Constant control result"). Several concrete biological situations fall into this category. For instance, neural networks are dynamical systems whose output is controlled by a finite number of scalar weights (Haykin, 2009), the selection on which is an example of a situation with constant control if weights are taken to be traits evolving genetically (see Ezoe and Iwasa, 1997 for an application to evolutionary biology). Likewise, phenomena as different as gene expression profiles and learning during an individual's lifespan can be regarded as the outcomes of dynamical systems controlled by a finite number of constant traits (e.g. see respectively Alon, 2020;Dridi and Akçay, 2018). The case of constant control is thus likely to be widespread in models in evolutionary biology and it would be interesting to work out more explicitly the connection between models for the evolution of learning and those based on control theory. Owing to the presence of both a state dynamics and a decision rule, the control theory approach to trait expression has a universal character (Haykin, 2009, chapter 15), which should thus in principle be able to cover all types of trait expression. Concerning limitations, we considered deterministic state dynamics but stochastic state dynamics may be interesting to consider in the future by way of applying stochastic optimal control theory (e.g. Kamien and Schwartz, 2012). Perhaps more importantly, we modelled a population reproducing in discrete time, where within each time period individuals can interact for a fixed time interval. As such, the vital rates of individuals can change over this interaction period, but not between interaction periods. Hence, our model with limited dispersal and time-dependent vital rates applies to semelparous species, which covers models with conflict between relatives in annual organisms (Day and Taylor, 2000;Avila et al., 2019). Furthermore, if we allow for complete dispersal between groups (r ¼ 0), then our framework can be used to address the evolution of function-valued traits under overlapping generations with time-dependent vital rates as in continuous time classical life history models with and without social interactions (e.g. León, 1976;Schaffer, 1983;Stearns, 1992;Perrin, 1992), but we add the possibility of considering the evolution of closed-loop controls. This scenario is encapsulated in our formalisation because the individual fitness function we used to analyse dynamic trait evolution (Eq. (13)) takes the same functional form as the basic reproductive number in age-structured populations (and which is sign equivalent to the Malthusian or the geometric growth rate, e.g., Karlin and Taylor, 1981, pp. 423-424). As such, our results on closed-loop controls (Section 3.2) allow to characterise long-term evolutionary outcomes when the fitness of an individual takes the form of the basic reproductive number. For this situation, our results for open-loop controls (Section 3.3) reduce to the standard Pontryagin's weak principle used in life-history models (e.g. Schaffer, 1983;Stearns, 1992;Perrin, 1992). In order to cover time-dependent vital rates with overlapping generations within groups under limited dispersal, one needs to track the withingroup age structure (e.g. Ronce et al., 2000), which calls for an extension of our formalisation. Finally, we did not consider between-generation fluctuations in environmental conditions, which certainly affect the evolution of function-valued traits and it would be interesting to investigate this case. Hence, while our results are not demographically general, our hope is that the present formalisation is nevertheless helpful in providing broad intuition about the nature and conceptualisation of directional selection on phenotypically plastic traits.
Box 1: Hamilton's indicators of the force of selection as selection on constant controls.
Assume that individual fitness takes the form of the basic where controls are constant (u ; u ; u 2 R),f u t ð Þ; x t ð Þ ð Þis the fecundity at age t and l t ð Þ is the probability of surviving until age t treated as an additional state variable satisfying the Þis the mortality rate of the focal individual. Then, it follows from our Constant Control Result that under these assumptions, the selection coefficient on a constant control that acts only at time t can be written as with Hamiltonian Here, k l t ð Þ ¼ @v t ð Þ=@l t ð Þ ¼ṽ t ð Þ is the shadow value of the cumulative survival probability evaluated at the resident population, which is precisely the neutral current-value reproductive value; namely, Fisher's reproductive value, since by definitionṽ t where l t ð Þ is evaluated at the resident population (see also Goodman, 1982;Perrin, 1992 for making this connection and Perrin, 1992; for writing the Hamiltonian as in Eq. (93) by treating l t ð Þ as a distinguished state variable). Assuming no interactions between relatives and that the control affects only fecundity and survival, Eq. (92) becomes This is consistent with Hamilton's indicators of the force of selection (Hamilton, 1966). Indeed, for a modifier of fecundity, the first term in Eq. (94) is consistent to that of Hamilton's analysis (e.g., Ronce and Promislow, 2010, Eq. (3.3)). For a modifier of survival, the second term in Eq. (94) is consistent that of Hamilton's analysis (e.g., Ronce and Promislow, 2010, Eq. (3.2), where l t ð Þ ¼ À log p t ð Þ ð Þ and p t ð Þ is the survival probability at t and their generation time T is a scaling factor that coverts the first-order perturbation of the basic reproductive number into that of the geometric growth rate, and explains why T does not appear in (94) but appears in previous formulations of Hamilton's indicators of the force of selection on agespecific modifier traits derived from the geometric growth rate of a mutant, e.g. Charlesworth, 1994, chapter 5.1). Finally, we note that in the presence of interaction between relatives, Eq. (92) is consistent with results of kin selection in age-structured populations (e.g. Charlesworth, 1994, chapter 5.3.5). Note that the vital rates in Eq. (94) can depend in a non-linear way on phenotypic effects and may be time-dependent since they depend on state.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements
We thank Charles Mullon for helpful comments on the manuscript.

Appendix A. Derivation of Hamilton's rule for function-valued traits
In this Appendix, we prove the gradient version of Hamilton's rule for function-valued traits and show that this provides an invasion implies substitution principle under weak selection (Eqs. (4)-(8)). A central concept used in our proof is the notion of a Gâteaux derivative.

A.1. Gâteaux derivative and point-wise functional derivative
Let F : Y T ½ ! R be some functional where Y T ½ is a (topological) vector space over a domain T and assume that for some y 2 Y T ½ the limit exists for all deviations g that satisfy y þ g 2 Y T ½ for a sufficiently small non-negative parameter . Then, the function F is said to be Gâteaux differentiable at y, and d g F y ð Þ=d g y is the shorthand notation for a Gâteaux derivative at y in the direction of g (Hille and Phillips, 1957, Section 3). The Gâteaux derivative can thus be thought of as a generalisation of the directional derivative familiar from finite dimensional spaces. Most rules that hold for ordinary derivatives also hold for Gâteaux derivatives, e.g. Taylor's theorem and the chain rule (e.g. see Section 2.1C in Berger, 1977, or, Appendix A of Engel andDreizler, 2013). The Gâteaux derivative can be expressed in terms of point variations (e.g. see Engel and Dreizler, 2013, Eqs. (A.15) and (A.28)) as is the point-wise functional derivative of F with respect to y t ð Þ and being the value of the function at s (d t s ð Þ ¼ 0 for all st). That is, Eq. (A.3) is the partial derivative of F with respect to y t ð Þ and hence we use the more familiar ''partial derivative" notation from finite dimensional spaces for it. The representation in Eqs. (A.2)-(A.3) is useful because it allows, for instance, to take a functional derivative of fitness with respect to the trait, and partition it into a deviation g t ð Þ and a marginal fitness effect at a specific (single) time point t 2 T ; @F y ð Þ=@y t ð Þ (i.e. a point-wise marginal fitness effect), and only then integrate over the domain T .

A.2. Dynamics of mutant-frequency
Consider that the mutant allele coding for trait u m and the resident coding for trait u, segregate in the homogeneous island population as described in the main text. Because no individual-level demographic heterogeneity is assumed withing groups (i.e., no class structure), each group can be characterised, from a population genetic state perspective, by the number of mutants that inhabit a given group and we denote the set of all group genetic states with I ¼ 0; 1; 2; . . . ; N f g . The state of the entire homogeneous island population can thus be described with the vector / s ¼ / i;s È É i2I where / i;s is the frequency of groups with i mutants at demographic time s. Since population size is constant in the homogeneous island population (mean fitness is one), the change in the average frequency Dp s ¼ p sþ1 À p s of the mutant allele from demographic time s to s þ 1 (over one life-cycle iteration) can be expressed as Dp s ¼ W u m ; u; / s ð Þ p s À p s ; ðA:4Þ where W u m ; u; / s ð Þis the marginal fitness (or lineage fitness) of the mutant allele. Namely, this is the expected number of offspring (including the surviving self) produced by a randomly sampled mutant individual from the collection of all mutants in the population when the distribution of mutants across groups is / s . This fitness can be written as the average W u m ; u; q s ð Þ¼ X i2Iŵ u m ; u i ; u / À Á q i;s ; ðA:5Þ where q i;s ¼ i/ i;s = P k2I k/ k;s is the probability that a randomly sampled mutant resides in a group with i mutants (whence P i q i;s ¼ 1) and where u i ¼ u m ; u; i À 1 ð Þand u / ¼ u m ; u; / s ð Þare vectors that describe, from the perspective of a mutant sampled in a group with i mutants, the distribution of traits among group neighbours (local individuals) and in the groups in the population at large (non-local individuals), respectively. The functionŵ : U Â U 2 Â IÂ U 2 Â D I ð Þ ! R þ is the individual fitness where D I ð Þ denotes the space of frequency distributions on I (i.e. the simplex in R Nþ1 ), and as such,ŵ u m ; u i ; u / À Á gives the fitness of a mutant when among its neighbours i À 1 individuals have trait u m and N À i À 1 ð Þ have trait u, and in the groups in the population at large, mutant and resident traits follow the / s distribution. When the mutant is rare, Eq. (A.5) reduces to the invasion fitness of the mutant allele in the homogeneous island population (Mullon et al., 2016, Eq. (1)).

A.3. Weak-selection approximation
We now study mutant gene frequency change Dp s assuming small (but where 0 P p s 6 1). To that end, it is useful to note that the fitness of a mutant in a group in state i can be approximated by writing it in terms of average traits aŝ w u m ; u i ; u / À Á ¼ŵ u m ; Þ specifies that all group neighbours have the same group average trait Þ= N À 1 ð Þ ½ being the frequency of mutants among neighbours, while u / ¼ u; u; / s ð Þspecifies that all non-local individuals have the same average population trait u ¼ u þ gp s with p s ¼ P i2I i=N ð Þq i;s being the average mutant frequency in the population. Eq. (A.6), which has been used for scalar traits (Rousset, 2004, p. 95), follows by Taylor expandingŵ u m ; u i ; u / À Á to the first-order about ¼ 0 and using the chain rule (which applies to Gâteaux derivatives, e.g. Eq. (A.38) in Engel and Dreizler, 2013) to see that the coefficients of the Taylor series involve (at most) Gâteaux derivatives weighted by average allele frequencies. This is an instantiation of the so-called generalised law of mass action (Meszéna et al., 2005;Dercole, 2016) and is secured by the assumption that all individuals within a group that have the same trait are exchangeable (individuals are demographically homogeneous). Because all non-local (mutant and resident) individuals are considered to have the same average trait (the same is true for group neighbours),ŵ u m ; u i ; u / À Á is de facto independent of / s . This allows us to further simplify the right-hand side of Eq. (A.6) by writinĝ w u m ; u i ; u / À Á ¼ w u m ; u i ; u ð Þ ; ðA:7Þ where the function w : U 3 ! R þ is the (average) fitness function introduced in Section 2 of the main text, where we do not need to detail mutant distributions. Hence, w u m ; u i ; u ð Þis the fitness of an individual with trait u m in terms of only the average trait u i of its group-neighbours and the average trait u of individuals in the population.  where all partial derivatives here and henceforth are evaluated at the resident value u. Since all the partial derivatives are independent of any allele frequency, they give the effects on any individual's fitness stemming, respectively, from itself, its average neighbour, and an average population member by varying (infinitesimally) trait expression. Hence, the type of the actor is not relevant when evaluating the fitness effects and we can equivalently write Eq. (A.11) as dw u m ; u i ; u ð Þ d where we replaced the variables u m ; u i , and u with u , u , and u (note that we have already substituted the resident trait into the final argument). This will be useful subsequently as it makes clear that fitness effects are independent of individual types and thus allows us to focus attention on the fitness of a focal individual.
Substituting Eq. (A.12) into Eq. (A.10) gives where we took into consideration that the sum of partial derivatives of the fitness function with respect to all of its arguments is zero (since population size is constant, see e.g. Rousset, 2004, p. 96 for scalar traits). Because P i2I p i;s q i;s ¼ p mjm;s is the probability that, conditional on being a mutant, a randomly sampled neighbour is also a mutant, and p mjm;s p s ¼ p mm;s is the probability that two randomly sampled individuals are both mutants (i.e., frequency of mutant pairs), Eq. (A.13) can be written Hence, to the first order in , the dynamics of Dp s is a function of only direct and indirect fitness effects evaluated in the resident population, and the average frequency p s and mutant-pair frequency p mm;s . Further, we only need to study the dynamics of p mm;s under neutrality ( ¼ 0) because any higher order terms contribute to O 2 À Á in Eq. (A.14). Eq. (A.14) thus generalises to functionvalued traits, a standard result for scalar traits (first detailed in Roze and Rousset, 2003 and re-derived a number of times since, e.g., Roze and Rousset, 2004;Rousset, 2004;Roze and Rousset, 2008;Lehmann and Rousset, 2014).

A.3.2. Mutant-pair dynamics and relatedness
Using standard population genetic arguments for writing recursions of moments of allelic state (e.g., Jacquard, 1974;Nagylaki, 1992;Roze and Rousset, 2008), we have p mm;sþ1 ¼ P 1 u ð Þp s þ P 2 u ð Þp mm;s þ 1 À P 1 u ð Þ À P 2 u ð Þ ð Þ p 2 s ; ðA:15Þ where P 1 u ð Þ is the fraction of pairs within groups (of two randomly sampled individuals in the same group without replacement) that descended from the same individual in the previous demographic time step (so that possibly one individual in the pair is the parent of the other in the presence of survival). The quantity P 2 u ð Þ is the fraction of pairs that have descended from two distinct individuals in the previous demographic time period, and where all these coefficients are constant since they are evaluated under ¼ 0 and thus depend at most on the resident trait u. The steady state can be solved explicitly and one getŝ p mm ¼r u ð Þp þ 1 Àr u ð Þ ð Þ p 2 ; ðA:16Þ where we used the fact that for ¼ 0 p s ¼ p is a constant, (see Section A.4 for a formal argument) and wherê r u ð Þ ¼ P 1 u ð Þ 1 À P 2 u ð Þ ðA:17Þ is the relatedness in a patch at the steady state, i.e., the fraction of pairs at the steady state that have a common ancestor in the patch. Owing to neutrality, this is also the probability that a randomly sampled neighbour of a randomly sampled focal individual, carries the same allele as the focal. Moreover, the steady stater u ð Þ changes continuously with a resident trait whenever P 1 u ð Þ and P 2 u ð Þ change continuously. where all the derivatives in the matrix are evaluated at