Topological constraints in early multicellularity favor reproductive division of labor

Reproductive division of labor (e.g. germ-soma specialization) is a hallmark of the evolution of multicellularity, signifying the emergence of a new type of individual and facilitating the evolution of increased organismal complexity. A large body of work from evolutionary biology, economics, and ecology has shown that specialization is beneficial when further division of labor produces an accelerating increase in absolute productivity (i.e. productivity is a convex function of specialization). Here we show that reproductive specialization is qualitatively different from classical models of resource sharing, and can evolve even when the benefits of specialization are saturating (i.e. productivity is a concave function of specialization). Through analytical theory and evolutionary individual-based simulations, we demonstrate that reproductive specialization is strongly favored in sparse networks of cellular interactions that reflect the morphology of early, simple multicellular organisms, highlighting the importance of restricted social interactions in the evolution of reproductive specialization.


Introduction
The evolution of multicellularity set the stage for unprecedented increases in organismal complexity (Szathmáry and Smith, 1995;Knoll, 2011). A key factor in the remarkable success of multicellular strategies is the ability to take advantage of within-organism specialization through cellular differentiation (Queller and Strassmann, 2009;Brunet and King, 2017;Cavalier-Smith, 2017). Reproductive specialization, which includes both the creation of a specialized germ line during ontogeny (as in animals and volvocine green algae) and functional differentiation into reproductive and non-reproductive tissues (as in plants, green and red macroalgae, and fungi), may be especially important (Cooper and West, 2018;Michod et al., 2006;Ispolatov et al., 2012;Solari et al., 2013;Michod, 2007;West et al., 2015). Reproductive specialization is an unambiguous indication that biological individuality rests firmly at the level of the multicellular organism (Michod, 1999;Folse and Roughgarden, 2010), and is thought to play an important role in spurring the evolution of further complexity by inhibiting within-organism (cell-level) evolution (Buss, 1988) and limiting reversion to unicellularity (Libby and Ratcliff, 2014). Despite the central importance of reproductive specialization, its origin and further evolution during the transition to multicellularity remain poorly understood (McShea, 2000).
The origin of specialization has long been of interest to evolutionary biologists, ecologists, and economists. A large body of theory from these fields shows that specialization pays off only when it increases total productivity, compared to the case where each individual simply produces what they need (Szathmáry and Smith, 1995;Smith and Szathmáry, 1997;Goldsby et al., 2012; which viability can be shared across connected cells, but fecundity cannot be shared (note, in order to test the sensitivity of our predictions to this assumption, in a later section we will consider the more general case in which viability and fecundity can both be shared, but by different amounts).
We consider a model of multicellular groups composed of clonal cells that each invest resources into viability and fecundity. Because there is no within-group genetic variation, within-group evolution is not possible, though selection can act on group-level fitness differences. Specifically, we consider the pattern of cellular investment in fecundity and viability, and their sharing of these resources with neighboring cells within the group, to be the result of a heritable developmental program. Thus, selection is able to act on the multicellular fitness consequences of different patterns of cellular behavior within the group. We let v denote each cell's investment into viability, and b denote each cell's investment into fecundity. Each cell's total investment is constrained so that v þ b ¼ 1. However, a cell's return on its investment is in general nonlinear. Here, we let a represent the 'return on investment exponent': by tuning a above and below 1.0, we can simulate conditions with accelerating and saturating (i.e. convex and concave, or super-and sub-linear) returns on investment, respectively. We letṽ andb represent a cell's return on viability and fecundity investments, respectively. Following Michod, 2006;Michod and Roze, 1997, we calculate a cell's reproductive output as a multiplicative function ofṽ andb (thus, both functions must be positive for a cell to grow). A single cell's reproduction rate is w ¼ṽb ¼ v a b a . At the group level, fitness is the total contribution of all cells in the group toward the production of new groups (i.e. group level reproduction). The group level fitness is thus the sum ofṽb over all cells.
Finally, cells may share the products of their investment in viability with other cells to whom they are connected.For a given group, the details about who may share with whom, and how much, is encoded in a weighted adjacency matrix c. The element c ij defines what proportion of viability returns cell i shares with cell j. Cells cannot give away all of their viability returns, as they would no longer be viable; mathematically, we count a cell among its neighbors and thus ensure that they always 'share' a positive portion of viability returns with themselves, so that c ii >0. Furthermore, since a cell cannot share more viability returns than the total it possesses, we have P N i¼1 c ji ¼ 1 for a group of N cells. For the networks we consider, each cell takes a fraction b of its viability returns and shares that fraction equally among all of its n i neighbors (including itself), and keeps the rest of its returns 1 À b for itself. Therefore cell i keeps a total fraction of 1 À b þ b ni of its returns for itself and gives b ni to each of its non-self neighbors. In other words, ni if cells i and j are connected, and c ij ¼ 0 if cells i and j are not connected. This means the total amount of returns kept by cell i depends on both the network topology and b. When b ¼ 0 there is no sharing, and when b ¼ 1 cells share everything equally among all connections and themselves. We refer to b as interaction strength. A given group topology (unweighted adjacency matrix) and b completely specify c.
Within a group of N cells, the overall returns on viability that a given cell enjoys, then, comprises its own returns as well as whatever is shared with it by other members of the group. This can be writ- ¼i v a j c ji , or equivalently,ṽ i ¼ P n j v a j c ji . Note that this is a column sum, since it describes the total incoming viability returns a cell receives as a result of its own effort and trade with neighboring cells. Therefore, we write the group level reproduction rate (i.e. the group fitness) for a group of N cells as where all three of the above equations are equivalent. We investigate evolutionary outcomes under this definition of group level fitness for groups with different topologies (who shares with whom), and in scenarios with various return on investment exponents a.

Fixed resource sharing
We first consider cases wherein cells within a group share across fixed intercellular interactions. In each case we vary the return on investment exponent, a, between 0.5 and 1.5, and the interaction strength, b, between 0.0 and 1.0, both in increments of 0.1. For each combination of topology, a, and b, the group investment strategy (v i for all i) was allowed to evolve for 1000 generations. We begin with simple topologies: groups with no connections and groups that are maximally connected. They represent, respectively, the case in which all cells within the group are autonomous and the case in which every cell interacts with all others (i.e. a 'well-mixed' group). In the absence of interactions, cells cannot benefit from functions performed by others and therefore must perform both functions v and b; hence specialization is not favored, and does not evolve. In the fully connected case, a high degree of specialization is observed for many values of a and b (Figure 1a). Consistent with classic results (Cooper and West, 2018;Michod et al., 2006;Ispolatov et al., Figure 1. Schematic of topology for a simplified ten cell group (first row), and mean specialization as a function of specialization power a and interaction strength b across the entire population. (A) When each cell in the group is connected to all others, specialization is favored only when a>1. (B) For the nearest neighbor topology, specialization is favorable for a wider range of parameters, including for some values of a<1. Specifically, specialization is advantageous when a> 3 4b . (C) Connecting alternating specialists creates a bipartite graph which maximizes the benefits of specialization and the range of parameters for which it is advantageous. In this case, specialization is favorable wherever a> 3 5b . The red curves represent analytical predictions for a Ã , the lowest value of a for which complete generalization is disfavored, and the orange vertical lines are at a ¼ 1 to guide the eye. While analysis shows that some degree of specialization must occur in the regime upward and to the right of the red curves, simulations reveal that when complete generalization is disfavored complete specialization is favored in these networks. 2012; Solari et al., 2013;Michod, 2007;West et al., 2015), specialization is only achieved in the fully connected case for a>1.
Next, we consider a simple sparse network in which each cell within a group is connected to only two other cells, forming a complete ring ( Figure 1b); we refer to this as the neighbor network. Surprisingly, preventing trade between most cells encourages division of labor. We find that specialization evolves even when a<1:0, that is, when the returns on investment are saturating or concave. In our simulations, this topology leads to alternating specialists in viability and fecundity ( Figure 1b). Analytically, we find that this topology always favors at least some degree of specialization whenever a> 3 4b . We next study a network with cells that can be separated into two disjoint sub-groups, where every edge of the network connects a cell in one sub-group to a cell in the other sub-group and no within sub-group connections exist, that is, a bipartite graph (Figure 1c). We refer to the specific network structure in Figure 1c as the 'balanced bipartite' network. We find that specialization evolves even when a<1:0, similar to the neighbor network. However, we find that specialization evolves for a wider range of a and b values for the balanced bipartite network than for the neighbor network.
We can analytically determine under what conditions complete generalization is optimal. The complete generalist investment strategy is where every cell in the group invests equally into viability and fecundity, defined as: v Ã i ¼ 1 2 for all i. For these simple topologies, the complete generalist strategy is either a maximum or a saddle point, depending on the values of a and b. Complete generalization is only favored when the Hessian evaluated at the generalist investment strategy q 2 W qv k qv ' jṽÃ ¼ H Ã is negative definite, that is, all of its eigenvalues are negative. The largest eigenvalues of the Hessian for the complete, neighbor, and balanced bipartite networks are a 1 2 À Á 2aÀ3 ðÀ1 þ abÞ, a 1 2 À Á 2aÀ3 ðÀ1 þ 4 3 abÞ, and a 1 2 À Á 2aÀ3 ðÀ1 þ 2N Nþ2 abÞ, respectively. When a and b are chosen so that the largest eigenvalue becomes non-negative, complete generalization cannot maximize group fitness.
While we have not analytically shown where the fitness maximum occurs in cases where the generalist strategy becomes a saddle point, evolutionary simulations ( Figure 1) suggest that when complete generalization is not a fitness maximum, a high degree of (or even complete) specialization typically does maximize fitness.
In all cases in which complete specialization is achieved in evolutionary simulations,ṽb terms for viability specialists go to zero, as they cannot reproduce on their own. Furthermore, the fecundity specialists are entirely reliant on the viability specialists for their survival; if viability sharing were suddenly prevented, theirṽb terms would also be zero. This amounts to complete reproductive specialization (Cooper and West, 2018;Kirk, 2005;Michod, 2006).

Evolving resource sharing
Until now, sharing has been included in every intercellular interaction within groups. Here, we consider the case in which there is initially no sharing, and sharing must evolve along with specialization. These simulations begin with no resource sharing (i.e. b ¼ 0); during every round, each group in the population has a 2% chance that a mutation will impact its developmental program, and the b value for one of its cells will change. The new b value is chosen from a truncated Gaussian with standard deviation of 10% of the mean, centered on the current value. Whatever is not retained is shared equally across all interactions, including the self term.
Evolutionary simulation results are similar to those from the fixed-sharing model (Appendix 1figure 1). Saturating specialization (i.e. specialization despite a concave return function) still occurs for the neighbor and balanced bipartite topologies. Thus, for both fixed and evolved resource sharing, we observe specialization for the largest range of parameters (including a<1) not when the group is maximally connected, but rather when connections are fairly sparse. Therefore, a sparse group topology constitutes a cooperation-prone physical substrate that can favor the evolution of cellular.
As an example of the benefit of evolving sharing, consider that the maximum fitness according to . The ratio of these fitnesses is where the approximation is for large N.
So for larger groups and when a> 1 2 À log b 2 log 2 , if a group can evolve resource sharing (i.e. letting b ! 1 and adopting the specialist investment strategy) its maximum fitness will increase.

Benefit of specialization
We now consider a simple example to highlight why specialization can be adaptive despite saturating (i.e., concave) returns from trade. Consider groups of four cells, connected via the nearest-neighbor topology (i.e. in a ring). We directly calculate the group-level fitness of generalists and specialists for two scenarios: a ¼ 0:9 and a ¼ 1 by summing the contributions of each cell within these groups ( Figure 2). In this simple scenario, reproductive specialization strongly increases group fitness (33% for a ¼ 1 and 16% for a ¼ 0:9).
The benefit of specialization in neighbor networks increases with group size. For a ring of size N, fitness under the specialist strategyṽ ¼ h0; 1; 0; 1:: For a ring of generalists the fitness is W ¼ Nð 1 2 Þ 2a . Therefore, whenever a> log 3Àlog b 2 log 2 , the ring of complete specialists enjoys a greater fitness than the ring of complete generalists. Again, note that complete generalization becomes disfavored when a> 3 4b , so there is a narrow regime where 3 4b <a< log 3Àlog b To explore how specialization can be favored by the nearest-neighbor topology, we compare the fitness of a four member system when cells are (A) generalists and (B) specialists. We first consider the case of linear functional returns (a ¼ 1). For the case of generalists (A), each cell receives as much viability as it shares, and all nodes contribute equally to the fitness of the group. Therefore, the fitness of the group is W ¼ 4 Á 1 2 Á 1 2 ¼ 1. For the case of specialists, however, the viability specialist cells (blue) haveṽb ¼ 0, while the fecundity specialist cells have nonzeroṽb due to the fact that they receive 1 3 of each viability specialist's output. Thus the fitness of the group is W ¼ 2ð2 Á 1 3 Þ ¼ 4 3 . Thus, fitness is higher for the group of specialists, so specialization is favored. For a ¼ 0:9, the fitness of generalists is 1.15, and the fitness of specialists is 1.33. Thus, even though the returns on investment are saturating (i.e. concave), specialization is favored.
suggest that even in this region, however, the specialization score of the optimal strategy is large ( Figure 1).

Effect of sparsity
Surprisingly, saturating specialization appears to be the rule, rather than the exception, for sparsely connected graphs. We investigated Erdő s-Ré nyi random graphs with varying degrees of connectivity to systematically examine the relationship between sparsity and the value of a at which specialization is favored. We find that many randomly assembled graphs obtain maximum fitness through complete reproductive specialization even when a is below 1 (Figure 3b,c). It is only at the extremes of sparsity and connectivity (near the fully connected or fully unconnected points) that generalists maintain superior fitness for all values of a<1. We further show that this general trend is independent of the size of a group; saturating specialization is favorable for groups of size N ¼ 10, N ¼ 100, and N ¼ 1000. When network connectivity is at its minimum, the group consists solely of isolated cells that cannot interact. Under these conditions generalists are favored. Similarly, at maximum connectivity every cell interacts with every other cell. Under these conditions generalists are favored unless ab>1. However, when connectivity is small but not zero, specialization arises most readily. We conjecture that the troughs in Figure 3b, where specialization occurs for the lowest values of a, occur when connectivity is just large enough so that the existence of a spanning tree is more likely than not. Figure 3. Sparsity encourages specialization. Heat maps showing conditions that favor specialists (white) and generalists (black) for nearest neighbor topologies (A, left) and randomly generated graphs with the same connectivity as nearest neighbor topologies (A, right). Specialization is adaptive on a neighbor network for a> 3 4b ; random networks with the same mean connectivity as the nearest neighbor topology behave similarly. (B) The sparsity of a random graph affects how likely it is to favor specialization. We numerically maximize fitness for random graphs of size N ¼ 10 (left), N ¼ 20 (middle), and N ¼ 100 (right) at different levels of sparsity, and subsequently measure the specialization S of the fitness maximizing investment strategy. The horizontal axis is the fraction of possible connections present ranging from 0 (none) to 1 (all). The vertical axis is the specialization power a, and the colormap shows mean specialization.

Filaments and trees
Sparse topologies like the neighbor network configuration have significant biological relevance, and direct ties to early multicellularity. The first step in the evolution of multicellularity is the formation of groups of cells (Szathmáry and Smith, 1995;Kirk, 2005;Willensdorfer, 2008;Bonner, 1998;Fairclough et al., 2010). Simple groups readily arise through incomplete cell division, forming either simple filaments (Figure 4a) or tree-like morphologies (Figure 4b; Bengtson et al., 2017b;Droser and Gehling, 2008;Berman-Frank et al., 2007;Ratcliff et al., 2012). Filament topologies have been widely observed in independently-evolved simple multicellular organisms, from ancient fossils of early red algae (Butterfield, 2000;Figure 4a) to extant multicellular bacteria (Claessen et al., 2014) and algae (Umen, 2014). Branching multicellular phenotypes have also been observed to readily evolve from baker's yeast (Ratcliff et al., 2015;Figure 4b), and are reminiscent of ancient fungus-like structures (Bengtson et al., 2017a) and early multicellular fossils of unknown phylogenetic position from the early Ediacaran (Droser and Gehling, 2008).
Simulations of populations of groups with filamentous and branched topologies reveal that specialization is indeed favored in the sub-linear regime (Figure 4a and b) ; conversely, sub-linear specialization is never observed for fully connected topologies (Figure 4c). While the generalist strategy is never a critical point for these networks (which have c 6 ¼ c T , see Materials and methods), we conjecture that there is a nearby critical point which maximizes fitness at small values of a and becomes unstable at larger values of a. We introduce a new metric, a Ã , defined as the value of a such that the largest (least negative) eigenvalue of the Hessian evaluated at the complete generalist strategy is zero when b ¼ 1. For topologies in which each member has the same number of neighbors, a Ã is a critical value at which generalization is no longer an optimal strategy. However, even for groups stops being negative definite, that is, a Ã ; dotted lines indicate roughly where the simulation curves cross specialization of 0.5, that is, the 'true' transition value of a where specialization becomes favored. (C) In contrast, for a well-mixed group with fully connected topology, a Ã ¼ 0:5, indicating specialization only occurs when there are accelerating returns on investment. (D) To further explore trees and filaments we analytically solved for a Ã for various types of trees and filaments of different sizes. a Ã is plotted versus group size for several topologies. This is a proxy measure of how amenable a network structure is to specialization. where the number of neighbors for each cell varies, we can still use a Ã as a proxy for how amenable a topology is to saturating specialization. The smaller a Ã , the more specialization is likely to be favored. We plot vertical lines where a ¼ a Ã (solid lines in Figure 4(a) Figure 4(b)), and dotted lines to indicate roughly where the simulation curves cross specialization of 0.5. These results show that, for these topologies, a Ã acts as an effective metric for how amenable a network is to saturating specialization. This metric a Ã only depends on topology and can in principle be calculated analytically given any network. We examined the value of a Ã as filaments and a variety of tree-like structures grow larger, and find that specialization becomes more strongly favored ( Figure 4D ). While group size has no effect on specialization for some topologies, like the neighbor network, filaments and trees all see a decrease in a Ã as group size increases; a Ã eventually plateaus once groups are larger than a few tens of cells. Simple and easily accessible routes to multicellular group formation can readily evolve in response to selection for organismal size (Ratcliff et al., 2012), and this process may also strongly favor the evolution of cellular differentiation (McCarthy and Enquist, 2005;Heim et al., 2017;McClain and Boyer, 2009;Bonner, 1998).

Mean field model
Finally, to capture some general principles underlying this phenomenon, we consider a mean-field model with N cells (N >> 1), each of which is connected to z other cells. For simplicity we consider the case in which b ¼ 1 and a ¼ 1. We pick a ¼ 1 as at this point, if the fitness of specialists is greater than that of generalists, specialization will be favored for at least some values of a<1. If the fitness of generalists is greater than or equal to that of specialists, specialization will only be favored if a>1.
For generalists, the fitness is simply W G ¼ N=4, as each cell has v ¼ 1=2 and b ¼ 1=2 (before and after sharing). Viability specialists produce v ¼ 1 and b ¼ 0, while fecundity specialists produce v ¼ 0 and b ¼ 1. Viability specialists then share v ¼ 1=ðz þ 1Þ with each of their z neighbors. After sharing, fecundity specialists receive v ¼ 1=ðz þ 1Þ from each of their viability specialist neighbors. But how many of their neighbors are viability specialists? We label the fraction of cells connected to fecundity specialists that are viability specialists f, that is, f is the mean number of viability specialists connected to each fecundity specialist divided by z, averaged over all fecundity specialists. For a bipartite graph, f ¼ 1; for a randomly connected graph on which half of cells are viability specialists and half of cells are fecundity specialists, f ¼ 1=2. Group fitness is thus: Here, zf =ðz þ 1Þ is the average viability returns each fecundity specialist has received after sharing, which is multiplied by the amount of fecundity each fecundity specialist has (1) and the number of fecundity specialists (N=2). Writing W S in terms of W G : Specialists will be favored if the ratio W S =W G >1. This will be true if: which reduces to: This inequality implies that specialization will only be favored if fecundity specialists are preferentially connected to viability specialists, that is, if f >1=2. Further, for a fully connected network f ¼ 1=2, so this inequality is never satisfied, that is, specialists cannot have larger fitness than generalists for a ¼ 1 and fully connected topologies, as classically predicted.
Further, f cannot be more than 1, so if the threshold from the inequality in Equation 5 is greater than or equal to 1, specialization cannot be favored for a<1. Thus, specialization for a<1 is only possible if: which reduces to: z>1. This again reproduces a classic result: specialization for a<1 is not possible for disconnected cells. This analysis allows us to interrogate specific cases. For example, if z ¼ 3, f must be greater than 2/3, while if z ¼ 4, f must only be greater than 5/8. Can such networks be constructed? The answer will depend on both the number of cells and how they are connected. Ultimately, the question of if a graph can be made with particular values of f and z is a graph coloring problem, and beyond the scope of this manuscript. However, this inequality presents a useful heuristic which can be used to determine if specialization is favored by measuring just a few properties of the graph.

Effect of varying ratios of specialists
We now allow the fraction of fecundity specialists to be X (rather than forcing X ¼ 1=2). For generalists, the group fitness is unchanged, W G ¼ N=4, while for specialists the group fitness is: Writing W S in terms of W G gives: Specialists will be favored if the ratio W S =W G >1. This will be true if: Compared to the threshold value of f when X ¼ 1=2, if X>1=2, that is, more than half of cells are fecundity specialists, the value of f necessary for specialization to be favored is lower. If X<1=2, the threshold value of f is higher than if X ¼ 1=2. In other words, 1:2 is different from 2:1, and they both are different from 1:1. Once again, the question of if a particular configuration can be created-and how-is a graph coloring problem beyond the scope of this manuscript. However, this mean field heuristic gives us some information about how to expect graphs with different ratios of specialists to generalists to behave.
We again ask what must be true for f to be less than 1 (if f >1, specialization will not be favored). Thus, specialization is only possible if: which reduces to: For a mean field model, specialization with a<1 is impossible if fewer than one fourth of cells are fecundity specialists. We stress here that this is a mean field model, and does not apply to scenarios in which cells have a wide range of values of z. If such networks do or do not favor specialization for a<1 will again be a graph coloring problem.

Discussion
During the evolution of multicellularity, formerly autonomous unicellular organisms evolve into functionally-integrated parts of a new higher level organism (West et al., 2015;Michod and Nedelcu, 2003). Evolutionary game theory (Corning and Szathmáry, 2015;Nash, 1950;Smith, 1988) argues that functional specialization should only evolve when increased investment in trade increases reproductive output. Conventionally, this requires returns from specialization to be accelerating, that is, convex or super-linear (Szathmáry and Smith, 1995;Smith and Szathmáry, 1997;Goldsby et al., 2012;Corning and Szathmáry, 2015;Boza et al., 2014;Taborsky et al., 2016;Page et al., 2006;Rueffler et al., 2012;Szekely et al., 2013). While this idea is intuitive, it is, in the case of fixed group topology, also overly restrictive. In this paper, we explore how social interactions within groups, measured by their network topology, affect the evolution of reproductive specialization. Indeed, when all cells within groups interact (with equal interaction strength), returns on investment must be an accelerating, that is, convex, function of investment for specialization to evolve (Figure 1a; Szathmáry and Smith, 1995;Smith and Szathmáry, 1997;Corning and Szathmáry, 2015;Cooper and West, 2018). Yet for a broad class of sparsely connected networks, complete specialization can evolve even when the viability and fecundity return on investment curves are saturating, that is, concave (Figure 3).
To understand how specialization can be favored despite concave return on investment (ROI) curves, consider Jensen's inequality. Jensen's inequality states that for a convex function FðxÞ, hFðxÞi>FðhxiÞ, that is, the average value of FðxÞ, hFðxÞi, is larger than FðhxiÞ, where hxi is the average value of x. A corollary of Jensen's inequality is that the opposite is true for concave functions, that is, for a concave function GðxÞ, hGðxÞi<GðhxiÞ. Jensen's inequality guarantees that for concave ROI functions generalists produce more total viability and fecundity than specialists, and that for convex ROI functions specialists produce more total viability and fecundity than generalists.
Crucially, however, Jensen's inequality does not connect ROI convexity/concavity to group fitness. Jensen's inequality relates the degree of specialization to the average viability and average fecundity produced, but does not itself say anything about group fitness, which is the product of viability and fecundity averaged across all cells. For fully connected topologies (i.e. Figure 4c), greater absolute productivity proportionally increases group fitness, and differentiation can only evolve with accelerating benefits of specialization. This is not the case for topologically structured organisms, where fitness also depends on how complementary specialist cells are connected. Natural selection acts on realized productivity, that is, average vb; mutations that increase average v or average b without increasing average vb are not adaptive. The importance of connecting complementary specialists has long been appreciated in other contexts, such as metabolic cross-feeding, for which it has been shown that the spatial arrangement of unlike specialists plays a key role in determining their productivity (and thus fitness) (Co et al., 2020). Indeed, While Jensen's inequality ensures that generalists will produce more viability and fecundity than specialists given a concave ROI function, specialization can still increase the fitness of topologically structured groups by increasing realized productivity.
Rather than being unusual, networks favoring specialization readily arise as a consequence of physical processes structuring simple cellular groups (Allen et al., 2017). For example, septin defects during cell division create multicellular groups with simple graph structures (Figure 4a and  b), where cells are connected only to parents and offspring (Bengtson et al., 2017b;Droser and Gehling, 2008;Ratcliff et al., 2012;Ratcliff et al., 2013). If cells share resources only with physically-attached neighbors, then the physical topology of the group describes its interaction topology, and these sparse networks strongly favor reproductive specialization. Finally, we note that the primary benefit of sparsity is that sparse networks are likely to be at least somewhat bipartite. The more bipartite-like a network is, the less effort is wasted, and the easier it is for specialization to be favored.
Disentangling the evolutionary underpinnings of ancient events is notoriously difficult. Still, it is worth examining the independent origins of complex multicellularity, which are independent runs of parallel natural experiments in extreme sociality. Complex multicellularity (large multicellular organisms with considerable cellular differentiation) has evolved in at least five eukaryotic lineages, once each in the animals (King, 2004), land plants (Kenrick and Crane, 1997), and brown algae (Silberfeld et al., 2010), two or three times in the red algae (Cock and Collén, 2015;Yoon et al., 2006), and 8-11 times in fungi (Nagy et al., 2018). In all cases other than animals, these organisms form multicellular bodies via permanent cell-cell bonds, creating long-lasting highly structured cellular networks. Both fossil and phylogenetic evidence suggests that early multicellular organisms in these lineages were considerably less complex, growing as relatively simple graph structures. For example, 1.2 billion year old red algae formed linear filaments of cells (Butterfield, 2000), basal multicellular charophyte algae formed circular sheets of cells radiating from a common center (Kenrick and Crane, 1997), the ancestor of the brown algae likely formed a branched haplostichous thallus that was either filamentous or pseudoparenchymatous (Silberfeld et al., 2010), and hyphal fungi are primarily composed of linear chains of cells. Much less is known about the topology of animals prior to the evolution of cellular specialization. One hypothesis is that early metazoans resembled extant colonial choanoflagellates (Fairclough et al., 2013), the closest-living protistan relatives of the animals (Fairclough et al., 2010). Extant colony-forming choanoflagellates have evolved a variety of multicellular structures with sparse cellular topologies and permanent cell-cell bonds. For example, many species form branched, tree-like structures (Leadbeater, 2015), Choanoeca flexa grows as a sheet of cells (Brunet et al., 2019), and Salpingoceca rosetta can form either linear chains or rosettes in which the cells are connected via cytoplasmic bridges formed through incomplete cytokinesis (Dayel et al., 2011). While these growth forms are quite diverse, they all share characteristics (i.e. permanent cellular bonds and sparse topologies) that promote the evolution of cellular differentiation.
The main differences between our work and previous investigations of the effect of group topology on specialization is that we consider the productivity of groups as a whole, not the cells within them, and we consider situations of highly asymmetric sharing. Our approach is general, and can be applied to other systems of trade and specialization, so long as (1) only the aggregate productivity of the group (and not the particles within it) is maximized, (2) the productivity of each particle within the group is a multiplicative function of returns on investment into two (or more) tasks, and (3) there is an asymmetry in how products of those investments are shared. While in this work we have focused on reproductive division of labor, a process in which fecundity returns are not shared at all, we show in the supplement that as long as sharing of two goods is sufficiently asymmetric, specialization with saturating returns on investment can still be adaptive (Appendix 1-figure 2).
Finally, we note that alternative paths to specialization likely exist. For example, cells at different positions in a group may experience different local environments, which may produce cells with varied fecundity-viability trade-offs. A previous paper demonstrated that the evolution of specialization is favored if these 'positional effects' result in an initially heterogeneous population of cell types (Tverskoi et al., 2018). However, these positional effects were considered for the case of well-mixed groups (i.e. completely connected network topologies). We thus anticipate that future work examining the relationship between cellular interaction topology and cellular heterogeneity (as well as a wide range of complex and varied relationships between viability, fecundity, and multicellular fitness) will provide unique insight into the origin and diversity of multicellular forms.

Conclusion
We explored the evolution of reproductive specialization in multicellular groups with various cellular interaction topologies. Our results demonstrate that group topological structure can play a key role in the evolution of reproductive division of labor. Indeed, within a broad class of sparsely connected networks, specialization is favored even when the returns from cooperation are saturating (i.e. concave); this result is in direct contrast to the prevailing view that accelerating (i.e. convex), returns are required for natural selection to favor increased specialization (Cooper and West, 2018;Michod et al., 2006;Ispolatov et al., 2012;Solari et al., 2013;Michod, 2007;West et al., 2015). Our results underscore the central importance of life history trade-offs in the origin of reproductive specialization Michod, 2007;Hammerschmidt et al., 2014;van Gestel and Tarnita, 2017;Noh et al., 2018), and support the emerging consensus that evolutionary transitions in individuality are not necessarily highly constrained (Ratcliff et al., 2012;Ratcliff et al., 2017;Fairclough et al., 2010;Brunet and King, 2017;Pennisi, 2018;Black et al., 2019;Rose et al., 2020;van Gestel and Tarnita, 2017;Black et al., 2019;Staps et al., 2019;Grosberg and Strathmann, 2007).

Analysis
The gradient of the fitness with respect to the group investment strategyṽ, is whereê k is a unit vector in the k th direction. First notice that if c ¼ c T , andṽ ¼ 1 21 where1 is a vector of ones, then the gradient is zero. This strategy,ṽ ¼ 1 21 , corresponds to the 'generalist' strategy, where every cell invests equally into both tasks. Second, notice that if c 6 ¼ c T then the gradient is not zero under the generalist strategy, so at least some degree of specialization must be necessary to maximize fitness. To determine the stability of this solution we examine H Ã , the Hessian (see SI Equation 3) evaluated at the generalist critical point. If H Ã is negative definite, then the generalist strategy is a fitness maximum and is therefore an optimal strategy. If, on the other hand, H Ã has both positive and negative eigenvalues then the generalist strategy lies at a saddle point within the fitness landscape, and therefore the optimal strategy must be somewhere else in (or on the boundary of) the domain (i.e. v i 2 ½0; 1 for all i 2 1; 2; :::N). Finally, note that H Ã is never positive definite since1 is always an eigenvector with negative eigenvalue (when c ¼ c T ).
We also use the zero crossing of the largest eigenvalue of H Ã evaluated atṽ ¼ 1 21 and b ¼ 1 as an overall measure of how amenable a network is to specialization, even when c 6 ¼ c T .

Evolutionary simulations
Our evolutionary simulations maintain the same overall structure as the Wright-Fisher model: a discrete-time Markov chain framework with fitness-weighted multinomial sampling between generations and constant population size. Therefore we refer to them as Wright-Fisher evolutionary simulations. We initialize a population of N ¼ 1000 groups, each of group size N ¼ 10, with uniform random investment strategies. We then let them evolve for 1000 generations, selecting offspring according to the relative fitness of each group (see Equation 1). At each generation, there is a 2% chance for a mutation to a given group's investment strategyṽ. If a mutation occurs, a new investment strategy is selected from a truncated multivariate gaussian distribution centered at the current (pre-mutation) investment strategy and with standard deviation equal to 1 10ṽ . After mutations each group's fitness is calculated according to Equation 1, and the population is ranked according to fitness. Finally, N groups are selected (with replacement) to populate the next generation, according to a multinomial distribution weighted by the groups' fitness ranks.

Measuring specialization
To quantify the degree of specialization associated with a given group's optimal investment strategy-the one which maximizes the fitness-we introduce the following metric, which we refer to simply as 'Specialization': Specialization ranges from 0 (for groups consisting of cells investing equally in functions v and b) to 1 for groups consisting of cells investing exclusively in either function.
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Data availability
All evolutionary simulations and other computations associated with this work are available at https://github.com/dyanni3/topologicalConstraintsSpecialization (copy archived at https://github. com/elifesciences-publications/topologicalConstraintsSpecialization); all parameters used in the current study are specified so all simulations can be repeated exactly.

Appendix 1 Analysis
As described in the main text, the fitness for a group of N individuals is defined as and the gradient of the fitness with respect to the group investment strategyṽ, is whereê k is a unit vector in the k th direction.

Hessian
The Hessian q 2 W qvk qv' is Of particular interest for us is the value of the Hessian at the generalist strategy when c ¼ c T . In that case where a is the row-normalized adjacency matrix of the network. If A is the network's adjacency matrix then The case when c ¼ c T As noted above, when c ¼ c T , the generalist strategy is always a critical point where qW qṽ ¼ 0. To determine the stability of this solution we examine H Ã (Equation 4). If H Ã is negative definite, then the generalist strategy is a fitness maximum and is therefore an optimal strategy. If, on the other hand, H Ã has both positive and negative eigenvalues then the generalist strategy lies at a saddle point within the fitness landscape, and therefore the optimal strategy must be somewhere else in (or on the boundary of) the domain (i.e. v i 2 ½0; 1 for all i 2 1; 2; :::N). Finally, note that H Ã is never positive definite (when c ¼ c T ). Consider H Ã1 : Note a1 ¼1 since a is row-normalized. Furthermore, a>0, so1 is always an eigenvector of H Ã with a negative eigenvalue.
We can next ask, under what conditions is H Ã negative definite? This will depend on the group topology, the nonlinear returns on investment a, and the interaction strength b. We examine three cases: the neighbor graph, the balanced bipartite graph, and the complete graph.
Appendix 1-table 1. Largest eigenvalue of the Hessian evaluated at the generalist critical point as a function of a, b, and N for three topologies. When the group size N ¼ 4, the balanced bipartite graph coincides with the neighbor graph, and indeed the eigenvalues agree. Similarly, when N ¼ 2 the balanced bipartite graph coincides with the complete graph and the eigenvalues agree. The interesting domain of ab is ð0; 1, so for the complete graph H Ã is always negative definite. However, the balanced bipartite and neighbor graphs show regions where the generalist strategy is not stable.

Topology
Largest eigenvalue neighbor graph a 1 2 À Á 2aÀ3 ðÀ1 þ 4 3 abÞ balanced bipartite graph a 1 2 À Á 2aÀ3 ðÀ1 þ 2N Nþ2 abÞ complete graph a 1 2 À Á 2aÀ3 ðÀ1 þ abÞ When c ¼ c T , the matrix H Ã is a special type of matrix called a circulant matrix, with well known properties. Its eigenvalues are given by the discrete Fourier transform of its first row. The k th eigenvalue is For the ring topology with N ¼ 10, for example which has its maximum when k ¼ 5, The maximum eigenvalue for the balanced bipartite and complete graphs can be computed similarly.

Evolution of resource sharing
Here we model the co-evolution of sharing and specialization. We start with generalists that do not share at all. We then allow the amount of sharing and the degree of specialization to evolve. As described in the main text, during every round, each group in the population has a 2% chance that one if its cells will mutate and change how much 'viability' it shares. When this occurs, the fraction of its output to retain is chosen from a Gaussian with standard deviation of 10% centered on the current value. Whatever is not retained is shared equally across its interactions. The degree of specialization evolves as in simulations described in the main text.
Results are shown in Appendix 1-figure 1, for neighbor topologies, balanced bipartite topologies, and for a complete network.
reproductive tasks whose fruits are totally unshared that leads to specialization under regimes of sublinear return on investment.
The fitness function is modified so that which yields the following Hessian at the generalist critical point (for the neighbor, balanced bipartite, and complete networks) where and, where, as above, A is the graph's adjacency matrix (including self loops). We see that for a given topology the adjacency matrix is fixed, so that c 1 and c 2 differ only in their functional interaction strengths b 1 and b 2 . Therefore the maximum fitness strategy, specified by the vectorṽ Ã , for a given group will depend under our model on the following parameters: A À ! Adjacency matrix; specifies topology b 1 À ! Functional interaction strength of resource 1 b 2 À ! Functional interaction strength of resource 2 a À ! Specialization power; assumed to be equal for resource 1 and 2 We demonstrate the effect of these parameters on the optimal strategy by finding the minimum value of a for which specialization becomes favored, which we denote a Ã , for a given pair ðb 1 ; b 2 Þ and given topology. The results are shown in Appendix 1-figure 2.
average fitness, and we should not expect convex ROI functions to be required for specialists to be favored.

Star graphs
Let W g be the fitness for the star shaped network group of generalists (v ¼ 0:5, b ¼ 0:5) and W s be the fitness of specialists (all of the points of the star get v ¼ 1, b ¼ 0 and the central point gets v ¼ 0, b ¼ 1).
Next assume there are N cells on the points of the star and 1 cell in the center of the star. We then have: as the only individual with nonzero fitness is the central individual (all others have b ¼ 0). The central individual's fecundity returns are 1 a ¼ 1, and it's own viability returns are 0. However, the central individual gets shared b 2 of each of the N other individuals' viability returns (which are 1 a ¼ 1 each). Next, for generalists, we have The term on the left of Equation 9 comes from the fact that there are N individuals each sharing b 2 of their viability returns (which is ð 1 2 Þ a each) with themselves, and are getting b N of the central individual's ð 1 2 Þ a viability returns shared with them. Additionally, they are getting 1 À b of their own viability returns (withheld from sharing). Finally, each of their fecundity returns is ð 1 2 Þ a . The term on the right of Equation 9 represents the contribution to the group fitness of the single central individual. That individual gets b 2 Ã ð 1 2 Þ a of viability returns shared to it N times, and it also shares with itself and keeps a portion of its returns for itself. And it has a fecundity return of ð 1 2 Þ a .

Star topologies in the limit of large N
We first examine Equation 9 in the limit where N>>1: which reduces to and finally To understand if generalists or specialists are favored we examine the ratio of generalist to specialist fitness Wg Ws .
This means W g >W s if b>2 aþ1 . Since b and a are both bounded between 0 and 1, this is never achievable. Therefore, at large N, specialists are always favored.