Mathematical toy model inspired by the problem of the adaptive origins of the sexual orientation continuum

Same-sex sexual behaviour is ubiquitous in the animal kingdom, but its adaptive origins remain a prominent puzzle. Here, I suggest the possibility that same-sex sexual behaviour arises as a consequence of the competition between an evolutionary drive for a wide diversity in traits, which improves the adaptability of a population, and a drive for sexual dichotomization of traits, which promotes opposite-sex attraction and increases the rate of reproduction. This trade-off is explored via a simple mathematical ‘toy model’. The model exhibits a number of interesting features and suggests a simple mathematical form for describing the sexual orientation continuum.


Introduction
When a particular behaviour or trait is widespread across a group of animals, its origin is usually explained in terms of the fitness advantage that it confers. Such explanations attempt first to understand how the fitness of the animal population has a dependence on the degree to which it exhibits a given trait. It is then assumed that the processes of evolution and natural selection bring the population close to the point of maximal fitness. (See, for example, references [1,2] for a review.) Given this paradigm, the prevalence of same-sex sexual behaviour in the animal kingdom has presented something of a puzzle. Same-sex sexual behaviour is ubiquitous across the animal kingdom, and has been catalogued in hundreds of animal species in ways that range from same-sex courtship and copulation to long-term pair bonding and parenting. (See, for example, reference [3] for an extensive review.) This ubiquity suggests the possibility that same-sex behaviour is associated with some kind of fitness advantage. The nature of this advantage, however, remains poorly understood, and is a source of considerable 2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

Model
In the toy model that is the subject of this paper, it is imagined that all individuals are characterized by a single trait whose value x ranges from 0 to 1. Suppose, for concreteness, that females tend to have values of x that are closer to 1, whereas males tend to have values of x that are closer to 0. Under this description, each sex is characterized by two probability density functions: one describing the probability of possessing a certain trait value x, and the other describing the probability of desiring a trait value x c in , the distribution of traits desired by males in a mate and q c (x), the distribution of traits desired by females in a mate. Within the model, all four can be related to a single distribution p(x), which is to be optimized. (b) When the parameter t is small, the optimal distributions are such that p(x) and q(x) have very little overlap, and the number of offspring is maximized. (c) When t is large, a broader distribution of traits is favoured, and consequently, there is significant overlap between the male and female trait distributions, resulting in a relatively high rate of same-sex pairing.
a mate. The distributions of the trait value x are denoted p(x) and q(x) for males and females, respectively. The distribution of the desired trait value x c is denoted p c (x c ) for males and q c (x c ) for females. The four distributions are summarized graphically in figure 1. It is assumed that x and x c are independent variables, so that the trait value x c that an individual desires in a mate is independent of the trait value x possessed by the individual itself. In principle, these four distributions can be completely distinct from each other. However, in order to simplify the model, I introduce the following two assumptions. The first assumption is that there is a symmetry between the two sexes, such that each sex is equivalent to the other under a redefinition of the value of the trait x → 1 − x. In other words, in terms of their traits and preferences, the two sexes are taken to be 'mirror images' of each other, so that q(x) = p(1 − x) and q c (x c ) = p c (1 − x c ). The second assumption is that the number of individuals possessing trait value x is equal to the number of individuals desiring the trait value x in a partner. This assumption guarantees that there is 'someone for everyone', and is equivalent to the conditions that p c (x) = q(x) and q c (x) = p(x). These two assumptions together imply that there is only one relevant distribution p(x) for describing the two sexes, and that all others can be related to it by p c (x) = q(x) = p(1 − x) and q c (x) = p(x) ( figure 1). Now, consider a population consisting of a very large number N of individuals, and suppose that the individuals all become paired with each other in such a way that every individual's desire for the trait value of their partner is satisfied. The proportion of heterosexual pairings that result from this process can be calculated as follows.
Consider two different trait values x 1 and x 2 . One can now define two groups of individuals: (i) those who possess trait value in the infinitesimal interval (x 1 , x 1 + dx 1 ) and desire trait value (x 2 , x 2 + dx 2 ) in a partner, and (ii) those who similarly possess x 2 and desire x 1 . These two groups are referred to as 'group 1' and 'group 2', respectively. The number of males in group 1 is given by Similarly, the number of females in group 1 is For group 2, one can likewise define the number of males and females as M 2 = N · p(x 2 ) dx 2 · p c (x 1 ) dx 1 , and F 2 = N · q(x 2 ) dx 2 · q c (x 1 ) dx 1 , respectively. Because of the symmetry of the distributions p(x) and q(x), the total number of individuals M + F is the same in both groups. One can, therefore, pair the two groups in such a way that each individual in group 1 is paired with an individual in group 2. If these pairings are selected at random, then the proportion of heterosexual pairings is (M 1 F 2 + F 1 M 2 )/(M + F) 2 , and the number of heterosexual pairings between the two groups is dN het (x 1 , x 2 ) = (M 1 F 2 + F 1 M 2 )/(M + F). To find the total number of heterosexual pairings across the entire population, one can integrate dN het (x 1 , x 2 ) over all values of x 1 , x 2 . Inserting the expressions for M 1,2 and F 1,2 gives The value of N het is maximized when the distributions of possessed traits and desired traits, p(x) and p(1 − x), have zero overlap (i.e. when p(x)p(1 − x) = 0 everywhere). In this case, all pairings are heterosexual, N het = N. If each heterosexual pairing produces b offspring on average, then the number of individuals in the next generation is bN het .
On the other hand, one may expect finite overlap between p(x) and p(1 − x) in situations where there is a fitness advantage conferred by each sex having a wide diversity in traits. In particular, one can define the trait entropy of the next generation as 2) is equivalent to the Shannon entropy s of the distribution p(x), multiplied by the number of individuals in the population. The entropy S is maximized when p(x) ≡ 1, i.e. when every trait value is equally likely for each individual, regardless of sex. Presumably, when the environment is such that there is pressure to produce offspring and also pressure to maintain a diversity of traits, the distribution p(x) will reach a steady-state that involves a trade-off between maximizing the number of offspring and maximizing the entropy of the trait distribution (figure 1b,c).
To model that trade-off, I introduce a generic fitness function F that consists of a term proportional to the total offspring number plus a term proportional to the trait entropy. In other words, the proposed fitness function is where u 0 and T 0 are constants that arise from environmental pressures and are independent of the distribution p(x). Dividing both sides of this equation by u 0 Nb one arrives at a renormalized fitness function f = F/(u 0 Nb) that is a function of only a single parameter t f = n(1 + ts). (2.3) Here, n = N het /N (see equation (2.1)) and s = S/(bN het ) (see equation (2.2)) are functionals of the trait distribution p(x), and t = T 0 /u 0 is a dimensionless 'entropy parameter' that characterizes the relative importance of trait diversity. (In this sense, t plays a role similar to that of the temperature in the Helmholtz free energy of statistical physics.) When t = 0, the optimum distributions have no overlap between male and female traits, and all pairings are heterosexual (n = 1). When t → ∞, on the other hand, the population fitness is optimized by p(x) ≡ 1, and heterosexual and homosexual pairings are equally likely (n = 1/2). In the remainder of this paper, results are presented for the distribution p(x) at different values of the parameter t. The primary tool used for finding the optimal p(x) is a numerical Monte Carlo algorithm, which is described in appendix A. Briefly, this algorithm divides the trait interval [0, 1] into discrete points x i , and makes an initial guess for the function p(x i ). The values of p(x i ) are then optimized by making random deviations from the initial guess, and then evaluating the corresponding change to the population fitness f . Changes are kept or discarded according to the Metropolis algorithm, and the procedure is iterated until a convergent solution is found.
Once the distribution p(x) is known, one can also examine the corresponding distributions of 'sexual orientation' θ , which is defined as the probability of a given individual pairing with a same-sex rather than with an opposite-sex partner. In particular, for an individual (say, a male) that prefers a trait value x c in a partner, one can define the orientation ϑ(x c ) of the individual as the proportion of same-sex individuals among the group to which the individual is attracted. One can also define a probability density function for θ as where δ(x) is the Dirac delta function. In the following section, results are presented for both the trait distribution p(x) and the orientation distribution P(θ) as a function of the entropy parameter t.

Results
When the entropy parameter is large, t 1, the trait distribution becomes flat, p(x) ≡ 1, which maximizes the trait entropy at the cost of reducing the total number of offspring by 50%. In fact, the optimal distribution is precisely equal to p(x) ≡ 1 for all values of t ≥ 4. Only at t < 4 do traits begin to specialize according to sex. At t slightly smaller than 4, the distribution p(x) acquires a step-like shape, with traits corresponding to x < 1/2 being more prevalent in males, and traits with x > 1/2 being more prevalent in females. This transition is depicted in figure 2a.
One can describe the transition at t = 4 analytically by writing the distribution p(x) as where c is a parameter to be determined. Inserting this distribution into equations ( In other words, at t ≥ 4, the optimal fitness is provided when c = 0, and the trait distribution is uniform. At t < 4, on the other hand, there emerges a difference in trait distributions between the two sexes, with a magnitude c that grows as √ 4 − t. This splitting also has an implication for the distribution of sexual orientations, P(θ). At t > 4, when the trait distribution is uniform, all individuals have orientation θ = 1/2, because there is no sexualization of traits. When t is lowered below 4, on the other hand, there emerge two classes of orientation: θ = (1 ± c)/2. The former class (with a majority preference for same-sex partners) comprises a smaller proportion (1 − c)/2 of the population. The latter class (with a majority preference for opposite-sex partners) comprises a larger proportion (1 + c)/2. In other words, the distribution of orientation P(θ) is such that P(θ) = δ(θ − 1/2) at t > 4, whereas at t slightly less than 4, one has P(θ) . This bifurcation of the orientation distribution is depicted in figure 2b.
As t is reduced even further, the trait distribution undergoes a sequence of additional splittings, as illustrated in figure 2a. At t 1.7, for example, the two-step structure of the trait distribution undergoes a transition to a three-step structure. In terms of the orientation distribution, one can say that a third class of individuals with orientation θ = 1/2 emerges in between the other two, and P(θ) is a sum of three Dirac delta functions. At t 1.17, this three-class structure transitions to a four-class structure, and as t is reduced an increasingly large number of classes emerge.
When t becomes small, t 1, the distribution p(x) has so many steps that it closely approximates a continuous function. As shown in figure 3, in this limit this function closely matches the form, which is reminiscent of the Fermi function from quantum statistical mechanics. The parameter T, which for the Fermi function is related to the system temperature, is linearly proportional to the entropy parameter t at small t.
To derive the relation between T and t, one can insert equation (    2) also implies a specific, continuous form for the distribution of sexual orientations, P(θ). In particular, evaluating equation (2.5) gives Note that, for any non-zero value of the entropy parameter t, the distributions of male and female traits always have finite overlap, and consequently, there are no individuals with strictly heterosexual or homosexual orientation, θ = 0 or θ = 1. Consequently, the distribution P(θ) should be considered to be defined only over the interval [θ min , In this sense, the probability distribution P(θ) is properly normalized, because θ max θ min P(θ ) dθ = 1.

Discussion
In this paper, I have considered a simple mathematical toy model for the trade-off between sexual dichotomy of traits and trait diversity. Among the more interesting features of the model are the series of sharp transitions in the trait distribution as the parameter t is varied, and the 'Fermi function' shape of the distribution at small values of t. Of course, the model has employed a number of fairly artificial assumptions, most notably the assumption of a single relevant trait that is defined on the interval [0, 1]. Because this assumption is unlikely to be applicable to a real biological population, it may be difficult to find direct empirical comparisons with the trait distribution p(x).
On the other hand, the model also makes specific predictions about the distribution of sexual orientation, which can in principle be observed. For example, the model suggests that when the relative importance of trait diversity is high (or, equivalently, when the relative importance of producing a large number of offspring is low), the population can be divided into a small number of well-defined groups with similar sexual orientation. As the environment is changed in such a way that trait diversity becomes less important, these groups split into a larger number of groups through a sequence of sharp transitions. Finally, when the value of trait diversity is low, the distribution of sexual orientation becomes continuous and takes the form P(θ ) ∝ 1/θ .
In principle, some of these results can be tested empirically by measuring the frequency of same-sex versus opposite-sex mating or pairing for a large number of individuals across an animal population. (Of course, one should be cautious about conflating the observed frequency of same-sex behaviours with the internal preference of an individual for same-sex partners.) Unfortunately, I am unaware of any studies that present sufficient data to construct an empirical version of the distribution P(θ).
To date, the vast majority of quantitative research about same-sex sexual behaviour focuses on humans. Some studies, beginning with the Kinsey reports, [10,11], have made an effort to assess the relative abundance of different sexual orientations. One can ask, then, how the results from such studies compare with the derived results from the model of this paper.
Such a comparison should, of course, be considered to be extremely speculative in nature. It is unlikely that the diverse range of human sexual behaviours can be described using the simplistic toy model outlined in this paper. What is more, data on sexual orientation in humans usually divides individuals into discrete categories and relies on self-reporting of same-sex sexual behaviour or sexual attraction. All of this makes it difficult to say anything quantitative about the distribution P(θ).
With these caveats, one can nonetheless make a speculative comparison between the distribution P(θ) and interview/survey data about human sexual orientation. Such data often categorize individuals according to their position on the Kinsey scale, which describes sexual orientation on a seven-point scale [10]. If this seven-point scale is (dubiously) considered to correspond to evenly distributed intervals of the orientation θ in the range [0, 1], then one can compare it directly to the theoretical distribution P(θ) from the model. Such a comparison is presented in figure 4. Figure 4 suggests that a very approximate fit to equation (3.4) is possible. This fit gives T ≈ 0.09, which corresponds to an entropy parameter t ≈ 0.4. This relatively small value of t is within the regime where the theoretical optimum distribution p(t) is well approximated by the continuous function of equation (3.2). One notable failure of the model is that it is unable to capture the relatively large proportion of individuals at either extreme of the distribution, θ ≈ 0 and θ ≈ 1. These extremes correspond to individuals who identify as either 'completely heterosexual' or 'completely homosexual', and their abundance is apparently greater than can be explained by the simple model proposed here. It remains an interesting question whether such extremization of sexual orientation can arise from   [12], circles are survey responses among ages 18-24 in the UK in 2015 [13], upward-facing triangles correspond to males age 20-25 in the original Kinsey reports (published in 1948) [10] and downward-facing triangles are from females age 20-25 in the Kinsey reports (published in 1953) [11]. The star symbols, connected by a solid line, denote a simple average of the four datasets. The dashed line shows a fit to equation (3.4).
optimization of the population fitness, or whether its appearance in the data is better ascribed to other (perhaps psychological or sociological) factors.
Future and ongoing studies may allow us to adjudicate between different proposed mechanisms for the appearance of same-sex sexual behaviour in the animal kingdom. In particular, the mechanism proposed here can be refined or refuted by collecting data on the proportion θ of same-sex versus opposite-sex sexual encounters for many individuals across a large animal population, and then checking whether it obeys the characteristic 1/θ distribution (as at t 0.5) or whether it resembles a set of discrete delta functions (as at t 1).
Alternatively, one could look for correlations between the rate of same-sex sexual behaviour in an animal species and the diversity of expression of a particular trait. If any such evidence is absent, it would suggest that the origins of same-sex sexual behaviour cannot be described as a simple competition between increased trait diversity and increased sexual dichotomization of traits. It could also suggest that the traits desired by a particular individual (x c ) are not statistically independent of the traits possessed by the individual (x); such non-independence would fundamentally alter the tension between the two terms in equation (2.3). Either way, finding a clever way to measure and study the distribution of biological traits, p(x), or the distribution of sexual orientations, P(θ), may prove to be a powerful tool for unravelling the mystery of same-sex sexual behaviour.
Finally, it is worth emphasizing that many of the results presented here are specific to the quantitative form of the assumed fitness function, equation (2.3). For example, one might consider that the degree of trait diversity is better characterized using the variance σ 2 = 1 0 (x − x ) 2 p(x) dx, rather than using the entropy s. (Here, x = 1 0 xp(x) dx is the average value of the trait x.) Substituting σ 2 for s in the fitness function f would then give a different mathematical optimization problem, and thus a different trait distribution p(x) for each value of the parameter t. Indeed, for the specific choice f = n(1 + tσ 2 ), I find that the optimal distribution p(x) assumes only one of two extremes: at all t larger than some critical value, t c ≈ 25, the optimal distribution is p(x) = 1, which corresponds to the t → ∞ limit in figure 2; on the other hand, at all t < t c , the optimal distribution is 2Θ(x − 1/2), where Θ(x) is the Heaviside step function, which corresponds to the t → 0 limit in figure 2. Thus, replacing the distribution entropy with the variance in equation (2.3) gives a significantly different phenomenology-one where the population consists uniformly of either bisexual individuals (θ = 1/2) or heterosexual individuals θ = 1), depending on whether t is above or below a critical value.
Data accessibility. Numerical code corresponding to the analysis performed in §2 and appendix A is available as a part of this article's electronic supplementary material. For the results presented in this paper, the initial guess was p(x i ) ≡ 1. A Metropolis-type algorithm is then used to incrementally update the values of p(x i ) in such a way that the maximum of f is increasingly approached. Specifically, the algorithm consists of repeatedly choosing random pairs of points x i and x j , and then updating the values p(x i ) and p(x j ) such that p(x i ) → p(x i ) + δ and p(x j ) → p(x j ) − δ. The increment δ is chosen at random from a small interval; results presented here use δ ∈ (0, 0.01). After each update, the change δ f in the fitness is evaluated. If δ f is positive, then the update is kept. If δ f < 0, on the other hand, then the update is reverted with probability 1 − exp[βδ f ]. Here, β is an 'inverse temperature' parameter that determines the rate of convergence of the solution and the final numerical accuracy. Results presented in §3 use a process of successively increasing values of β, starting at β = 10 5 and gradually increasing to β = 10 11 . At each value of β, a large number, 10 4 M, of updates is attempted to ensure convergence of the solution. Care was taken to ensure that the solution converged to the same result for different random realizations of the numerical procedure.
Finally, one can note that equation (A 1) has no explicit dependence on the value of x, and therefore, the numerical procedure does not, in general, find a set of values {p(x i )} that is meaningfully ordered as a function of x i . One can therefore arrange the numerical values {p(x i )} in order of decreasing value, and the resulting solution produces the same value of the fitness f .