A theory of discrete hierarchies as optimal cost-adjusted productivity organisations

Hierarchical structures are ubiquitous in human and animal societies, but a fundamental understanding of their raison d’être has been lacking. Here, we present a general theory in which hierarchies are obtained as the optimal design that strikes a balance between the benefits of group productivity and the costs of communication for coordination. By maximising a generic representation of the output of a hierarchical organization with respect to its design, the optimal configuration of group sizes at different levels can be determined. With very few ingredients, a wide variety of hierarchically ordered complex organisational structures can be derived. Furthermore, our results rationalise the ubiquitous occurrence of triadic hierarchies, i.e., of the universal preferred scaling ratio between 3 and 4 found in many human and animal hierarchies, which should occur according to our theory when production is rather evenly contributed by all levels. We also provide a systematic approach for optimising team organisation, helping to address the question of the optimal ‘span of control’. The significantly larger number ∼ 3 − 20 of subordinates a supervisor typically manages is rationalised to occur in organisations where the production is essentially done at the bottom level and in which the higher levels are only present to optimise coordination and control.


Introduction
Throughout most of Homo sapiens 300'000 year record, humans have lived in small-scale, mostly egalitarian hunter-gatherer societies, comprising around 30-50 or, at most, a few hundred individuals [1][2][3]. Following the strong warming of Earth by 5 to 10˚C from about 15'000 years ago leading to the end of the last ice age, settled communities emerged around 10'000 years ago, together with agriculture and animal domestication. These societies have been mostly structured into hierarchical societies. Over the past millennia, even more complex, large scale interconnected societies have evolved, shaped into cultural, economic, political and corporate hierarchies [3,4]. Explanations for the benefits of hierarchical organisation are manifold, such as advantages in warfare and multilevel selection [3,5], optimal search properties [6], robustness [7], effective use of resources [3] and so on. But a framework to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 quantitatively relate the specific hierarchical structures to the functions and constraints facing different types of society has been lacking.
Here, we determine the optimal social hierarchical configuration by maximising the output of an organization with respect to its design. Our framework accounts for the finite Dunbar's number as well as the universal preferred scaling ratio between 3 and 4 found in many human and animal hierarchies [8][9][10][11]. This model provides the first quantitative explanation for the ubiquitous occurrence of such triadic hierarchies, and furthermore provides a framework to answer questions regarding optimal team sizes in management tasks, helping to address the question of the optimal span of control [12][13][14].
2 Ingredients of a reduced-form theory of organisation structure

Production scaling and costs of coordination as a function of group size
We consider N individuals, who are working together to produce some output. This can be a directly measurable product or quantity, such as the revenue of a firm, or a more abstract quantity, e.g. overall group fitness. Quite generally, we may assume that the joint production P, resulting from the interaction of N individuals, scales as P * N β (β > 0). The most straightforward situation corresponds to β = 1, i.e. global output is proportional to population. However, for small groups, one could expect that "the whole is more than the sum of its parts", and indeed, it has been shown that the aggregate output in open-software projects scales super-linearly with the number of developers (β > 1), at least for group sizes N less than 30 to 50 persons [15]. Intuitively, this means that a group of individuals together can produce more than the sum of their individual production in absence of interaction. Generally, specialisation and complementary skills motivate cooperation between individuals to achieve results that would otherwise be impossible. Increased productivity can result from information sharing [16] as well as group heterogeneity [15], among others.
But as the group size increases, this super-linear production may tip over to just linear (β = 1) or even sub-linear growth (β < 1) [17,18], because the human brain can only cope with a limited number of social interactions [19] and too many communication channels would overload the attention span leading to collapsing performance. Generally, the overhead associated with communication, coordination and management of a group of collaborators of size N tends to decrease the performance per individual, a well-known characteristic of large organisations. As a first step, this cost can be represented as being proportional to the number N(N − 1)/2 * N 2 of pair-wise interactions between the N individuals in the group.

Optimal group size and Dunbar's number
Starting from the production scaling law P * N β supposed to hold for small teams, and adding communication costs as being approximately proportional to N(N − 1)/2 * N 2 , we obtain where μ and λ are two positive constants, which we refer to as the productivity factor and the coordination cost factor, respectively. Production now exhibits a maximum at individuals (as long as β < 2, which is a realistic assumption [15]). Thus, rather than a production scaling with the group size N, expression (1) predicts that, due to the cost of communication and coordination, groups of sizes larger than N � persons produce less than smaller groups of size N � . Large scale societies would then collapse into independent groups of size N � . While this is obviously counterfactual when interpreted for production, this prediction provides a rational for Dunbar's number [20], which is the maximum number of people with whom one can and does maintain stable social relationships. Dunbar's number is typically between 100 and 250, with a commonly used typical value of 150. This finite number has been suggested to result from cognitive constraints on group size that depends on the volume of neural material available for processing and synthesizing information on social relationships. This 'social brain hypothesis' describes the coevolution of neocortical brain size and social group sizes. In this context, the first term * N β captures the need for humans to cooperate and to socialise. The second term * N 2 embodies the costs of enforcing the restrictive rules and norms to maintain a stable, cohesive group. Using β � 1.5 [15] in (2) yields μ/λ � 16 for humans. Within this simple framework, the smaller group sizes of monkeys and primates may be interpreted as due to a smaller productivity factor and/or a larger coordination cost factor. Evolutionary improvement of the productivity factor by a factor of two predicts a four-fold increase of the optimal group size (for a fixed β = 1.5), possibly explaining how moderate cognitive increase may be associated with much larger group sizes. Technology, in the form of digital networking and artificial intelligence for instance, might promote an increase in the productivity factor, which could then be associated with larger social group sizes in futuristic human-digital symbiotic societies.

Evidence and needs for sub-group formation
Returning to the description of large firms and countries, their overall outputs typically increase approximately in proportion to the number of employees or citizens (allowing to define for instance such important economic metric as the GDP per capita). What is missing in the naive model (1) is that, in a group of N individuals, not everybody is directly interacting with everybody else. Instead, sub-groups form, with closely knitted individuals within a given sub-group interacting with other sub-groups via their representatives. A vivid illustration is provided by the organisation of combattants in an army, where soldiers at the bottom level form squads of about 10 headed by a corporal, then 3-4 squads form a platoon, 3 platoons combine into a company and so on. Such an organisation ensures an efficient transmission of information top-down and bottom-up for optimal battlefield performance. Such tendency to arrange into hierarchically structured groups have been reported widely, as previously mentioned [3,8,21].
To develop an intuition how this can come about, let us consider again N agents who need to communicate and coordinate. Under a flat organisation in which everyone interacts with everyone, the total coordination cost would be C * N 2 . But dividing the population into N 1 groups of N 0 individuals each (N = N 0 � N 1 ), the total communication overhead C then scales as N 2 0 � N 1 þ N 2 1 , where the first term accounts for the intra-communication cost of N 1 groups of size N 0 , and the second term accounts for the inter-communication between the N 1 groups through a single channel (for instance one representative of each group). C is minimized for N 0 * N 1/3 , N 1 * N 2/3 for which C * N 4/3 . The introduction of an additional level structure above the individual one thus reduces the communication overhead very significantly from C * N 2 to N 4/3 . In the supplementary information (SI), building on [22], we show that the addition of more layers (groups of groups, and so on) asymptotically reduces the cost to C * N (and some logarithmic correction terms). This reduced communication cost in hierarchical organizations helps understand how states and companies can function even when N is of the order of millions.
However, the argument that hierarchical structures are created just to solve the coordination problem [22] cannot be the whole story, because social agents come together in the first place to gain something, such as mutual protection, increased outputs, and so on. Here, we extend expression (1) to general hierarchical organisation structures and derive the optimal designs to maximise production.

General formulation
Following [22], we consider N individuals organised into p hierarchical levels. We denote by N 0 the number of individuals per group at the bottom of the hierarchical structure, i.e. the number of individuals per 'base group'. At the next higher order in the hierarchical chain, q 1 = N 1 /N 0 base groups taken together form a supergroup of N 1 individuals. Iterating, we define q r as the hierarchical group ratio, i.e. the size of a group at level r compared to level r − 1, we arrive finally at the highest level of the hierarchy. Fig 1 illustrates this construction. Note in particular that because N p = N is the number of individuals at the highest level of the hierarchy, we have a total of p + 1 hierarchical levels (we do not count the individual level). The special case of absence of hierarchies, i.e. p = 0, q 0 = N 0 = N, represents a system with only one level where everyone interacts with everyone. Identifying N −1 � 1 allows us to treat this case consistently. Also, note that q r � 2 (groups consist of at least two members). The maximum number of hierarchical levels is then The hierarchical generalization of expression (1) to p + 1 hierarchical levels amounts to summing over the productions P r of each level r as follows: where the first factor under the sum in the r.h.s. of (4) denotes the production of a group at level r and the second factor represents the number of groups at that level.

Geometric hierarchies
The productivity factors μ r and coordination cost factors λ r can depend on level r. It is natural to consider a geometric hierarchy defined with for some positive numbers ω, κ, ρ. The geometric series for μ r and λ r with constant scaling factors κ and ρ give a parsimonious dependence on the level r: for κ > 1 (resp. <1), higher levels of the hierarchy are more (resp. less) productive; for ρ > 1 (resp. <1), higher levels of the hierarchy require more (resp. less) efforts for coordination. The special case κ = 1 (resp. ρ = 1) corresponds to the same productivity (resp. communication cost) at all levels. The additional coefficient ω in μ r is the production of a single individual, which also sets the relative strength of productivity versus communication cost.
Putting (5) into (4), factoring out κ r and defining the relative cost-productivity scaling factor η � ρ/κ gives which constitutes our main object of study. Given a population of N individuals, for a given set of parameters κ, ω, η and β, our goal is to determine the optimal hierarchical structure, characterised by its number of hierarchical levels p � and the associated group sizes {q 0 , . . ., q p � }, which maximise the production (6).

Military hierarchies
Instead of (5), it also instructive to study the special case μ r = ωδ r0 while keeping λ r = ρ r . This represents the situation where only the ground level produces actual output, whereas the higher order levels are only acting as coordination nodes, for instance to allocate resources, manage and control. We shall refer to this as the military hierarchy, in reference to the fact that it is often the lowest military ranks (starting with "privates") who are exposed to active combats (new technology may be changing this), and the higher levels mostly exert "command and control". This case with the approximation q(q − 1)�q 2 allows for an analytical treatment given in the SI. As an illustration, with N = 2 12 = 4096, β = 1.5, ρ = 0.5, ω = 6, the optimal production is P = 40 0 447 with p � = 4 and the optimal structure is given by (q 0 � 9.1, q 1 � 2.8, q 2 � 2.8, q 3 � 6.5, q 4 � 8.9).
Varying ω, we find a qualitative differences between the hierarchical structures for large versus low ω's, which illustrate the fight between having large groups at the bottom to enhance productivity and the cost of coordination: for small ω's, the optimal structure consists in having maximally fragmented hierarchical structures with a maximum number of levels and smallest group sizes (all q r 's are equal to the minimum size 2); for larger ω's, larger subgroups are favoured, especially at the bottom and top levels, with relatively fewer levels (see S1 Fig).

Description of optimal structures
We now analyse the configurations of group sizes ðq 0 ; . . . ; q � p Þ that maximise (6). For a hierarchy with some fixed p, we determine the configuration {q 0 , . . ., q p } iteratively, by solving equation @P=@N r ¼ 0, and verifying that the solution indeed corresponds to a maximum. Details are found in the SI. There, we also double check that the optimal solution is not obtained by splitting the N individuals into isolated sub-structures.
The optimal group sizes are found to obey the recursive relation in the presence of the constraints N ¼ Q p r¼0 q r and q r 2 (2, N/2 p ). If solutions of (7) violates these constraints, one has to consider solutions on the boundaries. We thus apply a sequential numerical optimisation. First, for each p 2 {0, 1, . . ., p max }, we obtain numerically the configuration {q 0 , . . ., q p } that maximises (6) (see SI for details), thus getting the total production P(p) as a function of p. Different examples of this p-dependence are depicted in Fig 2, Then, p � is determined as the value that maximises the total production P(p) given by (6). Fig 2 presents the results of the search for the optimal hierarchical structures for four different set of parameters.

Fig 2. Production P(p) as a function of p (number of hierarchical levels minus one) for four different sets of parameters for a total population of 2 12 = 4096 collaborators.
For each of the four parameter set indicated in the four legends along the vertical axis, we obtain the optimal group sizes {q 0 , . . ., q p }, and calculate the corresponding total production P(p) from expression (6). The function P(p) exhibits a maximum at some p = p � indicated with an open circle. For each of the four optima, the corresponding optimal group sizes {q 0 , . . ., q p � } are given in the form of a stack of rectangles put on top of each other. The four different sets of parameters span different regimes and thus hierarchical designs. Non-integer values of q r 's should be interpreted as a combination of group of integer numbers of collaborators, with numbers within one unit from the quoted q r and such their average value is as close as possible to the q r . For instance, q r = 3.7 or 3.8 should be interpreted as corresponding to three groups of 4 and one group of 3. See main text for a detailed description of the four different cases.
https://doi.org/10.1371/journal.pone.0214911.g002 (a) Small group sizes and many hierarchical levels: An end-member class of solutions consists in having the smallest groups as possible, structured over as many levels as possible. This corresponds to the hierarchical structure at the boundary of the constraints q r 2 (2, N/2 p ), namely q r = 2 at all levels r and thus N = 2 p + 1 . This occurs approximately (but not necessarily precisely) when the output pre-factor ω is large compared to the relative cost-productivity scaling factor η and groups at higher levels are equally or more efficient than lower levels (κ � 1). This generalises the results found for the military hierarchies, whose structures are more simply controlled by the production ω of the bottom level with an inverse dependence as a function of ω, illustrating that hierarchical structures result from subtle competition between the different ingredients ω, κ, η.
(b) Trade-off solutions with non-trivial group sizes at different levels: When the output factor ωκ r and cost factor ρ r : = (κη) r are more balanced over multiple levels of the hierarchy, solutions expressing a trade-off between output and cost are characterised by non-trivial optimal group sizes at different levels of the hierarchy. This is illustrated in Fig 2 by the orange line with filled triangles and the stack of orange rectangles giving p � = 5 and the corresponding optimal group sizes.
(c) Decreasing production with hierarchical level: For productions that decay with level order (κ < 1), a small number of hierarchical levels is preferred (p � = 3), which can, for instance, be combined with group sizes that are increasing at higher orders in the hierarchy (red line with filled squares in Fig 2). Small optimal values for p � are also found analytically for the "military hierarchy", which is an extreme case of decrease of production with level order (see SI).

Dependence of hierarchical structure properties as a function of population size
For given productivity characteristics {β, ω, κ} and coordination cost properties {η} (or ρ) corresponding to case (b) above with p � = 5 in Figs 2 and 3 shows the dependence of three main features of the optimal hierarchical structure as a function of the population size N. The optimal group ratios q r shown in stacked bands of alternating colours of dark grey and pink exhibit several interesting features. First, as N increases, the optimal number p � of levels exhibit a series of transitions, from p � = 0 to p � = 1 at N = 3, from p � = 1 to p � = 2 at N = 4, from p � = 4 to p � = 5 at around N = 2020, and so forth. Note that the range of population sizes for a given p � does not follow a simple geometrical series that would be revealed by an approximately equi-spaced spacing in the logarithmic representation of the x-axis in Fig 3. In particular, the optimal value p � = 4 is found over a very large interval 2 5 − 1 < N � 2 11 . Nonetheless, p � can be shown to grow asymptotically on average proportionally to ln N (see SI).
Each of the transitions in p � is mirrored by a break or spike in the dependence of the productivity per individual, π � P/N, as a function of N. The first regime with p � = 0 corresponds to a super-linear growth of production, until it saturates with the emergence of the second hierarchical level, which is needed to tame the growing cost of communications. In particular, π has its absolute maximum at N � 15, suggesting an optimal size of 15 for an independent organisation, which should be organised into q 1 = 2 teams of q 0 = 7 members. We stress that these numbers are the optimal ones for the specific parameters β = 1.5, κ = 1, ω = 6 and η = 2. Other parameters would lead to different optimal hierarchies. Last, the productivity π can be seen to converge to π 1 � 10 for large N, corresponding to an asymptotically linear increase of the total production P as a function of organisation size N. As π 1 � 10 > ω = 6, the production per capita in the optimal hierarchical organisation is approximately 67% larger than that of isolated individuals, giving a significant gain. This asymptotic productivity per individual is however about 33% smaller than that of the optimal population size N � 15, exemplifying the relative disadvantage of growing organisations even with its optimal hierarchical structure. In a flat organisation, the quadratic cost would always end up dominating the total production and lead to a collapse of the organisation. Only a hierarchical structure can relieve from the excruciating cost of coordination and harvest the superlinear productivity (β > 1). The overall lesson is that knowledge of production and cost properties should provide guidance to shape the organisation structure for better productivity and performance. This has implications, not only for growing organisations that should develop additional levels of hierarchy in stage, as illustrated in Fig 3 but also, for mergers and acquisitions.

When is the whole more than the sum of the parts?
As mentioned above and shown more systematically in the SI, the total production P � (N) = π (N)N of the optimal hierarchical organisation, for any fixed set of productivity and cost parameters such that π(N) > 0, scales asymptotically linearly with the number of individuals N. In other words, the productivity or production per capita π(N) converges to a constant π 1 for large N, which is a function of β, ω, κ and η. On the other hand, for N non-interacting individuals (i.e. for N "structures" of q 0 = 1 individual each), Eq (6) reduces to a total production P I (N) = ωN, i.e., N times the production ω of a typical individual. In competitive, free markets, it will be rational for people to come together and cooperate only if their per capita production turns out to be larger than their individual ones. Fig 4 delineates the domain in the (κ, η)-plane, for different sets of fixed values of (β, ω). The domain can be split into two regimes, one for which π 1 � ω (regime where individuals are better off producing on their own, called "autonomy"), and the complementary domain for which π 1 > ω (regime where individuals are better off forming a group, called "hierarchy"). The curve separating the two domains is an increasing function of η as a function of ω. For β = 1.5, κ = 1, ω = 6 and η = 2, we show the dependence as a function of the total population number N (in logarithmic scale expressed in powers of 2) of three variables characterising the optimal hierarchical organisation determined and described in subsection 4.1: (i) total production P (blue dashed line); (ii) productivity per individual, π � P/N (dotted orange line); (iii) optimal group ratios q r shown in stacked bands of alternating colours of dark grey and pink. The tick marks on the right y-axis show the sizes q r of the groups for N = 2 14 , for which the optimal structure is given by p � = 4 with q 0 � 7, q 1 � 4, q 2 � 3, q 3 � 4, q 4 � 3 and q 5 � 3. https://doi.org/10.1371/journal.pone.0214911.g003 Intuitively, the larger the individual productivity ω, the larger can be the relative cost-productivity scaling factor η while still ensuring that a hierarchical society emerges.
The regime π 1 < ω represents organisations whose goals are not necessarily to improve productivity but to be stronger than other polities as a whole. Indeed, for many societies engaged in military competition for instance, what matters is the total military power relative to its rivals, not per soldier efficiency. Our theory on optimal hierarchical organisations applies there as well, as we obtain non-trivial hierarchical organisations even for cases where β � 1. These solutions are mostly dominated by a minimisation of the communication overhead. Elaborations of this regime will be reported elsewhere.

Triadic hierarchies & optimal span of control
In section I-2.2, we suggested a derivation of Dunbar's number * 100 − 250, describing the maximum number of people with whom one can develop stable social relationships. But Dunbar's number is actually just a part of the full story. In 2005, Zhou et al. [8] discovered the general existence of a discrete hierarchy of group sizes with a preferred scaling ratio close to three: For fixed values of (β, ω) given in the inset, the domain below each curve is such that the production per capita, π 1 , in the optimal hierarchical structure is larger than the production ω of an isolated individual. The computation of the production per capita has been performed numerically for N = 2 14 , which is large enough that the productivity per individual has approximately converged to its asympotic value π 1 . The background colouring shows π 1 for ω = 18, β = 1. The change of regime (white background) is where π 1 = ω, i.e. exactly where the hierarchical output P is equal to the input of N individuals, i.e. where P(N) = ωN. For parameters (κ, η) above this line, it is more productive if the N individuals work on their own. Below this line, hierarchical organisation is preferred.
https://doi.org/10.1371/journal.pone.0214911.g004 humans spontaneously form groups of preferred sizes organized in a geometrical series approximating 3 − 5, 9 − 15, 30 − 45, 90 − 140, 250 − 400 and so on. This finding has been corroborated in many different contexts [9,21,23,24] as well as for various groups of animals [25]. These works quantify the qualitative anthropological studies showing that societies, from primates [23] to humans [26], tend to arrange into discrete hierarchical structures, with group sizes ratios between hierarchical levels that typically range from 2 to 4 [8]. Within our framework embodied in Eq (4), this observation finds a natural explanation, as we will now show.
As long as the coefficients in the sets {μ r } and {λ r } ensure that a group (which has to be hierarchical) is more optimal than N isolated individuals, the optimal number of hierarchical levels p � scales as p � * log 2 N − 1 = n − 1, where we define N = 2 n for convenience (cf. SI). Since the maximum number of hierarchical levels is given by p max = n − 1, which occurs when all scaling ratios are equal to the minimum q r = 2, one can deduce that p � = α � (n − 1) for some α 2 (0, 1). So for example, in Fig 3, we see that the asymptotic regime (π is constant) starts roughly around N = 2 5 , at the beginning of the p � = 4 layer. The p � = 5 layer then only occurs at N = 2 11 , such that we estimate α � (5 − 4)/((11 − 1) − (5 − 1)) � 0.17. A more robust way to estimate α is outlined in the SI, and a systematic classification of α as a function of different parameter configurations, α = α(β, η, ω, κ), is depicted in Fig 5(a), showing that α ranges from its the minimal value close to zero all the way up to its maximum at one.
Assuming for simplicity that q r � q s 8r, s, it follows that 2 n ¼ N ¼ Q p � r¼0 q r � q p � þ1 ¼ q anþ1 and hence We can thus map the coefficient α to an optimal scaling ratio q through (8). The optimal scaling ratio q is depicted in Fig 5(b) for different sets of parameters, showing that q * 2 − 4 holds over a wide range of parameters. This is further exemplified in Fig 5(c), which plots the functional relation (8) for a large range of values of both α and n.
However, there are other regimes where q deviates significantly from the range 2 − 4, and depends on the level within the hierarchy. We propose that this range of parameters and corresponding regimes explain the findings in Business Management on the span-of-control [27][28][29], which is concerned with the number of subordinates a supervisor can or should manage. In many Fortune 500 organisation, the so-called "hourglass" organisation is observed, characterised by the vice-presidents at the top presiding over 8 to 9 senior directors, each of the senior director controlling 6 to 8 directors, each director supervising 3 to 6 lead managers, each lead manager directing 4 to 6 managers, each manager overseeing 5 to 7 supervisors, each supervisor leading 8 to 14 employees [30]. Such a structure is strongly reminiscent of the optimal hierarchy shown in S1 Fig for the "military" organisation with β = 1.5, ρ = 0.5 for large production per individual (ω = 6 or 10). We thus find that organisations, where the production is essentially done at the bottom level and for which the other higher levels are only present to optimise coordination and control, are characterised by strong non-universal scaling ratios that are level-dependent, with span-of-control ranging from 3 to 20. In contrast, as shown above, when the production is more evenly contributed by all levels, a quasi-universal scaling ratio in the range 2 − 4 ensures the optimal functioning of the society.
In conclusion, we have shown that, with very few ingredients captured in Eq (6), a wide variety of hierarchically ordered complex organisational structures can be derived. Future works will include pure integer optimisation, whereby we keep the group sizes equal to integer values, while simultaneously allowing for different group sizes on the same hierarchical level. Our fractional group ratios q r can then be seen as averages over these group sizes. Such an integer optimisation allows for direct comparison with actual organizational structures. Other extensions include allowing for heterogeneity among individuals in the productive ability, complementarity and different communication and coordination cost functions.