Modeling natural resources exploitation in low-information environments

The sustainable exploitation of natural resources constitutes a real-world problem of interest for many fields. In this work, we study those situations in which the exploiting agents have information about the state of the resource and their own benefits and costs but not about the behavior or performance of the rest of the agents. Cognitive Hierarchy Theory provides a framework for those low-information scenarios by focusing on the assumptions that agents make about other individuals’ behavior. Motivated by this theory, we introduce a theoretical agent-based model in which agents exhibit varying degrees of rationalization when exploiting the resource, and this resource’s evolution is driven by a differential equation that mirrors the dynamics of real-world resource growth. Our results show that, although most regimes imply depletion, higher benefits and sustainability are obtained when agents assume overexploitation by the rest and try to compensate for it. Furthermore, many exploiting agents and a long-term perspective also involve a better resource state, reaching the optimal exploitation level when all these factors come together.


Introduction
The trade-off between profit and sustainability in natural resource exploitation is a ubiquitous challenge across disciplines.Furthermore, world population growth increases demand, leading, together with industrialization and technological development, to a resource consumption increase [1].In addition, access to the resource and extraction are usually hard to control, often leaving in the hands of the exploiting agents the viability of the process to ensure sustainability [2,3].Typically, exploiters have to decide between overusing the resource when looking for short-term benefits or sustainable exploitation, which may provide even higher benefits in the long term.This problem can be successfully solved [4,5], but in any case, reaching the optimal solution is not trivial [6].There are many examples in which overexploitation leads to an ecosystem shift [7], biotope reduction [8], or even collapse [9].Actually, about a fourth of fisheries collapsed between 1950 and 2000 [10].All these cases constitute examples of poor safekeeping of nature's resources, described as The tragedy of the commons [11,12].
Those social dilemmas in which agents have to decide between selfish or cooperative behavior, favoring either their benefit or the common good sustainability, are usually studied through the evolutionary game theory [13,14].Within this framework, the decision-making process affecting common goods [2] is often tackled through public goods games [15][16][17][18], where it is shown that some determinants, such as kinship, may promote cooperation in a public goods problem [19].More specifically, regarding common pool resources exploitation, a substantial body of experimental studies, case analyses, common-pool resource games, and agent-based models is devoted to comprehending common pool resources [20][21][22][23][24][25], which usually encompass considerations of individual benefit versus collective benefit, and more broadly, behaviors of selfishness or cooperation.Importantly, information exchange between participants plays a key and positive role in common-pool resources management [21,26,27], in this context, the allowance of communication emerges as a pivotal factor, highlighting the profound influence of social interaction on behavior.Additionally, interesting evolutionary outcomes emerge in feedback-evolving games [28] when incorporating imitation and aspiration dynamics [29].
Although the available information is essential when the exploiting agents make decisions to maximize their benefits, affecting the outcome and resource sustainability, there are many situations in which the exploiters do not have information about the rest of the agents' behavior.Even though assuming that agents act strategically, rationality in these situations is limited by practical elements such as the available information and cognitive and time limitations, resulting in bounded rationality.As the agents can make wrong assumptions about the behavior of others, bounded rationality does not necessarily involve outcome maximization.Cognitive hierarchy theories address this problem by classifying the players according to their degree of reasoning when making assumptions about other agents' behavior [30][31][32][33][34][35].Within these theories, the Level-k Framework assumes a distribution of cognitive levels, i.e. the number of reasoning steps the agents perform.While zero-step (level-0) agents make decisions randomly, higher-level (strategic) players assume that the rest do fewer reasoning steps than them.Camerer et al found that a Poisson distribution with an average of 1.5 reasoning steps fits the experimental data in many different situations [36].
In this work, we address the problem of resource exploitation in a low-information scenario, specifically when the exploiting agents have information about the resource but not on the other agents' strategies or performance and under the assumption of the absence of communication among agents (note that, with communication, additional cognitive factors, ranging from the Big Five personality traits to Theory of Mind, become relevant).Inspired by the Level-k Framework, we propose a model where a population of agents with different cognitive levels exploits a resource capable of regenerating without overexploitation.Taking the Gordon-Schaefer model as a starting point [37][38][39][40], the resource (a fishing reservoir, although it could be of any other nature) evolves according to a differential equation that accounts for intrinsic population dynamics and human exploitation.In the proposed model, the agents try to obtain the maximum benefit in a given expectancy period (i.e. in the short, medium, or long term).With this scheme, we study how the different parameters impact both sustainability and the benefits obtained by the agents.We find that high cognitive levels do not involve high benefits.On the contrary, whereas a population of level-2 agents may involve depletion, the highest profits and the best resource state occur for a large number of level-1 agents where each agent tries to compensate for the rest when assuming that those other exploiters do not worry about the resource state.Furthermore, the expectancy period also correlates positively with the benefits and sustainability.Finally, it is shown that, for a large enough number of level-1 agents, those tend to reach the optimal exploitation level as they increase their expectancy period towards long-term profitability.

The model
In the proposed model, we consider a big lake (generally, a water body) and a set of N agents exploiting it.These agents mimic either individual fishers or fishing companies which, in both cases, attempt to maximize their own benefit in a given profitability expectancy period.To that end, the agents try to anticipate the resource evolution to choose their actions.In this context, an agent's action is given by its fishing effort, i.e. the time spent searching for fish [41].
Each agent i (i = 1, 2, . .., N) is characterized by its cognitive level l i (l i = 0, 1, 2, . ..) and the assumptions g i it makes about other agents' cognitive levels, i.e. its assumed distribution for the rest of the agents' levels [36].Level-0 agents (l i = 0) chose their actions randomly, which means that a level-0 agent i equiprobably makes an effort E i within the interval [0, 1]; these agents may be considered as non-rational.Level-1 agents (l i = 1) assume that the rest of the agents will act non-strategically (i.e. as level-0 agents).In the same way, a level-h agent (h > 1) assumes a population of lower levels agents, i.e. each level-h agent assumes that the rest of the agents have a cognitive level lower than h.
Then, at each round, relying on its other agents' levels assumption and the resource state, each rational agent (l i > 0) chooses the action that would provide the highest payoff accumulated over a period τ , where τ represents the profitability expectancy period (i.e.short-, medium-or long-term exploitation).Therefore, each rational agent computes the resource consumption by the other agents and then evaluates the effort E i that will maximize the benefit over τ .Specifically, each round represents a year, E i ∈ [0, 1] the fishing hours over the total hours in a year, and τ the number of years the agents consider when maximizing their benefit.Note that this approach involves a recursive function for the decision-making process which complexity increases exponentially with the other agents' assumed levels (and therefore with the own level, as it imposes an upper bound for the assumed levels).
The amount of natural resource (here, fishing harvest) that agent i obtains per time unit is given by: where E i represents the effort by Agent i, i.e. the time (days per year) that a given agent devotes to fishing, R represents the number of elemental units of the resource (here, the total number of fishes in the lake), U the carrying capacity of the system (lake), and z the time needed to catch a fish provided R = U.Therefore, in z = 1/γ, γ represents the harvest per year, assuming a resource at its carrying capacity state and a maximum effort.Note that the harvest for a given effort decreases as the resource abundance does.Additionally, it is important to emphasize that the model assumes a constant carrying capacity, acknowledging that in real-world scenarios carrying capacity may vary in response to environmental changes.
In the absence of fishing, for simplicity, we consider the logistic equation for the resource evolution, which is the solution of the first-order non-linear ordinary differential equation: where α is the growth rate, i.e, the number of female descendants per female and year.Therefore, regrowth takes place at rate α and is fastest when the population in the lake, R, is small relative to the carrying capacity, U, zero when the population is at the carrying capacity, and negative in the case of overpopulation R > U [42].Taking into account the fishing, the resource evolves according to: The individual benefit, in monetary terms, per time unit (year) is given by: where p is the unit fish price, and c is the cost involved in obtaining a unit of resource (i.e. a fish) [25].
At each time step, agents simultaneously make their decisions.That is, each agent i decides the effort E i it invests in fishing (in days per year).To make that decision, it does not have any information about the past or present decisions (efforts) of the rest of the agents, nor about their catches and benefits.Therefore, the only information available for a given agent is the resource state and its own benefits, catch, and efforts.Specifically, each agent i makes assessments of the rest of the agents' cognitive levels; these assumptions encompass a distribution of the harvesting efforts for the rest of the agents.Utilizing that efforts distribution, Agent i evaluates its level of effort E i that would yield its maximum benefit over τ years.This process involves forecasting the outcomes of various tentative efforts and selecting the effort that promises the most substantial benefits over the designated timeframe τ .Note that the only goal of agents is to maximize their benefits.

Results and discussion
To solve the model through numerical simulations, we have chosen parameter values to make the results explainable and relative to non-trivial problems.Explicitly, we want to study a resource susceptible to overexploitation.To this end, we fix 1/z = γ = U/4N.This choice involves that without regeneration and under the maximum effort by all the agents, the resource would be extinct in four years.Note that, according to the previous formula, the carrying capacity U in the simulations is proportional to the number N of exploiting agents.This will allow us to study the system scalability, i.e. when increasing the number of agents, we will also increase the size of the whole system (i.e. the resource capacity).To capture a realistic scenario, we fixed the cost to c = γp/4.The benefit will be given in terms of the price p, which is a conventional variable ultimately related to the currency unit.For the sake of clarity, we instantiate it by choosing the euro as currency; this choice will be determined by the values of the independent variable p.We will study the influence of some variables on the resources and benefits.These variables are: • The size of the system, which, according to the previous considerations, is determined by the number N of agents.• The profitability expectancy period τ , i.e. the number of years along which the agents try to maximize their benefit.• Cognitive level distribution.
• The agents' assumptions about the other agents' cognitive levels.

Fixed effort
Let us start with the simplest case in which all the agents invest the same effort constantly over time: ∀i, j ∈ {1, 2, . . ., N} E i = E j = E. Figure 1 provides the stationary value of the individual benefit π = πi (t → ∞) as a function of the effort E for different values of the growth rate α.As shown, the curves display a maximum value that represents the optimum effort in terms of benefit.For low values of the growth rate (α < 0.25), the observed almost linear decrease for high efforts (it is not a straight line but a curve) is because the positive terminal of the RHS of equation ( 4) is negligible compared to the negative term, i.e. the price of the catch is negligible compared to the cost of the effort invested.
Each curve in figure 1 corresponds to a specific value of α, and it is reasonable to interpret each curve as representing a distinct species or resource within the model.In the upcoming sections, our focus will be on the specific case where α = 0.15.This value is chosen to investigate a realistic and compelling scenario wherein excessive effort may lead to resource depletion, potentially resulting in negative economic outcomes.

One cognitive level
As a step forward in the model behavior understanding, in this section we will study a simple case in which (i) all the agents share the same cognitive level (ii) they assume that the rest of the agents are one level lower.This simplification will allow us to observe the behavior of the system and the influence of different factors, such as the profitability expectancy period or the number of exploiters.
First, we will focus on the time evolution of the more relevant system observables for different values of the number of agents N and cognitive levels.Figure 2 displays the mean (i.e.averaged over all the agents) effort ⟨E⟩, the resource state R, and the mean benefit ⟨π⟩ as a function of the time steps (years).The left (resp., right) panels show the results for the case where all the agents have a cognitive level l = 1 (l = 2).Correspondingly, in the top (resp., bottom) panels, the number of agents exploiting the resource is N = 2 (N = 5).In all the panels (and in all figures hereafter), the displayed observables are normalized: the effort is given over the maximum possible (100%), the resource over the carrying capacity U, and the benefit over the maximum obtained in optimal conditions (resource at its carrying capacity and maximum effort only during a year).
As shown in figure 2, an initial maximum exploiting effort involves a fast decrease in both the resource state and benefits.The agents react to this decrease by reducing their effort.The drop in exploitation efforts leads to a stabilization of the resource state; once this almost stationary value for the resource state is reached, agents' efforts and benefits fluctuate around respective values.Those final values for the exploitation effort, resource state, and benefits depend on the cognitive level and expectancy period.
The transitory period of over-exploitation lasts between five and fifteen years.In order to study the average values reached once the fast transient has elapsed and the effect of the different parameters on the system, we analyze the value of the observables over a time window of 25 steps (years) after the transient.Figure 3 displays the stationary values of the mean effort ⟨E⟩, the resource state R, and mean benefit ⟨π⟩ versus the profitability expectancy period τ , i.e, versus the period the exploiting agents consider when trying to maximize their benefits.Horizontal lines display the baseline results (l = 0), black dotted, blue dashed, and red dotted-dashed for ⟨E⟩, R, and ⟨π⟩, respectively.The left panels correspond to the case where all the agents have a cognitive level l = 1, central panels to l = 2, and right panels to l = 3. Correspondingly, in the top panels, the number of exploiting agents is N = 2, whereas the bottom panels display the case N = 5.As shown in panels (a) and (d), for a cognitive level l = 1 , an increase in the profitability expectancy period involves a decrease in the exploiting effort.Interestingly, this decrease in the effort involves a significant growth in the resource state and a sharp growth in the benefits.One can conclude that an increment in the cognitive level from l = 0 to l = 1 involves a resource state enhancement and higher benefits with lower exploiting effort.Furthermore, these improvements rise with the profitability expectancy period increase.To explain these results, one may consider that level-1 agents assume that the rest of the agents make their decisions at random, overexploiting the resource (as we are studying systems with a low growth rate α = 0.15 susceptible to resource depletion).Therefore, when trying to maximize their benefits decide to be cautious in their effort.Note that level-1 agents overestimate the other agents' effort when assuming l = 0 for the rest when in fact is l = 1.This overestimation makes them cautious, resulting in a better resource state and a higher benefit than that would correspond to the case their assumption was correct.As τ increases (and so the cumulative resource depletion), the optimal effort (according to the assumption) decreases, which leads to an improvement in the resource state and benefit growth.Comparison of panels (a) and (d) shows that this trend is more pronounced for N = 5 than for N = 2: a higher number of exploiting agents involves effort Figure 3.Effect of the profitability expectancy period on the effort, resource, and benefits.Mean effort ⟨E⟩, resource state R, and mean benefit ⟨π⟩ in the stationary state as a function of the profitability expectancy period τ for the scenario where all the agents have the same cognitive level l: l = 1, 2, 3 in the left, central, and right panels, respectively.The number of exploiting agents is N = 2 in the top graphs (a)-(c), whereas N = 5 in the bottom panels (d)-(f).The horizontal lines show the results for the irrational agents' case l = 0): black dotted, blue dashed, and red dotted-dashed for ⟨E⟩, R, and ⟨π⟩, respectively.Each point represents the mean of 1000 independent simulations, averaged in turn over a time window of 25 years after the system reaches a stationary state.Error bars correspond to the variance measured over the independent simulations.reduction to compensate for the overexploitation under the assumption of l = 0 for the rest.Later we will study in detail the effect of the number of agents on the dynamics.
Regarding a higher cognitive level, panels (b) and (e) of figure 3 show that the previous trend does not take place for l = 2.This behavior is a consequence of the assumptions made by level-2 agents.In this simple study case, exploiting agents assume that the rest of the agents have a cognitive level l = 1.Then, they presume that those level-1 agents will under-exploit the resource, and therefore when trying to maximize their benefits, they decide a higher effort than that would correspond to l = 1, which in turn involves the decline of the resource and lower benefits.As displayed in panel (b), for N = 2 there is no significant effect of the profitability expectancy period τ on the system observables, whereas for N = 5 (panel (d)) the trends for l = 2 are reversed regarding those for l = 1: now, an increase in τ entails an increase in the exploiting effort and, subsequently, a depletion of the resource and a reduction of the benefits.Note that in this case, the benefits are negative, i.e. as τ increases losses also do.As a plausible explanation, the trend of taking advantage of the rest of the agents increases with N, as, according to agents' assumptions, the larger the number of agents, the lower the own behavior effect on the dynamic, which leads to the (false) belief that they may gain in the long term (higher τ ) by increasing the effort.
Finally, panels (c) (N = 2) and (f) (N = 5) show the results when all the exploiting agents have a cognitive level l = 3, each one assuming l = 2 for the rest of the agents.That assumption leads each agent to overestimate others' effort and, therefore, to choose a low exploiting effort, which entails a better resource state and higher benefits than those corresponding to a scenario where all the agents behave as l = 2. Nevertheless, as level-3 agents' assumption (l = 2 for the rest) presumes lower exploitation than level-1 agents' assumption (l = 0 for the rest), the resource exploited by level-3 agents (panels (c) and (f)) does not recover the values of that exploited by level-1 agents (panels (a) and (d)), and the benefit obtained is not as high.Following this reasoning, as the cognitive level increases to higher values (l = 4, 5, . ..), an alternating and damped behavior towards a mean value is expected.From now on, we will focus on the study up to l = 2.
Addressing the effect of the number of exploiting agents on the dynamic, figure 4 shows the mean effort ⟨E⟩, the resource state R, and mean benefit ⟨π⟩ in the stationary versus the number N of agents.Now, horizontal lines display the optimal solution, i.e. the effort (black dotted line) that would provide the best resource state (blue dashed) and the highest benefit (red dotted-dashed).The left panels correspond to a profitability expectancy period τ = 1, central panels to τ = 5, and right panels to τ = 9.Correspondingly, the top panels display the results for the scenario where all the agents have a cognitive level l = 1 and assume l = 0 for the rest; whereas the bottom panels correspond to l = 2 agents, which assume l = 1 for the rest.
Let us first study the scenario l = 1 (top panels).For a low profitability expectancy period τ = 1 (panel (a)), there is no significant influence of N on the resource.That is, when the agents try to maximize their benefits in the short term, they do not care about sustainability: regardless of how many other agents are exploiting the resource, they invest a high effort that eventually provides them a scarce benefit.As the profitability expectancy period towards the long-term (panel (b), τ = 5), the agents care about the near future and reduce their effort, which leads to a better resource state and higher benefits.Furthermore, as the number of agents N increases, the mean effort decreases, as each agent assumes that the others will overexploit the resource.This trend is more pronounced for high expectancy periods (panel (c), τ = 9).Interestingly, in the latter case, as the number of exploiting agents increases, they asymptotically approach the optimal solution, which provides the highest benefits and optimal resource state.In conclusion, the long-term optimization (c) shows higher benefits ⟨π ⟩ and better resource state R than medium-(b) and short-term (a), being this difference larger as N increases.
Concerning the case in which all the agents share a higher cognitive level l = 2, for a low profitability expectancy period τ = 1 (panel (c)), there is no significant influence of N on the effort nor on the benefits or the resource state.Regardless of the number of exploiters, agents overexploit the resource, eventually receiving almost zero benefits.As displayed in panels (d)-(f), for larger expectancy periods, the higher N, the larger the effort, and this trend is more pronounced as τ increases.In this case, the agents assume that the rest are level-1 (therefore, preservers), and when maximizing their benefits, they try to take advantage of the rest of the agents.On the one hand, the higher the number of agents, the less the exploiters (wrongly) assume that their action will influence the resource.On the other hand, as the profitability expectancy period τ increases, the agents believe that the rest of the exploiters' concern about the resource increases and therefore tend to overexploit the resource towards depletion.Both mechanisms are synergistic, which explains the strong dependencies with N. As discussed above, level-2 agents try to take advantage of the rest of the exploiters.As N increases, the less they think their own behavior will affect the resource, then increasing their effort.Eventually, they obtain a very low (negative) benefit and deplete the resource.

General case: coexistence of different cognitive levels
In this section, we address the scenarios when the distributions of both exploiters' cognitive levels and assumed levels are heterogeneous.Given the exponential growth of possible configurations as the maximum level increases, the oscillatory nature of action choice with the increase of level, and the low average level observed in available data [36], here we will focus on the cognitive levels up to l = 2.
Figure 5 displays the stationary values of the mean effort ⟨E⟩, resource state R, and mean benefit ⟨π⟩ for different cognitive levels distributions and different assumptions made by level-2 agents about the rest of the exploiting agents.The x-axis represents a categorical variable, namely the configuration.Each configuration consists of (i) a cognitive level distribution and (ii) an assumption for the rest of the exploiters by level-2 agents (if any).These configurations are shown in the right-bottom diagram, where, for each configuration, the top bar corresponds to the levels' distribution and the bottom bar (when present) to the assumption made by level-2 agents.Configuration 0 corresponds to a population exclusively composed of level-0 agents; configuration 1 to a population composed of level-1 agents; configuration 2A to a population of 20% of level-0 agents, 60% of level-1 agents, 20% of level-2 agents; and configurations 2B-2C to a population composed by level-2 agents.Furthermore, in configurations 2A and 2B, level-2 agents assume, for each other agent, a level l = 0 with a probability of 40%, l = 1 otherwise (60%) whereas, in configuration 2C, level-2 agents assume a level l = 1 for the rest of the agents.In panel (a), the profitability expectancy period is τ = 1; in panel (b), τ = 5; and in panel (c), τ = 9.The number of agents was N = 5.
The mixed configurations 2A and 2B have been chosen for being representative since a high number of rationalization steps is not expected in real-world situations but rather a number between 1 and 2 [36].As configurations 0, 1, and 2C have been studied in the previous section, let us focus on mixed configurations 2A and 2B.Configuration 2B constitutes a modification of configuration 2C by lowering the expectations that level-2 agents make about the rest of the exploiters' cognitive levels.In that sense, 2B's level-2 agents' behavior is expected to be between those of 2C's level-2 and level-1 agents.Consequently, one should expect a system behavior for the 2B configuration between that in configurations 1 and 2C.On the other hand, level-2 agents make the same assumptions about the rest of the exploiters in 2A as in 2B.Yet, the population in 2A is a mix of agents with levels l = 0, 1, 2 dominated by level-1 agents, and therefore the behavior for configuration 2A is expected to lie between those of 1 and 2B.
Results shown in figure 5 confirm the previous arguments.In all the panels (i.e. when agents try to maximize their benefits either in the short, medium, or long-term), the mean effort ⟨E⟩ in configuration 2B corresponds to an intermediate value between those in pure configurations 1 and 2C .Correspondingly, both resource state R and mean benefit ⟨π⟩ in configuration 2B also show a value in-between those in configurations 1 and 2C.Regarding configuration 2A (the one designed to be the most realistic), as expected, the mean effort, resource state, and mean benefit show stationary values between those observed in configurations 1 and 2B.

Conclusions
In this paper, we apply the cognitive hierarchy Theory to the universal problem of natural resource exploitation.Drawing from the Gordon-Schaefer model as a foundational framework, our study delves into resource exploitation dynamics within low-information scenarios.Specifically, we examine situations where agents lack communication channels and possess limited information about other exploiters' performance or harvest.Instead, agents rely solely on information pertaining to their own performance and the state of the resource.To this end, we propose a game-theory model in which a dynamic resource is exploited by a set of N agents, to which a rationalization capacity is assumed quantified through a cognitive level l i .These cognitive levels indicate the predictive capacity that each one of the agents has on the actions of the other agents, namely the number of rationalization steps they made when evaluating the rest of the agents' actions.According to the k-level framework, the lower cognitive level agents, level-0, choose their actions at random.The next level agents, level-1, assume that the other agents will act non-strategically, i.e. a population made of level-0 agents.Level-2 agents act on the belief that the population of exploiters consists of level-0 and level-1 agents, and this pattern continues for higher levels.The agents exploit the resource in order to obtain a maximum profit in a given time expectation τ .The motivation of this study is to evaluate how different parameters affect the evolution of the resource state, its sustainability, and the benefits obtained by the agents.
To gain insight, we first deal with those cases where all agents have the same cognitive level.The numerical results show that a high cognitive level does not necessarily bring higher benefits or a better resource state, with level 1 displaying higher profits and a better resource state than level 2; level 3 shows an intermediate behavior between those of levels 1 and 2. When maximizing their benefits, agents try to compensate for lower-level exploiters' actions; therefore, as the cognitive level increases step by step, an alternating behavior is expected as the agents.While existing literature suggests that higher cognitive abilities lead to more sustainable resource use in scenarios with more information (e.g.communication and other agents' performance) [43,44], our results demonstrate that, in low-information environments, characterized by limited communication and information about other agents' performance, a high cognitive level does not necessarily guarantee better outcomes in terms of resource state and profitability.
Regarding the effect of the profitability expectancy period τ , when all the exploiting agents are level-1, an increase in τ reduces overexploitation, improves resource state, and increases benefits.This dependence is not displayed for a population of level-2 agents, which shows (i) a negligible effect of τ on the system when only two exploiters and (ii) even an opposite trend for higher N.This complements existing literature that has highlighted the importance of temporal considerations in sustainable resource use [45][46][47].
As for the influence of the number of exploiting agents, there is no effect of N on the resource when agents try optimization in the short-term (τ ∼ 1 year).As τ increases, the behavior depends on the cognitive level.In a level-1 scenario, for medium and long-term optimization (τ ≳ 5 years), increasing N reduces overexploitation.This trend is more pronounced as τ increases to the point that, for large values of τ , as N increases, the exploitation tends towards the optimal solution that involves an optimal resource state and the highest benefits.Interestingly, in a level-2 scenario, it is just the opposite: the higher the number N of exploiters, the higher the depletion, being this trend more pronounced as τ increases.All these results are explained in terms of agents' beliefs and benefit optimization.
Afterward, we deal with the general case, when the population is composed of agents of different levels who, in turn, may assume heterogeneity for the rest of the agents' levels.We focus on realistic configurations according to available experimental data, finding that the results of these heterogeneous configurations may be deduced from the previously analyzed only-one-level cases.
To conclude, the most profitable and sustainable exploitation regime takes place for many agents looking for long-term profitability, provided these agents assume that the rest of the exploiters do not worry about the resource and therefore try to compensate others' actions.

Figure 1 .
Figure 1.Fixed effort.Benefit obtained by the agents in the stationary state versus the exploitation effort for the simple case where all the agents invest the same effort E. Each curve represents a different value of the growth rate α of the resource.

Figure 2 .
Figure 2. Evolution.Time evolution of the mean effort ⟨E⟩, resource state R, and mean benefit ⟨π⟩ for four characteristic realizations corresponding to different cognitive levels and sizes for the simple case in which all the agents share the same cognitive level.Left panels (a), (c) correspond to cognitive level l = 1 and right panels (b), (d) to l = 2.In top panels (a), (b), the number of agents is N = 2, while in bottom panels (c), (d) N = 5.

Figure 4 .
Figure 4. Effect of the number of exploiters on the effort, resource, and benefits.Mean effort ⟨E⟩, resource state R, and mean benefit ⟨π⟩ in the stationary state Versus the number of agents N for the case where all the agents have the same cognitive level.Top panels (a)-(c) correspond to cognitive level l = 1 and bottom panels (d)-(f) to l = 2.In left panels (a), (d) the profitability expectancy period is τ = 1, in central panels (b), (e) τ = 5, and in right panels (c), (f) τ = 9.The horizontal lines represent the optimal solutions for ⟨E⟩, R, and ⟨π⟩ denoted by black dotted, blue dashed, and red dotted-dashed lines, respectively.Each point represents the mean of 1000 independent simulations, each one of them averaged over a time window of 25 years after the system reaches a stationary state.Error bars correspond to the variance measured over the independent simulations.

Figure 5 .
Figure 5. Coexistence of cognitive levels.Mean effort ⟨E⟩, resource state R, and mean benefit ⟨π⟩ in the stationary state for τ = 1 (panel (a)), τ = 5 (b), τ = 9 (c).The x-axis represents, as a categorical variable, the distribution of the cognitive values in the population, as shown in the right-bottom diagram.In that diagram, for each configuration, the top bar shows the cognitive level distribution of the population, and the bottom bar (if present) displays the level-2 agents' assumption for the rest of the agents' levels.Error bars correspond to the variance measured over the independent simulations.N = 5 in all the panels.See the main text for further details.