1 Introduction

For most of its existence the human species has lived in small bands of hunters and gatherers. Organized, complex, and hierarchical social structures—what we often call states—are a relatively recent phenomenon. States emerged gradually from around 3500 BCE, starting in a few corners of the world, in particular Mesopotamia, China, the Nile and Indus River Valleys, Mesoamerica, and the Andes (e.g., Service, 1975, Ch. 1; Borcan et al., 2018). A few millennia earlier, these same regions were also the first to enter the Neolithic Revolution, i.e., develop agriculture.

Many have therefore hypothesized a causal link from the rise of agriculture to statehood. One proposed mechanism has been labelled the Surplus Theory. The idea is that agriculture caused, or allowed, the rise of states by raising output per unit of land, thus creating a “surplus” which could be stored, and then feed a ruling elite. By contrast, in human societies that rely on relatively low-yielding techniques to obtain food, no such elite population can be sustained, since everyone’s labor is needed for procuring food. Variations on this broad explanatory theme can be found in, e.g., Childe (1936, 1950), Allen (1997), Diamond (1997), Hibbs and Olsson (2004), Putterman (2008, Section IV), and Borcan et al. (2020).Footnote 1

Another mechanism, proposed by Scott (2009, 2017), Mayshar et al. (2017, 2020), has been labelled the Appropriability Theory. This emphasizes the characteristics of new crops that arrived with the Neolithic Revolution, in particular cereals. These were easier to expropriate than foods obtained through gathering or horticulture, specifically tubers. In support of this theory, Mayshar et al. (2020) document that statehood did not arise earlier in locations with higher agricultural yields overall, when controlling for the relative productivity of cereals and tubers. They also make the theoretical point that the Surplus Theory is hard to reconcile with a Malthusian model. This relates to the standard Malthusian result that steady-state incomes per agent are independent of land productivity, implying that the rate of extraction chosen by the elite should also be independent of land productivity.

In this paper we propose a unified Malthusian framework that incorporates some elements of both of these theories. Decisions in this model are made by a ruler, representing an “embryonic” state, and by a continuum of subjects, whose incomes the ruler has some ability to expropriate. [The pre-existence of a ruler is not crucial. Prior to full-fledged statehood, we can think of this agent as a “chief,” or what Sahlins (1963) labelled a “big man.” This is discussed further in Sect. 3.6.] The size of the subject population evolves over time in a Malthusian fashion and depends on how much the (embryonic) ruler extracts.

The extracted resources can be used for the ruler’s own consumption, or for two types of investment. First, he can invest in public goods, or what we call productive capacity. This captures the observation that early states were often instrumental in providing, e.g., irrigation (cf. Wittfogel, 1957; Nissen & Heine, 2009) and external defense (cf. Dal Bó et al., 2016).

Second, the ruler can accumulate power, or capacity, to more easily extract resources in the future. We refer to this as investment in extractive capacity. One example of such investments could be the costly acquisition of knowledge about writing and record keeping, which have been important components of a state’s extractive apparatus (Scott, 2009, pp. 226–234; Stasavage, 2020, pp. 93–96). Another example could be the hiring of skilled administrators (Ertman, 1997, Ch. 1).

Extractive and productive capacities are complementary: expanding production is more valuable when extracting it is easy, and improving extraction is more valuable when there is more to extract. This can give rise to multiple steady-state equilibria: one has low extractive capacity, low rates of extraction, and low levels of land productivity, population density, and output; another has high extractive capacity, high rates of extraction, and high levels of productivity, population density, and output.

The way these steady states differ is a non-trivial insight. The population is denser in the very steady state where it is taxed more heavily, which is surprising given the Malthusian framework. It is the higher productive capacity in the high-extractive steady state that sustains that denser population.

Also, the higher rate of extraction does not follow trivially from a higher level of extractive capacity. Rather, the ruler extracts more to finance investment in future extractive capacity.

As in any model with multiple steady states, shocks can push the economy from one steady state to another. For example, a positive shock to extractive capacity, holding productive capacity constant, can push it from the low-extractive to the high-extractive steady state; a shock to productive capacity can cause the same type of transition, holding extractive capacity constant. In that sense, the workings of the model seem consistent with both the Appropriability and Surplus Theories.

Moreover, we show that multiplicity of steady states hinges on the ruler being able to invest in both extractive and productive capacities; removing either channel renders the steady state unique. In other words, investments in extractive and productive capacities produce richer results together than each of them can on its own.

To explore the empirical relevance of the model, we lean on the complementarity between productive and extractive capacities. This complementarity implies that land productivity should have a greater impact on state building when the return to investing in extractive capacity is higher. That return should arguably depend on how many existing states there are to copy from.

To illustrate this, we consider an extended setting with many societies, and assume that the return to investing in extractive capacity faced by each ruler is increasing with the average level of extractive capacity across all societies. We then simulate the model, and let a few societies experience a positive shock to extractive capacity at some point, which pushes these to the high-extractive steady state. This in turn raises the return to investing in extractive capacity for the remaining societies, among which those with higher land productivity transition into statehood earlier than those with lower land productivity. This generates a positive relationship between land productivity and statehood across societies with late state development, but not among those with early state development. This pattern is consistent with cross-country data for the Eurasian continent.

The rest of this paper is organized as follows. Next, Sect. 2 discusses some of the existing literature. Section 3 sets up the benchmark model, and arrives at its main prediction about multiplicity of steady states. Section 4 then shows how this result falls apart when dropping investment in either extractive or productive capacities. Section 5 presents a simulation and some empirical evidence. Section 6 ends with a concluding discussion.

2 Existing literature

This paper seeks to contribute to a strand of the economics literature studying early state development. One reason this topic matters to economists is that there seems to be long-lasting effects from early statehood on modern development. For example, Borcan et al. (2018) document that countries with very early and very late statehood tend to have lower GDP/capita levels than those with states of intermediate age. Other studies using earlier installments of the same state antiquity data (e.g., Bockstette et al., 2002; Chanda & Putterman, 2007; Chanda et al., 2014) find a mostly positive relationship. There are also some interesting correlations between early statehood and other modern outcome variables: Hariri (2012) documents that countries with older states are currently less democratic; Depetris-Chauvin (2016) finds links between early statehood and modern conflict in Africa. Theories linking the timing of statehood to democracy and other modern development outcomes include Lagerlöf (2016).

Empirical studies into the origins of statehood often focus on the natural environment as a deep-rooted factor. For example, Fenske (2014), Litina (2014), Depetris-Chauvin and Özak (2016) find that states emerge where ecological conditions promote trade and specialization. Heldring et al. (2019) link state development in the Fertile Crescent from 5000 BCE to shifts in rivers, which they argue induced provision of public goods.

One particularly influential theory of how the environment can induce state building is the so-called circumscription theory by Carneiro (1970), which holds that states tend to emerge where fertile lands are geographically delimited, e.g., by mountains. Recent research has found support for this theory. Schönholzer (2019) documents that states form at locations with locally high agricultural productivity, surrounded by areas with lower productivity. Looking at data from ancient Egypt, Mayoral and Olsson (2020) find that changes over time in the degree of circumscription—defined as the productivity gap between the taxable and non-taxable activity, and induced by variation in rainfall—seems to impact state stability. In our model, we may think of the parameters guiding the accumulation of extractive capacity as factors encompassing the degree of environmental circumscription.

Theories on the emergence of states also often focus on the environment. For example, Dal Bó et al. (2016) and Schönholzer (2019) present models where land productivity, and the degree of geographical circumscription, are drivers of state formation.Footnote 2 Different from these models our setting is Malthusian, allowing us to study population density as an endogenous outcome.

Using a Malthusian framework should also help address some of the critique against theories linking land productivity to state formation, or what we here label the Surplus Theory. As discussed in Sect. 1, Mayshar et al. (2020) argue that such theories are hard to reconcile with Malthusian population dynamics. This poses a conundrum, given the broad consensus about the relevance of the Malthusian model for preindustrial development (see, e.g., Galor, 2010; Ashraf & Galor 2011). In the Malthusian model presented here, land productivity can indeed affect state building. This hinges on extractive capacity being endogenous: when closing down this channel agricultural productivity no longer has any effect on the rate of extraction, similar to the results of Mayshar et al. (2020, Online Appendices B); see Sect. 4.1 below. Our empirical findings suggest that endogenous extractive capacity may be most relevant when state building is done by copying and learning from existing states. This does not contradict that earlier state building could be better understood from a framework where extractive capacity is exogenous and a function of crop composition, as argued by Mayshar et al. (2020).

Finally, this paper leans on a theoretical literature, starting with Besley and Persson (2009, 2011), on investment in fiscal and legal state capacities; what we here call extractive capacity corresponds closest to fiscal capacity in their jargon. Again, one difference is that we use a Malthusian setting, where population density is endogenous.Footnote 3

3 The model

Consider a world with two classes: subjects and what we for simplicity call a “ruler.” The term ruler, and many model assumptions, are discussed further in Sect. 3.6.

The subjects live in overlapping generations for two periods: as passive children and active adults. In the adult phase of life, a subject works, pays taxes, and produces offspring. This means that the size of the subject population evolves endogenously over time, as a function of the ruler’s extraction rate.

The ruler has one single offspring who replaces him in the next period. We refer to him by the singular male pronoun, but this can also be interpreted as a collective of agents (an elite, or proto-elite).Footnote 4

The ruler decides on the rate at which subjects are taxed, denoted \(\tau _{t}\). A fraction \(1-z_{t}\) of the taxed (extracted) resources are lost, where \(z_{t}\in (0,1]\). We refer to \(z_{t}\) as extractive capacity. The subjects thus get a fraction \(1-\tau _{t}\) of total output, the ruler gets a fraction \(\tau _{t}z_{t}\), while the remainder, \(\tau _{t}(1-z_{t})\), is lost. As discussed in Sect. 3.6, lost tax revenue can be interpreted as theft by a class of tax collectors.

Since the ruler’s income equals \(\tau _{t}z_{t}Y_{t}\), we shall refer to \(z_{t}Y_{t}\) as the ruler’s effective tax base.Footnote 5

3.1 Production

Output in period t, denoted \(Y_{t}\), is produced with the production function

$$\begin{aligned} Y_{t}=(MBA_{t})^{\alpha }L_{t}^{1-\alpha }, \end{aligned}$$
(1)

where \(\alpha\) is the land share of output, \(L_{t}\) is the size of the subject population, M denotes the size of land (below normalized to one, \(M=1\)), and B and \(A_{t}\) are the two different land productivity factors. We refer to \(L_{t}\) as just population, but since land is normalized to unity, it also measures population density.

The factor B is taken as given by the ruler, and captures time-invariant factors determined by geography, such as the caloric content of the crops that can be grown in a particular environment. By contrast, \(A_{t}\) depends on productivity-enhancing investment undertaken by the ruler, representing public goods such as irrigation systems, or knowledge. We shall refer to \(A_{t}\) as productive capacity.Footnote 6

3.2 Extraction and population dynamics

Each subject earns the average product of labor, \(y_{t}=Y_{t}/L_{t}=(BA_{t}/L_{t})^{\alpha }\), which is taxed at rate \(\tau _{t}\in [0,1]\). Each subject’s income after tax thus equals \((1-\tau _{t})y_{t}\).

Subjects care about consumption, \(c_{t}^{S}\), and fertility, \(n_{t}\), and utility is given by

$$\begin{aligned} U_{t}^{S}=(1-{\widetilde{\gamma }})\ln c_{t}^{S}+\widetilde{\gamma }\ln n_{t} , \end{aligned}$$
(2)

where \({\widetilde{\gamma }}\in \left( 0,1\right)\). Each subject takes her income as given and maximizes (2) subject to the budget constraint

$$\begin{aligned} c_{t}^{S}=(1-\tau _{t})y_{t}-qn_{t}, \end{aligned}$$
(3)

where \(q>0\) is the cost per child. This gives optimal fertility as

$$\begin{aligned} n_{t}=\gamma (1-\tau _{t})y_{t}. \end{aligned}$$
(4)

where \(\gamma \equiv {\widetilde{\gamma }}/q\). Since each subject is replaced by \(n_{t}\) offspring, the subject population in the next period equals \(L_{t+1}=n_{t}L_{t}\). Applying (4) and \(y_{t}=Y_{t}/L_{t}\) gives

$$\begin{aligned} L_{t+1}=\gamma (1-\tau _{t})y_{t}L_{t}=\gamma (1-\tau _{t})Y_{t}. \end{aligned}$$
(5)

The subject population thus constitutes a capital stock to the ruler, in the sense that its size in the next period, \(L_{t+1}\), decreases with the ruler’s current rate of extraction, \(\tau _{t}\). Put another way, \(1-\tau _{t}\) is the fraction of output that the ruler “invests” in the subject population.

3.3 Investment in extractive capacity

Let the ruler’s investment in next period’s extractive capacity be denoted \(x_{t}\ge 0\), which builds extractive capacity in the next period, \(z_{t+1}\) , at a rate \(\phi >0\). We let extractive capacity be bounded from above and below at levels \({\overline{z}}\) and \({\underline{z}}\), respectively, such that \(0<{\underline{z}}<{\overline{z}}\le 1\) (discussed further in Sect. 3.6 below). More precisely,

$$\begin{aligned} z_{t+1}=\min \{{\overline{z}},{\underline{z}}+\phi x_{t}\}=\left\{ \begin{array}{lll} {\overline{z}} &{} \text {if} &{} x_{t}\ge \frac{{\overline{z}}-{\underline{z}}}{\phi }, \\ {\underline{z}}+\phi x_{t} &{} \text {if} &{} x_{t}\in \left( 0,\frac{{\overline{z}}- {\underline{z}}}{\phi }\right) , \\ {\underline{z}} &{} \text {if } &{} x_{t}=0. \end{array} \right. \end{aligned}$$
(6)

The parameter \(\phi\) is a measure of how easy extractive capacity is to build. For now this is treated as exogenous. In Sect. 5 we are going to interpret \(\phi\) as a function of extractive capacity among other societies, the idea being that state building is often done by copying existing states.Footnote 7

3.4 Investment in productive capacity

Consider next investment in productive capacity. We let the cost of \(A_{t+1}\) in terms of period-t consumption be \(\eta A_{t+1}^{\sigma }\), where \(\eta >0\) and \(\sigma >1\). Assuming \(\sigma >1\) ensures that output and population converge to constant non-growing levels. The ruler’s budget constraint can now be written

$$\begin{aligned} c_{t}^{R}=\tau _{t}z_{t}Y_{t}-\eta A_{t+1}^{\sigma }-x_{t}, \end{aligned}$$
(7)

where \(c_{t}^{R}\) is the ruler’s consumption.

3.5 Utility

The ruler’s preferences are defined over \(c_{t}^{R}\) and the total effective tax base in the next period, \(z_{t+1}Y_{t+1}\), with utility function

$$\begin{aligned} U_{t}^{R}=(1-\beta )\ln \left( c_{t}^{R}\right) +\beta \ln (z_{t+1}Y_{t+1}) , \end{aligned}$$
(8)

where \(\beta \in \left( 0,1\right)\).Footnote 8

3.6 Discussion

Before we set up the ruler’s maximization problem, it is helpful to scrutinize some of the (implicit and explicit) assumptions in the set-up so far.

3.6.1 Minimum extractive capacity

As mentioned, we assume upper and lower bounds for extractive capacity, denoted \({\overline{z}}\) and \({\underline{z}}\), respectively. The upper bound is not critical and can be set to one, \({\overline{z}}=1\). The assumption that \({\underline{z}}>0\) is more important. If \({\underline{z}}=0\), then the economy would under certain conditions converge to a steady state with zero population and output, a special case of what we will later call a low-extractive steady state. Intuitively, in that steady state the ruler would have no extractive capacity, and thus lack tax revenue with which to invest in productive capacity, which is necessary for production, and thus for the population to reproduce. Assuming a minimum level of extractive capacity ensures that this steady state has positive population.

There are other ways to avoid the outcome with a vanishing population. For example, one can impose an exogenous lower bound for productive capacity instead.Footnote 9 However, that type of model would be mechanically similar to the one set up here, the main difference being that a non-negativity constraint on investment in productive capacity would replace that for extractive capacity in the current set-up.

3.6.2 Egalitarianism and the assumed pre-existence of a ruler

The model presumes that a so-called ruler exists, which might ostensibly contradict the idea of an egalitarian social structure from which statehood emerges. Again, this is mostly for simplicity and clarity, and not completely at odds with the stylized facts pertaining to many pre-state societies.

First of all, the ruler does not need to be richer than other agents. The Online Appendices shows that the ruler’s steady-state income can be lower than, or equal to, that of his subjects, if \({\underline{z}}\) is sufficiently small. What distinguishes the ruler from the subjects is not his income, but rather that he chooses taxes and invests in extractive and productive capacities.

Second, in any economic model where variation in statehood is the endogenous result of a choice, that choice needs to be vested with some agent, whether we call that agent a “ruler” or something else, and whatever the exact choice is. When interpreting the model, we may think of the decision maker more abstractly, standing in for various mechanisms through which pre-state societies solve collective-action problems, e.g., processes involving collaboration and negotiation.

Third, the conjectured presence of some type of ruler may in fact hold true for many quasi-egalitarian and pre-agrarian societies. It is common to categorize the political organization of human societies on a gradient from egalitarian bands, via more unequal tribes and chiefdoms, to fully fledged and highly hierarchical states (Flannery, 1972; Service, 1975; Diamond, 1997). In our model, equilibrium outcomes with low extractive capacity could at least correspond to chiefdoms.

Moreover, some societies at the earlier political stages have also been described as having embryonic rulers, tasked with rudimentary forms of public goods provision. Read (1959) coined the term “big man” for such leader figures among pre-state societies in New Guinea. Sahlins (1963) used the same term to contrast leader figures in Melanesia to those in more politically advanced Polynesian chiefdoms; see Lindstrom (1981) for other terminology used in the literature, such as “head man” and “center man.” Different from rulers of states, these leaders were typically not bestowed their powers through office or inheritance, but rather personal traits (Service, 1975, pp. 49–53). This may correspond to \({\underline{z}}\) in our model, applying when the preceding ruler did not invest in extractive capacity (by setting \(x_{t}=0\)).

3.6.3 Defense against external predators

The variable \(A_{t}\) is referred to as productive capacity. This may also include defensive (or protective) capacity. Specifically, we could let some fraction of the output be stolen by external predators, and allow the ruler to undertake costly investments to limit that fraction. That setting is explored in the Online Appendices, and shown to boil down to the same one presented here. The main difference is that some of the variables that we here treat as exogenous, such as \(\eta\) and \(\sigma\), in that setting become functions of the “deep” parameters characterizing the costs of investing in productive and defensive capacities, respectively.

One insight from that model set-up is that land that is less costly to protect corresponds to more productive land in the current setting (i.e., a higher B). Intuitively, resources not needed for protection can be invested in productive capacity instead, which translates to more output at a given level of total investment in defensive and productive capacities. In that sense, we can think of B as a measure not only of land productivity, but also of how well protected output is.Footnote 10

3.6.4 Tax collectors

We have conceptualized extractive capacity in this model as the fraction of the taxes collected that end up with the ruler, rather than being lost in the process of collecting them.

In order to not restrict ourselves to one single interpretation, we have not explicitly modelled how those tax revenues are lost. The Online Appendices proposes one way to capture that process more explicitly by introducing a new class of agents, called tax collectors. These can run off with the taxes they collect, and the ruler can invest in capacity to retrieve (some of) those lost revenues. The upshot is a model producing the same functional form for accumulation of extractive capacity as that in (6), but with \({\overline{z}}\), \({\underline{z}}\), and \(\phi\) being functions of “deep” model parameters.

3.6.5 Alternative ways to model extractive capacity

There are other ways to model extractive capacity. We can let the ruler face a cost of levying taxes, incurred in the same period they are levied. Then extractive capacity, \(z_{t}\), could be a variable characterizing that cost function, such that a higher \(z_{t}\) implies a lower cost of tax collection. This formulation resembles that of Mayshar et al. (2020, Online Appendices B).

Specifically, let the cost of levying a tax rate of \(\tau _{t}\) on total output \(Y_{t}\) equal \(C\left( \tau _{t},z_{t}\right) Y_{t}\), where \(C\left( \tau _{t},z_{t}\right)\) is increasing in the tax rate, \(\tau _{t}\), and decreasing in \(z_{t}\). Then the ruler’s budget constraint, corresponding to that in (7), becomes

$$\begin{aligned} c_{t}^{R}=\left[ \tau _{t}-C\left( \tau _{t},z_{t}\right) \right] Y_{t}-\eta A_{t+1}^{\sigma }-x_{t}. \end{aligned}$$
(9)

Our setting can be seen as a special case of this formulation, where \(C\left( \tau _{t},z_{t}\right) =\tau _{t}(1-z_{t})\), which makes (9 ) identical to (7). Similarly, what we can call the net tax (or extraction) rate, \(\tau _{t}-C\left( \tau _{t},z_{t}\right)\), then equals just \(z_{t}\tau _{t}\), which corresponds more closely to the variable used to measure statehood in Mayshar et al. (2020, Online Appendices B). In our benchmark model both \(\tau _{t}\) and \(z_{t}\) are endogenous, while they treat the latter as exogenous.

3.7 The ruler’s optimization problem

We are now ready to set up the ruler’s optimization problem. Recall that he chooses \(\tau _{t}\), \(x_{t}\), and \(A_{t+1}\) to maximize (8), subject to (5), (6), (7), (1) forwarded one period, and a non-negativity constraint on \(x_{t}\). More compactly, the problem can be written as follows:

$$\begin{aligned} \max _{\tau _{t},x_{t},A_{t+1}}(1-\beta )\ln \left( c_{t}^{R}\right) +\beta \ln (z_{t+1}Y_{t+1}), \end{aligned}$$
(10)

subject to

$$\begin{aligned} \begin{array}{l} x_{t}\ge 0, \\ z_{t+1}=\min \{{\overline{z}},{\underline{z}}+\phi x_{t}\}, \\ c_{t}^{R}=\tau _{t}z_{t}Y_{t}-\eta A_{t+1}^{\sigma }-x_{t}, \\ Y_{t+1}=(BA_{t+1})^{\alpha }L_{t+1}^{1-\alpha }, \\ L_{t+1}=\gamma (1-\tau _{t})Y_{t}. \end{array} \end{aligned}$$
(11)

We refer to this as the benchmark model. Its results can be understood from three different trade-offs that the ruler faces. First, higher investment in productive capacity, \(A_{t+1}\), generates a larger tax base in the next period (higher \(Y_{t+1}\)), at the cost of less consumption for the ruler today (lower \(c_{t}^{R}\)).

Second, a higher extraction rate, \(\tau _{t}\), gives higher income and consumption today (by raising more tax revenue, \(\tau _{t}z_{t}Y_{t}\)); this comes at the cost of a smaller future tax base (lower \(Y_{t+1}\)), in turn due to the Malthusian way in which more extraction reduces the future population size (\(L_{t+1}\)).

Third, investment in future extractive capacity, \(z_{t+1}\), is costly in terms of current consumption.

Due to the assumed linear functional form, and the upper and lower bounds on \(z_{t+1}\), this last trade-off can be seen to generate corner solutions: by setting \(x_{t}=0\), and thus \(z_{t+1}={\underline{z}}\), the ruler invests nothing in extractive capacity, keeping it at its minimum level; by setting \(x_{t}=({\overline{z}}-{\underline{z}})/\phi\), and thus \(z_{t+1}={\overline{z}}\), the ruler chooses maximum extractive capacity.

The ruler’s investment in future extractive capacity depends on his current effective tax base, \(z_{t}Y_{t}\). If this is small, then a marginal increase in \(\tau _{t}\) generates relatively little revenue, thus making it costly to finance investment in extractive capacity. If the effective tax base is small enough it is optimal to set \(x_{t}=0\); if it is sufficiently large, then it is optimal to set \(x_{t}=({\overline{z}}-{\underline{z}})/\phi\). In that sense, a currently strong and rich state is more likely to remain strong also in the next period. The next section derives explicit expressions for the ruler’s choice variables as functions of the effective tax base and exogenous parameters (with details deferred to Sect. 1 of the Appendices).

3.8 The ruler’s optimal choices

Let \({\underline{X}}\) and \({\overline{X}}\) denote the thresholds for \(z_{t}Y_{t}\) , above and below which the two constraints on \(z_{t+1}\) in (6) bind. That is, \(x_{t}=0\) and \(z_{t+1}={\underline{z}}\) if \(z_{t}Y_{t}\le {\underline{X}}\); and \(x_{t}=({\overline{z}}-{\underline{z}})/\phi\) and \(z_{t+1}={\overline{z}}\) if \(z_{t}Y_{t}\ge {\overline{X}}\). A weak ruler, with a low effective tax base (\(z_{t}Y_{t}\le {\underline{X}}\)), finds current extraction costly, making it optimal not to build any future extractive capacity, thus preserving the weak state. A strong ruler, with a large effective tax base (\(z_{t}Y_{t}\ge {\overline{X}}\)), finds it easy to extract resources, and chooses to maintain a strong state by investing enough to keep extractive capacity to its maximum, \({\overline{z}}\).

As shown in Sect. 1 of the Appendices, these thresholds are given by

$$\begin{aligned} {\overline{X}}=\frac{1}{\phi }\left[ {\overline{z}}\left( \frac{\beta \sigma (1-\alpha )+\sigma +\alpha \beta }{\beta \sigma }\right) -{\underline{z}} \right] , \end{aligned}$$
(12)

and

$$\begin{aligned} {\underline{X}}=\frac{1}{\phi }\left( \frac{\sigma (1-\alpha \beta )+\alpha \beta }{\beta \sigma }\right) {\underline{z}}. \end{aligned}$$
(13)

It is straightforward to show that \(0<{\underline{X}}<{\overline{X}}\) follows from \(0<{\underline{z}}<{\overline{z}}\).

The ruler’s choices thus depend on how the effective tax base falls relative to these thresholds. Consider first how the ruler sets the rate of extraction. Section 1 of the Appendices shows that the ruler’s optimal extraction rate can be written:

$$\begin{aligned} \tau _{t}=\left\{ \begin{array}{lll} 1-\left[ \frac{\beta \sigma (1-\alpha )}{\sigma (1-\alpha \beta )+\alpha \beta }\right] \left[ 1-\left( \frac{{\overline{z}}-{\underline{z}}}{\phi } \right) \frac{1}{z_{t}Y_{t}}\right] &{} \text {if} &{} z_{t}Y_{t}\ge {\overline{X}} , \\ 1-\left( \frac{\beta \sigma (1-\alpha )}{\beta \sigma (1-\alpha )+\sigma +\alpha \beta }\right) \left( 1+\frac{{\underline{z}}}{\phi z_{t}Y_{t}}\right) &{} \text {if} &{} z_{t}Y_{t}\in \left[ {\underline{X}},{\overline{X}}\right] , \\ 1-\left[ \frac{\beta \sigma (1-\alpha )}{\sigma (1-\alpha \beta )+\alpha \beta }\right] =\frac{\sigma (1-\beta )+\alpha \beta }{\sigma (1-\alpha \beta )+\alpha \beta } &{} \text {if } &{} z_{t}Y_{t}\le {\underline{X}}. \end{array} \right. \end{aligned}$$
(14)

It can be see from (14) that the relationship between \(\tau _{t}\,\) and \(z_{t}Y_{t}\) is inversely U-shaped. First, \(\tau _{t}\) is constant for \(z_{t}Y_{t}\le {\underline{X}}\), i.e., when investment in extractive capacity is not operative. This constant rate is the same as in the corresponding model without any investment in extractive capacity (see Sect. 4.1).

We also see that \(\tau _{t}\) is increasing in \(z_{t}Y_{t}\) for \(z_{t}Y_{t}\in \left[ {\underline{X}},{\overline{X}}\right]\). Over this interval, rulers respond to marginal increases in the effective tax base (\(z_{t}Y_{t}\)) by extracting more resources, in order to fund more investment in future extractive capacity. Finally, we see that \(\tau _{t}\) decreases with \(z_{t}Y_{t}\) for \(z_{t}Y_{t}\ge {\overline{X}}\). Intuitively, the cost of maintaining maximum extractive capacity falls relative to income as the effective tax base grows.

As \(z_{t}Y_{t}\) approaches infinity, \(\tau _{t}\) approaches the same level as when \(z_{t}Y_{t}\le {\underline{X}}\). However, for any finite level of \(z_{t}Y_{t}\), the extraction rate is always higher when the ruler invests the maximum amount in future extractive capacity (\(z_{t}Y_{t}\ge {\overline{X}}\) and \(z_{t+1}={\overline{z}}\)) than when he invests the minimum amount (\(z_{t}Y_{t}\le {\underline{X}}\) and \(z_{t+1}={\underline{z}}\)). That is, the top row of (14) is always greater than the bottom row, for finite \(z_{t}Y_{t}\). This means that any steady state with maximum investment in extractive capacity must have a higher extraction rate than one with no such investment. Below we explore if two such steady states can coexist.

3.9 Dynamics

Since the optimal extraction rate in (14) depends on the effective tax base, \(z_{t}Y_{t}\), the dynamics of the economy are most easily described in terms of the two state variables \(Y_{t}\) and \(z_{t}\).

3.9.1 Dynamics of \(z_{t}\)

As shown in Sect. 1 of the Appendices, the ruler’s optimal choice of \(z_{t+1}\) (as implied by the choice of \(x_{t}\)) can be written

$$\begin{aligned} z_{t+1}=\Phi (Y_{t},z_{t})\equiv \left\{ \begin{array}{lll} {\overline{z}} &{} \text {if} &{} z_{t}Y_{t}\ge {\overline{X}}, \\ \left( \frac{\beta \sigma }{\beta \sigma (1-\alpha )+\sigma +\alpha \beta } \right) \left[ \phi z_{t}Y_{t}+{\underline{z}}\right] &{} \text {if} &{} z_{t}Y_{t}\in \left[ {\underline{X}},{\overline{X}}\right] , \\ {\underline{z}} &{} \text {if } &{} z_{t}Y_{t}\le {\underline{X}}. \end{array} \right. \end{aligned}$$
(15)

That is, \(z_{t+1}\ge {\underline{z}}\) binds when \(z_{t}Y_{t}<{\underline{X}}\), and \(z_{t+1}\le {\overline{z}}\) binds when \(z_{t}Y_{t}>{\overline{X}}\). When these constraints are non-binding (i.e., when \(z_{t}Y_{t}\in \left[ {\underline{X}},{\overline{X}}\right]\)) the next period’s extractive capacity (\(z_{t+1}\)) increases linearly with the current period’s effective tax base (\(z_{t}Y_{t}\)). It is also easy to verify that the respective corner solutions coincide with the interior solution when \(z_{t}Y_{t}={\underline{X}}\) and \(z_{t}Y_{t}={\overline{X}}\).

3.9.2 Dynamics of \(Y_{t}\)

From (1) we see that \(Y_{t+1}=(BA_{t+1})^{\alpha }L_{t+1}^{1-\alpha }\) , and from (5) we recall that \(L_{t+1}=\gamma (1-\tau _{t})Y_{t}\). Once we have the ruler’s optimal \(A_{t+1}\) and \(\tau _{t}\) in terms of \(z_{t}\) and \(Y_{t},\) we can thus derive an expression for \(Y_{t+1}\) in terms of the same state variables. Section 2 of the Appendices shows that

$$\begin{aligned} Y_{t+1}=\Psi (Y_{t},z_{t},B)\equiv \left\{ \begin{array}{lll} \kappa DB^{\alpha }z_{t}^{\alpha -1}\left[ \phi z_{t}Y_{t}+{\underline{z}}- {\overline{z}}\right] ^{\rho } &{} \text {if} &{} z_{t}Y_{t}\ge {\overline{X}}\text {, } \\ DB^{\alpha }z_{t}^{\alpha -1}\left[ \phi z_{t}Y_{t}+{\underline{z}}\right] ^{\rho } &{} \text {if} &{} z_{t}Y_{t}\in \left[ {\underline{X}},{\overline{X}}\right] , \\ \kappa DB^{\alpha }z_{t}^{\alpha -1}\left( \phi z_{t}Y_{t}\right) ^{\rho } &{} \text {if } &{} z_{t}Y_{t}\le {\underline{X}}, \end{array} \right. \end{aligned}$$
(16)

where \(\rho =(\alpha /\sigma )+1-\alpha <1\), and where \(D>0\) and \(\kappa >1\) depend only on the exogenous and time-invariant variables \(\alpha\), \(\beta\) , \(\gamma\), \(\phi\), \(\sigma\), and \(\eta\) [see (47) and (54 ) in the Appendices], and play no role for the dynamics.

Note that \(Y_{t+1}\) depends on B, i.e., the land productivity factor that is independent of the ruler’s investment. This has interesting implications for how changes in B impact the dynamic configuration, as discussed below.

3.9.3 Multiple steady states

Now (15) and (16) define a two-dimensional dynamical system for \(z_{t}\) and \(Y_{t}\), which is illustrated in the phase diagram in Fig. 1. It shows the loci along which \(z_{t}\) and \(Y_{t}\) are constant (derived in Sect. 3 of the Appendices), and the regions where the constraints on extractive-capacity investment bind: \(z_{t+1}\ge {\underline{z}}\) binds when \(z_{t}Y_{t}<{\underline{X}}\), and \(z_{t+1}\le {\overline{z}}\) binds when \(z_{t}Y_{t}>{\overline{X}}\).

Fig. 1
figure 1

Phase diagram illustrating the dynamics. The loci along which \(z_{t}\) and \(Y_{t}\) are constant are indicated by the red and blue solid curves. The green dashed curves indicate the loci above and below which the constraints \(z_{t+1} \le {\overline{z}}\) and \(z_{t+1} \ge {\underline{z}}\) bind. In this configuration, there exist two stable steady states (Color figure online)

Generally, the configuration depends on exogenous variables, in particular B. Figure 1 illustrates a case where there are two locally stable steady-state equilibria, and one unstable. (Exact conditions for this type of configuration are stated in Proposition 1 below.) One stable steady-state equilibrium can be labelled a low-extractive steady state. Here the ruler undertakes no investment in extractive capacity, so \(z_{t}={\underline{z}}\), and output can be written

$$\begin{aligned} {\underline{Y}}=\left[ \kappa DB^{\alpha }{\underline{z}}^{\alpha -1}\left( \phi {\underline{z}}\right) ^{\rho }\right] ^{\frac{1}{1-\rho }}, \end{aligned}$$
(17)

which is illustrated in Fig. 1, and derived by setting \(Y_{t+1}=Y_{t}={\underline{Y}}\) and \(z_{t}={\underline{z}}\) in the bottom row of ( 16). The associated extraction rate, which we can denote \({\underline{\tau }}\), is given by the bottom row of (14), i.e., \({\underline{\tau }} =[\sigma (1-\beta )+\alpha \beta ]/[\sigma (1-\alpha \beta )+\alpha \beta ]\) . Population is given by (5) as \({\underline{L}}=\gamma (1-\underline{ \tau }){\underline{Y}}\).

The other stable steady state, at which \(z_{t}={\overline{z}}\), can be labelled the high-extractive steady state. Here output equals \({\overline{Y}}\), defined from \({\overline{Y}}=\kappa DB^{\alpha }{\overline{z}} ^{\alpha -1}\left[ \phi {\overline{z}}{\overline{Y}}+{\underline{z}}-{\overline{z}} \right] ^{\rho }\); cf. the top row of (16). The extraction rate in this steady state, \({\overline{\tau }}\), is given by the top row of (14 ), setting \(z_{t}Y_{t}={\overline{z}}{\overline{Y}}\). From (5), population can be written \({\overline{L}}=\gamma (1-{\overline{\tau }}){\overline{Y}}\).

A saddle path separates the phase diagram into two basins of attraction, each associated with one of the two steady states.Footnote 11 An economy starting off above the saddle path (i.e., with a large initial effective tax base, \(z_{0}Y_{0}\)) will converge over time to the high-extractive steady state. An economy starting off below the saddle path converges to the low-extractive steady state.

A trajectory leading to the high-extractive steady state eventually enters a region where \(z_{t}Y_{t}>{\overline{X}}\), at which point the upper bound on extractive capacity investment starts to bind. From there, \(z_{t}\) stays constant at \({\overline{z}}\), while \(Y_{t}\) continues to grow, stabilizing at \({\overline{Y}}\), as illustrated in Fig. 1. Similarly, a trajectory leading to the low-extractive steady state eventually enters a region where \(z_{t}Y_{t}<{\underline{X}}\), after which \(z_{t}\) stays constant at \({\underline{z}}\), while \(Y_{t}\) declines, approaching \({\underline{Y}}\).

We can also compare levels of population, output, extractive capacity, and rates of extraction in the two steady states. This is a nontrivial exercise, since these are all endogenous and jointly determined. The following proposition summarizes these results, and provides conditions for the existence and uniqueness of each steady state, respectively.

Proposition 1

Consider the model with investment in both productive and extractive capacities, as described by (10) and (11). In this model, there exist \({\widehat{B}}>0\) and \(\widehat{{\widehat{B}}}>0\), such that:

  1. (a)

    If, and only if, \(B<{\widehat{B}}\) does there exist a low-extractive steady state, \(({\underline{z}},{\underline{Y}})\), such that \({\underline{z}}{\underline{Y}}<{\underline{X}}\).

  2. (b)

    If, and only if, \(B>\widehat{{\widehat{B}}}\) does there exist a high-extractive steady state, \(({\overline{z}},{\overline{Y}})\), such that \({\overline{z}}{\overline{Y}}>{\overline{X}}\).

  3. (c)

    For \({\underline{z}}\) small enough, it holds that \(\widehat{ {\widehat{B}}}<{\widehat{B}}\). That is, the low- and the high-extractive steady states coexist for \(B\in (\widehat{{\widehat{B}}},{\widehat{B}})\).

  4. (d)

    Assume that \(B\in (\widehat{{\widehat{B}}},{\widehat{B}})\), so that both steady states exist. Then the following holds:

    1. (i)

      The low-extractive steady state has a lower extraction rate than the high-extractive steady state, i.e., \({\underline{\tau }}< {\overline{\tau }}\);

    2. (ii)

      The low-extractive steady state has lower output than the high-extractive steady state, i.e., \({\underline{Y}}<{\overline{Y}}\);

    3. (iii)

      The low-extractive steady state has lower population than the high-extractive steady state, i.e., \({\underline{L}}<{\overline{L}}\).

All proofs are in Sect. 5 of the Appendices.

The possibility of multiple steady states is quite intuitive, and has to do with how current extraction affects future extraction. A larger initial level of the effective tax base—i.e., a larger \(z_{t}Y_{t}\)—induces the ruler to invest more in both \(z_{t+1}\) and \(Y_{t+1}\), leading to a larger effective tax base in the next period. This can sustain high levels of extractive and productive capacities across generations of rulers. As we shall see in Sect. 4 below, investment in productive and extractive capacities are both needed for multiplicity of steady-state equilibria to arise.

The claims in part (d) in Proposition 1, comparing the properties of these steady states, are far less obvious.

For example, part (d) (iii) states that the high-extractive steady state has larger population (density) than the low-extractive one (\({\underline{L}}< {\overline{L}}\)). This may seem counter-intuitive, since a higher rate of extraction [see (d) (i)] would imply a smaller population for a given level of output; to see this one can impose steady state on (5). The result still holds because output is higher in the high-extractive steady state [see (d) (ii)], in turn due to higher investment in productive capacity, which is sustained by the ruler’s larger tax revenues.

Part (d) (i) of Proposition 1 is not obvious either (despite the ostensibly self-explanatory labels). We gleaned some of the intuition from ( 14). It is not merely about higher extractive capacity inducing a higher rate of extraction. In fact, the rate of extraction in the low-extractive steady state (\({\underline{\tau }}\)) is independent of the exogenously given minimum level of extractive capacity (\({\underline{z}}\)).Footnote 12 In other words, small changes in extractive capacity do not affect the rate of extraction, as long as the economy is not pushed out of the low-extractive steady state. Rather, the result refers specifically to a steady-state comparison. In the high-extractive steady state the ruler chooses a higher rate of extraction to finance investment in future extractive capacity, which is worthwhile precisely because of the large effective tax base in that steady state.

Shocks to \(z_{t}\) or \(Y_{t}\) As explained above, given a configuration with multiple steady states, such as that in Fig. 1, the economy converges over time to one of the stable steady-state equilibria. Which one it converges to depends on its initial position relative to the saddle-path trajectory leading to the unstable steady state.

This means that an economy can transition from the low-extractive to the high-extractive steady state in the wake of a one-period shock to either extractive capacity (\(z_{t}\)), or output (\(Y_{t}\)), or a combination of the two. Intuitively, the shock raises the ruler’s effective tax base in period t, inducing him to invest more in productive and/or extractive capacity, possibly putting the economy on a trajectory leading to the high-extractive steady state. For this to happen, the shock must push (\(z_{t},Y_{t}\)) above the threshold saddle path, into the basin of attraction of the high-extractive steady state.

A transition due to a shock to output would be consistent with the Surplus Theory, and could perhaps be interpreted as the result of temporary climatic variations, and/or a temporary phase of good harvests. A transition due to a shock to extractive capacity relates conceptually to the Appropriability Theory.

Exogenous changes to B Above we considered shocks to extractive capacity (\(z_{t}\)) or output (\(Y_{t}\)). We can also analyze exogenous increases in the geographically determined land productivity factor, B. As shown in Sect. 3 of the Appendices, this shifts up the (\(Y_{t+1}=Y_{t}\))-locus, thus raising output in the low-extractive steady state; note from (17) that \({\underline{Y}}\) is increasing in B. It also expands the basin of attraction for the high-extractive steady state. At some point the low-extractive steady state ceases to exist. Intuitively, a rise in B implies more output, which in turn can be used to accumulate both productive and extractive capacities.

Changes in B need not be interpreted as shocks. Very gradual increases in B would have small effects at first, but eventually lead to rapid changes in \(z_{t}\) and \(Y_{t}\), as the dynamic configuration changes and the high-extractive steady state becomes the unique steady state (i.e., when B exceeds \({\widehat{B}}\)). The economy can thus initially change slowly in response to improvements in B, and then go through a rapid spurt in extractive capacity and output, stabilizing at \({\overline{z}}\) and \({\overline{Y}}\), respectively. From there, output expands more slowly again (as \({\overline{Y}}\) is increasing in B).

4 Closing down channels

In the benchmark model the ruler could invest in both extractive and productive capacities. To see why this matters, we next consider what happens when we close down either of these channels.

4.1 Closing down investment in extractive capacity

To remove investment in extractive capacity from the model, we ignore (6), setting \(x_{t}=0\), and let \(z_{t}\) equal some exogenous constant, here denoted \({\widetilde{z}}\in (0,1]\). In this setting, an increase in \({\widetilde{z}}\) represents a rise in extractive capacity independent of any actions taken by the ruler, conceptually similar to Mayshar et al. (2020, Online Appendices B), who treat extractive capacity as exogenous.

The ruler’s optimization problem now becomes:

$$\begin{aligned} \max _{\tau _{t},A_{t+1}}(1-\beta )\ln \left( c_{t}^{R}\right) +\beta \ln ( {\widetilde{z}}Y_{t+1}), \end{aligned}$$
(18)

subject to

$$\begin{aligned} \begin{array}{l} c_{t}^{R}=\tau _{t}{\widetilde{z}}Y_{t}-\eta A_{t+1}^{\sigma }, \\ Y_{t+1}=(BA_{t+1})^{\alpha }L_{t+1}^{1-\alpha }, \\ L_{t+1}=\gamma (1-\tau _{t})Y_{t}. \end{array} \end{aligned}$$
(19)

The solution to this model resembles that analyzed in the previous section in the case when the non-negativity constraint on \(x_{t}\) was binding (\(x_{t}=0\)); see Sect. 1 of the Appendices for details. The dynamics of output becomes

$$\begin{aligned} Y_{t+1}=GY_{t}^{\rho }, \end{aligned}$$
(20)

where (recall) \(\rho =(\alpha /\sigma )+1-\alpha <1\), and where G depends on exogenous parameters and is increasing in both agricultural productivity ( B), and extractive capacity (\({\widetilde{z}}\)); see (60) in the Appendices. The following proposition summarizes the main results in this setting.

Proposition 2

Consider the model without investment in extractive capacity, as described by (18) and (19). In this model, there exists a unique (non-zero) steady-state equilibrium where the following holds: extractive capacity equals its exogenous level, \({\widetilde{z}}\); output equals \({\widetilde{Y}}=G^{1/(1-\rho )}\); and the rate of extraction equals

$$\begin{aligned} {\widetilde{\tau }}=\frac{\sigma (1-\beta )+\alpha \beta }{\sigma (1-\alpha \beta )+\alpha \beta }. \end{aligned}$$
(21)

Thus, taking investment in extractive capacity out of the model rules out multiplicity of steady states. It can be seen that \({\widetilde{Y}}\) is increasing in both B and \({\widetilde{z}}\) (since G is), so we do get the expected predictions from increases in both land productivity and extractive capacity; note that extractive capacity still affects tax revenues and thus investment in productive capacity, \(A_{t+1}\).

However, optimal \(\tau _{t}\) is here constant. [Indeed, the expression in (21) is the same as in the bottom row in (14), which applies to the benchmark model when \(x_{t}=0\), i.e., \(z_{t}Y_{t}<{\underline{X}}\).] Since the extraction rate does not depend on either B or \({\widetilde{z}}\), this setting cannot explain the rise of statehood as an endogenous outcome of changes in B and/or \({\widetilde{z}}\). In that sense, without investment in extractive capacity the model is inconsistent with both the Surplus and Appropriability Theories.Footnote 13

4.2 Closing down investment in productive capacity

Next we remove investment in productive capacity, setting \(A_{t}=1\) in all periods, but keep investment in extractive capacity. The ruler’s budget constraint, analogous to that in (7), becomes \(c_{t}^{R}=\tau _{t}z_{t}Y_{t}-x_{t}\). The expression for output in (1) becomes \(Y_{t}=B^{\alpha }L_{t}^{1-\alpha }\).

The ruler’s optimization problem can now be written:

$$\begin{aligned} \max _{\tau _{t},x_{t}}(1-\beta )\ln \left( c_{t}^{R}\right) +\beta \ln (z_{t+1}Y_{t+1}), \end{aligned}$$
(22)

subject to

$$\begin{aligned} \begin{array}{l} x_{t}\ge 0, \\ z_{t+1}=\min \{{\overline{z}},{\underline{z}}+\phi x_{t}\}, \\ c_{t}^{R}=\tau _{t}z_{t}Y_{t}-x_{t}, \\ Y_{t+1}=B^{\alpha }L_{t+1}^{1-\alpha }, \\ L_{t+1}=\gamma (1-\tau _{t})Y_{t}. \end{array} \end{aligned}$$
(23)

This model coincides with that in the benchmark setting in Sect. 3 when \(\sigma\) goes to infinity, i.e., when we make investment in productive capacity prohibitively expensive. Specifically, there are two thresholds for the effective tax base, \({\underline{X}}\) and \({\overline{X}}\), below and above which investment in extractive capacity is constrained to its minimum or maximum levels, respectively. Letting \(\sigma\) go to infinity in (12) and (13), these thresholds can now be written

$$\begin{aligned} {\overline{X}}=\frac{1}{\phi }\left[ {\overline{z}}\left( \frac{\beta (1-\alpha )+1}{\beta }\right) -{\underline{z}}\right] , \end{aligned}$$
(24)

and

$$\begin{aligned} {\underline{X}}=\frac{(1-\alpha \beta ){\underline{z}}}{\beta \phi }. \end{aligned}$$
(25)

That is, if \(z_{t}Y_{t}\le {\underline{X}}\), then \(z_{t+1}={\underline{z}}\) and \(x_{t}=0\); if \(z_{t}Y_{t}\ge {\underline{X}}\), then \(z_{t+1}={\overline{z}}\) and \(x_{t}=({\overline{z}}-{\underline{z}})/\phi\).

The dynamical system describing the evolution of \(z_{t}\) and \(Y_{t}\) is derived in Sect. 2 of the Appendices, and can also be derived from (15) and (16) by letting \(\sigma\) go to infinity, and setting \(\rho =1-\alpha\). Because the resulting expressions for \(z_{t+1}\) and \(Y_{t+1}\) are so qualitatively similar to those in (15) and (16), we suppress these to the Appendices.

We sum up the main results in the following proposition.

Proposition 3

Consider the model without investment in productive capacity, as described by (22) and (23). In this model, there exist \(B^{*}>0\) and \(B^{**}>0\), such that:

  1. (a)

    If, and only if, \(B<B^{*}\) does there exist a low-extractive steady state, \(({\underline{z}},{\underline{Y}})\), such that \({\underline{z}}{\underline{Y}}<{\underline{X}}\).

  2. (b)

    If, and only if, \(B>B^{**}\) does there exist a high-extractive steady state, \(({\overline{z}},{\overline{Y}})\), such that \({\overline{z}}{\overline{Y}}>{\overline{X}}\).

  3. (c)

    \(B^{**}>B^{*}\). That is, the low- and the high-extractive steady states cannot coexist.

  4. (d)

    If \(B\in (B^{*},B^{**})\), then there exists a unique steady state, \((z^{\text {int}},Y^{\text {int}})\), such that \(z^{\text { int}}Y^{\text {int}}\in ({\underline{X}},{\overline{X}})\). Furthermore, it holds that:

    1. (i)

      The steady-state extraction rate, \(\tau ^{\text {int}}\), is increasing in B and \(\phi\);

    2. (ii)

      The steady-state level of extractive capacity rate, \(z^{ \text {int}}\), is increasing in B and \(\phi\);

    3. (iii)

      The steady-state level of output, \(Y^{\text {int}}\), is increasing in B and decreasing in \(\phi\);

    4. (iv)

      The steady-state level of population density, \(L^{\text { int}}\), does not depend on B and is decreasing in \(\phi\).

Parts (a) and (b) of Proposition 3 are consistent with the corresponding claims in Proposition 1.Footnote 14 More (less) productive land makes the high-extractive (low-extractive) steady state more likely to exist. This is broadly consistent with the Surplus Theory.

However, part (c) of the proposition shows that multiple steady-state equilibria are not possible in this setting. If land productivity, B, is high enough that the high-extractive steady state exists (meaning \(B>B^{**}\)), then it is also too high for the low-extractive steady state to exist (since \(B^{**}>B^{*}\)). Intuitively, multiplicity of steady states requires strong enough feedback from current extraction to future extraction, and this feedback is weakened when rulers are not able to invest in productive capacity.

Part (d) takes this point further, by considering the case when \(B\in (B^{*},B^{**})\). Here neither the low- or high-extractive steady state exists. Rather, the economy converges to a unique interior steady state. Interestingly, this steady state has many properties—summarized by parts (i)-(iv) of (d)—that seem inconsistent with the facts. For example, a (small) rise in land productivity, B, leads to a higher steady-state extraction rate and higher levels of extractive capacity, but leaves steady-state population density unchanged. Intuitively, higher land productivity raises population in the usual Malthusian way, but that is counteracted by the higher rate of extraction, and here the net effect is zero. Both those effects were present in the benchmark model, but there higher tax revenues also generated higher investments in productive capacity, which tended to increase steady-state population density. That third channel is closed down here.

Similarly, a rise in \(\phi\) (which, recall, measures how easy it is to build extractive capacity) raises the steady-state extraction rate and extractive capacity, but lowers population density. This implies a negative association between statehood and population density, which is inconsistent with the empirical facts.

5 Empirical results

The results of the model build on a complementarity between extractive and productive capacities. Intuitively, the possibility of a high-extractive steady state hinges on land productivity affecting the effective tax base and thus investment in future extractive capacity. The implication is that an increase in land productivity, B, is more likely to generate statehood if investments in extractive capacity are easier to undertake, i.e., if \(\phi\) is large.Footnote 15

We can explore if this holds empirically by comparing the correlation between statehood and land productivity for samples of countries with high and low \(\phi\). To measure \(\phi\), we may lean on a literature emphasizing how much easier elites have found it to build a state when they already have a blueprint. For example, the earliest states developed writing and bookkeeping, which were copied by elites developing states later (Scott, 2009, pp. 226–234); Stasavage, 2020, pp. 91–93). Similarly, Ertman (1997, p. 27) argues that European state building became easier at a point when rulers could hire from an existing pool of experts to serve as administrators and in the military. In a multi-society interpretation of our model, this suggests that the return to investing in extractive capacity in one society, as captured by \(\phi\), could depend on the level of extractive capacity across a range of societies.

To fix ideas, suppose a group of countries have transitioned into statehood in a first wave. Since they did not have any statehood blueprints they faced a very low \(\phi\), but transitioned anyhow, possibly for reasons not modelled here, and once they have transitioned they are more likely to maintain statehood moving forward (due to the multiplicity of locally stable steady states). The remaining countries, being able to draw on the state knowledge accumulated by the first wave of countries, face a higher \(\phi\). The complementarity between B and \(\phi\) should then imply that countries in the second wave transition earlier if they have higher B.

5.1 A simulation example

To better understand the dynamics of a model where \(\phi\) changes over time, we can first consider a simulation where in each period \(\phi\) is a function of the average level of extractive capacity, \(z_{t}\), across 200 societies. (For details, see Sect. 1 of the Appendices) We let these 200 societies be endowed with different levels of land productivity, B, which is uniformly distributed between the two thresholds discussed in Proposition 1, \(\widehat{{\widehat{B}}}\) and \({\widehat{B}}\). Thus, two steady states exist initially.

All societies start off in a low-extractive steady state, with minimum extractive capacity (\({\underline{z}}\)), but 20 are exogenously hit by a shock at \(t=40\), giving them maximum extractive capacity (\({\overline{z}}\)). These 20 represent early states, and have levels of B distributed in the same way as among the other 180. (Here we select them as every tenth society when ranked by B, but one can also select them randomly.) Their function in this simulation is to initiate a process through which statehood can spread: the initial rise in average \(z_{t}\) raises \(\phi\), in turn inducing more societies to invest in \(z_{t}\), thus raising \(\phi\) further, creating a self-propelling dynamic.

Figure 2 shows the simulated time paths of the log of \(z_{t}\) for three societies out of the 180 not hit by the shock. A higher B is associated with an earlier rise in \(z_{t}\), since higher land productivity induces earlier investments in \(z_{t}\) when \(\phi\) starts to rise; the rise in \(\phi\) is in turn driven by the rise in average \(z_{t}\) across the 200 societies, shown as a dotted line.

Fig. 2
figure 2

Simulated time paths over 100 periods showing log extractive capacity, \(\ln (z_{t})\), for three societies in a setting where \(\phi\) depends on the average level of \(z_{t}\) across all societies (shown as a dotted path). These three are all among those societies not hit by a shock to extractive capacity

Some paths in Fig. 2 show a non-monotonic rise (hardly visible unless we log \(z_{t}\)), which reflects that the dynamics for a fixed \(\phi\) exhibit two locally stable steady states. Depending on parameter values, not all societies need ever transition into statehood, but in this simulation all 200 societies make the transition within 60 periods. In any given period, societies with higher B have higher levels of \(z_{t}\).

Figure 3 illustrates the cross-sectional relationship between land productivity and a cumulative statehood measure, namely mean extractive capacity over the 100 periods. The 20 societies with the highest levels of statehood are those that experienced a positive shock. By assumption, these have levels of B distributed across the same interval as the remaining 180, and thus show little association between land productivity and state history.Footnote 16 Among the remainder, however, we see a clear positive relationship between land productivity and mean extractive capacity, such that the highest levels of statehood are found in societies with the highest land productivity.

5.2 Cross-country evidence from Eurasia

Next we explore if this pattern is consistent with cross-country data. We focus on the continent of Eurasia, where most state building has spread from a couple of centers (see discussion below). We use accumulated State Antiquity over different periods from 3500 BCE to 1500 CE from Borcan et al. (2018) to measure statehood (corresponding to mean extractive capacity over time in the simulation). We use the Caloric Suitability Index (CSI) from Galor and Özak (2016) to measure land productivity. (See Sect. 2 of the Appendices for more details about the data.)

Table 1 presents results from regressing State Antiquity on CSI for different subsamples, namely countries which developed statehood before and after different temporal cutoffs. Columns (1)–(3) consider 450 CE, a common benchmark for the end of the classical-age state building era (see, e.g., Mayshar et al., 2020). Columns (4)–(9) consider 1000 BCE, an earlier point at which much fewer countries had begun to develop statehood.

Table 1 Agricultural productivity and statehood: countries with late and early state development

Consider first columns (1), (4), and (7) in Table 1, which use samples of countries with relatively late state development. Here we find a positive and significant correlation between the Galor–Özak CSI index and statehood. The relationship among countries with earlier state development in the remaining columns is mostly insignificant, at least when controlling for existing state development up until the cutoff year; see columns (3), (6), and (9). This is consistent with the simulation results in Fig. 3. That is, the relationship between accumulated statehood and land productivity tends to be positive for countries that developed statehood later, and close to zero for those with early statehood.

Fig. 3
figure 3

Plot showing the cross-sectional relationship between land productivity, B, and mean extractive capacity over 100 periods, based on the same simulation as in Fig. 2. Each circle represents one society

Figure 4 illustrates the relationship between land productivity and statehood for early and late state developers, using 1000 BCE as cutoff; cf. columns (4) and (5) in Table 1. Note that the pattern is qualitatively similar to the simulated one in Fig. 3.

Fig. 4
figure 4

Plot showing the relationship between statehood and the Galor–Özak CSI index for Eurasian countries that already had some statehood before 1000 BCE, and those that did not

Table 2 explores these cross-country data further when using 1000 BCE as cutoff for late and early state development, but using the full sample of Eurasian countries and instead interacting land productivity with an indicator for late state development. Column (1) first documents a negative but insignificant unconditional relationship between Galor–Özak CSI and statehood. This turns positive and significant in column (2), where we enter a Late Statehood Dummy, equal to one for countries which developed statehood after 1000 BCE. The Late Statehood Dummy itself carries a significant negative coefficient for obvious reasons.

Table 2 Agricultural productivity and statehood: interacting late statehood with agricultural productivity

In column (3) we interact the Late Statehood Dummy and the Galor–Özak CSI index. The interaction term comes out as positive and significant just below the 5% level. It stays positive and becomes much more precisely estimated in column (4), where we include region fixed effects. Column (5) also controls for the geodetic distance from country centroids to Baghdad or Beijing, whichever is closest, conjectured centers for state origins in Eurasia. Column (6) adds a control for Log Absolute Latitude. Throughout, the positive coefficient on the interaction term stays significant at the 5% level, or better. In other words, land productivity shows a positive association with statehood among countries that developed statehood later, just as we should expect.

As mentioned, we here focus on the Eurasian continent, since state building did not spread between Eurasia and other continents prior to 1500. When including the Americas, or the rest of the world, the results in Tables 1 and 2 tend to weaken. This seems consistent with the idea that land productivity should matter more when state building tools can be copied or imported more easily.

5.3 Anecdotal evidence from Sweden

The data presented above end in 1500 CE, but state building continued after that, in particular in Northern Europe, which lagged behind the continent (cf. Fig. 4). Sweden offers some concrete examples of how rulers of younger states could use tax revenue to import state building after 1500.

As described by Ertman (1997, pp. 313–314), in 1538 Sweden’s first king Gustav I (or Gustav Vasa) hired a German minister, Conrad von Pyhy, to organize its central administration following a template from the Holy Roman Empire. From 1611, Gustavus Adolphus continued state centralization by borrowing from more recent German and Dutch models.

Architecture offers another example. The oldest and most famous castles and monuments from Sweden’s so-called Great Power era in the 17th century were designed by foreign architects, in particular Simon de la Vallée and Nicodemus Tessin the Elder, who acquired their skills on the continent (Stevens Curl & Wilson, 2015). There may be more important (and productive) aspects of state building than castles, but this does illustrate that skills related to state building could indeed be imported.

6 Concluding remarks

There are many competing explanations of what caused the rise and spread of statehood, or social stratification more generally. The Surplus Theory posits that a non-producing elite could only be supported with a “surplus” supply of food. This surplus, goes the argument, arrived when land productivity rose in the wake of the Neolithic Revolution, i.e., when humans transitioned from food procurement through hunting and gathering to using agriculture. A different theory has been labelled the Appropriability Theory. It holds that the rise of states was rather about the arrival of new crops, which were easier for a ruling elite to confiscate.

This paper has presented a model which incorporates mechanisms related to those emphasized by both the Surplus and Appropriability Theories. A ruler extracts resources from a subject population, the size of which evolves over time in a Malthusian fashion, dependent on the ruler’s rate of extraction. The ruler can invest the extracted resources in what we call extractive and productive capacities. These complement each other in such a way that the model can give rise to multiple steady states holding constant land productivity and other exogenous factors. One steady state has low extractive capacity, a low extraction rate, and low population density and output; the other has high extractive capacity, a high extraction rate, and high population density and output.

Not only can the combination of extractive and productive capacities give rise multiple steady states. This paper has shown that both of these elements are needed for such multiplicity to arise. In that sense, the Surplus and Appropriability Theories, as modelled here, can generate richer theoretical results together than each theory on its own.

To illustrate the empirical relevance of the model we exploit its complementarity between land productivity and the return to state building. Intuitively, countries which develop statehood later are able to draw on the state knowledge accumulated by earlier states, and thus face a higher return to efforts and resources directed towards state building compared to countries which developed statehood from scratch. Therefore, among countries which transition into statehood relatively late, we should expect too see a positive association between land productivity and state antiquity, but not necessarily among earlier states. Evidence from across Eurasian countries supports this prediction.