Optimal Tax Policy and Endogenous Growth through Innovation*

We investigate optimal tax policy in a Romer-style endogenous growth model. We derive formulas for the optimal tax rates on capital, labour, and innovation on a balanced growth path. We compute the balanced growth path and the transition to it with optimal policy for a range of parameter values. We find that capital should be taxed in the short run, but be paid its marginal product in the long run. The returns to innovation and production labour, on the other hand, should always be lower than their marginal products. Whether the resulting taxes on innovative activity should be positive or negative depends on (a) the extent of government spending needs, (b) the importance of innovation externalities and (c) the market power of patent holders. The welfare gains from optimal policy are much larger than in a comparable exogenous growth model. JEL Classification: H21, E62, O3


Introduction
In this paper, we explore the implications of optimal tax policy in the Ramsey (1927) sense in an environment similar to that of Romer (1990). Our first goal is to de-termine what optimal policy looks like in that environment. Our motivation for investigating this aspect of Romer's model is that it is an environment where the ultimate engine of growth is endogenous innovation, innovation that responds to incentives. These incentives can to a large extent be influenced by policy, meaning that policy becomes an important determinant of long-run growth. This in turn implies that there is potentially more at stake than there is in an exogenous growth model à la Ramsey-Cass-Koopmans, and so the optimal policy choice may become more important. As Lucas (1988) famously said, "once one starts to think about [the determinants of long-term growth and its consequences for human welfare], it is hard to think about anything else." Our second goal, then, is to determine the extent to which growth driven by endogenous innovation raises the stakes associated with tax policy.
To get a sense of how important growth is for welfare, consider a simple back-of-theenvelope calculation of the welfare benefits of a permanent increase in the growth rate. Suppose that consumption sequences are ranked according to the following utility function, U = ∞ t=0 β t ln c t . Now compare two consumption paths which are identical, except that one grows (faster) by a factor Γ in every period. By what factor ∆ must consumption in every period increase in order to make the other path as attractive as the faster growing one? Straightforward calculations reveal that (1) As an example, suppose every period is a year, and β = 0.98. 1 An increase in the growth rate by 0.1 percentage points, say from 2% to 2.1%, is then equivalent to a roughly 5% permanent level increase. This little calculation gives us a sense of what may be achieved through policy reform in the context of an environment, such as ours, where growth is endogenous in a way that responds to policy. 2 The point here is that even small changes in growth-including ones small enough to be hard to detect statistically-have large effects on welfare.
We base our model environment on that of Romer (1990) partly because it is one of the most well-known models of endogenous growth but also because we find the focus on innovation, which in turn is endogenously generated from skilled labour, particularly compelling. Moreover, as we detail in Section 2, optimal policy in such an environment has not been studied before. This claim may sound implausible in view of the existence of several papers that appear to do exactly that. However, upon closer examination, it turns out that those who have studied Ramsey optimal taxation in models based on Romer (1990), have in fact modified the model environment in such a way as to reduce them to versions of the Rebelo (1991) or "AK" model. The point is that, in these modified environments, the ultimate engine of growth is constant returns to cumulable factors. (It is then not very important whether we call these cumulable factors physical capital, human capital or ideas.) The basic properties of optimal policy in such an environment was studied already in Jones et al. (1993) and Jones et al. (1997). What they found was that, in the absence of externalities, all taxes whould go to zero in the long run. What we do, by contrast, is to adhere more closely to the essential features of Romer (1990), and, as a result, our findings are quite different from those of Jones et al. (1993) and indeed from those of more recent papers in the area as detailed in Section 2. Nevertheless, our model is not quite identical to that of Romer (1990); we modify his model in the following four important ways for reasons that we find compelling. 2 The same basic point is found in King and Rebelo (1990), who compared the welfare effects of tax changes in a Rebelo (1991) model (also known as the "AK" model) with those in an exogenous growth model. They concluded, as we do, that there is essentially an order of magnitude difference. That is not to say that we are merely expressing the conventional wisdom. Indeed, Lucas (1990) found that the growth and welfare effects of optimal tax reform in the context of the Lucas (1988) model, where long-term endogenous growth is driven by accumulation of human capital, are quantitatively trivial.
First, we assume that innovation consists of coming up with new types of intermediate goods, not new types of capital. The purpose of this modification is to enable us to distinguish between taxing capital and taxing innovation.
Second, we assume that innovative activity is associated with an externality that might be called a crowding effect. Specifically, how many new ideas an individual inventor comes up with increases linearly in that inventor's effort, but decreases in the aggregate amount of innovative effort. The rationale for this specification is the notion that certain ideas are, as it were, "in the air", just waiting to be invented, so that innovative effort is to some extent directed at being the first to come up with something as opposed to coming up with something that would not have been invented otherwise. A single parameter determines the extent to which innovative activity is productive as opposed to mere rent-seeking.
Third, patents (exclusive use to ideas) are assumed to expire after one period (which we think of as about 20 years) rather than to last forever, as they do in Romer (1990).
We think this assumption is a reasonable one from an empirical standpoint. In this context we emphasize that, as in Jones (2019), we do not draw a sharp distinction between formal R&D on the one hand and entrepreneurial activity on the other.
Consequently we think of patent duration not necessarily only in a narrow legal sense but as the time it takes for someone to copy an idea, whether patented or not.
In addition, there are two compelling model-driven reasons for our assumption of one-period-only exclusive rights to ideas. One is that it implies that the initial stock of ideas cannot be taxed (because nobody owns it), so (in contrast to capital) there is no temptation on the part of the government to confiscate it. Another is that it renders the optimal taxation problem a bit more tractable.
Fourth, like Romer (1990), we assume that labour is employed in innovation as well as in ordinary production, but, unlike him, we assume that each type of labour (innovative and ordinary) is supplied elastically. In our model, innovative labour can be used only to come up with new ideas, whereas ordinary labour can be used only for production. In Romer (1990), human capital is flexible in that it can be used for either production or innovation, so as to make innovation activity endogenous; with our assumption of endogenous labour supply, this flexibility is not necessary.
Furthermore, distinguishing between production labour and innovation labour allows us to make a conceptually clean distinction between taxing labour employed in production and taxing innovation.
Our main results are the following. First, we show analytically that, in the long run, capital should be paid its marginal product, while innovative activity and production labour should be paid less than their marginal products. As in Chamley (1986) and others, the initial capital stock should be taxed heavily. Since optimal taxation seeks to tax endowments (as opposed to effort/sacrifice) and there is no endowment of capital besides the initial one, long-run capital taxes are not used for the purpose of raising revenue (as opposed to correcting for distortions). At the same time, there is an endowment of time for production labour and an endowment of time for innovation in each period; therefore, the related economic activities will be taxed for the purpose of raising revenue in every period. 3 Second, it is ambiguous whether the resulting taxes on profits from patents and, by implication, innovative activity, should be positive or negative. Due to the existence of externalities, innovative activity may be paid less than its marginal product and still face negative taxes. Whether innovation should be taxed or subsidized depends on (a) government spending needs (the desire for government spending), (b) the strength of the innovation externality, and (c) the degree of substitutability between different intermediate goods. The higher the desire for government spending, and the stronger the negative intra-period externality associated with innovation, the higher the optimal tax on innovation. These results are immediately intuitive.
Meanwhile, the elasticity of substitution between intermediate good varieties matters in a surprising way. The lower this elasticity of substitution, the higher the market power of firms that own patents and the higher their profits. Casual intuition would suggest that profit/patent taxes should be higher the higher the market power, but we find the opposite: a higher elasticity of substitution implies higher optimal profit taxes. To understand this, it may be helpful to note that the quantity supplied by a firm with market power is independent of the profit taxes; profits are still maximized no matter how much they are taxed. Moreover, the social gains from a new variety, the extent to which the final good is produced more efficiently, decreases in the elasticity of substitution between intermediate good. This becomes immediately clear in the limit, with perfect substitutes, where a new variety does not improve final good productivity at all. In a model of exogenous innovation, patents should be taxed at 100 percent regardless of the degree of substitutability since profits from patents are pure rents. In our dynamic model of endogenous innovation, however, the profits are not pure rents, but rather a reward for innovative activity. A lower elasticity of substitution implies larger social gains from innovation. Since these gains are not fully captured by inventors-patents expire after one period, but the productivity gains remain forever-patents should be taxed less the less substitutability there is.
Third, capital accumulation should be subsidized in the long run. This is a straightforward implication of increasing returns. If capital and labour were to receive their marginal products, this would more than exhaust output. So they are paid less than their marginal products, implying, in particular, that capital receives a market return that is strictly less than its marginal social return. As in Chamley (1986) and others, capital accumulation should not be distorted in the long run, which in our case implies a negative corrective tax. Meanwhile, there are many potential reasons, not present in our model, why optimal capital taxes may be positive; for instance, Conesa et al. (2009) provide a few of them. Our result is therefore not a sufficient argument for advising policy-makers that capital taxes should be negative rather than zero or positive. Rather, what our finding suggests is that if endogenous innovation is an important engine of growth, then that provides a reason to tax capital less than if endogenous innovation is unimportant for growth.
Finally, we emphatically confirm the view that long-term growth rates have an enormous impact on welfare, and that therefore optimal policy has the potential (if the status quo is not optimal) to raise welfare by much more in a model of endogenous innovation than in an exogenous growth model. The welfare effects are greater, (i) the further the initial innovation taxes are from the ones on the optimal balancedgrowth path (BGP), (ii) the lower the elasticity of substitution between different intermediate goods, and (iii) the lower the intra-period innovation externality. In our baseline calibration, optimal policy leads to welfare gains corresponding to a 6.0% permanent increase in public and private consumption, 4 whereas it would be 1.8% in the corresponding exogenous growth model. With different initial conditions and parameter values, the welfare gains can exceed 100%. Our model thus suggests that the stakes from optimal policy are indeed much higher in the presence of endogenous growth driven by innovation, and potentially an order of magnitude greater.
Note that these welfare gains do not rely on an initial confiscation of capital because we assume that the initial capital tax rate is fixed and the time periods are so long that there is little temptation to impose a confiscatory rate in the second period either. Meanwhile, because exclusive rights to ideas only last for a single period and that new ideas can be used immediately (in the same period they were invented), there is no initial stock of ideas to tax.
Our paper is organized as follows. Section 2 situates our work in the context of the existing literature. Section 3 presents our model environment. Section 4 characterizes optimal policy, while Section 5 considers a calibrated version of the model and describes some quantitative features of the solution. Section 6 concludes.

Related Literature
Our contribution's place relative to the existing literature should be seen in light of the two main objectives of our paper that we highlighted in the Introduction: (i) we aim to understand the tradeoffs between innovation, labour in production, the accumulation of capital, and government spending needs, both in the short and in the long run, that determine optimal fiscal policy; and (ii) we want to evaluate the welfare effects of optimal fiscal policy reform, and specifically, whether these welfare effects are substantially larger than in an exogenous-growth framework.
In order to achieve the first goal, we need a model that clearly distinguishes the various choice margins, and a set of tax instruments such that for each margin there is exactly one instrument that directly affects that margin and nothing else; we call this a complete tax system. 5 In our model, the profit tax affects innovation, the labour tax affects production labour, and the capital tax affects the accumulation of capital.
If one of these tax instruments were missing, say the profit tax, then optimal labour and capital taxes would have to take into account how they affect innovation. With profit taxes available, these effects do not have to be taken into consideration, since the incentives to innovate can be directly chosen through profit taxes. 6 By studying these three taxes jointly, our analysis is different from the rest of the literature of which we are aware.
We also emphasize that, in our model, government consumption is valued and endogenous rather than useless and exogenous. That turns out to matter much more in this context than it does in an exogenous growth model à la Cass (1965). One obvious reason for that is that if government consumption were completely exogenous (as opposed to, say, an exogenously fixed fraction of output), it would, generically, either vanish as a share of output in the long run, or eventually exhaust output.
Less obviously but still crucially, if it were endogenous but useless, as in most of the existing related literature, then any policy that encourages growth would also automatically encourage useless government consumption, creating an artificial tradeoff between encouraging growth and avoiding waste. 5 It is not obvious that a distinction between various forms of income can be drawn cleanly in practice. Indeed, the issue is the subject of much legal controversy. In 2017, 85 venture capitalists in Sweden lost a highly publicized legal case against the government and were required to classify a large proportion of their earnings as labour income rather than capital income. See, for example, https://www.realtid.se/helena-stjernholm-forlorar-skattemal-igen. Correia (1996) and Reis (2011) consider incomplete tax systems. 6 In particular, the incentive to innovate is given by the tradeoff between leisure and the net profits from having a patent. Labour and capital taxes affect gross profits, but profit taxes can always be chosen in such a way to obtain any net profits desired, independent of the gross profits (as long as these are strictly positive). Note, however, that a complete tax system does not imply that the solution is Pareto optimal or first best. Furthermore, in a dynamic model, taxes have distinct incentive effects in the short and the long run, so allowing for time-varying tax rates is important. This is true even if one wants to consider only long-run (or only short-run) taxes. For instance, consider capital taxes in Chamley (1986). The government would like to heavily tax the initial capital endowment, but not tax capital in the long run at all. If the government could only tax capital at one time-invariant rate, then it would choose a moderate but strictly positive tax rate. This may or may not be policy relevant from a practical standpoint, but from the perspective of understanding the incentives involved in taxing capital, allowing only for time-invariant taxes would paint a misleading picture. By maintaining the assumption in Chamley (1986) that tax rates may vary over time, our approach stands out from the related literature.
When it comes to our second goal, it is important to point out that welfare analysis in a dynamic model requires consideration of both the short and the long run.
Consequently, it is compulsory in this context to compute not only the BGP but also the transition to it. (The only exception is if the model is such that the economy is always on a BGP. This is true in the Rebelo (1991) model, but emphatically not in the Romer (1990) model.) For instance, consider yet again the model in Chamley (1986).
There, a higher capital stock in the long run is desirable, but it comes at the cost of foregone consumption in the short and medium run. Therefore, a welfare analysis that only compares steady states would be incorrect. By computing fully-fledged transitions (as opposed to, say, considering only small perturbations from the BGP), we differ from all of the related literature on optimal taxation in innovation-driven endogenous growth models that we are aware of.
Another significant difference to the rest of the literature is the co-existence of monopolistic and competitive firms in the intermediate goods sector of our model-all other models have just one type of firm in the intermediate goods sector. While this feature is not essential for the questions we want to answer, we find it realistic, and it allows us to have finitely lived patents, limiting the severity of the timeinconsistency problem without having to make awkward assumptions about how market power is passed on to another firm after a patent expires. 7 Jones et al. (1993) analyze optimal taxation in the context of an endogenous growth model à la Lucas (1988), i.e. a model where growth is endogenous as a result of constant returns to cumulable factors. Because of their focus on the accumulation of human capital as opposed to innovative activity as the engine of growth, their policy prescriptions are quite different from ours. What they find is that, in the long run, both physical and human capital should not be taxed. The reason is that in their model, there are essentially only two endowments, the initial stocks of physical and human capital. These initial endowments should be taxed, leaving the long-run taxes (in the absence of externalities) at zero. By contrast, what we find is that while physical capital accumulation should not be distorted in the long run, both labour and innovative activity should be taxed in the sense that their after-tax returns are less than their marginal products. This is because there is an endowment of time for labour and innovation in each period, beyond the initial capital endowment.
Two more recent related papers are Croce et al. (2017) and Long and Pelloni (2017).
Both share the following feature: Even though the model framework is inspired by Romer (1990), it differs crucially in that next-period innovation and current-period intermediate goods production both use the current-period final good as the only input. This implies that the model is essentially of the Rebelo (1991) type, with constant returns to cumulable factors, implying the absence of any transitional dynamics. Because returns to cumulable factors and labour jointly are increasing, factor prices are not equal to marginal products, leading to a distortion that may require correction through taxes or subsidies. In particular, the fact that the final good is an input into the production of intermediate goods, and vice versa, creates a multiplier effect that is external to individual firms and workers.
In both Croce et al. (2017) and Long and Pelloni (2017), government consumption is wasteful and equal to an exogenously fixed share of output, so there is a negative externality associated with growth, giving rise to a rationale for taxing capital (or, equivalently, profits or innovation, since neither model distinguishes between capital and innovation). 8 The output multiplier from intermediate goods implies that the marginal product of capital is higher than its private returns, giving rise to a countervailing reason to subsidize capital. In Croce et al. (2017), but not in Long and Pelloni (2017), there is an optimal subsidy per unit of intermediate goods, so this effect disappears.
In Croce et al. (2017), the specification of (negative) congestion effects and (positive) spillovers in innovation is such that they are hardwired to be of the same magnitude. Meanwhile, congestion is assumed to happen contemporaneusly, whereas spillovers occur with a one-period lag. Consequently, as a result of discounting, the net externality associated with innovative activity is necessarily negative. In Long and Pelloni (2017), there are no innovation externalities, but the authors only allow for time-invariant taxes, so capital taxes are also used to partly expropriate the initial capital stock. 9 Annicchiarico et al. (2020) is in essence highly similar to Long and Pelloni (2017) and Croce et al. (2017), but it contains a negative "business stealing" externality associated with innovation.
One important feature of Croce et al. (2017) is aggregate shocks, which allows them to perform policy analysis over the business cycle. In this respect they stand out from the rest of the literature, including this paper, where the model is deterministic. Cozzi (2018) aims to provide a credible quantitative assessment of the magnitudes of taxes, and therefore incorporates realistic features such as overlapping generations, idiosyncratic productivity shocks, and non-linear labour taxes. Innovation is modelled as in Howitt and Aghion (1998), an extension of Aghion and Howitt (1992) that includes capital. Aghion et al. (2013) study optimal taxation in a model based on Aghion and Howitt (1992). They rule out profit taxes (or R&D subsidies as they call them) as infeasible.
Taxes are forced to be time-invariant and the authors do not compute a transition.
Capital and labour taxes both affect the growth rate, through a so-called market size effect: higher taxes imply lower profits for inventors and hence reduce the growth rate. (This effect would have been present in our model too if we had ruled out profit taxes.)

The Model Environment
Our model is a modified version of the one presented in Romer (1990). A final good is assembled by competitive firms from a variety of intermediate goods, which are imperfect substitutes. Blueprints for new varieties of intermediate goods can be invented, and these new varieties are the source of endogenous growth. One important difference in relation to the Romer model is that patents for newly invented varieties do not last infinitely long, but only for one period (which represents 20 years in our model). That is, new varieties are produced by monopolistic firms, and already existing varieties are produced by perfectly competitive firms. This assumption is more relevant empirically, as we argued above, but also makes the model's policy analysis more tractable and better suited for numerical analysis. The point here is that, under our assumption, the tax on innovation in period t directly affects innovative activity in period t only, as opposed to innovative activity in all previous periods s ≤ t. If in this respect we had stuck to the specification of Romer (1990), the first-order conditions would have been plagued by an overwhelming abundance of derivatives.

Final Goods Producers
There is a large number of perfectly competitive firms that produce the final good, which can be used for private consumption, investment in capital goods, and government consumption. A representative firm produces output Y t according to the following production function: where Q z,t is the quantity used of intermediate input z, which is a continuous indicator; 0 < σ < 1 is a parameter that governs the degree of substitutability between these inputs, and 1/(1 − σ) is the elasticity of substitution. The representative firm aims to maximize its profits (which are zero in equilibrium, due to perfect competition and constant returns to scale). Profits are given by where p z,t is the price of intermediate input z, which each final-goods producer takes as given. As in the original Romer model, not all of these potential varieties are available at any time period, and we set the price of any unavailable input to infinity so that it is not used at all. This means that the integrals in Equations (2) and (3), in spite of appearances, in fact all have a finite upper limit defined by the stock of ideas already invented. The first-order condition for any available input yields its inverse demand function:

Intermediate Goods Producers
Each variety of intermediate good can potentially be produced by a large number of identical firms and they produce output Q z,t according to a Cobb-Douglas production function: where K z,t is capital used by producer of variety z and L z,t is labour. Each firm maximizes its profits per period: where r t and w t are the rental rate of capital and the wage rate for labour, respectively. Each firm takes these and the inverse market demand function derived above as given. Cost-minimization implies that marginal costs MC z,t for all firms are For the measure of varieties that has been newly developed, Z t (see below), there is only one firm for each such variety that holds the patent and hence monopoly rights for the current period. For all other varieties, there is a large number of perfectly competitive firms producing each intermediate good, and, due to constant returns to scale, profits are zero.
For monopolistic firms profit maximization yields For competitive firms the market output and price are respectively

Blueprint Producers
In Romer, this is called the research sector, in which blueprints are produced through human capital (skilled labour) H, and we keep this notational convention. However, one could also think of entrepreneurs developing new business ideas, and we will occasionally use this language. Using skilled labour (or perhaps entrepreneurial effort) h t , each inventor or entrepreneur can develop a measure of new ideas Z t according to the function where Z t is the current stock of ideas, h t is individual innovative effort, H t is aggregate innovative effort, η > 0 is a parameter that governs the productivity of the innovation process, and 0 ≤ Λ < 1 is a parameter that determines congestion in innovation. When Λ = 0 as in Romer, then there is no concurrent innovation externality at all, and the productivity of one innovator does not affect the productivity of the others at all. As Λ approaches one, there is a fixed measure of ideas developed in each period, and innovative activity becomes purely rent-seeking. In any case, the stock of current ideas can be used by all and there is no depreciation of ideas, so we have As in Romer, we assume that a patent is granted for each new blueprint, but unlike Romer, this patent expires after one period. Researchers auction off their blueprints and the accompanying monopoly rights to the highest bidder (or entrepreneurs sell their new businesses to large investors/corporations), for a price P t . We assume that new blueprints can be immediately put to use. This means that the measure of available ideas in period t is the sum of the measure of old ideas Z t and the measure of new ideas Z t so that, for instance, Equation (2), defining the output of the final goods sector, becomes

Private Households
There is a measure one of identical consumers. We focus deliberately on the case of identical consumers, so as to highlight the efficiency effects of optimal fiscal policy with growth through innovation. Naturally, fiscal policy also affects the distribution of income, and it is an interesting and important question to study the distributional effects of fiscal policy with growth through innovation. That is a different research question for future work, though. The representative consumer takes prices and taxes as given and maximizes the following lifetime utility function: u(c t , l t , h t ) is a period utility function that is strictly increasing and concave in consumption c t , and strictly decreasing in labour l t and entrepreneurial effort h t .
The subjective discount factor is denoted by β ∈ (0, 1). Government consumption is denoted by G t and v(·) is a strictly increasing and concave function, with lim G→0 v G = ∞. All lower-case variables are individual level variables, and uppercase variables are aggregates. Because the population is constant, it may as well be equal to one, and hence there is no need to use the notation g t for per-capita government consumption. In equilibrium, we will have C t = c t etc.
We assume that preferences are consistent with balanced growth, as described in King et al. (2002). A household's budget constraint for each period t is: where the tax on labour is denoted by τ l,t , the rate of return on assets is R t , and a t are asset holdings. Initial asset holdings a 0 are exogenously given.
Utility maximization with respect to private consumption c t , labour supply l t , entrepreneurial effort h t , and next-period assets a t+1 implies the familiar labourleisure trade-off and "Euler" equation characterizing the optimal trade-off between consumption in the current period versus the next (subscripts denote partial deriva- Equations (11), (12), (13), and (14), together with initial asset holdings and no-Ponzischeme conditions (which we leave out to avoid tedious notation) characterize consumer behaviour.

Investors
Investors collect savings on behalf of consumers and invest them in capital, blueprints, and government bonds B t in such a way as to maximize their profits (which are zero in equilibrium due to perfect competition and constant returns to scale). 10 Capital returns and profits are taxed at rates τ k,t and τ z,t , respectively;r t is the net return on government bonds. The representative investor's profit maxi- where A t = a t , as noted earlier, δ is the capital-depreciation rate, and K t is the total amount of capital rented out.
The first-order conditions imply the usual no-arbitrage conditions and determine the price of a blueprint: Equations (16), (17), (18), and (19) characterize the investors' behaviour. We refer to Equation (16) as the asset-market clearing equation.

The Government
The benevolent government maximizes the utility of its representative consumer.
It finances endogenous government consumption {G t } ∞ t=0 through taxes on capital income and profits, as well as through labour income taxes. Capital taxes are announced one period in advance, so the initial capital tax τ k,0 is exogenously specified. Because of the length of the period, this almost entirely removes the temptation to tax the initial endowment of capital. Governments may also issue one-period debt (subject to a no-Ponzi-scheme condition, which we omit here). Initial government debt B 0 is exogenously given. We impose an upper limit of 100% on capital and profit taxes (although neither is binding in our numerical analysis). The government's per-period budget constraint is

Market Clearing
We have already implicitly assumed market clearing for intermediate goods, government bonds, and patents. Asset-market clearing is also inherent in the investors' problem. What remains is capital market clearing and labour-market clearing where K C z,t and L C z,t are, respectively, the capital and labour demand of a competitive firm, while K M z,t and L M z,t are the capital and labour demand of a monopolistic firm. From the intermediate firms' cost minimization (details of the derivations are in Appendix B) we obtain the market-clearing prices We can write total output as a function of aggregate capital, aggregate labour, and the stock of ideas as Note that "total factor productivity" in our model is ( , a more complicated expression than in other models of endogenous growth. The reason is that the markets for some intermediate goods are monopolistic, while others are competitive, so there are different amounts produced of different varieties.
We can express a monopolist's profits as Final-goods market clearing is implied by Walras' Law: In Appendix A, we characterize a decentralized equilibrium as a function of (arbitrary) government policy.

Optimal Policy
The government announces its policy at time zero and is able to commit perfectly.
Households, firms, and investors react optimally to this, as described above. The government can perfectly foresee how the private sector will react to its policies, and so it can use private-sector decisions as choice variables if it includes the privatesector optimality conditions as constraints. The Lagrangian is with the following set of choice variables: We substituted outr t and replaced it with R t , thereby eliminating Equation (18) as a constraint. We also have to keep in mind that r t , w t , and Π z,t are functions of K t , L t , Z t , and Z t as described in Equations (21), (22), and (23): We now proceed to derive optimal capital, labour, and innovation taxes on a BGP.
While our numerical analysis computes both the transition and the BGP, we focus for our analytical derivations on the BGP. For the remainder of this section, we assume that the economy converges to a non-degenerate BGP, along which private and public consumption as well as capital grow at a constant positive rate. In Section 5 we are able to confirm that, for a wide range of empirically relevant parameter values, the economy does indeed converge to a non-degenerate BGP. We conclude that, for practical purposes, the phenomenon described in Straub and Werning (2020) does not arise in the context of our model. All proofs are relegated to Appendix C.

Optimal Capital Taxes
We start with a characterization of optimal capital taxes: Proposition 1 If the economy converges to a non-degenerate BGP, then the modified golden rule holds. The capital share, net of taxes, of total income is given by By ∂Y t /∂K t we mean, of course, and similarly for other partial derivatives of Y t .
Proposition 1 says that the private return to capital should be equal to the social return to capital. In other words, the post-tax share of capital in total income should be equal to the contribution of capital to total output, by which we mean the elasticity of output with respect to capital. Importantly, this result is independent of the relative value of government funds, i.e. it does not depend on the Lagrange multipliers ψ t (associated with the government budget constraint) versus θ t (associated with the household budget constraint). The return to capital should be undistorted, in the sense that Equation (26) also characterizes the first best. 11 Corollary 1 Capital taxes on a non-degenerate BGP are negative.
Corollary 1 makes clear that the return to capital being undistorted does not imply zero capital taxes. Capital taxes are not used to raise revenues, though; capital tax revenues are merely incidental. Rather, capital taxes are used in a corrective way, to align the private and public returns to capital, such that the capital provision is efficient. Due to constant returns to scale in capital and labour and non-zero profits, the rate of return on capital in the absence of taxes is smaller than the marginal product of capital. Therefore, negative capital taxes are used to align these two rates.
The negative capital tax result fits in with a broader result: in a wide set of models, capital is special in the sense that if all factors of production can be taxed independently, then in the long run the after-tax share of capital should, just as in our model, equal the elasticity of output with respect to capital. This finding is a further generalization of the results in Atkeson et al. (1999), though the precise statement is not 11 In this context, our definition of "first best" is the welfare-maximizing competitive equilibrium allocation when lump-sum taxes, but not good-specific taxes and subsidies, are available. See also Footnote 7 on page 10. that the capital tax should be zero in the long run. It is that, in the long run, the capital allocation should not be distorted and that therefore capital should not be taxed in order to raise revenue. Of course it does not follow that the optimal (long-run) tax rate on capital is zero. Indeed in our framework it is negative. Depending on the details of the model framework, the capital taxes that implement efficient intertemporal choices may be positive or negative or zero. An example of optimal (long-run) zero capital taxes is Chamley (1986), while an example of (potentially) positive capital taxes is Erosa and Gervais (2002). In any case, such non-zero optimal capital taxes are Pigouvian (corrective) in nature, and independent of the government's need to finance public spending (in contrast to other taxes, see below).
What can explain this special nature of capital? Optimal taxation seeks to tax endowments. In our model, there is an endowment of labour in each period, an endowment of entrepreneurial effort in each period, but only one endowment of capital, the initial capital stock. A government that does not have access to lump-sum taxes will raise revenues by taxing labour and entrepreneurial effort in each period, but will only seek to tax the initial capital stock-in the long-run, capital taxes are thus not used to raise revenues.
This logic relies on a complete tax system, where each factor of production can be taxed independently. Taxes on labour income do not directly tax the time endowment for labour, otherwise they would be non-distortionary, but they are better targeted to tax the labour endowment than capital or profit taxes. Similarly, taxes on profits do not directly tax the time endowment for entrepreneurial effort, but they are better suited than other taxes. The tax system can be incomplete for two reasons: (i) one of the factors of production cannot be taxed, as in Correia (1996), or (ii) the government cannot distinguish betweeen income from at least two of these factors of production, as in Reis (2011). If, for example, the government could not tax labour in our model, then it could use profit and capital taxes to indirectly tax the time endowment for labour. Or if the government could not distinguish between labour and capital income in our model, then the joint tax would be a compromise between the desire to raise revenues from the labour endowment and the desire to not distort the long-run capital allocation.

Optimal Labour Taxes
Proposition 2 If the economy converges to a non-degenerate BGP and if cross-derivatives of the period utility function are identically equal to zero, the labour share, net of taxes, of total income is given by where ψ t := ψ t /(β t u c,t ) and θ t := θ t /(β t u c,t ), which are constant on a BGP, and ϵ L,t := u LL,t · L t u L,t is the elasticity of the marginal disutility from working. In Appendix D we provide a characterization of the optimal labour share, net of taxes, of total income with non-zero cross-derivatives. Naturally, the government takes into account how an increase in hours worked affects the marginal utility of consumption (and thereby the incentives to work, innovate, and save) and the marginal utility of innovating. We think that incorporating these additional terms does not add much to the intuition behind our results, and so we will stick to the simpler case of zero cross-derivatives (and hence an intertemporal elasticity of substitution of one).
Lemma 1 If the economy converges to a non-degenerate BGP and if cross-derivatives of the period utility function are identically equal to zero, as long as lump-sum taxes are unavailable, then ψ t > θ t , and The left-hand side of Equation (27) is the after-tax labour share of total income, and it is equal to the contribution of labour to total output (or the elasticity of output with respect to labour), divided by a term that is always greater than one as long as lumpsum taxes are unavailable. The elasticity of the marginal disutility from working, ϵ L,t , is always positive, since u L,t < 0 and u LL,t < 0. Together with Lemma 1, it follows that the denominator of the right-hand side of Equation (27) is greater than one. Therefore, the net return to labour will always be less than its contribution to output, and it is in this sense that labour supply is distorted, unlike capital.
Also unlike capital, the labour allocation depends on the marginal excess value of public funds, ψ t − θ t . Equation (27) suggests that the higher it is, and hence the lower θ t / ψ t , the lower is the after-tax share of labour. All objects in Equation (27) are endogenous, so we cannot conduct traditional comparative statics, but it suggests a relationship between the after-tax share of labour and the marginal excess value of public funds that is confirmed in all our numerical analysis. When lump-sum taxes are available, then ψ t / θ t = 1, and ψ t = 1, so that the after-tax share of labour is equal to the contribution of labour to total output. 12 When lump-sum taxes are not available, then ψ t / θ t > 1, and the after-tax share of labour is lower than the elasticity of output with respect to labour; how much lower also depends on the elasticity of the marginal disutility from working, ϵ L,t : the greater this elasticity, the lower is the after-tax share of labour (subject to the same caveat regarding comparative statics as above). To understand this, note that if the marginal disutility from working were constant, and thus ϵ L,t = 0, then a lower labour supply, resulting from a lower net wage due to higher labour taxes, would not relax the constraint with Lagrange multiplier µ t , which is for the constraint regarding the household's optimal labour supply. When ϵ L,t > 0, however, then the marginal disutility from working decreases as a person works less (or the marginal utility from leisure decreases as the person consumes more leisure). This implies that higher taxes, which result in a lower labour supply, would indirectly relax the labour-supply constraint; the government takes this into account and consequently sets higher labour taxes, the larger the elasticity of the marginal disutility from working.

Optimal Profit Taxes
Proposition 3 If the economy converges to a non-degenerate BGP and if cross-derivatives of the period utility function are identically equal to zero, the profit share, net of taxes, of total income is given by where ϵ H,t := u HH,t · H t u H,t is the elasticity of the marginal disutility from providing entrepreneurial effort. In Appendix D we provide a characterization of the optimal profit share, net of taxes, of total income with non-zero cross-derivatives.
In Equation (29) we can see that the after-tax share of profits in output is equal to the elasticity of current and future output with respect to an additional variety of intermediate goods (the numerator of the right-hand side), divided by a term similar to the one for the after-tax labour share, and multiplied by 1 − Λ. We focus first on the term 1 − Λ: When Λ approaches 1, and technical progress is independent of research activity, then all research effort is rent-seeking in nature, and the after-tax share of profits should be zero. The smaller this research externality, the larger the after-tax share of profits.
The numerator of the right-hand side of Equation (29) contains three terms: The first, ∂Y t /Y t ∂ Z t / Z t is the direct impact of a new variety on contemporary output produced by a monpolistic firm (expressed as an elasticity). The second term, . An additional variety today results in one more variety produced by competitive firms in the next period, and in fact in all future periods.
, and reflects the fact that an additional variety today makes producing additional varieties in the next period, and again in all future periods, easier; one should note that the production of new knowledge scales linearly in current knowledge. All together, the numerator captures the effects of researching an additional variety on current and properly discounted future output.
Similar to the labour-tax share, Lemma 1 together with the fact that ϵ H,t ≥ 0 implies that the denominator of the right-hand side is always greater than one, and that hence the after-tax share of profits is always lower than the contribution of entrepreneurial effort to current and future output. When lump-sum taxes are available, then the denominator is equal to one, and entrepreneurial activity is not being "taxed;" when public funds are relatively scarce, and ψ t > θ t , then the denominator is strictly greater than one, and entrepreneurial activity is being taxed, in the sense that the private returns to effort are lower than the social marginal product. However, there are three externalities: (i) the contemporary one captured by Λ, (ii) the intertemporal one that a patent lasts one period, whereas the new variety lasts forever, and (iii) the intertemporal one that future research builds on current research.
This means that actual taxes on profits may be positive or negative, depending on whether the private return to innovation is greater than the social return or lower.
Using the same reasoning (and caveats) as for the labour-tax share, the after-tax share of profits is decreasing in the marginal excess value of public funds, ψ t − θ t , and the elasticity of the marginal disutility from innovation ϵ H,t . The taxation of innovation thus follows the same logic as labour taxation and not capital taxation.
What matters is not whether a factor of production has an intertemporal dimension to it: taxes on innovation clearly distort the intertemporal tradeoff. What matters is taxing endowments: there is one initial capital endowment, which calls for high initial taxes on capital, but then taxes in the long run should not distort the capital allocation; however, there is a labour time endowment and an innovation time endowment in every period, so these factors should be taxed in the short and the long run. As we show for capital and innovation, whether the government aims to tax a factor or not, the tax rates that implement the allocation may be positive, zero, or negative.
One factor that affects these actual tax rates, but that is not immediately visible in Equation (29), is the degree of substitutability between different types of intermediate goods, determined by the parameter σ. The elasticity of substitution is 1/(1 − σ) so that intermediate goods are better substitutes the closer σ is to one. The terms ∂Y ∂Z t and ∂Y ∂ Z t are a function of σ. Generally speaking, the higher σ, the lower these derivatives: a low substitutability implies that a new variety cannot be easily replaced by an already existing one, and hence a new variety increases final good productivity by more than with a high substitutability. If a higher σ implies that the right-hand side is lower, then the left-hand side needs to be lower, too. This can be achieved by higher profit taxes. As before, the qualifier holds that all variables change with σ; however, our numerical analysis confirms this relationship between profit taxes and σ. The intuition is that the less substitutable intermediate goods are, the greater is the social gain from another invention. Because patents expire after one period, these gains are not fully captured by inventors. Hence patents should be encouraged more, i.e. taxed less or subsidized more, the less substitutability there is.
Based on the findings in Jones and Williams (1998), it is at least arguable that the most plausible case empirically is that innovation should be subsidized.

Numerical Results
In this section we evaluate the quantitative importance of optimal taxation on growth and welfare and conduct some numerical experiments.

Parameterization
We follow convention and set the parameter α in the production function of intermediate goods to 1/3. An important parameter is σ, which determines the degree of substitutability between different intermediate goods. As σ approaches one, these become perfect substitutes. We set σ = 6/7, which corresponds to an elasticity of substitution of 7, which fits within the range of estimates from the international trade literature; see for instance Anderson and van Wincoop (2004)[p.33], who write: "Overall the literature leads us to conclude that [the elasticity of substitution] is likely to be in the range of 5 to 10." One period is 20 years, which conforms roughly to the length of a patent life. This choice has several advantages: (i) it keeps the number of periods before convergence to a BGP low, and thus facilitates computation of the equilibrium; (ii) patents last exactly one period, which keeps both the analytical and computational work more tractable, since only current profits affect the choice to develop new patents; (iii) most importantly, there is no incentive to tax the existing stock of ideas. We set the depreciation rate δ to 0.8, which conforms to a yearly depreciation rate of about 8%. We assume that utility takes the form so that the intertemporal rate of substitution is equal to one. We set the Frisch elasticity of labour supply to 0.5, which corresponds to ϵ = 3. A key parameter is Λ, which we set arbitrarily, but not unreasonably to 0.5. It seems very difficult to pin down what this parameter should be. Instead of trying to calibrate it to a specific, but in our opinion still arbitrary target, we conduct extensive sensitivity analysis regarding this parameter.
The remaining parameters, e L , e H , g, β, η, and the initial debt B 0 and assets A 0 are calibrated by computing a "pre-initial BGP", which satisfies all of the government's first-order conditions for optimality, except for capital and profit taxes (which we fix to 40% each) and next-period government debt. On a BGP, the latter condition is redundant, and conceptually speaking government debt is not pinned down on a BGP, as it is determined by initial conditions. We thus replace the FOC for debt by a calibration condition, that the debt-to-GDP ratio on the BGP is equal to its target.
We set the following targets: (i) time worked in the production sector is L = 1/3, which is a normalization; (ii) time worked in innovation is H = 1/3 · 1/9, since one tenth of the workforce in the United States is self-employed, which we take as a rough measure of innovators, but this is again a normalization; (iii) the ratio of government revenues to GDP on the pre-initial BGP is 30%; (iv) the yearly interest rate is 4% per annum, roughly corresponding to a value of 1.2; (v) the growth rate of the economy on the pre-initial BGP is 2% per year, which is equivalent to about 0.5; (vi) the debt-to GDP ratio on the pre-initial BGP is 0.03, which corresponds roughly to 60% on an annual basis, which is the ceiling of the European Stability and Growth Pact.
We then have 20 constraints and 20 variables from the modified government's op-timal tax problem (the constraints on maximum taxation are not binding on an interior BGP, so we do not include the constraints and the Lagrange multipliers); in addition, we have eight "target" equations, for capital and profit taxes, skilled and unskilled labour, the ratio of government revenues and government debt to GDP, the interest rate, and the growth rate of the economy. All together, this results in 28 equations and 28 variables, and we solve this using the Broyden method. Due to constant returns to scale in capital and labour, the initial amount of assets A 0 follows from this, and does not require an extra target.
Our numerical procedure for the optimal policy is similar: with the parameters and initial conditions given, we compute the transition and BGP simultaneously. The optimal-taxation problem generally depends on initial conditions, and it is thus not possible to compute a steady state or BGP independently of the transition. We use the Broyden method to solve a system of 24 equations and 24 variables per period (the cap on profit taxes is never binding, so we do not include the constraint and Lagrange multiplier in this), for T = 25 periods. This corresponds to a time frame of 500 years and is enough for (approximate) convergence to a BGP. We force variables at period T + 1 to be equal to their counterparts in period T, adjusted for growth.
We provide the details in our computational appendix.

Baseline Results
For our baseline calibration, we show our results in Figure 1. Capital taxes in period zero are at their exogenously imposed level, 40%, drop to 8.5% in the first period, and by the second period the convergence to the BGP level of -17.6% is almost complete. The reason why capital is not more strongly taxed in the first period is because the period length is twenty years, which means that the anticipation effects are very strong. Unlike in Atkeson et al. (1999) capital taxes do not immediately converge to their BGP level after one intermediate period, since optimal capital taxes also depend on growth: The difference between the private and the social returns to capital are a function of innovation activity. Notice that the economy does indeed converge to a non-degenerate BGP, meaning that the phenomenon described in Straub and Werning (2020) does not arise.

Varying the Research Externality, Elasticity of Substitution, and Value of Public Funds
We now turn to examine the importance of our calibration choices for our results.
As our theoretical work has shown, the intraperiod research externality Λ plays an important role, as well as the elasticity of substitution between varieties 1 − 1/σ, and the value of public funds ψ − θ, which is mainly influenced by the parameter g and hence our calibration target for the ratio of government revenues to GDP. In the following exercises, we recalibrate all parameters. We start with the time path of optimal fiscal policy for a higher value of Λ and then turn to the innovation taxes on a BGP for various different combinations of Λ, σ, and the calibration target for the ratio of government revenues to GDP.
As shown in Figure 3, the temporal pattern stays the same when we increase the value of Λ to 0.75 as compared to 0.5 in the baseline, but the profit taxes are now positive. Labour taxes are significantly lower, since the subsidies to research no longer have to be financed, and instead some revenues are generated from profit taxes. Capital taxes, on the other hand, are very similar to the baseline, with two minor differences: (i) the capital tax in period one is a bit lower at 5.4%, because the need to tax the initial capital stock is lessened, due to the revenues from profit taxes; (ii) capital taxes on the BGP are a bit higher, at -17.0%, since growth is lower, hence the fraction of monopolistic firms is lower, and thus the difference between the social and private returns to capital is smaller. The first observation we can make is that profit taxes are higher, the stronger the preference for public spending. This may seem like an obvious result, but it is worth repeating in this context that long-run capital taxes do not depend on this preference. The optimal capital taxes do vary, but this is because the difference between the marginal product of capital and the private returns to investors depends on the profits by patent-holders. These profits depend in turn on the degree of substitutability between varieties, but also the tax policy, in particular profit taxes.
Capital taxes thus follow qualitatively the same pattern as profit taxes, but as our analytical derivations made clear, this same pattern is due only to the fact that it affects the difference between public and private returns to capital. Moreover, the quantitative impact is small compared to profit taxes: BGP capital taxes vary for all these different parameter values between -16.8% at their highest and -20.8% at their lowest, while profit taxes are -133.8% at their lowest and 87.4% at their highest.
What explains this difference? Both technology and capital accumulate over time, but profit taxes are taxing the time endowment of inventors/entrepreneurs, while capital taxes tax the initial capital endowment; after the initial taxation of the capital endowment, capital taxes are only used to achieve the modified golden rule, and are thus independent of public spending preferences, subject to the same caveat as in the previous paragraph. Since there is no capital endowment beyond the initial one, there is no tradeoff between taxing an endowment and efficiency for capital taxes, and long-run capital taxes are purely used for maximum efficiency. For profit taxes, there is no initial endowment of patents to tax, but the time endowment of inventors can be taxed in every period, including on the BGP, at the expense of a reduced efficiency in research activity. Profit taxes thus depend on the preference for public spending, as there is always a tradeoff between taxing an endowment and efficiency -profit taxes are in this regard more akin to labour taxes than capital taxes.
The second observation is that the higher the negative intra-period research externality Λ, the higher are profit taxes. This is immediately intuitive. At the extreme, as Λ approaches one, the aggregate number of new varieties is independent of the time spent by inventors, and inventors are merely fighting over who patents it first.
In this case, profits by inventors are economic profits and should be taxed away completely; if we had an exogenous growth model where some firms have market power (corresponding to the limit of Λ → 1), a tax on profits is a lump-sum tax. As the graphs illustrate, the desire for public spending is also less important, the higher Λ is; this is because profit taxes distort less the number of new varieties invented, the higher Λ is. At the limit of Λ → 1, there is no distortion at all, and profit taxes of 100% are optimal, no matter what the public spending preferences are.
The third observation is perhaps a little counter-intuitive at first: profit taxes are increasing in the degree of substitutability between goods, which implies that as firms with patents have more market power and charge larger markups, profit taxes are lower. At first sight, this seems exactly the opposite of what a government should do: casual intuition would suggest that the higher the profits due to market power, the higher the profit taxes. In our model, a lower elasticity of substitution between varieties (and hence more monopoly power for patent holders) implies that a new variety adds more to the final good than with a higher elasticity. At the limit, with perfect substitutes, a new variety does not improve final-good productivity at all.
So far, we can thus state that a lower elasticity leads to more private gains for the inventor (through higher profits), but also to higher social gains (via increased productivity). Profit taxes are increasing in the degree of substitutability, because inventors reap the private profits for only one period, the twenty years of the patent length, whereas the benefits last forever.

Welfare Gains from Optimal Policy
We now explore the welfare effects of optimal policy for various parameter values and different pre-initial capital and innovation taxes. The pre-initial BGP is calculated in exactly the same way as described earlier for the calibration, which means that we recalibrate all the parameters in each of the scenarios that we are considering. As of some period t = t 0 , the government may choose policy optimally (though the initial period's capital tax is given by history to avoid a confiscatory rate). Our welfare gains are computed by comparing the utility of the representative agent with an optimal fiscal policy to the utility if the economy stayed on the pre-initial BGP. We express these welfare gains as the permanent percentage level increase in private and public consumption required to make the pre-initial BGP achieve the same utility as the path implied by optimal policy.
In order to be able to make a reasonable comparison to the welfare gains from optimal policy one would obtain in an exogenous growth model, we consider an economy in which all firms are competitive, there is no skilled labour H, and in which the technological growth rate is fixed. For details, see Appendix E. The calibration for this economy follows the same logic and the same targets as for the endogenous growth model. In all of the following, the capital taxes are set to 40% on the pre-initial BGP.
We start with our baseline profit taxes of 40% on the pre-initial BGP and show the results in Figure 5. The welfare gains in an exogenous growth model are naturally constant as we vary the parameters Λ and σ, at 1.8%. In our endogenous growth model, the welfare gains from optimal policy depend very much on the parameters.
For lower levels of Λ and σ, the welfare gains are decreasing in Λ and σ, but then the relationship reverses. What can account for this non-monotonic relationship?
It comes from the profit taxes on the pre-initial BGP: the closer these taxes are to the optimal profit taxes on the optimal BGP, the lower are the welfare gains from optimal policy. Profit taxes of 40% on the BGP are optimal when Λ is about 0.8, so that is roughly where we see the turning point. 14 That is also roughly the point where the role of σ reverses: 15 When profit taxes on the pre-initial BGP are suboptimally high, then the gains from lowering these to the optimal level and increasing growth is larger the lower σ is (since a lower σ implies larger gains from innovation).
However, when profit taxes on the pre-initial BGP are suboptimally high, then the optimal tax reduces the growth rate compared to the pre-initial BGP and hence the welfare gains are smaller when σ is lower.
Another observation we can make is that in our endogenous growth model, the welfare gains from optimal policy are larger than in the exogenous growth model, and in some cases an order of magnitude larger. In our benchmark specification, with Λ = 0.5 and σ = 6/7, the welfare gains are 6.0%. In the most optimistic case, when Λ = 0.05 and σ = 4/5, the welfare gains are 26.3%, more than 14 times larger than with exogenous growth; at the same time, the most pessimistic case yields welfare gains that are in the vicinity of those in an exogenous growth model, 2.8%.
This last result is to a large extent due to our assumption that there is no initial stock of patents to tax; in a model as in Romer (1990), where patents are infinitely-lived, there would be substantial welfare gains from expropriating the owners of existing patents.
14 It is tempting to think that the turning point should be precisely where optimal profit taxes on the BGP are 40%, but that is not quite correct. Optimal profit taxes also depend on capital taxes and the level of debt, which differ on the optimal BGP as compared to the pre-initial BGP. 15 Again, this is not precisely the point of reversal, as the optimal profit tax on the BGP also depends on the value of σ. We now alter the profit taxes on the pre-initial BGP, to 0% in Panel (a) and 80% in Panel (b) of Figure 6. When pre-initial profit taxes are low, then the potential gains from optimal fiscal policy for low levels of Λ are naturally diminished, as there is not much to be improved on. When pre-initial profit taxes are high, and 80% already seems like a fairly high number, then the potential welfare gains from optimal policy are much larger; in this case, and with the most optimistic values of Λ and σ, the welfare gains rise to an astonishing 63.6%. This is in terms of public and private consumption, as an equivalent rise in private consumption only, it would be 87.5%.
The welfare gains from optimal fiscal policy can thus reach remarkable heights. 16 16 We have used a conservative estimate for the Frisch elasticity of labour supply, 0.5. For a higher value of the elasticity, 1.0, the welfare gains almost double for values of Λ close to 0; for values of Finally, what are the implications of optimal policy for growth rates? In Figure 7 we show the economy growth rates as a function of Λ, and the two different values of σ, when profit taxes on the pre-initial BGP are 40%. The point where optimal growth rates are starting to be below the exogenous growth rates corresponds (roughly) to the turning point mentioned above, where the optimal profit taxes on the BGP are higher than the profit taxes on the pre-initial BGP.
Λ close to 1 the welfare gains are essentially the same. The intuition is that, for lower values of Λ, optimal policy increases the incentives to innovate by reducing the taxes on innovative labour. The higher the Frisch elasticity of labour supply, the larger the labour supply effects of these incentives. When research congestion is not too strong, this leads to an increase in the growth rate. While the welfare effects of a higher Frisch elasticity may be substantial, optimal policy remains essentially the same as with a lower elasticity. The long-run growth rate under optimal policy is 2.21% with an elasticity of 1.0, as compared to 2.14% with an elasticity of 0.5. In our benchmark case of Λ = 0.5 and σ = 6/7, the growth rate increases to 2.14% per annum, up from 2.00% in the exogenous growth case. In the most optimistic case, the annual growth rate would be 2.62%, which is a more substantial increase.
Not surprisingly, the welfare gains from optimal policy accrue mostly in the long run. When the pre-initial profit taxes are 80%, then the long-run growth rate can be as high as 3.17% (this result is not shown in the graphs), while the exogenous growth rate is calibrated to be the same, i.e. 2.00%.

Conclusion
We have studied optimal taxation in a variant of the Romer (1990) model. We found that capital should be subsidized and that whether innovation should be subsidized or not depends on (i) government spending needs, (ii) the extent to which innovative activity is merely rent-seeking, and (iii) the substitutability between different varieties of intermediate goods. For a reasonable baseline specification, innovation should be subsidized rather than taxed. We also find that the welfare gains from optimal reform are much greater than in an otherwise similar exogenous growth model.
Our findings provide a new perspective on the results of Lockwood et al. (2017), who consider corrective taxation in a world consisting of several professions, each of which comes with its unique combination of non-pecuniary benefits and (negative or positive) externalities, i.e. the extent to which they are over-or underpaid relative to their contribution to society. In particular, they assume that research related activities are not fully rewarded relative to their contribution to society. They conclude that enormous gains are associated with subsidizing research professions.
We are not necessarily unsympathetic to that conclusion, and it may be correct in the light of our model too. Nevertheless, our work provides a warning against jumping to the conclusion that an activity should be subsidized from the observation that it yields enormous gains for society as a whole. In our environment, innovative activity is the ultimate source of long-term growth, and so it yields enormous benefits to society. Nevertheless, because of the crowding effect that we allow for, these benefits may to a large extent be given and unresponsive to the total amount of time and effort devoted to innovation, in which case innovative activity should be taxed rather than subsidized. In any case, we view their work as complementary with ours, each providing, we hope, a useful perspective on the other.
Our work suggests a number of avenues for further research. For instance, in our model, the household consists of an innovator and a worker who share consumption. Suppose instead that households are heterogeneous; some are innovators, some are workers. How would that affect our conclusions? There would then likely be a trade-off between efficiency and equity, to the extent that innovators earn on average more than workers. Moreoever, we assumed that ideas are exclusive for one period, which in our model represents 20 years. An interesting question that we did not address is whether this patent length is optimal. What are the tradeoffs that arise from shortening or lengthening patent lives, especially in conjunction with optimal fiscal policy? Could everyone be made better off by extending or shortening patent life? Finally, how could our framework be extended to deal with a multi-country world? In such an extended framework, how would tax competition between governments affect growth?
The Romer (1990) model has been criticized for being at odds with the data, in particular that it exhibits a strong scale effect: a doubling of the population would more than double the growth rate. In our model, the scale effect is present, but much weaker, due to our assumption of a research congestion. Jones (1995) proposes a modification of the Romer model that eliminates the scale effect. The long-run growth rate is then by construction independent of policy. Nevertheless, it doesn't follow that policy analysis in such an environment is meaningless. Policy may (and probably does) matter along the transition of such a model. Since we are able to compute transitions in the context of our model, there is no major conceptual or practical difficulty facing a numerical analysis of optimal policy in a semi-endogenous growth model in the spirit of Jones (1995). We leave this for future research. Another possible avenue for future research is to quantitatively study optimal fiscal policy in a model of creative destruction such as Aghion and Howitt (1992).

Appendix A: Decentralized equilibrium
The following system of equations has to be satisfied at each time period t: We then have the following set of variables: It can be convenient to normalize the variables in such a way that they are constant on a BGP (BGP). The variables r t , R t , L t , and H t are all constant on a BGP without normalization; the variables w t , K t , C t , and A t grow at the rate G t := ( Z t /Z t ) (1−σ)/(σ(1−α)) ; Z t grows at the rate Z t /Z t ; finally, Π z,t and P t grow at the rate ( Z t /Z t ) (1−σ(2−α))/(σ(1−α)) . The resulting system of equations with appropriately scaled variables, where the intertemporal elasticity of substitution is 1/ς, is then r t = α(K t /L t ) α−1 1 + Z t σ σ/(1−σ) (1−σ)/σ , and Z t = H 1−Λ t η.

Appendix B: Derivations
The Lagrangian for a monopolist is As is well known, the Lagrange multiplier is equal to marginal cost, hence the notation. The first-order conditions are Dividing Equation (31) by (32), solving for K z,t /L z,t , and re-inserting into (31) we We can now write profits as Final output is given by We now use the fact that Y t = r t K t + w t L t + Π t Z t together with the results above to obtain

Appendix C: Proofs
Proposition 1 If the economy converges to a non-degenerate BGP, then the modified golden rule holds. The capital share, net of taxes, of total income is given by Proof. Public consumption G t grows at a constant, non-negative rate on a nondegenerate BGP, hence the marginal utility of public consumption evolves according where G t is the rate of growth of output (as well as public and private consumption and capital), and the intertemporal elasticity of substitution is 1/ς. The first-order condition with respect to G t is so we then have From the household's intertemporal optimality condition, Equation (14), we know that The first-order condition with respect to B t+1 is which leads us to conclude that ω t+1 = 0 on a non-degenerate BGP. We also know that ϕ t = 0 on such a BGP, otherwise R t = 0 and hence 1 > β = (1 + G t ) ς > 1, a contradiction; and clearly, φ t = 0, otherwise G t = 0. The first-order condition with respect to K t is From here on, we suppress the arguments of r t , w t , and Π t for notational convenience. Now we can combine this with the first-order conditions with respect to τ K,t , τ L,t , and τ Z,t (and using the fact that ϕ t = φ t = 0 on a non-degenerate BGP), to arrive at We can now use the identity that Y t = r t K t + w t L t + Π t Z t and hence to conclude that In words, the after-tax return on capital is equal to the marginal product of capital.
Together with the household's intertemporal optimality condition, Equation (14), this implies that the modified golden rule holds: Multiplying Equation (45) by K t /Y t yields Equation (26).
Corollary 1 Capital taxes on a non-degenerate BGP are negative.
Proof. In order to equate the private returns to capital to the marginal product of capital, a corrective negative tax is necessary, since ∂Y ∂K t ̸ = r t . The tax is given by Since σ < 1, it follows that τ K,t < 0.
Proposition 2 If the economy converges to a non-degenerate BGP and if crossderivatives of the period utility function are identically equal to zero, the labour share, net of taxes, of total income is given by .

Proof.
When cross-derivatives of the period utility function are identically equal to zero, the first-order condition with respect to labour at time t is Making use of the optimality conditions with respect to the tax instruments, Equa-tions (40) to (42), and assuming a non-degenerate BGP, Equation (48) becomes ψ t ∂r ∂L t K t + ∂w ∂L t L t + ∂π ∂L t Z t + ψ t τ L,t w t + θ t (1 − τ L,t )w t = −β t u L,t − µ t u LL,t . (49) We use again the identity that Y t = r t K t + w t L t + Π t Z t and thus to transform Equation (49) into Since µ t = (ψ t − θ t )L t /u C,t and u L,t = −(1 − τ L,t )w t u C,t , this becomes Dividing everything by β t u C,t and using the definitions for ψ t = ψ t and ϵ L,t = u LL,t L t u L,t , we obtain Dividing both sides by Y t and ψ t , multiplying by L t , and doing simple algebraic manipulations completes the proof: .
Lemma 1 If the economy converges to a non-degenerate BGP and if crossderivatives of the period utility function are identically equal to zero, as long as lump-sum taxes are unavailable, then ψ t > θ t , and 1/ ψ t + 1 − θ t / ψ t > 1.

Proof.
It is natural to think that ψ t − θ t > 0, and it is easy to show that this indeed must be true in the absence of lump-sum taxes. If we introduced lump-sum taxes ϑ t , they would show up in the government and household budget constraint; if we furthermore introduced a constraint that limits these taxes, ϑ t ≤ ϑ, with a Lagrange multiplier ς t , then the first-order condition with respect to ϑ t is ψ t − θ t − ς t = 0.
Hence, as long as the constraint is binding, ψ t − θ t > 0.
Multiplying both sides of the inequality in the Lemma by ψ t > 0 yields θ t < 1.
Using the definition of θ t , this is equivalent to θ t < β t u C,t . The first-order condition with respect to consumption at time t, assuming that the cross-derivatives of the period utility function are identically equal to zero, is On a BGP, ζ t = βζ t−1 (1 + G t ), so we can rewrite Equation (54) as Since u CC,t < 0 and R t > G t on a non-degenerate BGP (and all prices and quantities are naturally positive), if the Lagrange multipliers µ t , λ t , and ζ t−1 are all nonnegative, then the inequality is proven. If R t were smaller or equal to G t , then the household could increase consumption by decreasing assets, i.e. it would violate the transversality condition. Making use of the optimality condition for labour taxes, Equation (41), it follows that The first-order conditions with respect to profit taxes, Equation (42), and with respect to the patent price, θ t (∆Z) t + λ t ηu C,t Z t H −Λ t − Υ t = 0, imply together with the production of patents, Equation (8), that λ t = (ψ t − θ t )H t /u C,t > 0.
Finally, the first-order condition with respect to the net interest rate R t is ψ t B t + θ t A t + ζ t−1 βu C,t − γ t = 0.
Combining this with the optimality condition for capital taxes, Equation (40), and making use of the asset-market clearing condition, Equation (16), we get Proposition 3 If the economy converges to a non-degenerate BGP and if crossderivatives of the period utility function are identically equal to zero, the profit share, net of taxes, of total income is given by .
As above, we make use of the three first-order conditions with respect to the tax instruments, Equations (40) to (42), and assuming a non-degenerate BGP: ψ t ∂r ∂Z t K t + ψ t ∂w ∂Z t L t + ψ t ∂Π ∂Z t Z t + λ t ηu C,t P t H −Λ t + Ω t ηH 1−Λ t + κ t − κ t−1 = 0.
In order to substitute out λ t , we employ the first-order condition with respect to profit taxes, Equation (42), and with respect to the patent price, Equation (57), to arrive at We make use of the production of patents, Equation (8), and the marginal product which implies that ψ t ∂Y ∂Z t + (ψ t − θ t ) P t Z t Z t + Ω t Z t Z t + κ t − κ t−1 = 0.
A non-degenerate BGP where the cross-derivatives of the period utility function are identically equal to zero implies an elasticity of intertemporal substitution of one. It follows that κ t−1 = κ t 1 + Z t Z t /β, and so In order to substitute out the Lagrange multiplier κ t , we have to turn to the first-order condition for Z t : ψ t τ K,t ∂r Z t K t + ψ t τ L,t ∂w Z t L t + ψ t τ Z,t ∂Π Z t Z t + θ t (1 − τ L,t ) ∂w Z t L t + µ t (1 − τ L,t ) ∂w Z t u C,t + γ t ∂r Z t (1 − τ K,t ) + Υ t ∂Π Z t (1 − τ Z,t )+ ψ t τ Z,t Π t + θ t P t + κ t − Ω t = 0.
Making use again of Equations (40) to (42), this becomes: ψ t ∂r Z t K t + ψ t ∂w Z t L t + ψ t ∂Π Z t Z t + ψ t τ Z,t Π t + θ t P t + κ t − Ω t = 0.
The marginal product of Z t is ∂Y ∂ Z t = ∂r and we thus have κ t = −ψ t ∂Y ∂ Z t + (ψ t − θ t ) P t + Ω t .
In order to substitute out the Lagrange multiplier Ω t , we employ the first-order condition with respect to H t (noting that we are considering the case where crossderivatives of the period utility function are identically equal to zero): Using Equations (42), (8), and (57) this becomes Combining Equations (66), (70), and (72), we obtain Now we use the fact that u H,t = −u C,t P t Z t /H t to rewrite this as Dividing everything by β t u C,t and using the definitions for ψ t = ψ t β t u C,t , θ t = θ t β t u C,t , and ϵ H,t = u HH,t H t u H,t , we obtain Dividing both sides by Y t and ψ t and performing simple algebraic manipulations, while noting that P t = π t (1 − τ z,t ), completes the proof: .

Appendix D: Non-zero cross derivatives
The condition for optimal capital taxes is not affected by non-zero cross derivatives in the utility function. The conditions for optimal labour and profit taxes do change, however, and we state them here. The proofs proceed along the lines of the proofs above, so we omit them.

Proposition 2'
If the economy converges to a non-degenerate BGP the labour share, net of taxes, of total income is given by where ϵ CL,t := u CL,t · L t u C,t is the cross-elasticity of the marginal utility of consumption and working more, while ϵ LC,t := u LC,t · C t u L,t is the cross-elasticity of the marginal disutility from working and consuming more, and ϵ LH,t := u LH,t · H t u L,t is the cross-elasticity of the marginal disutility from working and innovating more.

Proposition 3'
If the economy converges to a non-degenerate BGP the profit share, net of taxes, of total income is given by 1/ ψ t + 1 − θ t / ψ t 1 + ϵ H,t − ϵ CH,t + ϵ HL,t + ϵ HC,t where β t = β(1 + G t ) 1−ς , ϵ CH,t := u CH,t · H t u C,t is the cross-elasticity of the marginal utility of consumption and innovating more, while ϵ HC,t := u HC,t · C t u H,t is the cross-elasticity of the marginal disutility from innovating and consuming more, and ϵ HL,t := u HL,t · L t u H,t is the cross-elasticity of the marginal disutility from innovating and working more.
Note that the discount factor is different here, which stems from the fact that the Lagrange multiplier dynamics over time are somewhat different when the intertemporal elasticity of substitution is not equal to one. In particular, the growth rate of κ t is then β(1 + G t ) 1−ς /(1 + Z t /Z t ).

Appendix E: Exogenous Growth Model
There are no intermediate goods and the production function of the final good is given by where Z t is the exogenously specified labour productivity in period t. Technological growth is constant, so we have Z t+1 = (1 + GR)Z t , where we set GR so that the annual growth rate is 2% as in our baseline calibration.
The Lagrangian is with the following set of choice variables: We also have to keep in mind that r t and w t are functions of K t and L t :