Economic impacts of AI-augmented R&D

Since its emergence around 2010, deep learning has rapidly become the most important technique in Artificial Intelligence (AI), producing an array of scientific firsts in areas as diverse as protein folding, drug discovery, integrated chip design, and weather prediction. As more scientists and engineers adopt deep learning, it is important to consider what effect widespread deployment would have on scientific progress and, ultimately, economic growth. We assess this impact by estimating the idea production function for AI in two computer vision tasks that are considered key test-beds for deep learning and show that AI idea production is notably more capital-intensive than traditional R&D. Because increasing the capital-intensity of R&D accelerates the investments that make scientists and engineers more productive, our work suggests that AI-augmented R&D has the potential to speed up technological change and economic growth.


Introduction
In this paper, we consider what effect the adoption of Artificial Intelligence (AI) within science and engineering will have on idea production and, subsequently, on productivity and economic growth.Unlike previous work that has attempted to provide only a theoretic treatment of the topic, we approach this question with microdata from deep learning, the AI paradigm responsible for nearly all landmark results in the past decade.We provide a framework for understanding the impact of two important trends: i) the recent breakthroughs using deep learning in R&D, and ii) the rapid scaling of computation in deep learning systems.We show that if deep learning is widely adopted in the U.S. R&D sector, it would induce an accumulation of computational capital that could nearly double the productivity growth rate.
Since the early 2010s, when it produced seminal breakthroughs in computer vision and speech recognition, deep learning has led to a rapid increase in the rate of progress in Artificial Intelligence (LeCun et al. 2015;Goodfellow et al. 2016, Ch. 1;Russell and Norvig 2020, Ch. 1.3).Breakthroughs have been made in many areas, including, to name a few, computer vision, speech recognition, natural language processing, and game playing.Deep learning has also made inroads into parts of science largely untouched by previous AI research, including protein folding, semiconductor chip floorplanning, controlling nuclear fusion, and even discovering novel algorithms and new insights in pure mathematics.The rate at which long-standing problems have been solved, and the pace at which deep learning systems have out-competed traditional algorithms, have been surprisingly rapid to even some of its most seasoned practitioners.
As uses of AI proliferate, economists have sought to understand its impacts on wages, factor shares, and economic growth.A prominent line of thought asks whether deep learning has the potential to become a General Purpose Technology, a technology with widespread applications in a variety of industries and the ability of AI to replace human labour across a wide variety of tasks (Goldfarb et al. 2022;Agrawal 2022;Trajtenberg 2018). 1  Much of the existing research has focused on the potential of AI to impact final good production, but it has also been pointed out (e.g. by Cockburn et al. 2019), that AI also has the potential to change the innovation process itself.Such "Inventions of a Method of Invention" (IMI) can significantly affect the rate of idea production (Crafts 2021;Cockburn et al. 2019) and, therefore, the overall rate of innovation in the economy.For example, building on the Weitzman 1998 model of recombinant technological development, Agrawal et al. 2019 argue that deep learning can improve knowledge production by effectively searching through and recombining a wider range of ideas than is possible by human scientists and that this could result in accelerated economic growth.Empirical testing of the impact of AI on R&D shows mixed results.Bianchini et al. 2020 find that the use of deep learning is positively correlated with the mean and variance of paper citations received, increasing the likelihood for a contribution to become an influential 'big hit.'However, they also find that it is negatively correlated with the re-combinatorial novelty of ideas, measured as a function of the fraction of novel citation pairs in a given paper.Another line of research focuses on the relationship between AI and data, showing that machine learning increases the returns to data and thus the rate of knowledge production for data-rich firms (Beraja et al. 2020;Abis and Veldkamp 2020;Agrawal et al. 2018).While these insights are informative about firm-level effects, they shed less light on the implications for the aggregate economy.
The impacts of AI on the innovation process deserve special attention because it has been pointed out that these, under suitable conditions, can have more dramatic permanent effects on productivity growth than those that arise from changes in final goods production.For example, in the semi-endogenous growth model of Aghion et al. 2019, the authors consider AI automation in producing final goods and in producing knowledge, and find that the latter can produce much more rapid output growth.Trammell and Korinek 2020 provide a review of the theoretical literature on AI and growth, which concludes that, while a high degree of automation in final goods production can produce a one-time increase in the growth rate, a high degree of automation in the R&D sector can produce unbounded increases in economic growth.
Our work investigates how deep learning will impact the production of ideas.We argue that the adoption of deep learning makes computational capital in R&D more productive, resulting in capital deepening that, if widespread, accelerates knowledge creation and economic growth.To motivate this, we derive a semi-endogenous growth model that shows that a positive shock to the R&D elasticity of capital-such as might follow the widespread adoption of deep learning techniques-permanently increases the rate of idea accumulation and economic growth.
But will deep learning increase the R&D elasticity of capital?We provide supportive empirical evidence by estimating the idea production functions for two relatively mature deployments of deep learning.To analyze human capital in deep learning, we develop a novel machine learning approach for estimating human capital and apply it to machine learning papers in the arXiv repository.To analyze computing resources, we augment the dataset from Thompson et al. 2020 to cover the entire universe of papers on two popular computer vision tasks.
Our estimates of the deep learning production function allow us to compare AI-augmented R&D with the R&D practiced in U.S. science and engineering areas.We find that deep learning's idea production function depends notably more on capital.This greater dependence implies that more capital will be deployed per scientist in AI-augmented R&D, boosting scientists' productivity and economy more broadly.Specifically our point estimates, when analysed in the context of a standard semi-endogenous growth model of the US economy, suggest that AI-augmented areas of R&D would increase the rate of productivity growth by between 1.7-and 2-fold compared to the historical average rate observed over the past 70 years.
Our analysis is organized as follows.Section 2 motivates the importance of R&D capital intensity for economic growth using a semi-endogenous growth model.Section 3 describes the datasets, and Section 4 the empirical strategy we use to model and estimate idea production.A key input for these models is an estimate of the human capital of the teams working on particular AI projects.In Section 5 we develop a deep neural network for learning simple representations of human capital that outperforms other measures commonly used in scientometric literature.For example, our human capital estimates explain 60-80% of variance in key publication-related outcomes, whereas standard linear models explain less than 20%.In Section 6, we present our empirical analysis, which implies that a firm in a competitive R&D sector using deep learning would be roughly 5 times more capital-intensive than current U.S. STEM R&D.In Section 7, we use our growth theory model to investigate the implications of higher capital intensity of R&D, and find that it implies a substantially faster productivity growth rate-2-to 3-fold greater than the 0.8% growth rate the US saw over the last decade.In Section 8, we find that our results are robust to outliers and alternative model specifications, including alternative elasticities of substitution between computational and human capital.In Section 9, we consider limitations of our analysis as well as future directions for research, and in Section 10 we conclude with a brief discussion of the implications of our results.

The role of capital in idea production
We argue that deep learning may affect the growth rate of knowledge by impacting the productivity of research capital.While the role of capital does not usually receive center-stage in the analysis of R&D-based growth, it has been shown to generate permanent growth effects by increasing the marginal product of labour in R&D and thus increasing investment in the R&D sector (Howitt and Aghion 1998;Howitt 1999).The key mechanism driving this result is that, unlike the stock of human labour, the rate of physical capital accumulation can be readily increased or decreased in response to a change in its productivity.A very similar point is made by Aghion et al. 2019 in their study of the growth effect of AI.They show that in the classic endogenous growth case, a one-time increase in R&D automation will raise the long-run growth rate, as capital-an accumulable factor in production-becomes permanently more important.
Empirical work has validated the importance of physical capital investment in idea production.For example, Helmers and Overman 2017 showed that the creation of the UK's Diamond Light Source synchrotron increased the research capital available to local scientists, which in turn increased their research publication output relative to UK scientists located elsewhere.
The prevalence of capital goods in U.S. R&D is documented by the National Science Foundation's Higher Education Research and Development Survey.It finds that academic institutions have spent over $2bn per year on Capital Equipment or Software for R&D since 2010, representing roughly 4% of total direct R&D expenditures (National Center for Science and Engineering Statistics 2021).In STEM fields, this share is higher-around 6% overall in 2020, with Chemistry at 14%, Material Science and Physics at 15%, and Engineering ranging between 7% and 11%.Since the overall capital-intensity of academic science is only 4%, there is enormous room for more capital-intensive approaches.
Even as compared to capital-intensive areas of science, there are reasons to suspect that R&D using deep learning might be yet more so.Some recent capital-intensive examples include OpenAI's GPT-3 language model (Brown et al. 2020), DeepMind's AlphaZero game-playing system (Silver et al. 2017), and DeepMind's protein-folding system, each of which reportedly used millions of dollars worth of computing (Gibney 2017;Jumper et al. 2021).
Computing theory also suggests reasons deep learning is capital-intensive and why it is likely to become more so.In classical statistical learning theory, there generally is a trade-off between bias and variance (Hastie et al. 2009).Once a model grows beyond a certain complexity threshold, it tends to "overfit" the data, worsening test performance2 as the variance term dominates.Deep neural networks seem capable of evading this trade-off by vastly expanding the size of the network ("overparameterization"), that is by deploying more computational capital (Belkin et al. 2018;Nakkiran et al. 2021).
Surprisingly, empirical analyses have shown that the performance gains that accrue to these networks with millions or billions of parameters are highly predictable.Generally, these analyses of "neural scaling laws" find that test error falls according to a power law in the scale of such models, and therefore in the amount of compute used (Kaplan et al. 2020;Hoffmann et al. 2022;Hestness et al. 2017;Sun et al. 2017;Lepikhin et al. 2020;Li et al. 2020;Jones 2021;Bahri et al. 2021;Sharma and Kaplan 2020). 3Researchers and practitioners are harnessing these dependencies to get better performance.For example, Thompson et al. 2020 shows that progress is highly dependent on computational resources across a wide range of machine learning tasks.For image classification on the ImageNet database, 71% of the variance in model performance is explained by the computation used.The importance of computing resources in deep learning was elegantly summarized by Rich Sutton (Sutton 2019), a prominent figure in the field of reinforcement learning, who wrote: The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin... Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.
If deep learning is indeed more capital-intensive, the investment dynamics implied by endogenous growth models would predict a rapid scale-up to have occurred in the computational capital being used in AI-based R&D.Sevilla et al. 2022a find exactly that: since the advent of deep learning, the growth in the amount of computational capital typically used in milestone models doubles roughly every 6 months, far outstripping the rate during previous eras of AI.So, while the size of capital investments made in deep learning systems are still small compared to, for example, those required for large-scale physics experiments, there are compelling reasons to believe that these models are capital-intensive and will continue to become more so.

R&D capital intensity in a semi-endogenous growth model
Consider a simple semi-endogenous growth model along the lines of Jones 1995.There are two sectors, a goodsproducing sector where output is produced and an R&D sector where additions to the stock of knowledge are made.A fraction α l of the labour force is used in the R&D sector and fraction 1 − α l in the goods-producing sector.Similarly, fraction α k of the capital stock is used in R&D and the rest in goods production.We make similar simplifying assumptions as Romer 1990 by supposing that α l and are α k exogenous and constant for expositional clarity.Ideas are non-rivalrous and the full stock is used equally in both sectors of the economy.
For further simplicity, we assume constant returns to scale in the production of final goods.The quantity of output produced at time t is thus: The production of new ideas depends on the quantities of capital and labour engaged in research and on the level of technology.We assume there are diminishing returns in the production of new ideas in inputs (β + θ < 1).This assumption ensures a unique steady-state growth rate, and prevents the growth rate from exploding as the R&D inputs grow without bound.The idea stock grows as follows: where B is a positive shift parameter.We further make the simplifying assumptions, not uncommon in the literature, that there is a constant saving rate s, and that capital depreciates at a constant rate δ.Moreover, our model is a semi-endogenous one.Hence, we suppose that population grows at exogenous rate n.Thus capital and labour accumulation are described as follows: , and L(t) = nL(t), where s, δ ∈ (0, 1) and n > 0. (3) Using equations (1-3), we solve for the steady-state rates of growth in ideas and capital (denoted as g * a and g * k respectively.4 ) These are: Proposition 1.Consider a shift in the technology of R&D that creates a positive shock to the R&D elasticity to capital and, at worst, a proportional negative shock to the R&D elasticity to scientists.That is, consider a shift from an R&D setting described by (2) to a setting described as follows: Such a shift in the technology of R&D has the following implications: (a) the rate of idea accumulation is strictly and permanently increased, and (b) the rate of economic growth is strictly and permanently increased.
Proof.Define ∆ X ≡ X − X as the difference between pre-and post-values of parameter X.By assumption, ∆ β ≥ −∆ γ .The steady-state rate of growth in ideas is strictly increased if: which follows from the fact that ∆ β ≥ −∆ γ .This proves part (a) of the proposition.The proof of part (b) may be found in Appendix A2.
Proposition 1 states that if there is a shift to an R&D setting with higher R&D elasticity to capital, and if the R&D elasticity to scientists is not disproportionately negatively affected, the steady-state rate of technological change and economic growth will strictly and permanently increase.
Highly capital intensive R&D Figure 1.Productivity growth and R&D cost-shares in a competitive R&D economy.Figure 1a shows steady-state productivity growth as a function of the implied capital-cost share in a competitive R&D industry when the elasticity of R&D output to the stock of ideas (θ) is 1/2 and the elasticity of R&D output to labour inputs (γ) is 2/5 (consistent with our survey of existing estimates in Appendix C). Figure 1b shows the share of R&D expenditure in each discipline that is spent on capital equipment, based on data from the National Science Foundation 2020 Higher Education Research and Development Survey (National Center for Science and Engineering Statistics 2021).5 The intuition behind this result is as follows: while the labour force grows at a rate independent of economic growth, capital accumulation is determined endogenously by investment.After a positive shock to the R&D elasticity to capital, investment rises, increasing the productivity of scientists.This increases the rate of idea production and consequently boosts economic growth.But faster economic growth also increases the rate of capital investment.Hence, a positive shock to the capital productivity of R&D gives rise to a virtuous cycle of idea accumulation, economic growth and capital formation: a cycle that produces a new balanced growth path with permanently higher steady-state rates.
To illustrate how capital-intensive R&D could result in super-normal productivity growth, consider Figure 1a.Suppose the R&D sector is competitive (such that wages and rents are equal to their marginal products), the steady-state rate of idea accumulation increases steadily in the share of R&D expenditure dedicated to capital.Under conservative assumptions on our growth model, highly capital-intensive R&D (such as when optimising R&D firms dedicate at least 20% of R&D expenditure to capital) would produce productivity growth rates in excess of the usual productivity growth rates observed in the US.By contrast, current US R&D tends to be highly labour-intensive.Using 2020 data from the National Center for Science and Engineering Statistics 2021 NSF-supported STEM R&D and assuming a Cobb-Douglas functional form for ideas production, we see that capital shares tend to fall between 3% and 20% (see Figure 1b). 67 In the analysis section, we present evidence that the relative returns to capital for deep learning are higher than other types of R&D.This suggests that, if deep learning could be similarly applied to a wide range of R&D problems, its high degree of capital intensity could accelerate technological change and, as a consequence, economic growth.

Data
In our work, we rely primarily on two datasets.Our primary dataset covers the compute cost and performance for 151 deep learning models that were presented in publications between 2012 and 2021.The second is a bibliometric dataset of the authors of machine learning publications published between 1993 and 2021, which we use to infer the human capital inputs for each deep learning model in our primary dataset.

Data on computer vision experiments
Our dataset on the compute costs and performance covers 151 models published between 2012 and 2021.This data is an augmented version of Thompson et al. 2020, with additional details about the settings under which the models were trained and tested, for example, whether additional training data was used, or whether the training or test data was augmented.
The compute estimates are derived from the underlying papers following the procedure described by Sevilla et al. 2022b, which we summarize in Appendix E. The inclusion and exclusion criteria used to generate our datasets is described in Thompson et al. 2020.Deep learning models in this dataset span two well-known benchmarks: image classification on the ImageNet dataset and object detection on the Microsoft COCO dataset, usually known as the MS COCO dataset.
ImageNet is perhaps the most well-known and widely used computer vision dataset.It spans 1,000 object classes and contains 1.28m training images (Russakovsky et al. 2015).Some of the most important breakthroughs in deep learning have happened in ImageNet models, starting with AlexNet, a watershed moment when deep learning first outperformed other techniques on this task (Krizhevsky et al. 2017).Importantly for our purposes, success on ImageNet has often proven to be general: techniques that advance its stateof-the-art have usually been found to be successful in other tasks and domains.For example, Beyer et al. 2020 documents various instances when progress on ImageNet due to architecture design or optimization has yielded corresponding gains on other modalities, such as natural language processing, audio processing, and game playing.Because of this, it is plausible that our results for this benchmark could generalize to tasks and domains beyond computer vision.
The MS COCO 2017 dataset is one of the most frequently used datasets for object detection, face detection, and pose estimation, among other tasks.It contains a total of 2.5 million labelled instances in 328k images (Lin et al. 2014).Like ImageNet, this dataset has been used as a test-bed of many influential innovations, such as He et al. 2016's Residual Network architecture, which has since become widely used in computer vision (see e.g.Khan et al. 2020).
While these two domains of computer vision are crucial test-beds for deep learning, it would be better if we considered a wider range of scientific and technical domains in which these techniques were applied.Unfortunately, this is difficult because of challenges for both inputs and outputs.For inputs, many deep learning papers fail to report even basic details of their computational usage.For outputs, some areas of deep learning struggle to define objective measures of performance.For example, how should one define the "correct" text summary of a picture?8

Data on authors and publications
Our dataset on machine learning publications comes from arXiv, a pre-print server commonly used in various STEM fields, including computer science, and Scopus, Elsevier's abstract and citation database.Our dataset includes all papers on arXiv that were posted between 1993 and 2021 that are from the subfields typically associated with machine learning: Machine Learning (stat.ML), Artificial Intelligence (cs.AI), Computation and Language (cs.CL), Computer Vision and Pattern Recognition (cs.CV), and Learning (cs.LG).We match the authors of these papers to their corresponding entries in Scopus using a variety of string distance-based matching approaches.This technique allows us to match 90.1% of authors, and spot testing on 300 random matches shows that 96% were correct.
With the connection between papers to their authors' publication histories, we construct a timeseries for each author that shows their number of publications, h-index, and citations (excluding self-citations).We supplement this data with similar timeseries of grant funding for each author's institution and department over time from the Dimensions grant database, institutional rankings over time from csmetrics.org,and measures of the scientific influence of computer science journals and conferences using from SCImago.For full details on data collection procedures, see Appendix D.

Empirical strategy
We assume, along the lines of the semi-endogenous growth model outlined above, that idea production using deep learning depends on three factors: labour (scientists' human capital), specialised capital goods (computational capital), and total factor productivity (the extant level of 'ideas' or technology upon which researchers build): , where t > 0, and X(0) > 0 for any X ∈ {A, S, C}.
where Ȧ(t) denotes the change in the stock of technology, S(t) the total human-capital input of scientists, and C(t) refers to the total capital inputs.To estimate this using data, we replace Ȧ(t) with a measure of the performance of deep learning models, C(t) with data on computational inputs, and S(t) with estimates of scientific human capital inputs.9

Empirical specification
Consider an economy where the level of technology grows exponentially on the balanced growth path in the way standardly assumed in growth theory models: We do not observe technology directly.Instead, we observe performance on machine learning tasks.In these cases, the level of performance-usually measured as a type of predictive accuracy-falls on the unit interval.
We assume that performance relates to technology according to the logistic function, reflecting that the most challenging parts of innovation are being able to make some initial headway with a problem and then perfecting it, This is a similar assumption used to model how effort relates to various outcomes when the outcomes are bounded, such as in contests (Vojnović 2015;Baik 1998), conflict interactions (Hirshleifer 1989;Jia, Skaperdas, et al. 2012), and persuasion (Skaperdas and Vaidya 2012).Beyond the fact that this is a relatively standard transformation that enables us to map progress in technology onto a bounded interval, there are two further motivating considerations.Firstly, we show that this functional form implies a power-law between the scale of the compute deployed and the level of error achieved, which is in line with a robust finding of the relevant machine learning literature (e.g.Hoffmann et al. 2022) (see Appendix A5).Secondly, this functional form enables us to construct a simple empirical counterpart for technological progress, which we derive as follows.First, assuming that growth rates in adjacent periods are approximately equal (i.e. that g t ≈ g t−1 ) it can be shown that proportional technological growth relates to performance improvements as follows (see Appendix A4): Proportional increase in accuracy which thus provides us an easy-to-interpret decomposition of technological progress in terms of the (logs of) the proportional reduction in error rate and the proportional increase in accuracy.
Let gt denote the approximation of g t given P (t), i.e. gt ≡ log ã .We can write the empirical specification of our model as follows: We thus obtain an empirical specification of gt that we can ground in the relevant empirical data.When relating the model to data, time becomes discrete, and experiments are produced by research groups, which are indexed by i ∈ {1, ..., N }.
Assuming a multiplicative error model, we specify the empirical counterpart of (10) as follows: Taking logs of both sides, we have: We estimate the following model: which we can estimate in a pooled fashion with a time-fixed effect that captures (θ − 1)A t for t ∈ {1, ...T }.By default, we will fix the time periods as years.In the robustness checks section, we show that shorter or longer time windows do not change our overall results.
A key basic that is evident from our estimation procedure (which we import from endogenous growth theory) is that knowledge is non-rivalrous.This assumption warrants some reflection.It assumes that researchers have access to a common stock of knowledge at any point in time and that the advancements made by researchers in one period are available to others in the next period.These assumptions seem broadly reasonable given the open research norms in machine learning (e.g. publishing on arXiv), providing access to code (e.g. via GitHub, Papers with Code, etc.) as well as the common tools (e.g.PyTorch) that embed model implementations.And, indeed, companies that are notoriously closed-lipped about technology are nevertheless relatively open about AI research (Ahmed and Wahed 2020).
If, nevertheless, there are groups that try to keep trade secrets, and those tend to be areas of particularly high human capital (as one might expect), then our model would implicitly treat such knowledge as an additional benefit of human capital.All else equal, this would make our results about capital-intensity underestimates of the true level.

Operationalizing innovations
In estimating our model, we need to operationalize proportional performance improvements in terms of observables.We measure this using our baseline data, where authors of each paper have indicated the touchstone models in the literature whose ideas they are building on.That is, for any particular task, we define relative performance gains as follows: That is, i's innovation is defined as the proportional improvement over the performance of a model that is considered, by the contemporaneous literature, to be a relevant baseline model.This operationalization is chosen for two reasons.Firstly, it is common practice to report these values in the machine learning literature, as the extent of innovations are often illustrated through comparisons to existing baseline levels of performance (Armstrong et al. 2009;Melis et al. 2017;Pressel et al. 2018).Secondly, this notion of an improvement over a model lines up well with the usual notion of the change in stock of knowledge in R&D-based growth models, such as those from Romer 1990; Grossman and Helpman 1994 and others; it represents the extent of the innovation of a new design relative to the existing stock of ideas.To find the appropriate baseline levels of performance, we survey the models that are used as baseline results in the relevant literature and take the median of their performance (See Appendix D1).

Modelling human capital
The final remaining piece needed to estimate the deep learning production functions for these computer vision tasks is to construct a measure of the scientific human capital used for each of these models.This measure should be predictive of outcomes that are strongly influenced by human capital and must be inferable from available data about the scientists' track records.In addition to overall predictiveness, it is particularly important that estimates are good for junior researchers, who contribute importantly to this young field.
Prior work has used various measures of the quality or status of scientists and engineers, including impactbased metrics, such as citation counts (Azoulay et al. 2019;Jones et al. 2014;Zucker et al. 2002), the number of high-impact citations (Azoulay et al. 2014) and bibliometric indices such as the h-index (Teplitskiy et al. 2019;Fisman et al. 2018;Breschi et al. 2014).These approaches have substantial limitations as approaches to measuring the human capital of teams of scientists and engineers.As we shall see, almost all of these measures are only weakly predictive of key outcomes where we would expect scientific and technical human capital to be important, such as the number of citations the work will receive in the future, or the quality of the journal or conference the publication is to be published in.Moreover, impact-based metrics, such as citation counts or the h-index generally assign low scores to junior researchers, as citations often take a long time to accrue following the publication of scholarly work.
Our strategy for modelling researcher human capital is as follows.We construct a deep neural network (DNN) and train it to develop a single-dimensional representation of the total quality-adjusted research input ("human capital") that is highly predictive of key bibliometric and publication-related outcomes.Our approach implements an encoder that maps many features about the publication's authors input to a single-dimensional representation, and a decoder model (built explicitly to have the function as a linear regression) that maps this representation onto citation-and publication-related outcomes.Our approach exploits the ability of DNNs for nonlinear data compression of high-dimensional input features (see e.g.Hinton and Salakhutdinov 2006;Kramer 1991).10Our approach finds human capital representations that are highly predictive of key bibliometric and publication-related outcomes, and that substantially outperform the typical approaches used in the literature.

Our machine learning approach to estimating human capital
We use our dataset of 49,251 machine learning publications to train a neural network to predict bibliometric and publication-related outcomes.The predicted outcomes include the citation trajectories for each publication and its SJR-values, a measure of the quality of the journal or conference where the work ends up being published (see González-Pereira et al. 2010).Figure 2a provides a diagrammatic overview of our data pipeline and the training set-up used to produce our model.Further details of the training procedure may be found in Appendix F.
Figure 2. Human capital estimation strategy.Figure 2a presents our set-up for learning human capital representations for machine learning publications.Figure 2b shows our neural network architecture.Highlighted is the human capital unit, whose activations are strongly related to the quality of the research team.The numbers on each layer represent the number of units in that layer (for the human capital unit, this is just 1).
Our architecture is constructed as follows.We first stack of 15 sets hidden layers, each consisting of a 4096 or 2048 node layer, followed by a batch-norm layer (Ioffe and Szegedy 2015).These feed into a single unit -the "human capital" unit.This layer forces the neural network to reduce the dimensionality of its representations and distil the relevant features into a single scalar.The human capital result is then concatenated with the publication date, and fed into a series of independent sub-branches, one for each output being predicted.The final layer effectively implements separate linear regressions of the sort y i = α + xβ, meaning that the learned human-capital representations can only be linearly re-scaled and offset in order to make predictions about citations or journal quality.
In other words, our approach implements an encoder that maps the input to the representation space, and a set of decoder regression models that map the representation onto citation-and publication-related outcomes.Thus, during training, the encoder is pushed to learn single-dimensional representations that are informative of human capital outcomes.

Validating our estimates
To assess the success of our measures in evaluating human capital, we compare their predictiveness across a range of outcomes, including citations received at various points and journal quality rankings.In each case, our estimates predict more than 55% of the variation in these measures, roughly 4-5 times as much as other proxies that are commonly used (e.g.prior publications, prior citations, h-index) (see Figure 3).In all cases, we are predicting out of sample on a test set of a random sub-sample of 4,081 publications which was held out from any of the training.11These results indicate that our model has learned to predict bibliometric outcomes of publications, and in doing so, it has inferred meaningful and predictive human-capital features that can be measured as activation strength.Finally, when restricting the dataset to just publications with junior researchers (defined as publications with 2 or fewer prior publications), we find that our human capital estimates are still highly predictive of each of the bibliometric and publication-related outcomes (see Appendix H), while most other proxies have little to no predictive power.
Thus far, we have shown that our measure has better predictive performance relative to other commonlyused human capital predictors.However, these other predictors also have access to much less data than our measure.For a more equal comparison, we also ask how our measure compares to a Lasso regression predictor-an approach more representative of linear approaches used in the literature-with access to the same inputs as our neural network.To do so, we evaluate our DNN on an out-of-sample test set.Our DNN represents a substantial improvement relative to simpler approaches found in the literature that rely on linear combinations of impact-based metrics, such as the h-index, received publications, or publication counts.In particular, we obtain prediction errors (measured in mean-square-error) that are at least 40% lower for each outcome compared to Lasso regressions, and thus we get much more precise predictions, as shown in Figure 3b.
While the preceding points to clear benefits with our approach, it is essential to mention that our approach still has many of the limitations that are true for many measures in the field.First, there is no natural unit for human capital, and thus the cardinality of estimates is hard to interpret.Second, it is unclear how citations and journal quality relates to actual scientific merit, novelty, or insight.Thus, by using a measure predictive of citations and journal quality we would implicitly be missing aspects of human capital that cannot be inferred from these imperfect measures.

Empirical Analysis
Having validated our human capital measures, we combine them with the compute data to estimate production functions for two important AI tasks: image classification and object detection.In particular, we estimate individual regression models and a pooled model described by equation 13.We estimate using OLS, except where a Breusch-Pagan test finds the presence of heteroskedascitity, in which case we estimate a GLS model by Maximum Likelihood (details in Appendix J).Our results for models A1-B2 are displayed in Table 4.We also estimate a pooled model with distinct time-fixed effects, which combines data across both computer vision tasks.A likelihood ratio test indicates that the pooled model fits the data better than separately estimated models.The estimates of models C1-C2 are displayed in Table 5.For image classification, we estimate the R&D elasticity of capital (β) is 0.111 (model A1) and 0.140 (model A2), as shown in Table 4.This means that a 1% increase in the computational capital used for this type of R&D is associated with a 0.111-0.140%increase in the rate of technological change.For object detection, we estimate β is 0.246 (for both model B1 and B2), considerably higher than for image classification.For our pooled estimates, we get estimates between 0.145 and 0.176 (see Table 5).All these results are statistically significant at the 5% significance level; most are also significant at the 0.1% level.

Data
We find less variation in our estimates of the R&D elasticity of human capital (γ) between the two deep learning tasks.Our human capital elasticity estimates for image classification are 0.246 (model A1) and 0.350 (model A2), both significant at the 0.1% significance level.These estimates are just over twice as high as those for computational capital.For object detection tasks, the estimates of γ are similar at 0.352 (model B1) and 0.319 (model B2), though only the former of these two estimates is significant at the 5% significance level.For our pooled model, we find estimates of the R&D elasticity of human capital is 0.278 (model C1) and 0.298 (model C2), each statistically significant at the 1% significance level.
Recall that with the standard economic growth model outlined in Section 2.3, we can directly infer the equilibrium cost shares dedicated to capital and labour from the relevant elasticities (assuming a competitive market).In doing so, we find that the implied capital-cost estimates range from 0.29 and 0.44 (see Figure 4).Our estimates indicate that an optimizing firm in the R&D sector should allocate between 29% and 44% of their total expenditure on computational capital.Using our confidence intervals generated by bootstrapping, we find that implied capital-cost estimates of statistically significantly greater than 0.15 at the 5% significance level for all models-a number that substantially exceeds most observed STEM R&D capital shares.  .Implied optimal R&D expenditure breakdown.Implied capital-cost shares given the estimates presented in tables 4 and 5, computed as β/( β + γ).Error bars represent 90% confidence intervals generated by bootstrapping 10,000 iterations.We use the bias-corrected percentile method for bootstrapping confidence intervals for ratios outlined in Campbell and Torgerson 1999.
We find that the implied capital cost share for object detection is higher than for image classification by a margin of roughly ten percentage points.However, this difference is not statistically significant.Overall, we see that the implied R&D capital shares for AI are substantially higher than in other areas of U.S. science and engineering Section 2.3, where capital share generally falls below 20%.

R&D with deep learning
Having estimated the capital intensity of R&D that is augmented with AI, we analyse the potential productivity effects that the widespread adoption of deep learning would have on economic productivity and growth.To do so, we suppose that the widespread adoption of deep learning would act as a one-time shock, raising the capital intensity of knowledge production in the economy to the levels we estimated in computer vision.
Along the balanced growth path, the steady-state growth rate in the stock of knowledge is described by equation 4. Using this, we can substitute in the parameter values implied by our empirical estimates from the prior section into our semi-endogenous growth model and compute the predicted change in R&D productivity growth conditional on the widespread of adoption of deep learning.As shown in Figure 5, depending on the model specification, the results from image classification would imply a productivity growth rate between 1.6% and 1.8%, whereas those from object detection would imply a rate between 3.1% and 3.9%.Our preferred estimate, both because more data inform it and the estimates are more precise, is the pooled estimate for computer vision.With the widespread adoption of deep learning raising ideas production in the economy to this level of capital intensity, we would expect the productivity growth rate to rise to between 2.1% and 2.4%.To put that in context, this would amount to increase of between 1.7-and 2-fold relative to the 1.2% average U.S. productivity growth from 1948 to 2021, and a 2.6-to 3-fold increase against the post-2000 0.8% growth (Fed 2022).Thus, our results indicate that if adopting deep learning in other areas of R&D allows those areas to leverage capital better in the same way that computer vision has, it will represent a substantial acceleration of scientific progress.indicate point estimates of implied optimal R&D expenditure with deep learning according to models A1-C2 as estimated in section 6. "Current region" indicates the current level of capital intensity of R&D according to NSF data, which our semi-endogenous growth model predicts to result in 0.5% to 1.3% productivity growth, a level consistent with observed recent US productivity growth.

Robustness and external validity
In this section, we show that the results in Section 7 are robust to outliers and to different choices of how granular or coarse-grained time periods are specified.Moreover, we show that our assumptions about the substitutability of labour and capital are consistent with our data, which supports a key assumption required for our inferences about the capital-intensity of deep learning from our estimated elasticities.Finally, we discuss the generalizability of estimates from computer vision to other R&D tasks.

Sensitivity to outliers
It is known, that certain empirical results from high-profile studies can be reversed by removing less than 1% of the sample even when standard errors are small (Broderick et al. 2020).In this section, we assess the sensitivity of our results to outliers by showing that the removal of a small fraction of the data is not determinative of our findings.To test the robustness of our results to the removing of samples, we re-run our analysis between around 10 4 and 10 6 times (depending on which model is re-estimated) on random sub-samples of our datasets that excludes a fraction of our observations.The point estimates are plotted in Figure 6.
For our dataset on object detection, we estimated models on all possible sub-samples that exclude 3 observations (of which there are 40!37!3! = 9880).For our dataset on image classification and for our pooled dataset, we estimated models on 10 6 random sub-samples that exclude 5 observations.The sub-samples considered covers around 1.6% of all possible sub-samples for the image classification dataset and 0.3% of all sub-samples for the pooled dataset and is, therefore, a non-trivial fraction of all possible permutations. .Estimates after removing a random 5% of dataset Median estimates when taking a random sub-sample of our dataset that excludes 1.6% to 5% of the total number of observations.Results are displayed as violin plots, using kernel density estimation to create the distributions.Inside the violins, the box plots show median and interquartile ranges.
We find that the point estimates are mostly robust to the removal of any small subset of observations, as we see that most estimates are tightly clustered around their median value, particularly for the estimates for which our datasets are the largest (namely, image classification and our pooled dataset).Moreover, the point estimates are consistent with the estimates found in our baseline empirical analysis presented in Section 6, indicating that our estimates are not the product of a small number of outliers.

Alternative model specifications
Our estimation strategy for A t (the stock of knowledge) takes advantage of cross-sectional variation at each time point.That is, we effectively pool publications into groups of contemporaries published around the same time.We then estimate the variation in performance due to changes in inputs amongst these contemporaries-and suppose that this variation is due to inputs rather than changes in A t .However, papers are published continuously over time, so this necessarily involves a bias-variance trade-off: if we specify more granular time periods (such as months instead of years), it gives more variance, but this will mean that each interval is estimated with fewer data points, making it noisier and more prone to over-fitting.In our empirical analysis, we balanced this trade-off by fixing time periods as yearly intervals.In what follows, we show that our conclusions are robust to different reasonable choices of how granular or coarse-grained periods are specified.
We re-estimate models A1-C2 with window lengths 6, 12, and 18 months and compare our estimates to those obtained in Section 6.We find that the estimates are mostly similar for all relevant datasets (see Table 6 for estimates for models A1-C1, and appendix Appendix K for re-estimates of models A2-C2).Moreover, similar patterns remain: estimates of the R&D elasticity to capital (β) are relatively lower for image classification tasks than for object detection tasks.These estimates strongly suggest that our key estimates are robust to different choices of how granular or coarse-grained time periods are specified.

Sensitivity to model assumptions about the substitutability of human scientists
In our semi-endogenous growth model, we assume that the elasticity of substitution equals 1 (i.e. is modelled by a Cobb-Douglas production function).Hence, we assume a substantial level of substitutability of human scientists for compute.In this section, we test: (i) whether our results would continue to hold if there was a lesser degree of substitutability of human scientists, and (ii) whether the data is consistent with a higher or lower level of substitutability.
The substitutability assumption is important because our semi-endogenous growth model implies that compute stock will grow faster than the stock of scientists.To see this, recall that from equation 4, the steady-state growth of capital dedicated to R&D is 1−θ+γ 1−β−θ n (which, with the estimates used in Section 2.2., this would be ∼ 2n, while the stock of scientists grows just at the rate n).Hence, we should expect that, in the fullness of time, the stock of specialized capital goods, C(t), will be substantially greater than the stock of scientists S(t).12If human scientists were more difficult to substitute with compute than we have assumed, the optimal investment path might involve larger investments in human scientists.We may ask: How much weaker of an assumption can we make for our conclusions to still follow, and is our assumption about the substitutability of human scientists reasonable?The first part is straightforward to answer.Suppose the idea production function instead followed a more general constant elasticity of substitution production function: where σ denotes the elasticity of substitution between compute and scientists.In this framework, assuming a competitive R&D sector, the share of expenditure dedicated to compute (which we will denote by f ) is given by: From this expression, we can see that in the typical Cobb-Douglas case (where σ = 1), we have that f = β/(β + γ).Note moreover, that whenever σ ≥ 1 it must be that f ≥ β/(β + γ), while when σ < 1 it must be that f < β/(β + γ).Hence, if scientists are less easily substitutable than we supposed, the capital intensity of R&D will be lower than our estimates imply.Hence, it is important to investigate whether σ = 1 is a reasonable level of substitutability to assume.We investigate this by estimating equation 15.Note that we can rewrite 15 by dividing by A(t) as follows: where ρ ≡ σ−1 σ .To simplify the estimation procedure, we can approximate this expression using the secondorder McLaurin expansion (i.e. the Taylor series evaluated at σ = 1), This is simply the translog production function, which has the well-known advantage that it is linear in its parameters and, therefore, estimable using OLS.Because of this, this approximation is widely used in similar settings (Guilkey et al. 1983;Berndt and Christensen 1973).In our case, the empirical model we estimate becomes: log git = (θ − 1) log where each variable has its usual meanings it had in Section 4. Figure 6  bootstrapping estimates of σ in 19 for each of our models A1-C2.Our estimates of σ are very tightly clustered around unity.These findings reinforce our assumption that σ = 1 is reasonable and, therefore, our inferences about capital intensity are consistent with the data.Moreover, for the tasks for which we have the most data, we observe that our estimates σ are slightly above 1.Hence, for these tasks, it is likely that, if anything, our inferred level of capital intensity of deep learning-based R&D is an underestimate of the true level after adequately accounting for the degree of substitutability of human scientists.

Limits to external validity stemming from our choice of domain
Our analysis focuses exclusively on two computer vision tasks.Although this a small segment of scientific and engineering problems that deep learning might be applied to, there are good reasons to expect that the key insights gained from studying a wide range of architectures for computer vision to broadly could carry over to a wider range of R&D problems.One broad consideration in favour of this view is that deep learning often builds on common techniques, algorithms, and similar architectures across different subfields (see e.g.Goodfellow et al. 2016, Ch. 3).Across many domains, deep learning systems are based on similar ideas and implemented using techniques and algorithms.At a high level, almost all modern deep learning systems-independently of the modality or task these are trained for-are all some 'deep' computational graph with many parameters that are learned through gradient descent along gradients of some loss function computed by backpropagation.Indeed it is widely considered (e.g. by Alom et al. 2018) that one of the first instances in which these key features of deep learning were all instantiated was with AlexNet in 2012, a model that is included in our dataset.
There certainly are some pronounced architectural divides across domains.For example, convolutional neural networks are widespread in computer vision, while transformer-based models are ubiquitous in machine translation.However, these architectural differences are also represented in our data.Indeed, our data includes convolutional neural networks, vision transformers, and architectures based purely on multi-layer perceptrons.Hence, much variation between modalities and tasks that exists within modern deep learning is reflected in the data that we consider.Some neural network architectures-such as the transformer, which is well-represented in our dataset-have been shown to operate effectively across domains, modalities and tasks (see, e.g.Reed et al. 2022).This suggests that the upshot of our findings may well generalize to domains outside of computer vision.What is more, approaches like the transformer are often amongst the state-of-the-art techniques across many R&Dadjacent tasks, such as code generation (Li et al. 2022), cheminformatics (Irwin et al. 2022), and bioinformatics (Elnaggar et al. 2020).
Moreover, work on neural scaling laws suggests that the relation between the model's size and performance scales according to the usual power-law independently of the domain the model is trained to handle.In fact, Henighan et al. 2020 find that for transformer models, there is a remarkable near-universal relation between the optimal model size and the compute budget across a range of domains spanning images, language, mathematics, video and more, which supports the notion the role of compute depends strongly on the type of technique or architecture, not the type of domain in which these techniques are applied.For these reasons, we expect that the key insights gained from studying a wide range of architectures for computer vision to broadly carry over to a wider range of R&D problems to which deep learning is applied.

How data availability influences our estimates
One reason to expect that our results may fail to apply to a broader set of problems is that we studied a set of problems within computer vision with a relative abundance of quality labelled data.In some domains-such as the problem of protein folding (where the crystal structures of proteins are expensive and arduous to generate) high-quality data might be less readily available or expensive to generate.As a result of the relative abundance of data in computer vision, the returns to compute might be higher than they would be in lowerdata-abundance regimes.To illustrate this, consider a simple model where deep learning system performance can be described as a constant elasticity production function in data D and compute C: It becomes clear whenever data and computation are gross complements (σ < 1), then, the returns to compute will be lower in low-data regimes compared to high-data regimes: This suggests that the returns to compute in high-data regimes can be unusually high.Moreover, empirical evidence has shown that AI-based ideas production is rapidly expanding in data-rich sectors such as investment management (Abis and Veldkamp 2020), and that computer vision firms with access to additional data are more innovative (Beraja et al. 2020).Might our results, therefore, fail to generalize to low-data regimes?Our sense is that it might very well generalize, particularly for economically important R&D tasks.One piece of evidence for this view comes from work on neural scaling laws for language modelling.Hoffmann et al. 2022 derive a parametric loss function in which the amount of data and the model size enters additively separably.Although this result is derived in the context of language modelling, where data is abundant, this does suggest that data and compute can substitute for data by applying it to train larger models, at least when this training is done appropriately.
Moreover, we might expect that for economically important R&D tasks, complementary investments in generating the necessary datasets to train machine learning models will be made.For these tasks, we might expect such investments to produce high-fidelity physical simulations, high-quality synthetic datasets, the proliferation of sensors, and higher-throughput measurement apparatuses.Hence, insofar as we expect R&D tasks to be of considerable economic importance, we might expect that low-data regimes are to be short-lived or otherwise atypical.For such tasks, a relative abundance of data may be representative.

Summary and implications
Our main contributions are as follows.First, we provide a framework for understanding the impact of two important trends: i) the recent breakthroughs using deep learning in R&D, and ii) the rapid scaling of computation in deep learning systems.We show that if deep learning is widely adopted in the R&D sector and induces high returns to computational capital, then technological change will, under suitable conditions, permanently accelerate.
Secondly, using data from two computer vision tasks that are considered key test-beds for deep learning, we produce empirical estimates that imply that deep learning is more capital-intensive than other forms of R&D; indeed an optimizing firm would dedicate between 29% and 44% of their total R&D expenditure on (computing) capital.This result implies a clear empirical prediction: If deep learning is widely adopted in R&D and this increases the returns to computational capital, then technological change will permanently accelerate.Consequentially, according to semi-endogenous growth theory, we will also get an acceleration of economic growth.
Thirdly, we make a methodological contribution by introducing a novel machine learning-based method for inferring human capital from scientific publications using an encoder to compress the inputs about the authors into a latent-space representation.In our approach, this encoder is tasked with learning representations of human capital that are most predictive of publication and citation-related outcomes that we have independent reasons to expect to be indicative of the quality of human capital.This human capital estimation predicts key outcomes 4-5 times more accurately than typical approaches in the literature.
Our work identifies three areas of future work that we expect to be fruitful.First, it would be valuable to better understand the diffusion of deep learning techniques within the economy, particularly within R&D.We know that deep learning is diffusing to domains as diverse as protein-folding (Jumper et al. 2021), semiconductor chip design (Mirhoseini et al. 2021), and programming (Li et al. 2022).But how quickly this diffusion is happening and what drives it is poorly understood and micro-data charting it is lacking.Such analyses would inform us about the transition dynamics of AI-augmented R&D across different industries.In particular, slower diffusion would imply a more protracted transition to the higher growth rates identified in Section 7.Such analyses might also highlight areas where R&D cannot be augmented with current AI techniques, implying more modest macro-level productivity improvements.
Another valuable area for research would be to look at whether improvements in AI techniques tend to make R&D more capital-intensive.If true, the productivity gains could be more drastic.For instance, Aghion et al. 2019 shows that continually-increasing capital intensity in R&D would result in ideas production becoming fully automated.So long as θ > 0 in Equation 2, this would produce unbounded growth.While we cannot analyze the possibility of this increase without further data or strong assumptions on our model, it seems worth studying whether the capital-intensity of deep learning varies with time.
A final area for follow-up work is how AI-augmented R&D makes deeper changes to the knowledge production.While we have made progress investigating the relative importance of inputs to knowledge production and the implications this could have for the rate of technological change, many questions remain.In our work, we assumed that knowledge produced by labs using deep learning are essentially the same as that produced by unaugmented labs.Recall that we supposed that knowledge production was modelled as: which implicitly assumes that A(t) θ -the intertemporal knowledge spillover for technological opportunities-is the same whether or not this knowledge was produced by AI-augmented R&D or not.There might be reasons to be sceptical of this assumption in either direction.AI-augmented R&D knowledge might be hard to share because deep learning is famously a 'black box' technique that does not provide easy intuitions about what is being done.On the other hand, there is evidence that deep learning systems can learn essential building blocks (e.g.laws of nature (Udrescu and Tegmark 2020)) and that machine learning artefacts such as models are easy to disseminate widely (e.g. via Hugging Face (Wolf et al. 2020)).These characteristics might make AI-augmented R&D easier to diffuse, as has already been seen with the easy ability to adapt, fine-tune, or use existing models as 'backbones' for adjacent problems.While it is unclear whether knowledge produced by AI-augmented R&D has smaller or more significant spillovers, we expect it is worth investigating whether the extent of such spillovers depends on the technology used to generate knowledge.

Selection issues and measurement error
The data used in our empirical work may suffer from selection bias issues that are important to consider.Our dataset contains just models for which authors reported enough to infer how the model was trained and how much computation was needed.We might expect such reporting to be more common among computationally-intensive papers or papers whose performance is cutting-edge, which could introduce bias into our estimates-although it is not clear that this bias would be significant.If it did, it would likely positively bias our estimates of the returns to physical capital due to two effects.First, we suspect that publications that report these details are more likely to use large amounts of computation.Second, because the cost of deploying hardware for machine learning training is expensive (see e.g.Sharir et al. 2020), large compute deployments are likely associated with more effort to optimise training runs that ensures that these resources are used efficiently.Such efficiencies could lead to those deployments being better able to leverage computation to achieve particular results.As a consequence, our data would disproportionately be composed of those using computation more effectively.
There are reasons to expect that such bias would be small.First, leading machine learning conferences require or at least encourage researchers to report details about hardware usage. 13Second, the implementation of training runs are likely to be fairly similar across papers, as researchers typically use one of only a few types of open-source software that have distributed training settings and profilers that might be broadly comparable in achieving utilization.Moreover, we address this selection issue by using two different methods for inferring the amount of computation that was used in training (so that we could capture a wider range of models than would otherwise be possible).Hence, although selection bias might be introduced by the machine learning experiments included in our dataset, it is unclear whether this is a significant issue.Additionally, the variation in the amount of compute used spans over four orders of magnitude and will therefore dominate the variance in utilization rates, which suggests that this bias is likely to be small.Even if our estimate for the effectiveness of compute is overestimated, it is unclear whether this would imply that our estimate of capital share is over-estimated.This is because the expertise needed to optimize training runs will likely be captured in our human capital estimates so that these would also be overestimated, and it is unclear which effect would be more significant.
There are also potential measurement-error issues with our estimates of the amounts of training computation to train machine learning models.Firstly, we estimate only the amount of computation used in training the model in the final training run, not all training runs.It is common for deep learning experiments to perform many (usually smaller) trial runs.These trial runs help inform the selection of hyper-parameters-parameters whose values control the learning process-used in the final training run.Our estimates of the compute that is used to train deep learning models are, therefore, likely to be a fraction of the total computation that was used.If this were a constant fraction, this would not bias our estimates of the relevant elasticities.It is currently not clear to the authors whether larger training runs use relatively more compute in selecting hyper-parameters than smaller training runs (in proportional terms).Regardless of the direction of this effect, the ratio of total computation used to the amount of computation used in the final training run will not be equal across experiments.This introduces noise in our estimates of elasticities, which will attenuate our estimates of the elasticity of computational capital downward.
Similar attenuation bias might be introduced in the process of estimating the computational inputs used to train deep learning models, as these rely on various approximations.To reduce the variance of our estimates' variance, we used multiple methods for producing these estimates and took the average of each.Given that the variation in the amount of compute used spans many orders of magnitude, minor errors due to the use of approximations are likely to be washed out.Nevertheless, our use of approximate techniques in estimations could have attenuated our estimates, which would mean that our results are underestimates.

Conclusion
Much has been written on the potential mechanisms through which AI impacts productivity and output growth.Motivated by recent trends in deep learning, we provide empirical evidence for one of these mechanisms: the ability for scientists to efficiently harness more computational capital in R&D.We argue that this mechanism is consequential because standard endogenous growth models imply that greater capital intensity in R&D produces permanent increases in productivity growth and economic growth.We present evidence that deep learning-based computer vision research is significantly more capital-intensive than virtually all other R&D sectors in the U.S. If, as we argue, deep learning has similar impacts in other areas of R&D, then the widespread adoption of deep learning-based R&D could double U.S. productivity growth rates.

D1: Data on deep learning models
Our data of deep learning models was collated as follows.We use data on deep learning models for computer vision tasks from Thompson et al. 2020, which contains 46 publications with reported performance of the highest-performing model on the relevant benchmark, and estimates of the computational inputs required to train it.We supplemented this dataset with data on the test and training settings for each of these models, which we found by manually searching the publications.In case we could not find what training or test-settings were used, we dropped the model from out dataset.
We further collected publications from arXiv and Papers with Code that present models for either of two computer vision tasks anytime after 2012.For each of the best-performing models presented by each publication, we record the performance and calculate the computational inputs required to train the highestperforming model, using methods described in Sevilla et al. 2022b.We further collected the test and training settings for each of these models, which we found by manually searching the publications.In total, our dataset contains 148 models, their reported performance, estimates of the computational inputs required in training, and their test and training-settings.

D2: Data on baseline performance
Our data on machine learning baseline results may be found here.

D3: Bibliometric panel dataset
We generate a yearly panel dataset of feature for 223,703 authors on 115,235 machine learning papers from 1993 to 2021 using data from arXiv, a popular open-access repository for machine learning papers; Microsoft Project Academic Knowledge; and Scopus (arXiv.org2022; Project Academic Knowledge 2022; Scopus 2022).Our panel dataset contains yearly series on all of the following author-specific features: number of publications, number of total citations, and authors' h-index (a measure of researcher productivity and impact, see Hirsch 2005).To build the dataset, we query arXiv for all papers from the following categories associated with machine learning: Machine Learning (stat.ML), Artificial Intelligence (cs.AI), Computation and Language (cs.CL), Computer Vision and Pattern Recognition (cs.CV), and Learning (cs.LG).This produces the entire universe of machine learning papers from 1993 to 2021.Using the procedure described in Muller et al. 2020, we match these papers to their corresponding entries in the Microsoft Project Academic Knowledge database to obtain author names and affiliations.We match Scopus publications to those in the Microsoft dataset using a combination of DOIs and string-distance matching measures for their titles and author names.For each iterative step, we perform manual quality checks on a random sample of matches, the highest of which have false positive rates of 4%.Through this procedure, all but 22,056 authors were matched.
To generate each series, we used bibliometrix, an R tool (Aria and Cuccurullo 2017), to query Scopus' API and retrieve all the publications of for each matched author in our dataset.For each publication, we then used pybliometrics, a Python interface to Scopus (Rose and Kitchin 2019), to retrieve the citation trajectories-the number of incoming citations-for each publication per year, excluding self-citations.Authors' research fields were identified by retrieving Scopus' subject area data on each of their publications.For each author, we further gathered data on fractional credit scores for citations and publications, which were generated by dividing the number of citations and publications by the number of authors on each publication.

D4: Grants dataset
Data on grants received by institutions comes from the Dimensions database of academic publications (Dimensions AI 2022).We query the API for each institution in the Scopus panel dataset on author affiliations.This returns nominal data in USD on all grants per project-year and per institution-year.In order to make intertemporal comparisons, we deflate the institution-level data using the GDP implicit price deflator (U.S. Bureau of Economic Analysis 2022).We use the Damerau-Levenshtein measure of string distance to match the institution-year data to the matched Scopus-Microsoft cross-sectional paper titles dataset.For the top 1000 papers, we find 731 matches, of which 1.2% are false positives.Overall, we match 50.8% of institutions in our titles dataset.On a random sample of 100 matches, 5% were found to be false positives.We report total grants received for the past 5 years by all authors' departments or employers in real 2015 USD.

D5: Institutional rankings data
Institutional rankings for Computer Science was generated using Computer Science publication data by csmetrics.org(Institutional Publication Metrics for Computer Science 2022), an online dataset of institutional publication and citation metrics for computer science.This data is based on measured (retrospective) and predictive (prospective) metrics to compute a measure of publication impact of computer science publications by institution.Each institution in our dataset was matched to the entities in the csmetrics.orgdatabase by using Levenshtein distance, a measure of string-distance.Since not all institutions in our dataset were also present in csmetrics.org'sdatabase, we were able assign ranks to only 89.6% of unique institutions in our dataset.The institutions we were unable to retrieve rankings for were smaller institutions that did not appear often in our data.In particular, institutional rankings were assigned in 99.4% of authors for whom institutional data was available.Moreover, the matching rarely produced false-matches: of a random sample of 120 matches of institutions, only 0.9% of matches were found to be incorrect.

Figure 3 .
Figure 3. Evaluating our human capital measure.Figure (3a).Correlations between human capital measures and publication outcomes for a hold-out set of 4,081 publications.Error bars indicate the 95 confidence interval.Prior citations are the cumulative total citations received by the authors up until the year prior to publishing the relevant publication (excluding self-citations).H-indices and publication counts are evaluated at the year of publication for each author.Journal rank here represents the ordinal value of each journal in descending order of shows the precision (defined as 1/MSE) of predictions of our DNN-based model and Lasso regressions on the same hold-out set of 4,081 publications estimated separately for each outcome.

Figure 4
Figure4.Implied optimal R&D expenditure breakdown.Implied capital-cost shares given the estimates presented in tables 4 and 5, computed as β/( β + γ).Error bars represent 90% confidence intervals generated by bootstrapping 10,000 iterations.We use the bias-corrected percentile method for bootstrapping confidence intervals for ratios outlined in Campbell and Torgerson 1999.

Figure 5 .
Figure 5. Predicted productivity growth under widespread deployment of AI in R&D.Steady-state productivity growth as a function of the implied capital-cost share in a competitive R&D industry when the elasticity of R&D output to the stock of ideas (θ) is 1/2 and the elasticity of R&D output to labour inputs (γ) is 2/5.Markersindicate point estimates of implied optimal R&D expenditure with deep learning according to models A1-C2 as estimated in section 6. "Current region" indicates the current level of capital intensity of R&D according to NSF data, which our semi-endogenous growth model predicts to result in 0.5% to 1.3% productivity growth, a level consistent with observed recent US productivity growth.
Figure6.Estimates after removing a random 5% of dataset Median estimates when taking a random sub-sample of our dataset that excludes 1.6% to 5% of the total number of observations.Results are displayed as violin plots, using kernel density estimation to create the distributions.Inside the violins, the box plots show median and interquartile ranges.

Figure 7 .
Figure 7. Elasticity of substitution estimates Kernel densities of the estimates of σ across each of our main models generated by bootstrapping 10,000 iterations.

Table 4 .
Deep learning production function estimates.Estimation results for image classification (n = 96) and object detection (n = 40).

Table 6 .
Estimation results for separate models with alternative window lengths.Estimates of models A1, B1, and C1 with different window lengths.Specifications are the same as in the main analysis in Section 6.