Causal theories, models and evidence in economics—some reflections from the natural sciences

Models have been extensively analysed in economic methodology, notably their degree of ability to provide explanations. This paper takes a complementary, comparative approach, examining theory development in the natural sciences. Examples show how diverse types of evidence combine with causal hypotheses to generate empirically based causal theories—a cumulative process occurring over a long timescale. Models are typically nested within this broader theory. This could be a good model for research in economics, providing a methodology that ensures good correspondence with the target system—especially as economics research is largely empirical, and has effective methods for causal inference. This paper analyses the key features of three successful theories in the natural sciences, and draws out some lessons that may be useful to economists. Some examples of good practice in economics are noted, e.g. involving money and banking, and the growth of the state. On the other hand, the widespread pre-crisis use of dynamic stochastic general equilibrium (DSGE) models that ignored the financial sector raises the question, how to realise what has been omitted? Nesting models in an empirically based causal theory could solve this. Furthermore, some phenomena have clear explanations, but mainstream theory obscures them, as with the Lucas puzzle about the *Corresponding author: Michael Joffe, Department of Epidemiology and Biostatistics, Imperial College London, Norfolk Place, London W2 1PG, UK E-mail: m.joffe@imperial.ac.uk Reviewing editor: Duncan Watson, University of East Anglia, UK Additional information is available at the end of the article

ABOUT THE AUTHOR Michael Joffe started off as a biologist: his first degree was in physiology at Cambridge, and later he worked for many years as an epidemiologist at Imperial College London. He published in numerous biomedical journals, including The Lancet. He later trained as an economist at Birkbeck College, University of London. His publications in economics include topics such as growth, the capitalist firm and feedback systems in the economy. He has also published in philosophy of science, on evidence discovery, causal inference, causation in biology and causal systems.

PUBLIC INTEREST STATEMENT
Economics traditionally relies on models to elucidate the workings of the complicated system that is the economy. However, there is no systematic way of ensuring these models correspond well with reality. An alternative way of generating knowledge is used in the natural sciences, e.g. biology-which also focuses on a complicated, heterogeneous and open-ended reality, much like the economy. This paper examines how the natural sciences combine diverse types of evidence with theorising, and argues this approach could be beneficial for economics. Models would still be used, but nesting them in a well-founded causal theory would clarify what is omitted in their construction. Furthermore, some phenomena such as international capital flows are readily understandable using an evidence-based approach, yet appear puzzling to mainstream economic theory. Economics could beneficially learn from the natural sciences.

Introduction
One important focus of economic methodology has been the analysis of models, e.g. whether they contain-or can in principle contain-true statements about the real world (Mäki, 2011), and the extent to which they are able to provide explanations (Reiss, 2012 and the ensuing discussion in JEM: Alexandrova & Northcott, 2013;Grüne-Yanoff, 2013;Hands, 2013;Hausman, 2013;Mäki, 2013;Reiss, 2013;Rol, 2013;Sugden, 2013). This approach fits well with the idea that philosophical analysis should begin with the practice of scientists, or here specifically with economists, because modelling is central to the discipline of economics as actually practiced. 1 This paper takes a different, complementary, approach. It examines the practice of natural scientists who investigate a complicated, heterogeneous and open-ended reality, much like the economy. The main focus is on biology, but a similar approach is widely adopted within the natural sciences and is not peculiar to biology. The natural sciences have proven highly successful in developing causal theories that explain important aspects of how the world works. In some cases, these causal theories are also useful in making predictions. The purpose of the paper is to draw methodological parallels between these scientific practices and economics, specifically to argue that they could provide a useful guide for generating reliable knowledge about the economy.
The use of biology as a source of methodological insight accords with Marshall's famous dictum that "The Mecca of the economist lies in economic biology" (Marshall, 1920; preface to the 8th edition, p. xii); this was on the basis that "economics, like biology, deals with a matter, of which the inner nature and constitution, as well as the outer form, are constantly changing" (Marshall, 1920, appendix C, p. 637). This position is reinforced by the observation that the biological sciences have progressed enormously since Marshall's day.
Opinion is divided as to the degree of success that Marshall himself achieved in following this advice. For example, Gee considered that these statements were "not mere window dressing; they reflect an attitude which profoundly influences the style of analysis throughout the Principles" (Gee, 1983). On the other hand, according to Hodgson (1993), the biological aspects of Marshall's thought were never well developed by him, and were obliterated by his followers. Nevertheless, a similar opinion to Marshall's was put forward much more recently by Hahn (1991): I am pretty certain that the following prediction will prove to be correct: theorising of the 'pure' sort will become both less enjoyable and less and less possible ... rather radical changes in questions and methods are required ... the signs are that the subject will return to its Marshallian affinities to biology.
My aim is to describe an approach to theory development that is only incompletely developed in economics. 2 The hope is that economists will find it useful to have an understanding of the way that successful theories have been constructed in the natural sciences.
The meaning I attach to the term "theory" will emerge from brief outlines of case studies that convey how natural scientists have gone about their work, rather than attempting a systematic analysis based on the verbal dissection of the various possible meanings of the term as used by methodologists and by scientists. After presenting the case studies, I identify the main features that could be useful in economics, and discuss how theory in this sense relates to models. I then present some examples of the use of a similar approach by economists, and other examples where economic theory could benefit from it. Finally, I make the case that it could be used more systematically in the development of economic theory than has traditionally been the case, and that this would have the advantage of producing theoretical accounts that bear a systematic resemblance to actual reality.
This cross-disciplinary, comparative perspective is complementary to the existing literature. It is bottom-up in that it focuses on the actual practices of scientists, but with the unusual feature that the scientists in question are biologists and other natural scientists, whereas the target discipline is economics.

Broad, empirically informed causal theories
The case studies in this section are deliberately chosen so as not to require expert knowledge in biology or other natural sciences. They are intended to be reasonably representative of this methodological approach to doing science, although representativeness cannot be guaranteed-others may judge that I have been unintentionally selective.

Germ theory of disease
The germ theory of disease emerged gradually in Western Europe between the sixteenth and nineteenth centuries, and then became the established explanation of many diseases. Previously, European medicine had been dominated by the miasma theory-the idea that disease was due to a poisonous vapour or mist, emanating from decomposed matter. Related to this, certain types of creature such as maggots were thought to arise from dead flesh by a process known as "spontaneous generation"-rather than from parents, i.e. causal agents very much like themselves.
In 1546, Fracastoro proposed that epidemic diseases are caused by transferable seed-like entities, but at the time this suggestion lacked direct evidence. It also lacked plausibility until micro-organisms were first observed by van Leeuwenhoek, the pioneer of microscopy and microbiology, in the 1640s. In the early eighteenth century, Fracastoro's idea was linked, e.g. by Andry and by Bradley, with the existence of micro-organisms on the grounds that they could provide a plausible causal mechanism. Direct evidence was not provided until 1835, when Bassi demonstrated that a disease which was decimating the silkworm industry in France and Italy was due to a contagious living entity (now known to be a fungus named Beauveria bassiana). Bassi also introduced successful interventions, including the use of disinfectants, separation of the rows of feeding caterpillars, isolating and destroying infected caterpillars and keeping farms clean.
In the mid-nineteenth century, evidence on the transmissibility of certain diseases was obtained, relating to the phenomenon itself without specifying the mechanism. In 1847, Semmelweis realised that puerperal fever was being spread to women in labour by doctors who had been conducting autopsies. Enforcing a hand-washing rule using chlorinated lime water reduced the mortality from childbirth from 18 to 2.2%-although his findings had little impact on the medical profession at the time.
During the 1850s, Snow established that the risk of cholera was related to the household's water supply. In one study, a simple statistical analysis showed that households supplied by the Southwark and Vauxhall Company had more than nine times the risk of those supplied by the Lambeth Company; the latter drew its water upstream from the Thames, before it reached London and its sewage, whereas the former supplied water contaminated by sewage. Another study demonstrated that a severe outbreak in Soho, central London, was spatially clustered around a particular pumplater found to have become contaminated by a cesspit. Snow had no knowledge of the rudimentary germ theory of disease outlined above, but his scepticism of the miasma theory motivated his research-the findings of which were not accepted by government officials until much later.
In subsequent decades, work by Pasteur, Koch and others established beyond any doubt that some diseases are caused by micro-organisms. Their work included research on rabies, tuberculosis, anthrax and many other diseases. A pattern was emerging: the hypothesis that many diseases are caused by micro-organisms was established, together with increasing information on related ideas such as immunity. Subsequent research, e.g. Ross's work on malaria, showed that some diseases involve also a vector such as the mosquito.
How then can we characterise the key features of the germ theory of disease? The core theoretical concepts are that (1) micro-organisms can invade the body and cause disease; and (2) micro-organisms multiply. A corollary is that such diseases can be spread either directly from person to person, via an inanimate medium such as water or by a vector. This clearly meets the criterion for "a genuine scientific theory" set out by Reiss (2011): "a small number of explanatory hypotheses that can be used over and over again in the explanation of a wide range of phenomena".
An important feature of the germ theory of disease is that its language refers to capacities of the important entities: the capacity of particular types of micro-organisms to invade the body and cause disease; their capacity for reproduction. Other components also involve a similar language, e.g. the capacity of the human immune system to defend against infection, and the capacity of insect vectors such as mosquitoes to play their role in the micro-organism's life cycle. The theory sets out to describe how (this aspect of) the world works, in terms of the structure and capacities of the component entities, i.e. of causation. It is therefore ontic not epistemic in intention.
As would be expected of a hypothesis involving causation, the history of the germ theory of disease involves the interweaving of different types of evidence-and specifically, both of mechanistic and difference-making kinds. 3 Moreover, the participating scientists expect that they should eventually fit together neatly, in accordance with the view that scientists operate according to the principle that a causal relationship is one that has a mechanism that by its operation makes a difference (Joffe, 2013).
Another feature of the germ theory of disease is that it evolved over time. The core concepts, from Fracastoro's transferable seed-like entities onwards, emerged long before the evidence was adduced that showed the theory to be true of many diseases. The theory was modified and amplified as evidence accumulated, e.g. to include insect vectors. A theory of this kind is therefore broad and descriptive, built on an interlocking nexus of evidence and indefinitely alterable and extendable to new phenomena in the light of new evidence.
It also encompasses heterogeneity: the biology of rabies, anthrax, tuberculosis and malaria are quite different, but the core concepts remain applicable-with the details depending on the specific evidence on each disease.
In addition, such a scientific theory is "allowed" to be incomplete. Not all diseases are caused by infectious agents, and not all that are so caused are contagious. Conversely, not all micro-organisms cause disease-indeed it is now known not only that most are harmless, but that some types of micro-organisms are beneficial, and even essential. Finally, multiple causation is permitted: the impact of an invading horde of micro-organisms may depend on unrelated factors such as nutritional status or genetics.
Several features that apply to the germ theory of disease-and, as I will argue, to scientific theories more generally-do not apply to models; this theme will be discussed in more detail in Section 4. These include the causal language of capacities; the ontic rather than epistemic focus; the importance of the role of evidence of multiple interlocking kinds; and the property that they are broad, descriptive and heterogeneous, as well as being incomplete and allowing multiple causation as described in the previous paragraph.
Models do, however, appear in the germ theory of disease literature. A prime example is Ross' model developed in the 1910s, which he called "a priori pathometry": he calculated that below a critical level of the ratio of the number of mosquitoes to the number of humans, a malaria epidemic would be unable to propagate itself. A similar approach, using differential equations to model transmission dynamics, has been extended to cover infectious diseases in general (Anderson & May, 1991). These take account of the specific causal properties of each of the diseases.

The circulation of the blood in humans
Before Harvey's ground-breaking research, published in 1628, it was generally believed that the blood oscillated in the blood vessels and that it was produced in the liver, a theory dating back to Galen in the second century and still regarded as authoritative in the seventeenth century. Harvey conducted a large variety of investigations, culminating in his theory that the blood circulates in a closed circuit rather than oscillating, and that it is not newly created in each heartbeat. His research methods included observations and experiments on a large variety of animals, mainly vertebrates but also some invertebrates. For example, by ligating arteries and veins, he established that blood flowed from the heart in the arteries and towards the heart in veins, and that the valves in the veins and in the heart itself guaranteed this. There were two separate circulations, one from the right side of the heart that perfused the lungs, the other from the left side of the heart to the rest of the body. One of his studies was quantitative, and showed that the heart pumps out 540 lb (245 kg or about 240 L) each minute, making the new creation of this quantity of blood highly implausible. As is clear from this description, he combined empirical research with careful attention to mechanistic hypotheses. The heart was merely a pump, not the mystical seat of the spirit.
The main missing element in his account was to explain how the blood got from the arteries to the veins-his idea was that it passed through "pores" in the flesh. This was shown to be incorrect once microscopes became available, when Malpighi observed capillary vessels in 1660, thus filling the gap.
This particular example of a physiological theory is unusual in that so much was achieved so early by one person. It has not only stood the test of time, but has also been augmented by a large variety of more recent findings-this is far more typical: the growth of this type of knowledge is cumulative. For example, plasma proteins exert osmotic pressure, thus pulling fluid into the circulatory system, balancing the hydrostatic pressure that results from the pumping of the heart. This can be expressed as the Starling equation (1896), which models fluid movement between compartments.
Another important addition to knowledge about the circulatory system is that the force of contraction of the heart muscle increases with a larger volume of blood filling the heart when all other factors remain constant (known by various names such as "Starling's Law"). This synchronises the output of the heart with the amount of blood reaching it, among other things. The mechanical and molecular basis of this phenomenon is now well understood. In addition, the output of the heart can be affected by other factors, such as exercise.
Although not generally described as a theory (as is typical in physiology), this account of the blood circulation has similar properties to those listed for the germ theory of disease. Its core concepts are the capacities of its components, this time with a heavy emphasis on structure, e.g. valves, as well as the propensity of heart muscle to contract rhythmically, with additional features in the more recent contributions. These include the importance not only of capacities, but also of flow and its causal efficacy, e.g. in a quantitative sense-an aspect of causation that perhaps needs more attention both in the philosophy of causation and in economics. As with the germ theory of disease, the development of this theory involved the interweaving of different types of evidence, both of mechanistic and difference-making kinds.
A difference between this example and the germ theory of disease is that there is far less heterogeneity: whereas, the germ theory is different for every disease, one can refer to "the" human blood circulatory system, and the theory was developed on this basis. The reason for this is that it is an evolved system in a single species. Evolution leads to the development of systems that are rather similar across individuals. This is typical in physiology, whereas the germ theory involves the coevolution of different species, which is far less predictable.
Causal incompleteness and multiple causation also feature, e.g. that "Starling's Law" only applies to healthy circulatory systems (a ceteris paribus condition), and that the output of the heart can be affected by different factors. In addition, important aspects of the physiology of the circulation have been captured as mathematical models such as Starling's equation; this is nested within the broader descriptive, causal theory.

Continental drift and tectonic plate theory
As early as 1596, Ortelius noted the similarity of the coastlines of continents on the opposite sides of the Atlantic Ocean, based on maps of the time. Many others followed, but the theory of continental drift, as it became known, was contradicted in the mid-nineteenth century by Dana's Permanence Theory, which maintained that the main outlines of continents had existed since the earliest time.
Further types of evidence accumulated that supported continental drift, notably similar rock formations on the either side of oceans, as well as observations on ancient forms of animal and plant life that similarly were present in widely separated continents. The evidence for this theory was proposed in a more complete way by Wegener in 1912, but it was almost half a century before it was accepted.
One of the problems was the lack of a plausible mechanism that would be strong enough to move continents. Also, some of the proposed ideas for this required, e.g. that the Earth should have been increasing in size.
The situation began to change in the late 1950s, with the arrival of new types of evidence, notably the observation of seafloor spreading. This occurs at mid-ocean ridges, such as the one that runs down the middle of the Atlantic Ocean. Volcanic activity creates new seafloor, as magma rises through fractures and then cools. This suggested that the seafloor itself moves and carries the continents, rather than the continents ploughing through the sea as in earlier theories. This accorded with other evidence that was accumulating, e.g. on patterns in the Earth's magnetic field. It led to the development of tectonic plate theory, which quickly became widely accepted.
The new theory involved observations on the different densities of continental and oceanic parts of the Earth's crust, convection flows in the upper mantle and the recognition of the different types of boundary that occur where the plates meet. The boundaries of the plates were mapped, with seven or eight major plates and numerous minor ones being identified. It was found that the formation of new crust in the mid-ocean ridges is balanced by the loss of crust into the mantle as part of the process of subduction, where one plate is pushed under another (implying that the Earth does not have to change size).
The principles of plate tectonics were able to explain more than the phenomena that had led to the idea of continental drift: volcanic activity and earthquakes are also attributable to the same forces, and occur particularly at subduction zones. Mountain building is also explicable from the interaction of tectonic plates, when one is crumpled and pushed upwards. However, the exact mechanism involved in plate tectonics is not yet established, the main candidates being mantle convection dynamics, gravitation and Earth rotation.
The core concepts of the current theory of plate tectonics thus involve magma flows in mid-ocean ridges, differential densities of parts of the Earth's crust, convection flows in the upper mantle, the physical processes of subduction and mountain building, etc. These concern the capacities, including the structure and composition, of the relevant entities, as well as flows. The language is causal.
Once again, we see different types of evidence interweaving, with a variety of difference-making evidence for the older continental drift idea, some in geology and some in biology-both of living organisms and of fossils. In this story, it was only the recognition of the plausibility of a mechanism that led to its acceptance, and this was based on observations of the ocean floor including ridge spreading and magnetism ("magnetic striping"). It is noteworthy that experimentation did not play any important role in either the original theory of continental drift or in the theory of plate tectonics. All the key pieces of evidence were observational.
A conspicuous feature of tectonic plate theory is that it is able to explain not only the group of phenomena relevant to continental drift, but also a range of other important topics such as volcanic activity, earthquakes and mountain building. It may not accord perfectly with Reiss' "small number of explanatory hypotheses", but the range of explanatory power of a single causal account is nevertheless impressive.
As noted for previous examples, the theory evolved over time, but in this case the history is of a precursor version, continental drift. It set out to explain the same range of phenomena, but was unsuccessful.
In this context, as with the Starling equation that models the balance of forces affecting the fluid in the circulatory system, mathematical models can be developed that relate to specific components of the overall theory. For example, Le Pichon and Sibuet (1981) modelled the formation of a particular type of ocean and continent boundary ("passive margin"), using a mathematical stretching model, and tested this against data from the north-eastern Atlantic. As with the Starling equation, the model is embedded within the broader causal theory, i.e. it is concerned with phenomena within an empirically based causal account of how this aspect of the world actually works.

Empirically based causal theories
These examples demonstrate how scientific progress can occur. It is a cumulative process that generates a theory able to explain all aspects of a phenomenon-its diverse range of features together with the causal forces that bring them about. Such theories are not one-off creations, rather they start tentatively, and then progress (albeit often with long delays), possibly with some blind alleys. Eventually, a successful theory reaches a point where the accumulated knowledge is secure, and one can safely say that the theoretical categories correspond to those of nature. Even after this point is reached, further contributions continue to be added.
Such theories can be characterised as empirically based causal theories. They are ontic-unlike models, which are explicitly epistemic. And they accommodate heterogeneity, incompleteness and multiple causation, in the senses used above. This again distinguishes them from models, in which simplicity is a key virtue. They also have a primary orientation towards causal explanation, aiming to show how an aspect of the world works, rather than prioritising prediction, as tends to be a feature of economic theory. In so far as science has been able to make successful predictions, prior causal understanding has been the foundation on which the predictions are made.
The first contribution is sometimes a causal idea, as with the germ theory of disease, and sometimes an observation, as with continental drift. Crucially, evidence (induction) and causally explanatory ideas (deduction) are both involved. In principle, the process is iterative, but as we have seen it is not necessarily as neatly ordered as that term might imply.
Evidence is diverse. Experiments have played an important role in some cases, but observational evidence has also made crucial contributions: as already remarked, plate tectonics was not based on experimental evidence, and the same is true for much of biology, including evolutionary biology, ecology and epidemiology. Also, evidence is both of the difference-making type, and aimed at elucidating the mechanism-its structure/composition, as well as how it works. Diversity of types of evidence is a key strength: a large number of distinct types means that they can be mutually corroborative.

Theories in economics
Cumulative building of empirically based causal theories as described in the previous section, which is widely used within the natural sciences, is not the mainstream methodology in economics. Rather, the usual approach is to go straight for a model. There are exceptions, however.

Examples of good practice
An instructive example is the study of the nature and origin of money. By the early twentieth century, a good understanding had been achieved by Schumpeter and others (see Chick, 2005 for a historical review). One important component was the realisation that the assumption of monetary neutrality, which is routine in orthodox economics, obscures many central features of modern economies. A second was the debunking of the assumption that all money is created by the state. There are numerous important issues here that are beyond the scope of the present paper, but the central insight is that money is endogenous: it is created within the economy in response to particular influences.
From the 1970s, this understanding was obscured by a schematic account that became the new orthodoxy, which still appears in some textbooks. Not only is money creation assumed to be a state activity and exogenous (uninfluenced by what it going on in the economy), this story also involves a systematic misrepresentation of the banking sector. It is said that banks accept deposits, and then use these "loanable funds" to make loans-a notion that is still frequently heard in ordinary discourse among economists.
The insight that most money is endogenous has been kept alive during the last few decades by a few economists who resisted that new orthodoxy. Recently, good accounts have been published that explain precisely how the monetary system works (McLeay, Radia, & Thomas, 2014a, 2014bRyan-Collins, Greenham, Werner, & Jackson, 2012). In brief, the vast majority of money is created by banks-97% in modern-day Britain, for example-and this is done when loans are made or assets are bought; the corollary is that repayment of loans involves the destruction of money. The implication is that the money supply depends on the activity of the banking sector, and is a side effect of banks' attempts to generate profits by making loans-which depends on the demand for loans and on banks' assessment of their risk. Central banks also play an important regulatory role.
The way that the correct description of the money-generating mechanism was achieved was by the patient documenting of what actually happens in the financial system, describing how banks really behave. This is reminiscent of the emphasis on description of the reality in front of the observer that characterises natural sciences such as biology, followed by generalisation and explanation. It should actually be less problematic to do this in economics, for example, compared with Harvey's task when trying to describe what was happening in the fast-moving circulatory system, but this would involve overcoming what appears to be an aversion among researchers to describing the obvious, i.e. what is widely seen to be happening.
Another example of theory development based on systematic empirical work is a two-volume study of the growth of the modern state (Lindert, 2004). This describes the growth of the state qualitatively and quantitatively in each of the major countries that developed rapidly after the industrial revolution, together with an analysis of the causal factors in that country. It then provides an overview of the forces behind state growth, while acknowledging the between-country heterogeneity. Thus, it encompasses description, generalisation and explanation, as well as the limits to generalisation imposed by factors specific to each country. This use of comparative economic history is a good model for developing theory, not least because it ensures that any explanation or suggested causal mechanism corresponds to the spatial and temporal patterns that actually occurred, as well as paying attention to specific factors that may have been present in certain countries.

Models: How to recognise what has been omitted?
A prominent example of how some aspects of orthodox theory are seriously problematic relates to the 2008 financial crisis, and especially to the failure of most of the economics profession either to predict it in advance, or to understand it once it had happened. The standard workhorse model in macroeconomics for some time has been the dynamic stochastic general equilibrium (DSGE) model. In fact, there is a large variety of DSGE models, with various features. But in the run-up to the crisis, the models that were in use assumed the absence of financial frictions-this may not always even have been done explicitly; rather, the financial sector and any possible problems arising in it were just absent from the model. Since the crisis, it has proved unproblematic to add the necessary financial component, but this is little consolation. In a wide variety of modelling traditions, one of the basic tenets is that one should be explicit in establishing the boundaries of the model, and clear about what has been left out (see e.g. Sterman, 2000). The issue here is, how do the modeller, and the user of the model and its results, realise what has been omitted?
If models are conjured out of the air, as appears often to be the case with economic theory, there is no obvious answer to this question, apart from exhortation to be careful. However, if the approach to theory development advocated in this paper is followed, and models are then nested within a broader, empirically based causal theory, then it is relatively easy to see what has been omitted in the modelling process.

Standard theory can be a barrier to causal understanding
Examples of orthodox methods of theorising that run into trouble are easy to find. A prime example is the classic paper by Lucas (1990), which attempted to explain why capital does not flow, as standard theory predicts, from capital-rich to capital-poor countries, because the latter would have higher returns in accordance with diminishing returns. This paper is a gem of its kind, the language and the mathematics both being highly elegant.
Lucas started with neoclassical theory, and then compared reality with it, resulting in a puzzle or "paradox" that he attempted to explain. But let us start at the other end. From the early 1990s until recently, the main observation about international capital flows is not the lack of massive rich-topoor country flows, but the presence of massive poor-to-rich country flows. The prime example is of capital flows from China to the United States, but the phenomenon is more widespread. This is very straightforward to explain, if one combines a variety of types of evidence. The 1978 reforms in China (institutional description) resulted in high productivity at very low cost (comparative price data), as well as high profitability and capital accumulation in the corporate, household and public sectors (data on capital). This generated a huge surplus above what was required for investment, even with the extremely high investment levels that characterised the Chinese economy. The excess capital, much in hard currency as a result of export-oriented manufacture, flooded out of China and was used to purchase US Treasury Bonds, real estate in America and elsewhere, and much else. Thus, an explanation involving large-scale capital generation and consequent flows emerges from the evidence.
China is unusual in the scale of this phenomenon, but in fact the earlier successful East Asian economies had a parallel experience on a smaller scale. Also, a similar phenomenon has been observed with other economies that have generated capital in excess of needs, notably the major oil exporters.
What then is the problem with Lucas' analysis? Part of the answer is that it is myopic: instead of seeing the world as it is, its scope is determined by the dominant theoretical perspective-its starting point is epistemic not ontic. And even though it then confronts the theory with the data, its vision remains restricted. This is reminiscent of the argument that economic analysis should be data-first not theory-first (Juselius, 2011).
A further point concerns flows: it is well-recognised that neoclassical theory tends to be static. But it is less well understood that this involves a systematic neglect of flows, which can have causal efficacy as in this case. Neglect of flows is deeply rooted in neoclassical methodology: for example, in basic consumer theory, rather than the whole situation of the consumer being explored, the analysis fixes the buying power available (the "budget constraint"), so that the theory now becomes reduced to decision theory. Standard theory removes a crucial dimension of economic life: the flows become invisible-which is odd, given that market exchange is always and everywhere a two-way flow.

Intuitive knowledge can protect bad theory
One other potential problem can occur when models are not embedded in a wider causal theory. Economists typically describe their situation as requiring familiarity and expertise with an array of different models that cover a particular topic. The task is then to select the most appropriate model for a particular purpose, using their accumulated judgement. The issue is the relationship of the content of the available models with substantive knowledge that is not incorporated in this "theory". For example, economists working on Lucas' puzzle must surely know how China (and other similar countries) came to be capital exporters-everyone does. Yet the recent literature ignores the obvious, and seeks to explain how the data can be combined with standard theory. For example, one prominent strand is that the capital-export phenomenon is due to the financial weakness of the exporting country (Buera & Shin, 2011;Caballero, Farhi, & Gourinchas, 2008;Gourinchas & Jeanne, 2013;Prasad, Rajan, & Subramanian, 2007;Sandri, 2010)! In such a situation, pointing out that large-scale capital export from China has an obvious explanation may not be accepted as evidence against the theory, because it is already known to the theorist.
Thus, there is a danger that bad theory can be protected by the co-existence of substantive knowledge with "theory" that does not incorporate it. An important instance is the idea that any macro concept, such as that of economic growth, requires "micro-foundations", in the sense of "optimal decision rules of economic agents" (Lucas, 1976). But this would imply that economic phenomena are universal, whereas the evidence shows that there are fundamental dissimilarities between different types of economic system. For example, the difficult task of explaining the observed patterns of growth across time and across countries is made more difficult, possibly intractable, by the insistence on a universal principle of this kind. Modern economic growth was unknown before the industrial revolution, and since that time it has been experienced by some economies but not others-all economists surely know this, even if their grasp of economic history is not profound. But it means that growth cannot be explained in terms of a universal human attribute, or postulated attribute, such as optimisation. The insistence on the need for micro-foundations is held by many economists, but the "news" that growth has had a specific spatial/temporal distribution is not news to themand therefore it would not be accepted as evidence against the theory.

Correspondence with the spatial and temporal distribution
Differential economic growth is a topic where the approach advocated in this paper promises to be especially fruitful, but where it has been ignored by mainstream economic theory. The spatial and temporal distribution shows extremely large contrasts-the "great divergence" of the nineteenth century and much of the twentieth (Pomeranz, 2000), together with partial convergence in recent decades as catch-up growth gained pace. But why? As Lucas (1988) has said, "The consequences for human welfare involved in questions like these are simply staggering: Once one starts to think about them, it is hard to think about anything else".
The literature on economic growth is vast and heterogeneous, and cannot be summarised here. Some approaches are phrased in universal terms, and pay no attention to the spatial and temporal distribution of capitalist growth. An example is endogenous growth theory (Aghion & Howitt, 1998), which focuses on spillovers that compensate for diminishing returns. The major orthodox theory that does have a historical/spatial dimension treats growth as the result of exogenous technical change. The standard production function approach is augmented by exogenous productivity growth (Lucas, 1988), and complementary to this are historical accounts that describe the inventions, e.g. in the British nineteenth century (Lipsey, Carlaw, & Bekar, 2005;Mokyr, 2002).
There are at least two problems here. First, this is more accurately seen as an account of growth in the leading economies, rather than of growth in general-it does not deal with catch-up growth, a very important phenomenon, especially in recent decades, where the issue is the ability to incorporate technology from outside rather than to create new technology.
Secondly, even as a theory of growth in cutting-edge economies this analysis raises the question, if technical change is exogenous, why do particular societies at particular times have an abundance of the relevant type of invention? Such an account is unable to explain why capitalist economies since the industrial revolution have a unique growth record. As Baumol (2002) noted, the spectacular and unprecedented growth rates of the industrialized market economies … set them apart from all alternative economic systems … growth rates for about one and a half millennia before the Industrial Revolution are estimated to have been approximately zero … In contrast, in the past 150 years, per capita incomes in a typical free-market economy have risen by amounts ranging from several hundred to several thousand percent! [emphasis in the original] And yet his response is to provide a model based on oligopolistic competition and the importance of research and development, features that empirically do not correspond to the spatial and temporal distribution of this growth. Oligopoly was not a feature of the decades during which Britain became the overwhelmingly dominant economy during the nineteenth century, nor of the catch-up growth in recent decades based on textile and clothing manufacture. And research and development are relatively recent features of industry in the rich economies-its distribution is quite unlike the distribution of economic growth-there was little of it in China and the earlier East Asian economies during their periods of spectacular growth, at least in the early decades.
A more realistic approach would start from an examination of the places and times where such a transformation occurred and those where it did not occur, and draw conclusions from the comparative descriptions, possible generalisations and also exceptions, thereby seeking explanations that fit the observations. As far as I am aware, this has rarely been attempted, and is certainly not part of the discourse of either orthodox or heterodox economics. One exception is a paper that seeks to account for these phenomena in terms of an arms race between firms that have specifically capitalist features. It is able to explain why such growth did not occur anywhere in the world before the industrial revolution, and why the Chinese economy grew rapidly after the 1978 reforms (Joffe, 2011).

Theories, models, evidence and causation
Most research in economics nowadays involves empirical work. Good data-sets have become available that cover a large variety of topics, and they are constantly improving and being extended. The same applies to economic history, in which the comparative method is also now strongly established. Methods of analysis have improved too, with the development of sophisticated and effective methods for causal inference and for testing predictions against evidence.

Theories generated from evidence
It is therefore odd to find a great deal of economic reasoning still starting from "standard theory". Whilst it does generate predictions that can be tested empirically, it does not have an empirical foundation, but rather is based on a story about universal human nature. It is really a modelling perspective, methodology and set of models, rather than theory in the sense used in this paper and in much of natural science. In fact, in many sub-disciplines the discourse largely concerns evidence. Yet it remains true that the traditional models retain a central place, and accumulating evidence does not tend to lead to the abandonment of a conventional starting point, even when the two are in conflict.
Rather there has been an accumulation of "puzzles", as with the Lucas puzzle discussed above. Other "puzzles" are scarcely puzzling at all, rather they are empirical observations that do not fit with standard theory, e.g. the home bias in trade and in equity portfolios (Obstfeld & Rogoff, 2001).
More recently, in labour economics we have the wage flexibility puzzle (Pissarides, 2009), the unemployment volatility puzzle (Chodorow-Reich & Karabarbounis, 2013) and the unexplained employer size-wage effect (ESWE) (Adamczyk, 2015). They are considered puzzling because they describe empirical findings that are difficult to understand from the viewpoint of the standard models in labour economics, whether based on the assumption of competitive markets, even when frictions are added, or of search and matching (Pissarides, 2000). In labour economics, as in many other sub-disciplines, there is abundant good evidence. This has been used constructively, e.g. in developing various versions of efficiency wage theory to explain important features of the labour market (Fehr, Gächter, & Kirchsteiger, 1996;Krueger & Summers, 1988;Shapiro & Stiglitz, 1984). However, the link from evidence into theory does not operate systematically-"theory" in the sense of standard models remains largely untouched, and is unable to account for the findings that underlie the above-mentioned puzzles.
The link from evidence into theory is similarly not always pursued in behavioural and experimental economics. One approach is to study human behaviour as a departure from homo economicus, explicitly to retain traditional theory as the starting point, and then to modify the analysis by incorporating important biases that deviate from this ideal (Thaler, 2015;especially p. 7). This involves two stages: the traditional assumptions of rationality, perfect information, etc., and then a correction. In terms of causal mechanism, neither of these represents an actually occurring process: the first stage is derived from axioms not observed behaviour and the second is a correction of the resulting error. A different approach is to study human behaviour as it is, e.g. truth-telling (Abeler, Nosenzo, & Raymond, 2016) and cooperation and altruism (Rand, Brescoll, Everett, Capraro, & Barcelo, 2016). This has the potential for developing a theory of economic behaviour that is based on the heuristics people actually use, and to link this with an evolutionary account of the causal processes that led to their existence in our brains (Gigerenzer, Hertwig, & Pachur, 2011;Gigerenzer, Todd, & ABC Research Group, 2000).
This second approach corresponds with the argument presented in this paper, that realistic theory can be derived from observations. In addition, the contrast between the two approaches shows that the use of experimentation is not the deciding factor. Rather, it is that evidence, whether experimental or observational, is used in generating new theory.
Thus, different strands of evidence can be combined with each other and with explanatory ideas to generate a broad, empirically informed theory, as is routine in the natural sciences. One principal aim is to identify the causal process(es) involved in generating the phenomena. Development of such a theory typically takes place over time, it is not a one-off invention. The same theory may have core concepts that explain a heterogeneous reality-as with the germ theory applied to different diseases, mutatis mutandis-which is likely to be important in economics. Other features can include that a theory can be complicated if necessary, and that it can be incomplete in the sense of not explaining everything. The operation of a causal factor may also be strongly modified by other causal influences-multiple causation. 4 Models can then be developed, e.g. mathematical models, nested within such a theory, to simulate, quantify or otherwise illuminate particular aspects.
This would be a natural way of developing conceptual categories that correspond to natural categories ("carving nature at its joints"). It also has a strong ontic emphasis, and a focus on causation. These features are widely agreed to provide a good basis for explanation, even among those who argue that other bases are possible, e.g. that explanations can be epistemic (Reiss, 2013). A theory in this sense can also be said to be true or false-or perhaps better, that it is able to possess some degree of truth.

The literature on models and theory in the philosophy of economics
Models do not necessarily possess these characteristics. It is controversial whether or not they can be true or false: Hausman (2013) believes that they cannot be, because they are artificial constructions not sentences, whereas Mäki (2013) considers that they can sometimes be true and/or contain true elements. They are typically not phrased in the language of capacities, even when they accurately capture a causal feature of the target system-rather, they tend to be set out as relationships between variables (at least this is true of economics). Models are constructions, and therefore epistemic (Reiss, 2013), whereas empirically based theories are intended to represent major features of the real world and are thus ontic. And, whereas evidence is central to such theories, it does not play an explicit part in model construction-although obviously the modeller typically takes account of the real world when deciding on what the model should contain. The key strength of a model is its simplicity, whereas a theory can be quite an elaborate structure that brings together multiple types of evidence with explanations; they can also apply to heterogeneous situations, as we saw with the germ theory, and allow for multiple causation.
Among economic methodologists, there appears to be wide agreement that one important feature of a model-possibly the main criterion in judging it-is its resemblance to its target system. Hausman (1992) uses the term "theory" to mean a model plus a theoretical hypothesis, where a theoretical hypothesis is of the form 'target system T is of the kind model M defines'. Mäki (2013) also emphasises the importance of models resembling their targets (what he calls the "semantic" issue), adding that this needs to be settled before dealing with epistemic concerns.
Others express this specifically in causal terms. Thus, Alexandrova and Northcott (2013) argue that in principle, models can be successful at isolating the capacities that occur in the target system. However, they state that in practice this does not happen in economics (and in some biology). They advocate "good old-fashioned experiment-tested causal hypotheses" and "close empirical study". Similarly, Rol (2013) praises models that capture something important such as a causal mechanism-although his example is a Madrid metro map, where the resemblance is structural rather than causal. Grüne-Yanoff (2013) agrees about the importance of correspondence, adding that the criterion is the stability of the causal factor across different environments, which is an empirical question.
Thus there appears to be wide agreement that models vary in the degree to which they represent their target system, and that causal properties or capacities are an important aspect of this, possibly the most important. The success of economics in achieving this is judged differently by different authors-but such a judgement inevitably needs to be made case by case, so that an overall generalisation is difficult. What does seem to be generally true is that there is no systematic method for ensuring this correspondence within economics.
The classic methodological statement of Friedman (1953) is interesting in this respect. Here, I will confine my remarks to his two biological examples, the distribution of leaves on a tree, and the billiard player. How well do the "as if" explanations resemble their target systems? They do not even attempt to do this in causal terms-"as if" means specifying a causal account that differs from the real-life one.
In the case of leaves on a tree, it is true that it is not trivial to observe that the leaves are distributed "as if" designed to catch the maximum sunlight. 5 The outcome is identified correctly. But the suggested "as if" mechanism is design, rather than a causal account that corresponds with real biological processes, whereas a biologist's task is to specify causal processes that explain this distribution. There is a ready-made explanatory framework in the theory of natural selection: trees with a genetic tendency towards the most beneficial leaf arrangement are more likely than others to survive and pass on their genes. The biologist's task is to identify the particular pathways in the general framework, for example, what competitive pressures are acting on the tree, perhaps in terms of climate or of a changing environment; identifying the development process and relevant genes, establishing the biochemical pathways by which they act; and so on. The idea that an "as if" explanation is satisfactory would be difficult to comprehend.
In the case of the billiard player, the focus is again on the outcome: Friedman's stated aim is to predict the shots. Here, the causal account is that Newton's equations describe the situation, and although the player does not actually use them while playing, s/he is learning how to approximate the Newtonian rules by experimentation. Friedman considered that this was a good analogy for what economists do when their equations describe economic behaviour in terms that the economic agent would not recognise. The model describes the behaviour "as if" it were a calculation, while acknowledging that the actual neurophysiological processes do not correspond to the description. Thus, this account does not aim at uncovering the causal process involved, as a biologist would do. For example, a neuro-physiologist would seek to explain how expertise in billiard playing is developed and executed in terms of neuronal pathways and neurotransmitters.
On Friedman's view therefore, causal correspondence with the real world is irrelevant to the economic modeller, and by implication for economic theory as well. To the extent that this has been influential, it may have detracted from the degree to which models in economics represent the actual capacities in the target system. It is true that the substitute mechanism may be better than nothing in some situations-for example, that firms equate marginal cost with marginal product, even though they are unaware that this is what they are doing-but Friedman provides no evidence for this, taking it to be true by definition.
One way that a model is connected to (features of) the world is by the story that accompanies the formal part of the model (Morgan, 1999), placing a considerable burden on the adequacy of the story. It may be that the perceived problem that much economic theory is unrealistic results from a poor connection resulting from an inadequate story, rather than with internal features of the model itself, such as simplifying assumptions. This "semantic" issue needs to be established first, because if it is seriously deficient then it could be argued that other aspects of the model lose their interest, e.g. the other function that Morgan specifies for a model, to demonstrate outcomes within the model.
What economic theorising needs is a more robust process for ensuring the correspondence of key aspects of each model with the target system, especially its central causal features. One way of doing this would be to emulate the methods of natural sciences such as biology, as outlined above.
What difference would this make? In the case of Lucas' puzzle of 1990, basing theory on an empirical account would quickly solve the problem-the export of capital from China during its years of mega-profitability is no mystery, and the same applies to other such countries.
To take the more complicated example of capitalist growth: a methodology could be adopted that is similar to that of Lindert on the growth of the state. The starting point would be comparative economic history, that clearly describes where and when such growth has occurred and where it has not. This would guarantee that putative explanations would at least fit with this macroscopic evidence. Then we might have several rival models and the problem of under-determination, rather than the current situation where it is hard to find any plausible ideas that correspond to the spatial and temporal observations. A model would also need to be able to account for the magnitude of the transformation as well as its functional form which has typically been exponential (suggesting a reinforcing feedback process). Another angle is the nature of micro-foundations: the fact that growth of this type is so limited spatially and temporally indicates that any micro-foundations cannot be based on a notion of universal human nature (e.g. optimisation), but rather must rest on a different foundation, e.g. institutions.

Conclusion
In the economic methodology literature, models in economics are often compared with models in the natural sciences-typically in physics and mechanics, such as "the idealizations of frictionless plane, perfectly elastic gas molecule, rigid body, planets as mass points, two-body solar system" (Mäki, 2009). This paper has taken a different perspective, which may be useful for a discipline that deals with such a complicated and open-ended reality. The core idea is that "theory" should be regarded as something quite different from a model, a modelling perspective, or a group of models. Reiss (2011) also distinguishes between models and theory in the context of what economics could learn from other disciplines, giving the examples of Newton, Darwin and Einstein (as well as, less convincingly, Marx and Freud). As previously stated, his concept of a theory is "a small number of explanatory hypotheses that can be used over and over again in the explanation of a wide range of phenomena". I agree that this is the aim, and is achieved in the theories associated with these great scientists. In this paper, I have tried to describe one of the major ways that the natural sciences, such as biology, achieve reliable knowledge of a reality in which (again in Marshall's words) "the inner nature and constitution, as well as the outer form, are constantly changing". The examples discussed above-e.g. the germ theory of disease and tectonic plate theory, are themselves successful, even if not quite in the league of Newton, Darwin and Einstein. But think about the germ theory in 1850, or the continental drift idea in 1950 before tectonic plate theory-these are more equivalent to the situation in much of economics. The issue is how to get from where we are now to the more desirable situation of having a satisfactory theory.
I suggest that one possible way is to use this approach as an exemplar. It is not the only one, and economists will continue to use a variety of methodological approaches. However, it is probably the one most likely to achieve alignment between theorising and reality.
The key elements are the combination of diverse types of evidence with causal hypotheses. The importance of capacities and of flows that emerged from the natural science examples suggests the need to specify features in economic life that play equivalent roles. These could include the capacities of economic agents (usually described as "human capital") and monetary flows. This is a large topic that will be developed in future research, substantive as well as methodological. http://dx.doi.org/10.1080/23322039.2017.1280983 on mechanism is concerned with the process that brings about this change-its structure and how it works. 4. In addition, feedback is likely to be important (Joffe, in press), a perspective that allows endogenous causation to be directly addressed. Discussion of this aspect is beyond the scope of the present paper. 5. Methodologically, Friedman's statement is functional: the distribution of leaves fulfils the need of the tree to capture maximal sunlight. The problems of such potentially teleological arguments are well recognised by scientists, and in the philosophy of science. Philosophers of biology have long realised that they can readily be overcome in the biological context, because it is straightforward to make a "translation of talk of functions into terms of talk of adaptations", i.e. a causal one based on differential survival and reproduction (Ruse, 1973); it then also becomes a historical account in real time.
Outside biology such a translation cannot in general be made; a functional statement may not correspond to any real causal processes.