An Introduction to Mechanized Reasoning

Mechanized reasoning uses computers to verify proofs and to help discover new theorems. Computer scientists have applied mechanized reasoning to economic problems but -- to date -- this work has not yet been properly presented in economics journals. We introduce mechanized reasoning to economists in three ways. First, we introduce mechanized reasoning in general, describing both the techniques and their successful applications. Second, we explain how mechanized reasoning has been applied to economic problems, concentrating on the two domains that have attracted the most attention: social choice theory and auction theory. Finally, we present a detailed example of mechanized reasoning in practice by means of a proof of Vickrey's familiar theorem on second-price auctions.


Introduction
Mechanized reasoners automate logical operations, extending the scope of mechanical support for human reasoning beyond numerical computations (such as those carried out by a calculator) and symbolic calculations (such as those carried out by a computer algebra system).Such reasoners may be used to formulate new conjectures, check existing proofs, formally encode knowledge, or even prove new results.The idea of mechanizing reasoning dates back at least to Leibniz (1686), who envisaged a machine which could compute the validity of arguments and the truth of mathematical statements.The development of formal logic from 1850 to 1930, the advent of the computer, and the inception of artificial intelligence (AI) as a research field at the Dartmouth Workshop in 1956 all paved the way for the first mechanized reasoners in the 1950s and 1960s. 1 Since then, mechanized reasoning has been both less and more successful than anticipated.In pure maths, mechanized reasoning has helped prove only a few high-profile theorems.Perhaps surprisingly -although consistent with the greater success of applied AI over 'pure' AI -mechanized reasoning and formal methods 2 have enjoyed greater success in industrial applications, as applied to both hardware and software design.In the past decade or so, computer scientists have also begun to apply formal methods to economics.
A central inspiration for this recent work is Geanakoplos' three brief proofs of Arrow's impossibility theorem (Geanakoplos, 2005).Initially, Nipkow (2009), Wiedijk (2007), and Wiedijk (2009) used theorem provers to encode and verify two of Geanakoplos' proofs.A subsequent generation of work, drawing on the inductive proof of Arrow's theorem in Suzumura (2000), used formal methods to discover new theorems.Tang and Lin (2009) introduced a hybrid technique, using computational exhaustion to show that Arrow holds on a small base case of two agents and three alternatives, and then manual induction to extend that to the full theorem.By inspecting the results of the computational step, they 1 Perhaps unsurprisingly, Gardner was ahead of his time in mechanized reasoning as well: four years before his regular columns with Scientific American began, his first article for them included a template allowing readers to make their own mechanized reasoners -out of paper. 2 The term formal methods is used here to denote approaches to establishing the correctness of mathematical statements to a precision that they can be meticulously checked by a computer.Rather than being seen as distinct from other mathematical methods, researchers in the area see them as the next step in mathematics' march towards greater precision and rigor (Wiedijk, 2008).Consider: "A Mathematical proof is rigorous when it is (or could be) written out in the first-order predicate language L (∈) as a sequence of inferences from the axioms ZFC" (MacLane, 1986).The advantages of taking this next step with computers include: a computer system is never tired or intimidated by authority, it does not make hidden assumptions, and can easily be rerun.A pioneer of mechanized reasoning -who saw himself building on Bourbaki's formalism -referred to computers as "slaves which are such persistent plodders" (Wang, 1960).
were able to discover a new theorem subsuming Arrow's.Tang and Lin (2011a)  used this approach -exhaustively generating and evaluating base cases, and then using a manual induction proof to generalize the results -to establish uniqueness conditions for pure strategy Nash equilibrium payoffs in two player static games; they published manual proofs of two of the most significant theorems discovered this way in Tang and Lin (2011b).Geist and Endriss (2011) used the approach to generate 84 impossibility theorems in the 'ranking sets of objects' problem (Barberà, Bossert, and Pattanaik, 2004).
To date, the economics literature remains almost untouched by research applying mechanized reasoning to economic problems. 3The one exception that we are aware of is Tang and Lin (2011b), whose two theorems were discovered computationally, but proved manually. 4As it is our view that these tools will become increasingly capable, this paper aims to introduce economists to mechanized reasoning. 5It does so by means of three analytical lenses, each with narrower scope but greater magnification than its predecessor.
First, Section 2 presents an overview of mechanized reasoning in general.We do so by setting out a classificatory scheme, with the caveat that it should not be seen as implying a partition on the field: interesting research will straddle boundaries, perhaps even forcing them to be redefined. 6econd, Section 3 surveys the emerging literature applying mechanized reasoning to economics.We structure this survey primarily according to the problem domain within economics, referring only secondarily to our classificatory scheme.We do this to focus on the economic insights -primarily within social choice and auction theory -made possible by these techniques, rather than on the techniques per se.
Finally, to make this introduction more concrete, Section 4 provides an example of what mechanized reasoning looks like in practice, presenting a blueprint of a mechanized proof of Vickrey's theorem on second-price auctions.We present such an established theorem to focus attention on its implementation.
Section 5 concludes, and suggests some possible next steps for mechanized reasoning in economics.

Mechanized reasoning
Our overview of mechanized reasoning distinguishes between deductive and inductive systems.While the distinction has been recognized at least since Aristotle, deductive reasoning -which allows reliable inference of unknown facts from established facts -has been in the focus of the mechanized reasoning community.
Inductive reasoning also generalizes from individual cases, but does not restrict itself to reliable inferences; the cost of this additional freedom is that its conjectures must then be tested.

Deductive reasoning
Historically, deductive reasoning systems were among the first AI systems, dating back to the 1950s.While the origins of deductive reasoning date to at least Aristotle, modern advances in this area built on the work of logicians in the second half of the 19th century and the start of the 20th (e.g.Whitehead and Russell,  1910).At the Dartmouth Workshop in 1956, Newell and Simon introduced the Logic Theorist, an automated reasoner which re-proved 38 of the 52 theorems in Whitehead and Russell's Principia Mathematica (Whitehead and Russell, 1910). 7bstractly, a deductive reasoner implements a logic -which is comprised of a syntax defining well-formed formulae and a semantics assigning meaning to formulae -and a calculus for deriving formulae (called theorems) from formulae (called premises or axioms).Historically, subfields of mechanized reasoning have been defined by choice of logic, calculus and problem domain.This section provides a classificatory scheme based, first, on the choice of calculus.Following the choice of calculus, a logic is chosen to balance expressiveness and tractability.Finally, the problem domain itself will dictate some of the specialized features of a mechanized reasoner.
When a mechanized reasoner applies the calculus' permissible operations to the axioms to obtain new, syntactically-correct formulae it does not make use of the semantics: the semantics, or ascribed meanings, yield models that may assist human intuition, but which are not necessary to the formal process of reasoning itself.8Crucially, mechanized reasoning involves manipulating symbols. 9hus, mechanized deductive reasoning since the Logic Theorist has seen reasoning as a search task for a syntactically well-defined goal.10Further, as the spaces through which search occurred was potentially large, successful reasoning would use heuristics to avoid unprofitable sequences of operations.From this point of view, mechanized reasoning operates as chess computers do.11For a chess computer, the premises' intended semantic interpretations are the board, its pieces and their positions; the calculus specified permissible moves.A chess computer could then test manually discovered solutions to chess puzzles by verifying that each move satisfies the requirements of its calculus, with the final operation yielding the goal-formula.More ambitiously, and interestingly, chess programs discover solutions (e.g.sequences of winning moves) by searching through permissible operations, with the benefit of heuristics (e.g.regarding relative values of pieces).
A set of premises and a formula may be related in two different ways.First, the semantic consequence relation describes situations in which the formula follows from the premises: if the symbols in the premises are interpreted in such a way that the formulae in the premises are all true, then the formula is also true when the symbols in it are interpreted in the same way.Second, the syntactic derivibility relation describes situations in which the formula can be derived from the premises: it is possible to generate the formula from the premises by applying a fixed set of so-called calculus rules.(An example of such a rule is modus ponens: From A and A → B it is possible to derive B, where A and B may match any formal expression).A proof that applies such rules, without any appeals to intuition or to the reader filling in steps on her own, is called a formal proof of the formula using the premises.
A calculus is called sound if only formulae can be derived from the premises that actually follow from them.Deductive reasoning is sound; inductive reasoning, considered below, is not.
A calculus is complete if it allows derivation of any formula that follows from the set of premises.A calculus is decidable if, for any set of premises and any formula, there is a procedure that either derives the formula from the premises or proves that no such derivation exists; a calculus is semi-decidable if a procedure exists that derives the formula from premises, whenever the formula follows from them (but may not terminate if it does not).
Decidability typically depends on the expressiveness of the logic used: more expressive logics model a richer set of concepts, but are generally harder to manipulate.While ambitious exercises in mechanized reasoning often begin by specifying a suitably tailored logic12 , we largely restrict our attention to some of the best known classical logics.13Propositional (Boolean) logic: Propositional or Boolean logic, the simplest classical logic, only uses propositional variables -which are either true or falseand connectives such as ∧ (and), ∨ (or), ¬ (not), and → (implies).An example of a propositional formula is first_bidder_bids_highest ∧ second_bidder_bids_lowest.
Propositional logic can only make concrete, finite statements, but has a sound, complete and decidable calculus.
An advantage of this decidability is that it may allow push-button technology, which does not require specialist knowledge in order to use.Once a problem is adequately represented a corresponding system solves the problem fully automatically.
First-order logic: First-order logic (FOL) is more expressive.First, it can speak about objects (e.g."bidder b 1 ") and their properties (e.g."bidder b 1 wins auction", bidder(b 1 ) ∧ wins (b 1 )).Second, ∃ and ∀ allow quantification over objects.For example, "every losing bidder pays nothing" may be expressed as ∀i .bidder(i) → (¬wins (i) → pay (i) = 0). (1) Expressions like wins are called predicates, Boolean functions which -when applied to their arguments -evaluate to either true or false.Gödel's completeness theorem proves that FOL has a sound and complete calculus, but FOL has only semi-decidable calculi.Furthermore, FOL is not expressive enough to express the finitude or (per negation) infinity of the non-empty sets of objects. 14Many-sorted FOL uses sorts to extend first-order logic, not to add to its expressiveness, but to allow more concise representations, and -therefore -more efficient proving.Sorts restrict the instantiation of variables to expressions of a certain sort.For instance, sorts allow us to specify that variable i is a bidder, and variable x a good.Formula (1) is then more precisely stated as: (2) i (with the sort bidder mentioned only at the first occurrence) can be instantiated now by terms of sort bidder, but not by those of sort good, thus reducing the search space for a proof.Sorted formulae can be translated to unsorted formulae by converting the sorts to unary predicates (which take a single argument).
Higher-order logic: Higher-order logic (HOL) enriches the expressiveness of FOL by extending quantification to predicates and functions.It also allows predicates and functions to take certain 15 other predicates and functions as arguments.
For example, bids, b, are both a function from bidders to prices and an argument (along with N, v and A) in the predicate Against this, HOL's calculi are not decidable, and are -by Gödel's incompleteness theorem -incomplete.Two common ways in which the classical logics (in particular, FOL) are augmented are, first, by the addition of set theoretical axioms and, second, by the addition of modal operators.The first allows the approximation of higher order logic while maintaining advantages of first order logic; the second allows logic to be applied to modalities, such as knowledge, belief, or time.
Set theoretical axioms allow the definition of new symbols and operations on both predicates (e.g.∈ and ⊆) and functions (e.g.∪, ∩ and ∅). 16They also allow the specification of properties of sets (e.g. a X).Adding set theoretical axioms to FOL allows it to weakly simulate HOL: functions can be expressed as relations over X × X that are left-total and right-unique; predicates are expressed as sets.While HOL is still more expressive than FOL augmented by set theory (e.g., FOL cannot express inductive arguments), HOL's incompleteness means that there are true statements that can be expressed in HOL but which may not have finite proofs.As FOL augmented by set theory uses FOL, it remains complete by using FOL's complete calculus. 15Unrestricted formula building leads to antinomies as discovered by Russell.The introduction of types imposes a hierarchy on logical objects, including predicates.This disables circular constructs such as X(Y) := ¬Y(Y), which -when Y is instantiated with X -produces the set of all X for which X X, Russell's famous antinomy. 16Constants such as ∅ are considered as a special case of functions, nullary functions -functions that do not take any argument.
Modal operators -such as 'next' and 'until' -allow the consideration of modes (or states in economic parlance).Linear temporal logic (LTL) is a popular simple modal logic, modelling states in a linear fashion, thus excluding the consideration of multiple possible future states.Kamp's theorem established the equivalence of LTL with a first-order logic.Another first-order approach to modelling states is the situation calculus (McCarthy and Hayes, 1969), which allows expression of states and the temporal development of systems in first-order logic by representing the state as an extra argument of the formulae (e.g., that agent i has £10 in state s 0 can be expressed as has(i, 10, s 0 )).By referring to the state absolutely, rather than in relation to other states, the problem can be expressed in standard FOL without recourse to specialized modal relations.
Our final level of distinction is the domain of the problem; this level will allow us to present concrete examples of the preceding.1, the decidable logic cell refers to decidable calculi as applied to logical problems.Boolean satisfiability problems (SAT) are among the simplest canonical problems in propositional logic.They specify a (finite) set of statements about a (finite) set of propositional variables, and ask whether there exists an assignment of values (i.e.true and false) to each of those variables that simultaneously satisfies all of the statements.
In SAT problems, clauses of Boolean variables are typically expressed in conjunctive normal form, conjunctions (∧) of disjunctions (∨) such as (¬p ∨ q) ∧ (p ∨ ¬q) ; (3) where p and q are Boolean variables, evaluating either to true or false. 17Revisiting the example that in auctions the non-winning player pays nothing, equation (1) can be translated for a finite number of bidders (here, three) to a propositional logic formula, stating for each of the three players separately that they win or pay nothing.Any formula in propositional logic can be expressed in this form, as can any formula in first-order logic when the domain is restricted to a concrete finite domain (such as three bidders in an auction).A SAT solver is used to try to assign the variables such that all of the clauses are true.For instance, assigning wins1 and pays1 to true and the other predicates to false shows that the single formula (4) is satisfiable.
SAT problems are NP-hard (Karp, 1972), requiring -in the worst case -trial of every possible input.Thus, while the logic and calculi involved are simple, SAT problems may not be computable in practice except in small cases.However, techniques have been developed so that SAT solvers are able to solve typical cases very quickly.One application area of SAT solvers are model checkers, as described below.
Constraint satisfaction problems (CSP) are triples, V, D, C , where V is a set of variables, D their domain, and C the constraint set.In CSPs, the variables may take on more values than in Boolean satisfiability's binary assignments.For example, an hours variable might take one of twelve values.While apparently richer, CSPs can be reduced to SATs by suitable definition of additional auxiliary variables. 18he third example of decidable calculi applied to logical problems that we consider are description logics.These are central to automated reasoning about concept hierarchies in classification (or ontological) tasks.One of their most important applications is to the semantic web, which allows computers to extract semantic information from web pages.As a simple example, semantically enabled web searches could recognize that x 2 + y 2 = z 2 and a = √ c 2 − b 2 were both statements of Pythagoras' theorem. 19 Model checking: Model checking (Clarke, Emerson, and Sistla, 1986; Clarke,  Grumberg, and Long, 1994) builds finite modelsto describe computer hardware systems or simple software systems and then tests their properties.Typical questions include whether certain states of the system can be reached, or whether information is flowing properly through a circuit design.
Such models are typically expressed as finite automata.A finite automaton can model either a finite system or an infinite system if abstraction allows the infinite state space to be simplified to a finite one. 20Then the model is systemat-ically checked for desired properties, e.g. by using SAT solvers.Viewing digital computer chips as a set of Boolean statements allows them to be modeled as decidable computer systems allowing, in turn, SAT solvers to automatically verify their properties.Since the mid-1990s, Intel has used formal methods to formally prove properties like 'this chip implements the IEEE division standard' following an embarrassing and costly recall of a Pentium chip that was discovered not to properly implement IEEE floating point division (Harrison, 2006).No further such problems have been reported since then. 21

Undecidable logic:
The upper right cell in Table 1 refers to the application of undecidable calculi to logical problems.The two types of mechanized reasoning mentioned here, interactive theorem proving (ITP) and automated theorem proving (ATP) have traditionally been equated with theorem proving, but seen as distinct, with the former involving more steering from a human user than the latter.Stereotypically, an ITP system could check an existing proof, while an ATP system could suggest steps in a proof or, in some cases, a whole proof.In practice, the distinction between the two has decreased, with ITP systems implementing ATP procedures. 22he traditional identification of theorem proving with work in these areas owes partly to some high profile successes in pure mathematics, the focus of the most hope in mechanized reasoning's early days.The earliest major success was -as might be expected in an emerging field -not even a clear example of mechanized reasoning: in the 1970s, computers were used to carry out the exhaustive computations required to prove the four-color map theorem (q.v.Appel and Haken,  1977; Appel, Haken, and Koch, 1977).Here, the computers were used to perform simple (algebraic) calculations, rather than to (logically) 'reason'.More recently, mechanized proof checkers have confirmed these results formally (q.v.Gonthier,  2008). 23he first major mathematical result to be established by mechanized reasoning -rather than 'mere' calculation -was Robbins' conjecture that two bases for Boolean algebras are equivalent.While appearing to be a beguilingly simple problem, it remained unresolved for 60 years, becoming a favourite of Tarski, who set it as an open problem (q.v.Henkin, Monk, and Tarski, 1971, p. 245).One of by >, < and =.See Burch et al. (1990) for an application to large, complex microprocessor circuits.
21 With chip design becoming more and more sophisticated, the reasoning in the verification needed to become also more sophisticated.Thus, HOL theorem provers such as HOL-Light are now also used for hardware verification.
22 Harrison (2007) noted that ITP may be preferred to ATP, as -in working more closely alongside human reasoning -it may be better at developing human understanding. 23Gonthier's team has now also formally checked the Feit-Thompson Odd Order Theorem (Gonthier et al., 2013).
the complicating factors of the conjecture was that the only known example of a Robbins algebra was also a Boolean algebra, reducing the evidence base that mathematicians could use to form intuitions about the problem.Nonetheless, in the late 1990s, McCune (1997) was able to pose the problem in a way that allowed EQP, an automated theorem prover related to his well-known Otter prover, to generate -not just check -a 17-step proof, later reduced to eight steps (McCune,  1997). 24erhaps the highest profile success of mechanized reasoning in pure mathematics is the solution to Kepler's conjecture that there is no denser packing of spheres in R 3 than the face-centred cubic.Hales' original proof was 120 pages long (excluding computer code that exceeded 500MB), requiring a team of 12 referees five years to become "99% certain" that it was correct.Unsatisfied with this standard, Hales founded Project Flyspeck to establish a fully formal proof of the conjecture (Hales, 2012).In August 2014, the project was completed (Hales  et al., 2015), close to Hales's original estimate of 20 person-years (Avigad and  Harrison, 2014).
More mundanely, ITP has been used to translate existing human proofs into formal proofs that are sufficiently detailed that a computer can mechanically verify them: as of January 2016, 91 of the 'top 100' mathematical theorems on a list maintained by Wiedijk (2014) had been formalized. 25While most of these are considerably less spectacular than the examples cited above -in which theorem provers have been used to help convince mathematicians as to the validity of major, new results -the gradual accretion of small proof libraries builds a foundation for applying ATPs more widely.
The distinction between high-profile, major theorems and lower-profile bodies of theory has been suggested as a reason that ATP has yet to fulfil its early hopes: Buchberger (2006) noted that human mathematicians typically do not try to prove isolated theorems but explore a whole theory, thereby building up valuable intuition which helps them in proving related theorems.Additionally, Newell (1981)  stated that standard theorem proving techniques -while often highly efficient -do not make use of advanced human approaches (as described in Pólya's books) such as simplifying a problem to one they can solve; applying the simplified solution to the original problem may still be very hard, but the intuition gained by solving the simplified problem may help solve the original problem.26 Program verification Table 1's lower right cell corresponds to software engineering's program verification, reasoning about software systems.This can be highly complex in the case of complex programs.Within program verification, traditional proof approaches have sought to prove that the software correctly implements properties specified in the design brief.As such proofs are very costly, full correctness proofs that seek to verify all desired properties of the code, are done only for 'mission critical' systems (Vijay D'Silva, 2008).
Some well known examples of program verification have come from transport and finance: in code controlling automated commuter rail systems, theorems that no two trains occupy the same location at the same time have been proved; within financial transactions software, theorems that transactions do not create or destroy value, but merely transfer it, have also been proved (Woodcock et al., 2009).More recently, a compiler for the C programming language has been formally verified (Boldo et al., 2013).These techniques are becoming more mainstream: in 2013, Facebook acquired Monoidics, a start-up firm applying theorem proving to software code analysis; in 2015, another start-up, Aesthetic Integration beat 600 competitors to win first prize in UBS' Future of Finance Challenge for its ability to automatically prove failure or compliance in financial algorithms. 27istorically, program verification has been conducted as a post mortem: given existing code, program verification determines whether or not it is correct.More recently, code extraction techniques have been developed to generate code that provably implements the desired properties.

Inductive reasoning
As noted above, both inductive and deductive reasoning date back at least to Aristotle, but the former is not sound, while the latter has been the focus of the mechanized reasoning community.The distinction between the two -as well as the utility of each -was expressed by Pólya (1954, p. vi), who referred to deductive reasoning as demonstrative reasoning, and inductive reasoning as plausible reasoning: We secure our mathematical knowledge by demonstrative reasoning, but we support our conjectures by plausible reasoning . . .Demonstrative reasoning is safe, beyond controversy, and final.Plausible reasoning is hazardous, controversial, and provisional. . . .In strict reasoning the principal thing is to distinguish a proof from a guess, a valid demonstration from an invalid attempt.In plausible reasoning the principal thing is to distinguish a guess from a guess, a more reasonable guess form a less reasonable guess. . . .[plausible reasoning] is the kind of reasoning on which [a mathematician's] creative work will depend.
Inductive systems seek to derive general statements based on a finite number of statements (e.g. if A 1 is true, and A 2 is true, and so on up to A N for some finite N, then A n is true for all natural numbers n). 28This sort of reasoning is immediately familiar to us when we reflect on how we form conjectures: we expect the sun to rise tomorrow without any understanding of astrophysics; this expectation, though, may lead to the formation of conjectures about astrophysics.However compelling the weight of evidence, inductive reasoning is not sound -as may be demonstrated by single counterexamples.In number theory, Euler's attempted generalization of Fermat's last theorem remained open for two centuries until a computer found a counterexample. 29In game theory, Neumann and Morgenstern conjectured that stable sets ('solutions' in their parlance) always existed; it took almost a quarter-century for counterexamples to be found (Lucas,  1968).
Inductive reasoning may be used for theorem discovery, whereby regularities in observed data are used to form conjectures to test. 30echanized inductive reasoning dates back to two systems built in the 1970s and 1980s to discover new conjectures, AM (Automated Mathematician) (Lenat,  1976) and Eurisko (Lenat, 1983).These were able to detect conjectures such as the unique prime factorization theorem and Goldbach's conjecture. 31The systems use certain measures of interestingness for concepts.For instance, concepts that are always true or always false are not interesting.However, if a concept is true for a significant proportion of examples (such as divisibility by only 1 and the number itself) then this is considered as an interesting concept ('primality' for divisivility 28 Inductive reasoning is distinct from mathematical induction, which involves proving A 0 and that A n+1 is true given A n .Mathematical induction is a sound deductive method. 29Euler's conjecture states: let n and k be integers greater than one, and let a 1 , . . .a n and b be non-zero integers; then The first known counterexample, found by computer, is 27 5 + 84 5 + 110 5 + 133 5 = 144 5 (Lander and Parkin, 1966). 30One of the most dynamic subfields of AI currently is machine learning.Some definitions are agnostic as to how the machines learn -e.g.whether deductively or inductively -while, perhaps more typically, others link machine learning more closely to inductive reasoning.Some of the highest profile applications of machine learning are statistical, positing rules that fit the existing data well, rather than perfectly. 31The prime factorization theorem states that any positive integer has a unique decomposition as the product of primes.Goldbach's conjecture states that every even integer beyond two can be expressed as the sum of two primes.
by only 1 and the number itself). 32enat's work was continued by Colton in the HR (Hardy-Ramanujan) system (Colton, Bundy, and Walsh, 1999), where more advanced measures for interestingness were developed.For instance, The novelty measure of a concept calculates how many times the categorisation produced by the concept has been seen.For example, square numbers categorise integers into two sets: {1, 4, 9, . ..} and {2, 3, 5, . ..}.If this categorisation had been seen often, square numbers would score poorly for novelty, and vice-versa.(Colton, Bundy,  and Walsh, 2000).
Another important advance in Colton's work is that the HR system weeds out simple conjectures, namely those that can be easily verified or falsified by automated theorem provers. 33One of the successes of HR was that it invented the concept of 'integers with a square number of divisors' which was added to Sloane's Encyclopedia of Integer Sequences. 34

Mechanized reasoning for economic problems
Over the past decade, computer scientists have become interested in economic problems -often publishing economically novel and interesting results, but almost entirely within the computer science literature.This section reviews that literature, focusing on the applications to social choice and auction theory.We structure this survey primarily according to the problem domain within economics, and only secondarily according to our classificatory scheme, in order to focus on the insights into economic problems made possible by these techniques, rather than the techniques themselves.
Table 2 places the papers reviewed in this section into our original classificatory scheme.This classification is imperfect.For example, Tang and Lin (2009)  and Geist and Endriss (2011) both used propositional logic solvers (and, therefore, deductive reasoning), but used them to discover new results -which we have associated, above, with inductive reasoning.Papers like this therefore span historical distinctions. 32Dick's case study of the Argonne National Laboratory's AURA system noted that, while "the capacity to identify what was 'promising' or 'interesting' was precisely one of those unautomatable human abilities . . . the Argonne practitioners decided what was important on the basis of extensive experimenting with AURA." 33 See also the introduction of Tang and Lin (2011a) for a brief review of the history of mechanized theorem discovery; a lengthier review is available in Tang (2010).
34 https://oeis.org/Social choice has been mechanized reasoning's main point of contact with economics, making it a convenient lens for illustrating mechanized reasoning.Auction theory is, we feel, promising as a new point of contact between mechanized reasoning and economics, due both to the technical parallels between social choice (where mechanized reasoning has proved fruitful) and mechanism design (q.v.Reny (2001)), and to auctions' importance as allocation mechanisms.2: Some applications of mechanized reasoning to economic problems

Social choice
Geanakoplos' three brief and distinct proofs of Arrow's impossibility theorem -that, for three or more alternatives and a finite set of agents, there is no social choice rule satisfying unanimity (UA), independence of irrelevant alternatives (IIA) and non-dictatorship (ND) -served as the mechanized reasoning community's entrée to economic problems: social choice was novel to this community, yet used familiar structures -particularly linear orders -and the three proofs by Geanakoplos (2005) gave the mechanized reasoning community an opportunity to attempt to compare the relative difficulty of encoding those proofs for computers.
One primitive measure of the relative difficulty of formal proofs is to compare their size to that of human proofs. 35Table 3 reports on the relative sizes of Nipkow's proofs in Isabelle -a higher-order logic theorem prover -and Wiedijk's proof 36 in Mizar -a set theoretic proof checker, which augments first-order logic by the axioms of Tarski-Grothendieck set theory. 37 Nipkow (2009) attributed the greater length of the Mizar proofs to Isabelle's "higher level of automation"something to which we return in our Isabelle proof of Vickrey's theorem.
Paper (Geanakoplos, 2005)  1 page 1 page Isabelle (Nipkow, 2009)  350 lines (6 pages) 300 lines Mizar (Wiedijk, 2007; Wiedijk, 2009)   (Geanakoplos, 2005).In seeking to formalize the first proof, he discovered a statement in one of the lemmas that required a 20 line auxiliary proof to properly establish.Further, a relationship between a pivotal voter and a dictator only "hinted at" in the original text required elaboration.Nipkow did not discover any errors in this first proof.Similarly, Wiedijk  (2009) reported on missing cases, but no "real errors".
As to the third proof, Nipkow found two instances of omitted material in its central lemma, preventing him from formalizing the proof.Nipkow presented these concerns to Geanakoplos by e-mail; both concerns were resolved in Geanakoplos (2005). 38oth Nipkow and Wiedijk's proofs were written by the authors themselves, and are therefore examples of ITP.By contrast, Grandi and Endriss (2012) sought to, first, restate Arrow's theory in FOL and, then, to automatically generate a proof for it. 39Expressing Arrow's theory in FOL presented the challenge that quantifying over all possible linear orders of agents' preference profiles appears to be a second-order quantification as it involves quantifying over agents, alternatives, and the agents' preference profiles.Grandi and Endriss addressed this by adopting became successively more abstract, making the first the most challenging as, generally "abstract mathematics is easier to formalize than concrete mathematics" (Wiedijk, 2009). 37The advantage of Tarski-Grothendieck set theory over Zermelo-Fraenkel is that the former only requires finitely many axioms to axiomatize sets. 38Mechanized reasoning can identify omissions by forcing close scrutiny.This, of course, is also possible without mechanical support.For example, in the matching literature, Aygün and Sönmez (2013) identified a hidden assumption in Hatfield and Milgrom (2005) -which they view as "widely considered to be one of the most important advances of the last two decades in matching theory" -without which many of their results fail to hold.The oversight arose from "an ambiguity in setting the primitives of the model".This ambiguity would likely have been detected by a mechanized reasoner as well. 39Grandi and Endriss ( 2012) is also a good guide to related work on formalizing results in social choice.
the approach taken in Tang and Lin (2009), namely to apply the situation calculus (mentioned in section 2.1) for the representation.Thus, they could present a first-order formalization of the requisite axioms, T ARROW , allowing them to restate Arrow's theorem as: Theorem 1 (Arrow à la Grandi and Endriss ( 2012)).T ARROW has no finite models.
A model in this sense is an instantiation (or example) of the variables used in the theory.For Arrow's theorem, the variables include N (the set of agents), A (the set of alternatives), the set of the agents' preference profiles, and the set of social welfare functions (SWFs) mapping from such profiles to a social preference.In the two-agent, three-alternative case, that T ARROW "has no finite models" means that none of the 6 36 possible SWFs satisfy the theory's axioms. 40The theorem claims this property for any finite number of agents, and any finite number of alternatives in excess of three.
FOL's completeness allows any property of the system to be explicitly derived.However, the second problem with FOL encountered by Grandi and Endriss is that FOL is unable to express finitude, for the same reason that it cannot express induction: intuitively, HOL defines finitude by considering the complement of the infinite, which it can define by induction on the natural numbers.Thus, formulating Arrow's Theorem in FOL requires a separate formulation for each |N|.Similarly, proofs of Arrow's theorem in FOL may differ for each |N|.Thus, Grandi and Endriss' attempts to use a first-order theorem prover to automatically generate proofs of Arrow's theorem failed outside of minimal cases. 41ndependently of Geanakoplos' proofs, Suzumura (2000) had presented an induction proof of Arrow's impossibility theorem for a base case of two agents and |A| alternatives; an induction result then demonstrated its truth in general.This motivated Tang and Lin (2009) to manually derive a second induction result in the number of agents.Proving the impossibility in a two-agent, three-alternative base case, would -by their two induction lemmas -cause it to hold in general.They computationally exhausted this base case in two different ways.
First, they expressed the problem as a Boolean SAT problem.Tang and Lin then used the situation calculus, which allows many of the problem's symmetries to be efficiently dealt with by the action of swapping arguments, to reduce the number of variables needed in the base case to 35,973 in 106,354 clauses.These are too many cases to check manually.However, using the SAT solver Chaff2 they could show the inconsistency between the three basic axioms in less than a second on a desktop computer.
Second, Tang and Lin expressed the problem as a CSP, in which V, the set of variables, consists -in their base case -of 36 preference profiles; D, their domain, of six linear orderings for each profile; and C, their constraint set, of the UN and IIA axioms.As the base case implies 6 36 ≈ 10 28 possible SWFs -far too many to be feasibly generated -the authors used the (first-order) logical programming language Prolog to generate all SWF satisfying the constraints of UN and IIA.Running in less than a second on a desktop computer, their Prolog code generated two SWFs, both of which were also dictatorial.
A similar approach yielded the Muller-Satterthwaite theorem, and Sen's Paretian liberal result, among others. 42hen implementing the CSP, the authors noticed that imposing even just the IIA constraint reduced the set of SWFs from 6 36 to 94.By inspecting these manually, Tang and Lin (2009) posited a new theorem that implies both Arrow's and Wilson's.Before stating it, note that a social order is inversely dictatorial if it ranks elements in the opposite way to at least one agent; the Kendall tau distance between two orderings is the number of pairs on which they disagree.Then: Theorem 2 (Tang and Lin (2009)).If a social welfare function W on (N, A) satisfies IIA, then for every subset Y of A such that |Y| = 3,

The range of W Y has at most 2 elements, whose [Kendall tau] distance is at most 1.
As an example of an SWF accepted under condition 3 of theorem, consider the function that always prefers the first alternative to the second, always prefers the first to the third, and prefers the second to the third alternative unless both agents prefer the third to the second.This is neither dictatorial nor inversely dictatorial: the agents' preferences for the first item are ignored; there are only two elements in its range (e.g. a ≻ b ≻ c and a ≻ c ≻ b), the distance between which is one. 43s Tang and Lin noted, the third case of their result violates Arrow's original nonimposition axiom, which requires that the SWF be surjective, mapping to every possible value in its range. 42See Geist (2010) for a more complete list. 43Represent preferences over three objects as a three-digit binary character, the first indicating whether a ≻ b, the second whether a ≻ c and the third whether b ≻ c.There are six permissible three digit numbers, 000, 001, 011, 100, 110 and 111, after eliminating the two cyclical ones.IIA then requires that each digit in the social preference is a function of the corresponding digits in the individual preferences alone.The 1-distance condition then allows only one of those digits to vary.
Of the 94 SWFs satisfying IIA, there are 84 of the sort described above, 6 constant SWFs (one for each ordering), two dictatorial functions, and two inversely dictatorial functions.
As before, the theorem is established by exhaustive computation on the twoagent, three-alternative base case, and then extended to arbitrary finite domains by the manually-derived induction lemmas.Chatterjee and Sen (2014) observed that, as far as they were aware, this is the "only Arrow-type result in the literature that does not use an axiom other than IIA", an achievement that they believe "could not have been conjectured without computational aid". 44ocial choice is replete with characterization and impossibility results.Geist and Endriss (2011) applied the Tang and Lin (2009) approach to the problem of ranking sets of objects (Kannai and Peleg, 1984), for which Barberà, Bossert, and  Pattanaik (2004) supplied almost 50 possibly desirable axioms. 45ather than deriving an induction lemma for every base case of interest, they derived a broadly applicable induction theorem based on model theory's Łoś-Tarski preservation theorem which describes when properties (ϕ, below) are retained in substructures, namely essentially when the theory can be expressed using universal quantifiers in the form ∀x .ϕ. 46 Furthermore, as they wished to distinguish between individual alternatives, sets of preferences, and preference orders the authors used a many-sorted FOL.Many-sorted FOL also allows relations (including set inclusion or union) to be defined on one domain that do not hold on the other.
Geist and Endriss then encoded 20 axioms drawn from Barberà, Bossert, and  Pattanaik (2004) in their many-sorted FOL.As their induction result translated impossibilities generated on small, finite domains to full-blown impossibility results, they took advantage of these concrete, finite base cases to re-write the axioms in propositional logic (using the kind of rewriting that transformed formula (1) to formula (4) in section 2.1).This, in turn, allowed them to use SAT solvers to search for subsets of axioms which generate impossibility results in these base cases; once found, the induction theorem generalized them to full impossibility results.Doing so for all base cases up to sets of eight items yielded 84 impossibility theorems from about one million combinations. 47heir results included known results (e.g.those of Kannai and Peleg (1984)  and Barberà and Pattanaik (1984)); variations on known results, typically formed by strengthening axioms to reduce the impossibility's minimal domain; direct consequences of other results (as they did not prune implications of existing impossibilities); a trivial contradiction between the axioms of uncertainty aversion and uncertainty appeal; and -perhaps most interestingly -new theorems.These last resolved an open question in the literature, which we now describe.
Letting ≻ (resp.) denote strict (resp.weak) preference on individual choice objects (denoted by lower case letters), and ⊲ (resp.) strict (resp.weak) preference on sets of objects (denoted by capital letters), Bossert, Pattanaik, and Xu  (2000) presented a theorem characterizing the min-max ordering in terms of four axioms.The min-max ordering is defined as where min {A} is the minimal element of A with respect to and max {A} the maximal element.Thus, a set A is weakly preferred under the min-max ordering to set B iff either the worst element of A is strictly preferred to that or B, or (when the worst elements are equally preferred) the best element of A is weakly preferred to that of B.
The four axioms were: for all x and y, so that a set consisting of a strictly preferred object is preferred to a set containing it as well as a strictly less preferred object, which -in turn -is preferred to a set consisting only of that less preferred object.

independence, A ⊲ B ⇒ A ∪ {x} B ∪ {x}
for all A and B and x not contained in A or B. Thus, adding a single object to two sets ranked by strict preference does not reverse that ranking (but it may weaken it).

uncertainty aversion,
for all x, y and z, so that a set consisting only of an intermediately preferred object is strictly preferred to a set consisting of a strictly more favourable and a strictly less favourable object.

simple top monotonicity,
x ≻ y ⇒ {x, z} ⊲ {y, z} for all x, y and z such that x ≻ z and y ≻ z, so that -if an object is strictly preferred to another -a set containing it and a third object is strictly preferred to a set containing the less preferred object and the third object.Arlegi (2003) showed that the min-max ordering was, in fact, inconsistent with the independence axiom, and presented an alternative axiomatic basis for it.Geist and Endriss (2011) presented a complementary result to Arlegi's, finding a contradiction between the four original axioms at even four choice objects, thus establishing that the original four axioms are inconsistent, so cannot form the basis of any transitive binary relationship.
Geist and Endriss (2011) also presented the first impossibility result in this literature not to use any dominance axiom.
In cases of interest, the authors were able to quickly derive manual proofs for the computationally discovered results. 48inally, the large set of impossibility results allowed the authors to statistically consider the role of the various axioms.For example, the linear order axiom appeared in all theorems; the 'even-numbered extension of equivalence' and reflexivity occurred in none; 'intermediate independence' occurred in all results for seven or eight choice items, but never for fewer than five choice items.Brandt and Geist (2016) extended the methodology of Geist and Endriss (2011) by performing an initial encoding in HOL, and then deriving implications capable of expression in propositional logic for small base cases.This allowed expression of more properties than was possible in the many-sorted FOL of Geist and Endriss  (2011).Thus, Brandt and Geist (2016) could encode a neutrality axiom that Geist and Endriss (2011) could not, but at the cost of generating exponentially many new variables, restricting the size of cases that could be computed.

Auctions
Applications of mechanized reasoning to auction design and implementation are less sophisticated than those to social choice.Nevertheless, given auctions' practical importance, we expect that these will ultimately become more widespread.This section surveys work in two separate areas -applying mechanized reasoning to checking results in auction theory, and checking implementations of auction designs.
On the former, Vickrey's theorem has provided a basic testbed result.Section 4 illustrates in detail our Isabelle implementation.It therefore complements Lange et al. (2013), which compared implementations of Vickrey's theorem in four different mechanized reasoners.
Conceptually, as higher-order logic is sufficient to express all concepts in auction theory, it is not challenging to represent basic results in auction theory using a higher-order logic theorem prover like Isabelle.Doing so in more basic logics is both more conceptually challenging, and may offer more promise of automation.
In simpler logics, model checking can automatically establish properties of systems by exhaustively inspecting the system's state space.Tadjouddine, Guerin,  and Vasconcelos (2009) used SPIN, a widely-used commercial model checker based on a linear temporal logic (LTL), to verify Vickrey auctions' strategy-proofness property that bidders cannot do better than to bid their valuations.49They implemented two techniques to reduce the search space while verifying strategyproofness for arbitrary bid ranges and numbers of agents: program slicing removed variables irrelevant to the property; abstraction discretized the domain of bids into a three-element domain, depending on whether a bid exceeded, equalled, or was less than an agent's valuation.A manual proof was required to establish the abstraction's soundness.Together, the two simplifications allowed strategyproofness to be verified for any number of agents in a Vickrey auction in a quarter of a second.
The second branch of applications of mechanized reasoning to auctions has sought to establish properties of auction designs as implemented.This is of interest for at least two reasons: first, even if theoretical properties of an auction are known, errors may be introduced when translating the auction from a design to an operational auction.Second, and more commonly for modern auctions, practice may simply outstrip theory.In both cases, mechanized reasoning can be used to reduce the likelihood that an auction will fail when run.Caminati et al. (2015) used Isabelle to prove that a combinatorial Vickrey auction is soundly specified, in the sense of guaranteeing that -whatever the bids received as input -the output allocated only the available goods, at non-negative prices, and assigned a unique output to each input.Furthermore, it implemented two parallel specifications of the auction, the first close to its standard paper specification, and the second a constructive one.Constructive definitions are essentially algorithmic descriptions.By contrast, definitions in classical logics need only state properties of the defined object.For instance, a classical definition of the maximum of a (non-empty) list of bids identifies an element of the list that is greater than or equal to every other element in the list.A constructive definition would begin by noting that -for a one-element list -the maximum is the single element of the list; it would then proceed recursively by computing the maximum of the remainder of the list.It would then return the larger of the two: the initial element, or the maximum of the remaining elements.
Isabelle was used to formally prove the equivalence of the two specifications.While the constructive specification is less intuitive, its algorithmic nature allows Isabelle to automatically generate verified executable code from it.
Model checking has also been used to examine auctions for evidence of shill bidding.Xu and Cheng (2007) used SPIN to define predicates corresponding to suspicious behaviour, including pushing prices to a reserve price before dropping out, and bidding on the higher priced of two identical goods.The model checker was then used to see whether the predicates were present in a finite dataset of actual bidding behaviour.Arcos et al. (2005) developed a toolkit to verify properties of multi-agent environments, with a traditional open outcry auction as their leading example.Their toolkit implemented liveness checks to ensure that agents are not blocked (i.e. can bid in every round), that each bidding round can be reached, and that the final bidding round is reachable from any other, as well as correctness of the bidding language (that is, that by following the rules, the system always remains in a defined state).Their toolkit also includes a simulation tool that conducts a 'what-if' analysis by performing a complete check of all cases.While the authors themselves do not refer to what they do as model checking, that is what it most closely resembles.
Finally, Bai, Tadjouddine, and Guo (2014) consider the question of how potential users of online auctions can trust the auctions' protocols.They develop a protocol for specifying auction designs that can be read by Coq, a mechanized reasoner.Future work building on this should eventually allow Coq to verify properties claimed for the auction.

Blueprint of a formal proof of Vickrey's theorem
The preceding has provided an overview of mechanized reasoning, both in general, and as applied to economic problems.This section provides a detailed description of how a mechanized reasoner is used in practice, in this case to verify a formal proof of Vickrey's theorem.We use Vickrey's familiar theorem to focus attention on the formal proof's implementation, rather than the details of the result or proof.
We begin with a standard statement of Vickrey's theorem and proof, in this case from Maskin (2004): Theorem 3 (Vickrey 1961).In a second-price auction, it is (weakly) dominant for each buyer i to bid its valuation v i .Furthermore, the auction is efficient.
Proof #1.Suppose that buyer i bids b i < v i .The only circumstance in which the outcome for i is changed by its bidding b i rather than v i is when the highest bid b by other bidders satisfies v i > b > b i .In that event, buyer i loses by bidding b i (for which its net payoff is 0) but wins by bidding v i (for which its net payoff is v i − b).Thus, it is worse off bidding b i < v i .By symmetric argument, it can only be worse off bidding b i > v i .We conclude that bidding its valuation (truthful bidding) is weakly dominant.Because it is optimal for buyers to bid truthfully and the high bidder wins, the second-price auction is efficient.
However intelligible to humans, Maskin's proof is too stylized for computers: that there is only one circumstance in which changing bids changes the outcome is merely asserted; the "symmetric argument" is not explicitly elaborated.Before formalizing it, we therefore elaborated the paper proof, and restructured it to four cases, rather than the original nine: Proof #2.Let N be the set of bidders, and suppose bidder i bids b i = v i , whatever b j each other bidder j i bids.There are two cases: 1. 2. i loses.This implies p i = 0, u i (b) = 0, and b i ≤ max j∈N\{i} b j as, otherwise, i would have won.This yields again two cases for i's alternative bid bi : (a) i wins, so that By analogy for all i, b = v supports an equilibrium in weakly dominant strategies.Efficiency is immediate: the highest bidder has the highest valuation.
To formally prove Vickrey's theorem, we used Isabelle, whose higher-order logic allows our formalization to remain close to paper mathematics.
Our proof, Vickrey.thy, is a 9 KB, 185 line file that draws on five ancillary files written for this project. 50All six files amount to 17 KB and 404 lines -much longer than their paper counterparts.A more reliable estimate of the additional effort involved in formal proofs, the de Bruijn factor (Wiedijk, 2012), cleans and compresses files before dividing the size of the code by the size of an informal T E X source.It thus avoids bias by semantically irrelevant differences in the syntaxes of formalisations such as languages or code styles using different lengths of lines or of identifiers.The de Bruijn factor relating Proof #2 and its definitions (including max) to our Isabelle code is 1.1; as our T E X source is more elaborate than usual, this is lower than the typically observed factors of around four.
Figure 1 depicts the files used in the proof.Those already in Isabelle's library are marked by ellipses.Dotted ellipses denote files containing general definitions and lemmas that we have added to Isabelle's library.Rectangles denote this paper's auction-specific files.Directed edges denote dependence, with the source code being imported into the target code.The fixes keyword applies the theorem to any N, v and A of the given types.The type single_good_auction is defined as an input × output relation, with the bidders and their bids as input, and a Boolean allocation vector and a vector of transfers as outcome. 51The valuations type is defined elsewhere to be a vector of real numbers.The assumes keyword on the next line states that the theorem holds under an assumption labeled val, namely that in the vector v of N real numbers, all numbers are non-negative (this defined at another place as the definition of 'valuations').

Vickrey.thy
Next, the defines declaration equates bids and valuations.The following assumes keyword introduces and labels further assumptions (e.g.A is a second-price auction; N contains more than one bidder).The shows keyword states the theorem: N agents participating in auction A, with valuations v and bids b (equated to valuations) yields an equilibrium in weakly dominant strategies.
SingleGoodAuctionProperties.thy defines the equilibrium concept: The definition's second line declares the type of the equilibrium_weakly_dom inant_strategy to be a (Boolean) predicate whose arguments are a set of participants, a valuation vector, a bid vector, and an auction. 52The definition's body states that the predicate, given arguments N, v, b and A, evaluates to true if and only if the remaining expression does.The expressions in the subsequent line ensure that all arguments have admissible values.Similarly, our first step when introducing whatever_bid is to ensure that it is an admissible bid vector.The whatever_bid(i := b i) notation then takes an arbitrary vector and replaces its ith component with i's bid b i (which the theorem equates to i's valuation). 53e denote the outcome of an arbitrary bid (whatever_bid) by (x, p), while (x ′ , p ′ ) denotes that of i's original bid and arbitrary bids by agents j i.To satisfy the definition of an equilibrium in weakly dominant strategies, the outcome (x ′ , p ′ ) of i's truthful bid must yield a payoff no less than that resulting from an arbitrary bid.The let • • • in • • • notation54 introduces local abbreviations, which can only be accessed within the in block; here, this makes the expression ((N, b ′ ), (x ′ , p ′ )) ∈ A more readable.
The code snippet below formalizes case 2b of Proof #2.It is declarative, resembling a textbook proof.Procedural proofs, by contrast, prescribe tactics to apply, thus more resembling the process humans use to find proofs.In either case, each theorem creates a proof obligation, or a goal; these may be broken into subgoals (e.g. by case distinction); the set of local proof obligations implied by these subgoals are stored on a goal stack.Proof #3.
proof cases 11 assume non_alloc : "x ′ i 1" 12 with spa_pred ′ i_range have "x ′ i = 0" using spa_allocates_binary by blast 13 with spa_pred ′ i_range have loser_payoff : The proof keyword starts the proof.Invoked alone, Isabelle would automatically select inference rules to apply.proof -performs manual inference.Alternatively, one can specify existing inference rules: • proof cases (lines 10 and 26) makes a case distinction; analysis of each case concludes by showing that the desired thesis holds; qed clears the goal stack; next begins the next case.
The proof considers an arbitrary but fixed participant i, which is introduced locally with the fix keyword, and assumed to be in the admissible range N for bidders. 55he have statements establish local facts, generating local proof obligations, which have to be discharged by corresponding proofs.Here, the cases proof establishes that u This proof makes use of further facts, omitted to keep the snippet readable: spa_pred and spa_pred ′ state that ((N, whatever_bid), (x, p)) and ((N, ?b), (x ′ , p ′ )) respectively are in an (input, outcome) relationship of a second price auction with each other.56defined states that a vector with one component per element of the (finite) set N has a well-defined maximum component.
Both from and using introduce facts to discharge the have obligations.The by keyword invokes an automated proof method, instead of discharging proof obligations by explicit declarative means.Isabelle thus combines ATP and ITP methods.
Line 20 supplies a simplification rule of our own, only_max_bidder_wins .
2. blast (lines 12 and 23) "is (in principle) a complete proof procedure for firstorder formulas" (Nipkow, 2015).In practice, blast either succeeds, fails, or -giving a practical example of semi-decidability -runs until the user cancels it.
While interactively developing the proof, we employed the try and try0 commands, which apply a range of automated methods, to find the most appropriate proof methods.Automated calls can always be replaced by explicit declarative steps; Isabelle's Sledgehammer tool (Blanchette and Paulson, 2015) can sometimes provide them automatically.
The assume • • • then have constructions (lines 17 and 18, and 27 and 28) list assumptions then state the proof obligations.Line 17's identifier ?thesis refers to the proof obligation at the proof's current level of reasoning.
Lines 22 -23's unfolding also performs substitutions, replacing stated concepts' names with the bodies of their definitions.Unlike abbreviations with ?, the latter are semantic definitions, of which the reasoner make use (e.g.sec-ond_price_auction_winner_def is restated in terms of i ∈ N, i ∈ arg max b, . . .).
Lines 29-32's have • • • also have • • • finally show construction allows chains of reasoning with equality before discharging a proof obligation: the ". . ." following the also have are replaced by the right hand side of the previous have statement.In line 31, this establishes that i receives zero given valuation v i and either (x, p), or (x ′ , p ′ ).

Discussion
The decade since the mechanized reasoning community became interested in economic applications has seen rapid progress.When Nipkow reported on his formalization of Arrow's theorem, he agreed that "[s]ocial choice theory turns out to be perfectly suitable for mechanical theorem proving", but felt that it was "unclear if [it] will lead to new insights into either social choice theory or theorem proving" (Nipkow, 2009).However, that very year Tang and Lin (2009) used mechanized reasoning to discover a new theorem that subsumes Arrow's, which Chatterjee  and Sen (2014) believed to be novel, and unlikely to have been found with traditional methods.Shortly thereafter, Geist and Endriss (2011) contributed their 84 impossibility theorems.
If mechanized reasoning is to make further inroads into economics it must be sensitive to a number of concerns.First, economics has no proofs of comparable complexity or length to significant results in modern mathematics.Thus, the question of whether a proof will exceed the capability of human theorists to verify is less of a concern than in mathematics.Further, it is unclear that there have been any disastrous cases of mistaken proofs within economics; instead, our greater errors likely result from poor modelling in the first place, and coding or data errors in econometrics.
Second, even when mechanized reasoners have helped identify new results, economic theorists may dismiss them as unmotivated, non-transparent or lacking insight. 57Even, however, in the worst case, we believe that a stock of poorlymotivated, non-transparent theorems generated blindly by computer provide cases for us to think about and reason with: the presence of the intermediate independence axiom in all of the larger impossibility theorems found by Geist and Endriss  (2011) should provide precisely the sort of hunch that sets us sharpening our pencils.
We close by suggesting some further possible applications of mechanized reasoning to economic problems.
First, there are open problems in auction theory that seem amenable to solution by computation (rather than 'reasoning').For example, the simplest formulation of optimal multi-object auctions (q.v.Armstrong, 2000) defines a linear programming problem that quickly becomes too large to solve manually as the number of items increases. 58As efficient algorithms exist for solving linear programming problems, automated mechanism design (q.v.Conitzer and Sandholm, 2003) has already begun to address the purely computational aspects of optimal mechanism design.As formal methods can be used to verify the results of computations (q.v.Gonthier, 2008; Hales et al., 2015), proofs in automated mechanism design could also be verified by formal methods.
Second, we believe that the exhaust-then-induct technique pioneered by Tang  and Lin (2009), and developed by Geist and Endriss (2011), offers the promise of automating search for theorems in other areas of economic theory.The formal similarities between social choice and matching theory -including a reliance on discrete objects -suggests that this technique could be applied directly to the latter.Although auction theory appears richer in its use of continuous objects (prices), there is a small literature establishing results by induction (Chew and  Serizawa, 2007; Morimoto and Serizawa, 2015; Adachi, 2014; Kato, Ohseto, and  Tamura, 2015); the possibility of coupling their induction steps with computational exhaustion has not been explored.
However these tools are applied within economics, it is hard to imagine them not becoming more important, as the tools themselves become faster and easier to use, as they gain acceptance within the pure mathematics community, and as the mechanized reasoning community seeks more applications for them.

Figure 1 :
Figure 1: High level theory graph for the formal proof of Vickrey's theorem

Table 3 :
1100 lines Relative lengths of human and machine proofs of Arrow's theorem Nipkow's formalization attempts began with Geanakoplos (2001), a working paper that preceded the published version