Safe functional systems through integrity types and veriﬁed assembly

Rather than


Introduction
Embedded devices are ubiquitous, with many now playing roles that support human health, well-being, and safety. The critical nature of these systems -automotive, medical, cryptographic, avionic -is at odds with the increasing complexity of embedded software overall: even simple devices can easily include an HTTP server for monitoring purposes. Traditional 1. The functional ISA, "Zarf," is devoid of all global or mutable state, and provides a compact, complete, and mathematical semantics for the behavior of instructions; 2. The imperative ISA is strictly separated from the functional ISA, connected only via a communication channel through which the system components can pass values; 3. The subset of the application which operates on Zarf can be verified and reasoned about without regard to the operation of the imperative components, meaning that only the critical components need to be ported and modeled; 4. Reasoning on the functional ISA is provably composable -i.e., two separate pieces can be statically shown to never interfere with each other.
To demonstrate the usefulness of this platform, we develop, model, and test a sample application which implements an Implantable Cardio-Defibrillator (ICD) -an embedded medical device which is implanted in a patient's chest cavity, monitors the heart, and administers shocks under certain conditions to prevent or counter cardiac arrest. Though ICDs provide life-saving treatment for patients with serious arrhythmia, these devices, along with other embedded medical devices, have seen thousands of recalls due to dangerous software bugs [2,3]. By leveraging this two-layer approach, we are able to formally verify the correctness of a low-level implementation of the core functions in Coq and directly extract executable assembly code without needing software runtimes. The ISA semantics allow us to construct an integrity type system and formally prove that the rest of the code never corrupts the inputs or outputs of the critical functions. Furthermore, the functional abstraction built in to the binary code allows us to bound worst-case execution time, even in the face of garbage collection. Taken altogether, we have an embedded medical application whose core components have been proven correct, where non-interference is guaranteed, where real-time deadlines are assured to be met, and where C code can execute arbitrary auxiliary functions in parallel for monitoring. The high-level system architecture is shown in Fig. 1.
Given the significant amount of related efforts in verification and ISA design, we begin by summarizing how our work differs from previous efforts in the fields of verification and architecture (Section 2). We then describe the Zarf platform in more detail and describe a hardware implementation, which runs the application on an FPGA (Section 3). Details of our embedded ICD software application and the ways it can leverage the properties of the Zarf platform are described next (Section 4), followed by a precise definition of Zarf's semantics (Section 5). We then discuss the verification of multiple properties of the critical sub-components of the ICD, covering correctness, timing, and non-interference (Section 6). Finally, we evaluate this system architecture and approach, presenting hardware resource requirements of the novel ISA, and examine the performance loss of the verified components when compared to an unverified C alternative (Section 7), and conclude (Section 8). High-level Zarf system architecture: by dividing the system into two hardware realms -one that provides a precise, mathematical semantics for reasoning about program behavior, and the other a standard imperative core for legacy software -we can formally verify and otherwise reason about critical subsets of applications without needing to model and verify the entire program.
In addition to proofs on machine code for existing machines, it is also possible to define new assembly abstractions that carry useful information. Typed assembly as an intermediate representation was previously identified as a method for Proof-Carrying Code [23], where machine-checked proofs guarantee properties of a program [24]. Typed assemblies and intermediate representations have seen extensive use in the verification community [25,14,26,27] and have been extended with dependent types [28], allowing for more expressive programs and proofs at the assembly level.
Verified compilers are a popular topic in the verification community [29][30][31][32], the most well-known example being CompCert [33], a verified C compiler. Verified compilers are usually equipped with a proof of semantics preservation, demonstrating that for every output program, the semantics match those of the corresponding input program. A verified compiler does not provide tools for, nor simplify the process of doing, program-specific reasoning. One needs a secondary tool-chain for reasoning about source programs, such as the Verified Software Toolchain (VST) [34] for CompCert. These frameworks often have a great cost, mandating the use of sophisticated program logics, such as higher-order separation logic in VST, in order to fully reason about possible program behaviors.
Further, in many systems, it's possible that not all source code is available; without being able to reason about binary programs, guarantees made on a piece of the source program (and preserved by the verified compiler) may be violated by other components. Extensions to support combining the output of verified compilers, such as separate compilation and linking, are still an active research area [35,36]. As work on verified compilers requires a semantic model of the ISA, it is complemented by our work, which gives complete and formal semantics for an ISA.
Previous work at the intersection of verification and biological systems has attempted to improve device reliability through modeling efforts. This includes work that formulates real-time automata models of the heart for device testing [37], formal models of pacing systems in Z notation [38], quantitative and automated checking of the interaction of heartpacemaker automata to verify pacemaker properties [39], and semi-formal verification by combining platform-dependent and independent model checking to exhaustively check the state space of an embedded system [40]. Our work is complemented by verification works such as these that refine device specification by taking into account device-environment interactions.

Architecture
The SECD Machine [41] is an abstract machine for evaluating arithmetic expressions based in the lambda calculus, designed in 1963 as a target for functional language compilers. It describes the concept of "state" (consisting of a Stack, Environment, Control, and Dump) and transitions between states during said evaluation. Interpreters for SECD run on standard, imperative hardware. Hardware implementations of the SECD Machine have been produced [42], which explore the implementation of SECD at the RTL and transistor level, but present the same high-level interface. The SECD hardware provides an abstract-machine semantics, indicating how the machine state changes with each instruction. Our verification layer makes machine components fully transparent, presenting a higher-level small-step operational semantics, where instructions affect an abstract environment, and a big-step semantics, which immediately reduces each operation to a value. These latter two versions of the semantics are more compact, precise, and useful for typical program-level reasoning.
The SKI Reduction Machine [43] was a hardware platform whose machine code was specially designed to do reductions on simple combinators, this being the basis of computation. Like our verification layer, it was garbage-collected and its language was purely applicative. The goal was to create a machine with a fast, simple, and complete ISA. The choice to use the "simpler" SKI model means that machine instructions are a step removed from the typically function-based, mathematical methods of reasoning about programs. Our functional ISA, while also simple and complete, chooses somewhat more robust instructions based on function application; though the implementation is more complicated, modern hardware resources can easily handle the resulting state machine, giving a simple ISA that is sufficiently high-level for program reasoning.
The most famous work on hardware support for functional programming was on Lisp Machines [44][45][46]. Lisp machines provided a specialized instruction set and data format to efficiently implement the most common list operations used in functional programming. For example, Knight [46] describes a machine with instructions for Lisp primitives such as CAR and CADR, and also for complex operations like CALL and MOVE. While these machines partially inspired this work, Lisp Machines are not directly applicable to the problem at hand. Side-effects on global state at the ISA level are critical to the operation of these machines, and while fast function calls are supported, the stepwise register-memory-update model common to more traditional ISAs is still a foundation of these Lisp Machine ISAs. In fact, several commercial Lisp Machine efforts attempted to capitalize on this fact by building Lisp Machines as a thin translation layer on top of other processors.
Flicker also dealt with architectural support for a smaller TCB in the presence of untrusted, imperative code, but did so with architectural extensions that could create small, independent, trusted bubbles within untrusted code [47]. Our architecture is almost inverted, with a trusted region providing the main control, calling out to an untrusted core as needed. Previous works such as NoHype [48] dealt with raising the level of abstraction of the ISA and factoring software responsibilities into the hardware. Our verification layer shares some of these characteristics, but deals with verification instead of virtualization, as well as being a complete, self-contained, functional ISA.
Previous work has explored the security vulnerabilities present in many embedded medical devices, as well as zeropower defenses against them [49][50][51]. The focus of our work is analysis and correctness properties, and we do not deal with security.

Hardware architecture and ISA
Our system relies on two separate layers, running two different ISAs, connected only by a data channel. This allows one of the layers to be specialized to the execution of machine code with 1) a compact, precise, and complete semantics highly amenable to proofs, and 2) the ability to compose verified pieces safely. It is entirely possible that all code in the system be written to be purely functional and run on Zarf: the ISA for this layer is complete. However, embedded devices often contain a mix of software, including legacy code or nice-to-have features that do not affect the application's behavior, such as relaying data and diagnostic information to outside receivers. With a two-layer approach, we can run imperative code that is orthogonal to the operation of critical application components while still connecting with the vetted, functional code in a structured way. This, in turn, allows code to be formally verified piecemeal, with functions "raised" into Zarf as deemed necessary.
The following subsections describe the interface and construction of Zarf, including the reasons we take an approach much closer to the lambda calculus underlying most software proof techniques, how we capture this style of execution in an instruction set, the semantics for that instruction set, and more practical considerations such as I/O, errors, and ALU functions.

Design goals
Normal, imperative architectures have been difficult to model, and the task of composing verified components is still an open problem [35,36]. We identify the following features as undesirable and counterproductive to the goal of assembly-level verification: 1. Large amounts of global machine state (memory, stack, registers, etc.) directly accessible to instructions, all of which must be modeled and managed in every proof, and which inhibit modularity: state may be modified by code you haven't seen. 2. The mutable nature of machine state, which prevents abstraction and composition when reasoning about functions or sets of instructions. 3. A large number of instructions and features: a complete model must incorporate all of them (e.g., fully modeling the behavior of the ARMv7 was 6,500 lines of HOL4 [19]). 4. Arbitrary control flow, which often requires complex and approximate analyses to soundly determine possible control flows [52]. 5. Unenforced function call conventions, meaning one must prove that every function respects the convention. 6. Implicit instruction semantics, such as exceptions where "jump" becomes "jump and update registers on certain conditions." To avoid these traits, we design an interface that is small, explicit in all arguments, and completely free of state manipulation and side effects -with the exception of I/O, which is necessary for programs to be useful. Without explicit state to reference (memory and registers), standard imperative operations become impossible, and we must raise the level of abstraction. Instead of imperative instructions acting as the building blocks of a program, our basic unit is the function. This is a major departure from a typical imperative assembly, where the notion of a "function" is a higher-level construct consisting of a label, control flow operations, and a calling convention enforced by the compiler -but which has no definition in the machine itself. By bringing the definition of functions to the ISA level, they become not just callable "methods" that serve to separate out independent routines, but are actually strict functions in the mathematical sense: they have no side effects, never mutate state, and simply map inputs to outputs. This change allows us to attach precise and formal semantics to the ISA operations.

Description and semantics
Zarf's functional ISA is effectively an a) untyped, b) lambda-lifted, c) administrative normal form (ANF) lambda calculus. Those limitations are a result of the implementation being done in real hardware: a) to avoid the complexity of a hardware typechecker, the assembly is untyped 2 ; b) because every function must live somewhere in the global instruction memory, only top-level declarations of functions are allowed (lambda-lifted); c) because the instruction words are fixed-width with a static number of operands, nested expressions are not allowed and every sub-expression must be bound to its own variable (ANF). The abstract syntax of Zarf assembly is given in Fig. 2.
All words in the machine are 32-bits. Each binary program starts with a magic word, a word-length integer N stating how many functions are contained in the program, and then a sequence of N functions. Each function starts with an informational word that lets the machine know the "fingerprint" of the function (including the number of arguments expected x ∈ Variable n ∈ Z fn, cn ∈ Name p ∈ Program ::= −→ decl fun main = e decl ∈ Declaration ::= cons | func cons ∈ Constructor ::= con cn x func ∈ Function ::= fun fn x = e e ∈ Expression ::= let | case | res let ∈ Let ::= let x = id − → arg in e case ∈ Case ::= case arg of − → br else e res ∈ Result ::= result arg br ∈ Branch ::= cn x ⇒ e | n ⇒ e id ∈ Identifier ::= x | fn | cn | op arg ∈ Argument ::= n | x op ∈ PrimOp :: both control flow and deconstruction of constructor forms. An arrow over any metavariable (e.g. x) signifies a list of zero or more elements. op refers to a function that is implemented in hardware (such as ALU operations); though the execution of the function invokes a hardware unit instead of a piece of software, the functional interface is identical to program-defined functions. and how many locals will be used) and a word-length integer M to specify that the body of the function is M words long.
The remaining M words of the function are then composed entirely of the individual instructions of the machine. Each function, as it is loaded, is given a unique and sequential identifier. These function identifiers are the only globally visible state in the system and serve as both a kind of name and a kind of pointer back to the code. Other functions can refer to, test, and apply arguments to function identifiers. There are two varieties of function identifiers: those that refer to full functions that contain a body of code, and "constructors," which have no body at all. Constructors are essentially stub functions and cannot be executed. However, just like other functions, you can apply arguments to them. These special function identifiers thus can serve as a "name" for software data types, where arguments are the composed data elements. (In more formal terms, you can use our constructors to implement algebraic data types.) The words defining the body of a function are built out of just three instructions: let, case, and result, which we will describe below. Unlike RISC instructions, let and case can be multiple words long (depending on the number of arguments and branches, respectively). However, unlike most CISC instructions, each piece of the variable length instruction is also word-aligned and trivial to decode.
Zarf has no programmer-visible registers or memory addresses, but instructions will still need to reference particular data elements. Instructions can refer to data by its source and index, where the source is one of a predefined set -e.g., local and arg, which serve a purpose similar to the stack on a traditional machine. The local and arg indices might be analogous to stack offsets, while the actual addresses themselves are never visible.
The primary ways of generating Zarf assembly are via extraction from Coq and writing it by hand. We also have a Haskell compiler that supporting a subset of basic Haskell constructs. In our experience, Zarf assembly code resembles a typical functional programming language like desugared Haskell or OCaml, and the resultant expressibility makes directly writing assembly relatively easy; the user doesn't need to worry about memory address calculations, maintaining register or stack state across function calls, or the myriad other things that make programming traditional ISAs tedious and error-prone. For more information on automatic Coq extraction, see our discussion of the ICD implementation in Section 6. Fig. 5 gives the complete ISA behavior using a big-step semantics, which explains how each instruction reduces to a value. This semantics uses eager evaluation for simplicity; though the current hardware implementation uses lazy semantics, the difference is not observable in our application because I/O interactions are localized to a specific function and always evaluated immediately. The semantics use assembly keywords for readability; Fig. 3 shows how the assembly maps one-toone with the binary encoding, and Fig. 7 shows how low-level Coq code can be directly converted to our assembly.

Instruction set
The let instruction applies a function to arguments and assigns it a local identifier. The first word in the let instruction indicates a function identifier or closure object and the number of argument words that follow. Note that unlike a function "call", let does not immediately change the control flow or force evaluation of arguments; rather it creates a new structure in memory (closure) tying the code (function identifier) to the data (arguments), which, when finally needed, can actually be evaluated (using lazy evaluation semantics). Additionally, the let instruction allows partial application, meaning that new functions (but not function identifiers) can be dynamically produced by applying a function identifier to some, but not all, of its arguments. The case instruction provides pattern-matching for control flow. It takes a value, then makes a set of equality comparisons, one for each "pattern" provided. The first word of the case instruction indicates a piece of data to evaluate. As we need an actual value, this is the point in execution that forces evaluation of structures created with let -however, it is evaluated only enough to get a value with which comparisons can be made; specifically, until it results in either an integer or a constructor object. 3 The words following the instruction encode patterns (pattern_literal and pattern_cons) against which to match the case value. If the case value exactly equals the literal value or function (i.e. constructor) identifier, execution proceeds with the next instruction; otherwise, it skips the number of words specified in the pattern argument. A matching pattern_else is required for every case which will be executed when no other matches are found (and demarcates the end of the case instruction encoding). Case/pattern sequences not adhering to the encoding described are malformed and invalid -e.g., you cannot skip to the middle of a branch, or have a case without an else branch.
The result instruction is a single word, indicating a single piece of data that the current function should yield. Every branch of every function must terminate with a result instruction (disallowing re-convergent branches means the simple pattern-skip mechanism is all that is necessary for control flow). Functions that do not produce a value do not make sense in an environment without side effects, and so are disallowed. After a result, control flow passes to the case instruction where the function result was required.
We realize that this is a departure from traditional hardware instructions and suggest reference to Fig. 3 to help ground our descriptions in a concrete example. Fig. 3 shows a small function, map, written in high-level assembly, machine assembly, and encoded as a binary. A more thorough description of the semantics of each of these instructions is found in Section 5.

Built-in functions, I/O, and errors
ALU operations are, for the most part, already purely mathematical functions -they just map inputs to an output. The Zarf functional ISA is built around the notion of function calls, so no new mechanism or instructions are needed to use the hardware ALU. Invoking a hardware "add" is the same as invoking a program-supplied function. In our prototype, function indices less than 256 (0x100) are reserved for hardware operations; the first program-supplied function, main, is 0x100, with functions numbered up from there. During evaluation, if the machine encounters a function with an index less than 0x100, it knows to invoke the ALU instead of jumping to a space in instruction memory.
The only two functions with side-effects in the system, input and output, are also primitive functions. The input function takes one argument (a port number) and returns a single word from that port; the output function takes two arguments, a port and a value, and writes its result to the port, returning the value written. Since data dependencies are never violated in function evaluation, software can ensure I/O operations always occur in the right order even in a pure functional environ-ment by introducing artificial data dependencies; this is the principle underlying the I/O monad [53,54], used prominently in languages like Haskell.
In a purely functional system there are no side effects, and thus no notion of an "exception". For program-defined functions, this just requires that every branch of every case return a value (that value could be a program-defined error). However, some invalid conditions resulting from a malformed program can still occur at runtime. To respect the purely functional system, these must cause a non-effectful result that is still distinguishable from valid results. Our solution is to define a "runtime error constructor" in the space of reserved functions. Every function, both hardware-and softwaredefined, can potentially return an instance of the error constructor. The ISA semantics are undefined in these error cases, because it's very easy to avoid -compiling from any Hindley-Milner typechecked language will guarantee the absence of runtime type errors [55,56].

System software
This section describes the software architecture across the two realms (functional and imperative) of the system, and provides an overview of the ICD and the functional coroutines.

Functional vs. imperative
As our system is composed of two small and separate computational layers, the software is split across two different ISAs. For existing applications, or applications prototyped for existing platforms, the decision of which components to migrate to Zarf represents a trade-off of increased abstraction and verification capability for additional development effort and some decrease in performance. Section 7 provides some quantitative worst-case bounds for this trade-off.
Zarf runs a small microkernel based on cooperative coroutines [57,58] to handle the scheduling and communication of different software components. This allows us to more easily group and reason about code in terms of higher-level behaviors -i.e., the small surface area of each coroutine means they can be considered (and occasionally verified) in blocks, as collections of functions with a single specification and interface. The cooperative nature of the system is a design choice that allows us to avoid interrupts, which would complicate proofs of a single coroutine's behavior. Timing analysis (section 6.2) ensures each coroutine always returns control.
Zarf enables reasoning about these coroutines at the assembly and binary level. Section 6 demonstrates different properties that can be verified. The integrity type system allows a developer to statically prove that a given set of coroutines (and the microkernel itself) will execute in cooperation without one coroutine corrupting values important to another. This composability of verification is extremely difficult on traditional architectures, as the global and mutable nature of all state makes it quite easy for any software component to affect any other.
The imperative layer -which can be any embedded CPU, but for our purposes is a Xilinx MicroBlaze processorruns whatever pieces of the software are not placed on Zarf. This allows for monitoring software, low-level drivers, communication protocols, and other complex, imperative code to exist and run without requiring modeling or pure-functional implementations. As this area of the system is untrusted and unverified, anything on which the critical components depend should be rewritten to run on Zarf.
In our sample application, three application coroutines are run on Zarf: one that handles the core ICD application, an I/O routine that handles the timing of reading the values from the patient's heart and outputting when shocks should occur, and a routine that sends values to the monitoring software on the imperative layer. The system operates in realtime, reading a single value from the heart, running ECG and ICD processing, and communicating the resulting value back out. In our application, the monitoring software tracks the number of times treatment occurs, and, when prompted from its communication channel, will output that number. This imperative software could be arbitrarily complex and handle more complicated monitoring and diagnosis, communication drivers to communicate with the outside world, or other features; as it is a standard imperative core, any embedded C code can be easily compiled for it with an off-the-shelf compiler.

ICD
ICDs are small, battery-powered, embedded systems which are implanted in a patient's chest cavity and connect directly with the heart. For patients with arrhythmia and at risk for heart failure, an ICD is a potentially life-saving device. Currently, the primary use of ICDs is to detect dangerous arrhythmias (such as ventricular tachycardia, or VT) and administer pacing shocks (anti-tachycardia pacing, or ATP). These shocks help prevent the acceleration in heart rate leading to ventricular fibrillation, a form of cardiac arrest.
From 1990 to 2000, over 200,000 ICDs and pacemakers were recalled due to software issues [2]. Between 2001 and 2015, over 150,000 implanted medical devices were recalled by the FDA because of life-threatening software bugs [3]. However, ICDs are credited with saving thousands of lives; for patients who have survived life-threatening arrhythmia, ICDs decrease mortality rates by 20-30% over medication [59][60][61]. Currently, around 10,000 new patients have an ICD implanted each month [62], and around 800,000 people are living with ICDs [63]. The core of our ICD is an embedded, real-time ECG algorithm that performs QRS 4 detection on raw electrocardiogram data to determine the timing between heartbeats. We work off of an established real-time QRS detection algorithm [64], which has seen wide use and been the subject of studies examining its performance and efficacy [65]. An open-source update of several versions of the algorithm [66] is available; we use the results of this open-source work as the basis of our algorithm's specification as well as the C alternative. After the ECG algorithm detects the pacing between heartbeats, the ATP function checks for signs of ventricular tachycardia and, if found, administers a series of pacing shocks. We implement the VT test and ATP treatment published in [67].
The I/O coroutine is passed the output of the previous iteration of the ICD coroutine. A hardware timer is used to ensure that I/O events occur at the correct frequency. When the correct time has elapsed (5 ms), the I/O coroutine outputs the given value and reads the next input value. It yields this value to the microkernel.
This input is then passed through to the ICD coroutine, which implements a series of filter passes to detect the spacing between QRS complexes (Fig. 4 illustrates the ECG filter passes). If 18 of the last 24 beats had periods less than 360 ms (corresponding to a heart rate greater than 167 bpm), the ICD coroutine moves into a treatment-administering state, where it outputs three sequences of eight pulses at 88% of the current heartrate, with a 20 ms decrement between sequences. This is designed to prevent continued acceleration and restore a safe rhythm.
The monitoring software, which runs on the MicroBlaze, receives the output of the ICD coroutine each cycle. A command can be given on the diagnostic input channel for the software to output the number of times treatment has occurred. I/O events occur at a fixed frequency of 200 Hz. Timing analysis in Section 6.2 confirms that, after an input event, the entire cycle of each coroutine running and yielding, including garbage collection, is able to conclude well within the 5 ms window, meaning that the entire system is always able to meet its real-time deadline.

ISA semantics
Zarf has the core goal of providing concise, mathematical semantics for its hardware ISA. These can be found in Fig. 5, which gives the complete ISA behavior using a big-step semantics, explaining how each instruction reduces to a value. This semantics uses eager evaluation for simplicity; though the current hardware implementation uses lazy semantics, the difference is not observable in our application because I/O interactions are localized to a specific function and always evaluated immediately.
The semantics are discussed in more detail in the following subsections. Note that terms introduced in the abstract syntax ( Fig. 2) are used in the semantics. Each rule (or helper function (Fig. 6)) is applicable in a different case, depending on what is under evaluation; the scenarios are all mutually exclusive, meaning that there is always exactly one rule that can (and should) be applied at every step. to v. getint gets an integer from a specified port, and putint puts an integer onto a specified port; both are the only mechanisms for I/O in the system.

Names and programs
A Constructor is a unique name and a list of zero or more values. Constructors serve as software data types, as a simple system for building up more complex data objects. The name indicates the "type" of the constructor, encoded statically as a unique integer, which the machine uses at runtime to distinguish constructors of different types.
A Closure is a function object, tying a function to a list of zero or more values, which are the arguments that have already been supplied. Closures allow for the dynamic construction of function objects from statically defined functions: e.g., applying the argument 1 to the static binary function add creates a new closure, which expects one argument, that performs the function λx.x + 1.
A Value is either an integer, a constructor, or a closure. The machine uses one bit at runtime to track which values are primitives and which are objects (either constructors or closures), and identifies constructor types with their name (unique type integer), but is otherwise untyped.
An Environment is a semantic entity mapping variables (local names) to values (integers, constructors, and closures). The PROGRAM rule states that there should be a set of zero or more function and constructor declarations, and one function main with a body expression e. Given the declarations and main function, and application of the semantic rules, we can reduce e to a value.

Result
The RESULT rule states that, given the current function environment and a result instruction with argument arg, we can reduce the current function execution to a single value v using the environment to look up arg, if arg is a variable, or simply returning it, if arg is a number.

Let
A let instruction will be reduced using one of four rules: LET-FUN, LET-CON, LET-VAR, or LET-PRIM. The first is used for static program function application; i.e., applying arguments to a program-defined function (which excludes I/O and hardware functions). Similarly, the second is used for static constructor application; i.e., applying arguments to a program-defined constructor. The third is used to apply arguments to a runtime value, which will be a closure expecting additional arguments. The final is application on primitive (hardware ALU) functions. I/O functions (getint and putint) have separate rules.
LET-FUN is used when the instruction under evaluation is a let instruction applying zero or more arguments to a program-defined function f n. Its premises state that the function should not be getint or putint, that the function should be defined in the program declarations, that the arguments should all be reducible to a sequence of values − → v 1 using the current function environment, that application of the applyFn helper rule on the body of f n with arguments − → v 1 will result in a value v 2 , and finally, that binding a new local variable to v 2 will allow us to reduce the remainder of the instructions in the current function to a value. 5 The final premise of the rule (ρ[x 1 → v 2 ] e 1 ⇓ v 3 ) continues the execution: e 1 is the remainder of the instructions, where the environment now includes a mapping from x 1 to the newly calculated value v 2 . A premise of this format occurs in every rule for non-terminal instructions (everything but PROGRAM and RESULT); the semantics treat the instructions as recursive, such that each instruction "points" to the rest of the instructions.
The premises for LET-CON are very similar to those of LET-FUN, with the primary difference coming in applyCn.
While function applications can be oversaturated, we disallow oversaturation of constructor applications for simplicity; we haven't found this overly restrictive at all in practice. Otherwise, the rules behave similarly, storing the value that results from applying arguments to a constructor into the environment ρ before continuing to evaluate the next expression e to a value v 3 . In this way, closures and constructors are structurally the same; both are function identifiers with a sequence of arguments. The difference is that closures are already fully evaluated.
LET-VAR is used when a let instruction applies arguments to a dynamic (runtime) value x 2 . The premises state that x 2 should be reducible via the environment to a value v 1 , that the sequence of arguments are reducible to a sequence of values − → v 2 , that applying the arguments − → v 2 to the object v 1 with the applyFn operation will result in a value v 3 , and that binding a local variable to that value will allow us to reduce the remainder of the instructions to a value. The applyFn is 5 As these are Big-Step semantics, rules indicate how expressions reduce directly to values, rather than giving step-by-step instructions for performing the reduction. In evaluating LET-FUN, for example, the rule does not instruct you to "call" into the indicated function, but rather just states that the function reduces to a value (as it must, eventually), then uses the value; as we are writing mathematical expressions, the reduction is assumed to occur immediately. One can still "execute" the semantics by stepping through, evaluating operands as necessary using the rules in places that the semantics simply reduce immediately to a value. written to only accept closures as its first arguments; in calling it, there is an implicit premise that v 1 is a closure object.
x 2 reducing to any other type of value (an integer or constructor) is a runtime error, and can occur only if the program is not well-typed.
LET-PRIM is similar to LET-FUN, but is used only when the static function is a primitive operation. These functions must be treated differently because there is no program definition for add, or sub, or mult; during machine execution, the hardware ALU handles them. The semantics define ALU operations the same way as the machine (as 32-bit modular mathematical operations). The premises of LET-PRIM contain no function declaration; in addition, they invoke applyPrim instead of applyFn. Otherwise, the application is similar: the function call reduces to a value, and binding that value to a local variable will allow us to reduce the remainder of the instructions in the function under evaluation.

I/O
GETINT is the rule for the hardware-defined input function; it is invoked when a let instruction uses getint in a program, which takes a single argument n 1 . The premises state that reading a value from port n 1 should return an integer n 2 ; binding that value to a local variable, we can proceed with evaluation.
PUTINT is the rule for the output function (also hardware-defined); it takes two values: a port number n 1 , and an arg that should be output. The premises use the environment to reduce arg to a value; not stated (as it is a side effect) is that the hardware sends this integer value to the indicated port. The integer sent is also used as the value to which the instruction reduces, so it is bound to a local variable, and evaluation proceeds.

Case
We use two rules for the evaluation of case instructions (CASE-CON and CASE-LIT, for constructor and integer scrutinees, respectively); in addition, the rule CASE-ELSE handles the "else" branch for each of the two case varieties.
These CASE rules are invoked when a case instruction is under evaluation; which rule is used depends on what the scrutinee reduces to: if it is a constructor, CASE-CON is used, while CASE-LIT handles integers. A case instruction includes a series of zero or more branches, each of which has a constructor name or integer as a guard, and one else branch. Exactly one branch will always execute. Since constructor "names" become unique integers, constructor and integer matches must be encoded differently to distinguish which variety each branch is meant to match.
The first premise of CASE-CON indicates that the scrutinee arg must reduce to a constructor; the second says to take the expression from the branch with the matching constructor name and use that for the remainder of the function, which will reduce to a value. CASE-LIT is similar, but requires the arg to reduce to an integer, and takes the expression from the branch that attempts to match exactly that integer.
The CASE-ELSE rule is invoked when no matching branch is found for the case instruction (indicated in the disjuncted premises); there, it indicates to take the expression from the else branch in the instruction and use that for the remainder of the function.

apply helper functions
applyFn is perhaps the most complicated rule, because it is where currying is handled -different actions must be taken if too few, too many, or just enough arguments are supplied to a function application. The helper rule takes a closure and a list of zero or more values, which will be arguments to the closure.
1. If zero arguments are supplied, and there are exactly enough values already saved and ready to be applied in the closure, then simply feed those values as the arguments to the function, reducing the function body to a value, and return that. 2. If zero arguments are supplied, and there are not enough saved values in the closure, return the same closure.
3. If at least one argument is supplied, and there are not enough saved values in the closure, recursively call into ap-plyFn, taking the first argument from the list and appending it to the list of saved values. Either the argument list will run out before the function is saturated (resulting in case 2), or the function will eventually be saturated (resulting in case 1 or 4). 4. If at least one argument is supplied, and there are exactly enough saved values already in the closure, then we evaluate the closure (which must, if the program is well-typed, result in another closure), then recursively call applyFn with the new closure and the same argument list.
applyCn handles applications of arguments to constructors. The first rule is if exactly enough arguments are applied to the constructor application; in that case, a constructor containing those values is built and returned. The second handles the case where fewer values were supplied than expected (partially saturating a constructor); in this case, a closure is returned to capture the values already supplied, and when additional values are applied, it will invoke applyCn again to check if the constructor has all fields necessary to be built. We note that this appears to be dynamically creating syntax (the rule indicates placing a let instruction into the created closure), but as every constructor has a finite number of fields, these functions are all known statically. applyPrim handles evaluation of primitive operations. The first case is invoked when the correct arguments have been supplied for that particular operation, and simply evaluates the operation according to the rule for 32-bit modular arithmetic, returning the answer. Similar to applyCn, we must account for the case where the function did not receive enough arguments; as there, we create a new closure to capture the arguments supplied, and the operation can be evaluated once all arguments are received (the second case).
The ρ(arg) = ... helper function is for convenience, simply stating that if arg is an integer, return that value; otherwise, it must be a name mapped to some value in the environment, in which case that value should be returned.

Verification
We separate the verification of the embedded ICD application into three parts: verification of the correctness of the ICD coroutine, a timing analysis to show that the assembly meets timing requirements in the worst case, and a proof of non-interference between the trusted ICD coroutine and untrusted code outside of it.

Correctness
We first implement a high-level version of the application's critical algorithms (the ECG filters and ATP procedure) in Gallina, the specification language of the Coq theorem prover [68], using this version as our specification of functional correctness. This specification operates on streams -a datatype that represents an infinite list -by taking a stream as input and transforming it into an output stream. By sticking to a high-level, abstract specification, we can be more confident that we have specified the algorithm correctly. An ICD implementation cannot operate on streams, as all data is not immediately available; instead, it takes a single value, yields a single value, and then repeats the process.
The form of the correctness proof is by refinement: first, we create a Coq implementation of the ICD algorithm that is "lower-level" than the Coq specification. This lower-level implementation operates on machine values rather than streams, isolates function applications to let expressions, and avoids the use of "if-then-else" expressions, among other triviallyresolved differences. We then create an extractor that converts this lower-level Coq code directly into executable Zarf functional assembly code (see Fig. 7). If, for all possible input streams, we can prove that the output stream produced by the high-level Coq specification is the same sequence of values produced by the lower-level implementation, we can conclude that the program we run on Zarf is faithful to the high-level Coq specification. This proof of equivalence between the two Coq implementations is done by induction and coinduction over the program, showing that if output has matched up to point N, and the computation of value N is equivalent, then value N + 1 will be equivalent as well. As compared to extracting for an imperative architecture, we avoid needing to compile functional operations to an imperative ISA and do not require a large software runtime -or any software runtime at all. The translation simply replaces Coq keywords with Zarf assembly keywords, which is possible because the low-level Coq specification is in the A-normal form that Zarf requires. For example, the Coq keyword CoFixpoint would be textually replaced with fun, match replaced with case, etc.
We begin constructing our proof by first defining the relevant datatypes and expressions of the ISA as inductively defined mathematical objects: We're then able to define a small interpreter that executes the instruction semantics (several constructors have been omitted for brevity and clarity of presentation): The zarf_run_function' interpreter is made up of several cases, each of which correspond to rules in the big-step semantics defined in Fig. 5. For example, the first case, H_apply', defines what function application means. It says that if we add a thunk to our local environment that is the result of executing the apply operation (locals + execApply x ys args locals caseVal), and then execute the rest of the instructions zs, we have successfully executed an apply statement followed by the rest of the instructions zs.
We declare several axioms asserting the correctness of the ISA's built-in functions and their equivalence to built-in Coq operations. These built-in functions (like add, multiply, etc., labeled PrimOp in Fig. 2) are ultimately the base operations employed by all user-defined functions, and we reference them during the proof of correctness of the higher-level ECG algorithms later on. We also define a few axioms related to list and pair construction, whose correspondence to the usermade Zarf function equivalents are trivial. Here are a few assorted examples: We can then begin defining and proving things about non-builtin functions, like append, reverse, index, and various matrix multiplication operations. Here's an example of defining the simplest user-defined function, id, which is used often in Zarf assembly for assigning a constant numeric value to a variable (because let expressions purely operate over function identifiers (see Fig. 2)). Proof bodies are omitted for brevity: We then proceed to define several ECG helper functions, like the low pass filter, at both a high-level stream-based abstraction and the lower-level implementation which operates on machine values, and verify their equivalence (and thus the correctness of the machine's implementation of the specification): After we've proven the same_low_pass_filter_rec_and_rec2 theorem, we can convert the low-level low pass filter implementation directly into Zarf assembly. In this example, the assembly version of low_pass_filter_rec2 would look like: The preceding snippets of code have been just a sample of the entire set of proofs we wrote. The full proofs of correctness of the assembly-level critical ECG and ATP functions take under 2,500 lines of Coq. The implementations are converted line-for-line into Zarf assembly code, which is combined with assembly for the microkernel and other coroutines.
In total, the Trusted Code Base for the correctness proof includes: the hardware, the Coq proof assistant, and the small extractor that converts the low-level Coq code into Zarf functional assembly code. All other code is untrusted and may be incorrect, and the proof will still hold. The high-level ISA and clearly-defined semantics make this very small TCB possible, allowing the exclusion of language runtimes, compilers, and associated tooling that is frequently present in the TCB in verification efforts.

Timing
With a knowledge of how the Zarf hardware executes each instruction, we create worst-case timing bounds for each operation. In general, in a functional setting, unbounded recursion makes it impossible to statically predict execution time of routines. Though our application uses infinite recursion to loop indefinitely, the goal is to show that each iteration of the loop meets the real-time deadline; within that loop, each coroutine is executed only once, and no functions call into themselves. This allows us to compute a total worst-case execution time for the sum of all the instructions by extracting the worst-case route through the hardware state machine to execute each possible operation. For example, applying two arguments to a primitive ALU function and evaluating it has a maximum runtime of 30 cycles -this includes the overhead of constructing an object in memory for the call, performing a function call, fetching the values of the operands, performing the operation, marking the reference as "evaluated" and saving the result, etc. In an average case, only a fraction of the possible overhead will actually be invoked (see section 7 for CPI averages).
Hardware garbage collection is a complicating factor on timing. GC can be configured to run at specific intervals or when memory usage reaches a certain limit; for our application, to guarantee real-time execution, the microkernel calls a hardware function to invoke the garbage collector once each iteration. To reason about how long the garbage collection takes, we bound the worst-case memory usage of a single iteration of the application loop. The hardware implements a semispace-based trace collector, so collection time is based on the live set, not how much memory was used in all. For the trace-collector state machine, each live object takes N+4 cycles to copy (for N memory words in the object), and it takes 2 cycles to check a reference to see if it's already been collected. We bound the worst-case by conservatively assuming that all the memory that is allocated for one loop through the application might be simultaneously live at collection time, and that every argument in each function object may be a reference which the collector will have to spend 2 cycles checking.
From the static analysis, we determine that the worst execution of the entire loop is 4,686 cycles, not including garbage collection. Garbage collection is bounded by a worst-case of 4,379 cycles, making a total of 9,065 cycles to run one iteration of system -or 181.3 μs on our FPGA-synthesized prototype running at 50 MHz, falling well-within the real-time deadline of 5 ms.

Non-interference
Because the ICD coroutine has been proven correct (Section 6.1), we treat its output as trusted. This output must then travel through the rest of the cooperative microkernel until it reaches the outside world via the I/O coroutine's putint primitive. In order to guarantee the integrity of this data (meaning it is never corrupted nor influenced by less-trusted data), we rely on a proof of non-interference. Non-interference means that "values of variables at a given security level ∈ L can only influence variables at any security level that is greater than or equal to in the security lattice L" [69]. In a standard security lattice, L (low-security) H (high-security), meaning that high-security data does not flow to (or affect) low-security output. In our application, however, we are concerned with integrity; our lattice is composed of two labels, T (trusted) and U (untrusted), organized such that T U. Therefore, our integrity non-interference property is that untrusted values cannot affect trusted values [11].
To prove this about Zarf, we create a simple integrity type system that provides a set of typing rules to determine and verify the integrity type of each expression, function, and constructor in a program. After providing trust-level annotations in a few places and constraining the normal Zarf semantics slightly to make type-checking much easier, we can run a typechecker over the resulting Zarf code to know whether it maintains data integrity. We extend the original Zarf syntax to allow for these type annotations, as follows: , pc ∈ Label ::= T | U τ ∈ Type ::= num | (cn, τ ) | ( τ → τ ) ∈ Env = Identifier → Type [i 1 → τ 1 , . . . , i n → τ n ] e :τ (fun fn i 1 :τ 1 , . . . , i n :τ n :τ = e):(τ 1 , . . . , τ n ) → τ (func) A type is inductively defined as either a labeled number, a singleton constructor, or a function constructed of these types. The type environment maps variables, function, and constructor names to types. Since all functions are annotated with their types, type checking proceeds by ensuring that the return type of a function is the same as the type deduced by checking the function's body expression with the function's parameter types added to the type environment. denotes the join of two types, and • denotes the joining of a type's integrity label with another.
, pc ∈ Label ::= T | U τ ∈ Type ::= num | (cn, τ ) | ( τ → τ ) func ∈ Function ::= fun fn x 1 :τ 1 , . . . , x n :τ n :τ = e cons ∈ Constructor ::= con cn x 1 :τ 1 , . . . , x n :τ n Specifically, following the spirit of Abadi et al. [6] and Simonet [70], types are inductively defined as either labeled numbers, or functions and constructors composed of other types. Our proof of soundness on this type system follows the approach done in work by Volpano et al. [7]. We show that if an expression e has some specific type τ and evaluates to some value v, then changing any value whose type is less-trusted than e's type results in e evaluating to the same value v; thus, we show that arbitrarily changing untrusted data cannot affect trusted data. We prove soundness case-wise over the three types of expressions in our language, combining our evaluation semantics with our security typing rules.

Integrity type system
The integrity type system is found in Fig. 8, with its helper functions found in Fig. 9. Our integrity lattice is composed of two elements, T and U (trusted and untrusted, respectively), such that T < U (opposite of a normal security lattice). We extended the Zarf ISA by requiring function and constructor type annotations. Constructors, which previously were untyped, are now singleton types: each constructor declaration defines a type, but that constructor is the sole inhabitor of the type. This restriction eliminates case expressions as sources of control flow when casing on a constructor type (since we know statically that the case expression's scrutinee will be a single unique value, and therefore also statically know which branch will be taken); note that this also eliminates the consideration of the else branch in a case expression on a constructor type. Instead, case expressions in this lightly-typed Zarf are solely for binding a constructor's internal values to variables (via deconstruction). Though this causes a loss in expressive power in the general case (constructors must be singleton types), our microkernel was designed without the need for this type of control flow. case expressions whose scrutinee is a number, however, still allow for control flow (since the value of an num is not known ahead of time); therefore, the type of this form of case expression is the join of all of its branch types. The type of the scrutinee, which is significant in a security analysis, is here irrelevant -there are no implicit flows for integrity. Because we do not use union types, another small restriction we enforce is that each branch in a case expression must result in the same base type (i.e. all must either type-check to a num, (cn, τ ), or τ → τ ), such that we may join them together properly (see Fig. 10).
Applying a helper function that takes one argument to a list of arguments is shorthand for mapping that function over the list.
num num = num 10. Joining Two Types. The • operator is used to join a type's label with another label; if the type that the label is being joined with is not a num, the label will be joined with each of the type's inner types until a base num is reached. Joining two lists of types is equal to the pairwise join of their elements. Constructor join is trivial because constructors are singletons whose type never changes, and only equal constructors can be compared. . 11. Subtyping Rules. One type is a subtype of another if their base types are equal and, in the case of the base num type, the first's label is lower in the integrity lattice than the other. A list is a subtype of another if pairwise each element of the first is a subtype of the corresponding element in the other list.
The integrity label associated with a num depends on the integrity level of the code that created it: untrusted code can only create numbers of type num U , while trusted code can create trusted numbers (which can be treated as untrusted numbers via subtyping; see Fig. 11). Primitive operations (add, subtract, etc.) are treated as named functions contained within the set of declarations −→ decl. The type of primitive operators is dependent on the trust level of the caller: for example, the type of add is num 1 → num 2 → num 1 2 pc , where pc represents the trust level of the current program location (we assume its value can be tracked and changed outside of the type system proper). This all implies that untrusted code cannot use the primitive operations to create any type of trusted value (regardless of the types of the numbers an untrusted caller uses), thus restricting untrusted code's ability to obtain trusted values to (1) the getint function (which in our application is data straight from the heart monitor) and (2) by calling trusted functions which return trusted values.
This system will verify integrity for a value with singular endpoints -i.e., for the code being checked, it is received at one point and sent at one point. More complex annotations and treatment of values, like an arbitrary number of mutually untrusted but critical values passing through an arbitrary number of trusted and untrusted regions, can be guaranteed with this type-system via piecewise checking. By guaranteeing each link in the chain one-at-a-time, the integrity of the chain is verified.
The soundness proof of the integrity type system proceeds by cases on the three forms of expressions. 2.
We show that regardless of arg 2 's level when it is of type num, it cannot be changed and therefore e 0 's value doesn't change.
(a) If ∃ 1 ∈ τ 0 s.t. 1 = T, then by typing rule case-lit and the rule for join, n's integrity label is T. Therefore, arg 1 cannot both equal n and be arbitrarily changed to some expression arg 2 because it is not an expression whose type label is less trusted than the type of the entire expression (i.e. num T ≥ τ 0 ). Thus we cannot replace arg 1 with arg 2 , so in this case the value of e 0 remains the same, as desired. Since e 1 through e m+1 are expressions whose soundness with respect to the type system can be considered separately through Lemmas 1, 2, and 3, we do not consider them here.
(b) If ∃ 1 ∈ τ 0 s.t. 1 = U, then by our definition of the T − U integrity lattice, there can be no values whose type is greater than τ 0 (arg 1 included) that we can change. Therefore, e 0 remains unchanged, satisfying our conclusion.
• We know by the operational semantics (restricted to accommodate this type system, with singleton constructor types) that which branch we case on is determined entirely by the constructor that arg 1 evaluates to, and not the values contained within that constructor. Therefore, changing the expressions within any constructor will result in the same branch being taken, such that e 0 evaluates to the branch's right-hand-side expression. Therefore, we cannot choose to replace arg 1 with another arbitrary arg 2 when arg 1 :(cn, τ 1 ).
• Let (cn 3 x 3 ⇒ e 3 ) be the matching branch (where cn = cn 3 ). Based on the previous bullet point, we know that changing the expressions of any other branches will not change the value of the entire case expression, so we focus on this particular branch as an example. We must show that ∀τ 3 , ∃x 3 ∈ x 3 s.t.
x 3 :τ 3 > τ 0 , changing the value that x 3 maps to in ρ does not change the value that e 3 evaluates to; that is, ρ[ Since e 3 is an expression, its soundness is either covered by Lemma 1 (by induction) or Lemmas 2 or 3.

Lemma 2 (Result expression soundness). If
Proof. The result expression is used for wrapping a value into a single expression containing that value. Therefore, changing the value of arg 1 to arg 2 would change the resultant value v 1 that e 0 is given, contradicting our result. As another point, by the typing rule result, result's type is precisely the type of arg 1 , meaning there are no values within e 0 to change that would not cause us to violate (3) above. Therefore, the value of arg 1 must equal the value of arg 2 such that value of e 0 cannot change.
Proof. By cases on id: 1. If id is a primitive function (add, multiply, etc.), then v 2 = v 3 ⇐⇒ − → arg 3 = − → arg 4 . By the typing rule of primitives, the type τ that the function returns is the least upper bound of all of its arguments, including arg 1 , meaning by definition, both the value and type of the primitive operation are entirely dependent on all arguments. Therefore, there cannot exist an arg 2 that allows us to substitute it for arg 1 whose type is less trusted than τ without changing the entire value v 1 .
2. If id is a constructor, then id has the type − → τ → (cn, τ ). id's return type is determined statically and does not change throughout program execution. Therefore, there does not exist a subexpression in − − → arg 3 , or more generally, in e 0 , that can be changed without changing the type of the constructor, which would contradict our having the same values after evaluation.
3. If id is a non-recursive function composed solely of case and result expressions and applications of primitive functions and constructors used in let expressions, then by (1), (2), Lemmas 1 and 2 and induction on Lemma 3, we know id must be sound. By extension, if id calls a function that fulfills these requirements, one can unfold the called function's contents in order to see that the resultant value v 2 satisfies this case. 4. If id is a recursive function or calls a function which calls id (i.e. mutual recursion), it is possible that the function call never terminates and therefore never results in a single value. The soundness of e 0 must then be guaranteed via induction on possible expressions, proven in the previous lemmas. We know statically that the type of id is of the form τ → τ , so we are guaranteed via simplification rules in the apply helper functions that types of − → arg 3 must be equal to or subtypes of τ , or otherwise our operational semantics would get stuck. By induction, any recursive calls made in e 1 must also satisfy this lemma, meaning that the actual arguments − → arg 3 are used properly, otherwise e 0 wouldn't type check to type τ 1 by getting stuck.
By proving that v 2 's value does not change when less-trusted values change, we can safely continue with the evaluation of e 1 , which will be a case, result, or let, all of which are handled in Lemma 1, Lemma 2, and Lemma 3, respectively.
Theorem 1 (Integrity type system soundness). Our integrity type system is sound if, given some expression e of type τ which evaluates to some value v, we can show that we can arbitrarily change any (or all) expressions in e which are less trusted than τ so that e still evaluates to v; i.e., untrusted data does not affect trusted data.
Proof. There are just three types of expressions: let, case, and result. By Lemma 1, we show that case expressions (the vehicle for control-flow) are sound. By Lemma 2, we show that result expressions are sound. Likewise, by Lemma 3, we show that let expressions (the vehicle for function application) are sound. Thus, we have exhaustively shown soundness of all expressions. Furthermore, we can see that when these expressions are composed according to the abstract syntax, with the additional typing annotations and a few restrictions, any well-typed Zarf program has the property of non-interference with respect to integrity, even while using a simplistic type system such as that explained here.

Programmer responsibility
We have demonstrated that there are varying degrees of responsibility a Zarf programmer can take when writing their application, each involving greater effort. The first is doing the minimum: the programmer writes their program in Zarf assembly. A major advantage of Zarf is that the application automatically gains the benefits of memory and control-flow safety inherent in the ISA, properties that other ISAs don't easily offer. Any well-formed application that runs on Zarf gets these properties without any additional programmer involvement.
The second degree of responsibility that can be taken is writing the application's specification in Coq and automatically lowering it to Zarf to prove its correctness. This approach involves a non-trivial amount of proof-writing, but since the ISA resembles the language of verification very closely, we argue that the amount of work involved relative to doing so over other imperative ISAs is significantly less. Since high-level specification and verification of critical applications is common practice, this level of programmer responsibility is not usual. Gladly, however, any future proof efforts might not need to be entirely application-specific. Given the exercise of proving the correctness of the Zarf implementation of the ICD algorithm, we now have a set of theorems and proofs showing the equivalence between common user-made Zarf functions and Coq versions. It is conceivable that verification in Coq of future Zarf applications could reuse this underlying work.
The third degree of responsibility involves proving additional properties over the system, beyond the aforementioned safety and correctness. We demonstrated this by laying a security type system over the ISA, somewhat restricting it (like all type systems are wont to do) in exchange for the added property of non-interference. Because the process of writing a type system and checker are sufficiently general, we can see additional type systems or analyses being made over the base Zarf ISA relatively easily.
Finally, there is the issue of determining which parts of an application should go into each hardware execution realm. Zarf has two execution realms due in part to the assumption that users might want to include legacy or high performance, non-critical code; this code can run on the imperative ISA. However, any activity providing critical functionality for safe operation should happen in the functional processor. Veridrone [71] is an example of another project beyond our ICD that might benefit from this approach; that project uses both a lower-performance core safety control system and a higherperformance unverified version that is more energy-efficient and allows for smoother flying.

Evaluation
To validate our designs, we download the Zarf hardware specification onto a Xilinx Artix-7 FGPA and run our sample application. For a comparison, we also run a completely unverified C version of the application on a Xilinx MicroBlaze on the same FPGA. Hardware synthesis results are summarized in Table 1.
The hardware description of Zarf is more complex than a simple embedded CPU, with 66 total states of control logic (4 deal with program loading, 15 with function application, 18 with function evaluation, and 29 with garbage collection). In all, the combinational logic takes 29,980 primitive gates (roughly the size of a MIPS R3000), or 4,337 LUTs when synthesized for an Artix-7 FPGA (less than 7% of the available logic resources). Estimated on 130 nm, the combinational logic takes up .274 mm 2 . Though larger than very simple embedded CPUs, Zarf is still quite a bit smaller than many common embedded microcontrollers.
From a dynamic trace of several million cycles, the ICD application exhibited the following average CPI for each in- Approximately one third of the dynamic instructions were branch heads. The C version of the ICD application on the MicroBlaze takes fewer than one thousand cycles for each iteration of the application. The analysis in section 6.2 discusses the worst-case runtime of the Zarf application, which is around 9,000 cycles or 180 μs (though much faster in the typical case). This is in addition to a longer cycle time (see Table 1). When compared to the carefully optimized and tiny MicroBlaze, our experimental prototype uses approximately twice the hardware resources, and the application is around 20x slower in the worst case than MicroBlaze in the common case -but is still over 25 times faster than it needs to be to meet the critical real-time deadlines, all while adding invaluable guarantees about the correctness of the most critical application components and assurance of non-interference between separate functions.

Conclusion
As computing continues to automate and improve the control of life-critical systems, new techniques which ease the development of formally trustworthy systems are sorely needed. The system approach demonstrated in this work shows that deep and composable reasoning directly on machine instructions is possible when the architecture is amenable to such reasoning. Our prototype implementation of this concept uses Zarf to control the operation of critical components in a way that allows assembly-level verified versions of critical code to operate safely in close partnership with more traditional and less-verified system components without the need to include run-times and compilers in the TCB. We take a holistic approach to the evaluation of this idea, not only demonstrating its practicality through an FPGA-implemented prototype, but furthermore showing the successful application of three different forms of static analysis at the assembly level of Zarf.
As we move to increasingly diverse systems on chip, heterogeneity in semantic complexity is an interesting new dimension to consider. A very small core supporting highly critical workloads might help ameliorate critical bugs, vulnerabilities, and/or excessive high-assurance costs. A core executing the Zarf ISA would take up roughly 0.002% of a modern SoC. Our hope is that this work will begin a broader discussion about the role of formal methods in computer architecture design and how it might be embraced as a part, rather than an afterthought, of the design process.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.