Keywords

figure a
figure b

1 Introduction

Certifying algorithms [1] generate a certificate alongside the computed solution such that proof checkers can independently validate the solution to increase users’ trust and the explainability of the results. In the model-checking community, a certificate to explain a verdict for a verification task is called a witness [2], and verifiers able to generate witnesses are called certifying verifiers. Witnesses can be independently checked by witness validators to confirm the verification results. Figure 1a shows a generic workflow for certifying and validating model checking. After a certifying verifier produces a verdict \(v\) and a witness \(\omega \) on a task \(T\), a witness validator takes \(T\) and \(\omega \) as input and checks if the information in \(\omega \) is enough to reestablish the results of the verifier on \(T\). The outcome of the verifier is certified if its verdict \(v\) and the validator’s verdict \(v'\) are consistent. In the rest of the paper, we use certifying model checking interchangeably with certifying and validating model checking when it is clear from the context that a framework contains both a certifying verifier and a witness validator. For reachability properties, if a model violates a safety specification, a violation witness [3] may contain external inputs to the model to replay the erroneous execution trace. If the safety specification is satisfied, a correctness witness [4] could record invariants of the model to reconstruct a safety proof. Section 2 presents a brief survey on witness validation in the formal-methods community.

Fig. 1.
figure 1

A certifying hardware-verification framework using software analyzers

Recently, hardware-to-software translators [5, 6] from the hardware-modeling language Btor2  [7], a prevailing format for word-level hardware model checking used in the Hardware Model Checking Competitions (HWMCC) [8, 9], have been proposed to facilitate the application of software analyzers to hardware circuits. Tools Btor2C  [5] and Btor2MLIR  [6] translate Btor2 circuits to behaviorally equivalent imperative software in the programming language C [10] and the intermediate representation used by the compilation toolchain LLVM [11], respectively, and enable any software analyzer for C or LLVM-bytecode programs to inspect Btor2 circuits. In an experiment on more than 1 000 Btor2 circuits [5], software verifiers for C programs are shown to detect more bugs than the best hardware model checkers by preprocessing the original circuit with Btor2C and analyzing the translated C program. However, in this previous work [5], only the verdicts of software verifiers but not the witnesses, which contain the information and reasoning behind a verdict, are transferred back to the hardware domain. In other words, the results of software verifiers on Btor2 circuits are not certified, and hence hardware designers may not trust software verifiers for analyzing their circuits.

1.1 Our Motivation and Contributions

Motivated to mitigate the aforementioned threat to reliability and leverage the capability of software verifiers to generate witnesses, we investigate the following open questions in this work: (1) whether the software witnesses for translated programs contain useful information about original circuits and (2) how to employ the information to aid hardware quality assurance. Our contributions are summarized below.

A Certifying Framework for HW Verification with SW Analyzers. Figure 1b shows the proposed certifying and validating hardware-verification framework based on software analyzers to approach the open questions. The framework translates a hardware-verification task \(T_H\) to a software task \(T_S\) and applies software verifiers to \(T_S\). After obtaining a software witness \(\omega _S\), it encodes relevant information from \(\omega _S\) in the form of a hardware witness \(\omega _H\) and validates the verdict returned by software verifiers with \(\omega _H\). We instantiate the framework in a tool Btor2-Cert for verifying Btor2 circuits with certified verdicts. In addition to preprocessing Btor2 circuits with Btor2C  [5] and invoking model checkers for the translated C programs, such as CPAchecker  [12], Cbmc  [13], Esbmc  [14], and UAutomizer  [15], Btor2-Cert features a translator from software witnesses to Btor2 witnesses and a witness validator Btor2-Val to check Btor2 witnesses. Section 4 shows our tool architecture.

Note that the framework in Fig. 1b is not limited to Btor2C and verifiers for C programs. For example, one could also materialize the concept with the translator Btor2MLIR  [6], analyzers for LLVM-bytecode programs [11], such as Klee  [16], Smack  [17], and SeaHorn  [18], and a corresponding LLVM-to-Btor2 witness translator. There also exist translators [19,20,21] from Verilog [22] circuits to C programs or SMV [23] models. We choose Btor2C for task translation because many verifiers for C programs participating in the International Competitions on Software Verification (SV-COMP) [24] can generate witnesses in a standardized and exchangeable format [2].

A Translator from Software Witnesses to Btor2 Witnesses . Btor2-Cert translates software violation witnesses in the format used in SV-COMP  [24] to the format defined by the Btor2 language [7]. For tasks satisfying their specifications, as there is no native format for correctness witnesses in Btor2, Btor2-Cert extracts the invariants in software witnesses and represents them as Btor2 circuits, whose inputs refer to the state variables of the original circuit. The advantages of not inventing a new format but reusing the existing Btor2 language are twofold: First, Btor2 extends SMT-LIB 2 [25] and provides the required operations on the word level to accommodate most invariants derived by software verifiers. Second, Btor2 is supported by many hardware model checkers participating in HWMCC [8, 9] and offers a suite Btor2Tools  [26] of utility tools for parsing and simulation, which simplifies further development around the Btor2 format.

A Validator for Btor2 Witnesses . To validate the witnesses for Btor2 circuits, we develop Btor2-Val, a portfolio-based witness validator involving hardware simulators and verifiers. Btor2-Val validates violation witnesses by invoking the simulator BtorSim from Btor2Tools  [26]. For correctness witnesses, Btor2-Val follows the validation-via-verification approach [27] by instrumenting the original Btor2 circuit with the circuit representing the invariant and verifying the instrumented circuit. The instrumented circuit satisfies the modified safety property if the invariant can be used to reconstruct the proof of correctness. Hardware verifiers are employed to check the instrumented circuits. Btor2-Val leverages CoVeriTeam  [28], a framework for cooperative verification, to coordinate the underlying hardware simulators and verifiers.

Enhancing Confidence in SW Verifiers on HW Designs. We evaluate Btor2-Cert on more than 1 000 Btor2 circuits to study its capability of providing certified verification results using software analyzers. In the experiment,

  • the witness translator was able to translate every violation witness and 97 % of the correctness witnesses produced by software verifiers,

  • the combination of witness translation and Btor2-Val outperformed mature software witness validators in both effectiveness and efficiency, and

  • Btor2-Cert provided certified results computed by software verifiers on some Btor2 circuits that the best hardware model checkers failed to verify.

The conceptual message conveyed by Btor2-Cert is software analyzers can derive useful information about circuits and complement conventional hardware model checkers with trustworthy results. Our contributions have a positive impact on analyzing hardware designs with software verifiers. The proposed framework Btor2-Cert is open-source and available online (more information in Sect. 4).

2 Related Work

Generating and validating witnesses for analysis results have been studied throughout the entire verification toolchain from satisfiability solvers to model checkers. In the following, we briefly review witness validation and compare our work to a recent certifying verification framework [29,30,31] targeting k-induction  [32].

2.1 Witness Validation

For satisfiability solving, the competitions on propositional SAT solvers [33, 34] use the DRAT format [35] to encode the certificates of unsatisfiability and independent validators [36, 37] to check the proofs. The competitions on SMT solving verify models to satisfiable formulas with the tool Dolmen [38]. Certifications for quantified Boolean formulas have also been investigated [39, 40].

For model checking, an early work [41] suggests generating a deductive proof from the run of model checkers with extra bookkeeping steps. In HWMCC [8, 9], the Btor2  [7] language defines a format for violation witnesses as a sequence of input values and initial values for registers that lead to an erroneous execution. However, Btor2 has no format for correctness witnesses. The competitions on automated termination analysis [42] use the format CPF [43], and in SV-COMP  [24], a GraphML-based format [2] is used to describe software witnesses as automata. In addition to the properties commonly used in tool competitions, a recent work extends proof generation of model checking to full LTL properties [44].

Numerous approaches have been invented for validating software witnesses. Methods to validate correctness witnesses include a parallel extension [45] of k-induction, program instrumentation with invariants and re-verification [27] (referred to as validation via verification in the publication), and program decomposition into several straight-line sub-programs [46]. Execution-based validation [47] is an elegant approach to validate violation witnesses. It extracts a sequence of external input values from a violation witness and employs debuggers or simulators to testify the reachability of an error location. Our witness validator Btor2-Val leverages validation via verification and execution-based validation. More details are given in Sect. 5 and Sect. 6, respectively. In our evaluation, the proposed validator Btor2-Val (together with the witness translator) competed well against the winners in the witness-validation track of SV-COMP 2023 [24].

2.2 Validating k-Inductiveness of Properties in Hardware Models

Given a sequential circuit and a number k as input, the tool Certifaiger  [29, 30] aims to validate that the safety property of the input circuit is k-inductive. Composing a k-induction-based hardware model checker and Certifaiger yields a certifying and validating model checker (as depicted in Fig. 1a), whose witnesses are the inductive length k. The key differences between the proposed framework in Fig. 1b and this framework [29, 30] for k-inductiveness are as follows.

First, our validator Btor2-Val expects a candidate invariant in the correctness witness but does not restrict the algorithms used by software verifiers. In contrast, Certifaiger expects a candidate inductive length k and thus can only validate results of k-induction-based model checkers. Second, to validate witnesses, Btor2-Val relies on validation via verification [27] and invokes model checkers because the candidate invariant may not be inductive. In comparison, Certifaiger avoids model checking and reduces the validation problem to several SAT checks since it assumes the safety property to be k-inductive. To sum up, our framework complements the existing work [29, 30] by considering candidate invariants as witnesses. Its applicability to algorithms other than k-induction comes at the expense of potentially more complex validation procedure. Certifaiger is further extended to accommodate temporal decomposition [48] as preprocessing to simplify the verification tasks [31], which has not yet been considered in our framework and is an important direction of future work.

3 Background

To facilitate the discussion in the rest of this manuscript, we provide prerequisite knowledge on model checking and witness validation from the literature.

A state-transition system \(\mathcal {M}\) is described by two predicates \(I(s)\) and \(TR(s,s')\) over states s and \(s'\) of \(\mathcal {M}\), which encode the initial states and transition relation (\(TR(s,s')\) is true if s can transit to \(s'\) via one step) of \(\mathcal {M}\), respectively. An invariant \(Inv(s)\) of a system \(\mathcal {M}\) is a predicate over states of \(\mathcal {M}\) such that \(Inv(s)\) is true for every reachable state s of \(\mathcal {M}\). We denote “\(Inv\) is an invariant of \(\mathcal {M}\)” by \(\mathcal {M}\models Inv\). A safety-verification task consists of a state-transition system \(\mathcal {M}\) and a safety property \(P(s)\). We say a safety-verification task (or a verification task for short) is safe if \(\mathcal {M}\models P\) and unsafe otherwise. Given a verification task of \(\mathcal {M}\) and \(P\), the problem of model checking asks whether \(\mathcal {M}\models P\) or not. In practice, state-transition systems manifest themselves as sequential digital circuits or programs. In the following, we briefly introduce the modeling languages used in HWMCC [8, 9] and SV-COMP  [24] with a running example.

3.1 The Btor2 Language for Word-Level Circuits

The Btor2 hardware-modeling language [7] was invented to describe model-checking problems of word-level sequential circuits. It extends the bit-level AIGER format [49] with data sorts of bit-vectors and arrays and inherits word-level operations from SMT-LIB 2 [25]. Figure 2a shows an example Btor2 circuit. The circuit has two state variables a and b and an external input in, defined in lines 6-8, respectively. The states and input are bit-vectors of width 8 (the sort bitvec 8 defined in line 1). Variables a and b are initialized to 2 and 0, respectively. In each iteration, variable a is right-shifted by 1 bit (line 18), and variable b is bitwise XOR-ed with 1 (line 19). Indicated by the keyword bad in line 16, a property violation happens if variable a equals 0, variable b equals 2, and input in equals 42. The example Btor2 circuit satisfies its safety property because variable b never equals 2. However, if variable b is initialized to a different value at line 10 (marked in red), say 2, a property violation will be triggered after two steps of state transition if 42 is given as the external input in the last iteration.

Fig. 2.
figure 2

An example Btor2 circuit and its translated C program

Translating Btor2 Circuits to C Programs . Btor2C  [5] is a lightweight translator from the Btor2 language to the programming language C [10]. It encodes Btor2 data sorts with unsigned integers and static arrays, expresses Btor2 operations with corresponding operators of C, and uses an infinite loop to model the execution of a sequential circuit. Given the example Btor2 circuit in Fig. 2a as input, Btor2C generates a translated C programFootnote 1 shown in Fig. 2b. Btor2C follows the rules of SV-COMP  [24] to encode safety-verification tasks for C programs, so compositional hardware model checkers for Btor2 circuits can be readily formed by combining software verifiers participating in SV-COMP as verification engines and Btor2C as preprocessing. In an extensive experiment [5], software verifiers are shown to detect more bugs in Btor2 circuits than the best conventional hardware model checkers, such as ABC  [50] and AVR  [51].

3.2 Representing Software Witnesses as Automata

Software witnesses can be represented as protocol automata [2], describing program invariants needed to construct a safety proof or program paths leading to a property violation. A letter in the alphabet of such a protocol automaton is a pair of a set of program edges and a condition over program variables. The set of program edges indicates the control flow, and the condition can be used to restrict the state space of the program. Program invariants that should hold at a certain program location can be annotated to a protocol automaton. In the following, we give an example correctness witness for the C program in Fig. 2b and an example violation witness for the same C program but with line 8 commented out.

Fig. 3.
figure 3

A correctness witness

Correctness Witnesses. Figure 3 shows an example correctness witness for the C program in Fig. 2b. The correctness witness shows that a program invariant \( \texttt {b{>}{=}0~ \& \& ~b{<}{=}1}\) is established once line 8 is executed. Indeed, variable b switches between 0 and 1 after being initialized, and \( \texttt {b{>}{=}0~ \& \& ~b{<}{=}1}\) is an invariant at the loop head of the program. A program invariant is stored as a C expression in a software witness and hence potentially more compact than invariants represented in other formalisms, e.g., a bit-level AIGER  [49] circuit.

Fig. 4.
figure 4

A violation witness

Violation Witnesses. Figure 4 shows an example violation witness for the modified C program with variable b uninitialized (by commenting out line 8 in Fig. 2b). The violation witness shows how to reach the error in line 12 of the C program. First, it assumes the value of variable b to be 2 via the condition when line 6 is executed. Second, it goes to the next state when line 10 is executed for the first two times. Third, it assumes the external input to be 42 when line 10 is executed for the third time. Indeed, the error in line 12 can be reached if variable b gets an initial value of 2 and the external input equals 42 in the third loop iteration.

4 Architecture of Btor2-Cert and Btor2-Val

We instantiate the proposed certifying and validating hardware-verification framework in Fig. 1b as Btor2-CertFootnote 2 with the Btor2-to-C translator Btor2C  [5], model checkers for C programs [52] that can produce verification witnesses in the format discussed in Sect. 3, a C-to-Btor2 witness translator, and the witness validator Btor2-Val. Figure 5 shows the translation and validation flows for correctness (in Fig. 5a) and violation witnesses (in Fig. 5b). Both the translator and the validator Btor2-Val for Btor2 witnesses are implemented in Python 3. Btor2-Val is based on a portfolio of hardware verifiers and simulators, with different tools coordinated by the cooperative-verification framework CoVeriTeam  [28].

Fig. 5.
figure 5

Witness translation and validation in Btor2-Cert and Btor2-Val

4.1 Validating Correctness Witnesses

Given a safe Btor2 circuit, its translated C program, and a correctness witness produced by some software verifier, Btor2-Cert certifies the results of the software verifier in two steps, as depicted in Fig. 5a. In the first step of witness translation, Btor2-Cert extracts the invariant at the loop head of the C program and represents it as a Btor2 circuit. The Btor2 circuit is named a witness circuit and refers to the state variables of the original circuit from its primary inputs. Second, in the validation step, Btor2-Val takes as input the original circuit, the witness circuit, and a user-defined parameter called invariant quality that specifies the level of strictness imposed on the invariant. Btor2-Val offers three levels of invariant quality to users, based on which it instruments the original circuit. Hardware verifiers are invoked on the instrumented circuit and will deem it safe if the invariant meets the specified invariant quality for reconstructing a safety proof. The details of validating correctness witnesses are presented in Sect. 5.

4.2 Validating Violation Witnesses

Given an unsafe Btor2 circuit, its translated C program, and a violation witness produced by some software verifier, Btor2-Cert certifies the results of the software verifier in two steps, as depicted in Fig. 5b. In the first step of witness translation, Btor2-Cert extracts the values for external inputs and uninitialized states from the software violation witness and encodes the information as a Btor2 violation witness [7]. Second, in the validation step, Btor2-Val invokes BtorSim  [26], a simulator for Btor2 circuits, to decide whether the Btor2 violation witness can trigger a bug in the original circuit. The details of validating violation witnesses are presented in Sect. 6.

5 Certifying Results of Software Verifiers: Correctness

In this section, we describe how Btor2-Cert certifies verification results for safe verification tasks. The Btor2 circuit and its translated C program in Fig. 2 as well as the software correctness witness in Fig. 3 will be used to explain the translation and validation of correctness witnesses, as outlined in Fig. 5a.

5.1 Witness Translation

Given a software correctness witness with a predicate annotated at the loop head of the translated C program, which some software verifier claims to be an invariant,Footnote 3 Btor2-Cert considers the predicate as a candidate invariant for the original Btor2 circuit and extracts it to reconstruct a safety proof. We encode the candidate invariant, written as an expression in the programming language C, into a combinational Btor2 circuit whose inputs refer to the state variables of the original Btor2 circuit and unique output asserts the predicate. Translating C expressions into Btor2 circuits is feasible thanks to the word-level data sorts and operations in the Btor2 language [7]. We name the combinational Btor2 circuit a witness circuit and refer to it as a Btor2 correctness witness. Note that our notion of a witness circuit is different from Certifaiger ’s definition of a k-witness circuit [29], which is a sequential circuit simulating k-step execution of the original circuit in one step. Figure 6 shows the witness circuit generated from the software correctness witness in Fig. 3. The input defined in line 5 refers to state variable b of the Btor2 circuit in Fig. 2a. The output defined in line 9 asserts the candidate invariant \( \texttt {b >= 0~ \& \& ~b <= 1}\).

Fig. 6.
figure 6

A witness circuit

5.2 Witness Validation via Verification

Following the idea of validation via verification [27], the validator Btor2-Val in Btor2-Cert checks Btor2 correctness witnesses by instrumenting the original circuit with the witness circuit and invoking hardware model checkers. It distinguishes three levels of quality for a candidate invariant computed by software verifiers. According to the notation introduced in Sect. 3, we denote the state-transition system of the original Btor2 circuit by \(\mathcal {M}\), with initial states \(I(s)\), a transition relation \(TR(s,s')\), and a safety property \(P(s)\). A predicate \(Inv(s)\) is

  • an invariant if \(\mathcal {M}\models Inv\),

  • a safe invariant if \(\mathcal {M}\models Inv\) and \(Inv(s)\Rightarrow P(s)\), and

  • a safe and inductive invariant if (1) \(Inv(s)\Rightarrow P(s)\), (2) \(I(s)\Rightarrow Inv(s)\), and (3) \(Inv(s)\wedge TR(s,s')\Rightarrow Inv(s')\).

In the literature [29], the three conditions for safe and inductive invariants are also named consistency, initiation, and consecution, respectively. Table 1 shows four predicates and highlights their respective quality as an invariant at the loop head of the program in Fig. 2b (\(P\) is the negated error condition).

Table 1. Candidate invariants at the loop head of the program in Fig. 2b

Btor2-Val takes the original Btor2 circuit, the witness circuit, and a user-specified invariant quality for the correctness witness as input and instruments the original circuit accordingly. To check if \(Inv(s)\) is an invariant helpful to reestablish a proof of \(P\), Btor2-Val combines the witness circuit and the original circuit by connecting the state variables of the original circuit to the corresponding inputs of the witness circuit. That is, Btor2-Val builds a circuit that encodes \(\mathcal {M}\models Inv\wedge P\). The instrumented circuit is given to hardware model checkers, which will utilize the information provided by the witness circuit to find a proof of correctness or refute the predicate if it is not an invariant. Note that the verification time of the instrumented circuit is expected to be shorter than that of the original circuit because the predicate can guide the search of hardware model checkers.

To implement the consistency, initiation, and consecution checks for safe or inductive invariants, Btor2-Val also relies on circuit instrumentation and hardware model checkers. While the three checks are not model checking but satisfiability in essence, it is convenient to encode them as combinational Btor2 circuits. Moreover, some hardware model checkers, such as ABC  [50], can simplify the circuits before performing satisfiability solving, which is usually faster than solving the queries directly with satisfiability solvers.

6 Certifying Results of Software Verifiers: Violation

In this section, we describe how Btor2-Cert certifies verification results for unsafe verification tasks. The unsafe versions of the Btor2 circuit and its translated C program in Fig. 2 with the state variable b being uninitialized (namely, with line 10 in Fig. 2a and line 8 in Fig. 2b commented out) as well as the software violation witness in Fig. 4 will be used to explain the translation and validation of violation witnesses, as outlined in Fig. 5b.

The Btor2 language defines a format for violation witnesses [7]. A Btor2 violation witness contains a sequence of input values fed to the Btor2 circuit in each cycle and the initial values for uninitialized state variables. Figure 7 shows an example violation witness for the unsafe version of the Btor2 circuit in Fig. 2a. It demonstrates how to trigger the error specified by the 0th bad statement (indicted by b0) via giving the initial value 2 to the 1th state variable b (under #0; a is the 0th state variable) and 42 to the 0th input in in the 2th cycle (indicated by @2). The simulator BtorSim  [26] takes a Btor2 circuit and a Btor2 violation witness and executes the circuit with the values for inputs and states in the witness. It confirms the violation witness if an error is triggered. The violation witness in Fig. 7 does not specify input values in the first two cycles because they are irrelevant to the error. In this case, BtorSim will assume the unspecified values to be zero.

Fig. 7.
figure 7

A Btor2 violation witness

6.1 Witness Translation

Given a software violation witness of the translated C program, Btor2-Cert extracts the conditions over program variables from the protocol automaton. These conditions are used by the software violation witness to prune out irrelevant program paths and highlight an error path. Btor2-Cert uses such information to give values to the corresponding Btor2 inputs and state variables in the form of a Btor2 violation witness. For example, the software violation witness in Fig. 4 will be translated to the Btor2 violation witness in Fig. 7.

6.2 Witness Validation via Execution

Following the idea of execution-based witness validation [47], Btor2-Val checks Btor2 violation witnesses by invoking the simulator BtorSim on the original Btor2 circuit and the translated Btor2 violation witness. An advantage of execution-based witness validation is its speed: In our evaluation, Btor2-Val was able to validate Btor2 violation witnesses translated from software violation witnesses much faster than software verifiers for finding the bugs. The speed of Btor2-Val minimizes the overhead to validate the alarms reported by software verifiers and makes the results of software verifiers more trustworthy and transparent for hardware designers.

7 Evaluation

To address the open questions highlighted in Sect. 1.1, we evaluated the proposed certifying hardware-verification framework Btor2-Cert on more than 1 000 Btor2 circuits and the witness validator Btor2-Val prepended with witness translation against the top contenders in the witness-validation track of SV-COMP 2023 [24]. Our experiment is designed to answer the following research questions:

  • RQ1: Can Btor2-Cert translate software witnesses to Btor2 witnesses?

  • RQ2: Is Btor2-Val prepended with witness translation effective compared to state-of-the-art software witness validators?

  • RQ3: Is Btor2-Val prepended with witness translation efficient compared to state-of-the-art software witness validators?

  • RQ4: Is the run-time consumed by witness validators shorter than the run-time consumed by software verifiers?

  • RQ5: Can Btor2-Cert complement conventional hardware model checking by providing additional certified verification results?

7.1 Benchmark Set

We executed our experiments on a benchmark set consisting of 1214 safety-verification tasks of Btor2 circuits, among which 758 are safe and 456 are unsafe. The verification tasks are collected from HWMCC as well as other sources and were used to compare the performance of hardware and software model checkers [5].

7.2 Experimental Settings

All experiments were conducted on machines running Ubuntu 22.04 (64 bit), each with a 3.4 GHz CPU (Intel Xeon E3-1230 v5) with 8 processing units and 33 GB of RAM. The resource limits imposed on verifying translated C programs and validating generated witnesses are both set to 2 CPU cores, 15 min of CPU time, and 15 GB of RAM. We used BenchExec  [53] to ensure reliable resource measurement and reproducible results. Btor2-Cert uses Btor2C at commit 36c1ad52 for translating a Btor2 circuit to a C program. In our experiment, we configure the witness validator Btor2-Val to use the PDR [54] implementation in ABC  [50] at commit 65ccd3cc and BtorSim  [26] as the underlying hardware model checker and simulator, respectively.Footnote 4 We also tried AVR  [51] for validating correctness witnesses, but it encountered errors on many instrumented circuits even though the circuits are syntactically valid according to Btor2Tools  [26].

7.3 Evaluated Verifiers and Validators

To verify the translated C programs, we used CPAchecker  [12] at revision 44619 and UAutomizer  [15] at commit 6fd36663 on safe tasks because they are good at constructing invariants in the competitions. We configured CPAchecker to run four algorithms based on Craig interpolation [55], including IMC [56, 57], ISMC [58], Impact  [59], and predicate abstraction [60]. On unsafe tasks, we evaluated the BMC [61] implementations in CPAchecker, Cbmc  [13], and Esbmc  [14] because BMC is the prevailing technique for bug hunting. Both Cbmc and Esbmc were downloaded from the archiving repository of SV-COMP 2023 [52]. For UAutomizer, we used its default settings in SV-COMP for both safe and unsafe tasks.

To evaluate Btor2-Val, we prepended it with the witness-translation step and compared the combination, which takes software witnesses as input, to validators for software witnesses. For correctness witnesses, we evaluated the first place winner UAutomizer of the witness-validation track in SV-COMP 2023 [24]. We also used an emerging validator LIV  [46] at commit cf736e45, which decomposes a program into straight-line sub-programs to check inductive invariants. We cannot compare Btor2-Val to Certifaiger  [29, 30] because Certifaiger consumes a candidate inductive length as input, while Btor2-Val expects an invariant from the witnesses. For violation witnesses, we compared Btor2-Val to execution-based validators [47] CPA-w2t and FShell-w2t. The former is of the same version as CPAchecker (i.e., at revision 44619) and the latter was downloaded from the tool archive of SV-COMP 2023 [52]. We also evaluated MetaVal  [27], a tool using validation via verification, but it did not terminate when instrumenting the translated C programs and failed to validate any witness in our experiment.

7.4 Results

RQ1: SW-to-HW Witness Translation. The upper part of Table 2 (resp. Table 3) shows the numbers of correctness (resp. violation) witnesses produced by the software verifiers and those successfully translated by the witness translator in Btor2-Cert. Table 2 additionally shows in its 2th row the numbers of software witnesses with candidate invariants annotated to the loop head of a translated C program. About 97 % of the candidate invariants in software correctness witnesses can be translated to Btor2 witness circuits. The CPAchecker ’s 14 candidate invariants that cannot be translated were due to the C-expression parserFootnote 5 exceeding the time limit when constructing abstract syntax trees. This is a technical limitation orthogonal to the proposed approach. Furthermore, all 4 candidate invariants of UAutomizer that could not be translated refer to undeclared program variables, rendering the witnesses to be syntactically incorrect.Footnote 6

For software violation witnesses, all of them were successfully translated by Btor2-Cert. The median translation time was below 2 s for both correctness and violation witnesses. Moreover, measured by the number of lines of a Btor2 witness, the translated correctness witnesses have a median size of 321, and the violation witnesses have a median size of 308. The results show the feasibility to translate and represent the information found by software verifiers in a native hardware-modeling format.

Table 2. Summary of results on validating correctness witnesses
Table 3. Summary of results on validating violation witnesses

RQ2: Effectiveness of Btor2-Val . The lower part of Table 2 (resp. Table 3) summarizes the numbers of correctness (resp. violation) witnesses that were validated by Btor2-Val and the compared validators.

Btor2-Val was able to validate the correctness witnesses produced by both CPAchecker and UAutomizer. When configured to accept safe and inductive invariants (recall the three levels of invariant quality in Sect. 5), it validates 329 out of 576 correctness witnesses translated to Btor2 witness circuits. In contrast, UAutomizer, the winner of the witness-validation track in SV-COMP 2023 [24], was not able to validate any correctness witness produced by CPAchecker (the corresponding cells are marked as “-”). LIV is designed to confirm safe and inductive invariants [46] and accepted 305 correctness witnesses in total, similar to Btor2-Val. Btor2-Val and LIV agreed on the majority of the correctness witnesses, and the cases where they computed different verdicts were caused by a bugFootnote 7 in LIV, which has been fixed by its developers. The results show that Btor2-Val is more robust than UAutomizer and achieves similar effectiveness as LIV. We manually inspected several witnesses rejected by both Btor2-Val and LIV and found that they indeed contain incorrect candidate invariants that do not overapproximate the reachable state spaces. Such invalid invariants might be caused by bugs in the conversion step of software verifiers from its internal formula representation back to the programming language C.

Table 2 also reports the results when Btor2-Val is configured to accept correctness witnesses with different levels of invariant quality. Overall, 77 % of the candidate invariants derived by software verifiers passed the invariant check of Btor2-Val, but only 57 % are deemed safe and inductive. As expected, the number of rejections increases with the strictness for invariant quality. However, there are 2 instances in Table 2 that passed the level “safe & inductive” but were not confirmed at the level “safe” by Btor2-Val. Such cases occurred because ABC, the backend verifier of Btor2-Val, ran into timeout when performing model checking, whereas the consistency, initiation, and consecution checks based on satisfiability easily went through. Among the four interpolation-based algorithms in CPAchecker, predicate abstraction is the best in terms of invariant quality: It generated the most safe and inductive invariants. The results demonstrate the unique value of Btor2-Val to quantify the quality of invariants derived by software verifiers.

For violation witnesses, Btor2-Val was far more effective than CPA-w2t and FShell-w2t in our experiment. Among 899 violation witnesses generated by software verifiers, Btor2-Val was able to validate 578 cases; It rejected other witnesses because they contain an incomplete or infeasible error path. In comparison, CPA-w2t and FShell-w2t only confirmed 122 and 150 witnesses, respectively. The numbers of rejected witnesses for CPA-w2t and FShell-w2t are not listed in Table 3 as the tools do not distinguish rejection of witnesses from other errors. We also observed that only 11 violation witnesses produced by CPAchecker, Esbmc, and UAutomizer were not validated by Btor2-Val, but witnesses generated by Cbmc suffered from a high rejection rate. This is because the violation witnesses of Cbmc often report an infeasible error path. Moreover, we notice that for many cases, different error paths are printed in Cbmc ’s violation witnesses and the console logs for its execution.Footnote 8 If we extract Btor2 violation witnesses from the console logs instead, Btor2-Val could validate 359 out of the 369 cases where Cbmc found an alarm. The effectiveness of Btor2-Val in confirming translated Btor2 violation witnesses showcases the value of Btor2-Cert because hardware designers can now trust software verifiers to detect bugs in their circuits and obtain a certified test case to trigger an error if software verifiers reported one.

RQ3: Efficiency of Btor2-Val . We compared the CPU time required for Btor2-Val and other state-of-the-art validators. From our experimental results, Btor2-Val (configured to accept safe and inductive invariants) achieved a median speedup of 2.2\(\times \) over LIV for correctness witness validation, and a median speedup of 11\(\times \) and 1.1\(\times \) over CPA-w2t and FShell-w2t for violation witness validation, respectively. In addition, Fig. 8 shows the scatter plots for the CPU time consumption of the compared validators. A data point (xy) in the plots corresponds to a case where CPAchecker took x seconds to produce a witness and a validator took y seconds to validate the witness. Observe that most data points of Btor2-Val are below those of other validators. The efficiency of the proposed certifying framework in translating and validating violation witnesses minimizes the overhead to apply software analyzers to find hardware bugs and makes the results of software verifiers trustworthy for hardware designers.

Fig. 8.
figure 8

CPU-time comparison of verification and witness validation (unit: s)

RQ4: Verfication versus Validation Time. Figure 8a (resp. Figure 8b) compares the CPU time for CPAchecker to compute a verdict and generate a correctness (resp. violation) witness to the CPU time for a validator to check the witness. We can see that almost all data points are below the diagonal, indicating that validation time is typically shorter than verification time. Such speedup shows that the validators are able to utilize the information in witnesses to reconstruct proofs of correctness or violation more efficiently than verifying the task from scratch.

RQ5: Complementing HW Model Checking with Btor2-Cert . The empirical evaluation in the TACAS 2023 publication [5] on Btor2C demonstrates that software verifiers are able to complement the state-of-the-art hardware model checkers by finding more bugs and uniquely solving dozens of tasks. We take a step further and investigate whether the verification results of those additional alarms and uniquely solved tasks can be certified by Btor2-Cert.

Btor2-Cert certified 37, 1, and 4 alarms found by the BMC implementations of Cbmc, CPAchecker, and Esbmc, respectively, which cannot be detected by the BMC implementation of ABC.Footnote 9 The additional alarms found by Cbmc alone account up to 8 % of unsafe tasks in our benchmark set. With the help of Btor2-Cert, the violation witnesses generated by software verifiers can be translated to Btor2 witnesses and validated by BtorSim. That is, the property violation reported by software verifiers can be replayed fully in the hardware domain, demonstrating the unique ability of Btor2-Cert to provide trustworthy verification results obtained by software analyzers.

For property satisfaction, although the previous study shows that software verifiers are not as good at finding proofs for correctness as their hardware counterparts, we still observed a case where ABC (the backend verifier used by Btor2-Val) went into timeout but only required less than 3 s to reconstruct a proof using the invariant generated by CPAchecker, and another case with a 5\(\times \) run-time speedup.

Summaries of Results. From the reported results, we conclude that (1) software witnesses can be translated to hardware witnesses (Table 2 and Table 3), (2) Btor2-Val is effective (Table 2 and Table 3) and efficient (Fig. 8), (3) witness validation by Btor2-Val consumes less time than software verification (Fig. 8), and (4) Btor2-Cert complements state-of-the-art hardware model checkers.

As a by-product of this work, our intensive investigation of software witnesses led to the discovery of several bugs in software verifiers. We reported the issues to the developers of the tools, and some of the bugs have been fixed. A complete list of issues that we found in software analyzers during this project is available on the supplementary webpage [62].

7.5 Threats to Validity

For external validity, our claims are established on a large set of Btor2 circuits to increase confidence, but it is unclear if they will hold on tasks with different features that are not covered in the used benchmark set. For construct validity, we report that witness validation is faster than verification, but validation and verification were done on behaviorally equivalent but syntactically different models (namely, a Btor2 circuit vs. a C program). While the setting is not exactly the same as in a previous publication [4], it is necessary because our experiment is designed to investigate how information in software witnesses can be used by hardware analyzers. We compared Btor2-Val prepended with witness translation to software witness validators. The former also uses the original Btor2 circuit as input, but the validators for software do not leverage circuit information. We performed the comparison this way because the hardware witness validator Certifaiger  [29] does not accept an invariant as input. For internal validity, we ran the experiments with the popular benchmarking framework BenchExec  [53] to guarantee reproducibility.

8 Conclusion

Validating verification results is vital to make formal methods applicable in practice, as it reinforces the trust of users and offers more insights into the analyzed model. In this manuscript, we proposed Btor2-Cert, a certifying and validating hardware-verification framework built upon translators and software analyzers. Btor2-Cert is an open-source toolchain, involving the Btor2-to-C translator Btor2C, certifying verifiers for C programs, a C-to-Btor2 witness translator, the Btor2 simulator BtorSim, and the validator Btor2-Val. We evaluated Btor2-Cert ’s capability of transferring the information across software and hardware analyzers and providing certified verification results on a large benchmark set. By employing software model checkers for hardware verification, we identified and certified 8 % of the unsafe tasks in our benchmark set that the state-of-the-art conventional hardware model checker ABC overlooked. For future work, we will augment Btor2-Cert to accommodate temporal decomposition [48], a preprocessing technique used to simplify sequential circuits before model checking. Such extension [31] has been made to k-inductiveness validators [29, 30].