Efficient Generation of Program Execution Hash

Distributed computing systems often require verifiable computing techniques in case their node is untrusted. To verify a node’s computation result, proof-of-work (PoW) is often utilized as a basis of verifiable computing method; however, this mechanism is only valid for computations producing results based on specific algorithm (e.g., AES decryption). To date, there is no efficient PoW mechanism applicable to arbitrary algorithm or a computation that does not produce any tangible output (e.g., void function). This paper proposes execution hash to serve as a proof for a program’s idempotent computation result without relying on its algorithm. Two versions of execution hash generation methods were designed and implemented and the efficacy was evaluated in terms of performance and reliability. Implementation was based on LLVM/Clang 6.0 and evaluation was based on open-source software, including GNU binutils/coreutils and Google’s OSSFuzz projects.

W ITH the recent implementation of cloud computing in various fields, outsourcing specific tasks to third parties have become prevalent, which has bolstered the importance of verifiable computing [1]- [4]. Verifiable computing generally refers to the process where one client outsourcing tasks, such as computation, to multiple untrusted entities. Each entity participating in the computation must verify its conduct by returning the results with proof that it executed the work correctly. However, issues that may arise include dishonest workers, not actually performing the computations, and returning plausible results [1]. Therefore, researchers have conducted various studies on enabling clients to verify, with little effort, if the work was correctly performed [5].
However, previous cases of verifiable computing primarily outsourced computations that have a clear input-output relationship, such as mathematical or cryptographic operations [6], [7]. This study proposes execution hash as a methodology that enables the client to verify the execution flow of all programs intuitively, including computations where a result does not exist (e.g., void function), or where the inputoutput relationship is unclear. Specifically, when offloading certain tasks to untrusted clients, the execution hash value can be used as a proof whether the computation was actually executed as intended.
The execution hash is a hash value generated from tracing the program's runtime execution flow; if the execution flow changes, then the hash value also changes 1 . There are much information related to a program execution. The execution hash can be derived from such information and one way to generate such hash is observing changes in execution flow at basic-block granularity; and by investigating changes in the frequency of calls to edges between basic blocks.
In the first method, the binary is instrumented using LLVM [8] and a hash value is generated when the binary is terminated. This hash value is result of the call records of all basic blocks executed from the beginning of the program to its termination; if the execution of even one basic block is omitted or added, or the order changes, then the hash value changes. In the second method, the binary also generates hash value but in a different way compared to the first method: a fixed size arrayin LLVM, this is referred as inline counter array of the edges connecting the basic blocks is created, then the frequency of calls using these edges are stored in the array, and the hash value of the array is calculated. Therefore, although the hash value will differ if the execution of even one basic block changes, the execution order of the basic blocks does not have to be considered. Thus, even if a program is identically executed at high-level semantics, these methodologies can be used to track changes in internal memory values or differences in the detailed execution flow.
This study proposes EXHGen(Execution Hash generator): a tool for efficient generation of program execution hash that can observe the program's execution flow and investigate subtle changes in single-threaded executions. As the execution hash is a trustworthy logical indicator, it can be applied to verifiable computing to save time and cost, and resolve the problem of dishonest participants. The execution hash method can easily observe the program execution flow and is applicable to numerous situations other than verifiable computing, such as deterministic execution and detailed debugging.
This study conducted an additional test that applied the execution hash to deterministic execution. Traditionally, deterministic execution is a field that generally investigates methods to effectively debug multi-threaded applications (e.g., Dthreads [9]). However, thread scheduling is not necessarily the only factor that causes a program to execute nondeterministically. For example, even single-threaded programs can have different execution flows with the same inputs and conditions depending on the heap memory status, results of Libc functions, state of the network or file system, external interrupts, and system call-handling results. As a more specific example, if the heap memory allocation function (e.g., malloc) fails due to insufficient memory resources, the value 0 is returned, for which, an exception handling code can be considered. This is a correctness issue in the field of program testing. Although there may be no difference in the result at a level where the difference in detailed flow is very high, if the detailed computation flow is important (e.g., fuzzing [10]- [13]), then the existence of a difference becomes significant. This study applied the execution hash to investigate how frequently non-deterministic executions appear in a single thread for 108 programs included in the binutils [14] and coreutils [15] packages as well as 32 programs of OSS-Fuzz. A hooking library that can deterministically change for cases with non-deterministic execution was applied and attributed to the API function and system call execution result. Through these tests, the two methodologies that implement execution hash using existing methodologies were compared; the distinctiveness and efficiency of execution hash were demonstrated.

A. BASIC BLOCK
Block is a bundle of assembly words that run at once when executing a program. Among them, basic block refers to a linear code sequence that branches only at the beginning and end. The branch at the beginning is the entry point of the block that comes in to execute the basic block at another block, and the branch at the end is the part that goes out to execute the code at another block after the execution of the basic block. Due to this limited form, branch instructions cannot exist in the middle of the code constituting the basic block, and all instructions are always executed only once in order. For this reason, tracking the execution flow of basic blocks makes it very easy to analyze the flow of instructions during program execution; it also helps to track records of certain exceptional situations, such as when errors occur. EXHGen_version1, developed in this study, uses a method of storing records of executed basic blocks in memory by inserting additional codes into each basic block and tracking the entire basic block execution record. Since the execution records of basic blocks can eventually be considered the execution flow of the program, a comparison can be made between these records and non-deterministic cases can be observed.

B. COVERAGE MAP
Coverage refers to the measurement of the execution flow of the program. It is mainly used to check how much program code has been executed for a program. When executing a program, the execution record of a basic block is generally expressed as code coverage, a concept used in gcov [16] in general software engineering and AFL coverage [17] in fuzzing. EXHGen_version 2, developed in this study, measures the execution flow of the program based on this coverage map. There can be various types of coverage maps, and in the case of EXHGen_version 2, the frequency of calls to edges between basic blocks is measured through the LLVM's inline counter array.

C. EINTR
EINTR [18] is an interrupt signal that occurs when a system call is interrupted. Interrupt can be divided into hardware interrupts, generated by hardware, and software interrupts, caused by exceptions that occurred while the system call was running. EINTR is an error signal that occurs in a software interrupt related to a system call. It indicates that the process was interrupted by a specific signal before the function could complete the normal operation in the system call process. In this study, a non-deterministic case was observed in which the basic block execution record of a single thread changed due to exceptions to function call results (EINTR signal generation) during system call and external library functions; additional experiments were conducted to solve this with hooking.

D. DBI
In computer programming, instrumentation refers to functions for diagnosing errors or monitoring the performance of programs, such as data logging, debugging, and performance tracking. Alternatively, it can also refer to the act of inserting an analysis code into the binary. Analysis code refers to a set of code inserted by the user to observe a program's behavior without changing its original semantic. Dynamic binary instrumentation (DBI) is a type of instrumentation that happens at run time instead of compilation time. Intel PIN (free software) is one of the standard framework that supports DBI implementation [19] [20]. When executing a binary using a PIN, the codes are translated into intermediate representation and then transformed with instrumentation. That is, the binary is not directly loaded and executed, but is executed on top of PIN engine. DBI implementation using Intel PIN is convenient as PIN provides rich APIs. However, DBI takes huge transformation overhead, thus, significantly slowing down target binary execution.

E. LIBFUZZER
Libfuzzer is an in-process, coverage-guided, and evolutionary fuzzer developed by Google as part of the LLVM project [21]. It uses evolutionary algorithms to expand coverage efficiently and in-process methods to improve fuzzing's ability. Evolutionary fuzzer uses genetic algorithms to generate random mutations from sample corpus, and mutations generated every round are used as new corpus sets. The input that increases code coverage in the generated corpus set is stored and used as a sample corpus later. Therefore, in general, there is a higher probability that evolutionary fuzzing may be more successful than when randomly generated input is used. Libfuzzer refers to performing fuzzing in units of functions of the library, not executing the entire program independently for fuzzing. Additional work is required because part of the program subject to fuzzing needs to be wrapped, but the overall performance has nevertheless improved dramatically, including speed. In this study, libfuzzer was used; however, it was modified and added to a part of the compiler to create version 2, an efficient tool to measure the execution flow of the program.

F. POW
RoT (Root of Trust) [22] is always reliable software or hardware, where all the security work of a computing system depends. Ultimately, it is TC (Trusted Computing) that is intended to achieve, and there are representative platform modules or execution environments such as TPM and TEE. A trusted execution environment (TEE) and trusted platform module (TPM) can provide hardware-based functions related to security to prove program integrity, thus enabling verifiable computing. TEE is a secure execution environment provided by the security area within the main processor. Since it executes in parallel with the operating system in an isolated environment, the confidentiality and integrity of code and data loaded into the TEE are protected. Hardware techniques that can be used to implement TEE include ARM's TrustZone [23] and Intel's SGX [24]; however, this study utilized proof-of-work (PoW) as the method to implement verifiable computing. In contrast to TEE and TPM, which implement hardware-based security, PoW [25], [26] is a consensus algorithm in which nodes endlessly repeat the process of finding a hash below a target value and prove that they participated in this work. PoW aims to prove that a node performed a certain task or computation. Additionally, it also deters data manipulation, and ensures that distributed computing can be securely implemented.

G. THREAT MODEL
The threat model in our paper is participating clients disrupting the distributed (outsourced) computation; intentionally or un-intentionally. These threats are caused by a program's non-deterministic behavior and thus can be mitigated by enforcing the determinism. There are a number of previous researches considering non-determinism as a threat [27], [28]. In this paper we leverage hash-based Proofof-Work (PoW) concept to achieve trustworthy verification of computation, thus guaranteeing the honesty of distributed computing workers.

A. EXECUTION HASH
In verifiable computing, each entity executes a program and then verifies whether the program was correctly executed mainly based on the execution result. However, there may be programs whose output execution result cannot be checked. Even under normal circumstances, users who execute programs are forced to rely solely on the output for information about the program's execution, which makes it challenging for them to obtain information about programs that are difficult to check using the output data. Even for programs whose execution result provides an output, the user cannot obtain information about the program's execution flow for a specific input without a debugging process. This means that even if an identical result is the output when the same input is passed to the program, the user cannot determine whether the same functions were called or the same flow was executed without additional work on their part. If a distributed computing system must verify aforementioned computations, the verification becomes challenging. To address such challenges, this study proposes an execution hash. An execution hash intuitively provides information on the program's execution flow without additional debugging work. The most accurate methodology for generating execution hashes is to generate a hash by tracking the full execution flow of the program. However, given the excessively high cost, tracking and hashing all information is highly inefficient. As a solution, this study proposes generating the hash by extracting only the most essential information about the program execution.
There are two ways to generate the hash: the first is a fine-grained method that traces the entire calling order of the basic blocks, and the second is a slightly more coarse-grained method that involves hashing the code coverage information. To describe the execution hash overall, after inserting specific code into the program, information indicating the execution flow is collected and the hash is calculated based on the coverage for the collected program execution flow. Any program using these methods can easily generate an VOLUME 4, 2016 execution hash, and the hash value can be used to check whether the program's execution flow has changed as shown in Figure 1. Along with PoW-based verifiable computing, this function of the execution hash is applicable to diverse situations, such as detailed debugging work when problems arise in a deterministic program. It serves as a logical basis in verifiable computing and can be used to identify the problem if something goes wrong, for instance, when executing a deterministic program that must produce identical output for the same input. The execution hash can be used when debugging components, such as heap memory. The internal structure of heap memory greatly varies even with subtle differences in program execution, thus frequently leading to different results even from identical executions. Thus, the execution hash can be used to provide logical and reliable information about the program to all users, and it can be applied in certain situations to solve problems. Figure 2_version1 shows the overall design of version 1. When building the test target program with the clang tool of version 1, a unique number is assigned to a basic block in the target program's binary, and additional code is inserted at the same time. Afterward, when a block of a specific number is executed, the execution flow is recorded in order by the inserted code, and immediately before the program is terminated, the accumulated execution flow is calculated as the execution hash value through the SHA256 algorithm. The basic block's execution record measured in version 1 is typically expressed as code coverage, which is a concept used in gcov [16] in general software engineering and AFL coverage [17] in fuzzing. In version 1, the code coverage is not tracked as an aggregated result, such as a hash map (e.g., conventional AFL code coverage) or counter data structure used in libfuzzer [21]. For precise observation, all execution records considering the order of the entire basic block visit record are observed, and the result is finally hashed. Thus, although it demands extensive load in terms of performance, version 1 can reliably track subtly different execution flows that existing coverage measurement tools might miss. One may think that if all basic block execution flows are tracked, then memory usage will excessively increase if the program execution flow continues indefinitely, thus depleting memory. However, this can be solved with a trade-off of speedmemory usage. Whenever a new basic block is executed, rather than accumulating the record in memory, the existing hash is only updated internally. Hence, version 1 has O(1) space complexity.

B. EXHGEN_VERSION 1
C. EXHGEN_VERSION2 Figure 2_version2 shows the overall design of version 2. As with version 1, when building the test target program with the clang tool of version 2, a unique number is assigned to a basic block in the target program's binary, and additional code is inserted at the same time. Unlike version 1 however, the execution flow of each basic block is not recorded in order, and only information on the execution frequency of edges between the basic blocks is recorded in the form of an array. As the total number of edges in the program is fixed, the array always has a constant size. As in version 1, the array in which the execution flow information is stored is calculated as an execution hash value using the SHA256 algorithm. Given that version 2 does not consider the execution flow order, the hash value remains identical even if the execution order of edges in the program changes, unlike version 1. As a result, although the observation is not precise like version 1, the time to measure the execution hash is always constant (e.g., O(1) time complexity) even if the execution flow becomes indefinitely long, as the array size and the length of the data itself are fixed. To conclude, this study proposes two tools with different application targets: for version 1, a simple program that requires precise measurement, and for version 2, a complex program that requires greater consideration of performance and time.

A. VERIFIABLE COMPUTING AND DEBUGGING
This section proposes two applications for this research: verifiable computing and debugging.
Suppose, a client outsources the computation of some functions to some untrusted clients. The clients that were assigned the work provide a result with proof that the work was correctly performed; hence, the client that outsourced the work should be able to judge the accuracy of the returned result. The proposed methodology can be applied in this case to prove that certain results were correctly computed in cases of verifiable computing that do not use TEE and TPM under the premise of PoW. For example, we propose the SETI@home project as an applicable scenario. SETI@home is a project that utilizes large-scale distributed computing technology to explore radio signals from space. It is a project to search for radio signals of extraterrestrial civilization by analyzing signals in the frequency band of planets that are likely to have intelligent life; using large-scale distributed computing technology. The master node outsources the radio signal analysis job to each node participating in the project and gathers the result. Each node participating in the job must use correct algorithm given by master -this must be verified if the participating nodes are untrusted. In general, distributed computing utilize TEE or TPM as root of trust to ensure code integrity. However, hardware support is not available to all systems, and additionally, untampered code do not always guarantee high level sementic of data processing that can subtly change due to runtime/external events such as interrupts. Although a system do not support hardware based root of trust, EXHGen can achieve same objective with high accuracy. For example, when the master node outsources the signal to analyze, EXHGen can effectively detect if the node is performing the job as-is without any tampering (intentionally or unintentionally) by comparing the excution hash value of other nodes. We can also apply EXHGen in a similar way to other distributed computing projects that require additional 4 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and   The second application is debugging. In the program development stage, debugging is conducted to find logical errors or bugs in the code, reveal the cause, and solve the problem. When debugging with repeated executions, if the results vary due to changes in the execution flow, then it may be impossible to debug the program effectively. Accordingly, researchers have continuously investigated techniques to make multi-threaded programs with frequent nondeterministic errors deterministic to debug them accurately.
However, even the single-threaded programs (integrity protected via RoT) in this study produce non-deterministic errors due to asynchronous interrupts; and their frequency increases as the program grows in complexity. Therefore, the tool can be used to check whether the program's execution flow has changed, and changes according to system calls and various external factors can be modified through the proposed solution of function hooking. Through this, a program can be fixed to show the same execution flow when repeatedly executed, allowing the debugger to locate critical errors or VOLUME 4, 2016 5 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and The proposed tool thus provides verifiable and reliable information in distributed computing environments and has various applications, such as creating an accurate debugging environment in the development stage.

B. NON-DETERMINISM
To describe the overall study design, first, EXHGen was applied to build binutils and coreutils programs, the study targets, and each of these built binaries was repeatedly executed by applying a variety of execution options. Cases where the execution hash value changed during repeated execution were recorded, which were then statically and dynamically analyzed to confirm whether the difference was due to a nondeterministic execution flow. If the change was caused by an exceptional operation, such as an API/system call, it was modified through hooking to show a deterministic execution result. For this work, an additional test code and hooking code were written. For the test code, the test target program was repeatedly executed, and cases where hash values cause non-deterministic errors were automatically detected and managed. For the hooking code, libc APIs were appropriately modified based on the cause analysis, and non-deterministic execution was made deterministic. Non-deterministic execution covered in this study refers to cases where a single thread's basic block execution record varies with the heap state, system calls, and exceptional circumstances when calling external library functions.

A. EXHGEN_VERSION 1
Figure 2_version 1 is implemented by adding a pass to the LLVM 6.0 as a sanitizer interface ("-fsanitize="). In other words, a version of clang was created that adds a sanitizer, which performs the instrumentation desired, and a compiler option was added to that clang to build the source code of the target program with the additional code. We added LLVM pass code to following call back function: "runOnBasicBlock". "runOnBasicBlock" is a call back function invoked by LLVM while parsing each basic block. In "runOnBasicBlock," the pass brings the last location of the basic block using the "getTerminator" function to specify where to insert the instrumented code. The instrumented code simply invokes the custom function (version 1_Trace) to trace the execution flow. In version 1_Trace function, the unique basic block was identified based on return address ("__builtin_return_address"). "__builtin_return_address" is a special clang built-in function to get the return address of currently executing function. Since this address contains a code location inside the target basic block, this number can be used as a unique ID of the block. In this way, entire sequence of basic block IDs can be gathered and each time the ID is obtained, the internal CRC (Cyclic Redundancy Check) state can be updated. As a result, the final CRC state at the end of program execution contains the overall program execution flow. version 1 installs an exit handler ("atExit") to print out the final CRC hash when the program is terminated. File interface is used to print out this information; therefore, when executing a test program built with version 1, the working directory must be in a writable state. Figure 2_version 2 is a program execution flow measurement tool created by modifying a part of the libfuzzer in the clang compiler and inserting additional code. The functions of version 2 are implemented by inserting additional codes written in C++, and the most important code is the part where the execution hash is obtained. Libfuzzer executes the program to be implemented by dividing it into compartments referred to as module. Based on this, version 2 measures the program execution hash of each module. The module consists of a basic block, which allows the program execution flow to be measured in units of modules. For this, version 2 introduces an inline counter, which records the frequency of execution of the edge. Since the module consists of basic blocks, several basic blocks connected by the edge are executed when the module is called. In this case, the edge represents a jump between the basic blocks, which means that the information on the edge flow may be regarded as information on the execution frequency of the basic block. To obtain information on the final flow of the program, inline counter information for each module must be recorded. Accordingly, version 2 inserts an array of up to eight bits to store an inline counter into each of the modules constituting the program. As the number of modules in the program and the number of basic blocks constituting the module do not change, the size of each inline counter array is also fixed. In this study, the inline counter array inserted into the module is regarded as the coverage of each module and hashed using the SHA1 algorithm. The operation is started for all modules constituting the program using a function of determining the start position and the end position of the module. The hash of each module is XOR-calculated to each other, and a hash chain is formed through this process. After the operation for all hashes is complete, the finally generated hash is dumped into the file. As a result, the hash finally dumped indicates an operation result in which information of all modules constituting the program is associated. This means that a very slight change in the flow of execution can bring a big change to the hash; hence, version 2 can be used from various perspectives.

VI. EVALUATION
The tests were carried out in the following environment: This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.

A. NON DETERMINISM TEST
Non-determinism tests were repeatedly performed on the selected commands and programs to examine whether the result varied due to any other secondary factor under a single-threaded situation. This was done to demonstrate the meaningfulness of finding non-deterministic execution flows in single-threaded commands and programs, which this study investigated, rather than multi-threaded ones, as in many previous studies. Each binutils/coreutils command was repeatedly executed 100,000 or 1 million times depending on performance to find non-deterministic cases. To catch non-deterministic case, a python script compares execution hashes gathered from multiple executions. EX-HGen_version 1 was applied with the assumption that the hash value obtained by the first execution was the correct value. Command execution was automated, and the hash values for the program execution flow accumulated in the file for each execution were compared. When the values differed, it was judged as a non-deterministic error. In Table  3, Err indicates the number of non-deterministic errors in the two commands. While most of the commands exhibited a deterministic program execution flow, the -O option of the strip-new command in binutils showed relatively many errors (15). This is because, unlike other commands, intermittent function calls related to system calls were omitted due to repeated execution.
The same algorithm was used in OSS-Fuzz; the number of iterations was manually based on the test target program's execution speed; 32 programs were tested about 1 million times each. Since EXHGen_version 2 was used, every time a program was executed, a file containing the hash value generated according to the number of basic block executions was generated. The md5 hash values for these files were obtained to check whether the execution flow changed; if the hash values were different, then it was judged as a nondeterministic error. Table 1 shows the error frequency in each program. The results in Table 1 and Table 3 indicate that the error rate rises with the program's complexity due to the variety of variables in repeated execution. Most of the 32 samples showed a significant error rate, while only six did not show any error.

B. PERFORMANCE TEST
Since additional code was added in EXHGen_version 1 to calculate the hash value for the program execution flow, it was predicted that the execution time would slow down as more codes were physically executed like Table 2. To examine how much the execution time slowed and how much the time performance differed with the existing executable file analyzer, the time performance was measured with repeated executions of binutils/coreutils commands Table 3. Due to page limitation, Table 3 summarizes subset of our data set. Full data is available in Appendix (Table 4, Table 5, Table 6). As binutils/coreutils considers even the order to calculate the hash for the execution history, it is meaningful to compare the time between the method using EXHGen_version 1 and the method not using it. However, for EXHGen_version 2 used in OSS-Fuzz, since the hash is calculated only once regardless of the program's execution time, the hash calculation time was not additionally analyzed. In Table 3, the number of iterations was fixed for all cases to 100,000 and measured the difference in the time performance of each command and option depending on whether EXHGen_version 1 was applied in binutils/coreutils. According to evaluation, in almost all cases, execution took longer when using EXHGen_version 1, though the difference was not large. Version 1 can also be implemented using a DBI tool, such as Intel pin tool; therefore, we additionally implemented EXHGen based on Intel pin's trace feature (denoted as D.T in the table) and the performance (consumed time memory) was accordingly compared. Compared to C.T, D.T clearly showed a significant overhead. In terms of time taken to measure the execution flow, using EXHGen_version 1 was 1% to 10% slower than not using any tool at all, whereas using Intel pin tool was nearly 10 times slower. This indicates that EXHGen can measure a program's execution flow more quickly and efficiently.

C. NON-DETERMINISM FIX
Given that non-deterministic errors were found even in single-threaded operations in the above non-determinism test, a non-determinism fix test was performed to demonstrate that it is meaningful to resolve the error and make the execution flow deterministic. For binutils/coreutils commands, where errors were found by comparing the execution hash values, ltrace and strace were run to find the cause of the error. This was done by inserting ltrace/strace code into the above code that was repeatedly executed. From the results, the causes were classified into four categories: • error inevitably occurred during repetition because the operation was a one-time command; • error occurred because the disk and memory conditions changed in real time; • a problem with the code used in the test; and VOLUME 4, 2016  Particularly, it was difficult to measure the determinism of the execution flow accurately when an error occurred due to the OS used for the test, computer specification or environment, disk environment, among others. As such, of these four categories, an additional test was performed on commands for which the cause of error was unknown.
The test was conducted on cases of the -O option of the strip-new command in binutils, which revealed the 15 errors as shown in Table 3. Unknown errors arose for this command with a probability of about 0.001%. The execution was repeated approximately 450,000 times for a more detailed observation, in which case the same hash value error occurred six times and the probability slightly rose to approximately 0.0012%. According to the normal execution flows shown in Figure 3 and the execution flows where error occurred shown in Figure 4 using ltrace, although most showed the same results, it was observed that when a non-deterministic situation occurred, the fread function call was intermittently omitted due to a system call error. Consequently, it was not possible to read the full data from the file as expressed in No. 5 of Figure 4; therefore, fread function hooking code (myfread.c) was written to fix the execution flow deterministically, which set the reason for non-determinism. 8 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and  The point of this hooking code Algorithm 1 is to fix the execution flow so that when reading a file using the fread function, the contents corresponding to the size passed as an argument are read accurately for every execution without exception. This code was composed in a format similar to the existing fread function and inserted the code original_fread = dlsym(RTLD_NEXT, "fread") to declare the original fread function as original_fread. Code was then written to check whether the returned value (total_count) when passing the same parameters to the original_fread function and calling it was identical to the count parameter value passed when calling the fread function. If the values were different, then the code proceeded through Nos. 6-10 shown in Figure 4, determined that the function returned while reading the file, and read the address after the interrupted address again with the original_fread function. This was implemented with the code original_fread( (void*)((unsigned int)ptr + total_count*size), size, count -total_count, stream). Thus, it was modified so that the hooking library could handle the exception-handling logic that could continuously read the contents of the file until count and total_count were equal. When count and total_count were equal, total_count was returned and the function was terminated.
The completed code myfread.c was compiled as a shared library using the command "gcc -Wall -fPIC -shared -o myfread.so myfread.c -ldl", after which the command "LD_PRELOAD=./myfread.so ./strip-new -O binary test" was executed so that when "strip-new" was executed, the myfread.so shared library could be loaded and the hooked fread function called. Finally, ltrace was used to check whether the hooked fread function was properly called; an error rate of 0% was observed when the test was repeated 1,000,000 times. In conclusion, the desired deterministic program execution flow result was successfully derived.

VII. RELATED WORK A. PROGRAM TRACE
Tools for tracing programs [29]- [31] often find detailed elements, such as program history (e.g., function calls), thread operation methods, and various event types in the program execution stage. Since the execution hash can determine whether the execution flow changed when tracing the program, it can effectively measure performance to judge whether the program was executed deterministically. SROH [29] calculates the traced hashes for values stored in memory corresponding to the random read and write access of the program as proposed in OH [30], thus automatically detecting non-deterministic program areas and protecting integrity. The  greatest difference in the current study is that the determinism of the program itself can be confirmed because the entire basic block call history is collected to create a table of the call frequency, from which the hash is calculated.

B. VERIFIABLE COMPUTING
As the execution hash in this study is designed to verify whether a program is executed correctly, it can be used for verifiable computing. Many prior studies on verifiable computing dealt with situations where the input-output relationship is clear, such as in cryptographic operations. [32] presented a method that defends against threats, such as dishonesty from untrusted clients and enables them to prove that they accurately performed most of the work with a high probability. In this process, although the act of confirming whether a specific operation result was returned is similar to the current study, a difference is that it uses mathematical operations.

VOLUME 4, 2016
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022 Furthermore, although [33]- [35] have similar objectives to this study, their approaches to solutions slightly differ in terms of mathematical and cryptographic operations. The current study applied verifiable computing to trace the execution flow of programs and demonstrated that the proposed methods can verify operations and identify whether computations were properly executed. Methods that can implement verifiable computing include PoW, TEE, and TPM. [25], [36]- [39] investigated ways to ensure the integrity of work and programs by utilizing these methods. PoW-based verifiable computing, the application target of this study, was covered in [25], which investigated efficient statistical techniques using PoW for cloud and fog computing and proposed a method that provides secure and transparent transactions by verifying data blocks while solving PoW's limitations of high resource consumption and long working time. To improve computing security, [36]- [39] used hardware-based methods applying TEE and TPM, which ensure program integrity. The studies referenced in [36] aimed to provide a secure and reliable distributed computing environment, same as our study. However, a difference is that they used the support of TPM for credentials. TEE has been recently applied to distributed computing, such as blockchain (e.g., [39]), making it possible to verify mathematically whether computing was correctly performed while maintaining security.

C. DEBUGGING
A characteristic of this research subject is that it only shows cases that become non-deterministic due to some other secondary factors in a single-threaded situation. As in [40]- [43], almost all attempts to make programs deterministic for effective debugging were on multi-threaded programs. A study [44] mentions multi-threaded execution as a major research subject, although there is similarity in that it deals with deterministic execution. Prior research adopted the solution of serializing the parallel execution flow, whereas this study differs in that it fixes changes caused by system calls or external factors to the program.

VIII. CONCLUSION
This study developed an execution hash generation tool using LLVM, verified the performance and occurrence of non-deterministic executions for binutils, coreutils, and OSS-Fuzz program sets, and discussed its feasibility as a PoW mechanism based on the findings. According to the test results, approximately 81 non-deterministic errors were observed among approximately 20 million executions using the tool for binutils and coreutils programs, and about 166 non-deterministic errors among approximately one billion executions for OSS-Fuzz programs. Certain cases were also selected among these non-deterministic errors to analyze and identify their causes. The developed tool is expected to be applicable to verifiable computing as well.
. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.