Using Algebra Graph Representation to Detect Pairwise-Constraint Software Faults

Automatic fault detection, localization, and repairing have always been a research hot spot. Because software faults can appear anywhere in a software, it is impossible to automatically repair all types of faults. One possible solution is to detect, localize, and fix a specific fault each time according to the fault signature pattern. Along this direction, we propose an approach that detects pairwise-constraint software faults using algebra graph representation. The approach takes a program and a pairwise-constraint as inputs and generates a fault report for programmers. It firstly generates a constraint control flow graph (CFG) that is labeled using pairwise constraint and then translates the constraint CFG as a path expression using algebra graph representation, which is an abstract path model for the program. Finally, it employs a detection algorithm to detect whether the program contains pairwise-constraint faults. We perform case studies to validate the effectiveness of our approach. The preliminary results show that the approach can detect pairwise-constraint software faults before software testing.


I. INTRODUCTION
If programmers were perfect, faults would not be present in software. However, most of us make mistakes when designing and implementing software. The faults hidden in a software need to be detected and fixed to improve software quality. Ideally, there would be an automatic software fault repair tool that could be automatically performed to detect, localize, and fix program faults without the intervention of a human programmer. In order to move towards this goal, an early automate program repair tool called GenProg was proposed by Weimer et al. [1]. Subsequently, many similar techniques have been proposed [2]- [4]. The techniques are called generate-and-validate techniques, which compile and test each candidate patch to collect all validated patches that produce expected outputs for all inputs in the test suite.
The associate editor coordinating the review of this manuscript and approving it for publication was Zhaojun Li . The reported results of recent state-of-the-art techniques are generally promising. However, the program repair techniques assume that faults have been already detected and accurately localized. Although great progress has been made in software testing and localized automatically (such as spectrum-based Fault Localization techniques [5], [6]) in recent years, accurate fault localization for all existing faults is impossible due to the complexity and diversity of faults [7]- [9]. One possible solution is to detect, localize, and fix a specific fault each time according to the fault signature pattern.
To better understand the characteristics of faults, researchers in recent years have tried to explore the characteristics of faults from real systems [22], [24]- [27]. Generally, software faults can classified by the following three dimensions: root cause, impact, and software component. According to root causes of software failures, we can divide software faults into the following types: memory-related faults, concurrency faults, and semantic faults [31]. Memory-related faults cause software failures by improper handling of memory objects. Concurrency faults occur only in multi-threading (or multi-process) environments, including data race, deadlock, and synchronization. Semantic faults refer to faults other than memory-related faults or concurrent faults that cause the software execution result to be inconsistent with the programmer's expectations. The types of semantic faults mainly include function losses, control flow faults, exception handling faults, editing oversight, solution faults, etc. An empirical study [31] showed that software semantic faults account for the majority of software faults: 87.0% in Mozilla, 82.5% in Apache, and 70.1% in Linux kernel. The percentage of semantic faults increase with the maturity of the software project. Unlike memory-related faults and concurrent faults, semantic faults have no distinguishing characters (for example, system crashes). Semantic types in a software are diverse, and the fault characteristics of semantic faults are closely related to their types. Obviously, it is difficult to have a general method to accurately detect and localize all software semantic faults. Therefore, semantic fault detection is more challenging and requires more effort.
A common source of semantic faults results from illegal sequences, such as control flow [11], [12] and data flow [14]- [16], in a program execution. Pairwise-constraint faults are one semantic fault type. A pairwise-constraint for a program is that the two specify operations must appear in pairs and have a certain order in all execution paths during program execution. Namely, a pairwise-constraint is a rule that imposes some pairwise restrictions on the order in which a certain program entity may be executed, which can be expressed informally as <op1,op2>. The rule that is not correctly design or implemented would become a pairwise-constraint fault. For instance, considering a C program execution, the software would trigger a failure if a function fopen were called to open a specify file and did not close (call function fclose) the file in the end of the software. The pair of <fopen, fclose> in a C program is a pairwise-constraint. Although pairwise-constraints may be seem simple, this type of semantic fault is difficult to detect because of the large number of looping interludes that occur during the program execution. For the type of faults, if we know that a program has a pairwise-constraint, it is possible to detect, localize, and fix the fault automatically.
In this article, we propose an approach using algebra graph representation to detect pairwise-constraint software faults. To achieve our goals, the approach takes three steps. First, it automatically generates a CFG and labels the CFG to form constraint CFG (Con-CFG) by the pairwise-constraint. Second, the Con-CFG is represented by an algebra graph. Third, we design a fault detection algorithm to detect whether there are pair-constraint faults and to generate a fault report for programmers.
The main contribution of our work is twofold: (1) to the best of our knowledge, this work is the first to use algebra graph representation to detect pairwise constraint faults.
(2) We perform case studies to validate the effectiveness of our approach. The preliminary results show that the approach can detect pairwise-constraint software faults before software testing.
The reset of this article is structured as follows. Section II gives an example to show our motivation. Section III defines the constraint control flow graph and represents it algebraically. Section IV elaborates on our approach. Section V presents case studies to illustrate the effectiveness of our approach and provides a discussion. Section VI puts forward the related works and discussion. Finally, Section VII draws our conclusions and includes future works.

II. MOTIVATION EXAMPLE
In this section, we illustrate our research motivation with an example shown in Figure 1. There exists a pairwiseconstraint fault. S 5 (fo.fopen(''fileopen.txt'',''r'')) should be the precursor to S 4 (fo.close()), not its successor. The program was run. It crashed when it executed and reached statement S 7 though S 4 and S 5 . It printed the following error message: ''I/O operation on closed file'' when variable i>2. The reason for this failure is that the two operations <fopen, fclose> must occur in pairs.
Modern compilers are powerful enough to spot software faults, such as variables being used before they are initialized. However, it cannot detect this defect because it is program-specific-related and cannot be detected easily. We can use program testing to detect the software defect based on certain test criteria. Shi proposed an approach to detect concurrency and sequential bugs using definition-use invariants [14]. However, the pairwise constraint faults are not always dependent on definition-use invariants. Our approach differs from other approaches in that it takes a program and a pairwise-constraint as inputs, uses algebra graph representation to detect pairwise-constraint software faults, and generates fault reports for programmers.

A. ALGEBRA GRAPH REPRESENTATION
In this section, we first introduce the concepts of control flow graph (CFG) and pairwise-constraint, then propose Constraint-CFG, and finally give the algebra graph representation.
Let P = {c 1 ,c 2 ,. . . ,c n } be a faulty program containing n program entities. A program entity can be a statement, basic block, method, and other program entities.
Definition 1: A control flow graph (CFG) of a program is a tuple G = (N , E, n 0 , n e ), where (N , E) is a finite directed graph. N maps are a set of basic blocks in the program, and E ⊆ N × N is a set of directed edges connecting the basic blocks. n 0 is the entry basic block, and n e is the Exit block of the program.
Definition 2: Given CFGG = (N , E, n 0 , n e ) of a program, a path in program executions δ = n 0 → n i → ... → n e is a sequence of nodes, and the sequence node starts with n 0 and ends at n e , where each pair of adjacent nodes < n i , n j > is in the set of E of edges.
Definition 3: A pairwise-constraint for a program is that the two specify nodes associated with some operations must appear in pairs and have a certain order in all δ.
Namely, the two operations occurred in a pair, and their behaviors negate each other, or one must be done before the other. The following pairwise operations in a program have a pairwise-constraint: • <Push,Pop> • < Enqueue,Dequeue> • <Getting memory,Disposingofmemory> • <Open,Close> Example 1: Figure 2 shows an example of a CFG. It can be represented as G = ({n 0 , n 1 , n 2 }, {< n 0 , n 1 >, < n 1 , n 2 >, < n 2 , n 3 >, < n 3 , n 4 >, < n 4 , n 3 >, < n 3 , n 5 >, < n 5 , n 6 >}, n 0 , n e ). A path δ =< n 0 , n 1 , n 3 , n 5 , n 6 > is covered in a program execution. In the CFG, a pairwise-constraint is <fopen, fclose>. The node n 1 associates with the operation fopen, and the nodes n 4 and n 5 associate with the operation fclose. Obviously, the two operations should occur in a pair, and an fclose operation must be preceded by an fopen operation. Therefore, a path δ =< n 0 , n 1 , n 3 , n 4 , n 3 , n 5 , n 6 >, in which there is a sequence of f open → ...f close → ..., f close , must trigger a software failure due to the pairwise-constraint being violated.
Generally, those pairwise constraints are shown or implied in the software specification. We can use some machine learning techniques (such as topic model) to automatically extract these pairwise constraints. To simplify the work, pairwise constraints are specified by the programmer in our work.
Definition 4: For a CFG and pairwise-constraint, a constraint-CFG is a CFGG = (N , E , n 0 , n e ) simplified by a pairwise-constraint. Different from general CFG, E is labeled with p , q , and 1 , where p means the edge associates with the predecessor of pairwise-constraint, q means the edge associates with the successor of pairwiseconstraint, and 1 represents the edge that has no pairconstraint. Example 2: Figure 3 shows an example of Con-CFG. The edges in the Con-CFG are labeled with 'p' (the edge associates with operation fopen()) or q (the edge associates with operation fclose()). There is no label '1' here because all the unrelated edges are simplified. The Con-CFG is represented asG = ({n 0 , n 1 ,n 2 , < n 0 , n 1 >: p, <n 1 , n 1 >: q, <n 1 , n 2 >: q},n 0 , n 2 ).
To resolve our issue, the constrain-CFG is represented algebraically, which can be manipulated using standard algebraic operations and converted to regular expressions. In a regular expression, the unique name can map as an edge in a CFG. Generally, there are three basic structures: sequential, branching, and cyclic.
• Sequential structure: If edge a is followed by edge b, their product is ab. The operator * is not written explicitly.
• Branching structure: If either edge a or edge b can be taken, their sum is a + b.
• Cyclic structure: If an edge, path product, or path expression can be repeated, then it is labeled with an exponent. For instance, a n means edge a is repeated n times during a program execution. Example 3: Figure 4 shows a graph in which the edges are all labeled. We apply the above three regular iteratively, and the final regular expression in our example is represented as abde * f+acde * f(gde * f) * hi.

B. RIPR MODEL
There are many types of mistakes that occur during software development. The following three terms are adopted by IEEE conventions: Definition 5: Fault/Bug is a static defect in the software. Definition 6: Failure, which can be observed, is an external, incorrect behavior that does not meet the software requirements.
Definition 7: Error is an incorrect internal state and can not be observed directly. It is the manifestation of faults/bugs in program execution.
Offutt and Morell independently proposed the RIP (Fault&Failure) model, in which they pointed out that a software failure to be observed must follow the three conditions. Their ideas were published as different notations. Recently, Li [13] extended the RIP model and highlighted that there are four necessary conditions, namely Reachability, Infection, Propagation, and Revealability (RIPR). The RIPR is certainly central to the software testing community. In Figure 5, the RIPR model is shown. The four conditions are defined as follows: • Reachability: The program entity containing the fault(s) must be reached.
• Infection: The faulty program entities must infect the internal state of the program.
• Propagation: The infected state must propagate, causing some output to be incorrect.
• Revealability: The tester must observe part of the incorrect portion of the program state. Based on the RIPR model, to detect a fault in a program, programmers should choose input data to reach the faulty program entity and infected internal states, and finally trigger software failures in software testing community. Ammann and Offutt [10] classified program testing as four testing criteria: input space partitioning criteria, logic coverage criteria, graphic coverage criteria, and syntax coverage criteria. Those testing criteria satisfy any of the RIPR conditions either explicitly or not explicitly.
We refocus on pairwise-constraint faults. Although pairwise-constraint faults cannot explicitly satisfy the RIPR model, RIPR can explain the process of the fault triggering. A pairwise-constraint is a rule that imposes some pairwise restriction on the order in which certain program entities may be executed. Therefore, the pairwise-constraint fault can be detected in program execution paths if the pairwise-constraint is known. Our static approach can detect pairwise-constraint software faults before software testing.

IV. OUR APPROACH
A. FRAMEWORK Figure 6 shows the workflow of our approach. It takes a program and a pairwise−constraint as inputs, and a fault report for the pairwise-constraint fault is generated. Obviously, the pairwise-constraint is specification-related, and it can be extracted automatically in software specification or specified by programmers. In our approach, the pairwiseconstraint is given by the programmers to detect pairwise constraint faults. For example, if the goal is to detect pairwise-constraint <connect, disconnect> for a database in a program, we will need to extract the keywords of <connect, disconnect> statements as pairwise-constraint as inputs. Guided by the pairwise-constraint of the programmer's interest, our approach can label and simplify the CFG for a program to generate a Con-CFG.
Our approach consists of three steps to detect pairwiseconstraint faults in a program.
Step 1 is to build a CFG for a program, then label the edges of CFG using a pairwise− constraint of the programmer's interest, and finally simplify the CFG to generate Con-CFG. In step 2, we can directly translate Con-CFG to a path expression using algebra graph representation. The elements of path expression are labeled p , q , q , and 1 . Therefore, the path expression is a simplified version of the algebra program CFG representation. In step 3, we reduce the path expression using complement operation for pairwise-constraint, and we perform pairwise-constraint fault detection that violates pairwise-constraint. In the following sections, we mainly discuss the three steps.

B. CON-CFG GENERATION
In this section, we focus on the generation of Con-CFG. Given a CFGG = (N , E, n 0 , n e ) of a program, we construct a Con-CFG G = (N , E , n 0 , n e ) based on the pairwise-constraint. There are two steps: Label Edge for CFG and simplify the labeled CFG.
In a CFG, the operations of pairwise-constraints associate with nodes. For example, the node n 1 associates with operation fopen() in Figure 2. In a Con-CFG, we mainly focus on the path information for the operations containing pairwise-constraint. Therefore, we will label each edge in Con-CFG. We divide the edges in the Con-CFG into three types: • p-Edge: Creator operation (such as fopen, push, enqueue, etc.), labeled as p.
• Edge: Neither a creator nor a destructor, labeled as 1.
Note that there is a number of pairwise-constraintindependent edges in the CFG, and we need to reduce these edges to facilitate our later work. Theoretically, there are two reduction strategies: (1) Reduce CFG first, and then express it in algebraic form; (2) Express it in algebraic form, and then reduce it. Obviously, strategy 1 can reduce the cost of algebra graph representation. Considering different program structures, we adopt different reducing strategies. We use these strategies iteratively to reduce the CFG until it does not change. These three strategies are described below: • Sequence structure: If successive edges are marked as 1, the pair directly merges the edges into one.
• Branching structure: If all branches are labeled as 1, the branches degenerate directly into an edge, and the edge is labeled as 1.
• Loop structure: If the edges of a loop are labeled as 1, the loop degenerates directly into an edge, and the edge is labeled as 1. Example 4: Figure 7 shows an example where all the edges of the CFG are labeled as p, q, and 1. Firstly, we combine all edges for sequence structure. For < n 1 , n 2 > and < n 2 , n 3 >, we merge the two edges into an edge labeled 1. For < n 3 , n 4 > and < n 4 , n 5 >, we merge the two edges into an edge labeled q. Figure 8 gives the result of combining the edges of sequence structure. In Figure 7, there is a branching structure (< n 1 , n 3 >, < n 1 , n 3 >), and the two branches degenerate directly into a single edge. The result is shown in Figure 9. It is comprised of a cycle. However, the loop includes a label q . Therefore, it cannot be degenerated. We perform the three strategies until the graph does not change. The final result is shown in Figure 9.

C. ALGEBRA GRAPH REPRESENTATION
In essence, pairwise-constraint fault detection checks whether each possible execution path violates pairwiseconstraint. However, the path may explode due to the large  number of loops in a program. Algebra graph representation can be manipulated using standard algebraic operations and converted to regular expressions. These operations can then be used to answer various questions about the graphs.
As mentioned above, there mainly exist two operations: multiplicative operation ( × , which is generally not written explicitly) and additive operation ( + ). Concatenating edges together form a path, so a sequence of edges is called a path product. A path expression contains path products and zero or more + operators. Thus, every path product is a path expression.
As far as we know, this process has not been automated and implemented in a tool. However, it is a special case of the common technique of constructing regular expressions from deterministic finite automaton. We implement the algorithm based on Con-CFG.  Figure 10. We can represent the graph algebraically, and the path expression is represented as p(q) q. In the path expression, the powers of are multiplication, and it can map as a loop in a program.

D. FAULT DETECTION
Given a path expression σ , our goal is to check whether each path σ i violates pairwise-constraint. There are two problems that need to be solved: how to reduce path expression and how to handle cycles in a program.
For path expression reduction, we simplify the path expression using complementary operation of pairwise constraint. The two operation pair is pairwise-constraint if their behaviors negate each other, or one must be done before the other. Complementary operations defines two operation tables as shown in Figure 11, which mathematicians have studied to define other algebraic operations [10]. The regular expression multiplicative and additive operators are changed based on the two operation tables. We can easily see that p * q reduces to 1, p + p reduces to p, and q + q reduces to q.
The problem of handling loops has plagued graph-based criteria from the beginning. It seems obvious that we want to cover paths, but loops create infinite numbers of paths. A prime path is a path from n i to n j if it is a simple path and does not appear as a proper subpath of any other simple path, and a simple path is a path from n i to n j if no node appears more than once in the path [13]. In the software testing community, prime path is a way to deal with loops in software testing using graph coverage criterion. Essentially, a prime path expands the nodes in a loop for 0 or 1 time. Therefore, one way to handle loops is to expand the paths in a loop for 0 or 1 times in the path expression.
The detection fault algorithm is illustrated in Algorithm 1. It takes path expression and pairwise-constraint as inputs. The approach has three major steps. (1) Path extraction: in this step, we first extract paths from the path expression by handing program loop(s) (lines 1-4). (2) Path reduction: in this step, we reduce the path by the above operation change table, in which we handle operator '+' and operator ' * ' respectively (lines 5-12). (3) Fault report: each path extracted by the path expression is reduced. It is added into the fault report if it is not reduced as '1'. Finally, the fault report is returned to programmers to further check (lines [13][14][15][16][17].
For a pairwise-constraint fault, the first question should be asked: ''Is it possible to have more 'q' operations than 'q'?'' If the answer is 'yes', then some path expressions are follows: The second question is: ''Is it possible to have more 'p' operations than 'p'?'' If the answer is 'yes', then some expressions are follows: The third question is: ''Is it possible to have an order <q, p> in path reduction eventually?'' If the answer is 'yes', then some expression are as follows: • qppp • qp(q) n , ∀n Each 'yes' answer represents a specification likely to cause anomalous behavior. It should be added to the fault report in our approach.

V. CASE STUDIES
Empirical studies have been undertaken to demonstrate the feasibility of these criteria. The goal is to demonstrate that our approach can be effectively used, and we hope to evaluate them more fully in the future. We illustrate it with a small program of a class that encapsulates operations on a file, which can be found in [10]. The class FileADT will mainly have two operations: • fopen(String FName) // Opens the file with the name FName.
• fclose(String FName) // Closes the file and makes it unavailable for use. The class has a pairwise-constraint for operation < open, fclose >. We give a partial CFG that represents a unit using the < fopen, fclose > operation in Figure 12. We can use this graph to illustrate our approach to detect pairwise-constraint. There are two questions for the effectiveness of our approach: • RQ1: Can our approach effectively detect faults in faulty programs?
• RQ2: Can our approach effectively and correctly report in fault-free programs?

A. ANSWER RQ1
The pairwise-constraint faults have four cases in a program: (1) More p operations than q operations in the faulty program, (2) more q operations than p operations, (3) pairwise-constraint faults occurring in multiple pairs, and (4) q operation occurring before p. To answer RQ1, we seed pairwise-constraint faults for the four cases in the program. The seed faults are shown in Figure 13. To simplify our work, we only represent the CFG of the fragment program. Then, the CFG is labeled, simplified, and represented as a path expression for the Con-CFG. Finally, a fault report is generated by the fault detection algorithm. For case 1, there exist some paths including more p operations than q operations, such as < n 0 , n 1 , n 2 , n 4 , n 8 >.
If the program executes those paths, it would trigger software failure. Our approach reduces and eventually generates the path expression δ = p(p + 1)q. Based on Algorithm 1, there exist two paths: δ 1 = pq andδ 2 = ppq. We simplify the two paths and find that the path δ 2 is illegal to pairwise-constraint.

Algorithm 1 Detecting Fault Algorithm
Input: Path expression σ , pairwise-constraint < p, q > Output: Fault report 1: δ s ← ∅, ← ∅ 2: for each Operation ' * ' do 3: δ S ← δ S ∪ exactPath(δ) 4: end for 5: for each δ S i ∈ δ S do 6: for each element e ∈ δ S i do 7: if current Operation = '+ then 8: For case 2, the program has some paths, such as < n 0 , n 1 , n 3 , n 5 , n 6 , n 3 , n 4 , n 8 >, including more q operations than p. Our approach reduces and eventually generates the path expression δ = p((q+1) * +1)q inputted as the CFG and pairwise-constraint < p, q >. Considering that loops exist, our approach unfolds the loop one time and 1 time. The path set is {δ 1 = pq, δ 2 = pqq }, and it is easy to check whether the δ 2 contains a pairwise-constraint fault. From case 2, we can find that the path is relatively simple, and the number of paths is reduced in our approach.
In case 3, pairwise-constraint faults occurred in multiple pairs. There is a two pair operation. Namely, we first open a file and close it sometime later, and then open it again and close it later. We generate the path expression δ = p((p + 1) * + 1)qq. Similar to case 2, the loop is unfolded one time and 1 time, The path set is {δ 1 = pqq, δ 2 = ppqq }. The path δ 1 contains the pairwise-constraint fault.
In case 4, the q operation occurred before p. In this case, there is a number of paths that are independent of the pairwise-constraint. The path expression is created as δ = qp. Obviously, the path expression contains a pairwiseconstraint fault.
We seed pairwise-constraint faults for these four cases and apply our approach to detect pairwise-constraint faults. Our approach outputted fault reports. In the fault reports, all the suspicious paths are given. The experimental results show that our approach can detect pairwise-constraint software faults in the four cases. According to the case studies, we have reason to believe our approach can effectively detect pairwise-constraint faults in the faulty programs.

B. ANSWER RQ2
Usually, the programmer would follow the pairwiseconstraints to design and implement the program. In this case, the program has no pairwise-constraints faults, and our approach should not generate false positives.
There are usually two pairwise-constraint usage patterns. (1) < p, q > pairwise usage pattern: in this pattern, the pairwise-constraint is only used once. For instance, open a file and close the file after write/read in a program.
(2) Multiple < p, q > usage pattern: in this pattern, the pairwise-constraint is used nested, such as < push, push, pop, pop >. We give two cases for the two usage patterns in Figure 14. To answer RQ2, we apply our approach to the two cases to detect the pairwise-constraint faults. In case 1, the p operation is in the initial node, and the q operation is in the last node, which is a very common usage pattern for pairwise-constraint. The path expression is created by our approach, and the expression is δ = pq. Obviously, the path expression has no pairwise-constraint faults.
For case 2, the pairwise-constraint is used nested. In case 2 in Figure 14, we can easily see that a pairwise-constraint is in a loop. The path expression of the Con-CFG is δ = p((pq) * + 1)q = pq = 1, generated by our approach. The path expression has no pairwise-constraint faults.
In the above two cases, our approach did not output any fault messages, namely, it had no false positives. From the above results, we found that our approach can be applied in practice.

A. RELATED WORKS
In this section, we describe closely-related studies and discuss our approach.
(1) Software testing. The pairwise-constraint fault is a special case of sequencing-constraint faults. Olender and Osterweil firstly proposed an approach to generate tests to satisfy sequencing constraints [11]. We extend our approach to detect sequencing-constraint faults. Obviously, the failure triggered by pairwise-constraint faults is related to the paths covered. The notion of subpath set was developed by Offutt et al. to support interclass path testing [12]. Prime paths were introduced and first appeared in the research literature in an experimental comparison [13]. Other software teaching approaches can also be used to detect pairwise-constraint faults [21], [23]. Different from the above approaches, our approach is based on program static analysis, in which detection results have negative and false positive cases. In the future work, we plan to combine the two techniques to detect semantic faults. For instance, our approach produces a fault report, and the fault report also can be viewed as testing requirement to drive program testing.
(2) Program static analysis. Shi et al. applied definition-use invariant to detect concurrency and sequential bugs [14]. Huang et al. investigated root causes of failures and identified five error-prone aspects in a fast path that was implemented to speed up the critical and commonly-executed functions in a workflow [28]. They found that many of the deep faults can be prevented by applying static analysis incorporating simple semantic information, and they extracted a set of rules based on their findings and built a toolkit PALLAS to check fast-path bugs. The above two works are similar to our approach: we also consider the semantic of faults and perform fault detection based on extracted specification-related rules (in this article, the rule is pairwise-constraint).
Cadar et al. proposed a new symbolic execution, KLEE, capable of automatically generating tests that have highcoverage for complex programs [20]. Symbolic execution symbolically performs program operation on variables. When symbolic execution reaches a branch, it performs a fork and creates a new branch in each iteration of a loop. We can use KLEE as a fault detection tool. Different from KLEE, our approach considers the fault semantic, which is specificationrelated. Driven by semantic information, it is more convenient to detect, localize, and fix software faults.
(3) Fault localization. There exist many fault localization techniques to localize fault(s) in a program. The spectrum-based fault localization technique is a popular fault localization technique [18], [29], [30]. It outputs a list of suspicious program entities in descending order based on their likelihood to be a root fault, and it is a popular method used by programmers to assist them in debugging. A spectrum was first introduced by Jones et al. [5]. A program spectrum consists of execution information from a perspective of interest. For example, a path spectrum may contain simple information, such as whether a path has been executed. Inspired by spectrum-based fault localization techniques, we can use the idea to localize the pairwise-constraint fault after the fault is detected by our approach.

B. DISCUSSION
According to RIPR model mentioned above, our approach can effectively detect pairwise-constraint faults and did not find the case of negative positive and false positives. However, two factors would generate negative positives and false positives in our approach.

1) LOOP PROCESSING
A loop in a program can lead to an infinite number of paths, and it is impossible to deal with. Inspired by prime path in the software testing community, our approach expands the sub-paths for each loop zero times or once. Therefore, our approach cannot detect pairwise-constraint faults if it appears on a path where the loop executes more than twice. In this case, our approach would produce a negative positive outcome possibly. A tradeoff solution is to specify the number of executions per loop by the programmers.

2) INFEASIBLE PATH
In essence, our approach is a static approach, which lists all path in a program and does not consider traditional testing. In this case, an infeasible path is a path that cannot be coved during the program executions, such as dead code, which results in infeasible requirements because the statements cannot be reached. Obviously, the infeasible path detection is an undecidable problem. Therefore, there will be some false positives. To solve this problem, we can combine dynamic testing to our approach to help programmers check false positives in the future work.

VII. CONCLUSION AND FUTURE WORKS
In this article, we proposed an approach that use algebra graph representation to detect pairwise-constraint software faults. We performed case studies to validate the effectiveness VOLUME 8, 2020 of our approach. The preliminary results showed that the approach can detect pairwise-constraint software faults before software testing.
There are several possible directions for our future work. Firstly, our approach assumes that the pairwise constraints are known. Obviously, this assumption is not always true. Therefore, we plan to acquire pairwise-constraints based on data mining techniques for the specification-document of the software. Secondly, it is interesting to pinpoint fault location based on the fault reports generated by our approach. Thirdly, to better validate the effectiveness of our approach, we plan to apply the approach to an ongoing large-scale industrial project in collaboration with our industry partners. Finally, pairwise-constraint faults are a special case of sequential constraint faults. We plan to extend our study to detect sequential constraint defects in future work.