Heuristic Synthesis of Reversible Logic – A Comparative Study

Reversible logic circuits have been historically motivated by theoretical research in low-power, and recently attracted interest as components of the quantum algorithm, optical computing and nanotechnology. However due to the intrinsic property of reversible logic, traditional irreversible logic design and synthesis methods cannot be carried out. Thus a new set of algorithms are developed correctly to synthesize reversible logic circuit. This paper presents a comprehensive literature review with comparative study on heuristic based reversible logic synthesis. It reviews a range of heuristic based reversible logic synthesis techniques reported by researchers (BDD-based, cycle-based, search-based, non-search-based, rule-based, transformation-based, and ESOP-based). All techniques are described in detail and summarized in a table based on their features, limitation, library used and their consideration metric. Benchmark comparison of gate count and quantum cost are analysed for each synthesis technique. Comparing the synthesis algorithm outputs over the years, it can be observed that different approach has been used for the synthesis of reversible circuit. However, the improvements are not significant. Quantum cost and gate count has improved over the years, but arguments and debates are still on certain issues such as the issue of garbage outputs that remain the same. This paper provides the information of all heuristic based synthesis of reversible logic method proposed over the years. All techniques are explained in detail and thus informative for new reversible logic researchers and bridging the knowledge gap in this area.


Introduction
Landauer, has shown that when an irreversible computational system perform any logic operation, a bit of information is erased [1], each of this information erased is converted to kTIn2 Joule of heat, where k is Boltzmann constant which is 1.38065·10 −23 J/K and T is the environment temperature [2]. Today, all computers erase bit of information every time a logic operation is performed due to the irreversible computational system used. As Moore's Law continues to hold, whereby the number of transistor in an integrated circuit doubles every 18 months [3], with the current irreversible technologies, that heat generated by each IC also doubles accordingly [4]. If this situation proceeds, Moore's Law will not remain valid after year 2020 as the amount of heat generated by the large number of transistors in an IC had reached a limit that the IC can bear and unable to go further.
An alternative way to overcome this problem is to use logic operations that do not erase information [5]. These types of logic operations are called reversible logic operations. Bennett [6] has proved that information lost would not occur if a computation is carried in a reversible way, since the amount of energy dissipated in a system bears a direct relationship to the number of bits erased during computation.
Reversible gates are logic gates that use reversible logic operation, its operation do not erase information and dissipate very less heat. Nowadays, research in reversible logic has received considerable attention in various areas such as low-power computing devices, optical computing [7], quantum computing [8] and nanotechnology [9].
The synthesis of reversible logic differs significantly from traditional irreversible logic synthesis due to the difference in their characteristic. A reversible gate is a logical cell that has the same amount of inputs and out-puts which has a bijective mapping between the input and output vectors. Direct fan-outs from a gate output to multiple gates and input as well as feedbacks from a gate output directly to its inputs are not allowed [10]. Due to such unique features of reversible circuits, existing algorithms and tools for circuit synthesis and optimization using irreversible logic gates cannot be used for reversible logic [11]. Therefore new methods are developed to synthesize and optimize reversible logic.
In this survey paper, we focused on heuristic based synthesis algorithm of reversible logic which includes most of the well-known algorithms proposed by different authors over the years.

Reversible Function
A logic function f (x 1 , x 2 , ..., x n ) of n Boolean variables is reversible if it maps each input assignment to an unique output assignment.

Input Output
Permutation Function x, y ,z p, q, r

1) Cycles
A cycle of length k which represented by disjoint cycles of variables is denoted by ( This representation is necessary for reversible logic because it is based on permutation which is a bijective function. The length of a cycle is the number of elements it contains [12]. A cycle with length two is called a transposition. When two cycles c 1 and c 2 are disjoint they can commute, i.e. c 1 c 2 = c 2 c 1 . Other than that, a cycle may be written in different ways as a product of transpositions and using different numbers of transpositions. Cycles can be categorized as even and odd with respect to the number of permutations [13]. Example: The truth table in Tab. 2 can be represented by (2, 3) (6, 7) because the corresponding function swaps 010 and 011; and 110 to 111.
Tab. 2: Reversible function and its permutation function.

Input Output
Permutation Function x, y ,z p, q, r

Reversible Gates
A reversible gate realizes a reversible function. If a reversible gate has k input and output wires, it is called as a k × k gate, or a gate on k wires [14]. Reversible gates have the same number of input and output i.e. one-to-one mapping between these two vectors. Therefore the input states can be always reconstructed from the output states. The commonly used reversible gates are illustrated below.
The gate maps a Boolean pattern (x 1 , x 2 , ..., x n+1 ) to (x 1 , x 2 , ..., x n ; x 1 x 2 ...x n ⊕ x n+1 ) for case n ≥ 2. For n = 1, it maps a Boolean pattern of (x 1 ) to (x 1 ). For n = 0, 1, 2 the gate are called as NOT, CNOT and Toffoli (C n NOT) gate. These three gates compose the universal NCT library. The general structure of NOT, CNOT and Toffoli gate is illustrated in Fig. 1.

3) Peres Gate
Peres gate [18] has only three inputs and outputs. The gate maps a Boolean pattern ( . The general structure of Peres gate is illustrated in Fig. 3(a).

4) Inverse Peres Gate
The inverse Peres gate is also known as the TF gate in [19]. The gate is the inverse connection of the Peres gate where we treat it outputs as inputs and inputs as outputs. The gate maps a Boolean pattern ( . The general structure of inverse Peres gate is illustrated in Fig. 3(b).

Elementary Quantum Gate
All reversible gates greater than 2 bits are realized with a combination of several elementary quantum gates [20]. The widely used elementary gates are the NOT, the CNOT, the controlled -V and controlled -V+ [16]. Unlike normal logic gates operation, elementary quantum gates manipulate with qubits rather than bits. In a bit, there are two states which are either 0 or 1, whereas for qubit, the two states are |0 and |1 , where notation '| ' is called the Dirac notation [21]. The difference between bits and qubits is that the qubit can be in the state other than |0 or |1 . It is also possible to form a linear combinations of |ψ = α|0 + β|1 often called as superposition, where α and β are complex numbers such that |α| 2 + |β| 2 = 1.
The diagram of controlled -V and controlled -V+ gates and operation are illustrated in Fig. 4. The elementary quantum gate constructing the Toffoli, Peres, SWAP and Fredkin gate are illustrated in Fig. 5.

Size Farbage
All reversible gates are associated with a cost called quantum cost. Quantum cost denotes the effort required to transform a reversible circuit into a quantum circuit. Quantum cost is measured based on the number of elementary quantum gate realized in the gate [22]. Each elementary gate contributes a quantum cost of ∆1. In Fig. 5(a), Toffoli gate have five elementary quantum gates, so its quantum cost is ∆5. In Fig. 5(d), Fredkin gate has seven elementary quantum gates, however it only has a quantum cost is ∆5, this is because two elementary quantum gates which shares the same line (quantum gates bracketed in the box) is consider as one. Quantum cost for SWAP, Peres and inverse Peres gate are ∆3, ∆4 and ∆4. Quantum cost of generalized Toffoli gates can be found in Tab. 3 [23]. For n ≥ 3 size Fredkin gate, the quantum cost is the same as the Toffoli gate.

Reversible Circuits
Reversible circuits are logic circuit constructed using only a combinational of reversible logic gates. In a reversible circuit connection, direct fan-outs from a gate output to multiple gates and input as well as feedbacks from a gate output directly to its inputs are not allowed [8], [10].

Garbage Output
The unused output of a reversible circuit which does not perform any operation is called the garbage output. These outputs are required to maintain the circuit in reversible to have an equal number of inputs and outputs [24]. Figure 6 shows an example of reversible function of f = x 1 x 2 ⊕ x 3 , the two unused pins are the garbage outputs.

Ancilla Input
The constant value input to a reversible circuit is called the ancilla input. In reversible circuit design, a reversible circuit with less ancilla input is preferred. However, some reversible function cannot be generated without using ancilla inputs i.e. a reversible AND gate required one ancilla as shown in Fig. 7(a). An ancilla input is added to a CNOT gate to act as a copying gate to explicit fan-out as shown in Fig. 7

Representation Model
Reversible functions can be described in several ways as below:

1) Truth Table
Truth table is a straightforward representation to represent a Boolean function but become cumbersome for a large number of variables [25]. A reversible function of n variable can be represented in a column wide of n and a row of 2 n . Figure 8 and Tab. 4 shows an example of a reversible circuit and its truth table representation. Input Output Matrix based representation can better reflects the quantum state evolution and the properties of quantum computation however it become cumbersome for reversible function with a large number of variable. It represents the permutation function of a reversible function in a 0-1 matrix with only one 1 appears in each column. Example below shows a matrix representation of a CNOT gate with it truth table defined in Tab. 5.
Input Output The specific definition of CNOT assumes an eigenbasis of: writing all the output in column, we obtain: (2)

3) Binary Decision Diagram (BDD)
Any Boolean function can be graphically represented by different type of Decision Diagrams (DD) [26], [27]. A BDD is a directed acyclic graph where a Shannon decomposition (f =x 1 f xi=0 +x i f xi=1 ) is carried out in each non-terminal node. Generally BDD of a function may require a large amount of nodes which become impractical for function with large variables. Fig. 9 shows an example of a BDD.

4) Positive Polarity Reed-Muller (PPRM) Expansion
Any Boolean function can be described using an EXOR sum-of-product (ESOP) expansion [28]. The PPRM expansion only uses uncomplemented variables and it can be derived from the function's sum-of-product (SOP) expression. To uncomplemented a complemented variable, this rules can be appliedā = 1 ⊕ a.
Where a i ∈ {0, 1} and x i are all uncomplemented (positive polarity).

Synthesis Algorithm
In this section, all heuristic based synthesis algorithms are described in the following subsections.

BDD-Based Synthesis Algorithm
Binary Decision Diagrams (BDDs) synthesis algorithm was first proposed by Kerntopf in [29]. The algorithm selects reversible gates, one at a time, based on the complexity of the reminder logic. In this method the Decision Diagrams are constructed for all the possible functions and minimal node BDD is selected. In [30], the algorithm synthesis a function starts by directly constructing a BDD. Then each node of the BDD is substituted by a cascade of reversible gates as seen in Fig. 10. As BDDs may contain shared nodes which result in fan-outs which are not allowed in reversible logic, therefore, additional circuit lines are needed to overcome this problem. The function being synthesis will result in a circuit composed of Toffoli or elementary quantum gates respectively are obtained in linear time and with memory linear to the size of the BDD. The algorithm is able to synthesize large functions with more than a hundred of variable in low running-time. This algorithm leads to a good reduction in both quantum cost and run-time, but many constant and garbage lines are added which makes the results impractical.
In [31], a post-process optimization method is used to reduce the number of lines by merging some garbage output lines with appropriate constant input lines. Therefore, the resulting circuit generated by [30] can apply the algorithm in [31] to reduce the constant and garbage lines.

Cycle-Based Synthesis Algorithm
Cycle-based synthesis methods can be described as in Fig. 11. The method works by separating the entire permutation into a set of cycles and synthesize them separately. This divide and conquer method is effective against reversible functions that leave many inputs unchanged. In [12], the authors proposed a synthesis method which can be implemented without temporary storage channels using the NCT library set. Each pair of disjoint transpositions is implemented by a synthesis algorithm and the final circuit is constructed by cascading individual circuits.
In [32], an extension of the method from [12] is done which it reduces the unnecessary large number of cycle and synthesis cost by applying NOT and CNOT gate instead of Toffoli gate for many situations.
In [33], the synthesis algorithm decomposed a given large cycle into a set of single 3-cycles, pairs of 3-cycles and pair of 2-cycles and synthesize the resulted cycle directly.
However in [34], the authors develop a k -cycle-based synthesis method that uses a set of seven building blocks directly to synthesize a given permutation to reduce both quantum cost and average run-time. The seven building blocks includes a pair of 2-cycle, a single 3-cycle, a pair of 3-cycles, a single 5-cycle, a pair of 5-cycles, a single 2-cycle (4-cycle) followed by a single 4-cycle (2-cycle) and a pair of 5-cycles. In [35], a more efficient decomposition algorithm was proposed. The algorithm produces all minimal and in-equivalent factorizations each of which contains the maximum of disjoint cycles. Then a graph perfect matching algorithm is used to select the best possible matching pairs with the minimum cost.
In [36], the authors presented an implementation of an algorithm for finding optimal gate count of any 4bit based reversible function. The algorithm is based on the set of all functions that have an optimal circuit up to 9 gates can be effectively stored in the computer nowadays. For each equivalent class of reversible function the algorithm stores them only in their canonical form and thus reduces in the memory consumption. The reversible function database used in the algorithm is stored as hash tables. Using the database, the gate count of the optimal circuit can be easily found in a short amount of time through lookup in their canonical representative form. For synthesized reversible function that requires more than 9 gate counts; an optimal circuit additional processing is used. The algorithm will partitioned the function into two circuits such that f = g • r, where f refers to the synthesized function, g and r refers to the two partitioned circuits and • denotes cascading of the circuits. By using the database, and partitioned synthesis, the algorithm archives great synthesis time.
In [37] the authors proposed a similar approach to [36] however their objective is to optimize in term of quantum cost with a given gate count. The authors have further extended their work in [38] which further improves the quantum cost result and able to optimize circuits of more than 4-bits.
In [39], an extension of the method from [36] is presented. The algorithm removes all inverse functions and added several new functions to the circuit databases. This improves the performance of the algorithm and allows to synthesize more 4-bit reversible functions.
In [40], the algorithm in [36] and [39] is extended which it combines the algorithm with depth-first search method for more effective pruning in the search tree. During the synthesis, a reversible gate is selected at each step and is added at the end of the previous analysed gate cascade and result is checked if it gives a circuit for the specified reversible function with the selected number of gates. The check is done by calculating the reversible function to be constructed and calculate the optimal gate count required. Once a solution is found, the algorithm backtracks and uses other possible reversible gates for the specific reversible function to get a better solution. Besides, the authors have added polarity control to the NCT library gate. The result shows improvement in term of gate count and quantum cost.

Search-Based Synthesis Algorithm
Search-based synthesis methods can be described as in Fig. 12. The method works by separating the entire permutation into a set of cycles and synthesize them separately. This divide and conquer method is effective against reversible functions that leave many inputs unchanged.
In [41], the authors' algorithm uses the positivepolarity Reed-Muller decomposition at each stage to synthesize the function using only CNOT and C 2 NOT(Toffoli) gate. The primary objective of their algorithm is to minimize the number of gates (ie. factors) needed to transform a PPRM expansion into the identity function. Their secondary objective is to minimize the size of the individual gates (i.e. the number of literals in each factor). In order to take advantage of shared functionality among multi-output func- tions, candidate factors are selected among common sub-expressions of PPRM expansions. However the method does not guarantee that the resulted PPRM expression contains fewer terms. In [42], the authors proposed a hybrid behavior of depth-first search (DFS) and breadth-first search (BFS) synthesis algorithm. Their algorithm is able to reduce the tree depth without decreasing the quality of results. In [43] improves the method of [41] by introducing Peres, reverse Peres and Fredkin gates into their search-based algorithm.

Non-Search-Based Synthesis Algorithm
In [11], the author proposed a non-search based synthesis algorithm. Compared with most widely used search-based methods whereby they evaluate all possible gates to find an implementation of the circuits, this method cannot be used when synthesize large functions. However this can be avoided in non-search based synthesis algorithm as it is able to produce a solution for a given specification without evaluation of all possible gates during each step. The synthesis algorithm is similar to [44] just that they have used multiplecontrolled Toffoli gates with both positive and negative controls. The algorithm works on the truth table into the identity function. The algorithm always converges and leads to a valid result very fast compared to search-based method. The following example shows a reversible function generates into its reversible circuit using the non-search based method. Table 6 represents the truth table of transformation in each step of the reversible function. During each step of the transformation, a Karnaugh map is used to decide what gate to be used as seen in Fig. 13 and Fig. 14. Tab. 6: Truth table representation.

F
Step 1 Step 2 Step 3

Rule-Based Synthesis Algorithm
In [45] a rule-based optimization approach of reversible logic is introduced. The synthesis algorithm uses both positive and negative control Toffoli gate during the optimization. A set of rules for removing NOT gates and optimizing sub-circuits with common-target gates are proposed. The synthesis algorithm can be broken into two steps, the first step uses NOT gates across a given reversible circuit to delete redundant NOT gate to improve the total circuit cost as can be seen in Fig. 15.
Then the second step is to use a Karnaugh map-based optimization introduced in [46] to optimize sub-circuits with common-target gates as can be seen in one of the examples in Fig. 16 where the reversible circuit is further optimized using the Karnaugh map.

Transformation-Based Synthesis Algorithm
In [47] a set of Toffoli based network transformation rules is introduced. The algorithm mainly served to bring a network to a canonical form. The transformation is done based on six local transformation rules which are applied for a sequence of Toffoli based gates. The disadvantages of the approach are that it produces a high number of garbage bits. All the application rules were further extended in [44]. The synthesis algorithm synthesizes reversible function in terms of n × n Toffoli gates and uses several transformation rules on a set of predefined patterns called templates. The circuit is constructed by a single pass through the specification with a minimal look ahead and no back tracking. Reduction rules are applied using simple template matching. The synthesis method works by comparing the truth table between the input and the output. For a given input or output, reversible gates are applied to transform them into identity function. To select which function to transform on which gate to be used, Hamming Distances between the input and output are used. The algorithm iterates through the row of the truth table looking for the different in input and output and transforms them with multiple-control Toffoli gates. The function of the algorithm can be illustrated in Fig. 17. Later in [48] the synthesis of Toffoli networks are divided into two steps, the first step finds a network that realizes the desired function and the second stage transform the network such that it uses lesser gates while realizing the same function. In [49], the authors further improved the template matching algorithm proposed in the previous work in [48] by replacing the Hamming Distance method with the Reed-Muller Spectra. Using the Reed-Muller Spectra, the reversible functions are represented in their PPRM expansion which can be easily substituted using reversible gates and thus improves in the overall synthesis result.
In [50], a modification of [44] is presented which transverse the truth table according to specially constructed ordering of rows. Explicit storage of truth tables has also been avoided and the data for synthesis is represented implicitly allowing for the synthesis of very large functions.

Finding, Discussion and Comparison
To analyze the effectiveness of reversible logic synthesis algorithms results, a certain benchmarking circuits are used. Benchmarking circuits are taken from [51] and [52] where these web pages offers a widely used reversible benchmark functions and a list of the proposed algorithm review over the years. All benchmarking are clearly listed and their currently best known circuit is presented. In Table 8 all the key features of each synthesis algorithm are listed. This table has five columns: 1 st : describes synthesis methods proposed by researchers; 2 nd : important feature considered for different approaches; 3 rd : Limitation of each algorithm; 4th: library function used and the last column indicates the metric. Table 9 shows those benchmarks functions with most synthesis algorithm comparing side by side in terms of gate count and quantum cost. For several synthesis algorithm which their synthesis result using benchmarking functions cannot be found are neglected. For those synthesis algorithms which are listed in Tab. 9, not all benchmarking result can be obtained, therefore for those we are not able to obtain, a symbol "-" are present in the table.
From Tab. 9, we observed that for the same approaches, the newer algorithm has slightly improved the synthesis outcome. For [31], as their algorithm reduction is many circuit dependent therefore to have the condition met their algorithm for reduction are less. As a result, their synthesis algorithm outcome only slightly improved from their previous one [30]. For benchmarking functions which have less number of variable such as the 4mod5, hwb5, sym6 where the function contain the most 6 variables, [43] and [49] can perform good simulation.