Design of elliptic curve cryptoprocessors over GF(2 163 ) using the Gaussian normal basis

This paper presents an efficient hardware implementation of cryptoprocessors that perform the scalar multiplication kP over a finite field GF(2 163 ) using two digit-level multipliers. The finite field arithmetic operations were implemented using the Gaussian normal basis (GNB) representation, and the scalar multiplication kP was implemented using the Lopez-Dahab algorithm, the 2-non-adjacent form (2-NAF) halve-and-add algorithm and the w  NAF method for Koblitz curves. The processors were designed using a VHDL description, synthesized on the Stratix-IV FPGA using Quartus II 12.0 and verified using SignalTAP II and Matlab. The simulation results show that the cryptoprocessors provide a very good performance when performing the scalar multiplication kP. In this case, the computation times of the multiplication kP using the Lopez-Dahab algorithm, 2-NAF halve-and-add algorithm and 16  NAF method for Koblitz curves were 13.37 µs, 16.90 µs and 5.05 µs, respectively.


Introduction 123
The use of computer networks and the steady increase in the number of users of these systems have driven the need to improve security for the storage and transmission of information.There are many applications that must ensure the privacy, integrity or authentication of the information stored or transmitted.The security of the applications has been resolved by using different cryptographic algorithms, which are used in private-or public-key cryptosystems.
The security of public-key cryptosystems is based on mathematical problems that are computationally difficult to resolve, i.e., problems for which there are no known algorithms to resolve them in 1 Paulo Cesar Realpe Muñoz.Bs in Physic Engineering, Universidad del Cauca, Colombia.M.Sc. in Electronics Engineering, Universidad del Valle, Colombia.Affiliation: Universidad del Valle, Colombia.E-mail: paulo.realpe@correounivalle.edu.co 2 Vladimir Trujillo Olaya.Bs in Electronic Engineering, Universidad del Valle, Colombia.M. Sc. in Electronics Engineering, Universidad del Valle, Colombia.Affiliation: Universidad del Valle, Colombia.E-mail: vladimir.trujillo@correounivalle.edu.co a practical time.Because of the high volume of information processed, electronic systems are required to perform the encryption and decryption processes in the shortest time possible without compromising the security.In this regard, hardware implementations of cryptographic algorithms have advantages, such as high speed, high security levels and low cost.
One of the most important cryptosystems is the elliptic curve cryptosystem (ECC), proposed independently by Koblitz (Kobliz, 1987) and Miller (Miller, 1986).There have been several investigations of the theory and practice of this cryptosystem.The results of the investigations demonstrated the ability of these systems to encrypt information and concluded that this cryptosystem offers better security, efficiency and memory usage.The hardware implementations of ECCs have many advantages and are used in equipment such as ATMs, smart cards, telephones, and cell phones.
In elliptic curve cryptography, it is known that finding the discrete logarithm of a random elliptic curve element with respect to a publicly known base point, that is, the elliptic curve discrete logarithm problem or ECDLP, has high hardness.The entire security of the ECC depends on the ability to compute the scalar multiplication and the inability to compute the multiplicand given the original and product points.Furthermore, the finite-field size of the elliptic curve determines the computational complexity of the above problem.
Several works regarding scalar multiplication over a finite field GF(2 m ) have been proposed and implemented efficiently in hardware.
C. Rebeiro and D. Mukhopadhyay (Rebeiro and Mukhopadhyay, 2008) presented a cryptoprocessor with novel multiplication and inversion algorithms.J.Y. Lai, T.Y.Hung, K.H. Yang and C.T. Huang (Lai et al., 2010) proposed an architecture for elliptic curves along with the operation scheduling for the Montgomery scalar multiplication algorithm.B. Muthukumar and S. Jeevananthan (Muthukumar and Jeevanahthan, 2010) implemented an elliptic curve coprocessor, which is a dual-field processor with a projective coordinate.A.K. Rahuman and G. Athisha (Rahuman and Athisha, 2010) presented an architecture using the Lopez-Dahab algorithm for the elliptic curve point multiplication and Gaussian normal basis (GNB) for field arithmetic over GF(2 163 ).M. Amara and A. Siad (Amara and Siad, 2011) proposed an EC point multiplication processor intended for cryptographic applications such as digital signatures and key agreement protocols.X. Cui and J. Yang (Cui and Yang, 2012) implemented a processor that parallelizes the computations of the ECC at the bit-level and gains a considerable speedup.The processor is fully implemented in hardware and supports key lengths of 113 bits, 163 bits and 193 bits.
In this context, we present in this work efficient hardware implementations of cryptoprocessors over GF(2 163 ) using a GNB representation and the Lopez-Dahab algorithm, 2-NAF halve-and-add algorithm and w-NAF method for Koblitz curves (Anomalous Binary Curves or ABC) with window sizes of 2, 4, 8 and 16 to perform the scalar multiplication kP.
The main contributions of this work are: (i) the hardware design of cryptoprocessors using the GNB over GF(2 163 ) and three scalar multiplication algorithms (Lopez-Dahab, halve-and-add and w-NAF method for Koblitz curves) to determine the best cryptoprocessor for embedded cryptographic applications.(ii) an efficient hardware implementation of cryptoprocessors based on the w-NAF method with different window sizes for the Koblitz curves.They present the best trade-off between the computation time and area, obtaining a higher performance than the other cryptoprocessors reported in the literature.Additionally, they are very suitable for hardware cryptosystems.

Mathematical background
GNB representation ANSI X9.62 (ANSI, 1999) describes the detailed specifications of the ECC protocols and uses the GNB to represent the finite field elements (NIST, 2000).An element over GF(2 m ) has the computational advantage of performing squaring very efficiently.How-ever, multiplying distinct elements can be cumbersome.In this case, there are multiplication algorithms that make this operation both simpler and more efficient.
A normal basis over GF(2 m ) is as follows: where   GF(2 m ) and any element A ∈ GF(2 m ) can be written as follows: The type T of a GNB is a positive integer and measures the complexity of the multiplication operation with respect to that basis.Generally, the type T of a smaller value provides a more efficient multiplication.For a given m and T, the field GF(2 m ) can have at most one GNB of type T. A GNB exists whenever m is not divisible by 8. Let m and T be two positive integers.Then, the type T of a GNB over GF(2 m ) exists if and only if p =Tm+1 is prime.
is represented by the binary string ( 0  1  2 …  −1 ), where   ∈ {0, 1}.In this case, the multiplicative identity element is represented by the bit string of all ones.
The additive identity element is represented by the bit string of all zeros.An important result for the GNB arithmetic is Fermat's Theorem.For all   GF(2 m ), then This theorem is important for performing the squaring of an element over GF(2 m ).

Finite field arithmetic operations
The following arithmetic operations can be performed over GF(2 m ) when using a normal basis of type T.
In this case, squaring is a simple rotation of the vector representation.
Inversion: If A ≠ 0 and A  GF(2 m ), the inverse of A is C  GF(2 m ), and C is the only element of GF(2 m ) such that AC = 1, i.e., C = A - 1 .The algorithm used to calculate the inversion is based on equation (7): Itoh and Tsujii (Itoh and Tsujii, 1998) proposed a method that reduces the number of multiplications to calculate the inversion, and it is based on the following: Trace: If A is an element over GF(2 m ), the trace of A is: If A = (a0a1a2…am-1) and it is represented in a normal basis, then the trace can be computed efficiently as follows: The trace of the element A has two possible values (0 or 1).Quadratic equation solving over GF(2 m ): If A is an element of GF(2 m ) represented in a normal basis, then the quadratic equation: has 2  2T solutions over GF(2 m ), where T = Tr(A).Therefore, if T = 1, there is no solution, and if T = 0, there are two solutions.If z is one solution, then the other solution is z + 1.For example, if A = 0, the solutions are z = 0 and z = 1 (IEEE std 1363(IEEE std , 2000)).The algorithm 1 calculates the quadratic equation over GF(2 m ) for a normal basis representation.

Square root: Let
In this case, the square root in a normal basis is a simple rotation of the vector representation (IEEE std 1363(IEEE std , 2000)).

Elliptic curve arithmetic
A non-supersingular elliptic curve E(Fq) is defined as a set of points (x, y)  GF(2 m )×GF(2 m ) that satisfies the affine coordinates equation, where a and b  Fq and are constants with b ≠ 0 together with the point at infinity denoted by O.The group operations for the elliptic curve arithmetic in affine coordinates are defined as follows.Let P = (x1, y1) and Q = (x2, y2) be two points that belong to the curve, and let the addition inverse of P be defined as P = (x1, x1 + y1).Then, if Q ≠ P, the point P + Q = (x3, y3) can be computed as: Using the group operations above, the elliptic curve scalar multiplication can be defined as follows.Let E be an elliptic curve over GF(2 m ), let Q and P  E be two arbitrary elliptic points satisfying equation ( 13), and let k be an arbitrary positive integer.Then, the elliptic curve scalar multiplication Q = kP is defined as: Considering the group operations described in equations ( 14) and (15) using the finite field arithmetic in affine coordinates, three main elliptic curve operations can be defined: point addition, point doubling and point halving.In the group operations, the inversion is the arithmetic operation that is most expensive over GF(2 m ), and this operation can be avoided with a projective coordinate representation.In this case, the inversion is avoided by using the finite field multiplication.
A point P in the projective coordinates is represented using three coordinates (X, Y and Z).For the Lopez-Dahab (LD) projective coordinates (Lopez and Dahab, 1999), the projective point (X : Y : Z) with Z ≠ 0 corresponds to the affine coordinates x = X/Z and y = Y/Z 2 .Then, equation ( 13) can be mapped from the affine coordinates to the LD projective coordinates as: The three group operations for the elliptic curve arithmetic in the projective and affine coordinates can be computed as (Menezes et al., 2003): 1. Point doubling Q = 2P, where Q = (X3 : Y3 : Z3) and P = (X1 : Y1 : Z1) in the projective coordinates, can be performed using 4 finite field multiplications, such as 2. Point addition Q + P, where Q = (X1 : Y1 : Z1) in the projective coordinates and P = (x2, y2) in the affine coordinates, can be performed using 8 finite field multiplications, such as 3. Point halving Q/2 is the inverse operation of point doubling.Let P = (x1, y1) and Q = (x2, y2) be the points over the curve (13) in the affine coordinates.The point halving operation is performed by computing P such that Q = 2P by solving the following equations: where If Q in the -representation is the input of the point halving algorithm, then it is possible to compute point halving without using the affine coordinates.In scalar multiplication, repeated point halving operations can be performed directly on the -representation.However, when a point addition is required, a conversion to the affine coordinates must be performed.Algorithm 2 computes the point halving operation.

Koblitz Curves
Koblitz curves, or anomalous binary curves, are elliptic curves defined over GF(2 m ).The main advantage of these curves is that the scalar multiplication operation can be performed without the use of point doubling operations.
An algorithm for scalar multiplication on Koblitz curves is presented by Solinas (Solinas, 2000).The Solinas algorithm or the τadic window method computes a special τ-adic expansion of an integer number in ℤ[τ].For example, a special τ-adic expansion is the window τ-adic non-adjacent form (τNAF).The Koblitz curves are curves defined over GF(2 m ) by: where a  {0,1}, that is, curves E0 and E1.
These curves present the following property: If P(x, y) is a point on the curve Ea, then the point (x 2 , y 2 ) is also a point on Ea.In addition, they satisfy (x 4 , y 4 ) + 2(x, y) = µ(x 2 , y 2 ) for each point (x, y) on Ea, where µ = (1) 1a .In GF(2 m ), the Frobenius map  is an endomorphism that raises every element to its power of two, i.e.,  : x → x 2 .Then, the Frobenius endomorphism is performed efficiently (cost-free) when the elements of the finite field are represented in a normal basis (Cui and Yang, 2012).Koblitz shows that the point doubling operation can be performed efficiently by using the Frobenius endomorphism, if the binary curve is defined over GF(2 m ) and a ∈ {0, 1}.Then, the Frobenius map can be defined as The -adic representation can be obtained by repeatedly dividing k by , where the remainders of each division step are named digits ui.This procedure is also used to obtain the representation's NAF of the scalar k, namely, k is repeatedly divided by 2. To decrease the number of point additions for the scalar multiplication, it is necessary to obtain a NAF representation of k that achieves a smaller number of nonzero digits.The scalar multiplication can be computed as: The result corresponds to the Hamming weight of the NAF, and it is equal to the binary NAF representation, i.e., the Hamming weight ≈ (log2 k)/3, and the length of the -adic representation of k is approximately 2m, which is twice the length of the binary NAF representation.However, Solinas presents a method that reduces the length of the -adic representation to approximately m.Thus, the Koblitz curves' arithmetic is based on the point addition and Frobenius map .

Hardware architectures for elliptic curve cryptoprocessors
In this section, we present the hardware architectures for elliptic curve cryptoprocessors over GF(2 163 ) using a Gaussian normal basis.Each cryptoprocessor is designed using one algorithm for the scalar multiplication, namely, the Lopez-Dahab algorithm (Lopez and Dahab, 1999), the halve-and-add 2-NAF algorithm (Menezes et al., 2000) and the w-NAF method for Koblitz curves with w = 2, 4, 8 and 16 (Solinas, 2000).

Digit-level multiplier
The finite field multiplication over GF(2 m ) is an operation that is more important for performing the scalar multiplication.Thus, this operation must be implemented efficiently in hardware.There are several algorithms for performing the finite field multiplication that are presented in Azarderakhsh and Masoleh (2010), Huang et al. (2011,), Wang and Fan (2012) Lee and Chiou (2012).
Azarderakhsh and Masoleh (Azarderakhsh and Masoleh, 2010) proposed a serial or parallel digit-level multiplier with a digit-size d, where 1 ≤ d ≤ m.In this case, if d = m, the multiplier is parallel and if d < m, it is serial and requires M = m/d, 1 ≤ M ≤ m, clock cycles to generate all the m coefficients of C = AB = (c0c1c2…cm-1), where A = (a0a1a2…am-1) and B = (b0b1b2…bm-1) are elements represented in a GNB over GF(2 m ). Figure 1 shows the digit-level GF(2 m ) multiplier for T = 4, where A, B and C are registers for storing the input and output elements.The block r is formed by the blocks r1 and r2, and its structure depends on type T of the GBN with T  2 and the multiplication matrix R. The block J is a set of m, two-input AND gates.The block CS is a d-fold cyclic shift and an adder GF (2 163 ), which is a set of two-input XOR gates.
The block r1 is an optimal set of XOR gates that are obtained using ( 27), and r2 is a set of XOR gates that are obtained from the main matrix r: The time complexity of the digit-level multiplier is TA + (2 + log2m)TX, where TX and TA are the delay time of a twoinput XOR gate and a two-input AND gate, respectively.The area complexity of this multiplier is m 2 ANDs and  2m 2 -2m XORs (Azarderakhsh and Masoleh, 2010).
To implement the digit-level multiplier with a digit-size d = 55 in hardware, that is M = 3 clock cycles, a Matlab code is written to generate the equations of the blocks r1 and r2, which are synthesized using VHDL.

Hardware architecture using the Lopez-Dahab algorithm
The scalar multiplication kP for non-supersingular elliptic curves over binary fields using the Lopez-Dahab algorithm is shown in Algorithm 3, which is a modified version of the Montgomery algorithm, where the same operations are performed during each iteration of the main loop (D.Hankerson et al., 2003).
To implement the above algorithm in hardware, we initially define three functions: Madd() performs the point addition, Mdouble() performs the point doubling and Mxy() performs the conversion from projective to affine coordinates.These functions are defined as follows: where, (x, y) and (x3, y3) are the coordinates of points P and Q = kP, respectively.
Point addition and point doubling are implemented in hardware using the data dependence graph shown in Figure 2, and the conversion from the projective to affine coordinates is implemented using two digit-level multipliers for the data dependence graph shown in Figure 3.The inversion operation is implemented using the Itoh-Tsujii algorithm (Itoh and Tsujii, 1998).According to Figures 2 and 3, the latencies for Madd and Mdouble and the projective to affine conversion are 3M and 15M +1, respectively, where M is the latency for a finite field multiplication.
In step 4 of Figure 3, two multipliers are used, and one of them with the block of rotation performs the inversion of an element A  GF(2 163 ).In this case, the latency of the inversion is 10M because it needs 10 finite field multiplications for m = 163.In step 6, a multiplier is only used because the last operation of the coordinate conversion requires a multiplication.
The architecture of the cryptoprocessor over GF(2 163 ) using the Lopez-Dahab algorithm is shown in Figure 4.It uses two register files, two parallel digit-level multipliers, one inversion block, several squaring and adder blocks, a main control and an FSM to perform the point addition, point double and conversion from the projective to affine coordinates.The functional blocks that perform the finite field arithmetic operations over GF(2 163 ) for the Lopez-Dahab cryptoprocessor are shown in Figure 5.It is important to mention that the performance of any cryptoprocessor depends on the efficient implementation of the hardware for the finite field arithmetic.

Figure 5. Functional blocks of the finite field arithmetic
The main control is an FSM that generates the control signals to perform the scalar multiplication, process the key, initialize the cryptoprocessor and control the I/O registers.The second FSM performs the point addition, point doubling and conversion from the projective to the affine coordinates.

Figure 6. ASM chart of the main control
In Figure 6, the ASM chart of the main control is shown, where the variables X1, Z1, X2 and Z2 are initialized and stored in the register files.Each bit of the scalar k is evaluated from left to right to perform the operations Madd and Mdouble using the data dependence graph shown in Figure 2. If the bit ki is '1', then Madd(X1,Z1,X2,Z2), Mdouble(X2,Z2) are computed.
Else, Madd(X2,Z2,X1,Z1), Mdouble(X1,Z1).When all bits of the scalar k are evaluated, the conversion from the projective to affine coordinates is executed using the data dependence graph shown in Figure 3, and kP in the affine coordinates is stored in the output register.
Algorithm 3 is more resistant against simple power analysis and timing attacks.This is because the computation cost does not depend on the specific bit of the scalar k.For each bit of the scalar k, one point addition and one point doubling are performed.The proposed scheme has two different execution paths depending on the current bit of the scalar k.Both execution paths have the same complexity and require the same number of clock cycles.

Hardware architecture using the halve-and-add algorithm
Schroeppel (Schroeppel, 2000) and Knudsen (Knudsen, 1999) independently proposed the halve-and-add algorithm to accelerate the scalar multiplication on the elliptic curves defined over the binary extension fields.This algorithm uses an elliptic curve primitive called point halving as shown in algorithm 2.
Because, theoretically, the point halving operation is three times faster than the point doubling operation, it is possible to accelerate the scalar multiplication Q = kP by replacing the double-and-add algorithm with the halve-and-add algorithm, which uses an expansion of the scalar k in terms of negative powers of 2 (Mercurio et al., 2006).
In the halve-and-add algorithm, it is necessary to transform the integer k = (km-1,…,k0)2.If k´ is defined by where n represents the order of the base point P, then , where each nonzero coefficient ki is odd and at most, one of any w consecutive digits is nonzero.In this case, the NAFw of k can be computed using algorithm 4.  Return (ki1,…,k1,k0) In this work, a Maple code is written to obtain the expansion coefficients NAFw with w = 2, namely, the coefficients NAFw(2 t-1 k mod n), which are represented by 2-bits.
The halve-and-add algorithm is shown in algorithm 5. Step 3 of the algorithm performs the point addition Qi + P in the Lopez-Dahab mixed coordinates (Qi and P are represented in LD projective and affine coordinates, respectively) using equation ( 14) and the halving point P/2 in the affine coordinates or -representation, if bit ki´ ≠ 0; else, compute point halving.In this case, it is important to mention that if the results of the first two operations A and B of equation ( 19) are equal to zero, the point doubling 2P is performed in the LD projective coordinates using equation ( 18) with X1 = x2, Y1 = y2 and Z1 = 1.
Algorithm 5: Halve-and-add w-NAF point multiplication The point addition in the LD mixed coordinates and the point doubling in the LD projective coordinates are implemented in hardware using the data dependence graphs shown in Figure 7 and Figure 8, respectively.According to Figures 7 and 8, the latencies for the point addition and point doubling are 5M and M + 3, respectively.
The architecture of the cryptoprocessor over GF(2163) using the halve-and-add algorithm is shown in Figure 9, and it uses two register files, two digit-level finite multipliers, one solving quadratic equation block, one point halving block, several squaring and adder blocks, a main control and an FSM to perform the point addition, point doubling and point halving.

Figure 9. Elliptic curve cryptoprocessor using the halve-and-add algorithm
The functional blocks that perform the finite field arithmetic operations over GF(2 163 ) for the halve-and-add cryptoprocessor are shown in Figure 10.In this case, finite field arithmetic operations are the addition, squarer, square root, trace, half trace (quadratic equation solving in a normal basis) and multiplication.

Figure 10. Blocks of the finite field arithmetic
The main control is an FSM that generates the control signals to perform the scalar multiplication, process the key, initialize the cryptoprocessor and control the I/O registers.The second FSM performs the point addition, point doubling and point halving.
Idle start

Figure 11. ASM chart of the main control
In Figure 11, the ASM chart of the main control is shown, where the sequence processing is as follows: initialize coordinate Q according to the sign of the bit  −1 ′ ; perform the point halving operation on P; evaluate the bit   ′ for i > t1; compute the point addition in the LD mixed coordinates and point halving on P if   ′  0, else compute point halving; and perform the conversion of the point P in the -representation to the affine coordinates only when a point addition is required.Finally, Q = kP is obtained in the LD projective coordinates.
Algorithm 7 performs the rounding of a complex number 0 + 1 with 0 and 1  ℚ to obtain an element  ℤ[].

Hardware architecture using the w-τNAF algorithm
The length of the -adic representation for k = d0 + d1  ℤ[] is roughly twice log2(max(d0, d1)).Solinas (Solinas, 2000) presents a method that reduces the length of the -adic representation.The objective is to find r  ℤ[] of small norm with r  k (mod δ), where δ = ( m  1)/(  1), and use NAF(r) to calculate rP.
Algorithm 6 calculates an element r´  k (mod δ), which is also written as r´ = k partmod δ.Solinas proved that l(r) ≤ m + a and if C  2, then l(r´) ≤ m + a + 3.
The w-NAF expansion can be efficiently computed using algorithm 8, which can be viewed as an approach similar to the general NAF algorithm.In this work, a Maple code is written to obtain the expansion w-NAF of the scalar k with w = 2, 4 and 8, generating 8-bit expansion coefficients and w = 16, generating 16-bit expansion coefficients.
Algorithm 8: Computing a w-NAF of an element in ℤ[] Solinas proposed algorithms to compute kP using the window NAF method for the scalar k, namely, kP is calculated using the w-NAF method and Horner's rule (Solinas, 2000).An efficient scalar multiplication algorithm that uses the w-NAF method is presented in algorithm 9, where step 1 calculates the w-NAF of the scalar k with the partial reduction modulo  = ( m 1)/(1), namely, w-NAF(r  k mod()), where r  k mod() is obtained from algorithms 6 and 7; step 2 generates the multiples of the point P and step 4.2 performs the point addition Q + Pu, when the bit ui ≠ 0, and point doubling 2Q, when the results of the two first operations A and B of equation ( 19) are equal to zero. 2. Compute P u = α u P, for u ∈ {1, 3, 5, … , 2 w−1 − 1} 3. Q ← ∞ 4. For i from l −1 downto 0 do 4.1 Q ← τQ 4.2 if u i ≠ 0 then Let u be such that The point addition in the LD mixed coordinates and the point doubling in the LD projective coordinates with b = 1 are implemented in hardware using the data dependence graphs shown in Figure 7 and Figure 8, respectively.The architecture of the cryptoprocessor over GF(2 163 ) using the w-NAF algorithm for Koblitz curves is shown in Figure 12, and it uses two register files, two digit-level finite multipliers, one Frobenius map block, one RAM that stores the expansion coefficients w-NAF of the scalar k, two ROMs that store the pre-computed points Pu in the affine coordinates, which were obtained from Matlab for w = 2, 4, 8 and 16, several squaring and adder blocks, a main control and an FSM to perform the point addition, point doubling and Q.
The functional blocks that perform the finite field arithmetic operations over GF(2 163 ) for the wNAF cryptoprocessor for Koblitz curves are shown in Figure 13.The main control is an FSM that generates the control signals to perform the scalar multiplication, process the key, initialize the cryptoprocessor and control the I/O registers.The second FSM performs the point addition, point doubling and Q.
In Figure 14, the ASM chart of the main control is shown, where the sequence processing is as follows: initialize the Q coordinate according to the sign of the bit ui of the w-NAF expansion; evaluate the bits ui for i > t1; and compute the point addition in the LD mixed coordinates and the Frobenius map  on Q, if ui  0. Else, compute Q.Finally, Q = kP is obtained in the LD projective coordinates.In Figure 14, the ASM of the FSM is shown.One important remark is that the Koblitz curves are resistant to simple power analysis and to all the known special attacks (T.Juhas, 2007).

Hardware verification and synthesis results
The López-Dahab, halve-and-add and w-NAF cryptoprocessors are described using generic structural VHDL, are synthesized for a digit-size of d = 55 on the Stratix-IV FPGA (EP4SGX180HF35C2) using the Altera Quartus II version 12 design software for the implementation and are verified using SignalTap II and Matlab.

Synthesis results for the cryptoprocessors
The synthesis results of the cryptoprocessors over GF(2 163 ) are shown in Table 1.Additionally, some of the data presented in Table I are plotted in Figure 18.From Figure 18, we can see that the w-NAF cryptoprocessorwith w = 16 performs the scalar multiplication at a faster time (5.05 s), and the halve-and-add processor with w = 2 uses fewer area resources than the other processors.

Comparison of the results with other works
To compare the performance of the designed cryptoprocessors with respect to the cryptoprocessors presented in the literature, Table 2 shows several design parameters and processing times, such as area resources, frequency, kP time and time-area product.However, it is important to mention that performing a fair comparison in hardware design is very difficult because there are other technical considerations, including the technologies, hardware platforms, software tools, scalar multiplication algorithms, finite field representations, and size of the fields.
From Table 2, it is possible to observe that the GF(2 163 ) cryptoprocessor presented in Mahadizadeh et al ( 2013) requires less time to perform the scalar multiplication than our processor based on the Lopez-Dahab algorithm because the first processor uses three digit-level multipliers, and our design uses two digit-level multipliers, and the latency to compute Madd and Mdouble is 3M.However, the first processor requires more area than our processor.Mercurio et al (2006) computes kP by using the half-andadd algorithm, m=163, polynomial bases representation and one parallel multiplier.Our processor requires more area than the mentioned processor because it uses two digit-level multipliers, but our design requires less time to perform the scalar multiplication, and the latency to compute the point addition is 5M.Finally, our processor is based on the Koblitz curves and has a higher performance (area and time) than the processor presented in Azarderakhsh (2013) because our design has a latency of 5M to compute the point addition, and it uses two digit-level multipliers and a window method that allows us to reduce the amount of point addition operations.From Table 2, it is possible to observe that the GF(2 163 ) cryptoprocessor presented in Mahadizadeh et al (2013) requires less time to perform the scalar multiplication than our processor based on the Lopez-Dahab algorithm because the first processor uses three digit-level multipliers, and our design uses two digit-level multipliers, and the latency to compute Madd and Mdouble is 3M.However, the first processor requires more area than our processor.Mercurio et al (2006) computes kP by using the half-andadd algorithm, m=163, polynomial bases representation and one parallel multiplier.Our processor requires more area than the mentioned processor because it uses two digit-level multipliers, but our design requires less time to perform the scalar multiplication, and the latency to compute the point addition is 5M.Finally, our processor is based on the Koblitz curves and has a higher performance (area and time) than the processor presented in Azarderakhsh (2013) because our design has a latency of 5M to compute the point addition, and it uses two digit-level multipliers and a window method that allows us to reduce the amount of point addition operations.

Conclusions
This work presents the design of elliptic curve cryptoprocessors to compute the scalar multiplication over GF(2 163 ) using the GNB.
The Lopez-Dahab, halve-and-add and w-NAF algorithms are used to design the cryptoprocessors, which are described using generic structural VHDL, synthesized on the Stratix IV FPGA (EP4SGX180HF35C2).
Considering the hardware verification results, the 16-NAF cryptoprocessor performs the scalar multiplication in less time (5.05 s), and the 2-NAF halve-and-add cryptoprocessor uses fewer area resources than the other processors, in this case, 22670 ALUTs.All the cryptoprocessors use roughly 17% of the ALUTs of the FPGA.
Additionally, it is important to mention that the algorithms are synthetized on the same hardware platform using Quartus II, are simulated in Modelsim, and are verified using SignalTAP and Matlab; the cryptoprocessors use two digit-level finite field multipliers over GF(2 163 ) in the GNB; the expansion coefficients for the private key k are obtained using the software Maple; and the FSMs use a data dependence graph to perform kP to achieve the minimal states.
Future work will be oriented to increase the performance of the designed cryptoprocessors and the hardware implementation of the GF(2 233 ) processors.Additionally, new cryptoprocessors will be designed based on elliptic curves that are not included in the National Institute of Standards and Technology (NIST), such as the Hessian and Edwards curves that perform the scalar multiplication kP.
Figure 1.Digit-level Multiplier Figure 2. Data dependence graph for Madd and Mdouble

Figure 4 .
Figure 4. Elliptic curve cryptoprocessor using the Lopez-Dahab algorithm can be generalized to a window-NAF.The NAFw of a positive integer k and w  2 is represented by the expression  = ∑   2  −1 =0

Algorithm 4 :
NAFw of a positive integer Input: Window width w, positive integer k.

Figure 13 .
Figure 13.Blocks of the finite field arithmetic

Figure 14 .
Figure 14.ASM chart of the main FSM

Figure 18
Figure 18.(a) Area resources.(b) Frequency.(c) Registers resources.(d) Time to perform the scalar multiplication of each cryptoprocessor.