Dual-Homomorphic Message Authentication Code Scheme for Network Coding-Enabled Wireless Sensor Networks

Network coding has shown a considerable improvement in terms of capacity and robustness compared to traditional store-and-forward transmission paradigm. However, since the intermediate nodes in network coding-enabled networks have the ability to change the packets en route, network coding-enabled networks are vulnerable to pollution attacks where a small number of polluted messages can corrupt bunches of legitimate messages. Recently, research effort has been put on schemes for protecting the transmitted messages against data pollution attacks. However, most of them cannot resist tag pollution attacks. This paper presents a new homomorphic MAC-based scheme, called Dual-Homomorphic MAC (Dual-HMAC), for network coding-enabled wireless sensor networks. The proposed scheme makes use of two types of tags (i.e., MACs and D-MACs) to provide resistance against data pollution attacks and partially tag pollution attacks. Furthermore, our proposed scheme presents low communication overhead and low computational complexity compared to other existing schemes.


Introduction
Wireless sensor networks (WSNs) have been recently used in a plethora of applications of many different areas, such as surveillance monitoring, environmental monitoring, traffic control, natural disaster prevention, and e-health.Due to their wide range of applications, they have already become so attractive to research community.However, WSNs are characterized by low communication bandwidth and power consumption constraints [1].Thus, network coding (NC) can be a promising solution for WSNs, since it is a technique that can provide network capacity improvement [2] and lower energy consumption [3].
NC was introduced by Ahlswede et al. [2].Nowadays, NC has been used in various applications over different networks, such as wireless mesh networks [4], wireless sensor networks [5], and peer-to-peer systems [6].In contrast to traditional and classical commodity flow, where information is only routed or replicated, information flow can also employ coding operations at intermediate nodes.In [7], random linear network coding (RLNC) was studied as a fully distributed method for performing network coding.Also, in [7], it is mentioned that there is a possibility for each node in the network to select a set of coefficients independently and randomly and use them to make linear combinations of the data symbols.
However, NC is more susceptible to data pollution attack than the traditional store-and-forward approach.If a data pollution attack is not detected at the forwarders, then the sink nodes will not be able to recover the source messages correctly.It is worthwhile to mention that even a small number of polluted messages can infect a large number of downstream nodes because the pollution propagates via recoding.Furthermore, NC-enabled networks are also vulnerable to tag pollution attack, which is a more sophisticated type of pollution attack.In tag pollution attack, the adversary targets at modifying (i.e., polluting) the tags carried by the messages rather than modifying the content of the messages.It is possible for a message with polluted tags to travel multiple hops until it is detected and discarded.Yet, this results in a waste of network bandwidth.

2
International Journal of Distributed Sensor Networks Thus far, several cryptographic schemes such as homomorphic hash functions, signatures, and MACs have been presented to secure network coding against data and tag pollution attacks (e.g., [8,9]).Furthermore, informationtheoretic schemes have been also proposed (e.g., [10,11]) to detect data pollution attacks.Although information-theoretic schemes are more efficient, in terms of computation, compared to the cryptographic schemes, they can only detect polluted packets at the sink nodes.Among all the above mentioned proposed schemes, Homomorphic MAC is considered as a low-complexity solution against data pollution [12,13].Specifically, a MAC or tag is a small piece of information appended to the end of the message packet.This piece of information is the output of a MAC function taking as inputs the message packet and a secret key.However, this solution is vulnerable to tag pollution attacks.
In this paper, we propose a new homomorphic MACbased scheme, called Dual-Homomorphic MAC (Dual-HMAC), for NC-enabled WSNs, in order to mitigate both data pollution attacks and partially tag pollution attacks.In our scheme, the source generates multiple MACs and Dual MACs (D-MACs) for each message.Each MAC ensures integrity of the transmitted message and each D-MAC ensures integrity of the MACs.In other words, by appending D-MACs, the proposed scheme mitigates partially tag pollution attacks.
The rest of the paper is outlined as follows.In Section 2, we give an overview of the related work on security schemes against data pollution attacks and tag pollution attacks in NC-enabled networks.In Section 3, the problem statement is discussed.In Section 4, our proposed Dual-HMAC scheme is presented.In Section 5, the security analysis of the proposed scheme takes place.In Section 6, its performance evaluation is given.In Section 7, a comparison among the communication and computation overhead of our work (i.e., Dual-HMAC) and the communication and computation overhead of the related works in [9,14] is provided.Finally, Section 8 concludes the paper.

Related Work
In this section, we present related work on security schemes against pollution attacks in NC-enabled networks.In addition, we present the key distribution mechanisms used by these security schemes.

Security Schemes against Pollution Attacks in NC-Enabled
Networks.Most of the research in the field of pollution attacks in network coding is focused on two types of security schemes: information-theoretic schemes [10,11] and cryptographic schemes [14,15].

Cryptographic
Schemes.These schemes rely on additional verification information which is added by the sources through cryptographic techniques.This allows intermediate nodes to verify the original messages and filter out the polluted ones.This category includes the homomorphic hashing schemes, the homomorphic signature schemes, and the homomorphic MACs.
(a) Homomorphic Hashing Schemes.Homomorphic hashing schemes are based on a homomorphic hash function which is applied by the source to the messages.These schemes rely also on additional secure communication channels in order to transmit the calculated hash values to the intermediate nodes.
A well-known homomorphic hashing scheme is proposed by Krohn et al. in [8] which enables the intermediate nodes to verify on-the-fly.The extension of this work was presented in [16] which is focused on network coding-enabled networks and further reduces the expensive computation cost of hash functions in each intermediate node by enabling cooperation verification.
(b) Homomorphic Signature Schemes.Homomorphic signature schemes were introduced for the first time by Charles et al. in [17] and have been based on Weil pairing over elliptic curves.The homomorphic property of the signatures in these schemes allows nodes to sign any linear combination of the incoming packets without contacting the signing entity.The calculation of signature covers the whole augmented message.In [18], Yu et al. proposed a homomorphic signature function which allows the relay nodes to verify the received messages by generating the signatures without contacting the signing entities.In Yu's scheme, the communication does not need any extra secure communication channel.Their experimental results show that its verification efficiency is improved up to 10 times compared to the other related work (i.e., CJLs scheme [17]).Moreover, in [10], an alternative lightweight scheme is proposed.It is much faster rather than Yu's scheme, but it is not as secure as the first one.
(c) Homomorphic MACs.Homomorphic MACs are based on appending some extra information to the original message.The basic definition of homomorphic MAC is defined by Agrawal and Boneh in [14] which allows the integrity checking of the network coded data.However, this idea is susceptible to tag pollution attacks.Kehdi and Baochun [19] presented a homomorphic MAC scheme to detect pollution attacks based on the subspace properties of random linear network coding, where null keys for verification are used.As a lot of null keys are used in each generation, a high bandwidth overhead can be incurred.The authors in [13] proposed an idea where the source attaches multiple MACs to each packet.This idea can be applied for XOR network coding networks; however it is vulnerable to tag pollution attacks.RIPPLE [20] was proposed to counteract against the tag pollution problem.This work is based on a symmetric key based scheme for network coding authentication.RIPPLE allows a node to efficiently detect corrupted packets and encode only the authenticated (i.e., verified) ones.However, the global synchronization among all nodes is the problem of this scheme.In [9], the authors presented a hybrid-key cryptography approach which could achieve authentication in the presence of both data and tag pollution attacks.To achieve this, a number of tags and a signature are appended to each transmitted message.However, the signature which is necessary for preventing tag pollution is a time consuming process.

Information-Theoretic Schemes.
The informationtheoretic schemes benefit from the reasonable computation performance in comparing to the cryptographic schemes, but they have two main drawbacks.First of all, they are not able to detect the polluted messages at the intermediate nodes of the network.For example, the work presented in [10], which extends the random network coding, relies on coding redundant information into the original messages allowing only the receivers to detect the polluted messages.Furthermore, the information-theoretic schemes are characterized by the limitation on the number of compromised nodes.These two drawbacks comprise the main motivation to interest to the cryptographic schemes.

Key Distribution Mechanisms in NC-Enabled Networks.
Nowadays, in emerging multicast communication systems (e.g., pay TV and video-conferencing services), the key distribution is a challenging issue.There are various proposed schemes for key distribution in multicast systems.Key distribution models can be categorized in trusted-server schemes [21], public-key schemes, and key predistribution schemes [22].The trusted-server and public-key schemes are infeasible to be used in WSNs as the sensors have limitations in terms of energy and computing processes.On the other hand, key predistribution schemes are feasible to be used in WSNs.In these schemes, the shared keys are distributed among the nodes of a group in such a way that every node in the group is able to compute individually a common key associated with that group.Two examples of key predistribution schemes are given as follows.
In MacSig which is an efficient subspace authentication scheme for NC and presented in [9], the double-random key distribution has been proposed.This proposed keys distribution model includes two random procedures.First of all, they assign each node with a random set of keys; next, the source node randomly selects a subset of keys from the key pools and uses these keys to generate the MACs.However, it is necessary to attach the indexes of these keys to each packet.Hence, a considerable bandwidth overhead is incurred.
In KEPTE (key predistribution-based tag encoding) scheme, a key distribution model for practical NC has been proposed in [23].According to this key distribution model, a Key Distribution Center (KDC) is responsible for the distribution of the keys.The KDC allocates  secret vectors at the source node to produce  tags.Moreover, it assigns two secret vectors to each node to check the correctness of its received packets.

Problem Statement
In this section, we provide our problem statement through a network coding-enabled WSN model and its corresponding adversary model.

Network Coding-Enabled WSN Model.
We consider a network coding-enabled WSN, as it is depicted in Figure 1, consisting of a source node , a number of intermediate nodes, and a set of sink nodes.Random linear network coding is exploited in this network.In order to model our network, we define a triple (, , ) which consists of the following components: (i) directed multigraph : we consider a pair of ,  as a directed acyclic graph where  and  are the node sets and edge sets of , respectively; (ii) source node : in our network model, we have a source  that wants to multicast its messages; (iii) non-source node : we define relay and sink nodes in a set of nodes which is defined as:  = {∀ ∈  − {}}.
Notations section summarizes the main notations used in this paper.
Prior to transmission, the source node partitions message packets into generations.Each generation consists of  messages packets denoted as u 1 , u 2 , . . ., u  .Each packet u  is represented as a vector of  symbols ( ,1 ,  ,2 , . . .,  , ) in the finite field F   .In [24], it is shown that  = 2 8 is usually sufficient for practical use and convenient for computation.Then, the source node generates an augmented packet u  by prefixing u  with the th unit vector of dimension  represented as After that, the source node transmits these augmented packets to its neighbour nodes.During transmission, an intermediate node buffers its received packets u  temporarily and creates an outgoing message  as linear combinations of packets u 1 , u 2 , . . ., u ℎ belonging to the same generation: where each   ∈ F  is a random coefficient chosen by each forwarder.A coded packet  is considered to be valid if it is in the linear subspace spanned by the original augmented packets, denoted as  ∈ Span{u 1 , . . ., u  }.In fact, when  is valid, these linear combination coefficients are the first Figure 2: A basic scenario of tag pollution.We consider a scenario which has a packet and 4 tags.We consider a node A as an adversary that pollutes the second tag  2 of the packet.In (a), we see that the next hop, node B, has two sets of keys ( 2 ,  3 ) and is able to detect tag pollution.As a result node B discards the packet.In (b), the next hop, node B does not have ( 2 ), so it allows the polluted packet to pass which results in wasted bandwidth; however, the hop, node C is able to detect the polluted packet.In the last scenario shown in (c), we demonstrate the case, where totally wasted bandwidth occurs since none of the nodes holds key  2 .symbols of the packet .Otherwise,  is invalid and it is denoted as  ∉ Span{u 1 , . . ., u  }, which may be caused by transmission errors or pollution attacks.After transmission, when a sink node has obtained  linearly independent coded packets, it can decode those using Gaussian eliminations, so as to recover the original messages [7].

Adversary Model.
For the above mentioned network coding-enabled model, we consider that the source is always trusted and there is no possibility to forge it, but the intermediate and sink nodes can be compromised.The adversary is able to wiretap all the data packets that are transmitted over a network.The adversary's goal is to achieve pollution attack.There are two types of pollution attack.

Data Pollution
Attack.An adversary can inject fake data packets into the network.The objective of data pollution attack is to pass the verification of other innocent nodes and to cause incorrect decoding at the sink node, as well as wasting of bandwidth.
Tag Pollution Attack.An adversary injects a corrupted packet consisting of correct data but modified tags to the network.The objective of tag pollution is to discard the correct data packets due to the corresponding corrupted tags.This results in the waste of bandwidth.A basic scenario of tag pollution attack is presented in Figure 2.

Proposed Scheme: Dual-Homomorphic MAC (Dual-HMAC)
In this section, we present our Dual-HMAC scheme, which mitigates data pollution and partially tags pollution attacks for a network coding-enabled WSN.The Dual-HMAC scheme's outline, the construction and the correctness of the scheme, and the key distribution model are provided in this section.
Our proposed solution is based on the homomorphic MAC solution, defined by Agrawal and Boneh [14], and the MacSig scheme proposed in [9].Additionally, our proposed scheme makes use of additional MACs, termed as D-MACs, in order to resist data pollution and partially tag pollution attacks.In contrast to the homomorphic MAC solution in [14], our proposed scheme is not susceptible to tag pollution attacks.Particularly, our proposed scheme can achieve resistance for the half of the tags against tag pollution attack with significant low computational complexity compared to the MacSig scheme, which uses a signature scheme characterized by a much considerable computational overhead.
Our scheme consists of three steps, as it is depicted in Figure 3, in order to generate the required MACs and D-MACs.In step 1, it computes a number of MACs (i.e., tags), for each message, according to the keys which are chosen randomly from the first pool of keys.Then, in step 2, it computes a number of D-MACs for the initial MACs, according to the keys which are chosen from the second pool of keys.Finally, in step 3, the computed MACs and D-MACs are appended to the message.More specifically, the main idea of our work is based on the orthogonal subspace property [9].

The Outline of the Dual-HMAC Scheme.
In this section, we provide description of the proposed Dual-HMAC scheme with a formal method.This scheme includes four steps: setup, tag generation, verification, and encoding detailed as follows.

International Journal of Distributed Sensor Networks
The basic idea of Dual-HMAC scheme which is related to the padding orthogonality property.

Setup
(a) Key Distribution Centre (KDC) distributes the following two sets of secrete key vectors to the source node : KDC also distributes the following two subsets of secrete vectors to all intermediate nodes and sink nodes: (c) Note that the subscriptions ,   , , and   represent the index of the key, which do not change in different subsets.In other words, if The detail of probabilistic key distribution algorithm will be described in Section 4.4, as well as how to choose the key set size. with its tags, this node checks the correctness of packet  using the algorithm Verify via its predistributed keys K   and K    .If the output is 1, then the received coded packet  is considered to be a correct one.Otherwise, this packet is discarded.

The Construction.
In this subsection, we provide the construction of the scheme.Particularly, we present the above-mentioned four algorithms, namely, MAC, D-MAC, Combine, and Verify, in detail.The algorithm MAC computes  tags for each data packet u  , and the algorithm D-MAC computes another   tags for the  tags.Moreover, the algorithm Combine generates new tags for RLNC encoded International Journal of Distributed Sensor Networks packet.Furthermore, the algorithm Verify checks the correctness of a data packet with all tags.Details of these four algorithms are described as follows.

𝑀𝐴𝐶(K
Theorem 1.The first verification is correct. Proof.According to the description of "Setup," since K   ⊂ K  , ∀k  ∈ K   , there is a key k  = k  and  = .As a result, by letting  = u  , k  = k  , and substituting  , =  u  , using (3), it is for sure that (6)   = 0, ∀k  ∈ K   .Similarly, since K    ⊂ K   , ∀k    ∈ K    , there is a key k   = k   and   =   .As a result, by letting  = u  , k   = k   , and substituting   ,  =   u  ,  using (4), it is for sure that ( 7) Theorem 2. The second verification is correct.
Proof.According to (3), the tags are created in a way that k  is orthogonal to the concatenation of u  and its  tags k  ⊥ (u  ‖  u  , ).This relationship is true for all packets in the same generation, which represents as Moreover, the algorithm Combine is a linear combination, which does not change the subspace.Hence, ( ‖  , ) ∈ Span{(u  ‖  u  , ) ℎ =1 }, k  ⊥ ( ‖  , ), and ( 6)   = 0, ∀k  ∈ K   .Similarly, we could derive     = 0, ∀k    ∈ K    .As a result, the output of Verify is 1, and the second condition is also correct.

Key Distribution Model.
In our work, we assume that a set of symmetric keys are distributed to all participant nodes in a secure and authenticated manner through a key International Journal of Distributed Sensor Networks

𝑁 − 𝑖
A representation of adversary behavior.Max hops shows the maximum hops which the polluted packet can travel through the network.We consider the worst case when the adversary node can have access to all the next  − 1 nodes keys (say, a chain of  compromised nodes; moreover, ( − ) is the maximum number of hops between the adversary and the sink node).
distribution scheme.In this section, we provide the model of the key distribution scheme that we have adopted.
A set system is a pair (, ), where  is a finite set of elements and  is an ordered set of subsets of .Definition A. A set system (, ) is called a  cover-free family (c-CFF) if, for any  sets  1 ,  2 , . . .,   ∈  and any other set  ∈ , one has We believe that if key assignment is done properly, no coalition of  nodes can fool another node.Our key predistribution scheme considers two key pools which are available at the source node.We use all these two sets for checking the message's integrity.It is shown in [25] that, in order to resist against less than  compromised nodes, the cardinality of the key size at the source must be ( + 1) times larger than those key sizes at the relay/sink nodes, which could be expressed as  ≥ ( + 1), and   ≥ ( + 1)  .More specifically, we claim that, by this model, we can be sure each verifier has some shared keys which can verify the integrity of messages.Our scheme assigns two sets keys.In addition,  is the security parameter and it stands as  = (N ⋅ (   )) −1 , where  is the total number of recipients (usually  = 10 −3 would be sufficient) and  is a fixed number of corrupted users.Every key k  or k    is included in a set of each verifier with probability of 1/2( + 1) (for simplicity, we consider the same length for both poll keys).

Security Analysis
In this section, we analyse the possibilities of an adversary launching pollution attack successfully.Hence, three possible pollution attack behaviours are analysed, and their security levels are quantified, respectively.We summarize our analysis in Table 1.Finally, the possible knowledge that an adversary can access is analysed.

The Resistance against Pollution
Attack.We consider the following three possible pollution attack scenarios.

1st Pollution Behaviour: Data Pollution
Attack.An adversary node intends to make changes in a message.This polluted message can be detected immediately by the next hop, or in a few hops later (maximum  − 1 hops later).It means the polluted message should travel some hops to be detected.The worst case happens whenever an adversary can do coalition with the other  − 1 compromised nodes.

2nd Pollution Behaviour: Tag (i.e., MAC) Pollution
Attack.An adversary node intends to make changes in the MACs.This attack can be detected immediately by the next hop if he cannot compromise the next hop and get the next hop keys, or in a few hops later (maximum  − 1 hops later).It is similar to the 1st pollution behaviour.[14] and [9].[14] MacSig [9] Dual-HMAC Proof.The probability of recovering the same shared key between two nodes is 1/2( + 1), so the probability of the key  from a user  located in next hop  + 1 is 1/2( + 1), and this probability would be (1/2( + 1)) 2 if this key is found in two hops later.In other words, if the total number of hops between an intermediate node and a sink node, named as downstream node, is ( − ), then the probability that a downstream user receives the same key as key  of user  is not less than [1 − 1/2( + 1)] − .

The Possible Knowledge of an Adversary.
As outlined in Section 3.2, we assume the source node is trustworthy and the process of key predistribution is secure.Hence, the secret key sets, K  and K   , assigned to the source node are considered secure.However, an adversary can wiretap all the data communication in a network and may compromise several relay or sink nodes.Hence, the adversary can get access to the received packets from the previous hops, the key information distributed to him by the KDC, and the key information stored at the compromised nodes.

Performance Evaluation
In this section, communication and computation overhead of the proposed Dual-HMAC scheme are presented.
6.1.Communication Overhead.In our scheme, each source message has  +  codewords and each codeword has log 2  bits.So, the bit-length of a source message is ( + )log 2 .The  MACs and the   D-MACs are appended to the source message.We consider log 2  symbols for each MAC and D-MAC.By fixing  +   = , our scheme has the following communication overhead: Hence, the Dual-HMAC scheme's overhead is equal to the scheme presented in [14].However this scheme is vulnerable to tag pollution.Additionally, the bandwidth overhead between our scheme and the scheme proposed in [9] is given by the following equation: Base on (11), the bandwidth overhead comparison between our scheme and the scheme described in [9], for different values of , is depicted in the graphs included in Figure 4.

Computation Overhead.
Recall that, to append  MACs and   D-MACs, ( +  + 1) and   ( + 1) multiplication operations for MACs and D-MACs are needed, respectively.In addition, for verification, we consider that all the relay nodes have shared keys and each node needs to verify the message.So, as shown in ( 6) and (7), our scheme needs (+  + 1) multiplications for MACs and   ( + 1) multiplications for D-MACs.

Discussion
The number of appended tags which is needed in [14] is .However, the total number of tags (homomorphic MACs and a signature) which is used in [9] is  + 1 + ( key indexes).
To conclude this section, consider the following.
(i) Dual-HMAC incurs a relatively low communication overhead which is almost equal to [14]; however, our scheme's overhead, according to (11), saves up to (1.25 + 1/) in comparing to MacSig approach [9].This means that our scheme provides at least 25% bandwidth saving.(ii) Dual-HMAC not only slightly increases the tag pollution resistance, but also decreases the computation overhead compared to the schemes in [9,14].
International Journal of Distributed Sensor Networks MacSig [9] Dual-HMAC MacSig [9] Dual-HMAC (iii) Dual-HMAC provides resistance for half of the tags against tag pollution attack.This results in reducing the possibility of tag pollution around 50% compared to the work presented in [14].
(iv) Dual-HMAC uses symmetric keys, in contrast to MacSig scheme using symmetric and public keys.

Conclusion
Network coding-enabled WSNs are susceptible to pollution attack where a small number of polluted messages can corrupt bunches of legitimate messages.This paper has presented a homomorphic MAC-based scheme, called Dual-HMAC, for network coding-enabled WSNs.The proposed scheme makes use of two types of tags to provide resistance against data pollution attacks and partially tag pollution attacks.The partial protection against tag pollution is related to the key predistribution model.Furthermore, our proposed scheme presents low communication overhead and low computational overhead compared to other existing schemes.Finally, as future work, we plan to propose a scheme that will mitigate not only partially tag pollution attacks but also tag pollution attacks.

𝑚:
The number of packets per generation u  : Th eth source message : The number of symbols of each packet : A finite field size (say, 2 8 ) : As e c u r i t yp a r a m e t e r( s a y ,10 −3 ) : Th en u m b e ro fM A C s   : Th en u m b e ro fD -M A C s : The number of source keys ( =  1 +  2 )   : Th eth MAC    : Th eth D-MAC : The number of verifiers : The number of compromised nodes K  , K   : The two polls of keys of source K   , K    : The two polls of keys of th node.

5. 1
.3.3rd Pollution Behaviour: Dual-Tag (i.e., D-MAC) Pollution Attack.An adversary  which is one of the  legitimate International Journal of Distributed Sensor Networks

Table 2 :
Communication and computation overhead of Dual-HMAC and the two related works in