Privacy-Preserving Identity Management System on Blockchain Using Zk-SNARK

Privacy plays a crucial role in the internet era, where many applications allow people to communicate and use their services through the internet. Privacy-preserving Identity Management (PPIdM) system is a scheme that helps manage users’ identities and protects users’ privacy by enabling users to authenticate themselves without disclosing their real identities. The PPIdM system also allows users to reveal some minor identity attributes while others remain secret selectively. However, anonymity also encourages malicious users to break the system’s policy and commit crimes since their real identities are anonymous. Existing PPIdM systems use the identity provider (IP) as a medium to verify users’ identity attributes, record all users’ real identities, and ensure that malicious users’ identities are traceable. Therefore, users’ identities are hidden from all entities but the IP. However, the user’s privacy is vulnerable because there is nothing to guarantee that the IP is always honest and not curious about their users’ activities and private information. This paper proposes a PPIdM system on the blockchain that helps users manage their identity attributes and keeps their real identities secret from all entities, including the IP. Still, the system’s consensus can trace malicious users’ real identities if they violate the system’s policy. The PPIdM’s security requirements are analyzed and proved informally using the game-based proof scheme. The main idea of this study is to combine zk-SNARK, a type of zero-knowledge proof (ZKP), Shamir’s secret sharing (SSS), and several other cryptographic techniques.


I. INTRODUCTION
In the internet era, using online services from a service provider (SP) or communicating with other users in an online community sometimes requires users to have specific IAs, like age, gender, and additional information. In these cases, users must prove that their IAs are valid, and an IdM system [1], [2] is necessary. The IdM system is a system that helps users manage their IAs. There are two types of IAs: primary identity attributes (PIAs) and minor identity attributes (MIAs). A PIA is a unique attribute that can identify a user. Driver licenses, citizen identity, and passport numbers are typical types of PIA. On the other hand, MIAs are common attributes that do not specify a user. They are date of birth, gender, workplace, or other additional information.
The associate editor coordinating the review of this manuscript and approving it for publication was Wei Huang . Using the IdM system, a user must register and ask a trusted IP to verify his IAs. After verifying the IAs, the IP will issue a certificate indicating the user's IAs are valid. Afterward, the user can manage his verified IAs and use his certificate to prove the validity of the IAs when using the SP's services.
Because each user is identified by a PIA and the PIA must be shown in the certificate, the SP can track all their user's activities. Hence, this type of system lacks privacy. A PPIdM system does not include the user's PIA in the certificate. Instead, the IP will issue each user a pseudonym (PS) and have this PS in the user's certificate. Because a PS does not reveal any information about the user's PIA and only the IP and the PS's owner know to whom the PS belongs, the SP and other entities cannot track users' activities and service history. Nevertheless, if the IP is malicious, it can collude with the SP to track all the user's activities and service history. Therefore, this system is only ideal with an assumption: the IP is always honest and is not curious about the user's private information and activities.
A solution to the above issue is to keep the PS away from the IP. In other words, the certificate still comprises the PS instead of the PIA, but the IP does not know who the PS owner is. Therefore, there must be a method for the IP to verify a user's IAs and issue the user a certificate without knowing the user's PS. Anonymous authentication using zk-SNARK [3] is a good solution to this problem. However, if only the user knows the PS, this PPIdM system will lack traceability. It will encourage malicious users to commit crimes because there is no way for the IP to trace their PIAs. In addition, the zk-SNARK scheme is vulnerable to collusion attacks, in which multiple users can simultaneously use a certificate as the zk-SNARK's witness. Collusion attacks enable unregistered users to join the system by reusing other users' certificates. Therefore, an ideal PPIdM system should provide anonymity for honest users and traceability for malicious users. It should also prevent a user from sharing his certificate with other users. With this idea, we introduce a novel PPIdM system with the following contributions.
• We employ the public blockchain, the ElGamal encryption, the hash-based commitment scheme, and the zk-SNARK scheme to provide IAs-selective disclosure, anonymity, and unforgeability and counter the collusion attacks. We use the Shamir Secret Sharing (SSS) scheme [4] and the blockchain's consensus to provide traceability.
• We analyze the security of the proposed system based on essential requirements for a PPIdM system.
• We calculate the time complexity and the gas cost of the proposed system.
ElGayyar et al. [5] introduced a robust automatic blockchain-based federated IdM. Smart contracts automatically generate identities and audits for users. Users can control their identities and store them on the blockchain. However, this system's anonymity is entirely based on the anonymity of the blockchain. In addition, because this system employs smart contracts to encrypt and decrypt users' data or identities, users' identities may be visible to the public because of the transparent property of the public blockchain.
The system of Gao et al. [6] also allows users to hide their real identities from the public. First, a user authenticates himself by bringing a smartphone with a biometric authentication mechanism and his real identity to the IP. The IP generates a pair of asymmetric keys and embeds the secret key in the smartphone. It issues a certificate comprising the user's information and the public key to the user. Afterward, the user uploads this certificate to the blockchain network using his PS. Any transactions from the user must be signed by the secret key embedded in the smartphone, which is unlocked only by the user's biometric. This system is similar to ElGayyar et al. [5]. Users' identities are stored on the blockchain and visible to other users or SPs. The system's anonymity is based on the anonymous addresses of the blockchain.
Xu et al. [7] have the same approach as Gao et al. [6] for building an IdM system for mobile devices. A user must first provide his identity and public key to the IP. After the IP authenticates his identity and gives him a verifiable claim, he sends his authenticated identity and the verifiable claim to the network operator. The network operator will verify this claim and upload his identity and public key to the blockchain network. Afterward, all the blockchain network members can verify this user by querying his public key on the blockchain. Unlike Gao et al. [6], their scheme employs the consortium blockchain instead of a public blockchain. Besides the above system, schemes [8] and [9] have the same overall architecture. However, these systems have a low privacy level as users' identities are visible to everyone because they are stored in the blockchain network. The following systems can solve these problems.
In the system of Kassem et al. [10], storing the user's IAs in a permissioned ledger can prevent unauthorized SPs from accessing the user's IAs without a user's consent. Each user has an Ethereum account that maps to their IAs, verified by specified smart contracts. The SPs must gain the user's consent to access his permissioned ledger to verify his IAs. In the system of Faber et al. [11], each user has a personal off-chain database to store their identities instead of keeping them in the blockchain. The hashes of these data are uploaded to the blockchain as the addresses of their IAs. Service providers must have the user's consent to access a user's personal database to verify the user's IAs. However, users must use the same account for consuming on-chain services and receiving IAs and data from IPs. Nevertheless, the above researches have a common problem that decreases privacy. Although the users' real identities are hidden from the public, they must be visible to the IPs VOLUME 11, 2023 to trace the real identities of malicious users. In other words, the privacy of these systems strongly relies on the IPs. There is nothing to guarantee that the IPs will not collude with the service providers to link their real identities with the corresponding private data from the service providers. Similarly, the scheme [12] also has this problem.
To eliminate this disadvantage, Zhuang et al. [13] introduced a blockchain-based PPIdM system that enables users to hide their IAs from both the public and the IP but still allows the IP to trace their real identities if the users violate the system's policy based on the blockchain's consensus and SSS. This system works similarly to the scheme of Xu et al. [7] to achieve this goal, except that it divides the user's PIA into k shares using SSS instead of directly storing it in the blockchain network. A valid user can use his user identity (UID) to communicate with service providers. The service providers can verify if the user is valid by querying his UID on the blockchain. Notably, this UID does not disclose information about the user's PIA. If the user violates the system policy, the system can reconstruct his PIA based on k shares using SSS. However, this system strongly assumes that the IPs must delete the user's PIA and the associated UID. Otherwise, SSS is meaningless because the IPs can always map the UID to the corresponding PIA.
Based on the above literature, we can see that these systems' privacy is low or moderate. In addition, none of them support selective disclosure. Therefore, we addressed their existing problems by designing a privacy-preserved IdM system on the blockchain that enables users to hide their IAs from the public and the IP. The proposed system ensures that no single party can trace a user's PIA. When malicious users violate the system policy, it requires the agreement of a pre-defined number of parties to open the user's PIA.

III. PRELIMINARIES
A. SHAMIR SECRET SHARING Shamir Secret Sharing (SSS) [4] is an algorithm for protecting a secret in a distributed way based on polynomial interpolation over finite fields. Given a secret s 0 , SSS is a perfect (k, n)-threshold scheme that satisfies two following properties.
• The secret s 0 can be reconstructed using at least k shares with probability 1.
• The secret s 0 is information-theoretically hidden when trying to reconstruct the share using at most k −1 shares.
Let G = (g, q) be a group of order q, where g is a generator of G. The (k, n)-threshold SSS in a group G can be presented with two following functions.
• SharesGen(G, P k−1 , s 0 ) → S n , where P k−1 is a set of k − 1 numbers from p 1 to p k−1 randomly chosen in Z q and s 0 ← g w is the secret for a random w ∈ Z q . Then, S n is a set of n points s i = (x i , y i ) for i = 1, . . . , n, which are generated as follows.
. . . s n = (n, g f (n) ), The secret s 0 is reconstructed as follows.
B. ZK-SNARK Our proposed system employs the zk-SNARK scheme of Parno et al. [14]. We briefly recall it as follows.
• KeyGen(C) → {PK , VK }: this function takes an arithmetic circuit C and creates a pair of a proving key PK , and a verification key VK .
• Proof( w, x, PK ) → π: this function creates a proof string π (that consists of eight elliptic curve points) by taking as input a public input x, a witness w, and the proving key PK .
• Verify(π, x, VK ) → b: this function takes the proof string π, a public input x, and the verification key VK to verify that π is valid or not, and outputs a decision bit b. The zk-SNARK scheme of Parno et al. also satisfies the following properties.
• Completeness: If the prover generates a correct proof string π using a witness w, the verifier always accepts π.
• Succinctness: The size of the proof π is short, regardless of the size of the witness w and the public input x, and also π is efficiently verified.
• Zero-knowledge: There exists an efficient simulator that takes a proving key PK and a public input x, and outputs a simulated proof that is indistinguishable from a real proof (generated with a witness w).
• Soundness: Given a proving key PK , a verification key VK , and a public input x (but not the associated witness w), it is infeasible to generate a correct proof string to be successfully verified.

C. DIGITAL SIGNATURE
We recall the Edward-curve digital signature algorithm (EdDSA) [15]. EdDSA is a variant of the Schnorr signature based on performance-optimized twisted Edwards curves. Let G = (g, q) be a group (over an Edwards curve) of order q, where g is a generator of G. Then, EdDSA consists of the following three algorithms. • KeyGen sig (λ, G) → {SK sig , PK sig }, where λ is a security parameter, SK sig ∈ Z q is the secret key, and PK sig ← g SK sig ∈ G is the public key.
• Sign sig (m, SK sig ) → {R, s}, where m is a message to be signed. Given a hash function H : {0, 1} * → Z q , {R, s} is a signature that is calculated as follows.
where b is the decision bit and is decided as follows.
Since EdDSA is a variant of the Schnorr signature, it is easily shown (and well-known) that EdDSA is also unforgeable against chosen-message attacks based on the hardness of a discrete logarithm problem in G, assuming that H is modeled as a random oracle.

D. ElGamal ENCRYPTION
We briefly describe the ElGamal encryption scheme as follows. Let G = (g, q) be a group (over an Edwards curve) of order q, where g is a generator of G. The ElGamal encryption consists of the following algorithms.
parameter. The secret key SK is randomly chosen in Z q , and PK ← g SK ∈ G is the corresponding public key. • ENC(m, PK ; r) → c, where m ∈ G is a message and r is a random number in Z q . The ciphertext c = {c 1 , c 2 } is generated as follows: The decryption process is as follows.
The ElGamal encryption is known to be correct and secure against chosen-plaintext attacks, assuming the decisional Diffie-Hellman problem holds in G.
• Onewayness. It is computationally infeasible to compute a preimage from a hash value given.
• Collision-resistance. Finding two distinct preimages mapped into the same hash value is computationally infeasible.

F. COMMITMENT
A commitment scheme allows a committer to commit a value to keep it hidden from others and reveal the committed value later. A commitment scheme for our system is a simple and well-known commitment scheme based on a hash function. Suppose that a committer A wants to commit a value x to B. The hash-based commitment scheme can be represented as follows.
x is the committed value, and q is chosen randomly from {0, 1} l for some integer l. • A reveals his commitment by sending {x, q} to B. B then verifies A's commitment by checking whether C = H(x, q). The above hash-based commitment scheme satisfies the following two properties.
• Hiding. The commitment C should give no information about the committed value x.
• Binding. The committer A cannot change the committed value x once the commitment C is sent to B.

G. BLOCKCHAIN
Blockchain is a decentralized data structure implemented in a peer-to-peer network. Instead of storing data in one VOLUME 11, 2023 centralized database, the system keeps the replicated data in numerous peers (nodes). Each time new data are added, all nodes in the system update their replicated data. In the blockchain, data are sent and received in transactions. Blockchain has the following three main properties.
• Immutability: Once transactions are added, no single entity can alter these transactions.
• Decentralized: Data are stored in numerous nodes instead of a single node.
• Consensus: Transactions must be verified by more than half the number of nodes to be added to the blockchain. Basically, there are three types of blockchains as follows.
• Public blockchain: users are free to join and quit this system. Transactions in this system are truly transparent and visible to all nodes.
• Private blockchain: this system is managed by an entity. Users must have the permission of the manager to participate into. Transactions in this system are visible only to its members.
• Hybrid blockchain: this system combines public and private blockchain. Instead of one entity managing the system, all members manage the system and decide who can participate in the blockchain. They also determine which transactions are public or private. There are two types of consensus algorithms, Proof of Work (PoW) [16] and Proof of Stake (PoS) [17]. PoW allows all nodes in the blockchain to generate and verify transactions. It reduces the risk of blocks generated by malicious nodes by asking block miners to solve a hash puzzle [18]. On the other hand, PoS only allows some specific nodes (validators) to create and verify transactions. PoS reduces the risk of malicious validators by asking them to delegate an amount of money as a stake before joining the validating process. Malicious validators may lose their stake if they are detected based on the system policy.
The consensus algorithm is crucial in the blockchain. It decides which block will be added to the main chain among several blocks. A bad consensus algorithm will result in an insecure blockchain that is vulnerable to 51% attacks [19] and censorship attacks [20]. The typical consensus algorithm in most blockchain networks is Byzantine Fault Tolerance (BFT) [21]. To date, there have been many other BFT versions that improve and optimize the BFT's performance [22].
One of the most-used applications of the blockchain is smart contracts [23]. A smart contract (SC) is a program stored in the blockchain that runs when the predetermined conditions are met. Because the SC's programming language is Turing-complete in most blockchain networks, the SC can be used to implement various logical operations and applications. The proposed system uses an SC as the zk-SNARK's verifier and controls all the system's processes.

A. MAIN ENTITIES
There are six main entities in the proposed system: an identity provider (IP), validators, users, pseudonyms (PSs), an SC, and service providers (SPs) . Fig 2 briefly shows the relations of these entities in the proposed system. We describe these entities as follows.
IP. This entity can be a centralized or decentralized organization that issues user certificates. The IP is responsible for tracing the malicious users' PIAs.
SPs. These entities provide services for users. They can require users to have some specific IAs to use their services.
Validators. These entities are responsible for collecting, verifying, and wrapping transactions into a block and adding blocks to the blockchain. They are also responsible for keeping users' SSS shares and the PIA-opening process.
Users. The IP validates these entities and their IAs. Users use their PSs to manage their IAs, communicate anonymously in the proposed system, and use the services provided by the SPs.
PSs. They are public blockchain accounts (addresses) of users, validators, SPs, and the IP. These entities can use their PSs to sign and send transactions.
SC. The SC in the proposed system acts as a zk-SNARK verifier and helps store and query users' committed IAs.
There are two communication channels in the proposed system: the on-chain channel and the off-chain channel. We describe them as follows.
On-chain channel. This is the public blockchain network, where all messages are sent and received under the transaction form. In the proposed system, the public blockchain uses the PoS consensus algorithm. Validators collect transactions, verify them and add them to the blockchain. Communication on this channel is transparent and secure against several types of attacks.
Off-chain channel. This channel refers to all channels unrelated to the public blockchain network. In our construction, this off-chain channel includes communications with which users initially register them to the IP or users later have access to the SP.

B. ASSUMPTIONS
We assume the following statements to ensure that this paper is within our scope and focuses on our contributions.
• The off-chain channel is secure to the extent that the channel achieves its goal. For instance, attackers cannot gain information about users from communications between users and the IP.
• The public keys PK IP (for the IP), PK V n (for a set of validators V n ), and the zk-SNARK's proving key PK are public and shared in advance by all entities in the system.
• All entities use the same elliptic curve for zk-SNARK and the same group G = (g, q) based on a certain Edwards curve for the SSS, the ElGamal encryption, and the EdDSA signature schemes. These parameters are shared in advance.
• When working with the (k, n)-threshold SSS, it is assumed that at least k number of validators are honest and k > n 2 . • The SPs are audited entities. Their public keys and addresses on the blockchain are verified.

C. CIRCUITS AND SMART CONTRACT
This section describes arithmetic circuits, the SC, and the proposed system. We present all notations used in the proposed system in Table 2 for ease of reading.

1) ARITHMETIC CIRCUIT
The arithmetic circuit is the core of zk-SNARK. It describes the relationship between the prover's witnesses w and the verifier's pre-defined constraint x (as the arithmetic circuit's public inputs). We design two arithmetic circuits C and D. Circuit C is responsible for anonymous authentication, SSS distribution, and selective disclosure processes. The previous work [24] designed an anonymous authentication technique that supports a fixed number of users in their arithmetic circuit. In our proposed system, however, we enhance their technique by using EdDSA to support an unlimited number of users in a single arithmetic circuit. The details for the circuit C are described in Algorithm 1. Algorithm 2 represents circuit D for verifying the correctness of the ElGamal encryption, which is necessary for the PIA-opening process.
In an arithmetic circuit, the private inputs correspond to zk-SNARK's witnesses w. They are hidden, and the proof string π exposes no information about them. The set of public inputs x is visible to everyone. The verifier uses x and π to verify the relations between w and x declared in the arithmetic circuit.

Algorithm 1 Arithmetic Circuit C
Private inputs: where m i and r are two random numbers, P k−1 is a set of k − 1 random numbers for the (k, n)-threshold SSS (line 6, Algorithm 1), and Q t is another set of t random numbers for computing the commitment of X t (line 8, Algorithm 1).
PS i , as a prover, needs to convince the verifier that the following statements are true.
• PS i is the owner of a certificate cert i provided by the IP. This statement is proven without revealing the identity of U i .
Because root i is a hash value, it is fixed-sized, and any changes in X t or h i will result in a different root i . Line 3 verifies whether cert i is the EdDSA signature of root i . PS i needs to pass this verification to prove that he is the owner of cert i .
• m i has never been used before. This statement aims to enhance the above anonymous authentication by removing the threat of collusion attacks where U i shares m i with other users. Each π C will be identified byĥ i ← H(m i + 1). The verifier records the list ofĥ i . Any π C with a pre-usedĥ i will be rejected. Line 4 implements this statement.
• U i correctly generates S n from his PIA and then encrypts S n using PK V n . This statement allows U i to convince the verifier that he distributes the correct SSS shares in ciphertext generated from his PIA. These ciphertexts can be decrypted using SK V n . Because the ElGamal encryption works with an Edwards curve, line 5 maps the PIA to an elliptic curve point e i . Line 6 uses SharesGen to generate S n from P k−1 and e i . Line 7 encrypts s i using the ElGamal encryption algorithm, ENC, and shows that the result is identical to c i , which will be issued to a validator v i for i = 1, . . . , n.
• The prover commits correct IAs which the IP verified. PS i must convince the verifier that Y t is the set of correct commitments generated from X t . Later, the public Y t values are used to selectively disclose some of IAs related to U i . Line 8 computes the commitment of {x i , q i } and compares it with y i for i = 1, . . . , t, which will be stored on the blockchain.

b: CIRCUIT D
Because reconstructing PIA requires S k from V k using the (k, n)-SSS, there must be a way to prove that v i honestly sent its share s i without disclosing SK v i . We design circuit D to verify that validators honestly submit their SSS shares. As in Algorithm 2, circuit D requires v i to enter SK v i as the private input ( w D = SK v i ), and {c i , The circuit D describes the following statement: v i owns SK v i corresponds to PK v i (line 1), and s i is derived by decrypting c i using SK v i (line 2). If the above statements are true then b D = 1, else b D = 0.

2) SMART CONTRACT
We design an SC to implement zk-SNARK's function Verify of circuits C and D and other functions for the proposed system. Algorithm 3 describes the details of the SC.

3:
Require PK V n to be correct.

14:
Require c i to be in Table 1 . 15: Add s i to Table 3 . 16: Function queryPIA(PS i ): 17: Require PS i to be in Table 3 .

18:
Return S k . 19: Function queryMIA(PS i , y i ): 20: Require PS i to be in Table 2 .

21:
If: PS i has y i : return true.

22:
Else: return false. As shown in Algorithm 3, the SC has three tables described in Fig. 3 and Fig. 8. The SC's functions are as follows.
• authentication(π C , x C ): this function takes as inputs π C and x C , where It requires that (1) b C = 1, (2) PK V n are correct, (3) the zk-SNARK's function Verify outputs 1, and (4)ĥ i has not been used before. If all these conditions hold, this function adds {PS i ,ĥ i , C n } to Table 1 and {PS i , Y t } to Table 2 .
• queryPS(PS i ): this function takes as input a PS and checks if it is authenticated. If the queried PS is in Table 1 , it returns true and vice versa.  Table 1 , this function appends s i to Table 3 as the associated share with the target PS.
• queryPIA(PS i ): this function takes as input a PS and returns the corresponding S k if the PS is in Table 3 . The  IP can call this function to reconstruct the PIA of the target PS.
• queryMIA(PS i , y i ): this function receives a PS and y i . If the PS is in Table 2 and associated with y i , it returns true and vice versa.

D. SYSTEM DESCRIPTION
Step 0. System initialization. The IP initializes the system as follows.
In the off-chain channel: • Create the circuits C and D described in Algorithms 1 and 2. In the on-chain channel: • Construct and deploy the SC described in Algorithm 3 to the public blockchain. Step 1. User registration. In this step, the IP verifies that U i 's IAs are valid, and the PIA identifies U i . Afterward, the IP issues U i a certificate cert i . Fig. 4 describes the details of this step.
In the off-chain channel: • U i chooses a random number m i and compute h i ← H(m i ).
• U i sends {X t , h i } to the IP.
• The IP verifies {X t , h i } to ensure that X t is valid with respect to U i and h i has never been used before. Afterward, the IP computes root i and cert i .
• The IP computes e i ← g PIA ∈ G. • The IP stores e i and the PIA to its data storage. • The IP sends cert i to U i . Step 2. Anonymous authentication, distribution of SSS shares, and IAs commitment. In this step, PS i authenticates himself to the SC, distributes C n to V n , and stores Y t to the blockchain storage. Fig 5 describes the overview of this step.
In the off-chain channel: • U i chooses a random number r for the ElGamal encryption and two sets of random numbers P k−1 = {p 1 , p 2 , . . . , p k−1 } for the (k, n)-threshold SSS and Q t = {q 1 , q 2 , . . . , q t } for the commitment of X t . • U i computesĥ i , S n , C n , and Y t such that: • U i uses the circuit C and zk-SNARK's function Proof to generate π C .
In the on-chain channel: • PS i sends π C and x C to the SC by calling the SC's function authentication(π C , x C ).
• After verifying π C and x C , the SC adds {PS i ,ĥ i , C n } to Table 1 and {PS i , Y t } to Table 2 . All the PSs in Table 1 are meant to be authorized users. They can selectively disclose their committed IAs stored in Table 2 .
• v i takes c i from x C , and decrypts c i to get s i in the off-chain channel as follows.
Step 3. Selective disclosure. If the SP requires an additional MIA from the PS i , PS i can disclose the MIA x i from X t . Fig 6 describes the overview of this step. • The SP computes y i ← H(x i , q i ).
• In the on-chain channel, the SP calls the SC's function queryAtt(PS i , y i ). If PS i has the commitment y i , the function will return true, and the SP can verify that PS i has x i . Step 4. PIA-opening. In case PS i is malicious and violates the system's policy, the IP can ask v i to send s i and reconstruct PS i 's e i to find the PIA associated with PS i . Fig. 7 describes the overview of this step, and particularly Fig. 8 describes the structure of the PIA-opening process.
In the on-chain channel:  Table 2 if the zk-SNARK's function Verify(π D , x D ) returns true.
• After openPIA() is called k times by V k , the IP can call the function queryPIA(PS i ) to get S k . In the off-chain channel: • The IP reconstructs e i using S k and the SSS Reconstruct algorithm as e i ← Reconstruct(S k ).
• The IP finds the PIA in its data storage associated with e i .

V. SECURITY
This section defines security requirements for PPIdM system, and proves that the proposed system satisfies them.

A. SECURITY REQUIREMENTS
Referring to security models of group signatures [25], we define security requirements for a PPIdM system as follows.
Unforgeability. Unforgeability captures that the system should prevent attacker A from (1) changing the registered IAs, (2) passing the authentication process by using incorrect witnesses, or (3) reusing another user's witnesses.
Anonymity. Recall that U i 's identity is associated with his m i and PIA. Anonymity means that the proposed system should prevent A from finding information about m i and the PIA associated with U i .
Traceability. Traceability implies that A cannot (1) stop the PIA-opening process once triggered or (2) cheat the PIA-opening process to trace back another user instead of the target user.

B. SECURITY ANALYSIS
This section analyzes the security of the proposed system and proves that it satisfies the security requirements mentioned above.

Theorem 1. If the hash function H is collision-resistant,
EdDSA is unforgeable against chosen-message attacks, zk-SNARK satisfies soundness property, the public blockchain is immutable, and the commitment scheme is binding, then the proposed system provides unforgeability.
Proof. We use the game-based proof strategy to create a series of games and prove that the real game is indistinguishable from the final game, where the probability that adversary A succeeds in breaking unforgeability becomes negligible.
• Game 0 is the real game where A tries to join the system without registering his IAs to the IP or masquerading as other users.
• Game 1 is identical to Game 0 , except that no collision on the hash function occurs. Because H is collision-resistant, Game 1 is indistinguishable from Game 0 .
• Game 2 is the same as Game 1 , except A can generate an EdDSA signature without SK IP . Because EdDSA is unforgeable against chosen-message attacks, the probability of forging an EdDSA signature is negligible. This means A cannot use an invalid EdDSA signature to convince the SC that the IP has verified him. Therefore, Game 2 is indistinguishable from Game 1 .
• Game 3 is identical to Game 2 , except that A succeeds at generating π C without satisfying lines 3, 4, 7, and 8 in Algorithm 1. Because zk-SNARK satisfies the soundness property, the probability of generating π C from invalid witnesses  Table 1, associated with the target PS. (2) If it is false, the process stops. (3) If it is true, s i will be associated to the target PS in Table 3. (4) If the target PS is in Table 3 has S k , the IP will query for it and reconstruct e i from S k . (5) Afterward, it finds the corresponding PIA in its data storage.
is negligible. Therefore Game 3 is indistinguishable from Game 2 .
• Game 4 is identical to Game 3 , except that A succeeds at modifying the blockchain data. Because the public blockchain is immutable, A cannot change his registered IAs stored in the blockchain. Therefore, Game 4 is indistinguishable from Game 3 .
• Game F is identical to Game 4 , except that A succeeds at changing his committed value when verifying his commitment. Because the commitment scheme is binding, A cannot change his committed IAs when revealing them to the SP. Therefore, Game F is indistinguishable from Game 4 .
We can observe that, in Game F , A cannot use invalid witnesses or collude with other users to reuse their witnesses. Also, once A's IAs are on the blockchain, A cannot change them due to the blockchain's immutable property. Finally, the binding property of the commitment scheme does not allow A to change his committed IAs in the commitment-revealing process. Hence, under these assumptions of Theorem 1, the proposed system provides the unforgeability property.

2) ANONYMITY
Theorem 2. If the hash function H is oneway, the public blockchain is anonymous, zk-SNARK satisfies the zeroknowledge property, the ElGamal encryption is secure against chosen-plaintext attacks, the commitment scheme is hiding, and the (k, n)-threshold SSS satisfies the correctness property, then the proposed system provides anonymity.
Proof. We create a series of games from Game 0 to Game F as follows.
• Game 0 is a real game where A tries to find the PIA associated with PS i .
• Game 1 is identical to Game 0 , except that A succeeds at finding the preimage ofĥ i ← H(m i + 1), which is public on the blockchain. Because H satisfies the onewayness property, it is infeasible to discover the preimage of a given hash value. Hence, Game 1 is indistinguishable from Game 0 .
• Game 2 is identical to Game 1 , except that A succeeds at finding the owner of a PS by analyzing the PS on the public blockchain. Because the public blockchain is anonymous, generating a PS does not require information about the owner's identity. Therefore, A cannot gain information about the PS's owner, and thus Game 2 is indistinguishable from Game 1 .
• Game 3 is the same as Game 2 , except π C is simulated without the correct witnesses. Because zk-SNARK is zeroknowledge, π C only shows that it is associated with a PS without revealing information about its witnesses. Therefore, Game 3 is indistinguishable from Game 2 .
• Game 4 is identical to Game 3 , except that A succeeds at decrypting C k = {c 1 , . . . , c k } without the corresponding However, because the ElGamal encryption is secure against chosen-ciphertext attacks, it is infeasible for A to decrypt C n without corresponding SK V n . Therefore, Game 4 is indistinguishable from Game 3 .
• Game 5 is identical to Game 4 , except that A succeeds at finding the committed value from the commitment. Because the commitment scheme is hiding, Game 5 is indistinguishable from Game 4 . In this case, A cannot find the committed x i from commitment y i , which is public in the blockchain and associated with the target PS.
• Game F is identical to Game 5 , except that A succeeds at reconstructing the secret generated by the (k, n)-threshold SSS using less than k shares. Because the (k, n)-threshold VOLUME 11, 2023 SSS satisfies the correctness property, A cannot reconstruct the secret e i ← g PIA using S l , where l < k. Therefore, Game F is indistinguishable from Game 5 .
We can see that Game F is the final game where A cannot gain information about the PS's PIA or m i even when A can obtain the PS's π C ,ĥ i , C n , Y t , and S l , where l < k. Therefore, the proposed system provides the anonymity property.

3) TRACEABILITY
Theorem 3. If the system satisfies the unforgeability property, zk-SNARK is sound, the ElGamal encryption is correct, the public blockchain is immutable, and the (k, n)-threshold SSS is correct, the proposed system provides traceability.
Proof. We also create a series of games from Game 0 to Game F .
• Game 0 is a real game where A tries to prevent the IP and validators from finding the PS's PIA.
• Game 1 is identical to Game 0 , except that A succeeds at masquerading as another user or joining the system without registering A's IAs to the IP. Because the proposed system satisfies the unforgeability property, Game 1 is indistinguishable from Game 0 .
• Game 2 is the same as Game 1 , except that A succeeds at generating π C without correct witnesses. Because zk-SNARK satisfies the soundness property, Game 2 is indistinguishable from Game 1 . This means that A cannot omit C n when generating π C or calling the SC's function authentication(π C , x C ) in Step 2.
• Game 3 is identical to Game 2 , except that A succeeds at breaking the correctness property of the ElGamal encryption. Because the ElGamal encryption is correct, DEC(c i , SK v i ) always returns s i if c i is generated using ENC(s i , PK v i ; r). Therefore, Game 3 is indistinguishable from Game 2 .
• Game 4 is identical to Game 3 , except that A succeeds at deleting data on the blockchain. Because blockchain is immutable, Game 4 is indistinguishable from Game 3 . Therefore, A cannot delete its C n or Y t once they are stored in the blockchain.
• Game 5 is identical to Game 4 , except that A succeeds at generating π D without s i and SK v i in terms of validator v i . As before, because zk-SNARK satisfies the soundness property, Game 5 is indistinguishable from Game 4 . A cannot upload incorrect s i to the SC because A cannot generate π D using s i .
• Game F is identical to Game 5 , except A succeeds at breaking the correctness property of the (k, n)-threshold SSS. Because the SSS is correct, Game F is indistinguishable from Game 5 .
We can see that, in Game F , A cannot break the PIAopening process. As a validator, A cannot send invalid s i to the SC to break the PIA-opening process. As an user, A cannot masquerade as another user or join the system with invalid witnesses because of the system's unforgeable property. The correctness property of the SSS and the ElGamal encryption ensures that A can neither generate nor distribute invalid SSS shares to validators in the authentication process. Hence, the only way for A to join the system is honestly using his PIA to generate and distribute SSS shares. In addition, because the SC keeps his C n and Y t in the blockchain storage, A cannot delete them. Hence, A's secret is always reconstructable in Game F , and both the requirements (1) and (2) of the traceability property are satisfied. Therefore, the proposed system provides traceability.

VI. PERFORMANCE EVALUATION
This section shows the simulation results of the proposed system in terms of performance times and transaction costs. We simulate the proposed system according to the off-chain and on-chain tasks that each entity has to perform. Table 3 presents the parameters for the off-chain simulation, and Table 6 shows the parameters for the on-chain simulation.

A. OFF-CHAIN SIMULATION 1) SIMULATION TOOLS FOR ZK-SNARK
In this simulation, we employ Zokrates [26], a tool that supports implementing zk-SNARK. Essentially, three algorithms of zk-SNARK described in Section III can be represented by five Zokrates tasks.
• Compile(C) → P, where C is the arithmetic circuit C, and P is a set of polynomials. This function uses the quadratic arithmetic program (QAP) to transform the circuit C into polynomials P.
• Setup(P) → (PK , VK ), where PK and VK are a proving key and a verification key, respectively. This function creates a pair of keys (PK , VK ) by implementing zk-SNARK's KeyGen.
• Compute − witness(P, a) → w, where a is a set of inputs to the circuit C (both private and public), and w is the witness.
• Generate − proof( w, PK ) → {π, x}, where π is the proof string and x is the set of public inputs. This function implements zk-SNARK's Proof.
• Verify(π, x) → b, where b is a decision bit which is decided after running zk-SNARK's Verify with π and x as inputs.

2) SIMULATION RESULT
We divide the off-chain simulation into two parts. The first part is the simulation of circuit C, in which anonymous authentication, SSS distribution, and selective disclosure processes are implemented. The second part is the simulation of circuit D, which implements the PIA-opening process. Table 4 presents the time complexity, space complexity and the frequency of the two parts' processes. We summarize the total results in Table 5. In Steps 1 and 2, we create {h i ,ĥ i , Y t , S n , C n }, where t = 9 and n = 10 in terms of U i . Afterward, we run two Zokrates tasks Compute − witness and Generate − proof to generate π C . The total time complexity of these processes is time U i = 38.04. Because v i only needs to decrypt c i to get s i in Step 2, the time complexity for v i is time v i = 0.032m. Because generating π C requires PK C , the total space for generating π C is 1,536,047,032 + 4,528 = 1,536,051,560 bits, where 4,528 is π C 's size in the JSON format. Other parameters have the same 254-bit size. Notably, S n and C n require 2,540 bits because n = 10.
In Step 3, we generate y i ← H(x i , q i ) to verify x i and q i in terms of the SP. The time complexity for the SP is time SP = 0.01j, where 0.01 is the time for generating y i and j is the number of times time the SP verifies an MIA. The size of a single commitment is 254 bits.

b: CIRCUIT D AND PIA-OPENING PROCESS
In this simulation, we create circuit D according to Algorithm 2. In terms of the IP, we execute two tasks Compile and Setup in Step 0 to generate PK D and VK D . Afterward, we reconstruct a secret share using function Reconstruct. Because opening a PS's PIA requires the IP to run function Reconstruct once, the time complexity for the IP is time IP = 2.83 + 0.55z, where 2.83 is the time for running Compile and Setup, 0.55 is the time for running function Reconstruct, and z is the number of times the IP implements the PIA-opening process. In terms of v i , we decrypt c i to get s i in Step 2. Afterward, we run Compute − witness and Generate − proof to generate π D and x D as in Step 4. Because each time opening a PS's PIA also requires v i to send s i once, the time complexity for v i is time v i = 1.58z. The size of PK D is 31,169,120 bits (38.9 Mb) and VK D is 27,048 bits (3.4 Mb). Because generating π D requires PK D , the total space for generating π D is 31,169,120 + 4,528 =   TABLE 4. The time complexity, space complexity, and the frequency of processes in the off-chain simulation, where m is the total number of users, z is the number of times the IP implements the PIA-opening process, and j is the number of times the SP verifies an MIA.  Although the time complexity and space complexity are large, most resources are spent for generating {PK C , VK C }, {PK D , VK D }, π C , and π D . Because these processes are implemented once (excluding π D ), the computational burden for the IP and users is acceptable. Generating π D is a computational burden (1.58 seconds and 38.9 Mb) for validators because they must do this process m times, where m is the number of malicious users. However, this burden can be migrated to malicious users after their PIA is opened, and they must pay for their PIA-opening fees.

B. ON-CHAIN SIMULATION
In this simulation, we evaluate the SC performance by interacting with its functions and showing the transaction cost. Our target is to decrease the transaction cost as much as possible.

1) SIMULATION RESULTS
Under the parameters given in Table 6, we construct the SC according to Algorithm 3, using the Solidity programming language. We deploy the SC to the Ropsten Testnet and interact with the SC using Remix IDE and MetaMask. Each function's interaction is implemented in the form of a VOLUME 11, 2023  transaction. The cost of calling SC's functions is summarized in Table 7. The details of the simulation are as follows.
The IP is responsible for creating and deploying the SC to the blockchain network. This deployment is implemented once. In addition, when a user violates the system's policy, the IP needs to query the user's SSS shares and reconstruct the user's PIA. Therefore, the cost for the IP in this process is cost IP = 3,977,797 + 32,696d, where 3,977,797 is the SC's deployment cost, 32,696 is the cost of calling function queryPIA(), and d is the number of times the PIA-opening process is implemented.
U i needs to run function authentication() to authenticate its PS i , distribute C n to V n , and add Y t to the blockchain. Because the function is implemented once, the total cost for U i is cost U i = 1,852,354.
v i needs to call function openPIA() to send s i to the SC. The total cost for v i is cost v i = 961,035d, where 961,035 is function openPIA()'s cost and d is the number of times the PIA-opening process is implemented. Because we use the (k, n)-threshold SSS, where k = 6 and n = 10 in this simulation, the total cost for opening a PIA is 6 × 961,035 = 5,766,210.
In terms of the SP, we run the function queryPs() and queryMIA() to check whether PS i is authenticated by the IP. The cost for the SP is cost SP = 26,008a + 51,941b, where 26,008 is the cost of function QueryPS(), a is the number of queries the SP issues to check a PS, 51,941 is the cost of function queryMIA(), and b is the number of queries the SP  issues to check a PS's MIA. Putting them all together, Table 8 shows the total cost of our on-chain simulation.
Compared with Ethereum's basic and most used transaction (token-transferring transaction), which costs 21000 gas, Fig. 9 illustrates how high the proposed system's transaction fees are.

VII. LIMITATIONS AND FUTURE WORKS
The first limitation is the computational power required for working with zk-SNARK. Our experiments show that resource-constrained devices (less than 4 Gb of RAM) cannot run zk-SNARK for the (k, n)-threshold SSS, especially when n is bigger than 15. The second limitation is that the decreasing number of validators can prevent the (k, n)-threshold SSS scheme from working normally. The increasing number of validators does not affect the (k, n)-threshold SSS with respect to previously joined users. However, if the number of validators that left the system is too large so that n < k, the (k, n)-threshold SSS and thus the PIA-opening process cannot work. The third limitation is from the fact that at least k malicious validators can collude to do the PIA-opening process in off-chain actions. Indeed, such malicious off-chain actions cannot be detected.
Based on the above limitations, future works can follow the direction of designing an efficient system using zk-SNARKfriendly hash functions and operations in the arithmetic circuit. Another direction would be to consider a novel mechanism for detecting off-chain actions of malicious validators using zk-SNARK and blockchain techniques.

VIII. CONCLUSION
This paper combines several cryptographic techniques to introduce a novel PPIdM system based on blockchain. Users' activities and service history are entirely hidden from all external entities. The proposed scheme provides anonymity by allowing users to authenticate themselves using zk-SNARK anonymously. The system's identity traceability utilizes the blockchain's consensus and the SSS algorithm. Selective disclosure is provided by using zk-SNARK and the hash-based commitment scheme. We calculated the performance of the proposed system by measuring the time complexity and space complexity in the off-chain channel and the computational power (gas cost) in the on-chain channel to show that the proposed system is efficient and realistic.