An Externally Auditable Identity Management System Using the Bitcoin Blockchain

The Personal Identity Management System (PIMS) proposed by Augot et al. is based on the approach of reducing the disclosure of personal information as much as possible by means of the zero-knowledge proof and the Bitcoin technologies. Even if these technologies are used, complete disclosure of personal information is practically impossible. This paper improves the PIMS in terms of auditability, such that the system makes on-chain histories traceable and provides possible leakage sources. The paper also discusses some ideas for reducing the Bitcoin transaction cost because the cost of writing data to the Bitcoin is expensive. 


I. INTRODUCTION
Nowadays, collection and leakage of personal information have been recognized as a big world-wide privacy issue [1].These activities are performed not only by companies who are pursuing their profits but also public institutions or even government agencies [2], [3].To protect personal information, the majority of recent research papers take an approach of making contracts about handling personal data such that individuals can gain maximum control of their usage [4]- [7].In today's Internet society, however, it seems to be infeasible for any organizations to guarantee nondisclosure of personal data.Therefore, in addition to this contract-based approach, the approach of reducing the disclosure of sensitive information as much as possible should be discussed in parallel.This paper purses the latter possibility.
When individuals must show their personal information, they should not present unnecessarily detailed information.Let us consider the case when a person buys a ticket with a university student discount.The ticket sales company requires only the fact that the person is being a student.However, if the person presents the certificate of student status, unnecessary private information, such as the university name, specialty, etc., may reveal at the same time.Generally, by means of a zero-knowledge proof, it is possible to present only the fact of being a student.
Augot et al. [8], [9] proposed a Personal Identity Management System (PIMS) based on zero-knowledge Manuscript received June 13, 2018; revised August 21, 2018.proofs, in which a user (USR) shows a Service Provider (SP) the existence of her personal data, whose validity has been confirmed by an Identity Provider (IP) through the rigid enrollment process.Most of exchanged messages among these three actors are stored on the Bitcoin network, so that the zero-knowledge proofs proceed in a stable, tamper-proof, and publicly-visible fashion.Their work, in particular, focuses on quick and efficient update and revocation of personal data.
Even if zero knowledge proofs are used, complete disclosure of personal information is practically impossible.If personal data leakage occurs, external auditors (AUDs) should investigate the incident.In this paper, we improve the PIMS in terms of auditability, such that the system makes on-chain histories traceable and provides possible leakage sources.We also discuss some ideas for reducing the Bitcoin transaction cost because the cost of writing data to the Bitcoin is expensive.
To clarify the scope of our paper, let us categorize personal data from the view point of how they are disclosed and how they are used.Personal data are disclosed directly by users, for example, through filling out registration forms, or by analyzing collected user activities such as browsing histories [10].The former and the latter happen between a USR and an IP and between a USR and a SP, respectively.This paper focuses on the former.Meanwhile, personal data are used just for identification or for providing services.This paper considers the former case because we discuss the model in which USRs do not reveal their data to SPs (by means of zero-knowledge proofs).
The remainder of the paper is organized as follows.In Section II, we overview the related work.In Section III, we discuss the technical background related to our proposal and briefly describe the work by Augot et al.In Sections IV and V, we detail our proposal.In Section VI, we describe some applications of the PIMS system and section VII concludes the paper.

II. RELATED WORK
In 2016, the European Union (EU) adopted a new General Data Protection Regulation (GDPR) that compels new obligations on service providers, reinforcing the protection of personal data [1], [6], [11].Since then, many literatures have presented a variety of ideas, many of which rely on the blockchain technologies because of their decentralized, immutable, and publicly-accessible nature.This section mainly focuses on recent identity management approaches that leverage the blockchain technologies.For a survey on this topic, see [12].
Zyskind et al. [4] presented a personal data management system using the blockchain technology, which is considered as an access control moderator.Users are allowed to be aware of data collected about them by service providers and how they are used.The Sora identity system in [13] uses key-value stores that are encrypted with a cryptographic key that is owned by the user, and hashes of the values of personal information are salted and put into a blockchain.Users can share their personal information of their own volition to institutions, such as banks or other companies, and those institutions can in turn cryptographically sign hashes of salted personal information, thus acting as a notary.In [11], Writh et al. discuss some legal problems related to blockchain technologies under the GDPR that will have to be addressed.In [14], the authors extend Hyperledger Fabirc to support private data using on-chain secure Multiparty Computation (MPC) protocols and describe their demo implementation.
All transactions on the Bitcoin are publicly visible, while for privacy protection of the users, Bitcoin addresses and their owners (individuals or groups in the real world) are not linked.The anonymity is further enhanced by mechanisms such as CoinJoin mixing techniques [10], [15].To make on-chain activities auditable by third parties, some mechanism is needed, which makes on-chain activities traceable.Suzuki et al. in [7] apply the blockchain technology to the client/server systems.The data sent by the client and the reply from the server are recorded in the blockchain, by which the request-reply sequences are verified by anybody and the complete security of the data are ensured in an auditable manner.In [5], the blockchain is used for reducing the workload of the auditors and for helping in minimizing fraud and optimizing the existing processes in auditing.The authors in [6] propose a blockchain-based data usage auditing architecture, which is the hierarchical identity based cryptographic mechanisms combined with blockchain infrastructures.

III. WORK BY AUGOT ET AL.
This section briefly introduces the Personal Identity Management System (PIMS) framework proposed by Augot et al. [8], [9] (we mainly focus on [9]).The system has three types of actors: an Identity Provider (IP), a Service Provider (SP), and a user (USR).Let us consider the case where a USR would like to receive service from a SP.First of all, the SP requires to present some information on 1 ,, n X X for identification, where 1 ,, n X X are n attributes of an identity of the USR (e.g., 1 X may be USR's name, 2 X USR's nationality, etc.).The USR registers the n attributes 1 ,, n X X with an IP and proves the SP's request without revealing j Xs  .The zero-knowledge proof scheme in the system is based on Brands idea [16].Briefly, let q be a prime number and G a group of order q .Group G may be the Koblitz elliptic curve secp256k1, the same G being used for the Bitcoin signature.Let 1 0 , , , n g g g G  .In the following Definition 1, prover P corresponds to a USR, verifier V a SP.An IP also calculates h in Definition 1 for verification.0 X is an auxiliary random number for security improvement.
Definition 1: , , , n g g g .To prove knowledge of a DLREP of h to a verifier V, a prover P performs the following protocol step.
1. P generates n secret, random numbers , where H is a one-way hash function.2. P computes , 0,1, , and sends them, as well as c to V. 3. The verifier V checks that the following holds.
In the selective disclosure in [16], the DLREP is shown to be useful for proving arbitrary satisfiable Boolean statements about j Xs  .For example, a USR is a citizen of a country AND is under 18 OR 65.Note that the scheme in Definition 1 relies on the one-way hash function and the hardness of the discrete logarithm problem.
The system is built on the Bitcoin blockchain.Let USR a , IP a , and SP a be Bitcoin addresses of a USR, an IP, and a SP, respectively.They may have different addresses (e.g.,  Fig. 2 shows a sequence for updating the identity of a USR when the update leads to recalculations of h that two providers IP1 and IP2 need to make.This may occur, for example, when the USR graduates from a school (IP1) and moves to a different city (IP2).Using each multisignature in the output of

IV. EXTERNAL AUDIT
In the previous work [8], [9], the registration of user identities is basically an arrangement only between a USR and an IP, and it is not in a form visible to third parties.As a result, it is difficult to inspect whether registration, revocation, updates, and service requests with respect to the registered data adequately proceed.Furthermore, even if a part of the registered information leaks, there is no way to confirm that the leakage actually occurred.We revise the previous work such that auditors (AUDs) are allowed to inspect such activities in the form of minimizing the amount of on-chain data.This paper assumes USRs and SPs consider IPs as trustworthy (e.g., IPs are considered to be like certificate authorities (CAs) in the PKI model), however, their activities must also be externally verifiable.
Table I shows the usage of OP_RETURN fields in three transactions.After the registration, the IP issues transaction PUBLISH TX that contains Audit-ref in the OP_RETURN field as shown in Fig. 3 (in the previous work, an h value obtained in Definition 1 is included), where Audit-ref is used to point an off-chain data.Throughout the paper, each off-chain reference is achieved by writing URL and the hash value or by using content-addressable storages [18].From Table I, Auditref points a file that consists of two parts: ANY and AUD.The ANY part includes an h value in Definition 1 and the AUD part has necessary information for auditing, such as all Bitcoin addresses of the USR (who registered her identities) and the IP (who published the PUBLISH TX for the USR) that have been used on chain until that time, the public key certificate of the IP issued by a CA, the contract between them, etc.The ANY part is visible by anyone; while the AUD part is encrypted and the decryption key is transferred to the AUD who is expected to inspect their activities.Note that both IPs and AUDs are considered basically as reliable but unnecessarily information related to the USR should not be stored in the AUD part.The AUD also verifies whether the personal information has leaked or not, and indicates the possible leakage sources, which includes the IPs who have the leaked data and the AUDs who were the positions to gain the data through auditing.Let k X  be a personal information that is doubtfully considered to be equal to the true information k X by the public.The following procedure checks whether k k X X   using Definition 1.

The USR (or the IP) sends the AUD
c , and all Bitcoin addresses of the USR and the IP on-chain until now, where Journal of Advances in Information Technology Vol. 9, No. 3, August 2018 indicate that a collision occurs, which contradicts the assumption that H is a one-way hash function.If expression (3) does not equal c , k X is not leaked.If there are multiple identities 1 ,, m k k X X   need to be checked, the above procedure is executed one by one for each identity or once using (2).Note that different random numbers   i a in Definition 1 should be used for each verification.
After the AUD recognizes that some personal information is leaked, the following procedure starts: 1.The AUD makes a revocation certificate, which includes at least the inspection date, the revocation reason (leakage), the information used in the inspection (e.g., h , k X  , k ,   j b , c ), all Bitcoin addresses of the USR and the IP.The revocation certificate with an AUD's digital signature for the certificate and the public key certificate of the AUD are sent to the USR (or the IP).

The USR (or the IP) stores received data off-chain.
The revocation certificate should be encrypted for privacy protection.

A. Multi-Identities
In Fig. 2 the USR answers the SP's request using identities registered with IP1 and IP2.Therefore, transaction REQUEST DOUBLE TX  has two multi-signatures MSIG1_2( (1)   USR a , 1 IP a ) and MSIG1_2( (2)   USR a , 2 IP a ), which are abbreviated for simplicity as MSIG1_2(1) and MSIG1_2(2), respectively.The two multi-signatures can be combined into one [19], [20], denoted as MSIG1_2(MSIG1_2(1), MSIG1_2(2)), which implies that we can expect to lower the cost incurred by the use of the Bitcoin blockchain.
Fig. 3 and Fig. 4 illustrate the details of five transactions and revised transaction sequence for identity update after making this modification, respectively.As show in Fig. 3  varies.We should also mention that in comparison with Fig. 2 and 4 only need to perform revoke once.

B. Transaction Modification
Let us see how

C. Cost Estimation
Let us first estimate the sizes of the transactions in detail.Tables II, III, and IV show the latest description of the transaction field sizes derived from [21], [22].The fastest and the most inexpensive transaction fee is currently 10 satoshis/byte, namely -6  10 BTC/byte [15].According to our proposal,

VI. APPLICATION
The Personal Identity Management Systems (PIMS) in this paper could be used in many organizations such as, government, company, school, hospital, etc.Let us consider the case a company provides workers of the company with identification certificates.These certificates contain various fields of user-related information, including their names, their status in their company, working records, etc.Individuals and companies can use these identities to certify themselves for various government agencies or others companies.In the PIMS framework, the workers correspond to USRs, the company to IPs.In this scenario, there are two cases when identity revocation occurs.The first is when a worker (USR) withdraws from the company.In this case, the company (IP) issues AUDs may be intelligent software agents that are incorporated into the PIMS system.We can monitor the above-mentioned activities through the agents, and if an information leakage event occurs, we can quickly identity the possible leakage points through the agents.

VII. CONCLUSION
In this paper, we have engaged in enhancing a newly proposed personal identity management system (PIMS), which has the attractive property of reducing the disclosure of personal information as much as possible based on zero-knowledge proofs and Bitcoin technologies.To make the system more practical, we proposed an auditability mechanism, by which on-chain activities can be monitored and possible leakage sources are presented without highly increasing the cost of using the Bitcoin blockchain.
To further reduce the disclosure of personal data, we need to consider the case where service providers need personal data for providing service.In this case, we cannot avoid presenting our data.However, privacy related new service could appear in the future, which plays a similar role as PayPal Holdings Inc. in that shopping sites do not have to know the credit card numbers if they can receive payment through PayPal.
the link between their transactions.Fig.1shows a typical Bitcoin transaction sequence during the interval from the time a USR registers her identity until the USR is accepted to receive service.Each transaction contains only the input and output address fields.After the USR successfully registers all j Xs  , the IP issues transaction PUBLISH TX that contains h described in Definition 1. the USR then publishes REQUEST TX that has   j b and c in Definition 1 using multi-signature MSIG1_2( USR a , IP a ), which indicates either the USR or IP can spent the amount of bitcoin sent to the multi-signature [15], [17].After the SP verifies that (1) holds, the SP sends ACCEPT TX and accepts service request from the USR.

Figure 1 .
Figure 1.A typical transaction sequence when USR is accepted to receive.
of bitcoin sent to each output address (see Section V).In Fig.

Figure 2 .
Figure 2. A transaction sequence of updating identities of USR when involving multiple identity providers.

Figure 3 .
Figure 3. Revised input and output fields of five transactions.The upper two correspond to PUBLISH TX

3 .
The USR (or the IP) delivers REVOKE TX for revocation if needed.The OP_RETURN field of this transaction includes Reason-ref (see Fig. 3), which points the location of the above certificates.

Figure 4 .
Figure 4.A revised identity update sequence.The combination of two multi-signatures reduces the number of REVOKE TX to the multi-signature modification and the number of Bitcoin addresses in the input and output fields of REQUEST DOUBLE TX  decreases, so that the size (therefore, the cost) of the transaction also decreases (Subsection C details the cost reduction).In other words, because the cost reduction implies a reduction of transaction fee f REQUEST DOUBLE  , as show in Fig.3the amount of bitcoin spent for request and accept transactions 3 are used and revised through the introduction of auditability and the multi-signature combination.REVOKE TX : Before update the identities, USR or IP needs to revoke the original identities first.As show in REVOKE TX in Fig. 3 the input address is MSIG1_2(MSIG1_2(1), MSIG1_2(2)) derived from the REQUEST DOUBLE TX  .The amount sent to IP a is revised because REVOKE TX appears only once per identity update.The OP_RETURN field is used for Reason-ref in TableI.

:
After the identities have been revoked, each IP sends UPDATE TX that contains Audit-ref in the OP_RETURN field, which point a new h value etc.The output address USR a is used just for indicating whose identity is updated.This transaction also uses multisignature MSIG1_2(MSIG1_2(1), MSIG1_2(2)) for cost reduction.Fig. 3 demonstrates both the input and output fields contain multi-signatures.This reduces the size of the transaction considerably.The OP_RETURN field includes Proof-ref, which point revised   j b and c for each IP.

.
The second is when a worker (USR) voluntarily requests transfer of the department or the company adjusts the worker's department.In this case, either the company (IP) or the worker (USR) issues a REVOKE TX and then the company (IP) transmits UPDATE TX .

TABLE I .
USAGE OF THREE OP_RETURN FIELDS.DS AND PK INDICATE A DIGITAL SIGNATURE AND A PUBLIC KEY CERTIFICATE, RESPECTIVELY

TABLE II .
STRUCTURE OF A BITCOIN TRANSACTION