Secure and Privacy Enhanced Gait Authentication on Smart Phone

Smart environments established by the development of mobile technology have brought vast benefits to human being. However, authentication mechanisms on portable smart devices, particularly conventional biometric based approaches, still remain security and privacy concerns. These traditional systems are mostly based on pattern recognition and machine learning algorithms, wherein original biometric templates or extracted features are stored under unconcealed form for performing matching with a new biometric sample in the authentication phase. In this paper, we propose a novel gait based authentication using biometric cryptosystem to enhance the system security and user privacy on the smart phone. Extracted gait features are merely used to biometrically encrypt a cryptographic key which is acted as the authentication factor. Gait signals are acquired by using an inertial sensor named accelerometer in the mobile device and error correcting codes are adopted to deal with the natural variation of gait measurements. We evaluate our proposed system on a dataset consisting of gait samples of 34 volunteers. We achieved the lowest false acceptance rate (FAR) and false rejection rate (FRR) of 3.92% and 11.76%, respectively, in terms of key length of 50 bits.


Introduction
Smart environments established by the development of mobile technology have brought vast benefits to human being [1]. Nowadays, mobile devices could be utilized not only for communication and entertainment but also for transaction [2], personal healthcare [3], or even in emergency situations [4]. As a result, more and more personal data are collected and kept in the mobile device for analysis [5], which would lead to increasing system security and user privacy concerns. Basically, security techniques for authentication and identification are commonly based on password (e.g., OTP [2]), token (e.g., ID cards), or biometric recognition (e.g., iris [6], fingerprint [7], face [8], and gait [9] recognition). Biometric based authentication mechanisms are more convenient in terms of end-user usage viewpoint when comparing with the two remaining methods of password and token. However, using biometric authentication on mobile devices should be considered carefully. Due to the fact that biometrics is unique but fuzzy and revocable, most conventional biometric authentication systems are developed based on pattern recognition and machine learning (PR-ML) algorithms to deal with the natural variations of biometric measurement [6]. Enrollment biometric templates or extracted features are stored under unconcealed form for matching with a new biometric sample to authenticate/identify users. This kind of approaches could leave critical vulnerabilities in terms of system security and user privacy, especially when it is implemented on mobile devices. These devices are easily lost so that an adversary could illegally access the mobile repository to obtain original biometric templates. Since biometrics is tied to unique characteristics of an individual which are hardly changed, the user privacy leak means an adversary could partly or fully determine the user's biometrics. From the viewpoint of system security, a compromise of biometric templates results in everlasting forfeiture. An adversary could utilize compromised templates to thereafter always illegally grant access to sensitive services.
In this paper, we introduce an authentication system based on biometric cryptosystem (BCS) to enhance the system security and user privacy on mobile devices. The biometric modality used in our system is human gait which is collected using an inertial sensor named accelerometer attached to the user's body. This type of sensor has been 2 The Scientific World Journal utilized to propose motivating applications in smart phones recently [3]. To the best of our knowledge, this is the first approach of a BCS using gait biometrics captured from the accelerometer. We utilize a fuzzy commitment scheme [10] whereby the key, acting as an authentication factor, is biometrically encrypted by the user's gait. The gait sample is merely employed to retrieve the cryptographic key and then be always discarded so that the system security and user privacy are significantly enhanced. Moreover, the system has significant advantages in terms of small storage space and low computational requirements. Therefore, it is more applicable to be deployed directly on mobile devices with limited resources, compared with other PR-ML based systems [9].
The rest of this paper is organized as follows. Section 2 presents the related works. Our proposed system is described in Section 3. Experimental evaluations are presented in Section 4. Finally, Section 5 draws our conclusions.

Related Works
To preserve the security and user privacy of biometric authentication systems, various modern approaches have been proposed [11], wherein biometric cryptosystems (BCSs) have attracted much research in recent years. State-of-the-art BCSs which were previously proposed mostly utilize physiological modalities such as iris [12], face [13], and fingerprint [14]. There are some studies that use behavioral biometrics such as signature [15] and voice [16]. Generally, BCSs could be classified into 2 subsystems including key-binding and keygeneration systems [11]. In key-binding systems, a random key string is generated and then bound with a biometric template yielding helper data. Such data are stored for further utilization to retrieve the key in the authentication phase. For example, Hao et al. [17] proposed an iris based BCS using fuzzy commitment scheme. They used 2048 bits of iris code combined with concatenated codes and achieved the false acceptance rate (FAR) and false rejection rate (FRR) of 0% and 0.47%, respectively, and the key length of their system is 140 bits. In contrast to key-binding systems-the key generation scheme-helper data is created directly only from the biometric template. Such helper data will associate with a presented query which is sufficiently close to the original template to generate either the unique key string or the original template. Typical techniques of such scheme are fuzzy extractor [18] and secure sketches [19]. Applications of key-generated scheme have already been implemented on iris [12] and voice [16]. Generally, approaches on physiological modalities achieved better results in terms of error rates and security level, compared with behavioral biometric factors. This is due to the fact that physiological modalities such as iris and fingerprint are more robust than behavioral factors which are significantly affected by various conditions. For example, human voice depends on the state of health, gait of individual changes over time, and so forth. Figure 1 sketches the specification of our gait based BCS using a fuzzy commitment scheme [10]. In the enrollment phase, gait signal of a user will be acquired and preprocessed to reduce the influence of the acquisition environment. Feature vectors are extracted in both time and frequency domains and then binarized. After that, a reliable binary feature vector is extracted based on determining reliable components. Concurrently, a cryptographic key , which is generated randomly corresponding to each user, is encoded to a codeword by using error correcting codes. The fuzzy commitment scheme computes the hash value of and a secured using a cryptographic hash algorithm ℎ and a binding function, respectively. The helper data which are used to extract reliable binary feature vectors and values of ℎ( ), are locally stored for later use in the authentication phase.

The Proposed Method
In the authentication phase, the user supposed to be will provide a different gait sample. It is also preprocessed to extract a feature vector and a reliable vector is extracted by using helper data which is previously stored in the enrollment phase. The decoding function computes the corrupted codeword via binding with and then retrieves a cryptographic key from using a corresponding error correcting code decoding algorithm. Finally, the hash value of will be matched with ℎ( ) for authentication decision.

Data Acquisition.
A Google Nexus One smart phone put inside front pocket is employed to collect user gait signal ( Figure 2). This discrete time signal is a sequence of combined values of gravity acceleration, ground reaction force, and inertial acceleration which are captured by a built-in 3dimensional accelerometer during walking. We present the output of this accelerometer as 3-component vectors where , , represent the magnitude of the acceleration values acting on three directions, respectively.

Data Interpolation.
As the accelerometer integrated in mobile devices is power saving and designed to be simpler than standalone sensors, its sampling rate is not stable and entirely depends on mobile OS. The time interval between two consecutive returned samples is not a constant. The sensor only outputs value when the acceleration on 3 dimensions has a significant change. The sampling rate of Google Nexus One used in our study is instable and fluctuates around 27 ± 2 Hz. Therefore, acquired signal is interpolated to 32 Hz using linear interpolation to ensure that the time interval between two sample points will be fixed.

Noise Filtering.
When accelerometer samples movement data by user walking, some noises will inevitably be collected. These could be yielded by idle orientation shifts or bumps on the road during walking. Moreover, mobile accelerometer produces numerous noises compared with standalone sensors since its functionality is fully governed by mobile OS layer. Hence, we adopt a multilevel wavelet decomposition and reconstruction method, specifically the  Daubechies orthogonal wavelet (Db6) with level 2, to filter the gait signal. In 1st level, original gait signal is decomposed into two separate parts containing coarse and detail coefficients. Such coarse coefficients acquired in the 1st level are then used as input signal to be decomposed in the next level. This process continues until the desired level is achieved. To eliminate the impacts of noise, in each level, we assign detail coefficients which are lower than a predefined threshold to 0. The noise-filtered signal is reconstructed conversely to the decomposition process, wherein coarse coefficients will associate with new detail coefficients starting from the lowest level until the zero level is achieved. Because walking is a cyclic activity, we segment a sequence of gait signal after eliminating noise to separate patterns which consist of consecutive gait cycles. A gait cycle is defined as the time interval between two successive occurrences of one of the repetitive events when walking.
We observed that whenever the human foot, which is on the same side as the device, touches the ground, the acceleration value in the vertical dimension signal changes obviously as illustrated as red points in Figure 3. We determined these points by calculating the autocorrelation coefficients = ∑ −| | =1 + on the vertical dimension signal and filtering vivid peaks based on mean and standard deviation. Then based on these points, we segment gait signals into separate patterns, in which each pattern consists of gc ( gc = 4 in our experiment) consecutive gait cycles of all 3 dimensions. Finally, a feature vector is extracted from each pattern in both time and frequency domains. (i) Average maximum acceleration (ii) Average minimum acceleration avg min = mean(min (GC )) gc =1 .
(iii) Average absolute difference (iv) Root mean square (v) 10-bin histogram distribution (vi) Standard deviation (vii) Waveform length where () is the time length of a gait cycle.

Feature Vector Binarization.
We adopt a quantization method which is previously used in [13] for face template binarization. Assume the number of users is denoted by . The number of feature vectors extracted from each user is . Let ( ⃗ ) , ( = 1 ⋅ ⋅ ⋅ , = 1 ⋅ ⋅ ⋅ ) be the th feature vector of the user ; the mean over intraclass variability ⃗ of the user is calculated as The mean over all feature vectors ⃗ in the enrollment phase is calculated by The The quantization method transforms th component in ( ⃗ ) , into {0, 1} by comparing th component of ⃗ with a specific threshold defined by corresponding tth component of . For each user , the binary feature vector is determined by In the enrollment phase, we use enrollment feature vectors to approximately estimate the value of ⃗ . This ⃗ is stored as the helper data and used as the specific threshold for binarizing real-valued feature vectors in the authentication phase.

Reliable Binary Feature Extraction.
As the authors pointed out in [13], when using the quantization method to transform real-valued vectors into the binary forms based on statistical analysis as in the previous section, components in ⃗ are significantly instable when using → and ⃗ to determine the output bit. For example, if the tth component of ( → ) is close to ( ⃗ ) , the error probability for the next verification will be higher. Therefore, it is necessary to extract only high robust and reliable bits among ⃗ . First, the variance 2 of each tth component for each user is calculated by Assume that the variability of components is modeled as a Gaussian. Then, the standard error functions of tth bit of the user are estimated as Indices of rel val (called rel idx ) are also stored as the helper data to extract reliable bits in authentication phase.

Key Binding Scheme.
We adopt the BCH code [20] as an error correcting code to overcome the natural variations between biometric measurements. The advantage of BCH code, compared with other codes, is that it can correct single errors which could occur randomly as in our extracted binary feature vectors. Moreover the decoding process of BCH code is designed to be simple. Therefore, it requires less computational capability and low-powered consumption so that our system is more lightweight to be possibly deployed on mobile devices. Let BCH 2 ( , , ) be a binary BCH code, where is the code length of bits, is the key length of bits, and is the error correction capability. The binary cryptographic key of length is generated randomly corresponding to each user and then is encoded into the codeword of length using a BCH 2 ( , , ) encoding scheme [20]. After that, we conceal this by binding it with the extracted binary feature vector yielding a secured and then discard . Since , are two binary strings, an exclusive-OR operator is adopted to bind these two strings together.
In summary, we represent all of the necessary steps in both enrollment and authentication phases in our system as follows.

Enrollment Phase.
(i) Select a BCH 2 ( , , ) by predefining parameters including the length of the codeword and the length of the secret key.
(ii) For each user , real-valued feature vectors ∈ R are extracted.
(iii) Determine a mean over all feature vectors ⃗ and extract a binary vector ∈ {0, 1} by using the quantization scheme. Then, discard .
(iv) Determine the reliable bit indices rel idx and reduce the length of to by only selecting first bits among based on rel idx .
(v) Store ⃗ , rel idx as helper data for further use to construct new feature vectors in the authentication phase.
(vi) Randomly generate a binary secret key with the length of .
(vii) Calculate the hash value of by using a cryptographic hash function ℎ (e.g., SHA) and store ℎ( ).
(ix) Bind with using exclusive-OR operator yielding . Then, discard and store .
Authentication Phase.
(i) For each user , feature vectors ∈ R are extracted from a new biometric sample.
(ii) Extract binary feature vectors with length of with the help of ⃗ and rel idx . Then, discard .
(iii) Bind with the stored using exclusive-OR operator to obtain a corrupted codeword .
(iv) Decode using a BCH decoding scheme to obtain a key from .
(v) Calculate hash value ℎ( ) using the equivalent cryptographic hash function (e.g., SHA) as in the enrollment phase and then discard .
(vi) Match ℎ( ) with ℎ( ); if ℎ( ) = ℎ( ), the user is authenticated. Otherwise, he will be rejected. on Android SDK. A total of 34 volunteers including 24 males and 10 females with the average age from 24 to 28 participated in our dataset construction. Each volunteer will perform around 18 laps. To make the dataset more realistic, we collect gait signals regardless of footgear and clothes. Volunteers are asked to walk as naturally as possible and change their footgear (e.g., sandal, shoe, or slipper) as well as clothes (e.g., short to long trouser, etc.) whenever they start a new lap. We only have a constraint that when volunteers perform walking, the mobile put in the pocket will not change its position and orientation. To ensure that, we request volunteers to wear trousers having a narrow pocket. Totally, we accumulated the gait signals of 34 volunteers, each having at least 16 realvalued feature vectors which could be extracted using the method in Section 3.2. In our experiment, each volunteer will have an equal number of the extracted feature vectors so that we randomly select 16 vectors for users having more than 16. Figure 4 represents the Euclidean distance distribution of extracted real-valued feature vectors. Note that the operation of our BCS is likely to be similar to a thresholdbased classification, in which the threshold is likely to be low according to an appropriate distance metric. We can see that the mixing area between intraclass and interclass real-valued feature vectors is large. Thus, applying threshold based classification on these vectors would lead to the high error rate in terms of FAR and FRR. Fortunately, when such vectors are binarized by using the proposed method in Section 3.3, the discrimination of binary feature vectors between users is likely to be higher and the Hamming distance of intraclass feature vectors is getting lower. Figure 5 illustrates the Hamming distance of binary feature vectors of lengths of 127 and 255, respectively. These values of length are selected to be appropriate with the design of the BCH code which allows the length of codeword to be equal to 2 − 1, ∈ N, > 3 and the maximum dimension max of feature vector which could be extracted in this study ( max = 289). As already stated, the length of binary feature vector must be equal to the length of BCH codeword for possible binding using an Exclusive-OR operator. Hence, the reliable bit extraction process in Section 3.4 will only select a number of reliable components identical to the codeword length. Looking into Figure 5, we can see that the Hamming distance of intraclass feature vectors of length of 127 is lower than in case of length of 255. We found that this is due to the fact that the actual number of bits being highly reliable according to (16) is just approximately half of the original feature vector dimension. Hence, to obtain a binary feature vector of length of 255, even low reliable bits are also selected. Figure 6 illustrates the error rates of our proposed gait based BCS using fuzzy commitment scheme corresponding to two codeword lengths of 127 and 255, respectively. In both cases, when the key length increases which is equivalent to the number of errors allowed in the codeword decreases, the FAR is getting reduced to 0 and the FRR exponentially increases. The best error rates of our proposed system are (1) in the case of codeword length = 127; the achievements of FAR and FRR are approximately 3.921% and 11.76%, respectively, in terms of key length = 50 bits. (2) In the case of codeword length = 255, we achieve the FAR ≈ 1.4% and the FRR ≈ 32.53% in terms of the key length = 55 bits. These keys are rather sufficiently long to be secured by a cryptographic hash algorithm. The FRR of codeword length = 255 is significantly higher than in case of codeword length = 127 because, as already stated, selecting many low reliable bits makes the binary feature vectors of length = 255 more dissimilar. However, the achieved FAR is slightly better (1.4% compared with 3.921%). In both cases, we can see that the FRRs are rather high which could decrease the friendliness of the system. However, user's gait could be captured continuously and implicitly by an accelerometer which does not make the user annoyed as other biometric modalities (e.g., iris, fingerprint, face, and signature). Therefore, this issue is not so considerable. Table 1 shows the performance of our proposed system compared to some other state-of-the-art BCSs using different behavioral modalities such as voice and signature. Note that all these works use different approaches and the dataset used is totally different so the comparison is just relative. Therefore, through this study, we would merely like to illustrate that human gait captured from inertial sensors could be utilized to construct an effective BCS as other behavioral modalities. Moreover due to the fact that we adopt a quantization scheme similar to [13], we also compare our system with this face based BCS. The authors achieved the key length of 58 bits, the FAR of approximately 0%, and the FRR of approximately 3.5% and 35% corresponding to two different datasets of CALTECH and FERET, respectively. We can see that face is a physiological biometric which is more robust than human gait, which is a behavioral modality. Hence, the performance of their system in terms of key length, FAR, and FRR is slightly better.

Conclusion
In this paper, we introduce an approach of gait based biometric cryptosystem using fuzzy commitment scheme.
The results show a good potential to construct an effective gait based BCS especially on mobile devices. The drawbacks of our work are that the error rates in terms of FAR and FRR are still rather high. We expect to achieve the FAR of 0% to make the system more secured. Hence, our further work will focus on reducing the error rates of FAR and FRR by constructing higher discriminant feature vectors using global feature transformations as well as finding an 8 The Scientific World Journal optimal quantization scheme for binarization. Moreover, the system security should be analyzed in depth to ensure that a gait based biometric cryptosystem could fulfill the security requirement in order to be deployed in reality. Finally, validating the proposed system on a larger public dataset is also our main further work.