Crypto Model of Real-Time Audio Streaming Across Paired Mobile Devices

The progressive usage of the Internet of Things (IoT) and smartphone ubiquity is inducing attention, importance and interest in leveraging smartphones for deploying and running sophisticated mobile applications. Modern smartphones have good connectivity, a significant amount of processing, and are always with us, making them an ideal candidate for envisioned applications that can serve as a means of ensuring the safety and security of mankind. This research aims at developing a real-time audio encryption application between two or more handheld mobile devices, the communication between devices is made secure by encrypting the audio sent in real-time using Advanced Encryption Standard (AES) 256-bit encryption key while channelling message through a cloud-hosted database (Firebase) that works as a real-time database and performs implicit AES encryption and decryption on its data. Geocoder from the Google Map API library services is used to track the location of the audio sender.


Introduction
Communication through mobile devices has developed from the use of single cell-phone to an advanced smartphone thus increasing functionality and user interactivity, due to this efficiency and reliability, it can be seen within anyone's reach.In this contemporary day of digital world, importance of networks, its effect and presence can't be neglected.The constant use of digital data in real life applications and its significance craved the need for a new way to ensure safety and usage of electronic gadget possessed by human.Information is power no matter how minute it is and the rapid advances in the development of computer technologies and internet has made the security of information as most important factor in information technology and communication hence the need for a secured information processing system.
The rate of abduction in the world especially in Africa is escalating, recently in Nigeria the recorded average rate of kidnapping and abduction is about three cases per week, while the number of individuals who become victims in these ugly incidences is above seven.In most cases recorded, their abductors will first of all seize their mobile phones and demand for ransom from their relatives.In cases where the victim is injured, they can easily access their phone SIM cards and retrieve the contacts of their close relatives or the last phone number the victim has called.Therefore, there is a need to focus on a way to reduce this threat being posed by hostage takers.In instances where distress occurs, the application could be initiated to take audio input of whatever is going on in the surroundings so as to proffer and effect proper solution to the distress call at an appropriate time.
With the help and advancement of Internet of Things (IoT), communication and interconnection through internet connected device enable user to send and receive data.This serve as an integrated and helpful way of embedding features that will allow sharing of meaningful audio to the selected contacts and designated receivers of the audio.If Mobile devices can be used for other purposes, why can't it be implemented for the security of mankind, as human security is essential in day-to-day living and within their reach wherever they find themselves located?This idea predominantly comes from the fact that people all over the world are connected through mobile phones that mainly uses audio or video as it main source of communication.The purpose of this research is to explore and design how a smartphone can be used in the safety and security of mankind along with its usual daily routine usage.It includes all of the fundamental tools and resources needed in order to achieve its development.Android based mobile devices is deployed for testing the features in the application.Other section of this paper contains relevant related works, system model and design, system implementation, conclusion and references.

Background and Related Works
Researches have been focused on this area, a brief of some related works that falls within are explained.
[1] implemented an encryption scheme for voice calls, in its implementation, an encryption scheme was provided based on RSA encryption standard that enable users to encrypt a voice call before transmitting it on the mobile network but using an intermediate server as a software to encrypt calls from two parties was not always applicable due to large cost and non-availability of hardware needed.
[2] was able to create a real-time voice encryption system where setup for a one-way communication system is achieved through scrambling voice in real-time with basic digital signal processing operations.However, scrambling of voice signal is not enough to guarantee voice security as the intensity of the different frequencies stays the same even though they are shifted.
[3] developed an android mobile application that allows real-time voice communication through short range local wireless network mainly Bluetooth and Wi-Fi.They were able to create a two-way radio transceiver using an android device that allows allow peers to establish a voice communication provided that devices are in range but the challenge with this method was that android devices need to be discoverable at all time in order to pair devices.
[4] proposed a technique where quaternion-based encryption and decryption of audio signal can be achieved using digital images as a variable key and cover for audio signal, the technique will compose of two processes, the encryption process at the transmitter part and the decryption part at the receiver part.Encryption and decryption are symmetric based and the proposed technique implemented using Matlab simulator.
[5] took a survey of various techniques that can be used for encryption of audio data to suggest a more secure method for audio encryption.From the research, they observed that selective encryption techniques were better than total encryption techniques as it takes less time with degradation of signal and time consumption on MP3 compression is less than total encryption.In the study, they were unable to make modifications into existing algorithms to make audio data more secure.
[6] proposed a technique for encrypting an audio file using combine approach of transformation and cryptography, transformation is used to convert the audio file from time domain to frequency domain using fast Fourier transform modification to encrypt frequency band and RSA for its encryption.
[7] proposed a lightweight encryption scheme for realtime multimedia transmission without loss of security and media quality of service using two block transpositions and XOR operation.The first transformation to be used to generate a key frame in order to seed frame and improve compression, the second transformation with XOR operation is the main encryption process.The outcome of their experimental research result with various MPEG-4 movies shows real-time transmission of the encrypted data without loss of quality of service and states it encrypts faster than AES encryption of MPEG compressed data but the idea of selective algorithm proposed for encryption provides lower usability than the naïve algorithms that encrypts all data.
[8] proposed a system for secured audio data transfer over the internet using steganography.Their idea was to develop an encryption process on secret speech signal data bits level to achieve great strength of encryption, embed the audio file inside the cover image using techniques of steganography and hide the audio as well as text message inside the cover image.To provide additional security noise is added to the audio file before it is hidden in the cover image, audio file encryption is carried out by adding high frequency noise bits at low frequency components of the signal, using least significant bits embedding method to hide the data and XORing secret key with binary data.This same process will be used to retrieve the speech file for a person with the correct secret key as new secret key is generated from the system every time.
[9] proposed five level cryptography in speech processing with Matlab using multi-hash and repositioning of speech elements to increase the security of audio data meant to be transferred on an insecure medium by creating a cipher signal which is routed through an insecure line to the receiving recipient.Microphone converts the sound eave into an electrical wave, the electrical wave is converted into digital audio using pulse code modulation, the speech is sampled and kept in a loop with a key table substitution process that substitute the speech elements with the key table elements that helps in repositioning the speech elements, repositioning is done in such a way that the end result is a cosine waveform and a new file is generated to be sent across the internet.The decryption process follows the inverse of the encryption process.

EAI Endorsed Transactions on Mobile Communications and Applications
Online First Crypto Model of Real-Time Audio Streaming Across Paired Mobile Devices 3 [10] implemented voice, video and text data through wireless by creating a communication system that allows android based smartphone users to send and receive messages, voice call and video call over the Wi-Fi range which requires neither internet connectivity nor messaging service from the mobile service providers.The base idea behind their system was to unify voice and data onto a single network infrastructure by digitizing voice signals, converting them into IP packets and sending them through an IP network together with the information gotten.It was able to reduce the cost of data transmission and communication within a fixed range providing zero cost communication through Wi-Fi.[11] carried out a research on secured data transmission using the different security approaches together.Their proposed work follows an approach based on the use of Encryption of data using AES, DES and Blowfish algorithm sequentially on the data and the use of steganography using LSB algorithm to embed the encrypted data file into frames to distract the attention of the attacker.
[12] carried out a research on the design and implementation of encrypted call application on android system using VoIP to transmit voice and convert it from an analog signal to a digital signal, RTP protocols to transmit digital packets over the network line and SRTP to add protection with the use the of AES algorithm and Zimmermann real-time transport protocol to generate key for each call.
[13] conducted a study on the current scenario of audio encryption by demonstrating various techniques for different applications.In his research work, it was found that all audio encryption techniques can be divided into either full or partial encryption, with a good amount of overhead increases with partial encryption and complexity in algorithm.
[14] worked on encryption of an audio file on lower frequency band for secure communication by taking a frequency domain of the wav audio signal for encryption and decryption using partial encryption approach and RSA technique to encrypt the important portion of the audio and DFT to transform it time domain audio signal to frequency domain audio signal.
[15] proposed a method to encrypt audio stream of data in mobile handset by applying chaos using a pair of onedimensional logistic maps to generate a chaotic encryption sequence for audio transfers between phones.

System Model and Design
Observing several researches brought about the idea of creating a real-time system that could be used in scenarios where emergency occurs and there is need to track location of an individual on a search basis and also allow sharing of recorded audio.
The requirement needed in developing the cryptomodel audio streaming mobile device are Android studio and Android SDK Tools.
The programming languages used are Java and JavaScript and the backend server implemented using Firebase.
Features incorporated into the application are:  Authentication and Token generation achieved in registration process for profile creation. Selecting contacts for sharing, contacts to be selected are saved on the user mobile phonebook and registered on the mobile application. Recording of audio (default of 3-5mins).A user (sender) is connected to other users by registering, once registered, the application provides the list of other active clients that can be chosen as designated receivers, receivers must have been registered to the application and saved to the contact phonebook.Before the message gets to the database server (during the process of recording, the message is encrypted in realtime as the transmission start using the system entropy in AES-256) and when part of the message gets to the server side (database), in a progressive mode, the server pushes a response to the receiver side (designated user) almost immediately as in form of notification for incoming message and the message is decrypted.On the receivers' application user interface, the location of the sender is tracked in longitude and latitude and can be mapped using the global positioning system online for further information.In the case where the sender is in danger, may be his/her mobile phone is seized or even destroyed by the abductors while he or she is being kidnapped, the last recorded voice data and exact position is already saved in the memory of paired devices and cloud database, which could aid emergency rescue team or law enforcement agents in knowing the exact time and location the incident happened for quick response.This is carried out by Firebase server.e.At the receivers' end, the audio is decrypted and the received data is decoded using an audio decoder.

How Encryption is achieved
The Javax Crypto Security Key library was used to handle the encryption of each created recorded audio file before real-time uploading to the cloud through network connection happens.This allow the secured sharing of each recorded audio and it decryption by selected contacts.AES Algorithm was used and a 256-bit encryption was used for the audio encryption, to generate the random passphrase keys for encryption and decryption, the system's entropy is used in line with block size of audio file.Advanced Encryption Standard (AES) algorithm is a block cipher algorithm, it encrypts each message received as a block one at a time producing its output one at a time.It uses a symmetric key encryption.In AES, each round consists of 4 layers; byte substitution using a substitution table, rows shifting of the state array by different offsets, column mixing of the data within each column of the state array and key addition but in it last round, mixed column is absent.AES supports three-key length size where the number of rounds depends on the key length as illustrated in the table below.

How Bit Manipulation Occurs
According to [16], all internal operations of AES are based on finite fields and a new number system is generated.Finite can be observed to be countable elements or elements having limit and a field can be deduced as a mathematical set or entities under the operation addition, subtraction, multiplication and inverse in its region.From one of the conditions that must hold for a finite field to exist which is if there are (P m ) elements where P is a prime number and M is a positive integer, if a finite field with 256 elements is to be considered, its representation will be given as FF(2 8 ) which falls under the AES Field.

Figure 5. Round Transformation
At the start of the cipher, the incoming audio (AI) is been broken down in bytes to fit into a state array (a two dimensional array of bytes (SR,C) which consist of 4 rows of bytes containing block length divided by 32).The four bytes in each column of the state array forms a 32-bit words, where all the bytes are interpreted as finite field elements following the modulus of an AES Irreducible Polynomial to ensure that the result can be thus represented as a byte not greater than the byte in the column array.
The Byte Substitution is used to provide confusion by taking the multiplicative inverse of (AI) in GF(2 8 ) then applying affine mapping matrix transformation to it and the inverse of the byte substitution when decrypting is the byte substitution where the inverse table is applied, obtained by the inverse of the affine mapping, then taking the multiplicative inverse in GF( 2  Row shifting is done to scramble the data obtained from BI(x).Here byte permutation or re-ordering of the byte is carried out.Bitflops is performed over a different numbers of bytes (offsets).In Column mixing, linear transformation occurs on the state instead of the rows, the data is spread across the data path.The columns are considered as a polynomial of GF(2 8 ), since each columns has 4 rows, Matrix transformation is treated as 4x4 followed by the addition of round key transformation added to the state by a bitwise XOR operation.Row shifting and Column mixing is done to provide diffusion.

Tracking Sender's Location
Integrating geocoder from Google map API service library to track location of audio sender, the Geocoder is localized for a given locales and the result given is a best guess.Google location listener package is used for the location updates and changes as audio recording happens.Real-time location update is possible provided there is internet connectivity in place.

System Implementation
The Android application called "Smartrec" is a mobile application with fast and helpful integrated feature to allow sharing of recorded audio among selected circle members.
In this study, the following devices were used for testing: On lunching application, the user interface is displayed with three sections, make application record and send to selected contact during onlunch, choosing designated receivers and user registration to be able to send and receive recordings.

Figure 9. Recording Interface
The MIC symbol is tapped to start recording, it can be stopped to automatically begin upload to its receiver or left alone to begin upload after the given record time has elapsed.
After the elapsed time given for audio recording or explicit tapping on MIC image to stop recording, the file is being processed and an automatic upload to the designated receivers is done.The mic image is tapped to decode and listen to the audio.The location beneath shows the location of the audio receiver and the location on each audio sent shows the location of the audio sender in longitude and latitude.

Conclusion
With rapid spread of smartphones incorporating a lot of features, Mobile devices becomes a very important aspect of every human life since it assists and enable us to access a large variety of services.This study presented an overview of secured smartphone audio transmission among users.It provides a suitable security to the audio by encrypting the audio through AES.Then, transmitting the audio through a cloud-based storage in real-time to designated receivers.Future research and development should be carried out, so that improvement can be made and users can be rest assured of the security of their data.Developers working on implementation should consider various survey and encryption techniques that can be incorporated with the specifics of the database server needed in order to achieve a complete functional and desired system.The system should be given higher level of enhancement by incorporating other mobile devices products running different operating systems.


Encoding and decoding of recorded audio. Location listener for location updates as recording of audio happens. Generating of Url for each encoded recorded audio  for identification and immediate upload to cloud database. Sharing of the uploaded encoded recorded audio among selected contacts. Instant service notification on received encoded audio for selected contacts.The figure below describes the system sequence of operation.

Figure 6 .
Figure 6.How bit substitution is performed

Figure 7 .
Figure 7. Generation of AES Cypher Key

Table 1 .
Key Block Round Combination for AES Algorithm