DIGITAL FORENSIC ANALYSIS OF TELEGRAM MESSENGER APP IN ANDROID VIRTUAL ENVIRONMENT

The paper provides an in-depth analysis of the artifacts generated by the Telegram Messenger application on Android OS which provides secure communications between individuals, groups, and channels. Since the past few years, the application went through major changes and updates and the latest version’s artifacts varied from the previous ones. Our methodology is based on the set of experiments designed to generate the artifacts from various use cases on the virtualized environment. The acquired artifacts such as messages, their location, and data structure how they relate to one another were studied and were then compared to the older versions. By correlating the artifacts of newer version with the older ones, it shows how the application have been upgraded behind the scenes and by incorporating those results can provide investigators better understanding and insight for the certain evidence in a potential cybercrime case.


INTRODUCTION
Various social media applications over the internet nowadays provide a variety of interesting set of features from texting, group chatting, notifications, location, contacts, file sharing to statuses updates all free of cost [1]. This is especially accelerated as with the rise of 4, 5G era, making the old Short Message Service (SMS) out of age [2].
According to [3] the internet has 3.97 billion user -almost half the world's population with the highest usage in Asia. Applications are readily available at the any user's ease offering services at various levels and industries. This increasing usage with the rise of digital services has led to the unlawful acts as well [4]. Any violation of law committed using computer technology is now being identifies as cybercrime [5][6] [7]. One of the most serious of those crimes may include fraud, impersonation, and blackmailing.
In recent literature, smartphone forensics has been widely studied mainly focusing on Android and iOS platforms due to their pervasiveness [8]. As a result, a vast range of techniques and, methodologies are available that are used to extract and analysis the evidence from the smartphones [9]. We try to leverage these body of work for extraction and analysis of the Telegram application.
Anglano et al. 2017 [10] conducted forensic analysis on the Telegram application on Android. The authors reconstructed the contact list and messages that were exchanged between users. Using the logs file, they were able to map out the messages in chronological order with the details of who, when, the messages were exchanged and when they deleted. However, the hash functions were not used in this work.
Mahajan [11] carried out the forensic analysis on WhatsApp and Viber -two of the most popular instant messaging applications on the platform. They tried to find any data on the internal memory of the device i.e., messages or media files etc. However, the paper does not focus on the details of the artifacts or the evidence of their location.
Agrawal et al. 2019 [12] performed the forensic analysis on Facebook application on the virtualized android environment. They acquired the artifacts and performed user action and monitored the changes in device and over the network as well. The analysis only focused on the limited number of user activities.
Arshad et al. [13] explored the artifacts from Tor browser on the latest versions of Windows 10 and Android OS. They utilized storage, ADB logs. RAM and ZRAM on Android devices; a first-time analysis of an Android swap file. The investigation showed that a significant evidence extraction from these techniques can be achieved -approximately 60% and can be a considerable alternative to performing RAM investigations especially when there are privacy applications in question.
Asmara et al. [14] proposed a network forensic strategy for social messenger app named Signal that identify the artifacts from the encrypted network traffic. Proposed strategy can easily detect activities such as chatting, media messages, audio, and video calls by looking at the payload patterns from the from encrypted traffic.
Anwar et al. [15] investigated the location data collected by Google when the GPS is disabled on Android devices. They employed the live forensics on RAM with no evidence of such data, however they found some similarities in the data collected by the associated Google account which collects GPS locations from cellular networks, sensors and Wi-Fis with varying accuracy.
The paper focuses on the investigation into the digital forensics of social messaging applications available on Android OS smartphones [16]. These applications are available through Google Play Store and most of them are accessible free of cost. The goal of this paper to better help understand to forensic analysis of the application Telegram and the process through which and investigator may face during a cybercrime case.
The methodology of this paper can be summarized as follows: 1) Forensic methodology for an app running on Android smartphones.

2)
Interpret the acquired artifacts from the analysis with predesigned user activities.

3)
Furthermore, compare the results with the older version of the Telegram application.
In section II we map out our methodology used in the process. In, section III, we discuss the results of the forensic and finally, we present the conclusions of this work in section IV.

METHODS
Given the objective of the forensic analysis the analysist must be allowed to obtain the evidence by the app. The analysis methodology we proposed can be categorized in four different sub parts.

A. Identification
We consider the role of the writers as a forensic investigators or analyst [17]. This part of the methodology deals with the evidence identification. Where the said evidence is located or stored, what materials re being used as digital evidence what activities are being performed by the user on the application in question.

B. Preservation
In this part we preserve the evidence acquired during the entire acquisition in the forensic analysis process. It is the most important part of the investigation where every artifact must be preserved the way it was originally found as any modification intentionally or accidently may cause inaccuracies in the analysis process further down the case and can even lead to the cancellation of a well-established digital forensic case.

C. Analysis
After the digital evidence produced in the previous process of preservation, it must be read for the analysis. The data cannot, however, be read directly. The help of tools as mentioned previously were required and with those we were able to carry out the analysis process. The method we used for this was to compare the results of the previous analysis [10] that uses an older version of the telegram with the newest version that is now available today. Fig.1 shoes the activities which are to be performed: 1) The initialization of the app: installation, signup, verification, launch.

2)
CRUD operations on the contact: add, update, delete.

4)
Furthermore, the uninstallation of the app.

5)
Finally, the source code analysis and comparing all the above-mentioned parts one to one with the old version of the app.

D. Presentation and Documentation
When all the previous mentioned steps have been carried out the final stage is to present the results and conclusions resulted in the process. The artifacts generated as a result would be compared with a previous version of the Telegram i.e., v3.15 which came out in 2016. The final object is to figure out what changes, if any, is there to the application and verify if there is any major change needed to carry out the forensic of the app now which will always help forensic analyst prove the cybercrime committed in a relevant court.

RESULT AND DISCUSSIONS A. Setup and Preparations
In this paper, the scope was to determine the residual data from an Android device using sound forensic techniques on Telegram Messenger. User activities performed on the app is necessary to perform a thorough investigation such as: downloading and installation of the application, signing up into the app and then launching it for the first time; contacts creation, update, deletion; message exchanges including sending out the text message, receiving the message, sending image; sharing contact and file. The test took place on a virtual Android smartphone configured with generic settings for eliminating the specificity of hardware. For this work a virtual phone was used customized according to an average android smartphone configuration -2GB of RAM, 2 Core processor, 16GB of storage with Android 8.1 having API level 27. Telegram Application version used was 7.7.2 (2293) which was the latest release of the forensic analysis. Fig 2. Shows the startup and signing in the app in virtual environment. For the acquisition of the artifacts, we used live and as well as online tools while the activities were happening in real time. We incorporated different tools to aid us to the access to a rooted virtual device, file explorer and command line interface for analyzing the logs of the events performed. At the end all the data generated along the activities were compiled to presented including the files, files' contents, and their locations.

B. Investigation 1)
File Structure During the extraction of the data, we found different files and folders relating to the app directory in multiple location. Fig 3 shows   Finally, we have the location data/media/0/Telegram where the app stores its files and media. These are the files which are downloaded from the message exchanges and are stored on either the local or an external storage if present and selected as default; in case of external storage the path is <sd_card_name>/Telegram where sd_card_name is the name of the external card (without the angled braces) dependent on the model and make as shown in Fig 4. In comparison to the unrooted device, we found that these stored artifacts are sometimes stored on the cache folder temporarily.

2) Initialization
The initialization included the downloading and installation of the application. After the installation, the app was launched for the time for the signup. The signup included OTP verification. We used ADB logcat for the online forensics and Genymotion own virtual device log generation for offline forensics.
The Fig. 5 shows the entry in the users table in the cache4.db file with the user who just signed into the application. This provide the evidence that the activity performed can be proved through this method. We signed up with a mobile number which can clearly be (unencrypted) in blob seen in the right section of the screen capture. The unique id can also be seen for the user which is vital for the further investigation. The logs were obtained through adb logcat command with further filtration of the type of log we wanted in the analysis for example adb -e logcat org.telegram.messenger.org for filtering by the desired application processes and adb -e *:D for displaying all the debug process in the virtual mobile. All the logs for these activities are in Table 1. These logs can be useful as digital evidence in court later. Regarding the comparison with the older version of the app, the file location and structure of the telegram files and folder including their names remain the same.

3) Contacts
As with the previous user activity, same method was utilized to obtain this data. Online log caught the relevant data during the performance of the activities related to the contacts where user first added the contact and then edited it and deleting it afterwards. It could be used to strengthen the case in the case where a suspect might perform these activities.
The same location where the signed in user is stored in cache4.db is also where all the users and contacts can be seen. Each user has an id followed by a name. In Telegram a name is a display name (or nickname) visible to the header of a message thread in the app and multiple users can have the same display name. Whereas a username is a unique handle distinguishing each user from each other. Both display name and unique handle are stored in the users table separated by three semicolons as show in Fig 6. This is a crucial piece of artifact that shows which user holds which name and their corresponding handle which clearly distinguishes them for the other. In case of deletion of a contact, user_contact_v7 with the fields userID, forename, surname is utilized with correlation to the table user_phones_v7 with the fields userID, phone, sphone, and deleted which shows which contacts once were saved were removed. Table 2. shows the log acquired through online forensic during the above-mentioned user activities. Regarding the comparison with the older version of minor changes are there within the cache4.db database file i.e., the table names user_phone_v3 and user_contact_v3 are now changed to user_phones_v7 and user_contact_v7 respectively which seems to change along with app's version number.

4) Chat Messages
In this scenario, there were a few use cases performed -first, a simple text message was sent to one of the contacts; secondly, an image was shared; after that, a media file was sent followed by sharing a saved contact; finally, location was shared. All the logs were accessed the same way as with the previous cases -through ADB and Genymotion's own log generation. Thought all the logs were massive in number filtering them out based on the timestamp were necessary. Fig 7. shows the messages table in the cache4.db file. Entry number 17 shows the uid with 1188353915 identifying the user who sent the message and the blob displaying the message itself unencrypted. This simple text is useful to the investigator finding out the actual content of the message and other related information such as timestamp and whether the message was sent or read until the acquisition. Although any plain text message is visible in the SQLite database browser, the media message is however, not readable. As described in [18] the message's media file name is visible in Telegram v3.4.2 it is not, however, the case with our analysis as shown in Fig. 8. It is to be noted that any caption relating to a media message can be seen in the section above and in some cases the snippet including in a URL is also saved in the database in media_v2 table. The logs were captured using the online forensics in the message exchanges as can be seen in Table 3. Telegram uses TDS (Telegram Data Structure) to encode the information it stores. Most of the user's information the application stores are in the shared preferences which is the mechanism to save states to the application after it has been terminated in Android OS e.g., appLocked, loginTime, lastMyLocationShareTime, sharingMyLocationUntil, passcodeRetryInMs etc. As discussed in the first section of the investigation process in File Structure section the shared preference is stored in the shared_prefs>userconfig.xml. One of the most important pieces of information the stores is the user information in a serialized form. As shown in Fig. 9 one of the data types is the string with the user value for the user to which the account belongs to. Telegram allows the feature to add more than one account in which separate user configuration file is generated with names userconfig1 and userconfig2 and so on. These serialized strings are then deserialized using the TDS extracted from the source code. The deserialization was not performed in our process as this was not part of our proposed methodology. However, [19] provides all the schema related to the data structures and all the other variables the application uses inside the code. The source code provided [20] us useful in corelating the java classes with TDS which are then can be used for the deserialization process. Regarding the comparison of the old and newer versions of the application we found some changes being made to the TDS e.g., the case with user's details the parameters used are id, first name, last name, phone, photo, status, username as shown in Fig. 10 used to deserialize the string is unchanged according to [https://core.telegram.org/schema]. As with the case of java classes the source only provides the code up until v5 and cannot therefore compare it accurately. However, comparing the analyzed code in [10] with the v3, the class names does seem to be retaining their names.

Fig 10. Name, their Data Type and Description for a User TDS
The analysis provided is guide that can be utilized by the investigators and forensic analyst that could improve knowledge for cyberlaw professionals in finding artifacts on Telegram Messenger application. We obtained some important data by performing some pre-designed user activities on the app that gave us insight to how the application behaves when certain actions are performed. Table 4 shows the precise comparisons between the work conducted on the Telegram v3. 15   The gathered data was then put in comparison with the results of the forensics of the old version of application. We compared the app artifacts' locations, performed actions related to contact's addition, modification and deletion and difference type of messages were exchanged. Upon the analysis we found that some of the database's table names were different such as user_phone_v3 and user_contact_v3 are now changed to user_phones_v7 and user_contact_v7 respectively which seems to change along with app's version number.
Furthermore, the case with the message exchanges involving media, the location to where the media was stored on the phone was unreadable. In the case of the source code analysis, we were able to compare the v3 to v5 code and did not find any major changes such as different java classes' name and their location in the java package.
We present a brief guide to our findings at the end of our investigation. Table 5 shows two columns -the activities we performed and the locations that had an effect from those activities. In short, either the database file was updated in case of activities with signing in, contacts and plain messages. Whereas the location where all the media and files are stored was impacted with the activities involving message exchanges with images, location or contact sharing. This guide can provide insight to the investigators when looking for the certain evidence in a potential cybercrime case.

CONCLUSIONS
Based on the results obtained by the five major scenarios we conducted on Telegram Messenger v7.7.2 on Android OS 8.1, this paper lays out the details of the artifacts and addition to those it