Identifying Game Processes Based on Private Working Sets

Fueled by the booming online games, there is an increasing demand for monitoring online games in various settings. One of the application scenarios is the monitor of computer games in school computer labs, for which an intelligent game recognition method is required. In this paper, a method to identify game processes in accordance with private working sets (i.e., the amount of memory occupied by a process but cannot be shared among other processes) is introduced. Results of the W test showed that the memory sizes occupied by the legitimate processes (e.g., the processes of common native windows applications) and game processes followed normal distribution. Using the T-test, a significant difference was identified between the legitimate processes and C/S-based computer games, in terms of the means and variances of their private working sets. Subsequently, we derived the density functions of the private working sets of the considered game processes and those of the legitimate processes. Given the private working set of a process and the derived probability density functions, the probability that the process is a legitimate process and the probability that the process is a game process can be determined. After comparing the two probabilities, we can easily determine whether the process is a game process or not. As revealed from the test results, the recognition accuracy of this method for C/S-based computer games was approximately 90%.


Introduction
Online games are becoming increasingly predominant in our leisure activities. Many people, especially those with poor self-control are suffering from addictions to online games. Accordingly, the possibility of monitoring online games has attracted increasing attention. Although computer labs in schools are primarily used for learning, some students with weak self-control often play games online in their labs. The development of intelligent game monitoring software will be effective in preventing students from playing games in the computer labs of their schools.
The existing methods of game recognition are complex and exhibit relatively poor performance. In this paper, an effective method for identifying game processes by drawing a comparison of memory footprints is proposed. It was found that the private working sets of legitimate processes and those of game processes comply with normal distributions based on the results of the W test, whereas a significant difference was identified between the private working sets of legitimate processes and those of game processes. Subsequently, according to the sample data considered in this study, the probability density functions of the private working sets of game processes and those of legitimate processes were derived. Given the private working set of a process and the derived probability density function, the probability 1 P that the process is a legal process and the probability 2 P that the process refers to a game process can be determined. If 2 1 P P > , it is predicted as a legal process; otherwise, it is a game process. The proposed method is simple to apply, and yields high recognition accuracy on client/server-based (C/S-based) computer games The rest of this paper is organized as follows. In Section 2, the related works are discussed. In Section 3, the proposed method of game process recognition is presented. In Section 4, the accuracy of the proposed method is evaluated. Finally, in Section 5 the conclusion of this study is described.

Related work
Numerous researchers have performed investigations on the identification of game and illegitimate processes. The main methods of game or illegitimate process identification can be listed as follows: A. Method of blacklisting First, a blacklist of illegitimate processes is created, following which the process information of the local machine is scanned. Subsequently, the records of the blacklist are compared with the process information of the local machine. If these are found to be identical, the corresponding process is determined to be illegitimate. This method is simple, yet time-consuming as all the blacklist information has to be updated [Gao and Guan (2007); Li and Li (2006); Yeming, Ori and Claudia (2017)]. B. Method using keystroke features of users The characteristics of players pressing keys on the keyboard and clicking the mouse are different from those using non-game software [Zhang (2012) ;Li, Zhang, Yue et al. (2014); Shanmugapriya and Padmavathi (2011);Balagani, Phoha, Ray et al. (2011)]. For instance, a player frequently uses a specific set of keyboard keys; whereas, a user who does not play games uses these keys less frequently. However, in considerable cases, the characteristics of players pressing keys on the keyboard and clicking the mouse slightly differ from those of professional software users. For instance, professionals who use photoshop to perform operations on images often use certain keyboard shortcuts at high frequency and intensively. C. Method of characteristic code detection This method has been extensively applied in anti-virus software. Several experts and scholars have researched the topic of characteristic code detection [Zhong, Li, Tang et al. (2010); Zou, Zhang, Zhang et al. (2014)]. Based on the collection of viruses or malware samples, the virus signature can be extracted from the malicious code. The hex code acts as an identifier of the virus or malware; and based on it, a signature database is built. A virus or malware scanner adopts a pattern matching algorithm (e.g., Brute Force (BF) and the Knuth-Morris-Pratt (KMP) algorithm) for signature matching [Wu, Fan, Wang et al. (2012)]. The disadvantage is that this method is costly as the signature database needs to be updated regularly. D. Other methods Some researchers delved into relevant issues using image processing, machine vision, and deep learning techniques [Luo, Qin, Xiang et al. (2019) (2019)]. However, useable outcomes have rarely been achieved in this field. Many of these techniques are fueled by improvements in computing power, and the mentioned methods, which are closer to people's intuitive insights into game interfaces will also be worth studying, which should be our future research direction.

Data collection of legitimate processes
In this paper, legitimate applications refer to those applications that are allowed to be used in computer laboratories. The legitimate sample software considered in this study is listed in Tab. 1. The software of a target computer is launched, and the process information pertaining to the aforementioned software was scanned by the process scanner implemented by our team, and then saved to a database. The process information primarily comprises process names, average memory (private working set) sizes occupied by the processes, etc. For a better understanding, the term "private working set" is explained as follows. A working set refers to the physical memory occupied by a program (including the memory shared with other programs), and a private working set refers to the exclusive physical memory of that particular program. In this paper, the terms "private working set" and "memory" have been interchangeably used.
To be specific, the process information of sample applications was read and saved every 3 s using a monitoring tool (e.g., a process scanner module). The average of the top 40 values of the memory occupancy of each process was then taken. The average value is used to represent memory size occupied by a process. It was determined that the memory footprints of the sampling processes range primarily between 40,000 KB and 250,000 KB, as shown in Fig. 1. Likewise, the information of other legitimate processes is also acquired. Most other legitimate processes were found to individually take up less than 40,000 KB of memory, as shown in Fig. 2.

Game process data collection
Fifty popular online games were randomly selected as samples (as listed in Tab. 2). Based on the previous approach, a monitoring tool was used to read and store the process information of the samples. Moreover, the memory sizes occupied by the game processes were found to range primarily between 300000 KB and 1000000 KB, as depicted in Fig. 3.

W test
The W test, which is a correlation-based algorithm, is also known as the Shapiro-Wilk test. The results yield a correlation coefficient; the closer it is to 1, the better will the data and normal distribution fit. It is generally considered that when the sample size n reaches 50 3 ≤ ≤ n , the W test can be adopted to verify whether these samples comply with a normal distribution [Zhang and Dong (2015); He and Wang (2014)]. The formula for the W test can be expressed as follows: If W< W , the normal hypothesis is discarded according to the significance level α; if W> W , the normal hypothesis is accepted. The values of the memory (i.e., the private working set) occupied by the valid processes are tested by SPSS. The results are presented in Tab. 3.  As can be seen, the value of Sig is 0.106, which is greater than 0.05. Thus, the values of the memory sizes occupied by game processes indicate a normal distribution. Fig. 4 shows the differences in the probability distributions of the private working sets occupied by legitimate processes and game processes in an intuitive manner.

Comparative analysis
The T-test is performed to infer whether a significant difference can be identified between the mean values of the two collections (the values of the process memory occupancy of legitimate applications and those of game process memory occupancy). Ttest, also called Student's T-test, is primarily performed for normal distributions with a small sample size (e.g., n<30) and an unknown population standard deviation σ. The Ttest is split into a single-population test and a double-population test. The doublepopulation T-test aims to test whether there is a significant difference between the averages of two samples and the corresponding populations. The double-population Ttest can fall into two cases, i.e., independent sample T-test and paired sample T-test. The formula for calculating independent sample T-test statistics is as follows:  The value of F of the Levene's Test is 34.106, and the value of Sig is 0, which is less than the significance level (e.g., 0.05). Therefore, a significant difference is identified in the variance. As can be observed from the results of the T-test, the value of Sig (2-tailed) is 0, which is lower than the significance level (e.g., 0.05), and a significant difference is identified in the mean of the two populations.

Probability density functions
The values of the memory sizes used by the two types of processes comply with the normal distributions, and therefore, the probability density function can be expressed as follows: (3) where ̂ and � 2 can be computed using the maximum likelihood estimation. Moreover, the equations are written as follows: (5) The probability density of the memory sizes of legitimate processes can be defined as follows: The probability density of the memory sizes of game processes can be defined as follows:

Game recognition
The basic idea of a game recognition algorithm is to 1) get the memory size of a process, 2) calculate the probabilities of whether it is a game or a legitimate process using the memory size, 3) predict whether the corresponding process is a game process. The method to compute the probabilities is described below. a) If the size of memory taken by a process : the probability that the process is a legitimate process is expressed as the probability that the process is a game process is If the size of memory taken by a process : the probability that the process is a legitimate process is written as (10) the probability that the process is a game process is c) If the size of memory occupied by a process : the probability that the process is a legitimate process is expressed as (12) the probability that the process is a game process is (13) Given the mentioned analysis, the process of computer game process identification is elucidated as follows: First, the process memory information of the sample applications, sample games, and all the other processes in the target computer is read. Subsequently, Eq. (4) can be adopted to get and , and then Eq. (5) can be employed to get and . Lastly, the probabilities of game and legitimate process are calculated through the memory size occupied with the described method. Furthermore, whether the process is a legitimate process, or a game process is demonstrated according to the results of probability calculation.
For instance, the memory size occupied by a process is assumed as 0 x .

a) If
: we calculate 0 x into Eqs. (8) and (9), respectively, and then compare the , this process is judged as a legitimate , the process is considered a game process.

b) If
: we substitute 0 x into Eqs. (10) and (11) , the process is considered a game process.

c) If
: we substitute 0 x into Eqs. (12) and (13) , the process is considered a game process.
From Eqs. (8)-(13), the probability is calculated by converting the general normal distribution into a standard normal distribution. Subsequently, the result is found in the standard normal distribution table.
4 Performance evaluation 20 legitimate applications and 20 computer games were randomly selected to test the efficacy of the proposed method. The results of identification of legitimate application processes are listed in Tab. 6. and those of game processes are listed in Tab. 7.  Definition 1: false positives rate=(number of legitimate processes misreported as game processes)/(total number of legitimate processes). According to Tab. 6, as suggested from the test results, the false positives rate is 15%. Definition 2: game process recognition rate=(number of game processes identified)/(total number of game processes). According to Tab. 7, as revealed from the test results, the game process recognition rate is 95%. Definition 3: accuracy of game process recognition=(1-false positives rate+game process recognition rate)/2. The accuracy of game process recognition was calculated to be 90%. This method outperforms and is simpler to use than the methods mentioned in Section 2.

Conclusion and future work
We proposed a game process recognition method based on the private working sets of computer software. The proposed method is simple and yields high recognition accuracy (approximately 90%) for computer games based on C/S architecture. However, since the process memory occupancy is tightly related to computer hardware and operating systems, the probability density function and its parameter values proposed in this paper cannot be directly applied to other computer environments. In other words, this method exhibits poor portability. Therefore, in future work, machine learning technologies will be adopted to improve the accuracy and portability of the proposed method.