A Survey of Android Malware Static Detection Technology Based on Machine Learning

. With the rapid growth of Android devices and applications, the Android environment faces more security threats. Malicious applications stealing users ʼ privacy information, sending text messages to trigger deductions, exploiting privilege escalation to control the system, etc., cause signiﬁcant harm to end users. To detect Android malware, researchers have proposed various techniques, among which the machine learning-based methods with static features of apps as input vectors have apparent advantages in code coverage, operational eﬃciency, and massive sample detection. In this paper, we investigated Android applications ʼ structure, analysed various sources of static features, reviewed the machine learning methods for detecting Android malware, studied the advantages and limitations of these methods, and discussed the future directions in this ﬁeld. Our work will help researchers better understand the current research state, the beneﬁts and weaknesses of each approach, and future technology directions.


Introduction
e number of Android applications continues to grow. As of June 2020, there are more than 2.99 million apps in Google Play [1]. At the same time, the number of malicious apps is also increasing rapidly. Experts at AV-TEST Institute had counted around 1.8 million new malicious apps in the first half of 2020 [2]. e malicious apps steal personal information from end users, automatically call and send text messages to trigger deductions, and exploit root privileges to control the system, causing massive harm to end users [3]. Various detection techniques for Android malware have been proposed one after another to cope with these security threats. e machine learning-based detection method is one of the efficient ways. is type of approach has several advantages: (i) Comprehensiveness of the detection: when detecting malware, traditional methods usually pay close attention to certain malicious functions, such as the framework proposed by Xu et al. [4] against privilege escalation and the technique offered by He et al. [5] for privacy leakage analysis. Most machine learning methods can detect various malicious behaviours at one time and carry out a classification of multiple families, which can be used as a universal and comprehensive detection means. (ii) Accuracy of the detection: benefiting from the rapid development of machine learning algorithms, the accuracy of identifying malicious applications based on such methods is increasing. Detection frameworks based on SVM, Naïve Bayes, Perceptron, and deep neural network algorithms are continually being proposed. ey perform better and better in dealing with malicious application identification, multiclassification, and malicious code location. (iii) Reduction of dependence on experts: traditional detection methods rely heavily on the rich experience of human experts. However, through feature extraction at many different levels with less experience knowledge, more malicious application patterns can be identified by machine learning methods, which are more conducive to discovering new malicious software and improving detection efficiency.
In the implementation of Android malware detection using machine learning, the two primary sources of the feature are static extraction and dynamic extraction [6]. Static features are extracted from the manifest, Dalvik bytecode, native code, sound, image, and other reversed APK files. Dynamic features are collected from the log records, code execution paths, variable value tracking, sensitive function calls, and other behaviours in the process of application execution by running APK files in a monitored environment.
Although the detection method based on static features has some limitations compared with those based on dynamic ones, such as it is challenging to combat code obfuscation, it also has distinct advantages: (i) Full code coverage: static feature extraction can cover all code and all resource files by scanning code or symbolic pseudoexecution. In contrast, dynamic feature extraction can hardly cover all code execution paths. Many applications require users to provide login credentials to use most of the features, making it difficult to detect all the functions in dynamic execution, resulting in incomplete feature extraction. (ii) Reliable detection efficiency: static feature extraction will complete the detection task in the expected time because it does not need to run the application. In contrast, dynamic feature extraction requires triggering various functions in code execution, which will consume lots of time. While the application is running, it takes some time to simulate a click-through interface. e program may perform a very complex computation or enter an infinite loop. ese conditions make it difficult for the detection task to be completed within the specified time frame. (iii) Unperceived by malicious code: static detection does not require the codesʼ execution, so malicious applications cannot recognize that they are under check. Although some malware attempt to make the static analysis more challenging by setting up interference codes, these added codes may themselves be an identifier to assist in identifying malicious applications. (iv) Easier to generate generic fingerprint identification: static malicious sample analysis is more inclined to extract features with invariance and universality. In contrast, dynamic analysis is very likely to be affected by the operating environment. Statically extracted features are suitable for fingerprinting and can be used for the rapid predetection of large-scale malicious applications.
ere are some surveys about Android malware detection published in the past few years. e authors in [7][8][9] analysed the Android security mechanism and typical malware detection methods. e authors in [10][11][12] focused on applying deep learning algorithms such as Restricted Boltzmann Machines, Convolutional Neural Network, Deep Belief Network, Recurrent Neural Network, and Deep Autoencoder to malware detection and analysed the advantages and results achieved. e authors in [13] is mainly concerned with the analysis of Android malware variants' detection methods. e authors in [14] investigated Android malware detection and protection technology based on data mining algorithms. e study in [15,16] collected the literature research of the past few years, systematically analysed the static detection technology, and discussed datasets, features, algorithms, empirical experiments, and performance measures. e study in [17] introduced the Android architecture, security mechanism, malware classification, and entire detection process, including sample collection, data preprocessing, feature selection, machine learning model construction, and experimental evaluation. e study in [18][19][20] comprehensively discussed static, dynamic, and hybrid detection techniques. e study in [21] mainly focused on mobile malware detection techniques, analysed signature-based detection, anomaly-based detection, and other traditional detection methods. e study in [22] discussed various threats and the current Android platform security state, introduced three attack types, explained the factors contributing to the increase of malware, and analysed defensive mechanisms of Android protection. e above surveys have done excellent work, but there are still some aspects that can be improved. For example, the sources of static features, the challenges of obfuscation technology to static analysis, and the deterioration issues of machine learning models are not investigated in detail. Our work aims to provide a comprehensive survey about Android malware static detection based on machine learning technologies. To this end, we searched in IEEE, ACM, Springer, Wiley, Hindawi, and other databases and used Google Scholar and DBLP to find the related papers. It is worth noting that we only use research papers from DBLP since 2016 in statistics of static feature types, machine learning algorithms, dataset usage, papers' number, and evaluation metrics. e reason is that DBLP has become an authentic database, and the papers it saves are of relatively high quality and are relatively few in number. We can manually check the detection technology, algorithm model, and evaluation method used in each article to form accurate statistical results. Based on the papers we collected on Android static malicious application detection, we finish this survey work.
In our work, we analysed the Android application static features and the typical obfuscation methods, discussed machine learning algorithms suitable for Android malware detection, explained evaluation metrics of machine learning models and sustainability issues, investigated the technical route, advantages, and disadvantages of the existing research, and made an outlook on the possible future research directions in this field. e main contributions can be summarized as follows: (i) We carried out a comprehensive review of Android malwareʼs various static detection methods based on machine learning. e basic principles, feature sources, datasets, performance metrics, contributions, and limitations of the methods were compared vertically.
(ii) We analysed the Android application composition, the source of static feature extraction, and the feature vector generation method in detail. (iii) e limitations of current methods were discussed, and the future development directions were prospected. An Android app is released in the APK package form, a zip archive mainly composed of assets, lib, res, manifest, Dalvik bytecode, and resource files [24]. ese files are commonly used as static feature sources. According to each fileʼs role, the feature vectorʼs extraction method and expression are different. As shown in Figure 1, there are mainly the following types of features.

Permission
Features. From AndroidManifest.xml, the various permissions that an app requires to use during its runtime are declared in AndroidManifest.xml. Extracting the permissions listed in the file can help determine whether the app is malicious. e Android system provides about 250 kinds of permissions [25], resulting in the feature vector taking a form as a binary vector of about 250 bits long.

Component
Features. From AndroidManifest.xml and classes.dex files, the four basic components need to be registered in AndroidManifest.xml. ey will be declared and created by calling related system calls in the classes.dex file. e types and quantities of components used in the app will generally be included in the formed feature vectors.

Intent
Features. From AndroidManifest.xml and classes.dex files, intents are used to pass messages between components. When an intent is passed to a component, a predefined call-back function is executed to process this intent. In the formation of feature vectors, intents are often used with components because they can help analyse the association between two components.

Constant String Features.
From resource and dex files, the strings.xml resource file stores the developer-defined strings, and the dex file stores the string defined by the Smali code. Extracting the content and frequency of these strings from these files can reflect the appʼs characteristics. Considering that there are many types of strings and some of them are very long, the hash operation is generally carried out first before subsequent processing when forming feature vectors.

Resource File Features.
From res and assets directories, including sounds, images, and layout files, many repackaged, cloned malicious applications do not modify resource files so that such files can be used as static features.
2.1.6. Opcode and API Features. From dex files, the frequency of Dalvik opcodes and API calls reflecting devel-opersʼ programming habits is very suitable for generating detection features. In general, the occurrence number of opcodes and APIs presents a significant distinction between malware and benign apps. Feature vectors can be generated by measuring the frequency of continuous N opcodes and APIs.

Native Code Frequency Features.
From .so files, many malware use native instructions to perform malicious operations, for the reason that compiled codes of these instructions are more difficult to decompile, which will bring many obstacles to the detection work. Extracting the arm opcode frequency and system call invocation frequency from the .so file can significantly help the detection work.

Control Flow Graph and Data Flow Graph Features.
From dex files, the control flow graph and data flow graph can be obtained by analysing the instruction invocation relationship and data flow direction in the codes. Transforming these graphs into vector representation through graph embedding and other methods can distinguish malicious from benign behaviours well. e above static features are used differently according to detection scenarios. We counted the research papers on machine learning-based static detection of Android malware in the DBLP database between Jan. 2019 and Nov. 2020. After eliminating repetitive and irrelevant articles, we obtained 118 papers. e statistics of static feature usage are shown in Figure 2. All features were used 277 times, of which API features were the most frequently adopted and native opcodes the least.
is situation illustrates that these features play different roles in identifying malicious apps.
In recent years, the academic community has paid great attention to the sustainability and deterioration issues faced by learning-based detection models [26,27]. A key to achieving high sustainability of a classifier lies in the underlying features being able to differentiate benign apps from malware for a long period. Zhang [28] built an API Graph based on API features to enhance malware classifier performance and slow down model aging with the similarity information among evolved Android malware. Xu [29] built and dynamically expanded the feature set based on APIs to Mobile Information Systems train a variety of online learning models in a model pool that can determine the drift samples and update the aging model in a weighted voting style.

Android Malware Datasets with Static Features.
To facilitate the analysis and utilization of static features, some researchers provided Android malware datasets with static features.
Drebin dataset [30] contains 5560 apps from 179 different malware families in the period of August 2010 to October 2012, providing eight types of features including hardware features, requested permissions, app components, filtered intents, restricted API calls, used permissions, suspicious API calls, and network addresses.
MalGenome dataset [31] provided 1260 Android malware samples in 49 different malware families from Aug 2010 to Oct 2011 and analysed the characteristics of malicious applications during installation and activation, including some static features, such as used permissions.

Static Feature's Obfuscation Technology.
To make the applications more difficult for reverse engineers to understand or be checked by detection tools, malicious application developers will obfuscate static features to a certain extent. e main methods are as follows.

Identifier
Renaming. e name of the identifier in the code is usually meaningful, and developers generally follow similar naming rules. To make it difficult for reverse engineers to understand code logic and reduce potential information leakage, the malware developers may use random strings to replace the identifiersʼ names [40]. is kind of renaming is more effective for humans than for machine analysis. Since the identifier renaming method does not  work with Android API functions, Ficco [41] bypassed this techniqueʼs obstacle by comparing the API call sequence.

String Confusion.
Encoding and encrypting the strings in the resource and code files and decrypting and reading them at runtime can achieve the effect of bypassing the detection [42]. In this case, the encryption function and encoding function should be included in the investigation scope and examined carefully. For binary programs, previous works [43,44] had proposed various approaches to identify cryptographic functions in a program, such as AES, DES, and RC4. For Android apps, Suarez-Tangil et al. [45] dealt with this kind of obfuscation by strengthening the checking of cryptographic system API calls.

Call Indirection.
e malicious developer modifies the original methodʼs invocation entry, inserting a new randomly named method before invoking the original. Calling the original method through the newly inserted method will add many unrelated nodes to the control flow graph, rendering some detection methods based on these technologies ineffective. Garcia [46] constructed a detection framework that used sensitive API usage, data flows between APIs, intent action, and package API usage features to detect malicious apps that use various obfuscation techniques, including call indirections.

Junk Code Insertion.
Malicious developers often add junk instructions to the code files, such as NOP, jump, and register operation instructions, to increase the codeʼs complexity and eliminate the original static characteristics. Some detection methods based on opcode statistics [47,48] will be disturbed by this obfuscation technology. In comparison, the method based on the source-sink data flow [49] will normally work because the influence of the inserted junk instructions on the data flow is limited.

Dynamic Code Loading.
e Android app can load native code and additional Dalvik bytecode from local resource files, other apps, or remote networks. Malicious developers often load dynamic code through the Ldalvik/ system/DexClassLoader function or use the Ljava/lang/reflect/Method function to make reflection calls, making it challenging to locate malicious code. e typical detection method is to check the appʼs sensitive functions to judge whether it is malicious. Poeplau [50] constructed the super control flow graph (sCFG) to check sensitive function calls and parameter passing to judge whether dynamic calls are vulnerable or malicious.
For the above obfuscation methods, various static detection techniques have different countermeasures. Generally speaking, it is difficult for malicious developers to obfuscate all static features. Static detection typically improves the detection accuracy by using multiple features comprehensively [45,46,51,52].

Machine Learning Algorithms and Related Performance Metrics
Using machine learning algorithms for Android malware classification must consider the algorithm efficiency and accuracy. Generally speaking, shallow machine learning algorithms are used to construct simple classification models, with the advantages of simple implementation and fast running, but the precision is relatively low. Using complex machine learning models, detection accuracy is high, but efficiency is not desirable. Many methods are compromised between the two. Logistic regression, Naïve Bayes, Support Vector Machine, k-Nearest Neighbour, Decision Tree, and Random forest are suitable shallow machine learning algorithms for detecting Android malware. (1) Logistic regression [53] is a generalized linear regression analysis model for estimating a particular thingʼs probability. e purpose of logistic regression is to find a best-fit model that describes the relationship between the dependent variable and a set of independent variables. (2) e Naïve Bayes [54] is based on  the Bayesian theorem and assumes that the feature conditions are independent. e Bayesian network [55], also known as the reliability network, is suitable for expressing and analysing uncertain and probabilistic events and can make inferences from incomplete, inaccurate, or uncertain knowledge. (3) e Support Vector Machine (SVM) [56] is a generalized linear classifier that classifies data in a supervised learning manner. e decision boundary is the maximummargin hyperplane for solving the learning samples. e SVM can perform nonlinear classification by the kernel method and is one of the common kernel learning methods. (4) e idea of the k-Nearest Neighbour (kNN) [57] algorithm is to find k nearest neighbour samples of a sample. Most of them belong to a specific category and have similar attributes. (5) e Decision Tree [58] is a nonparametric supervised learning method that can summarize decision rules from a series of data with features and labels and use the treeʼs structure to present these rules to solve classification and regression problems. (6) e Random forest [59] is a classifier that contains multiple decision trees. Its output category is determined by the output categories of most decision trees.
Deep neural network models suitable for detecting Android malware mainly include Deep Belief Network, Convolutional Neural Network, Recurrent Neural Network, Generative Adversarial Network, Multimodal Machine Learning, Multiple Kernel Learning, Graph embedding, and Representation learning. (1) e Deep Belief Network (DBN) [60] is a probabilistic generation model that establishes a joint distribution between observation data and labels. It is composed of multiple restricted Boltzmann machines. e layer-by-layer training method is used to solve the problem that the traditional neural network training method is not suitable for the multilayer network. (2) e Convolutional Neural Network (CNN) [61] is a feedforward neural network with convolutional computation and deep structure. It has the ability of representation learning and can perform translational shift-invariant classification of input information according to its hierarchical structure. (3) e Recurrent Neural Network (RNN) [62] is a recursive neural network with sequence data as input and is recursive in the sequenceʼs evolution direction. It is Turing-complete and has advantages when learning nonlinear features of sequences. (4) e Generative Adversarial Network (GAN) [63] is one of the most promising methods for unsupervised learning in complex distribution. e model produces a reasonably good output through mutual game learning between the frameworkʼs generated and discriminant models. (5) Multimodal Machine Learning (MMML) [64] aims to realize the ability to process multisource modal information. It can perform various tasks, such as representation learning, collaborative learning, and modal conversion. (6) Multiple Kernel Learning [65] uses multikernel functions to map and combine various features so that the data can be more reasonably expressed in the new combined space. (7) Graph embedding [66] maps high-dimensional sparse graph data into low-dimensional dense vectors, which can solve the problem that the graph data is difficult to feed into machine learning algorithms efficiently. (8) Representation learning [67] is a learning feature representation technique that transforms raw data into a form that can be effectively recognized by machine learning. It avoids the hassle of manually extracting features, allowing computers to learn to use features, while also learning how to extract features.
In the DBLP database, there are 225 classifier algorithms used in 118 papers of static machine learning-based Android malware detection from Jan. 2019 to Nov. 2020. Figure 3 shows these algorithmsʼ distribution. ere are more shallow learning algorithms used in the research. e SVM algorithm ranks first with used times of 36. In the deep neural network models, the CNN algorithm ranks first with used times of 16. To where TP is the number of true positive samples, FN is the number of false negative samples, FP is the number of false positive samples, and TN is the number of true negative samples. Another commonly used evaluation metric is AUC (Area Under Curve), which is defined as the area enclosed by the receiver operating characteristic curve and the coordinate axis. e higher the AUC value, the better the effect of the model. e ML-based Android malware detection models are prone to deteriorate for the rapid emergence of malicious apps. Researchers have introduced a series of metrics to evaluate the modelʼs sustainability, including AUT (Area Under Time) [68], Stability [69], Algorithm Credibility [70], and Algorithm Confidence [70].
AUT is a metric proposed by Pendlebury [68], which defines the area under the performance curve over time to represent the model's sustainability, as shown in the following equation: where f is the performance metric (e.g., F 1 -score, Precision, Recall, etc.), N is the number of test slots, and f (k) is the performance metric evaluated at the time k. e perfect classifier with robustness to time decay has an AUT metric closer to 1. e Stability metric proposed by Cai [69] of a classifier indicates how stable the classifier is without retraining or any other model updates and is measured by a tuple <e s , n>, where e s is classification accuracy the classifier achieves in an average case when trained on apps of year x and tested on apps of year x + n, n ≥ 1.
Jordaney [70] proposed Algorithm Credibility and Algorithm Confidence metrics to assess the decision of the classifiers and identify aging classification models before the performance starts to degrade. ey first defined a p-value p C z * for an object z * in a set of objects K, which means the proportion of objects in the class K that are at least as dissimilar to other objects in a set C as z * . e p value is defined as the following equation: where A D (C/z, z) tells how different an object z is from a set C. Based on the p value, they then defined Algorithm Credibility A Cred (z * ) and Algorithm Confidence A Conf (z * ) as the following equation: where A Cred (z * ) is defined as the p value for the test object z * corresponding to the label chosen by the algorithm under analysis and A Conf (z * ) is defined as 1.0 minus the maximum p value among all p values except the p value chosen by the algorithm. rough these two metrics, it is possible to understand whether the choices made by an algorithm are supported with statistical evidence, making it easier to discover concept drift and model aging issues.

Literature Review
It is worth noting that many papers use more than one machine learning algorithm. We analysed in detail which algorithm the researchers work on, or which algorithm is discussed or improved, or which algorithm has achieved the best results in the authorʼs experiment, as a classification basis. [71] proposed a method to detect android malware using permissions and API. e authors divided the detection of malicious applications into four steps: reverse engineering, feature extraction, feature vector generation, and classification. ey reversed the APK file with reverse engineering tools and obtained AndroidManifest.xml and Smali files. Permissions from AndroidManifest.xml and APIs from Smali files were extracted to generate combined feature vectors. ey achieved 96.56% accuracy for combined features using the logistic regression algorithm.

Logistic Regression. Tiwari and Shukla
Milosevic et al. [72] presented two machine learningaided approaches for static analysis of Android malware. e first approach is based on permission features. e Precision of 0.823, Recall of 0.822, and F-score of 0.821 are achieved using the logistic regression model as a classifier. e other approach extracts features from code files. Android apps are first reversed into multiple Java files; then, the natural language processing method is used to generate feature vectors through the bag-of-words model. SVM with SMO, logistic regression, simple logistic regression, and Ada-BoostM1 with SVM algorithms are integrated into the framework, and they achieved Precision, Recall, and F-score values of 0.958, 0.957, and 0.956, respectively.

Naïve Bayes and Bayesian Network.
Yerima [73] proposed an effective Bayesian classification method to deal with Android malware. e authors developed three tools: API call detectors, command detectors, and permissions detectors. ey extracted features from API call, resources, assets, libraries, and Permissions, respectively. rough experimental data analysis, top-n attributes with the most discriminative ability are selected to form effective features. Finally, a Bayesian classifier is trained to make decisions. eir experimental dataset contains 1000 malware samples and 1000 benign apps. Under the condition of using 20 attributes as classification features, the performance reaches Accuracy, Precision, and AUC with 0.921, 0.935, and 0.97223, respectively.
Sanz [74] proposed a method for categorizing Android apps through machine learning techniques. ey extracted three different feature sets: the frequency of the printable strings, the various permissions of the app itself, and the appʼs permissions gathered from the Android market. ey used Random Forest, J48, kNN, Bayesian Networks, Naïve Bayes, and SVM as classifiers to carry out experiments on 820 samples of seven different families and concluded that Bayes TAN was the best classifier obtaining an AUC of 0.93.

Support Vector Machine. Zhao [75] presented a Feature
Extraction and Selection Tool (FEST) based on machine learning approaches for malware detection. According to the predefined rules, the authors first implemented a feature extraction tool named AppExtractor. en, they proposed a feature selection algorithm named FrequenSel, which selects features by finding the difference of permission and API frequencies between malware and benign apps. In experiments, the authors tested various classification algorithms and found that the SVM algorithm was the best.
Nissim et al. [76] introduced a framework named ALDROID based on the active learning method. eir framework aimed to select only new informative applications (benign and especially malicious) to reduce security expertsʼ labelling efforts. ey first extracted permissions from manifest files, counted the number of activities, services, receivers, and content providers as features, and finally Mobile Information Systems used the SVM as a classification algorithm. e highest performance achieved an Accuracy of 98.8%, a TPR of 90%, and an FPR of 0.0008.
Xu et al. [77] analysed intercomponent communication-(ICC-) related characteristics and proposed a method of identifying malware called ICCDetector, which could capture interactions between components or across app boundaries. e ICCDetector outputs all ICC sources and sinks from the APK file. ese sources-sinks and other ICCrelated features can be extracted to form feature vectors. e authors experimented with a dataset of 5264 malware and 12,026 benign apps with the SVM algorithm as a classifier.
ey achieved an accuracy of 97.4%, with a lower FPR of 0.67%. Furthermore, they discovered 43 new malicious apps from the benign dataset using the ICCDetector tool.

k-Nearest Neighbour. Wu [78] developed a system called
DroidMat, considering the static information, including permissions, component deployments, intent messages, and API calls for characterizing the Android app's behaviour. Firstly, the DroidMat extracts the information from the manifest file and regards components as entry points for drilling down to trace API calls. Next, it applies the k-means algorithm to enhance malware modelling capability. e number of clusters is decided by the Singular Value Decomposition (SVD) method on the low-rank approximation. Finally, it uses the kNN algorithm to classify the app as benign or malicious. e model achieved 97.87% Accuracy, 87.39% Recall, 96.74% Precision, and 91.83% F 1 -score on the dataset from the "Contagio mobile" site.
Baldini and Geneiatakis [79] investigated the simple machine learning classifiersʼ performance. e authors performed an extensive comparison using various wellknown distance measures over the Drebin dataset. Results show that the distance measureʼs proper choice can provide a significant enhancement to the classification accuracy. Specifically, Hamming and CityBlock can boost the classi-fiersʼ performance in mobile malware detection. For instance, CityBlock can improve the kNN algorithmʼs false positive rate by up to 33% compared to the Euclidean distance. [80] developed a method that extracts several features from the manifest file to build machine learning classifiers. ese feature sets are the permissions required for the app and the features under the uses-features group. ey generated an input vector for all possible permissions, then used Naïve Bayes, J48, Random Forest, and other classifiers to perform experiments, and achieved the best results on the Random Forest containing 100 trees with an AUC of 98% and an Accuracy of 94.83%.

Decision Tree and Random Forest. Sanz
Canfora [47]  e authors trained two classifiers, SVM and Random Forest, to do binary classification. Results show that 97% accuracy can be obtained on average when 2-opcodes is used. Kang [48] did a similar job. ey also used n-opcodes as a feature to test Naïve Bayes, SVM, Partial Decision Tree, and Random Forest classification algorithms. For N � 3 and N � 4, the SVM shows the best F 1 -score of 98%, and Random Forest shows the best performance in terms of both training and prediction speeds.  [83] proposed DeepClassifyDroid, which takes a three-step approach as follows: feature extraction, feature embedding, and detection, to discriminate malware from android apps based on the Convolutional Neural Network. ey embedded permissions, intent-filters, API calls, and constant strings from the disassembled codes in a unified joint-vector space. en, they trained a CNN model with two convolutional layers, a pooling layer, and a full connection layer to learn these vectors. Experiments show that the approach achieves an accuracy of 97.4% with few false alarms on a dataset of 5546 malware and 5224 benign apps.

Deep Belief
Nix and Zhang [84] investigated the effectiveness of CNN and LSTM for Android apps' classification using system API call sequences. ey encoded each API call using a one-hot vector, and then, each segment is encoded by a matrix of size n × m, which serves as the input to a CNN model. ey compared their CNN model with LSTM and other n-gram-based methods. Both CNN and LSTM significantly outperformed n-gram-based methods, and the performance of CNN is the best. e experiments show that the results achieve 99.4% Accuracy, 100% Precision, and 98.3% Recall on a dataset of 1016 APK files.
Ganesh [85] proposed a CNN-based deep learning model can extract the patterns of malware. ey demonstrate that CNN is appropriate for malware detection by using data transformation. e APK file is parsed and decompiled using Androguard and Smali disassembler. en, the extracted manifest file is converted into a 12 × 12 vector of permissions, which is fed into the trained CNN model. eir solution identifies malware with 93% accuracy 8 Mobile Information Systems on a dataset of 2500 Android apps, of which 2000 were malicious and 500 were benign.

Recurrent Neural Network.
Amin et al. [86] proposed an end-to-end deep learning architecture that detects and attributes Android malware via opcodes extracted from bytecode files. ey confirmed that bidirectional long shortterm memory (BiLSTM) neural networks can be effectively applied to detect Android malwareʼs static behaviour without using handcrafted features. Experimental results report an accuracy of 99.9% and an F 1 -score of 99.6% on a large dataset of more than 1.8 million Android applications. Lee et al. [87] proposed a stacked RNNs and CNNsbased classification model for learning the generalized correlation between obfuscated strings from the package's and certificate owner's name. e model uses the embedded method and the GRU unit to extract features and uses additional CNN units to optimize the extraction process. eir experiments demonstrate that the feature extraction process is robust to obfuscation and sufficiently lightweight for Android devices and that the CNN-RNN method improved classification performance by 16% against n-gram features and reduced training time by 50% against an RNN model.
Ma [88] proposed Droidetec, a deep learning-based method for android malware detection and malicious code localization, to model an application program as a natural language sequence. Droidetec adopts a depth-first algorithm to extract API sequences from the Android app as features. Based on that, the BiLSTM network is utilized for malware detection. Each unit in the extracted behaviour sequence is inventively represented as a vector, allowing Droidetec to automatically analyse the semantics of sequence segments and eventually discover the malicious code. Experiments with 9616 malicious and 11,982 benign programs show that Droidetec reaches an accuracy of 97.22% and an F 1 -score of 98.21%. In all, Droidetec has a hit rate of 91% to find out malicious code segments properly. [89] proposed a Generative Adversarial Network-based model to detect Android malware inspired by the famous two-player game theory for rock-paper-scissor problems. Inside the discriminator and generator, they incorporated LSTM as deep learning architecture to learn the opcode-based binary sequential data on a large and unlabelled dataset. e test data sequences are passed through the context window, determining the bytecodesʼ sequences that differ from the previously recorded ones. If the sequence mismatch at one or more locations, it would help evaluate and characterize the behaviour of the APK. e technique achieved an F 1 -score of 99% with a receiver operating characteristic of 99%.

Multimodal Deep Learning and Multiple Kernel
Learning. Kim [90] proposed a model based on Multimodal Deep Learning to detect Android malware. e model has five initial networks and a final network. e initial networks take five types of features as inputs and output intermediate vectors to train the final network. e features are refined using an existence-based or similarity-based extraction method, which can reflect Android appsʼ properties from various aspects. e authors achieved 98% and 99% accuracy on the VirusShare and MalGenome dataset, respectively.
Narayanan et al. [91] proposed MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localization. MKLDroid uses a graph kernel to capture structural and contextual information from appsʼ dependency graphs and identify malicious code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views, which yields the best detection accuracy. Besides multiview learning, MKLDroid can locate fine-grained malice code portions in dependency graphs. On benchmark datasets, MKLDroid achieves more than 97% F-measure. In the malicious code localization experiments on a dataset of repackaged malware, MKLDroid identifies all the malice classes with 94% average recall. [92] used the API call graph to represent all possible execution paths that a malicious app can track during its runtime. e API call graph was embedded into a low-dimension numeric vector feature set, which was introduced to the deep neural network. e authors built a CNN model as the classifier, which contains two convolution layers, one pooling layer, one flatten layer, and one dense layer, to decide whether a given app was malicious or benign. ey evaluate four different graph embedding methods, namely, DeepWalk, Node2vec, Structural Deep Network Embedding, and Higher-Order Proximity Preserved Embedding, and found that SDNE provides more discriminative features. e result reached 98.86% accuracy when graph embedding size is equal to 32.

Representation
Learning. Narayanan [93] designed a semisupervised representation learning framework named apk2vec to automatically generate a compact representation for a given app. Apk2vec is an integration technology that draws on the idea of doc2vec to embed an app into a vector. It can encompass information from multiple semantic views, use labels associated with apps, and combine RL and feature hashing to build appsʼ profiles efficiently. e evaluations with more than 42,000 apps demonstrated that apk2vec's app profiles perform well in malware detection, familial clustering, app clone detection, and app recommendation tasks.
All the methods that we have analysed are vertically compared, as shown in Table 1.

Limitations of the Static Machine Learning-Based Detection Method.
We have discussed various algorithms used in past research works. ese algorithms perform very well Mobile Information Systems 9 in some respects, but there are also some limitations, mainly as follows: (1) Lack of standard benchmark datasets: there are 17 malware datasets used to verify the practical effect in 118 papers that detect Android malware based on static machine learning from Jan. 2019 to Nov. 2020. We can see the statistics of these datasets in Figure 4. ese datasets only provide malicious samples, resulting in the dilemma that researchers must collect benign samples from several app stores. e lack of standard benchmark datasets makes it challenging to evaluate which detection method is better or worse judicially.
(2) ere is no guarantee that the classifier model based on the existing dataset still has a good detection effect on new malicious applications. Many algorithms may achieve good detection results on some datasets for a while. However, as time goes, the new malicious samples may not be suitable for classification using the previous learning model, or the previously trained model leads to poor results. (3) e ability to resist obfuscation and other targeted attacks is generally weak. e article [87] mentions antialiasing, but it is limited to the package name and the ownerʼs name in the certificate and is invalid for obfuscation of Smali and native code. Obfuscation attacks conceal many of the original features, causing some static machine learning methods not to work very well.

Future Directions.
According to the published papers of DBLP from Jan. 2016 to Nov. 2020, Android malware detection has always been a hot research direction. As shown in Figure 5, the number of papers on Android malware detection based on machine learning is roughly equivalent in recent years. e detection method using static features has always been an absolute advantage. It can be concluded that the static machine learning detection method will still be a hot spot in the foreseeable future. At the same time, new detection technologies are continually emerging, and they will be more lightweight, fast, stable, and robust.
(1) e performance of static deep learning methods will reach a higher level. Among the Android malware detection papers based on static machine learning in DBLP from Jan. 2019 to Nov. 2020, the Accuracy metric was used 88 times. As shown in Figure 6, we    Features only consider opcode n-gram. Adding more factors may be better. can see that most Accuracy metrics are over 90%. It can be predicted that the detection methods will continue to improve efficiency and speed, while maintaining this crucial Accuracy metric. (2) e detection method will be more inclined to support large-scale detection. Many algorithms in past research works were evaluated over small datasets. Although excellent performance metrics were achieved, these algorithmsʼ scalability was not verified on large datasets. With the increasing number of applications in the future, there will be a growing need for fast detection methods to support massive apps.
(3) e detection model needs to have the ability to identify zero-day attacks and new malware. Static detection based on machine learning has excellent classification ability for known malicious applications. However, it is easy to misreport new and unknown 0-day samples because the new virus weakens the known features. Future detection technology will be developed towards improving 0day detection ability. (4) e antiattack capabilities of the machine learning model used in Android malware detection will be further enhanced. Many machine learning algorithms are vulnerable to poisoning attacks, spoofing Presented apk2vec, a semi-supervised multimodal RL technique to automatically build data-driven behaviour profiles of Android apps.
Accuracy will be affected when the number of labelled samples was too small.  attacks, impersonate attacks, and inversion attacks [94]. In the literature surveyed, no effective protection measures are proposed for possible attacks, which will be improved in the future.

Conclusion
With the continuous growth of Android devices and applications, Android appsʼ security has attracted more and more attention. is paper studied Android app composition, analysed the source of static features, reviewed Android malware static detection technology based on machine learning, and discussed the future development direction. We analysed the algorithm model, core ideas, datasets, and performance metrics of the existing methods through the vertical comparison method and pointed out the advantages and limitations. Compared with other types of Android malicious application detection technology, the static detection method based on machine learning has advantages in the comprehensiveness, accuracy, and less expert dependence of detection, although it also has some weaknesses. is paperʼs work may provide Android application security researchers with reference, help them quickly grasp various methods, master key issues, and understand the development trend of technology.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.