cHybriDroid: A Machine Learning-Based Hybrid Technique for Securing the Edge Computing

Department of Computer Science, Capital University of Science and Technology, Islamabad 44000, Pakistan Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen 5063, Norway National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan School of Information Science and Technology (SIST), Southwest Jiaotong University, Chengdu 611756, China


Introduction
Internet of things (IoT), along with edge computing, has revolutionized industrial processes with the help of mobile devices such as tablets, smartphones, smartwatches, and PDAs. Nowadays, mobile devices can adequately render advanced functionalities for efficient, reliable, and scalable cloud services that exploit mobile edge computing (MEC). Extensive usage of Android mobile devices attracts the number of malwares to do MEC services. An increasing number of security threats have emerged recently that is used to steal private user information, lead towards bank frauds, and other socioeconomic crimes [1]. To evade the damages caused by such threats, different malware detection systems [2][3][4] were presented. Android security solutions for vulnerability assessment and malware analysis can be divided into two main categories as: (1) static and (2) dynamic analysis approaches. In the static technique, the application code is analysed without executing it. e dynamic technique focuses on analysing applications during execution and monitors its interaction with the other system modules and networks [5][6][7]. However, majority of the existing malware analysis techniques do not consider both the permissions and intents to analyse Android malware.
In contrast to static analysis, most of the dynamic techniques [8,9] only focus on analysing system and API calls. Existing dynamic malware analysis techniques do not focus on important dynamic features, such as data leakages, network connection manipulation, and enforcing special permissions. Using multiple dynamic features could strengthen the run-time analysis to detect a variety of malicious activities and application security threats. A comprehensive dynamic approach can detect most of the vulnerabilities and security threats at the cost of execution overhead. To efficiently cope with these issues, there should be a comprehensive malware analysis approach that exploits the lightweight static analysis for the already known malware and a comprehensive dynamic approach for the analysis of zero-day malware threats. In this work, we propose a comprehensive framework that incorporates both the static and dynamic analysis exploiting permissions and intents and considers important dynamic features such as data leakages, network connection manipulation, and enforcing special permissions. e major contributions of this research include the following: (1) a novel machine learning-based framework to analyse Android applications using a hierarchical approach (applying both the static and dynamic analysis) to detect known and zero-day malware, (2) a machine learning-based comprehensive static analysis model that incorporates both the application's permissions and intents, (3) a dynamic analysis model that involves the investigation of system calls (such as network activity, files access, SMS activity, and call activity), external DexClass usage, cryptographic activity, run-time permissions enforcement, and rehashing to detect known and zero-day malware, (4) hyper-tuning malware classifiers using the treebased pipeline optimization technique to improve the accuracy for malware detection.

Literature Review
is section encompasses the critical analysis of existing state-of-the-art approaches related to malware analysis as shown in Table 1.

Malware Detection Using Static Analysis.
Arora et al. [22] suggested a static approach to analyse permissions using the manifest file. A lightweight technique for malware detection was proposed, and its effectiveness was experimentally demonstrated using real Android malware samples. It extracted the permissions from the manifest file and compared them with a predefined keyword list. e designed model considered only one aspect of vulnerability but ignored other aspects, for example, intents and API calls, among others. Another study [10] considered intents (both the explicit and implicit) as semantically rich features to encode the malicious intentions of malware, especially when the intents are used in combination with permissions. e proposed system performed encoding and extracted explicit and implicit intents, intent filters, and permissions. Almin and Chatterjee [5] utilized the k-means clustering algorithm to classify applications which exploited the permission authorization to do malicious activity. e comparison of the research with famous antivirus solutions indicated that the proposed technique was able to detect the malware that remains undetected by most of the antivirus software.
MalDozer [18] is a system relying on artificial neural network that took an input of the raw sequences of API method calls with the same order as they showed up in the .dex file for android malware detection and their family recognition. During the training, MalDozer can automatically recognize malicious patterns using only the sequences of raw method calls in the assembly code. A framework [17] using several features that reflects multidimensional characteristics of the Android applications useful for malware detection is proposed. e authors choose a multimodal deep neural network to select the features with different characteristics. ey focused on static features such as Opcode, API, permissions, component, and environmental and string features. Experiments were conducted using the data set from Virus-Share and Malgenome project.
e proposed system attained a good accuracy of up to 98%. ough they studied many static features, the authors use dynamic features useful to detect zero-day and obfuscated malware. Wang et al. [16] recommended a deep learning-based hybrid model using autoencoder (i.e., DAE) and convolutional neural network (CNN) to improve the accuracy of malware detection. Reconstruction of the multiple features of android application performed and multiple CNN were employed for effective malware detection. To boost feature extraction proficiency, several pretraining procedures were accomplished, and customized combination of the deep autoencoder and CNN model (i.e., DAE-CNN) was employed that various learned ranges of patterns in a short time. e empirical test was performed on a data set comprises 23,000 Android applications with the attained 99.8% accuracy.

Malware Detection Using Dynamic Analysis.
e dynamic analysis technique is based on observing the application behavior during execution. In 2012, Google introduced a dynamic analysis-based security infrastructure named Bouncer by Wang et al. [16] for the Android platform. According to Google officials [16], every application that is uploaded on the Google play store is first simulated on Google Cloud infrastructure (using software named Bouncer). e Bouncer aims at guarding the Google Play store against malware threats.
Canfora et al. [8] introduced a detection method to identify malware attacks by employing system calls. Authors assumed that malicious behaviors were implemented by a sequence of system calls. e study employed a machine learning classifier SVM [23] to identify the specific sequence of system calls associated with malware. Authors used the sequence to identify the new malware families. ough the results of this research work produced a promising accuracy of up to 97%, more features like API calls and network statistics should be explored for a comprehensive dynamic analysis and a higher detection rate. A technique, named IntelliDroid is introduced by Wong and Lie [4], to capture the malicious activities during run time of an application.
e IntelliDroid recorded instances of specific API calls. e inputs generated by the proposed system triggered different events to monitor application behavior. In [11], the authors suggested an API sequence analysis-based dynamic mechanism. To monitor a new program, the hooking process (part of the implemented tool) monitors and tracks the API call sequences of programs. After extracting the API call sequences, the proposed system is compared with the API call sequence reference database. If matched, an alert about the potential malware is generated.
Alzaylaee et al. [2] proposed a system named DynaLog to extract many features (such as logging of high-level behavior and API calls). e extracted features were further analysed to detect malicious applications.
e DynaLog took advantage of existing open-source tools such as Droidbox [24] that can detect a wide range of Android malware. e DynaLog is basically based on the Monkey tool [25] provided by Google for testing Android applications. e applications which were unable to run in the emulated environment remain unchecked by the proposed system. Moreover, the DynaLog was incapable of recording events from the native code within Android applications.

Hybrid Malware Analysis Techniques.
A hybrid malware analysis technique combines the features from both the static and dynamic approaches to detect the wide range of Android security threats. Zhao et al. [12] proposed a hybrid malware analysis technique named AMDetector that employs a modified attack tree model [26] for malware analysis. e static part of the proposed technique detects possible attacks and employs this knowledge to classify applications into benign and malware classes. e application behavior triggered by different code components during run time is the part of the proposed dynamic analysis. e organized rules (with attack trees) rendered the good code coverage to the prototype model. e major drawback of the proposed system was the manual formation of rules and time-costly dynamic analysis.
Bläsing et al. [27] suggested the Application Sandbox (Sandbox) system, which is capable to identify malicious applications using the hybrid analysis. e static part of the proposed analyzer extracted the classes (i.e., .dex files) and decompiled these files into human-readable format. Furthermore, the code is scanned for suspicious patterns. e proposed system recorded the low-level details of system interactions during the application execution within the sandbox environment. e sandbox environment ensured the security of analysing system and safety of data of the underobservation device. e dynamic part of the proposed technique employs the Monkey tool [25] to observe the behavior of an application by producing random events. One of the limitations observed within the system was its incapability to detect unknown or new types of malware. SAMADroid [19] represents a hybrid malware detection model that combined the benefits of three different levels: (1) static and dynamic analysis; (2) host, which is local and remote, and (3) machine learning. Static analysis was performed on remote host considering the features belong to hardware components, requested permissions, application components, and API calls. e dynamic analysis was performed on local host using system calls that helped in the detection of malware patterns. Experimental results show that SAMADroid achieves up to 98% malware detection accuracy. us, the inspection of the applications is statistical. However, the employed dynamic analysis is only for the system calls related analysis. e employed dynamic analysis of system calls is already a well-worked area [10], and many malwares easily bypass system calls inspections [28] using code obfuscation techniques. erefore, there is a need to check the other dynamic features like network activity, API calls, and executable codes.
A technique to discover all the flow paths of most engaging APIs in a program using static analysis was proposed [29]. ey preferred static analysis because dynamic analysis is sometimes unable to extract all the important APIs completely. is technique is then named DroidDomTree. e strategy that they opted dependent on the study of dominant API is called during static analysis of an application. ese dominant API calls are also known as (semantic signatures), and mining the dominance tree of these semantic signatures is used to detect malware. Furthermore, in the dominance tree, authors assigned weights to individual nodes for effective feature selection. is weighting arrangement supported to choose imperative modules that helped further in feature selection and malware detection.
is study proposed the DL-Droid, a dynamic analysis-based Android malware detection scheme by using deep learning to find malicious patterns in a specific application. Authors enhanced their techniques through a state-based input generation method for improved code coverage. DL-Droid examined the accomplishment of the stateful input generation method using random input generation as a relational baseline. ey obtained higher accuracies with these stateful approaches.
is study highlighted the significance of enhanced input generation for Android malware detection systems during dynamic analysis. e authors conducted experiments using real devices and achieved a detection rate of 97.8% with dynamic features [24].
Chaulagain et al. [30] suggested a deep learning-based hybrid classifier for the safety screening of Android-based applications.
e proposed approach takes advantages of automated feature engineering and the combines benefits of static and dynamic analysis. is research collects different artifacts during static and dynamic analysis and trains the deep learner to get independent models. ese separate models combined to create a hybrid classifier that helped in vetting decision. e suggested vetting system has proved efficient against imbalance data and has achieved 99% accuracy. Pektas and Acarman [21] presented a hybrid featurebased classification system that statically analysed the requested permissions and the hidden payload while dynamic features such as API calls, installed services, and network connections were considered for malware detection. Different well-known machine learning algorithms were applied to evaluate the accuracy level in the classification of different classifiers using a data set of 3,339 samples. Authors attained the testing accuracy of up to 92% on the employed Android applications.
ough the proposed static analysis technique exploits the permissions and payload features, it ignored the close relationship of intents with permissions. Most of the time, considering only permissions to identify the malware is not adequate [10]. Table 1 shows the summary of related work about methodology and important features that most of the researchers employed for static or dynamic analysis. As shown in Table 1, most of the researchers have concatenated either on static or dynamic analysis and ignored an important aspect of application vulnerabilities that can be exploited in both static and dynamic analysis. A few researchers considered hybrid analysis. However, most of them ignored the intents and permissions relationship, which is a crucial aspect of Android applications. Moreover, most of the researchers have not exploited important system calls (such as network activity, file access, SMS, and call activity), usage of external DexClass, data leaks, cryptographic activity, runtime permissions, and rehashing activity during the execution of applications. e critical analysis of narrated stateof-the-art approaches has led us to formulate the following research questions (i) Q1: which of the static features (e.g., permissions along with certain intents patterns) play a vital role in Android malware detection?
(ii) Q2: which combination of the dynamic features such as system calls (i.e., network activity, file access, SMS, and call activity), usage of external DexClass, data leaks, cryptographic activity, run-time permissions, and detection of rehashing activity is important for Android malware identification?
(iii) Q3: how can malware detection rate be improved by employing hybrid analysis and machine learningbased classification?
To address these research questions, we propose a hybrid machine learning-based malware detection framework called HybriDroid for Android platform.

Proposed Hybrid Malware Analysis
To analyse the impact of hybridization, we propose two machine learning-based hybrid malware analyzers, respectively, named HybriDroid and cHybriDroid. e HybriDroid framework exploits static as well as dynamic features for malware analysis using a hierarchical mechanism. First, the applications are analysed solely using the static features, and then the dynamic features are employed to examine the suspicious (the applications marked as clean by the static analysis) applications. Moreover, to investigate the impact of combined analysis (using both the static and dynamic features), we propose cHybriDroid framework.

HybriDroid Architecture.
is section describes the overall methodology of the proposed Android malware analysis framework, that is, HybriDroid (shown in Figure 1). e proposed hybrid approach is comprised of a hierarchical system based on two phases: (1) static and (2) dynamic phases (as depicted in Figure 1). In the static analysis phase, the APK files of applications are first dissembled into XML and Java files. After that, the XML files are examined to extract the application related to permissions and intents.
ese features are then supplied to the proposed machine learning-based static analyzer. By employing the provided static features, the machine learning-based analyzer categorizes an application like malware or suspicious. To further examine the suspicious applications, the dynamic analysis phase is initiated. e applications classified as suspicious are then provided to the dynamic analyzer for analysing run-time behaviors.
For dynamic analysis, first of all, each application is executed in the emulated environment (using DroidBox [24] emulation tool) to log the observed dynamic features (such as system calls, usage of external DexClass, data leaks, cryptographic activity, and detection of rehashing activity). e dynamic features are then provided to the machine learning-based dynamic analyzer for the classification purpose.
e machine learning-based dynamic analyzer classifies these suspicious applications as benign or malware. e applications classified as malware are added to the malware data set while the applications declared as benign are added to the clean applications data set.

cHybriDroid Architecture.
To investigate the impact of combined analysis (using both the static and dynamic features), we propose a cHybriDroid framework (as shown in Figure 2). e cHybriDroid examines the Android applications using both the static and dynamic features simultaneously (see the architecture of cHybriDroid in Figure 2). For each Android application, both the static and dynamic features are extracted and provided to the machine learning-based analyzer for classification (as malware or benign). To extract the static features (i.e., intents and permissions), the application is disassembled into APK and manifest files. Moreover, the application is executed in the virtual environment (interactively by tapping and using sample inputs), and the dynamic features are logged. Afterwards, the static (i.e., intents and permissions) and dynamic (such as data leakage, network usage, and use of DexClass) features are provided for the developed cHy-briDroid to analyse the application (as depicted in Figure 2). Figure 3(a) depicts the complete training process of the proposed HybriDroid malware analyzer. e training data set comprises 50% benign (i.e., clean Android applications) and 50% malware (as mentioned in Table 2). As the HybriDroid mechanism is based on the hierarchical model, therefore, both the static and dynamic machine learning analyzers are trained separately. To train the static analyzer, an Android application is disassembled into Java and XML files (sample shown in Figure 4) in order to extract the feature vectors related to permission and intents. e disassembled Java, XML, and manifest files are used to obtain static features such as intents and permissions. ese intents and permissions are then compared with each application in the data set. If the application intent or permission matches with the extracted permissions, the value of that intent or permission is set to 1; otherwise, it is set to 0. Similarly, a feature vector based on 407 distinct values is formed. ese feature vectors along the application category or label (i.e., malware or benign) are provided to the static machine learning analyzer. Similarly, for the training of the dynamic analyzer (in the HybriDroid framework), 50% of the benign and 50% of the malware applications-based training data set was executed in a virtual environment (i.e., DroidBox [24]). A total of 15 distinct dynamic features are collected and provided along with the application category (i.e., malware or benign) to the dynamic analyzer (HybriDroid). Additionally, K-fold cross-validation method is used along with grid search mechanism that is employed for hyperparameter tuning (as shown in Table 3). Figure 3(b) shows the training of cHybriDroid that employs single machine learning-based analyzer trained using both the static and dynamic features simultaneously. For each Android application, the static and the dynamic features are extracted and supplied along with the application category (i.e., benign or malware) to the cHybri-Droid's combined analyzer. e combined analyzer is trained using 432 distinct feature vectors based on the static and dynamic aspects of the application.

Experimental Result
e experiments are performed on a personal computer. Detailed specifications of the machine are illustrated in Table 4. To evaluate the proposed frameworks, HybriDroid and cHybriDroid, we employed five machine learning classifiers, respectively, are Random Forest (RF), K star (K * ), Naive Bayes (NB), Support Vector Machine (SVM), and J48 decision tree [12-14, 27, 31]. Moreover, TPOT [28] technique is also used that chooses the right machine learning model and the best hyperparameter for that model.

Data Set.
e benign or clean applications in the data set are collected from the Google play store [16], and a thirdparty app store called Apkpure [16] is shown in Table 5. For Security and Communication Networks malware samples, we acquired benchmark Drebin [3] data set that consists of 5,560 malwares from 179 different families and some of them are shown in Table 6. Drebin is extensively used throughout research works on Android malware detection. e Drebin data set consists of malware applications obtained from various Android markets, different antivirus engines, malware forums, security blogs, and Android malgenome project [16].

Feature Selection.
e permissions are one of the important static features which must be examined carefully to safeguard from the potential security threats. In addition to the permissions, intents within Android applications are another important aspect requiring careful analysis. Intents are part of the complex messaging model of Android system, which facilitates execution of the different applications, services, and operating system functions. Different activities,  broadcast receivers, and some services used intents for their activation and record their type of intent using intent filters in the manifest file. Some of the recent studies [10,32] have shown that the intents and permissions are often exploited (such as intent spoofing and permission collusion) by the malware. us, their critical examination is necessary to detect malicious activities. Table 7 shows the features collected (using DroidBox tool) during the dynamic analysis step of the proposed methodology. ese features are the result of the execution events generated during the execution of applications (within a virtual environment). From Table 8, it is evident that the internet is the most employed (i.e., 20%) permission by the applications (by both the malware and benign). Other permissions that are the part of the most requested permission set in malware applications belong to sending and writing SMS, having a collective percentage of 14. Moreover, accessing approximate and exact locations through ACCESS_FINE_LOCATION and ACCESS_COAR-SE_LOCATION permissions is employed by the 11% malware applications.

Feature Ranking.
e motivation behind using a reduced feature set (for the employed predictive models) is to eliminate redundant data, reduce overfitting issues, improve classification accuracy, and decrease the training time of the algorithm. e dynamic analysis results in a large number of features; therefore, it was necessary to use only the important features for the machine learning model. For this purpose, we employ the information gain method [16] that finds certain patterns of the features in the employed applications of the data set. Each feature is assigned with a certain score highlighting the effectiveness of the feature in classification.
e InfoGain is a well-known feature selection algorithm that records the changes in the entropy of the information class before and after the observation [3]. e formula to measure the information gain is shown as where P indicates the set representing the pattern, |P| is the number of samples in P, v is the value of the feature F, (P, v) is the value of feature F, and Pv is the subset of P (where feature F has value v). Before the observation of features entropy, the class is defined and shown as where C indicates the class set and Pc represents the subset of P belonging to class c. Information gain is considered as a simple and fast ranking method that yields the most suitable features, which are helpful in identifying application class (in our case malware or benign). Using InfoGain, 172 important static features (comprising permissions and intents) out of a total of 407 features are selected. e top 10 features are    For intents, the receiver has been ranked highest among the intent category. e top ranked dynamic features with the rank score are shown in Table 10. As shown in Table 10, sendsms is the top dynamic feature that has the highest potential to reveal the category of an Android application (i.e., as malware or benign). Sendsms dynamic feature represents information leakage via network, SMS, or any file-based activity. Cryptousage, sendsms, enfperm, and sendnet are the other top-ranked features which retain maximum information (i.e., attained higher rank value), and this shows the significance of these features for malware analysis. In this research, we use the top five (out of a total of 15) ranked dynamic features.
e dataleaks dynamic feature retains the maximum information when InfoGain is applied (as shown in Table 10). is information is necessary for accurate malware classification. Similarly, the READ_SMS from the intent category has the highest potential to accurately classify malware compared with the other employed features. e android.provider.Telephony.SMS_RECEIVED in the permission's category is among the top 10 highest-ranked permission (as shown in Table 11). In this research, we selected the top 20 (422) hybrid features. e full feature ranking and information gain are mentioned at https://bit. ly/2GduUEt. Table 12, results related to crossvalidation grid search experiment are presented. When feature selection is not performed then, TPOT produces the highest 0.91 F-measure. However, the Naive Bayes produces 0.98 precision, and TPOT produces 0.91 recall. In Table 12, when feature selection is performed, the TPOT F-measure is decreased from 0.91 to 0.87. Random forest produced the best result of 0.88 F-measure and 0.88 precision. e reduced result of the model indicated that removed features have minimal impact on the performance of the classifiers. In Table 12, the cross-validation grid search experiment related data based on dynamic features with and without feature selection is presented. When feature selection is not performed then, TPOT produces the highest 0.94 F-measure, which is 0.03% improved compared with the static features mentioned in Table 13. However, the Naive Bayes produces 0.99 precision and support vector machine produces 0.92 recall. In Table 12, when feature selection is employed, the TPOT F-measure is decreased from 0.94 to 0.91. e TPOT produced the best result of 0.91 F-measure and 0.88 precision while reducing the number of features from 15 to 5 (with a drop of F-measure 0.03). Table 13 shows the best classifier for dynamic features based analysis. In Table 12, cross-validation grid search experiment is conducted on the hybrid features with (20 selected features) and without (total 422 features) feature selection. When feature selection is not performed then the Naive Bayes produces the highest F-measure (i.e., 0.99) which is Music & audio  Table 13, respectively. e Naive Bayes produces the precision of 1.00 and the recall of 0.99. In Table 14, when feature selection is performed, the TPOT F-measure results in 0.97 and the Naive Bayes F-measure is decreased from 0.99 to 0.96. e TPOT produced the best results, that is, 0.97 F-measure, 1.00 precision, and Data leaks Detect leakage of information on the phone including messages, e-mail, password, contacts, IMEI, GPS information phone number, and so on 8

Result Discussion. In
Accessed files File accesses 9 Fda ccess Read and write operations of file and directory 10 Send sms Send SMS 11 Phone call Phone calls made 12 Cryptousage Detect the cryptographic functions and what key is used when encrypting and decrypting data 13 Recvaction APKs function invoked as a receiver 14 Enfperm Enforce special permission to activity, broadcast receiver, and service 15 Hashes e hash value of APK file   Table 13 showed each fold result of the TPOT (without feature selection) and random forest (with feature selection). Since random forest is trained on different samples of the data which reduces variance, it obtained better performance. Moreover, random forest used a random subset of features which also helps to reduce overfitting. e dynamic features based TPOT technique is shown in Table 13 depicting the most performing classifiers (with and without employing feature selection). e reason that the extratree classifier obtained the improved results compared with the other classifier is that the random value is selected for feature consideration. e random split for the extra trees helps to create more diversified trees and less splitters. Table 13 shows the best classifier using the hybrid features. In the hybrid feature, Naive Bayes classifier resulted in the best classifier without feature selection and the TPOT-based technique results in best classifier with reduced features. e Naive Bayes is a probabilistic based classifier, so it does not require any selection of tune parameter. However, TPOT needed hypertuning, where we used the evolutionary algorithm to optimize the parameter. e tune parameters for TPOT model are StackingEstimator (estimator � LogisticRegression (C � 0.1, dual � True, penalty � "l2")), GaussianNB ()). e reason that TPOT technique obtained the improved results compared with the other classifiers is that it uses a stack generation technique to improve its performance. e metalearner that outputs Gaussian classifier makes the final prediction. e results presented in Table 13 show that the TPOT and Naive Bayes outperformed the other machine learning models and are more effective in malware detection. e attained F-measure value for the TPOT model indicates the notable performance of the model. It is evident that, for the TPOT model, the true positive rate is observed fairly high and the false positive rate is extremely low. erefore, we employ the TPOT classification technique for our proposed cHybriDroid framework.

Prediction Model Overhead.
e cHybriDroid is trained offline. e overhead of using cHybriDroid predictor includes the selective feature extraction and making the predictions. e overhead of feature extraction is negligible (approximated 1s in total) as a feature is extracted at compile time. e prediction model training is performed once, and it is a one-time cost. e training and testing time for both models are mentioned in Table 15. In summary, the overhead of the prediction model is negligible, that is, two seconds for one application.
Using the hybrid analysis approach, we experimented with real malware and benign Android applications. Our study showed that using both the static and dynamic application features result in a commendable malware detection accuracy. With the feature ranking mechanism, we further optimized the two proposed hybrid methodologies in terms of performance and accuracy. e reduced number and employing only the important features results in good detection performance and accuracy. For hybrid malware analysis, we adopted two strategies: (1) HybriDroid and (2) cHybriDroid. e HybriDroid methodology was typically designed to perform a hybrid malware analysis (employing both the static and dynamic or run-time features) using a hierarchical mechanism. At the same time, the cHybriDroid mechanism was employed to analyse the effectiveness of malware detection when the static and dynamic features are analysed simultaneously. Our results exhibit a higher malware detection accuracy for the HybriDroid with a 97% F-measure as mentioned in Table 16. We found that the TPOT [28] was the top-performing machine learning model (for cHybriDroid) as compared with the other employed models. To attain a better performance insight, we noted the False Positive Rate (FPR) and True Positive Rate (TPR) for the cHybriDroid classifier. e results revealed that the TPOT [28] machine learning model attained the highest performance up to 96% TPR. Similarly, the r 2 value for the TPOT machine learning model also specifies the potential of TPOT to detect malware. Overall, the malware detection accuracy of the hierarchical hybrid approach (i.e.,    Hybrid analysis increased the F-measure score of 5% with and without feature selection  Table 13: e selected best model in terms of cross-validation score. cHybriDroid) was marginally better than the combined hybrid approach, that is, HybriDroid.
4.6. Analysis. As seen from Table 13, the static, hybrid, and dynamic model achieve the high F-measure score. We train the analysis tool on a comprehensive data set and use the optimized parameters for machine learning. Within the proposed security mechanism, we firstly do the static analysis part mainly comprising the manifest file, including permission tags and application intents. e reason for the static analysis is that malware can be tested on the submission of the application before the execution of the application. If the model probability is low, the mechanism should apply the dynamic classification model and detect it under control environment. e dynamic method takes rigorous testing, so it will cost execution time. If the model is uncertain again, then the proposed method will apply to the hybrid model. In this way, we test the application with three different models. Table 14 answers the research question mentioned in Section 1. e method can be adopted for the ransomware and adversarial attacks. e method can be applied in a huge size data set. We can train such kind of ensemble machine learning analyzer on discussed features to detect the ransomware application and classify them into families.

Conclusion and Future Work
Nowadays, Android is deemed as the renowned OS for mobile devices. Subsequently, the Android platform attracts several malware experts to gather huge economic and social benefits. To mitigate malware activities, different malware detection systems have been proposed. However, the deficiencies in these systems have led us to propose a novel machine learning-based hybrid malware detection framework that employs several important static and dynamic features. Furthermore, the study has also analysed the role of different machine learning classifiers for malware detection.
is study highlights that, in the development of a robust machine learning-based malware detection system, the selection of features from the data set is one of the significant steps. Feature selection depends upon the analysis method through which they are extracted. It is the analysis technique that determines the compatibility of features with the classification algorithm. In the experiments, we attain 97% F-measure, and the trained classifier shows a tremendous efficiency with an r 2 value of 0.91. e TPR is also high, that is, 0.96, while the FPR is very low, that is, 0.04. For the future work, we intend to incorporate code coverage, memory utilization, and network statistics aspects of the executing applications (for dynamic analysis). Moreover, the classifiers will be trained to subclassify the malware into families.

Data Availability
e datasets used in the study were taken from previously published studies (Google-play store [16]; Drebin [3]).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.