A Novel Malware Classification Method Based on Crucial Behavior

Network and Information Center, Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China National Engineering Laboratory for Mobile Network Security (No. [2013] 2685), Beijing University of Posts and Telecommunications, Beijing 100876, China Inspur Electronic Information Industry Co., Ltd, Jinan 250000, China State Key Laboratory of High-end Server and Storage Technology, Jinan 250000, China Ernst and Young, Tokyo, Japan


Introduction
Malware refers to any software that aims at damaging or infiltrating computer systems [1]. e fast-growing malware variants pose a serious threat to malware detection. According to Symantec's 2018 Internet Security reat Report (ISTR), the number of malware variants reached 669,947,865 in 2017, doubling that of 2016 [2]. Moreover, the advent of new technologies has contributed to the increasing complexity of malware. Facing numerous and sophisticated malware variants, malware detection is urgently needed (e.g., see [3][4][5]).
Among the existing methods, malware detection is mainly divided into static and dynamic analysis methods [6]. Static analysis methods are processes of analyzing instructions and structures to confirm program functions [7]. ey do not need to run malware directly. Unfortunately, static analysis methods are sensitive to sophisticated obfuscation instructions and encryption techniques. Aiming at the shortcomings of the static analysis methods, dynamic analysis methods are proposed. e advantage of dynamic methods is that they observe behaviors by running programs in a virtual environment. Dynamic methods commonly address the threat of statically obfuscated malware [8] and encryption techniques. Obfuscated malware samples can change their code syntax while preserving their semantics [9]. Dynamic analysis is an effective way to recognize malware behavior. For example, when we want to analyze the keylogger, dynamic analysis can help us find the keylogger's log file and trace the information.
e program generally relies on Application Programming Interface (API) calls provided by the operating system to accomplish its functions. Hence, a program's execution trace can be obtained by monitoring the stream of API calls [10]. e API call graph is constructed by tracing API calls and their arguments, which is an effective method to indicate program behavior [11]. Considerable effort has been expended to identify malware by using the API call graph [12,13]. With the advent of sophisticated malware samples, the API call graph is becoming more and more complex [14,15]. e major issue facing malware detection is computational complexity [16,17] of graph matching. Moreover, it is a great challenge to construct a graph that is general enough to classify malware.
Our method is different from previous methods since it solves one major shortcoming. We propose a novel method that divides the API call graph into fragment behaviors. Moreover, we extract crucial behaviors by applying the term frequency-inverse document frequency-(TF-IDF-) like measure [18,19] and information gain (IG) [20,21]. Finally, we utilize Random Forest (RF) [22], Support Vector Machine (SVM) [23], Decision Tree (DT) [24,25], and K-Nearest Neighbor (KNN) [26] for malware classification. We aim to enhance the malware classification performance by constructing an appropriate behavioral representation of the malware family. e main contributions of this method are as follows: (1) We propose a graph repartition algorithm to extract fragment behaviors from original API call graphs. e extracted fragment behavior is a graph-based API sequence that preserves the dependency of the API call graph.
(2) We extract crucial behaviors by improving the feature extraction measure. e improved extraction measure which combines TF-IDF and IG shows great advantages in malware classification.
(3) e proposed method achieves promising performance in both malware detection and classification. e experimental results demonstrate that the extracted crucial behaviors can accurately describe malware activities. e rest of this paper is organized as follows: Section 2 reviews the related work. Section 3 introduces some basic notations. Section 4 represents the proposed method which consists of system overview, graph repartition, and feature extraction module. Experiment and evaluation are described in Section 5. e limitations of the proposed method are discussed in Section 6, which is followed by the conclusion in Section 7.

Related Work
Programs generally perform various activities by taking advantage of different predefined API calls. API calls provide valuable information to identify potential exceptions and malicious activities. A considerable amount of researchers has been devoted to the research of the API call sequence. Eskandari et al. [27] proposed a dynamic malware detection system that explores system call via API call. In addition, they extracted API calls from the log file and used the n-gram to generate 4-gram API call sequences. Lee et al. [28] utilized the Cuckoo sandbox to execute programs dynamically. ey extracted API behavior data and transformed API calls into sequences by using the n-gram method. Moreover, they calculated the frequency of sequences. After that, the cosine similarities of API sequences were calculated among different programs. Finally, the malware samples which are similar to each other were grouped.
Hansen et al. [29] presented a scalable dynamic analysis method by injecting programs into parallel virtual environments. e parallel virtual environment is implemented by developing the setup of the Cuckoo sandbox. ey extracted labels and features from samples. e extracted features consist of API calls and their input arguments which include registry and DLLs. After that, they proposed two representation methods for malware detection and classification.
As mentioned above, the common parts of API call sequences can be utilized to identify the similarities of malware samples. e sequence-based approach is relatively simple to describe malware behavior. However, sequencebased methods only preserve temporal information between API calls, which are vulnerable to reorder or irrelevant API calls. Some methods have been proposed to address the drawbacks of sequence-based methods, such as deep learning-based models and more comprehensive feature representation.
Amin et al. [30,31] explored bidirectional long shortterm memory for building an antimalware system to detect static opcodes of malware. In addition, they designed a deep learning model of generative adversarial networks to detect Android malware. D'Angelo et al. [32] transformed API call sequences which are invoked by apps during their execution to APIimages. ey autonomously extracted the most representative and discriminating features by applying autoencoders. e deep learning-based model shows great advantages in malware detection.
On the other hand, the API call graph is proposed to capture comprehensive relations (such as argument dependency) between API calls [33]. Park et al. [34] monitored the execution of programs and then constructed weighted directed behavioral graphs that represent kernel objects, object attributes, and dependencies between kernel objects. In addition, they proposed a method to generate a common behavioral graph by clustering individual behavioral graphs.
Elhadi et al. [11] presented a static analysis system; the proposed system read samples and then extracted API calls and their parameters under a secure environment. ey classified API call graphs based on sequence dependence, data dependence, declaration dependence, and API dependence. For each kind of dependence, they constructed an API call graph. Finally, they integrated four kinds of API call graphs into an integrating API call graph and calculated the similarity between graphs.
Nikolopoulos and Polenakis [35] proposed a graphbased model based on dynamic taint analysis. e proposed model is constructed by exploring main properties of systemcall dependency graphs. ey adopted the Euclidean distance-based Δ-similarity metric for malware detection and the SaMe-NP similarity metric for malware classification.
Programs generally accomplish tasks by executing similar behaviors or repeating behavior multiple times. More similar or repeated behaviors occur, and more duplicated nodes or subgraphs appear. e drawback arises with sophisticated behavior which results in high dimensional features and brings more calculations [36]. Furthermore, it is unsatisfied that a behavioral graph is too specific which may ignore the minor changes in malware variants [37]. Likewise, not specific enough of the behavior graph commonly leads to benign samples judged as malware. A large number of work have been concentrated on investigating accurate approximation methods for these problems.
Fredrikson et al. [37] mined significant behaviors from samples based on the data dependence graph. e mined significant behaviors are then utilized to synthesize an optimally discriminative specification based on concept analysis and simulated annealing [38] algorithm. e focus of this proposed method is to reduce the size of the graph [39].
Alam et al. [40] put forward the "Annotated Control Flow Graph" and "Sliding Window of Difference and Control Flow Weight" to reduce the effects of obfuscations. e proposed Annotated Control Flow Graph provides a quick graph matching method by dividing itself into many smaller Annotated Control Flow Graphs. e proposed Sliding Window of Difference and Control Flow Weight captures the semantics of the control flow and helps in malware detection.
Ding et al. [41] constructed an API dependency graph by tracing taint data. After that, they proposed a dependency graph pruning algorithm for pruning a dependency graph. Finally, they constructed a common behavioral graph based on the pruned dependency graph. e proposed common behavior graph prunes similar and repeated behaviors.
We provide a comprehensive summary of malware detection and classification work in Table 1. To simplify the representation of the graph, we propose a novel graph repartition algorithm. e proposed algorithm constructs fragment behaviors that describe crucial activities of the malware. e novel and simplified representation of fragment behavior preserves the dependency of the API call graph and effectively avoids problems in graph matching.
is novel behavioral representation is designed to provide a better malware classification performance.

Basic Notation
We explain some notations in this section: subgraph, Norder subgraph, crucial N-order subgraph, and TF-IDF.
API call graph commonly represented by a directed acyclic graph which consists of nodes and edges. If an API call A is associated with API call B, an edge is established from node A to node B.
at is, edges represent dependencies among different types of nodes (e.g., network, registry, and file system).
API call graph defines specific behaviors. We annotate root and leaf nodes with labels in each API call graph. After that, we extract the full execution paths from the root node to leaf nodes in an API call graph. ese no-branching execution graphs extracted from API call graph are represented as subgraphs in our system. N-order subgraph is extracted from the subgraph by sliding a window of size N.
Definition 1 (N-Order Subgraph (NSG)). NSG is a graph in which the maximum number of nodes does not exceed N.
NSGS stands for NSG set: where m is the number of NSG in NSGS. NSG with the crucial information is chosen as an indicator of malware. We call this crucial NSG.
Definition 2 (Crucial N-Order Subgraph (CNSG)). CNSG is a subset of NSGS, and it contains the crucial information of NSGS. It can be described as follows: where TF-IDF is a numerical statistic in information retrieval. It reflects the importance of words in a document. TF refers to the number of times a given word appears in a document. IDF measures the general importance of words [42].

Definition 3. TF-IDF is the product of TF and IDF.
Given a document d i and document set D which has n documents, D � (d 1 , d 2 , . . . , d n ). e word in the dataset is represented as w. TF-IDF is calculated as follows: where TF(w, d i ) represents the frequency of the word w in is the number of times the given word w appears in a document d i , and |d i | represents the dimension of the document d i . IDF(w, D) reflects the inverse document frequency of the word w in document set D, and | d i | w ∈ d i | is the number of the documents which contain w.

Malware Classification System
Overview. Our method consists of three parts: graph repartition, feature extraction, and malware classification. e whole process of the proposed system is outlined in Figure 1. Graph repartition consists of two modules: subgraph construction module and NSG construction module. Subgraph construction module extracts subgraphs from API call graphs which are constructed based on the registry, filesystem, process, services, network, and synchronization. NSG construction module extracts NSG from the subgraph construction module. Our goal is to build the appropriate behavioral representation and extract CNSG by using the improved TF-IDF-like measure in feature extraction. In the last step, RF, SVM, DT, and KNN are used for malware classification. e following are the steps of our proposed system.
Step 1. Extract subgraphs We extract subgraphs from API call graphs of malware and benign samples. Icons with different shapes and colors represent various API calls in Figure 1. We can see that four API call graphs are listed in different rectangles. e subgraph construction module extracts five different subgraphs from four API call graphs.
Step 2. Build fragment behavior of NSG NSG is obtained through an API call repartition algorithm based on the sliding window. We illustrate 3SG and 4SG in Figure 1. Icons that are not in the shadow refer to the parts that need to be discarded. NSG preserves more complex semantic information than API sequences, which contains the dependencies of API call graphs.
Step 3. Extract crucial behavior of CNSG We adopt the TF-IDF-like measure and IG to calculate the crucial coefficient of NSG.
e NSG with the higher crucial coefficient is selected as the significant behavior (e.g., CNSG) in our method.
Step 4. Malware classification For each program analyzed in Cuckoo sandbox, we use some classifiers (e.g., RF, SVM, DT, and KNN) to identify whether the program is benign or malware. We obtain the appropriate CNSG in this process by comparing the performance of the experiments.

Malware Classification System
Overview. In this section, we propose a graph repartition algorithm that reconstructs the API call graph to NSG. e purpose of the proposed algorithm is to build the appropriate fragment behavior of malware families by pruning similar behaviors. Figure 2 shows the trace extracted from the log file generated through the Cuckoo sandbox. is is part of the input that the API call graph is built from. In line 1, one can see that the malware creates and opens a registry. After that, it repeatedly retrieves and sets the data. On lines 7 and 8, the malware creates a file and changes its information. In line 9, the program retrieves the information of the file. It closes the file on line 10. On lines 11 to 13, the program creates, retrieves, and closes another file.
In Figure 3, G 1 , G 2 , and G 3 are three API call graphs from line 1 to line 6, line 7 to line 10, and line 11 to line 13 in Figure 2, respectively. As illustrated in Figure 3, API call is applied to construct the node of a graph and the arguments are utilized to connect two API calls based on dependencies.   [27] API call sequence Simple, vulnerable to reorder or irrelevant API calls Lee et al. [28] API call sequence Hansen et al. [29] API call sequence; arguments; frequency Amin [30,31] Opcode End-to-end learning D'Angelo et al. [32] API call sequence-based image Park et al. [34] Behavioral graph High dimensional features can bring more calculations Elhadi et al. [11] API call graph Nikolopoulos and Polenakis [35] System call dependency graph Fredrikson et al. [37] Optimally discriminative specification Simplified representation of behavior graphs Alam et al. [40] Control flow graph-based feature Ding et al. [41] API dependency graph 4 Mathematical Problems in Engineering 0x0000044c of Handle is used to connect the RegQuer-yValue on line 2. e details of API call graph construction are described in our previous work [43]. It is necessary to extract crucial behaviors from the API call graph for malware classification. For each API call graph, we first identify the root and leaf nodes. e root node is a node with no input information, and the leaf node is the node whose output is null in our system. Also, we need to extract subgraphs from the established API call graph. e extracted subgraphs are simple no-branching graphs that start at the root node and end with the leaf node. We obtained all subgraphs as follows.
When there is only one branch in the API call graph, the extracted subgraph is the same as the API call graph. We divide the subgraph into fragment behavior through a sliding window. e behavior in a sliding window is a fragment behavior. e fragment behaviors are a set of behaviors that can accomplish a part or a certain function. e extracted fragment behavior is represented by NSG in our method. e size of the sliding window determines the maximum number (N) of NSG.
We show the process of some NSGs extracted from the subgraph of G 1 in Figure 4. When N � 3 (3SG), SG1 in the first window is the first 3SG extracted from the subgraph. e sliding window of size 3 slides from top to bottom at intervals of 1. Different colors of the window reflect the moving trail of the sliding window. We can see that the fragment behavior in the fourth window is the same as the fragment behavior in the second window. Hence, three unique 3SGs of SG1, SG2, and SG3 are extracted from the subgraph: For G 2 and G 3 , we notice that the number of nodes in each subgraph is smaller than 3 when we want to build 3SG. In this condition, the subgraph is regarded as a 3SG, which does not need to be divided. When all NSGs are extracted from all subgraphs of a program, we obtain NSGS. is set is used to represent the program behavior. e combination of NSGs contains the complete semantic information of a program's API call graph. Our goal is to describe malware with an appropriate fragment behavioral representation by searching for N.
Algorithm 1 describes our proposed API Call Repartition Algorithm. e input of this algorithm is the API call graph (G 1 , G 2 , . . . , G r ). For an API call graph G i , we first search for the root and leaf nodes. Lines 3 and 4 describe the root node and leaf nodes in an API call graph. We extract subgraphs from an API call graph in line 5. e extracted  Mathematical Problems in Engineering subgraphs start from the root node and end with leaf nodes. It is worth mentioning that the simple paths from the root node to a certain leaf node in an API call graph may occur more than once. For each subgraph in line 8, if the order of a subgraph is smaller than N, NSG is the same as the subgraph. Otherwise, we should extract the appropriate NSG based on the sliding window. e output of this algorithm is NSG. In this algorithm, we transform the original API call graph into fragment behaviors NSG. e subgraph is a no-branching fragment behavior extracted from API call graphs. Hence, the number of subgraph n is no less than the number of API call graph r (n ≥ r). In the same way, m ≥ n. An important problem of this issue is the value of N. Our goal is to better describe malware with an appropriate N as small as possible.
API call repartition algorithm removes two types of similar behaviors: internal similarity and external similarity. Internal similarity refers to the similarity of NSG in a subgraph. External similarity represents the similarity of NSG among different subgraphs. Internal similarity is generally caused by repeatedly executing API calls. For example, if a program repeatedly invokes RegQueryValue and RegSetValue in Figure 4,   Find the root node V 0 in G i (4) Find leaf nodes (V k , V k+1 , . . . , V k+r ) in G i (5) Extract subgraphs from V 0 to leaf nodes (6) End (7) Obtained all extracted subgraphs (G 1 ′ , G 2 ′ , . . . , G n ′ ) (8) For i � 1 to n do // i represents i − th subgraph and m ≥ n (9) If the order h in G i ′ is smaller than N: h ≤ N (10)  6 Mathematical Problems in Engineering

Feature Extraction.
e characteristic of the program is represented as fragment behavior NSGS by applying the API call repartition algorithm which eliminates similarity behaviors. To remove unimportant ones, we need to calculate the crucial coefficient of NSG in NSGS. We propose a method that exploits the idea of TF-IDF and IG to evaluate the importance of an NSG.
We have four malware families and different types of benign samples in our proposed system. Different types of benign samples are defined as one family of benign. Hence, we have five categories; the category set is represented as C, where C � (C 0 , C 1 , C 2 , C 3 , C 4 ). Each family has k samples (in our proposed system, k � 880).
TF-IDF's main idea is that a fragment behavior NSG is appropriate for selecting as a crucial behavior when it appears with a high frequency (TF) in a category and appears with a low frequency in other categories. For IDF, NSG is appropriate for selecting as a CNSG when a fragment behavior NSG i appears in a small number of categories.
We consider that a fragment behavior NSG appears p times in a category C j (C j ∈ C). In addition, it appears q times in other categories except for C j . Hence, fragment behavior appears (p + q) times altogether. NSG is a crucial behavior of C j when p is large enough, which means that the value of Cru(NSG i ) is very high. However, the value of IDF(NSG i ) is relatively small because of the large (p + q).
We present the improved TF-IDF-like measure by applying IG which is described in our previous work [20]. IG is defined as how much information the feature brings to the system. e more the information this feature brings to the system, the more important the feature is. e fragment behavior NSG appears p times in the category C j and appears q times in other categories except for C j . When p is large enough, the value of IG(NSG i ) is sufficient to select NSG as a crucial behavior.
Based on the TF-IDF and IG, we derive a symbolic expression for calculating the coefficient of NSG as follows: where TF − IDF(NSG i ) represents the value of the TF-IDFlike measure and IG(NSG i ) stands for the value calculated by IG. e improved TF-IDF-like method determines the effects of different factors of TF-IDF-like measure and IG by finding appropriate α (0 < α < 1). f(NSG i , C j ) is the number of times NSG i appears in family C j , |C j | is the dimension of C j , and C j | NSG i ∈ C j is the number of samples which contain NSG i .

Results
is section describes the dataset and the evaluation method in Section 5.1. Section 5.2 shows the experiment and evaluation results.

Dataset and Evaluation Method.
To prove that our method is effective in detecting malware, a set of malware classification experiments are presented in this section. To ensure the fairness and effectiveness of the experiment, we selected the same amount of families which consist of Delf, Small, Zlob, and OBfuscated. To prevent confusion with obfuscated malware, we use the OBfuscated that begins with two uppercase letters to represent the Trojan.Wi-n32.Obfuscated family. In addition, we download 880 benign samples from different websites. More precisely, benign samples consist of Desk Widget, Facebook Messenger, Google Earth, Matlab, Minclock, and Quicktime player.
Ubuntu is selected as the operating system to run a standard Cuckoo sandbox. First, we process malware samples in bulk by developing the Cuckoo sandbox. Each sample was executed several times. After a comprehensive analysis, the samples that performed malicious behaviors were selected for experimental analysis. As we all know, fileless malware can delete all the files it saves on the infected system disk, injects code into running processes, and uses PowerShell, Windows Management Instrumentation, and other technologies to make detection and analysis difficult.
is antianalysis method can bypass hooks deployed in automated analysis sandboxes (such as Cuckoo sandbox).
is article does not focus on file-less malware and other escape circumstance. Second, to ensure the fairness and effectiveness of the experimental results, we select 880 samples for each malware family and benign for experiments. Finally, we perform 10-fold cross-validation. In 10fold cross-validation, we divide all dataset samples into ten parts. To guarantee the proportion of each family, we choose nine parts for training and the last part for testing each time.
e experiments are repeated ten times and the accuracy is the average of the experimental results.
In our proposed malware classification method, TP, FN, FP, TN, TPR, and FPR in the formulas are defined in Table 2. is definition uses Delf as an example. Delf is a malware family in our work.
TP represents the number of samples in which the sample belongs to Delf and is correctly classified as Delf.
FN is the number of samples in which the sample belongs to Delf but not classified as Delf.
FP indicates the number of samples in which the sample not belongs to Delf but classified as Delf.
TN indicates the number of samples in which the sample not belongs to Delf and is not classified as Delf.
e common performance of accuracy is defined as follows:

Experiment and Evaluation
Results. RF, SVM, DT, and KNN are employed to evaluate the detection effectiveness of our method and to explore the impact of α in malware classification. We studied the effect of the value of α on different classifiers. We set the size of α from 0.1 to 0.9 to observe the effect of α on different classifiers. e effect of α on the accuracy of different CNSGs and classifiers is shown in Figure 5.
Mathematical Problems in Engineering e horizontal axis of Figure 5 indicates the value of α. e vertical axis of Figure 5 is the average accuracy we obtained from the 10-fold cross-validation.
We can see from Figure 5 that RF has good performance in malware classification based on behavioral fragment CNSG. In Figure 5(a), the average accuracy of RF is higher than of other classifiers for different values of α. e average accuracy of RF increases first and then decreases with the increase of α. e average accuracy of RF reaches the optimal value when α is equal to 0.7. In Figure 5(b), when the value of   Mathematical Problems in Engineering α is between 0.1 and 0.3, the average accuracy of SVM is optimal. When α is greater than 0.3, the average accuracy of RF is the best and is slowly increasing. e average accuracy of RF reaches the optimal value when α is equal to 0.9. In Figures 5(c) and 5(d), the average accuracy of RF is better than the other three classifiers, and the highest average accuracy is achieved when α is equal to 0.6 and 0.7, respectively. It can be seen from Figure 5 that with the change of α, the average accuracy of CNSG classified by different classifiers has obvious changes. In other words, exploring changes in α has a positive impact on malware classification. α is an indispensable factor in malware classification. We can conclude that the IG can well compensate for the shortcomings of the TF-IDF-like measure when the optimal value of α is obtained.
To prove the validity of our improved TF-IDF-like measure, we compare the TF-IDF-like measure with our proposed method. Table 3 describes the average accuracy of the TF-IDF-like measure and the improved TF-IDFlike measure. We can see from Table 3 that with different classifiers and CNSGs, the improved TF-IDF-like measure is better than the TF-IDF-like measure, in most cases.
e experimental results also demonstrate that IG can make up for the deficiency of the TF-IDF in malware classification. When α is 0.9, C4SG has the highest classification accuracy (when the classifier is RF), which is as high as 95.27%. Based on the experimental results, we select C4SG as the final fragment behavior.
For malware detection, we select the optimal value of α obtained in malware classification. We draw a Receiver Operating Characteristic (ROC) curve in Figure 6. e horizontal axis of Figure 6 represents FPR, and the vertical axis of Figure 6 is TPR. e ROC curve reflects the correlation between FPR and TPR. It can be calculated in Figure 6 that the accuracy is as high as 99.7% with the FPR of 1.2%. e experimental results show that C4SG is promising in malware detection.
For malware classification, an example of the ROC curve is depicted in Figure 7. It illustrates the classification performance of C4SG detected by RF. Four pictures with the detection performance of Delf, OBfuscated, Small, and Zlob are presented. Figure 7(a) describes the detection performance of Delf. In Figure 7(a), we compare the ROC curve of API sequence (4 gram), C4SG, and subgraph. We can see from Figure 7(a) that the performance of C4SG is better than the subgraph and API sequence and the performance of the subgraph is better than the API sequence. Figure 7(b) describes the detection performance of OBfuscated. We can see from Figure 7(b) that both C4SG and subgraph obtained better detection performance than API sequence and C4SG is better than the subgraph. Figures 7(c) and 7(d) represent the detection performance of Small and Zlob, separately. In Figures 7(c) and 7(d), C4SG has better detection performance than the subgraph and API sequence, and the detection performance of the subgraph is better than the API sequence.
Subgraph and C4SG contain many API calls and their dependencies. Hence, the semantic in subgraph and C4SG is more abundant than in the API sequence. C4SG achieves a better detection performance than the subgraph.
is effectively proves that the C4SG we built is suitable for malware classification.
For malware detection and classification, we make a comparison with some related models, i.e., Fredrikson et al. [37], Alam et al. [40], and Ding et al. [41] in Table 4. Our malware detection result shows good advantages in related studies. For malware classification, authors of [41] have surpassed our results; we take note that Delf, Small, and Zlob in our experiment have some of the same malicious behavior, which may be an important cause of the reduction in classification accuracy.

Discussion
We summarize the limitations of the system in this section. In addition, possible solutions are counseled on these limitations.
e main premise of our proposed malware classification method is that we observe malicious activities by executing Cuckoo sandbox. Sandbox is widely used for detecting malware in dynamic analysis. Nevertheless, certain malware samples can evade detection by analyzing the virtual environment to avoid executing malicious operations. In addition, malware writers can also use some   multiclassification. is is also the work we want to do in the future. e proposed method is very promising for family classification, but there are miss predictions. In our experiments, some Delf samples are detected as Small and Zlob. e main reason for this misclassification is that they have some of the same malicious behavior. Delf generally downloads and runs files on designated IP and port, causing the malware to run automatically on remote hosts. Small usually infects a computer and connects to remote servers to download malware. Zlob is a Trojan that remotes access to infected computers unauthorized. at is to say, Delf, Small, and Zlob perform remote connection operations and have similar behaviors with each other. In addition, the graphbased sequence may have certain limitations. erefore, in future work, we intend to explore the similarity of CNSG in the form of the graph.
To overcome the shortcomings of traditional detection models, we also need to explore some state-of-the-art modes, i.e., Amin et al. [30], Amin et al. [31], and D'Angelo et al. [32]. We will explore deep learning-based methods to improve the detection rate of malware. e dataset for malware analysis is relatively small. Larger numbers of malware samples may have better results. erefore, more samples are needed to implement a large multiclassification. is is also the work we want to do in the future.

Conclusions
In this paper, we propose a dynamic malware analysis method that relies on novel feature representation and extraction for malware classification. e proposed feature representation measure transforms malware behavior into fragment behavior. Moreover, the improved feature extraction measure is utilized to extract crucial behaviors of malware families. e experimental results show that the proposed C4SG achieves a promising performance of 95.27% detected by RF in malware classification.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.