SHAP Interpretations of Tree and Neural Network DNS Classifiers for Analyzing DGA Family Characteristics

Domain Generation Algorithms (DGA’s) have been employed by botnet orchestrators for controlling infected hosts (bots), while evading detection by performing multiple DNS requests, mostly for non-existing domain names. With blacklists ineffective, modern DGA filtering methods rely on Machine Learning (ML). Emerging needs for higher intrusion detection accuracy lead to complex, non-interpretable black-box classifiers, thus requiring eXplainable Artificial Intelligence (XAI) techniques. In this paper, we utilize SHapley Additive exPlanation (SHAP) to derive model-agnostic, post-hoc interpretations on DGA name classifiers. This method is applied to binary supervised tree-based classifiers (e.g. eXtreme Gradient Boosting - XGBoost) and deep neural networks (Multi-Layer Perceptron - MLP) to assess domain name feature importance. SHAP visualization tools (summary, dependence, force plots) are used to rank features, investigate their effect on model decisions and determine their interactions. Specific interpretations are detailed for identifying names belonging to common DGA families pertaining to arithmetic, wordlist, hash and permutation based schemes. Learning and interpretations are based on up-to-date datasets, such as Tranco for benign and DGArchive for malicious names. Domain name features are extracted from dataset instances, thus limiting time-consuming and privacy-invasive database operations on historical data. Our experimental results demonstrate that SHAP enables explanations of XGBoost (the most accurate tree-based model) and MLP classifiers and indicates the characteristics of specific DGA schemes, commonly employed in attacks. In conclusion, we envision that XAI methods will expedite ML deployment in networking environments where justifications for black-box models are required.


I. INTRODUCTION
Machine Learning (ML) algorithms have been widely employed within the cybersecurity domain for effectively filtering massive amounts of data and classifying malignant traffic. Such algorithms have been commonly used in The associate editor coordinating the review of this manuscript and approving it for publication was Sotirios Goudos . the field of botnet traffic detection and for classifying names originating from Domain Generation Algorithms (DGA's) [1]. Tree-based ML classifiers and deep neural networks are utilized to differentiate between legitimate and malicious Domain Name System (DNS) names with promising accuracy results.
Development of DGA name classifiers has been motivated by the desire for ML models of higher performance. Therefore, simple and intrinsically explainable ML classifiers have been replaced by complex, black-box models that are not interpretable. Thus, developers are incapable of understanding their models to debug them and assert their intended operation, while users cannot receive justifications on model decisions made on their data. Finally, regulators are unable to ensure that models deployed within critical infrastructures comply with General Data Protection Regulation (GDPR) [2] or equivalent legislations.
The aforementioned limitations led to investigations for eXplainable Artificial Intelligence (XAI) techniques [3] to provide interpretations (and possibly explanations) on ML model operation. As mentioned in [4], post-hoc and model-agnostic XAI algorithms are typically preferred. Posthoc algorithms are applied to ML models after learning is completed; model-agnostic ones are independent of the selected ML models, e.g. tree classifiers and neural networks. Explanations may be (i) global detailing model behavior on entire sets of sample points and (ii) local reporting how models make classification decisions for specific inputs. A promising post-hoc and model-agnostic approach is SHapley Additive exPlanation (SHAP) [5], [6], which is capable of global and local explainability.
Our work leverages on XAI to analyze the operation of binary, supervised DGA name classifiers that distinguish between legitimate and malicious 1 names, thus detecting botnet traffic abusing DNS. We train and evaluate various tree-based classifiers (Random Forests -RF's, Gradient Boosting -GB, eXtreme Gradient Boosting -XGBoost, Adaptive Boosting -AdaBoost, Extremely Randomized Trees -ExtraTrees) and a deep neural network (Multi-Layer Perceptron -MLP). SHAP is subsequently employed to determine and compare the classification criteria of XGBoost [7], which was the most accurate tree model, and MLP deep neural network [8] in a post-hoc and modelagnostic manner. Our experimental analysis focuses on global and local model interpretations used to rank the impact of utilized features and indicate how their individual values contribute to classification decisions. Relying on multiple SHAP visualization tools (i.e. summary, dependence and force plots [3], [6]) we investigate how the developed models (i) differentiate between benign and malicious domain names and (ii) identify which features have the most significant contribution in classifications of names originating from well-known fundamental DGA generation schemes that produce malicious names [1]. Learning and interpretations are based on linguistic and statistical features, directly extracted from domain names included within up-to-date datasets of benign and malignant DNS names.
Our main contributions are summarized as follows: • SHAP-based interpretations of DGA name classifiers based on deep neural networks (MLP's) and comparison 1 Throughout our paper, DNS names are considered malicious if they are produced by DGA's. Non-DGA names, even those related to malignant activities (e.g. malware propagation), are labeled as benign names in the training set. of their decision-making criteria versus tree-based ML models (XGBoost).
• Identification of dominant features utilized for malicious domain name detection pertaining to specific DGA generation schemes (arithmetic, wordlist, hash and permutation based).
• Extraction of linguistic and statistical features leading to accurate and real-time classification of DGA names with no reliance on time-consuming and privacy sensitive external repository operations.
• Training and interpretations based on the most updated and inclusive dataset of DGA names, i.e. the DGArchive repository [1], [9] including 105 DGA families.
The remainder of this paper is structured as follows: Section II provides brief background and summarizes related work; Section III provides a high-level overview of our methods used for interpreting DGA name classifiers; Section IV elaborates on implementation details pertaining to our approach; Section V includes our experimental results and interpretations of DGA name classifiers based on XGBoost and MLP. Finally, in Section VI we conclude our work and discuss future steps.

II. BACKGROUND AND RELATED WORK
This section provides brief background on concepts used in our paper (subsection II-A), outlines related research approaches (subsection II-B) and details our key contributions (subsection II-C).
The seeding strategy, the number of domain names produced by a bot and their structure are determined by the DGA family. Although there are various families with diverse characteristics, DGA's are grouped into the following four generation schemes [1] based on the technique utilized to produce domain names: • Arithmetic-based: These algorithms generate sequences of random values. DGA names are constructed by concatenating the ASCII representations corresponding to these values or using them to locate characters within lists that constitute the DGA alphabet.
• Wordlist-based: DGA names are generated by randomly concatenating dictionary words. Thus, domain name randomness is reduced, rendering malicious name detection more complicated.
• Hash-based: Domain names are constructed by hashing alphanumeric strings and returning their hexadecimal representation.
• Permutation-based: They generate at random a domain name, which is subsequently permuted several times to produce multiple DGA names.

2) SHAPLEY ADDITIVE EXPLANATION (SHAP)
SHAP is a model-agnostic, post-hoc XAI method related to cooperative game theory. In cooperative games, players collaborate to achieve a pay-off, which is subsequently split based on participant contributions. Accordingly, features are considered as participants that tune a classifier and subsequently SHAP determines feature importance by estimating the effect of specific features on classification decisions when these features are present and absent. SHAP delivers global and local explanations on ML model decisions, whereas various visualization tools facilitate interpretations, e.g. summary plots, dependence plots and force plots. Model-agnostic SHAP is typically based on the KernelExplainer [22] method; this approximates feature importance via a weighted linear regression model applied to input instances (sample points). SHAP time complexity mainly depends on the dataset size. Enabling execution within reasonable time frames may require clustering and/or subsampling a given dataset. This process extracts the eXplainability Background Instances (XBI's) used for tuning SHAP values and eXplainability Test Instances (XTI's) utilized for generalizing model interpretations.
Interpreting DGA name classifiers has recently attracted significant interest. In [25] neural network classifiers are interpreted based on their weights. A system for result visualization is also presented to facilitate model comprehension. However, interpretations rely on model-specific XAI methods applicable exclusively to deep learning models, whilst the total features are limited for visualization purposes. In [26] multi-class DGA name classifiers are developed based on features directly extracted from domain names and feature importance is assessed using various statistical methods. Nevertheless, [26] is limited to global explainability of DGA classifiers, thus neglecting model interpretations on specific DNS names. Moreover, the effect of different DGA schemes on model decisions is not addressed.
In [27], [28], and [29] SHAP and/or equivalent XAI techniques (e.g. Local Interpretable Model-Agnostic Explanation -LIME [30] and Counterfactual Explanations [31]) are employed to provide global and local interpretations on binary DGA name classifiers. Although the aforementioned approaches deliver promising results, they are limited mainly to tree-based ML classifiers. These approaches focus on interpreting how names are classified as benign or malicious, therefore neglecting how the characteristics of different DGA families affect classification decisions. Furthermore, feature calculation in [27] and [29] requires resource-intensive operations on databases involving historical data, e.g. IP reputation lists, WHOIS lookups and Time To Live (TTL) values from DNS responses. These are usually time-consuming and may raise privacy concerns.

C. KEY CONTRIBUTIONS
Our approach relies on SHAP for model-agnostic (regardless of the selected models) and post-hoc (after the learning procedure is completed) validation of DGA name classifier operation. Our models are based on features extracted entirely from given names, hence resource-intensive operations on privacy-sensitive historical DNS data are not required. We compare interpretations derived from tree-based models (i.e. XGBoost) and neural networks (i.e. MLP's) using both global and local explanations. Notably, we extend related approaches by analyzing how binary classifier feature rankings perform when facing diverse DGA schemes, e.g. following testing methods used in use cases related to radio communications and health systems [32], [33]. Finally, malicious DNS data used for training and interpreting our models are selected from DGArchive; we included 105 DGA families, a significantly higher number compared to [27], [28], and [29].

III. OVERVIEW
This section outlines the design principles of our analysis (subsection III-A) and provides a baseline description of our proposed schema for developing and interpreting DGA name classifiers (subsection III-B).

A. DESIGN PRINCIPLES
The main design principles of our approach are: • Model-agnostic ML interpretations: We leverage on the SHAP KernelExplainer [22] to interpret our DGA name classifiers independently of the underlying ML model. Therefore, we analyze the operation of tree-based and deep neural network classifiers in a unified manner. The Administrator initially selects the learning dataset that will be utilized for tuning DGA name classifiers (step 1). The selected data consist of benign and malicious (i.e. DGA generated) DNS names labeled for binary classification purposes. Malicious dataset labels include the DGA algorithm used for name construction; such information is typically available from reverse engineering efforts on DGA malware installed within infected hosts [34].
Details of the Learning Module operation are subsequently determined (step 2). The Administrator defines the model specifications required for tuning name classifiers, i.e. the ML algorithm, the model hyperparameters and the selected features. The learning dataset is then retrieved (step 3) and preprocessed (step 4) based on the selected features and ML model details. The DGA Classifier is subsequently trained and evaluated (step 5), while assessment results and tuned model parameters are returned to the Administrator (step 6).
Upon completion of the learning phase, the Administrator configures the Explainability Module by determining the reduced dataset instances required for SHAP execution (step 7). This step refers to the clustering and subsampling processes required for keeping the SHAP running time within feasible time periods. In steps 8 and 9 the Learning Module feeds the trained DGA Classifier, the selected features and the preprocessed dataset to the Explainability Module. This dataset is then clustered and subsampled (step 10) to derive the instances required for SHAP; the eXplainability Background Instances (XBI's) used in SHAP calculations for assessing feature importance and the eXplainability Test Instances (XTI's) consisting of the input sampling points used to eventually derive model interpretations. Note that, in our case XTI's were subsampled from the class of malignant DGA names since our purpose was to assess feature importance per DGA generation scheme.
After SHAP analysis is completed (step 11), the Explainability Module provides the Administrator with global and local model-agnostic interpretations of the trained classifiers (step 12). The Administrator gathers the Learning and Explainability module results to validate model operation (step 13). If the classifier accuracy and explanations are satisfactory, the Administrator deploys appropriate DGA filtering procedures within the Recursive DNS Server (step 14).
In step 15, ingress DNS requests from DNS Clients are inspected by the Recursive DNS Server (step 16). Malicious DNS requests are dropped, whereas legitimate ones are resolved by the DNS Software, e.g. BIND [35], installed within the Recursive DNS Server (steps 17 and 18).

IV. IMPLEMENTATION DETAILS
This section elaborates on feature selection (subsection IV-A), on the development and operations of the Learning Module (subsection IV-B) and on details pertaining to the Explainability Module (subsection IV-C).

A. SELECTED FEATURES
We leverage on feature values that are directly extracted from given domain names and denote linguistic properties (e.g. values denoting the number of vowels) and statistical measures (e.g. entropy values). Such features facilitate real-time DNS traffic inspection and limit sensitive data exchanges by not requiring storage of privacy-sensitive information. As already stated, we do not employ historical data features (e.g. time-based patterns of DNS responses and IP reputation measures), which typically require excessive processing resources and storing them may raise privacy concerns [17].
Prior to feature extraction valid DNS suffixes (one or multiple zone namespaces, e.g. ''.com'' and ''.gov.uk'') are removed from domain names as in [17]. These are not generated by DGA's, hence they are not meaningful to the learning process. Identification of valid DNS suffixes is based on the Mozilla public suffix list [36]. Note that removing these suffixes mapped multiple distinct names to common prefixes within the learning dataset, e.g. ''google.com'' and ''google.fr'' were both reduced to ''google''. As a result, classifiers are tuned towards accurately recognizing frequently requested DNS names; their appearance frequency within the dataset reflects specific trends of DNS queries resolved by Recursive DNS Servers.
The features used for DGA name classification are outlined in Table 1; feature selection was based on approaches available from the literature, e.g. [14], [17], [37]. In the following, features 44, 47, 48 and 50 are further analyzed: • Vowel_Freq (feature 44): Determines the number of vowels included within the domain name, i.e. letters a, e, i, o, u and y; considering y as a vowel typically increases classification accuracy as reported in [20].
• Reputation (feature 47): Evaluates domain name Reputation defined as an indication of its legitimacy [38]; the higher the Reputation the more legitimate the name may appear. A method for measuring the reputation score of a domain name is the appearance frequency of N-grams (i.e. sequences of N consecutive characters) present in benign names and absent in malignant ones [39].
Estimating Reputation requires a preprocessing stage whereby a whitelist is constructed based on the N-grams derived from a set of legitimate DNS names (e.g. the Tranco list [40]). Reputation of a given domain name is evaluated by determining how many of its N-grams are included in the aforementioned whitelist. N values are selected between 3 and 7 characters as in [39]; unigrams (i.e. N = 1) and bigrams (i.e. N = 2) are excluded because most of them exist in both legitimate and malicious names, thus affecting the learning process and hindering feature importance.
• Words_Freq (feature 48): Determines the number of meaningful words within given names. Words are extracted using the Wordninja Natural Language Processing (NLP) tool [41] similarly to [42]. Wordninja probabilistically splits strings into concatenated words based on the unigram frequency of words appearing within the English Wikipedia. As in [43], words shorter than 3 characters (e.g. pronouns and articles) are ignored as their effect to the learning process is not significant.
• Entropy (feature 50): Estimates domain name randomness using Shannon Entropy [17]. We used the standard definition of entropy: where X is the set of characters included within a DNS name and p(x) the frequency of character x ∈ X .

B. LEARNING MODULE
This module trains and evaluates supervised binary classifiers that differentiate between legitimate and DGA names. The labeled dataset comprised of benign and malicious names is retrieved and the Learning Module proceeds with dataset preprocessing by performing feature extraction. Pairwise feature correlations are calculated using the Pearson's Correlation Coefficient (PCC) statistical measure [44] to detect redundant features not contributing significantly to the learning process. Upon detecting pairs with PCC's exceeding a predefined threshold, a feature is randomly selected and evicted from the dataset, eventually accelerating the learning process without significant performance degradation. The resulting dataset is randomly split into the training set (used for tuning the binary classifier) and the testing set (used for evaluating model generalization). Training and testing instances are scaled between 0 and 1 using Min-max normalization based on minimum and maximum values of training instances as in [45]. The Learning Module completes dataset preprocessing by balancing the number of benign and malicious class instances. Training set instances are oversampled using the Synthetic Minority Over-sampling Technique (SMOTE) [46], similarly to [45]. SMOTE synthetically generates instances following training set statistical properties to reduce imbalance between given classes.
Finally, the Learning Module trains and evaluates DGA name classifiers. We trained tree-based classifiers (i.e. Random Forest -RF, Gradient Boosting -GB, eXtreme Gradient Boosting -XGBoost, Adaptive Boosting -AdaBoost, Extremely Randomized Trees -ExtraTrees) and a deep neural network (i.e. Multi-Layer Perceptron -MLP). Tree classifiers were developed using scikit-learn [47] and XGBoost Python Package [48], whereas MLP's with Keras [49]. Model hyperparameters were fine-tuned using Grid Search, which exhaustively explores a subset of the ML algorithm hyperparameter space and selects the best performing classifier [50].

C. EXPLAINABILITY MODULE
This module analyzes the operation of DGA name classifiers using SHAP, eventually delivering global and local modelagnostic post-hoc interpretations to the Administrator.
The preprocessed dataset, the trained model and the selected features are initially retrieved from the Learning Module. The preprocessed dataset is then clustered and subsampled to limit SHAP analysis within reasonable time constraints [4]. The eXplainability Background Instances (XBI's) are obtained as the centroids of K-means clustering on the training set, whereas eXplainability Test Instances (XTI's) are derived by randomly subsampling the testing set. XBI's are used to tune SHAP values and XTI's to interpret decisions made by the DGA name classifiers.
Subsequently, SHAP KernelExplainer [22] is used to derive global and local interpretations by ranking features according to their contribution in classification decisions and determining interactions between them. SHAP offers various visualization tools to facilitate comprehension of interpretations [4], [6]. We relied on the following SHAP plots: , a moderated repository continuously updated with DGA names resulting from reverse engineering efforts on DGA malware code. We retrieved roughly 200 million domain names corresponding to 105 distinct DGA families pertaining to all generation schemes (i.e. arithmetic, wordlist, hash and permutation based). The total repository size and constraints of our experimental infrastructure rendered training of DGA name classifiers time-consuming and memory intensive. Therefore, we sampled DGArchive and randomly extracted 10,000 DNS names from each DGA family as in [52]; families involving less than 10,000 names were included without subsampling. Eventually, our dataset consisted of 600,775 DGA names, which were used to train, evaluate and interpret DGA name classifiers.
Legitimate DNS names were selected from Tranco [40], a public online service ranking domain names based on their popularity. Tranco merges data from various name ranking services, namely Alexa, Cisco Umbrella, Majestic and Farsight. Name rankings are calculated over long time periods (e.g. 30 days), thus mitigating the impact of abrupt daily fluctuations and/or list manipulation attempts. However, Tranco still contains a small percentage of DGA names that are frequently requested by large numbers of infected Internet devices (bots). Therefore, we filtered the Tranco dataset [53] by removing names included within DGArchive; these amounted to 0.57% of Tranco entries. We subsequently utilized the top-ranked 1 million entries from the remaining Tranco names similarly to [28]. Following [39] we used the first 100,000 to construct the whitelist pertaining to the Reputation feature (subsection IV-A); the remaining 900,000 were used to train and assess the DGA name classifiers.
The aforementioned name sets were labeled as benign and malignant without indicating specific families of malicious DGA names. Binary classifiers were selected instead of multi-class ones. Although multi-class classifiers may provide insight in specific DGA families, they are typically less accurate than binary ones in segregating benign and malignant names [11].

B. TESTBED OVERVIEW
Experiments were performed within our laboratory infrastructure. We utilized a Virtual Machine (VM) comprising of 8 virtual cores and 24GB physical memory. The hypervisor was a Dell PE R730 with Intel Xeon E5-2620 v3 2.4 GHz. Training of neural networks was accelerated using the NVIDIA GeForce GTX 1050 Ti 4GB [54] graphics card.

C. LEARNING MODULE
The Learning Module was evaluated by assessing (i) the pairwise correlation among selected features and (ii) the performance of supervised binary DGA name classifiers. Assessments were performed using the dataset of benign and malicious names described in subsection V-A.
Pearson's Correlation Coefficient (PCC) was utilized to detect highly correlated features. PCC's were calculated for all feature pairs and those exceeding 0.9 (by absolute value) were considered strongly correlated [55]. In such feature pairs, a feature was selected at random and evicted from 61150 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. the dataset. In particular, Ratio_DeciDig was determined as strongly correlated to other features, hence it was removed from subsequent experiments.
We selected Random Forests (RF's), Gradient Boosting (GB), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost) and Extremely Randomized Trees (ExtraTrees) as indicative algorithms of tree-based classifiers; Multi-Layer Perceptrons (MLP) were selected as representative models of deep neural networks. Classifiers were trained and evaluated using the dataset described in subsection V-A. This dataset was randomly split into two parts using the train_test_split method of scikit-learn [10]; 80% was utilized as the training set and the remaining 20% as the testing set.
Grid Search was used to tune model hyperparameters. The number and maximum depth of RF, GB and XGBoost trees were varied as described in Table 2. The number of AdaBoost and ExtraTrees estimators were varied as described in the table. Similarly, multiple MLP configurations were considered by varying the hidden layers number, the neurons per layer, the batch size and the rate of dropout regularization layers placed between the hidden layers to reduce overfitting. Considered MLP hyperparameters are described in Table 3.
Based on the accuracy of ML models, classifier performance was assessed as: where True Positives (TP's) are the correctly classified DGA names, True Negatives (TN's) are the correctly categorized benign names, False Positives (FP's) are the incorrectly classified benign names and False Negatives (FN's) are the misclassified malicious names. Grid Search determined that among RF, GB, XGBoost, AdaBoost, ExtraTrees and MLP classifiers the best accuracy scores on the testing set were 94.67%, 94.66%, 94.81%, 92.32%, 94.67% and 94.51% respectively 2 as shown in Table 4. Their configuration details are summarized in tables 2, 3.

D. EXPLAINABILITY MODULE
The Explainability Module was evaluated based on SHAP interpretations derived on the trained models (subsection V-C) for the dataset described in subsection V-A. We investigated (i) the features used to discern benign and malicious names derived from multiple DGA families and, (ii) the most influential features utilized to differentiate specific DGA schemes.
Interpretations were derived for 105 DGA families of the DGArchive repository and are available from our GitHub repository [10]. However, for illustration purposes representative results are presented in this paper for 4 indicative 2 Filtering repetitive name prefixes (see subsection IV-A) within the training and testing sets yielded comparable accuracy results, specifically 94.39% for XGBoost (best tree-based classifier) and 94.31% for the MLP neural network. Thus, we did not consider filtering them in our experiments pertaining to the Explainability Module.   DGA families pertaining to 4 diverse DGA schemes (see Section II-A1). Specifically, as in [59] results are presented for the following: (i) DirCrypt (arithmetic-based), (ii) Matsnu (wordlist-based), (iii) Bamital (hash-based) and (iv) Volatile-Cedar (permutation-based).
Similarly to [4], XBI's were selected as the cluster centroids resulting from K-means execution on the training set with K equal to 50. XTI's used for interpreting how name classifiers differentiate between benign and malicious names derived from all DGA families were obtained by randomly subsampling 250 DGA names from the testing set. Interpretations pertaining to specific DGA families were based on XTI's randomly subsampled from testing set entries of these specific families; families with less than 250 names were included without subsampling.
A greater number of XBI's and XTI's yielded in our extensive experiments insignificant interpretation improvements, while SHAP running time increased dramatically [10]. Using the aforementioned parameters, the Learning Module and the Explainability Module required approximately 2 days to complete their operation.  The following subsections present SHAP interpretations for XGBoost (which was the most accurate tree-based model) and the MLP deep neural network model. Interpretations are based on multiple SHAP plots: (i) summary plots pertaining to 250 XTI's from all DGA families (subsection V-D1), (ii) summary plots involving XTI's from selected DGA families (subsection V-D2), (iii) dependence plots pertaining to 250 XTI's from all DGA families (subsection V-D3), (iv) dependence plots including XTI's from specific DGA families (subsection V-D4) and (v) force plots for selected domain names (subsection V-D5). Legitimate and malicious name classes are denoted with numbers 0 and 1 respectively. Thus, negative SHAP values contribute to benign name classifications, whereas positive values to DGA name classifications.

1) XGBOOST AND MLP CLASSIFIER SUMMARY PLOTS FOR ALL DGA FAMILIES
In this subsection SHAP summary plots are used to explain the operation of binary DGA name classifiers. Fig. 2 demonstrates XGBoost (Fig. 2a) and MLP (Fig. 2b) classification criteria for segregating malicious names from benign ones. Analysis was based on 250 XTI's, illustrated as colored dots in the horizontal dimension, from all DGA families. In these summary plots blue color is used to denote low feature values, whereas red color is utilized for high feature values (see subsection IV-C). High Length and DeciDig_Freq values favor malicious name classifications. Such behavior is related to lengthy names and high decimal digit frequencies, typically employed by most DGA's to avoid coincidence with legitimate registered domain names. As expected, high Reputation and Words_Freq values mostly point to benign name categorizations since the presence of many whitelisted N-grams and meaningful words are linked to legitimate names. Max_DeciDig_Seq contribution is significantly smaller compared to the impact of the aforementioned features; it is ranked 12th in terms of contribution to classification decisions. Finally, high feature values of Words_Mean may inconclusively affect both benign and malicious name classifications. Fig. 2b shows that the most influential features used by the MLP classifier are Reputation, Length, Max_DeciDig_Seq, Words_Mean and DeciDig_Freq. Similarly to XGBoost, MLP relies predominantly on Reputation and Length features. Max_DeciDig_Seq was the 3rd most important feature for MLP with higher values pointing to benign name classifications. Recall that for XGBoost, Max_DeciDig_Seq was ranked 12th, a much lower significance level (Fig. 2a). Likewise, Vowel_Freq feature significantly affects MLP decisions ranking as the 8th most influential feature, while XGBoost dependence on Vowel_Freq is not even among the 20 most significant features of Fig. 2a. This may be partially explained by the difference of XGBoost and MLP in modeling learning tasks. The former mainly relies on splitting training set instances based on dominant feature deviations; following boosting methods strong tree estimators are eventually constructed by iteratively improving weaker classifiers. The latter (MLP) tunes its weights during back propagation towards directions that linearly combine feature values, forming induced local fields that are further subjected to non-linear activation functions (e.g. ReLU, Sigmoid). Thus, XGBoost mainly relies on boosting methods based on significant feature deviations [60], while MLP on weighted feature differences.

2) MLP CLASSIFIER SUMMARY PLOTS FOR SELECTED DGA FAMILIES
This subsection addresses explanations pertaining to binary MLP classifiers tested for XTI's derived from specific DGA families. In Fig. 3 we present summary plots for 4 DGA families selected from 4 different generation schemes: (a) DirCrypt (arithmetic-based), (b) Matsnu (wordlist-based), (c) Bamital (hash-based) and (d) VolatileCedar (permutationbased). In Table 5 we list four indicative malicious names pertaining to each of the aforementioned DGA families; note that typical suffixes, e.g. ''.com'' and ''.info'', are not included in the table. These schemes and their respective families have the following properties [1]: • Arithmetic-based DGA's (e.g. DirCrypt): Domain names are generated by concatenating randomly selected characters. DirCrypt is based on the 26 English alphabet letters to produce names between 8 and 20 characters. Names typically contain long consonant sequences and are characterized by increased randomness compared to benign names.
• Wordlist-based DGA's (e.g. Matsnu): Random dictionary words are concatenated to generate malicious domain names resembling legitimate ones. Matsnu forms long names between 12 and 24 characters by joining multiple dictionary words of relatively short length [61].
• Hash-based DGA's (e.g. Bamital): They rely on the hexadecimal representation resulting from hashing domain names. Bamital is based on MD5 hash function to generate names consisting of 32 hexadecimal digits.
• Permutation-based DGA's (VolatileCedar): Multiple DGA names are produced by permuting a generated domain name that resembles legitimate names. Linguistic (e.g. number of vowels) and statistical properties (e.g. letter frequencies) of the initial malignant name are inherited by derived names. In the following we analyze specific feature contributions using summary plots derived by experimenting with malignant XTI's, randomly subsampled from the aforementioned DGA schemes:  Fig. 4a and Fig. 4b show that Reputation significantly influences classifications. Namely, Reputation interacts with Length for XGBoost and DeciDig_Freq for MLP. However, combined Reputation and interacting feature values do not clearly affect classification decisions because, as shown in Fig. 2, the impact of Reputation is significantly higher than that of Length and DeciDig_Freq.
• Entropy Interactions: As expected from the summary plots of subsection V-D1, Fig. 4c and Fig. 4d  Bamital XTI's in Fig. 5c and Fig. 5d show that DeciDig_Freq interacts with Spec_Char_Freq, while Max_DeciDig_Seq with Length. Increasing DeciDig_Freq favors TP's, with its influence increasing (higher SHAP values) for higher values of the interacting feature (Spec_Char_Freq). This is expected because Bamital names consist of hexadecimal digits, thus decimal digits constitute their majority. Moreover, as in Fig. 5d, increased Max_DeciDig_Seq values favor FN's since Bamital follows the statistical properties of MD5 hash function with hexadecimal digits uniformly distributed across domain names. Therefore, long decimal digit sequences typically favor benign name misclassifications.

5) MLP CLASSIFIER FORCE PLOTS FOR LOCAL EXPLAINABILITY
In this subsection force plots are used to analyze the operation of binary MLP classifiers pertaining to specific inputs (local explainability). Force plots are particularly helpful for understanding False Positives (FP's) and False Negatives (FN's) in classification of specific benign and DGA names. In these plots, features dominantly influencing name classifications are depicted along with their values. Red color denotes features favoring malicious name categorizations and blue colors those contributing to benign name classifications. A bold decimal value corresponds to the classifier output.    Fig. 6a shows that name ''wawibox.de'' is perceived as DGA, mainly because of the high frequency of letter W and the low Reputation value. For this particular name Freq_W and the absence of many whitelisted N-grams override the effect of Length that favors benign name classifications. Fig. 6b shows that name ''rvwgm2wrld2.xyz'', which is frequently used for malware propagation [62] but is not produced by DGA's, is misclassified as DGA. This is attributed to the low Reputation value, the high frequency of letter W and the low Words_Mean value, although the zero Vowel_Freq value might point to non-DGA name classification. Fig. 7 depicts force plots pertaining to MLP FN's, i.e. DGA names incorrectly classified as benign. Fig. 7a shows that Length values have a major effect on misclassifying name ''nomodum.info'', generated by the Simda arithmeticbased DGA family, despite the high frequencies of letters M and O that favor malicious name classifications. Fig. 7b shows that name ''californiatransferable.ru'' originating from the Gozi wordlist-based DGA family is classified as benign because Reputation values point to benign name classifications. This counterbalances the effect of name length and the high presence of vowels that point towards DGA names.

VI. CONCLUSION AND FUTURE WORK
We investigated XAI methods for interpreting DGA name classifiers that detect malicious DNS messages used by bots to communicate with Command & Control (C&C) servers. We addressed defense mechanisms based on ML classifiers and analyzed their operation via the SHapley Additive exPlanation (SHAP) algorithm that provides global and local interpretations in a model-agnostic, post-hoc manner.
To that end, we first configured tree-based and deep neural network binary classifiers for differentiating between benign DNS names and malicious names produced by DGA's. We trained and evaluated classifiers based on supervised ML algorithms, specifically Random Forests (RF's), Gradient Boosting (GB), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Extremely Randomized Trees (ExtraTrees) and Multi-Layer Perceptrons (MLP's). These relied on features directly extracted from domain name datasets, thus eliminating time-consuming and privacy-sensitive operations on repositories of historical data. Classifiers were trained using up-to-date and inclusive datasets. Legitimate names originated from Tranco, an online service ranking top Internet sites; we selected the 1 million most popular names. Malicious instances were sampled from the DGArchive repository, which reports 105 DGA families from 4 different generation schemes; we randomly selected 600,775 DGA names.
Our SHAP-based evaluation analyzed the features used by our trained XGBoost (determined as the most accurate tree-based model) and MLP deep neural network classifiers to segregate benign and DGA name instances. We investigated how DGA families and their different underlying algorithmic generation schemes (i.e. arithmetic, wordlist, hash or permutation based) affect the features that specifically influence classification decisions. Relying on multiple SHAP visualization tools (summary, dependence and force plots) we provided global and local interpretations on sampled dataset instances. Specifically, we ranked feature importance, investigated the effect of feature values on model decisions and determined their interactions. Using up-to-date and extensive datasets, we conclude that our SHAP-based analysis enables interpretations of XGBoost and MLP name classifiers, attacked by well-known diverse DGA schemes. Such methods may facilitate ML adoption within networking environments where interpretations for black-box schemes are required.
We plan to extend our SHAP-based interpretations to address additional deep neural network models. These include Convolutional Neural Networks (CNN's), Long Short-Term Memory (LSTM) networks and/or Bidirectional LSTM (BiLSTM) networks that may be employed for DGA name classification [13]. Alternative XAI approaches, e.g. LIME [30] and Counterfactual Explanation [31], will also be considered. The proposed scheme may be further adapted to unsupervised deep learning models, e.g. Autoencoders. Finally, the proposed approach will be extended to multi-domain infrastructures using Federated Learning [63] for collaborative DGA name detection, similarly to [64], [65]. Therefore, privacy-aware model interpretations will be derived without sharing attack and benign data.
NIKOS KOSTOPOULOS received the engineering degree from the National Technical University of Athens (NTUA), Athens, Greece, in 2017, where he is currently pursuing the Ph.D. degree. His research interests include the liaison of big data methods, programmable data planes, and machine learning algorithms for defending against attacks targeting (e.g. DDoS attacks) or abusing (e.g. DGA's) the normal operation of DNS.