Extracting automata from neural networks using active learning

Deep learning is one of the most advanced forms of machine learning. Most modern deep learning models are based on an artificial neural network, and benchmarking studies reveal that neural networks have produced results comparable to and in some cases superior to human experts. However, the generated neural networks are typically regarded as incomprehensible black-box models, which not only limits their applications, but also hinders testing and verifying. In this paper, we present an active learning framework to extract automata from neural network classifiers, which can help users to understand the classifiers. In more detail, we use Angluin’s L* algorithm as a learner and the neural network under learning as an oracle, employing abstraction interpretation of the neural network for answering membership and equivalence queries. Our abstraction consists of value, symbol and word abstractions. The factors that may affect the abstraction are also discussed in the paper. We have implemented our approach in a prototype. To evaluate it, we have performed the prototype on a MNIST classifier and have identified that the abstraction with interval number 2 and block size 1 × 28 offers the best performance in terms of F1 score. We also have compared our extracted DFA against the DFAs learned via the passive learning algorithms provided in LearnLib and the experimental results show that our DFA gives a better performance on the MNIST dataset.


INTRODUCTION
Deep learning is one of the most advanced forms of machine learning, which has been applied to various fields, including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design and board game programs (Schmidhuber, 2014;Lecun, Bengio & Hinton, 2015). Most modern deep learning models are based on an artificial neural network, such as deep neural networks (DNN), deep belief networks, convolutional neural networks (CNN) and recurrent neural networks (RNN). Benchmarking studies reveal that neural networks have produced results comparable to and in some cases superior to human experts.
However, the generated neural networks are typically regarded as incomprehensible black-box models. They are in practice unlikely to generalise exactly to the concept being trained, and what they eventually learn actually is unclear (Omlin & Giles, 2000). The opaqueness of neural networks not only limits their applications, but also hinders is found, it may be not that the hypothesis is incorrect, but rather that the abstract model is not precise enough and needs to be refined.
Finally, we have implemented our approach in Java, wherein we use the library LearnLib (Howar et al., 2012) to implement the active learning framework. To evaluate our approach, we conducted a series of experiments on a classifier for the MNIST dataset, a large database of handwritten digits that is commonly used for training various image processing systems. We first test the measures of the MNIST classifier, namely, safety, conflict, the size of alphabet and the length of words, under the abstractions with different interval numbers (i.e., the number of partitioning) and block sizes, and have identified some suitable abstractions. Secondly, we conduct some experiments to learn DFAs from the MNIST classifier with the suggested abstractions. The results shows that the abstraction with interval number 2 and block size 1 × 28 offers the best performance in terms of F1 score. At last, we also conduct the experiments to compare our resulted DFA against the DFAs learned via the passive learning algorithms (see "Passive Learning") provided in LearnLib and the MNIST classifier itself. Although worse than the classifier, our DFA gives a better performance than the other DFAs in our experiments. Nevertheless, there are still some limitations for our approach.
In summary, our contributions are as follows: We have proposed an MAT framework to extract automata from neural networks, employing abstraction interpretation of the neural network for answering membership and equivalence queries.
We have conducted several experiments on a MNIST classifier, and the experimental results show that our approach is viable, and the resulted DFA has a better performance in terms of F1 score than the DFAs learned via the passive learning algorithms provided in LearnLib on the MNIST dataset.
The remainder of this paper is organised as follows. "Preliminary" gives the preliminaries of DFA and active learning. "Approach" describes our approach, followed by the experimental results in "Experiments". "Limitations" discusses some limitations of our approach. "Related Work" presents the related work, followed by some concluding remarks in "Conclusion".

PRELIMINARY
In this section, we present the notion about neural networks, deterministic finite automata, active learning and passive learning.

Neural networks as functions
Neural network models can be viewed as mathematical models defining a possible nonlinear function N : X ! Y, where X is the input and Y is the output. In more detail, N can be defined as a composition of other (layer) functions, which can further be decomposed into other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between functions. In this paper we assume that the framework of neural networks is unknown and only network-acceptors are considered. So the function we consider here is the one representing the whole network N : X ! Bool, where X is a multi-dimensional array. We say an input data X is positive if NðXÞ ¼ true, and otherwise negative.

Deterministic finite automata
Definition 2.1 A deterministic finite automaton (DFA) is a 5-tuple (Q, σ, δ, q 0 , F), where Q is a finite set of states, σ is a finite set of input symbols and is called the alphabet, d : Q Â AE ! Q is the transition function, q 0 ∈ Q is the starting state, F Q is the set of accepting states.
A word or string over an alphabet σ is a finite sequence of symbols from σ. The length of a word is the number of symbols it contains. Note that a word can be empty: the empty word, denoted as ε, has length 0 and contains no symbols.

Definition 2.2
Let M ¼ ðQ; AE; d; q 0 ; FÞ be a DFA and w = a 1 a 2 … a n be a word of length n over σ. The automaton M accepts the word w if and only if there exists a sequence of states r 0 ,r 1 , …, r n with the following conditions: The set of words recognised by a DFA M, called the language of M, is the following set: LðMÞ ¼ fw 2 Σ Ã jw is accepted by Mg Active learning framework Angluin (1987) proposed the first active learning algorithm, the L* algorithm, to learn finite automata from a MAT in 1987, and today all the most efficient learning algorithms that are being used follow Angluin's approach. In the following, we briefly introduce the MAT framework.
Finite automata can be learned precisely from a MAT, that is, an oracle capable of answering the so-called membership and equivalence queries: membership queries: the learner asks whether a given word is accepted by the automaton or not, and the teacher answers with the result. equivalence queries: the learner asks whether a given hypothesis automaton H is equal to the automaton model M held by the teacher. The teacher answers yes if this is the case. Otherwise she answers no and supplies a word, the so-called counterexample, on which the hypothesis automaton H and the automaton model M disagree.
The MAT framework is shown in Fig. 1. Initially, the learner knows the static interface of the SUL, that is, the sets of input (i.e., multi-dimensional array for neural networks) and output (i.e., yes or no for recognizers). Then the learner starts to ask a sequence of membership queries (MQs) and receives the corresponding responses from the teacher. After a "sufficient" number of queries, the learner builds a hypothesis H from the obtained information, and then sends an equivalence query (EQ). If the teacher answers yes, then the hypothesis H is returned. Otherwise, the learner refines the information with the returned counterexample, and continues on querying.

Passive learning
Different from active learning, passive learning constructs automata from sets of examples directly. Many approaches in grammatical inference can be described as passive learning. In the paper, we consider the polynomial-time RPNI algorithms provided in the library LearnLib (Howar et al., 2012). Oncina & Garca (1992) proposed the Regular Positive and Negative Inference (RPNI) algorithm for DFA learning. RPNI starts with a prefix tree acceptor, a tree-like DFA built from the learning examples by taking all the prefixes of the examples as states, and then greedily creates clusters of states (by merging) in order to come up with an automaton that is always consistent with the examples. Two heuristic strategies can be employed in state merging: Evidence Driven State Merging (EDSM) (Cicchello & Kremer, 2002) and Minimum Description Length (MDL) (Adriaans & Jacobs, 2006).

APPROACH
In this section, we present an active learning framework to extract automata from neural network classifiers. Our framework is shown in Fig. 2, which is a classic MAT framework with an abstraction 2 . In a nutshell, we make an abstraction between the learner and the SUL. When the membership queries are sent, the abstraction maps the abstract words into the concrete ones (i.e., the up arrow in Fig. 2), which are then fed into the neural network under learning. When the equivalence queries are sent, the abstraction does the opposite (i.e., the down arrow in Fig. 2) and from the abstract words an abstract representation is built for checking. In the following, we explain how to define an abstraction for neural networks and how to instantiate the active learning framework on neural networks. Abstraction For simplicity, we focus on network-acceptors, that is, there are only two outputs for the SUL, and thus we do not need to abstract them. In other words, only the inputs need to be abstracted. Generally, the inputs of neural network classifiers are always multidimensional arrays. As mentioned in "Introduction", we aim to abstract an input as a word, rather than a symbol. So the aim of the abstraction is to convert multi-dimensional arrays into words and vice-versa. Just like the serialization of multi-dimensional arrays, a naive and simple solution to the abstraction is to convert the input multi-dimensional array into a 1-dimensional array in row (or column) major order, and then concatenate the string representation of each value in the converted array in order, yielding a word. For example, Fig. 3 shows an array with size 4 × 4, on which the simple abstraction is applied. However, there are two problems for this solution: (1) the size of alphabet may be too large, even infinite.
(2) the length of the abstracted word may be too long. Both of them can make the automata learning too time-consuming.
So for practicality, we propose a three-layer abstraction, which consists of: value abstraction: each value in an input array is mapped into an integer via partitioning, which helps reduce the size of alphabet; symbol abstraction: a block of multi-dimensional integer array is abstracted as a symbol, which enables us to reduce the length of word; word abstraction: the whole input array is encoded into a word, wherein value abstraction and symbol abstraction are applied.

Value abstraction
In order to reduce the size of alphabet, inspired by Omlin & Giles (1996) work, we first split the values of the input space into n (equal) intervals, and map each interval into an integer, that is, the index of the corresponding intervals. Formally, let the input space be I and it be split into n intervals I 0 , …, I n − 1 . Then a value abstraction function a v : I ! f0; . . . ; n À 1g is defined as follows: This value abstraction function maps a concrete value in the input to an abstract integer. Figure 4 shows an example of value abstraction applying on the array given in Fig. 3, where the input space I is [0,1] and I is split into I 0 = [0,0.5) and I 1 = [0.5,1].
While for concretization, an abstract integer is mapped to its corresponding interval, that is, the value concretization function g v : f0; . . . ; n À 1g ! 2 I is defined: Both a v and β v can be extended on sets of elements in a natural way: It is easy to get that a v (γ v (i)) = i for each integer i, and d ∈ γ v (a v (d)) for any given value d. Therefore, (a v , γ v ) forms a Galois connection (Nielson, Nielson & Hankin, 1999). While in practice, especially when we query a word, we are unable to test all the values in the interval for each integer. For that, we randomly select at most k v (which can be dependent on the intervals) values to represent the corresponding interval. That is to say, we define a weak value concretization function: g prac v ðiÞ ¼ fd j j d j 2 I i and 0 j < k v g Obviously, the larger k v is, the closer g prac v ðiÞ is to γ v (i) . So concerning Galois connections, the larger k v , the better. But only Galois connections are not enough here. We also need to consider the safety of neural networks (Huang et al., 2017), that is, a vibration of values should not flap the outputs, since different values may be abstracted into an identity integer. In fact, the composition of the functions a v and γ v can be viewed as a kind of manipulations (Huang et al., 2017). We say a k-value manipulation vm k with respect to a v and γ v is a function such that for any input array in . Intuitively, k-value manipulation replaces (at most) k values of the input array by some values, which share the same intervals with the corresponding original values. Figure 5 shows an example of 4-value manipulation applying on the input array in Fig. 4, where the input space I is [0,1] and it is split into I 0 = [0,0.5) and I 1 = [0.5,1]. And we say a network N is safe with respect to this value manipulation vm k if for every input array in

Nðvm k ðinÞÞ ¼ NðinÞ
That is to say, performing this value manipulation should not result in a different classification. This requires that every interval I i should be as small as possible or the number n of intervals should be as large as possible. However, the safety cannot easily be preserved in practice, unless the abstraction is an identity function or the network is robust enough. So instead, we use a weak notation called σ-safety: we say a network N is σ-safe with respect to k-value manipulation vm k under a given input set D if

Symbol abstraction
After the value abstraction, each integer can be used as a symbol. But this could yield words that are too long to learn the model. So for scalability, we add a symbol abstraction, which abstracts input arrays into symbols by blocks. For simplicity, in this paper we consider 2-dimensional array with size iRow × iCol. We say a slice of an input array starting from the index (ri, ci) to the index (ri + oRow − 1, ci + oCol − 1) is a block, and the size of the block is oRow × oCol. A natural way to abstract blocks into symbols is to map the blocks into one dimension in row (or column) major order and then encode the one dimension into a base-n number (or a string consisting of the integers in the one dimension). We denote this mapping as a B s . Figure 6 gives an example of a B s that are performed on the array that are obtained by the value abstraction shown in Fig. 4, where the number n of intervals is 2 and the size of blocks is 2 × 2. Moreover, by decoding the base-n number (or the string), it is easy to obtain the inverse mapping g B s . It is clear ða B s ; g B s Þ forms a Galois connection. But a drawback of this solution is that the size of alphabet is n oRow × oCol , which could be too large in practice. For example, the size of alphabet of the example shown in Figure 6 is 16.
In this paper we use an alternative way to represent a block as its sum. In more detail, we define a symbol abstraction function a S s that maps integer blocks of size oRow ×oCol into the sum of the integers in blocks: It is easy to compute the set of the possible sums of blocks, that is, {0,…, oRow×oCol× (n − 1)}. So the size of alphabet is oRow×oCol× (n − 1) + 1. Compared to the natural solution, the size of alphabet is quite smaller (from n oRow × oCol reduced to oRow×oCol× (n − 1) + 1). Take the input array in Fig. 6 for example. Under this sum abstraction, its abstraction is shown in Fig. 7 and the size of alphabet is 5.
But it is pity that this mapping is not bijective. So in order to form Galois connections, similar to value abstraction, we define the inverse mapping from symbols (i.e. sums) to sets consisting of blocks of size oRow × oCol whose sum is exactly the symbol: Likewise, these two functions can be lifted to sets of elements in a natural way. It is easy to get that a s (γ s (sum)) = sum for each symbol sum, and b ∈ γ s (a s (b)) for any given block b of size oRow × oCol. Therefore, (a s , γ s ) forms a Galois connection. While for practicality, similar to value abstraction γ v , we use a weak sum concretization function: That is, we select at most k s (which can be dependent on the block size and n) blocks to represent the corresponding sum. Clearly, the larger k s is, the closer g Sprac s ðsumÞ is to g S s ðsumÞ. So concerning Galois connections, the larger k s , the better. In addition, the composition of a S s and g S s forms a manipulation as well, and the network N should be safe with respect to this manipulation. Formally, we say a k-shift manipulation sm k with respect to a S s and g S s is a function such that for any input array in where b i is a block of size oRow ×oCol belonging to in and b 0 i 2 g S s ða S s ðb i ÞÞ. And a network N is safe with respect to sm k if for every input array in Nðsm k ðinÞÞ ¼ NðinÞ In a word, k-shift manipulation replaces (at most) k blocks of the input array by some blocks, which share the same sums with the corresponding original blocks; and performing this k-shift manipulation should not result in a different classification. Similar to value manipulation, the safety cannot easily be preserved in practice. So we use a weak notation called σ-safety: a network N is σ-safe with respect to sm k under a given input set D if jfin 2 DjNðsm k ðinÞÞ 6 ¼ NðinÞgj jDj r where |D| ≥ 1. Considering the safety, the distance between two blocks of the same sum or the size of block should be as small as possible. Figure 8 shows an example of 1-shift manipulation applying on the input array in Fig. 7, where n is 2 and the size of blocks is 2 × 2. Finally, let us consider the alphabet size. As discussed above, the alphabet size for the abstraction function a S s (a s resp.) is linear (exponential resp.) in the block size and the number of intervals n. So concerning the alphabet size, the smaller the block size and the interval number n, the better.

Word abstraction
Finally, we split the input array into blocks, and map them into a sequence of symbols (i.e., a word) in row (or column) major order. Algorithm 1 shows the detail of the word abstraction function a w . The algorithm first invokes the value abstraction a v to map the values in the input array into integers (Line 1). Then it slides over the integer array block by block (Lines 2-10) and maps each block into a symbol by the symbol abstraction a S s (Line 6). Note that here we use a narrow slide on the input array, that is, the blocks to be abstracted are fully contained in the input array. One can use the wide slide with zeropadding as well. Just like the convolution operation of CNN, one can further set the stride sizes for each dimension. In addition, one can further encode the sequence of symbols into a final word in a more compact format, such as run-length encoding (RLE).
As mentioned above, the symbol abstraction aims to reduce the length of words. According to the word abstraction, we have that the larger the block size, the shorter the word. Figure 9 shows our abstraction applying on the input array given in Fig. 3, where the number n of intervals is 2 and the size of blocks is 2 × 2. Compared to the simple abstraction shown in Fig. 3, our abstraction yields words with smaller size of alphabet and shorter length.
The word concretization function γ w , which is shown in Algorithm 2, does the opposite: it maps a sequence of symbols (i.e., a word) into a sequence of sets of blocks (Lines 5-7), and combines them into a set of arrays (Lines 8−10). Note that we require the length of Algorithm 1 Word abstraction function α w (in).

Input: an input in
Output: a word w 7: ci = ci + oCol 8: end while 9: ri = ri + oRow 10: end while 11: return w word to be concretized should conform to the size of input (Lines 2−4). One can release this length condition by zero-padding or discarding the superfluous symbols. But this may break the Galois connections.
Theoretically, if (a v , γ v ) and ða S s ; g S s Þ form Galois connections, then so does (a w , γ w ). While for practicality, we use , and the number of data in g prac w ðwÞ depends on the word w as well as k v and k s . In particular, in our implementation we collect sets of inputs (including values and blocks) that are mapped into an identity word from existing data and then select inputs from the corresponding set.
Finally, considering safety, if (a v , γ v ) and ða S s ; g S s Þ does not cause the flapping, then neither does (a w , γ w ). But the safety of a network focus on the vibrations of local parts of Algorithm 2 Word Concretization Function γ(w).

Input: a word w
Output: a set matrix_set of arrays inputs (Huang et al., 2017). To evaluate the whole inputs, we use another notation conflict, that is, inputs of different classifications should not abstracted into an identity word. Formally, we say a network N is non-conflict with respect to a w and γ w if for every input array in and for every array in′ ∈ γ w (a w (in))

Nðin 0 Þ ¼ NðinÞ
In other words, the abstraction itself should not be over-approximated. We say a word w is conflict, if there exist two inputs of different classifications that are abstracted into it. So to avoid over-approximation, the number of the conflict words caused by the abstraction should be as few as possible. Similar to the safety, the non-conflict cannot easily be preserved in practice. For that, we evaluate the conflicts words under a given dataset. In detail, we say a network N is σ-conflict with respect to a w under a given set D if jfin j 9in 0 :NðinÞ 6 ¼ Nðin 0 Þ^a w ðinÞ ¼ a w ðin 0 Þgj jDj r where |D| ≥ 1 and in, in′∈ D.
To sum up, to obtain a suitable abstraction (e.g., scalable, safe and non-conflict), one needs to take into account the number n of intervals, the block size oRow and oCol, and the other factors.

Active learning
In this section, we present how to instantiate the active learning framework on neural networks, in particular the membership and equivalence queries.

Membership query
Membership queries can be answered by the neural networks via the word concretization function. In our abstraction, we map a word into a set of data. As mentioned above, the abstraction may flap the results or yield some conflict words, that is, the classifications of different data in the set of an identity word may not be the same. To address this, we count the numbers of different classifications of the data in the set and take the classification which gets the most votes as the result for the word.
Given a network N and a word concretization function a w , we say a word w is positive if jfin 2 g w ðwÞjNðinÞ ¼ truegj ! jfin 2 g w ðwÞjNðinÞ ¼ falsegj and otherwise negative. Intuitively, a word is positive (negative resp.) if there are more positive (negative resp.) input arrays that are abstracted into it than the negative (positive resp.) ones.
Algorithm 3 gives the procedure of membership query checking, where N denote the neural network under learning. Firstly, the algorithm concretises the word w that is being queried into a set matrix_set of possible data, using the word concretization function γ w (Line 1). If the set matrix_set is null, that is, the length of word w does not conform to the size of input data, then the algorithm returns false immediately. Otherwise, the algorithm feeds each data into the neural network N under learning and counts the numbers of different classifications (Lines 5-12). Finally, it returns the classification that gets the most votes (Line 13).

Equivalence query
As there is no finite interpretation for neural networks (Weiss, Goldberg & Yahav, 2018), equivalence queries are more challenging than membership queries. To address this, similar to Weiss, Goldberg & Yahav (2018)'s work, we use an abstract representation of the neural network under learning. But different from Weiss, Goldberg & Yahav (2018)'s work, we start with the automaton that is learned passively via the RPNI algorithm (Oncina & Garca, 1992) from some test queries, which are selected from the training dataset. Then we perform the equivalence query against this abstract model. As discussed in Weiss, Goldberg & Yahav (2018), when a counterexample is found, it may be not that the hypothesis is incorrect, but rather that the abstract model is not precise enough (i.e., different behaviors from the neural network under learning) and needs to be refined.
The procedure 3 of equivalence query checking is given in Algorithm 4. Firstly, the algorithm tries to find a word that can separate the hypothesis H and the abstract model M (Line 3). If such a word does not exist, then it returns null (Lines 4−6), which means the equivalence query is yes. Assume a word w is found. Then it checks whether this word is a true counterexample, that is, the classifications of the abstract model and the neural network under learning are the same (Line 7). If it is in that case, then it returns this word as a counterexample to the learner (Line 8). Otherwise, it refines the abstract model with this word (Line 10): it adds the counterexample into the positive set or the After that, the algorithm continues on the equivalence query against this refined model.

EXPERIMENTS
We have implemented our approach in a prototype in Java, wherein we use the library LearnLib (Howar et al., 2012) to implement the MAT learning framework and the RPNI algorithm. Moreover, to find the true counterexamples faster, we use the Wpmethod test (Fujiwara et al., 1991) 4 in the equivalence query between the hypothesis and the abstract models. To evaluate our approach, we conduct a series of experiments on a classifier for the MNIST dataset, a large database of handwritten digits that is commonly used for training various image processing systems. Firstly, we conduct experiments to see the measures of the MNIST classifier, namely, σ-safety, σ-conflict, the size of alphabet and the length of words, under the abstractions with different interval numbers and block sizes. Secondly, we present the experiments to learn DFAs from the MNIST classifier under different selected abstractions. Thirdly, we also conduct experiments to compare the resulted DFAs against the DFAs learned via the passive learning algorithms provided in LearnLib and the MNIST classifier itself. The experiments were conducted on a workstation with Intel Processor i7-7820HQ (2.90GHz) and 32GB memory.

MNIST classifier
The MNIST classifier under learning is a binary classification version of MnistClassifier from the tutorial examples of DeepLearning4J (https://github.com/deeplearning4j/dl4jexamples), which recognises the number 1. It is built on a convolution neural network, which consists of six layers, namely, a convolution layer, a pooling layer, another convolution layer, another pooling layer, a dense layer and an output layer. The training

Abstraction experiments
As discussed in "Abstraction", the interval number n and the block size affect the definition of the abstraction, especially the safety and the conflict of the neural network under learning, the size of alphabet and the length of words. For that, we present in this section some experiments to see these measures of the abstractions with different interval numbers and block sizes.

Safety
The first measure to test is the safety. For that, we present some experiments to test the flapping of the MNIST classifier on some selected inputs from the training set (i) via performing k-value manipulations vm k with different interval numbers and (ii) via performing k-shift manipulations sm k with different block sizes. First, in the experiments about k-value manipulation vm k , for a given interval number n, we randomly select k values from a selected input, and replace each selected value by a random value which shares the same interval with the corresponding selected value. Then we fed the resulted data into the MNIST classifier and see whether the classifications are flapped. We select 59,838 inputs in total from the training set, which are classified correctly by the MNIST classifier. Table 1 shows the results, where Flaps denotes the number of inputs whose results are flapped by the manipulation, and Ratio denotes the percentage of the number of flapped input to the total number of selected inputs.
From the results we can see that, the number of flapped inputs increases as the number k of selected values increases, since the larger the number k, the larger the vibration for the inputs. In contrast, as the number of intervals increase, the number of flapped inputs decreases, which indicates that the larger the interval number, the better. This conforms to the discussion in "Abstraction". Moreover, the results also show that the MNIST classifier is about 0.053%-safety, with respect to the 100-value manipulation vm 100 with the interval number 2. And the 100-value manipulation means 12.76% (100/784) of an input has been modified, such that we believe 100 is enough for local vibration. Therefore, we suggest to set the interval number n as 2. Next, in the experiments about k-shift manipulation sm k , for a given block size oRow × oCol, we randomly select k blocks from a selected input, and rearrange the values in each selected block. Then we feed the resulted data into the MNIST classifier and see whether the classifications are flapped. Similarly, we select the 59,838 inputs that are classified correctly by the MNIST classifier from the training set. For simplicity and scalability, we consider the block sizes whose row sizes or column sizes are 28. The results are given in Table 2, where the notations are the same as the ones in Table 1.
First, the results show that, as the size of block increases, the number of flapped inputs increases, which conforms to the discussion in "Abstraction". The results also show that the number of flapped inputs increases as the number k of selected blocks increases. This is because that, the larger the number k, the larger the vibration for the inputs. Moreover, we found that the MNIST classifier is more safe under k-shift manipulation built on rows than the one on columns. The reason may be that the digit number of 1 is more regular in row order than in column order. Finally, assume the size allowed for local vibration is about 100. All the σ-safeties of the MNIST classifier with respect to the k-shift manipulation with the block size 1 × 28, 2 × 28, 4 × 28, 28 × 1, 28 × 2 or 28 × 4 are smaller than 1.8%. In particular, the MNIST classifier is about 0.055%-safety, with respect to the 4-shift manipulation vm 4 with the block size 1 × 28.

Conflict
The second measure to test is the non-conflict, which indicates whether the abstraction with the given block size is over-approximated. In other words, we would like to conduct experiments to test how many conflict words that are generated by the abstractions with different block sizes under the training set. For that, we perform the abstractions with different block sizes on some selected inputs from the training set, and do a statistic analysis on the abstracted words with respect to their classifications, where we take the interval number n as the suggested one 2. The test inputs that are selected from the training set is 59,840 in total, with 6,700 positive inputs and 53,140 negative ones.
The statistic results are given in Table 3, where TW denotes the total number of words, PW (NW resp.) denotes the number of positive (negative resp.) words, CPW (CNW resp.) denotes the number of positive (negative resp.) words that have both positive and negative inputs and CPD (CND resp.) denotes the number of positive (negative resp.) inputs that are abstracted into a negative (positive resp.) word.
The results show that as the block size increases, the number of abstracted words decreases, which conforms to the discussion in "Abstraction". Thus it could be easier to extract the automaton for a larger block size. For example, when taking the whole input as a symbol, there are 239 words in total. But both the number of conflict words and the number of conflict data increase as the block size increases, which indicates that an abstraction with a larger block size is prone to be an over-approximation. In particular, when taking the whole input as a symbol, 56.194% of the positive inputs are abstracted into negative words and 64.865% of the positive words are conflict. Moreover, from the results we can also see that all the σ-conflicts for the MNIST classifier with respect to the abstractions with the block size 1 × 28, 2 × 28, 28 × 1 or 28 × 2 are smaller than 0.015% (8/59,840). And the abstractions with the block size 2 × 28 and 1 × 28 perform best, yielding none conflict data nor words.

Word complexities
Finally, we also conduct experiments to see the size of alphabets and the length of words, which are dubbed as word complexities. Table 4 shows the word complexities under different abstractions with different block sizes, where Size denotes the size of alphabet, dSize denotes the number of symbols occurring in the selected inputs and Length denotes the length of words.
The results show that the larger the block size, the larger the alphabet size and the shorter the word length, which conforms to the discussion in "Abstraction". Moreover, we found that the products of the alphabet size and the word length are almost the same. So for scalability, any block size seems fine. But if considering the practical alphabet (i.e., symbols occurring in the inputs), the larger block size could be better.
To sum up, based on the experiments above, we suggest to use for the MNIST classifier the abstractions with the interval number n = 2 and the block size 1 × 28, 2 × 28, 28 × 1 or 28 × 2.

Automata learning
In this section, we present the experiments to learn DFAs from the MNIST classifier under the suggested abstractions.
To quantitatively validate the models, we use the following performance measures. Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. Precision is the ratio of correctly predicted positive observations to the total predicted positive observations, and Recall is the ratio of correctly predicted positive observations to all observations in actual class. F1 score is the weighted average of Precision and Recall, that is, (2 · Precision · Recall)/(Precision + Recall). Moreover, there are two kinds of observations in our experiments, namely, the input arrays and the abstracted words (i.e., the abstractions of the input arrays). So we compute the measures above with respect to both kinds of observations. Intuitively, the higher the measures above, the better the model.
The results show that all the learned DFAs perform well on the testing dataset, with the F1 score more than 50% 5 . In particular, the DFAs learned under the abstraction with block size 1 × 28 performs best. The results also show that DFA learned via the abstraction with a smaller block size can obtain a higher F1 score. In detail, the F1 score of DFA learned via the abstraction with block size 1 × 28 (28 × 1 resp.) is higher than the one with block size 2 × 28 or 28 × 2 (28 × 2 resp.). This is because that a smaller block size can generate a more preciser abstraction, which conforms to the discussion in "Abstraction". Moreover, from the results, we can see that the DFAs learned under the abstractions in rows perform better than the ones under the abstractions in columns in terms of all performance measures with respect to both words and data. For example, the F1 score of DFA learned via the abstraction with block size 1 × 28 is higher than the one with block size 28 × 1. The reason may be that the digit number of 1 is more regular in row order than in column order. In addition, we also perform the abstraction mapping a whole input as a symbol as does in Weiss, Goldberg & Yahav (2018)'s work. Due to this abstraction is over-approximated, the extracted DFA gets the worst performance in the input data layer, although it has a better performance than the other models on the word layer, especially the Precision.

Learning complexities
During the experiments, we also count the learning times in seconds needed by the resulted DFAs and the sizes of the resulted DFAs. The results are given in Table 6, where aTime, xTime and iTime respectively denote the average time, the maximum time and the minimum time needs by the resulted DFAs, and aState, xState and iState respectively denote the average number, the maximum number and the minimum number of states of the resulted DFAs.
From the results, we can see that learning the DFA via a smaller block size needs more time. As discussed before, an input array can be abstracted into a longer word under the abstraction with a smaller block size, which thus requires more time to proceed. Concerning the size of learned DFA, learning via a smaller block size can yield a larger DFA. For example, the number of the states of the DFA learned under the abstraction with block size 28 × 1 is the largest one among the results. Similar to the learning time, the reason is that an abstraction with a smaller block size yields longer words, which could enlarge the learned DFA. In addition, both the learning times and the sizes of the learned DFAs under the abstractions in rows are larger than the ones under the abstractions in columns. The reason may be that more blocks are abstracted into 0 under the abstractions in columns than the abstractions in rows.

Learned automata
Finally, we convert the learned DFAs into the format used in JFLAP (Rodger & Finley, 2006), which enables us to view the DFAs and convert DFA into regular expressions step by step. For simplicity, we consider a DFA learned via the abstraction with block size 14 × 28, which is given in Fig. 10. From this figure, we can see that (i) the learned DFA has a very clear structure: a starting node, an intermediate layer with several nodes, an accepting node and a trap node; and (ii) the learned DFA accepts words with length 2.
Let us see a DFA learned via the abstraction with block size 7 × 28, which is shown in Fig. 11. Compared with the one in Fig. 10, this DFA has a more complex structure. But we still can identity some hierarchical structures in it. It is pity that we are not able to convert this DFA into a regular expression via JFLAP due to a runtime error.
In addition, we also present a DFA learned via the abstraction with block size 1 × 28, which has 949 states and is given in Fig. 12. It is a little complex to understand, so we can only identity a rough hierarchical structure. We believe that one can understand this DFA more if he gets a closer look on it. In addition, the learned DFAs can help to generate test cases to test the networks, which are left as a future work.

Comparison
To further evaluate the resulted DFA, we compare it against the DFAs learned via the passive learning algorithms provided in LearnLib, namely, the RPNI algorithm, the RPNI- EDSM algorithm and the RPNI-MDL algorithm, and the MNIST classifier itself. In these experiments, we perform our abstraction on the training data and then learn a DFA via each passive learning algorithms provided in LearnLib, wherein the abstraction we used is the one with interval number 2 and block size 1 × 28, all the arrays are selected for , and all the positive arrays and only one ninth of the negative ones are selected for the RPNI-EDSM algorithm (to avoid memory overflow and time-consuming). Next, we evaluate all the models with the testing dataset. The results are given in Table 7, where the notations are the same as the ones of Table 5.
Compared to the RPNI one, our DFA performs better in all the performance measures, since the abstract model we use is the DFA learned via the RPNI algorithm from some inputs in the training dataset and is refined with respect to the classifier continually during learning. While compared to the RPNI-MDL and RPNI-EDSM ones, our DFA has a better Accuracy, Precision and F1 score, but a worse Recall. This is because that these two DFAs take all the positive inputs in the training set into account such that it can recognise more positive inputs in the testing dataset, while only part of positive inputs are selected for our abstracted model. The results also show that our DFA is still worse than the classifier. There are several reasons for this. The first one is that we have set some bounds (e.g., the refining time for the abstract representation) in our implementation for the learning procedure for efficiency and to avoid memory overflow. The second one is that the Wp-method test used in our experiments may miss some true counterexamples. The third one is that the abstraction may be over-approximated to yield too many conflict words. Nevertheless, our approach still needs to be improved.

LIMITATIONS
Although our approach works for the MNIST classifier, there are still some limitations. Firstly, to figure out a suitable abstraction for the neural network under learning is not an easy task. As shown in Biggio et al. (2013), Szegedy et al. (2013) and Huang et al. (2017), several DNN, including highly trained and smooth networks optimised for vision tasks, are unstable with respect to so called adversarial perturbations. Hence, some neural networks may be too sensitive to the abstraction manipulation to find a reasonable interval number. Even if a reasonable interval number were found, one need to make a compromise between the abstraction and the scalability to find a block size. Moreover, whether a turing machine can simulate a natural neural network is an open question (Zenil & Quiroz, 2006). So in some sense, we cannot define an abstraction without the conflict or the flapping.
Secondly, the scalability is another problem. Generally, the size of inputs of neural networks is in thousands. For such a neural network, either the alphabet may be too large (if a large block size is taken) or the word may be too long (if a small block size is taken) for us to extract the automaton. Taking the MNIST classifier for example, it may last several hours for some abstractions (see the abstraction with block size 1 × 28 in Table 6) to extract the automaton, even several days. There are two possible reasons for this issue in our implementation: (i) we use the implementation of RPNI algorithm from LearnLib, which does not support incremental learning proposed in Dupont (1996) and can be improved with it; and (ii) there are too many queries for the non-accepting words with invalid lengths. Thirdly, our approach is dependent on the dataset. In "Experiments", we selected the interval number and the block size via an analysis on the training dataset. Different datasets may derive different abstractions. To make things worse, it may be the case that an abstraction is suitable for the training dataset, but unsuited for some other testing dataset. Moreover, our abstract model is built from some existing testing data. Different data yields different abstract models, which could affect the results, such as the learning time and the learned DFA.
Fourthly, the implementation of equivalence query is a practical problem requiring attention. One may think that a precise equivalence check can be performed in polynomial time on the hypothesis automaton and the abstract automaton. However, the precise equivalence check could return too many false counterexamples such that it takes too much time for the learning. This is because the precise equivalence check is prone to generate a short and false counterexample that is invalid with respect to the abstraction. Indeed, we have tried this precise equivalence check, but we only succeed on the abstraction of 28 × 28 in 1 h. So considering the efficiency, we use the Wp-method test in the equivalence check, which enables us to find the counterexamples whose lengths are in a given range.
Fifthly, although our approach is black-box, the structures of neural networks may affect the performances of the networks themselves, so as the performances of the learned DFAs. We have performed our approach with the abstraction 2 × 28 on the MNIST classifiers whose hidden layer numbers range in {1, 2, 5, 10}, which are binary classification versions from the tutorial examples of DeepLearning4J. The accuracies of all the classifiers are above 99%. Figure 13 shows the F1 scores with respect to words and input data of Figure 13 The F1 scores with respect to words and input data via the abstraction 2 × 28.
Full-size  DOI: 10.7717/peerj-cs.436/ fig-13 the learned DFA. We can see that the F1 scores are quite close to each other. And the DFA learned from the larger network is not necessary to get the best performance in terms of F1 score. Therefore, we present the experimental results on only one MNIST classifier in "Experiments".

RELATED WORK
In this section, we review some related work. Existing work on DFA extraction from neural networks targets RNNs, which was extensively explored in Jacobsson (2005) and Wang et al. (2017). Omlin & Giles (1996) proposed a global partitioning of the network state space according to q equal intervals along every dimension, and then exploring the network transitions in the partitioned space. Our value abstraction adopts this partitioning, but we work on the input space, instead of the state space. Cechin, Simon & Stertz (2003) presented a approach to extract DFA using k-means and fuzzy clustering. The key idea is to classify a large sample set of reachable network state using k-means. Hou & Zhou (2018) proposed another approach to extract DFA from RNN using two clustering algorithms, namely LISOR-k and LISOR-x, on hidden states. There are several other work that adopted cluster analysis on state space, including k-means clustering (Zeng, Goodman & Smyth, 1993;Frasconi et al., 1996;Gori et al., 1998;Cohen et al., 2017), hierarchical clustering (Sanfeliu & Alquezar, 1994) and selforganizing maps (Tiňo & Šajda, 1995). These approaches have to access the state-vectors, while our approach is a black-box one.
Recently, Weiss, Goldberg & Yahav (2018) adopted active learning to extract automata from RNN. Our work is inspired by and similar to this, but different in the follows: (1) we target general neural network, not only RNN; (2) we consider an input is a word, rather than a symbol; (3) we use a DFA that is inferred from some training data as an abstract model for equivalent queries.

CONCLUSION
In this work, we have proposed a MAT framework to extract automata from neural networks, employing abstraction interpretation of the neural networks for answering membership and equivalence queries. We have implemented our approach in a prototype and have carried out some interesting experiments on a MNIST classifier. Through experiments, we have found that the DFA extracted from the MNIST classifier under the abstraction with the interval number 2 and the block size 1 × 28 performs the best. In the experiments, that our resulted DFA has a better performance than the DFAs learned via the passive algorithms provided in LearnLib on the MNIST dataset.
As for future work, we may consider a better encoding such as RLE to improve the approach. We can improve the RPNI algorithm with incremental learning to reduce the learning time. We can also perform experiments on other neural network classifiers. Other models to be extracted from neural network are under consideration.