A Reinforcement Learning Based Grammatical Inference Algorithm Using Block-Based Delta Inverse Strategy

A resurgent interest for grammatical inference aka automaton learning has emerged in several intriguing areas of computer sciences such as machine learning, software engineering, robotics and internet of things. An automaton learning algorithm commonly uses queries to learn the regular grammar of a Deterministic Finite Automaton (DFA). These queries are posed to a Minimum Adequate Teacher (MAT) by the learner (Learning Algorithm). The membership and equivalence queries which the learning algorithm may pose, are often capable of having their answers provided by the MAT. The three main categories of learning algorithms are incremental, sequential, and complete learning algorithms. In the presence of a MAT, the time complexity of existing DFA learning algorithms is polynomial. Therefore, in some applications these algorithms may fail to learn the system. In this study, we have reduced the time complexity of DFA learning from polynomial to logarithmic form. For this, we propose an efficient complete DFA learning algorithm; the Block based DFA Learning through Inverse Query (BDLIQ) using block based delta inverse strategy, which is based on the idea of inverse queries that John Hopcroft introduced for state minimization of a DFA. The BDLIQ algorithm possess $O(\vert \Sigma \vert N.log N)$ complexity when a MAT is available. The MAT is also made capable of responding to inverse queries. We provide theoretical and empirical analysis of the proposed algorithm. Results show that our suggested approach for complete learning; BDLIQ algorithm, is more efficient than the ID algorithm in terms of time complexity.


I. INTRODUCTION
Automaton learning, also known as grammatical inference, is a field in which a system is inferred in the form of an automaton by providing a sequence of inputs (i 1 , i 2 , . . . , i n ), and then synthesising the corresponding output sequence (o 1 , o 2 , . . . , o n ).There are two main concepts used in the automated learning; the Learner and the Minimal Adequate Teacher (MAT) [4]. Depending on the setting that the learning algorithm provides, the learner learns the regular set through questions and counterexamples. The learner asks questions to the MAT, and the MAT provides answers regarding the The associate editor coordinating the review of this manuscript and approving it for publication was Yu-Da Lin . unidentified regular set. It responds to two different questions:A membership query is the first type and consists of the string t ∈ * . Depending on whether or not string t is a part of the unidentified regular set, the teacher responds as ''yes'' or ''no''. The second sort of query is a conjecture, which is made up of a description of the regular set S. If S is behaviorally equivalent to the unknown language, the response is ''yes,'' otherwise it is a string t in the symmetric difference between S and the unknown language. In the second instance, the string t is referred to as a witness or a counterexample because it serves to refute the conjectured set S.
Automaton learning algorithms are created in such a way that they learn in the limit to produce the isomorphic representation of the desired Deterministic Finite Automaton VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ (DFA). E.M. Gold [19] developed the idea of ''learning in the limit'' for the first time in 1967. In his paper, he demonstrated how a regular language corresponding to some unidentified target DFA may be inferred by a finite number of inquiries or guesses using a grammatical inference or an automaton learning algorithm. There are three main categories of algorithms proposed in literature for automaton learning [35], [36], including incremental learning algorithms, sequential learning algorithms and complete learning algorithms. In incremental learning, the system under learning (SUL) is learnt in a series of increments i = 1, 2, . . . , n. At the conclusion of each increment, the learner formulates the hypothesis DFA M i and asks the teacher an equivalence question. The teacher may or may not give a counterexample in the event that the equivalency question receives a negative response. If the teacher gives a counterexample, the learner expands its learning process based on the counterexample it received and incorporates the learning material from the prior increment(s) into the new increments. Similarly, learning is also done in a number of increments in sequential learning, but the learner begins learning from scratch and does not draw on the knowledge from earlier increments. A complete learning involves learning the entire system what it needs to know in order to produce a hypothesis. The hypothesis DFA, M , is generated only when the learner has mastered the entire system (SUL).
The ability of grammatical inference [32], [39] to address a variety of real-world applications of machine learning [18], [20], [45], software engineering [1], [2], [37], robotics, artificial intelligence and big data [27] has led to its use in recent years by the computer science research community. In general, these applications employ the idea of inferring an automaton by creating a model of a system under learn (SUL) and examining it to determine whether its behaviour adheres to a specification.
In the presence of MAT, the current automaton learning algorithms have polynomial time complexity (at least in cubic form). The current automaton learning algorithms require a lot of time when inferring the model of the System Under Learn (SUL). These algorithms can occasionally fail to learn extremely complicated systems. In the paper [40] authors specifically give some real-world examples of 6 transition systems of CSS processes [13], [14], [15] like buffers, schedulers, vending machines, and mutual exclusion protocols [34], [44] where they fail to learn them due to inefficient learning algorithms (in terms of time) and a lack of storage space. In there study, authors also emphasize the need for a good automaton learning algorithm that is sufficiently efficient in terms of execution time and memory. According to the existing literature [7], even though significant progress has been made in the development of DFA learning algorithms, many researchers agree that more effective automaton learning algorithms must still be developed in order to address real-world learning issues and situations [42].
Due to the fact that the notion of the minimal adequate teacher (MAT) was first time introduced in the ID (Identification of Regular Languages) algorithm and without it the learning of a DFA is an NP-hard problem therefore in this study we propose a new efficient DFA learning algorithm called BDLIQ based on the ID algorithm. In contrast to the DLIQ algorithm described in our earlier work [21], the BDLIQ algorithm does not employ the live complete set (as an input) for learning purposes, which poses a limitation to learning the System Under Learn (SUL) in many practical applications. The BDLIQ algorithm substitutes it with a set of input alphabet for block-based learning. This learning approach also reduces its complexity from polynomial to logarithmic form.
By partitioning the states of System Under Learn (SUL) into blocks of final and non-final states, the BDLIQ algorithm identify behaviorally equivalent and non-equivalent states by traversing back to the initial state using inverse queries, which were first put forth by John Hopcroft for state minimization of DFA.
The main contributions of this research work are as follows: 1) We develop a novel and an efficient DFA learning algorithm that reduces the worst-case time complexity of complete learning process to logarithmic form. 2) We introduce the concept of delta inverse (δ −1 ) and Inverse Queries(IQ) in DFA learning. 3) We improve the capabilities of the current MAT to make it capable of answering inverse queries. The remainder of the paper is structured as follows: in Section II, we provide background information to help readers comprehend the proposed algorithm and in Section III, we discuss relevant research in the field. The proposed BDLIQ algorithm is presented in Section IV. In Section V we study and describe the BDLIQ algorithm's time complexity, the BDLIQ algorithm's termination and correctness are proven in Section VI, and a functioning example of the algorithm is demonstrated in Section VII. We contrast the effectiveness of our suggested BDLIQ algorithm and the ID algorithm in Section VIII. At the end, in Section IX, we conclude our findings.

II. PRELIMINARIES AND NOTATIONS
A Deterministic Finite Automaton (DFA) A consists of a five tuples ⟨Q, , δ, q 0 , F⟩ where: Q denotes the finite set of states. A finite set of input symbols is represented as . The transition function δ determines the subsequent state when an input symbol is read from a certain state δ : Q × → Q. The start state is represented by the state q 0 ∈ Q. The collection of final states is F ⊆ Q. Definition 1: Let for a DFA A, having transition function δ : Q × → Q which can also be written as δ(q i , σ ) = q j and the iterated transition function δ * : Q × * → Q inductively defined by δ(q, λ) = q where λ is an empty string and δ * (q j , b 1 , b 2 , . . . , b n ) = δ(δ * (q, b 1 , b 2 , . . . , b n−1 ), b n ) Likewise, we inductively define δ −1 * using the inverse transition relation δ −1 . Where δ −1 : Q × ⊆ Q and can also be written as δ −1 (q j , σ ) ⊆ Q where −1 denotes an inverse transition by reading an element of from a state Q to give its predecessor states. The inductive definition of δ −1 * is now simple to follow as δ −1 (q, λ) = q, if and only if q is a starting state otherwise it returns □ The Inverse Query (IQ) is a question which is asked by the learner from the MAT about the predecessor state(s) of a state q j , by reading some string α ∈ * from it, i.e., δ −1 * (q j , α) =? The teacher's response is either an empty set or a set of one or more states.
A block is a collection of states indicated by the symbol B(num), where num is the block number. Let B(k) stand for the k th block in the collection of blocks. The number of states in a block is indicated by the symbol |B(k)|, which also serves as the block's size. A LearningBlock is a collection of all the blocks that have already been en-queued for learning purposes in BlockQueue, where BlockQueue is a queue and each element is represented as B new . A collection of blocks called B num is used to keep predecessor states in the corresponding block number, such as B(num ′′ ). The BlockSet is a collection of blocks that correspond to the states of the hypothesis DFA.

III. RELATED WORK
Many studies have been done on automaton learning during the previous few decades. Some researchers have concentrated in particular on previously developed learning algorithms and have attempted to determine the effects of the size of the alphabet and the number of states on the required number of membership queries. Some researchers have contributed by outlining the restrictions placed on equivalence queries, using [5]. Numerous of them have concentrated primarily on the kinds of queries and their importance in automaton learning [6], [8], [9], [10], [11], [22] and in learning regular expressions [26].
Some researchers have studied how to use grammatical inference in various contexts like robotic planning [12] vehicle platooning [31], [25] and automated verification of distributed control programs [24]. Some researchers have proposed the algorithms to learn the random DFA [7], nondeterministic input enabled labelled transition systems [43] and Buchi Automata based on families of DFA's [28].
According to the best of our knowledge, in literature the existing complete learning algorithms other than DLIQ are L * [4], ID [3] and RPNI [16].
The learning algorithm L * is complete. Two different types of queries; membership and equivalence queries are used to infer a regular language. It asks questions about membership and stores the answers in a table known as a observationtable (OT). Before making a hypothesis and requesting equivalence questions, the OT must satisfy two fundamental criteria. According to [4], these characteristics include closure and consistency. A hypothesis can be created when the OT is closed and consistent. Up until OT becomes consistent and closed, the L * continues to learn.
According to [3], the ID algorithm is a complete learning algorithm. To learn the regular set, it poses membership questions from the teacher (MAT). This algorithm was the one that initially introduced the idea of MAT. The terms live states, live complete set, distinguishing strings and dead state d 0 are used in this algorithm. The ID algorithm creates a table and searches it for the blocks of accepting and nonaccepting states. When the first iteration is finished, the ID algorithm selects two strings from the live complete set that behave similarly, but when some σ ∈ is concatenated with one of the strings, the other behaves differently, moving to the rejecting block. This provides a possible distinguishing string. The hypothesis DFA, H, which is isomorphic to the target DFA is constructed by the ID algorithm if it fails to find such a pair of strings.
According to Oncina Garcia in 1992, the RPNI algorithm is a passive learning algorithm. The information regarding the hypothesis is stored in a tree data structure rather than a table, and consistency is not maintained. It doesn't employ membership queries for learning purposes. It requires two input sets, S + and S − , which stand for positive and negative examples, respectively. It creates the prefix tree PT (S + ) from the set of positive examples and their prefixes after first writing the items of S + and its prefixes in lexical order. Then, it divides the tree's branches into blocks in a recursive manner. Each element of PT (S + ) is initially a part of the block that it self-contains. These blocks are subjected to joint operation in a recursive manner by the RPNI algorithm in order to be combined into two final blocks. The accepting state block is the first, while the non-accepting state block is the second.
If we examine the complexity of these complete learning algorithms, we can find that, when a adequate teacher is present, the complexity of these learning algorithms is polynomial as shown in table 1. The John Hopcroft established the idea of state minimization, and in his algorithm he had employed the tactic of creating blocks of final and nonfinal states. He found the similar states to create the minimal target automaton using the δ −1 transition approach. He stated that his technique generates a minimal target automaton with nlogn complexity.
In this paper, we present BDLIQ, a new effective DFA learning algorithm based on the concepts of the ID algorithm along with inverse transition strategy of John Hopcroft algorithm for state minimization of DFAs, which is based on the existing literature for automaton learning and state minimization of an automaton. The current study aims to decrease bounds for complete DFA learning by reducing the complexity of DFA learning from polynomial(cubic) to logarithmic form. The comparison of our proposed approach with existing DFA (Complete) learning algorithms is provided in table 1.

IV. THE PROPOSED BDLIQ ALGORITHM
The proposed BDLIQ algorithm is presented in Algorithms 1 and 2.

V. COMPLEXITY ANALYSIS
The minimum adequate teacher (MAT) has polynomial time complexity, but for the query complexity analysis, [4] its complexity is taken as constant. For a membership and δ −1 query, the MAT has O(1) time complexity, however for an equivalence query, it has polynomial time complexity. Table construction

VI. CORRECTNESS AND TERMINATION
The algorithm's accuracy is proven by the fact that it successfully creates a learned DFA that is consistent with the desired DFA A.

Theorem 2 (Correctness Theorem):
The BDLIQ algorithm terminates on BlockQueue and the hypothesis automaton M is a canonical representation of A.
Proof: In order to learn more about the grammar of the target DFA A, the BDLIQ algorithm poses inverse and equivalence queries. The fact that BDLIQ explicitly learns through inverse queries must be noted. To establish this theorem, we rely on the following two premises: (a) The Termination Theorem 1 establishes that learning terminates when all elements of the set LearningBlock those are enqueued in Block-Queue are exhausted. (b) The BDLIQ algorithm combines all the states of target automaton A that have behavior-equivalent values into a single state block, and the hypothesis automaton M is then built utilising all the blocks formed as states of the learnt hypothesis. As a result, the hypothesis automaton M is a unique minimal representation of A. Combining (a) and (b) prove the theorem. □

VII. AN EXAMPLE
We now provide an example to demonstrate how the BDLIQ algorithm functions. For this, we consider an example automaton taken from [41] given in Fig. 1.
The inputs of the algorithm consist of the Target Automaton (A) (given in Figure 1    Each block represents a single state. By reading the δ −1 table input transitions, algorithm will construct the hypothesis automaton M (described in Fig. 2).

VIII. COMPARISON OF BDLIQ AND ID ALGORITHM
To compare the performance of BDLIQ with other relevant algorithms, we looked at the algorithms listed in Table 1, which lists all the complete learning algorithms that are currently available in the literature. Only three other learning algorithms, the L*, ID, and RPNI, are complete learning algorithms, similar to the BDLIQ algorithm proposed in this study. The RPNI cannot be compared to BDLIQ since it is based on passive learning while the L*, ID, and BDLIQ are based on active learning. Therefore, only the L* and/or the ID can be utilised for BDLIQ comparison. However, the L * 's obligation to utilise a counterexample whenever the hypothesis it builds is not equal to the target automaton is a key distinction from other. This characteristic is lacking in both the ID and the BDLIQ algorithms and makes learning more directed when compared to them. Therefore, We were left with only the ID algorithm to perform comparison with the BDLIQ algorithm as both share almost all comparison parameters with the exception of the inverse query.
For the comparison of BDLIQ and ID algorithms we have setup an evaluation framework (given in Fig. 3) consisting of following modules: 1) A target DFA (A) 2) A random DFA generator 3) BDLIQ and ID algorithms 4) A DFA equivalence checker

A. EXPERIMENTAL SETUP
We have used Java language to implement the BDLIQ and ID algorithms. We used a computer with Windows 8.1 pro, 16GB of RAM, and an Intel Core i5-3470 processor to carry out the research. We have run numerous tests on both of these algorithms. Due to two key parameters of the target DFA A, the tests varied. These parameters include: 1) The state size |Q| 2) The input alphabet size | | We set up the tests to be run with state size |Q| ranging between 10, 20, 30, . . . , 100 and alphabet size varied between 2, 4, . . . , 10 in order to analyse the performance of both of these algorithms. Randomly generated target DFAs included every possible pairing of the two parameters such as for (|Q|=10, | |=2), (|Q|=10, | |=4), (|Q|=10, | |=6), (|Q|=10, | |=8), (|Q|=10, | |=10). Additionally, we conducted the studies ten times for a given parameter configuration before compiling the data to determine the mean of the ten trials.

B. EMPIRICAL EVALUATION
For comparison of both these algorithms (BDLIQ and ID), we have considered the following two parameters: 1) Number of queries (posed by the learner to the MAT for learning purpose) 2) The learning time(ms) (time taken by equivalence checker not included)

C. RESULTS AND ANALYSIS
The calculated results are provided in tables 3, 4, 5 and 6. Results given in table 4 show that due to learning through inverse queries instead of membership queries the overall number of queries posed by the BDLIQ algorithm for learning purpose is very small than the ID algorithm (given in  are less than the parameter set | | = 2 and number of states |Q| = 100 of ID algorithm. Similarly, we can see that results given in table 5 and table 6 describe the same picture regarding learning times. Using delta inverse strategy in BDLIQ algorithm, the learning time of BDLIQ algorithm is very less than the ID algorithm which uses conventional delta strategy for learning purpose. We can see that the maximum value against the parameter set | | = 10 and number of states |Q| = 100 is more than 20 times less than the value computed against the learning time of ID algorithm which clearly shows the efficiency of BDLIQ algorithm over ID algorithm with respect to learning time.
According to the comparative analysis presented in Fig.4 and Fig.5, we can see that due to the large number of queries posed by the ID algorithm for learning purpose, it experiences a significant increase in the number of queries and learning time as the number of states or input alphabet size increases, whereas due to the delta inverse strategy and inverse queries used in BDLIQ algorithm, the graphs of the BDLIQ algorithm grow relatively slowly and show logarithmic behavior with increase in number of states or input alphabet size. It follows that in terms of time and the number of queries pose to the MAT, the BDLIQ algorithm is more efficient than the ID algorithm.

1) DISCUSSION
All the existing complete learning algorithms described in Table 1 learn target DFA using δ strategy and possess polynomial time complexity. Our proposed BDLIQ algorithm uses δ −1 strategy due to which it learns model from final to start state unlike other learning algorithms. The L * and ID algorithms are active in nature and uses membership queries for learning purposes whereas RPNI algorithm is passive in nature and does not pose any kind of query. In other hand, the BDLIQ algorithm uses IQs and successfully infer model by posing smaller number of queries than other complete learning algorithms. Therefore, the learning time of the BDLIQ algorithm becomes small and ultimately it possesses worst case time complexity in nlogn form.

IX. CONCLUSION
The existing grammatical inference algorithms have at-least polynomial time complexity due to which these algorithms take a lot of time to infer the required model. In some situations, these algorithms even become fail to learn extremely complicated systems. In this paper, we have developed a revolutionary complete learning algorithm called BDLIQ for learning deterministic finite automata (DFAs), which reduces the complexity of DFA learning from polynomial to nlogn form such as O(| |N .logN ). For this, we have used the delta inverse strategy along with inverse queries. Additionally, we have performed empirical analysis of our proposed algorithm with existing complete learning algorithms possessing similar characteristics like BDLIQ algorithm such as; ID algorithm. We have used two parameters; number of queries posed by the learning algorithm and learning time (ms). Results demonstrated that the BDLIQ outperforms the ID algorithm in terms of number of queries and learning time. By utilising inverse queries, this technique enhances the entire learning process of DFAs. In future, we intend to apply this learning approach to design incremental learning algorithm as well.