Runtime analysis of the (1+1) EA on computing unique input output sequences

Computing unique input output (UIO) sequences is a fundamental and hard problem in conformance testing of ﬁnite state machines (FSM). Previous experimental research has shown that evolutionary algorithms (EAs) can be applied successfully to ﬁnd UIOs for some FSMs. However, before EAs can be recommended as a practical technique for computing UIOs, it is necessary to better understand the potential and limitations of these algorithms on this problem. In particular, more research is needed in determining for what instance classes of the problem EAs are feasible, and for what instance classes EAs are provably bet- ter than random search strategies. This paper presents rigorous theoretical and numerical analyses of the runtime of the (1 + 1) EA and random search on several selected instance classes of this problem. The theoretical analysis shows ﬁrstly, that there are instance classes where the EA is efﬁcient, while random testing fails completely. Secondly, an instance class that is difﬁcult for both random testing and the EA is presented. Finally, a parametrised instance class with tunable difﬁculty is presented. The numerical study estimates the constants in the asymptotic expressions obtained in the theoretical analysis, and the variability of the runtime. The numerical results ﬁt well with the theoretical results, even for small problem instance sizes. Together, these results provide a ﬁrst theoretical characterisation of the potential and limitations of the (1 + 1) EA on the problem of computing UIOs.


a b s t r a c t
Computing unique input output (UIO) sequences is a fundamental and hard problem in conformance testing of finite state machines (FSM). Previous experimental research has shown that evolutionary algorithms (EAs) can be applied successfully to find UIOs for some FSMs. However, before EAs can be recommended as a practical technique for computing UIOs, it is necessary to better understand the potential and limitations of these algorithms on this problem. In particular, more research is needed in determining for what instance classes of the problem EAs are feasible, and for what instance classes EAs are provably better than random search strategies.
This paper presents rigorous theoretical and numerical analyses of the runtime of the (1 + 1) EA and random search on several selected instance classes of this problem. The theoretical analysis shows firstly, that there are instance classes where the EA is efficient, while random testing fails completely. Secondly, an instance class that is difficult for both random testing and the EA is presented. Finally, a parametrised instance class with tunable difficulty is presented. The numerical study estimates the constants in the asymptotic expressions obtained in the theoretical analysis, and the variability of the runtime. The numerical results fit well with the theoretical results, even for small problem instance sizes. Together, these results provide a first theoretical characterisation of the potential and limitations of the (1 + 1) EA on the problem of computing UIOs.

Introduction
As modern software systems grow larger and more complex, there is an increasing need to support the software engineer with tools for automating some of the software engineering tasks. The field of search based software engineering (SBSE) approaches this challenge in a novel way by reformulating software engineering problems into optimisation problems. Such a reformulation has allowed the automation of a wide range of software engineering tasks using evolutionary algorithms and other randomised search heuristics [2].
The increasing popularity of SBSE approaches is partly due to the relatively ease with which search heuristics can be adapted to new problem domains. In principle, the only required ingredients in a search based approach is an encoding of candidate solutions and a way of comparing the quality of two candidate solutions. In contrast, developing problem-spe-cific algorithms may require deep insight into the problem structure. The development of problem-specific algorithms is further complicated by the fact that many software engineering problems are NP-hard [3].
However, before SBSE approaches can be widely adopted in the industry, some challenges must be addressed. In particular, it is hard to predict whether a search heuristics will be successful on a given optimisation problem. In some cases, search heuristics fail to find any solution to a problem within acceptable time. Such failures can happen for several reasons. Firstly, the search heuristic applied may not be the most appropriate search heuristic among the many search heuristics that have been developed. From a general point of view, the No Free Lunch (NFL) theorem limits the comparative advantage a given search heuristic can have on a wide problem classes [4]. Although the conditions of the NFL theorem do not always hold for practical problems [5], it is reasonable to assume that the effectiveness of a search heuristic depends on how well it is adapted to the problem. Hence, for a given SE problem, one search heuristic may fail, whereas another succeeds. Secondly, a search heuristic may fail if its parameters are not appropriately tuned to the problem. In the case of evolutionary algorithms, it is known that minor changes in parameters like the use of crossover operator [6,7], population size [8,9], diversity mechanisms [10,11], and the mutation-selection balance [12] can have dramatic impacts on the runtime. A third reason for failure is related to the computational intractability of many software engineering problems. Any search heuristic applied to an NP-hard optimisation problem will fail on at least on some instances of the problem, unless some widely held conjectures in computational complexity do not hold.
Unless these causes of failure are not properly addressed, the SBSE techniques will be associated with some degree of unreliability. We therefore argue that studies of search based approaches to a software engineering problem should consider the questions: Which of the many available search heuristic is best suited for the problem? How should the parameter settings be adjusted? Which instances of the problem are tractable for the search heuristic, and which instances are hard? We claim that such questions would be most rigorously answered by a theoretical analysis. However, except for a few studies [1,[13][14][15], all previous research in SBSE has been experimental.
To answer the questions above rigorously, it is necessary to specify more clearly what it means that a search heuristic is successful on a problem. For a given search heuristic and problem class, one can initially ask whether the heuristic will ever find a solution, if it is allowed unlimited time. This type of questions falls within the realms of convergence analysis, which is a well-developed area [16]. There exist simple conditions on the underlying Markov chain of a search heuristic that guarantee convergence in finite time. These conditions often hold for the popular search heuristics [16]. However, convergence itself gives very little information about whether a search heuristic is successful in practice, because no limits are put on the amount of resources the algorithm uses. If convergence can be guaranteed within unlimited time, the next question to ask is how much time the search heuristic needs to find the solution. This type of questions falls within the realms of runtime analysis, where one tries to estimate the runtime as a function of the problem instance size. Similar to the case of classical algorithms, one can make the broad distinction between efficient algorithms that find the solution in polynomial time, and inefficient algorithms, that need exponential time to find the solution. Runtime analysis of search heuristics is very challenging, partly because of their randomised nature. Nevertheless, the techniques for analysing search heuristics have improved rapidly over the last decade, to the point where the runtime of search heuristics can now be analysed on classical problems in combinatorial optimisation [17]. We suggest that the theoretical methods that have been developed for analysing the runtime of search heuristics can be applied in theoretical studies in search based software engineering. Some of the possible avenues for this type of research have already been outlined in [18].
To initiate such a theoretical study, we therefore consider the domain of finite state machine (FSM) testing, which is an area with a long history in software engineering [19]. Techniques for testing finite state machines have traditionally been applied to test implementations of communication protocols [20]. However FSM testing techniques have also been applied elsewhere, including traditional software testing domains [21]. Finite state machine testing has also been a popular research area within search based software engineering [22][23][24][25][26][27][28]24,29]. All previous research on FSM testing in search based software engineering has been empirical.
We will focus on the specific problem of computing unique input output (UIO) sequences for finite state machines [19]. Unique input output (UIO) sequences are related to conformance testing of finite state machines, which consists of checking whether an implementation machine is equivalent with a specification machine. While one has full information about the specification machine, the implementation machine is given as a black box. To check the implementation machine for faults, one is restricted to input a sequence of symbols and observe the outputs the machine produces. A fundamental problem which one will be faced with when trying to come up with such checking sequences is the state verification problem, which is to assess whether the implementation machine starts in a given state [19]. One way of solving the state verification problem is by finding a unique input output sequence (UIO) for that state. A UIO for a state is an input sequence which, when started in this state, causes the FSM to produce an output sequence which is unique for that state.
Computing UIOs is hard. All known algorithms for this problem have exponential runtime with respect to the number of states. Lee and Yannakakis proved that the decision problem of determining whether a given state has a UIO or not is PSPACE-complete, and hence also NP-hard [30]. In the general case, it is therefore unlikely that there will ever be an efficient method for constructing UIOs and one cannot hope to do much better than random search or exhaustive enumeration. The application of evolutionary algorithms or any other randomised search heuristic cannot change this situation. However, the existence of hard instances does not rule out the possibility that there are many interesting instances that can be solved efficiently with the right choice of algorithm. On such ''easy'' instances, EAs can potentially be more efficient than exhaustive enumeration and random search.
Guo et al. reformulated the problem of computing UIOs into an optimisation problem to which he applied an EA [24]. When comparing this approach with random search for UIOs, it was found that the two approaches have similar performance on a small FSM, while the evolutionary approach outperforms random search on a larger FSM. Derderian et al. presented an alternative evolutionary approach which also allows the specification machine to be partially specified [22]. Their approach was compared with random search on a set of real-world FSMs and on a set of randomly generated FSMs. Again, it was found that the evolutionary approach outperformed random search on large FSMs. Furthermore, the difference in performance increased with the size of the FSM. Although previous experimental research have show that there are instances of the problem where the evolutionary approach is preferable over a simple random search strategy, more research is needed to get a deeper understanding of the potential of EAs for computing UIOs. Such a deeper insight can only be obtained if the experimental research is complemented with theoretical investigations.
Runtime analysis of EAs is difficult. When initiating the analysis in a new problem domain, it is an important first step to analyse a simple algorithm like the (1 + 1) EA. Without understanding the behaviour of such a simple algorithm in the new domain, it is difficult to understand the behaviour of more complex EAs, e.g. those EAs that use a population and crossover. Although the (1 + 1) EA is relatively simple compared to other evolutionary algorithms, recent research has shown that this algorithm is surprisingly efficient on a wide range of useful problems [17], including sorting [31], minimum spanning tree [32] and Eulerian cycle [33].
Our objective with the theoretical investigation is not to propose a new evolutionary approach to computing UIOs, and we do not claim that the evolutionary approach taken here is more efficient than previous approaches. Rather, we would like to consider a sufficiently simple scenario that allows a theoretical characterisation to be undertaken. We would like to analyse whether evolutionary algorithms can outperform random search in this problem domain, and we would also like to understand what types of FSMs the EA can find UIOs efficiently, and for what types of FSMs the problem is hard for the EA.
This paper extends the preliminary conference version of the paper [1] in several ways. The theoretical study has been complemented with an extensive numerical study. The runtime analysis now considers a generalised variant of the (1 + 1) EA that operates on general input alphabets. Furthermore, the easy FSM problem instance class has been generalised to modulo-n counters over general input alphabets, and the hard FSM problem instance class is generalised to the wider class of sequence detection FSMs.
The rest of this paper is structured as follows. Section 2 recalls the basic definitions for FSMs and UIO sequences. Section 3 describes the evolutionary approach for computing UIO that will be considered in this paper. In particular, the section describes how candidate solutions are encoded, the definition of the fitness function and how the (1 + 1) Evolutionary Algorithm has been adapted for the problem. In addition, Section 3 defines the expected runtime of an evolutionary algorithm. Section 4 contains the main theoretical runtime results from the paper. Section 4 describes the experimental methodology, and Section 5 presents the experimental results and compares these with the theoretical results obtained in Section 4. The paper is concluded with a discussion and conclusion in Sections 7 and 8.

Notation
Symbol denotes the empty string. The length of a string x is denoted 'ðxÞ. Concatenation of strings x and y is denoted x Á y, and x i denotes i concatenations of x. Standard notation (e. g., O, X and H) for asymptotic growth of functions (see, e.g. [34]) is used in the analysis. At any point in time, an FSM M is in exactly one state s in S. When receiving an input a from I, the machine outputs symbol kðs; aÞ and goes to state dðs; aÞ. The domain of the state transition function d and the output function k is generalised to nonempty strings over the input alphabet, i.e. dðs; a 1 a 2 Á Á Á a n Þ :¼ dðdðs; a 1 a 2 Á Á Á a nÀ1 Þ; a n Þ and kðs; a 1 a 2 Á Á Á a n Þ :¼ kðs; a 1 Þ Á kðdðs; a 1 Þ; a 2 Á Á Á a n Þ:

Finite state machines
Definition 2 (Unique input output sequence [19]). A unique input output sequence (UIO) for a state s in an FSM M is a string x over the input alphabet of M such that kðs; xÞ-kðt; xÞ for all states t; t-s. Fig. 1. An edge ðs i ; s j Þ labelled i=o defines the transition dðs i ; iÞ ¼ s j and the output kðs i ; iÞ ¼ o. The input sequence acac is a UIO for state s 1 , because only starting from state s 1 will the FSM output the sequence 0001. The single input symbol c is a UIO for state s 2 , because only state s 2 has output 1 on input c. This FSM does not have any distinguishing sequence, because for every input symbol x in the FSM, there exists at least two states s and t such that kðs; xÞ ¼ kðt; xÞ and dðs; xÞ ¼ dðt; xÞ.

Representation and fitness function
Previous research on computing UIOs with EAs have considered different types of representations and fitness functions [22,24]. The purpose of this paper is not to propose a new evolutionary approach to computing UIOs, but to analyse the runtime behaviour of an EA when using what we consider is the most straightforward and simple representation and fitness function. We will therefore build on the existing approaches, and where we consider it natural, make some modifications. The research question as to which representation and fitness function are the most appropriate when computing UIOs is left open for future research.
Following [24], candidate solutions are represented as strings over the input alphabet I of the FSM. The length of the input sequences among which the EA is searching for UIOs has to be bounded in some way. It is known that there exists FSMs where the shortest UIOs have exponential length with respect to the number of states n [30]. However, in order to obtain a fitness function that is computationally feasible, it is necessary to bound the length of input sequences LðnÞ to some small polynomial in n. As all the FSM classes studied here have UIOs of length n, we therefore bound the input sequence length to LðnÞ :¼ n.
The representation used here differs from previous representations in that we do not use ''don't care''-symbols proposed in Guo et al. [24]. Such symbols do not cause any state transition in the FSM, and can therefore be removed from the solutions provided by the EA to obtain UIOs shorter than the representation length LðnÞ. We will not consider ''don't care''-symbols because shorter UIOs can still be obtained through a simple post-processing stage. For every UIO x of length LðnÞ that has been obtained by the EA, it is easy to check in polynomial time whether any prefix x 1 Á Á Á x i of x, 1 < i < LðnÞ is also a UIO.
Similarly to Guo et al. [24], we will define the fitness of an input sequence as a function of the state partition tree induced by the input sequence. Intuitively, the state partition tree of an input sequence represents how increasingly long prefixes of the input sequence partitions the set of states according to the output they produce. Fig. 1(right) gives an example of a state partition tree for input sequence acac on the FSM in Fig. 1(left). The root node is the set of all nodes. On input symbols ac, state s 1 and s 4 output 00, while states s 2 and s 3 output symbol 01. The two partitions fs 1 ; s 4 g and fs 2 ; s 3 g, are divided consecutively on further inputs, and finally into three partitions fs 1 g; fs 4 g and fs 2 ; s 3 g on the final input acac. Each singleton fs i g in a state partition tree indicates that the corresponding input sequence is a UIO for that state s i .
In previous work [22,24] the EA was given the task to obtain UIOs for all states simultaneously. The fitness functions were defined to reward input sequences that are close to be UIOs, regardless of for which state in the FSM. One could hope that given a sufficiently diverse population, different individuals would encode UIOs for different states. However, as was found in previous research [24], it was hard to maintain such a diverse population, and the input sequences tended to converge towards a few similar input sequences. We would argue that the problem of computing UIOs for two states s and t is two different objectives. In general, these objectives can be conflicting, as the UIO for state s can be significantly different from the UIO for state t. As an example, consider the FSM in Fig. 1. The shortest UIO for state s 4 is the sequence b, however none of the other states in this FSM has a UIO beginning with symbol b, hence the objective of computing a UIO for state s 4 conflicts with the objective of computing UIOs for the other states. In the exceptional case that none of the objectives are conflicting, the FSM is likely to contain a distinguishing sequence (DS) [19], and there is no need to apply an EA.
Here, we will therefore consider what we think is a more natural approach, to search for a UIO for one state at a time. UIOs for all the states in the FSM can be obtained simply by re-running the EA, each time with a different fitness function corresponding to a new state. The fitness of an input sequence with respect to a state s is hence defined with respect to the cardinality of the leaf containing state s in the state partition tree. The instance size of a fitness function f M;s is defined as the number of states n in FSM M. The value of c M ðs; xÞ is the number of states in the leaf node of the state partition tree containing node s, and is in the interval from 1 to n. If the shortest UIO for state s in FSM M has length no more than n, then f M;s has an optimum of n À 1. As an example of Definition 3, consider the FSM in Fig. 1, for which the fitness function takes the values f M;s 1 ðacacÞ ¼ 3 and f M;s 1 ðaaaaÞ ¼ 0. In all the instances presented here, the objective is to find a UIO for state s 1 . To simplify notation, the notation f M will therefore be used instead of f M;s , and the notation cðxÞ will be used instead of c M ðs; xÞ, where the FSM M is given by the context.

Evolutionary algorithms
Runtime analysis of evolutionary algorithms in new problem domains are often initiated with a simple evolutionary algorithm called (1 + 1) EA [35,17]. The (1 + 1) EA keeps a single individual represented as a bitstring of length n, and mutates this individual by flipping each bit with probability 1=n. One way of directly apply the (1 + 1) EA to the UIO problem would be to encode input symbols in binary. However, such binary encodings are inconvenient for input alphabet cardinalities other than a power of two. Instead, we consider a generalised (1 + 1) EA which operates on any fixed input alphabet I. In this algorithm, the individual is represented as a string of length n over the input alphabet I. In the mutation step, each string position i in a bitstring x is mutated independently with probability 1=n by setting x i to a randomly chosen input symbol r different from the original symbol x i . Clearly, with the binary input alphabet I ¼ f0; 1g, the generalised (1 + 1) EA becomes identical to the classical (1 + 1) EA. if with probability 1/n then 6: x 0 i r, where r is sampled uniformly at random from I n fx i g 7: end if 8: end for 9: if f ðx 0 Þ P f ðxÞ then 10: x x 0 11: end if 12: until termination condition met We say that one step of the (1 + 1) EA is one iteration of the Repeat-loop in the algorithm. In each step of (1 + 1) EA, the fitness value f ðx 0 Þ must be evaluated. We can assume that the fitness value f ðxÞ of the current search point x is stored in a local variable. Hence, after step t of the algorithm, the fitness function has been evaluated t times. In the black box scenario, the runtime complexity of a randomised search heuristic is measured in terms of the number of evaluations of the fitness function, and not in terms of the number of internal operations in the algorithm [36]. For a given function and randomised search heuristic, the expected runtime is defined as the mean number of fitness function evaluations until the optimum is evaluated for the first time. The runtime on a class of fitness functions is defined as the supremum of the expected runtimes of the functions in the class [35,37].
A function is considered easy for the (1 + 1) EA if the expected runtime is bounded from above by a polynomial in n. Conversely, a function is considered hard for the (1 + 1) EA if the expected runtime is bounded from below by an exponential function in n.

Runtime analysis
This section analyses the behaviour of the (1 + 1) EA on the problem of computing UIOs for different classes of FSMs. We would like to emphasise that our primary concern here is not with these FSM classes themselves. Rather, the FSM classes are studied to shed light on what distinguishes the tractable from the intractable classes of FSMs for the (1 + 1) EA on the UIO problem. Given the NP-hardness of the UIO problem, it is clear that EAs will fail on certain FSM classes, and we would like to distinguish those from the tractable FSM classes. For this reason, it is desirable to consider FSM classes that are not too intricate or too closely tied to a particular application, such that the reasons for failure or success of the (1 + 1) EA can become more easily recognisable, and possibly be used as guidelines when studying the behaviour of evolutionary algorithms in more application-near cases of the UIO problem.
Nevertheless, the two first FSM classes that are studied here turn out to play fundamental roles in practical applications of FSMs. The tractable class is the modulo n-counter FSMs [38]. Such FSMs are used widely where there is a need to report when a certain number of events of a given type has occurred. A simple example of such FSMs is the binary counter. The second class of FSMs, which is intractable for the (1 + 1) EA, is the sequence detector FSMs [38]. These FSMs listen to a stream of symbols, and emit a signal every time a specified sequence of symbols occurs in the stream. Sequence detector FSMs are also widely used, for example in the lexical analysis component of compilers, in electronic key locks, or in communication systems where one wants to recognise start and stop signals in a bitstream. Both of these FSM classes typically occur as components within a larger system.
A runtime analysis of the (1 + 1) EA for a given FSM M requires a certain level of information about the fitness landscape of the corresponding fitness function f M . The runtime analyses are therefore carried out in two steps. The first step is a characterisation of the function values of the fitness function f M obtained via Definition 3. The second step is the runtime analysis of the search heuristics on fitness function f M .
As described above, the fitness function is defined with respect to a single state. To obtain UIOs for all n states in an FSM, the EA should be re-run n times, once for each state. The runtime analysis will only focus on the time to find a UIO for a single state s 1 for the following reason. If T i is the runtime to obtain a UIO for state i, then the overall runtime T to find UIOs for all n states will be T ¼ P n i¼1 T i . Hence, the asymptotic upper and lower bounds on runtime Hence, to characterise within a linear factor the asymptotic runtime to find the UIOs for all states, it suffices to analyse the runtime on the hardest state.

Easy FSM instance class
Our first aim is to construct a class of problem instances which is hard for random search, while being easy for the (1 + 1) EA. In order to be hard for random search, the length of the shortest UIO for state s 1 must be at least linear in n, and there must be few UIOs of this length. To keep the instance class easy for the (1 + 1) EA, the idea is to ensure that the resulting fitness function has few interactions among the variables. It is well known that the (1 + 1) EA optimises all linear functions efficiently [37,35].
Definition 4 (Modulo-n counter FSM class). For instance sizes n; n P 2, define an FSM E with input alphabet I of constant cardinality m having a special input symbol 1 2 I, and output alphabet O :¼ fconst; incr; resetg, and n states S :¼ fs 1 ; s 2 ; . . . ; s n g. For all states s i , define the output function k as kðs i ; xÞ :¼ reset if x ¼ 1; and i ¼ n; incr if x ¼ 1; and i < n; and const otherwise: For all states s i , define the state transition function d as s iþ1 if x ¼ 1; and i < n; and s i otherwise: The objective is to find an UIO of length n for state s 1 .
The instances in Definition 4 are illustrated in Fig. 2. This FSM is a modulo-n counter, which counts the number of 1-symbols received, and outputs the special symbol reset after n inputs of symbol 1. Note that ðs n ; s 1 ; 1=resetÞ is the only state transition with a distinguishing input/output behaviour. Furthermore, the states will never collapse, i. e. dðs i ; xÞ-dðs j ; xÞ for any input sequence x and any pair of different states s i and s j . It is easy to see that any sequence of length n where at most one symbol is different from 1, is a UIO for state s 1 . We show that the easy instance class leads to a fitness function which is very similar to the well-known fitness function ONEMAX [39]. The proof is in the appendix.
Using fitness function f E , the expected time until random search finds a UIO for state s 1 is geometrically distributed with parameter p ¼ ðnm þ 1Þ Á m Àn . Furthermore, the probability that random search finds a UIO for state s 1 in less than e cÁn iterations is exponentially small e ÀXðnÞ , where c is a constant.
Proof. An optimal solution has at most one symbol different from 1. Hence, the probability that the uniformly sampled sequence in any iteration of random search is optimal is p : For n > 5 and m P 2, we have p < e Àn=4 . The probability that random search finds an optimal solution within e cn ; n P 6, steps is thus no more than The runtime analysis of (1 + 1) EA on the problem of computing a UIO for state s 1 is similar to the well-known analysis of the (1 + 1) EA on the ONEMAX problem [39]. However, because we consider a generalised search space I n , where jIj ¼ m, we need to consider a more general case.

Theorem 2.
Using fitness function f E , (1 + 1) EA will find a UIO for the state s 1 in the modulo n counter FSM in expected time Oðnm log nÞ, where m P 2 is the size of the input symbol alphabet of the FSM.
Proof. By the values of fitness function f E , in non-optimal search points x and y, f E ðxÞ P f E ðyÞ if and only if search point x has at least as many 1-symbols as search point y. So in a given step of (1 + 1) EA, the mutated search point x 0 will only be accepted if it has at least as many 1-symbols as search point x. If x 0 has more 1-symbols than x, we say that the step is successful. When x has i symbols different from 1, the probability of a successful step is at least i=ðnðm À 1ÞÞ Á ð1 À 1=nÞ nÀ1 P i=enm.
Search points with at least n À 1 1-symbols are optimal, hence it suffices to wait for n À 1 successful steps to find the optimum. The expected runtime of (1 + 1) EA is therefore bounded from above by P n i¼2 enm=i ¼ Oðnm log nÞ. h The result in Theorem 1 means that random search is a highly inefficient strategy for computing UIOs for modulo-n counter FSMs. As such FSMs are quite common in applications, this result means that one cannot rely on random search as a general strategy for computing UIOs. The problem with random search occurs when the shortest UIOs are at least linear in length with respect to the number of states.
In contrast, the result in Theorem 2 implies that the (1 + 1) EA can compute UIOs efficiently for the modulo-n counter FSMs. Although the time to find the UIO increases with the size of the input alphabet, the runtime remains polynomial as long as the input alphabet is of polynomial size. The efficiency of the (1 + 1) EA can be attributed to the structure of the fitness landscape induced by the FSM class. For every input sequence that is not a UIO for state s 1 , it suffices to increase the number of 1-symbols in the input sequence to distinguish s 1 from one more state. Hence, at any non-optimal point in the search space, there is a better, neighbouring input sequence.
It has previously been asserted that evolutionary algorithms can outperform random search when computing UIOs. However, Theorems 1 and 2 provide the first formal proof that this assertion is indeed true. This result warrants further exploration of search based approaches for computing UIOs.

Hard FSM instance class
As explained above, the problem of computing UIOs is NP-hard. Hence, one should expect that any search based approach will fail to find UIOs in polynomial time for at least some classes of FSMs. In practical applications of search heuristics, it is necessary to know about these intractable cases such that other techniques or problem reformulations can be considered to avoid the unnecessary waste of resources. Here, we describe a broad class of FSMs that is hard for both random search and the (1 + 1) EA.
The objective is to find an UIO of length n for state s 1 . The instances in Definition 5 are illustrated in Fig. 3, and corresponds to a machine that recognises a fixed keyword w. In every step, the FSM receives a new letter from a word. The FSMs acknowledges each input letter by outputting the symbol ack. If the FSM is given all the input letters corresponding to the keyword w suffixed by a special input symbol EOL marking the ''end of line'', then the FSM responds with the message found.
The difficulty of computing a UIO for state s 1 in the different members of the Sequence Detector FSM class may vary depending on the particular keyword w that the FSM detects. However, the runtime on a class of functions depends on the hardest instance. In order to prove a lower bound on this class, it therefore suffices to show that there exists at least one sub-class of sequence detector FSMs for which it is hard to compute UIOs. The particular class of FSMs that we will consider have keyword alphabet R ¼ f1g and keyword w ¼ 1 nÀ1 . Proposition 2 shows that this instance class leads to a fitness function that takes the same low value on all, except two input sequences. Hence, the fitness landscape is essentially a ''needle in the haystack'' which is hard for all EAs [36]. The proof is in the appendix.

Proposition 2.
The fitness function f w corresponding to the instance class in Definition 5 with keyword alphabet R ¼ f1g and keyword w ¼ 1 nÀ1 , takes the value f H ðxÞ ¼ 1 for all input sequences x 2 I n , except on input sequences 1 n and 1 nÀ1 Á EOL on which it takes the values f H ð1 n Þ ¼ 0 and f H ð1 nÀ1 Á EOLÞ ¼ n À 1.
By noting that the shortest UIO for state s 1 in the special type of keyword FSM instance considered above has length n, the following theorem can be proved similarly to Theorem 1.
Theorem 3. The Sequence Detector FSM class contains instances such that the probability that random search will find a UIO for state s 1 in less than e cÁn iterations is exponentially small e ÀXðnÞ , where c is a small constant.
To lower bound the expected runtime of the (1 + 1) EA, we apply drift analysis which is a general technique for proving exponential lower bounds on first hitting-time in Markov processes [37]. The following variant of the drift theorem is taken from [40].
Lemma 1 (Drift theorem). Let X 0 ; X 1 ; X 2 ; . . . be a Markov process over a set of states S, and g : S ! R þ 0 a function that assigns to every state a non-negative real number. Pick two real numbers aðnÞ and bðnÞ which depend on a parameter n 2 R þ such that 0 < aðnÞ < bðnÞ holds and let random variable T denote the earliest point in time t P 0 where gðX t Þ 6 aðnÞ holds. If there are constants k > 0 and D P 1 and a polynomial pðnÞ taking only positive values, for which the following four conditions hold 1. gðX 0 Þ P bðnÞ, 2. bðnÞ À aðnÞ ¼ XðnÞ, 3. E½e ÀkðgðX tþ1 ÞÀgðXt ÞÞ jX t ; aðnÞ < gðX t Þ < bðnÞ 6 1 À 1 pðnÞ , for all t P 0, 4. E½e ÀkðgðX tþ1 ÞÀbðnÞÞ jX t ; bðnÞ 6 gðX t Þ 6 D, for all t P 0, then for all time bounds B P 0, the following upper bound on probability holds for random variable T Pr½T 6 B 6 e kðaðnÞÀbðnÞÞ Á B Á D Á pðnÞ: Theorem 4. The keyword recogniser FSM class contains instances where the probability that (1 + 1) EA will find a UIO for state s 1 in this instance within e cÁn steps is exponentially small e ÀXðnÞ , where c is a small constant.
Proof. For any n, we consider the Keyword recogniser FSM with keyword alphabet R ¼ f1g and keyword w ¼ 1 nÀ1 . We lower bound the time it takes until the current search point of (1 + 1) EA contains at least n À 1 1-symbols for the first time. This time is clearly shorter than the time the algorithm needs to find the optimal search point 1 nÀ1 Á EOL.
Let random variables Y 0 ; Y 1 ; Y 2 ; . . . represent the stochastic behaviour of (1 + 1) EA on fitness function f H , where each variable Y t denotes the number of EOL-symbols in the search point in step t. Then Y 0 ; Y 1 ; Y 2 ; . . . is a Markov process. To simplify this Markov process, we introduce another Markov process X 0 ; X 1 ; X 2 ; . . ., defined for all t P 0 as X 0 :¼ Y 0 , and Let random variable T denote the first point in time t where X t 6 1. Intuitively, the simplified process corresponds to an ''improved'' algorithm which never looses more than one EOL-symbol in each step, but otherwise behaves as the (1 + 1) EA. Clearly, the expected optimisation time E½T of the modified process is no more than the expected optimisation time of the original process.
The drift theorem is now applied to derive an exponential lower bound on random variable T. Define gðxÞ :¼ x and parameters aðnÞ :¼ 1 and bðnÞ :¼ cn, where c is a constant that will be determined later. With this setting of aðnÞ and bðnÞ, the second condition of the drift theorem is satisfied.
The following notation will be used p j :¼ Pr½gðX tþ1 Þ À gðX t Þ ¼ jjX t ; 1 < gðX t Þ < cn; r j :¼ Pr½gðX tþ1 Þ À gðX t Þ ¼ jjX t ; cn 6 gðX t Þ: The terms in the equation can be divided into four parts according to the value of the index variable j. The term where j ¼ 1 simplifies to p 1 Á e Àk 6 e Àk , the term where j ¼ 0 simplifies to p 0 Á e 0 ¼ ð1 À 1=nÞ n 6 1=e, the term where j ¼ À1 simplifies to p À1 Á e k 6 e k 1 À 1 n nÀ1 1 n Á X t 6 e k c and the remaining terms where j 6 À2 can be simplified as follows: The sum in Eq. (1) can now be bounded from above as E½e ÀkðgðX tþ1 ÞÀgðXt ÞÞ jX t ; 1 < gðX t Þ < cn 6 e Àk þ 1=e þ e k c À 1 À e k c À expðe k cÞ ¼ e Àk þ 1=e À 1 þ expðe k cÞ: For appropriate values of k and c (eg. k ¼ ln 2 and c ¼ 1=32), the value of this expression is less than 1 À d for a constant d > 0. Hence, the third condition in the drift theorem is satisfied. It is straightforward to see that the fourth condition holds now that condition three holds E½e ÀkðgðX tþ1 ÞÀcnÞ jX t ; cn 6 gðX t Þ 6 E½e ÀkðgðX tþ1 ÞÀgðXt ÞÞ jX t ; cn 6 gðX t Þ ¼ r 1 Á e Àk þ X n j¼0 r Àj Á e jk : Using the same ideas as above, the expectation can be bounded from above by E½e ÀkðgðX tþ1 ÞÀcnÞ jX t ; cn 6 gðX t Þ 6 e Àk þ expðe k Þ: When parameter c ¼ 1=32, using Chernoff bounds [41], the probability that the first search point has less than cn EOL-symbols is e ÀXðnÞ . Hence we can assume with high probability that the first condition is satisfied as well.
All four conditions of the Drift theorem now hold. By setting B ¼ e c 0 n for some small constant c 0 , one obtains the exponential lower bound Pr½T 6 e c 0 n ¼ e ÀXðnÞ . h Theorems 3 and 4 mean that for some sequence detector FSMs, the probability that either random search or (1 + 1) EA find a UIO is very small, even when the search heuristics are allowed an exponential number of iterations in the number of states. Hence, these theorems imply that these search heuristics cannot be applied to find UIOs for such classes of FSMs with the existing representation and fitness function.
For designers of new search based approaches to computing UIOs, Theorem 4 points out an important class of FSMs which could serve as a benchmark for further studies. Given the reasons for failure described in the proofs above, one could seek novel fitness functions or representations which do not suffer from the same problems.

k-Gap FSM instance class
The previous two subsections presented classes of FSMs for which it is either easy or hard to compute a UIO with the (1 + 1) EA. It is desirable to also study FSM classes of intermediate difficulty. Trivially, one could consider the modulo-n counter as an FSM class of intermediate difficulty, because as Theorem 2 shows, the runtime on this class increases with the size of the input alphabet I. However, we would like to understand cases where it is the structure of the FSM, rather than the size of the input alphabet that determines the difficulty. We therefore consider a third class of FSMs that has binary input alphabet I ¼ f0; 1g, and where the structure is parameterised by the value of some parameter k. It will be shown that the instance class is easy when the value of parameter k is low, and the problem becomes harder when parameter k is increased.
One way of creating a problem with tunable difficulty is to make sure that the fitness function contains a ''trap'' which easily leads the EA into a local optimum at distance k from the global optimum. By increasing the distance k between the local and global optimum, the problem gets harder [35]. The ''trap'' in the FSM defined in Definition 6 are the m states q 1 ; . . . ; q m . By producing an input sequence with many leading 1-bits, the (1 + 1) EA easily makes the output from these states different from state s 1 . However, as can be seen from Fig. 4, the UIO for state s 1 must contain k 0-bits somewhere in the beginning of the input sequence. The proofs of the following two propositions are in the appendix. f GðkÞ ð10 k 1 nÀ2kÀ2 zÞ ¼ n À 1: In other words, any search point on the form 10 k 1 nÀ2kÀ2 z is a UIO for state s 1 . Let i be any integer 0 6 i < n, and z any string of length n À i À 1. If string z does not contain the substring 1 nÀ2kÀ2 , then f GðkÞ ð1 i 0zÞ ¼ minði; n À k À 1Þ; and ð3Þ Proposition 4. Let i be any integer 0 6 i 6 2k þ 2, and z any sequence of length 'ðzÞ ¼ n À i À 1 containing the sequence 1 nÀ2kÀ2 . If the sequence 1 i 0z is not optimal, then f GðkÞ ð1 i 0zÞ 6 2k þ 2. Analysing the (1 + 1) EA on the problem is easy if we can assume that the sequence 1 nÀ2kÀ2 never occurs in the suffix. Proposition 5 shows that this assumption holds in most cases. The proof of this proposition is in the appendix.
Definition 7 (Typical run). A typical run of (1 + 1) EA on f GðkÞ is a run where the current search point x is never on the form 1 i 0z, 0 6 i < 2k þ 2, where z is a sequence of length 'ðzÞ ¼ n À i À 1 containing the sequence 1 nÀ2kÀ2 . A run of (1 + 1) EA on f GðkÞ is divided into the following three phases.
Phase 1 is defined as the time interval in which the search point has less than 2k þ 2 leading 1-bits. If the current search point during this phase has a suffix containing sequence 1 nÀ2kÀ2 , then we say that we have a failure. The event of failure will be denoted F. Phase 2 is defined as the time interval when the search point has between 2k þ 2 and n À k À 1 leading 1-bits. Phase 3 is defined as the time interval when the search point has at least n À k À 1 leading 1-bits, and this phase lasts until the search point is optimal for the first time. Fig. 4. Finding a UIO for state s1 with (1 + 1) EA becomes harder when increasing parameter k. Proposition 5. The probability of a failure during Phase 1 is bounded from above by e ÀXðnÞ .
Theorem 5. Let k be any constant integer k P 2. The expected runtime of (1 + 1) EA to find a UIO for state s 1 using f GðkÞ is Hðn k Þ.
Proof. Given the probability of the failure event F, the expected runtime of (1 + 1) EA can be calculated as To estimate an upper bound on the expected runtime, we use that E½T 6 E½TjF þ Pr½F Á E½TjF:We will first find an upper bound on the runtime conditional on a typical run E½TjF and pessimistically assume that the optimal search point will not be found during Phase 1 or 2 of the run. We first upper bound the duration of Phase 1 and 2. Let i; 0 6 i 6 n À k À 1, be the number of leading 1-bits in the current search point. A step of the algorithm is called successful if the mutated search point x 0 has more leading 1-bits than the current search point x. In typical runs, Proposition 3 guarantees that x 0 will be accepted in a successful step. To reach the end of Phase 2, we have to wait at most for n À k À 1 successful steps. The probability of a successful step is at least 1=n Á ð1 À 1=nÞ nÀ1 P 1=en; so the expected duration of Phase 1 and Phase 2 is Oðn 2 Þ. By Proposition 3, for Phase 3 to end, it is sufficient to flip k consecutive 1-bits starting at position 2. The probability that this will happen in any step of Phase 3 is at least ð1=nÞ k Á ð1 À 1=nÞ nÀk P 1=ðn k eÞ. Hence, the expected duration of Phase 3 is bounded from above by Oðn k Þ. An upper bound on the expected runtime conditional on the event that the run is typical is therefore E½TjF ¼ Oðn k Þ.
We now give an upper bound on the expected time E½TjF conditional on a failure. To keep the analysis simple, we give a pessimistic upper bound. At some time in such a run, the current search point has a suffix containing the sequence 1 nÀ2kÀ2 . We assume that this search point is not the optimal search point, and furthermore, we assume that in this situation, we will never accept an optimal search point during Phase 1. Clearly, this will only slow down the optimisation process. By Proposition 4, this search point has fitness at most 2k þ 2. To end Phase 1, Proposition 3 shows that it is sufficient to wait for a step in which all the 0-bits in the 2k þ 3 long prefix of the search point is flipped into 1-bits. The probability of such a mutation is at least ð1=nÞ 2kþ3 ð1 À 1=nÞ nÀ2kÀ3 P 1=en 2kþ3 . So if a failure occurs, the duration of Phase 1 will be no longer than Oðn 2kþ3 Þ. Failures do not occur in Phases 2 or 3, we therefore reuse the upper bounds of Oðn 2 Þ and Oðn k Þ that were calculated for the typical runs, yielding an upper bound of Oðn 2kþ3 Þ for the duration of runs with failures. Due to the exponentially small failure probability, the unconditional expected runtime of (1 + 1) EA is therefore E½T ¼ Oðn k Þ.
A lower bound on the expected runtime is estimated using the inequality E½T P ð1 À Pr½FÞ Á E½TjF. We need to estimate the expected runtime conditional on a typical run. Optimal search points contain the suffix 1 nÀ2kÀ2 , hence the optimal search point will not be found during Phase 1 of typical runs. By Propositions 3 and 4, only search points with at least 2k þ 2 leading 1-bits or an optimal search point will be accepted during Phase 2. Optimal search points must contain 10 k 1 nÀ2kÀ2 . Hence, in order to find the optimum in the second phase, it is necessary to flip k consecutive 1-bits into 0-bits, starting somewhere in the interval between position 2 and k þ 2. The probability of this event in any given step is no more than k=n k . Hence, the expected duration of Phase 2 and Phase 3 is at least n k =k steps. The unconditional expected runtime can now be bounded from below by E½T P ð1 À e ÀXðnÞ Þ Á n k =k ¼ Xðn k Þ. h Theorem 5 relates the runtime of the (1 + 1) EA to a structural parameter k in a class of FSMs. The theorem shows that small modifications in the structure of an FSM can have a strong impact on the runtime of the algorithm. Hence, one cannot generally infer that the runtime of (1 + 1) EA will be similar for similarly structured FSMs.
The reason why the runtime of the (1 + 1) EA increases with parameter k can be explained informally as follows. In order for state s 1 to reach the distinguishing transition from state q m to q 1 , it is necessary to include k consecutive 0-symbols early in the input sequence. However, on most input sequences containing early 0-symbols, the output when starting in state s 1 will be indistinguishable from the outputs when starting in most of the states labelled q. The majority of the states are qstates. In contrast state s 1 is distinguished from q-states on input sequences with many leading 1-bits. When the (1 + 1) EA has found an input sequence with sufficiently many leading 1-bits, it is necessary to flip k 1-bits simultaneously to produce a better input sequence. The expected waiting time for such an event is n k .

Objectives
The theoretical analysis in the previous section leaves some questions open. One open question is how the search heuristics behave on particular, possibly small, instance sizes. The theoretical analysis of runtime considered asymptotic behaviour and the results were expressed using big-Oh notation. Hence, in principle, there is no guarantee that these theoretical results are relevant for ''interesting'' values of n, i.e. for the number of states in ''typical'' applications of the FSMs considered. A statement like f ðnÞ ¼ OðgðnÞÞ only means that there exists some constants c and n 0 such that f ðnÞ < c Á gðnÞ for all n > n 0 . If either of these constants are very large, the inequality is either very weak, or only holds for very large values of n. Two questions to consider are therefore whether the sizes occurring in runtime expressions are very large, and whether the runtime results seem to manifest themselves for small values of n.
A second issue is that the theoretical analysis mainly focused on the expectation of the runtime distribution. To get a clearer picture of the runtime distributions, it is necessary to investigate the variability of the distributions.

Strategy
To address these issues, the theoretical analysis is complemented with a numerical study. The modulo-n counter FSMs include a wide range of different FSMs that depends both on the number of states n and the input alphabet I. Similarly, the sequence detector FSMs contains a wide range of FSMs depending on the keyword alphabet R, the keyword w, and the number of states n. To keep the number of experiments within a feasible range, we therefore only consider the modulo-n counter FSMs with the binary input alphabet I ¼ f0; 1g, and we will refer to these FSMs as the Easy FSM instance class.
Furthermore, we only consider the sequence detector FSMs with keyword alphabet R ¼ f1g, and keyword w ¼ 1 nÀ1 . These FSMs correspond exactly to those FSMs considered in the proof of Theorem 4. They will in the following be referred to as the Hard FSM instance class. Additionally, the runtime of (1 + 1) EA will be investigated on the k-gap FSM instance class for values of k between 2 and 4. Based on the theoretical results on the expected runtime of (1 + 1) EA on this instance, it is deemed impractical to carry out experiments for values of k higher than 4.
For each experimental setting, the search heuristic under consideration is started, and run until an optimal solution has been found. The runtime of one run is defined as the number of times the fitness function has been evaluated. Repeated runs with different random seeds are made to allow statistical analysis. Table 1 summarises the experimental settings. The experiments are numbered from 1 to 14, and divided into two groups.
The purpose of the first group of experiments numbered 1-7 is to study the relationship between instance size and runtime for different instance classes, and in particular to provide estimates for the constants in the asymptotic runtime expressions. The theoretical runtime analysis combined with estimates of the time to evaluate the fitness function were used to choose sets of instance sizes which were deemed feasible to carry out within reasonable time constraints.
For each setting of algorithm, FSM instance and instance size n, 100 experiments were run. The bootstrap percentile method with 400 bootstrap samples were used to calculate 95% confidence intervals of the mean of the true runtime distribution. These confidence intervals are reported as error bars. Following [42], for each setting of algorithm and problem instance size, we fitted different models to the observed runtimes using non-linear regression with the Gauss-Newton algorithm. Each model corresponds to a one term expression a Á tðnÞ of the runtime, where the model parameter a corresponds to the constant to be estimated. The residual sum of squares (RSS) of each fitted model was calculated to identify the model which corresponds best with the observed runtimes. Additionally, for each instance size, a box-and-whisker plot is made, indicating the smallest observed runtime, the lower quartile, the median, the upper quartile and the largest observed runtime. Together, the box-and-whisker plots provide information on how the variability of runtime depends on the instance size.
The purpose of the second group of experiments, numbered 8-14, is to look closer at the variability of the runtime for a fixed instance size. In these experiments, a larger number of repetitions were made. Again, the particular instance sizes were chosen based on the theoretical analysis and estimates of the time to evaluate the fitness function. The results from these experiments are plotted as histograms to visualise the variability of the observed runtime distributions.

Easy FSM instance class
The easy FSM instance class was constructed to point out that there are instance classes where (1 + 1) EA is highly efficient, whereas random search fails completely. The theoretical analysis shows that the expected runtime of (1 + 1) EA on this instance class is Oðn log nÞ, and there is an exponentially small probability that random search will find the optimal solution within e cn iterations, where c is some constant. Three models were fitted to the observed runtimes of (1 + 1) EA on the Easy FSM instance class. The models were chosen to be close to the theoretically obtained runtime bound on this instance. The results summarised in Table 2 indicate that the model which fits the data best is a Á n log n with estimated parameter a ¼ 1:967. This result corresponds well with the asymptotic runtime Oðn log nÞ obtained in Theorem 2. The fitted models are plotted together with the mean of the observed runtimes in Fig. 5.
As show in Fig. 6, the runtime of random search on the easy FSM instance class grows much faster than the runtime of (1 + 1) EA. The log-plot indicates that the runtime grows exponentially with the number of states in the FSM.

Hard FSM instance class
The Hard FSM instance class was constructed to point out that (1 + 1) EA is not successful on all instance classes of this problem. The theoretical analysis shows that on this instance class, there is an exponentially small probability that the algo- Table 2 Residual sum of squares of three models fitted to the observed runtime of (1 + 1) EA on the Easy FSM instance.
12.35 n 1.97 n log n 0.02 n^2

RS on Easy FSM instance class (log).
Number of states in FSM (n).
Iterations of RS. rithm will find the optimal solution within e cn iterations, for some constant c. This result also implies that the expected runtime of (1 + 1) EA on this instance class is exponential. The plot in Fig. 7 shows the mean runtime of (1 + 1) EA on the Hard FSM instance class with error bars indicating 95% confidence intervals. The log-plot indicates that the observed mean runtime grows exponentially with the number of states in the FSM. The mean observed runtime of Random Search on the same instance class is plotted in Fig. 8, showing a similar trend. These results correspond well with the theoretical results.

k-Gap FSM instance class
The k-gap FSM instance class was constructed to point out that the asymptotic runtime of (1 + 1) EA on the UIO problem is not limited to being either very small or exponentially large, but can range over a large range of values depending on characteristics of the FSMs. Even small changes to the FSM can have a big impact on the runtime. The theoretical analysis shows that the expected runtime of (1 + 1) EA on the k-gap FSM instance class is Hðn k Þ for any constant k P 2.
Three models were fitted to the observed runtimes of (1 + 1) EA on the k-gap FSM instance class with k ¼ 2; k ¼ 3 and k ¼ 4, using non-linear regression. The results are summarised in Tables 3-5. For k ¼ 2, the model with best fit was a Á n 2 with estimated parameter a ¼ 2:518, for k ¼ 3, the model with best fit was a Á n 3 with estimated parameter a ¼ 1:722, and for k ¼ 4, the model with best fit was a Á n 4 with estimated parameter a ¼ 1:605. These results correspond well with the theoretical result. The fitted models are plotted with the mean observed runtime in Figs. 9-11. The asymptotic behaviour predicted by the theoretical analysis seems evident in these plots even for small instance sizes. The results from bootstrapping confidence intervals of the mean are shown using error bars in the plots. The error bars indicate larger confidence intervals with increasing instance size. 5 1 0 1 5 2 0 1e+02 1e+03 1e+04 1e+05 1e+06

RS on Hard FSM instance class (log).
Number of states in FSM (n).
Iterations of RS.

Variability of runtime
Box-and-whisker plots are made to show how the variability of the runtime depends on the instance size. The plots in Figs. 12 and 13 show that the interquartile range in the observed runtimes increases with increasing instance size. The increasing interquartile range is most evident in Fig. 13, showing the runtime of Random Search on the Easy FSM instance class.
The box-and-whisker plots for the Hard FSM instance class are not included here, but they show a similar tendency as the box-and-whisker plots for Random Search on the Easy FSM instance class.
The box-and-whisker plots for the k-gap FSM instance class also show that the interquartile range increases with the instance size. Here, we only include the plot for k ¼ 4. For larger instance sizes, one can observe that the distribution is positively skewed, with the median closer to the lower than the upper quartile.
To investigate closer the variability in runtime, the experiments numbered 8-14 in Table 1 were conducted with a larger number of repetitions on a single experimental setting. Results from these experiments are summarised in histograms. The histogram in Figs. 14 and 15 shows the observed runtimes from 2000 runs of the (1 + 1) EA on the Easy FSM instance class (1+1) EA on k−gap FSM instance class (k=2).
2.52 n^2 0.06 n^3 0 n^4  Iterations of (1+1) EA. Fig. 12. Box-and-whisker-plots from observed runtimes of (1 + 1) EA on the Easy FSM instance class.  Iterations of (1+1) EA. ðn ¼ 200Þ. The histogram shows a slightly positively skewed distribution. The variance of this distribution is lower than the variance of the distribution shown in the histogram in Fig. 16, which summarises the observed runtimes from 2000 runs of Random Search on the same instance class ðn ¼ 17Þ. The distribution shown by this histogram is highly positively skewed. Theorem 1 shows that the runtime distribution of random search on this instance class is exactly the geometric distribution with parameter p ¼ ðn þ 1Þ Á 2 Àn . The histogram corresponds well with the plot of the density function of this distribution.
The histograms for the Hard FSM instance class are not included here, but they show similarly shaped distributions as the one of Random search on the Easy FSM instance class in Fig. 16.
The histograms for the runtime of (1 + 1) EA on the k-gap FSM instance class are similar, and we only include the one for k ¼ 4 which is shown in Fig. 17. The distribution in this histogram has a similar shape as the distribution of Random Search on the Easy FSM instance. The distribution is highly positively skewed, and has a large variance. In the proof of Theorem 5, it is shown that the dominating phase of a typical run of (1 + 1) EA on this instance class is when the algorithm needs to flip k consecutive bits in a single iteration. Hence, one can conjecture that for large k, the runtime distribution of (1 + 1) EA on the k-gap FSM instance class will be close to geometrically distributed.

Discussion
Three classes of finite state machines have been constructed and studied in this paper. The first FSM class is the modulo-n counting FSMs that are used widely in cases where it is necessary to detect when a certain number of events has happened. A simple example is a binary counter. The second FSM class is the sequence detector FSMs, which is also occur frequently in RS on Easy FSM instance (n=17). applications, including in the lexical analysis component of compilers, in communication systems and in electronic key locks. FSMs of these two types often occur as sub-modules within larger systems. The third FSM class, which is parametrised, shares properties with both the easy and hard FSM class. Such FSMs have many applications in software engineering, including modelling and testing of non-functional requirements. One example is security testing of automated teller machines (ATMs). The lock FSM could model requirements related to authentication by personal identification number (PIN), while the counter FSM could model requirements related to card retainment after a specified number of failed authentication attempts. The notions of easy and hard instances depend on both the EA used and on the way the problem of finding UIOs has been defined. This paper uses the terms hard and easy relative to the (1 + 1) EA, as described in Section 3.2. These terms should not be confused with the terms EA-hard and EA-easy which are sometimes used in evolutionary computation to mean problems that are thought to be generally hard, respectively easy for all EAs. There are certainly functions that are hard in the sense of Section 3.2 for (1 + 1) EA, but which are easy for other EAs. Furthermore, the hardness of finding UIOs is relative to the way the fitness function for this problem has been defined. We believe the formulation in Definition 3 is quite natural, however one could envisage other fitness function definitions which could potentially lead to different runtimes for the (1 + 1) EA.

Conclusion
Search based software engineering is a promising approach to automating software engineering tasks. Although a significant amount of research has been conducted in the area over recent years, there exists still very few theoretical results. Theoretical research is needed to rigorously determine the potential and limitations of search heuristics in various software engineering domains. In particular, for many software engineering problems that are NP-hard [3], it is necessary to characterise the problem instances that are tractable for search heuristics. Only when the tractable class of problem instances have been accurately characterised can search heuristics be applied with a predictable performance to a software engineering problem.
Here, we have initiated such a theoretical study by analysing the runtime of the (1 + 1) Evolutionary Algorithm on the problem of computing unique input output (UIO) sequences in finite state machines. The primary purpose of this theoretical study has been to give an initial description as to which types of FSMs that are tractable for the (1 + 1) EA, and which classes are intractable. As far as we know, this paper, along with a preliminary conference version [1], represents the first rigorously obtained result on the runtime of an evolutionary algorithm in the field of search based software engineering.
It is shown that on the class of modulo-n counter FSMs, the (1 + 1) EA is highly efficient, whereas random search fails completely. This result indicates that the (1 + 1) EA can be preferable over the sometimes proposed strategy of randomly searching for UIOs. Furthermore, it is shown that the (1 + 1) EA fails on the class of sequence detecting FSMs. On this particular instance class, the state partition tree gives little information about the UIO. The existence of such hard instances for the (1 + 1) EA is to be expected since the general problem of finding UIOs is NP-hard. This result implies that alternative approaches should be considered when computing UIOs for such FSMs. Furthermore, the sequence detecting FSMs could be a useful benchmark for new search based approaches to the UIO problem. Finally, an instance class with tunable difficulty for the (1 + 1) EA is presented. This instance class highlights how specific, small changes to the structure of the FSM can make the problem of computing UIOs increasingly hard. This result implies that structurally similar FSMs are not necessary equally hard for the (1 + 1) EA.
The theoretical analysis was complemented with an extensive numerical study to investigate the constants that are hidden by the big Oh-expressions, and to gain insight into the variability of the runtime. Constants were estimated using nonlinear regression, and variability were investigated using histograms. The numerical and theoretical results agree well. In all cases, the theoretically obtained asymptotic expression for runtime fitted the observed runtimes best among a selection of similar models. The estimated constants in the asymptotic expressions were small. The asymptotic behaviour predicted by the theoretical analysis appear evident, even for small instance sizes. The observed variability in runtime was in general large on all instance classes. The (1 + 1) EA on the Easy FSM instance class showed the least variability. On the other instance classes, both Random Search and (1 + 1) EA showed a much larger variability in runtime. The observed runtimes formed a highly positively skewed distribution.
The stochastic behaviour of evolutionary algorithms and other search heuristics is often highly complex and therefore hard to predict. Only recently have results about the runtime of EAs started to appear for artificial functions and some combinatorial optimisation problems. It is a highly non-trivial task to estimate the success probability of an EA on an arbitrary problem instance. More theoretical research is still needed before such predictions can be made for any given class of FSMs on the UIO problem. However, we think that this and other theoretical studies will contribute to building the strong foundation that is needed for search based approaches to be reliably applied in the software engineering industry.

Appendix A. Proofs
Proof of Proposition 1. The case where P n i¼1 jx i j 1 P n À 1 is easy. State s 1 is the only state which outputs a on each of the first n À 1 inputs of symbol 1. Hence, for such input sequences, cðxÞ ¼ 1.
Before showing that the proposition also holds for the remaining input sequences, we first show that for any input sequence x with cðxÞ > 1, we have cðxÞ ¼ cðx Á pÞ þ jpj 1 : ðA:1Þ Eq. (A.1) obviously holds when symbol p is different from 1, because all states output symbol const on input symbols different from 1, so it remains to show that cðx Á 1Þ ¼ cðxÞ À 1 for all x.
By the definition of the transition function, there must be a state t such that dðt; xÞ ¼ s n . Furthermore, we can show that state s 1 and state t produce the same output on input sequence x. Suppose not, that kðs 1 ; xÞkðt; xÞ. This would imply that on input x, state t must have reached the only distinguishing transition from state s n to state s 1 , i. e. sequence x can be expressed on the form x ¼ y1z with dðt; yÞ ¼ s n . Since both dðt; yÞ and dðt; y1zÞ equal state s n , we must have P 'ðzÞ i¼1 jz i j 1 P n À 1. However, this is a contradiction, because the assumption cðxÞ > 1 implies that P n i¼1 jx i j 1 < n À 1. It is thus clear that kðs 1 ; xÞ ¼ kðt; xÞ, and furthermore kðs 1 ; x Á 1Þ-kðt; x Á 1Þ. For all other states s i different than state t, kðdðs i ; xÞ; 1Þ ¼ kðdðs 1 ; xÞ; 1Þ ¼ a. So to conclude, if cðxÞ > 1 then cðxÞ ¼ cðx Á 1Þ þ 1.
We can now show that the proposition also holds for input sequences where P n i¼1 jx i j 1 < n À 1. On such input sequences, state s 2 cannot reach the distinguishing state transition from s n to s 1 . So state s 1 and s 2 are indistinguishable and cðxÞ > 1.
Obviously, the same also holds for all prefixes of input sequence x. Eq. (A.1) can now be applied recursively, and by noting the special case of cðÞ ¼ n on the empty string, we obtain the desired result.
¼ n À X n i¼1 jx i j 1 : Ã Proof of Proposition 2. The two special cases 1 n and 1 nÀ1 Á EOL are simple. By the definition of the output function, kðs i ; 1 n Þ ¼ ack n for any state s i . Hence, cð1 n Þ ¼ n so the value of the fitness function on the first special case is f H ð1 n Þ ¼ 0.
On input sequence 1 nÀ1 Á EOL, the output function gives kðs 1 ; 1 nÀ1 Á EOLÞ ¼ ack nÀ1 Á found, and for states s i different than s 1 , the output function gives kðs i ; 1 nÀ1 Á EOLÞ ¼ ack n . Hence, the value of the fitness function on the second special case is f H ð1 nÀ1 Á EOLÞ ¼ n À 1.
The remaining input sequences to consider are those that contain at least one EOL-symbol, but which are different from the sequence 1 nÀ1 Á EOL. Such strings are of the form 1 k Á EOL Á z where k is an integer, 0 6 k < n À 1, and z can be any sequence of length 'ðzÞ ¼ n À k À 1.
We now show that for the sequences on the form 1 k Á EOL Á z, there is exactly one state s i for which kðs 1 ; 1 k Á EOL Á zÞ-kðs i ; 1 k Á EOL Á zÞ. We have just proved that this inequality requires that kðs 1 ; 1 k Á EOLÞ-kðs i ; 1 k Á EOLÞ. Because all states have the same output on input 1, it is necessary that kðdðs 1 ; 1 k Þ; EOLÞ-kðdðs i ; 1 k Þ; EOLÞ, which implies that kðs 1þk ; EOLÞ-kðs iþk ; EOLÞ. The only way to satisfy this inequality is to let i þ k ¼ n. Hence, state s nÀk is the only state that produces different output than state s 1 on input sequences containing at least one EOL-symbol, and that are different from 1 nÀ1 Á EOL. h Proof of Proposition 3. We first prove Eq. (2). On input sequence 10 k , only dðs 1 ; 10 k Þ ¼ q kþ2 and for all other states t, dðt; 10 k Þ ¼ s 1 . Hence, state s 1 goes through the distinguishing transition on input 10 k 1 nÀ2kÀ2 while all other states are in transition between states s 1 and r 1 , showing that state s 1 has a unique output. Therefore, search points on the form 10 k 1 nÀ2kÀ2 z are optimal. (There are other optimal search points, but knowing the structure of a few optimal search points will be sufficient in the analysis.) We now show that cð1 i 0Þ ¼ n À minði; mÞ. Note that ðq m ; q 1 ; 1=bÞ is the only distinguishing state transition. For i no larger than m, the i states q mÀiþ1 ; . . . ; q m reach this transition on input sequence 1 i and therefore produce different outputs than state s 1 . For i at least m, all m states q 1 ; . . . ; q m reach the distinguishing transition. State s 1 and the k states r 1 ; . . . ; r k do not reach the distinguishing transition on input sequence 1 i 0. Therefore, the number of states that produce different outputs than state s 1 on input sequence 1 i 0 is minði; mÞ.
Finally, we prove Eqs. (3) and (4) under the assumption that z does not contain the substring 1 nÀ2kÀ2 . For all states s, either dðs; 1 i 0Þ ¼ s 1 or dðs; 1 i 0Þ ¼ r 2 . All state transition paths from either state s 1 or state r 2 to the distinguishing state transition from state q m must go through the n À 2k À 2 state transitions between q kþ2 and q m . Transitions along this path require an input sequence with n À 2k À 2 consecutive 1-bits, which is not possible with sequence z. Therefore, we have cð1 i 0zÞ ¼ cð1 i 0Þ ¼ n À minði; mÞ. This also proves Eq. (4) because cð1 i Þ ¼ cð1 i 0Þ ¼ n À minði; mÞ. h Proof of Proposition 4. Assume first that i ¼ 0, i. e. the search point begins with a 0-bit. In this case, all states q 1 ; . . . ; q m collapse with state s 1 , and the suffix z can at most distinguish s 1 from the k states r 1 ; . . . ; r k . Hence, in this case cð0zÞ P n À k.
Assume now that 1 6 i 6 k þ 2. After input 1 i 0, all states have moved to either state s 1 or state r 2 . If i is even, then state s 1 has collapsed with states q 1 ; . . . ; q m . Hence, the suffix z can at most distinguish the k states r 1 ; . . . ; r k from state s 1 , i. e. cð1 i 0zÞ P n À i À k P n À ðk þ 2Þ À k. If i is odd, then states r 1 ; . . . ; r k have collapsed with states q 1 ; . . . ; q m . So if x is not optimal, then cð1 i 0zÞ ¼ n À i P n À k À 2.
Finally, assume that k þ 2 < i 6 2k þ 2. After input 1 i 0, no more states can reach the distinguishing transition because moving from state s 1 or state r 2 to the distinguishing transition requires at least the subsequence 0 kÀ1 1 nÀ2kÀ2 which is longer than subsequence z. So in this case, we have cð1 i 0zÞ ¼ n À i P n À ð2k À 2Þ. h Proof of Proposition 5. The current search point x of (1 + 1) EA in Phase 1 is on the form x ¼ 1 i 0z for some i, 0 6 i < 2k þ 2 and z a string of length 'ðzÞ ¼ n À i À 1. We call this substring z occurring after the first 0-bit the suffix of the current search point.
We first show that as long as the run for the first t steps has been typical, then the suffix z in step t þ 1 is a random string. The initial search point is a random string, so the suffix is also a random string. Assume that the run has been typical until step t and the suffix z is a random string. By Eq. (4) in Proposition 3, any bitflip of the suffix will be accepted. Randomly mutating a random string, will clearly produce a new random string. The suffix in step t þ 1 will therefore be a random string. The suffix z of the new search point in step t þ 1 can contain 1 nÀ2kÀ2 , i. e. we may have a failure in step t þ 1. However, we show that this is unlikely. The probability that the string 1 nÀ2kÀ2 occurs in a random string shorter than n is no more than ð2k þ 2Þ Á 2 Ànþ2kþ2 , which for large n is less than e Àn=16 . One way of increasing the number of leading 1-bits without having a failure is by flipping the first 0-bit and flip no other bits. So the probability of increasing the number of leading 1-bits without having a failure in the following step is at least ð1=nÞ Á ð1 À 1=nÞ nÀ1 P 1=en.
Hence, for large n, the probability that the number of leading 1-bits increases before we have a failure is at least 1=en 1=en þ 1=e n=16 P 1 À ne Á e Àn=16 P 1 À e Àn=32 : A failure must occur before the number of leading 1-bits has been increased more than 2k þ 2 times. So the failure probability Pr½F is no more than X 2kþ2 i¼0 ð1 À e Àn=32 Þ i Á e Àn=32 6 ð2k þ 2Þ Á e Àn=32 ¼ e ÀXðnÞ : Ã