Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Traditional numerical algorithms for the verification of Markov chains may be computationally intense or inapplicable, when facing a large state space or limited knowledge about the chain. To this end, statistical algorithms are used as a powerful alternative. Statistical model checking (SMC) typically refers to approaches where (i) finite paths of the Markov chain are sampled a finite number of times, (ii) the property of interest is verified for each sampled path (e.g. state r is reached), and (iii) hypothesis testing or statistical estimation is used to infer conclusions (e.g. state r is reached with probability at most 0.5) and give statistical guarantees (e.g. the conclusion is valid with \(99\,\%\) confidence). SMC approaches differ in (a) the class of properties they can verify (e.g. bounded or unbounded properties), (b) the strength of statistical guarantees they provide (e.g. confidence bounds, only asymptotic convergence of the method towards the correct value, or none), and (c) the amount of information they require about the Markov chain (e.g. the topology of the graph). In this paper, we provide an algorithm for SMC of unbounded properties, with confidence bounds, in the setting where only the minimum transition probability of the chain is known. Such an algorithm is particularly desirable in scenarios when the system is not known (“black box”), but also when it is too large to construct or fit into memory.

Most of the previous efforts in SMC has focused on the analysis of properties with bounded horizon [5, 12, 13, 21, 27, 28]. For bounded properties (e.g. state r is reached with probability at most 0.5 in the first 1000 steps) statistical guarantees can be obtained in a completely black-box setting, where execution runs of the Markov chain can be observed, but no other information about the chain is available. Unbounded properties (e.g. state r is reached with probability at most 0.5 in any number of steps) are significantly more difficult, as a stopping criterion is needed when generating a potentially infinite execution run, and some information about the Markov chain is necessary for providing statistical guarantees (for an overview, see Table 1). On the one hand, some approaches require the knowledge of the full topology in order to preprocess the Markov chain. On the other hand, when the topology is not accessible, there are approaches where the correctness of the statistics relies on information ranging from the second eigenvalue \(\lambda \) of the Markov chain, to knowledge of both the number |S| of states and the minimum transition probability \(p_{\mathsf {min}}\).

Table 1. SMC approaches to Markov chain verification, organised by (i) the class of verifiable properties, and (ii) by the required information about the Markov chain, where \(p_{\mathsf {min}}\) is the minimum transition probability, |S| is the number of states, and \(\lambda \) is the second largest eigenvalue of the chain.

Our contribution is a new SMC algorithm for full linear temporal logic (LTL), as well as for unbounded quantitative properties (mean payoff), which provides strong guarantees in the form of confidence bounds. Our algorithm uses less information about the Markov chain than previous algorithms that provide confidence bounds for unbounded properties —we need to know only the minimum transition probability \(p_{\mathsf {min}}\) of the chain, and not the number of states nor the topology. Yet, experimentally, our algorithm performs in many cases better than these previous approaches (see Sect. 5). Our main idea is to monitor each execution run on the fly in order to build statistical hypotheses about the structure of the Markov chain. In particular, if from observing the current prefix of an execution run we can stipulate that with high probability a bottom strongly connected component (BSCC) of the chain has been entered, then we can terminate the current execution run. The information obtained from execution prefixes allows us to terminate executions as soon as the property is decided with the required confidence, which is usually much earlier than any bounds that can be computed a priori. As far as we know, this is the first SMC algorithm that uses information obtained from execution prefixes.

Finding \(p_{\mathsf {min}}\) is a light assumption in many realistic scenarios and often does not depend on the size of the chain – e.g. bounds on the rates for reaction kinetics in chemical reaction systems are typically known, from a Prism language model they can be easily inferred without constructing the respective state space.

Example 1

Consider the property of reaching state r in the Markov chain depicted in Fig. 1. While the execution runs reaching r satisfy the property and can be stopped without ever entering any \(v_i\), the finite execution paths without r, such as stuttutuut, are inconclusive. In other words, observing this path does not rule out the existence of a transition from, e.g., u to r, which, if existing, would eventually be taken with probability 1. This transition could have arbitrarily low probability, rendering its detection arbitrarily unlikely, yet its presence would change the probability of satisfying the property from 0.5 to 1. However, knowing that if there exists such a transition leaving the set, its transition probability is at least \(p_{\mathsf {min}}=0.01\), we can estimate the probability that the system is stuck in the set \(\{t,u\}\) of states. Indeed, if existing, the exit transition was missed at least four times, no matter whether it exits t or u. Consequently, the probability that there is no such transition and \(\{t,u\}\) is a BSCC is at least \(1-(1-p_{\mathsf {min}})^4\).

Fig. 1.
figure 1

A Markov chain.

This means that, in order to get \(99\,\%\) confidence that \(\{t,u\}\) is a BSCC, we only need to see both t and u around 500 timesFootnote 1 on a run. This is in stark contrast to a priori bounds that provide the same level of confidence, such as the \((1/p_{\mathsf {min}})^{|S|}=100^{\mathcal {O}(m)}\) runs required by [4], which is infeasible for large m. In contrast, our method’s performance is independent of m. \(\triangle \)

Monitoring execution prefixes allows us to design an SMC algorithm for complex unbounded properties such as full LTL. More precisely, we present a new SMC algorithm for LTL over Markov chains, specified as follows:

  • Input: we can sample finite runs of arbitrary length from an unknown finite-state discrete-time Markov chain \(\mathcal {M}\) starting in the initial stateFootnote 2, and we are given a lower bound \(p_{\mathsf {min}}>0\) on the transition probabilities in \(\mathcal {M}\), an LTL formula \(\varphi \), a threshold probability p, an indifference region \(\varepsilon >0\), and two error bounds \(\alpha ,\beta >0\),Footnote 3

  • Output: if \(\mathbb P[\mathcal {M}\models \varphi ]\ge p+\varepsilon \), return YES with probability at least \(1-\alpha \), and if \(\mathbb P[\mathcal {M}\models \varphi ]\le p-\varepsilon \), return NO with probability at least \(1-\beta \).

In addition, we present the first SMC algorithm for computing the mean payoff of Markov chains whose states are labelled with rewards.

Related Work. To the best of our knowledge, we present the first SMC algorithm that provides confidence bounds for unbounded qualitative properties with access to only the minimum probability of the chain \(p_{\mathsf {min}}\), and the first SMC algorithm for quantitative properties. For completeness, we survey briefly other related SMC approaches. SMC of unbounded properties, usually “unbounded until” properties, was first considered in [10] and the first approach was proposed in [22], but observed incorrect in [9]. Notably, in [26] two approaches are described. The first approach proposes to terminate sampled paths at every step with some probability \(p_{\mathsf {term}}\). In order to guarantee the asymptotic convergence of this method, the second eigenvalue \(\lambda \) of the chain must be computed, which is as hard as the verification problem itself. It should be noted that their method provides only asymptotic guarantees as the width of the confidence interval converges to zero. The correctness of [16] relies on the knowledge of the second eigenvalue \(\lambda \), too. The second approach of [26] requires the knowledge of the chain’s topology, which is used to transform the chain so that all potentially infinite paths are eliminated. In [9], a similar transformation is performed, again requiring knowledge of the topology. The (pre)processing of the state space required by the topology-aware methods, as well as by traditional numerical methods for Markov chain analysis, is a major practical hurdle for large (or unknown) state spaces. In [4] a priori bounds for the length of execution runs are calculated from the minimum transition probability and the number of states. However, without taking execution information into account, these bounds are exponential in the number of states and highly impractical, as illustrated in the example above. Another approach, limited to ergodic Markov chains, is taken in [20], based on coupling methods. There are also extensions of SMC to timed systems [7]. Our approach is also related to [8, 18], where the product of a non-deterministic system and Büchi automaton is explored for accepting lassos. We are not aware of any method for detecting BSCCs by observing a single run, employing no directed search of the state space.

Experimental Evaluation. Our idea of inferring the structure of the Markov chain on the fly, while generating execution runs, allows for their early termination. In Sect. 5 we will see that for many chains arising in practice, such as the concurrent probabilistic protocols from the Prism benchmark suite [15], the BSCCs are reached quickly and, even more importantly, can be small even for very large systems. Consequently, many execution runs can be stopped quickly. Moreover, since the number of execution runs necessary for a required precision and confidence is independent of the size of the state space, it needs not be very large even for highly confident results (a good analogy is that of the opinion polls: the precision and confidence of opinion polls is regulated by the sample size and is independent of the size of the population). It is therefore not surprising that, experimentally, in most cases from the benchmark suite, our method outperforms previous methods (often even the numerical methods) despite requiring much less knowledge of the Markov chain, and despite providing strong guarantees in the form of confidence bounds. In Sect. 6, we also provide theoretical bounds on the running time of our algorithm for classes of Markov chains on which it performs particularly well.

Due to space constraints, the proofs and further details can be found in [2].

2 Preliminaries

Definition 1

(Markov chain). A Markov chain (MC) is a tuple \(\mathcal {M}= (S, \mathbf {P}, \mu )\), where

  • \(S\) is a finite set of states,

  • \(\mathbf {P} \;:\; S\times S\rightarrow [0,1]\) is the transition probability matrix, such that for every \(s\in S\) it holds \(\sum _{s'\in S} \mathbf {P}(s,s') = 1\),

  • \(\mu \) is a probability distribution over \(S\).

We let \(p_{\mathsf {min}}:=\min (\{\mathbf {P}(s,s') > 0\mid s,s'\in S\})\) denote the smallest positive transition probability in \(\mathcal {M}\). A run of \(\mathcal {M}\) is an infinite sequence \(\rho = s_0 s_1 \ldots \) of states, such that for all \(i\ge 0\), \(\mathbf {P}(s_i, s_{i+1}) > 0\); we let \(\rho [i]\) denote the state \(s_i\). A path in \(\mathcal {M}\) is a finite prefix of a run of \(\mathcal {M}\). We denote the empty path by \(\lambda \) and concatenation of paths \(\pi _1\) and \(\pi _2\) by \(\pi _1\,.\,\pi _2\). Each path \(\pi \) in \(\mathcal {M}\) determines the set of runs \(\mathsf {Cone}(\pi )\) consisting of all runs that start with \(\pi \). To \(\mathcal {M}\) we assign the probability space \( (\mathsf {Runs},\mathcal {F},\mathbb P)\), where \(\mathsf {Runs}\) is the set of all runs in \(\mathcal {M}\), \(\mathcal F\) is the \(\sigma \)-algebra generated by all \(\mathsf {Cone}(\pi )\), and \(\mathbb P\) is the unique probability measure such that \(\mathbb P[\mathsf {Cone}(s_0s_1\ldots s_k)] = \mu (s_0)\cdot \prod _{i=1}^{k} \mathbf {P}(s_{i-1},s_i)\), where the empty product equals 1. The respective expected value of a random variable \(f:\mathsf {Runs}\rightarrow \mathbb R\) is \(\mathbb {E}[f]=\int _\mathsf {Runs}f\ d\,\mathbb P\).

A non-empty set \(C\subseteq S\) of states is strongly connected (SC) if for every \(s,s'\in C\) there is a path from s to \(s'\). A set of states \(C\subseteq S\) is a bottom strongly connected component (BSCC) of \(\mathcal {M}\), if it is a maximal SC, and for each \(s\in C\) and \(s'\in S\setminus C\) we have \(\mathbf {P}(s,s')=0\). The sets of all SCs and BSCCs in \(\mathcal {M}\) are denoted by \(\mathsf {SC}\) and \(\mathsf {BSCC}\), respectively. Note that with probability 1, the set of states that appear infinitely many times on a run forms a BSCC. From now on, we use the standard notions of SC and BSCC for directed graphs as well.

3 Solution for Reachability

A fundamental problem in Markov chain verification is computing the probability that a certain set of goal states is reached. For the rest of the paper, let \(\mathcal {M}= (S,\mathbf {P},\mu )\) be a Markov chain and \(G \subseteq S\) be the set of the goal states in \(\mathcal {M}\). We let \( \Diamond G:= \{ \rho \in \mathsf {Runs}\mid \exists i\ge 0: \rho [i]\in G\} \) denote the event that “eventually a state in G is reached.” The event \(\Diamond G\) is measurable and its probability \(\mathbb P[\Diamond G]\) can be computed numerically or estimated using statistical algorithms. Since no bound on the number of steps for reaching G is given, the major difficulty for any statistical approach is to decide how long each sampled path should be. We can stop extending the path either when we reach G, or when no more new states can be reached anyways. The latter happens if and only if we are in a BSCC and we have seen all of its states.

In this section, we first show how to monitor each simulation run on the fly, in order to detect quickly if a BSCC has been entered with high probability. Then, we show how to use hypothesis testing in order to estimate \(\mathbb P[\Diamond G]\).

3.1 BSCC Detection

We start with an example illustrating how to measure probability of reaching a BSCC from one path observation.

Example 2

Recall Example 1 and Fig. 1. Now, consider an execution path stuttutu. Intuitively, does \(\{t,u\}\) look as a good “candidate” for being a BSCC of \(\mathcal {M}\)? We visited both t and u three times; we have taken a transition from each t and u at least twice without leaving \(\{t,u\}\). By the same reasoning as in Example 1, we could have missed some outgoing transition with probability at most \((1-p_{\mathsf {min}})^2\). The structure of the system that can be deduced from this path is in Fig. 2 and is correct with probability at least \(1-(1-p_{\mathsf {min}})^2\). \(\triangle \)

Now we formalise our intuition. Given a finite or infinite sequence \(\rho =s_0s_1\ldots \), the support of \(\rho \) is the set \(\overline{\rho }=\{s_0,s_1,\ldots \}\). Further, the graph of \(\rho \) is given by vertices \(\overline{\rho }\) and edges \(\{(s_i,s_{i+1})\mid i=0,1,\ldots \}\).

Fig. 2.
figure 2

A graph of a path stuttutu.

Definition 2

(Candidate). If a path \(\pi \) has a suffix \(\kappa \) such that \(\overline{\kappa }\) is a BSCC of the graph of \(\pi \), we call \(\overline{\kappa }\) the candidate of \(\pi \). Moreover, for \(k\in \mathbb N\), we call it a k-candidate (of \(\pi \)) if each \(s\in \overline{\kappa }\) has at least k occurrences in \(\kappa \) and the last element of \(\kappa \) has at least \(k+1\) occurrences. A k-candidate of a run \(\rho \) is a k-candidate of some prefix of \(\rho \).

Note that for each path there is at most one candidate. Therefore, we write \(K(\pi )\) to denote the candidate of \(\pi \) if there is one, and \(K(\pi )=\bot \), otherwise. Observe that each \(K(\pi )\ne \bot \) is a SC in \(\mathcal {M}\).

Example 3

Consider a path \(\pi =stuttutu\), then \(K(\pi )=\{t,u\}\). Observe that \(\{t\}\) is not a candidate as it is not maximal. Further, \(K(\pi )\) is a 2-candidate (and as such also a 1-candidate), but not a 3-candidate. Intuitively, the reason is that we only took a transition from u (to the candidate) twice, cf. Example 2. \(\triangle \)

Intuitively, the higher the k the more it looks as if the k-candidate is indeed a BSCC. Denoting by \( Cand _{k}(K)\) the random predicate of K being a k-candidate on a run, the probability of “unluckily” detecting any specific non-BSCC set of states K as a k-candidate, can be bounded as follows.

Lemma 1

For every \(K\subseteq S\) such that \(K\notin \mathsf {BSCC}\), and every \(s\in K\), \(k\in \mathbb N\),

$$ \mathbb P[ Cand _{k} (K) \mid \Diamond s]\le (1-p_{\mathsf {min}})^k. $$

Example 4

We illustrate how candidates “evolve over time” along a run. Consider a run \(\rho =s_0s_0s_1s_0\ldots \) of the Markov chain in Fig. 3. The empty and one-letter prefix do not have the candidate defined, \(s_0s_0\) has a candidate \(\{s_0\}\), then again \(K(s_0s_0s_1)=\bot \), and \(K(s_0s_0s_1s_0)=\{s_0,s_1\}\). One can observe that subsequent candidates are either disjoint or contain some of the previous candidates. Consequently, there are at most \(2|S|-1\) candidates on every run, which is in our setting an unknown bound. \(\triangle \)

Fig. 3.
figure 3

A family (for \(n\in \mathbb N\)) of Markov chains with large eigenvalues.

While we have bounded the probability of detecting any specific non-BSCC set K as a k-candidate, we need to bound the overall error for detecting a candidate that is not a BSCC. Since there can be many false candidates on a run before the real BSCC (e.g. Fig. 3), we need to bound the error of reporting any of them.

In the following, we first formalise the process of discovering candidates along the run. Second, we bound the error that any of the non-BSCC candidates becomes a k-candidate. Third, we bound the overall error of not detecting the real BSCC by increasing k every time a different candidate is found.

We start with discovering the sequence of candidates on a run. For a run \(\rho =s_0s_1\ldots \), consider the sequence of random variables defined by \(K(s_0\ldots s_j)\) for \(j\ge 0\), and let \((K_i)_{i\ge 1}\) be the subsequence without undefined elements and with no repetition of consecutive elements. For example, for a run \(\varrho =s_0s_1s_1s_1s_0s_1s_2s_2\ldots \), we have \(K_1=\{s_1\}\), \(K_2=\{s_0,s_1\}\), \(K_3=\{s_2\}\), etc. Let \(K_j\) be the last element of this sequence, called the final candidate. Additionally, we define \(K_\ell :=K_j\) for all \(\ell >j\). We describe the lifetime of a candidate. Given a non-final \(K_i\), we write \(\rho =\alpha _i\beta _ib_i\gamma _id_i\delta _i\) so that \(\overline{\alpha _i}\cap K_i=\emptyset \), \(\overline{\beta _ib_i\gamma _i}=K_i\), \(d_i\notin K_i\), and \(K(\alpha _i\beta _i)\ne K_i\), \(K(\alpha _i\beta _ib_i)= K_i\). Intuitively, we start exploring \(K_i\) in \(\beta _i\); \(K_i\) becomes a candidate in \(b_i\), the birthday of the ith candidate; it remains to be a candidate until \(d_i\), the death of the ith candidate. For example, for the run \(\varrho =s_0s_1s_1s_1s_0s_1s_2s_2\ldots \) and \(i=1\), \(\alpha _1=s_0\), \(\beta _1=s_1\), \(b_1=s_1\), \(\gamma _1=s_1\), \(d_1=s_0\), \(\delta _1=s_1s_2s_2\varrho [8]\varrho [9]\ldots \). Note that the final candidate is almost surely a BSCC of \(\mathcal {M}\) and would thus have \(\gamma _j\) infinite.

Now, we proceed to bounding errors for each candidate. Since there is an unknown number of candidates on a run, we will need a slightly stronger definition. First, observe that \( Cand _{k}(K_i)\) iff \(K_i\) is a k-candidate of \(\beta _ib_i\gamma _i\). We say \(K_i\) is a strong k-candidate, written \( SCand _{k}(K_i)\), if it is a k-candidate of \(b_i\gamma _i\). Intuitively, it becomes a k-candidate even not counting the discovery phase. As a result, even if we already assume there exists an ith candidate, its strong k-candidacy gives the guarantees on being a BSCC as above in Lemma 1.

Lemma 2

For every \(i,k\in \mathbb N\), we have

$$\mathbb P[ SCand _{k}(K_i) \mid K_i\notin \mathsf {BSCC}]\le (1-p_{\mathsf {min}})^k.$$

Since the number of candidates can only be bounded with some knowledge of the state space, e.g. its size, we assume no bounds and provide a method to bound the error even for an unbounded number of candidates on a run.

Lemma 3

For \((k_i)_{i=1}^\infty \in \mathbb N^\mathbb N\), let \(\mathcal {E}rr\) be the set of runs such that for some \(i\in \mathbb N\), we have \( SCand _{k_i}(K_i)\) despite \(K_i\notin \mathsf {BSCC}\). Then

$$\displaystyle \mathbb P[\mathcal {E}rr]<\sum _{i=1}^\infty (1-p_{\mathsf {min}})^{k_i}.$$

In Algorithm 1 we present a procedure for deciding whether a BSCC inferred from a path \(\pi \) is indeed a BSCC with confidence greater than \(1-\delta \). We use notation \(\textsc {SCand}_{k_i}(K,\pi )\) to denote the function deciding whether K is a strong \(k_i\)-candidate on \(\pi \). The overall error bound is obtained by setting \(k_i=\frac{i-\log \delta }{-\log (1-p_{\mathsf {min}})}\).

figure a

Theorem 1

For every \(\delta >0\), Algorithm 1 is correct with error probability at most \(\delta \).

We have shown how to detect a BSCC of a single path with desired confidence. In Algorithm 2, we show how to use our BSCC detection method to decide whether a given path reaches the set G with confidence \(1-\delta \). The function \(\mathsf {NextState}(\pi )\) randomly picks a state according to \(\mu \) if the path is empty (\(\pi =\lambda \)); otherwise, if \(\ell \) is the last state of \(\pi \), it randomly chooses its successor according to \(\mathbf {P}(\ell ,\cdot )\). The algorithm returns \(\mathbf {Yes}\) when \(\pi \) reaches a state in G, and \(\mathbf {No}\) when for some i, the ith candidate is a strong \(k_i\)-candidate. In the latter case, with probability at least \(1-\delta \), \(\pi \) has reached a BSCC not containing G. Hence, with probability at most \(\delta \), the algorithm returns \(\mathbf {No}\) for a path that could reach a goal.

figure b

3.2 Hypothesis Testing on a Bernoulli Variable Observed with Bounded Error

In the following, we show how to estimate the probability of reaching a set of goal states, by combining the BSCC detection and hypothesis testing. More specifically, we sample many paths of a Markov chain, decide for each whether it reaches the goal states (Algorithm 2), and then use hypothesis testing to estimate the event probability. The hypothesis testing is adapted to the fact that testing reachability on a single path may report false negatives.

Let \(X_\Diamond ^\delta \) be a Bernoulli random variable, such that \(X_\Diamond ^\delta =1\) if and only if SinglePathReach \((G,p_{\mathsf {min}},\delta )=\mathbf {Yes}\), describing the outcome of Algorithm 2. The following theorem establishes that \(X_\Diamond ^\delta \) estimates \(\mathbb P[\Diamond G]\) with a bias bounded by \(\delta \).

Theorem 2

For every \(\delta >0\), we have \(\mathbb P[\Diamond G]-\delta \le \mathbb {E}[X_\Diamond ^\delta ]\le \mathbb P[\Diamond G]\).

In order to conclude on the value \(\mathbb P[\Diamond G]\), the standard statistical model checking approach via hypothesis testing [28] decides between the hypothesis \( H_0: \mathbb P[\Diamond G]\ge p + \varepsilon \) and \(H_1: \mathbb P[\Diamond G]\le p - \varepsilon \), where \(\varepsilon \) is a desired indifference region. As we do not have precise observations on each path, we reduce this problem to a hypothesis testing on the variable \(X_\Diamond ^\delta \) with a narrower indifference region: \( H_0': \mathbb {E}[X_\Diamond ^\delta ] \ge p+(\varepsilon - \delta )\) and \(H_1': \mathbb {E}[X_\Diamond ^\delta ]\le p - \varepsilon \), for some \(\delta <\varepsilon \).

We define the reduction simply as follows. Given a statistical test \(\mathcal {T'}\) for \(H'_0, H'_1\) we define a test \(\mathcal {T}\) that accepts \(H_0\) if \(\mathcal {T}'\) accepts \(H'_0\), and \(H_1\) otherwise. The following lemma shows that \(\mathcal {T}\) has the same strength as \(\mathcal {T}'\).

Lemma 4

Suppose the test \(\mathcal {T'}\) decides between \(H_0'\) and \(H_1'\) with strength \((\alpha , \beta )\). Then the test \(\mathcal {T}\) decides between \(H_0\) with \(H_1\) with strength \((\alpha , \beta )\).

Lemma 4 gives us the following algorithm to decide between \(H_0\) and \(H_1\). We generate samples \(x_0, x_1, \ldots , x_n \sim X_\Diamond ^\delta \) from SinglePathReach \((G,p_{\mathsf {min}},\delta )\), and apply a statistical test to decide between \(H'_0\) and \(H'_1\). Finally, we accept \(H_0\) if \(H'_0\) was accepted by the test, and \(H_1\) otherwise. In our implementation, we used the sequential probability ration test (SPRT) [24, 25] for hypothesis testing.

4 Extensions

In this section, we present how the on-the-fly BSCC detection can be used for verifying LTL and quantitative properties (mean payoff).

4.1 Linear Temporal Logic

We show how our method extends to properties expressible by linear temporal logic (LTL) [19] and, in the same manner, to all \(\omega \)-regular properties. Given a finite set Ap of atomic propositions, a labelled Markov chain (LMC) is a tuple \(\mathcal {M}= (S, \mathbf {P}, \mu , Ap, L)\), where \((S,\mathbf {P}, \mu )\) is a MC and \(L: S\rightarrow 2^{Ap}\) is a labelling function. Definition of LTL formulae is standard and for reader’s convenience recalled in the full version [2], along with other standard details omitted in this section. Given a labelled Markov chain \(\mathcal {M}\) and an LTL formula \(\varphi \), we are interested in the measure \(\mathbb P[\mathcal {M}\models \varphi ] := \mathbb P[\{\rho \in \mathsf {Runs}\mid L(\rho ) \models \varphi \}],\) where \(L\) is naturally extended to runs by \(L(\rho )[i]=L(\rho [i])\) for all i.

For every LTL formula \(\varphi \), one can construct a deterministic Rabin automaton (DRA) \(\mathcal {A}= (Q, {2^{Ap}}, \gamma , q_o, Acc)\) that accepts all runs that satisfy \(\varphi \) [3]. Here \(Q\) is a finite set of states, \(\gamma : Q\times {2^{Ap}}\rightarrow Q\) is the transition function, \(q_o\in Q\) is the initial state, and \(Acc\subseteq 2^Q\times 2^Q\) is the acceptance condition. Further, the product of a MC \(\mathcal {M}\) and DRA \(\mathcal A\) is the Markov chain \(\mathcal {M}\otimes \mathcal {A}=(S\times Q, \mathbf {P}', \mu ')\), where \(\mathbf {P}'((s,q),(s',q')) = \mathbf {P}(s,s')\) if \(q'=\gamma (q,L(s'))\) and \(\mathbf {P}'((s,q),(s',q'))=0\) otherwise, and \(\mu '(s,q) = \mu (s)\) if \(\gamma (q_o, L(s))=q\) and \(\mu '(s,q)=0\) otherwise. Note that \(\mathcal {M}\otimes \mathcal {A}\) has the same smallest transition probability \(p_{\mathsf {min}}\) as \(\mathcal {M}\).

The crux of LTL probabilistic model checking relies on the fact that the probability of satisfying an LTL property \(\varphi \) in a Markov chain \(\mathcal {M}\) equals the probability of reaching an accepting BSCC in the Markov chain \(\mathcal {M}\otimes \mathcal {A}_\varphi \). Formally, a BSCC C of \(\mathcal {M}\otimes \mathcal {A}_\varphi \) is accepting if for some \((E,F)\in Acc\) we have \(C\cap (S\times E)=\emptyset \) and \(C\cap (S\times F)\ne \emptyset \). Let \(\mathsf {AccBSCC}\) denote the union of all accepting BSCCs in \(\mathcal {M}\). Then we obtain the following well-known fact [3]:

Lemma 5

For every labelled Markov chain \(\mathcal {M}\) and LTL formula \(\varphi \), we have \(\mathbb P[\mathcal {M}\models \varphi ] = \mathbb P[\Diamond \mathsf {AccBSCC}]\).

figure c

Since the input used is a Rabin automaton, the method applies to all \(\omega \)-regular properties. Let \(X^\delta _\varphi \) be a Bernoulli random variable, such that \(X_\varphi ^\delta =1\) if and only if SinglePathLTL \((\mathcal {A}_\varphi ,p_{\mathsf {min}},\delta )=\mathbf {Yes}\). Since the BSCC must be reached and fully explored to classify it correctly, the error of the algorithm can now be both-sided.

Theorem 3

For every \(\delta >0\), \(\mathbb P[\mathcal {M}\models \varphi ] - \delta \le \mathbb {E}[X_\varphi ^\delta ] \le \mathbb P[\mathcal {M}\models \varphi ] + \delta \).

Further, like in Sect. 3.2, we can reduce the hypothesis testing problem for

$$\begin{aligned} H_0: \mathbb P[\mathcal {M}\models \varphi ]\ge p + \varepsilon \qquad \text {and} \qquad H_1: \mathbb P[\mathcal {M}\models \varphi ]\le p - \varepsilon \end{aligned}$$

for any \(\delta <\varepsilon \) to the following hypothesis testing problem on the observable \(X^\delta _\varphi \)

$$\begin{aligned} H'_0: \mathbb {E}[X^\delta _\varphi ]\ge p + (\varepsilon -\delta )\qquad \text {and} \qquad H'_1: \mathbb {E}[X^\delta _\varphi ]\le p - (\varepsilon - \delta ). \end{aligned}$$

4.2 Mean Payoff

We show that our method extends also to quantitative properties, such as mean payoff (also called long-run average reward). Let \(\mathcal {M}= (S,\mathbf {P},\mu )\) be a Markov chain and \(r:S\rightarrow [0,1]\) be a reward function. Denoting by \(S_i\) the random variable returning the i-th state on a run, the aim is to compute

$$\mathsf {MP}:=\lim _{n\rightarrow \infty } \mathbb {E}\left[ \frac{1}{n}\sum _{i=1}^n r(S_i)\right] .$$

This limit exists (see, e.g. [17]), and equals \(\sum _{C\in \mathsf {BSCC}}\mathbb P[\Diamond C]\cdot \mathsf {MP}_C,\) where \(\mathsf {MP}_C\) is the mean payoff of runs ending in C. Note that \(\mathsf {MP}_C\) can be computed from r and transition probabilities in C [17]. We have already shown how our method estimates \(\mathbb P[\Diamond C]\). Now we show how it extends to estimating transition probabilities in BSCCs and thus the mean payoff.

First, we focus on a single path \(\pi \) that has reached a BSCC \(C=K(\pi )\) and show how to estimate the transition probabilities \(\mathbf {P}(s,s')\) for each \(s,s'\in {C}\). Let \(X_{s,s'}\) be the random variable denoting the event that \(\mathsf {NextState}(s)=s'\). \(X_{s,s'}\) is a Bernoulli variable with parameter \(\mathbf {P}(s,s')\), so we use the obvious estimator \(\hat{\mathbf {P}}(s,s')=\#_{ss'}(\pi )/\#_{s}(\pi )\), where \(\#_{\alpha }(\pi )\) is the number of occurrences of \(\alpha \) in \(\pi \). If \(\pi \) is long enough so that \(\#_s(\pi )\) is large enough, the estimation is guaranteed to have desired precision \(\xi \) with desired confidence \((1-\delta _{s,s'})\). Indeed, using Höffding’s inequality, we obtain

$$\begin{aligned} \mathbb P[\hat{\mathbf {P}}(s,s') - \mathbf {P}(s,s')|>\xi ] \le \delta _{s,s'}=2e^{-2 \#_s(\pi )\cdot \xi ^2}. \end{aligned}$$
(1)

Hence, we can extend the path \(\pi \) with candidate C until it is long enough so that we have a \(1-\delta _{C}\) confidence that all the transition probabilities in \(C\) are in the \(\xi \)-neighbourhood of our estimates, by ensuring that \(\sum _{s,s'\in C}\delta _{s,s'}<\delta _{C}\). These estimated transition probabilities \(\hat{\mathbf {P}}\) induce a mean payoff \(\hat{\mathsf {MP}}_{C}\). Moreover, \(\hat{\mathsf {MP}}_{C}\) estimates the real mean payoff \(\mathsf {MP}_{C}\). Indeed, by [6, 23],

$$\begin{aligned} |\hat{\mathsf {MP}}_{C}-\mathsf {MP}_{C}|\le \zeta :=\left( 1+\frac{\xi }{p_{\mathsf {min}}}\right) ^{2\cdot |{C}|}-1. \end{aligned}$$
(2)

Note that by Taylor’s expansion, for small \(\xi \), we have .

figure d

Algorithm 4 extends Algorithm 2 as follows. It divides the confidence parameters \(\delta \) into \(\delta _{BSCC}\) (used as in Algorithm 2 to detect the BSCC) and \(\delta _C\) (the total confidence for the estimates on transition probabilities). For simplicity, we set \(\delta _{BSCC}=\delta _{C}=\delta /2\). First, we compute the bound \(\xi \) required for -precision (by Eq. 2). Subsequently, we compute the required strength k of the candidate guaranteeing \(\delta _{C}\)-confidence on \(\hat{\mathbf {P}}\) (from Eq. 1). The path is prolonged until the candidate is strong enough; in such a case \(\hat{\mathsf {MP}}_{C}\) is -approximated with \(1-\delta _{C}\) confidence. If the candidate of the path changes, all values are computed from scratch for the new candidate.

Theorem 4

For every \(\delta >0\), the Algorithm 4 terminates correctly with probability at least \(1-\delta \).

Let random variable \(X_{\mathsf {MP}}^{\zeta ,\delta }\) denote the value SinglePathMP(\(r,p_{\mathsf {min}},\zeta ,\delta \)). The following theorem establishes relation between the mean-payoff \(\mathsf {MP}\) and the expected value of \(X_\mathsf {MP}^{\zeta ,\delta }\).

Theorem 5

For every \(\delta , \zeta > 0\), \(\mathsf {MP} - \zeta - \delta \le \mathbb {E}[X_\mathsf {MP}^{\zeta ,\delta }] \le \mathsf {MP} + \zeta + \delta \).

As a consequence of Theorem 5, if we establish that with \((1-\alpha )\) confidence \(X_\mathsf {MP}^{\zeta ,\delta }\) belongs to the interval [ab], then we can conclude with \((1-\alpha )\) confidence that \(\mathsf {MP}\) belongs to the interval \([a-\zeta - \delta , b+\zeta +\delta ]\). Standard statistical methods can be applied to find the confidence bound for \(X_\mathsf {MP}^{\zeta ,\delta }\).

5 Experimental Evaluation

We implemented our algorithms in the probabilistic model checker Prism [14], and evaluated them on the DTMC examples from the Prism benchmark suite [15]. The benchmarks model communication and security protocols, distributed algorithms, and fault-tolerant systems. To demonstrate how our method performs depending on the topology of Markov chains, we also performed experiments on the generic DTMCs shown in Figs. 3 and 4, as well as on two CTMCs from the literature that have large BSCCs: “tandem” [11] and “gridworld” [27].

All benchmarks are parametrised by one or more values, which influence their size and complexity, e.g. the number of modelled components. We have made minor modifications to the benchmarks that could not be handled directly by the SMC component of Prism, by adding self-loops to deadlock states and fixing one initial state instead of multiple.

Our tool can be downloaded at [1]. Experiments were done on a Linux 64-bit machine running an AMD Opteron 6134 CPU with a time limit of 15 min and a memory limit of 5GB. To increase performance of our tool, we check whether a candidate has been found every 1000 steps; this optimization does not violate correctness of our analysis. See the full version of this paper [2] for a discussion on this bound.

Fig. 4.
figure 4

A Markov chain with two transient parts consisting of N strongly connected singletons, leading to BSCCs with the ring topology of M states.

Reachability. The experimental results for unbounded reachability are shown in Table 2. The Prism benchmarks were checked against their standard properties, when available. We directly compare our method to another topology-agnostic method of [26] (\(\mathsf {SimTermination}\)), where at every step the sampled path is terminated with probability \(p_{\mathsf {term}}\). The approach of [4] with a priori bounds is not included, since it times out even on the smallest benchmarks. In addition, we performed experiments on two methods that are topology-aware: sampling with reachability analysis of [26] (\(\mathsf {SimAnalysis}\)) and the numerical model-checking algorithm of Prism (\(\textsf {MC}\)). The full version [2] contains detailed experimental evaluation of these methods.

The table shows the size of every example, its minimum probability, the number of BSCCs, and the size of the largest BSCC. Column “time” reports the total wall time for the respective algorithm, and “analysis” shows the time for symbolic reachability analysis in the \(\mathsf {SimAnalysis}\) method. Highlights show the best result among the topology-agnostic methods. All statistical methods were used with the SPRT test for choosing between the hypothesis, and their results were averaged over five runs.

Table 2. Experimental results for unbounded reachability. Simulation parameters: \(\alpha =\beta =\varepsilon =0.01\), \(\delta =0.001\), \(p_{\mathsf {term}}=0.0001\). \(\mathsf {TO}\) means time-out, and \(\mathsf {MO}\) means memory-out. Our approach is denoted by \(\mathsf {SimAdaptive}\) here. Highlights show the best result the among topology-agnostic methods.

Finding the optimal termination probability \(p_{\mathsf {term}}\) for the \(\mathsf {SimTermination}\) method is a non-trivial task. If the probability is too high, the method might never reach the target states, thus give an incorrect result, and if the value is too low, then it might sample unnecessarily long traces that never reach the target. For instance, to ensure a correct answer on the Markov chain in Fig. 3, \(p_{\mathsf {term}}\) has to decrease exponentially with the number of states. By experimenting we found that the probability \(p_{\mathsf {term}}=0.0001\) is low enough to ensure correct results. See the full version [2] for experiments with other values of \(p_{\mathsf {term}}\).

On most examples, our method scales better than the \(\mathsf {SimTermination}\) method. Our method performs well even on examples with large BSCCs, such as “tandem” and “gridworld,” due to early termination when a goal state is reached. For instance, on the “gridworld” example, most BSCCs do not contain a goal state, thus have to be fully explored, however the probability of reaching such BSCC is low, and as a consequence full BSCC exploration rarely occurs. The \(\mathsf {SimTermination}\) method performs well when the target states are unreachable or can be reached by short paths. When long paths are necessary to reach the target, the probability that an individual path reaches the target is small, hence many samples are necessary to estimate the real probability with high confidence.

Moreover, it turns out that our method compares well even with methods that have access to the topology of the system. In many cases, the running time of the numerical algorithm \(\textsf {MC}\) increases dramatically with the size of the system, while remaining almost constant in our method. The bottleneck of the \(\mathsf {SimAnalysis}\) algorithm is the reachability analysis of states that cannot reach the target, which in practice can be as difficult as numerical model checking.

LTL and Mean Payoff. In the second experiment, we compared our algorithm for checking LTL properties and estimating the mean payoff with the numerical methods of Prism; the results are shown in Table 3. We compare against Prism, since we are not aware of any SMC-based or topology-agnostic approach for mean payoff, or full LTL. For mean payoff, we computed \(95\,\%\)-confidence bound of size 0.22 with parameters \(\delta =0.011, \zeta =0.08\), and for LTL we used the same parameters as for reachability. Due to space limitations, we report results only on some models of each type, where either method did not time out. In general our method scales better when BSCCs are fairly small and are discovered quickly.

Table 3. Experimental results for LTL and mean-payoff properties. Simulation parameters for LTL: \(\alpha =\beta =\varepsilon =0.01\), \(\delta =0.001\), for mean-payoff we computed \(95\,\%\)-confidence bound of size 0.22 with \(\delta =0.011, \zeta =0.08\).

6 Discussion and Conclusion

As demonstrated by the experimental results, our method is fast on systems that are (1) shallow, and (2) with small BSCCs. In such systems, the BSCC is reached quickly and the candidate is built-up quickly. Further, recall that the BSCC is reported when a k-candidate is found, and that k is increased with each candidate along the path. Hence, when there are many strongly connected sets, and thus many candidates, the BSCC is detected by a k-candidate for a large k. However, since k grows linearly in the number of candidates, the most important and limiting factor is the size of BSCCs.

We state the dependency on the depth of the system and BSCC sizes formally. We pick \(\delta :=\frac{\varepsilon }{2}\) and let

$$ sim = \frac{ -\log \frac{\beta }{1-\alpha }\log \frac{1-\beta }{\alpha } }{\log \frac{p-\varepsilon + \delta }{p + \varepsilon - \delta }\log \frac{1-p - \varepsilon + \delta }{1-p + \varepsilon - \delta }} \qquad \text {and} \qquad k_{i}=\frac{i-\log \delta }{-\log (1-p_{\mathsf {min}})} $$

denote the a priori upper bound on the number of simulations necessary for SPRT [24, 25] and the strength of candidates as in Algorithm 2, respectively.

Theorem 6

Let R denote the expected number of steps before reaching a BSCC and B the maximum size of a BSCC. Further, let \(T :=\max _{C\in \mathsf {BSCC};s,s'\in C}\mathbb {E}[time\,to\,reach\,s'\,from\,s]\). In particular, \(T\in \mathcal {O}(B/p_{\mathsf {min}}^{B})\). Then the expected running time of Algorithms 2 and 3 is at most

$$ \mathcal {O}(sim\cdot k_{R+B}\cdot B \cdot T). $$

Systems that have large deep BSCCs require longer time to reach for the required level of confidence. However, such systems are often difficult to handle also for other methods agnostic of the topology. For instance, correctness of [26] on the example in Fig. 3 relies on the termination probability \(p_{\mathsf {term}}\) being at most \(1-\lambda \), which is less than \(2^{-n}\) here. Larger values lead to incorrect results and smaller values to paths of exponential length. Nevertheless, our procedure usually runs faster than the bound suggest; for detailed discussion see [2].

Conclusion. To the best of our knowledge, we propose the first statistical model-checking method that exploits the information provided by each simulation run on the fly, in order to detect quickly a potential BSCC, and verify LTL properties with the desired confidence. This is also the first application of SMC to quantitative properties such as mean payoff. We note that for our method to work correctly, the precise value of \(p_{\mathsf {min}}\) is not necessary, but a lower bound is sufficient. This lower bound can come from domain knowledge, or can be inferred directly from description of white-box systems, such as the Prism benchmark.

The approach we present is not meant to replace the other methods, but rather to be an addition to the repertoire of available approaches. Our method is particularly valuable for models that have small BSCCs and huge state space, such as many of the Prism benchmarks.

In future work, we plan to investigate the applicability of our method to Markov decision processes, and to deciding language equivalence between two Markov chains.