Nonparametric inference for stochastic feedforward networks based on cross-spectral analysis of point processes

: In this paper we study a nonparametric estimation problem for stochastic feedforward systems with nodes of type G/G/ ∞ . We assume that we have observations of the external arrival and external departure processes of customers of the system, but no information about the movements of the indistinguishable customers in the network. Our aim is the construction of estimators for the characteristic functions and the densities of the service time distributions at the nodes as well as for the routing prob- abilities. Since the system is only partly observed we have to clarify ﬁrst if the parameters are identiﬁable from the given data. The crucial point in our approach is the observation that in our stochastic networks under study the inﬂuence of the arrival processes on the departure processes can be described in a linear and time-invariant model. This makes it possible to apply cross-spectral techniques for multivariate point processes. The con- struction of the estimators is then based on smoothed periodograms. We prove asymptotic normality for the estimators. We present the statistical analysis for a tandem system of two nodes in full details and show after-wards how the results can be generalized to feedforward systems of three or more nodes and to systems with positive feedback probabilities at the nodes.


Introduction
Stochastic networks are complex interacting systems with the typical application fields computer and telecommunication networks, the internet, manufacturing facilities, as well as dynamical population models and models of the neurosciences. In all these fields there is an increasing interest in the statistical inference for system characteristics based on incomplete information of the working systems. For a recent example, have a look at the applied comprehensive modeling and statistical analysis of internet voice communication presented in [33]. In particular, knowledge of the distribution of the operating times at the nodes is essential for specifying the performance behavior and for guaranteeing reliability and quality standards of the systems. It is important to describe the speed and potential monotonicity of service at the nodes and to classify the shape of the service time distribution with respect to unimodality or multimodality. For example, a unimodal service time density shows a homogeneous service behavior whereas a bimodal density may indicate that there are two distinct customer populations or breakdowns of the server. There is interest in both, parametric and nonparametric statistical inference for service time distributions. However, most of the currently applied techniques for service demand estimation in queueing networks focus only on the first and second order moments of the service times obtained e.g. by linear regression of measured utilization values. Some efforts have been done in the recent past to extend queueing networks to more general models with arbitrary distributions and/or nonrenewal processes, but the service distribution estimation problem has remained unsolved up to now.
Important contributions to a parametric analysis of queueing systems are among others the following papers. In [23] the mean service and response times of an IT system are inferred from measurements of the end-to-end delays of the customers. The analysis is based on a mathematical programming formulation. In [22] the traffic intensity in G/G/1 queueing systems is estimated based on samples of the interarrival and service times of customers. The estimators are shown to be consistent and asymptotically normal and corresponding confidence intervals and tests are constructed. Moreover, a comprehensive literature overview on parameter estimation problems in queueing systems is presented. The construction of new confidence intervals for the mean sojourn times in M/G/1-FCFS systems is achieved in [14]. The analysis is based on four different bootstrap procedures. A thorough simulation study is presented. In [15] the analysis is extended to G/G/1-FCFS systems. In [30] traffic parameters like the means of service and waiting times are estimated for various single server systems as well as for queueing networks by sending and observing a test stream. The research is motivated by the need to decide if incoming traffic should be allowed to enter the system in ATM networks. [25] provides a review of general parameter estimation problems for single queues and discusses in particular maximum likelihood estimation, the method of moments and the problem of hypothesis testing in this context. In [21] finite source discrete time Geo/Geo/1/∞/N queues with parallel arrival streams of ordinary customers and of disasters which empty the system on their arrival are analyzed. Numerical schemes to compute steady state probabilities of the system and performance measures are derived. A numerical study illustrates the sensitivity of the performance measures due to changes of the system's parameters. In [13] a modification of the M/G/∞ system is studied as a model for data traffic generation where the transmission rates may also be random. The statistical analysis in particular concentrates on the influence of changes in the input parameters on the long-range dependence of the accumulated traffic. The connection between geometric coverage processes and queueing systems is exploited for the statistical analysis in [16] and [17](mainly Chapter 2). By interpreting service times in M/G/∞ systems as line segments, busy periods as clumps of segments and idle periods as spacings between clumps, the relation between one dimensional mosaic processes and M/G/∞ systems is established. Based on observations of the busy periods of M/G/∞ systems then in particular estimators of the arrival rate are studied and their efficiency is compared.
Next we give a brief literature overview on nonparametric estimation studies for queueing systems. [18] bases the nonparametric analysis of M/G/∞ systems on data of consecutive sequences of the busy and idle periods. The kernel-based deconvolution approach relies on the trick to expand the characteristic function of the busy period distribution before inversion. The main results are uniform strong consistency for the estimators of the distribution function and the density function of the service time distribution. In a series of papers Pitts and her coauthors contributed to the field of nonparametric inference for queues. They successfully applied a powerful functional approach based on infinitedimensional versions of the delta method to various settings. [29] studies the problem of estimating the stationary waiting time distribution function in the GI/G/1 queue given random samples from the service and interarrival time distributions. The results are strong consistency and asymptotic normality for the functional estimator. In [2] the arrival rate and the service time distribution function in M/G/ ∞ systems are estimated in a parametric as well as in a nonparametric setting. The analysis is based on data on the queue length process which is either completely observable or just in terms of the idle and busy periods. [3] is devoted to the estimation of the arrival rate and the service time distribution via its generating function in M/G/1 systems given observations of the busy and idle periods of the server. [19] studies nonparametric inference for the service time distribution from the workload process of M/G/1 systems. The analysis exploits the underlying regenerative structure of the sampled workload process and utilizes central limit theorems for regenerative processes and a decompounding approach for geometric random sums. In the early contribution [7] Brillinger studies the problem of nonparametric estimation of the service time distribution in a single G/G/∞ queue based on observations of the input and output processes. His results are asymptotically normal estimators. He applies in this paper the spectral theory of stationary processes. We refer to a thorough presentation to his book [9] written in the context of time series.
To the best of our knowledge up to now there are no results in the important case of nonparametric inference for stochastic networks with two or more nodes where dependencies between the components due to the moving customers must be taken into consideration. With this paper we contribute to close this gap and present a statistical analysis study of a feedforward network of nodes of type G/G/∞. The external arrival processes of customers to these networks are modelled as general point processes which may in particular show inner dependence. At each node there are infinitely many servers which act independently from each other according to a common general service time distribution. No waiting of customers occurs. Once having left a node customers cannot enter it again on their way through the network. Due to its general interarrival and service times the model is of wide practical interest. Particularly, if nothing is known about the structure of the arrival and service processes this model is often the first choice. If the number of servers is unknown in the application it suggests itself to assume infinitely many servers and thus, to equalize service and sojourn times of customers at the nodes. We assume that we observe over a finite stretch of time at each node the external arrival and external departure times of customers only. It is here not possible to follow individual customer paths through the system and directly assign the departures to the arrivals. Our aim is to provide nonparametric estimators for the service time distributions at the nodes as well as estimators for the unknown routing probabilities. In particular, we will estimate both, the characteristic functions and the densities of the service time distributions. The crucial point in our approach is the observation that in our tandem systems under study the influence of the arrival processes on the departure processes can be described in a linear and time-invariant model. Then the stochastic analysis of the system can be based on models for multivariate point processes and cross-spectral techniques for stationary processes can be applied. The underlying Fourier analysis for stochastic processes originated from the search for hidden periodicities in the time course. As we will see, spectral analysis is a powerful tool for studying inner dependency structures of multivariate processes as well as interrelations between two processes since these properties are well represented in the frequency domain. Our work is in particular a generalization of [7] to the multivariate case with a thorough detailed definition of the mathematical point process framework.
The paper is organized as follows. In Section 2 we summarize the notations used, in particular with respect to point processes. Section 3 describes the network model under consideration (a tandem system of two nodes of type G/G/∞) and its stochastical analysis. Section 4 is devoted to the study of general linear time-invariant models for multivariate point processes. We describe crossspectral methods for the estimation of characteristics of the model and prove the main results about asymptotic normality for the estimators of the so-called transfer function and the impulse-response function of the model. In Section 5 we show how the results from Section 4 can be applied in the setting of the queueing system presented in Section 3. In particular, we develop estimators for the characteristic functions and the (Lebesgue) densities of the service time distributions at the nodes as well as for the routing probabilities of the model and prove their respective asymptotic normality. Furthermore, we present the multiple coherence as an indicator if service times and routing behavior are deterministic. In Section 6 we sketch how our statistical approach can be generalized to feedforward systems of more than two nodes and to tandem systems with positive feedback probabilities at the nodes. In Section 7 we show and discuss results from a simulation study.

Notation
In the paper we consider point processes on R to be random counting measures, i.e. we call a measurable mapping D from an underlying probability space (Ω, F , P ) into the set of all nonnegative integer-valued boundedly finite measures on the Borel σ-algebra B(R) on R a point process. Equivalently, D can be characterized by the sequence of its jump points D = (σ D k ) k∈Z with the conven- The number of jumps of a point process D in a Borel time set B ∈ B(R) can then be calculated as We assume that all considered point processes D are simple, i.e. the jump points are distinct, σ D j = σ D j+1 for all j. Further, note that simple point processes do not have any accumulation points so that the sequence of jump points diverges and in finite time sets only a finite number of events occur.
We define n-variate point processes D on the space R as vectors whose components are point processes on R, i.e., D(B) = (D 1 (B), . . . , D n (B)) t for B ∈ B(R), where t denotes transposition. For further details on point processes we refer the reader to [12].
The conditional expectation E[D(B)|A] of the random variable D(B) relatively to the multivariate point process A is defined as the conditional expectation of D(B) relatively to the σ-field generated by the random jump points of all component processes of A. Note that there are at most countably many such jump points since all D i are simple.
dt. E n is the ndimensional identity matrix, O n×m the n × m matrix with all entries equal to zero and 1 n is the n-dimensional vector with all components equal to 1. N C n (a, C) indicates an n-variate complex normal distribution with mean vector a and variance-covariance matrix C. Moreover, for B ∈ B(R), let 1 B (·) be the indicator function defined by For B ∈ B(R) and u ∈ R we denote by B − u the set {b − u : b ∈ B}. As usual, we set empty sums equal to zero. Furthermore, we assume for all processes and random variables an underlying common probability space (Ω, F , P ) to be given.
Tandem network of two nodes.

Tandem models of queues
We present here our approach and results in the framework of a stationary tandem network of two G/G/∞ queues in series according to Figure 1. Generalizations of the model to tandem systems of more than two nodes and to systems with positive feedback probabilities at the nodes are given in Section 6. We assume that customers which enter the system to obtain service at the nodes are indistinguishable. Customers arrive at nodes 1 and 2 from the outside according to general stationary point processes A 1 and A 2 on the time axis R. We denote the sequences of jump points of the input processes A 1 and A 2 by (σ A1 j ) j∈Z and (σ A2 j ) j∈Z respectively with . .
. . referring to the arrival times of customers from the outside to node 1 and . . .
. . referring to the arrival times of customers from the outside to node 2 respectively. We assume that the point processes A 1 and A 2 are simple, i.e. each jump point σ Ai j of A i refers to the arrival of exactly one customer and the jump points are distinct, σ Ai j = σ Ai j+1 for all j and i = 1, 2. At each node there are infinitely many servers, no waiting occurs. Service at node i is provided for each customer according to a service time distribution with (Lebesgue) density g i (·) on R + (denoted in Figure 1 by "∼ g i (·)"), i = 1, 2. We assume mean service times to be finite, i.e. ∞ 0 t g j (t)dt < ∞ for j = 1, 2. After the completion of service at node 1 a customer decides with probability 1−p 1 to leave the system and with probability p 1 to move on to node 2 to obtain service there. After being served at node 2 all customers leave the system. We assume here that p 1 ∈ (0, 1]. For p 1 = 0 the model would reduce to the single node case considered in [7]. Note that for p 1 = 1 we have no observations of the departure process D 1 which is no problem though since then the matrix H(·) given in (7) shows a simple form which renders the estimation procedure facile.
We denote by D 1 = (σ D1 j ) j∈Z and D 2 = (σ D2 j ) j∈Z the external output processes from node 1 and node 2 respectively where (σ Di j ) j∈Z is the sequence of the successive time instants at which customers leave the system from node i, i = 1, 2. Clearly, D 1 and D 2 are point processes on R.
We assume that the arrival processes A 1 and A 2 are independent of each other and that all service times at nodes 1 and 2 are independent of each other and of the inter-jump times of the processes A 1 and A 2 . Furthermore, the routing decisions of the customers after their respective services at node 1 are taken independently of each other and of the past of the system.
We observe the incoming and outgoing customers of the system over the time interval (0, T ], i.e., our observations are part of the sequences of jump points We assume the system to be in steady state at time 0. However, the actual number of customers at the nodes at time 0 is not known. The times of movements of customers from node 1 to node 2 are not observable. Our aim is to construct nonparametric estimators of the service time distributions given by the densities g 1 (·) and g 2 (·) as well as an estimator of the routing probability p 1 . Note that we aim high here since we cannot assign the departures to the arrival points and directly measure the individual sojourn times of the customers in the system. If we could, the estimation of g 1 (·) and g 2 (·) would be straightforward. In particular, for a customer leaving the system via the departure process D 2 we cannot tell if he has entered the system via A 1 or via A 2 . Thus, at first it is not clear at all if the unknown parameters are identifiable from the given observations.
The underlying point process structure of the system leads to the following considerations: There is a jump of the output process D 1 in a time Borel set B ∈ B(R) if at least for one customer the sum of her arrival time σ A1 j to node 1 and her service time γ (1) j at node 1 is in B and she decided to leave the network after being served at node 1 instead of jumping on to node 2. Therefore, we deduce for the output process D 1 : with random Bernoulli variables Z (1−p1) j ∼ Bern(1 − p 1 ) and random variables γ (1) j ∼ g 1 (·) for all j ∈ Z. All these random variables are independent from each other. Here Z (1−p1) j corresponds to the customer's routing decision after the service at node 1 has expired. If Z (1−p1) j equals 1 she leaves the network, otherwise she jumps on to node 2.
Taking the conditional expectation of (1) conditioned on the arrival process A 1 we obtain with the theorem of monotone convergence: Analogously we derive for the output process D 2 : with random variables γ (12) j ∼ g 2 (·) and γ (2) j ∼ g 2 (·) for all j ∈ Z, all independent from each other. The reason behind is as follows. There are two different explanations for a jump of D 2 in a time set B ∈ B(R): Either for a customer the sum of her arrival time σ A1 j to node 1 and her consecutive service times γ (1) j and γ (12) j at nodes 1 and 2 is in B and she decided after being served at node 1 to jump on to node 2. Or for a customer the sum of her external arrival time σ A2 j to node 2 and her service time γ Taking the conditional expectation of (3) conditioned on the arrival processes A 1 and A 2 yields for B ∈ B(R): Summarizing (2) and (4) in a matrix notation gives for B ∈ B(R), For later purposes we compute the Fourier transform of the matrix h(·) componentby-component, where G j (λ) := ∞ 0 exp(−iλt)g j (t)dt for j = 1, 2. We recognize that the influence of the vector A := (A 1 , A 2 ) t of arrival processes on the vector D := (D 1 , D 2 ) t of departure processes is given in a linear time-invariant manner (for a precise definition see Section 4). We will study general linear time-invariant relations between multivariate point processes in the next section and then in Section 5 apply the results to the particular model given by (5) and (6) to solve our nonparametric estimation problem.

Linear time-invariant models for multivariate point processes
This section is devoted to the analysis of functional linear time-invariant relations between two multivariate point processes. These kinds of models were e.g. studied by Brillinger in the context of time series in [9], in the framework of one-dimensional processes with stationary increments in [7] and were sketched in the context of stationary interval functions in [6]. We provide here an analysis in the general framework of multivariate point processes.
Consider a functional relation R between two multivariate point processes A = (A 1 , . . . , A n ) t and D = (D 1 , . . . , D m ) t on the space R, . . .
Thus, the application of the relation R on the point process A on R yields a new point process D on R which may have a different dimension. The relation R is said to be linear if where A v denotes the shifted point process defined by A v (B) := A(B + v) for all B ∈ B(R) and v ∈ R. A linear, time-invariant operation R is also called an m × n linear filter. We will assume here that a linear filter does only describe the probability structure of the influence of a multidimensional process A on the process D up to some additional noise vector ε, which is independent from A, i.e., Taking conditional expectations conditioned on A on both sides of the equation The error process then can be expressed by We restrict ourselves here to the important class of m × n matrix linear filters taking the form where h is a nonnegative m × n-matrix with . . , n} and µ is an n-dimensional real vector. The vector µ plays the role of an additional linear trend with which in our model the number of expected points of D given A may be shifted by a constant. The integral in (9) is to be understood by components, i.e., It is obvious that the operation R is time-invariant, but only linear if µ is the zero vector. However, following Brillinger we call this model in all cases the linear time-invariant model. Altogether, our model takes the form with multivariate point processes A and D.
As in cross-spectral analysis of stationary processes the matrix h(·) in (9) is called the impulse-response function of the filter R. The Fourier transform of h(·) taken component-by-component, is called transfer function of R.
Our aim is to derive estimators for µ and H(·) and then for h(·) from observations of the point processes A and D in the time interval (0, T ]. The approach is as follows: After having introduced the relevant notations we derive a crucial relation between the matrix H(·) and spectral matrices of the point processes A and D (see (ii) of Proposition 4.4 below). We then apply smoothed periodograms as estimators for the spectra to plug-in estimators for H(·) given by equation (19) below. We show under assumptions that these estimators H (T ) (·) are asymptotically normal in Theorem 4.10. Then, we define estimators for h(·) based on the estimators H (T ) (·) and prove under additional assumptions also their asymptotic normality, see Theorem 4.11.
For the statistical analysis we make the following assumptions:  Stationarity of the process N implies that the cumulant measures of the process decouple into so-called reduced cumulant measures and Lebesgue components which only depend on the lag of the arguments (see e.g. [11] (Chapter 8) and [6]), i.e., for all k ≥ 1, a 1 , . . . , a k ∈ {1, . . . , n + m} and B 1 , . . . , B k ∈ B(R) we have . Note that the (reduced) cumulant measures are in general only signed measures. (12) directly implies for the first moment measures that there are constant vectors η N ∈ R n+m such that In particular, it follows that there are vectors η A ∈ R n and η D ∈ R m with Via the construction of measurable functions (13) can be extended in the usual way for nonnegative measurable functions f : R → R: Moreover, it follows from (12) that for the covariance measures there exist for a 1 , a 2 ∈ {1, 2, . . . , n + m} reduced covariance measuresC Na 1 Na 2 (dt) such that for nonnegative measurable functions f, g : (15) In order to define Fourier transforms of the reduced covariance measures we impose so-called mixing conditions on these measures which are kind of asymptotic independence conditions on the increments of the point process N.
Definition 4.3. The second-order spectra f Nj N k are defined as the Fourier transforms of the reduced covariance measures: For j = k, f Nj Nj is real and nonnegative and called power spectrum, for j = k, f NjN k is complex-valued and called cross spectrum. For λ ∈ R we summarize spectra of particular importance in the matrices Note that the error process ε given in (8) also defines a multivariate random measure on B(R). With the corresponding matrixC εε of reduced covariance measures we have for λ ∈ R, Important properties of the model (10) are now summarized in the next theorem.
We first note that for a nonnegative function h on R and a measureC on B(R) the convolution h * C is defined by Proof. The proof of (i) is straightforward: Choosing B = (0, 1] and taking the mean on both sides of (10) yields with (14): Note that the interchange of integral order in the third equation is valid due to the theorem of Tonelli since by assumption each component of the matrix h(s)ds is finite. For the proof of (ii) we have on the one hand for i = 1, . . . , m, j = 1, . . . , n and for all B ∈ B(R) due to (15): On the other hand we compute for i = 1, . . . , m, j = 1, . . . , n and for all B ∈ B(R): We conclude from (16) and (17) that the measuresC DiAj (·) and

1683
For the proof of (iii) we set for a function h, h(x) := h(−x) for x ∈ R. Fix i, j = 1, . . . , m and B 1 , B 2 ∈ B(R). We first note that due to (8), Next we calculate with (15): Similarly, it follows: Furthermore we have: Putting all together yields that the measuresC Dj Di (·) andC εj εi (·) + n k,l=1 (h jl * C A l A k * h ik )(·) are equal on B(R). This implies: T respectively, see Proposition 5.6 below for a discussion of these estimators. For the construction of estimators of the spectral matrices see below. Natural candidates for estimators of H(λ) and µ then are given by (b) In case of n = m = 1 the gain of the process D over the process A at frequency λ is given by: If the error-process ε is equal to zero then (ii) and (iii) yield (compare with the discussion of (37) below) This explains the term gain since in the model (10) the amplitudes of the input process A at particular frequencies are intensified by L.
The most common nonparametric estimators of second-order spectra are based on the so-called periodogram which results from taking the Fourier transform of the natural estimatorC (T ) Nj N k of the reduced covariance measureC Nj N k given by The periodogram is asymptotically unbiased, but unstable and not consistent. It is known that the periodogram ordinates at distinct frequencies near a frequency λ are asymptotically independent estimates of the spectrum (see e.g. [9], Theorem 5.2.6 for the case of time series). This leads to the fruitful idea of smoothing the periodogram in the neighborhood of λ by means of weighted averages of several periodograms: Definition 4.6. The smoothed periodogram at frequency λ ∈ R is given by with a bandwidth B T ≥ 0 fulfilling B T → 0 as T → ∞ and a weight function W which is of bounded variation, standardized, even and fulfills W (α) = 0 if |α| > 0.5.
As well-known examples for the weight function W we mention the rectangular kernel, the triangular kernel and the Epanechnikov kernel.
We recognize in Definition 4.6 the classical bias-variance trade-off of nonparametric statistics: The sum in (18) is taken over all s = 0 with 2πs . Thus, the larger B T is, the more periodograms are considered for the estimation which reduces the variance. However, the larger B T is, the more periodograms with frequencies 2πs T are considered which show a significant distance to λ. This can produce a higher bias. A way out is to let the bandwidth B T go to zero in a speed controlled manner by postulating that B T → 0 and The smoothed periodogram given by (18) is stable and consistent. For further details and proofs for periodogram based estimators we refer to [9] (in the context of time series). There are various other ways of estimating the spectrum. We point to [4] for references and for a new approach to construct a shrinkage estimator with a smaller L 2 -risk than the smoothed periodogram.
The next theorem shows that vectors of smoothed periodograms as defined in Definition 4.6 are asymptotically normal. For a proof based on the cumulant method see [7], Theorem 4.1.
Theorem 4.7. Let N be a multivariate point process on R satisfying Assumption 4.2. Assume that for all j, k, |f . . .
For B = (b ij ) (i,j)∈{1,...,m}×{1,...,n} and an s × t matrix V, the ms × nt Kronecker matrix-product B ⊗ V is defined as Useful properties of these two matrix notations are given in the following proposition, see [26].
To estimate the transfer function H of the model (10) at frequency λ ∈ R we use the plug-in estimator with second-order spectral estimators f (T ) NiNj (λ), i, j ∈ {1, 2, n + m}, constructed according to (18).
We proceed with a technical result which is essential for the proofs of the two main results in this section: Lemma 4.9. With the notations from above the following decomposition holds: Proof. Using Proposition 4.8 we have: We now turn to the first main result in this section. It shows the asymptotic behavior of the estimator for the transfer function of the linear time-invariant model (10). For an analogue in the framework of time series we refer to [9], Theorem 8.8.1. T T → 0 as T → ∞. Then, as T → ∞, we have for the estimator of the transfer function H given by (19), for λ ∈ R\{0}, Proof. We divide the proof into 3 steps.
First we show that all components of the last two summands converge to zero in probability: All components of the vectors vec 2 ) and f  (20) is a finite sum of asymptotically normally distributed variates multiplicated with variates which converge in probability to zero. Then, by the Lemma of Slutsky all components of the two summands converge to zero in probability.
The limiting distribution of √ B T T vec(H (T ) (λ) − H(λ)) therefore is completely described by the first two summands in (20). This shows the asymptotic normality of √ B T T vec(H (T ) (λ) − H(λ)) due to Theorem 4.7. The asymptotic mean clearly is the zero vector because of the unbiasedness of f Step 3: Asymptotic variance-covariance matrix. Using Proposition 4.8 we compute the asymptotic variance-covariance matrix of This yields the asymptotic covariance matrix given by with (due to Theorem 4.7) Hence, (22) is equivalent to applying Theorem 4.4(iii) in the last line.
In the linear time-invariant model it will often be desirable to have an estimate of the impulse-response function h(·) (time domain), rather than of the transfer function H(·) (frequency domain) only. This will e.g. be important for the statistical analysis of the stochastic tandem model in the next section. It follows from the definition of H(λ) in (11) that For estimation of h(u) we follow the suggestion of [7] in the univariate case and use for small C T and large Q T . The factor 1−cos(CT u) πCT u 2 is inserted there and here in the same manner as in [5]. C T plays the role of the bandwidth of the interpolation intervals. For the proof of the asymptotic normality of the estimator h (T ) (u) the crucial condition is the existence of a constant L 1 < ∞ such that sup λ∈R f AA (λ) −1 ∞ ≤ L 1 .
In the following let for a matrix B, B ∞ denote the maximum absolute row sum norm which is in particular sub-multiplicative and compatible with the vector maximum norm · ∞ . (10). Suppose that all assumptions of Theorem 4.10 are in force.

Theorem 4.11 (Asymptotic Normality for h (T ) (u)). Consider the linear timeinvariant model
Then, as T → ∞, we have for the estimator of the impulse-response function h given by (23), for u ∈ R, Note in particular that the asymptotic variance-covariance matrix is independent of the argument u.
Proof. We divide the proof into 4 steps.
Step 1: Decomposition of vec h (T ) (u) . It follows from Lemma 4.9 that for all u ∈ R: Step 2: Asymptotic behavior of term T 1 . We first show that Due to the assumptions we have for some r > 0 for each component j of vec(H(λ)): This implies with Result 2.2 in [5](page 120) (note that Bohman's term ϕ ′′ (0) here corresponds to − u 2 |vec(h(u)) j |du): Thus, we have that which, by assumption, converges to 0 as T → ∞.
Step 3: Asymptotic behavior of term T 3 .

We first show that sup
We note that under the assumptions given, we have for any ǫ > 0 as T → ∞ that sup QT q=−QT |f ). Using the Lemma of Dunford-Schwartz as in the proof of Theorem 4.10, see (21), we conclude: . Further, the mixing conditions (Assumption 4.2) imply that there is a constant K 1 < ∞ such that sup λ∈R f DA (λ) ∞ ≤ K 1 . Thus, it follows for any ǫ > 0: Further, this implies for T 3 : since (see [5] page 118) Choosing ǫ < 0.5 we conclude for the asymptotic behavior of T 3 : which converges to 0 as T → ∞ since by assumption then Q 3 Step 4: Asymptotic normality of term T 2 .
The term Γ T (u) determines the asymptotic behavior of vec(h (T ) (u)). It directly follows from Theorem 4.7 that Γ (T ) (u) is asymptotically normal of rate √ B T T with zero mean vector. Applying Theorem 4.7, Proposition 4.8 and the fact that f t AA (λ) = f AA (−λ) for all λ, the asymptotic variance-covariance matrix of Γ T (u) can be computed as follows. Note that in the computations below we have left out the term 2π W (α) 2 dα to reduce the length of the equations. Let asCov denote "asymptotic covariance".
Using again that 1−cos(CT u) πCT u 2 = O(C T ), the proof is complete.

Nonparametric estimation for the tandem model of two nodes
In this section we apply the results of the preceding section to the estimation problem in the tandem network model (Figure 1) given by (5) and (6) as discussed in Section 2. We will in particular construct estimators for the Fourier transforms G 1 and G 2 , for the routing probability p 1 , for the densities g 1 and g 2 and for the multiple coherence of the model. For all estimators we will prove their respective asymptotic normality under appropriate conditions. We see that the tandem model is represented by the linear time-invariant model stated in (10) with µ = 0, the matrix h given by (6), and A and D being two-dimensional stationary point processes. Due to Remark 4.5 the unknown parameters G 1 (λ), G 2 (λ) and p 1 are identifiable provided that the spectral matrix f AA (λ) is invertible, λ ∈ R.
Since by assumption the arrival processes A 1 and A 2 are stochastically independent of each other, f A1A2 (λ) = 0 for all λ ∈ R, which yields the spectral matrix f AA (λ), λ ∈ R, to be given by: Furthermore, the arrival process A 2 at the second node has no influence on the external departure process D 1 of the first node and vice versa, thus f D1A2 (λ) = 0 for all λ ∈ R. This implies: Estimation of the Fourier transforms G 1 (·) and G 2 (·) A comparison of (24) with (7) yields for the Fourier transform G 2 (·) of the service density at the second node and for the Fourier transform G 1 (·) of the service density at the first node Remark 5.1. Considering the spectrum as a measure of coherence between the respective processes the forms of G 1 (·) and G 2 (·) can be explained in an intuitive way: G 2 (λ) describes at frequency λ the coherence between the external input process A 2 and the external output process D 2 at node 2 after subtraction of the inner coherence of the arrival process A 2 . G 1 (·) can be interpreted similarly.
In analogy to the last section we construct estimators of G 1 (·) and G 2 (·) based on smoothed periodograms as estimators for the spectra according to (18): , λ ∈ R. 2 (λ) given by (27) and (28) for T → ∞ is then characterized by for a sequence of bandwidths (B T ) T ≥0 satisfying B T → 0, B T T → ∞ and B 5 T T → 0 as T → ∞. The asymptotic variance-covariance matrix is given by Proof. In analogy to the proof of Theorem 3.1 in [7] it follows that the point process N = (A, D) t on R fulfills Assumption 4.2 since the mean sojourn time in the network is finite. By the delta method (see e.g. [32], Theorem 3.1) asymptotic normality follows from Theorem 4.10 with the fact that The total derivative of the function φ at vec(H(λ)) is given by The asymptotic variance-covariance matrix can then be computed by which yields the stated terms.
Here u 1− α 2 denotes the (1 − α 2 )-quantile of the real-valued standard normal distribution. Clearly, the goodness of the approximation increases as T → ∞.
Example 5.4. Consider the case that the external arrival processes A 1 and A 2 are Poisson processes with constant intensities κ 1 and κ 2 , i.e., the tandem system under study consists of two nodes of type M/G/∞. It follows then for the reduced covariance measuresC AiAi (du) = κ i δ 0 (du) and thus the power spectra are given by f AiAi (λ) ≡ κi 2π , λ ∈ R, (see [11], Example 8.2(a)). Since the intensities κ i can directly be estimated by the observed arrival times of the processes A i , the estimation of the power spectra f AiAi (λ) is here straightforward without use of the periodograms, i = 1, 2. However, the estimation of the cross spectra (like f D1A1 (λ)) cannot be simplified, although the departure processes D 1 and D 2 are also Poisson processes with constant intensities (for a proof see [1], Theorem 1.3 in Chapter IV.1).
Estimation of the routing probability p 1 Now we turn to the estimation of the routing probability p 1 in the tandem network from Figure 1 in steady state. We first show how we can use the developed spectral theory to derive an asymptotically normal estimator. Equation (7) directly yields an equation for the routing probability p 1 in terms of the components of the transfer function H(·): It is interesting to see that the right hand side thus is independent of λ. However, an estimator of the form formally depends on the frequency λ under consideration. Therefore, we make a smoothing approach and define our estimator for p 1 by the minimum of the quadratic loss, i.e., p Taking the derivative of (30) with respect to p 1 yields: where the last equality is due to H 11 (λ). The second derivative of the quadratic loss (30) with respect to p 1 is positive and thus, the optimal p (T ) 1 is given by with G is by definition real valued. Since for all λ ∈ L indeed quantifies the underlying true routing behavior of customers in the tandem network.
Each summand of the numerator is presentable as function value of a differentiable mapping ϕ of a finite set of spectral estimates f (T ) DiAj (λ k ) with i, j ∈ {1, 2}, λ k ∈ R, k ∈ N, appropriate. Due to Theorem 4.7 the delta method yields that converges in distribution to a normal distribution with mean for each λ ∈ L. Thus, the numerator converges in distribution to a normal distribution with mean p 1 · λ∈L |G 1 (λ)| 2 . The asymptotic variance term s 2 can be computed based on the derivative ϕ ′ of the mapping ϕ involved following the delta method by ϕ ′ Σϕ ′t with Σ from Theorem 4.7. However, the term s 2 is too lengthy and complicated to be given here in explicit terms. By the lemma of Slutsky the proof is complete.
An alternative estimator for the routing probability p 1 based on observations of the tandem system in the time interval (0, T ] can be constructed as Intuitively, as T → ∞, the observed relative frequency of arrivals at node 1 which depart from the system after having been served at node 1 should tend to the true routing probability 1 − p 1 . Due to Theorem 4.4 we have Then we have for all k ≥ 1 and a 1 , . . . , a k ∈ {1, 2, . . . , m + n} as T → ∞: with matrix Γ given by the entries Γ aiaj = f Na i Na j (0) = 1 2πC Na i Na j (R).
Note that due to stationarity the mean E(q It follows an asymptotic result for the estimator p Proof. The proof relies on a Taylor expansion analogously to the proof of Theorem 4.10: We recognize for the asymptotic behavior of the estimator p (T ) 1 the optimal parametric rate √ T .
Estimation of the densities g 1 (·) and g 2 (·) We now turn to the estimation of the service time densities g 1 (·) and g 2 (·). It is clear from the structure of the impulse-response function h(u) defined in (6) that without knowing the routing probability p 1 the estimation of g 1 (·) and g 2 (·) is a hard problem due to the convolution involved. As we have seen this was not the case for the estimation of the Fourier transforms G 1 (·) and G 2 (·), a big well-known advantage of estimation in the frequency domain. In this subsection, we assume that p 1 is known, which is no crucial restriction given that from the previous subsection we have an estimate of p 1 with rate √ T at hand. Then, clearly, appropriate estimators for g 1 (u) and g 2 (u) are for u ∈ R with small C T and large Q T . As a direct corollary from Theorem 4.11 we obtain: 2 ). Consider the tandem model of queues given by (5) and (6) as described in Section 2. Suppose that all assumptions of Theorem 5.2 are in force. Moreover, assume for j = 1, 2 that inf λ∈R f Aj Aj (λ) ≥ 1 L1 for a constant L 1 < ∞. In addition, let |u| 2 h(u) ∞ du < ∞ and |λ| r H(λ) ∞ dλ < ∞ for some r > 0. Further, consider Then, as T → ∞, we have for the esti-mators of the service time densities g 1 (·) and g 2 (·) given by (34) and (35) for u ∈ R,

Estimation of the multiple coherence
In the rest of this section we will take a closer look at the role of the error process ε in our setting, The multiple coherence of the output process D i with the input processes A 1 and A 2 at frequency λ ∈ R is defined as compare with [9] p.296, for the time series case. The notion multiple coherence for processes is an analogy to the correlation of random variables and represents a measure of a model's goodness of fit. It expresses the extent to which the output process D i is determined by the input processes A 1 and A 2 through linear time-invariant operations. Theorem 4.4(iii) implies that for i = 1, 2 and λ ∈ R, Since power spectral matrices are non-negative definite we have 0 ≤ |R DiA (λ)| 2 ≤ 1 for all λ ∈ R, i = 1, 2. We construct an estimator for the quantity |R DiA (λ)| 2 in the obvious plug-in manner analogously to the estimator of the transfer function, with spectral estimators as defined in (18). We have the following asymptotic result: Theorem 5.9. If in addition to the conditions of Theorem 5.2 it holds f DiDi (λ) > 0 for λ ∈ R, then the asymptotic behavior of the estimator |R (T ) DiA (λ)| 2 for i = 1, 2 is as T → ∞ given by: Proof. The proof runs analogously to the proof of Theorem 4.10. The main step to show is: In case of |R DiA (λ)| 2 ≈ 0, relation (36) implies that the power spectrum of the error process ε i at frequency λ is nearly as large as the corresponding value of the power spectrum of the output process D i . This case is called incoherent, since the error variance is not reduced by the input processes A 1 and A 2 .
In case of |R DiA (λ)| 2 ≈ 1, called the perfectly coherent case, there is a strong linear relationship between the input processes A 1 and A 2 and the output process D i at frequency λ ∈ R and the spectrum of the error process ε i at λ is close to zero.
Thus, the multiple coherence |R DiA (λ)| 2 is seen to provide a measure of the degree of linear relationship between the process D i and the input processes A 1 and A 2 at frequency λ.
Recall the classical spectral representation theorem of Cramér for stationary processes, see e.g. [9] or [10]. In a modified version it says that the stationary process ε i can be written in the form, for B ∈ B(R), with |K(λ)| 2 = f εiεi (λ) for all λ ∈ R and a complex-valued stochastic process Z with orthonormal increments. Note that ε i ((0, t]) − lim s↑t s =t ε i ((0, s]) is the jump level of ε i at time point t. The spectral representation shows a decomposition of the process into periodic components (sine and cosine) of frequencies λ and random amplitudes with variance |K(λ)| 2 = f εiεi (λ). It enlightens the meaning of the name power spectrum: The value f εiεi (λ) gives the power of the component of frequency λ.
Assuming that |R DiA (λ)| 2 is close to 1 for almost all λ ∈ R, the error process ε i itself is reduced to a small quantity close to zero according to the spectral representation (37). By definition of ε i this implies the approximation, Thus, for all time instants t the number of external departures from node i is nearly equal to the expected number of external departures from node i upon complete knowledge of the arrival processes A 1 and A 2 . Equality would mean that the output process D i is entirely determined by the knowledge of A 1 and A 2 which would imply that the service times are deterministic, i.e. the service densities are Dirac measures, and that also the routing decisions are deterministic, i.e. p 1 equals 0 or 1. Clearly, the observed data of the external arrival and departure processes would also indicate this situation.

Generalizations of the tandem model of two nodes
We sketch here how our statistical approach of the tandem model studied so far can be extended to more general queueing models. We always assume that we are able to observe all external arrival and external departure processes at the G/G/∞ nodes of the networks, but have no information about the movements of the customers between the nodes in the system. The aim is to estimate the service time distributions at the nodes as well as the routing probabilities.

Feedforward models of more than two nodes
Let us consider a tandem system of three nodes according to Figure 2. The structure of customer movements leads to the following equations for Borel time sets B ∈ B(R): j,l + γ with Bernoulli variables Z p 10 p 10 +p 12 j ∼ Bern( p10 p10+p12 ) as well as random variables γ (1) j , (γ (11) j,l ) l=1,2,...,W p 11 j ∼ g 1 (·), and γ j are geometric random variables with success probability (1−p 11 ), i.e. P (W p11 j = N ) = p N 11 (1−p 11 ) for N ≥ 0, for all j ∈ Z. Analogously, W p22 j and W p22 j are geometric with success probability (1 − p 22 ) for all j ∈ Z. All random variables are independent from each other. In a matrix notation this yields The corresponding transfer function H(·) is then (note that |G 1 (λ)|p 11 < 1): This yields that for identification of the five unknown parameters p 10 , p 12 , p 20 , G 1 (λ), G 2 (λ), λ ∈ R, there are just three equations at hand. With respect to the routing probabilities we first observe that p12 p10 = H21(λ) H11(λ)H22(λ) , thus, if the feedback probability p 11 is unknown we can just determine the ratio between p 12 and p 10 , not their absolute values. Furthermore, it is technically and intuitively clear that we cannot separate p 11 and G 1 (λ) (and p 22 and G 2 (λ) respectively) here without any further information. Therefore, for tandem systems with positive feedback probabilities according to Figure 3 with our method we may just identify the total sojourn time distributions of customers at the nodes, given by Asymptotically normal estimators for the unknown parameters may then be constructed according to the relations , and .
The case of queueing systems of two nodes with positive routing probability p 21 for jumping back from node 2 to node 1 is much more involved. The statistical analysis of such stochastic networks of general topology is postponed to a following paper.

Simulations
We illustrate in this section the practicality of the estimators derived with a simulation study performed in R. We simulated the tandem network of two nodes given in Figure 1 under the following conditions. The external input processes A 1 and A 2 are assumed to be self-exciting processes with exponential decay (see [20] and [27]), i.e., they are characterized by their conditional intensities (with ν i ≥ 0, τ i < β i ), for t ∈ R, i = 1, 2, These processes belong to the class of Hawkes' processes which constitute the binary analogues of autoregressive time series. They have been extensively applied in various fields in science and engineering such as e.g. statistical seismology, auditory physiology, and electrical engineering, see [24]. With the chosen exponential decay the corresponding intensity is a Markov process and the covariances of the sequence of jump intervals are positive and decreasing, see [27]. In [28] an algorithm to simulate the process is proposed which we have followed. The advantage of this choice of arrival processes lies in the fact that we can easily show the sensitivity of our estimators with respect to the inner dependence of the arrival processes. If we increase β i , then the inner dependence of the process A i decreases (compare with [28], Figures 7 and 9) and intuitively the estimators should then behave in a better way. The service times are chosen to be exponentially distributed with parameter θ 1 = 0.8 (node 1) and θ 2 = 3 (node 2) respectively. The routing probability is set equal to p 1 = 0.6. The weight function W used in the calculations of the smoothed periodograms as estimators for the spectra according to (18) is assumed to be the triangular kernel given by W (x) := (2 − 4|x|) · 1 [−0.5,0.5] (x) for x ∈ R.
T is set to 20000. The bandwidth B T is taken of order O(T −0.25 ) fulfilling the conditions of the theorems. The characteristic function of the exponential distribution with parameter θ is explicitly computable. For the Fourier transform of the service time distribution at node j follows for j = 1, 2: G j (λ) = θj θj+iλ which yields for its real part Re(G j (λ)) = θ 2 j θ 2 j +λ 2 and for its imaginary part Im(G j (λ)) = −θjλ θ 2 j +λ 2 . In the plots for the estimation of the real and imaginary parts of the Fourier transforms G j (·) the black solid lines give the true functions and the dashed lines display the estimated functions. In addition we have plotted confidence intervals as grey solid lines. We have to stress here that these are just pointwise confidence intervals given for each λ which we have connected (compare with Remark 5.3). Up to now we have no uniform asymptotic normality results for our estimators at hand and thus no uniform confidence bands (over all λ) are available. Furthermore, we have estimated the entries of Σ which occur in the calculation of the confidence intervals which is the reason for these intervals sometimes being not sufficiently accurate. Nevertheless, the simulation results illustrate the good quality of our estimators.
We present here results for two different parameter settings for the arrival processes A 1 and A 2 . In the first case called "β large" the processes are chosen to show little inner dependence. Here τ 1 = 0.4, β 1 = 18, ν 1 = 1.6 and τ 2 = 0.2, β 2 = 15, ν 2 = 1.5. On the contrary in the second case called "β small" the processes show more inner dependence, the parameters are set to τ 1 = 1, β 1 = 1.2, ν 1 = 0.29 and τ 2 = 0.9, β 2 = 1, ν 2 = 0.16. In both cases in our simulated time interval approximately 30000 customers entered the network via A 1 and A 2 respectively so that the situations can be well compared. Figure 4 illustrates the different inner dependence structure of the processes in the cases "β large" (above) and "β small" (below). Here a cross indicates a customer having entered the system via the process A 1 and a circle indicates a customer having entered the system via A 2 . In case of "β small" the greater inner dependence is shown by obvious clusters in the set of arrival points of A 1 and A 2 , whereas in case of "β large" the occurrence of arrival points is more regular. Figures 5 and 7 present the estimators given in (27) and (28) for the real and imaginary parts of G 1 (·) (above) and G 2 (·) (below) in case of "β large" and "β small". We observe that as expected the behaviour of the estimators is better in case of "β large". We notice that for "β small" the estimators are in particular inaccurate in the frequency interval (0, 1), and since the asymptotic variance is also estimated the confidence intervals are here also not satisfactory. Furthermore, note that in the definition of the estimator for G 1 (λ) the estimator for the cross-spectrum f D2A2 (λ) appears in the denominator whose absolute value can become very small. Simulations have shown that every time the absolute value of f D2A2 (λ) is near to zero the estimator for G 1 (λ) behaves badly, which is no surprise. Thus, we have stabilized the estimator G (T ) 1 (·) in our simulations by adding the small term 10 −3 in the denominator of the second summand. Then the behaviour of G (T ) 1 (·) is very good as Figures 5 and 7 show. An even better estimation result for G 1 (·) can be achieved by computing the estimator p 1 (T ) from (33) and plugging it into the term suggested by the relation H 11 (λ) = G 1 (λ)(1−p 1 ), compare with (7). By standard computations the following asymptotic result can be proved for the estimator G 1 (T ) (λ): The corresponding plots are given in Figure 6 ("β large") and 8 ("β small"). Figure 9 is based on 200 Monte Carlo simulations for two different estimators of the routing probability p 1 . The parameters for the arrival processes are chosen as in case of "β small", but with T = 2000. Instead of using the optimal estimator p (T ) 1 defined in (31) we averaged the values of the estimator p 1 (T ) given in (29) over five different frequencies λ. This considerably reduced the computational complexity. The result is shown in Figure 9 (above). The plots below correspond to the intuitive estimator p (T ) 1 given in (33). The histograms and the q-q-plots show a very good performance for both estimators; as expected from the theoretical results the intuitive estimator has a smaller variance.