SAMPLING OF RANDOM DATA STREAMS

Modern telecommunication networks work on the transmission method of common data streams in which data bursts consisting of packets that further consist of particular bits are multiplexed from various traffic sources. The larger amount of data is transmitted through a transmission medium (optical fibre), the more frequently bursts occur, and the lower amount of data, the more rarely they follow. If it is required to monitor how large amount of data is being transmitted in a network branch in order to find out, to which measure of this branch is occupied, it is not necessary to take each information unit (each packet or even each particular bit). It will do if information whether a data burst occurs in the transmission or does it not occur is taken in certain time intervals – with a certain sampling frequency. The paper deals with this sampling intervals.


Introduction
There are many types of data bursts on a transmission route between two network nodes from various traffic sources.The data burst length is considered as a random variable.According to Fig. 1, the burst train in a common data stream can be described as a random process X(t) consisting of series of random occurring rectangular pulses with a random amplitude that only gains 2 values of x 0 = 0 and x 1 = 1 and with a random time length  that gains values of  k , k = 1, 2, ... The mean level or the average value of this random process also expresses the average traffic use of a communication channel, e.g. with which part of the total time T theoretically infinitely long the channel was occupied by the transmitted data bursts.In other words, the value y also expresses the probability that the communication channel will be in any moment of sampling t v found as occupied.
Further, let  be the count of seizures of the communication channel by the transmitted data bursts and  the count of releases of the communication channel during a time unit.The ratio represents the traffic offered to the communication channel.The inverse value of is the average burst length.

Theory 2.1 Random Process Characteristics
The occurrence of data bursts in the common data stream can be considered as the Markov random process because the features of such a process are fulfilled:  a data burst occurrence is fully random,  bursts last for a random time,  the probability of a data burst occurrence or end of a burst during a very small time interval t will be proportional to the duration of this interval.
These features represent a hidden deterministic element in the random process.The consequence of this is that it is possible to explicitly derive basic characteristics -correlation and co-variation functions, dispersion and spectrum for that random process [1], [2], [3], [4].

1) Solution of Random Process
The random process gains only 2 states: S 0communication channel is free and S 1 -communication channel is occupied.
The process stays in the zero state S 0 if no data burst occurs during t, the probability of which is: The process transits from the state S 0 to the state S 1 (communication channel will be seized) if a data burst occurs during t.The probability of this is: ( The process stays in the state S 1 if a data burst does not finish during t, the probability of which is: The process transits from the state S 1 to the state S 0 (communication channel will be released) if a data burst finishes during t.The probability of that is: The probability that more than 1 change occurs in the state of the channel during t is equal to zero.
The probabilities p 00 , p 01 , p 10 , p 11 are arranged into the matrix of transition probabilities: The matrix J is the unit one.The matrix A is the matrix of intensities of transition probabilities.In general, the elements of the matrix A can be time dependent.But if it is supposed the ergodicity of the random process, e.g. the Markov process is homogenous, the time dependence does not exist.

By the solution of the basic equation of the Markov random process
it can be calculated, with which probability p 0 (t) the communication channel will be free and with which probability p 1 (t) the communication channel will be occupied by a data burst in a time t.The equation is solved by means of Laplace transform: Hence, when the transform is carried out The sought vector p(t) will be obtained after the inverse Laplace transform First, let s.J -A be written in: The inverse matrix: To carry out the inverse Laplace transform, the elements of the inverse matrix shall be decomposed into the sum of particle fractions.Then the basic system of equations (13) acquires the form: The result will be obtained by the inverse Laplace transform: These equations describe the Markov random process with 1 channel in that manner that they give the probability with which the random process will stay in one of both states in an eligible future time t.

2) Correlation and Co-variation Function
The value of the random process X(t) acquires in a time t the random variable X that can acquire only 2 values: x 0 = 0 if a data burst is not present or x 1 = 1 if a data burst is present.Thus, there are 4 various combinations of products to determine the correlation function R(): If x 0 = 0, only the last from 4 articles has a non-zero value.As the random process is also stable and ergodic, it does not depend on the value t, so it can be t = 0, too.Then: The probability P{X(0) = 1  X() = 1} can be get from the formula for the conditional probability: Supposing stability of the random process with probability P{X(0) = 1}, that the communication channel will be occupied at the beginning (in the time t = 0).The conditional probability is equal to P{X() = 1/X(0) = 1}, if the communication channel was occupied at the beginning, and if it will be also occupied after a time , it can be calculated from the equation ( 19), in which p 0 (0) = 0 and p 1 = 1: thereby the equations ( 2) and (3) were used, too.
Then the correlation function R() using ( 22) and (24) will be: The following relationship holds between traffic a offered to a communication channel and their traffic use y: where p 1 is the probability that a channel is occupied in the steady state: Substituting into (24) we obtain: where m is the mean value (mean level) of the random process.Then So the co-variation function will be: Dispersion is:

3) Spectrum
The spectrum of a random process can be calculated by means of the Wiener-Khintchin transform which is the Fourier transform of a co-variation function: Let in the equation ( 32) and  2 is given by (33).Then Fig. 2: Shape, co-variation function and spectrum of the random process.
The data bursts train, the shape of the co-variation function and the spectrum belonging to it are drawn on figure 2 for the comparison.The spectrum on figure 2 c) has a shape of a probability distribution which can advantageously be used for determination of the sampling frequency.

Sampling Frequency
If the spectrum S() is to be considered for a probability distribution, it must be fulfilled: So the distribution will have the shape: which is the Cauchy distribution and it expresses the probability density of occurrence of amplitudes of particular spectral components.Then the elementary rectangle f().. in figure 3 gives the probability with which the components in the interval from  to  +  will occur in the spectrum.All spectral components in the range from - to + occur with the probability P = 1 because the integral from f() in this range equals to 1.
That means if we want to catch all changes in the course of the random process with the total certainty, it must be sampled with the infinite high frequency.Therefore, let the sampling frequency is limited by the frequency  v and then the probability P with which each change will be caught in the random process at this sampling frequency will be calculated: The sampling frequency or the sampling period can be calculated from this equation for the given probability P:

Analysis of Results
The equations ( 41) and (42) give the sampling frequency or sampling interval, respectively in the case data bursts occur totally randomly and their length is totally random, too.It is also the case when the limit situation can occur when data bursts follow consecutively each other and the gap between them is infinitely small (y  1), or the data bursts length is infinitely short (  0), or both cases occur simultaneously.

Practical Application
An example where this theory could be applied is the gigabit passive optical network (GPON) [5], [6], [7], [8], [9].The GPON is an access network where more subscribers (up to 128) share the same optical fibre.The upstream communication between an optical network termination (ONT) on the subscriber side and the broadband node -optical line termination (OLT) on the network side goes through optical pulses bursts called transmission containers (T-CONTs) carrying data streams.It is necessary to hinder overlapping T-CONTs of simultaneously communicating ONTs.Therefore a time window is matched to each T-CONT from OLT.This window defines the T-CONT length and the time instants in which an ONT may send T-CONTs with data.To determine these properly, The OLT either receives and decodes dynamic bandwidth assignment (DBA) reports coming from ONT, or monitors the traffic load on each branch in the optical network.The first method is called the status DBA and the second the non-status DBA.
The status DBA requires exchange of various control information between an OLT and an ONT in headers of data batches.The non-status DBA does not require any information exchange but it is necessary to know how often the traffic status shall be sampled on a network branch.Here the formula (41) or (42) can be helpful.
The GPON bit rate is 1 244,16 Mbit/s in the upstream.The shortest data unit is one 125 s long frame which is performed by 19 440 bytes.The longest data unit in the upstream is 2 16 = 65 536 bytes which lasts 421,4 s.Let´s take the mean value of these 2 figures for the average burst length: Considering P = 0,95, we obtain for sampling interval from (42): The longest sampling interval will be when there is no traffic load (y = 0): It is necessary to take into consideration that the maximum and minimum sampling intervals T v,max and T v,min are calculated on condition that the data bursts occur randomly on the optical fiber and their length is also random in the range from 125 to 421,4 s.
Even the minimum sampling interval of 4 s is 5000-times longer than if each particular bit should be
a will be expressed by means of y from (26) and (27), we obtain:

Fig. 3 :
Fig. 3: Distribution of amplitudes of spectral components in the random process.
avoid collision of 2 consecutive T-CONTs, the guard interval consisting of 32 bits is introduced in the upstream.They perform the guard time 32/1 244 160 000 = = 25,72 ns.During this time, ONT emits no energy into the optical fibre.That means there are 4-byte gaps in the 65 536-bytes bursts.That represents the traffic load of