TREE-ORIENTED DATA ACQUISITION ARCHITECTURE

Hierarchical system of network nodes is suitable solution how to collect and how to pre-process data from large amount of end-nodes. By contrast to flat (one layer) architecture there are special intermediary nodes used and they are called summarization nodes. These special nodes have to be suitably placed in the network to enable efficient data collection and their number in the hierarchy is one of the key parameters of the architecture. The article deals with the tree architecture design, with its optimisation and with the problem of limited number of summarization nodes.


Introduction
There are several ways of data collection and processing in the network environment.First model is called centralized, where there is only one data centre where all pieces of information are collected, processed and available.The second one is called hierarchical system created by a tree of servers with local data and using links among them the requested information can be found through the tree hierarchy.The third model is distributed model where the pieces of information are distributed among equivalent data centres and using one sophisticated directory services with one account requested information can be obtained.Many applications are using plain centralized model.There is no problem with data acquisition and with data processing provided the number of data sources is fairly low and data flows are weak and low frequent.When these conditions are not fulfilled either the centre itself or data links to the centre can be overloaded or allowed data transmission frequency is very low.When the data acquisition is auxiliary procedure of the service, the available bandwidth for such procedure is strictly limited and the situation becomes even worse.This is the case of applications like IP-TV where the main procedure of the se rvice is the multimedia streaming using R TP protocol and the multicast transmission and the session quality parameter collection using RTCP protocol is an optional though useful supplementary service [1], [2], [3].The transmission capacity of RTCP is limited for 5 % of total service bandwidth and it causes large delays in sending RTCP (feedback) data from each receiver for large-scale media streaming services based on Source-Specific Multicast (SSM), [7].Similar problem arises also with other applications focused on data acquisition in the case of large-scale systems.

Hierarchical Data Acquisition System
To combat the problem the hierarchical system for data acquisition has been proposed in [5], [6] and modified in [7].In addition to the data centre and data sources such tree contains special nodes called summarization nodes, see Fig. 1.
The data is periodically sent from data sources (terminals or sensors) to assigned summarization node.The summarization node aggregates data from a group of terminals of the size n B and again periodically sends to assigned summarization node at the higher level.The summarization nodes are also organized into groups of size n S .Structure of Receiver Summary Information (RSI) message was specified in [5].The message includes sub-report blocks (SRB) that contain distribution information about particular features like a packet loss or a jitter.
To enable efficient transmission of information about the session from the data centre to the terminals an extension of original RTCP specification in the form of Extend Report (XR) message had to be adopted [3].The XR RTCP summarization packet consists basic information for the terminals mainly how to calculate the message transmission period.In the case of SSM (Source-Specific Multicast) service the message is sent in multicast manner so that together with the summarization method it decreases the overhead and saves the bandwidth.Optionally in addition to the summarization process (when detailed information is lost) the summarization nodes can store detailed information obtained from terminals or lower level summarization nodes for some time period to allow the data centre to get detailed information about particular terminal or group of terminals when necessary.
Because of large group of terminals division into a big number of smaller groups the bandwidth restriction is not the problem and the message transmission period of t h e t e r m i n a l s r e m a i n s f a i r l y l o w e v e n i f t h e o v e r a l l number of terminals rises.Especially this is the case of multimedia multicast sessions which can vary substantially in size.The overall delay that is bounded by the time instant when data is generated (or measured) in the terminal (sensor) and by the time instant when the data is received in the data centre consists of particular transmission delays between transmission instants of adjacent layers in the tree.The situation is depicted in Fig. 2: W h e n t h e t r e e c o n s i s t s o f I layers, i.e. (I-1) summarization layers and one terminal layer, a formula for the overall delay T R between data generation (measurement) and its reception in the data processing centre can be derived: provided the transport delay through the network is neglected.Variable t MT is the delay between measurement (data generation) and transmission instants and t i is the delay between summarized message transmission instants at linked summarization nodes in adjacent layers.The worst case for the delay will be when all summarization nodes at all levels of the tree and also the terminals (sensors) are synchronized, i.e. all of them transmit messages almost at the same time instants.Then the formula (1) will convert to 1 R T W R RR 1 Provided the transmission periods are the same through whole tree the formula (2) changes into the form ( ) where T RR is the transmission period of the group of terminals (it depends on the number n B of terminals in the group, message length and the allocated bandwidth, [7]), I is the number of levels in the tree (it depends on the total number of terminals, on the number of terminals in the group and on the number of summarization nodes in the group) and T SR is the message transmission period of the summarization node group (it depends on the number n S of summarization nodes in the group, summarization message length and the allocated bandwidth, [7]).Now several problems come out.First group of problems are how to manage the tree when the number of terminals rises or declines, how to keep it in balanced form and how to minimize the total delay specified by (3) .In addition to this the problem of the number of required summarization nodes should be addressed.At the beginning of our research we considered that the summarization nodes are only terminals with special functionality [7].It was found that there would be lot of overhead with the management of such tree especially when the tree is variable in a large extent, i.e. the terminals will enter and leave the session frequently; this is the case of multimedia streaming sessions.Also this functionality would require additional power and energy that is unwanted issue especially in the case of wireless terminals (sensors) with very limited computational power and energy.Therefore in later research ( [8], [10]) the summarization nodes are considered as special nodes (or software modules) that are managed by the service provider.Such summarization nodes have higher computational power, larger storage capacity for temporary data and fixed location.This last feature is very important when tree structure is established according to the location of terminals, [10].

Tree Optimization
When the service provider intents to implement a service based on the tree architecture described above before implementation some initial conditions have to be considered: bandwidth (or maximum data flow) allocated for the data acquisition BW A (it will be allocated for each group of terminals or summarization nodes), expected number of data sources (terminals) n T , maximum period (or delay) of data collection T Rmax , length PL RR of plain messages generated by the terminals and the length PL SR of summarization packets generated by summarization nodes.Additional constraints can be: maximum overall number of available summarization nodes N STmax , minimum periods of message transmission in a group of terminals T RRmin and in a group of summarization nodes T SRmin and some others.The goal is to find such tree that meets all of these conditions and restrictions.Equation (3) shows how to calculate the largest overall delay T R (and also the maximum time period of data acquisition) between data generation (measurement) in terminals (data sources) and its reception in the data processing centre.It can be worked out in more detailed form: where I is a number of tree levels, t R is a time interval consumed by one message send by a terminal and t S is a time interval consumed by one summarization message generated by summarization node, n B is the number of terminals in one group of terminals, n S is the number of nodes in the group of summarization nodes (the rest of symbols are explained in the text above).
The number of levels with summarization nodes in the tree, i.e. the value (I-1), can be calculated from the condition As I is an integer number the nearest higher integer will be The worst-case total delay values obtained from the optimization process with continuous function are quite close to and always a better than when discontinuous function is considered (due to the fact that I 1 h ³ ), see Fig. 4: The goal of optimization is to find its global extreme (minimum) in this region.Global extreme can be located either in local extremes of the function or at the boundary of definition domain.The function is continuous in whole region and smooth, therefore first and also second derivatives can be calculated and stationary points of the function can be found: ( ) Stationary points are the candidates for local extremes and they can be calculated from the conditions that first derivatives (10) When the results ( 12) and ( 13) are used in (14) we get: The inequality D 1 > 0 i s a l w a y s m e t a n d t he inequality D 2 will be fulfilled when R T e n t t S > . (16) Again in the example of RTCP presented in [7] the length of receiver report PL RR was 736 bits and the length of summarization report PL SR was 11296 bits.In the case when the same link bandwidths are assigned both to terminals and summarization nodes (16) has the form Then the absolute minimum will be reached for the smallest n S = n Smin and for n B = n Bmin .

Summarization Nodes
The service provider has no unlimited number of summarization nodes available and therefore the overall number of required summarization nodes N ST in the tree hierarchy is also very important parameter and should be optimized.The total number of summarization nodes can be calculated as follows (see Fig. 5:): where I is the number of levels in the tree, N Si is the number of summarization nodes at the level i, n S is the number of summarization nodes in one group and N S(I-1) is the number of summarization nodes at the level I-1.
The parameter n S is known and therefore the task is to calculate the variable N S(I-1) .As shown in Fig. 5:the terminals (sensors) are connected to the summarization nodes at two layers, (I-2) and (I-1) respectively.These summarization nodes can be called summarization endpoint nodes N SE and its total number N SET (T = total) can be expressed by formula SET SE( 2) SE( 1) where N SE(I-2) and N SE(I-1) are endpoint summarization nodes at levels (I-2) and (I-1) respectively.As the (I-1) layer is the last layer of the summarization nodes it is clear that N S(I-1) = N SE(I-1) .The parameter N SET is an integer figure and it can be calculated by the equation where n Bmax is the maximum number of terminals in one group.
To obtain the total number of required summarization nodes N ST it is necessary to calculate N SE(I- 1) .To get this parameter we need N SE(I-2) first.When new terminals are to be added and the current tree is not sufficient, additional layer of summarization nodes has to be added.An appropriate number of summarization nodes x that will loose the terminals for next-layer summarization nodes (where the maximum of summarization nodes therefore can be x*n S ) can be calculated from (23): The closest larger integer is Then the total number of required summarization nodes is: When required parameter n Bmxm obtained from (27) is smaller than optimum parameter n Bopt , the point (n Bopt , n Sopt ) will be used as the best value for the number of terminals in one group and for summarization nodes in one group respectively.Otherwise the new optimum value n Bopt larger than n Bmxm and n Sopt will be searched.The results are separately compared in Fig. 6: and in Fig. 7:

Conclusion
This article dealt with the problem of hierarchical data acquisition.The process of tree design was presented and some problems related to it were addressed like minimization of the total acquisition delay and the limited figure of summarization nodes.The delay optimum was found and tree parameters were derived.Influence of limited number of summarization nodes was considered and proved by simulations in Matlab environment.Separate paper will address the problem of end nodes (terminals) organisation according to their localities.

Fig. 2 :
Fig. 2: Time instants of data generation and message transmissions in hierarchical data acquisition system.

Fig. 5 :
Fig. 5: Tree for data acquisition with highlighting of last three levels.
25) shows that the overall number of summarization nodes is mainly influenced by total number of receivers n T and by the number of terminals in one group n Bmax .Parameter n S does not have big impact on the N ST when s n >>1.The rest of the formula (25), i.e. expression l b e a l w a y s < 3 ( e v e n w he n S 2 n = , which is the smallest value, and both x 1 h ® and E 1 h ® ).Then it holds that has N STmax summarization nodes available we can derive the minimum number of terminals in one group n Bmxm from (26) and we get

Fig. 6 :
Fig. 6: The courses of required number of summarization nodes for different numbers of terminals (nT) without a and with respect to the demand on NSTmax = 1190.

Fig. 7 :
Fig. 7: The courses of minimum delay for different numbers of terminals (nT) without and with respect to the demand on NSTmax = 1190.
and (11) are put equal zero and the results are: RRmin and in a group of summarization nodes T SRmin are required then minimum values n Bmin , n Smin are set: