PROCESSING DATA FOR TIME SERIES ANALYSIS AND INPUTS OF ALGORITHMS FOR AIRPORT SIMULATIONS

This article focuses on the application of given data of flight delays at the airport Košice and its adaptation for further processing. These data were recorded from 2006 till 2009. The airport Košice did not do the data-processing of delays yet. Since the results of this process are useful in the planning or scheduling, we try to establish a methodology for analysing these data. The values of basic statistical parameters for different airlines and different types of flights are shown. We publish their short analysis and commentary. We try to show the problem of prediction of development of these delays at the airport using statistical test.


INTRODUCTION
This article provides the processed data from the Košice airport (KEA) for the years 2006-2009.We show processed data representing share of different types of flights and airlines on the total amount of movement on the Košice airport.The primary objective was analysis of the time deviation of arrivals and departures of flights with respect to the scheduled value.This data can help us to detect approaching problems.Based on these results, the scheduling problem for arrivals and departures of flights at the airport can be better solved.In the section where we analyse data of airlines, we use replacement names AIR-01, AIR-02, AIR-03, and AIR-04.The choice of these four airlines is justified by their share on total traffic at the Košice airport (see Table 1).At the end of the article the results of statistical tests that compare the temporal deviations for the years 2006-2009 are presented.These tests tell us, that the prediction based on the given data can be very difficult, even impossible.Despite these negative results, the data were analysed using time series (see articles [4,5]).To confirm some of the conclusions it would be necessary to conduct a detailed analysis of the problem.Unfortunately, for this we have no space in this article.

WHAT DATA WE HAVE AVAILABLE?
The data which is processed in this article has been obtained from the Košice airport.These data cover the period of 48 months from January 2006 until December 2009.We received a rather large amount of data that has undergone initial processing.We have an average of 10 000 rows of data each year, that is about 40 000 lines of data.Each line contains at least 20 data on the performance of an aircraft's arrival or departure.The initial data processing and formal treatment of data for next analysis was done and partially published in articles [1][2][3][9][10][11].In these works, the process of preparation of data has been described, which leads to obtaining the primary indicators of delays at the Košice airport.Also the basic parameters of realized flights at the Košice airport have been published.The initial analysis forms the basis of our research in this area.We based this research on already processed and provided basic data and also on already published basic parameters of these large data sets.
Based on this information, we had to decide which data will be analysed further and will need to undergo further processing and subsequent analysis.The basic data set provided by the airport also contained data that was incorrect, ambiguous, or required additional modification.All such modifications were made using filtering and sorting.An example of the ambiguity in the data was the value expressing the actual arrival time and departure time of the aircraft at the airport.The air plane of airline AIR-01 has landed with respect to the flight schedule at the airport on 12. 12. 2008 at 23:35, but in fact the flight was delayed 45 minutes and air plane landed at 00:20.In the database of flights is just the time of arrival 00:20.In fact this time belongs already to the day 13.12. 2008 and not 12. 12. 2008.From these data, it is not clear whether the flight was 45 minutes late or the arrival was 23 hours and 15 minutes sooner than scheduled.This article will focus only on the data collection, processing, treatment, and analysis using basic statistical tests of hypotheses.This analysis reveals more about these data and because of their treatment partial comparison between the years from 2006 to 2009 will be possible.
Our work is built on the articles [9], [10], and [11], describing the basic characteristics and parameters of the recorded data.In these, process of modification of the data and its further processing is described.Given the extensive information, this article is limited only to those parameters that are immediately needed for our analysis.
Similarly, we take basic information from [1][2][3] where are already published the results of processing, referred to work [9].These data will serve as input for our analysis and we can also use them for comparison of our obtained results.
At the end of this article we will try to suggest the direction of further work in this field.The scope for research in this field is quite large, because it is possible to examine whether the economic crisis has an impact on the development of parameters and how to effectively predict the trends based on the actual trends of monitored parameters (see [7]).An interesting analysis of these data using time series is presented in the work [6] and articles [4,5].The Unauthenticated | 194.138.39.60 Download Date | 1/15/14 3:07 AM results of this work are applicable for deciding management of airport for planning of flights and activities at the airport.The data obtained can be used as inputs for the queuing theory or scheduling.

USED METHODS
During processing of the data we used the software: MATLAB 2009b, MS Excel 2003, MS Excel 2010, and QtOctave, respectively.MATLAB and QtOctave were used mainly for analysis and testing of processed data and MS Excel for the processing and selection of appropriate data for processing and analysis.Also results published in this article, are consistent by format with these program packages.
Firstly, we try to describe the basic methods that were used in data processing in the next selection.Next, we describe the use of statistical hypothesis testing to assess the quality of data for prediction.

Data processing and selection
As the basic data set we took the edited file mentioned in [1][2][3][9][10][11].This data has already undergone initial treatment and was checked for the dumb data or data that would lead to errors.
For processing and selection of appropriate data we used filters.Based on these initial filters, we found anomalies in recorded data and we were trying to identify the reason of them.If it was possible to correct them, then they were corrected immediately.If not, the anomalies were marked and later adapted so that they were usable in the calculations.
Similarly, we looked at the data that we showed in different forms in the PivotTable.These tables reveal us what is the structure of records and also show suitable candidates for further processing and analysis.We also used these tables as specific filter, or we used them to detect the frequency of disagreement in each class, we were looking for the causes of these disagreements and we tried to correct them.It could be that we could not fix the recorded value, in which case we deleted it from the list and the calculations were done without the deleted record.There was only minimum of such cases with respect the amount of the records.

Testing and data analysis
In this section we take as the population already edited file using the methods described above.This file is significantly smaller and more specific for further analysis and testing.
For analysis and testing, the data was downloaded from MS Excel worksheet to the program MATLAB (or QtOctave), and we used statistical functions ttest and vartest2 of MATLAB Statistics toolbox for testing and validation of parameters of retrieved data.

BASIC CHARACTERISTICS OF PROCESSED DATA
Data were restricted to airlines listed in articles [1-3, 10, 11].We used the labels AIR-01 up to AIR-04.Similarly, we marked the types of flights: scheduled domestic flights (SDF), scheduled international flights (SIF), nonscheduled domestic flights (NSDF), non-scheduled international flights (NSIF).
We  For a better understanding of the actual structure of the files we created Tables 1 and 2 3 and  4. Table 3 describes the situation for the arrivals and Table 4 for departures.
In Fig. 3 is shown graph of frequencies for each occurrence time deviations (departures) of airline Air-02 in the year 2009.It is a nice example of the structure of data and it confirms that the expected mean value is zero.It can also be seen that most of the values are within the range from −10 to 16 minutes.Special value of zero represents nearly a 50 percent share of all deviations recorded for airline AIR-02 for departures during the year 2009.The nice shape of histogram is confirmed by the numerical values of statistical parameters, and this are the mean value and variance, which are listed in the Table 4.
In tables we can observe the temporal evolution of the average deviations for arrivals and departures at the Košice airport from 2006 until 2009.These are values for years preceding the global economic crisis.Interesting are figures for regular and irregular lines and comparison of them with the total number of lines at the Košice airport.Table 3 shows only three negative values in the third row in 2006 and 2008 and first row in 2009.This indicates that most airlines were landing (or departing) on time, and if not, they produced rather delay on arrival (or departure) than they arrived (departed) earlier.We can notice that between 2006 and 2007 there was a significant decrease in the variation and the mean value of delay, and this trend continued in following years.We can notice that between 2006 and 2007 there was a significant decrease in the dispersion and the delay, and this trend continued in following years.This means that since 2007 has significantly improved discipline of airlines in complying with flight plan.
In Table 4, showing the values for the departure, is not even a negative value except for the first row in year 2009, which corresponds to the fact that the aircraft would not leave earlier than planned to have a plane in flight.It is an interesting development of variations in the average time each year, together with the evolution of variances.A more detailed analysis of individual rows of table requires much more space, so it is not shown here.Nevertheless, it is easy to see that the values in the table of departures are better than values in the table of arrivals (variance values are significantly lower).It means that compliance with the schedule of departures for the airlines is simpler and more easily achievable.
Even more interesting are the values of those airlines that are divided into the national and the international and also scheduled and non-scheduled.The results are shown in Tables 5 and 6.From these tables we can see quite a big difference between scheduled and non-scheduled flights.Similarly, we observed differences between the arrivals and departures.The question is how to interpret these differences, and where to find the causes of these deviations.Scheduled flights are significantly different from non-scheduled lines because these flights are mostly charter flights during the holiday period.They often do not comply with the flight plan.Also there are private flights whose complying with the flight plan is more formal than real.

PROCESSED DATA TESTING
This section describes how to check behaviour of different sets of data files for different analysed years.As first, it is necessary to test files whether they meet the conditions for using statistical tests of hypotheses and then describe the tests and their results.Given the limited space of this article, we shortly describe the verification of individual results and focus mainly on the description and interpretation of the tests carried out for individual files.
From all possible tests we carried out one sample test for mean values and variances.Then we conducted series of two-sample t-tests of the mean value.At the end we did the multi-dimensional test of equality of mean values.All these tests of statistical hypotheses were conducted on a 5% significance level (i.e., α = 0.05).All results are presented in overview tables.
Table 7 shows the results of the two-sample tests for mean values between sets of time deviation of arrivals and departures for each year.We can see that all of the tests have given a negative answer on question, whether we can at a significance level of 0.05 consider average time deviation of arrivals and departures to be comparable across years 2006-2009.Due to fluctuations in the mean values in each year and due to the results of tests, it can be assumed that the prediction of time variations will be very difficult, or even impossible.This analysis is essential for development and testing of time deviations of flights at the airport and analysis by time series which can be represented by months, weeks or days of the week.Such data processing and results are ideal for time analysis, which is described in [6], [7], and [8].Time series analysis should be the target of all of these partial results of research described in this article well as in the aforementioned articles.The results of analysis of the time deviations are very important source of information for the management of Košice International Airport.This model of analysis is of course also applicable to other airports and the results could be compared between different airports.
Even more interesting is the analysis of extended data starting from the global economic crisis in 2009.It is interesting to follow developments of the time deviations in 2009, compared with the years before the crisis (from 2006 to 2008) and to try to describe the reasons for any differences based on the results obtained (see [6,7]).On the other hand, the time evolution of these deviations should theoretically not be affected by the crisis, but by weather or unforeseen event.We need to suggest that this conclusion is non-standard behaviour.More research in this area will be necessary to clarify the hidden patterns of temporal anomalies in arrivals and departures at the airport.This could be later effectively used in planning and management of the airport.
Further direction of research in this area can be therefore divided into main lines.One line should address expanding the basic set of statistics for more years, or extend the set of data from other airports.The second line should analyse these data in more detail and depth.
can see the evolution of the means of the deviations for departures of the airline AIR-02 in the years 2006-2009 in the Fig. 1.The real situation for flight deviations we can see on the Figs. 2 (Departures) and 4 (Arrivals).

Fig. 1 Fig. 2 Fig. 3 Fig. 4
Fig. 1 Mean values of deviations of departures for AIR-02 in the years 2006-2009 . These tables show the share of selected companies on the overall movements of the Košice airport and their mutual share in arrivals and departures during the years 2006-2009.In 2006 they made 7 166 arrivals and departures, in 2007 9 329 arrivals and departures, in 2008 10 340 arrivals and departures, and in 2009 9 636 arrivals and departures.The first column in each year is the relative percentage within the groups AIR-01 up to AIR-04 (together 100%).The second column in each year represents a percentage of movements of the above airlines with respect to all movements at the airport in given year.The values in the last row (under the table) show why we have chosen these four airlines.They have conducted majority of movements at the airport.Basic characteristics are related to deviations from the planned arrival and departure times, respectively.Later departure or arrival time (called delay) is represented by positive values and the earlier arrival or departure is represented by negative values.The ideal situation is zero expected value with minimal dispersion.Actual mean values in minutes and corresponding variances are shown in Tables

Table 1
Share of airlines AIR-01 to AIR-04 on mutual and total air traffic at Košice airport

Table 2
The share of different types of flights and the total air traffic at the Košice airport

Table 3
Basic characteristics for time deviation of arrivals at Košice airport (in minutes)

Table 4
Basic characteristics for time deviation of departures at Košice airport (in minutes)

Table 5
Basic characteristics for time deviation of arrivals at Košice airport (in minutes)

Table 6
Basic characteristics for time deviation of departures at Košice airport (in minutes)

Table 7
Two-sample tests of equality of mean value of time deviation of sets of arrivals and departures