Swedish civil air traffic control dataset

The Swedish Civil Air Traffic Control (SCAT) dataset consists of 13 weeks of data collected from the area control in Sweden flight information region. The dataset consists of detailed data from almost 170,000 flights as well as airspace data and weather forecasts. The flight data includes system updated flight plans, clearances from air traffic control, surveillance data and trajectory prediction data. Each week of data is continuous but the 13 weeks are spread over one year to provide variations in weather and seasonal traffic patterns. The dataset does only include scheduled flights not involved in any incident reports. Sensitive data such as military and private flight has been removed. The SCAT dataset can be useful for any research related to air traffic control, e.g. analysis of transportation patterns, environmental impact, optimization and automation/AI.


Specifications
Engineering: Aerospace Engineering Specific subject area The dataset consists of detailed data of almost 170,0 0 0 flights, weather forecasts and airspace data from the Swedish air traffic control system. Type of data JSON files Tables (for the data format specification) How the data were acquired The data was acquired by extracting information from the Swedish air traffic control systems. Data format Transformed Filtered Description of data collection The dataset includes surveillance data, air traffic controller input, flight planning, trajectory prediction, airspace and weather data from Swedish area control. Data source location Institution: LFV (Swedish air navigation service provider) Country: Sweden (Swedish Flight Information Region)All the data in this dataset originates from data recorded by the TopSky air traffic control system [1] , however some of the data in TopSky originates from other sources: • Surveillance data originates from the ARTAS system [2] which is a multi-sensor tracker, which gets its data from radar stations and wide area multilateration sensors in Sweden. • Weather forecasts originates from the world meteorological organization [3] in London. • Flight planning data originates from the flight plans submitted by pilots/airline operators in their respective countries, sent via the aeronautical networks AFTN/AMHS to TopSky. Flight plan updates are also sent from neighboring flight information regions using the OLDI protocol. • Airspace data mostly originates from Aeronautical Information Publications (AIP) made in relevant countries, but alterations and addition may be done to adapt the information to TopSky.
Data that originates from these other systems may have been modified by TopSky before being recorded. Data accessibility Repository name: Mendeley data Data identification number: DOI: 10.17632/8yn985bwz5.1 Direct URL to data: https://data.mendeley.com/datasets/8yn985bwz5

Value of the Data
• A key challenge in the research of air traffic control is that there is a lack of publicly available data [4 , 5] . The main reasons is that data in this domain in many cases is classified and/or rely on proprietary software systems and data formats. • This dataset is unique in its kind. Other publicly available sources of air traffic related data exists, e.g. OpenSky Network [6] and ADS-B Exchange [7] , however these sources are limited to ADS-B data and are as such lacking the comprehensive coverage of the full information related to each flight as presented in the this data set. • This dataset can be useful for any research related to air traffic and air traffic control, e.g. automation and support tools for air traffic control, environmental research and airspace optimization.

Objective
There is currently a lack of high-quality open datasets for research around air traffic control and air transportation. The main objective behind the SCAT dataset [8] is to enable in-depth analysis and research in the context of aviation. We foresee that SCAT can be used in the research and development of future AI and machine learning based tools for air traffic control.

Data Description
The Swedish Civil Air Traffic Control (SCAT) dataset [8] contains detailed data of almost 170,0 0 0 flights, weather forecasts and airspace data from a perspective of air traffic control. The data originates from the air traffic control systems at the two control centers, Malmö (ESMM) and Stockholm (ESOS), which provide upper area control in the Swedish flight information region (FIR). The data is organized in 13 compressed archives in ZIP format, each containing one week of continuous data, see Table 1 . The data has been filtered and processed as described in the "Experimental design, materials and methods" section.
All files inside the archives are in the JavaScript Object Notation (JSON) format [9] . Time stamps are in UTC time and represented as strings in ISO 8601 format without explicit time zone (e.g. 2017-0 6-0 6T13:45:10.362). Properties without values may be null or left out, depending on the source for the data. To reduce the number of tables needed to document each object type, several object types may be represented in one table. In such case indentation and the sign • is used in front of the property name to indicate that it is a property of the object at the previous indentation level.
Each archive contains several flight files, which are files named with the unique id number given to the flight during data extraction, e.g. 101234.json . Each file holds all the data related to a single flight such as: • The sequence of control centers controlling the flight.
• Data related to flight planning, coordination and clearances from the air traffic controllers.
• Surveillance data from the system tracker which process' the information from multiple radar and wide area multilateration sources into a single traffic view. • Data from the trajectory prediction subsystem in TopSky which repeatedly makes updated predictions of the future flight trajectory.
The format of the top level object in the flight files is described in Table 2 and its contained types are described in Tables 3-11 . Fig. 1 shows three example visualizations where the trajectories of flights in Swedish airspace from different dates are illustrated.
In each archive there is also one file named airspace.json that contains coordinates for all named points as well as all the extents of the control sectors for each of the centers. The format of the airspace file is described in Table 12 . The airspace is valid for the entire week of data since the dates was chosen such that there were no configuration changes.
Finally, each archive contains one file named grib_meteo.json . The contents in this file are wind and temperature predictions used by the air traffic control system. This data originates from World Meteorological Organization (WMO) in London. Predictions are made every third The time when the control centers is deemed the most relevant data source for the current flight. Fpl object Flight plan related information, see Table 3 . Id number Integral number with the flights unique id (corresponds to the file name). Plots [object] Array of plots from surveillance system, see Table 10 . predicted_trajectory [object] Trajectory predictions from air traffic control system, see Table 11 Table 3 The properties of the flight plan object.
Property name Type Description

fpl_arr [object]
Array of flight arrival information sorted by time_stamp , see Table 4. fpl_base [object] Array of basic flight plan information sorted by time_stamp , see Table 5 . fpl_clearance [object] Array of given clearances sorted by time_stamp , see Table 6 . fpl_dep [object] Array of flight departure information sorted by time_stamp , see Table 7 . fpl_holding [object] Array of holding information sorted by time_stamp , see Table 8 . fpl_plan_update [object] Array of flight plan updates sorted by time_stamp , see Table 9 .
string Time stamp when the information was updated.

Table 7
The properties of a flight plan clearances object.  hour for each cell in the grid. Each cell is 1.25 °in size in both latitude and longitude direction and divided into 13 height bands from flight level 50 (50 0 0 ft) to flight level 530 (53,0 0 0 ft), see Table 13 . The data is in the form of an array sorted by time, longitude, latitude and altitude. Example of code to use this dataset is made available on GitHub [10] . At the time of writing there are three examples, one tool to index the flights contained in each zip archive and two tools to convert the data into Keyhole Markup Language (KML) for visualization. string Time stamp when the information was updated.

Table 10
The properties of a plot object. This data is converted from Asterix cat 62 [11] , and the property names corresponds to the name in the Asterix specification. • ag_hdg number Magnetic heading in degrees.
• subitem6 object Selected altitude. From either the FMS, the Altitude Control Panel, or the current aircraft altitude.
• altitude number Selected altitude in feet.
( continued on next page ) • altitude number Selected altitude in feet.
• am bool Approach mode active.
• mv bool Managed vertical mode active.
• baro_vert_rate number Barometric rate of climb/descent in feet per minute (negative values indicates descent).
• ias number Indicated air speed in knots.
• mach number Mach number.

Table 11
The properties of a predicted trajectory object.
Property name Type Description

route [object]
Array of predicted route points. • eto string Estimated time over point.
• fix_kind string Fix kind is a short text from the system describing the type of point.
• fix_name string Name of fix, if it is a named point, or coordinates as a string in degrees and minutes.
• is_ato bool True if the aircraft has passed this point, false otherwise.
• point_to_be_used_as_cop bool True if the point is to be used as sector coordination point.

Table 12
The properties of the airspace file.
Property name Type Description
• points [object] Array of navigation point objects.
• name string Name of point.

• sectors [object]
Array of air traffic control sectors.
• name string Name of sector.

• volumes [object]
Array of the volumes the sector consists of.

• coordinates [object]
Array of coordinates of the lateral boundary of the volume.
• max_alt number The maximum altitude of the volumes extent.
• min_alt number The minimum altitude of the volumes extent. • lat number Latitude in WGS-84 coordinates.
• lon number Array of navigation point objects.
• temp number Temperature in degrees Celsius.
• time string Time stamp.
• wind_spd number Predicted wind speed in knots.

Data Sources
The data in the SCAT dataset originate from the air traffic control system TopSky [1] , used for area control in Swedish airspace. TopSky continuously records various system data and technical logs, and stores it for up to three months. To store data for longer periods LFV uses KOOPA, a system developed in-house, that collects and stores the data in its original proprietary raw format. KOOPA also parses and stores the most commonly used data in a database to make it more accessible. Most data in the this dataset was extracted from this database except for trajectory predictions, weather data and some additional fields in the surveillance data that was extracted from the raw data. Since there are no standard formats suitable for the data in this dataset it was transformed into JSON.

Data Selection
The data was extracted in continuous one week time-blocks to capture the variation between weekdays as well as variations due to the time of day. To capture seasonal variations in weather conditions and traffic flow, the extracted weeks were e spread as evenly as possible over one year, see Table 1 . The time periods were select with the following limitations in mind: • To get a representative traffic sample we avoided to sample data from any time period with events that had a major impact on the traffic patterns, such as ash clouds from volcanoes or pandemics. • To keep the data consistent we avoided any year with an update of the air traffic control systems that changed the format and/or content of the data. • Any weeks with system configuration changes or system downtime due to maintenance were avoided in order to get continuous data with a single airspace configuration.

Data Extraction and Processing
For each selected week, all flight plans and radar plots, for public flights (see Data filtration below) passing Swedish airspace, were extracted from the KOOPA database using the individual flight plan identity code (IFPLID) as a unique identifier. Since both centers (ESMM and ESOS) track information on flights outside their respective control area many flights were represented in the data from both centers. In order to avoid duplication of information for each flight with various levels of completeness and correctness, only the data from the most relevant center were kept at all points in time. For flights controlled by both centers, a transition time was calculated as the average time of when control was released from the first center to the time when control was assumed by the second center. Data with time stamps before the transition time were then extracted from the first center in control, and the data with time stamps after the transition time from the second center. The average time was selected as a reasonable time stamp for hand-over for flights not traveling directly from the first center to the other, e.g., for example flights passing through uncontrolled or foreign airspace in between the centers. Trajectory prediction data and additional surveillance data (I062/380 Aircraft derived data) for each flight were then extracted from the raw data in KOOPA. As a last step airspace data and weather data were extracted.

Data Filtration
Due to regulations, LFV may only publish data on scheduled commercial flights not involved in any investigation or emergency, and the data in this dataset have been filtered accordingly. For example military and other state flights as well as general aviation (private flights) have been removed. Any publication of surveillance data (radar plots) outside of Swedish airspace are also prohibited and were therefore filtered out. A small number of the remaining flights were removed for other reasons. Flights missing an IFPLID were removed since this information is required in order to correlate flights between the two centers. Flights crossing the boundary between ESMM and ESOS more than once were removed since manual efforts would be required to sort out the most relevant data for each data type. Finally, flights that had a radar track of less than 30 s in Swedish airspace were removed since they were not regarded as useful.

Data Validation
As a first step, this dataset was validated during the extraction by logging values and comparing to the expected boundaries for applicable fields, out of bounds values were manually compared with the content in the original data. After the extraction, the data were compared to the content in the KOOPA database using a separate software that loaded each JSON file and compared its content with the database. Manual validation was performed on 100 randomly selected flights from each week of data, in total 1300 flights, by converting the data to KML and visually inspecting the content using Google Earth. As final step of validation we have developed a visualization tool in which we load the data and can visualize its different properties. Using the visualization tool the structure and integrity of the dataset has been validated by ocular inspection and filtering such that different aspects of the data can be checked for inconsistencies and errors. The data collected by KOOPA is validated by LFV as part of the normal system maintenance.
Even though this dataset has been subjected to extensive validation, it is important to realize that the original raw data is not free from errors. For example there are sometimes errors in the flight plans that are corrected by the air traffic controllers if and when they are detected. Air traffic controllers may make mistakes when entering values into the system or use the system in such a way that a clearance does not correspond to what actually is happening. Pilots also make mistakes and do not always not fly according to given clearances. No effort was made to identify, filter out or correct any such errors in this dataset since it is a part of normal operation, and removal of such errors would impede the analysis of realistic scenarios.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics Statements
Informed consent to extract and publish this dataset was obtained from LFV who owns and operates the two air traffic control centers from which the data in this dataset originates. Any sensitive data has been removed from this dataset and the LFV data redistribution policies were complied with.