The Process of Data Validation and Formatting for an Event-Based Vision Dataset in Agricultural Environments

– In this paper, we describe our team’s data processing practice for an event-based camera dataset. In addition to the event-based camera data, the Agri-EBV dataset contains data from LIDAR, RGB, depth cameras, temperature, moisture, and atmospheric pressure sensors. We describe data transfer from a platform, automatic and manual validation of data quality, conversions to multiple formats, and structuring of the final data. Accurate time offset estimation between sensors achieved in the dataset uses IMU data generated by purposeful movements of the sensor platform. Therefore, we also outline partitioning of the data and time alignment calculation during post-processing.


I. INTRODUCTION
When it comes to visual sensing in robotics, conventional frame-based cameras have been an industry standard for decades [1]. However, as with many significant inventions, it is often beneficial to seek inspiration from nature. In recent years, a new camera technology inspired by the working principles of the retina has emerged, promising significant improvements in key computer vision weaknesses [2]. A dynamic vision sensor (DVS), also known as an event-based camera, generates singlepixel light intensity change events instead of full frames. Each event contains information about x and y coordinates of the pixel, a timestamp, and the light intensity change polarity. The asynchronous nature of the DVS allows it to achieve microsecond-level latency as an event is sent out almost instantly. Furthermore, event cameras have a high dynamic range, low power consumption, take up less bandwidth, and require less processing power than standard cameras. DVSs generate next to no events at a standstill and are less affected by motion blur than frame-based cameras [3].
Since event cameras are a novel technology, most modern algorithms have not yet been adapted for the asynchronous nature of the DVS sensors [4]. Event-based cameras are expensive or difficult to obtain. Therefore, several datasets have been created for researchers to develop and test event-based algorithms. Datasets are useful since they provide a reproducible environment, so the input data do not change between algorithm updates and iterations. Also, researchers do * Corresponding author's e-mail: Maris.Galauskis@rtu.lv not necessarily have to invest in expensive equipment to test out their ideas, allowing them to iterate quickly and effectively and reduce the potential cost of research in event-based vision.
There are multiple event-based and conventional computer vision datasets available online. Among the more popular ones, along with the general description of the datasets in their accompanying papers, researchers mainly concentrate on the sensor calibration process [1], [3]- [6], time synchronisation [5], [7], and ground truth generation [4], [8]. In contrast, the data preparation, processing, and validation processes are not so extensively discussed in research papers; the few available articles on the topic include [7] and [8].
Our team has created a dataset for researchers interested in agricultural environment event-based camera data or eventbased data in general [9]. The dataset includes DVS, LIDAR, RGB, and depth data recorded during remote-controlled wheeled platform movements in various agricultural scenes.
Creating such a dataset involves several tasks: designing a time synchronisation method between sensors, designing and building a recording equipment bundle, scheduling various locations for recording, and post-processing the obtained data to ensure their quality before making them available publicly.
In this paper, we describe our team's approach to dealing with the last step of dataset creation, i.e., data processing. The need for human input for actions like visual validation of the data and, specifically in our case, the time offset calculation between the sensors hinders a fully automated workflow. Therefore, we created a semi-automated solution involving both human input and automated software.

A. Description of the Dataset
The Agri-EBV-autumn dataset [10] contains data sequences acquired in various agricultural environments. Each of the data sequences includes 1-4 min of the actual recording of the environment with OS-1 LIDAR, DVS240 event-based camera, as well as Realsense (RS) RGB and depth cameras. A sample frame with data from each of the sensors is shown in Fig. 1. We also provide metadata with recording visualisations and supplementary data subsets that offer additional data for sensor calibration and time alignment. The dataset is intended primarily as a source of DVS event data with supplementary visual data from the RGB camera and depth information from the LIDAR and the depth camera. Additionally, basic information about the environment is recorded, such as temperature, humidity and air pressure. IMU data of the LIDAR, Realsense, and DVS are used for time alignment estimation between the sensors. All the recorded sequences use trajectories with loop-closure to allow for evaluation of SLAM algorithms. The initial version of the dataset consists of 21 data sequences recorded in autumn. The complete dataset will include sequences for all four seasons. We also provide all the recorded data in rosbag format for the convenience of ROS users.

A. Overview
The general outline of the data processing is illustrated in Fig. 2. We tried to automate and batch-process most of the operations since there are 21 recordings in the first Autumn section of the dataset alone, and we are going to add Winter, Spring and Summer sections as well.

B. Data Acquisition from Sensor Bundle
The data sequences were recorded using a custom robot platform with a sensor bundle mounted on top of it, as shown in Fig. 3. External hard drives are used to facilitate the copying process. Therefore, the sensor bundle was only used for data acquisition, and all the necessary data processing happened afterwards on another computer. To automate the process and thus minimise potential human errors, a python script is used for copying the data from the sensor bundle to a local computer. The software searches for log files generated by the recorder program to identify recordings.

C. Conversion from Binary to Readable Data
All the data from the sensors are recorded into separate raw binary files to eliminate conversion operations for the onboard computer. Therefore, the first step in data processing is to convert the raw binary data to commonly used individual data formats for each recorded sensor (see Table I). This process is summarised in the following graph (Fig. 4). A dedicated C++ converter program for each sensor type reads a single raw data file generated by the sensor and exports Applied Computer Systems _________________________________________________________________________________________________2021/26 175 the data to a respective standard file format. This process is automated by a python script that systematically launches those converters and captures their outputs to determine whether a conversion was successful. A log file is generated and updated after each step to keep track of the progress in case of the main python script failure.
Our typical recording contains four distinct parts, as shown in Fig. 5. These parts are separated during post-processing. There are two periods for time alignment -b and d, a sequence for LIDAR-RGB extrinsic calibration -a, and the main recording that contains data recorded during the platform motion in each recording -c.
For each C++ converter, it is possible to select a start and an end time to export only a particular part of the binary file. This mechanism is used to partition the recording into corresponding sections. We first estimate approximate time offsets between the sensors by comparing absolute time values of the DVS, Realsense and LIDAR. It is necessary because not all the sensors use absolute time -for example, the DVS starts its time count after the reset command at startup. Then, we manually examine the DVS IMU gyroscope data graphs and note the start and end times of both platform tilting/lowering procedures. After that, the time alignment script uses estimates of time offsets to match the corresponding periods from the LIDAR and RS IMU gyroscope data. Accurate time offset values are obtained by finding a time offset with maximum crosscorrelation value within a predetermined search window. LIDAR and RS gyroscope data are re-sampled to match the DVS gyroscope sample rate, which is 1 kHz. The time drift between these sensors is calculated by comparing the time between two controlled platform tilting events, the first one before the main recording and the second one shortly after.
To produce separate data for RGB -LIDAR extrinsic calibration, we manually browse RGB frames to visually find the start and end frame numbers of this part of the recording. The time alignment script correlates these frame numbers to their time values according to the LIDAR time conversion coefficients found by the time offset calculation.
The converter script is used again on the raw data when all recording periods are marked into a JSON file. This time, the converter uses the previously acquired time interval values to convert only the distinct four parts.

D. Preview Video Generation
For users to preview the data sequences quickly, we provide a video containing DVS event visualisation frames, RGB frames and depth images side by side. Since Realsense frame durations are not constant, there are occasional occurrences of longer frame times followed by a sequence of shorter frame periods. First, the Realsense RGB and depth images corresponding to the desired recording fragment start and end times are copied. Then, the frames of DVS events are generated by drawing all the positive and negative events of the corresponding RGB frame duration into a png image. Afterwards, all of the three sensor images are combined into a single frame. Finally, frames are encoded into a video using the FFmpeg 2 tool.

E. Reference Trajectory Generation
With each recording section of the Agri-EBV dataset, we provide a reference trajectory of the sensor platform estimated by the Cartographer SLAM tool [11]. The ROS version of the Cartographer is used. Therefore, we first create a rosbag file that contains both LIDAR point cloud and IMU data. Then, a script systematically launches the Cartographer for each of the recordings. The output trajectory points of the Cartographer are then captured by a custom ROS node, as there is no provision for writing these data directly to a file. The trajectory is provided in the regular dataset version as a list of points and orientations in a csv file. However, the trajectory is a separate rosbag file containing a single topic of the tf2 type in the ROS version. Afterwards, a Gnuplot 3 diagram tool is used to draw the graphs of XY coordinates of each trajectory for visual validation.

F. Validation
We validate data in multiple steps. An overview of data validation is shown in Table II. The first data validation is done after raw data files are uploaded to a computer. We check recording software log files and convert only those recordings showing up as completed in the log. This action is helpful because there may be recording attempts that have failed due to equipment or environmental issues. The main validation step is done after the raw data are converted to the standard formats. The IMU-based time alignment approach implies that the sensor platform is tilted and lowered before and after the main recording. Therefore, these moments have to be located visually in the IMU gyroscope data of the DVS sensor. The data graph at these moments has a distinct shape, so it is relatively easy to do by hand. The recording is processed further only if the platform tilting parts can be found in the IMU data so that time alignment can be calculated.
We provide data for postponed LIDAR to RGB extrinsic calibration that contains a sphere held in discrete positions, using the methodology from [12]. Therefore, we also check if the whole sphere is visible in the captured RGB frames. RGB frames are also used to visually find the start and the end of the main recording part.
Individual data converters provide information that is used to assess data quality. In particular, we consider the ratio of the number of positive and negative DVS events and the number of lost IMU samples. For example, some recordings may have missing data samples due to connection bandwidth limitations. Converter software compares the actual number of IMU samples with an estimate based on a known IMU sample rate and recording length. We provide this ratio of lost IMU samples in the metadata file of the dataset because it may also indicate possible missing data from other sensors.
After the preview videos are generated, we visually validate the content of RGB frames and frames generated by accumulating DVS events. In RGB images, we check if the desired scene and scenario are presented as intended and whether the images are sharp and clear. In DVS images, we check for excessive noise and sharp edges, as blurred edges may indicate issues with focal length adjustment of the DVS sensor.
The quality of LIDAR data is validated directly by visualisation of point cloud data and assessing trajectories obtained with the SLAM tool. We check if trajectories are closed and have a distinctive shape in the XY plane.

G. Conversion to Rosbag Format
The team also decided to publish the data in the rosbag file format as it provides a unified representation of all the different data types obtained from our sensor platform. All the data from various sensors written in the rosbag files can be simultaneously played back in the ROS environment and processed with built-in tools like Rviz or C++ and Python code API. We use twostep conversion: first, we convert each source file or a set of files to its ROS representation. This way, we obtain rosbag files that contain data from a single sensor, e.g., a different file for DVS events and DVS IMU, etc. Then, in the second step, we can combine the desired data of any of the sensors into a single merged file. We chose to publish four rosbag files of each recording. If needed, rosbag files can be combined further using rosbag API or ready-to-use ROS codebase from open-source packages.

H. Structure for Publishing
To create the final data structure for publishing (see Fig. 6), we use a python script that copies and renames the necessary data, compresses the data and generates a unique metadata file for each data sequence. The metadata contains information about the sensors used, data structure description, data about the time synchronisation and known issues. We publish three archives for each of the recordings: the main recording data, data for postponed calibration, and the main recording data in the rosbag format.  For internal purposes, we used recording names consisting of a date followed by a time, for example, "2020-09-25_11-21-18". It is a simple approach to generate a unique identifier for each of the recordings and a quick way to evaluate the time and place of the data recording session. For the end-user, this naming scheme would not make any sense. Therefore, when generating the final data structure, all the occurrences of the internal names are renamed to a recording number followed by the name of the recorded environment, e.g., "01_forest".
For hosting the dataset, IEEE DataPort was selected. It is a widely recognised unified open-access platform with a great variety of datasets. It offers good collaboration opportunities and allows updating datasets and getting feedback from other users.

III. CONCLUSION
In this paper, we have described procedures involved in processing data for the Agri-EBV dataset. Due to the requirements to achieve accurate time alignment of the sensors used, a necessity for postponed calibration data and conversion to the rosbag format, the whole procedure demands multiple stages of data conversion, formatting and validation.
The procedure we described could include more automation to reduce the human time required during dataset processing. We believe that finding loop closure and calibration balls in RGB frames could be done by automated computer vision software. Still, the development of such a system was not feasible for our number of recordings. Also, the IMU-based time alignment method could be improved to automatically find the time offset and drift values without the need for dedicated platform movement procedures.