dataset

This paper presents a novel challenging dataset that offers a new landscape of testing material for mobile robotics, autonomous driving research, and forestry operation. In contrast to common urban structures, we explore an unregulated natural environment to exemplify sub-urban and forest environment. The sequences provide two-natured data where each place is visited in summer and winter conditions. The vehicle used for recording is equipped with a sensor rig that constitutes four RGB cameras, an Inertial Measurement Unit, and a Global Navigation Satellite System receiver. The sensors are synchronized based on non-drifting timestamps. The dataset provides trajectories of varying complexity both for the state of the art visual odometry approaches and visual simultaneous localization and mapping algorithms. The full dataset and toolkits are available for download at: http: //urn.fi/urn:nbn:fi:att:9b8157a7-1e0f-47c2-bd4e-a19a7e952c0d. As an alternative, you can browse for the dataset using the article title at: http://etsin.fairdata.fi. © 2020TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).


Introduction
The intense competition to develop a safe marketable selfdriving car has motivated a huge amount of research in the field of autonomous vehicles. Coupled with the growing interest of companies to put self-driving cars on the road, various companies are also interested in introducing other forms of autonomous vehicles to automate various industrial processes such as mining, shipping, agriculture, and forestry. Irrespective of the industry and targeted operations, the autonomy of any machine is highly dependent on advancements in a number of vision technologies, such as object detection [1], reconstruction quality [2], scene perception [3]. However, the base capabilities of an autonomous vehicle that need most attention remain visual odometry, relocalization and mapping [4]. This requires testing and validation on all scenarios that a vehicle/machine can face in a simulated environment. A variety of public datasets are available that provide a good amount of data for testing in various conditions and locations. We will mention some of the well-known datasets in an attempt to measure the expanse of the collections and find a horizon. Most of these datasets focus on urban environments * Corresponding author.
(for example, [5][6][7][8][9]) in order to facilitate testing on public roads in urban areas. The earliest among these datasets that recorded urban environment are Ford Campus [10] and KITTI [11]. Being among the first public datasets in the field, these datasets contributed significantly towards testing and validation. The recent additions to the publicly available datasets are The Oxford RobotCar [12], KAIST Multi-Spectral Day/Night [13], and Complex Urban LiDAR Data Set [14]. All these datasets, when combined, provide a significant amount of testing data for urban environment with short and long trajectories at various speeds [10]. Moreover, they incorporate weather and seasonal changes [12], long term changes in urban structure [12] and gradual/sudden illumination variations [13]. However, all these datasets target indoor or outdoor urban environment.
In contrast, some unique datasets target entirely different environments to assist automation of other form of vehicles. Among these are Aqualoc Underwater [15], Canoe [16] and Underwater Caves SONAR and Vision Dataset [17]. These datasets comprise of data acquired for under water exploration and surface sailing conditions. On the other hand, a few public datasets target more domain specific terrains for their experimentation. In [18], authors recorded data in the Chile's largest underground productionactive copper mine. The data was recorded for a length of approximately 2 km using Lidar, radar and stereo cameras fixed on  a robotic platform. The Devon Island rover navigation dataset [19] provides a dataset for testing rovers for planetary explorations. The dataset was recorded on Devon Island in the Canadian High Arctic, which is assumed to be analogous to Moon/Mars terrains due to the wide variety of geological features and microbiological attributes of the site.
The dataset that has high relevance to our work is the SFU Mountain dataset [20]. The study used a mobile ground based robot to traverse walking trails in the Burnaby Mountain, British Columbia, Canada. The dataset provides a semi-structured woodland terrain with different illumination and weather conditions and with changing vegetation, infrastructure, and pedestrian traffic. The dataset provides a good amount of data for visual odometry, however, lacks to present opportunities to test loop closure and re-localization.
In [21], the authors carried out brief experiments on the SFU Mountain dataset and their own dataset, Hillwood. The Hillwood dataset consists of photorealistic rendered and real forest video scenes. However, the Hillwood dataset only provide video recordings for testing without any ground truth information. In their conclusive remarks, the authors stressed upon the need and advantage of actual forest dataset with complete synchronized groundtruth poses [21].
In this paper, we present a new dataset that will target a real forest landscape recorded in the outskirts of Tampere, Finland. The goal is to provide testing data in order to facilitate the research towards increasing the autonomy of vehicles traversing rural areas and heavy machines working in the forest. Unlike urban settings, a terrain environment provides fewer discriminate landmarks and more repetitive textures in the scene. Presumably, such a situation strengthens visual odometry to some extent, however, affects adversely relocalization algorithms. This dataset provides semi-structured forest routes under different conditions (i.e. lighting, weather, vegetation, and infrastructure) in a highly self-similar natural environment. Furthermore, the sequences include scenes that best replicate the motions (i.e. stationary, sharp motion, bumps and potholes, slopes, and backand-forth motion) and environments (i.e. log piles, close-up of trees, off-road routes) involved in actual forestry operations. The dataset includes unique trajectories to test both visual simultaneous localization and mapping (visual-SLAM) and visual odometry algorithms thoroughly. Moreover, each path is traversed in two different condition, namely sunny summer and snowy winter. The dataset provides images from 4 cameras and ground truth poses for each sequence in each condition using Global Navigation Satellite System (GNSS) and Inertial Measurement Unit (IMU) data fusion. We provide processed rectified images, calibration data and ground truth at three sampling rates i.e. 40, 13.33 and 8 Hz except for two sequences which are sampled at 20, 10 and 7 Hz. For simpler representation, here onwards, we will approximate 13.33 to 13 Hz in the manuscript. Additionally, we provide raw images (40 Hz) for most of the sequences and the calibration images to the public. For this purpose, we also provide development tools to process raw data and evaluation tools in order to facilitate benchmarking against the state-of-the-art.
We hope this dataset provides a good reference for rural, forest and general terrain environment in order to facilitate the researchers to mitigate the challenges faced in this field of research.

Recording platform and sensor configuration
The recording platform and the arrangement are illustrated in Fig. 1. The data was recorded using a sensor rig mounted on a vibration dampening platform affixed to the vehicle. The vibration dampening platform was affixed to the Jeep using strong suction cups. The rig houses all the sensors as shown in Fig. 2.
The sensor and hardware specifications are as follows: For the sensors and their coordinate systems, we use the following notations, All cameras were connected to the embedded computer. The cameras stored data on the computer while the IMU and GNSS data was recorded on the internal memory of the NovAtel Module. To minimize write latency into storage and to prevent losses, we used CAT7 cable and wrote on SSDs using parallel threads for all the cameras.
To obtain high quality images it was essential to control the exposure time of the cameras during the acquisition. To minimize the effect of motion blur, the exposure time was kept below 10 ms. Moreover, to obtain images from four cameras for stereo analysis, it was of the utmost importance to enforce synchronicity. Hence, to acquire synchronized feed from four cameras at 40 fps on a Windows based platform, we utilized a special purpose triggering hardware known as Machine Vision Timing Controller. This timing controller or triggering device sent synchronized signals to all the cameras in order to enforce realtime consistent capture. Additionally, one trigger signal was sent to the NovAtel Module from the triggering device to generate timestamps. The NovAtel module was configured to store a timestamp in GPS time upon receiving the signal from the triggering device. The GPS time is more accurate and does not drift, unlike the clock on the Windows platform. This timestamp signal was sent at a delay of 1.5 ms. Even though this would have had a negligible effect, nonetheless, we compensated for this delay during the ground truth generation. The IMU and GNSS data are pre-synchronized by the NovAtel receiver. Hence, we have a precise synchronization among the cameras, IMU and GNSS data in effect. The raw data of GNSS and IMU are acquired at 20 Hz and 200 Hz, respectively. However, they are not the limits of the system. The maximum acquisition rate of the system is 100 Hz and 1000 Hz, respectively. For this study 20 Hz GNSS is used and interpolated to 50 Hz with the IMU data during post-processing by NovAtel Inertial Explorer software.
The sensor arrangement is illustrated in Fig. 3. It constitutes four cameras, a GNSS antenna and an IMU unit. The sensor rig has the middle cameras (C 2 and C 3 ) facing forward and houses the IMU unit in between them. The outward facing cameras (C 1 and C 4 ) are at nearly same angle from the forward direction. The motivation backing this camera arrangement is to test the effects of various camera configurations on the accuracy of joint perception. It is mostly observed in SLAM implementations that during forward motion, the view is dominated by consistently tracked areas of interest that are further away from the camera. This negatively affects the scale estimation for visual odometry. This is more apparent in monocular SLAM algorithms where the SLAM methods fail at certain point because the further points do not exhibit enough disparity change. The methods survive as long as the closest features are not lost due to motion blur. However, if the camera is fixed at an angle, instead of facing the forward direction, then the effective area in which the feature points exhibit disparity change increases as well.

Data overview
Our primary contribution through publishing this dataset is to provide publicly accessible data recorded in forest for research towards Advanced driver-assistance systems (ADAS) and autonomous work machines. In general the dataset provides challenges by incorporating sequences that are recorded at various times of day and weather conditions. Moreover, the sequences have been recorded so that they present considerable challenges for both visual odometry and SLAM approaches. The area explored during the course of the recording sessions can be viewed in Fig. 4. The dataset comprises unique trajectories, most of which are recorded in two seasonal conditions. In winter, a part of the route was blocked due to heavy snow and could not be re-recorded in snowy conditions. An overview of the dataset is provided in Table 1. The dataset offers a total of 11 sequences. We provide the dataset at different sampling rates to facilitate testing. The original visual data was recorded at 40 Hz and later subsampled to facilitate testing. The subsampled versions are provided in the form of compressed image packages and Rosbags.  The number of frames at each sampling rate is provided against the sequence name in Table 1. Three of the sequences offer loop closure opportunities while the remaining sequences are aimed at testing visual odometry. We have also tabulated the distance covered while traversing each path. The range of distance traveled varies from 1.3 km to 6.48 km. Information regarding the season Fig. 6. Illustration of the drastic changes in appearance of the scene produced by different illumination and weather conditions. and the illumination condition is also provided corresponding to each sequence. The seasonal name is also abbreviated in the name of each sequence for clarity. The dataset covers a variety of conditions with different illumination such as overcast, direct sunlight, dusk and night. However, we would like to state that the dataset does not offer sequences with rain and fog which would have provided further useful information for testing. Further details about the unique challenges of each sequence are provided in Section 7, Discussion. Fig. 5 presents a montage of selected images illustrating the range of varying appearances of the environment encountered as a result of different season and routes. The left half of the montage constitutes the left to right camera images from the summer dataset while the right half of the montage shows the left to right images from the winter dataset of same scene from almost the same vehicle locations. Fig. 6 illustrates the changes in appearance of the scene from almost similar camera perspective and location during both seasons and the challenges it brings about. The dark overcast in winter demands longer exposure time and slower vehicle motion to capture the details in the scene accurately. On the other hand, summer season presents challenges like overexposure, rain, puddle and flares in the scenes. The last row of images exhibit the conditions of a night and dusk time with varying illumination.
The high resolution and frame rate of the data recordings make it challenging to store the data on online data repositories. In order to make the usage of the data convenient for users, we have split the dataset into subset sequences. Each sequence can be downloaded and used independently as a .zip package at three sampling rates. Moreover, the most common configuration preferred for stereoscopic analysis is parallel, hence, we only provide the processed images from the forward facing stereo pair i.e C 2 and C 3 . Nonetheless, the raw images from all the cameras are provided in the dataset along with a toolkit to easily extract and process them in ready to use format. The MATLAB toolkit readily extracts the raw images into stereo pairs C 1 -C 2 , C 2 -C 3 and C 3 -C 4 .
The data structure or format for each sequence is illustrated in Fig. 7. The name of the folder constitutes the nature of the data and the rate at which it is sampled. Each sequence is selfcontained and is provided with supporting files inside the compressed file format. The compressed file in turn constitutes sub folders, which correspond to the stereo pairs for forward facing cameras (C 2 and C 3 ). The Rosbag version contains the Rosbag file instead of the PNG image files for the cameras (C 2 and C 3 ). Additionally, the calibration files, timestamps and the ground truth poses are provided in the corresponding directories for the rectified cases. The ground truth data already corresponds directly to the images provided and does not need further matching or synchronization. Each row of the ground truth text file corresponds to a new reading of the ground truth pose of 3 × 4 matrix [R|t] in the row first vectored form as shown below:

Sensor calibration and ground truth
In this section, we will discuss two forms of calibration that are essential to use the data effectively.

Cam-to-cam calibration
The first calibration step is the camera-to-camera calibration, which is performed to compute the intrinsic parameters and extrinsic transformations for the cameras. In the dataset, we have included both the processed data (using the calibrations) and the raw data. The processed data from the cameras can be directly used with any SLAM pipeline using the provided calibration parameters. However, for researchers who wish to re-calibrate the cameras and process the raw data themselves, we have included the raw images along with the calibration images in the dataset.
The calibration images are provided as stereo pairs between the nearest two cameras. Special attention was given to calibration by recalibrating the cameras for each recording session. Although, the sensor setup was not altered, some minute numerical differences are possible. It is strongly recommended to use the calibration parameters from the calibration files and not the illustrations. The camera-to-camera calibrations are provided for the nearest camera pairs, namely C 1 -C 2 , C 2 -C 3 and C 3 -C 4 . These camera pairs are jointly calibrated using MATLAB stereo calibration toolbox for their intrinsic and extrinsic parameters based on the approach presented in [22]. The calibration information is provided in two forms, namely MATLAB stereo-parameters object file and a text file with excerpts of the object file along the dataset.

Cam-to-IMU calibration
We calibrate the camera and IMU in order to obtain the external transformation between the camera and IMU unit. For this, a sequence was recorded in front of the calibration board, where the motion in all the six degrees of freedom was stimulated by moving along and around each axis. The relation between the camera and the IMU is then analogous to hand-eye calibration problem. For this, we utilize Kalibr toolkit [23] which estimates the spatial and temporal parameters of a camera system with respect to an intrinsically calibrated IMU. Since we have an accurate synchronization between the images acquired from the camera and the data from IMU/GNSS using the timestamps, we are not interested in the temporal relationship provided by the toolkit. However, the spatial parameters or the extrinsic transformation between the camera and the IMU is of interest to this work. We calibrate the IMU unit with the camera C 2 . We choose camera C 2 for calibration in order to be consistent with our ground truth coordinate system and the general approach of choosing a forward facing camera.

Ground truth quality evaluation
Acquiring ground truth information in an enclosed environment is a challenging step. The global accuracy of the ground truth solution is dependent on the availability of GNSS signals. In general, the strength and accuracy of GNSS signals are high in an open area, while poor signals are received in enclosed areas such as indoors, narrow city streets and forests. On the other hand, the local accuracy can be improved by fusing the information acquired through local sensing mechanisms such as IMU, Odometer, Radar, Lidar, Camera, etc. with the GNSS information for better results. As mentioned earlier, we utilize the NovAtel's PwrPak7 TM module to acquire a ground truth solution through a tightly coupled pose estimation framework that uses GNSS and IMU information.
To assure the readers of the quality of the ground truth, we provide the estimated position accuracy in the form of standard deviations for the positions at every timestamp for all sequences To facilitate readers, we show the range of each sequence in the figures. It can be observed from Fig. 8(a-c) that the average standard deviation for the winter sequences (W01, W03-W07) is lower than 2 cm for East and North with occasional larger deviations. The spikes in deviation are obtained where the vehicles traverse a narrow path with trees densely covering the area around it for a longer period. In all the sequences, the errors in the East axis are the lowest followed by errors in the North. The largest deviations are found in the elevation, which is typical of such a system.
On the other hand, the summer sequences (S01-S05) exhibit slightly larger standard deviations (see Fig. 8(d)). Except for S02, the errors for all the summer sequences in East and North are lower than 15 cm and 20 cm, respectively. As before, the largest deviation is observed in the elevation. This is in the sequence S05 with a value of 0.54 m. The deterioration of the GNSS performance for the summer sequence is logical. In contrast to the winter sequence, which was recorded in December 2018, the summer sequence was recorded near the springtime of May 2019. In the springtime, the GNSS results can be affected by the foliage which can cause 24 to 35% attenuation at L-band [24]. The contributing factor to the attenuation of the signals is the combined effect of signal absorption and scattering from the conglomeration of tree canopies and trunks. In winter, the sparsity of foliage in the tree canopy provides for a larger interval of nonattenuating space, while that advantage is lost in springtime in the presence of dense foliage [25]. In the absence of the GNSS signal, the ground truth pose estimation system relies more on the information provided by the IMU. Nonetheless, considering the task at hand, the results obtained for the summer sequence are good and provide a valid reference for experimentation.

Development and evaluation toolkit
The dataset is accompanied by a set of MATLAB scripts that can be used for processing of raw data or evaluating the odometry obtained from user's algorithm against the ground truth poses. The dataset includes ready to be used information for easy access to the researchers. Nonetheless, we provide a set of MATLAB tools for processing the data. Each data sequence is accompanied by a set of raw images. The raw data is of interest to the researchers who wish to re-calibrate the cameras using the set of calibration images provided with the dataset with their own or different calibration algorithms. The new calibration can then be used to process the raw images of the dataset. MATLAB script read-Raw_Debayer.m and readRaw_Rectify.m can be used to read the raw images from a folder and write the debayered and rectified images onto another directory, respectively. The debayered color images can then be used with the provided calibration data or any newly computed calibration data using the calibration images.
An evaluation script is also provided as part of the toolkit to assess the results. The MATLAB script mainEvaluate.m can be used to evaluate the obtained visual odometry poses against the ground truth poses. Prior to using the script, the directories for the text file with the ground truth poses and the self-computed poses should be specified. The evaluation script computes relative pose error (relative translation and rotation errors) and absolute trajectory error (ATE) for each sequence and the overall errors for all sequences. The core reason for selecting these metrics is that relative pose error provides a good analysis of the local accuracy of the trajectory over a fixed distance. Relative comparison over fixed distances can measure the effect of drift more effectively and provide a better response for visual odometry. On the other hand, ATE provides a more coherent and globally consistent comparison using the absolute distances between the corresponding ground truth poses and the poses estimated by the assessed system.
The ATE can be obtained by computing the absolute distance between the estimated and the ground truth trajectory. For global consistency, it is essential that both trajectories are in the same reference coordinate system. If that is the case, then the ATE can be computed directly, otherwise, the alignment can be calculated in the form of a transformation matrix T in closed-form using Umeyama's method [26]. It is to be noted that ATE only considers the translational errors. The commonly used form of ATE is given as follows [27] where T ∈ SE(3) transforms the trajectory p i to the coordinate system of the ground truth posesp i . Additionally, the mean, standard deviation, minimum, median and maximum errors can be computed to analyze the performance from different perspectives.
As mentioned earlier, relative errors can provide more accurate local information about visual odometry errors. Kümmerle et al. [28] proposed to compute relative error over an interval followed by an average over all these errors. The interval was selected based on fixed distance. This is a good approach, however, the trajectory and orientation errors are amalgamated and form a joint error metric. Geiger et al. [11] took this concept and isolated the rotation and translation part. This enabled them to compute the rotation and translation error independent of each other. The isolated relative translation error (RTE) and relative rotation error (RRE) are defined as follows where the interval τ corresponds to the set of image frames (i, j) that cover a specific length in the trajectory and p i andp i are the estimated and ground truth poses, respectively. The symbol ⊖ denotes the inverse compositional operator explained in [28] and ̸ [ ] is the rotation angle for the rotation error.

Benchmarking
In this section, we discuss the nature of the trajectories planned and traversed during the dataset recording. Furthermore, we provide experimental results of using state-of-the-art visual SLAM methods on the FinnForest dataset.
All sequences start from and end at the same location. Each trajectory has been recorded with an intent to tackle different conditions. The first route (W01 and S01), shown in Figs. 9(ac) and 10(a-c), comprises a short ellipse shaped trajectory that offers two repeated loop closures while traveling in the same direction and a third loop closure from the opposite direction. The terrain is rather harsh and mimics the uneven ground traversed by work machines. The second sequence (S02), shown in Fig. 10(d-f), offers another loop based trajectory for SLAM approaches. Unlike W01 and S01, this path is traveled only once and therefore forming a single closed loop. Moreover, as mentioned before, no winter recordings are available for this trajectory due to route blockage. The remaining sequences are more visual odometry oriented sequences. These sequences do not offer loop closures by traveling in the same direction. However, the same routes are traversed from the opposite direction, hence, offering an opportunity to explore relocalization possibilities while traveling from the opposite direction. The third, fourth and fifth sequence routes offer short, medium and relatively long trajectories for estimating visual odometry. The third route (W03 and S03), shown in Figs. 9(d-f) and 10(g-i), is the shortest of visual odometry sequences and offers the simplest case for testing. The fourth route (W04 and S04), shown in Figs. 9(g-i) and 10(j-l), offers more of an exploration type of trajectory with back and forth driving to mimic investigative movements of work machines. The fifth route (W05 and S05), shown in Figs. 9(j-l) and 10(m-o), is relatively long and provides more of a challenging odometry test course. Two more visual odometry sequences are provided in the winter condition W06 and W07 (see Fig. 9(m-o) and (p-r)) that are recorded in night and dusk time, respectively. In our opinion, sequence 3-7 are helpful for improving the autonomy of heavy work vehicles in such environments. The sequences mimic the movements of heavy machines that are more fixated on the task at hand in an exploratory manner.
It is noteworthy that the area traversed is deliberately kept limited in terms of displacement from the starting point. Unlike urban infrastructure, forest covered routes provide limited chances to record loop closure over large distances. Recording large distances without loop closure does not suit visual SLAM approaches, therefore, we focused on maintaining short distances with more information in terms of frame rate for improved accuracy. Moreover, at the given framerate, the data recorded is significantly high for the route traversed during the recordings.
Among the state-of-the art visual SLAM implementations that rank high in the KITTI benchmarking suite [29], we chose ORB-SLAM2 [4] and Stereo-Parallel Tracking and Mapping (S-PTAM) [30]. These studies provide open-source implementation of a stereo based visual SLAM method which facilitates the testing phase of our study. It is important to note that both ORBSLAM2 and S-PTAM are used in their standalone mode in order to process all the frames. S-PTAM in specific was not able to process all the incoming frames in its native ROS mode, where it attempts to simulate time-constrained real-time scenario. The implementation was not able to keep up with the incoming frames using the given computational resources. As a consequence, some of the frames were dropped in the ROS mode. To provide a fair and  thorough comparison, we provide the results of both methods in their standalone mode with no time constrains for processing. In addition, S-PTAM was used without the loop closure capability due to compatibility issues of the implementation with new versions of dependencies. Except for the sequences with loops, the remaining majority visual odometry sequences should not be affected.
Nonetheless, ORB-SLAM2 and S-PTAM yield excellent results in a typical structured urban environment. These methods have been extensively tested in urban and indoors settings over KITTI, EuRoC, and Level 7 block-set datasets [4,30].
The results obtained with the aforementioned implementations over FinnForest dataset are plotted against the ground truth in Figs. 9 and 10 for all the sequences recorded with the forward facing stereo pair (C 2 -C 3 ). Thorough quantitative result is tabulated in Tables 2 and 3. For all these experiments, a standard laptop with an Intel Core i7 @ 1.90 GHz processor and 32 GB RAM was used.
The primary aim of testing the dataset with state-of-the-art method is to educate the readers about the challenges provided by the dataset. Large drift and scale errors are observed for the visual odometry sequences, compared to the sequences with loops, in spite of short distances being covered. We will discuss the results obtained from experimentation in more detail in the next section.

Discussion
In this section, we discuss the experimental results using the new dataset and state our observations. Our remarks are intended to aid further research and experimentation with the given data.

Feature tracking in FinnForest
The forest environment provides unique challenges for tracking features. Due to self-similar and repetitive patterns, extracting correct matches and maintaining tracking with a low number of feature points is tricky. To avoid any obvious obstacles towards tracking, we recorded the data at low driving speeds around 25-30 km/h and low exposure time for image acquisition to avoid motion blur. Following the recommendation in [21], we include the skyline in the scene which is expected to be useful for navigation and augments to reliable features for matching. Furthermore, the forest view near the skyline significantly adds to the rotation accuracy (especially yaw and pitch) by providing features that are far away from the camera.
In most of the testing cases, we used 2000 feature points to track with ORB-SLAM2 and 1000 feature points with S-PTAM. The number of feature points selected was a compromise between the image resolution and memory management of the implementation. However, we observed that the selected parameters were suitable for testing most of the sequences and provided sufficient cross over candidates between frames for matching. During experimentation, we observed that S-PTAM required more tuning of the parameters compared to ORB-SLAM2, in which they were kept mostly the same for all experiments. ORB-SLAM2 uses ORB features which are both faster and more robust (due to rotation invariance) compared to features used by S-PTAM. S-PTAM uses the GFIT feature and BRIEF descriptors for matching. The descriptor is not invariant to rotation and as a result, the implementation requires parameter adjustment for various sequences of FinnForest dataset to maintain tracking on the parts of the route with harsh terrain.
As mentioned earlier, unlike urban routes, the path traversed while recording the dataset is a rough terrain. The combined effect of erratic motion, speed and data sampling introduce challenges for testing. It is apparent from the experimental results that the highest errors are observed in the visual odometry sequences while the errors are reduced and distributed in the sequences where loop closure has been achieved.
Effect of data sampling on tracking: The sampling of the dataset at lower rates is intended to facilitate testing and investigate a suitable data rate for real cases. Though lower frames per second (fps) are advantageous for testing purposes, information processing at lower fps can considerably compromise the visual odometry pipeline during real field operation. To exemplify the behavior, we take the experimental results of ORBSLAM2 on S02 at 8 Hz. ORBLSLAM2 fails to continue its tracking of feature points when the vehicle hits a pothole and the scene observes a sharp motion. It is important to note, that ORBSLAM2 successfully completed the same test sequence at higher frame rates (13 and 40 Hz). For further investigation, we significantly changed the parameters by increasing the feature points to 5000 and varying the FAST feature threshold between 4 and 18. However, the result remained the same. Surprisingly, S-PTAM successfully completed the test sequence S02 at 8 Hz when the feature points to detect were set to 1500.
On the contrary, ORBSLAM2 was able to handle a similar situation in W01 at 8 Hz with loosened parameters while S-PTAM failed to continue the tracking. However, none of the implementations were able to successfully complete the sequences W05, W06, and S04. A similar effect was observed in W07 at 7 Hz and the parameters were loosened again. This time ORB-SLAM was able to successfully process the entire sequence while S-PTAM failed.
Effect of motion on tracking: In some cases, the erratic motion due to terrain in combination with the scene is already too much even at a higher frame rate. In the case of W01, we observed that S-PTAM fails to complete the sequence at all sampling rates. The implementation fails while locally adjusting the poses that lie in the range where the sharp movements are observed. On the other hand, ORBSLAM2 was able to process the sequence with relative ease at 40 and 13 Hz without fine-tuning of the parameters. However, at 8 Hz the feature points used for tracking were increased, and the feature threshold lowered to maintain tracking even with ORBSLAM2.
A different cause is expected to be affecting S-PTAM while processing W05 at 8 Hz. The tracking failure occurs when the vehicle slows down to a momentary stationery state and restarts motion. We believe the source of the issue is the predictive feature search that fails to find matches. In both S-PTAM and ORB-SLAM2, a motion model is used to predict the position of the map points on the latest image frame and find matches in the small neighborhood for tracking. In case, if the feature matches are not found in the small predicted neighborhood, ORBSLAM2 expands the search window as a fallback option. On the other hand, we believe, S-PTAM relies only on the decaying velocity model and does not expand its search neighborhood as a fallback option. As a result, a sudden change in velocity at a lower frame rate affects the tracking of feature points. This phenomenon is aggravated by the sub sampling since the same behavior is handled by S-PTAM at sampling of 13 Hz but fails at 8 Hz when change is more abrupt. By requesting more feature points to be detected in the new image frame, we can avoid the tracking failure altogether, however, poor matches with the map then lead to convergence issue in the local bundle adjustment step of S-PTAM.
Effect of illumination on tracking: The dataset includes various opportunities to test the robustness of visual SLAM implementation towards tracking and pose estimation in a scene with varying illumination. The notable opportunities regarding illumination change are provided by W07, S04, and W06. In W07, we observe gradual illumination change as it gets darker. The sequence was recorded at the dusk time and the illumination changes drastically between the start and end of the sequence. ORBSLAM2 did not face any issue in terms of tracking feature points in this sequence, however, S-PTAM faced considerable problems to maintain tracking at all sampling rates. S-PTAM also failed tracking at sampled data of 13 Hz, however, we have included the results since the failure point was close to the end of the sequence.
On the other hand, a more rapid change is observed in illumination due to direct sunlight in the sequence S04. At a sampling rate of 40 and 13 Hz, both ORBSLAM2 and S-PTAM can successfully process the sequence. However, at 8 Hz they fail at different points. The S-PTAM fails directly due to overexposure and flare observed in the scene while ORBSLAM2 fails due to fast erratic motion following the over-exposed scene in the recording.  The Night sequence, W06, is especially challenging for both implementations. Neither of the implementations could process the sequences under normal parameter settings. ORBSLAM2 was able to process the sequence at 40 Hz with relaxed parameters after the FAST feature threshold was reduced to 4 to avoid losing the track of features. S-PTAM is not able to process the W06 sequence at any sampling rate. Even after the feature threshold is reduced, S-PTAM fails to converge at local bundle adjustment. This is expected since the scene in view is limited to a few meters of the snow-covered road. As a result, the poses estimated do not agree over a longer duration and fail to converge at bundle adjustment.

Loop closure
The dataset provides three sequences with loop closure opportunities. Among these, S01 and W01 repeat the same route twice in one direction and the third time in the opposite direction. This means that ORBSLAM2 can identify the loop closure opportunity at any time of the second lap of the drive. During experimentation we observed that ORBSLAM2 successfully closes the loop and distributes the errors for the aforementioned sequences. In contrast, ORBSLAM2 fails to close the loop for the sequence S02, even though, enough overlap of the start and end scenes is provided. Oddly, ORBSLAM2 can re-localize itself at the end of the sequence S02 that is processed at 8 Hz after losing track of the feature points. A closure can be observed due to re-localization in Fig. 10(f) in the trajectory estimated by ORBSLAM2. We believe that sparser keyframes formed at 8 Hz provided more decisive information compared to the same sequence at higher fps, where the relocalization was not observed.

Drift
A drift in scale and rotation can be observed in the estimations provided by ORBSLAM2 and S-PTAM for all of the visual odometry sequences. This effect of drift becomes stronger as the sample rate drops down from 40 to 8 Hz. The effect is most apparent in S03 and W03 (see Fig. 9(d-f) and (g-i)).

Seasonal effect
Seasonal changes have an apparent effect on various aspects of this dataset. As discussed earlier, the ground truth accuracy reduced in the summertime compared to the wintertime due to considerably higher foliage effect in the summer. An added challenge from the perspective of recording was that, while traversing the forest, different parts of the forest provided different levels of shade from the sun due to the density of the trees in that specific part. This created a challenge to avoid over or underexposure of the scenes since we used a fixed aperture. These effects are more obvious in the sequence S04.
The winter sequences, on the other hand, were adequately exposed since most of the recordings are in overcast. In addition, there was enough texture on the ground due to tire tracks in the snow. ORBSLAM2 handled tracking very well with evenly distributed points on the snow-covered ground. S-PTAM focused more on the obvious texture from the trees. Most of the feature points from the snow-covered road are discarded by S-PTAM as false matches.

Effect of ground truth precision
It is important to note that the experimentation is independent of the precision level of the ground truth position since the test algorithms did not use the IMU and GNSS information. However, the effect of the ground truth precision indeed has to be considered when comparing the experimental results against the ground truth poses. The ground truth precision for each sequence is shown using Fig. 8 and discussed in Section 4.3. In the context of benchmarking, we can say that we are more confident in the comparison performed in Tables 2 and 3 for the winter sequences (W01-W07) than the summer sequences (S01-S05) since the precision of ground truth position for winter sequences is comparatively higher. Nonetheless, the precision of the ground truth is high enough in both cases for valid analysis of visual odometry/SLAM algorithms.
It is important to remember that the visual odometry/SLAM algorithms may give different responses for the same trajectory recorded under different condition, as we discussed throughout Section 7. Therefore, arguing that one result is better than the other without comparing to the provided ground truth is not an objective conclusion.

Summary
In this paper, we have presented a novel dataset that offers a forest-like environment in various light and weather conditions for visual odometry and SLAM systems to process. The dataset provides synchronized and processed image frames from 4 cameras that can be used independently or as stereo pairs. Moreover, raw data is also provided to encourage further examination into the system. We believe this dataset will prove immensely useful towards enlarging the spectrum and diversity of the testing data for autonomous vehicles, especially, autonomous heavy work machines. We hope that this dataset will provide new challenges and inspire exploration of new possibilities for autonomous vehicles/machines.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ihtisham Ali received his B.Sc. in Mechatronics Engineering from the University of Engineering and Technology, Pakistan (2014) and his M.Sc. in Automation Engineering from Tampere University, Finland (2017). Currently, he is a doctoral researcher in 3D Media Group at Tampere University. He has worked on several industrial projects pertaining to machine automation using visual cues. His research interest is focused on computer vision and robotics specifically object pose estimation, 3D reconstruction, and visual SLAM.