Ice thickness data in the northern sea route (NSR) for the period 2006–2016

This data article includes the datasets of the mean and the standard deviation of ice thickness in a set of sailing zones for a sailing route that goes through the Northern Sea Route (NSR) between Murmansk and Pusan. The route under consideration is between the longitudes 33° 45′ 0″ and 129° 3′ 60″ and the latitudes 69° 24′ 27″ and 35° 6′ 0″ that correspond to the ports of Murmansk (Russia) in the west and Pusan (China) in the east respectively. Within this area, the part that is between the longitude 57° 0′ 0″ and −168° 58′ 0″ and the latitude 70° 27′ 18″ and 69° 6′ 0″ correspond to the NSR. This route has been divided into 49 subzones, and each subzone into squares of 12.5km of side following the data structure of the database Copernicus [1]. The detailed coordinates of the subzones (longitude and latitude) are provided in the article. The daily ice thickness for the period between January 1, 2006 and December 31, 2016 has been obtained for each of the 12.5km sided squares. This data article provides the normality test outcomes and the corresponding p-value of the ice thickness data for each subzone on each calendar day. Moreover, the mean and the standard deviation of the ice thickness in each subzone are also provided. The data provided in this data article can be very helpful for researchers for different applications related to the weather conditions in the NSR zone or to shipping related issues. For instance, the data provided in this paper can be used to investigate the change in ice thickness in the NSR over the period 2006–2016 and to estimate future changes. Another potential application is the estimation of the need for icebreaker assistance as well as the possible ranges for the vessel sailing speed based on the vessel type and for any navigation day in any of the NSR zones. In addition, this data can be used to estimate the risk of blockage for any vessel type because of ice conditions in the NSR zones. It can be helpful to estimate the economic viability of shipping through the NSR since the icebreaker assistance, the speed and the risk of blockage have an effect on the profitability of the shipping lines that may use the NSR.


b s t r a c t
This data article includes the datasets of the mean and the standard deviation of ice thickness in a set of sailing zones for a sailing route that goes through the Northern Sea Route (NSR) between Murmansk and Pusan. The route under consideration is between the longitudes 33 45 0 0 00 and 129 3 0 60 00 and the latitudes 69 24 0 27 00 and 35 6 0 0 00 that correspond to the ports of Murmansk (Russia) in the west and Pusan (China) in the east respectively. Within this area, the part that is between the longitude 57 0 0 0 00 and À168 58 0 0 00 and the latitude 70 27 0 18 00 and 69 6 0 0 00 correspond to the NSR. This route has been divided into 49 subzones, and each subzone into squares of 12.5km of side following the data structure of the database Copernicus [1]. The detailed coordinates of the subzones (longitude and latitude) are provided in the article. The daily ice thickness for the period between January 1, 2006 and December 31, 2016 has been obtained for each of the 12.5km sided squares. This data article provides the normality test outcomes and the corresponding pvalue of the ice thickness data for each subzone on each calendar day. Moreover, the mean and the standard deviation of the ice thickness in each subzone are also provided. The data provided in this data article can be very helpful for researchers for different applications related to the weather conditions in the NSR zone or to shipping related issues. For instance, the data provided in this paper can be used to investigate the change in ice thickness in the NSR over the period 2006e2016 and to estimate future changes. Another potential application is the estimation of the need for icebreaker assistance as well as the possible ranges for the vessel sailing speed based on the vessel type and for any navigation day in any of the NSR zones. In addition, this data can be used to estimate the risk of blockage for any vessel type because of ice conditions in the NSR zones. It can be helpful to estimate the economic viability of shipping through the NSR since the icebreaker assistance, the speed and the risk of blockage have an effect on the profitability of the shipping lines that may use the NSR.
© We looked at the ice thickness for 49 cells on a daily basis over 11 years and took the mean of it.

Experimental features
The ARCTIC_REANALYSIS_PHYS_002_003 is a set of data coming from observed data and a numerical model to fill the gap. To do they used the TOPAZ4 system as numerical model.

Data source location
Copernicus database, longitudes 33 45 0 0 00 to 129 3 0 60 00 and the latitudes 69 24 0 27 00 to 35 6 0 0 00 that correspond to the zone between the ports of Murmansk (Russia) in the west and Pusan (China) in the east respectively Data accessibility Data are within this article

Data
The datasets of this article provide three categories of data (datasets). The first dataset contains the coordinates of the 49 subzones of the considered route. The second dataset includes two MS Excel files that provide information about the normality test of the daily ice-thickness of all calendar days and the corresponding p-values for 49 subzones based on data from 2006 to 2016. The third dataset contains two other MS files that provide information about the mean and standard deviation of ice-thickness for all calendar days and for the same 49 subzones.

Data of the route coordinates
The selected sailing route is between the port of Murmansk in Russia in the west and the port of Pusan in China in the east. The route includes the NSR, that is divided into 7 administrative zones by the Russian Administration. Moreover, to obtain more accurate data about ice thickness, the route has been divided into 49 subzones. The coordinates of these subzones, as well as the travelled distance within each subzone, are provided in Table 1. Two figures showing the route on a map are available in the article.

Downloading the data of the ice thickness
Each subzone defined in Table 1 has been defined using its coordinates in the database Copernicus [1]. Matlab has then been used to call Python and download the data from the Copernicus server automatically for all coordinates and for all the dates between the 1st of January 2006 and the 31st of December 2016 which corresponds to 11 years of data. Since the structure of the database Copernicus [1] divides the defined zone into squares of 12.5km of side, then the downloaded data for a given zone and a given date will contain as many values as the number of squares in the zone for a given parameter. The exact dataset that was used from Copernicus is ARCTIC_REANALYSIS_PHYS_002_003 and its subset dataset-ran-arc-day-myoceanv2-be which requires defining the coordinates of the zone for which the ice thickness data is required, and the required range of dates. It then allows the user to define the parameters that are required to be downloaded, including the water temperature, salinity, ice thickness, depth, etc. We selected only the data for the ice thickness. A total of 196,882 files have been downloaded corresponding to 11 years, 365 days (or 366 days) in each year and 49 zones. We

Value of the data
Ice thickness depends on the weather conditions, which are random by nature, and on the climatic conditions, which are varying over the years. This leads ice-thickness data to have the random variability, and suggests the datasets provided in this data paper to go through a normality test. The p-values of the normality test for ice thickness of the zone between Murmansk (Russia) and Pusan (China) through the NSR are provided in this data paper. The normality test data are provided for 49 subzones and for each calendar day based on the ice thickness data between 2006 and 2016. These data can be used by researchers to check whether ice-thickness is normally distributed or not in a given subzone for a given calendar day and for various application. Considering ice thickness randomness may help researchers in the calculation of the probability of a vessel to be blocked in the NSR because of ice thickness. The datasets also include the mean and the standard deviation of the ice thickness for the 49 subzones and for any calendar day. Researchers may use these datasets for many applications related to the NSR. For instance, they can be used for weather related investigations, where the data provided in this paper can be compared with future similar datasets to investigate the effect of global warming on ice thickness. Moreover, the datasets of this data paper can also be used for further analysis on the NSR viability as a shipping route, such as in optimization models. An example of these models can have as objective to find the optimal route to be followed by vessels on the NSR and to minimize the shipping cost or the risk of blockage. The MATLAB files provided in this data paper can be used by researchers to download and analyze ice thickness data from the database Copernicus [1].
used four computers in parallel to speed up the download process of the data, where each computer was responsible for downloading the data for part of the zones and for the 11 years. In addition, the cores of the processor of each computer were used in parallel to download different files at the same time using the "parfor" loop of Matlab. The download request through the website server took between 33 45 0 0 00 E 6 9 24 0 27 00 N Murmansk port 6 1 3 4 0 0 0 00 E 6 9 26 0 12 00 N Murmansk sea 108 2 3 9 0 0 0 00 E 6 9 55 0 51 00 N Subzone 1. 20 seconds and 2 minutes per file on every computer, with the majority of the cases being in the low range (close to 20 seconds per file) which corresponded to a total download time of around 11 days. The downloaded files are in the format .nc (NetCDF). They result from the TOPAZ system based on an advanced sequential data assimilation method and the Hybrid Coordinate Ocean Model (HYCOM version 2.2). The dataset includes 26-years Arctic reanalysis product in the period 1991e2017 included. The variables delivered are all physical variables, including 3D currents, temperatures and salinities, 2D parameters for sea ice, mixed layer depth and sea surface heights. Sea surface temperature and sea surface heights are corrected for bias, with an online bias correction algorithm. Table 2 provides the metadata of the dataset.
It is worth noting that ARCTIC_REANALYSIS_PHYS_002_003 is a set of data coming from observed data and a numerical model to fill the gap. It is a merged product of weekly sea ice thickness (SIT) measurements in Arctic from the satellite CryoSat-2 altimeter and the satellite Soil Moisture and Ocean Salinity (SMOS) radiometer (referred to as CS2SMOS). This product is gridded with a resolution of approximately 25 km [2]. The database includes an estimate of the observation error but it only accounts for the errors related to the merging and interpolation [3]. A method has been used to evaluate the observation error suitable in the TOPAZ4 system for assimilating CS2SMOS data in a short sensitivity experiment, to which a term to the C2SMOS raw error estimate has been added. The amplitude of SIT has increased to ε ¼ min (0.5, 0.1þ0.15*d), where d represents the merged SIT measurement. The aforementioned maximal observation error is limited by a threshold value of 0.5 m in the years of 2014e2015 only. Afterwards, the additional observation error term has been tuned as ε ¼ min (0.25, 0.1þ0.075*d) for the winter 2016e2017.
The total size of the downloaded files is 24.79 GB. The detailed file sizes per zone are shown in Table 3.
Matlab has built-in function to read the NetCDF files. However, for the data files that have been included in this data paper, this functionality was able to read part of the data correctly but not the ice thickness data. Therefore, to avoid any risk of having incorrect data, NetCDF4Excel macro was utilized after modifying it to be connected with Matlab. We therefore, converted the NetCDF files into MS Excel files. To do so, we modified the Excel Macro (NetCDf4Excel) so that it opens the NetCDF files without requiring the user to browse for the file. Instead, this macro was modified to require the file name only Table 2 Meta data descriptors of the data available in the dataset "ARCTIC_REANALYSIS_PHYS_002_003". while calling it, i.e. without browsing. Matlab has then been used to call the Macro in an automatic way and to feed it with the file name so that the file can be read in Matlab and then saved as an MS Excel file.
The file names have been generated in an automatic way in Matlab in order to fit with the file names that are given by the database Copernicus. The complete process has then been done automatically. Indeed, the file names are generated automatically based on the zone number and the date. Each MS Excel file is stored in the folder that belongs to its zone. An iterative approach (using loops) has been used to automate the process. In addition, the NetCDF4Excel macro asks the user to browse and select the .nc file and then the user needs to save the converted file manually to MS Excel (.xlsx format). To automate things, this NetCDF4Excel macro has been modified by replacing the "browse" command with a variable that contains the file name and path. In addition, a code has been added to automatically save the converted file to .xlxs file. Excel is called using the command actxserver ('Excel.Application'). This command creates a Microsoft Component Object Model (COM) that can be used to control MS Excel through Matlab.

Normality test and p-value of ice thickness
First, some of the ice thickness data that were downloaded had small very small negative values (all of them close to zero). The downloaded data have then been processed in order to eliminate these negative values by setting all the negative values to be equal to zero.
Given the variability of the ice thickness data, normality test has been conducted. Lilliefors test was performed using the Matlab function lillietest () that returns the selected hypothesis (0 for H 0 if the data are normally distributed or 1 for H 1 if the data are not normally distributed) and the corresponding p-value. To perform the test for a given subzone and a given date, the ice thickness data of each square of 12.5 km of side in the subzone and for the considered date have been first obtained and averaged. For example, for 1 January, the average ice thickness data of 1/1/2006 (based on all the squares of the considered subzone), the average of 1/1/2007, 1/1/2008 … etc. have been calculated. Therefore, 11 means (from 11 years) have been obtained for each subzone. The Lilliefors test has then

Table 6
Matlab files used for the download of the ice thickness data.

Download_From_Server_P1
This script takes the following input: 1 The path of python motu-client package 2 The Copernicus account details (username and password) 3 The excel file contains the zones coordiantes 4 The database specific details (Mercator_motu_web, service_id, and product_id) The script prepares the python commands to download the data files (NetCDF) for all zones and all dates and provides two output files (.mat): 1 Download commands as a string 2 The full path where the data file will be saved for each date and each zone Download_From_Server_P2 This script loads the data file produced by Download_From_Server_P1 and evaluates the python command (which was saved as string) and starts the downloading process. This script uses "parfor" loop which utilizes the parallel computing to download more than one file at the same time. This script was run on different computers to speed up the process where each computer was responsible for downloading part of the zones.

Find_Missing_Files
This script loads the data files produced by Download_From_Server_P1 and compares the downloaded files (Download_From_Server_P2) with the list of files to be downloaded (Download_From_Server_P1). If a file is identified as missing then the script downloads it. A file might be missing due to some server issues during the downloading process.

Convert_Nc_to_xlsx
This file requires the full path (generated by Download_From_Server_P1) and stored in the (.mat file). In addition, it requires the path where the modified NetCDF4Excel exists. It calls and opens the modified macro and converts the NetCDF files to .xlsx files and stores them using the same file name of the NetCDF files.
been used on these 11 data points for each day and each subzone. The corresponding outcomes are two 365 by 49 cells MS Excel files that are provided in the supplementary materials of this data paper and are described in Table 4. It is worth noting that out of 17,885 tests, for 5,253 tests the null hypothesis was rejected which means that for the majority of the tests the null hypothesis was accepted and therefore the ice thickness data can be considered as normally distributed for each subzone and each calendar day.

Ice thickness mean and standard deviation
Based on the normality tests, the mean and standard deviations of the ice thickness for the 11 values described in section 2.2 for each subzone and for each calendar day have been calculated. The obtained data are provided in the supplementary materials of this data paper and are described in Table 5.

Matlab scripts
To download the data using python, motu client package is required. Table 6 includes the Matlab code file names as well as their descriptions. These files are the ones that are used to download the data.

Transparency document
Transparency document associated with this article can be found in the online version at https:// doi.org/10.1016/j.dib.2019.103925.