Ice and AIS : ship speed data and sea ice forecasts in the Baltic Sea

The Baltic Sea is a seasonally ice covered marginal sea located in a densely populated area in northern Europe. Severe sea ice conditions have the potential to hinder the intense ship traffic considerably. Thus, sea ice foreand nowcasts are regularly provided by the national weather services. 5 Typically, the forecast comprises several ice properties that are distributed as prognostic variables, but their actual usefulness is difficult to measure and the ship captains must determine their relative importance and relevance for optimal ship speed and safety ad hoc. 10 The present study provides a more objective approach by comparing the ship speeds, obtained by the Automatic Identification System (AIS), with the respective forecasted ice conditions. We find that, despite an unavoidable random component, this information is useful to constrain and rate foreand 15 nowcasts. More precisely, 62–67 % of ship speed variations can be explained by the forecasted ice properties when fitting a mixed effect model. This statistical fit is based on a test region in the Bothnian Sea during the severe winter 2011 and employs 15 to 25 min averages of ship speed. 20


Introduction
The Baltic Sea is a seasonally ice-covered marginal sea located in a densely populated area in northern Europe with important shipping routes crossing the regularly ice-covered regions.The ice season lasts up to 7 months (Vihma and Haapala, 2009).The maximum ice extent is typically reached in late February, showing large interannual variations between 12.5 and 100 % (Leppäranta and Myrberg, 2009).In regions with long wind fetch the ice cover is often broken up and the ice is forced into motion (Uotila, 2001).Thus, the ice coverage here is not uniform but consists of ice floes of variable sizes, leads and deformed ice patches (Leppäranta and Myrberg, 2009).Ships have to find their way through this "drift ice landscape".
Since sea ice potentially hinders winter navigation, detailed forecasts of the ice conditions are in demand and regularly provided by the local weather services.A typical ice forecast contains several prognostic variables, for instance ice concentration, thickness and prognosticated ice drift.Additional variables are occasionally included, e.g., ridged ice fraction, which refers to the most important deformed ice type.Ridges can form substantial obstacles to winter navigation and thus receive increasing attention from the research community (e.g., Haapala, 2000;Kankaanpää, 1988;Leppäranta and Hakala, 1992;Leppäranta et al., 1995;Löptien et al., 2013).The forecast of the Swedish Meteorological and Hydrological Institute (SMHI) provides additional information about convergence of the ice drift field (i.e., regions where the ice is compacting are marked).In regions with convergent ice motion, large ice stresses can occur, the ships might get stuck and, in the worst case, even damaged (e.g., Suominen and Kujala, 2014;Pärn et al., 2007).
Based on spatial maps of the sea ice properties described above, ship captains, supported by the national maritime administrations, must choose the supposedly best route.Rating the relative importance of the forecasted variables in terms of ship speed and safety depends on the expertise of each captain.Also, a typical forecast model has a horizontal resolution that ranges from 1 to 3 nm (nautical miles; 1 nm = 1852 m) and important processes acting on the ship scale (i.e., a scale of a few hundred meters) might not be resolved.
The present study provides an objective assessment of how a typical ice forecast (provided by SMHI) compares to the ship scale and how the various ice properties affect ship speed.The study focuses on a test region in the northern Bothnian Sea (62.8-63.6 • N and 19.8-21.0• E, Fig. 1), which is regularly passed by ships and known for its severe ice conditions.The region is located south of the so-called Kvark Strait (Green et al., 2006), a narrow passage with little space to circumnavigate problematic areas.The mean ice drift in the test region is generally directed towards the northeast (Fig. 2), but, in the presence of high ice concentrations in the Bothnian Bay, the northward flow is limited (or even blocked).As the ice concentrations in the Bothnian Bay decrease, e.g. in March and April, the transport through Kvark Strait accelerates.Still, the narrowness of the passage leads to an accumulation of sea ice in the test region.This accumulation of sea ice makes it impossible for ships to fully avoid severe ice conditions and makes the region particularly interesting as a test region.The corresponding ship speed observations are obtained by the automatic identification system (AIS).While the AIS comprises an unavoidable random component (e.g., ship captains might reduce speed due to reasons not related to sea ice), this large-scale comprehensive data set is available for research purposes without any extra costs.Due to the large amount of ships which have a tight schedule and aim to keep a relatively constant high speed, we anticipate that the noise might well be on a relatively low level and test the applicability of AIS-derived ship speeds for the evaluation of sea ice fore-and nowcasts.We explore to what extent observed ship speeds can be reconstructed based on the forecasted ice properties by fitting a mixed-effect model.This statistical model resembles a mul-tilinear regression but allows additionally for the inclusion of (construction-related) differences between individual ships.A detailed description of the underlying data as well as the statistical method is given in the following section.Section 3 shows the results of our data exploration and the statistical fit, followed by a conclusive summary in Sect. 4.

Methods
We compare ship speed observations to the corresponding (forecasted) ice properties.Both the ship speed observations and the ice forecast model, inclusive of evaluation, are described in this section (Sects.2.1 and 2.2, respectively).After a preceding data exploration, we fit a statistical model.This so-called mixed-effect model is described in detail in Sect.2.3.

Ship speed observations
The automatic identification system (AIS) was developed in the 1990s and is an automatic tracking system for identifying and locating ships.The system is based on an electronic exchange of data with other ships nearby, AIS base station and satellites.The major aim is to avoid collisions by supplementing ship radars (Berking, 2003;Harati-Mokhtari et al., 2007).Additionally, it enables maritime authorities to monitor vessel movements.The "International Maritime Organization's International Convention for the Safety of Life at Sea" requires AIS to be installed aboard international voyaging ships with a tonnage of 300 t and more, as well as on all passenger ships.AIS data contain, inter alia, a unique identification (MMSI number), position, course and speed of a vessel.Since the data coverage has increased considerably during the past 2 decades, the data set is increasingly used for scientific purposes (e.g., Montewka et al., 2010, assessed the collision risk of vessels; Jalkanen et al., 2009 andMiola et al., 2011 estimated the emissions of marine traffic).
The present analysis is based on a test data set, collected during the severe winter of 2011 (January-April).We focus on a test region in the Bothnian Sea (62.8-63.6 • N and 19.8-21.0• E, Fig. 1), with generally severe ice conditions and intense ship traffic.No harbors are included in this test area.Ship speed and direction are calculated from the ship locations every 5 min.All observations ±1 h around an ice forecast (which is provided four times daily) are analyzed.Ships close to icebreakers (within a rectangle of 0.2 nm (= 370.4 m)) as well as icebreakers themselves are excluded from the analysis (as they add an unforeseeable random component).Note that, nevertheless, icebreaker channels and icebreaker fragmentation of the ice pack might persist.The data table, without icebreakers, comprises of 16 407 entries.Since we could not detect any systematic drop of ship speeds at ice concentrations below 60 %, those data are not considered.We also exclude all ships that only remained in the test re- Ultimately, the analyzed data set consists of observations from 319 different ships, with an average duration of stay in the test region of 215 min.Overall, ∼ 14 000 observations were included into the statistical analysis.

Ice forecast model
The ice forecasts are based on the operational coupled ocean-ice forecast model HIROMB (High Resolution Operational Model for the Baltic) of SMHI.It includes a threedimensional, baroclinic ocean model, covering the Baltic Sea and North Sea (Funkvist and Kleine, 2007).The ocean model is coupled to a Hibler-type sea ice model (as described by Wilhelmsson, 2002; extensions by Kotovirta et al., 2009 andAxell, 2013).The horizontal resolution ranges from 3 nm (nautical miles; 3 nm = 5556 m) in the North Sea to 1 nm (= 1852 m) in the Skagerrak-Kattegat area.
The forecasts include data assimilation of salinity, temperature and various ice properties.These latter are provided by the operational ice service at SMHI and comprise ice concentration, level ice thickness and "degree of ridging" (which is used to approximate ridge density, following the approach of Lensu, 2003).The data are based on in situ measurements, estimates from voluntary ships and icebreakers as well as satellite observations.The degree of ridging is a num-ber describing how heavily ridged a region is (as perceived by the ice analyst).Based on the approach of Lensu ( 2003), this number is tentatively converted to the more common measure "ridge density" (= number of ridges per kilometer).Note, this number is approximate only.
Apart from the assimilated ice properties described above, the model output covers ice drift in u and v directions as well as divergence of the ice motion.Divergence is defined as the sum of the derivatives of the ice flow field in u and v directions.As such negative values stand for areas where the ice is compacting (i.e., convergent ice motion).Auxiliary classifications of the ice thickness are available but not included into the following analyses since they do not provide new independent information.
To evaluate the sea ice model independently of the AIS data, ideally large-scale observations, which are not already included in the data assimilation, are needed.Ice thickness and concentration are available as digitized ice charts, which are provided daily by the Finnish Meteorological Institute (FMI).The charts summarize the available ice information for shipping, based on the manual interpretation of satellite data and ground truth.The underlying observations are provided, e.g., by icebreakers, voluntary observing ships, ports and station observation stations of the Baltic ice services, and are in large parts independent of the Swedish observations (which are assimilated into HIROMB).The generally close match of observed and simulated ice thickness and concen- tration (Fig. 3) is not surprising, given the assimilation of these relatively well-observed variables.It is, however, difficult to find reliable observations of other ice properties.One recent attempt estimates ice drift based on synthetic aperture radar (SAR) images (Karvonen, 2012).The data are provided within the MyOcean Project.Despite the known uncertainties and sparseness of the provided data, the data set is still unique regarding its spatial coverage.According to Karvonen (2012), the ice drift direction is relatively well estimated, while the magnitude might often be biased.The data set consists of ice displacement (in meters), estimated from two successive SAR images over the same area.Two exemplary snapshots of the derived velocities at times with a relatively high data coverage are shown in Fig. 4. As ice drift is not directly assimilated, and given the uncertainty in the observations, it seems reasonable that the agreement between modeled and observed ice drift is not as close as between thickness and concentration.Particularly, the simulated sea ice is more mobile than implied by the SAR estimates, while the ice drift direction and the major patterns agree rather well.An overestimation of ice drift speed in coastal regions was expected as land-fast ice is not considered by the model.Nevertheless, one should bear in mind that, even though ice drift is not directly assimilated, this occurs to a certain degree indirectly as it can not evolve completely freely due to the constraints given by assimilating ice thickness and concentration.Unfortunately, the SAR estimates seem too patchy for reliable estimates of divergence.

Statistical analysis
After some preceding data exploration, we aim to test how well we can reconstruct the ship speed observations by the forecasted ice properties.For this purpose, we fit a mixedeffect model (e.g., Zuur et al., 2007).A mixed-effect model is an extension of a common multilinear regression, which accounts for the differences between individual ships (depending on ice class, shape and size of a vessel, engine power, etc.).A multilinear regression alone would not be able to capture these often substantial differences.In matrix notation a mixed-effect model can be written as Here, i = 1, . .., N indexes the MMSI numbers and y i denotes a vector of observations per ship (= dependent variable), which consists here of the square root of the speed of individual vessels during consecutive 5 min time steps.The square root is taken to bring the data closer to normality.The vector β stands for the "fixed effects" and has the same value for all ships.u is a vector of so-called "random effects" (with mean 0), by which entries are allowed to vary between individual ships (which are uniquely identified by the MMSI numbers).
X and Z denote matrices of regressors, relating the observations to β and u.When omitting the term Zu i , the formula corresponds to a common multilinear regression.Since generally not every single ship-dependent regression parameter in u is of interest but rather the overall properties are (e.g., variations and covariability), u is termed "random".The matrices X and Z may, or may not, contain the same explanatory variables.In the present study, we chose X to contain ice concentration, level ice thickness, ridge density, ice drift speed, convergence and the angle at which the ship is moving relative to the ice movement (factorized as explained in Sect.3.1).To keep the number of estimated parameters as low as possible and to avoid overfitting, we included only those variables in Z which showed, in a preceding data exploration, indications of large variations among the ships (cf.Sect.3.2) and merely ice concentration, level ice thickness and ridge density were considered here.Additionally, we allow for a ship-dependent intercept (i.e., points where the regression lines cross the y axis), accounting for the different mean speeds of individual vessels.As usual, i represents a random-noise component ( i ∼ N (0, i ), iid).

Data exploration
First, we explore the distribution of ship speeds for different ice concentrations, ice thicknesses and ridge densities (Fig. 5).To visualize the large amount of data, the ice properties are binned into several classes and subsequently the ship speed distributions are analyzed per ice property class.This analysis shows that the median ship speed per bin, as well as the upper quantiles, decreases strongly with increasing ice concentration (Fig. 5a).While the median speed is around 14 kn (knots; 14 kn = 7.2 m s −1 ) for ice concentrations between 60 and 65 %, this value decreases to 4-5 kn (≈ 2-2.6 m s −1 ) at ice concentrations between 95-100 %.For level ice thicknesses below 30 cm, a similar decrease in the median ship speed occurs with increasing ice thickness.Interestingly, no further systematic speed drop occurs for thicknesses above 30 cm (Fig. 5b).Also, it is interesting to note that the variability in observed ship speeds increases with both increasing ice concentration and thickness.We anticipate that the increased spread of ship speeds with decreasing median reflects the varying abilities of differing vessels to cope with the ice conditions.As ice concentration and thickness increase, small ships will in general experience very large speed drops, while big ships with strong engines are less affected.We conclude that all variables with a pronounced increase in the spread of ship speeds with decreasing median might strongly benefit from a random component when fitting the mixed-effect model.In addition to ice concentration and thickness, such a link between median ship speed and variability also exists for the amount of ridged ice. Figure 5c shows a considerable decrease in median ship speed in combination with an increase in variability as ridge density exceeds a value of 1 ridge/km (from ≈ 13 to ≈ 8 kn or ≈ 6.7 to ≈ 4 m s −1 ) but no clear drop as ridge density increases further.Note that the latter result might partly be due to the uncertainties in the precise values of the assimilated ridge densities.
Figure 6 shows a similar analysis as Fig. 5 but focuses on strong nonlinear and factorized relationships.Note that in contrast to the prognostic variables analyzed in Fig. 5, these factors are based on prognostic variables which are not assimilated into HIROMB.The first investigated factor covers convergence in the ice drift field.As in the released forecast product, we distinguish convergent from nonconvergent ice motion and do not consider the magnitude.Figure 6a illustrates that the ship speed distributions are surprisingly similar under convergent and nonconvergent ice motion.In contrast, simulated ice drift speed is influential while the impact is nonlinear (Fig. 6b).Particularly, very slow-moving, almost stationary ice is related to a considerable median speed drop, but fast-moving ice also seems to affect ship traffic.To factorize this nonlinear relationship for the following statistical fit (Sect.3.2), we distinguish four (nonequidistant) ice velocity classes: stationary ice (0-0.04 m s −1 ), slow-moving ice (0.04-0.1 m s −1 ), medium speed (0.1-0.3 m s −1 ) and fastmoving ice (> 0.3 m s −1 ).Another particularly problematic situation for ships is illustrated in Fig. 6c.Very slow-moving ice in combination with a drift angle close to 90 • relative to the ship movement is related to a reduction in median ship speed to values close to 0 (Fig. 6c).This finding is in line with the experiences of naval architects (K.Riska, consulting and engineering company ILS Oy (Finland), personal communication, 2012), who report that ship routes on which ice drifts towards the side of the ship generally result in closure of the ship channels as the ice might cause considerable pressure on a ship's hull on a large contact surface.At the same time, high ice pressure is generally related to high ice concentrations and, accordingly, slow ice drift.

Mixed-effect modeling
After the purely heuristic data exploration, fitting a statistical model allows us to investigate the relation between the various ice properties and ship speeds systematically.The aim is to test how well we can reconstruct the ship speed observations using the forecasted ice properties.The statistical significance is tested by using a t test.A good agreement between this ship speed reconstruction and observed ship speed implies that the noise level in the ship speed observations is sufficiently small to use the data for model evaluation.At the same time it illustrates the actual usefulness of the ice forecast to estimate delays in the time schedule of ships.As described above, we fit a mixed-effect model based on ice concentration, level ice thickness, ridge density, ice drift speed and the factor according to Figure 6c (cf.Sect.2.3).Divergence was excluded from the final statistical model since the impact of simulated convergent ice motion appeared, in agreement with the foregoing data analysis, not to be statistically significant at the 5 % level.Similarly, all interactions among the above variables could not score any remarkable improvement of the statistical fit (according to the Akaike information criterion (AIC)) and were not considered.Random intercept and slope are included for ice thickness, ice concentration and ridge density as the preceding data exploration of these variables revealed a pronounced increase in the variability of ship speeds with decreasing median.
The reconstruction of ship speed based on the mixed-effect model yields a remarkably close relation with the original observations: the correlation between the square root of observed ship speed and reconstruction is 0.7, which implies that ≈ 50 % of the variance in ship speed can be explained by the modeled ice properties.When smoothing the data with a running mean of 15 min, this correlation increases considerably to 0.79.For a running mean over 25 min, we obtain a correlation of 0.82, which refers to an explained variance of 67 %.Typical examples of the corresponding multilinear regressions for individual ships are shown in Fig. 7.In agreement with the foregoing data exploration, the impacts of ice concentration, level ice thickness and ridge density appear to be highly significant (as the p values in Table 1 are clearly below 0.05).The forecasted ice concentrations seem The reconstructions are based on a multilinear regression of forecasted ice concentration, level ice thickness, ridge density, ice speed and an additional factor which is based, inter alia, on the angle at which the ship is moving relative to the ice movement (parameterized as described in Fig. 3c).
to have the largest impact among the continuous variables as the estimated mean slope has the largest amplitude among the normalized continuous variables (−1.01; cf.column 1, Table 1).Ice drift speed appears significant as well, while the relation is nonlinear, in agreement with the foregoing data exploration.The strongest factor affecting ship speed is the relatively rare situation were the ice drift is very slow and the ice drift is directed towards the side of the ship (according to Table 1 the estimated impact of this factor is −0.63).Note that the effects of ridges and level ice thickness can not be fully separated since both quantities appear to be correlated at 0.53.This relatively high correlation is reasonable since thin-level ice will raft rather than form ridges when deformed.Additional weakly negative correlations occur between ice drift speed and ice thickness (−0.3) as well as with ice concentration (−0.14).These correlations do not impact the validity or forecast suitability of the statistical model but rather complicate the interpretation of the impact of the respective variables.That is to say, the impact of two highly correlated variables can not be distinguished.
Table 2 provides information about the random components.The standard deviations are listed in column 1 and amounts to 1.02 for the residuals.The random intercept has a standard deviation of 1.71, while the standard deviation ranges from 1.37 to 2.22 for the random slopes.The remaining columns in Table 2 refer to the correlations among the random slopes (columns 3-4) and the correlations of the random slopes with the random intercept (column 2).The correlation between the random intercept and the random slope related to ice concentration is −0.79, indicating that faster ships are generally less impacted by high ice concentrations.The same holds for ice thickness and ridge density -but here the relation is somewhat weaker (correlations with the random intercepts are −0.57and −0.43, respectively).

Conclusions
Our analysis illustrates that, for a test data set, a large part of observed ship speed variations can be well reconstructed by the corresponding forecasted ice properties (Fig. 4), and, on average, 62-67 % of the ship speed variations can be explained (when considering 15-25 min averages).These large explained variances have two major implications.First, the ship speed observations obtained from the AIS system appear to be useful for evaluating sea ice fore-and nowcasts -despite an unavoidable random component inherent in this data set.This finding might be of great interest, in particular as ship traffic in the Arctic, and thus the demand for sea ice forecasts, increases.Specifically, the need for forecasted ice properties exceeds information on ice concentration and thickness, which are difficult to evaluate otherwise.Note, however, that we regard our study as a pioneer study and the stability of the results for other regions, ship types, etc., remains to be tested in studies to come.
The second implication of the close fit is a proven usefulness of the respective ice forecast for shipping.Despite the fact that the regression parameters vary strongly from ship to ship (Tables 1 and 2), the good correspondence between ship speed observations and ice forecast is remarkable since the impact of nonresolved small-scale processes was not entirely clear.The impact of all provided prognostic variables, apart from convergence, appears to be significant.The surprisingly weak relation between ship speed and convergent ice drift might be related to shortcomings in the modeled ice drift, which amplify when deriving convergence.A well-known problem in this context, yet to be solved, is the often poor simulation of the ice drift related to the land-fast ice zone.As illustrated in Löptien and Dietze (2014) (their Fig. 6), our test region might well be affected by this problem.
Another somewhat surprising result is that median ship speeds level out at 30 cm, i.e., they do not decrease further with increasing ice thickness (Fig. 5b).We speculate that this finding is related to icebreakers: even though icebreakers are excluded from our analysis, they create (under nonconvergent ice motion) persistent channels which facilitate the progress for ships.We thus anticipate that our parameter estimates might change when the method is applied in regions outside the Baltic Sea, where icebreaker assistance is less frequent.An alternative explanation for the above finding is that some of the thickest ice occurs in the land-fast ice zone.Ships benefit from the fact that convergent ice motion or ice deformation is typically absent in such ice conditions.Thus another potential subject of future research is an extensive analysis of the land-fast ice zone.

Figure 1 .
Figure 1.The test region considered in this study is indicated by the black box.Blue shading refers to the average number of ships in the test region per day and 3 × 3 nm (= 5556 × 5556 m) model grid box in winter 2011 (January-April).Gray shading and contour lines depict the average ice concentrations in percent during that winter (SMHI forecast).Contour intervals are 10 %.

Figure 2 .
Figure 2. (a)-(d) Forecasted monthly mean ice concentration and ice drift in winter 2011.The squares mark the test region considered in this study.

Figure 3 .
Figure 3. Mean forecasted and observed (a, b) level ice thickness and (c, d) ice concentrations during winter 2011 (JFMA).The squares mark the test region considered in this study.(a) and (c) were generated by using MyOcean Products.

Figure 4 .
Figure 4. (a, c) Two exemplary SAR-based ice drift estimates compared to (b, d) the forecasted ice drift.The arrows depict ice drift, and the colors indicate the respective vector lengths.(a) refers to the SAR-based estimates of sea ice displacement during the time period between the 20 January, 20:17 UTC, and the 21 January, 16:12 UTC, interpolated on the model grid.(b) depicts an average of all 6-hourly forecast model outputs included in this period.Likewise, (c) refers to the SAR-based estimate during the time period between 1 February, 15:51 UTC, and the 3 February, 05:03 UTC.(d) refers to the corresponding average of 6-hourly model snapshots.Observe the different arrow lengths in (a) and (b) (resp., c and d).(a) and (c) were generated by using MyOcean Products.

Figure 5 .
Figure 5. (a)-(c) Observed ship speed distribution under several (binned) ice conditions, described by box plots.The bottom and top of the boxes are the first and third quartiles, while the thicker band inside the boxes depicts the median.Lines extending vertically from the boxes (whiskers) depict ship speeds within 1.5 times the interquartile range of the box.Outliers are plotted as individual points.(a) refers to ice concentration, (b) to ice thickness and (c) to ridge density.

Figure 6 .
Figure 6.(a)-(c) The distribution of observed ship speeds under various ice-related factors.As in Fig. 5 the respective distribution of ship speeds is depicted by box plots.(a) refers to convergent and nonconvergent ice motion, (b) explores various classes of ice drift speed and (c) refers to the specific situation where the ice is drifting very slowly (< 0.1 m s −1 ) and, additionally, the ice drift angle is close to 90 • relative to the ship course.(Naturally, only data sets with ship and ice speeds > 0 could be considered.)

Figure 7 .
Figure 7. (a)-(d) Observed (red lines) and reconstructed (black lines) ship speeds for typical vessels in the test region: (a) general cargo, 120 m; (b) oil tanker, 140 m; (c) cargo, 117 m; (d) Ro-Ro, 166 m.The reconstructions are based on a multilinear regression of forecasted ice concentration, level ice thickness, ridge density, ice speed and an additional factor which is based, inter alia, on the angle at which the ship is moving relative to the ice movement (parameterized as described in Fig.3c).

Table 1 .
Summary of the parameters obtained by fitting the mixed-effect model.

Table 2 .
Random components of the mixed-effect model.