Skip to main content
Log in

Automatic water mixing event identification in the Koljö fjord observatory data

  • Applications
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

This study addresses the task of automatically identifying water mixing events in the multivariate time series of salinity, temperature and dissolved oxygen provided by the Koljö fjord observatory. The observatory is used to test new underwater sensory technology and to monitor water quality with respect to hypoxia and oxygenation in the fjord and has been collecting data since April 2011. The fjord water properties change, manifesting as peaks or drops of dissolved oxygen, salinity and temperature, when affected by inflows of new water originating from the open sea or by rivers connected to the fjord system. An acute state of oxygen depletion can harm wildlife and the ecosystem permanently. The major challenge for the analysis is that the water property changes are marked by highly varying peak strength and correlation between the signals. The proposed data-driven analysis method extends existing univariate outlier detection approaches, based on clustering techniques, to identify the water mixing events. It incorporates three major steps: 1. smoothing of the input data, to counter noise, 2. individual outlier detection within the separate variables, 3. clustering of the results using the DBSCAN clustering algorithm to determine the anomalous events. The proposed approach is able to detect the water mixing events with a \(F{\textit{1}}\)-measure of 0.885, a precision of 0.931—that is 93.1% of all events have been correctly detected—and a recall of 0.843–84.3% of events that should have been found actually also have been. Using the proposed method, the oceanographers can be informed automatically about the status of the fjord without manual interaction or physical presence at the experiment site.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Median-smoothed curves reflect slope changes with a delay of half the filter window size.

  2. Estimated minimum duration of a typical water mixing event.

References

  1. Aanderaa Data Instruments AS: Aanderaa Recording Doppler Current Meter 600. http://www.aanderaa.com/media/pdfs/RDCP-600.pdf/ (2016a). [Online; Accessed 07 Oct 2016; 10:23 CEST]

  2. Aanderaa Data Instruments AS: Aanderaa Seaguard II DCP Doppler Current Profiler. http://www.aanderaa.com/media/pdfs/seaguardii-dcp.pdf/ (2016b). [Online; Accessed 07 Oct 2016; 10:33 CEST]

  3. Aanderaa Data Instruments AS: Aanderaa Seaguard String System. http://www.aanderaa.com/media/pdfs/seaguard-string-system.pdf/ (2016c). [Online; Accessed 07 Oct 2016; 10:34 CEST]

  4. Andersson, L., Rydberg, L.: Trends in nutrient and oxygen conditions within the Kattegat: effects of local nutrient supply. Estuar. Coast. Shelf Sci. 26(5), 559–579 (1988)

    Article  Google Scholar 

  5. Arce, G., McLoughlin, M.: Theoretical analysis of the max/median filter. IEEE Trans. Acoust. Speech Signal Process. 35(1), 60–69 (1987)

    Article  Google Scholar 

  6. Atamanchuk, D., Tengberg, A., Aleynik, D., Fietzek, P., Shitashima, K., Lichtschlag, A., Hall, P.O., Stahl, H.: Detection of CO2 leakage from a simulated sub-seabed storage site using three different types of CO2 sensors. Int. J. Greenh. Gas Control 38, 121–134 (2015)

    Article  Google Scholar 

  7. Bagnall, A.J., Janacek, G.J.: Clustering time series from ARMA models with clipped data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 49–58 (2004)

  8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  9. Diepenbroek, M., Grobe, H., Reinke, M., Schindler, U., Schlitzer, R., Sieger, R., Wefer, G.: PANGAEA—an information system for environmental sciences. Comput. Geosci. 28(10), 1201–1210 (2002)

    Article  Google Scholar 

  10. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl. Discov. Data Min. 96, 226–231 (1996)

    Google Scholar 

  11. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., et al: Open mpi: goals, concept, and design of a next generation mpi implementation. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, pp. 97–104. Springer (2004)

  12. Gariel, M., Srivastava, A.N., Feron, E.: Trajectory clustering and an application to airspace monitoring. IEEE Trans. Intell. Transp. Syst. 12(4), 1511–1524 (2011)

    Article  Google Scholar 

  13. Götz, M., Bodenstein, C., Riedel, M.: HPDBSCAN: highly parallel DBSCAN. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, ACM, p. 2 (2015)

  14. Goutte, C., Toft, P., Rostrup, E., Nielsen, F.Å., Hansen, L.K.: On clustering fMRI time series. NeuroImage 9(3), 298–310 (1999)

    Article  Google Scholar 

  15. Götz, M., Kononets, M.: Auxiliary material for the Koljöfjord observatory water mixing event detection using DBSCAN. http://hdl.handle.net/11304/8e3d1c07-96b6-4ab7-b4aa-f273ac8cbf74/ (2016). [Online; Accessed 17 Nov 2016; 16:07 CET]

  16. Hallac, D., Vare, S., Boyd, S., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 215–223 (2017)

  17. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)

    MATH  Google Scholar 

  18. Hansson, D., Stigebrandt, A., Liljebladh, B.: Modelling the Orust fjord system on the Swedish west coast. J. Mar. Syst. 113, 29–41 (2013)

    Article  Google Scholar 

  19. Himberg, J., Hyvärinen, A., Esposito, F.: Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22(3), 1214–1222 (2004)

    Article  Google Scholar 

  20. Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  21. Jiang, D., Pei, J., Zhang, A.: Dhc: a density-based hierarchical clustering method for time series gene expression data. In: Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings, pp. 393–400. IEEE (2003)

  22. Johnston, F., Boyland, J., Meadows, M., Shale, E.: Some properties of a simple moving average when applied to forecasting a time series. J. Oper. Res. Soc. 50(12), 1267–1271 (1999)

    Article  MATH  Google Scholar 

  23. Klise, K.A., McKenna, S.A.: Water quality change detection: multivariate algorithms. In: Defense and Security Symposium, International Society for Optics and Photonics, p. 62030J (2006)

  24. Koljöfjord Observatory Koljöfjord Observatory Data. http://koljofjord.cmb.gu.se/data/ (2016a). [Online; Accessed 19 June 2016; 15:07 CEST]

  25. Koljöfjord Observatory PANGAEA Data Repository, Koljöfjord entries. https://pangaea.de/search?q=KOLJOEFJORD (2016b). [Online; Accessed 19 June 2016; 15:08 CEST]

  26. Kononets, M., Götz, M.: Koljöfjord Observatory Preprocessed Data And Water Mixing Events. http://hdl.handle.net/11304/f76da1d9-c61e-4250-beca-94d1b2803e77/ (2016). [Online; Accessed 07 Oct 2016; 10:15 CEST]

  27. Kut, A., Birant, D.: Spatio-temporal outlier detection in large databases. CIT J. Comput. Inf. Technol. 14(4), 291–297 (2006)

    Article  Google Scholar 

  28. Liao, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)

    Article  MATH  Google Scholar 

  29. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 14, pp. 281–297 (1967)

  30. Madsen, H.: Time series analysis. CRC Press, Boca Raton (2007)

    Book  MATH  Google Scholar 

  31. Götz, M.: PANGAEA Github Repository. https://github.com/Markus-Goetz/pangaea (2016). [Online; Accessed 13 Jan 2016; 14:04 CET]

  32. McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc., Newton (2012)

    Google Scholar 

  33. Murray, R., Haxton, T., McKenna, S., Hart, D., Klise, K., Koch, M., Vugrin, E., Martin, S., Wilson, M., Cruze, V., et al.: Water quality event detection systems for drinking water contamination warning systems—development, testing, and application of canary. EPAI600IR-lOI036, US (2010)

  34. Nordberg, K., Filipsson, H.L., Gustafsson, M., Harland, R., Roos, P.: Climate, hydrographic variations and marine benthic hypoxia in Koljö Fjord, Sweden. J. Sea Res. 46(3), 187–200 (2001)

    Article  Google Scholar 

  35. Pavlidis, N.G., Tasoulis, D.K., Plagianakos, V.P., Vrahatis, M.N.: Computational intelligence methods for financial time series modeling. Int. J. Bifurc. Chaos 16(07), 2053–2062 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  36. Perelman, L., Arad, J., Housh, M., Ostfeld, A.: Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46(15), 8212–8219 (2012)

    Article  Google Scholar 

  37. Powers, D.: Evaluation: From Precision, Recall and F Factor to ROC, Informedness, Markedness and Correaltion. School of Informatics and Engineering, Flinders, Bedford Park (2007)

    Google Scholar 

  38. Swedish Meteorological and Hydrological Institute: Marina miljöövervakningsdata. http://www.smhi.se/klimatdata/oceanografi/havsmiljodata/marina-miljoovervakningsdata (2016). [Online; Accessed 19 Sept 2016; 16:29 CEST]

  39. University of Gothenburg: Sven Lovén centrum för marin infrastruktur—Väderstation Kristineberg. http://www.weather.loven.gu.se/kristineberg/ (2016). [Online; Accessed 19 Sept 2016; 16:55 CEST]

  40. Whitle, P.: Hypothesis Testing in Time Series Analysis, vol. 4. Almqvist & Wiksells, Stockholm (1951)

    Google Scholar 

  41. Zhao, H., Hou, D., Huang, P., Zhang, G.: Water quality event detection in drinking water network. Water Air Soil Pollut 225(11), 1–15 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

The installation of the Koljö fjord cabled observatory was carried out by the University of Gothenburg in collaboration with MARUM, University of Bremen, Germany, and funded by the European Commission projects ESONET-NoE (contract number 036851), HYPOX (Grant agreement number 226213) and EMSO (Grant agreement number 211816). This work is also supported by Aanderaa Data Instruments AS providing the Doppler Current Profiler instruments, other material and financial support to run the Koljö fjord observatory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Götz.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Götz, M., Kononets, M., Bodenstein, C. et al. Automatic water mixing event identification in the Koljö fjord observatory data. Int J Data Sci Anal 7, 67–79 (2019). https://doi.org/10.1007/s41060-018-0132-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0132-z

Keywords

Navigation