Abstract
This study addresses the task of automatically identifying water mixing events in the multivariate time series of salinity, temperature and dissolved oxygen provided by the Koljö fjord observatory. The observatory is used to test new underwater sensory technology and to monitor water quality with respect to hypoxia and oxygenation in the fjord and has been collecting data since April 2011. The fjord water properties change, manifesting as peaks or drops of dissolved oxygen, salinity and temperature, when affected by inflows of new water originating from the open sea or by rivers connected to the fjord system. An acute state of oxygen depletion can harm wildlife and the ecosystem permanently. The major challenge for the analysis is that the water property changes are marked by highly varying peak strength and correlation between the signals. The proposed data-driven analysis method extends existing univariate outlier detection approaches, based on clustering techniques, to identify the water mixing events. It incorporates three major steps: 1. smoothing of the input data, to counter noise, 2. individual outlier detection within the separate variables, 3. clustering of the results using the DBSCAN clustering algorithm to determine the anomalous events. The proposed approach is able to detect the water mixing events with a \(F{\textit{1}}\)-measure of 0.885, a precision of 0.931—that is 93.1% of all events have been correctly detected—and a recall of 0.843–84.3% of events that should have been found actually also have been. Using the proposed method, the oceanographers can be informed automatically about the status of the fjord without manual interaction or physical presence at the experiment site.
Similar content being viewed by others
Notes
Median-smoothed curves reflect slope changes with a delay of half the filter window size.
Estimated minimum duration of a typical water mixing event.
References
Aanderaa Data Instruments AS: Aanderaa Recording Doppler Current Meter 600. http://www.aanderaa.com/media/pdfs/RDCP-600.pdf/ (2016a). [Online; Accessed 07 Oct 2016; 10:23 CEST]
Aanderaa Data Instruments AS: Aanderaa Seaguard II DCP Doppler Current Profiler. http://www.aanderaa.com/media/pdfs/seaguardii-dcp.pdf/ (2016b). [Online; Accessed 07 Oct 2016; 10:33 CEST]
Aanderaa Data Instruments AS: Aanderaa Seaguard String System. http://www.aanderaa.com/media/pdfs/seaguard-string-system.pdf/ (2016c). [Online; Accessed 07 Oct 2016; 10:34 CEST]
Andersson, L., Rydberg, L.: Trends in nutrient and oxygen conditions within the Kattegat: effects of local nutrient supply. Estuar. Coast. Shelf Sci. 26(5), 559–579 (1988)
Arce, G., McLoughlin, M.: Theoretical analysis of the max/median filter. IEEE Trans. Acoust. Speech Signal Process. 35(1), 60–69 (1987)
Atamanchuk, D., Tengberg, A., Aleynik, D., Fietzek, P., Shitashima, K., Lichtschlag, A., Hall, P.O., Stahl, H.: Detection of CO2 leakage from a simulated sub-seabed storage site using three different types of CO2 sensors. Int. J. Greenh. Gas Control 38, 121–134 (2015)
Bagnall, A.J., Janacek, G.J.: Clustering time series from ARMA models with clipped data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 49–58 (2004)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Diepenbroek, M., Grobe, H., Reinke, M., Schindler, U., Schlitzer, R., Sieger, R., Wefer, G.: PANGAEA—an information system for environmental sciences. Comput. Geosci. 28(10), 1201–1210 (2002)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl. Discov. Data Min. 96, 226–231 (1996)
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., et al: Open mpi: goals, concept, and design of a next generation mpi implementation. In: European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting, pp. 97–104. Springer (2004)
Gariel, M., Srivastava, A.N., Feron, E.: Trajectory clustering and an application to airspace monitoring. IEEE Trans. Intell. Transp. Syst. 12(4), 1511–1524 (2011)
Götz, M., Bodenstein, C., Riedel, M.: HPDBSCAN: highly parallel DBSCAN. In: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, ACM, p. 2 (2015)
Goutte, C., Toft, P., Rostrup, E., Nielsen, F.Å., Hansen, L.K.: On clustering fMRI time series. NeuroImage 9(3), 298–310 (1999)
Götz, M., Kononets, M.: Auxiliary material for the Koljöfjord observatory water mixing event detection using DBSCAN. http://hdl.handle.net/11304/8e3d1c07-96b6-4ab7-b4aa-f273ac8cbf74/ (2016). [Online; Accessed 17 Nov 2016; 16:07 CET]
Hallac, D., Vare, S., Boyd, S., Leskovec, J.: Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 215–223 (2017)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Hansson, D., Stigebrandt, A., Liljebladh, B.: Modelling the Orust fjord system on the Swedish west coast. J. Mar. Syst. 113, 29–41 (2013)
Himberg, J., Hyvärinen, A., Esposito, F.: Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22(3), 1214–1222 (2004)
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Jiang, D., Pei, J., Zhang, A.: Dhc: a density-based hierarchical clustering method for time series gene expression data. In: Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings, pp. 393–400. IEEE (2003)
Johnston, F., Boyland, J., Meadows, M., Shale, E.: Some properties of a simple moving average when applied to forecasting a time series. J. Oper. Res. Soc. 50(12), 1267–1271 (1999)
Klise, K.A., McKenna, S.A.: Water quality change detection: multivariate algorithms. In: Defense and Security Symposium, International Society for Optics and Photonics, p. 62030J (2006)
Koljöfjord Observatory Koljöfjord Observatory Data. http://koljofjord.cmb.gu.se/data/ (2016a). [Online; Accessed 19 June 2016; 15:07 CEST]
Koljöfjord Observatory PANGAEA Data Repository, Koljöfjord entries. https://pangaea.de/search?q=KOLJOEFJORD (2016b). [Online; Accessed 19 June 2016; 15:08 CEST]
Kononets, M., Götz, M.: Koljöfjord Observatory Preprocessed Data And Water Mixing Events. http://hdl.handle.net/11304/f76da1d9-c61e-4250-beca-94d1b2803e77/ (2016). [Online; Accessed 07 Oct 2016; 10:15 CEST]
Kut, A., Birant, D.: Spatio-temporal outlier detection in large databases. CIT J. Comput. Inf. Technol. 14(4), 291–297 (2006)
Liao, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 14, pp. 281–297 (1967)
Madsen, H.: Time series analysis. CRC Press, Boca Raton (2007)
Götz, M.: PANGAEA Github Repository. https://github.com/Markus-Goetz/pangaea (2016). [Online; Accessed 13 Jan 2016; 14:04 CET]
McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc., Newton (2012)
Murray, R., Haxton, T., McKenna, S., Hart, D., Klise, K., Koch, M., Vugrin, E., Martin, S., Wilson, M., Cruze, V., et al.: Water quality event detection systems for drinking water contamination warning systems—development, testing, and application of canary. EPAI600IR-lOI036, US (2010)
Nordberg, K., Filipsson, H.L., Gustafsson, M., Harland, R., Roos, P.: Climate, hydrographic variations and marine benthic hypoxia in Koljö Fjord, Sweden. J. Sea Res. 46(3), 187–200 (2001)
Pavlidis, N.G., Tasoulis, D.K., Plagianakos, V.P., Vrahatis, M.N.: Computational intelligence methods for financial time series modeling. Int. J. Bifurc. Chaos 16(07), 2053–2062 (2006)
Perelman, L., Arad, J., Housh, M., Ostfeld, A.: Event detection in water distribution systems from multivariate water quality time series. Environ. Sci. Technol. 46(15), 8212–8219 (2012)
Powers, D.: Evaluation: From Precision, Recall and F Factor to ROC, Informedness, Markedness and Correaltion. School of Informatics and Engineering, Flinders, Bedford Park (2007)
Swedish Meteorological and Hydrological Institute: Marina miljöövervakningsdata. http://www.smhi.se/klimatdata/oceanografi/havsmiljodata/marina-miljoovervakningsdata (2016). [Online; Accessed 19 Sept 2016; 16:29 CEST]
University of Gothenburg: Sven Lovén centrum för marin infrastruktur—Väderstation Kristineberg. http://www.weather.loven.gu.se/kristineberg/ (2016). [Online; Accessed 19 Sept 2016; 16:55 CEST]
Whitle, P.: Hypothesis Testing in Time Series Analysis, vol. 4. Almqvist & Wiksells, Stockholm (1951)
Zhao, H., Hou, D., Huang, P., Zhang, G.: Water quality event detection in drinking water network. Water Air Soil Pollut 225(11), 1–15 (2014)
Acknowledgements
The installation of the Koljö fjord cabled observatory was carried out by the University of Gothenburg in collaboration with MARUM, University of Bremen, Germany, and funded by the European Commission projects ESONET-NoE (contract number 036851), HYPOX (Grant agreement number 226213) and EMSO (Grant agreement number 211816). This work is also supported by Aanderaa Data Instruments AS providing the Doppler Current Profiler instruments, other material and financial support to run the Koljö fjord observatory.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Götz, M., Kononets, M., Bodenstein, C. et al. Automatic water mixing event identification in the Koljö fjord observatory data. Int J Data Sci Anal 7, 67–79 (2019). https://doi.org/10.1007/s41060-018-0132-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0132-z