Skip to main content
Log in

Anomalous variable-length subsequence detection in time series: mathematical formulation and a novel evolutionary algorithm based on clustering and swarm intelligence

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Variable-length anomalous subsequence detection in time series has many important applications in the real world, yet the methods presented in existing studies are computationally expensive, as the detection techniques are mostly brute-force approaches. In this work, we formalize the detection problem into a subsequence segmentation problem (SSP) optimization task, in which the time series is segmented by a set of cutting points into subsequences with minimized total distances to the representative motif. The anomalous subsequences can then be accurately located by reducing the dissimilarity among all subsequences, and this technique, when compared to existing techniques, can reduce the number of comparisons required for search. We further introduce a new clustering-based and swarm intelligence-based evolutionary algorithm (CBSI) in this work to solve the highly complex SSP efficiently. The proposed method balances the scopes of exploration and exploitation under a local-global search strategy. The CBSI clusters the solutions in the search space into groups, allowing frequent information sharing among solutions in the same cluster for their exploitation within their own search spaces. Furthermore, the best local solutions are promoted by the global-search strategy to explore the remaining search regions. Through a comparison with existing state-of-the-art techniques in solving both synthetic and real-world problems, we show that any optimization methods under our proposed SSP bring significant computational savings and comparable searching accuracy compared to existing techniques for the detection task. Our proposed CBSI also has the highest searching capability compared to existing and related optimization methods. The experimental results also highlight the scalability of our study to longer time series, larger anomaly sizes and wider search ranges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availibility Statement

All datasets used in this paper are available online. In specific, all synthesized datasets in Section 4.2 are collected from UCR time series classification dataset, see Reference [4] for details. The ECG time series data in Section 4.2.1 is collected from the MIT-BIT Arrhythmia database, see Reference 11 for details. The traffic volume dataset is collected from the Freeway Bureau of Taiwan, see Reference 6 for details.

References

  1. Abdollahzadeh B, Gharehchopogh FS, Mirjalili S (2021) African vultures optimization algorithm: a new nature-inspired metaheuristic algorithm for global optimization problems. Comput Ind Eng 158:107408

  2. Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288. https://doi.org/10.1016/j.future.2015.01.001

    Article  Google Scholar 

  3. Crawford B, Soto R, Astorga G et al (2017) Putting continuous metaheuristics to work in binary search spaces. Complexity 2017:1–19. https://doi.org/10.1155/2017/8404231

    Article  MathSciNet  Google Scholar 

  4. Dau HA, Bagnall A, Kamgar K et al (2018) The ucr time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

  5. Febrero M, Galeano P, González-Manteiga W (2007) Outlier detection in functional data by depth measures, with application to identify abnormal nox levels. Envirometrics 19(4):331–345. https://doi.org/10.1002/env.878

    Article  Google Scholar 

  6. Freeway Bureau (2022) The ministry of transportation and communications of Taiwan. https://tisvcloud.freeway.gov.tw

  7. Gálvez J, Cuevas E, Becerra H et al (2020) A hybrid optimization approach based on clustering and chaotic sequences. Int J Mach Learn Cybernet 11:359–401

    Article  Google Scholar 

  8. Gupta A, Datta S, Das S (2018) Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering. Pattern Recognit Lett 116:72–79. https://doi.org/10.1016/j.patrec.2018.09.003

    Article  Google Scholar 

  9. Heidari AA, Mirjalili S, Faris H et al (2019) Harris hawks optimization: algorithm and applications. Futur Gener Comput Syst 97:849–872

  10. Hu M, Feng X, Ji Z et al (2019) A novel computational approach for discord search with local recurrence rates in multivariate time series. Inf Sci 477:220–233. https://doi.org/10.1016/j.ins.2018.10.047

    Article  Google Scholar 

  11. Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: The fifth IEEE international conference on data mining. IEEE Computer Society, 1106352, pp 226–233. https://doi.org/10.1109/icdm.2005.79. http://www.cs.cuhk.hk/~adafu/Pub/icdm05time.pdf

  12. Levine J, Ducatelle F (2004) Ant colony optimization and local search for bin packing and cutting stock problems. J Oper Res Soc 55:705–716

    Article  Google Scholar 

  13. Li S, Chen H, Wang M et al (2020) Slime mould algorithm: a new method for stochastic optimization. Futur Gener Comput Syst 111:300–323

  14. Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: SDM. Society for industrial and applied mathematics, pp 895–906. https://doi.org/10.1137/1.9781611972825.77

  15. Lin J, Keogh E, Lonardi S, et al (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Zaki MJ, Aggarwal CC (eds) Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery - DMKD ’03. Association for Computing Machinery, New York, NY, USA, DMKD ’03, pp 2–11. https://doi.org/10.1145/882082.882086

  16. Linardi M, Zhu Y, Palpanas T et al (2020) Matrix profile goes mad: variable-length motif and discord discovery in data series. Data Min Knowl Disc 34:1022–1071. https://doi.org/10.1007/s10618-020-00685-w. arXiv:2008.13447

  17. Lu Q, Wang Z, Chen M (2008) An ant colony optimization algorithm for the one-dimensional cutting stock problem with multiple stock lengths. 2008 Fourth Int Conf Nat Comput 7:475–479. https://doi.org/10.1109/icnc.2008.208

  18. Luo W, Gallagher M (2011) Faster and parameter-free discord search in quasi-periodic time series. In: Huang JZ, Cao L, Srivastava J (eds) Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol 6635. Springer, Verlag Berlin Heidelberg, pp 135–148. https://doi.org/10.1007/978-3-642-20847-8_12

  19. Luo W, Gallagher M, Wiles J (2013) Parameter-free search of time-series discord. J Comput Sci Technol 28(2):300–310. https://doi.org/10.1007/s11390-013-1330-8

    Article  Google Scholar 

  20. Matsumoto K, Umetani S, Nagamochi H (2011) On the one-dimensional stock cutting problem in the paper tube industry. J Sched 14:281–290. https://doi.org/10.1007/s10951-010-0164-2

    Article  MathSciNet  Google Scholar 

  21. Nguyen TPQ, Phuc PNK, Yang CL et al (2023) Time-series anomaly detection using dynamic programming based longest common subsequence on sensor data. Expert Syst Appl 213:118902

    Article  Google Scholar 

  22. Paparrizos J, Gravano L (2016) k-shape: efficient and accurate clustering of time series. SIGMOD Rec 45:69–76. https://doi.org/10.1145/2723372.2737793

  23. Phoa FKH (2017) A swarm intelligence based (sib) method for optimization in designs of experiments. Nat Comput 16(4):597–605. https://doi.org/10.1007/s11047-016-9555-4

    Article  MathSciNet  Google Scholar 

  24. Phoa FKH, Chen RB, Wang W et al (2016) Optimizing two-level supersaturated designs using swarm intelligence techniques. Technometrics 58:43–49

    Article  MathSciNet  Google Scholar 

  25. Rahmani A, Afra S, Zarour O et al (2014) Graph-based approach for outlier detection in sequential data and its application on stock market and weather data. Knowl-Based Syst 61:89–97. https://doi.org/10.1016/j.knosys.2014.02.008

    Article  Google Scholar 

  26. Rohlfshagen P, Bullinaria JA (2007) A genetic algorithm with exon shuffling crossover for hard bin packing problems. In: Lipson H (ed) GECCO ’07. ACM Press, pp 1365–1371. https://doi.org/10.1145/1276958.1277213. http://www.cs.bham.ac.uk/~jxb/PUBS/BPP.pdf

  27. Sanchez IAL, Mora-Vargas J, Santos CA et al (2018) Solving binary cutting stock with matheuristics using particle swarm optimization and simulated annealing. Soft Comput 22:6111–6119. https://doi.org/10.1007/s00500-017-2666-8

    Article  Google Scholar 

  28. Santhosh KK, Dogra DP, Roy PP et al (2021) Vehicular trajectory classification and traffic anomaly detection in videos using a hybrid cnn-vae architecture. IEEE Transactions on Intelligent Transportation Systems pp 1–12. https://doi.org/10.1109/tits.2021.3108504

  29. Senin P, Lin J, Wang X et al (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders T, Esposito F, Hüllermeier E et al (eds) ECML/PKDD, vol 8726. Springer Berlin Heidelberg, pp 468–472. https://doi.org/10.1007/978-3-662-44845-8_37

  30. Senin P, Lin J, Wang X et al (2018) Grammarviz 3.0: interactive discovery of variable-length time series patterns. ACM Trans Knowl Disc Data (TKDD) 12:1–28. https://doi.org/10.1145/3051126

  31. Wang J, Ma Y, Zhang L et al (2018) Deep learning for smart manufacturing: methods and applications. J Manuf Syst 48:144–156. https://doi.org/10.1016/j.jmsy.2018.01.003

  32. Yang CL, Sutrisno H (2020) A clustering-based symbiotic organisms search algorithm for high-dimensional optimization problems. Appl Soft Comput 97:106722. https://doi.org/10.1016/j.asoc.2020.106722

    Article  Google Scholar 

  33. Yang CL, Darwin F, Sutrisno H (2019) Local recurrence rates with automatic time windows for discord search in multivariate time series. Procedia Manuf 39:1783–1792. https://doi.org/10.1016/j.promfg.2020.01.261

    Article  Google Scholar 

  34. Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. 2016 IEEE 16th International Conference on Data Mining (ICDM) pp 1317–1322. https://doi.org/10.1109/icdm.2016.0179

  35. Zhang L, Gao Y, Lin J (2020) Semantic discord: Finding unusual local patterns for time series. In: Demeniconi C, Chawla NV (eds) Proceedings of the 2020 SIAM International Conference on Data Mining, SIAM. Society for Industrial and Applied Mathematics, pp 136–144. https://doi.org/10.1137/1.9781611976236.16

  36. Zhang Y, Chen Y, Wang J et al (2021) Unsupervised deep anomaly detection for multi-sensor time-series signals. IEEE Trans Knowl Data Eng abs/2107.12626:1–1. https://doi.org/10.1109/TKDE.2021.3102110. arXiv:2107.12626

Download references

Acknowledgements

This project is partly supported by Academia Sinica Grant Nos. AS-TP-109-M07 and AS-IA-112-M03, and the National Science Council (Taiwan) Grant Nos. 107-2118-M-001-011-MY3 and 111-2118-M-001-007-MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederick Kin Hing Phoa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sutrisno, H., Phoa, F.K.H. Anomalous variable-length subsequence detection in time series: mathematical formulation and a novel evolutionary algorithm based on clustering and swarm intelligence. Appl Intell 53, 29585–29603 (2023). https://doi.org/10.1007/s10489-023-05066-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05066-6

Keywords

Navigation