skip to main content
research-article

Elastic Data Binning: Time-Series Sketching for Time-Domain Astrophysics Analysis

Published:19 July 2023Publication History
Skip Abstract Section

Abstract

Time-domain astrophysics analysis (TDAA) involves observational surveys of celestial phenomena that may contain irrelevant information because of several factors, one of which is the sensitivity of the optical telescopes. Data binning is a typical technique for removing inconsistencies and clarifying the main characteristics of the original data in astrophysics analysis. It splits the data sequence into smaller bins with a fixed size and subsequently sketches them into a new representation form. In this study, we introduce a novel approach, called elastic data binning (EBinning), to automatically adjust each bin size using two statistical metrics based on the Student's t-test for linear regression and Hoeffding inequality. EBinning outperforms well-known algorithms in TDAA for extracting relevant characteristics of time-series data, called lightcurve. We demonstrate the successful representation of various characteristics in the lightcurve gathered from the Kiso Schmidt telescope using EBinning and its applicability for transient detection in TDAA.

References

  1. C. C. Aggarwal. An Introduction to Outlier Analysis. Springer International Publishing, Cham, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Aizawa, K. Kawana, K. Kashiyama, R. Ohsawa, H. Kawahara, F. Naokawa, T. Tajiri, et al. Fast optical flares from M dwarfs detected by a one-second-cadence survey with Tomo-e Gozen. Publications of the Astronomical Society of Japan (PASJ), 74(5):1069--1094, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  3. Astropy Collaboration, A. M. Price-Whelan, B. M. Sipőcz, H. M. Günther, P. L. Lim, S. M. Crawford, S. Conseil, et al. The astropy project: Building an open-science project and status of the v2.0 core package. The Astronomical Journal, 156(3):123, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  4. F. Bischoff. TSMP: Time series with matrix profile, 2022. R package version 0.4.15.Google ScholarGoogle Scholar
  5. A. Blázquez-García, A. Conde, U. Mori, and J. A. Lozano. A review on outlier/anomaly detection in time series data. ACM Comput. Surv., 54(3), 2021.Google ScholarGoogle Scholar
  6. H.-P. Chan, K. I. Konstantinou, and M. Blackett. Spatio-temporal surface temperature variations detected by satellite thermal infrared images at merapi volcano, indonesia. Journal of Volcanology and Geothermal Research, 420:107405, 2021.Google ScholarGoogle Scholar
  7. K.-P. Chan and A. W.-C. Fu. Efficient time series matching by wavelets. In Proceedings of 15th International Conference on Data Engineering (ICDE), pages 126--133, 1999.Google ScholarGoogle Scholar
  8. G. Chiarot and C. Silvestri. Time series compression survey. ACM Comput. Surv., 55(10), 2023.Google ScholarGoogle Scholar
  9. R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning. STL: A seasonal-trend decomposition. Journal of Official Statistics, 6(1):3--73, 1990.Google ScholarGoogle Scholar
  10. G. Cormode. Current trends in data summaries. SIGMOD Rec., 50(4):6--15, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. R. A. Davenport, S. L. Hawley, L. Hebb, J. P. Wisniewski, A. F. Kowalski, E. C. Johnson, M. A. Malatesta, et al. Kepler flares. II. the temporal morphology of white-light flares on GJ 1243. The Astrophysical Journal, 797, 2014.Google ScholarGoogle Scholar
  12. A. Dokumentov and R. J. Hyndman. STR: Seasonal-trend decomposition using regression. INFORMS Journal on Data Science, 1(1):50--62, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  13. P. Esling and C. Agon. Time-series data mining. ACM Comput. Surv., 45(1), 2012.Google ScholarGoogle Scholar
  14. I. Frías-Blanco, J. d. Campo-Ávila, G. Ramos-Jiménez, R. Morales-Bueno, A. Ortiz-Díaz, and Y. Caballero-Mota. Online and non-parametric drift detection methods based on hoeffding's bounds. IEEE Transactions on Knowledge and Data Engineering, 27(3):810--823, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Gama, I. Žliobaitundefined, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation. ACM Comput. Surv., 46(4), 2014.Google ScholarGoogle Scholar
  16. S. Gharghabi, Y. Ding, C.-C. M. Yeh, K. Kamgar, L. Ulanova, and E. Keogh. Matrix profile VIII: Domain agnostic online semantic segmentation at superhuman performance levels. In Proceedings of the IEEE International Conference on Data Mining (ICDM'17), pages 117--126, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  17. G. Helou and C. A. Beichman. The confusion limits to the sensitivity of submillimeter telescopes. In Liege International Astrophysical Colloquia, pages 117--123, 1990.Google ScholarGoogle Scholar
  18. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  19. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263--286, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. E. Keogh, J. Lin, and A. Fu. HOT SAX: efficiently finding the most unusual time series subsequence. In Proceedings of the IEEE International Conference on Data Mining (ICDM'05), pages 226 -- 233, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Keogh and A. A. Mueen. Time series data mining using the matrix profile: A unifying view of motif discovery, anomaly detection, segmentation, classification, clustering and similarity joins. Tutorials of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017.Google ScholarGoogle Scholar
  22. T. Kim and C. Park. Anomaly pattern detection for streaming data. Expert Systems with Applications, 149:113252, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. M. Law. STUMPY: A Powerful and Scalable Python Library for Time Series Data Mining. The Journal of Open Source Software, 4(39):1504, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD'03), pages 2--11, New York, NY, USA, 2003. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Lin, E. Keogh, L. Wei, and S. Lonardi. Experiencing sax: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2):107--144, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Lott, L. Escande, S. Larsson, and J. Ballet. An adaptive-binning method for generating constant-uncertainty/constant-significance light curves with fermi-lat data. Astronomy & Astrophysics (A&A), 544:A6, 2012.Google ScholarGoogle Scholar
  27. M. Madhavan and G. G. Nair. An effective sequence structure representation for long non-coding rna identification and cancer association using machine learning methods. SIGAPP Appl. Comput. Rev., 18(3):49--58, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Malinowski, T. Guyet, R. Quiniou, and R. Tavenard. 1d-sax: A novel symbolic representation for time series. In Advances in Intelligent Data Analysis XII, Lecture Notes in Computer Science, pages 273--284, Berlin, Heidelberg, 2013. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Mandel and E. Agol. Analytic light curves for planetary transit searches. The Astrophysical Journal, 580(2):L171, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  30. J. R. Martínez-Galarza, F. B. Bianco, D. Crake, K. Tirumala, A. A. Mahabal, M. J. Graham, and D. Giles. A method for finding anomalous astronomical light curves and their analogues. Monthly Notices of the Royal Astronomical Society, 508(4):5734--5756, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Nordin, V. Brinnel, J. van Santen, M. Bulla, U. Feindt, A. Franckowiak, C. Fremling, et al. Transient processing and analysis using ampel: alert management, photometry, and evaluation of light curves. Astronomy & Astrophysics (A&A), 631:A147, 2019.Google ScholarGoogle Scholar
  32. S. Pauwels and T. Calders. Detecting anomalies in hybrid business process logs. SIGAPP Appl. Comput. Rev., 19(2):18--30, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Petralia and G. Micela. Principal component analysis to correct data systematics. case study: K2 light curves. Experimental Astronomy, 49(3):97--114, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  34. T. Phungtua-eng. Supplementary website. Retrieved from https://sites.google.com/view/elasticdatabinning.Google ScholarGoogle Scholar
  35. T. Phungtua-Eng, Y. Yamamoto, and S. Sako. Detection for transient patterns with unpredictable duration using chebyshev inequality and dynamic binning. In Proceedings of the 9th International Symposium on Computing and Networking Workshops, pages 454--458, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  36. T. Phungtua-Eng, Y. Yamamoto, and S. Sako. Dynamic binning for the unknown transient patterns analysis in astronomical time series. In Proceedings of the 2021 IEEE International Conference on Big Data (BigData), pages 5988--5990, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  37. T. Phungtua-eng, Y. Yamamoto, and S. Sako. Elastic data binning for transient pattern analysis in time-domain astrophysics. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing (SAC'23), pages 342 -- 349, New York, NY, USA, 2023. Association for Computing Machinery.Google ScholarGoogle Scholar
  38. U. Rebbapragada, P. Protopapas, C. E. Brodley, and C. Alcock. Finding anomalous periodic time series. Machine Learning, 74(3):281--313, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Sako, R. Ohsawa, H. Takahashi, Y. Kojima, M. Doi, N. Kobayashi, T. Aoki, et al. The Tomo-e Gozen wide field CMOS camera for the Kiso Schmidt telescope. In C. J. Evans, L. Simard, and H. Takami, editors, Ground-based and Airborne Instrumentation for Astronomy VII (SPIE), volume 10702, page 107020J, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  40. K. Sayood. Introduction to Data Compression (Third Edition). The Morgan Kaufmann Series in Multimedia Information and Systems. Morgan Kaufmann, Burlington, 3rd edition, 2006.Google ScholarGoogle Scholar
  41. S. Seabold and J. Perktold. statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  42. P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, and S. Frankenstein. Grammarviz 3.0: Interactive discovery of variable-length time series patterns. ACM Trans. Knowl. Discov. Data, 12(1):10:1--10:28, 2018.Google ScholarGoogle Scholar
  43. G. Shevlyakov and M. Kan. Stream data preprocessing: Outlier detection based on the chebyshev inequality with applications. In 2020 26th Conference of Open Innovations Association (FRUCT), pages 402--407, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  44. R. Sulo, T. Berger-Wolf, and R. Grossman. Meaningful selection of temporal resolution for dynamic networks. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs (MLG'10), pages 127--136, New York, NY, USA, 2010. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Tavenard, J. Faouzi, G. Vandewiele, F. Divo, G. Androz, C. Holtz, M. Payne, R. Yurchak, M. Rußwurm, K. Kolar, and E. Woods. Tslearn, a machine learning toolkit for time series data. Journal of Machine Learning Research, 21(118):1--6, 2020.Google ScholarGoogle Scholar
  46. A. Tuzcu Kokal, I. Ismailoglu, N. Musaoglu, and A. Tanik. Detection of surface temperature anomaly of the sea of marmara. Advances in Space Research, 71(7):2996--3004, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  47. B. D. Warner. A Practical Guide to Lightcurve Photometry and Analysis. Springer Cham, Cham, Switzerland, 2nd ed. edition, 2016.Google ScholarGoogle Scholar
  48. B. L. Welch. The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3/4):350--362, 1938.Google ScholarGoogle ScholarCross RefCross Ref
  49. Q. Wen, J. Gao, X. Song, L. Sun, and J. Tan. RobustTrend: A huber loss with a combined first and second order difference regularization for time series trend filtering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI'19), pages 3856--3862. International Joint Conferences on Artificial Intelligence Organization, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  50. Q. Wen, J. Gao, X. Song, L. Sun, H. Xu, and S. Zhu. RobustSTL: A robust seasonal-trend decomposition algorithm for long time series. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 5409--5416, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C.-C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. A. Dau, D. F. Silva, A. Mueen, and E. Keogh. Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In Proceedings of the IEEE International Conference on Data Mining (ICDM'16), pages 1317--1322, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  52. L. Yuan, B. Pfahringer, and J. P. Barddal. Addressing feature drift in data streams using iterative subset selection. SIGAPP Appl. Comput. Rev., 19(1):20--33, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Y. Zhu, S. Gharghabi, D. F. Silva, H. A. Dau, C.-C. M. Yeh, N. Shakibay Senobari, A. Almaslukh, K. Kamgar, Z. Zimmerman, G. Funning, A. Mueen, and E. Keogh. The swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code. Data Mining and Knowledge Discovery, 34(4):949--979, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  54. Y. Zhu, Z. Zimmerman, N. S. Senobari, C.-C. M. Yeh, G. Funning, A. Mueen, P. Brisk, and E. Keogh. Matrix profile II: Exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In Proceedings of the IEEE International Conference on Data Mining (ICDM'16), pages 739--748, 2016.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Elastic Data Binning: Time-Series Sketching for Time-Domain Astrophysics Analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGAPP Applied Computing Review
        ACM SIGAPP Applied Computing Review  Volume 23, Issue 2
        June 2023
        52 pages

        Copyright © 2023 Copyright is held by the owner/author(s)

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2023

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)42
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader