ABSTRACT
Researchers reuse data from past studies to avoid costly re-collection of experimental data. However, large-scale data reuse is challenging due to lack of consensus on metadata representations among research groups and disciplines. Dataset File System (DFS) is a semi-structured data description format that promotes such consensus by standardizing the semantics of data description, storage, and retrieval. In this paper, we present analytic-streams - a specification for streaming data analytics with DFS, and streaming-hub - a visual programming toolkit built on DFS to simplify data analysis workflows. Analytic-streams facilitate higher-order data analysis with less computational overhead, while streaming-hub enables storage, retrieval, manipulation, and visualization of data and analytics. We discuss how they simplify data pre-processing, aggregation, and visualization, and their implications on data analysis workflows.
- A. Batch and N. Elmqvist. 2018. The Interactive Visualization Gap in Initial Exploratory Data Analysis. IEEE Transactions on Visualization and Computer Graphics, Vol. 24, 1 (Jan. 2018), 278--287.Google ScholarCross Ref
- G.H. Brimhall and A. Vanegas. 2001. Removing Science Workflow Barriers to Adoption of Digital Geologic Mapping by Using the GeoMapper Universal Program and Visual User Interface. In Digital Mapping Techniques. U.S. Geological Survey Open-File Report 01--223, Tuscaloosa, AL, USA, 103--115.Google Scholar
- J. Demvs ar, T. Curk, A. Erjavec, C. Gorup, et almbox. 2013. Orange: data mining toolbox in Python. The Journal of Machine Learning Research, Vol. 14, 1 (2013), 2349--2353.Google Scholar
- S. Jayarathna and F. Shipman. 2017. Analysis and Modeling of Unified User Interest. In 2017 IEEE International Conference on Information Reuse and Integration (IRI). IEEE, San Diego, CA, USA, 298--307.Google Scholar
- Y. Jayawardana and S. Jayarathna. 2019. DFS: A Dataset File System for Data Discovering Users. In ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, Urbana-Champaign, IL, 355--356.Google Scholar
- S. Kandel, J. Heer, C. Plaisant, J. Kennedy, et al. 2011. Research directions in data wrangling: Visualizations and transformations for usable and credible data. Information Visualization, Vol. 10, 4 (2011), 271--288.Google ScholarDigital Library
- C. Kothe. 2014. Lab streaming layer (LSL). https://github.com/sccn/labstreaminglayer, Vol. 26 (2014), 2015.Google Scholar
- J.J. Thomas and K.A. Cook. 2006. A visual analytics agenda. IEEE computer graphics and applications, Vol. 26, 1 (2006), 10--13.Google Scholar
Index Terms
- Streaming Analytics and Workflow Automation for DFS
Recommendations
Streaming Analytics with Adaptive Near-data Processing
WWW '22: Companion Proceedings of the Web Conference 2022Streaming analytics applications need to process massive volumes of data in a timely manner, in domains ranging from datacenter telemetry and geo-distributed log analytics to Internet-of-Things systems. Such applications suffer from significant network ...
Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid ComputingWe are now witnessing an unprecedented growth of data that needs to be processed at always increasing rates in order to extract valuable insights. Big Data streaming analytics tools have been developed to cope with the online dimension of data ...
Novel Scalable Deep Learning Approaches for Big Data Analytics Applied to ECG Processing
Big data analytics and deep learning are nowadays two of the most active research areas in computer science. As the data is becoming bigger and bigger, deep learning has a very important role to play in data analytics, and big data technologies will ...
Comments