Abstract
Data visualization is by far the most commonly used mechanism to explore and extract insights from datasets, especially by novice data scientists. And yet, current visual analytics tools are rather limited in their ability to operate on collections of visualizations---by composing, filtering, comparing, and sorting them---to find those that depict desired trends or patterns. The process of visual data exploration remains a tedious process of trial-and-error. We propose zenvisage, a visual analytics platform for effortlessly finding desired visual patterns from large datasets. We introduce zenvisage's general purpose visual exploration language, ZQL ("zee-quel") for specifying the desired visual patterns, drawing from use-cases in a variety of domains, including biology, mechanical engineering, climate science, and commerce. We formalize the expressiveness of ZQL via a visual exploration algebra---an algebra on collections of visualizations---and demonstrate that ZQL is as expressive as that algebra. zenvisage exposes an interactive front-end that supports the issuing of ZQL queries, and also supports interactions that are "short-cuts" to certain commonly used ZQL queries. To execute these queries, zenvisage uses a novel ZQL graph-based query optimizer that leverages a suite of optimizations tailored to the goal of processing collections of visualizations in certain pre-defined ways. Lastly, a user survey and study demonstrates that data scientists are able to effectively use zenvisage to eliminate error-prone and tedious exploration and directly identify desired visualizations.
- Airline dataset (http://stat-computing.org/dataexpo/2009/the-data.html). {Online; accessed 30-Oct-2015}.Google Scholar
- Effortless data exploration with zenvisage: An expressive and interactive visual analytics system. Technical Report. http://data-people.cs.illinois.edu/zenvisage.pdf.Google Scholar
- Spotfire, http://spotfire.com. {Online; accessed 17-Aug-2015}.Google Scholar
- Tableau public (www.tableaupublic.com/). {Online; accessed 3-March-2014}.Google Scholar
- Upwork (https://www.upwork.com/). {Online; accessed 3-August-2016}.Google Scholar
- Zillow real estate data (http://www.zillow.com/research/data/). {Online; accessed 1-Feb-2016}.Google Scholar
- Tableau q2 earnings: Impressive growth in customer base and revenues. http://www.forbes.com/sites/greatspeculations/2015/07/31/tableau-q2-earnings-impressive-growth-in-customer-base-and-revenues.Google Scholar
- C. Ahlberg. Spotfire: An information exploration environment. SIGMOD Rec., 25(4):25--29, Dec. 1996. Google ScholarDigital Library
- R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. In INFOVIS., pages 111--117. IEEE, 2005. Google ScholarDigital Library
- M. R. Anderberg. Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks, volume 19. Academic press, 2014.Google Scholar
- K. S. Bordens and B. B. Abbott. Research design and methods: A process approach. McGraw-Hill, 2002.Google Scholar
- H. Gonzalez et al. Google fusion tables: web-centered data management and collaboration. In SIGMOD Conference, pages 1061--1066, 2010. Google ScholarDigital Library
- K. Goto and R. A. Geijn. Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS), 34(3):12, 2008. Google ScholarDigital Library
- J. Han et al. Dmql: A data mining query language for relational databases. In Proc. 1996 SiGMOD, volume 96, pages 27--34, 1996.Google Scholar
- G. Holmes, A. Donkin, and I. H. Witten. Weka: A machine learning workbench. In Conf. on Intelligent Information Systems '94, pages 357--361. IEEE, 1994.Google ScholarCross Ref
- T. Imielinski and A. Virmani. A query language for database mining. Data Mining and Knowledge Discovery, 3(4):373--408, 2000. Google ScholarDigital Library
- K. Wongsuphasawat et al. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE TVCG, 2015.Google Scholar
- S. Kandel et al. Profiler: integrated statistical analysis and visualization for data quality assessment. In AVI, pages 547--554, 2012. Google ScholarDigital Library
- A. Key, B. Howe, D. Perry, and C. Aragon. Vizdeck: Self-organizing dashboards for visual analytics. SIGMOD '12, pages 681--684, 2012. Google ScholarDigital Library
- M. Livny et al. Devise: Integrated querying and visualization of large datasets. In SIGMOD Conference, pages 301--312, 1997. Google ScholarDigital Library
- J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans. Graph., 5(2):110--141, Apr. 1986. Google ScholarDigital Library
- J. D. Mackinlay et al. Show me: Automatic presentation for visual analysis. IEEE Trans. Vis. Comput. Graph., 13(6):1137--1144, 2007. Google ScholarDigital Library
- A. Netz et al. Integrating data mining with sql databases: Ole db for data mining. In ICDE'01, pages 379--387. IEEE, 2001. Google ScholarDigital Library
- Pedregosa et al. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
- S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pages 42--53, 1999. Google ScholarDigital Library
- G. Same and S. Sarawagi. Intelligent rollups in multidimensional olap data. In VLDB, pages 531--540, 2001. Google ScholarDigital Library
- T. K. Sellis. Multiple-query optimization. ACM TODS, 13(1):23--52, 1988. Google ScholarDigital Library
- C. Stolte et al. Polaris: a system for query, analysis, and visualization of multidimensional databases. Commun. ACM, 51(11):75--84, 2008. Google ScholarDigital Library
- P. Terlecki et al. On improving user response times in tableau. In SIGMOD, pages 1695--1706. ACM, 2015. Google ScholarDigital Library
- M. Vartak et al. Seedb: Efficient data-driven visualization recommendations to support visual analytics. VLDB, 8(13), Sept. 2015. Google ScholarDigital Library
- H. Wickham. ggplot: An implementation of the grammar of graphics. R package version 0.4. 0, 2006.Google Scholar
- L. Wilkinson. The grammar of graphics. Springer Science & Business Media, 2006.Google ScholarDigital Library
- M. M. Zloof. Query-by-example: A data base language. IBM Systems Journal, 16(4):324--343, 1977. Google ScholarDigital Library
Recommendations
Big data exploration through visual analytics
VAST '12: Proceedings of the 2012 IEEE Conference on Visual Analytics Science and Technology (VAST)SAS® Visual Analytics Explorer is an advanced data visualization and exploratory data analysis application that is a component of the SAS Visual Analytics solution. It excels at handling big data problems like the VAST challenge. With a wide range of ...
Safe Visual Data Exploration
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataExploring data via visualization has become a popular way to understand complex data. Features or patterns in visualization can be perceived as relevant insights by users, even though they may actually arise from random noise. Moreover, interactive data ...
Interactive data exploration using semantic windows
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataWe present a new interactive data exploration approach, called Semantic Windows (SW), in which users query for multidimensional "windows" of interest via standard DBMS-style queries enhanced with exploration constructs. Users can specify SWs using (i) ...
Comments