skip to main content
research-article

Effortless data exploration with zenvisage: an expressive and interactive visual analytics system

Published:01 November 2016Publication History
Skip Abstract Section

Abstract

Data visualization is by far the most commonly used mechanism to explore and extract insights from datasets, especially by novice data scientists. And yet, current visual analytics tools are rather limited in their ability to operate on collections of visualizations---by composing, filtering, comparing, and sorting them---to find those that depict desired trends or patterns. The process of visual data exploration remains a tedious process of trial-and-error. We propose zenvisage, a visual analytics platform for effortlessly finding desired visual patterns from large datasets. We introduce zenvisage's general purpose visual exploration language, ZQL ("zee-quel") for specifying the desired visual patterns, drawing from use-cases in a variety of domains, including biology, mechanical engineering, climate science, and commerce. We formalize the expressiveness of ZQL via a visual exploration algebra---an algebra on collections of visualizations---and demonstrate that ZQL is as expressive as that algebra. zenvisage exposes an interactive front-end that supports the issuing of ZQL queries, and also supports interactions that are "short-cuts" to certain commonly used ZQL queries. To execute these queries, zenvisage uses a novel ZQL graph-based query optimizer that leverages a suite of optimizations tailored to the goal of processing collections of visualizations in certain pre-defined ways. Lastly, a user survey and study demonstrates that data scientists are able to effectively use zenvisage to eliminate error-prone and tedious exploration and directly identify desired visualizations.

References

  1. Airline dataset (http://stat-computing.org/dataexpo/2009/the-data.html). {Online; accessed 30-Oct-2015}.Google ScholarGoogle Scholar
  2. Effortless data exploration with zenvisage: An expressive and interactive visual analytics system. Technical Report. http://data-people.cs.illinois.edu/zenvisage.pdf.Google ScholarGoogle Scholar
  3. Spotfire, http://spotfire.com. {Online; accessed 17-Aug-2015}.Google ScholarGoogle Scholar
  4. Tableau public (www.tableaupublic.com/). {Online; accessed 3-March-2014}.Google ScholarGoogle Scholar
  5. Upwork (https://www.upwork.com/). {Online; accessed 3-August-2016}.Google ScholarGoogle Scholar
  6. Zillow real estate data (http://www.zillow.com/research/data/). {Online; accessed 1-Feb-2016}.Google ScholarGoogle Scholar
  7. Tableau q2 earnings: Impressive growth in customer base and revenues. http://www.forbes.com/sites/greatspeculations/2015/07/31/tableau-q2-earnings-impressive-growth-in-customer-base-and-revenues.Google ScholarGoogle Scholar
  8. C. Ahlberg. Spotfire: An information exploration environment. SIGMOD Rec., 25(4):25--29, Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. In INFOVIS., pages 111--117. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. R. Anderberg. Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks, volume 19. Academic press, 2014.Google ScholarGoogle Scholar
  11. K. S. Bordens and B. B. Abbott. Research design and methods: A process approach. McGraw-Hill, 2002.Google ScholarGoogle Scholar
  12. H. Gonzalez et al. Google fusion tables: web-centered data management and collaboration. In SIGMOD Conference, pages 1061--1066, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Goto and R. A. Geijn. Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS), 34(3):12, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Han et al. Dmql: A data mining query language for relational databases. In Proc. 1996 SiGMOD, volume 96, pages 27--34, 1996.Google ScholarGoogle Scholar
  15. G. Holmes, A. Donkin, and I. H. Witten. Weka: A machine learning workbench. In Conf. on Intelligent Information Systems '94, pages 357--361. IEEE, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  16. T. Imielinski and A. Virmani. A query language for database mining. Data Mining and Knowledge Discovery, 3(4):373--408, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Wongsuphasawat et al. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE TVCG, 2015.Google ScholarGoogle Scholar
  18. S. Kandel et al. Profiler: integrated statistical analysis and visualization for data quality assessment. In AVI, pages 547--554, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Key, B. Howe, D. Perry, and C. Aragon. Vizdeck: Self-organizing dashboards for visual analytics. SIGMOD '12, pages 681--684, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Livny et al. Devise: Integrated querying and visualization of large datasets. In SIGMOD Conference, pages 301--312, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans. Graph., 5(2):110--141, Apr. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. D. Mackinlay et al. Show me: Automatic presentation for visual analysis. IEEE Trans. Vis. Comput. Graph., 13(6):1137--1144, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Netz et al. Integrating data mining with sql databases: Ole db for data mining. In ICDE'01, pages 379--387. IEEE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pedregosa et al. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, pages 42--53, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Same and S. Sarawagi. Intelligent rollups in multidimensional olap data. In VLDB, pages 531--540, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. K. Sellis. Multiple-query optimization. ACM TODS, 13(1):23--52, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Stolte et al. Polaris: a system for query, analysis, and visualization of multidimensional databases. Commun. ACM, 51(11):75--84, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Terlecki et al. On improving user response times in tableau. In SIGMOD, pages 1695--1706. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Vartak et al. Seedb: Efficient data-driven visualization recommendations to support visual analytics. VLDB, 8(13), Sept. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Wickham. ggplot: An implementation of the grammar of graphics. R package version 0.4. 0, 2006.Google ScholarGoogle Scholar
  32. L. Wilkinson. The grammar of graphics. Springer Science & Business Media, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. M. Zloof. Query-by-example: A data base language. IBM Systems Journal, 16(4):324--343, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 4
    November 2016
    180 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 November 2016
    Published in pvldb Volume 10, Issue 4

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader