Abstract
Today's large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a "sketch of sketches" to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times.
- 2014. Spark treeAggregate and treeReduce. https://github.com/apache/spark/pull/1110. (2014). [Online; accessed 16-July-2022].Google Scholar
- 2015. Kafka tops 1 trillion messages per day at linkedin. https://www.datanami.com/2015/09/02/kafka-tops-1-trillion-messages-per-day-at-linkedin/. (2015). [Online; accessed 16-July-2022].Google Scholar
- 2015. SURUS - Anomaly detection at Netflix. https://netflixtechblog.com/radoutlier-detection-on-big-data-d6b0494371cc. (2015). [Online; accessed 16-July-2022].Google Scholar
- 2016. Approximate Algorithms in Apache spark: Hyperloglog and Quantiles. https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html. (2016). [Online; accessed 16-July-2022].Google Scholar
- 2017. Kafka Streams. https://kafka.apache.org/documentation/streams/. (2017). [Online; accessed 16-July-2022].Google Scholar
- 2018. EC2 DNS Resolution Issues in the Asia Pacific Region. https://aws.amazon.com/message/74876/. (2018). [Online; accessed 16-July-2022].Google Scholar
- 2019. CAIDA Trace. https://www.caida.org/catalog/datasets/monitors/passive-equinix-nyc/. (2019). [Online; accessed 16-July-2022].Google Scholar
- 2019. Druid Ingestion Performance. https://stackoverflow.com/questions/54578482/druid-parquet-poor-ingestion-performance#54580535. (2019). [Online; accessed 16-July-2022].Google Scholar
- 2019. EBS Service Event in the Tokyo Region. https://aws.amazon.com/message/56489/. (2019). [Online; accessed 16-July-2022].Google Scholar
- 2021. CAIDA Network Flow Traces. https://www.caida.org/catalog/datasets/overview/. (2021). [Online; accessed 16-July-2022].Google Scholar
- 2022. Amazon AWS EC2 pricing. https://aws.amazon.com/ec2/pricing/on-demand/. (2022). [Online; accessed 16-July-2022].Google Scholar
- 2022. Conviva - Real-time Streaming Video Intelligence. https://www.conviva.com/. (2022). [Online; accessed 16-July-2022].Google Scholar
- 2022. HYDRA repository. https://github.com/antonis-m/HYDRA_VLDB. (2022). [Online; accessed 16-July-2022].Google Scholar
- 2022. IBM Streams. https://www.ibm.com/cloud/streaming-analytics. (2022). [Online; accessed 16-July-2022].Google Scholar
- Daniel J Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, et al. 2005. The design of the borealis stream processing engine.. In Cidr, Vol. 5. 277--289.Google Scholar
- Lior Abraham, John Allen, Oleksandr Barykin, Vinayak Borkar, Bhuwan Chopra, Ciprian Gerea, Daniel Merl, Josh Metzler, David Reiss, Subbu Subramanian, et al. 2013. Scuba: Diving into data at facebook. Proceedings of the VLDB Endowment 6, 11 (2013), 1057--1067.Google ScholarDigital Library
- Swarup Acharya, Phillip B Gibbons, and Viswanath Poosala. 2000. Congressional samples for approximate answering of group-by queries. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 487--498.Google Scholar
- Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. 1999. The aqua approximate query answering system. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data. 574--576.Google ScholarDigital Library
- Pankaj K Agarwal, Graham Cormode, Zengfeng Huang, Jeff M Phillips, Zhewei Wei, and Ke Yi. 2013. Mergeable summaries. ACM Transactions on Database Systems (TODS) 38, 4 (2013), 1--28.Google ScholarDigital Library
- Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. 29--42.Google ScholarDigital Library
- Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. Millwheel: Fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment 6, 11 (2013), 1033--1044.Google ScholarDigital Library
- Noga Alon, Yossi Matias, and Mario Szegedy. 1996. The Space Complexity of Approximating the Frequency Moments. In Proc. of ACM STOC.Google ScholarDigital Library
- Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Mayur Datar, Keith Ito, Rajeev Motwani, Utkarsh Srivastava, and Jennifer Widom. 2016. Stream: The stanford data stream management system. In Data Stream Management. Springer, 317--336.Google Scholar
- Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. 2015. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1383--1394.Google ScholarDigital Library
- A Asta. 2016. Observability at Twitter: technical overview, part i, 2016. (2016).Google Scholar
- Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, and Sahaana Suri. 2017. Macrobase: Prioritizing attention in fast data. In Proceedings of the 2017 ACM International Conference on Management of Data. 541--556.Google ScholarDigital Library
- Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker. 2005. Fault-tolerance in the Borealis distributed stream processing system. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 13--24.Google ScholarDigital Library
- Ran Ben Basat, Gil Einziger, Michael Mitzenmacher, and Shay Vargaftik. 2020. Faster and more accurate measurement through additive-error counters. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 1251--1260.Google ScholarDigital Library
- Ran Ben Basat, Gil Einziger, Michael Mitzenmacher, and Shay Vargaftik. 2021. SALSA: self-adjusting lean streaming analytics. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 864--875.Google ScholarCross Ref
- Ran Ben Basat, Gil Einziger, Roy Friedman, Marcelo C Luizelli, and Erez Waisbard. 2017. Constant time updates in hierarchical heavy hitters. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 127--140.Google ScholarDigital Library
- Lucas Braun, Thomas Etter, Georgios Gasparis, Martin Kaufmann, Donald Kossmann, Daniel Widmer, Aharon Avitzur, Anthony Iliopoulos, Eliezer Levy, and Ning Liang. 2015. Analytics in motion: High performance event-processing and real-time analytics in the same database. In Proceedings of the 2015 ACMSIGMOD International Conference on Management of Data. 251--264.Google ScholarDigital Library
- Vladimir Braverman and Stephen R Chestnut. 2014. Universal sketches for the frequency negative moments and other decreasing streaming sums. arXiv preprint arXiv:1408.5096 (2014).Google Scholar
- Vladimir Braverman and Rafail Ostrovsky. 2010. Zero-one frequency laws. In Proceedings of the forty-second ACM symposium on Theory of computing. 281--290.Google ScholarDigital Library
- Chiranjeeb Buragohain and Subhash Suri. 2009. Quantiles on Streams. (2009).Google Scholar
- Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).Google Scholar
- Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. Scope: easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment 1, 2 (2008), 1265--1276.Google ScholarDigital Library
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection: A Survey. ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages. Google ScholarDigital Library
- Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems (TODS) 32, 2 (2007), 9--es.Google ScholarDigital Library
- Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate query processing: No silver bullet. In Proceedings of the 2017 ACM International Conference on Management of Data. 511--519.Google ScholarDigital Library
- Xiaoqi Chen, Shir Landau-Feibish, Mark Braverman, and Jennifer Rexford. 2020. Beaucoup: Answering many network traffic queries, one memory update at a time. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 226--239.Google ScholarDigital Library
- Jeffrey Considine, Marios Hadjieleftheriou, Feifei Li, John Byers, and George Kollios. 2009. Robust approximate aggregation in sensor data management systems. ACM Transactions on Database Systems (TODS) 34, 1 (2009), 1--35.Google ScholarDigital Library
- Graham Cormode, Minos Garofalakis, Peter J Haas, and Chris Jermaine. 2012. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases 4, 1--3 (2012), 1--294.Google ScholarDigital Library
- Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58--75.Google ScholarDigital Library
- Chuck Cranor, Theodore Johnson, Oliver Spataschek, and Vladislav Shkapenyuk. 2003. Gigascope: A stream database for network applications. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 647--651.Google ScholarDigital Library
- Marianne Durand and Philippe Flajolet. 2003. Loglog counting of large cardinalities. In European Symposium on Algorithms. Springer, 605--617.Google ScholarCross Ref
- Anja Feldmann, Albert Greenberg, Carsten Lund, Nick Reingold, Jennifer Rexford, and Fred True. 2001. Deriving traffic demands for operational IP networks: Methodology and experience. IEEE/ACM Transactions On Networking 9, 3 (2001), 265--279.Google ScholarDigital Library
- Philippe Flajolet and G Nigel Martin. 1985. Probabilistic counting algorithms for data base applications. Journal of computer and system sciences 31, 2 (1985), 182--209.Google ScholarDigital Library
- Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, and Asterios Katsifodimos. 2020. A Survey on the Evolution of Stream Processing Systems. arXiv preprint arXiv:2008.00842 (2020).Google Scholar
- Edward Gan, Peter Bailis, and Moses Charikar. 2020. Coopstore: Optimizing precomputed summaries for aggregation. Proceedings of the VLDB Endowment 13, 12 (2020), 2174--2187.Google ScholarDigital Library
- Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, and Peter Bailis. 2018. Moment-based quantile sketches for efficient high cardinality aggregation queries. arXiv preprint arXiv:1803.01969 (2018).Google Scholar
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles. 29--43.Google ScholarDigital Library
- Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data mining and knowledge discovery 1, 1 (1997), 29--53.Google Scholar
- Michael Greenwald and Sanjeev Khanna. 2001. Space-efficient online computation of quantile summaries. ACM SIGMOD Record 30, 2 (2001), 58--66.Google ScholarDigital Library
- Arpit Gupta, Rob Harrison, Marco Canini, Nick Feamster, Jennifer Rexford, and Walter Willinger. 2018. Sonata: Query-driven streaming network telemetry. In Proceedings of the 2018 conference of the ACM special interest group on data communication. 357--371.Google ScholarDigital Library
- Alex Hall, Alexandru Tudorica, Filip Buruiana, Reimar Hofmann, Silviu-Ionut Ganceanu, and Thomas Hofmann. 2016. Trading off accuracy for speed in PowerDrill. (2016).Google Scholar
- Jiawei Han, Jian Pei, Guozhu Dong, and Ke Wang. 2001. Efficient computation of iceberg cubes with complex measures. In Proceedings of the 2001 ACM SIGMOD international conference on Management of data. 1--12.Google ScholarDigital Library
- Venky Harinarayan, Anand Rajaraman, and Jeffrey D Ullman. 1996. Implementing data cubes efficiently. Acm Sigmod Record 25, 2 (1996), 205--216.Google ScholarDigital Library
- Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.Google ScholarDigital Library
- Daniel N Hill, Houssam Nassif, Yi Liu, Anand Iyer, and SVN Vishwanathan. 2017. An efficient bandit algorithm for realtime multivariate optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1813--1821.Google ScholarDigital Library
- J-H Hwang, Magdalena Balazinska, Alex Rasin, Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik. 2005. High-availability algorithms for distributed stream processing. In 21st International Conference on Data Engineering (ICDE'05). IEEE, 779--790.Google ScholarDigital Library
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007. 59--72.Google ScholarDigital Library
- Jeffrey Jestes, Ke Yi, and Feifei Li. 2011. Building wavelet histograms on large data in mapreduce. arXiv preprint arXiv:1110.6649 (2011).Google Scholar
- Junchen Jiang, Vyas Sekar, Henry Milner, Davis Shepherd, Ion Stoica, and Hui Zhang. 2016. CFA: A Practical Prediction System for Video QoE Optimization. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX Association, Berkeley, CA, USA, 137--150. http://dl.acm.org/citation.cfm?id=2930611.2930621Google ScholarDigital Library
- Junchen Jiang, Vyas Sekar, Ion Stoica, and Hui Zhang. 2013. Shedding light on the structure of internet video quality problems in the wild. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies. ACM, 357--368.Google ScholarDigital Library
- Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at a/b tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1517--1525.Google ScholarDigital Library
- Seyed Jalal Kazemitabar, Ugur Demiryurek, Mohamed Ali, Afsin Akdogan, and Cyrus Shahabi. 2010. Geospatial stream query processing using Microsoft SQL Server StreamInsight. Proceedings of the VLDB Endowment 3, 1--2 (2010), 1537--1540.Google ScholarDigital Library
- Adam Kirsch and Michael Mitzenmacher. 2006. Less hashing, same performance: building a better bloom filter. In European Symposium on Algorithms. Springer, 456--467.Google ScholarDigital Library
- Marcel Kornacker, Alexander Behm, Victor Bittorf, Taras Bobrovytsky, Casey Ching, Alan Choi, Justin Erickson, Martin Grund, Daniel Hecht, Matthew Jacobs, et al. 2015. Impala: A Modern, Open-Source SQL Engine for Hadoop.. In Cidr, Vol. 1. 9.Google Scholar
- Laks VS Lakshmanan, Jian Pei, and Jiawei Han. 2002. Quotient cube: How to summarize the semantics of a data cube. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 778--789.Google ScholarCross Ref
- Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data. 615--629.Google ScholarDigital Library
- Xiaolei Li, Jiawei Han, and Hector Gonzalez. 2004. High-dimensional OLAP: A minimal cubing approach. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. 528--539.Google Scholar
- Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and Vladimir Braverman. 2016. One sketch to rule them all: Rethinking network flow monitoring with univmon. In Proceedings of the 2016 ACM SIGCOMM Conference. 101--114.Google ScholarDigital Library
- Qingzhi Ma and Peter Triantafillou. 2019. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data. 1553--1570.Google ScholarDigital Library
- Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3, 1--2 (2010), 330--339.Google ScholarDigital Library
- Gregory T Minton and Eric Price. 2014. Improved concentration bounds for count-sketch. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 669--686.Google ScholarDigital Library
- Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 439--455.Google ScholarDigital Library
- Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, Peter Steenkiste, Guyue Liu, Ao Li, Christopher Canel, Adithya Abraham Philip, Ranysha Ware, et al. Sketchlib: Enabling efficient sketch-based monitoring on programmable switches. NSDI.Google Scholar
- Christopher Olston, Edward Bortnikov, Khaled Elmeleegy, Flavio Junqueira, and Benjamin Reed. 2009. Interactive Analysis of Web-Scale Data.. In CIDR. Citeseer.Google Scholar
- Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 1099--1110.Google ScholarDigital Library
- Niketan Pansare, Vinayak Borkar, Chris Jermaine, and Tyson Condie. 2011. Online aggregation for large mapreduce jobs. Proceedings of the VLDB Endowment 4, 11 (2011), 1135--1145.Google ScholarDigital Library
- Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2018. Verdictdb: Universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data. 1461--1476.Google ScholarDigital Library
- Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment 8, 12 (2015), 1816--1827.Google ScholarDigital Library
- Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S Pai, and Michael J Freedman. 2014. Aggregation and degradation in jetstream: Streaming analytics in the wide area. In 11th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 14). 275--288.Google Scholar
- Anirudh Ramachandran, Srinivasan Seetharaman, Nick Feamster, and Vijay Vazirani. 2008. Fast monitoring of traffic subpopulations. In Proceedings of the 8th ACM SIGCOMM conference on Internet measurement. 257--270.Google ScholarDigital Library
- Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). Ieee, 1--10.Google ScholarDigital Library
- Lefteris Sidirourgos, Martin L Kersten, Peter A Boncz, et al. 2011. Sciborq: scientific data management with bounds on runtime and quality.. In CIDR, Vol. 11. 296--301.Google Scholar
- Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2, 2 (2009), 1626--1629.Google ScholarDigital Library
- Daniel Ting. 2018. Count-min: optimal estimation and tight error bounds using empirical error distributions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2319--2328.Google ScholarDigital Library
- Daniel Ting. 2019. Approximate distinct counts for billions of datasets. In Proceedings of the 2019 International Conference on Management of Data. 69--86.Google ScholarDigital Library
- Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 8. NIH Public Access, 2182.Google Scholar
- Jeffrey Scott Vitter and Min Wang. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. Acm Sigmod Record 28, 2 (1999), 193--204.Google ScholarDigital Library
- Lu Wang, Robert Christensen, Feifei Li, and Ke Yi. 2015. Spatial online sampling and aggregation. Proceedings of the VLDB Endowment 9, 3 (2015), 84--95.Google ScholarDigital Library
- Zhewei Wei, Ge Luo, Ke Yi, Xiaoyong Du, and Ji-Rong Wen. 2015. Persistent data sketching. In Proceedings of the 2015 ACM SIGMOD international conference on Management of Data. 795--810.Google ScholarDigital Library
- Qingjun Xiao, Shigang Chen, Min Chen, and Yibei Ling. 2015. Hyper-compact virtual estimators for big network data based on register sharing. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 417--428.Google ScholarDigital Library
- Yinglian Xie, Vyas Sekar, David A Maltz, Michael K Reiter, and Hui Zhang. 2005. Worm origin identification using random moonwalks. In 2005 IEEE Symposium on Security and Privacy (S&P'05). IEEE, 242--256.Google Scholar
- Fangjin Yang, Eric Tschetter, Xavier Léauté, Nelson Ray, Gian Merlino, and Deep Ganguli. 2014. Druid: A real-time analytical data store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 157--168.Google ScholarDigital Library
- Mingran Yang, Junbo Zhang, Akshay Gadre, Zaoxing Liu, Swarun Kumar, and Vyas Sekar. 2020. Joltik: enabling energy-efficient" future-proof" analytics on low-power wide-area networks. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.Google ScholarDigital Library
- Minlan Yu, Lavanya Jose, and Rui Miao. 2013. Software Defined Traffic Measurement with OpenSketch. In 10th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 13). 29--42.Google Scholar
- Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. 2016. Apache spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.Google ScholarDigital Library
Comments