Abstract
We propose a novel approach to an FPGA-based approximate query processing accelerator using the Bag of Little Bootstraps (BLB) algorithm. The BLB algorithm is a statistical approximate computing method, allowing for efficient parallelization. We enhanced the BLB algorithm with a streaming mode to neglect data storage and memory transfer overhead. This allows us to take full advantage of the hardware capabilities of FPGAs. We supersede resampling with multiple passes over the dataset with a method based on Poisson bootstrapping using resampling coefficients. We show that our approach implemented on a Xilinx Zynq7000 FPGA with clock frequency at 125 MHz outperforms an optimized, multithreaded CPU implementation on an Intel i7-6850K with 4 GHz by factor 4 without and factor 2 with data transfer time for one million entries. This improvement increases with the amount of data to be processed. Implementing the BLB algorithm on an FPGA as an approximate query processing accelerator offers a promising approach for improving database query processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S., et al.: The aqua approximate query answering system. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp 574–576. New York, NY, USA (1999). https://doi.org/10.1145/304182.304581
Agarwal, S., Panda, A., Mozafari, B., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. EuroSys 2013 (2012). https://doi.org/10.1145/2465351.2465355
Alimohammad, A., Fard, S.F., Cockburn, B.F., Schlegel, C.: A compact and accurate gaussian variate generator. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 16(5), 517–527 (2008). https://doi.org/10.1109/TVLSI.2008.917552
Broneske, D., Drewes, A., Gurumurthy, B., Hajjar, I., Pionteck, T., Saake, G.: In-depth analysis of OLAP query performance on heterogeneous hardware. Datenbank-Spektrum 21(2), 133–143 (2021). https://doi.org/10.1007/s13222-021-00384-w
Canty, A.J., Davison, A.C., et al.: Bootstrap diagnostics and remedies. Can. J. Stat. 34(1), 5–27 (2006). https://doi.org/10.1002/cjs.5550340103
Cormode, G.: Sketch techniques for approximate query processing. Found. Trends Databases 4(1-3), 1-294 (2011)
Dagum, L., Menon, R.: Openmp: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998). https://doi.org/10.1109/99.660313
Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7(1), 1–26 (1979). https://doi.org/10.1214/aos/1176344552
Fang, J., Mulder, Y.T.B., Hidders, J., Lee, J., Hofstee, H.P.: In-memory database acceleration on FPGAs: a survey. VLDB J. 29(1), 33–59 (2019). https://doi.org/10.1007/s00778-019-00581-w
Babu, G.J., Pathak, P.K., Rao, C.R.: Second-order correctness of the Poisson bootstrap. Ann. Stat. 27(5), 1666–1683 (1999). https://doi.org/10.1214/aos/1017939146
Gough, B.: GNU scientific library reference manual. Network Theory Ltd. (2009)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. SIGMOD Rec. 26(2), 171–182 (1997). https://doi.org/10.1145/253262.253291
Hilprecht, B., et al.: Deepdb: learn from data, not from queries! Proc. VLDB Endow. 13(7), 992–1005 (2020). https://doi.org/10.14778/3384345.3384349
Kleiner, A., Talwalkar, A., Agarwal, S., Stoica, I., Jordan, M.: A general bootstrap performance diagnostic, pp. 419–427 (2013). https://doi.org/10.1145/2487575.2487650
Kleiner, Ariel, et al.: A scalable bootstrap for massive data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 76 (2011). https://doi.org/10.1111/rssb.12050
Li, Y., Chow, P., et al.:: Software/hardware framework for generating parallel gaussian random numbers based on the monty python method. In: 2012 International Conference on Field-Programmable Technology, pp. 190–197 (2012). https://doi.org/10.1109/FPT.2012.6412133
Liu, Z., Zhang, A.: A survey on sampling and profiling over big data (technical report) (2020). https://doi.org/10.48550/ARXIV.2005.05079
Ma, Q., Triantafillou, P.: Dbest: revisiting approximate query processing engines with machine learning models, pp. 1553–1570 (2019). https://doi.org/10.1145/3299869.3324958
Mahmud, M.S., Huang, J.Z., et al.: A survey of data partitioning and sampling methods to support big data analysis. Big Data Min. Anal. 3(2), 85–101 (2020). https://doi.org/10.26599/BDMA.2019.9020015
Malik, J.S., Hemani, A.: Gaussian random number generation: a survey on hardware architectures. ACM Comput. Surv. 49(3) (2016). https://doi.org/10.1145/2980052
Nair, L.B.G., et al.: The reprovide query-sequence optimization in a hardware-accelerated DBMs. In: Proceedings of the 16th International Workshop on Data Management on New Hardware. DaMoN 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3399666.3399926
Park, Y., et al.: Verdictdb: universalizing approximate query processing, SIGMOD 2018, pp. 1461–1476. New York, NY, USA (2018). https://doi.org/10.1145/3183713.3196905
Parsons, V.L.: Stratified Sampling, pp. 1–11. Wiley, Hoboken (2017). https://doi.org/10.1002/9781118445112.stat05999.pub2
Peng, J., et al.: AQP++: Connecting approximate query processing with aggregate precomputation for interactive analytics, pp. 1477–1492. SIGMOD 2018, New York, NY, USA (2018). https://doi.org/10.1145/3183713.3183747
Pol, A., Jermaine, C.: Relational confidence bounds are easy with the bootstrap, pp. 587–598 (2005). https://doi.org/10.1145/1066157.1066224
Rao, C., Pathak, P., Koltchinskii, V.: Bootstrap by sequential resampling. J. Stat. Plan. Infer. 64(2), 257–281 (1997). https://doi.org/10.1016/S0378-3758(97)00041-4
Salami, B., Gorker, et al.: Axledb: a novel programmable query processing platform on FPGA. Microprocess. Microsyst. 51, 142–164 (2017). https://doi.org/10.1016/j.micpro.2017.04.018
Shoemaker, O.J., Pathak, P.K.: The sequential bootstrap: a comparison with regular bootstrap. Commun. Stat. Theor. Methods 30(8–9), 1661–1674 (2001). https://doi.org/10.1081/STA-100105691
Thomas, D.B.: The table-hadamard GRNG: an area-efficient FPGA gaussian random number generator. ACM Trans. Reconfigurable Technol. Syst. 8(4) (2015). https://doi.org/10.1145/2629607
TPC: Tpc-h decision support benchmark. https://www.tpc.org/tpch. Accessed 05 Aug 2022
Xilinx: Logicore IP product guide. https://docs.xilinx.com/v/u/en-US/pg060-floating-point (2020)
Zhao, H., Zhang, H., Jing, Y., Zhang, K., He, Z., Wang, X.S.: Revisiting approximate query processing and bootstrap error estimation on GPU. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. LNCS, vol. 13245, pp. 72–87. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-00123-9_5
Ziener, D., Bauer, F., et al.: FPGA-based dynamically reconfigurable SQL query processing. ACM Trans. Reconfigurable Technol. Syst. 9(4) (2016). https://doi.org/10.1145/2845087
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Burtsev, V. et al. (2023). FPGA-Integrated Bag of Little Bootstraps Accelerator for Approximate Database Query Processing. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-42921-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42920-0
Online ISBN: 978-3-031-42921-7
eBook Packages: Computer ScienceComputer Science (R0)