ABSTRACT
Both forward and reverse mode automatic differentiation derive a model function as used for gradient descent automatically. Reverse mode calculates all derivatives in one run, whereas forward mode requires rerunning the algorithm with respect to every variable for which the derivative is needed. To allow for in-database machine learning, we have integrated automatic differentiation as an SQL operator inside the Umbra database system. To benchmark code-generation to GPU, we implement forward as well as reverse mode automatic differentiation. The inspection of the optimised LLVM code shows that nearly the same machine code is executed after the generated LLVM code has been optimised. Thus, both modes yield similar runtimes but different compilation times.
- Matthias Boehm et al. 2016. SystemML: Declarative Machine Learning on Spark. PVLDB 9, 13 (2016), 1425--1436.Google Scholar
- Matthias Boehm et al. 2020. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. In CIDR. www.cidrdb.org.Google Scholar
- Patrick Damme et al. 2020. MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model. PVLDB 13, 11 (2020), 2396--2410.Google Scholar
- Ahmed Elgohary et al. 2018. Compressed linear algebra for large-scale machine learning. VLDB J. 27, 5 (2018), 719--744.Google ScholarDigital Library
- Edward Gan et al. 2020. CoopStore: Optimizing Precomputed Summaries for Aggregation. PVLDB 13, 11 (2020), 2174--2187.Google Scholar
- Rainer Gemulla et al. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In KDD. ACM, 69--77.Google Scholar
- Ahmed Helal et al. 2021. A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science. PVLDB 14, 12 (2021), 2675--2678.Google Scholar
- Dimitrije Jankov et al. 2019. Declarative Recursive Computation on an RDBMS. PVLDB 12, 7 (2019), 822--835.Google Scholar
- Ahmet Kara et al. 2021. Machine learning over static and dynamic relational data. In DEBS. ACM, 160--163.Google Scholar
- Lukas Karnowski et al. 2021. Umbra as a Time Machine. In BTW (LNI). GI.Google Scholar
- Andreas Kunft et al. 2019. An Intermediate Representation for Optimizing Machine Learning Pipelines. PVLDB 12, 11 (2019), 1553--1567.Google Scholar
- Sören Laue. 2019. On the Equivalence of Forward Mode Automatic Differentiation and Symbolic Differentiation. CoRR abs/1904.02990 (2019).Google Scholar
- Tae-Jun Lee et al. 2018. Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection. CoRR abs/1801.03168 (2018).Google Scholar
- Xupeng Li et al. 2017. MLog: Towards Declarative In-Database Machine Learning. PVLDB 10, 12 (2017), 1933--1936.Google Scholar
- Thomas Neumann et al. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.Google Scholar
- Stefanie Scherzinger et al. 2019. The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning. In ICDCS. IEEE.Google Scholar
- Maximilian Schleich et al. 2020. LMFAO: An Engine for Batches of Group-By Aggregates. PVLDB 13, 12 (2020), 2945--2948.Google Scholar
- Maximilian E. Schüle et al. 2017. Monopedia: Staying Single is Good Enough - The HyPer Way for Web Scale Applications. PVLDB 10, 12 (2017), 1921--1924.Google Scholar
- Maximilian E. Schüle et al. 2019. In-Database Machine Learning: Gradient Descent and Tensor Algebra for Main Memory Database Systems. In BTW (LNI). GI.Google Scholar
- Maximilian E. Schüle et al. 2019. ML2SQL - Compiling a Declarative Machine Learning Language to SQL and Python. In EDBT.Google Scholar
- Maximilian E. Schüle et al. 2019. MLearn: A Declarative Machine Learning Language for Database Systems. In DEEM@SIGMOD. ACM, 7:1--7:4.Google Scholar
- Maximilian E. Schüle et al. 2019. The Power of SQL Lambda Functions. In EDBT.Google Scholar
- Maximilian E. Schüle et al. 2020. Freedom for the SQL-Lambda: Just-in-Time-Compiling User-Injected Functions in PostgreSQL. In SSDBM. ACM, 6:1--6:12.Google Scholar
- Maximilian E. Schüle et al. 2021. ArrayQL for Linear Algebra within Umbra. In SSDBM. ACM, 193--196.Google Scholar
- Maximilian E. Schüle et al. 2021. In-Database Machine Learning with SQL on GPUs. In SSDBM. ACM, 25--36.Google Scholar
- Maximilian E. Schüle et al. 2022. ArrayQL Integration into Code-Generating Database Systems. In EDBT.Google Scholar
- Vraj Shah et al. 2021. Towards Benchmarking Feature Type Inference for AutoML Platforms. In SIGMOD. ACM, 1584--1596.Google Scholar
- Amir Shaikhha et al. 2021. An Intermediate Representation for Hybrid Database and Machine Learning Workloads. PVLDB 14, 12 (2021), 2831--2834.Google Scholar
- Ted Shaowang et al. 2021. Declarative Data Serving: The Future of Machine Learning Inference on the Edge. PVLDB 14, 11 (2021), 2555--2562.Google Scholar
- Jonas Traub et al. 2020. Agora: Bringing Together Datasets, Algorithms, Models and More in a Unified Ecosystem [Vision]. SIGMOD Rec. 49, 4 (2020), 6--11.Google ScholarDigital Library
- Hantian Zhang et al. 2021. OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning. In SIGMOD. ACM, 2076--2088.Google Scholar
Recommendations
In-Database Machine Learning with SQL on GPUs
SSDBM '21: Proceedings of the 33rd International Conference on Scientific and Statistical Database ManagementIn machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have rarely ...
Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures
ICPP '20: Proceedings of the 49th International Conference on Parallel ProcessingAutomatic differentiation, back-propagation, differentiable programming and related methods have received widespread attention, due to their ability to compute accurate gradients of numerical programs for optimization, uncertainty quantification, and ...
Reverse-mode automatic differentiation and optimization of GPU kernels via enzyme
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisComputing derivatives is key to many algorithms in scientific computing and machine learning such as optimization, uncertainty quantification, and stability analysis. Enzyme is a LLVM compiler plugin that performs reverse-mode automatic differentiation (...
Comments