An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Pochelu, Pierrick; Petiton, Serge G.; Conche, Bruno

doi:10.1109/BigData52589.2021.9671725

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2208.14049 (cs)

[Submitted on 30 Aug 2022]

Title:An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Authors:Pierrick Pochelu, Serge G. Petiton, Bruno Conche

View PDF

Abstract:Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational resources. Unlike recent initiatives on inference servers and inference frameworks, which focus on the prediction of single DNNs, we propose a new software layer to serve with flexibility and efficiency ensembles of DNNs.
Our inference system is designed with several technical innovations. First, we propose a novel procedure to find a good allocation matrix between devices (CPUs or GPUs) and DNN instances. It runs successively a worst-fit to allocate DNNs into the memory devices and a greedy algorithm to optimize allocation settings and speed up the ensemble. Second, we design the inference system based on multiple processes to run asynchronously: batching, prediction, and the combination rule with an efficient internal communication scheme to avoid overhead.
Experiments show the flexibility and efficiency under extreme scenarios: It successes to serve an ensemble of 12 heavy DNNs into 4 GPUs and at the opposite, one single DNN multi-threaded into 16 GPUs. It also outperforms the simple baseline consisting of optimizing the batch size of DNNs by a speedup up to 2.7X on the image classification task.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2208.14049 [cs.DC]
	(or arXiv:2208.14049v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2208.14049
Journal reference:	Proceedings of IEEE International Conference on Big Data 2022
Related DOI:	https://doi.org/10.1109/BigData52589.2021.9671725

Submission history

From: Pierrick Pochelu PhD [view email]
[v1] Tue, 30 Aug 2022 08:05:43 UTC (223 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators