ABSTRACT
The research in Connected Component Labeling, although old, is still very active and several efficient algorithms for CPUs and GPUs have emerged during the last years and are always improving the performance. This article introduces a new SIMD run-based algorithm for CCL. We show how RLE compression can be SIMDized and used to accelerate scalar run-based CCL algorithms. A benchmark done on Intel, AMD and ARM processors shows that this new algorithm outperforms the State-of-the-Art by an average factor of x1.7 on AVX2 machines and x1.9 on Intel Xeon Skylake with AVX512.
- D. A. Bader and J. Jaja, "Parallel algorithms for image histogramming and connected components with an experimental study," Parallel and Distributed Computing, vol. 35, 2, pp. 173--190, 1995.Google Scholar
- A. Lindner, A. Bieniek, and H. Burkhardt, "PISA - parallel image segmentation algorithms," pp. 1--10, Springer, 1999.Google Scholar
- L. He, X. Ren, Q. Gao, X. Zhao, B. Yao, and Y. Chao, "The connected-component labeling problem: a review of state-of-the-art algorithms," Pattern Recognition, vol. 70, pp. 25--43, 2017.Google ScholarDigital Library
- F. Bolelli, M. Cancilla, L. Baraldi, and C. Grana, "Toward reliable experiments on the performance of connected components labeling algorithms," Journal of Real-Time Image Processing (JRTIP), pp. 1--16, 2018.Google Scholar
- M. Niknam, P. Thulasiraman, and S. Camorlinga, "A parallel algorithm for connected component labeling of gray-scale images on homogeneous multicore architectures," Journal of Physics - High Performance Computing Symposium (HPCS), 2010.Google Scholar
- S. Gupta, D. Palsetia, M. A. Patwary, A. Agrawal, and A. Choudhary, "A new parallel algorithm for two-pass connected component labeling," in Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 1355--1362, IEEE, 2014.Google Scholar
- A. Rosenfeld and J. Platz, "Sequential operator in digital pictures processing," Journal of ACM, vol. 13, 4, pp. 471--494, 1966.Google ScholarDigital Library
- F. Wende and T. Steinke, "Swendsen-wang multi-cluster algorithm for the 2d/3d Ising Model on Xeon Phi and GPU," in International Conference on High Performance Computing (SuperComputing) (ACM, ed.), pp. 1--12, 2013.Google Scholar
- L. Lacassagne, L. Cabaret, F. Hebache, and A. Petreto, "A new SIMD iterative connected component labeling algorithm," in ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), pp. 1--8, 2016.Google Scholar
- A. Kalentev, A. Rai, S. Kemnitz, and R. Schneider, "Connected component labeling on a 2d grid using CUDA," Journal of Parallel and Distributed Computing, vol. 71, pp. 615--620, 2011.Google ScholarDigital Library
- A. Hennequin, I. Masliah, and L. Lacassagne, "Designing efficient SIMD algorithms for direct connected component labeling," in ACM Workshop on Programming Models for SIMD/Vector Processing (PPoPP), pp. 1--8, 2019.Google Scholar
- Y. Komura, "GPU-based cluster-labeling algorithm without the use of conventional iteration: application to swendsen-wang multi-cluster spin flip algorithm," Computer Physics Communications, pp. 54--58, 2015.Google ScholarCross Ref
- D. P. Playne and K. Hawick, "A new algorithm for parallel connected-component labelling on GPUs," IEEE Transactions on Parallel and Distributed Systems, 2018.Google ScholarCross Ref
- F. Bolelli, L. Baraldi, M. Cancilla, and C. Grana, "Connected components labeling on DRAGs," in International Conference on Pattern Recognition (ICPR) (IEEE, ed.), pp. 121--126, 2018.Google Scholar
- F. Bolelli, S. Allegretti, L. Baraldi, and C. Grana, "Spaghetti labeling: Directed acyclic graphs for block-based connected components labeling," Transactions on Image Processing, vol. PP, pp. 1--14, 2019.Google Scholar
- L. Lacassagne and A. B. Zavidovique, "Light speed labeling for RISC architectures," in IEEE International Conference on Image Analysis and Processing (ICIP), 2009.Google Scholar
- L. Cabaret, L. Lacassagne, and D. Etiemble, "Parallel Light Speed Labeling for connected component analysis on multi-core processors," Journal of Real-Time Image Processing (JRTIP), vol. 15, no.1, pp. 173--196, 2018.Google ScholarDigital Library
- A. Hennequin, Q. L. Meunier, L. Lacassagne, and L. Cabaret, "A new direct connected component labeling and analysis algorithm for GPUs," in IEEE International Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1--6, 2018.Google Scholar
- A. H. Robinson and C. Cherry, "Results of a prototype television bandwidth compression scheme," Proceedings of the IEEE, vol. 55, 3, pp. 8--19, 1967.Google ScholarCross Ref
- T. A. Welch, "A technique for high-performance data compression," Computer, vol. 17, 6, pp. 8--19, 1984.Google ScholarDigital Library
- J. Ziv and A. Lempel, "Compression of individual sequences via variable-rate coding," Transactions on Information Theory, vol. 24, 5, pp. 530, 536, 1978.Google Scholar
- C.-Y. Chan and Y. E. Ioannidis, "Bitmap index design and evaluation," in ACM SIGMOD Record, vol. 27, pp. 355--366, ACM, 1998.Google ScholarDigital Library
- J. Willms, "Autocorrelations of binary sequences and run structure," Transactions on Information Theory, vol. 59, 8, pp. 4985--1993, 2013.Google Scholar
- D. Lemire, O. Kaser, N. Kurz, L. Deri, C. O'Hara, F. Saint-Jacques, and G. Ssi-Yan-Kai, "Roaring bitmaps: Implementation of an optimized software library," Software: Practice and Experience, vol. 48, no. 4, pp. 867--895, 2018.Google ScholarCross Ref
- A. Ungethum, J. Pietrzyk, P. Damme, D. Habich, and W. Lehner, "Conflict detection-based run-length encoding - avx-512 cd instruction set in action," in International Conference on Data Engineering Workshops (ICDEW), pp. 96--101, IEEE, 2019.Google Scholar
- H. Lang, L. Passing, A. Kipf, P. Boncz, T. Neumann, and A. Kemper, "Make the most out of your simd investments: counter control flow divergence in compiled query pipelines," Journal on Very Large Data Bases (VLDB), pp. 1--18, 2019.Google Scholar
- D. Lemire, "Lemire's simdprune https://github.com/lemire/simdprune," 2019.Google Scholar
- H. S. Warren, Hacker's Delight. Addison-Wesley Professional, 2nd ed., 2012.Google Scholar
- C. Grana, "YACCLAB https://github.com/prittt/YACCLAB," 2016.Google Scholar
Recommendations
Designing efficient SIMD algorithms for direct Connected Component Labeling
WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector ProcessingConnected Component Labeling (CCL) is a fundamental algorithm in computer vision, and is often required for real-time applications. It consists in assigning a unique number to each connected component of a binary image. In recent years, we have seen the ...
A new SIMD iterative connected component labeling algorithm
WPMVP '16: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector ProcessingThis paper presents a new multi-pass iterative algorithm for Connected Component Labeling. The performance of this algorithm is compared to those of State-of-the-Art two-pass direct algorithms. We show that thanks to the parallelism of the SIMD multi-...
Parallelization of Connected-Component Labeling on TILE64 Many-Core Platform
Many-core technology is considering as a key to improve the performance of recent computer systems. To obtain good performance for a many-core system, exploiting parallelism in arithmetic level is not enough and the parallelization strategy must apply ...
Comments