Abstract
In this paper, we present the analysis and development of a cross-platform OpenCL implementation of the box-counting algorithm, which is one of the most widely-used methods for estimating the Fractal Dimension. The Fractal Dimension is a relevant image analysis method used in several disciplines, but computing it is in general a time consuming process, especially when working with 3D images. Unlike parallel programming models that strictly depend on the hardware type and manufacturer, like CUDA, OpenCL allows us to provide an implementation suitable for execution on both GPUs and multi-core CPUs, whatever the hardware manufacturer. Sorting is a key part of the fast box-counting algorithm and the final speedup is highly conditioned by the efficiency of the sorting algorithm used. Our study reveals that current OpenCL implementations of sorting algorithms are clearly slower when compared with both CUDA for GPU and specific multi-core CPU implementations. Our OpenCL algorithm has been specifically optimized according the type of the target device and the results show an average speedup of up to 7.46× and 4×, when executed on the GPU and the multi-core CPU respectively, both compared with the single-threaded (sequential) CPU implementation.
Similar content being viewed by others
References
Esteban FJ, Sepulcre J, Ruiz de Miras J, Navas J, de Mendizábal NV, Goñi J, Quesada JM, Bejarano B, Villoslada P (2009) Fractal dimension analysis of grey matter in multiple sclerosis. J Neurol Sci 282:67–71
Wu YT, Shyu KK, Jao CW, Wang ZY, Soong BW, Wu HM, Wang PS (2010) Fractal dimension analysis for quantifying cerebellar morphological change of multiple system atrophy of the cerebellar type (MSA-C). NeuroImage 49:39–551
Shyu KK, Wu YT, Chen TR, Chen HY, Hu HH, Guo WY (2011) Measuring complexity of fetal cortical surface from MR images using 3-D modified box-counting method. IEEE Trans Instrum Meas 60:522–531
Kotowski P (2006) Fractal dimension of metallic fracture surface. Int J Fract 141(1–2):269–286
de Souza J, Rostirolla SP (2011) A fast MATLAB program to estimate the multifractal spectrum of multidimensional data: application to fractures. Comput Geosci 37(2):241–249
Khanbareh H, Wu X, Van der Zwaag S (2012) Analysis of the fractal dimension of grain boundaries of AA7050 aluminum alloys and its relationship to fracture toughness. J Mater Sci 47(17):6246–6253
Vahedi A, Gorczyca B (2011) Application of fractal dimensions to study the structure of flocs formed in lime softening process. Water Res 45(2):545–556
Khoury M, Wenger R (2010) On the fractal dimension of isosurfaces. IEEE Trans Vis Comput Graph 16:1198–1205
Russel D, Hanson J, Ott E (1980) Dimension of strange attractors. Phys Rev Lett 45:1175–1178
Ruiz de Miras J, Villoslada P, Navas J, Esteban FJ (2011) UJA-3DFD: a program to compute the 3D fractal dimension from MRI data. Comput Methods Programs Biomed 104:452–460
Hou X, Gilmore R, Mindlin GB, Solari HG (1990) An efficient algorithm for fast O(N⋅ln(N)) box counting. Phys Lett A 151:43
Liebotich LS, Toth T (1989) A fast algorithm to determine fractal dimension by box counting. Phys Lett A 141:386
Bauer W, Mackenzie CD (2001) Cancer detection on a cell-by-cell basis using a fractal dimension analysis. Acta Phys Hung, Heavy Ion Phys 14(1–4):43–50
Koster M, Hannawald J, Brameshube W (2006) Simulation of water permeability and water vapor diffusion through hardened cement paste. Comput Mech 37(2):163–172
Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23(8):1369–1386
NVIDIA GPU computing documentation (2011). http://developer.nvidia.com/nvidia-gpu-computing-documentation
Khronos OpenCl Working Group (2010) The OpenCL specification. Version 1.1. http://www.khronos.org/opencl/
Jiménez J, Ruiz de Miras J (2012) Fast box-counting algorithm on GPU. Comput Methods Programs Biomed 108(3):1229–1242
Escalera S, Puig A, Amoros O, Salamó M (2011) Intelligent GPGPU classification in volume visualization: a framework based on error-correcting output codes. Comput Graph Forum 30(7):2107–2115
Weber R, Gothandaraman A, Hinde RJ, Peterson GD (2011) Comparing hardware accelerators in scientific applications: a case study. IEEE Trans Parallel Distrib Syst 22:58–68
Choudhary NK, Navada S, Ginjupalli R, Khanna G (2011) An exploration of OpenCL on multiple hardware platforms for a numerical relativity application. In: Proceedings of the international conference on parallel and distributed computing and systems, pp 87–92
Yuan Z, Si W, Liao X, Duan Z, Ding Y, Zhao J (2012) Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform. J Supercomput 61:84–102
Zavala-Romero O, Meyer-Baese A, Meyer-Baese U (2012) Multiplatform GPGPU implementation of the active contours without edges algorithm. In: Proceedings of SPIE, vol 8399
Kruger A (1996) Implementation of a fast box-counting algorithm. Comput Phys Commun 98:224–234
Bainville E (2011) OpenCL sorting. http://www.bealto.com/gpu-sorting_intro.html
Ha L, Krüger J, Silva CT (2009) Fast four-way parallel radix sorting on GPUs. Comput Graph Forum 28(8):2368–2378
Zagha M, Blelloch GE (1991) Radix sort for vector multiprocessors. In: Supercomputing’91: proceedings of the 1991 ACM/IEEE conference on supercomputing, New York, NY, USA, 1991, pp 712–721. ISBN: 0818621583
Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: IPDPS 2009—proceedings of the 2009 IEEE international parallel and distributed processing symposium
clpp—OpenCL Data Parallel Primitives Library (2011). http://code.google.com/p/clpp/
Hoberock J, Bell N (2012) Thrust: a parallel Template Library. v1.6.0. http://thrust.github.com/
Du P, Weber R, Luszczek P, Tomov S, Peterson G, Dongarra J (2012) From CUDA to OpenCL: towards a performance-portable solution for multi-platform GPU programming. Parallel Comput 38(8):391–407
Intel OpenCL Bitonic Sort algorithm (2011). http://software.intel.com/en-us/articles/vcsource-samples-bitonic-sorting/
Intel Threading Building Blocks (TBB) (2008). http://threadingbuildingblocks.org/
Stanford university (2011) The Stanford 3D scanning repository. http://graphics.stanford.edu/data/3Dscanrep
Aim@shape repository (2011). http://shapes.aimatshape.net
3DVIA repository (2011). http://www.3dvia.com
QuickSort. http://www.inf.fh-flensburg.de/lang/algorithmen/sortieren/quick/quicken.htm
Khan FG, Khan OU, Montrucchio B, Giaconne P (2011) Analysis of fast parallel sorting algorithms for GPU architectures. In: Proceedings—2011 9th international conference on frontiers of information technology, FIT 2011, pp 173–178
Process.h C library specification. http://www.digitalmars.com/rtl/process.html
Merrill D, Grimshaw A (2011) High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process Lett 21:245–272
Acknowledgements
This work has been partially supported by the University of Jaén, the Caja Rural de Jaén, the Andalusian Government and the European Union (via ERDF funds) through the research projects UJA2009/13/04 and PI10-TIC-5807.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiménez, J., Ruiz de Miras, J. Box-counting algorithm on GPU and multi-core CPU: an OpenCL cross-platform study. J Supercomput 65, 1327–1352 (2013). https://doi.org/10.1007/s11227-013-0885-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0885-z