Abstract
Medium and large clusters incorporating hybrid CPU/graphics processing unit (GPU) nodes are present in many datacenters today. They can accelerate many different kinds of applications and appropriately manage applications dealing with a high volume of data. This is the case of the similarity problem because large databases are managed and very quick responses are required to hundreds or thousands of queries per second. However, the design and usage of heterogeneous computing platforms poses big challenges as system size, energy saving, task mapping, scheduling, among others, must be efficiently handled. In this paper we focus on the scheduling issue for distributing the incoming queries to all the processing components in the cluster nodes. Our algorithms exploit the computational resources, simultaneously processing queries on CPU cores and on the GPUs. Thus, we address the problem of how to distribute the queries over the whole system in order to obtain the best performance, under the assumption of defining a heuristic that automatically provides the best distribution. Experimental results show the benefits in terms of execution time and energy saving of using an appropriate scheduling scheme.
Similar content being viewed by others
References
Chávez E, Navarro G, Baeza-Yates R, Marroquín JL (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321
Kalantari I, McDonald G (1983) A data structure and an algorithm for the nearest point problem. IEEE Trans Softw Eng 9(5):631–634
Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40:175–179
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd international conference on VLDB, pp 426–435
Brin S (1995) Near neighbor search in large metric spaces. In: Proceedings of the 21st VLDB conference, pp 574–584
Navarro G, Uribe-Paredes R (2011) Fully dynamic metric access methods based on hyperplane partitioning. Inf Syst 36(4):734–747
Micó ML, Oncina J, Vidal E (1994) A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognit Lett 15(1):9–17
Baeza-Yates R, Cunto W, Manber U, Wu S (1994) Proximity matching using fixed-queries trees. In: Proceedings of the 5th combinatorial pattern matching (CPM’94), LNCS-807, pp 198–212
Chávez E, Marroquín JL, Baeza-Yates R (1999) Spaghettis: an array based algorithm for similarity queries in metric spaces. In: Proceedings of the 6th international symposium on string processing and information retrieval (SPIRE’99), pp 38–46
Chávez E, Marroquín JL, Navarro G (2001) Fixed queries array: a fast and economical data structure for proximity searching. Multimed Tools Appl 14(2):113–135
Pedreira O, Brisaboa NR (2007) Spatial selection of sparse pivots for similarity search in metric spaces. In: Proceedings of the 33rd conference on current trends in theory and practice of computer science, LNCS-4362, pp 434–445
Top500. http://top500.org/. Accessed 26 Dec 2013
Green500. http://green500.org/. Accessed 26 Dec 2013
Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: Proceedings of the 2010 international conference on high performance computing and simulation (HPCS 2010), pp 224–231
Zezula P, Savino P, Rabitti F, Amato G, Ciaccia P (1998) Processing m-trees with parallel resources. In: Proceedings of the workshop on research issues in database engineering, RIDE ’98, p 147
Alpkocak A, Danisman T, Tuba U (2002) A parallel similarity search in high dimensional metric space using m-tree. In: Proceedings of the advanced environments, tools, and applications for cluster computing LNCS-2326, pp 247–252
Gil-Costa V, Marín M, Reyes N (2009) Parallel query processing on distributed clustering indexes. J Discret Algorithms 7(1):3–17
Gil-Costa V, Barrientos R, Marín M, Bonacic C (2010) Scheduling metric-space queries processing on multi-core processors. In: Proceedings of the Euromicro conference on parallel, distributed, and network-based processing, pp 187–194
Kuang Q, Zhao L (2009) A practical GPU based kNN algorithm. In: Proceedings of the international symposium on computer science and computational technology (ISCSCT), pp 151–155
Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: Computer vision and pattern recognition workshop, pp 1–6
Barrientos RJ, Gómez JI, Tenllado C, Prieto M, Marín M (2011) kNN query processing in metric spaces using GPUs. In: Proceedings of the 17th international European conference on parallel and distributed computing (Euro-Par 2011), LNCS-6852, pp 380–392
Barrientos RJ, Gómez JI, Tenllado C, Prieto M, Marín M (2012) Range query processing in a multi-GPU environment. In: Proceedings of the 10th IEEE international symposium on parallel and distributed processing with applications (ISPA 2012), pp 419–426
Uribe-Paredes R, Valero-Lara P, Arias E, Sánchez JL, Cazorla D (2011) Similarity search implementations for multi-core and many-core processors. In: Proceedings of the international conference on high performance computing and simulation (HPCS), pp 656–663
Uribe-Paredes R, Cazorla D, Sánchez JL, Arias E (2012) A comparative study of different metric structures. In: Thinking on GPU implementations. Lecture notes in engineering and computer science, pp 312–317
Uribe-Paredes R, Arias E, Sánchez JL, Cazorla D, Valero-Lara P (2012) Improving the performance for the range search on metric spaces using a multi-GPU platform. In: Proceedings of the database and expert systems applications (DEXA), LNCS-7447, pp 442–449
Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, USA
Uribe-Paredes R, Arias E, Sánchez JL, Cazorla D (2013) Metric data structures supported by heterogeneous systems, Technical Report DIAB-13-05-2, University of Castilla-La Mancha, Albacete
YOKOGAMA PZ4000 POWER ANALYZER. http://tmi.yokogawa.com/es/. Accessed 26 Dec 2013
Acknowledgments
This work has been partially supported by the project Ref: TIN2009-14475-C04 and by CAPAP-H4 network (TIN2011-15734-E).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Uribe-Paredes, R., Cazorla, D., Arias, E. et al. Towards an efficient static scheduling scheme for delivering queries to heterogeneous clusters in the similarity search problem. J Supercomput 70, 527–540 (2014). https://doi.org/10.1007/s11227-013-1079-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-1079-4