Abstract
Ordinary programs contain many parallel loops which account for a significant portion of these programs’ completion time. The parallel executions of such loops can significantly speedup performance of modern multi-core systems. We propose a new framework - Locality Aware Self-scheduling (LASS) - for scheduling parallel loops to multi-core systems and boost up performance of known self-scheduling algorithms in diverse execution conditions. LASS enforces data locality, by forcing the execution of consecutive chunks of iterations to the same core, and favours load balancing with the introduction of a work-stealing mechanism. LASS is evaluated on a set of kernels on a multi-core system with 16 cores. Two execution scenarios are considered. In the first scenario our application runs alone on top of the operating system. In the second scenario our application runs in conjunction with an interfering parallel job. The average speedup achieved by LASS for first execution scenario is 11% and for the second one is 31%.
Keywords
This work was partially supported by the National Natural Science Foundation of China under grant NSFC-61300011.
References
Smith, B.J.: Architecture and applications of the HEP multiprocessor computer system. Real Time Signal Processing IV 298, 241–298 (1981)
Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans on Computers 36(12), 1425–1439 (1987)
Flynn-Hummel, S., Schonberg, E., Flynn, L.E.: Factoring: A method for scheduling parallel loops. Communications of the ACM 35(8), 90–101 (1992)
Tzen, T.H., Ni, L.M.: Trapezoid self-scheduling: a practical scheduling scheme for parallel computers. IEEE Trans. on Parallel and Distributed Systems 4(1), 87–98 (1993)
Breiman, L., Friedman, J., et al.: Classification and Regression Trees. Chapman & Hall/CRC (1984)
Jarp, S., Jurga, R., Nowak, A.: Perfmon2: a leap forward in performance monitoring. J. Phys. Conf. Ser. 119, 042017 (2008)
Podgorelec, V., Kokol, P., et al.: Decision trees: An overview and their use in medicine. J. Med. Syst. 26, 445–463 (2002)
Tang, P., Yew, P.C.: Processor self-scheduling for multiple nested parallel loops. In: ICPP, pp. 528–535 (1986)
Kruskal, C.P., Weiss, A.: Allocating independent subtasks on parallel processors. IEEE Trans. Softw. Eng. SE-1 1(10), 1001–1016 (1985)
Cariño, R.L., Banicescu, I.: Dynamic load balancing with adaptive factoring methods in scientific applications. J. Supercomput 44(1), 41–63 (2008)
Tabirca, T., Freeman, L., et al.: Feedback guided dynamic loop scheduling: convergence of the continuous case. J. Supercomput. 30(2), 151–178 (2004)
Markatos, E.P., LeBlanc, T.J.: Using processor affinity in loop scheduling on shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 5(4), 379–400 (1994)
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)
Blumofe, R.D., Joerg, C.F., et al.: Cilk: An efficient multithreaded runtime system. In: PPoPP, pp. 207–216 (1995)
Intel(R) Threading Building Blocks, Intel Corporation
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Wang, Y., Beni, L.A., Nicolau, A., Veidenbaum, A.V., Cammarota, R. (2014). A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms. In: Hsu, CH., Shi, X., Salapura, V. (eds) Network and Parallel Computing. NPC 2014. Lecture Notes in Computer Science, vol 8707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44917-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-662-44917-2_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44916-5
Online ISBN: 978-3-662-44917-2
eBook Packages: Computer ScienceComputer Science (R0)