Skip to main content

OpenMP Extension for Explicit Task Allocation on NUMA Architecture

  • Conference paper
  • First Online:
  • 1150 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Abstract

Most modern HPC systems consist of a number of cores grouped into multiple NUMA nodes. The latest Intel processors have multiple NUMA nodes inside a chip. Task parallelism using OpenMP dependent tasks is a promising programming model for many-core architecture because it can exploit parallelism in irregular applications with fine-grain synchronization. However, the current specification lacks functionality to improve data locality in task parallelism. In this paper, we propose an extension for the OpenMP task construct to specify the location of tasks to exploit the locality in an explicit manner. The prototype compiler is implemented based on GCC. The performance evaluation using the KASTORS benchmark shows that our approach can reduce remote page access. The Jacobi kernel using our approach shows 3.6 times better performance than GCC when using 36 threads on a 36-core, 4-NUMA node machine.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Barcelona OpenMP Task Suite (BOTS). https://pm.bsc.es/projects/bots/

  2. Drebes, A., Heydemann, K., Drach, N., Pop, A., Cohen, A.: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM Trans. Archit. Code Optim. 11(3), 30:1–30:25 (2014). http://doi.acm.org/10.1145/2641764

    Article  Google Scholar 

  3. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE Computer Society, Washington, DC (2009). doi:10.1109/ICPP.2009.64

  4. KASTORS Benchmark. https://gforge.inria.fr/projects/kastors/

  5. Muddukrishna, A., Jonsson, P.A., Vlassov, V., Brorsson, M.: Locality-aware task scheduling and data distribution on NUMA systems. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 156–170. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40698-0_12

    Chapter  Google Scholar 

  6. Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012). doi:10.1177/1094342011434065

    Article  Google Scholar 

  7. Olivier, S.L., de Supinski, B.R., Schulz, M., Prins, J.F.: Characterizing and mitigating work time inflation in task parallel programs. In: Proceedings ofthe International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 65:1–65:12. IEEE Computer Society Press, Los Alamitos (2012). http://dl.acm.org/citation.cfm?id=2388996.2389085

  8. Tahan, O.: Towards efficient OpenMP strategies for non-uniform architectures. CoRR abs/1411.7131 (2014). http://arxiv.org/abs/1411.7131

  9. Vikranth, B., Wankar, R., Rao, C.R.: Topology aware task stealing for on-chip NUMA multi-core processors. Procedia Comput. Sci. 18, 379–388 (2013). 2013 International Conference on Computational Science. http://www.sciencedirect.com/science/article/pii/S187705091300344X

    Article  Google Scholar 

  10. Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., Gautier, T.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11454-5_2

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinpil Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lee, J., Tsugane, K., Murai, H., Sato, M. (2016). OpenMP Extension for Explicit Task Allocation on NUMA Architecture. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45550-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45549-5

  • Online ISBN: 978-3-319-45550-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics