Skip to main content

Exploring and Evaluating Array Layout Restructuring for SIMDization

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Abstract

SIMD processor units have become ubiquitous. Using SIMD instructions is the key for performance for many applications. Modern compilers have made immense progress in generating efficient SIMD code. However, they still may fail or SIMDize poorly, due to conservativeness, source complexity or missing capabilities. When SIMDization fails, programmers are left with little clues about the root causes and actions to be taken.

Our proposed guided SIMDization framework builds on the assembly-code quality assessment toolkit MAQAO to analyzes binaries for possible SIMDization hindrances. It proposes improvement strategies and readily quantifies their impact, using in vivo evaluations of suggested transformation. Thanks to our framework, the programmer gets clear directions and quantified expectations on how to improve his/her code SIMDizability. We show results of our technique on TSVC benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. von Hanxleden, R., Kennedy, K.: Relaxing SIMD control flow constraints using loop transformations. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (1992)

    Google Scholar 

  2. Krall, A., Lelait, S.: Compilation techniques for multimedia processors. Int. J. Parallel Program. 28(4), 347–361 (2000)

    Article  Google Scholar 

  3. Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2000)

    Google Scholar 

  4. Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short simd architectures. In: ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)

    Google Scholar 

  5. Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2008)

    Google Scholar 

  6. Henretty, T., Stock, K., Pouchet, L.-N., Franchetti, F., Ramanujam, J., Sadayappan, P.: Data layout transformation for stencil computations on short-vector SIMD architectures. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 225–245. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Intel: Vtune (2014). http://software.intel.com/en-us/intel-vtune-amplifier-xe

  8. Videau, B., Marangozova-Martin, V., Genovese, L., Deutsch, T.: Optimizing 3D convolutions for wavelet transforms on CPUs with SSE units and GPUs. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 826–837. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Kong, M., Veras, R., Stock, K., Franchetti, F., Pouchet, L.N., Sadayappan, P.: When polyhedral transformations meet SIMD code generation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2013)

    Google Scholar 

  10. Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: an Mey, D., Alexander, M., Bientinesi, P., Cannataro, M., Clauss, C., Costan, A., Kecskemeti, G., Morin, C., Ricci, L., Sahuquillo, J., Schulz, M., Scarano, V., Scott, S.L., Weidendorfer, J. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 637–646. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  11. Callahan, D., Dongarra, J., Levine, D.: Vectorizing compilers: a test suite and results. In: Conference on Supercomputing (1988)

    Google Scholar 

  12. Maleki, S., Gao, Y., Garzarn, M.J., Wong, T., Padua, D.A.: An evaluation of vectorization compilers. In: International Conference on Parallel Architectures and Compilation Techniques (PACT) (2011)

    Google Scholar 

  13. Barthou, D., Rubial, A.C., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 OpenMP codes with MAQAO. In: Müller, M.S., Resch, M.M., Schulz, A., Nagel, W.E. (eds.) Tools for High Performance Computing. Springer, Heidelberg (2010)

    Google Scholar 

  14. Charif-Rubial, A.S., Barthou, D., Valensi, C., Shende, S., Malony, A., Jalby, W.: Mil: A language to build program analysis tools through static binary instrumentation. In: IEEE International High Performance Computing Conference (HiPC), Hyberabad, India, December 2013, pp. 206–215 (2013)

    Google Scholar 

  15. Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2005)

    Google Scholar 

  16. Ketterlin, A., Clauss, P.: Prediction and trace compression of data access addresses through nested loop recognition. In: ACM/IEEE International Conference on Code Generation and Optimization, pp. 94–103. ACM, New York (2008)

    Google Scholar 

  17. Lee, Y.-J., Hall, M.: A code isolator: isolating code fragments from large programs. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 164–178. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Hargrove, P.H., Duell, J.C.: Berkeley lab checkpoint/restart (BLCR) for linux clusters. J. Phys.: Conf. Ser. 46(1), 494 (2006)

    Google Scholar 

  19. Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: Workshop on Productivity and Performance (PROPER), Aachen, Germany, September 2013

    Google Scholar 

  20. Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2004)

    Google Scholar 

  21. Shin, J., Hall, M., Chame, J.: Superword-level parallelism in the presence of control flow. In: ACM/IEEE International Conference on Code Generation and Optimization (2005)

    Google Scholar 

  22. Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2006)

    Google Scholar 

  23. Ren, G., Wu, P., Padua, D.: Optimizing data permutations for SIMD devices. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2006)

    Google Scholar 

  24. Ren, B., Agrawal, G., Larus, J.R., Mytkowicz, T., Poutanen, T., Schulte, W.: SIMD parallelization of applications that traverse irregular data structures. In: ACM/IEEE International Conference on Code Generation and Optimization (2013)

    Google Scholar 

  25. Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Scout: a source-to-source transformator for SIMD-optimizations. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., Jeannot, E., Namyst, R., Roman, J., Scott, S.L., Traff, J.L., Vallée, G., Weidendorfer, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 137–145. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  26. Evans, G.C., Abraham, S., Kuhn, B., Padua, D.A.: Vector seeker: a tool for finding vector potential. In: Workshop on Programming Models for SIMD/Vector Processing, pp. 41–48. ACM, New York (2014)

    Google Scholar 

  27. Jaeger, J., Barthou, D.: Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs. In: IEEE International High Performance Computing Conference, pp. 1–10. IEEE Computer Society, Pune, December 2012

    Google Scholar 

  28. Petit, E., Bodin, F., Papaure, G., Dru, F.: ASTEX: a hot path based thread extractor for distributed memory system on a chip. In: HiPEAC Industrial Workshop (2006)

    Google Scholar 

  29. Akel, C., Kashnikov, Y., de Oliveira Castro, P., Jalby, W.: Is source-code isolation viable for performance characterization? In: International Workshop on Parallel Software Tools and Tool Infrastructures (2013)

    Google Scholar 

Download references

Acknowledgement

This work has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreements no 610402 Mont-Blanc 2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olivier Aumage .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Haine, C., Aumage, O., Petit, E., Barthou, D. (2015). Exploring and Evaluating Array Layout Restructuring for SIMDization. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17473-0_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17472-3

  • Online ISBN: 978-3-319-17473-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics