Exploring and Evaluating Array Layout Restructuring for SIMDization

Haine, Christopher; Aumage, Olivier; Petit, Enguerrand; Barthou, Denis

doi:10.1007/978-3-319-17473-0_23

Christopher Haine¹⁵,
Olivier Aumage¹⁵,
Enguerrand Petit¹⁵ &
…
Denis Barthou¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

883 Accesses
1 Citations

Abstract

SIMD processor units have become ubiquitous. Using SIMD instructions is the key for performance for many applications. Modern compilers have made immense progress in generating efficient SIMD code. However, they still may fail or SIMDize poorly, due to conservativeness, source complexity or missing capabilities. When SIMDization fails, programmers are left with little clues about the root causes and actions to be taken.

Our proposed guided SIMDization framework builds on the assembly-code quality assessment toolkit MAQAO to analyzes binaries for possible SIMDization hindrances. It proposes improvement strategies and readily quantifies their impact, using in vivo evaluations of suggested transformation. Thanks to our framework, the programmer gets clear directions and quantified expectations on how to improve his/her code SIMDizability. We show results of our technique on TSVC benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

von Hanxleden, R., Kennedy, K.: Relaxing SIMD control flow constraints using loop transformations. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (1992)
Google Scholar
Krall, A., Lelait, S.: Compilation techniques for multimedia processors. Int. J. Parallel Program. 28(4), 347–361 (2000)
Article Google Scholar
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2000)
Google Scholar
Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short simd architectures. In: ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2008)
Google Scholar
Henretty, T., Stock, K., Pouchet, L.-N., Franchetti, F., Ramanujam, J., Sadayappan, P.: Data layout transformation for stencil computations on short-vector SIMD architectures. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 225–245. Springer, Heidelberg (2011)
Chapter Google Scholar
Intel: Vtune (2014). http://software.intel.com/en-us/intel-vtune-amplifier-xe
Videau, B., Marangozova-Martin, V., Genovese, L., Deutsch, T.: Optimizing 3D convolutions for wavelet transforms on CPUs with SSE units and GPUs. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 826–837. Springer, Heidelberg (2013)
Chapter Google Scholar
Kong, M., Veras, R., Stock, K., Franchetti, F., Pouchet, L.N., Sadayappan, P.: When polyhedral transformations meet SIMD code generation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2013)
Google Scholar
Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: an Mey, D., Alexander, M., Bientinesi, P., Cannataro, M., Clauss, C., Costan, A., Kecskemeti, G., Morin, C., Ricci, L., Sahuquillo, J., Schulz, M., Scarano, V., Scott, S.L., Weidendorfer, J. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 637–646. Springer, Heidelberg (2014)
Chapter Google Scholar
Callahan, D., Dongarra, J., Levine, D.: Vectorizing compilers: a test suite and results. In: Conference on Supercomputing (1988)
Google Scholar
Maleki, S., Gao, Y., Garzarn, M.J., Wong, T., Padua, D.A.: An evaluation of vectorization compilers. In: International Conference on Parallel Architectures and Compilation Techniques (PACT) (2011)
Google Scholar
Barthou, D., Rubial, A.C., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 OpenMP codes with MAQAO. In: Müller, M.S., Resch, M.M., Schulz, A., Nagel, W.E. (eds.) Tools for High Performance Computing. Springer, Heidelberg (2010)
Google Scholar
Charif-Rubial, A.S., Barthou, D., Valensi, C., Shende, S., Malony, A., Jalby, W.: Mil: A language to build program analysis tools through static binary instrumentation. In: IEEE International High Performance Computing Conference (HiPC), Hyberabad, India, December 2013, pp. 206–215 (2013)
Google Scholar
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2005)
Google Scholar
Ketterlin, A., Clauss, P.: Prediction and trace compression of data access addresses through nested loop recognition. In: ACM/IEEE International Conference on Code Generation and Optimization, pp. 94–103. ACM, New York (2008)
Google Scholar
Lee, Y.-J., Hall, M.: A code isolator: isolating code fragments from large programs. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 164–178. Springer, Heidelberg (2005)
Chapter Google Scholar
Hargrove, P.H., Duell, J.C.: Berkeley lab checkpoint/restart (BLCR) for linux clusters. J. Phys.: Conf. Ser. 46(1), 494 (2006)
Google Scholar
Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: Workshop on Productivity and Performance (PROPER), Aachen, Germany, September 2013
Google Scholar
Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2004)
Google Scholar
Shin, J., Hall, M., Chame, J.: Superword-level parallelism in the presence of control flow. In: ACM/IEEE International Conference on Code Generation and Optimization (2005)
Google Scholar
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2006)
Google Scholar
Ren, G., Wu, P., Padua, D.: Optimizing data permutations for SIMD devices. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2006)
Google Scholar
Ren, B., Agrawal, G., Larus, J.R., Mytkowicz, T., Poutanen, T., Schulte, W.: SIMD parallelization of applications that traverse irregular data structures. In: ACM/IEEE International Conference on Code Generation and Optimization (2013)
Google Scholar
Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Scout: a source-to-source transformator for SIMD-optimizations. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., Jeannot, E., Namyst, R., Roman, J., Scott, S.L., Traff, J.L., Vallée, G., Weidendorfer, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 137–145. Springer, Heidelberg (2012)
Chapter Google Scholar
Evans, G.C., Abraham, S., Kuhn, B., Padua, D.A.: Vector seeker: a tool for finding vector potential. In: Workshop on Programming Models for SIMD/Vector Processing, pp. 41–48. ACM, New York (2014)
Google Scholar
Jaeger, J., Barthou, D.: Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs. In: IEEE International High Performance Computing Conference, pp. 1–10. IEEE Computer Society, Pune, December 2012
Google Scholar
Petit, E., Bodin, F., Papaure, G., Dru, F.: ASTEX: a hot path based thread extractor for distributed memory system on a chip. In: HiPEAC Industrial Workshop (2006)
Google Scholar
Akel, C., Kashnikov, Y., de Oliveira Castro, P., Jalby, W.: Is source-code isolation viable for performance characterization? In: International Workshop on Parallel Software Tools and Tool Infrastructures (2013)
Google Scholar

Download references

Acknowledgement

This work has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreements no 610402 Mont-Blanc 2.

Author information

Authors and Affiliations

LaBRI/INRIA, University of Bordeaux, Bordeaux, France
Christopher Haine, Olivier Aumage, Enguerrand Petit & Denis Barthou

Authors

Christopher Haine
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Aumage
View author publications
You can also search for this author in PubMed Google Scholar
Enguerrand Petit
View author publications
You can also search for this author in PubMed Google Scholar
Denis Barthou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivier Aumage .

Editor information

Editors and Affiliations

Intel Corporation, Santa Clara, California, USA
James Brodman
Intel Corporation, Santa Clara, California, USA
Peng Tu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haine, C., Aumage, O., Petit, E., Barthou, D. (2015). Exploring and Evaluating Array Layout Restructuring for SIMDization. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-17473-0_23
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics