Abstract
SIMD processor units have become ubiquitous. Using SIMD instructions is the key for performance for many applications. Modern compilers have made immense progress in generating efficient SIMD code. However, they still may fail or SIMDize poorly, due to conservativeness, source complexity or missing capabilities. When SIMDization fails, programmers are left with little clues about the root causes and actions to be taken.
Our proposed guided SIMDization framework builds on the assembly-code quality assessment toolkit MAQAO to analyzes binaries for possible SIMDization hindrances. It proposes improvement strategies and readily quantifies their impact, using in vivo evaluations of suggested transformation. Thanks to our framework, the programmer gets clear directions and quantified expectations on how to improve his/her code SIMDizability. We show results of our technique on TSVC benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
von Hanxleden, R., Kennedy, K.: Relaxing SIMD control flow constraints using loop transformations. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (1992)
Krall, A., Lelait, S.: Compilation techniques for multimedia processors. Int. J. Parallel Program. 28(4), 347–361 (2000)
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2000)
Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short simd architectures. In: ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2008)
Henretty, T., Stock, K., Pouchet, L.-N., Franchetti, F., Ramanujam, J., Sadayappan, P.: Data layout transformation for stencil computations on short-vector SIMD architectures. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 225–245. Springer, Heidelberg (2011)
Intel: Vtune (2014). http://software.intel.com/en-us/intel-vtune-amplifier-xe
Videau, B., Marangozova-Martin, V., Genovese, L., Deutsch, T.: Optimizing 3D convolutions for wavelet transforms on CPUs with SSE units and GPUs. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 826–837. Springer, Heidelberg (2013)
Kong, M., Veras, R., Stock, K., Franchetti, F., Pouchet, L.N., Sadayappan, P.: When polyhedral transformations meet SIMD code generation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2013)
Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: an Mey, D., Alexander, M., Bientinesi, P., Cannataro, M., Clauss, C., Costan, A., Kecskemeti, G., Morin, C., Ricci, L., Sahuquillo, J., Schulz, M., Scarano, V., Scott, S.L., Weidendorfer, J. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 637–646. Springer, Heidelberg (2014)
Callahan, D., Dongarra, J., Levine, D.: Vectorizing compilers: a test suite and results. In: Conference on Supercomputing (1988)
Maleki, S., Gao, Y., Garzarn, M.J., Wong, T., Padua, D.A.: An evaluation of vectorization compilers. In: International Conference on Parallel Architectures and Compilation Techniques (PACT) (2011)
Barthou, D., Rubial, A.C., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 OpenMP codes with MAQAO. In: Müller, M.S., Resch, M.M., Schulz, A., Nagel, W.E. (eds.) Tools for High Performance Computing. Springer, Heidelberg (2010)
Charif-Rubial, A.S., Barthou, D., Valensi, C., Shende, S., Malony, A., Jalby, W.: Mil: A language to build program analysis tools through static binary instrumentation. In: IEEE International High Performance Computing Conference (HiPC), Hyberabad, India, December 2013, pp. 206–215 (2013)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2005)
Ketterlin, A., Clauss, P.: Prediction and trace compression of data access addresses through nested loop recognition. In: ACM/IEEE International Conference on Code Generation and Optimization, pp. 94–103. ACM, New York (2008)
Lee, Y.-J., Hall, M.: A code isolator: isolating code fragments from large programs. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 164–178. Springer, Heidelberg (2005)
Hargrove, P.H., Duell, J.C.: Berkeley lab checkpoint/restart (BLCR) for linux clusters. J. Phys.: Conf. Ser. 46(1), 494 (2006)
Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: Workshop on Productivity and Performance (PROPER), Aachen, Germany, September 2013
Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD architectures with alignment constraints. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2004)
Shin, J., Hall, M., Chame, J.: Superword-level parallelism in the presence of control flow. In: ACM/IEEE International Conference on Code Generation and Optimization (2005)
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2006)
Ren, G., Wu, P., Padua, D.: Optimizing data permutations for SIMD devices. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2006)
Ren, B., Agrawal, G., Larus, J.R., Mytkowicz, T., Poutanen, T., Schulte, W.: SIMD parallelization of applications that traverse irregular data structures. In: ACM/IEEE International Conference on Code Generation and Optimization (2013)
Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Scout: a source-to-source transformator for SIMD-optimizations. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., Jeannot, E., Namyst, R., Roman, J., Scott, S.L., Traff, J.L., Vallée, G., Weidendorfer, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 137–145. Springer, Heidelberg (2012)
Evans, G.C., Abraham, S., Kuhn, B., Padua, D.A.: Vector seeker: a tool for finding vector potential. In: Workshop on Programming Models for SIMD/Vector Processing, pp. 41–48. ACM, New York (2014)
Jaeger, J., Barthou, D.: Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs. In: IEEE International High Performance Computing Conference, pp. 1–10. IEEE Computer Society, Pune, December 2012
Petit, E., Bodin, F., Papaure, G., Dru, F.: ASTEX: a hot path based thread extractor for distributed memory system on a chip. In: HiPEAC Industrial Workshop (2006)
Akel, C., Kashnikov, Y., de Oliveira Castro, P., Jalby, W.: Is source-code isolation viable for performance characterization? In: International Workshop on Parallel Software Tools and Tool Infrastructures (2013)
Acknowledgement
This work has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreements no 610402 Mont-Blanc 2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Haine, C., Aumage, O., Petit, E., Barthou, D. (2015). Exploring and Evaluating Array Layout Restructuring for SIMDization. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-17473-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)