Abstract
Modern Android mobile devices are enabled by complex heterogeneous MPSoC platforms. To exploit the full potential of these hardware platforms, computationally intensive parts of applications have to be properly parallelized. However, the current practice involves several manual steps, which is a cumbersome task for programmers. In this paper, we present an automated approach to extract multiple forms of parallelism from native C code within Android applications, targeting heterogeneous multicore devices. We show the effectiveness of our approach by parallelizing a set of benchmarks on a Nexus 7 tablet, which is based on a Snapdragon MPSoC that features a quad-core Krait CPU cluster and an Adreno 320 GPU.
Similar content being viewed by others
References
Acosta, A., Almeida, F.: Euro-Par 2013: parallel processing workshops. In: Towards a Unified Heterogeneous Development Model in AndroidTM, Chap., pp. 238–248. Springer, Berlin (2014)
Aguilar, M.A., Eusse, J.F., Ray, P., Leupers, R., Ascheid, G., Sheng, W., Sharma, P.: Parallelism extraction in embedded software for Android devices. In: SAMOS XV, pp. 9–17 (2015)
Aguilar, M.A., Leupers, R.: Unified identification of multiple forms of parallelism in embedded applications. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 482–483 (2015)
Aguilar, M.A., Leupers, R., Ascheid, G., Kavvadias, N.: A toolflow for parallelization of embedded software in multicore DSP platforms. In: Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems. SCOPES’15, pp. 76–79. ACM, New York (2015)
Aguilar, M.A., Leupers, R., Ascheid, G., Murillo, L.G.: Automatic parallelization and accelerator offloading for embedded applications on heterogeneous MPSoCs. In: Proceedings of the 53rd Annual Design Automation Conference, DAC’16, pp. 49:1–49:6. ACM, New York (2016)
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20. 1967, Spring Joint Computer Conference, AFIPS’67 (Spring), pp. 483–485. ACM, New York (1967)
ASUS: Nexus 7 (2013). (online) http://www.asus.com/Tablets_Mobile/Nexus_7_2013/. Accessed 02/2016
Boissinot, B.: Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions. Ph.D. thesis (2010)
Castrillon, J., Leupers, R.: Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Springer, Berlin (2014)
Castrillon, J., Leupers, R., Ascheid, G.: MAPS: mapping concurrent dataflow applications to heterogeneous MPSoCs. IEEE Trans. Ind. Inform. (99), 19 (2011)
Castrillon, J., Tretter, A., Leupers, R., Ascheid, G.: Communication-aware mapping of KPN applications onto heterogeneous MPSoCs. In: Proceedings of the 49th Annual Design Automation Conference, pp. 1266–1271. ACM, New York (2012)
Chandrasekaran, S., Chapman, B.: A portable OpenMP runtime library based on MCAPI/MRAPI. (online) http://www.embedded.com. Accessed 03/2016
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload characterization, IISWC 2009. pp. 44–54 (2009)
Cordes, D.A.: Automatic parallelization for embedded multi-core systems using high-level cost models. Ph.D. thesis, TU Dortmund University (2013)
CriticalBlue: Prism. (online) http://www.criticalblue.com/. Accessed 3/2016
Eclipse. (online) www.eclipse.org. Accessed 03/2016
Eusse, J.F., Williams, C., Leupers, R.: CoEx: A novel profiling-based algorithm/architecture co-exploration for ASIP design. ACM Trans. Reconfig. Technol. Syst. 8, 17:1–17:16 (2014)
Faxen, K.F., Popov, K., Albertsson, L., Janson, S.: Embla—data dependence profiling for parallel programming. In: Proceedings of Complex, Intelligent and Software Intensive Systems, pp. 780–785 (2008)
Gilles, K.: The semantics of a simple language for parallel programming. In: Rosenfeld, J.L. (ed.) IFIP Congress 74, pp. 471–475. North Holland, Amsterdam (1974)
Google: Android Auto. (online) https://www.android.com/auto/. Accessed 02/2016
Google: Android: Canvas and Drawables. (online) http://developer.android.com/guide/topics/graphics/2d-graphics.html. Accessed 03/2016
Google: Android Studio. (online) http://developer.android.com/tools/studio/index.html. Accessed 03/2016
Google: ART and Dalvik. (online) https://source.android.com/devices/tech/dalvik/index.html. Accessed 02/2016
Google: Java Native Interface. (online) http://developer.android.com/training/articles/perf-jni.html. Accessed 02/2016
Google: Native Development Kit. (online) http://developer.android.com/ndk/guides/concepts.html. Accessed 02/2016
Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 151–162 (2006)
IDC: Smartphone os market share, 2015 q2. (online) http://www.idc.com/prodserv/smartphone-os-market-share.jsp. Accessed 02/2016
Islam, M.: On the limitations of compilers to exploit thread-level parallelism in embedded applications. In: 6th IEEE/ACIS International Conference on Computer and Information Science, 2007. ICIS 2007, pp. 60–66 (2007)
Johnson, R.C.: Efficient program analysis using dependence flow graphs. Ph.D. thesis, Cornell University (1994)
Johnson, R.E.: Software development is program transformation. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research. FoSER’10, pp. 177–180. ACM, New York (2010)
Karkowski, I., Corporaal, H.: Overcoming the limitations of the traditional loop parallelization. FGCS 13(4–5), 407–416 (1998)
Kejariwal, A., Veidenbaum, A.V., Nicolau, A., Girkarmark, M., Tian, X., Saito, H.: Challenges in exploitation of loop parallelism in embedded applications. In: Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis. CODES+ISSS’06, pp. 173–180. ACM, New York (2006)
Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, San Francisco (2002)
Ketterlin, A., Clauss, P.: Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In: Proceedings of MICRO 45, pp. 437–448. IEEE Computer Society, Washington (2012)
Keutzer, K., Mattson, T.: Our pattern language (OPL). A pattern language for parallel programming. (online) http://parlab.eecs.berkeley.edu/wiki/patterns/patterns. Accessed 06/2016
Khronos: The OpenCL specification. version 1.1. (online) https://www.khronos.org/registry/cl/specs/opencl-1.1.pdf. Accessed 03/2016
Kienhuis, B., Rijpkema, E., Deprettere, E.: Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In: Proceedings of CODES 2000, pp. 13–17
Kim, M.: Dynamic program analysis algorithms to assist parallelization. Ph.D. thesis, Atlanta. AAI0828881 (2012)
Kock, E.A.D., Essink, G., Smits, W.J.M., Wolf, P.V.D.: YAPI: application modeling for signal processing systems. In: Proceedings of 37th DAC, pp. 402–405 (2000)
McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient Computation, 1st edn. Morgan Kaufmann, San Francisco (2012)
Membarth, R., Reiche, O., Hannig, F., Teich, J.: Code generation for embedded heterogeneous architectures on Android. In: Proceedings of DATE’14, pp. 86:1–86:6
Multicore Association: Software-hardware interface for multi-many-core (SHIM) specification v1.00. (online) http://www.multicore-association.org. Accessed 06/2016
OpenMP Review Board: Openmp application program interface. version 3.1. (online) www.openmp.org/mp-documents/OpenMP3.1.pdf. Accessed 08/2016
Qualcomm: Snapdragon. (online) https://www.qualcomm.com/products/snapdragon. Accessed 02/2016
Samsung: Exynos. (online) https://www.samsung.com/exynos. Accessed 02/2016
Sheng, W., Schürmans, S., Odendahl, M., Bertsch, M., Volevach, V., Leupers, R., Ascheid, G.: A compiler infrastructure for embedded heterogeneous MPSoCs. Parallel Comput. 40(2), 51–68 (2014)
Silexica: (online) http://www.silexica.com. Accessed 4/2016
Stotzer, E.: Towards using OpenMP in embedded systems. OpenMPCon: Developers Conference (2015)
Stulova, A., Leupers, R., Ascheid, G.: Throughput driven transformations of synchronous data flows for mapping to heterogeneous MPSoCs. In: Proceedings of SAMOS XII, pp. 144–151 (2012)
Sujeeth, A.K., Brown, K.J., Lee, H., Rompf, T., Chafi, H., Odersky, M., Olukotun, K.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. 13(4s), 134:1–134:25 (2014). doi:10.1145/2584665
Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In: Proceedings of MICRO 40, pp. 356–369. IEEE Computer Society (2007)
Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: a language for streaming applications. In: Proceedings of CC’02, pp. 179–196. Springer, Berlin (2002)
Tournavitis, G.: Profile-driven parallelization of sequential programs. Ph.D. thesis, University of Edinburgh (2011)
Verdoolaege, S., Nikolov, H., Stefanov, T.: Pn: A tool for improved derivation of process networks. EURASIP J. Embedded Syst. 2007(1), 19–19 (2007). doi:10.1155/2007/75947
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aguilar, M.A., Eusse, J.F., Ray, P. et al. Towards Parallelism Extraction for Heterogeneous Multicore Android Devices. Int J Parallel Prog 45, 1592–1624 (2017). https://doi.org/10.1007/s10766-016-0479-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0479-5