research-article

Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs

Authors:
Amit Kumar Singh

Nanyang Technological University and National University of Singapore, Singapore

Nanyang Technological University and National University of Singapore, Singapore
View Profile

,
Akash Kumar

National University of Singapore and Eindhoven University of Technology, Singapore

National University of Singapore and Eindhoven University of Technology, Singapore
View Profile

,
Thambipillai Srikanthan

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 18 Issue 1Article No.: 9pp 1–29https://doi.org/10.1145/2390191.2390200

Published:16 January 2013Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Modern embedded systems need to support multiple time-constrained multimedia applications that often employ multiprocessor-systems-on-chip (MPSoCs). Such systems need to be optimized for resource usage and energy consumption. It is well understood that a design-time approach cannot provide timing guarantees for all the applications due to its inability to cater for dynamism in applications. However, a runtime approach consumes large computation requirements at runtime and hence may not lend well to constrained-aware mapping.

In this article, we present a hybrid approach for efficient mapping of applications in such systems. For each application to be supported in the system, the approach performs extensive design-space exploration (DSE) at design time to derive multiple design points representing throughput and energy consumption at different resource combinations. One of these points is selected at runtime efficiently, depending upon the desired throughput while optimizing for energy consumption and resource usage. While most of the existing DSE strategies consider a fixed multiprocessor platform architecture, our DSE considers a generic architecture, making DSE results applicable to any target platform. All the compute-intensive analysis is performed during DSE, which leaves for minimum computation at runtime. The approach is capable of handling dynamism in applications by considering their runtime aspects and providing timing guarantees.

The presented approach is used to carry out a DSE case study for models of real-life multimedia applications: H.263 decoder, H.263 encoder, MPEG-4 decoder, JPEG decoder, sample rate converter, and MP3 decoder. At runtime, the design points are used to map the applications on a heterogeneous MPSoC. Experimental results reveal that the proposed approach provides faster DSE, better design points, and efficient runtime mapping when compared to other approaches. In particular, we show that DSE is faster by 83% and runtime mapping is accelerated by 93% for some cases. Further, we study the scalability of the approach by considering applications with large numbers of tasks.

References

Ahn, Y., Han, K., Lee, G., Song, H., Yoo, J., Choi, K., and Feng, X. 2008. SoCDAL: System-on-chip design acceLerator. ACM Trans. Des. Autom. Electron. Syst. 13, 17, 1--38. Google ScholarDigital Library
Angiolini, F., Ceng, J., Leupers, R., Ferrari, F., Ferri, C., and Benini, L. 2006. An integrated open framework for heterogeneous MPSoC design space exploration. In Proceedings of the Design, Automation and Test Conference in Europe. 1--6. Google ScholarDigital Library
Ascia, G., Catania, V., Di Nuovo, A. G., Palesi, M., and Patti, D. 2007. Efficient design space exploration for application specific systems-on-a-chip. J. Syst. Archit. 53, 733--750. Google ScholarDigital Library
Benini, L., Bertozzi, D., and Milano, M. 2008. Resource management policy handling multiple use-cases in MPSoC platforms using constraint programming. In Proceedings of the International Conference on Logic Programming. 470--484. Google ScholarDigital Library
Bonfietti, A., Lombardi, M., Milano, M., and Benini, L. 2009. Throughput constraint for synchronous data flow graphs. In Proceedings of the International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. 26--40. Google ScholarDigital Library
Borkar, S. 2007. Thousand core chips: A technology perspective. In Proceedings of the Annual Design Automation Conference. 746--749. Google ScholarDigital Library
Carvalho, E. and Moraes, F. 2008. Congestion-aware task mapping in heterogeneous MPSoCs. In International Symposium on System-on-Chop (SoC). 1--4.Google Scholar
Cho, S. H., Xanthopoulos, T., and Chandrakasan, A. 1999. A low power variable length decoder for MPEG-2 based on nonuniform fine-grain table partitioning. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 7, 2, 249--257. Google ScholarDigital Library
Gangwal, O. P., Radulescu, A., Goossens, K., Pestana, S. G., and Rijpkema, E. 2005. Building predictable systems on chip: An analysis of guaranteed communication in the Æthereal network on chip. In Dynamic and Robust Streaming in and between Connected Consumer-Electronic Devices, vol. 3, Springer, 1--36.Google Scholar
Geilen, M., Basten, T., Theelen, B., and Otten, R. 2005. An algebra of Pareto points. In Proceedings of the International Conference on Application of Concurrency to System Design. 88--97. Google ScholarDigital Library
Ghamarian, A. H., Geilen, M. C. W., Stuijk, S., Basten, T., Theelen, B. D., Mousavi, M. R., Moonen, A. J. M., and Bekooij, M. J. G. 2006. Throughput analysis of synchronous data flow graphs. In Proceedings of the International Conference on Application of Concurrency to System Design. 25--36. Google ScholarDigital Library
Giovanni, B., Fossati, L., and Sciuto, D. 2010. Decision-theoretic design space exploration of multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integ. Cir. Sys. 29, 1083--1095. Google ScholarDigital Library
Goossens, K., Dielissen, J., and Radulescu, A. 2005. AEthereal network on chip: Concepts, architectures, and implementations. IEEE Des. Test 22, 5, 414--421. Google ScholarDigital Library
Grecu, C., Pande, P., Ivanov, A., and Saleh, R. 2005. Timing analysis of network on chip architectures for mp-soc platforms. Microelectronics J. 36, 9, 833--845.Google ScholarCross Ref
Hentati, M., Aoudni, Y., Nezan, J., Abid, M., and Deforges, O. 2011. FPGA dynamic reconfiguration using the RVC technology: Inverse quantization case study. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing. 1--7.Google Scholar
Hu, J. and Marculescu, R. 2004. Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In Proceedings of the conference on Design, automation and Test in Europe (DATE'04). Google ScholarDigital Library
Jia, Z. J., Pimentel, A., Thompson, M., Bautista, T., and Nunez, A. 2010. NASA: A generic infrastructure for system-level MP-SoC design space exploration. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia. 41--50.Google Scholar
Keinert, J., Streubühr, M., Schlichter, T., Falk, J., Gladigau, J., Haubelt, C., Teich, J., and Meredith, M. 2009. SystemCoDesigner—an automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications. ACM Trans. Des. Autom. Electron. Syst. 14, 1, 1--23. Google ScholarDigital Library
Kim, M., Banerjee, S., Dutt, N., and Venkatasubramanian, N. 2008. Energy-aware cosynthesis of real-time multimedia applications on MPSoCs using heterogeneous scheduling policies. ACM Trans. Embed. Comput. Syst. 7, 1, 1--19. Google ScholarDigital Library
Kistler, M., Perrone, M., and Petrini, F. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro 26, 10--23. Google ScholarDigital Library
Kumar, A., Fernando, S., Ha, Y., Mesman, B., and Corporaal, H. 2008. Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA. ACM Trans. Des. Autom. Electron. Syst. 13, 40, 1--27. Google ScholarDigital Library
Lee, E. A. and Messerschmitt, D. G. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24--35. Google ScholarDigital Library
Leijten, J., van Meerbergen, J., Timmer, A., and Jess, J. 1997. PROPHID: A heterogeneous multi-processor architecture for multimedia. In Proceedings of the International Conference on Computer Design. 164--169. Google ScholarDigital Library
Liu, W., Yuan, M., He, X., Gu, Z., and Liu, X. 2008. Efficient SAT-based mapping and scheduling of homogeneous synchronous dataflow graphs for throughput optimization. In Proceedings of the Real-Time Systems Symposium. 492--504. Google ScholarDigital Library
Lukasiewycz, M., Glass, M., Haubelt, C., and Teich, J. 2008. Efficient symbolic multi-objective design space exploration. In Proceedings of the Asia and South Pacific Design Automation Conference. 691--696. Google ScholarDigital Library
Mariani, G., Avasare, P., Vanmeerbeeck, G., Ykman-Couvreur, C., Palermo, G., Silvano, C., and Zaccaria, V. 2010. An industrial design space exploration framework for supporting run-time resource management on multi-core systems. In Proceedings of the Conference on Design, Automation and Test in Europe. 196--201. Google ScholarDigital Library
Moreira, O., Mol, J. J.-D., and Bekooij, M. 2007. Online resource management in a multiprocessor with a network-on-chip. In Proceedings of the Symposium on Applied Computing. 1557--1564. Google ScholarDigital Library
Moreira, O., Valente, F., and Bekooij, M. 2007. Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor. In Proceedings of the International Conference on Embedded Software. 57--66. Google ScholarDigital Library
Nollet, V., Avasare, P., Eeckhaut, H., Verkest, D., and Corporaal, H. 2008. Run-time management of a MPSoC containing FPGA fabric tiles. IEEE Trans. Very Large Scale Integr. Syst. 16, 24--33. Google ScholarDigital Library
OEIS. 2012. Encyclopedia of integer sequences. http://oeis.org/.Google Scholar
Palermo, G., Silvano, C., and Zaccaria, V. 2005. Multi-objective design space exploration of embedded systems. J. Embed. Comput. 1, 305--316. Google ScholarDigital Library
Palermo, G., Silvano, C., and Zaccaria, V. 2008. Robust optimization of SoC architectures: A multi-scenario approach. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia. 7--12.Google Scholar
Palma, J., Marcon, C., Moraes, F., Calazans, N., Reis, R., and Susin, A. 2005. Mapping embedded systems onto NoCs—The traffic effect on dynamic energy estimation. In Proceedings of the Symposium on Integrated Circuits and Systems Design. 196--201. Google ScholarDigital Library
Paulin, P. G., Pilkington, C., Bensoudane, E., Langevin, M., and Lyonnard, D. 2004. Application of a multi-processor SoC platform to high-speed packet forwarding. In Proceedings of the Conference on Design, Automation and Test in Europe. 58--63. Google ScholarDigital Library
Ren, J. and Kehtarnavaz, N. 2007. Comparison of power consumption for motion compensation and deblocking filters in high definition video coding. In Proceedings of the International Symposium on Consumer Electronics. 1--5.Google Scholar
Rutten, M. J., van Eijndhoven, J. T. J., Jaspers, E. G. T., van der Wolf, P., Pol, E.-J. D., Gangwal, O. P., and Timmer, A. 2002. A heterogeneous multiprocessor architecture for flexible media processing. IEEE Des. Test 19, 39--50. Google ScholarDigital Library
Schranzhofer, A., Chen, J.-J., and Thiele, L. 2010. Dynamic power-aware mapping of applications onto heterogeneous MPSoC platforms. IEEE Trans. Ind. Inf. 6, 4, 692--707.Google ScholarCross Ref
Segars, S. 1997. ARM7TDMI power consumption. IEEE Micro 17, 4, 12--19. Google ScholarDigital Library
Singh, A. K., Jigang, W., Prakash, A., and Srikanthan, T. 2009. Efficient heuristics for minimizing communication overhead in noc-based heterogeneous MPSoC platforms. In Proceedings of the International Symposium on Rapid System Prototyping. 55--60. Google ScholarDigital Library
Singh, A. K., Kumar, A., and Srikanthan, T. 2011. A hybrid strategy for mapping multiple throughput-constrained applications on MPSoCs. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems. Google ScholarDigital Library
Singh, A. K., Srikanthan, T., Kumar, A., and Jigang, W. 2010. Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms. J. Syst. Archit. 56, 242--255. Google ScholarDigital Library
Stuijk, S., Basten, T., Geilen, M. C. W., and Corporaal, H. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In Proceedings of the 44th Annual Design Automation Conference. 777--782. Google ScholarDigital Library
Stuijk, S., Geilen, M., and Basten, T. 2006. SDF³: SDF for free. In Proceedings of the 6th International Conference on Application of Concurrency to System Design. 276--278. Google ScholarDigital Library
Stuijk, S., Geilen, M., and Basten, T. 2010. A predictable multiprocessor design flow for streaming applications with dynamic behaviour. In Proceedings of Euromicro Conference on Digital System Design. 548--555. Google ScholarDigital Library
Sung, T.-Y., Shieh, Y.-S., Yu, C.-W., and Hsin, H.-C. 2006. High-efficiency and low-power architectures for 2-D DCT and IDCT based on CORDIC rotation. In International Conference on Parallel and Distributed Computing, Applications and Technologies. 191--196. Google ScholarDigital Library
Texas Instruments. 2010. TMS320C6412 DSP. http://www.ti.com/product/tms320c6412.Google Scholar
TILE-Gx100 2009. First 100-core processor with the new TILE-Gx family. http://www.tilera.com/products/processors/TILE-Gx_Family.Google Scholar
van Stralen, P. and Pimentel, A. 2010. Scenario-based design space exploration of MPSoCs. In International Conference on Computer Design. 305--312.Google Scholar
Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., and Borkar, N. 2007. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the International Solid-State Circuits Conference. 98--589.Google Scholar
Yang, P., Marchal, P., Wong, C., Himpe, S., Catthoor, F., David, P., Vounckx, J., and Lauwereins, R. 2002. Managing dynamic concurrent tasks in embedded real-time multimedia systems. In Proceedings of the International Symposium on System Synthesis. 112--119. Google ScholarDigital Library
Yang, Z., Kumar, A., and Ha, Y. 2010. An area-efficient dynamically reconfigurable spatial division multiplexing network-on-chip with static throughput guarantee. In Proceedings of the International Conference on Field-Programmable Technology. 389--392.Google Scholar
Ykman-Couvreur, C., Avasare, P., Mariani, G., Palermo, G., Silvano, C., and Zaccaria, V. 2011. Linking run-time resource management of embedded multi-core platforms with automated design-time exploration. Computers Digital Techniques, IET 5, 2, 123--135.Google ScholarCross Ref
Ykman-Couvreur, C., Nollet, V., Catthoor, F., and Corporaal, H. 2006. Fast multi-dimension multi-choice knapsack heuristic for MP-SoC run-time management. In Proceedings of the International Symposium on System-on-Chip. 1--4.Google Scholar
Zamora, N. H., Hu, X., and Marculescu, R. 2007. System-level performance/power analysis for platform-based design of multimedia applications. ACM Trans. Des. Autom. Electron. Syst. 12, 2, 1--29. Google ScholarDigital Library

Index Terms

Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs

Recommendations

Move Based Algorithm for Runtime Mapping of Dataflow Actors on Heterogeneous MPSoCs

Considering the evolution towards highly variable data flow applications based on an increasing impact of dynamic actors, we must target at runtime the best matching between dataflow graphs and heterogeneous multiprocessor platforms. Thus the mapping ...
Read More
Towards embedded runtime system level optimization for MPSoCs: on-chip task allocation
GLSVLSI '09: Proceedings of the 19th ACM Great Lakes symposium on VLSI

Next generation multiprocessor systems-on-chip (MPSoCs) are expected to contain numerous processing elements, interconnected via on-chip networks, executing real-time applications. It is anticipated that runtime optimization algorithms which dynamically ...
Read More
A High-Throughput Distributed Shared-Buffer NoC Router

Microarchitectural configurations of buffers in routers have a significant impact on the overall performance of an on-chip network (NoC). This buffering can be at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Design Automation of Electronic Systems Volume 18, Issue 1
Special section on adaptive power management for energy and temperature-aware computing systems
January 2013
319 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2390191
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 16 January 2013
- Accepted: 1 August 2012
- Revised: 1 April 2012
- Received: 1 September 2011
Published in todaes Volume 18, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multiprocessor systems-on-chip
design-space exploration
embedded systems
energy consumption
multimedia applications
runtime mapping
synchronous data-flow graphs
throughput
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 460
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Move Based Algorithm for Runtime Mapping of Dataflow Actors on Heterogeneous MPSoCs

Towards embedded runtime system level optimization for MPSoCs: on-chip task allocation

A High-Throughput Distributed Shared-Buffer NoC Router

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Move Based Algorithm for Runtime Mapping of Dataflow Actors on Heterogeneous MPSoCs

Towards embedded runtime system level optimization for MPSoCs: on-chip task allocation

A High-Throughput Distributed Shared-Buffer NoC Router

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media