Abstract
Modern embedded systems need to support multiple time-constrained multimedia applications that often employ multiprocessor-systems-on-chip (MPSoCs). Such systems need to be optimized for resource usage and energy consumption. It is well understood that a design-time approach cannot provide timing guarantees for all the applications due to its inability to cater for dynamism in applications. However, a runtime approach consumes large computation requirements at runtime and hence may not lend well to constrained-aware mapping.
In this article, we present a hybrid approach for efficient mapping of applications in such systems. For each application to be supported in the system, the approach performs extensive design-space exploration (DSE) at design time to derive multiple design points representing throughput and energy consumption at different resource combinations. One of these points is selected at runtime efficiently, depending upon the desired throughput while optimizing for energy consumption and resource usage. While most of the existing DSE strategies consider a fixed multiprocessor platform architecture, our DSE considers a generic architecture, making DSE results applicable to any target platform. All the compute-intensive analysis is performed during DSE, which leaves for minimum computation at runtime. The approach is capable of handling dynamism in applications by considering their runtime aspects and providing timing guarantees.
The presented approach is used to carry out a DSE case study for models of real-life multimedia applications: H.263 decoder, H.263 encoder, MPEG-4 decoder, JPEG decoder, sample rate converter, and MP3 decoder. At runtime, the design points are used to map the applications on a heterogeneous MPSoC. Experimental results reveal that the proposed approach provides faster DSE, better design points, and efficient runtime mapping when compared to other approaches. In particular, we show that DSE is faster by 83% and runtime mapping is accelerated by 93% for some cases. Further, we study the scalability of the approach by considering applications with large numbers of tasks.
- Ahn, Y., Han, K., Lee, G., Song, H., Yoo, J., Choi, K., and Feng, X. 2008. SoCDAL: System-on-chip design acceLerator. ACM Trans. Des. Autom. Electron. Syst. 13, 17, 1--38. Google ScholarDigital Library
- Angiolini, F., Ceng, J., Leupers, R., Ferrari, F., Ferri, C., and Benini, L. 2006. An integrated open framework for heterogeneous MPSoC design space exploration. In Proceedings of the Design, Automation and Test Conference in Europe. 1--6. Google ScholarDigital Library
- Ascia, G., Catania, V., Di Nuovo, A. G., Palesi, M., and Patti, D. 2007. Efficient design space exploration for application specific systems-on-a-chip. J. Syst. Archit. 53, 733--750. Google ScholarDigital Library
- Benini, L., Bertozzi, D., and Milano, M. 2008. Resource management policy handling multiple use-cases in MPSoC platforms using constraint programming. In Proceedings of the International Conference on Logic Programming. 470--484. Google ScholarDigital Library
- Bonfietti, A., Lombardi, M., Milano, M., and Benini, L. 2009. Throughput constraint for synchronous data flow graphs. In Proceedings of the International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. 26--40. Google ScholarDigital Library
- Borkar, S. 2007. Thousand core chips: A technology perspective. In Proceedings of the Annual Design Automation Conference. 746--749. Google ScholarDigital Library
- Carvalho, E. and Moraes, F. 2008. Congestion-aware task mapping in heterogeneous MPSoCs. In International Symposium on System-on-Chop (SoC). 1--4.Google Scholar
- Cho, S. H., Xanthopoulos, T., and Chandrakasan, A. 1999. A low power variable length decoder for MPEG-2 based on nonuniform fine-grain table partitioning. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 7, 2, 249--257. Google ScholarDigital Library
- Gangwal, O. P., Radulescu, A., Goossens, K., Pestana, S. G., and Rijpkema, E. 2005. Building predictable systems on chip: An analysis of guaranteed communication in the Æthereal network on chip. In Dynamic and Robust Streaming in and between Connected Consumer-Electronic Devices, vol. 3, Springer, 1--36.Google Scholar
- Geilen, M., Basten, T., Theelen, B., and Otten, R. 2005. An algebra of Pareto points. In Proceedings of the International Conference on Application of Concurrency to System Design. 88--97. Google ScholarDigital Library
- Ghamarian, A. H., Geilen, M. C. W., Stuijk, S., Basten, T., Theelen, B. D., Mousavi, M. R., Moonen, A. J. M., and Bekooij, M. J. G. 2006. Throughput analysis of synchronous data flow graphs. In Proceedings of the International Conference on Application of Concurrency to System Design. 25--36. Google ScholarDigital Library
- Giovanni, B., Fossati, L., and Sciuto, D. 2010. Decision-theoretic design space exploration of multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integ. Cir. Sys. 29, 1083--1095. Google ScholarDigital Library
- Goossens, K., Dielissen, J., and Radulescu, A. 2005. AEthereal network on chip: Concepts, architectures, and implementations. IEEE Des. Test 22, 5, 414--421. Google ScholarDigital Library
- Grecu, C., Pande, P., Ivanov, A., and Saleh, R. 2005. Timing analysis of network on chip architectures for mp-soc platforms. Microelectronics J. 36, 9, 833--845.Google ScholarCross Ref
- Hentati, M., Aoudni, Y., Nezan, J., Abid, M., and Deforges, O. 2011. FPGA dynamic reconfiguration using the RVC technology: Inverse quantization case study. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing. 1--7.Google Scholar
- Hu, J. and Marculescu, R. 2004. Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In Proceedings of the conference on Design, automation and Test in Europe (DATE'04). Google ScholarDigital Library
- Jia, Z. J., Pimentel, A., Thompson, M., Bautista, T., and Nunez, A. 2010. NASA: A generic infrastructure for system-level MP-SoC design space exploration. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia. 41--50.Google Scholar
- Keinert, J., Streubühr, M., Schlichter, T., Falk, J., Gladigau, J., Haubelt, C., Teich, J., and Meredith, M. 2009. SystemCoDesigner—an automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications. ACM Trans. Des. Autom. Electron. Syst. 14, 1, 1--23. Google ScholarDigital Library
- Kim, M., Banerjee, S., Dutt, N., and Venkatasubramanian, N. 2008. Energy-aware cosynthesis of real-time multimedia applications on MPSoCs using heterogeneous scheduling policies. ACM Trans. Embed. Comput. Syst. 7, 1, 1--19. Google ScholarDigital Library
- Kistler, M., Perrone, M., and Petrini, F. 2006. Cell multiprocessor communication network: Built for speed. IEEE Micro 26, 10--23. Google ScholarDigital Library
- Kumar, A., Fernando, S., Ha, Y., Mesman, B., and Corporaal, H. 2008. Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA. ACM Trans. Des. Autom. Electron. Syst. 13, 40, 1--27. Google ScholarDigital Library
- Lee, E. A. and Messerschmitt, D. G. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24--35. Google ScholarDigital Library
- Leijten, J., van Meerbergen, J., Timmer, A., and Jess, J. 1997. PROPHID: A heterogeneous multi-processor architecture for multimedia. In Proceedings of the International Conference on Computer Design. 164--169. Google ScholarDigital Library
- Liu, W., Yuan, M., He, X., Gu, Z., and Liu, X. 2008. Efficient SAT-based mapping and scheduling of homogeneous synchronous dataflow graphs for throughput optimization. In Proceedings of the Real-Time Systems Symposium. 492--504. Google ScholarDigital Library
- Lukasiewycz, M., Glass, M., Haubelt, C., and Teich, J. 2008. Efficient symbolic multi-objective design space exploration. In Proceedings of the Asia and South Pacific Design Automation Conference. 691--696. Google ScholarDigital Library
- Mariani, G., Avasare, P., Vanmeerbeeck, G., Ykman-Couvreur, C., Palermo, G., Silvano, C., and Zaccaria, V. 2010. An industrial design space exploration framework for supporting run-time resource management on multi-core systems. In Proceedings of the Conference on Design, Automation and Test in Europe. 196--201. Google ScholarDigital Library
- Moreira, O., Mol, J. J.-D., and Bekooij, M. 2007. Online resource management in a multiprocessor with a network-on-chip. In Proceedings of the Symposium on Applied Computing. 1557--1564. Google ScholarDigital Library
- Moreira, O., Valente, F., and Bekooij, M. 2007. Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor. In Proceedings of the International Conference on Embedded Software. 57--66. Google ScholarDigital Library
- Nollet, V., Avasare, P., Eeckhaut, H., Verkest, D., and Corporaal, H. 2008. Run-time management of a MPSoC containing FPGA fabric tiles. IEEE Trans. Very Large Scale Integr. Syst. 16, 24--33. Google ScholarDigital Library
- OEIS. 2012. Encyclopedia of integer sequences. http://oeis.org/.Google Scholar
- Palermo, G., Silvano, C., and Zaccaria, V. 2005. Multi-objective design space exploration of embedded systems. J. Embed. Comput. 1, 305--316. Google ScholarDigital Library
- Palermo, G., Silvano, C., and Zaccaria, V. 2008. Robust optimization of SoC architectures: A multi-scenario approach. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia. 7--12.Google Scholar
- Palma, J., Marcon, C., Moraes, F., Calazans, N., Reis, R., and Susin, A. 2005. Mapping embedded systems onto NoCs—The traffic effect on dynamic energy estimation. In Proceedings of the Symposium on Integrated Circuits and Systems Design. 196--201. Google ScholarDigital Library
- Paulin, P. G., Pilkington, C., Bensoudane, E., Langevin, M., and Lyonnard, D. 2004. Application of a multi-processor SoC platform to high-speed packet forwarding. In Proceedings of the Conference on Design, Automation and Test in Europe. 58--63. Google ScholarDigital Library
- Ren, J. and Kehtarnavaz, N. 2007. Comparison of power consumption for motion compensation and deblocking filters in high definition video coding. In Proceedings of the International Symposium on Consumer Electronics. 1--5.Google Scholar
- Rutten, M. J., van Eijndhoven, J. T. J., Jaspers, E. G. T., van der Wolf, P., Pol, E.-J. D., Gangwal, O. P., and Timmer, A. 2002. A heterogeneous multiprocessor architecture for flexible media processing. IEEE Des. Test 19, 39--50. Google ScholarDigital Library
- Schranzhofer, A., Chen, J.-J., and Thiele, L. 2010. Dynamic power-aware mapping of applications onto heterogeneous MPSoC platforms. IEEE Trans. Ind. Inf. 6, 4, 692--707.Google ScholarCross Ref
- Segars, S. 1997. ARM7TDMI power consumption. IEEE Micro 17, 4, 12--19. Google ScholarDigital Library
- Singh, A. K., Jigang, W., Prakash, A., and Srikanthan, T. 2009. Efficient heuristics for minimizing communication overhead in noc-based heterogeneous MPSoC platforms. In Proceedings of the International Symposium on Rapid System Prototyping. 55--60. Google ScholarDigital Library
- Singh, A. K., Kumar, A., and Srikanthan, T. 2011. A hybrid strategy for mapping multiple throughput-constrained applications on MPSoCs. In Proceedings of the International Conference on Compilers, Architectures and Synthesis of Embedded Systems. Google ScholarDigital Library
- Singh, A. K., Srikanthan, T., Kumar, A., and Jigang, W. 2010. Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms. J. Syst. Archit. 56, 242--255. Google ScholarDigital Library
- Stuijk, S., Basten, T., Geilen, M. C. W., and Corporaal, H. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In Proceedings of the 44th Annual Design Automation Conference. 777--782. Google ScholarDigital Library
- Stuijk, S., Geilen, M., and Basten, T. 2006. SDF3: SDF for free. In Proceedings of the 6th International Conference on Application of Concurrency to System Design. 276--278. Google ScholarDigital Library
- Stuijk, S., Geilen, M., and Basten, T. 2010. A predictable multiprocessor design flow for streaming applications with dynamic behaviour. In Proceedings of Euromicro Conference on Digital System Design. 548--555. Google ScholarDigital Library
- Sung, T.-Y., Shieh, Y.-S., Yu, C.-W., and Hsin, H.-C. 2006. High-efficiency and low-power architectures for 2-D DCT and IDCT based on CORDIC rotation. In International Conference on Parallel and Distributed Computing, Applications and Technologies. 191--196. Google ScholarDigital Library
- Texas Instruments. 2010. TMS320C6412 DSP. http://www.ti.com/product/tms320c6412.Google Scholar
- TILE-Gx100 2009. First 100-core processor with the new TILE-Gx family. http://www.tilera.com/products/processors/TILE-Gx_Family.Google Scholar
- van Stralen, P. and Pimentel, A. 2010. Scenario-based design space exploration of MPSoCs. In International Conference on Computer Design. 305--312.Google Scholar
- Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., and Borkar, N. 2007. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the International Solid-State Circuits Conference. 98--589.Google Scholar
- Yang, P., Marchal, P., Wong, C., Himpe, S., Catthoor, F., David, P., Vounckx, J., and Lauwereins, R. 2002. Managing dynamic concurrent tasks in embedded real-time multimedia systems. In Proceedings of the International Symposium on System Synthesis. 112--119. Google ScholarDigital Library
- Yang, Z., Kumar, A., and Ha, Y. 2010. An area-efficient dynamically reconfigurable spatial division multiplexing network-on-chip with static throughput guarantee. In Proceedings of the International Conference on Field-Programmable Technology. 389--392.Google Scholar
- Ykman-Couvreur, C., Avasare, P., Mariani, G., Palermo, G., Silvano, C., and Zaccaria, V. 2011. Linking run-time resource management of embedded multi-core platforms with automated design-time exploration. Computers Digital Techniques, IET 5, 2, 123--135.Google ScholarCross Ref
- Ykman-Couvreur, C., Nollet, V., Catthoor, F., and Corporaal, H. 2006. Fast multi-dimension multi-choice knapsack heuristic for MP-SoC run-time management. In Proceedings of the International Symposium on System-on-Chip. 1--4.Google Scholar
- Zamora, N. H., Hu, X., and Marculescu, R. 2007. System-level performance/power analysis for platform-based design of multimedia applications. ACM Trans. Des. Autom. Electron. Syst. 12, 2, 1--29. Google ScholarDigital Library
Index Terms
- Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs
Recommendations
Move Based Algorithm for Runtime Mapping of Dataflow Actors on Heterogeneous MPSoCs
Considering the evolution towards highly variable data flow applications based on an increasing impact of dynamic actors, we must target at runtime the best matching between dataflow graphs and heterogeneous multiprocessor platforms. Thus the mapping ...
Towards embedded runtime system level optimization for MPSoCs: on-chip task allocation
GLSVLSI '09: Proceedings of the 19th ACM Great Lakes symposium on VLSINext generation multiprocessor systems-on-chip (MPSoCs) are expected to contain numerous processing elements, interconnected via on-chip networks, executing real-time applications. It is anticipated that runtime optimization algorithms which dynamically ...
A High-Throughput Distributed Shared-Buffer NoC Router
Microarchitectural configurations of buffers in routers have a significant impact on the overall performance of an on-chip network (NoC). This buffering can be at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or ...
Comments