ABSTRACT
Multicore architectures provide the increased performance required by modern embedded real-time systems. Most platforms exhibit a non-uniform memory access (NUMA). In NUMA, memory banks with different access time can be explicitly addressed. Such an architecture, however, is challenging predictability given the significant impact of the allocation of variables on the execution times.
At software level, real-world embedded applications (e.g. automotive) are composed by thousands of functions often communicating through shared variables stored in memory, with a variable access time because of NUMA.
This paper addresses the mapping of complex embedded applications onto NUMA multicore architectures. The developed problem formulation offers a solution to the following problems: (i) allocating variables (called labels in the automotive context) over memories of different characteristics, (ii) mapping functionalities (called runnables) onto CPUs, (iii) creating OS tasks from runnables, and (iv) assigning priorities to tasks. Our developed implementation is capable to handle an application composed by 1K+ runnables, all sharing 10K+ labels and finds a solution in at most 3 minutes on a standard laptop, enabling interactive design space exploration.
- 2017. Eclipse APP4MC. Eclipse APP4MC Website, https://www.eclipse.org/app4mc/.Google Scholar
- Emile Aarts and Jan Korst. 1989. Simulated Annealing and Boltzmann Machines. Wiley & Sons.Google Scholar
- Adrian Alexandrescu, Ioan Agavriloaei, and Mitică Craus. 2011. A genetic algorithm for mapping tasks in heterogeneous computing systems. In 15th International Conference on System Theory, Control and Computing. IEEE, 1–6.Google Scholar
- Matthias Becker, Dakshina Dasari, Borislav Nicolic, Benny Akesson, Vincent Nélis, and Thomas Nolte. 2016. Contention-free execution of automotive applications on a clustered many-core platform. In 2016 28th Euromicro Conference on Real-Time Systems (ECRTS). IEEE, 14–24.Google ScholarCross Ref
- Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated Annealing. Statist. Sci. 8, 1 (1993), 10–15. https://doi.org/10.1214/ss/1177011077Google ScholarCross Ref
- Enrico Bini, Marco Di Natale, and Giorgio Buttazzo. 2008. Sensitivity analysis for fixed-priority real-time systems. Real-Time Systems 39, 1–3 (2008), 5–30. https://doi.org/10.1007/s11241-006-9010-1Google ScholarDigital Library
- Rahma Bouaziz, Laurent Lemarchand, Frank Singhoff, Bechir Zalila, and Mohamed Jmaiel. 2018. Multi-objective design exploration approach for ravenscar real-time systems. Real-Time Systems 54, 2 (2018), 424–483.Google ScholarDigital Library
- Daniel Casini, Paolo Pazzaglia, Alessandro Biondi, and Marco Di Natale. 2022. Optimized partitioning and priority assignment of real-time applications on heterogeneous platforms with hardware acceleration. Journal of Systems Architecture (2022), 102416.Google Scholar
- Travis S. Craig. 1993. Queuing spin lock algorithms to support timing predictability. 1993 Proceedings Real-Time Systems Symposium (1993), 148–157.Google ScholarCross Ref
- Pedro Cuadra, Lukas Krawczyk, Robert Höttger, Philipp Heisig, and Carsten Wolff. 2017. Automated scheduling for tightly-coupled embedded multi-core systems using hybrid genetic algorithms. In International Conference on Information and Software Technologies. Springer, 362–373.Google ScholarCross Ref
- George B. Dantzig. 1957. Discrete-variable extremum problems. Operations research 5, 2 (1957), 266–288.Google Scholar
- Robert I Davis and Alan Burns. 2007. Robust priority assignment for fixed priority real-time systems. In 28th IEEE International Real-Time Systems Symposium (RTSS 2007). IEEE, 3–14.Google ScholarDigital Library
- Hamid Reza Faragardi, Björn Lisper, Kristian Sandström, and Thomas Nolte. 2014. An efficient scheduling of AUTOSAR runnables to minimize communication cost in multi-core systems. In 7’th International Symposium on Telecommunications (IST’2014). IEEE, 41–48.Google ScholarCross Ref
- Frédéric Fauberteau and Serge Midonnet. 2010. Robust Partitioned Scheduling for Static-Priority Real-Time Multiprocessor Systems with Shared Resources. In 18th International Conference on Real-Time and Network Systems. 217–225.Google Scholar
- Gabriel Fernandez, Jaume Abella, Eduardo Quinones, Luca Fossati, Marco Zulianello, Tullio Vardanega, and Francisco J Cazorla. 2015. Seeking time-composable partitions of tasks for cots multicore processors. In 2015 IEEE 18th International Symposium on Real-Time Distributed Computing. IEEE, 208–217.Google ScholarDigital Library
- Fabrizio Ferrandi, Pier Luca Lanzi, Christian Pilato, Donatella Sciuto, and Antonino Tumeo. 2010. Ant colony heuristic for mapping and scheduling tasks and communications on heterogeneous embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 6 (2010), 911–924.Google ScholarDigital Library
- Simon Fürst, Jürgen Mössinger, Stefan Bunzel, Thomas Weber, Frank Kirschke-Biller, Peter Heitkämper, Gerulf Kinkelin, Kenji Nishikawa, and Klaus Lange. 2009. AUTOSAR — A Worldwide Standard is on the Road. In 14th International VDI Congress Electronic Systems for Vehicles, Baden-Baden, Vol. 62. 5.Google Scholar
- Mo Guan and Tong Tong. 2016. Ant colony algorithm based optimization method for real-time task scheduling of multi-core system.Google Scholar
- William E Hart, Jean-Paul Watson, and David L Woodruff. 2011. Pyomo: modeling and solving mathematical programs in Python. Mathematical Programming Computation 3, 3 (2011), 219–260.Google ScholarCross Ref
- Robert Höttger, Lukas Krawczyk, and Burkhard Igel. 2015. Model-based automotive partitioning and mapping for embedded multicore systems. In International Conference on Parallel, Distributed Systems and Software Engineering, Vol. 2. 888.Google Scholar
- Infineon [n. d.]. AURIX™TC3xx User’s Manual. Infineon. Available at https://www.infineon.com/cms/en/product/microcontroller/32-bit-tricore-microcontroller/32-bit-tricore-aurix-tc3xx/aurix-family-tc39xxx/.Google Scholar
- Yutaro Kobayashi, Kentaro Honda, Sasuga Kojima, Hiroshi Fujimoto, Masato Edahiro, and Takuya Azumi. 2022. Mapping Method Usable with Clustered Many-core Platforms for Simulink Model. Journal of Information Processing 30 (2022), 141–150.Google ScholarCross Ref
- Simon Kramer, Dirk Ziegenbein, and Arne Hamann. 2017. Automotive application model based on APP4MC (WATER17). available at https://www.ecrts.org/forum/viewtopic.php?f=31&t=108&sid=9e9dc98cfb2dac9e2606ef421789ceeb.Google Scholar
- John P. Lehoczky, Lui Sha, and Ye Ding. 1989. The Rate-Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior. In Proceedings of the 10th IEEE Real-Time Systems Symposium. Santa Monica (CA), U.S.A., 166–171.Google ScholarCross Ref
- Liu Liping. 2017. CPU (Central Processing Unit) performance optimization method and device based on NUMA (Non-uniform Memory Access) architecture.Google Scholar
- Chung Laung Liu and James W. Layland. 1973. Scheduling Algorithms for Multiprogramming in a Hard real-Time Environment. Journal of the Association for Computing Machinery 20, 1 (Jan. 1973), 46–61.Google ScholarDigital Library
- Robin Lougee. 2003. The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research community. IBM Journal of Research and Development 47 (02 2003), 57 – 66. https://doi.org/10.1147/rd.471.0057Google ScholarDigital Library
- Matias Maspoli, Matthias Knauss, and Marcin Nowacki. 2017. Method and device for operating a many-core system.Google Scholar
- Shane D. McLean, Silviu S. Craciunas, Emil Alexander Juul Hansen, and Paul Pop. 2020. Mapping and Scheduling Automotive Applications on ADAS Platforms using Metaheuristics. In 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vol. 1. IEEE, 329–336.Google ScholarCross Ref
- Aloysius K. Mok and Deji Chen. 1997. A multiframe model for real-time tasks. IEEE Transactions on Software Engineering 23, 10 (Oct. 1997), 635–645.Google ScholarDigital Library
- Fionn Murtagh and Pedro Contreras. 2011. Methods of Hierarchical Clustering. Computing Research Repository - CORR (04 2011). https://doi.org/10.1007/978-3-642-04898-2_288Google Scholar
- Suzuki Noriaki, Edahiro Masato, and Sakai Junji. 2012. Real time system task configuration optimization system for multi-core processors, and method and program.Google Scholar
- Suraj Paul, Navonil Chatterjee, Prasun Ghosal, and Jean-Philippe Diguet. 2020. Adaptive Task Allocation and Scheduling on NoC-based Multicore Platforms with Multitasking Processors. ACM Transactions on Embedded Computing Systems (TECS) 20, 1 (2020), 1–26.Google ScholarDigital Library
- Paolo Pazzaglia, Alessandro Biondi, and Marco Di Natale. 2019. Optimizing the functional deployment on multicore platforms with logical execution time. In 2019 IEEE Real-Time Systems Symposium (RTSS). IEEE, 207–219.Google ScholarCross Ref
- Quentin Perret, Pascal Maurère, Éric Noulard, Claire Pagetti, Pascal Sainrat, and Benoît Triquet. 2016. Mapping hard real-time applications on many-core processors. In Proceedings of the 24th International Conference on Real-Time Networks and Systems. 235–244.Google ScholarDigital Library
- Salah Eddine Saidi, Sylvain Cotard, Khaled Chaaban, and Kevin Marteil. 2015. An ILP approach for mapping autosar runnables on multi-core architectures. In Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools. 1–8.Google ScholarDigital Library
- H. Takada and K. Sakamura. 1994. Predictable spin lock algorithms with preemption. In Proceedings of 11th IEEE Workshop on Real-Time Operating Systems and Software. 2–6. https://doi.org/10.1109/RTOSS.1994.292571Google ScholarCross Ref
- Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA.Google Scholar
- A. Wieder and B. Brandenburg. 2013. On spin locks in AUTOSAR: Blocking analysis of FIFO, unordered, and priority-ordered spin locks. In Proceedings of the IEEE 34th Real-Time Systems Symposium. 45–56.Google Scholar
- Carsten Wolff, Lukas Krawczyk, Robert Höttger, Christopher Brink, Uwe Lauschner, Daniel Fruhner, Erik Kamsties, and Burkhard Igel. 2015. AMALTHEA — Tailoring tools to projects in automotive software development. In 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Vol. 2. IEEE, 515–520.Google ScholarDigital Library
- M. Yang, A. Wieder, and B. Brandenburg. 2015. Global real-time semaphore protocols: A survey, unified analysis, and comparison. In Proceedings of the IEEE 36th Real-Time Systems Symposium. 1–12.Google Scholar
- Yecheng Zhao and Haibo Zeng. 2018. The concept of unschedulability core for optimizing real-time systems with fixed-priority scheduling. IEEE Trans. Comput. 68, 6 (2018), 926–938.Google ScholarDigital Library
Recommendations
NUMA obliviousness through memory mapping
DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New HardwareWith the rise of multi-socket multi-core CPUs a lot of effort is being put into how to best exploit their abundant CPU power. In a shared memory setting the multi-socket CPUs are equipped with their own memory module, and access memory modules across ...
Modeling, Architecture, and Applications for Emerging Memory Technologies
Editor's note:Spin-transfer torque RAM and phase-change RAM are vying to become the next-generation embedded memory, offering high speed, high density, and nonvolatility. This article discusses new opportunities and challenges presented by these two ...
Cost-Efficient Memory Architecture Design of NAND Flash Memory Embedded Systems
ICCD '03: Proceedings of the 21st International Conference on Computer DesignNAND flash memory has become an indispensable component in embedded systems because of its versatile features such as non-volatility, solid-state reliability, low cost and high density. Even though NAND flash memory gains popularity as data storage, it ...
Comments