Abstract
Applications in embedded systems often need to meet specified timing constraints. It is advantageous to not only calculate the worst-case execution time (WCET) of an application, but to also perform transformation, which reduce the WCET, since an application with a lower WCET will be less likely to violate its timing constraints. Some processors incur a pipeline delay whenever an instruction transfers control to a target that is not the next sequential instruction. Code-positioning optimizations attempt to reduce these delays by positioning the basic blocks to minimize the number of unconditional jumps and taken conditional branches that occur. Traditional code-positioning algorithms use profile data to find the frequently executed edges between basic blocks, then minimize the transfers of control along these edges to reduce the average case execution time (ACET). This paper introduces a WCET code-positioning optimization, driven by the worst-case (WC) path information from a timing analyzer, to reduce the WCET instead of ACET. This WCET optimization changes the layout of the code in memory to reduce the branch penalties along the WC paths. Unlike the frequency of edges in traditional profile-driven code positioning, the WC path may change after code-positioning decisions are made. Thus, WCET code positioning is inherently more challenging than ACET code positioning. The experimental results show that this optimization typically finds the optimal layout of the basic blocks with the minimal WCET. The results show over a 7% reduction in WCET is achieved after code positioning is performed.
- Arnold, R., Mueller, F., and Whalley, D. 1994. Bounding worst-case instruction cache performance. In Proceedings of the Fifteenth IEEE Real-time Systems Symposium, San Juan. IEEE Computer Society Press. 172--181.Google Scholar
- Benitez, M. 1994. Retargetable register allocation. Ph.D. thesis, University of Virginia, Char-lottesville, VA.Google Scholar
- Benitez, M. E. and Davidson, J. W. 1988. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language design and Implementation, Atlanta, GA. ACM Press, New York. 329--338. Google Scholar
- Benitez, M. E. and Davidson, J. W. 1994. The advantages of machine-dependent global optimization. In Proceedings of the 1994 International Conference on Programming Languages and Architectures, 105--124. Google Scholar
- Calder, B. and Grunwald, D. 1994. Reducing branch costs via branch alignment. In Proceeding of ASPLOS'94, San Jose, CA. ACM Press, New York. 242--251. Google Scholar
- Engblom, J. and Ermedahl, A. 2000. Modeling complex flows for worst-case execution time analysis. In Proceedings of the 21st IEEE Real-time System Symposium, Orlando, FL. IEEE Computer Society Press, 875--889. Google ScholarCross Ref
- Eyre, J. and Bier, J. 1998. Dsp processors hit the mainsteam. IEEE Computer 31, 8 (Aug.), 51--59. Google ScholarDigital Library
- Harmon, M., Baker, T., and Whalley, D. 1994. A retargetable technique for prediction execution time of code segments. Real-Time Systems. 159--182. Google Scholar
- Healy, C. and Whalley, D. 1999. Tighter timing predictions by automatic detection and exploitation of value-dependent constraints. In Proceedings of the IEEE Real-Time Technology and Applications Symposium, Vancouver. IEEE Computer Society Press. 79--99. Google Scholar
- Healy, C. and Whalley, D. 2000. Automatic detection and exploitation of branch constraints for timing analysis. IEEE Transaction on Software Engineering 28, 8 (Aug.), 763--781. Google Scholar
- Healy, C., Whalley, D., and Harmon, M. 1995. Integrating the timing analysis of pipelining and instruction caching. In Proceedings of the Sixteenth IEEE Real-Time Systems Symposium, Pisa. IEEE Computer Society Press. 288--297. Google Scholar
- Healy, C., Arnold, R., Mueller, F., Whalley, D., and Harmon, M. 1999. Bounding pipeline and instruction cache performance. IEEE Transactions on Computers 48, 1 (Jan.), 53--70. Google ScholarDigital Library
- Healy, C., Sjodin, M., Rustagi, V., Whalley, D., and van engelen, R. 2000a. Supporting timing analysis by automatic bounding of loop iterations. Real-Time Systems 18, 2 (May), 121--148. Google ScholarCross Ref
- Healy, C., Whalley, D., and Van engelen, R. 2000b. A general approach for tight timing predictions of non-rectangular loops. In WIP Proceedings of the IEEE Real-Time Technology and Applications Symposium, Washington, DC. IEEE Computer Society Press. 11--14.Google Scholar
- Hong, S. and Gerber, R. 1993. Compiling real-time programs into schedulable code. In Proceedings of the SIGPLAN'93, Albuquerque, NM. ACM Press, New York. 166--176. Google Scholar
- Ko, L., Healy, C., Ratliff, E., Arnold, R., Whalley, D., and Harmon, M. 1996. Supporting the specification and analysis of timing constraints. In Proceeding of the IEEE Real-Time Technology and Application Symposium, Boston, MA. IEEE Computer Society Press. 170--178. Google Scholar
- Ko, L., Al-Yaqoubi, N., Healy, C., Ratliff, E., Arnold, R., Whalley, D., and Harmon, M. 1999. Timing constraint specification and analysis. Software Practice & Experience 29, 1 (Jan.), 77--98. Google ScholarCross Ref
- Kulkarni, P., Zhao, W., Moon, H., Cho, K., Whalley, D., Davidson, J., Bailey, M., Paek, Y., and Gallivan, K. 2003. Finding effective optimization phase sequences. In ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems, San Diego, CA. ACM Press, New York. 12--23. Google Scholar
- Lee, S., Lee, J., Park, C., and Min, S. 2004. A flexible tradeoff between code size and wcet using a dual instruction set processor. In International Workshop on Software and Compilers for Embedded Systems, Amsterdam. Springer, New york. 244--258.Google Scholar
- Li, Y., Malik, S., and Wolfe, A. 1995. Efficient microarchitecture modeling and path analysis for real-time software. In Proceedings of the Sixteenth IEEE Real-time Systems Symposium, Pisa. IEEE Computer Society Press. 298--307. Google Scholar
- Lim, S., Bae, Y., Jang, G., Rhee, B., Min, S., Park, C., Shin, H., Park, K., and Kim, C. 1994. An accurate worst case timing analysis technique for risc processors. In Proceedings of the Fifteenth IEEE Real-Time Systems Symposium, San Juan. IEEE Computer Society Press. 875--889. Google Scholar
- Lundqvist, T. and Stenstrom, P. 1998. Integrating path and timing analysis using instruction-level simulation techniques. In Proceedings of SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems(LCTES'98), Montreal. IEEE Computer Society Press. 1--15. Google Scholar
- Mcfarling, S. and Hennessy, J. 1986. Reducing the cost of branches. In 13th Annual International Symposium of Computer Architecture, Tokyo, Japan. 396--403. Google Scholar
- Mueller, F. 1997. Timing predictions for multi-level caches. In ACM SIGPLAN Workshop on Language, Compiler and Tool Support for Real-time Systems, Las Vegas, NV. ACM Press, New York. 29--36.Google Scholar
- Mueller, F. 2000. Timing analysis for instruction caches. Real-Time Systems 18, 2 (May), 209--239. Google ScholarDigital Library
- Pettis, K. and hansen, R. 1990. Profile guided code position. In Proceeding of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, ACM Press, New York. 16--27. Google Scholar
- Shaw, A. C. 1989. Reasoning about time in higher-level language software. IEEE Transactions on Software Engineering 15, 7, 875--889. Google ScholarDigital Library
- Star Core, I. 2001a. Sc100 simulator reference manual.Google Scholar
- Star Core, I. 2001b. Sc110 dsp core reference manual.Google Scholar
- T. Marlowe, S. M. 1992. Safe optimization for hard real-time programming. In Special Session on Real-Time Programming, Second International Conference on Systems Integration. 438--446.Google Scholar
- Vivancos, E., Healy, C., Mueller, F., and Whalley, D. 2001. Parametric timing analysis. In Proceedings of the ACM SIGPLAN Workshop on Language, Compilers, and Tools for Embedded Systems, Snowbird, UT. ACM Press, New York. 83--93. Google Scholar
- White, R. T., Mueller, F., Healy, C., Whalley, D., and Harmon, M. 1997. Timing analysis for data caches and set-associative caches. In Proceedings of the IEEE Real-Time Technology and Application Symposium, Montreal. IEEE Computer Society Press. 192--202. Google Scholar
- White, R., Mueller, F., Healy, C., Whalley, D., and Harmon, M. 1999. Timing analysis for data caches and wrap-around-fill caches. Real-Time Systems 17, 1 (Nov.), 209--233. Google ScholarCross Ref
- Zhao, W., Cai, B., Whalley, D., Bailey, M., van Engelen, R., Yuan, X., Hiser, J., Davidson, J., Gallivan, K., and Jones, D. 2002. Vista: A system for interactive code improvement. In ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems, Berlin. ACM Press, New York. 155--164. Google Scholar
Index Terms
- Improving WCET by applying a WC code-positioning optimization
Recommendations
Improving WCET by applying worst-case path optimizations
It is advantageous to perform compiler optimizations that attempt to lower the worst-case execution time (WCET) of an embedded application since tasks with lower WCETs are easier to schedule and more likely to meet their deadlines. Compiler writers in ...
WCET-driven branch prediction aware code positioning
CASES '11: Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systemsIn the past decades, embedded system designers moved from simple, predictable system designs towards complex systems equipped with caches, branch prediction units and speculative execution. This step was necessary in order to fulfill increasing ...
Multicore-aware hybrid code positioning to reduce worst-case execution time
INTERACT-14: Proceedings of the 2010 Workshop on Interaction between Compilers and Computer ArchitectureUnlike general-purpose programs, it is important to reduce the worst-case performance rather than the average-case performance for real-time systems. In this paper, based on a dual-core processor with a shared L2 cache, we propose a code positioning ...
Comments