Analysis of Execution Efficiency in the Microthreaded Processor UTLEON3

Sykora, Jaroslav; Kafka, Leos; Danek, Martin; Kohout, Lukas

doi:10.1007/978-3-642-19137-4_10

Analysis of Execution Efficiency in the Microthreaded Processor UTLEON3

Jaroslav Sykora¹⁹,
Leos Kafka¹⁹,
Martin Danek¹⁹ &
…
Lukas Kohout¹⁹

Conference paper

860 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6566))

Abstract

We analyse an impact of long-latency instructions, the family blocksize parameter, and the thread switch modifier on execution efficiency of families of threads in a single-core configuration of the UTLEON3 processor that implements the SVP microthreading model. The analysis is supported by code execution in an FPGA implementation of the processor.

By classifying long-latency operations as either pipelined (e.g. floating-point operations) or non-pipelined (e.g. cache faults) we show that the blocksize parameter that controls resource utilization in the micro- threaded processor has profound effects when the latency is pipelined, i.e. increasing the blocksize can improve the performance. In the non-pipelined long-latency case the efficiency reaches its maximum even with a small value of blocksize beyond which it cannot improve due to occupancy of an exclusive resource (memory bus congestion).

The conclusions drawn in this paper can be used to optimize code compilation for the microthreaded processor. As the compiler specifies the blocksize parameter for each family of threads individually, it can optimize the register file utilization of the processor.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Independent JPEG Group, http://www.ijg.org/
Agarwal, A., Kubiatowicz, J., Kranz, D., Lim, B.H., Yeung, D., D’Souza, G., Parkin, M.: Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE MICRO 13, 48–61 (1993)
Article Google Scholar
Akkary, H., Jothi, K., Retnamma, R., Nekkalapu, S., Hall, D., Shahidzadeh, S.: On the potential of latency tolerant execution in speculative multithreading. In: IFMT 2008: Proceedings of the 1st International Forum on Next-generation Multicore/manycore Technologies, pp. 1–10. ACM, New York (2008)
Chapter Google Scholar
Arvind, K., Nikhil, R.S.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Transaction on Computers 39(6), 300–318 (1990)
Article MATH Google Scholar
Barnes, R.D., Nystrom, E.M., Sias, J.W., Patel, S.J., Navarro, N., Hwu, W.m.W.: m.W.: Beating in-order stalls with ”flea-flicker” two-pass pipelining. In: MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p. 387. IEEE Computer Society, Washington, DC (2003)
Google Scholar
Danek, M., Kafka, L., Kohout, L., Sykora, J.: Instruction set extensions for multi-threading in LEON3. In: Kotásek, Z., et al. (eds.) Proceedings of the 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems DDECS 2010, pp. 237–242. IEEE, Los Alamitos (2010)
Google Scholar
Gaisler, J., Catovic, E., Habinc, S.: GRLIB IP Library User’s Manual. Gaisler Research (2007)
Google Scholar
Guz, Z., Bolotin, E., Keidar, I., Kolodny, A., Mendelson, A., Weiser, U.C.: Many-core vs. many-thread machines: Stay away from the valley. IEEE Comput. Archit. Lett. 8(1), 25–28 (2009)
Article Google Scholar
Jesshope, C.: Scalable instruction-level parallelism. In: Pimentel, A.D., Vassiliadis, S. (eds.) SAMOS 2004. LNCS, vol. 3133, pp. 383–392. Springer, Heidelberg (2004)
Chapter Google Scholar
Jesshope, C.R.: μTC - an intermediate language for programming chip multiprocessors. In: Asia-Pacific Computer Systems Architecture Conference, pp. 147–160 (2006)
Google Scholar
Kissell, K.D.: MIPS MT: A multithreaded RISC architecture for embedded real-time processing. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds.) HiPEAC 2007. LNCS, vol. 4917, pp. 9–21. Springer, Heidelberg (2008)
Chapter Google Scholar
Kongentira, P., Aingaran, K., Olukotum, K.: Niagara: a 32-way multithreaded SPARC processor. IEEE Micro 25(2), 21–29 (2005)
Article Google Scholar
Mikschl, A., Damm, W.: MSparc: A Multithreaded Sparc. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 461–469. Springer, Heidelberg (1996)
Chapter Google Scholar
Nekkalapu, S., Akkary, H., Jothi, K., Retnamma, R., Song, X.: A simple latency tolerant processor. In: IEEE 26th International Conference on Computer Design (2008)
Google Scholar
Parcerisa, J.M., Gonzalez, A.: Improving latency tolerance of multithreading through decoupling. IEEE Trans. Comput. 50(10), 1084–1094 (2001)
Article Google Scholar
Rangan, R., Vachharajani, N., Vachharajani, M., August, D.I.: Decoupled software pipelining with the synchronization array. In: PACT 2004: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 177–188. IEEE Computer Society, Washington, DC (2004)
Chapter Google Scholar
Redstone, J., Eggers, S., Levy, H.: Mini-threads: Increasing TLP on small-scale SMT processors. In: 9th Intl Symp. On High-Performance Computer Architecture (HPCA 2003) 2003
Google Scholar
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: HPCA6, pp. 375–386 (2000)
Google Scholar
Saavedra-Barrera, R.H., Culler, D.E., von Eicken, T.: Analysis of multithreaded architectures for parallel computing (1990)
Google Scholar
The Apple-CORE Consortium: Architecture Paradigms and Programming Languages for Efficient programming of multiple COREs, http://www.apple-core.info
Ungerer, T., Robič, B., Šilc, J.: A survey of processors with explicit multithreading. ACM Comput. Surv. 35(1), 29–63 (2003)
Article Google Scholar
Waldspurger, C.A., Weihl, W.E.: Register relocation: Flexible contexts for multithreading. In: 20th Annual International Symposium on Computer Architecture, pp. 120–130 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Signal Processing, Institute of Information Theory and Automation of the ASCR, Pod Vodarenskou vezi 4, Prague, Czech Republic
Jaroslav Sykora, Leos Kafka, Martin Danek & Lukas Kohout

Authors

Jaroslav Sykora
View author publications
You can also search for this author in PubMed Google Scholar
Leos Kafka
View author publications
You can also search for this author in PubMed Google Scholar
Martin Danek
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Kohout
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Datentechnik und Kommunikationsnetze, Hans-Sommer-Straße 66, 38106, Braunschweig, Germany
Mladen Berekovic
Dipartimento di elettronica e informazione, Via Ponzio 34/5, 20133, Milano, Italy
William Fornaciari & Cristina Silvano &
Johann Wolfgang Goethe-Universität Frankfurt, Robert-Mayer-Straße 11-15, 60325, Frankfurt am Main, Germany
Uwe Brinkschulte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sykora, J., Kafka, L., Danek, M., Kohout, L. (2011). Analysis of Execution Efficiency in the Microthreaded Processor UTLEON3. In: Berekovic, M., Fornaciari, W., Brinkschulte, U., Silvano, C. (eds) Architecture of Computing Systems - ARCS 2011. ARCS 2011. Lecture Notes in Computer Science, vol 6566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19137-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-19137-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19136-7
Online ISBN: 978-3-642-19137-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics