Skip to main content

Analysis of Execution Efficiency in the Microthreaded Processor UTLEON3

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6566))

Abstract

We analyse an impact of long-latency instructions, the family blocksize parameter, and the thread switch modifier on execution efficiency of families of threads in a single-core configuration of the UTLEON3 processor that implements the SVP microthreading model. The analysis is supported by code execution in an FPGA implementation of the processor.

By classifying long-latency operations as either pipelined (e.g. floating-point operations) or non-pipelined (e.g. cache faults) we show that the blocksize parameter that controls resource utilization in the micro- threaded processor has profound effects when the latency is pipelined, i.e. increasing the blocksize can improve the performance. In the non-pipelined long-latency case the efficiency reaches its maximum even with a small value of blocksize beyond which it cannot improve due to occupancy of an exclusive resource (memory bus congestion).

The conclusions drawn in this paper can be used to optimize code compilation for the microthreaded processor. As the compiler specifies the blocksize parameter for each family of threads individually, it can optimize the register file utilization of the processor.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Independent JPEG Group, http://www.ijg.org/

  2. Agarwal, A., Kubiatowicz, J., Kranz, D., Lim, B.H., Yeung, D., D’Souza, G., Parkin, M.: Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE MICRO 13, 48–61 (1993)

    Article  Google Scholar 

  3. Akkary, H., Jothi, K., Retnamma, R., Nekkalapu, S., Hall, D., Shahidzadeh, S.: On the potential of latency tolerant execution in speculative multithreading. In: IFMT 2008: Proceedings of the 1st International Forum on Next-generation Multicore/manycore Technologies, pp. 1–10. ACM, New York (2008)

    Chapter  Google Scholar 

  4. Arvind, K., Nikhil, R.S.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Transaction on Computers 39(6), 300–318 (1990)

    Article  MATH  Google Scholar 

  5. Barnes, R.D., Nystrom, E.M., Sias, J.W., Patel, S.J., Navarro, N., Hwu, W.m.W.: m.W.: Beating in-order stalls with ”flea-flicker” two-pass pipelining. In: MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p. 387. IEEE Computer Society, Washington, DC (2003)

    Google Scholar 

  6. Danek, M., Kafka, L., Kohout, L., Sykora, J.: Instruction set extensions for multi-threading in LEON3. In: Kotásek, Z., et al. (eds.) Proceedings of the 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems DDECS 2010, pp. 237–242. IEEE, Los Alamitos (2010)

    Google Scholar 

  7. Gaisler, J., Catovic, E., Habinc, S.: GRLIB IP Library User’s Manual. Gaisler Research (2007)

    Google Scholar 

  8. Guz, Z., Bolotin, E., Keidar, I., Kolodny, A., Mendelson, A., Weiser, U.C.: Many-core vs. many-thread machines: Stay away from the valley. IEEE Comput. Archit. Lett. 8(1), 25–28 (2009)

    Article  Google Scholar 

  9. Jesshope, C.: Scalable instruction-level parallelism. In: Pimentel, A.D., Vassiliadis, S. (eds.) SAMOS 2004. LNCS, vol. 3133, pp. 383–392. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Jesshope, C.R.: μTC - an intermediate language for programming chip multiprocessors. In: Asia-Pacific Computer Systems Architecture Conference, pp. 147–160 (2006)

    Google Scholar 

  11. Kissell, K.D.: MIPS MT: A multithreaded RISC architecture for embedded real-time processing. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds.) HiPEAC 2007. LNCS, vol. 4917, pp. 9–21. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Kongentira, P., Aingaran, K., Olukotum, K.: Niagara: a 32-way multithreaded SPARC processor. IEEE Micro 25(2), 21–29 (2005)

    Article  Google Scholar 

  13. Mikschl, A., Damm, W.: MSparc: A Multithreaded Sparc. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 461–469. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  14. Nekkalapu, S., Akkary, H., Jothi, K., Retnamma, R., Song, X.: A simple latency tolerant processor. In: IEEE 26th International Conference on Computer Design (2008)

    Google Scholar 

  15. Parcerisa, J.M., Gonzalez, A.: Improving latency tolerance of multithreading through decoupling. IEEE Trans. Comput. 50(10), 1084–1094 (2001)

    Article  Google Scholar 

  16. Rangan, R., Vachharajani, N., Vachharajani, M., August, D.I.: Decoupled software pipelining with the synchronization array. In: PACT 2004: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 177–188. IEEE Computer Society, Washington, DC (2004)

    Chapter  Google Scholar 

  17. Redstone, J., Eggers, S., Levy, H.: Mini-threads: Increasing TLP on small-scale SMT processors. In: 9th Intl Symp. On High-Performance Computer Architecture (HPCA 2003) 2003

    Google Scholar 

  18. Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: HPCA6, pp. 375–386 (2000)

    Google Scholar 

  19. Saavedra-Barrera, R.H., Culler, D.E., von Eicken, T.: Analysis of multithreaded architectures for parallel computing (1990)

    Google Scholar 

  20. The Apple-CORE Consortium: Architecture Paradigms and Programming Languages for Efficient programming of multiple COREs, http://www.apple-core.info

  21. Ungerer, T., Robič, B., Šilc, J.: A survey of processors with explicit multithreading. ACM Comput. Surv. 35(1), 29–63 (2003)

    Article  Google Scholar 

  22. Waldspurger, C.A., Weihl, W.E.: Register relocation: Flexible contexts for multithreading. In: 20th Annual International Symposium on Computer Architecture, pp. 120–130 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sykora, J., Kafka, L., Danek, M., Kohout, L. (2011). Analysis of Execution Efficiency in the Microthreaded Processor UTLEON3. In: Berekovic, M., Fornaciari, W., Brinkschulte, U., Silvano, C. (eds) Architecture of Computing Systems - ARCS 2011. ARCS 2011. Lecture Notes in Computer Science, vol 6566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19137-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19137-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19136-7

  • Online ISBN: 978-3-642-19137-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics