Abstract
We analyse an impact of long-latency instructions, the family blocksize parameter, and the thread switch modifier on execution efficiency of families of threads in a single-core configuration of the UTLEON3 processor that implements the SVP microthreading model. The analysis is supported by code execution in an FPGA implementation of the processor.
By classifying long-latency operations as either pipelined (e.g. floating-point operations) or non-pipelined (e.g. cache faults) we show that the blocksize parameter that controls resource utilization in the micro- threaded processor has profound effects when the latency is pipelined, i.e. increasing the blocksize can improve the performance. In the non-pipelined long-latency case the efficiency reaches its maximum even with a small value of blocksize beyond which it cannot improve due to occupancy of an exclusive resource (memory bus congestion).
The conclusions drawn in this paper can be used to optimize code compilation for the microthreaded processor. As the compiler specifies the blocksize parameter for each family of threads individually, it can optimize the register file utilization of the processor.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Independent JPEG Group, http://www.ijg.org/
Agarwal, A., Kubiatowicz, J., Kranz, D., Lim, B.H., Yeung, D., D’Souza, G., Parkin, M.: Sparcle: An evolutionary processor design for large-scale multiprocessors. IEEE MICRO 13, 48–61 (1993)
Akkary, H., Jothi, K., Retnamma, R., Nekkalapu, S., Hall, D., Shahidzadeh, S.: On the potential of latency tolerant execution in speculative multithreading. In: IFMT 2008: Proceedings of the 1st International Forum on Next-generation Multicore/manycore Technologies, pp. 1–10. ACM, New York (2008)
Arvind, K., Nikhil, R.S.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Transaction on Computers 39(6), 300–318 (1990)
Barnes, R.D., Nystrom, E.M., Sias, J.W., Patel, S.J., Navarro, N., Hwu, W.m.W.: m.W.: Beating in-order stalls with ”flea-flicker” two-pass pipelining. In: MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p. 387. IEEE Computer Society, Washington, DC (2003)
Danek, M., Kafka, L., Kohout, L., Sykora, J.: Instruction set extensions for multi-threading in LEON3. In: Kotásek, Z., et al. (eds.) Proceedings of the 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems DDECS 2010, pp. 237–242. IEEE, Los Alamitos (2010)
Gaisler, J., Catovic, E., Habinc, S.: GRLIB IP Library User’s Manual. Gaisler Research (2007)
Guz, Z., Bolotin, E., Keidar, I., Kolodny, A., Mendelson, A., Weiser, U.C.: Many-core vs. many-thread machines: Stay away from the valley. IEEE Comput. Archit. Lett. 8(1), 25–28 (2009)
Jesshope, C.: Scalable instruction-level parallelism. In: Pimentel, A.D., Vassiliadis, S. (eds.) SAMOS 2004. LNCS, vol. 3133, pp. 383–392. Springer, Heidelberg (2004)
Jesshope, C.R.: μTC - an intermediate language for programming chip multiprocessors. In: Asia-Pacific Computer Systems Architecture Conference, pp. 147–160 (2006)
Kissell, K.D.: MIPS MT: A multithreaded RISC architecture for embedded real-time processing. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds.) HiPEAC 2007. LNCS, vol. 4917, pp. 9–21. Springer, Heidelberg (2008)
Kongentira, P., Aingaran, K., Olukotum, K.: Niagara: a 32-way multithreaded SPARC processor. IEEE Micro 25(2), 21–29 (2005)
Mikschl, A., Damm, W.: MSparc: A Multithreaded Sparc. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 461–469. Springer, Heidelberg (1996)
Nekkalapu, S., Akkary, H., Jothi, K., Retnamma, R., Song, X.: A simple latency tolerant processor. In: IEEE 26th International Conference on Computer Design (2008)
Parcerisa, J.M., Gonzalez, A.: Improving latency tolerance of multithreading through decoupling. IEEE Trans. Comput. 50(10), 1084–1094 (2001)
Rangan, R., Vachharajani, N., Vachharajani, M., August, D.I.: Decoupled software pipelining with the synchronization array. In: PACT 2004: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 177–188. IEEE Computer Society, Washington, DC (2004)
Redstone, J., Eggers, S., Levy, H.: Mini-threads: Increasing TLP on small-scale SMT processors. In: 9th Intl Symp. On High-Performance Computer Architecture (HPCA 2003) 2003
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: HPCA6, pp. 375–386 (2000)
Saavedra-Barrera, R.H., Culler, D.E., von Eicken, T.: Analysis of multithreaded architectures for parallel computing (1990)
The Apple-CORE Consortium: Architecture Paradigms and Programming Languages for Efficient programming of multiple COREs, http://www.apple-core.info
Ungerer, T., Robič, B., Šilc, J.: A survey of processors with explicit multithreading. ACM Comput. Surv. 35(1), 29–63 (2003)
Waldspurger, C.A., Weihl, W.E.: Register relocation: Flexible contexts for multithreading. In: 20th Annual International Symposium on Computer Architecture, pp. 120–130 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sykora, J., Kafka, L., Danek, M., Kohout, L. (2011). Analysis of Execution Efficiency in the Microthreaded Processor UTLEON3. In: Berekovic, M., Fornaciari, W., Brinkschulte, U., Silvano, C. (eds) Architecture of Computing Systems - ARCS 2011. ARCS 2011. Lecture Notes in Computer Science, vol 6566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19137-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-19137-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19136-7
Online ISBN: 978-3-642-19137-4
eBook Packages: Computer ScienceComputer Science (R0)