Abstract
Energy efficiency has become one of the most important topics in computing. To meet the ever increasing demands of the mobile market, the next generation of processors will have to deliver a high compute performance at an extremely limited energy budget. Wide single instruction, multiple data (SIMD) architectures provide a promising solution, as they have the potential to achieve high compute performance at a low energy cost. We propose a configurable wide SIMD architecture that utilizes explicit datapath techniques to further optimize energy efficiency without sacrificing computational performance. To demonstrate the efficiency of the proposed architecture, multiple instantiations of the proposed wide SIMD architecture and its automatic bypassing counterpart, as well as a baseline RISC processor, are implemented. Extensive experimental results show that the proposed architecture is efficient and scalable in terms of area, performance, and energy. In a 128-PE SIMD processor, the proposed architecture is able to achieve an average of 206 times speed-up and reduces the total energy dissipation by 48.3 % on average and up to 94 %, compared to a reduced instruction set computing (RISC) processor. Compared to the corresponding SIMD architecture with automatic bypassing, an average of 64 % of all register file accesses is avoided by the 128-PE, explicitly bypassed SIMD. For total energy dissipation, an average of 27.5 %, and maximum of 43.0 %, reduction is achieved.
Similar content being viewed by others
References
Abbo, A., Kleihorst, R., Choudhary, V., Sevat, L., Wielage, P., Mouy, S., Vermeulen, B., Heijligers, M. (2008). Xetal-II: A 107 GOPS, 600 mW massively parallel processor for video scene analysis. IEEE Journal of Solid-State Circuits (JSSC), 43(1), 192–201.
Amdahl, G.M. (2007). Validity of the single processor approach to achieving large scale computing capabilities. IEEE Solid-State Circuits Society Newsletter, 12(3), 19–20.
Balfour, J., Harting, R., Dally, W. (2009). Operand registers and explicit operand forwarding. IEEE Computer Architecture Letters, 8(2), 60–63.
Corporaal, H. (1998). Microprocessor architectures: from VLIW to TTA. Wiley.
Frijns, R., Fatemi, H., Mesman, B., Corporaal, H. (2008). DC-SIMD: dynamic communication for SIMD processors. Proceedings of international symposium on parallel and distributed processing (IPDPS) (pp. 1–10).
Goel, N., Kumar, A., Panda, P. (2007). Power reduction in VLIW processor with compiler driven bypass network. Proceedings of the 20th international conference on vlsi design (VLSID) (pp. 233–238).
Guan, X., & Fei, Y. (2008). Reducing power consumption of embedded processors through register file partitioning and compiler support. Proceedings of international conference on application-specific systems, architectures and processors (ASAP) (pp. 269–274).
Gustafson, J.L. (1988). Reevaluating Amdahl’s law. Communications of the ACM, 31(5), 532–533.
He, Y. (2013). Low power architectures for streaming applications. PhD Thesis.
He, Y., Pu, Y., Ye, Z., Londono, S., Kleihorst, R., Abbo, A., Corporaal, H. (2010). Xetal-Pro: An ultra-low energy and high throughput SIMD processor. Proceedings of the 47th design automation conference (DAC) (pp. 543–548).
He, Y., She, D., Mesman, B., Corporaal, H. (2011). MOVE-Pro: a low power and high code density TTA architecture. Proceedings of the 11th international conference on embedded computer systems: architectures, modeling, and simulation (SAMOS) (pp. 294–301).
He, Y., Ye, Z., She, D., Mesman, B., Corporaal, H. (2011). Feasibility analysis of ultra high frame rate visual servoing on FPGA and SIMD processor. Proceedings of advances concepts for intelligent vision systems (ACIVS) (pp. 623–634).
He, Y., Ye, Z., She, D., Pieters, R., Mesman, B. (2010). Corporaal, H.: 1000 fps visual servoing on the reconfigurable wide SIMD processor. Proceedings of the 16th annual conference of the advanced school for computing and imgaging (ASCI) (pp. 302–309).
He, Y., Zivkovic, Z., Kleihorst, R., Danilin, A., Corporaal, H. (2008). Real-time implementations of hough transform on SIMD architecture. Proceedings of the ACM/IEEE international conference on distributed smart cameras (ICDSC) (pp. 1–8).
He, Y., Zivkovic, Z., Kleihorst, R., Danilin, A., Corporaal, H., Mesman, B. (2008). Real-time hough transform on 1-D SIMD processors: implementation and architecture exploration. Proceedings of the international conference advanced concepts for intelligent vision systems (ACIVS) (pp. 254–265).
Kapasi, U., Dally, W., Rixner, S., Owens, J., Khailany, B. (2002). The Imagine stream processor. Proceedings of international conference on computer design: vlsi in computers and processors (ICCD) (pp. 282–288).
Kyo, S., & Okazaki, S. (2008). IMAPCAR: A 100 GOPS in-vehicle vision processor based on 128 ring connected four-way VLIW processing elements. Journal of Signal Processing Systems, 1–12.
Otsu, N. (1975). A threshold selection method from gray-level histograms. IEEE Transactions on Systems Man, and Cybernetics, 11, 285–296.
Prengler, A., & Adi, K. (2009). A reconfigurable SIMD-MIMD processor architecture for embedded vision processing applications. SAE World Congress, (pp. 1–9).
CACTI: cacti 5.3, rev 174. http://quid.hpl.hp.com:9081/cacti/.
Delft University of Technology: MOVE project. http://ce.et.tudelft.nl/MOVE/.
Tampere University of Technology: TTA-based codesign environment (TCE). http://tce.cs.tut.fi/.
Pu, Y., He, Y., Ye, Z., Londono, S., Abbo, A., Kleihorst, R., Corporaal, H. (2011). From Xetal-II to Xetal-Pro: on the road toward an ultra low-energy and high-throughput SIMD processor. IEEE Transactions on Circuits and Systems for Video Technology (TCAS-VT), 21(4), 472–484.
Raghavan, P., Munaga, S., Ramos, E., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D. (2007). A customized cross-bar for data-shuffling in domain-specific SIMD processors. Proceedings of architecture of computing systems (ARCS) (pp. 57–68).
Satpathy, S., Foo, Z., Giridhar, B., Dreslinski, R., Sylvester, D., Mudge, T., Blaauw, D. (2010). A 1.07 Tbit/s 128x128 swizzle network for SIMD processors. Proceedings of IEEE symposium on VLSI circuits (VLSIC) (pp. 81–82).
She, D., He, Y., Corporaal, H. (2012). Energy efficient special instruction support in an embedded processor with compact ISA. Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES) (pp. 131–140).
She, D., He, Y., Mesman, B., Corporaal, H. (2012). Scheduling for register file energy minimization in explicit datapath architectures. Proceedings of design, automation test in europe conference exhibition (DATE) (pp. 388–393).
She, D., He, Y., Waeijen, L., Corporaal, H. (2013). OpenCL code generation for low energy wide SIMD architectures with explicit datapath. Proceedings of international conference on embedded computer systems: architectures, modeling, and simulation (SAMOS) (pp. 322–329).
Waeijen, L., She, D., Corporaal, H., He, Y. (2013). SIMD made explicit. Proceedings of international conference on embedded computer systems: architectures, modeling, and simulation (SAMOS) (pp. 330–337).
Waeijen, L., She, D., Corporaal, H., He, Y. (2014). Reduction operator for Wide-SIMDs reconsidered. Proceedings of the 51st design automation conference (DAC) (pp. 1–6).
van de Waerdt, J., & et al. (2005). The TM3270 media-processor. Proceedings of the 38th international symposium on microarchitecture (MICRO) (pp. 331–342).
Woh, M., & et al. (2008). From SODA to scotch: The evolution of a wireless baseband processor. Proceedings of the 41st IEEE/ACM international symposium on microarchitecture (pp. 152–163).
Woh, M., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K. (2010). AnySP: anytime anywhere anyway signal processing. IEEE Micro, 30(1), 81–91.
Yan, J., & Zhang, W. (2007). Virtual registers: Reducing register pressure without enlarging the register file. Proceedings of high performance embedded architectures and compilers (HiPEAC) (pp. 57–70).
Acknowledgments
This work is supported by the Ministry of Economic Affairs of the Netherlands, project EVA PID07121, and the Dutch Technology Foundation STW, project NEST 10346.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article has been partially presented at the 13th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII) [29].
Rights and permissions
About this article
Cite this article
Waeijen, L., She, D., Corporaal, H. et al. A Low-Energy Wide SIMD Architecture with Explicit Datapath. J Sign Process Syst 80, 65–86 (2015). https://doi.org/10.1007/s11265-014-0950-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0950-8