Skip to main content

Advertisement

Log in

A Low-Energy Wide SIMD Architecture with Explicit Datapath

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Energy efficiency has become one of the most important topics in computing. To meet the ever increasing demands of the mobile market, the next generation of processors will have to deliver a high compute performance at an extremely limited energy budget. Wide single instruction, multiple data (SIMD) architectures provide a promising solution, as they have the potential to achieve high compute performance at a low energy cost. We propose a configurable wide SIMD architecture that utilizes explicit datapath techniques to further optimize energy efficiency without sacrificing computational performance. To demonstrate the efficiency of the proposed architecture, multiple instantiations of the proposed wide SIMD architecture and its automatic bypassing counterpart, as well as a baseline RISC processor, are implemented. Extensive experimental results show that the proposed architecture is efficient and scalable in terms of area, performance, and energy. In a 128-PE SIMD processor, the proposed architecture is able to achieve an average of 206 times speed-up and reduces the total energy dissipation by 48.3 % on average and up to 94 %, compared to a reduced instruction set computing (RISC) processor. Compared to the corresponding SIMD architecture with automatic bypassing, an average of 64 % of all register file accesses is avoided by the 128-PE, explicitly bypassed SIMD. For total energy dissipation, an average of 27.5 %, and maximum of 43.0 %, reduction is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22

Similar content being viewed by others

References

  1. Abbo, A., Kleihorst, R., Choudhary, V., Sevat, L., Wielage, P., Mouy, S., Vermeulen, B., Heijligers, M. (2008). Xetal-II: A 107 GOPS, 600 mW massively parallel processor for video scene analysis. IEEE Journal of Solid-State Circuits (JSSC), 43(1), 192–201.

    Article  Google Scholar 

  2. Amdahl, G.M. (2007). Validity of the single processor approach to achieving large scale computing capabilities. IEEE Solid-State Circuits Society Newsletter, 12(3), 19–20.

    Article  Google Scholar 

  3. Balfour, J., Harting, R., Dally, W. (2009). Operand registers and explicit operand forwarding. IEEE Computer Architecture Letters, 8(2), 60–63.

    Article  Google Scholar 

  4. Corporaal, H. (1998). Microprocessor architectures: from VLIW to TTA. Wiley.

  5. Frijns, R., Fatemi, H., Mesman, B., Corporaal, H. (2008). DC-SIMD: dynamic communication for SIMD processors. Proceedings of international symposium on parallel and distributed processing (IPDPS) (pp. 1–10).

  6. Goel, N., Kumar, A., Panda, P. (2007). Power reduction in VLIW processor with compiler driven bypass network. Proceedings of the 20th international conference on vlsi design (VLSID) (pp. 233–238).

  7. Guan, X., & Fei, Y. (2008). Reducing power consumption of embedded processors through register file partitioning and compiler support. Proceedings of international conference on application-specific systems, architectures and processors (ASAP) (pp. 269–274).

  8. Gustafson, J.L. (1988). Reevaluating Amdahl’s law. Communications of the ACM, 31(5), 532–533.

    Article  Google Scholar 

  9. He, Y. (2013). Low power architectures for streaming applications. PhD Thesis.

  10. He, Y., Pu, Y., Ye, Z., Londono, S., Kleihorst, R., Abbo, A., Corporaal, H. (2010). Xetal-Pro: An ultra-low energy and high throughput SIMD processor. Proceedings of the 47th design automation conference (DAC) (pp. 543–548).

  11. He, Y., She, D., Mesman, B., Corporaal, H. (2011). MOVE-Pro: a low power and high code density TTA architecture. Proceedings of the 11th international conference on embedded computer systems: architectures, modeling, and simulation (SAMOS) (pp. 294–301).

  12. He, Y., Ye, Z., She, D., Mesman, B., Corporaal, H. (2011). Feasibility analysis of ultra high frame rate visual servoing on FPGA and SIMD processor. Proceedings of advances concepts for intelligent vision systems (ACIVS) (pp. 623–634).

  13. He, Y., Ye, Z., She, D., Pieters, R., Mesman, B. (2010). Corporaal, H.: 1000 fps visual servoing on the reconfigurable wide SIMD processor. Proceedings of the 16th annual conference of the advanced school for computing and imgaging (ASCI) (pp. 302–309).

  14. He, Y., Zivkovic, Z., Kleihorst, R., Danilin, A., Corporaal, H. (2008). Real-time implementations of hough transform on SIMD architecture. Proceedings of the ACM/IEEE international conference on distributed smart cameras (ICDSC) (pp. 1–8).

  15. He, Y., Zivkovic, Z., Kleihorst, R., Danilin, A., Corporaal, H., Mesman, B. (2008). Real-time hough transform on 1-D SIMD processors: implementation and architecture exploration. Proceedings of the international conference advanced concepts for intelligent vision systems (ACIVS) (pp. 254–265).

  16. Kapasi, U., Dally, W., Rixner, S., Owens, J., Khailany, B. (2002). The Imagine stream processor. Proceedings of international conference on computer design: vlsi in computers and processors (ICCD) (pp. 282–288).

  17. Kyo, S., & Okazaki, S. (2008). IMAPCAR: A 100 GOPS in-vehicle vision processor based on 128 ring connected four-way VLIW processing elements. Journal of Signal Processing Systems, 1–12.

  18. Otsu, N. (1975). A threshold selection method from gray-level histograms. IEEE Transactions on Systems Man, and Cybernetics, 11, 285–296.

    Google Scholar 

  19. Prengler, A., & Adi, K. (2009). A reconfigurable SIMD-MIMD processor architecture for embedded vision processing applications. SAE World Congress, (pp. 1–9).

  20. CACTI: cacti 5.3, rev 174. http://quid.hpl.hp.com:9081/cacti/.

  21. Delft University of Technology: MOVE project. http://ce.et.tudelft.nl/MOVE/.

  22. Tampere University of Technology: TTA-based codesign environment (TCE). http://tce.cs.tut.fi/.

  23. Pu, Y., He, Y., Ye, Z., Londono, S., Abbo, A., Kleihorst, R., Corporaal, H. (2011). From Xetal-II to Xetal-Pro: on the road toward an ultra low-energy and high-throughput SIMD processor. IEEE Transactions on Circuits and Systems for Video Technology (TCAS-VT), 21(4), 472–484.

    Article  Google Scholar 

  24. Raghavan, P., Munaga, S., Ramos, E., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D. (2007). A customized cross-bar for data-shuffling in domain-specific SIMD processors. Proceedings of architecture of computing systems (ARCS) (pp. 57–68).

  25. Satpathy, S., Foo, Z., Giridhar, B., Dreslinski, R., Sylvester, D., Mudge, T., Blaauw, D. (2010). A 1.07 Tbit/s 128x128 swizzle network for SIMD processors. Proceedings of IEEE symposium on VLSI circuits (VLSIC) (pp. 81–82).

  26. She, D., He, Y., Corporaal, H. (2012). Energy efficient special instruction support in an embedded processor with compact ISA. Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES) (pp. 131–140).

  27. She, D., He, Y., Mesman, B., Corporaal, H. (2012). Scheduling for register file energy minimization in explicit datapath architectures. Proceedings of design, automation test in europe conference exhibition (DATE) (pp. 388–393).

  28. She, D., He, Y., Waeijen, L., Corporaal, H. (2013). OpenCL code generation for low energy wide SIMD architectures with explicit datapath. Proceedings of international conference on embedded computer systems: architectures, modeling, and simulation (SAMOS) (pp. 322–329).

  29. Waeijen, L., She, D., Corporaal, H., He, Y. (2013). SIMD made explicit. Proceedings of international conference on embedded computer systems: architectures, modeling, and simulation (SAMOS) (pp. 330–337).

  30. Waeijen, L., She, D., Corporaal, H., He, Y. (2014). Reduction operator for Wide-SIMDs reconsidered. Proceedings of the 51st design automation conference (DAC) (pp. 1–6).

  31. van de Waerdt, J., & et al. (2005). The TM3270 media-processor. Proceedings of the 38th international symposium on microarchitecture (MICRO) (pp. 331–342).

  32. Woh, M., & et al. (2008). From SODA to scotch: The evolution of a wireless baseband processor. Proceedings of the 41st IEEE/ACM international symposium on microarchitecture (pp. 152–163).

  33. Woh, M., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K. (2010). AnySP: anytime anywhere anyway signal processing. IEEE Micro, 30(1), 81–91.

    Article  Google Scholar 

  34. Yan, J., & Zhang, W. (2007). Virtual registers: Reducing register pressure without enlarging the register file. Proceedings of high performance embedded architectures and compilers (HiPEAC) (pp. 57–70).

Download references

Acknowledgments

This work is supported by the Ministry of Economic Affairs of the Netherlands, project EVA PID07121, and the Dutch Technology Foundation STW, project NEST 10346.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yifan He.

Additional information

This article has been partially presented at the 13th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII) [29].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Waeijen, L., She, D., Corporaal, H. et al. A Low-Energy Wide SIMD Architecture with Explicit Datapath. J Sign Process Syst 80, 65–86 (2015). https://doi.org/10.1007/s11265-014-0950-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0950-8

Keywords

Navigation