Abstract
Dataflow process networks (DPNs) provide a convenient model of computation that is often used to model system behavior in model-based designs. With fixed sets of nodes, they are also used as dataflow graphs as an intermediate program representation by compilers to uncover instruction-level parallelism of sequential programs. Many recent processor architectures, which are still von Neumann architectures, also use dataflow computing to increase their exploitation of instruction-level parallelism by exposing their datapaths so that the compiler can take care of the allocation of processing units (PUs), the execution schedules of instructions on the PUs, and the communication of intermediate values between PUs. If the communication paths are buffered, these architectures can be abstracted into a DPN architecture whose PUs and interconnection network are DPN nodes.
In this article, we introduce a DPN abstraction of hybrid dataflow/von Neumann architectures and consider the mapping of the nodes of a given dataflow graph to the PUs of such a DPN architecture such that there are no conflicts due to the mapping of different nodes to the same PU. We express the allocation and scheduling constraints in terms of propositional logic for the original dataflow graph and for a modified version of the dataflow graph that simplifies the constraints by introducing levels using copy nodes, such that all nodes receive inputs only from nodes of the previous level. We also formulate equisatisfiable SMT constraints using integer variables to reason directly about the parallel runtime. On this basis, we further present alternative SAT constraints that explicitly encode concurrency, and discuss variants of the constraints for a better understanding of the same.
- [1] . 1982. Data flow languages. IEEE Computer 15, 2 (1982), 15–25.Google ScholarDigital Library
- [2] . 1982. Data flow systems. IEEE Computer 15, 2 (1982), 10–13.Google ScholarDigital Library
- [3] . 2009. Static scheduling for cyclo-static data flow graphs. In Proceedings of the Parallel and Distributed Processing Techniques and Applications. (Ed.), CSREA Press, Las Vegas, Nevada, 302–306.Google Scholar
- [4] . 2018. On memory optimal code generation for exposed datapath architectures with buffered processing units. In Proceedings of the Application of Concurrency to System Design. and (Eds.), IEEE Computer Society, Bratislava, Slovakia, 115–124.Google ScholarCross Ref
- [5] . 1988. The price of asynchronous parallelism: An analysis of dataflow architectures. In Proceedings of the Conference on CONPAR 88. British Computer Society, Manchester, England, UK, 541–555.Google Scholar
- [6] . 1988. Assessing the benefits of fine-grain parallelism in dataflow programs. International Journal of Supercomuter Applications 2, 3 (1988), 10–36.Google ScholarDigital Library
- [7] . 1978. Can programming be liberated from the von neumann style? Communications of the ACM 21, 8 (1978), 613–641.Google ScholarDigital Library
- [8] . 1996. Software Synthesis from Dataflow Graphs. Kluwer Adacemic Publishers.Google ScholarDigital Library
- [9] . 2003. The synchronous languages twelve years later. Proc. IEEE 91, 1 (2003), 64–83.Google ScholarCross Ref
- [10] . 1991. Synchronous programming with events and relations: The SIGNAL language and its semantics. Science of Computer Programming 16, 2 (1991), 103–149.Google ScholarDigital Library
- [11] . 2020. Code Generation for Synchronous Control Asynchronous Dataflow Architectures. Ph. D. Dissertation. Department of Computer Science, University of Kaiserslautern, Germany.
PhD .Google Scholar - [12] . 2016. Towards code generation for the synchronous control asynchronous dataflow (SCAD) architectures. In Proceedings of the Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen. (Ed.), University of Freiburg, Freiburg, Germany, 77–88.Google Scholar
- [13] . 2016. Optimal compilation for exposed datapath architectures with buffered processing units by SAT solvers. In Proceedings of the Formal Methods and Models for Codesign. and (Eds.), IEEE Computer Society, Kanpur, India, 143–152.Google ScholarCross Ref
- [14] . 2017. Exploring different execution paradigms in exposed datapath architectures with buffered processing units. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation. and (Eds.), IEEE Computer Society, Pythagorion, Greece, 1–10.Google ScholarCross Ref
- [15] . 2017. Exploring the potential of instruction-level parallelism of exposed datapath architectures with buffered processing units. In Proceedings of the Application of Concurrency to System Design. and (Eds.), IEEE Computer Society, Zaragoza, Spain, 106–115.Google ScholarCross Ref
- [16] . 2022. Buffer allocation for exposed datapath architectures. In Proceedings of the International Symposium on Embedded Multicore/Many-core Systems-on-Chip. IEEE Computer Society, Penang, Malaysia, 18–25.Google ScholarCross Ref
- [17] . 2023. Program balancing in compilation for buffered hybrid dataflow processors. In Proceedings of the Computer Architectures and Platforms at Computer Software and Applications Conference. IEEE Computer Society, Torino, Italy.Google ScholarCross Ref
- [18] . 2013. Dynamic dataflow graphs. In Proceedings of the Handbook of Signal Processing Systems. Springer, 905–944.Google ScholarCross Ref
- [19] . 1996. Cyclo-static dataflow. IEEE Transactions on Signal Processing 44, 2 (1996), 397–408.Google ScholarDigital Library
- [20] . 2009. A survey of multicore processors. IEEE Signal Processing Magazine 26, 6 (2009), 26–37.Google ScholarCross Ref
- [21] . 1988. On the intersection of stacks and queues. Theoretical Computer Science 58 (1988), 69–80.Google ScholarDigital Library
- [22] . 1993. Scheduling Dynamic Dataflow Graphs with Bounded Memory Using the Token Flow Model. Ph. D. Dissertation. University of California, Berkeley, California.
PhD .Google ScholarDigital Library - [23] . 1994. Static scheduling and code generation from dynamic dataflow graphs with integer-valued control streams. In Proceedings of the Asilomar Conference on Signals, Systems, and Computers. IEEE Computer Society, Pacific Grove, CA, 508–513.Google ScholarCross Ref
- [24] . 1995. The token flow model. In Proceedings of the Advanced Topics in Dataflow Computing and Multithreading. , , and (Eds.), IEEE Computer Society, Hamilton Island, Queensland, Australia, 267–290.Google Scholar
- [25] . 1987. Incorporating dataflow ideas into von neumann processors for parallel execution. IEEE Transactions on Computers (T-C) 36, 12 (1987), 1515–1522.Google ScholarDigital Library
- [26] . 2004. Scaling to the end of silicon with EDGE architectures. IEEE Computer 37, 7 (2004), 44–55.Google ScholarDigital Library
- [27] . 1987. LUSTRE: A declarative language for programming synchronous systems. In Proceedings of the Principles of Programming Languages. ACM, Munich, Germany, 178–188.Google Scholar
- [28] . 1991. QRT FIFO automata, breadth-first grammars and their relations. Theoretical Computer Science 85 (1991), 171–203.Google ScholarDigital Library
- [29] . 2015. Upward planarity testing in practice: SAT formulations and comparative study. ACM Journal of Experimental Algorithmics 20 (2015), 1–27.Google ScholarDigital Library
- [30] . 1994. Design of transport triggered architectures. In Proceedings of the Great Lakes Symposium on VLSI. IEEE Computer Society, Notre Dame, IN, 130–135.Google ScholarCross Ref
- [31] . 1999. TTAs: Missing the ILP complexity wall. Journal of Systems Architecture 45, 12–13 (1999), 949–973.Google ScholarDigital Library
- [32] . 2000. Computation in the context of transport triggered architectures. International Journal of Parallel Programming 28, 4 (2000), 401–427.Google ScholarCross Ref
- [33] . 2018. Optimal scheduling for exposed datapath architectures with buffered processing units by ASP. Theory and Practice of Logic Programming 18, 1 (2018), 438–451.Google ScholarCross Ref
- [34] . 1978. The architecture and system method of DDM1: A recursively structured data driven machine. In Proceedings of the International Symposium on Computer Architecture. ACM, Palo Alto, CA, 210–215.Google ScholarDigital Library
- [35] . 1974. First version of a data-flow procedure language. In Proceedings of the Programming Symposium. (Ed.), LNCS, Vol. 19, Springer, Paris, France, 362–376.Google ScholarCross Ref
- [36] . 1980. Data flow supercomputers. IEEE Computer 13, 11 (1980), 48–56.Google ScholarDigital Library
- [37] . 1975. A preliminary architecture for a basic dataflow processor. In Proceedings of the International Symposium on Computer Architecture. and (Eds.), ACM, Houston, TX, 126–132.Google Scholar
- [38] , D. Eppstein, R. Hickingbotham, P. Morin, and D. R. Wood. 2022. Stack-number is not bounded by queue-number. Combinatorica 42, 2 (2022), 151–164.Google Scholar
- [39] . 1994. Cyclo-static dataflow: Model and implementation. In Proceedings of the Asilomar Conference on Signals, Systems and Computers. IEEE Computer Society, Pacific Grove, California, 503–507.Google Scholar
- [40] . 1995. Cyclo-static dataflow. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. IEEE Computer Society, Detroit, Michigan, 3255–3258.Google Scholar
- [41] . 1982. A second opinion of data flow machines and languages. IEEE Computer 15, 2 (1982), 58–69.Google ScholarDigital Library
- [42] . 1995. Upward planarity testing. Order 12 (1995), 109–133.Google ScholarCross Ref
- [43] . 2005. The AMIDAR class of reconfigurable processors. The Journal of Supercomputing 32, 2 (2005), 163–181.Google ScholarDigital Library
- [44] . 1987. SIGNAL, a declarative language for synchronous programming of real-time systems. In Proceedings of the Functional Programming Languages and Computer Architecture, (Ed.), LNCS, Vol. 274, Springer, Portland, Oregon, 257–277.Google ScholarCross Ref
- [45] . 2012. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 33, 5 (2012), 38–51.Google ScholarDigital Library
- [46] . 1989. The epsilon dataflow processor. ACM SIGARCH Computer Architecture News 17, 3 (1989), 36–45.Google ScholarDigital Library
- [47] . 1985. The manchester prototype dataflow computer. Communications of the ACM 28, 1 (1985), 34–52.Google ScholarDigital Library
- [48] . 1991. The synchronous dataflow programming language LUSTRE. Proc. IEEE 79, 9 (1991), 1305–1320.Google ScholarCross Ref
- [49] . 1992. Comparing queues and stacks as mechanisms for laying out graphs. SIAM Journal on Discrete Mathematics 5, 3 (1992), 398–412.Google ScholarDigital Library
- [50] . 1996. Recognizing leveled-planar DAGs in linear time. In Proceedings of the Graph Drawing. (Ed.), LNCS, Vol. 1027, Springer, Passau, Germany, 300–311.Google ScholarCross Ref
- [51] . 1997. Stack and queue layouts of posets. SIAM Journal on Computing 10, 4 (1997), 599–625.Google ScholarDigital Library
- [52] . 1999. Stack and queue layouts of directed acyclic graphs: Part II. SIAM Journal on Computing 28, 5 (1999), 1588–1626.Google ScholarDigital Library
- [53] . 1999. Stack and queue layouts of directed acyclic graphs: Part I. SIAM Journal on Computing 28, 4 (1999), 1510–1539.Google ScholarDigital Library
- [54] . 1992. Comparing queues and stacks as machines for laying out graphs. SIAM Journal on Computing 21, 5 (1992), 927–958.Google ScholarDigital Library
- [55] . 1988. Dataflow computing models, languages, and machines for intelligence computations. IEEE Transactions on Software Engineering 14, 12 (1988), 1805–1828.Google ScholarDigital Library
- [56] . 1994. Transport-triggering vs. operation-triggering. In Proceedings of the Compiler Construction. (Ed.), LNCS, Vol. 786, Springer, Edinburgh, UK, 435–449.Google ScholarCross Ref
- [57] . 1988. Towards a dataflow/von neumann hybrid architecture. In Proceedings of the International Symposium on Computer Architecture. (Ed.), IEEE Computer Society, Honolulu, Hawaii, 131–140.Google ScholarCross Ref
- [58] . 2004. Advances in dataflow programming languages. ACM Computing Surveys 36, 1 (2004), 1–34.Google ScholarDigital Library
- [59] . 2018. Transport-triggered soft cores. In Proceedings of the International Parallel and Distributed Processing Symposium Workshops. IEEE Computer Society, Vancouver, BC, Canada, 83–90.Google ScholarCross Ref
- [60] . 1998. Level planarity testing in linear time. In Proceedings of the Graph Drawing. (Ed.), LNCS, Vol. 1547, Springer, Montréal, Canada, 224–237.Google ScholarCross Ref
- [61] . 1974. The semantics of a simple language for parallel programming. In Proceedings of the Information Processing. (Ed.), North-Holland, Stockholm, Sweden, 471–475.Google Scholar
- [62] . 1977. Coroutines and networks of parallel processes. In Proceedings of the Information Processing. (Ed.), North-Holland, Toronto, Canada, 993–998.Google Scholar
- [63] . 1966. Properties of a model for parallel computations: Determinacy, termination, queueing. SIAM Journal on Applied Mathematics 14, 6 (1966), 1390–1411.Google ScholarDigital Library
- [64] . 1991. Programming real-time applications with SIGNAL. Proc. IEEE 79, 9 (1991), 1321–1336.Google ScholarCross Ref
- [65] . 1991. Consistency in dataflow graphs. IEEE Transactions on Parallel and Distributed Systems 2, 2 (1991), 223–235.Google ScholarDigital Library
- [66] . 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers (T-C) 36, 1 (1987), 24–35.Google ScholarDigital Library
- [67] . 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235–1245.Google ScholarCross Ref
- [68] . 1995. Dataflow process networks. Proc. IEEE 83, 5 (1995), 773–801.Google ScholarCross Ref
- [69] . 1977. Fully abstract models of typed \(\lambda\)-calculi. Theoretical Computer Science 4, 1 (1977), 1–22.Google ScholarCross Ref
- [70] . 2017. Dataflow Supercomputing Essentials – Algorithms, Applications and Implementations. Springer.Google Scholar
- [71] . 2015. Guide to Dataflow Supercomputing – Basic Concepts, Case Studies, and a Detailed Example. Springer.Google ScholarCross Ref
- [72] . 1989. Can dataflow subsume von neumann computing?. In Proceedings of the International Symposium on Computer Architecture. IEEE Computer Society, Jerusalem, Israel, 262–272.Google Scholar
- [73] . 1990. Monsoon: An explicit token-store architecture. In Proceedings of the International Symposium on Computer Architecture. and (Eds.), IEEE Computer Society, Seattle, Washington, 82–91.Google ScholarDigital Library
- [74] . 1992. Exploring the Powers of Stacks and Queues Via Graph Layouts. Ph. D. Dissertation. Virigina Polytechnic Institute and State University, Blacksburg, VA.
PhD .Google Scholar - [75] . 1977. LCF considered as a programming language. Theoretical Computer Science 5, 3 (1977), 223–255.Google ScholarCross Ref
- [76] . 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. ACM, Vancouver, British Columbia, Canada, 128–138.Google ScholarDigital Library
- [77] . 2023. Towards buffers as a scalable alternative to registers for processor-local memory. In Proceedings of the Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen.VDE, Freiburg, Germany, 1–12.Google Scholar
- [78] . 1977. A data flow multiprocessor. IEEE Transactions on Computers (T-C) 26, 2 (1977), 138–146.Google ScholarDigital Library
- [79] . 2004. TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP. ACM Transactions on Architecture and Code Optimization 1, 1 (2004), 62–93.Google ScholarDigital Library
- [80] . 1975. Sequentially and parallely computable functionals. In Proceedings of the Lambda-Calculus and Computer Science Theory. (Ed.), LNCS, Vol. 37, Springer, Rome, Italy, 312–318.Google ScholarDigital Library
- [81] . 1976. Degrees of parallelism in computations. In Proceedings of the Mathematical Foundations of Computer Science (Ed.), LNCS, Vol. 45, Springer, Gdansk, Poland, 517–523.Google ScholarCross Ref
- [82] . 1976. Expressibility of functions in D. Scott’s LCF language. Algebra and Logic 15 (1976), 192–206.Google ScholarCross Ref
- [83] . 2021. Translating structured sequential programs to dataflow graphs. In Proceedings of the Formal Methods and Models for Codesign. and (Eds.), ACM, Beijing, China, 66–77.Google ScholarDigital Library
- [84] . 2022. Code generation criteria for buffered exposed datapath architectures from dataflow graphs. In Proceedings of the Languages, Compilers, and Tools for Embedded Systems. and (Eds.), ACM, San Diego, CA, 133–145. Google ScholarDigital Library
- [85] . 2022. Virtual buffers for exposed datapath architectures. In Proceedings of the Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen. (Ed.), VDE, Virtual Event, 45–55.Google Scholar
- [86] (Ed.). 1992. Data Flow Computing – Theory and Practice. Ablex Publishing.Google Scholar
- [87] . 2006. The WaveScalar Architecture. Ph. D. Dissertation. University of Washington.
PhD .Google Scholar - [88] . 2003. WaveScalar. In Proceedings of the Microarchitecture.IEEE Computer Society, San Diego, California, 291–302.Google ScholarCross Ref
- [89] . 2007. The WaveScalar architecture. ACM Transactions on Computer Systems 25, 2 (2007), 1–54.Google ScholarDigital Library
- [90] . 1930. Sur l’extension de l’ordre partiel. Fundamenta Mathematicae 16 (1930), 386–389.Google ScholarCross Ref
- [91] . 1999. Design Decisions in the Implementation of a RAW Architecture Workstation. Master’s thesis. Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA.
Master .Google Scholar - [92] . 2002. The RAW microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2 (2002), 25–35.Google ScholarDigital Library
- [93] . 1975. On representation of sequential and parallel functions. In Mathematical Foundations of Computer Science. (Ed.), LNCS, Vol. 32, Springer, Mariánské Lázně, Poland, 411–417.Google Scholar
- [94] . 1976. Recursive program schemes and computable functionals. In Proceedings of the Mathematical Foundations of Computer Science. (Ed.), LNCS, Vol. 45, Springer, Gdansk, Poland, 137–152.Google ScholarCross Ref
- [95] . 1976. Relationships between classes of monotonic functions. Theoretical Computer Science 2, 2 (1976), 225–247.Google ScholarCross Ref
- [96] . 1991. Overview of the monsoon project. In Proceedings of the International Conference on Computer Design. IEEE Computer Society, Cambridge, Massachusetts, 150–155.Google ScholarCross Ref
- [97] . 1993. Datenflußrechner. Teubner.Google ScholarCross Ref
- [98] . 1986. Dataflow machine architecture. ACM Computing Surveys 18, 4 (1986), 365–396.Google ScholarDigital Library
- [99] . 1970. Über einen automaten mit pufferspeicherung. Computing 5, 1 (1970), 57–70.Google ScholarCross Ref
- [100] . 1945. First Draft of a Report on the EDVAC.
Technical Report . Moore School of Electrical Engineering, University of Pennsylvania.Google ScholarDigital Library - [101] . 1993. First draft of a report on the EDVAC. IEEE Annals of the History of Computing 15, 4 (1993), 27–75.Google ScholarDigital Library
- [102] . 1997. Baring it all to software: RAW machines. IEEE Computer 30, 9 (1997), 86–93.Google ScholarDigital Library
- [103] . 1982. A practical data flow computer. IEEE Computer 15, 2 (1982), 51–57.Google ScholarDigital Library
- [104] . 2014. Hybrid dataflow/von-neumann architectures. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1489–1509.Google ScholarDigital Library
- [105] . 1987. The SIGMA-1 dataflow computer. In Proceedings of the Fall Joint Computer Conference on Exploring Technology: Today and Tomorrow. (Ed.), ACM, Chicago, IL, 578–585.Google Scholar
- [106] . 1998. The energy complexity of register files. In Proceedings of the International Symposium on Low Power Electronics and Design. IEEE Computer Society, Monterey, CA, 305–310.Google Scholar
Index Terms
- Consistency Constraints for Mapping Dataflow Graphs to Hybrid Dataflow/von Neumann Architectures
Recommendations
Allocation and Scheduling of Dataflow Graphs on Hybrid Dataflow/von Neumann Architectures
MEMOCODE '23: Proceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System DesignHybrid dataflow/von Neumann processors expose their processing units and datapaths to the compiler to exploit the instruction-level parallelism of sequential programs. Generating code from dataflow graphs for such processors that use FIFO-buffered ...
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on MicroarchitectureA mini-graph is a dataflow graph that has an arbitrary internal size and shape but the interface of a singleton instruction: two register inputs, one register output, a maximum of one memory operation, and a maximum of one (terminal) control transfer. ...
Code generation criteria for buffered exposed datapath architectures from dataflow graphs
LCTES 2022: Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsMany novel processor architectures expose their processing units (PUs) and internal datapaths to the compiler. To avoid an unnecessary synchronization of PUs, the datapaths are often buffered which results in buffered exposed datapath (BED) ...
Comments