DOUBLE INPUT OPERATORS OF THE DF KPI SYSTEM

Dataflow architectures can be used advantageously for computation-oriented applications that exhibit a fine grain parallelism. The implementation of the dataflow computer architecture depends on the form of execution of the dataflow program instructions, which is implemented as a process of receiving, processing and transmission of data tokens. The architecture described in this paper belongs to a class of dynamic dataflow architectures with direct operand matching. The concept of direct operand matching represents the elimination of the costly process (in terms of computing time) related to associative searching of the operands. This process is associated with the processing units of the proposed system. The processing units are designed as a dynamic multifunction pipelined unit of five segments, Load-Fetch-Operate-Matching-Copy. This pipeline stages handle processing of operand matching of dataflow operators. From the many types of operators, this paper describes microprogram managing for double input operators.


INTRODUCTION
With the requirements on high performance, a great focus was given to a specific class of parallel computers in the 60's, denoted as dataflow architectures.In dataflow architectures the computing process is managed by the operands flow accessed on different levels for executing instructions of dataflow program.The dataflow computational model uses a dataflow graph, to describe a computation.This graph consists of nodes (vertices), which indicate operations, and arcs (edges) from one node to another node, which indicate the flow of data between them.Nodal operations are executed when all required information has been received from the arcs into the node.Typically, a nodal operation requires one, two or N (N ≥ 3) operands (for conditional operations a Boolean input value) and produces one result.Hence one, two or N arcs enter a node and one arcs leave it.Once a node has been activated and the nodal operation performed (i.e. the node has fired) result is passed along output arc to waiting node or nodes, if the result is copied.This process is repeated until all of the nodes have fired and the final result has been created.Executing the program instructions can be done sequentially, in a flow (pipelining), parallel, or in different hybrid modes, depending on the used dataflow computational model.The fundamental idea behind the data flow computational model is the mapping of tasks to the computing elements, which can increase the rate of parallelism.Dataflow architectures can be used advantageously for computationoriented applications that exhibit a fine grain parallelism.Examples of such applications are image processing, scene analysis, aerodynamic simulation, weather prediction etc.The dataflow concept has been utilized in the design of various processors [1], [2], [3], [4], [5], [6], [7], [8], [9].The renewed interest in dataflow architectures is in part sparked by the underlying elegance of the model, but also motivated by the changes wrought by continued technology scaling [10].The WaveScalar architecture [8] is an example.

ARCHITECTURE OF THE DF-KPI SYSTEM
The task the computer designer faces is a complex one: Determine what attributes are important for a new computer, then design a computer to maximize performance while staying within cost, power, and availability constraints.This task has many aspects, including instruction set design, functional organization, logic design, and implementation.The implementation may encompass integrated circuit design, packaging, power, and cooling.Optimizing the design requires familiarity with a very wide range of technologies, from compilers and operating systems to logic design and packaging [11].The DF-KPI system [12], being developed at the Department of Computers and Informatics at the Faculty of Electrical Engineering and Informatics of the Technical University of Košice, has been designed as a dynamic system with direct operand matching.The architecture model of the DF-KPI computer is a part of a dataflow complex system, which includes support components for dataflow computing environment for the implementation of the defined application targets.
The structural organization (Fig. 1) of the DF-KPI computer architecture model consists of the following components: Coordinating Processors (CP) are intended to manage, coordinate and process instructions of the dataflow program, based on the presence of their operands, which are enabled at the CP.DI input port of the coordinating processor -either from its CP.DO output port or from the CP.DO output ports of other CPs through an interconnection network, or from a Data Queue Unit and from the Frame Store.The structure of the CP is a dynamic pipelined multiple-function system.
The Data Queue Unit (DQU) is a unit designed to store the activation symbols (data tokens), which represent operands waiting for matching during program execution.
The Instruction Store (IS) is a memory of instructions of the dataflow program, in the form of a proper data flow graph.
The Frame Store (FS) is a memory of matching (pairing) vectors, by means of which the CP detects the presence of operands to perform the operation defined by the operator (node) in the data flow graph.The short description of the item format of MV matching vector in the FS is <FS>::= <AF><V>, where AF is a flag of the operand's presence (Affiliation Flag) and V is the value of the given operand.
Supporting components of the dataflow system are needed to create a realistic computing environment.In the given architecture they are formed by the following: The Main computer (HOST) provides standard functions of the computer system during dataflow computing process.
The Information Technology unit is a unit used to create dedicated application environments (virtual reality, diagnostics, e-learning).
The I / O processors for fast direct inputs/outputs into the dataflow module (standard I/Os are implemented by the main computer).
The structure of the CP is a dynamic system with pipeline processing, composed of Load, Fetch, Operate, Matching and Copy segments also indicates the states of system.
The Load segment is used for loading the data token and its preparation for further processing in the Fetch segment.This segment is the first segment of the processor.
The Fetch segment reads the word DFI from the instruction memory IS.The word DFI defines the format of the dataflow instruction.This segment is reachable by the processor from the segment Load and Operate.In case that the segment Operate and segment Load requires the access to the Fetch segment a conflict occurs.The priority system decides which segment will be preferred.
The Operate segment handling the data token processing based on the operation code OC stored in word DFI.The execution units of the coordinating processor are accessed from this segment.The result of the operation is sent to the Load segment.In case that the result of the operation is intended to matching in the FS memory, then it is sent to the Matching segment.If the result is intended to Load and this segment is occupied, the token will be sent to another processor by the interconnection network, or will be saved into the DQU.
The Matching segment ensures matching of operands on the basis of flag in DFI.
Instructions needed for copying the operation results are processed in the Copy segment.
The format of the dataflow instructions is as follows: DFI ::= OC LI {DST, [IX]} n  Where OC is the operation code; LI is a literal (e.g.number of copies of the result); DST represents the target address for operation result; IX is a matching index for the operations.
The dataflow program instruction represented by a data token is stored in the Instruction Store at the address defined by DST field.The data token has the following format DT ::= P T, V MVB {DST, [IX]} Where P is the priority of the data token; T represents the data type of operand with a value V; MVB defines a base address of matching vector in the Frame Store and The RC field is set according to the size of the matching vector at compile-time.After the function associated with the operator has fired, the value of RC is decremented.If RC = 0, the Matching Vector in the frame store is released.

OPERAND MATCHING
One of the most important steps based on the dynamic dataflow model is direct operand matching [12], [13].The concept of direct operand matching represents the elimination of the costly process (in terms of computing time) related to associative searching of the operands.In this scheme, a matching vector is dynamically allocated in the Frame Store memory for each token generated during ISSN 1335-8243 (print) © 2011 FEI TUKE ISSN 1338-3957 (online) www.aei.tuke.skwww.versita.com/aeithe execution of the data flow graph.The current location of a matching vector in the Frame Store is determined at compile time, while the Frame Store location is determined after the program starts.Each calculation can be described using an instruction address (ADR) and the pointer to the matching vector MVB in the Frame Store.The <MVB, ADR> value pair is part of the token.A typical action is the searching for the operands pair in the Frame Store.The matching function provides searching for the tokens marked identically.After the operand has arrived to the Coordinating Processor, the matching function detects if a commonly entered operand is present in the Frame Store.Detection is performed according to matching IX index.If the operand is not there yet, it is stored in the Frame Store, in the Matching Vector specified by base address of the MVB operand, into the item specified by index IX.
The operand matching process control at the operator input is influenced by the process of matching, instruction execution and of a result at its output.Using a compiler producing DFG output with forward searching that allows for the detecting and eliminating of redundant computations and change order of token processing, process control can be defined as the transition of activation signs along the edges of the data flow graph (Fig. 2), between the "producer" (P) operator and the "consumer" (C) operator.In this article the proposed operand matching control for configuration shown in Fig. 2b is described.

Process Control for P-double Operators
The binary information stored in DF KPI system can be classified as either data or control information.The main components of this system are the CPs, wherein the data is manipulated in a datapath by using microoperations (microop), implemented with register transfers.These operations are implemented with adder/subtractors, shifters, registers, multiplexers and buses.The control unit of the CP provides signals that activate the various microop within the datapath to perform the specified processing tasks.The control unit of the CP also determines the sequence in which the various actions are performed.
The control unit that generates the signals for sequencing the microop is a sequential circuit with states that dictate the control signals for the system (Fig. 3).

Fig. 3 State (Mealy) Machine
At any given time, the state of the sequential circuit activates a prescribed set of microop.Using status conditions and control inputs, the sequential control unit determines the next state.The digital circuit that acts as the control unit provides a sequence of signals for activating the microop and also determines its own next state.
The control unit of the CP allows transition between different states denoted as Load, Fetch, Matching, Copy, and Operate.These states (Fig. 3) are represented by different segments (l -Load, f -Fetch, m -Matching, c -Copy, o -Operate) of the control unit in different order.The segments work in overlapped manner.Transitions between segments are controlled by the microprogram.
Formal notation of the microprogram (which results from the functional system specification and its decomposition to operational/datapath and control unit) is represented by the program scheme.Program model scheme of the operational part is expressed as a sequence of marked pairs i i p n : ( Where i n are labels, and i p -are instruction(s).
Instructions i p in respect to (1) display different elementary instructions -microinstructions, which initiate execution of different elementary operationsmicrooperations. Further following basic command types are defined where


-predicates (conditions), with the characteristics shows the microinstruction intended for the execution of microop, after which a transition is made to an instruction marked with the label n.Commands (3 -4) represent control instructions (microinstructions) intended for execution of branch and conditional jumps in the program (microprogram).These commands test the condition defined by the predicate.If the condition α i is valid a transition is made to an instruction marked with the label n i (3) or an operation X is simultaneously executed (4).
Let the execution of the operations X i be launched by the control word R i consisting of the control signal sequence Let the predicates α p be represented by the status information word Then the micro program of executed P-double input operators DFG (Fig. 2b), through microop for the various stages of multifunctional pipeline unit has a form The labels l, f, m, o represent the segments Load, Fetch, Matching and Operate.The microoperations and predicates of individual control words and status words are listed in Tab. 2 and Tab. 3.
The function isFree (X) tests the busy state of segment X. Micro-operations, which can be executed in parallel, are placed in a single command block (Tab.1).Initialization of the coordinating processor is done by sign Init = 1.The boot command of the data flow program loads the data token from the DQU to the LOAD segment, sets the busy flag for the LOAD segment to 1 (i.e. the LOAD segment is occupied) and in next step blocks the processing of the following tokens (GetDT = 0).If the next segment, the Matching segment, is free, the token is loaded into the Load/Matching register.After that, the microprogram releases the LOAD segment, activates the loading of other tokens into the coordinating processor, and performs operand matching for defined double input operator.
Fig. 4 Pipeline system for processing of double input operators The control mechanism copies the content of the Load/Matching register to the Matching/Fetch register and determines the DF address operator based on the MFR.DST.ADR address.In the next step the operator will be loaded from the instruction store (IS) into the FOR.
If the Operate segment is not busy (isFree (Operate) = true), the operator is fetched from the Fetch/Operate register and processed.In the next step, if the CP is not busy, the result of the operator processing is available for processing in the same CP.Otherwise; the result is propagated to another CP through the interconnection network.If all CPs are busy, the token is stored in the DQU.
The proposed architecture at a logical level of operand matching control is show in Fig. 4. FIFO registers with the following specifications have been inserted to increase the throughput coefficient between the various stages of coordinating processor:

Fig. 1
Fig. 1 The DF-KPI System DST specifies a destination address of the resulting DT data token.The structure of the DST field is the following DST ::= MF IP ADR Where MF is a matching function, with a defined set of labels {M, B}, M stands for matching (of two DTs), B stands for bypass (without DT matching); IP defines an input port {L(eft), R(ight)}; ADR is the address of the operator or function.If the operands enter the two-input or multi-input operators, operand matching occurs.The DF-KPI architecture uses the direct operand matching control mechanism.It is based on the allocation of a Matching Vector in the Frame Store according to the activation code (procedure, call).Allocated Matching Vectors are represented as a matching record in the Frame Store.The format of the Matching Vector in the Frame Store is as follows: FS[B ACT + H + IX + 1] ::= RC, MVS B OLD  DST RET  D{[B NEW ]{D}} Where B ACT is a pointer to the current top record; H is the size of a header of record; MVS defines the size of a matching vector; RC is the reference counter; B OLD is a pointer to the previous token; DST RET specifies the return address; B NEW defines the base address for new matching record and D represents an operand value.The RC field is set according to the size of the matching vector at compile-time.After the function associated with the operator has fired, the value of RC is decremented.If RC = 0, the Matching Vector in the frame store is released.

Fig. 2
Fig. 2 The operand matching (a -P-single input, b -P-double input, c -P-double input/Csingle input, d -P-double input/C-double input, e -P-double input/C-u-single, v-double input) Between the stages L and M  register LMR Between the stages M and F  register MFR Between the stages F and O  register FOR Unauthenticated | 194.138.39.60 Download Date | 1/15/14 3:20 AM