Instant Feedback Rapid Prototyping for GPU-Accelerated Computation, Manipulation, and Visualization of Multidimensional Data

Objective We have created an open-source application and framework for rapid GPU-accelerated prototyping, targeting image analysis, including volumetric images such as CT or MRI data. Methods A visual graph editor enables the design of processing pipelines without programming. Run-time compiled compute shaders enable prototyping of complex operations in a matter of minutes. Results GPU-acceleration increases processing the speed by at least an order of magnitude when compared to traditional multithreaded CPU-based implementations, while offering the flexibility of scripted implementations. Conclusion Our framework enables real-time, intuition-guided accelerated algorithm and method development, supported by built-in scriptable visualization. Significance This is, to our knowledge, the first tool for medical data analysis that provides both high performance and rapid prototyping. As such, it has the potential to act as a force multiplier for further research, enabling handling of high-resolution datasets while providing quasi-instant feedback and visualization of results.


Introduction
This document provides an overview of possibilities to extend our real-time rapid-prototyping framework xcv. The most up-to-date documentation and source code can be found at https://bitbucket.org/maxmalek/xcv.
There are two ways to add extra functionality: Lua [1] scripting and C++ plugins. The first part of this document describes the more user-friendly Lua API. The second part explains the C++ extension API and plugin creation.
2 Extension via nodes, using the Lua API Xcv performs data manipulation and most heavy computations via GLSL compute shaders. Lua 5.3 is used as a glue language for user interface (UI), light computation, automatic resource management and orchestration of internal components. GLSL shader code is typically embedded as a Lua string or generated on the fly and further compiled into a shader program, and then invoked as needed. Most operations can (and should) be performed using the Lua API, such as reading and writing data (file formats are detected automatically where applicable, e.g. image files), interacting with the GPU (textures, buffers, shaders, etc.), user interaction, automation, etc. The rendering process is scripted as well but is outside of the scope of this document.
Resource management is integrated with the Lua garbage collector and handled automatically. A node has defined inputs and outputs that may perform operations on the data passing through, but is not allowed to have global side effects. Data that pass through the graph are immutable, i.e. nodes never change the data directly. This ensures that the same input can be passed to multiple nodes without risking unexpected changes to the data before all nodes are done processing. This has to be taken care of when implementing a node -make sure not to change the input data. A node typically creates its own output data, which is then passed to nodes down-stream.

Node API
A node is a single Lua script that exposes functions called by the core scripts. The script file should be self-contained, i.e. not rely on 3rd-party libraries or anything beyond the core scripts. A node script should not write to the global Lua namespace, such as register global functions or variables. In order to prevent accidental access to globals, node scripts are sandboxed: They can see the same global namespace as the rest of the application, but changes to the global namespace are restricted and contained within the node script to some degree. This feature exists solely to catch implementation errors, but is not a "safe" sandboxing method to contain e.g. malicious scripts.
A node script typically ends by returning its definition table (Figure 1), containing functions and data definitions. All entries in the definition table are optional. Recognized table entries are listed in Subsection A.1. All node interface functions get passed one or more parameters when called by the core scripts. The first parameter is always the respective node itself.  For nodes with a simple compute shader with "obvious" inputs and outputs but some missing interface functions, the core scripts will look at the definition table and try to provide the missing functions automatically. If this is not possible and the node does not satisfy all requirements, it will fail to load and an error is displayed. The automatisms are as follows: • If src is specified, compile the shader code into node.shader • If file is specified, load shader code from that file and use it as if it was src • Inspect the shader and deduce connectors for the node.
-Inputs start with in_ -Outputs start with out_ -Tunables/knobs start with u_ • If the shader has buffers as input or output, require user implemented recalc() / makeOutput() functions. • If the shader has a single image as output: Allocate output image based on input, in specified format and size (Subsection A.1) if supplied, otherwise same as input: -If single input, use that -If multiple inputs: use outGen definition (Subsection A.1) • Run compute shader with inputs and outputs • Return generated output Whenever a node changes and must re-evaluate its inputs, node:onSomethingChanged() must be called to propagate changes through the graph. Dependant nodes' recalc() method will be automatically invoked if required.
If a node script requires functions from a plugin, use require "libname" in its init() function.

User Interface
function drawUI() if imgui.Button("Press") then --button was pressed end end The user interface is exclusively built with dear imgui [2]. In contrast to other ("typical"/retained-mode) UIs, logic happens while widgets are drawn ("immediate mode gui"). A widget is simply a function ( Figure 3). There is no state or UI setup phase, everything is drawn on demand and thus recreated in every frame. This makes it very easy to modify the UI in flight. Figure 2 is a more complete example with graphical representation.
Since nodes are implemented in Lua, a good place to look at currently exported functionality is src/guibase/lua_imgui.cpp -this file contains the interesting bits and implementation of the imgui Lua bindings.
The file bin/luacore/imgui.lua contains more useful functions implemented on top of the C++/Lua interface. Refer to the dear imgui repo at https://github.com/ocornut/imgui/ for further examples and more documentation.
#version 430 layout(local_size_x=32, local_size_y=32) in; uniform sampler2D in_Tex; writeonly restrict uniform image2D out_Tex; uniform float u_exp = 1.0, u_add = 0.0; void main() Compute shaders are exclusively implemented in GLSL and compiled by the graphics driver. Before shader code is passed to the graphics driver, some modifications are done to enhance compatibility with drivers, conditionally enable features based on hardware capabilities, and to provide macros for easier use. The most important macro is AUTOTILE(index, size) that implements automatic tiling of large input data. Automatic tiling ensures that the graphics driver will not reset the GPU for shaders that would otherwise take too long to execute. For each kernel invocation, AUTOTILE(index, size) generates a suitable work index, for all size pixels/elements. Both index and size must have the same type and can be vector types. Note that the maxsize parameter must be set in the node definition, otherwise autotiling is disabled. See bin/shaders/comp.inc for the implementation.
Naming conventions: Uniforms should be prefixed by in_, out_, u_, respectively to make them appear in the default UI (refer to Subsection 2.1). Add the readonly and writeonly keywords to inputs and outputs where applicable to catch misuse, and add the restrict keyword where appropriate -usually outputs can be safely marked as restrict.
Preferably use sampler uniforms instead of image uniforms for reading data, and texture(), textureLod() or texelFetch() instead of imageLoad() 1 . Do not add layout() qualifiers to uniforms unless necessary -the framework handles internal OpenGL details like binding points and uniform locations automatically if they are not specified. Note that the GLSL compiler optimizes away unused uniforms -they will not appear in the UI if they are not used. Refer to Figure 4 for a small GLSL shader that follows the guidelines outlined in this section. Plugins are implemented as dynamic libraries and can be used to extend the core with functionality that would be too complicated or too slow to be implemented in Lua. Its intended purpose is the extension of image format support and the addition of extra Lua functions by exporting a table of functions that is subsequently loaded by the core. Figure 5 shows a simplified plugin skeleton. API versioning and implementation details are automatically handled by the MAKE_PLUGIN_ * and * _FillHeader macros. Compared to the Lua API, the C++ API is more limited and has no direct access to GPU functionality. The MAKE_PLUGIN_IMPORT macro may be used to retrieve a list of function pointers supplied by the core, which comprises functions for error reporting, image format conversion, and interaction with the embedded Lua interpreter.
The number of available functions is currently very limited and will be extended in future. The current main use case is saving images, optionally converting them to a desired format if required by the target file format.
Refer to the included stb_image plugin for a comprehensive example -it implements both loading and saving images and also interacts with the core API.
To compile a plugin, make sure the C++ headers in src/api/*.h are visible to the compiler. No other headers or libraries are required aside from the C standard library.

A.1 Recognized definition table entries
• name: Human-readable name of the node. Appears in its title text area.
• category: Used as a prefix before the name in the "new node" menu. (Typically something like "compute", "render", "input", "output", but anything can be used) • desc: Longer description. Shown in the info panel.
• tags: Space-separated list of words that will be used by the quick node search function. Words from name, desc, category and author are already part of the search, but tags may be used to include additional keywords to improve searchability. • author: Shown in the info panel.
• references: Single string or table of strings. Treated as an URL and is made clickable in the UI.
• src: GLSL source code • file: File name to load GLSL source from • init(node): Function that is called whenever a node is initialized.
-Expected to set default values -Called when code is reloaded / user presses F5 -Possibly called multiple times throughout a node's lifetime • makeOutput(node, inputs, name): Function that must return a new object for a given output variable name.
-Automatically inferred when not present inputs is a table, indexed by the name of the input connector, and its associated input object (e.g. texture, buffer, etc) -Set to false if outputs are to be constructed in recalc() (Saves memory for custom recalc() implementations) • recalc(node, inputs): Function that is expected to fill/return output objects (i.e. that invokes the compute shader) inputs is same as for makeOutput() -Automatically inferred when not present -There are 3 ways to fill outputs: 1. The function itself adds entries to node.RESULTS[cname] = x, where cname is the name of the output connector. Nothing is returned from the function. 2. If the node has a single output connector: return resultObj. The name of the connector is inferred. 3. Return a table with results, e.g. {outTex1=x, outTex2=y, outBuf=z} to populate connectors outTex1, outTex2, outBuf. • serialize(node): Function called when serializing node state. Can return a table, string or number describing the node state. The returned value must be serializable. • deserialize(node, t): Function called when node state is to be restored, if this node has previously returned something from serialize(). t contains the value originally returned from serialize(). • drawUI(node): Called when a node is drawn.
-Whatever is drawn in this function is drawn directly on the node in the graph. The node is resized to fit.
-Automatically inferred when not present (default: draw widgets for uniforms prefixed with u_).
-If a true value is returned, node:onSomethingChanged() is called. • drawDetailUI(node): Called when a node is focused. Defines the detail panel content of the node.
-Whatever is drawn in this function is drawn into the detail panel when the node is focused.
-Automatically inferred when not present (default: draw outputs, if present).
-If a true value is returned, node:onSomethingChanged() is called. • drawOutput(node, name, obj): Called when drawing output name (usually into the detail panel).
-This function is useful to specialize the way an output is drawn.
obj is the associated output object.
-Call drawOutputDefault(node, name, obj) to forward drawing to the default handler, if no specialization required for a given obj. • openContextMenu(node): Called when the user right-clicks the node, just before the context menu is opened.
-This function is intended to prepare context menu entries before drawing. • drawContextMenu(node): Called when a node context menu is to be drawn.
-If this function is present, the node gets a blue border to indicate that there is an extra context menu. • update(node, dt): Called every frame.
dt: Time difference to the prev. frame, in seconds.
-Useful to run simulations or apply changes to a node as time passes. • outGen: B : Backup texture to system RAM after data upload. In case of a recoverable GPU crash, the texture is automatically restored. This is an internal flag that should normally not be used. The backup is NOT updated automatically when shaders write to the texture.
The default is an empty string. Any unrecognized characters are silently ignored.

A.5 Buffer flags string
String that may contain the following characters: r: Buffer content can be memory-mapped for reading. w: Buffer content can be memory-mapped for writing. u: Buffer content can be updated after initial creation (via buffer:uploadBytes()) p: Buffer mapping is persistent. Must be used with r or w. Data written by the GPU can be seen by the CPU and vice versa. B: Backup buffer contents to system RAM after data upload. In case of a recoverable GPU crash, the buffer is automatically restored. This is an internal flag that should normally not be used. The backup is not updated automatically when shaders write to the buffer, or the buffer is changed by writing to mapped memory. If this flag is present, the backup is updated when buffer:uploadBytes() is called or the buffer is first created.
The default is an empty string. Any unrecognized characters are silently ignored.

B Hands-on Example
This section provides a hands-on example that explains the process of creating a node in detail. For better debugging support, enable Debug/Developer mode in the settings in the top panel. After enabling it, in order to make the change fully effctive, reload all scripts by pressing F5.