Axum: Dataflow Networks

As per request on Twitter, this time we’ll explore what Axum’s dataflow networks can do for us.

Dataflow, in its simplest form, is the process of data passing through a set of points that act upon or transform the data. Points only run when data actually passes through, meaning that we can declaratively set up our networks before even using them.

The great thing about dataflow is that it is easily parallelizable; each point in the dataflow network can execute completely in parallel with others – provided that the data we send through is either immutable or always cloned, of course. In the case of Axum, you can only send primitive types (passed by value), immutable types (such as string), and schemas (which are cloned), and in addition, points in the network may not modify any shared state (with the exception being the unsafe keyword, but the name of the keyword implies the level of caution that should be exercised when using it).

Before introducing the language syntax for dataflow, we need to introduce the concept of interaction points. There are two types of these: Sources and targets; IInteractionSource and IInteractionTarget. A source can pump out data, while a target can receive and act upon it. Most interaction points are both sources and targets; IInteractionSourceAndTarget.

Five main interaction points exist:

  • OrderedInteractionPoint: A simple buffering point; data flows in and out, no further ado. As the name implies, message order is preserved
  • ImmediateValue: A source point providing a predefined constant value
  • WriteOnceInteractionPoint: A target/source point that starts out empty, and allows a single write, after which its value can never change
  • Future: A variant of WriteOnceInteractionPoint which lazily provides a value (the value is computed once requested), and does not allow being written to
  • SingleItemInteractionPoint: Similar to OrderedInteractionPoint, but holds only one value at a time; if a value has not been consumed yet, and a value is being pushed to this point, it will be rejected until the currently held value has been consumed

An abstract base class, InteractionPoint, exists to ease implementation of target/source points. By the way, nothing prevents you from writing your own interaction points; Axum’s language syntax will fully support custom interaction points, as it relies solely on the aforementioned interfaces.

All of the interaction point types can be found in System.Concurrency.Messaging.

It is also important to mention that channel ports are interaction points. Suppose you have a channel type like so:

public channel CAddition
{
    input int Num1;
    input int Num2;
    output int Sum;

    Start: { Num1 -> GotNum1; }
    GotNum1: { Num2 -> GotNum2; }
    GotNum2: { Sum -> End; }
}

If you access CAddition::Num1 and CAddition::Num2, they will appear as target interaction points, while CAddition::Sum will appear as a source interaction point. This is how the outside world sees CAddition. However, to the implementor, i.e. the agent implementing the channel, this is reversed; Num1 and Num2 are source interaction points, and Sum is a target interaction point. The implementing end of CAddition can be reached through CAddition.implements. So in the case of the implementor, we’d be talking about CAddition.implements::Num1, CAddition.implements::Num2, and CAddition.implements::Sum.

Now for the cool stuff: Axum’s language support for dataflow. Six operators are currently implemented in the compiler:

  • ==>: The forward operator; forwards data from the left-hand source to the right-hand target
  • –>: The forward-once operator; forwards one piece of data from the left-hand source to the right-hand target, and then disconnects
  • >>-: The multiplex operator; forwards data from the left-hand collection of sources to the right-hand target
  • &>-: The combine/join operator; forwards an array of data, consisting of exactly one value from each source in the left-hand collection per message, to the right-hand target
  • -<<: The broadcast operator; forwards data from the left-hand source to the right-hand collection of targets
  • -<:: The alternate operator; forwards data from the left-hand source to one target in the right-hand collection per message, in round-robin order

All of these operators return a disposable expression which disconnects the constructed network. In most scenarios, you don't actually need to care about this, and you can safely ignore it.

Futhermore, there is the , –>, >>-, and <–. These so happen to be the easier ones to remember – at least to me.

Let's return to channels for a bit: Did the explanation earlier confuse you? Think about the previous Rand48 example. We used –> and ==> to map PrimaryChannel::Seed and PrimaryChannel::Next to functions. But those ports were input ports, and –> and ==> take a source as their left-hand operand and target as right-hand operand! In fact, we weren’t referring to the using side of the CRand48 channel (which is what the outside world — consumers — see), but the implementing side, meaning that the direction of the ports had been reversed; PrimaryChannel always refers to the implementing side. So we were actually telling Axum how to act upon incoming data using a (very simple) dataflow network!

On a slightly related note, the Signal type is often used for ‘parameterless’ channel ports. Signal is very much like the unit type in F#: It indicates the lack of any real value. The void type could obviously not be used in messages, and as such, Signal is used (to quote Niklas Gustafsson) as a poor man’s unit type. It is also useful to build dataflow networks that act solely on signals rather than data – a sort of event pipeline. Note that Signal is a struct with absolutely no contained data. As such, it is extremely cheap to pass around – even cheaper than an integer.

As vaguely mentioned above, Axum allows you to use functions as target/source interaction points in dataflow expressions, for example:

integers ==> ((int x) => x * x) ==> PrintInteger;

Of course, functions cannot be used as source-only interaction points, as there would be no meaningful way to pull data out of them (except for parameterless functions, but the usefulness of this is arguable). They can, however, be used as target/source and target-only points. However, if you forward to a target-only function which returns void, you can obviously not continue the network flow. If you wish to continue the network, but without a real value, make your function return Signal.

Axum also has support for network declarations. These are reusable network types that you can instantiate and use much like channels, or as interaction points (if the network has only one input and output port). It is the aspiration of the Axum developers that people will use network declarations to build libraries, rather than classes.

I won’t ciover network declarations here; they deserve more in-depth investigation, since they support single-base inheritance, generics, and multiple input/output ports, and at the same time behave much like regular classes.

By the way, two other dataflow operators are specified, but not currently implemented:

  • ,,,: The tuple operator; takes a left-hand target and a right-hand source and forms a target/source point out of them. The two are not linked, but the resulting expression can be used seamlessly in a dataflow expression
  • ???: The filter operator; creates a buffer point that only propagates data that passes the given predicate

The tuple operator is useful when you have an input and an output port that you want to use in a network:

source ==> (chan::In ,,, chan::Out) ==> target;

The filter operator is useful when searching:

strings ==> ??? ((string str) => str.StartsWith("Axum")) ==> target;

It’s currently unknown whether these two operators will make it into the Axum compiler, or if they’ll be dropped. Their usefulness, in any case, is apparent.

With all this knowledge, we can create powerful concurrent dataflow networks in a purely declarative fashion. As you’ve seen, dataflow operators already make handling of channel ports extremely elegant. You could even wire channel input ports into full-blown dataflow networks.

One thought on “Axum: Dataflow Networks

  1. Pingback: Money Management: Storage Area Networks For Dummies » Money Management

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s