Overloaded Functional Types

As some of you may know, I’m working on a compiler infrastructure written in D. While the framework itself is rather high-level (it’s very close to actually being half of a compiler front end), an actual compiler implementation is needed to see if the whole thing will work out as well as I hope.

Thus, Tony and I set out to design a simplistic functional programming language. It is heavily inspired by Haskell and OCaml, and has many similarities, though it does not have much in the way of OOP. In short, it’s a language with first-class functions, generics, records/discriminated unions, and finally, treats everything as a value (much like most FP languages). We call it Cesura.

So far, nothing of what I’ve mentioned is particularly new or innovative. In fact, very few FP languages don’t have the features I mentioned. However, one interesting feature that we thought up (and which, to our knowledge, hasn’t been used in any other language) is what we call overloaded functional types. I’ll spare you from any category theory and jump straight to the problem.

You’re given this code:

sqr x : Int -> Int = x * x
sqr x : Float -> Float = x * x
f = sqr

The problem is immediately obvious: Which of the two functions does f refer to?

You could perhaps solve the problem by doing:

f : Float -> Float = sqr

This, however, completely defeats type inference and has lots of corner cases. If you passed sqr directly to a function, what would you do then?

We came to the conclusion that in order to maintain type inference and still have overloaded functions, we’d need to carry the overloads directly in the type system. This means:

f = sqr // The type of f is now: Int -> Int | Float -> Float

Here, we introduce the concept of an overload set. The type of f is a function that has an overload set consisting of Int -> Int and Float -> Float.

There are several advantages in this sort of type system. One of them is the fact that any function value is implicitly convertible to any function type whose overload set is a subset of the overload set of the function value’s function type.

But that’s not all. You might ask: What happens if I try to call f (from above)? How will the compiler know which function in the overload set I intend to call? The answer to this question is actually very simple:

x = f 2
y = f 4.0

Given this code, look at what’s passed to f in the first call. Clearly, it’s an integer. Thus, we know that a call to the Int -> Int overload was intended. In the second call, we’re passing a floating-point value, and therefore we’re calling the Float -> Float overload. So, x is of type Int and y is of type Float.

You might ask why we didn’t just choose to incorporate type classes, considering I claim that we’re heavily inspired by Haskell. The answer to this boils down to simplicity: Cesura needs to be a simple language utilizing as many features of the MCI framework as possible. We’re creating a simple, practical language, not a language utilizing extremely complex (but certainly beautiful) type system techniques.

Lastly, we’d really like some input on this approach. We’re still not entirely sure that it is the right way to go about this, though it does seem to be the most promising solution that isn’t overly complicated to implement. Also, if you know of some other language that uses this technique, please let us know!

Three Reasons I Love Functional Programming

I realize a lot of people reading my blog are imperative programmers who haven’t looked into functional programming before. I figured I’d write a bit about why I find functional programming interesting and worthwhile.

You might think that I like functional programming because of first-class functions. You might think that I like it because of purity and immutability. Hell, you might even think that I like it because of natural recursion and tail calls. But no. Below are the three reasons I like the paradigm.

Everything is an expression: Typing return everywhere is a pain in the ass. The fact that the last expression in a function will be its result really cuts down on typing and reading. Additionally, this means that what you would usually call an if statement is actually an if expression. This implies that the last expression in the true/false paths are the result of the if expression. The same goes for match expressions, loop expressions, and so on. This makes it possible to write very clear code compared to the imperative equivalent.

Pattern matching: Basically the switch statement on steroids. Some (me being among them) will argue that this is what the switch statement always should have been, and that the switch statement in its current form is a language design mistake. Being able to match any value against any value makes for much more readable code, as compared to an if/else forest.

Discriminated unions: Ever wanted C unions but with less insanity and more safety? DUs allow you to use any type for your cases and with well-defined behavior. That alone is already great, but DUs truly shine when combined with fluent pattern matching syntax.

Most of the features I described here are present in all functional languages in one form or another, but they are particularly important in F# and Erlang which are the functional languages I take most interest in.

It might surprise some people that I didn’t list first-class functions and immutability. I like those concepts too, but I don’t think they’re what makes functional programming truly beautiful. This is ironic, because they’re probably the most basic traits of a functional programming language. But, in the time I’ve worked with functional languages, I haven’t found those two features to be the most appealing to me (don’t take that the wrong way; they are still great).

The Actor Model and Message-Passing

I’m sure people who have been following me and talking to me have noticed how I’ve been looking so much into actor/agent-based concurrency and message-passing – how I’ve been talking about Axum, TPL Dataflow, Erlang, and some of my homebrewn equivalents.

I believe the future of concurrency lies in message-passing. My reason for saying this actually boils down to OOP. In OOP, every method call is a message – both the request and the result.

For example, whenever you do:

var result = calc.Add(40, 2);

You are, in OOP terms, saying:

  1. Prepare a message to inform calc that I want to call its Add method
  2. Add the values 40 and 2 to the message
  3. Send the message to calc
  4. Wait for calc.Add to do its thing
  5. Receive the result through a reply message

Something should become obvious to you when I spell it out like this: Why should the thread be stalled waiting for the result to be sent back before going on with other tasks? Surely it could just go on doing something else while calc finishes doing its very heavy computation.

So the point I’m trying to get across is: Whenever you call a method and whenever you get the value of a property, you’re sending a message (or several). This very simple fact has been largely ignored by mainstream developers and frameworks up until recently. Previously, it was all about trying to parallelize algorithms and using locks to synchronize where you just couldn’t parallelize; this obviously doesn’t scale. There’s also asynchrony. For a long time, people have been writing asynchronous code in the form of lambda expressions or anonymous methods, or even worse, as separate methods, passed to some API that starts an asynchronous operation.

You probably saw this one coming: C# 5.0 pretty much solves this entire problem. With the new async and await keywords, we can easily express asynchronous code in its logical form. However, these two keywords alone don’t give us all the power we need. While they allow us to treat every method call as sending a message and retrieving the result, we have no real way to control how execution happens, at what degree of parallelism, in what synchronization context, etc. This is where the new API in .NET vNext, TPL Dataflow, comes in.

TPL Dataflow allows you to set up a bunch of so-called source and target blocks, and then link them together, so that whenever some data is posted to the initial block, the data will automatically be passed through this “dataflow network”. That’s all nice and fancy, but alone, it isn’t much different from any other message-passing framework. C# 5.0 augments TPL Dataflow with the async and await keywords, so that we can write code like this:

agent.Post(Tuple.Create(40, 2));
var result = await agent.Receive();

This looks a bit more awkward than the first code example above, but the advantage here is that we can set up our agents just the way we want to. We can specify the task scheduler they use, how many messages they process per task, what degree of parallelism they utilize, and so on, giving us absolute control over program flow. For more information, see the docs on the TC Labs page.

F# has solved the problem of asynchrony and concurrency differently, through asynchronous workflows and generic agents. For more about that, check Don Syme’s post on async and parallel design patterns in F#.

The cool thing about actors/agents and message-passing is that we get implicit concurrency without a whole lot of effort. We just write our code the way we naturally would, and watch it scale. I say naturally because in the real world, communication happens through messages – why wouldn’t it in programming, too?

Still, we’re not quite there yet when it comes to language support. Erlang/OTP is probably the single language/platform that has the best support for actors, but we have no means of interacting with Erlang/OTP from .NET currently. There’s Axum too, but given that the language effort has stopped, that’s a no-go.

It really is incredible that the most sane approach to concurrency has been ignored for so long. It seems as though, historically, we went from message-passing to parallel algorithms, and then back to message-passing again. How anyone could have believed that parallel algorithms were going to be more scalable is beyond me, but languages like Erlang and Axum have shown us just how important message-passing is.

A collection of links related to this and previous posts:

Checklist for Type Design

Please, please, please consider the following when you design your types:

  • Is it an internal type? Does it make sense to expose it to the public?
  • Does it make sense to inherit the type? Should it be open, abstract, or sealed?
  • Does it support value equality? Should you implement IEquatable<T>?
  • Does it support comparison? Should you implement IComparable<T>?
  • Should you implement operators for the above points?
  • If you do implement the above three points, implement GetHashCode and Equals!
  • Does the type have a textual representation? Should you implement ToString?
  • Is it a small piece of data? Should it be a value type?
  • Should it be immutable? (If it’s a value type, in 99% of cases, it should!)

These guidelines of course apply to all .NET languages; C#, F#, VB.NET, etc.

F# and XBuild (Debian)

Since some folks from ##fsharp on irc.freenode.net requsted it, I’m going to put up the stuff I actually remember about getting F# projects on Linux/Mono to compile under XBuild. This guide assumes that you have successfully built and installed the F# tooling from source.

Most of the stuff that needs to be done is hacking. A sane setup shouldn’t need the amount of hackery I’ve done to get this to work.

Also, I’d like to thank shana and mhutch @ irc.gnome.org for their help with getting this done. As far as I know, shana is also working on getting some better Debian packaging put together (so that the package on CodePlex isn’t totally broken).

The first hack we’re going to do is creating the directory where XBuild expects to find Microsoft.FSharp.targets. If your Mono installation is prefixed to /usr/local, this will be /usr/local/lib/mono/Microsoft F#/v4.0. So go ahead and create that.

Now we’re going to throw in the actual MSBuild file. We just grab it off the source repository, here:

https://github.com/fsharp/fsharp/raw/master/lib/bootstrap/4.0/Microsoft.FSharp.targets

With this file in place (in /usr/local/lib/mono/Microsoft F#/v4.0), XBuild will now have something usable for building F# projects.

You might get an error regarding _ComputeNonExistentFileProperty when trying to build a project. If so, simply locate it in /usr/local/lib/mono/4.0/Microsoft.Common.targets, and uncomment it. It does no harm.

Now, back to hacking: We need to create a series of symbolic links to the F# runtime libraries in /usr/local/lib/mono/Microsoft F#/v4.0. Ideally, this should be fixed in Microsoft.FSharp.targets at some point, but my knowledge about MSBuild is quite limited, so I haven’t dared to do an attempt.

Anyhow, in the aforementioned directory, ensure that you have a layout like so:

FSharp.Build.dll –> /usr/local/lib/mono/4.0/FSharp.Build.dll
FSharp.Core.dll –> /usr/local/lib/mono/4.0/FSharp.Core.dll
FSharp.Core.dll.optdata –> /usr/local/lib/mono/4.0/FSharp.Core.dll.optdata
FSharp.Core.dll.sigdata –> /usr/local/lib/mono/4.0/FSharp.Core.dll.sigdata

This should make XBuild happy. However, I haven’t actually tested this; I’m just recalling from memory – so, feedback is welcome!

Edit: Tim Robinson mentions that the instructions are the same on OS X, except that Mono will be installed in /Library/Frameworks/Mono.framework/Versions/Current instead of /usr/local.

F#: Inline IL

Finally a .NET language that lets me do this crazy stuff!

module Arithmetic

let inline add x y = (# "add.ovf" x y : int #)
let inline sub x y = (# "sub.ovf" x y : int #)
let inline mul x y = (# "mul.ovf" x y : int #)
let inline div x y = (# "div" x y : int #)

It is deprecated, though, so it might be removed in the future. Also, you have to disable warning 42 (was this number a coincidence?) if you don’t want the compiler yelling at you.

And, of course, it should be noted: Don’t do this. Seriously. It’s pointless. There are only very rare cases where there is a valid reason to use inline IL, such as optimizing very tricky managed code that the runtime isn’t allowed to make assumptions about during the JIT process. Even then, think twice; sacrificing readability for performance is almost always a bad idea.

F#: Y Combinator

Functional code!

let y f =
    let g = ref (fun _ -> null)
    g := f (fun a -> !g a)
    !g

Obviously, I’d like to not have the null constraint on the type parameter ‘b, though (implied by g). Need to figure out a workaround.

Update: And there we go! It’s not pretty, but it works:

let y<'a, 'b> (f : ('a -> 'b) -> 'a -> 'b) : 'a -> 'b =
    let g = ref null
    g := upcast f (fun a -> (!g :?> 'a -> 'b) a)
    !g :?> 'a -> 'b

.NET: Thoughts on Portability

I was recently having a discussion with a few folks over at #wcell on QuakeNet about portability of programming languages (specifically the classic C/C++ versus .NET languages like C# and F#).

One could argue that C/C++ is the most portable because it will compile anywhere as long as it adheres to the standard. But then, C# and .NET are standardized too, are they not? And the standards are open, are they not? Just like C/C++! Now, of course, Microsoft, who came up with the standards, did implement the C# compiler and the .NET runtime on Windows, which then became the “de facto” standard implementations.

But, what hinders anyone from implementing the .NET runtime on any platform? Think about it; the only reason C/C++ is cross-platform is because the standards are implemented on different platforms (albeit with slight deviations at some points).

In the end, you’re still relying on tools native to the platform for your code to work on it. The only difference is that .NET programs can be compiled on Windows and still execute on Linux (for instance), while C/C++ programs need to be compiled on the specific platform. And honestly, which process is the most convenient? I don’t need to point that out, do I? Personally, I absolutely hate having to cross-compile.

Just some food for thought.