Life: Status Update

I really don’t blog about my life often, but a lot of interesting stuff has been happening lately that I’d like to share.

First, I’ve just become an intern at Xamarin. I’ll be working on the documentation build system and integration into the website and the MonoDevelop IDE. I was hired after GSoC was over, following a recommendation from Miguel. I’m really looking forward to working with Xamarin; it’s an awesome team of skilled hackers.

Second, I’ve been accepted into university, starting February 1 (at the UCN). The Danish name for the course I’m taking is “datamatiker”. They call it computer science in English, but it’s actually much more practically oriented than that (i.e. actual software development rather than theory). I was accepted based on my ability in the software development field, as I don’t fulfill the formal requirements (yet). This means I’ll probably quit high school, as I’d rather pursue something software-related than natural sciences.

In other (less life-related) news, I’ve also been putting some time into writing a compiler infrastructure library. I’ll be posting more about the goals and directions of that later.

GSoC and the State of ILAsm

GSoC 2011 is now officially over, and I figure I should write some sort of status post.

First of all, the project is not completely done. I had anticipated this early on, and discussed with JB which parts I should focus on for the deadline. I did get a fair bit done for the deadline, but I wouldn’t quite say it’s ready for production use yet. I did, however, pass the final evaluation, as JB agreed to let me finish the project outside of GSoC (which I fully intend to do).

The project actually ended up being more involved than originally planned. The idea was to simply swap the PEAPI back end with Cecil, but I quickly realized that this was not quite as trivial as it seemed back then. I ended up ripping out the old code generation completely, emptying the code of every last parser production, updating the parser to .NET 4.0, and finally, adding the Cecil code generation back end in. You might think that all of this would have taken most of the entire GSoC period, but a lot of time was spent on spec reading too, not to mention figuring out weird quirks and corner cases in the Microsoft implementation of ILAsm. It’s scary just how much of the ILAsm language is poorly documented or not documented at all.

I’d like to extend my thanks to everyone in #cecil @ irc.gnome.org for helping me understand ILAsm, CIL, and the CLI in general.

I’ve also made some progress on the ILDasm front. It’s still missing a lot of features, but it does disassemble types and methods decently at this point. I’ll work more on it later; ILAsm comes first.

I recently switched ILAsm to a deferred type resolution model. This was JB’s idea, as a way to solve the generic parameter issue I blogged about earlier, and it’s worked nicely so far. This was probably the last major issue in the new ILAsm, so I expect to be doing more ILDasm work soon (there are only a couple of known ILAsm issues left).

During GSoC I also wrote a command line interface to the Mono soft debugger (SDB). It’s available here. I wrote it primarily because using MonoDevelop (which was the only existing interface to SDB at the time) as a debugger for a command line application was not very convenient. This debugger also has a few cool features such as decompilation of code with no source code (through ICSharpCode.Decompiler).

Overall, it’s been a very fun and educational program for me. I now know a lot of things about the CLI that I didn’t have the slightest clue about before, and I finally managed to contribute something really significant to Mono. Working with JB and the Mono community has been an awesome experience, as everyone’s very helpful and easily approachable. I would definitely do this again if I get the chance.

I’ll probably make a final blog post whenever I get this entire project merged into mainline Mono.

D: Building DMD and Phobos on Linux

I’ve recently been having a deeper look into the D programming language, because writing a JIT on top of .NET turned out to be extremely hard without interfering with the .NET runtime. I didn’t quite want to use C, as that’s too low-level for the stuff I’m doing, and C++ is just broken. D seems to be both languages done right, and so I settled on that.

You can program D perfectly fine with the latest Digital Mars D (DMD) release, but I ended up having to contribute some patches to the standard libraries, and figured I might as well build everything from source.

First, you’re going to need Git, as everything is hosted on GitHub.

Next, do the following clones:

In all of the makefiles, the MODEL variable specifies what bitness to build for. Here, I’m going to build for x86-64, so I’ll use 64. To build for x86, just use 32.

To build DMD, do:

cd /usr/src/dmd/src;
gmake -f posix.mak MODEL=64;

Everything should Just Work in the compile process. I’m using gmake because the makefiles are designed for GNU Make, and some systems don’t map make to that by default.

Unfortunately, DMD (and related repositories) don’t have any standard install target that we can use, so we’re just going to copy things into the file system. I’m using /usr/local/bin, /usr/local/lib, and /usr/local/include:

cp dmd /usr/local/bin;

Verify that you have a DMD matching your system’s bitness:

dmd;

This should print:

DMD64 D Compiler v2.055

DMD32 should be printed for a 32-bit installation.

You need to add a configuration file for DMD. Add the following to /etc/dmd.conf:

[Environment]

DFLAGS=-I/usr/local/include/d2 -L-L/usr/local/lib -L--no-warn-search-mismatch -L--export-dynamic

Now to build the core runtime libraries:

cd /usr/src/druntime;
gmake -f posix.mak MODEL=64 DMD=../dmd/src/dmd;

If the build succeeds, we can copy the resulting interface files to our file system:

mkdir /usr/local/include/d2;
cp -r import/* /usr/local/include/d2;

Next, we build Phobos, which is the standard library containing facilities for concurrency, regular expressions, I/O, signals, math, text manipulation, and so on:

cd /usr/src/phobos;
gmake -f posix.mak MODEL=64 DMD=../dmd/src/dmd;

Again, we copy library and interface files:

cp generated/linux/release/64/libphobos2.a /usr/local/lib;
cp -r std /usr/local/include/d2;
cp -r etc /usr/local/include/d2;

In the copy command for libphobos2.a, replace 64 with 32 if you built for x86.

With these in place, we should be able to build the additional D utilities:

cd /usr/src/dtools;
dmd ddemangle.d;
dmd rdmd.d;
cp ddemangle /usr/local/bin;
cp rdmd /usr/local/bin;

You should now have a fully functional installation of DMD! Feel free to post here if you run into trouble, or have some tip you’d like added to this tutorial.

I’d like to extend a thank you to everyone on #d @ irc.freenode.net for putting up with my questions. You guys are awesome. :)

ILAsm: The Mono Implementation

This time, I’ll write about some issues I faced during development of Mono’s ILAsm and how I got around them. I’ve had to make several changes that make Mono’s ILAsm stricter than MS.NET’s or outright incompatible.

First of all, we have generic parameters. In MS.NET’s ILAsm, !0 and !!0 are simply emitted as VAR 0 and MVAR 0 in the metadata, meaning that you could do something like:

.method public static !!0 Foo(!!1 bar)
{
    ldarg bar
    ret
}

Even though !!1 refers to a generic parameter that does not exist, MS.NET’s ILAsm will gladly emit it, and not even warn you. Due to an API “limitation” in Cecil, Mono’s ILAsm cannot do this. All GenericParameter objects must have an owner (i.e. the type or method definition/reference) and must be contained in that owner’s GenericParameters collection at the correct index. Therefore, Mono’s ILAsm would throw an error for the above code, since !!0 is out of bounds. This is a slightly annoying incompatibility, but there’s not much I can do about it other than patching Cecil. That’s only going to happen, though, if it won’t cause API breaks for version 1.0.

Second, properties like .file alignment, .imagebase, and .stackreserve aren’t fully supported. Cecil currently has no way of setting these on a module. ILAsm will check the values and error if they’re invalid, though.

Third, several native and variant types are not supported in marshal signatures. These include variant, void, syschar, decimal, date, objectref, nested struct and null, void, int64, uint64, unsigned int64, lpstr, lpwstr, safearray, hresult, carray, userdefined, record, filetime, blob, stream, storage streamed_object, stored_object, blob_object, cf, clsid, as well as pointers/references/vectors of these. The reason is that Cecil doesn’t expose any way to set these types. Either way, most of these are deprecated nowadays. All other native/variant types are supported.

Fourth, support for .data declarations is very limited (we actually only do a weak attempt at emulating them). Cecil has no way of emitting data constants, and they’re implementation-defined anyway. Currently, the only thing we do with them is copy them to the InitialValue property of field definitions where appropriate. This is far from correct, but it’s the best we can do. Either way, you should avoid using these declarations. They don’t make a whole lot of sense in managed land.

Fifth, we have no support for parsing System.Reflection-notation strings (yet). This means that things like custom marshalers aren’t supported (the syntax will simply be ignored). A type parsing API will supposedly be exposed in Cecil 1.0.

Sixth, support for declarative security syntax is limited. While we support the full syntax specified by the standard, we don’t support all of the syntax MS.NET’s ILAsm does (specifically, the syntax resembling verbal custom attribute initialization).

Seventh, exported types are unfinished because Cecil lacks a way to manipulate the File table directly, and because of the lack of a type parser.

Eighth, we can’t emit custom attributes on manifest resources and assembly references. This is a Cecil limitation which should be relatively easy to fix.

Lastly, the .vtable/.vtfixup/.vtentry directives are unsupported. Again, these will be implemented once Cecil has support for them.

There are some other incompatibilities, but they’re very subtle and you’re unlikely to ever encounter them. And even if you do, ILAsm will warn you.

Generally, the differences between the two implementations will only be encountered in obscure features that most users are highly unlikely to be using, or when attempting to use nonstandard syntax. Rule of thumb: Stay away from these things and you’re safe.

ReSharper and Code Contracts

I just ran into this today. It makes ReSharper shut up about possible null references when you use Code Contracts in your C#/VB projects.

The XML file provided there is not complete, however. I’ve added the ContractInvariantMethodAttribute and ContractClassForAttribute attributes to the annotations so that ReSharper won’t claim that your contract classes and invariant methods are unused.

The XML is here.

Just drop this into C:\Program Files (x86)\JetBrains\ReSharper\v5.1\Bin\ExternalAnnotations\mscorlib as Microsoft.Contracts.xml or so.

ILAsm: The Microsoft Implementation

So, we’re now 3 weeks into GSoC, and I figure I should start blogging about my work.

This post will be about the ILAsm standard (specified in ECMA 335) and the Microsoft implementation. There are several annoying differences and incompatibilities between the two. As part of figuring them out, I’ve been reading Expert .NET 2.0 IL Assembler and the Microsoft-annotated ECMA 335 standard.

The first difference is that MS.NET’s ILAsm does not respect the unsigned modifier on integer types. This means that an unsigned int32 will in effect be treated as an int32 (signed). I’ve chosen to break compatibility with Microsoft and correctly emit an unsigned integer, primarily because I think this is an outright stupid bug. Note that you can still get an unsigned integer by using uint32 with MS.NET. The latter form is preferred, anyway.

Second, MS.NET does not have the platformapi keyword. Instead, you have to use the winapi keyword, which makes your ILAsm source code unportable. In Mono’s ILAsm, we support both, in order to aid portability.

Third, MS.NET allows #line instead of .line for specifying source line information. This seems to be for compatibility reasons. Mono’s ILAsm supports both notations.

Fourth, MS.NET has a bunch of directives that just don’t exist in the standard. Specifically .file alignment, .imagebase, .language, and .namespace. We support all of them, though the first two currently have no actual effect. Note that use of .namespace is considered bad practice.

Fifth, MS.NET’s ILAsm doesn’t use .culture for specifying culture information, but rather .locale. This is directly against the standard, and the assembler won’t even recognize .culture. Mono’s ILAsm supports both notations.

Sixth, the MS.NET ILAsm doesn’t require the .hash directive; if it’s not specified, the hash will automatically be computed.

Seventh, the MS.NET ILAsm allows using “value class” in place of “valuetype” for indicating value types. This is relatively easy to handle, and Mono’s ILAsm does so. It is, however, considered bad notation.

Lastly, the MS.NET ILAsm allows specifying things like calling convention, type attributes, method attributes, field attributes, parameter attributes, and so on using flags(int32) notation. I’m sure this was done for a reason, but it seems like a great way to give developers the opportunity to make their programs unportable or impossible to run at all.

There are a lot of other incompatibilities, but they’re less annoying and easier to work around/implement.

That’s it for the this post. Next time: Implementation details of Mono’s ILAsm.

.NET and Distributed Concurrency

This is just going to be a bit of a brain dump.

I’ve been talking with my friend from Microsoft for the last couple of days about .NET and concurrency. Me being the language geek that I am, I was arguing that Erlang is, and will continue to be, the better choice for highly concurrent systems. .NET currently does not offer any reasonable equivalent to Erlang’s concurrency model. Sure, we have message-passing frameworks, we have locks, we have concurrent collections, we have the TPL (and TDF), but it all boils down to the same thing: Threads. Threads are evil.

A thread is no better than an OS process. In order to create a thread, a 1 MB stack (on Windows, anyway) needs to be allocated, and the OS needs to schedule the thread like it would schedule a process. This is heavier than it might sound. With Erlang, you can have millions of processes running at almost no cost; with OS processes you’d die at a few thousands.

Well, this happens to be so because Erlang processes are designed to be small. A few hundred bytes is all it takes to run code in an Erlang process. This is huge difference from OS threads/processes. There is one primary reason why Erlang is so economic with process resources: It uses the Actor model. If you’re an OO developer, you could think of actors as objects. In Erlang, to invoke a method on these objects, you’d send a message that the objects process completely isolated from each other, and thus, concurrently. In other words, by using messaging as the means of communication between objects, you get concurrency for free. This is quite different from how we view objects and threads from a language such as C#. In C#, we often have some sort of primary object that runs in a separate thread, or in the thread pool, which takes care of running a bunch of child objects, because it’s simply too expensive and inefficient to spawn too many threads or add too many callbacks to the thread pool. In Erlang, this is not the case; in fact, you’re encouraged to run thousands or even millions of small processes in order to be able to scale your application. There is a clear difference between Erlang and, for example, .NET here: Distributed concurrency is first-class in Erlang.

.NET is a fine platform and I love to use it. But the problem with .NET is that concurrency is something that wasn’t really regarded as being important. Sure, .NET has threads, as well as high-level abstractions on top of them, such as TPL, TDF, PLINQ, and so on, but there is virtually no support in the runtime itself for distributed concurrency. We don’t have location-transparent processes, we don’t have load-balancing, we don’t have lightweight processes, we don’t have continuations, we don’t have code hot swapping, we don’t have fault-tolerance… The list goes on. And it doesn’t look like we’re getting it anytime soon, either. To Microsoft, this apparently seems too “academic” and “unrealistic”. Apparently, Ericsson, Facebook, RabbitMQ, T-Mobile, and Telia are not realistic companies.

I know exactly what the problem is, however. The problem is that Erlang is not popular. If it’s not popular, surely nobody needs it to solve their problems, and it’s just an academic toy. The usual excuse for not implementing something. You can probably guess what I’m getting at: Microsoft has recently done almost nothing innovative when it comes to development tools and frameworks. They spend their time mimicking what the open source community or other companies have provided for years. ORMs have existed for a long, long time, yet Microsoft is still playing the catch-up game with Entity Framework. Cloud services have existed for years, yet Microsoft had to push Azure forward. IoC and extension APIs have existed for almost a decade, yet Microsoft is working on MEF. MVC web frameworks have existed for years, yet Microsoft is working on ASP.NET MVC. The only Microsoft division I can say is truly rolling out new tech is Microsoft Research, with their work on F#, Code Contracts, Pex/Moles, Singularity, etc.

It’s sad. Microsoft doesn’t seem to dare encourage any sort of paradigm shift these days, and shamelessly dismisses new technologies as “academic” if they’re not popular and in wide use. What they don’t seem to realize is that all they’re doing is reinventing wheels and that innovation is what needs to happen. Now.

Mono Summer of Code

Around a month ago, I turned in a GSoC 2011 proposal to rewrite Mono’s ILAsm (IL assembler) to use Cecil as its code generation back end, as well as write a managed ILDasm (IL disassembler), as Mono’s current monodis is written in C and relies greatly on the runtime itself. Yesterday, I received word that the proposal has been accepted, and that I’m going to participate in GSoC!

My friend, Wolfgang Steffens, who’s the primary developer on SL#, also had a GSoC project accepted, which seeks to incorporate SL# in the Axiom game engine. This would be a great way for us to draw more attention to the library, and also makes shader development for Axiom users much easier.

There are a lot of other interesting projects going on with Mono this year – check them out here.

SL# and Mono

It turned out that SL# 1.4 did not run on Mono as we had expected. Due to a bug in Mono’s CancellationToken that caused a new instance to be canceled by default, ICSharpCode.Decompiler didn’t actually decompile methods. This issue was seen in both Mono 2.8 and 2.10.1. A bug has been filed, and we worked around the issue for now.

Another bug we encountered (which is only relevant for SL# master) was invalid IL generation, which has also been fixed. Curiously, Microsoft’s CLR actually executed the IL correctly, even though the local variable had an incompatible type… Mono was much more helpful here, in that it threw an InvalidProgramException. As said, the latter issue is only relevant for master, not 1.4.

We might release a 1.4.1 package of SL# in a few days that addresses the first issue. If you don’t want to wait, grab this commit and apply the CancellationToken workaround.

Edit: Oh, and this seems to be my 100th blog post! Yay!

SL# 1.4 Released

After less than a month with SL# 1.3, we’ve decided to release version 1.4.

This is probably the release with the most changes that we’ve done so far:

  • Full support for all GLSL data types (including matrices)
  • Support for in and inout through C#’s ref and out keywords
  • Switched to ICSharpCode.Decompiler instead of our homegrown one
  • Support for many more language constructs due to switching decompiler engine
  • Support for hundreds of GLSL functions that we didn’t have before
  • Shader.DebugMode now actually works as expected
  • Better type checking/verification when translating
  • We now expose a sane standalone API (GlslTransform) for GLSL translation, should you want to use it
  • We now only ship one assembly, IIS.SLSharp.dll
  • Lots of cleanups to the public API; we no longer expose any internals
  • Lots and lots of bug fixes

That’s a fairly impressive list of changes if you compare to our previous ones! We’ve been incredibly productive with SL# lately, and GitHub certainly has been a booster, with its new intuitive Issues 2.0.

As usual, the Git repository sits here (note that master is quite ahead of this release). Downloads can be found here, and the updated NuGet package has been published.

If you run into any issues, please report them on the issue tracker as always.

Lastly, the reason we chose to release 1.4 so quickly is because we’re doing heavy work on making SL# engine-agnostic. This means that 1.4 will be the last release that depends on OpenTK. Future versions (2.0 and on) will support various engines (OpenTK, Axiom, XNA, SlimDX) as well as several languages (GLSL, HLSL, Cg) through bindings. This new architecture should also make it easier to add more native-feeling support for F# (through function quotations). This of course means that we’ll be doing massive (breaking) changes and refactoring to our API and possibly even our syntax, hence the major version bump.

With that said, enjoy!