Zor's Blog
Seeing Sharp

Jul
19

This time, I’ll write about some issues I faced during development of Mono’s ILAsm and how I got around them. I’ve had to make several changes that make Mono’s ILAsm stricter than MS.NET’s or outright incompatible.

First of all, we have generic parameters. In MS.NET’s ILAsm, !0 and !!0 are simply emitted as VAR 0 and MVAR 0 in the metadata, meaning that you could do something like:

.method public static !!0 Foo(!!1 bar)
{
    ldarg bar
    ret
}

Even though !!1 refers to a generic parameter that does not exist, MS.NET’s ILAsm will gladly emit it, and not even warn you. Due to an API “limitation” in Cecil, Mono’s ILAsm cannot do this. All GenericParameter objects must have an owner (i.e. the type or method definition/reference) and must be contained in that owner’s GenericParameters collection at the correct index. Therefore, Mono’s ILAsm would throw an error for the above code, since !!0 is out of bounds. This is a slightly annoying incompatibility, but there’s not much I can do about it other than patching Cecil. That’s only going to happen, though, if it won’t cause API breaks for version 1.0.

Second, properties like .file alignment, .imagebase, and .stackreserve aren’t fully supported. Cecil currently has no way of setting these on a module. ILAsm will check the values and error if they’re invalid, though.

Third, several native and variant types are not supported in marshal signatures. These include variant, void, syschar, decimal, date, objectref, nested struct and null, void, int64, uint64, unsigned int64, lpstr, lpwstr, safearray, hresult, carray, userdefined, record, filetime, blob, stream, storage streamed_object, stored_object, blob_object, cf, clsid, as well as pointers/references/vectors of these. The reason is that Cecil doesn’t expose any way to set these types. Either way, most of these are deprecated nowadays. All other native/variant types are supported.

Fourth, support for .data declarations is very limited (we actually only do a weak attempt at emulating them). Cecil has no way of emitting data constants, and they’re implementation-defined anyway. Currently, the only thing we do with them is copy them to the InitialValue property of field definitions where appropriate. This is far from correct, but it’s the best we can do. Either way, you should avoid using these declarations. They don’t make a whole lot of sense in managed land.

Fifth, we have no support for parsing System.Reflection-notation strings (yet). This means that things like custom marshalers aren’t supported (the syntax will simply be ignored). A type parsing API will supposedly be exposed in Cecil 1.0.

Sixth, support for declarative security syntax is limited. While we support the full syntax specified by the standard, we don’t support all of the syntax MS.NET’s ILAsm does (specifically, the syntax resembling verbal custom attribute initialization).

Seventh, exported types are unfinished because Cecil lacks a way to manipulate the File table directly, and because of the lack of a type parser.

Eighth, we can’t emit custom attributes on manifest resources and assembly references. This is a Cecil limitation which should be relatively easy to fix.

Lastly, the .vtable/.vtfixup/.vtentry directives are unsupported. Again, these will be implemented once Cecil has support for them.

There are some other incompatibilities, but they’re very subtle and you’re unlikely to ever encounter them. And even if you do, ILAsm will warn you.

Generally, the differences between the two implementations will only be encountered in obscure features that most users are highly unlikely to be using, or when attempting to use nonstandard syntax. Rule of thumb: Stay away from these things and you’re safe.

Jun
26

I just ran into this today. It makes ReSharper shut up about possible null references when you use Code Contracts in your C#/VB projects.

The XML file provided there is not complete, however. I’ve added the ContractInvariantMethodAttribute and ContractClassForAttribute attributes to the annotations so that ReSharper won’t claim that your contract classes and invariant methods are unused.

The XML is here.

Just drop this into C:\Program Files (x86)\JetBrains\ReSharper\v5.1\Bin\ExternalAnnotations\mscorlib as Microsoft.Contracts.xml or so.

Jun
25

So these are some ideas I’ve thinking about for a good while, and I’m finally going to do something with them.

A good while back, I blogged about how bytecode is evil. This post will explain what I intend to do with the AST idea.

The project I’m working on consists of five major components:

  • A new executable format consisting of binary-form ASTs
  • A managed JIT compiler
  • A new assembly-like intermediate language
  • An abstract bare-bones type system
  • An AST/assembly optimization layer

The new executable format will be similar to the CIL format, but without all the PE bloat, and with one major difference: Code is stored as binary-serialized ASTs rather than bytecode. This allows easy introspection of compiled code, opening the doors for many interesting scenarios, such as translation from one language to another at runtime. I’ve chosen to call the format CAST (Compiled Abstract Syntax Trees).

To make the AST format actually interesting, a new JIT compiler needs to be implemented. This JIT compiler will understand the AST and operate on it for things such as optimization. For compatibility reasons, the AST will have a sort of RawCodeNode that contains plain assembly language (something similar to CIL but with registers). This is useful when one wants to emit non-AST code such as CIL/ILAsm, Java bytecode, Erlang BEAM, etc. I’ve chosen to call this JIT compiler Mono.Jit.

The new assembly-like intermediate language will be similar in nature to CIL. However, unlike CIL, this language will be based on registers and a stack, rather than locals and a stack. While the CIL approach is generally much prettier here, it’s less abstract and in some cases it is impossible to translate code to CIL (such as x86 machine code). I’m not sure what to call this yet, so for now, I’ll refer to it as IAL (Intermediate Assembly Language).

For CAST, I’ll eventually need to implement a compiler/decompiler, but for now, plans are to create a fluent API to create CAST code from .NET land. For IAL, I’ll just make a simple assembler/disassembler.

Now, in order for the AST to have any sort of meaning, a type system is needed. The type system in Mono.Jit is extremely simple; it supports only fields and methods on types. The JIT doesn’t need to know anything else in order to emit code. I do, however, plan to make an interface that allows giving the JIT optimization hints (such as “this type is completely immutable”, “this method makes no modification to members”, etc.). The JIT’s type system currently supports generics (types and methods), but I’m strongly contemplating delegating this to code loaders unless there is a compelling reason for the JIT to be aware of them.

Lastly, an optimization layer needs to be written. This layer will primarily concern itself with the AST input to the JIT, because that’s the entire point of this project. As mentioned, IAL is just a fallback for when you don’t have an AST to feed to the JIT. I don’t have any particular plans for optimization of IAL other than simple stuff like constant folding/propagation.

That’s it for now. More details will come as I write more code.

Jun
09

So, we’re now 3 weeks into GSoC, and I figure I should start blogging about my work.

This post will be about the ILAsm standard (specified in ECMA 335) and the Microsoft implementation. There are several annoying differences and incompatibilities between the two. As part of figuring them out, I’ve been reading Expert .NET 2.0 IL Assembler and the Microsoft-annotated ECMA 335 standard.

The first difference is that MS.NET’s ILAsm does not respect the unsigned modifier on integer types. This means that an unsigned int32 will in effect be treated as an int32 (signed). I’ve chosen to break compatibility with Microsoft and correctly emit an unsigned integer, primarily because I think this is an outright stupid bug. Note that you can still get an unsigned integer by using uint32 with MS.NET. The latter form is preferred, anyway.

Second, MS.NET does not have the platformapi keyword. Instead, you have to use the winapi keyword, which makes your ILAsm source code unportable. In Mono’s ILAsm, we support both, in order to aid portability.

Third, MS.NET allows #line instead of .line for specifying source line information. This seems to be for compatibility reasons. Mono’s ILAsm supports both notations.

Fourth, MS.NET has a bunch of directives that just don’t exist in the standard. Specifically .file alignment, .imagebase, .language, and .namespace. We support all of them, though the first two currently have no actual effect. Note that use of .namespace is considered bad practice.

Fifth, MS.NET’s ILAsm doesn’t use .culture for specifying culture information, but rather .locale. This is directly against the standard, and the assembler won’t even recognize .culture. Mono’s ILAsm supports both notations.

Sixth, the MS.NET ILAsm doesn’t require the .hash directive; if it’s not specified, the hash will automatically be computed.

Seventh, the MS.NET ILAsm allows using “value class” in place of “valuetype” for indicating value types. This is relatively easy to handle, and Mono’s ILAsm does so. It is, however, considered bad notation.

Lastly, the MS.NET ILAsm allows specifying things like calling convention, type attributes, method attributes, field attributes, parameter attributes, and so on using flags(int32) notation. I’m sure this was done for a reason, but it seems like a great way to give developers the opportunity to make their programs unportable or impossible to run at all.

There are a lot of other incompatibilities, but they’re less annoying and easier to work around/implement.

That’s it for the this post. Next time: Implementation details of Mono’s ILAsm.

Jun
06

This is just going to be a bit of a brain dump.

I’ve been talking with my friend from Microsoft for the last couple of days about .NET and concurrency. Me being the language geek that I am, I was arguing that Erlang is, and will continue to be, the better choice for highly concurrent systems. .NET currently does not offer any reasonable equivalent to Erlang’s concurrency model. Sure, we have message-passing frameworks, we have locks, we have concurrent collections, we have the TPL (and TDF), but it all boils down to the same thing: Threads. Threads are evil.

A thread is no better than an OS process. In order to create a thread, a 1 MB stack (on Windows, anyway) needs to be allocated, and the OS needs to schedule the thread like it would schedule a process. This is heavier than it might sound. With Erlang, you can have millions of processes running at almost no cost; with OS processes you’d die at a few thousands.

Well, this happens to be so because Erlang processes are designed to be small. A few hundred bytes is all it takes to run code in an Erlang process. This is huge difference from OS threads/processes. There is one primary reason why Erlang is so economic with process resources: It uses the Actor model. If you’re an OO developer, you could think of actors as objects. In Erlang, to invoke a method on these objects, you’d send a message that the objects process completely isolated from each other, and thus, concurrently. In other words, by using messaging as the means of communication between objects, you get concurrency for free. This is quite different from how we view objects and threads from a language such as C#. In C#, we often have some sort of primary object that runs in a separate thread, or in the thread pool, which takes care of running a bunch of child objects, because it’s simply too expensive and inefficient to spawn too many threads or add too many callbacks to the thread pool. In Erlang, this is not the case; in fact, you’re encouraged to run thousands or even millions of small processes in order to be able to scale your application. There is a clear difference between Erlang and, for example, .NET here: Distributed concurrency is first-class in Erlang.

.NET is a fine platform and I love to use it. But the problem with .NET is that concurrency is something that wasn’t really regarded as being important. Sure, .NET has threads, as well as high-level abstractions on top of them, such as TPL, TDF, PLINQ, and so on, but there is virtually no support in the runtime itself for distributed concurrency. We don’t have location-transparent processes, we don’t have load-balancing, we don’t have lightweight processes, we don’t have continuations, we don’t have code hot swapping, we don’t have fault-tolerance… The list goes on. And it doesn’t look like we’re getting it anytime soon, either. To Microsoft, this apparently seems too “academic” and “unrealistic”. Apparently, Ericsson, Facebook, RabbitMQ, T-Mobile, and Telia are not realistic companies.

I know exactly what the problem is, however. The problem is that Erlang is not popular. If it’s not popular, surely nobody needs it to solve their problems, and it’s just an academic toy. The usual excuse for not implementing something. You can probably guess what I’m getting at: Microsoft has recently done almost nothing innovative when it comes to development tools and frameworks. They spend their time mimicking what the open source community or other companies have provided for years. ORMs have existed for a long, long time, yet Microsoft is still playing the catch-up game with Entity Framework. Cloud services have existed for years, yet Microsoft had to push Azure forward. IoC and extension APIs have existed for almost a decade, yet Microsoft is working on MEF. MVC web frameworks have existed for years, yet Microsoft is working on ASP.NET MVC. The only Microsoft division I can say is truly rolling out new tech is Microsoft Research, with their work on F#, Code Contracts, Pex/Moles, Singularity, etc.

It’s sad. Microsoft doesn’t seem to dare encourage any sort of paradigm shift these days, and shamelessly dismisses new technologies as “academic” if they’re not popular and in wide use. What they don’t seem to realize is that all they’re doing is reinventing wheels and that innovation is what needs to happen. Now.

Apr
26

Around a month ago, I turned in a GSoC 2011 proposal to rewrite Mono’s ILASM (IL assembler) to use Cecil as its code generation back end, as well as write a managed ILDASM (IL disassembler), as Mono’s current monodis is written in C and relies greatly on the runtime itself. Yesterday, I received word that the proposal has been accepted, and that I’m going to participate in GSoC!

My friend, Wolfgang Steffens, who’s the primary developer on SL#, also had a GSoC project accepted, which seeks to incorporate SL# in the Axiom game engine. This would be a great way for us to draw more attention to the library, and also makes shader development for Axiom users much easier.

There are a lot of other interesting projects going on with Mono this year – check them out here.

Apr
24

It turned out that SL# 1.4 did not run on Mono as we had expected. Due to a bug in Mono’s CancellationToken that caused a new instance to be canceled by default, ICSharpCode.Decompiler didn’t actually decompile methods. This issue was seen in both Mono 2.8 and 2.10.1. A bug has been filed, and we worked around the issue for now.

Another bug we encountered (which is only relevant for SL# master) was invalid IL generation, which has also been fixed. Curiously, Microsoft’s CLR actually executed the IL correctly, even though the local variable had an incompatible type… Mono was much more helpful here, in that it threw an InvalidProgramException. As said, the latter issue is only relevant for master, not 1.4.

We might release a 1.4.1 package of SL# in a few days that addresses the first issue. If you don’t want to wait, grab this commit and apply the CancellationToken workaround.

Edit: Oh, and this seems to be my 100th blog post! Yay!

Apr
22

After less than a month with SL# 1.3, we’ve decided to release version 1.4.

This is probably the release with the most changes that we’ve done so far:

  • Full support for all GLSL data types (including matrices)
  • Support for in and inout through C#’s ref and out keywords
  • Switched to ICSharpCode.Decompiler instead of our homegrown one
  • Support for many more language constructs due to switching decompiler engine
  • Support for hundreds of GLSL functions that we didn’t have before
  • Shader.DebugMode now actually works as expected
  • Better type checking/verification when translating
  • We now expose a sane standalone API (GlslTransform) for GLSL translation, should you want to use it
  • We now only ship one assembly, IIS.SLSharp.dll
  • Lots of cleanups to the public API; we no longer expose any internals
  • Lots and lots of bug fixes

That’s a fairly impressive list of changes if you compare to our previous ones! We’ve been incredibly productive with SL# lately, and GitHub certainly has been a booster, with its new intuitive Issues 2.0.

As usual, the Git repository sits here (note that master is quite ahead of this release). Downloads can be found here, and the updated NuGet package has been published.

If you run into any issues, please report them on the issue tracker as always.

Lastly, the reason we chose to release 1.4 so quickly is because we’re doing heavy work on making SL# engine-agnostic. This means that 1.4 will be the last release that depends on OpenTK. Future versions (2.0 and on) will support various engines (OpenTK, Axiom, XNA, SlimDX) as well as several languages (GLSL, HLSL, Cg) through bindings. This new architecture should also make it easier to add more native-feeling support for F# (through function quotations). This of course means that we’ll be doing massive (breaking) changes and refactoring to our API and possibly even our syntax, hence the major version bump.

With that said, enjoy!

Apr
04

SL# 1.3 is now available on NuGet. This is the first time I’ve released anything on NuGet, so please let me know if I messed anything up!

Apr
04

I’m sure people who have been following me and talking to me have noticed how I’ve been looking so much into actor/agent-based concurrency and message-passing – how I’ve been talking about Axum, TPL Dataflow, Erlang, and some of my homebrewn equivalents.

I believe the future of concurrency lies in message-passing. My reason for saying this actually boils down to OOP. In OOP, every method call is a message – both the request and the result.

For example, whenever you do:

var result = calc.Add(40, 2);

You are, in OOP terms, saying:

  1. Prepare a message to inform calc that I want to call its Add method
  2. Add the values 40 and 2 to the message
  3. Send the message to calc
  4. Wait for calc.Add to do its thing
  5. Receive the result through a reply message

Something should become obvious to you when I spell it out like this: Why should the thread be stalled waiting for the result to be sent back before going on with other tasks? Surely it could just go on doing something else while calc finishes doing its very heavy computation.

So the point I’m trying to get across is: Whenever you call a method and whenever you get the value of a property, you’re sending a message (or several). This very simple fact has been largely ignored by mainstream developers and frameworks up until recently. Previously, it was all about trying to parallelize algorithms and using locks to synchronize where you just couldn’t parallelize; this obviously doesn’t scale. There’s also asynchrony. For a long time, people have been writing asynchronous code in the form of lambda expressions or anonymous methods, or even worse, as separate methods, passed to some API that starts an asynchronous operation.

You probably saw this one coming: C# 5.0 pretty much solves this entire problem. With the new async and await keywords, we can easily express asynchronous code in its logical form. However, these two keywords alone don’t give us all the power we need. While they allow us to treat every method call as sending a message and retrieving the result, we have no real way to control how execution happens, at what degree of parallelism, in what synchronization context, etc. This is where the new API in .NET vNext, TPL Dataflow, comes in.

TPL Dataflow allows you to set up a bunch of so-called source and target blocks, and then link them together, so that whenever some data is posted to the initial block, the data will automatically be passed through this “dataflow network”. That’s all nice and fancy, but alone, it isn’t much different from any other message-passing framework. C# 5.0 augments TPL Dataflow with the async and await keywords, so that we can write code like this:

agent.Post(Tuple.Create(40, 2));
var result = await agent.Receive();

This looks a bit more awkward than the first code example above, but the advantage here is that we can set up our agents just the way we want to. We can specify the task scheduler they use, how many messages they process per task, what degree of parallelism they utilize, and so on, giving us absolute control over program flow. For more information, see the docs on the TC Labs page.

F# has solved the problem of asynchrony and concurrency differently, through asynchronous workflows and generic agents. For more about that, check Don Syme’s post on async and parallel design patterns in F#.

The cool thing about actors/agents and message-passing is that we get implicit concurrency without a whole lot of effort. We just write our code the way we naturally would, and watch it scale. I say naturally because in the real world, communication happens through messages – why wouldn’t it in programming, too?

Still, we’re not quite there yet when it comes to language support. Erlang/OTP is probably the single language/platform that has the best support for actors, but we have no means of interacting with Erlang/OTP from .NET currently. There’s Axum too, but given that the language effort has stopped, that’s a no-go.

It really is incredible that the most sane approach to concurrency has been ignored for so long. It seems as though, historically, we went from message-passing to parallel algorithms, and then back to message-passing again. How anyone could have believed that parallel algorithms were going to be more scalable is beyond me, but languages like Erlang and Axum have shown us just how important message-passing is.

A collection of links related to this and previous posts:

Follow

Get every new post delivered to your Inbox.