This is just going to be a bit of a brain dump.
I’ve been talking with my friend from Microsoft for the last couple of days about .NET and concurrency. Me being the language geek that I am, I was arguing that Erlang is, and will continue to be, the better choice for highly concurrent systems. .NET currently does not offer any reasonable equivalent to Erlang’s concurrency model. Sure, we have message-passing frameworks, we have locks, we have concurrent collections, we have the TPL (and TDF), but it all boils down to the same thing: Threads. Threads are evil.
A thread is no better than an OS process. In order to create a thread, a 1 MB stack (on Windows, anyway) needs to be allocated, and the OS needs to schedule the thread like it would schedule a process. This is heavier than it might sound. With Erlang, you can have millions of processes running at almost no cost; with OS processes you’d die at a few thousands.
Well, this happens to be so because Erlang processes are designed to be small. A few hundred bytes is all it takes to run code in an Erlang process. This is huge difference from OS threads/processes. There is one primary reason why Erlang is so economic with process resources: It uses the Actor model. If you’re an OO developer, you could think of actors as objects. In Erlang, to invoke a method on these objects, you’d send a message that the objects process completely isolated from each other, and thus, concurrently. In other words, by using messaging as the means of communication between objects, you get concurrency for free. This is quite different from how we view objects and threads from a language such as C#. In C#, we often have some sort of primary object that runs in a separate thread, or in the thread pool, which takes care of running a bunch of child objects, because it’s simply too expensive and inefficient to spawn too many threads or add too many callbacks to the thread pool. In Erlang, this is not the case; in fact, you’re encouraged to run thousands or even millions of small processes in order to be able to scale your application. There is a clear difference between Erlang and, for example, .NET here: Distributed concurrency is first-class in Erlang.
.NET is a fine platform and I love to use it. But the problem with .NET is that concurrency is something that wasn’t really regarded as being important. Sure, .NET has threads, as well as high-level abstractions on top of them, such as TPL, TDF, PLINQ, and so on, but there is virtually no support in the runtime itself for distributed concurrency. We don’t have location-transparent processes, we don’t have load-balancing, we don’t have lightweight processes, we don’t have continuations, we don’t have code hot swapping, we don’t have fault-tolerance… The list goes on. And it doesn’t look like we’re getting it anytime soon, either. To Microsoft, this apparently seems too “academic” and “unrealistic”. Apparently, Ericsson, Facebook, RabbitMQ, T-Mobile, and Telia are not realistic companies.
I know exactly what the problem is, however. The problem is that Erlang is not popular. If it’s not popular, surely nobody needs it to solve their problems, and it’s just an academic toy. The usual excuse for not implementing something. You can probably guess what I’m getting at: Microsoft has recently done almost nothing innovative when it comes to development tools and frameworks. They spend their time mimicking what the open source community or other companies have provided for years. ORMs have existed for a long, long time, yet Microsoft is still playing the catch-up game with Entity Framework. Cloud services have existed for years, yet Microsoft had to push Azure forward. IoC and extension APIs have existed for almost a decade, yet Microsoft is working on MEF. MVC web frameworks have existed for years, yet Microsoft is working on ASP.NET MVC. The only Microsoft division I can say is truly rolling out new tech is Microsoft Research, with their work on F#, Code Contracts, Pex/Moles, Singularity, etc.
It’s sad. Microsoft doesn’t seem to dare encourage any sort of paradigm shift these days, and shamelessly dismisses new technologies as “academic” if they’re not popular and in wide use. What they don’t seem to realize is that all they’re doing is reinventing wheels and that innovation is what needs to happen. Now.
Pingback: Dew Drop – June 6, 2011 | Alvin Ashcraft's Morning Dew
Erlang has been out there for long enough and despite all it’s feature, it didn’t become mainstream.
There’s a good reason for that: Erlang is full of different concepts that not everyone is willing to learn and understand.
Microsoft is a company, and one thing that we know for sure, is that a company cannot push a product its customers don’t want.
See F#, it’s a incredible language and fully supported. Yet have gained very small popularity compared to other languages. C# was wildly accepted since version 1.
Until customers understand the advantages and start demanding this kind of language, no for-profit company will invest in it.
That’s the thing: Microsoft would have to evangelize new stuff if they want customers to pick it up, otherwise it’s never going to happen. Customers won’t pick up some new piece of tech or language if it isn’t made clear to them why it’s advantageous and how it could help reduce development time and friction. And that’s what I’m getting at: Microsoft doesn’t seem to dare to evangelize new tech anymore; it’s always frameworks and tools that in some way resemble technologies and techniques we’ve already seen from other projects/companies/people.
Tasks aren’t threads… I don’t see the problem.
F# has support for task continuation and Microsoft has adopted the model for C#5 which contrarily to what you mention, is “sometimes soon”.
Besides as far as message passing goes, the Reactive Extensions for .NET is an extremely well designed and sophisticated API built on top of tasks which lets you model problems in ways Erlang can only dream of doing.
“you get concurrency for free.”
No. You don’t. This is woolly thinking and a poor piece of evangelism.
You get lock free data structures implemented for you (with considerable skill one hopes) behind the scenes handling the message passing for you but these are still using CAS (or moral equivalent) instructions so still have scaling limits and concurrency overheads (not to mention forcing copy semantics[1]). What they do is make it clear where your boundaries are, so if you have very little shared state (the norm) it becomes trivial to write code that doesn’t share state without having to worry about it. They also allow the platform to ensure that unintentional sharing (cache line collisions) and like don’t happen, rather than the programmer.
Also when you talk about things like hot swappable code do you have any idea how many systems actually *require* that behavior? Not very many (and they frequently have the sort of SLA requirements that merit the testing involved to ensure it actually works).
All the companies you cite as examples do guess what as part of their core business? messaging (and connectivity) for high volume clients what a surprise that they like a system that is designed around this. Why on earth should Microsoft try to push that model in that niche? Now there are many things I’d like them to do, (some of which I think might actually benefit them) and location transparent processes (perhaps app domains) is one of them but I’m under no illusions as to this being relatively niche still.
Plus any app whose primary state is a database has simply pushed the hard locking to the database (and STM seems to not have found much favour within Erlang, perhaps because the lack of throughput guarantees is not great for the target market)
MS are doign some work on continuations (admittedly through CPS compiler rewriting), have already done it with f# and are looking to bake it into c# (so pretty mainstream). Whether it merits it or not will be an interesting case study in 5-10 years me thinks.
(I’m no microsoft apologist – their BCL is patchy, some parts lag well behind the competition in both design and implementation).
[1] Immutability by default helps here, I wish more non functional languages put this to the forefront, I can’t imagine many modern languages where immutability being the default would cause much additional work to the initial programmer (and should have considerable benefit to maintenance programmers). As far as I’m concerned that, more than anything else is the most useful thing.
>No. You don’t. This is woolly thinking and a poor piece of evangelism.
I think you misunderstood what I meant here: I mean to say that you get concurrency without any additional effort to orchestrate it. Of course, under the hood you will have lock-free algorithms to do the message-passing. On that note, I did some performance evaluation a few months back, and message-passing in the ring test turned out to be slower by almost a minute in .NET using lock-free queues, as compared to [1].
>Also when you talk about things like hot swappable code do you have any idea how many systems actually *require* that behavior?
I know I, for one, would not want to take a distributed service down if I could avoid it, in order to provide the best possible service for my customers.
>Plus any app whose primary state is a database has simply pushed the hard locking to the database (and STM seems to not have found much favour within Erlang, perhaps because the lack of throughput guarantees is not great for the target market)
In Erlang land, we have mnesia, which is a scalable distributed “object” database.
>MS are doign some work on continuations (admittedly through CPS compiler rewriting), have already done it with f# and are looking to bake it into c# (so pretty mainstream). Whether it merits it or not will be an interesting case study in 5-10 years me thinks.
And as usual, it remains in-process. When it’s cross-process, they might start catching up.
[1] http://www.rodenas.org/blog/2007/08/27/erlang-ring-problem/
Right, if your problem decomposes well into message passing (and many do) this is true, and very useful.
If you need to spend the time to convert the problem into message passing it’s not free in that sense either.
I’d be intrigued to see what you were using for the lock free queues in the .Net example.
Were you disallowed from exploiting the fact that only one message is in flight at a time (so the queue is size one) and did you have a thread per ‘process’ or was it done using fibers? I admit doing it in fibers would be a real pain compared to doing it in Erlang.
The number of people writing distributed services is still low, especially for LOB applications (which are surely MSFT’s bread and butter) is likewise low. It is especially daft for MS to try to implement erlang features when the people that want it already like erlang. It’s hard to sell something as a replacement to people when it reduces their choices unless it is much better (for the record I believe c# as a language is much better[1] then java).
I have very low opinions of the distributed object databases as anything but caching layers. Pretending they are replacing a database almost always involves people tripping up as they assume ACID semantics when they are only getting D (and some value of C). I’m sure there’s plenty of uses for them where you don’t need the A and I but I have yet to find any use for them in my current work.
I agree that out of process work is interesting, but in this case it is fundamentally a distribution and security problem. MS and others have various apllications of this in the MapReduce like area (DryadLINQ for example) but I would note that these are mainly trying to deal with the issue of getting data to the separate nodes, which is a far harder problem than distributing processing. Does erlang have any stuff to deal with spreading processes about to deal with multi gigabyte/tera byte data sets? Genuine question I’d love to know.
[1] in the sense I can use it to provide higher quality, better performing applications to my employer faster with less bugs.