The D programming language has the concept of a slice. Slices are a neat way to do data processing with arrays but have a fundamental flaw, related to garbage collection, that I’ll explain here. I’ll also present a way that this flaw can be fixed.
Crash Course on Slices
I’m not going to explain slices in all their glory – see this instead. The short story is this: A slice is simply a length + pointer pair. The pointer points to some arbitrary memory and the length specifies how many contiguous elements are at that location in memory. D does not really have arrays in the traditional sense. Consider this code:
int xs = [1, 2, 3, 4, 5];
The xs variable is not actually an array. It’s a slice with a length of 5 and a pointer to somewhere in memory. We can now do this:
int ys = xs[1 .. 4];
This is what slicing means and where the name slice comes from. The ys variable is now a slice with a length of 3 and a pointer to the second element in xs. Slices are especially beautiful in string processing code since you can pick off parts of a string easily.
The Problem with Slices
First, a thing to note: Slices are passed by value. This means that they are effectively a two-word struct being passed around. This is of course for efficiency reasons, since allocating a heap object for every slice would be unreasonably wasteful.
Now, consider that the pointer part of a slice can point to GC-managed memory. This means that the garbage collector has to scan it. This is not bad in itself, but let’s review the code above:
int ys = xs[1 .. 4];
As said, ys’s pointer is now pointing inside xs’s memory block. We have an interior pointer. This forces the garbage collector to not only scan slices conservatively, but also to assume that their pointer part can be an interior pointer. And this is for every single array/slice in a D program.
This is clearly a huge problem for type-precise garbage collection.
The Proposed Solution
Given an arbitrary slice pointing to any non-null memory, we want to be able to immediately tell whether that block of memory is GC-managed without having to do a heap range check on the slice’s pointer.
There is fundamentally only one way to solve this problem (without introducing unacceptable overhead) with two variations. It boils down to storing a pointer to the original memory block in the slice. This means that slices will become 3 words long. This is a significant change because it means an extra word to copy around when slices are passed between functions. On the other hand, I think the advantages it gives in garbage collection complexity (or rather, reduction thereof) outweigh that.
So, previously, slices look like this (assuming a word size of 8):
------ ------------ Offset Value ------ ------------ 0 Length 8 Pointer ------ ------------
In my scheme, slices look like this:
------ ------------ Offset Value ------ ------------ 0 Length 8 Base Pointer 16 Pointer ------ ------------
The base pointer here is the pointer to the start of the memory block. This ensures that the GC can trivially figured out that the memory is live. The pointer is then the (possibly interior) pointer to the relevant part of the memory block as in the old slice representation.
The other variation looks like this:
------ ------------ Offset Value ------ ------------ 8 Base Pointer 16 Offset ------ ------------
Here, offset means the offset into the base pointer to get the actual region of memory that the slice points to. This does mean a little extra work when doing indexing. It also assumes that the location the base pointer points to stores the memory block’s length. This scheme allows slices to work with arrays in the Common Language Infrastructure and other virtual machines. It would not work for plain D that compiles to native code, however.
I hope one of these variations can be worked into the D language. The current situation is highly problematic for garbage collection — and also prevents a D port to the .NET ecosystem — and needs to be addressed.
Comments very welcome – if you spot a flaw in the proposed solution, do say so. If something about the new scheme seems confusing or unsound, speak up too – it probably indicates a flaw. Suggestions for a better solution are also appreciated!