DRAFT: dot operator blog post

17 Jul 2012

NOTE: The canonical version of this page is http://simblob.blogspot.com/2012/07/playing-with-dot-operator.html

Normally the “[]” operator is used for extracting an element out of a sequence. Sequence[Index] turns into *(Sequence+Index) in C. And since + is commutative, it’s also Index[Sequence]. Yes, you can write 3[a] and it’s the same as a[3]. Is that useful? Probably not. But it’s interesting.

I want to talk about the “.” operator. It’s taken for granted, like “[]”, but I think it’s interesting to question how it works.

Normally the “.” operator is used for extracting an element out of a record. In Python it’s two hash table lookups. Given a.x you first look up a in the current namespace (stack frame) to find a record, then look up x in the record. (See also: plists in Lisp.) In many languages one or both of these steps are optimized, but I’m going to explore the straightforward unoptimized implementation.

Field lookup

Let’s consider field lookups a.x, a.y, b.n, and b.y. (Note: for simplicity I’m conflating variable names and record ids in the diagrams and sample code but they’re really separate things.) Here’s the conventional view:

In code, we can think of it like this (Python syntax):

values = {}
values["a"] = {}
values["b"] = {}
values["a"]["x"] = 3.14
values["a"]["y"] = 2.71
values["b"]["n"] = "no"
values["b"]["y"] = "yes"

We think of the record as primary and the field as secondary. But does it have to be this way? One of the reasons this layout is preferred is optimization. In a language where record types are known, mapping from field name to offset in the record can be resolved at compile time (like this^[1]). This means we can use a simple dereference instead of a hash table lookup. For this blog post however I’m going to ignore optimizability and instead look at the alternatives.

Combined lookup

What if we put both the record id and the field name into a tuple, and made them siblings in the lookup process?

In code, we can think of it like this:

values = {}
values[("a", "x")] = 3.14
values[("a", "y")] = 2.71
values[("b", "n")] = "no"
values[("b", "y")] = "yes"

Are there any downsides to this, other than optimizability? It’s a little more work to enumerate the fields of a record but you can mostly do the same sorts of things with this implementation as you can with the conventional implementation — lookup, assignment, field insertion, field deletion. This approach may be useful in some situations but I didn’t pursue it.

Record lookup

What if we made the field name primary and the record id secondary? Back in the 1990s when I was into programming languages (I’ve since recovered), I played around with this idea.

In code, we can think of it like this:

values = {}
values["x"] = {}
values["y"] = {}
values["n"] = {}
values["x"]["a"] = 3.14
values["y"]["a"] = 2.71
values["n"]["b"] = "no"
values["y"]["b"] = "yes"

One interesting idea that came out of this experimental language was the idea that the field name could be local to a module or function. (See also: symbols in Lisp packages.) With this feature, you can attach fields to records without needing subclassing. (Ruby’s mixins are some like this except they only allow methods, not data.) Records become only IDs; all the actual data is attached from the outside. (See also: database normalization.)

Imagine being able to attach per-module fields to objects. Imagine being able to treat a “field” just as you might treat an “object” in a regular language: take fields as function parameters, return them from factory functions, import/export them with modules, store them inside objects, etc.!

Use in games

Is this useful in games? I think so. I think it might be related to what entity/component systems are trying to build without having direct language support. The entities are the records and the components are the modules and local data. Consider this data:

Object-oriented systems organize this by column. There’ll be a soldier module that defines a class with x, y, and health, and a beacon module that defines a class with x, y, and color. There will probably be a game object superclass that contains x, y.

When you treat fields as first class constructs in the language, you organize this differently — by row. You’d put the x and y fields into one module, health into another, and color into a third. Code that works with soldiers would import the position and health modules. Code that works with beacons would import the position and color modules. And code that works on either would only import the position module. You can mix and match features — for example, an object with both health and color takes no additional classes or code in this system, because health and color are in separate modules. You keep traditional syntax like soldier2.health; it’s just that health might be a local field instead of a global one, visible only in modules that have imported the health field. You can attach local fields to records at run time and detach them when no longer needed.

It’s quite possible this system wouldn’t work out in practice. I never finished my programming language so I can’t point to a concrete test of the principles described here. The ideas from the experiment have definitely influenced how I write code; that’s the outcome I hope for when exploring an idea. It’s fun to take something mundane like “.” and ponder how else it could work.

Field lookup

Combined lookup

Record lookup

Use in games

Endnotes