Some time ago I found a fascinating paper on the history of Erlang, written by one of its main creators:
It showed up on programming.reddit for about a day, and then disappeared right back into the deep waters – which is too bad, because the paper is illuminating.
Erlang is a language with strong built-in support for concurrency based on message passing. As the author describes it:
Erlang was designed for writing concurrent programs that “run forever.” Erlang uses concurrent processes to structure the program. These processes have no shared memory and communicate by asynchronous message passing. Erlang processes are lightweight and belong to the language, not the operating system.
Reading the paper, I had several delightful “aha!” moments, where the details of Erlang design just clicked together. And I don’t mean the mundane issues, like single-assignment variables (hello, Prolog!), or interactively modifiable code (hello, Lisp!). Rather, the history put some architectural bits in perspective.
At a certain level, I always knew that Erlang was designed for programming telephone exchanges. But this paper really conveys the feel for what that means: huge collections of loosely coupled finite-state machines, all running in parallel, doing little bits of protocol validation here and there, but mostly staying dormant.
Typically, the software for call control is modeled using finite state machines that undergo state transitions in response to protocol messages. From the software point of view, the system behaves as a very large collection of parallel processes. At any point in time, most of the processes are waiting for an event caused by the reception of a message or the triggering of a timer. When an event occurs, the process does a small amount of computation, changes state, possibly sends messages to other processes and then waits for the next event. The amount of computation involved is very small.
Which definitely explains a number of their implementation choices which, without this motivation, look highly unusual. Let me highlight just three of them.
First, there’s strong emphasis on easily tracking and manipulating the flow of control, rather than data. Everything is based on message passing. This makes Erlang perfect for control problems: monitoring the state of the world, dispatching events, tracking protocols. It also makes perfect sense, given the pragmatic need to support massive collections of concurrent FSMs.
But here’s the flip side: there is little support for processing large, shared data structures – unless they come from an external source, such as a database, which takes responsibility for locking and synchronization.
So when it comes to problems that really want to be done in a straightforward shared-data style (SIMD scientific computing comes to mind), that’s not as easy anymore – even if the problem itself is inherently parallelizable. One gets a sense that Erlang’s concurrency primitives are general-purpose, but they’re definitely oriented towards very specific applications of concurrency.
Second, I enjoyed the explanation of one particular detail of Erlang: in the message-passing implementation, individual processes don’t handle messages immediately. Instead, there’s a concept of “mailboxes” where messages get queued up, and handled only when the process is in the right state, and has all the messages it needs.
This seems unusual, especially coming from the traditional programming background, where messages are merely a vehicle for doing RPC. But that’s not how this language uses them.
Erlang’s finite state machines implement and validate message-oriented protocols. Immediate message handling would make this much more complicated:
[W]e observed that handling out-of-order messages in a finite state machine led to an explosion in the state space of the finite state machine. This happens more often than you think, in particular when processing remote procedure calls. Most of the telecommunications programs we write deal with message-oriented protocols.
This is a pretty nice idea. In my own previous attempts at building large collections of parallel FSMs, I encountered similar problems, caused by immediate input processing. I ended up having to split a single “conceptual FSM” into multiple concrete FSMs, and broadcast all inputs to all of them, just to be able to deal with out-of-order messages. But then coordinating them all turned into a major problem. In retrospect, a message-based approach with mailboxes and pattern matching over those mailboxes would have been a clear win.
Third, Erlang is somewhat infamous for its distributed error handling mechanism. Most languages are very specific about how errors propagage – Java and C# errors, for example, always travel up the call stack; one could imagine a similar scheme for a message-passing system.
Erlang, however, completely washes its hands of this responsibility. Failed processes post an error, but it’s up to the programmer to make sure some other process somewhere else is looking for those errors.
I was always curious what prompted such an unusual mechanism, and the paper illuminates that:
The idea of links and of the mechanism by which all processes in a set die was due to Mike Williams. Mike’s idea was inspired by the design of the release mechanism used in old analogue telephones and exchanges. In the analogue telephones and in the early electromechanical exchanges, three wires called A, B and C were connected to the phones. The C wire went back to the exchange and through all the electromechanical relays involved in setting up a call. If anything went wrong, or if either partner terminated the call, then the C wire was grounded. Grounding the C wire caused a knock-on effect in the exchange that freed all resources connected to the C line.
This is very powerful. You want errors to propagate via some unexpected route, to a master controller that shuts down all sorts of things, and then maybe attempts recovery? You can set that up fairly easily.
At the same time, it feels dangerous, since one is no longer relying on the runtime for error handling. Now it’s up to the programmers to design an error propagation network. But relying on convention doesn’t scale very well; and besides, isn’t this why we have compilers, to do the boring enforcement work for us?
All in all, a fascinating read!
Erlang-China » [荐]《对“A History of Erlang”的评论》 | 22-Jun-08 at 6:26 am | Permalink
[...] 原文在这里。貌似“功夫网”并未“河蟹”——既然能访问,俺就不盗版了。 [...]