2008

2007

2006

2005

2004

On Clojure (and concurrency in general)

▁ apr 15 2008

I just finished watching Clojure Concurrency, a presentation given by the ever-talented Rich Hickey. It’s almost 2 hours, so don’t just watch it if you’re bored.

Now Rich is a really smart guy. He does a talk on this language he wrote called Clojure (pronounced ‘closure’), which is a Lisp-1 running on top of the JVM. I had previously spotted this some time ago on proggit, but it was still in a very early stage (pre-alpha), and I was more infatuated with Erlang at the moment, while reading Armstrong’s Programming Erlang: Software for a Concurrent World.

Concurrency is a very hot topic these days. The big-shots say it’s a new paradigm, and I’m with them on that one. Well, sorta. In his talk, Rich says at one point that “we’ve been doing it wrong.” If you ever tried creating a multithreaded application, ending up in a living hell acquiring locks and having more mutex’es than normal variables, you know the problem at hand.

The “new” proposed way of doing things, tie in directly with the also-hyped functional languages (I’m sure you’ve seen Haskell or F# mentioned, no?) — the gist of this being immutability. That’s a big word that just means “you can’t change it.” Once you’ve assigned a value to something, you can’t change it. There’s a section in the aforementioned video that talks about “Persistent Datastructures” which offers some great insight on how this works beyond simple strings or integers (but on stuff like vectors or hashes.) Once you take the mutable part out of the equation, there’s no need to lock anything. You still need transactions every once in a while, but we’ve been using those in databases for years, so they’re not so scary anymore.

Clojure does not yet have a “platform” for seamlessly distributing computing over a network (only via threads running on the CPU, running on your OS, like Haskell), unlike Erlang who offers “green threads”—that is, threads running in the VM, and not on the operating system. This allows for creating and communicating between threads while keeping it extremely cheap. While it may be argued that distributed computing over a network right in the VM is pretty cool, it might not be a very good indicator when choosing your next programming language.

If you want to read more about concurrency and the languages prevailing on the subject these days, I suggest you read Concurrency (computer science) - Wikipedia, the free encyclopedia) which debates the infamous “Dining Philosophers” problem, and definitely Joe Armstrong’s essay “Concurrency is easy”.

← Previous: More App Engine stuff  //  Next: App Engine ported to EC2

comments

Vetle, 7 months, 2 weeks ago:

I assume Clojure uses whatever the JVM uses… JVMs used to have green threads, but since green threads have some limitations (they cannot run on separate processors, for instance), they ditched it in favour of native threads. See http://en.wikipedia.org/wiki/Green_threads for more info, green threads do have some benefits as well.

Heh, I actually see now that according to wikipedia, Erlang has threads similar to Java’s original green threads.

Of course, if you hit a barrier with Erlang’s threads, you can just start a new Erlang process.

Jesper, 7 months, 2 weeks ago:

Vetle,

Erlangs green threads are not the same as the ones the JVM used to have. I’m not sure about the implementation specifics, but the threads/processes in Erlangs run on multicore CPUs just fine. You can read more about how Erlang deals with concurrency and distribution here. Also, Chapter 20 of the Erlang book, “How to make programs run efficiently on a multicore CPU” simply states that to efficiently use several CPUs, all you have to do is use lots of processes. If you “outsource” your processing to “actors”, each part of your larger computation will run independently.

Perhaps the confusion lies in that Erlang does not use actual threads, but processes, in the sense that they do not have a shared state, like threads do. They behave much more like OS procs than anything else. Wikipedia has more information.

Vetle, 7 months, 2 weeks ago:

I am fully aware that Erlang can use multiple processes, but if you think green threads can do that, you need to read up on what green threads really are. You really don’t need to link me to the exact same page that I linked to. Notice the conflicting information there? First it says it’s green threads, then green processes?

Anyway…. It is impossible for Erlang to run on multiple processes without creating OS processes or threads. The Erlang VM has to submit to the same rules as any user space process.

Jesper, 7 months, 2 weeks ago:

Vetle,

Erlangs “processes” are commonly referred to as “green threads”, although (as the wikipedia article states), this is the wrong terminology.

If you feel like sitting through this talk, one of the first thing he mentions is how Erlang spawns a scheduler per physical CPU, that is how it handles multicores. I understand how any virtual machine is confined by the rules of user space. I never said that Erlang could do multicore without any magic behind it, I said I was unsure of the implementation details. I hope things are more clear now.

powered by