Running C and Python Code on The Web

Last week, Scott Petersen from Adobe gave a talk at Mozilla on a toolchain he’s been creating—soon to be open-sourced—that allows C code to be targeted to the Tamarin virtual machine. Aside from being a really interesting piece of technology, I thought its implications for the web were pretty impressive.

Before reading this post, readers who aren’t familiar with Tamarin may want to read Frank Hecker’s excellent Adobe, Mozilla, and Tamarin post from 2006 for some background on its goals and why it’s relevant to Mozilla and the open-source community in general.

If I followed his presentation right, Petersen’s toolchain works something like this:

A special version of the GNU C Compiler—possibly llvm-gcc—compiles C code into instructions for the Low Level Virtual Machine.
The LLVM instructions are converted into opcodes for a custom Virtual Machine that runs in ActionScript, a variant of ECMAScript and sibling of JavaScript.
The ActionScript is automatically compiled into Tamarin bytecode by Adobe Flash, which may be further compiled into native machine language by Tamarin's Just-in-Time (JIT) compiler.

The toolchain includes lots of other details, such as a custom POSIX system call API and a C multimedia library that provides access to Flash. And there’s some things that Petersen had to add to Tamarin, such as a native byte array that maps directly to RAM, thereby allowing the VM’s “emulation” of memory to have only a minor overhead over the real thing.

The end result is the ability to run a wide variety of existing C code in Flash at acceptable speeds. Petersen demonstrated a version of Quake running in a Flash app, as well as a C-based Nintendo emulator running Zelda; both were eminently playable, and included sound effects and music.

So, once Petersen’s modifications to Tamarin make their way into the next version of Adobe Flash, we can expect to see older commercial games running in the browser. Even more impressive, though, is the sheer volume of existing code that can be made to run inside the browser: Petersen showed us the C-compiled versions of Lua, Ruby, Perl, and Python all running on the web in secure Flash sandboxes.

What this means for Python

The potential implications this has for Python are particularly interesting to me. The ability to run Python on the web is exciting, to say the least; also interesting is the fact that by sandboxing CPython in a virtual machine, we solve a lot of the security issues that currently face the language when it comes to running untrusted code.

Petersen’s work also resonates with a few goals of another project called PyPy. I’m going to try to explain the idea behind PyPy in a later post; for the time being, the slides from my April 2007 ChiPy presentation on PyPy may serve as a passable introduction.

In a nutshell, the difference in mindset between PyPy and Petersen’s work is that the former is radically innovative in scope and mission, while the latter is pragmatic. PyPy’s goal is essentially to move the canonical implementation of Python from C to Python itself, and then use a pluggable toolchain to translate the Python interpreter to any platform with a configurable set of language and implementation features. In one fell swoop, this modularizes the composition of the Python interpreter in such a way that innovating and maintaining different ports and variants of Python like IronPython, Jython, and Stackless no longer requires either writing an entire copy of the same interpreter in a different language or branching the CPython source code and making pervasive changes to it.

Rather than focusing on innovation, Petersen’s work focuses on code reuse. Instead of moving a canonical interpreter implementation from C to a dynamic language, his strategy is to simply compile the existing C code to run in a virtual machine that’s implemented in a dynamic language. Both approaches aim to obviate the necessity of “ports” of interpreters to different platforms, and as such their purposes intersect at a common subset of functionality. But Petersen’s work can’t be used to facilitate the innovation of the Python language and its implementation, while PyPy offers few or no tools to reuse existing non-Python code. Perhaps it’s possible to combine the best of both worlds by taking PyPy’s generated C interpreter and using Petersen’s toolchain to allow it to be usable on the web and other places that Tamarin runs.

What this means for the Open Web

To be honest, I’m not quite sure where the dividing line is between what of Petersen’s work is Flash-specific and what can be reused to benefit the Open Web. Since ActionScript is a sibling language to JavaScript, it’s possible that the custom VM he created can be run in a browser with relatively few modifications—albeit much more slowly in Firefox at the time being, since SpiderMonkey-Tamarin integration is not yet complete. Once that’s further along, though, I imagine it should be possible to create C “libraries” that can be used in the toolchain to allow sandboxed C code to interact with web pages rather than Flash apps. Should this be feasible, I think it will possibly be the ultimate in a relatively recent string of next-generation Javascript virtual machines that allow existing code to run safely in browsers.

Also, in the context of the web, download size is a significant concern because applications are essentially streamed to clients. While Petersen’s toolchain means that it’s possible to instantly inherit most of CPython’s benefits on the web, it also means that we get all of its flaws along with it—such as the fact that the standard CPython distribution is a few megabytes large. But there’s ways to get around this.

In any case, I’m really excited to see how both Petersen’s work and PyPy proceed. I just hope I haven’t mis-represented either one of them here due to a lack of understanding; I’ll try to correct this blog post as I become aware of my mistakes.

July 3, 2008

Running C and Python Code on The Web