Running C and Python Code on The Web

Last week, Scott Petersen from Adobe gave a talk at Mozilla on a toolchain he’s been creating—soon to be open-sourced—that allows C code to be targeted to the Tamarin virtual machine. Aside from being a really interesting piece of technology, I thought its implications for the web were pretty impressive.

Before reading this post, readers who aren’t familiar with Tamarin may want to read Frank Hecker’s excellent Adobe, Mozilla, and Tamarin post from 2006 for some background on its goals and why it’s relevant to Mozilla and the open-source community in general.

If I followed his presentation right, Petersen’s toolchain works something like this:

  1. A special version of the GNU C Compiler—possibly llvm-gcc—compiles C code into instructions for the Low Level Virtual Machine.
  2. The LLVM instructions are converted into opcodes for a custom Virtual Machine that runs in ActionScript, a variant of ECMAScript and sibling of JavaScript.
  3. The ActionScript is automatically compiled into Tamarin bytecode by Adobe Flash, which may be further compiled into native machine language by Tamarin’s Just-in-Time (JIT) compiler.

The toolchain includes lots of other details, such as a custom POSIX system call API and a C multimedia library that provides access to Flash. And there’s some things that Petersen had to add to Tamarin, such as a native byte array that maps directly to RAM, thereby allowing the VM’s “emulation” of memory to have only a minor overhead over the real thing.

The end result is the ability to run a wide variety of existing C code in Flash at acceptable speeds. Petersen demonstrated a version of Quake running in a Flash app, as well as a C-based Nintendo emulator running Zelda; both were eminently playable, and included sound effects and music.

So, once Petersen’s modifications to Tamarin make their way into the next version of Adobe Flash, we can expect to see older commercial games running in the browser. Even more impressive, though, is the sheer volume of existing code that can be made to run inside the browser: Petersen showed us the C-compiled versions of Lua, Ruby, Perl, and Python all running on the web in secure Flash sandboxes.

What this means for Python

The potential implications this has for Python are particularly interesting to me. The ability to run Python on the web is exciting, to say the least; also interesting is the fact that by sandboxing CPython in a virtual machine, we solve a lot of the security issues that currently face the language when it comes to running untrusted code.

Petersen’s work also resonates with a few goals of another project called PyPy. I’m going to try to explain the idea behind PyPy in a later post; for the time being, the slides from my April 2007 ChiPy presentation on PyPy may serve as a passable introduction.

In a nutshell, the difference in mindset between PyPy and Petersen’s work is that the former is radically innovative in scope and mission, while the latter is pragmatic. PyPy’s goal is essentially to move the canonical implementation of Python from C to Python itself, and then use a pluggable toolchain to translate the Python interpreter to any platform with a configurable set of language and implementation features. In one fell swoop, this modularizes the composition of the Python interpreter in such a way that innovating and maintaining different ports and variants of Python like IronPython, Jython, and Stackless no longer requires either writing an entire copy of the same interpreter in a different language or branching the CPython source code and making pervasive changes to it.

Rather than focusing on innovation, Petersen’s work focuses on code reuse. Instead of moving a canonical interpreter implementation from C to a dynamic language, his strategy is to simply compile the existing C code to run in a virtual machine that’s implemented in a dynamic language. Both approaches aim to obviate the necessity of “ports” of interpreters to different platforms, and as such their purposes intersect at a common subset of functionality. But Petersen’s work can’t be used to facilitate the innovation of the Python language and its implementation, while PyPy offers few or no tools to reuse existing non-Python code. Perhaps it’s possible to combine the best of both worlds by taking PyPy’s generated C interpreter and using Petersen’s toolchain to allow it to be usable on the web and other places that Tamarin runs.

What this means for the Open Web

To be honest, I’m not quite sure where the dividing line is between what of Petersen’s work is Flash-specific and what can be reused to benefit the Open Web. Since ActionScript is a sibling language to JavaScript, it’s possible that the custom VM he created can be run in a browser with relatively few modifications—albeit much more slowly in Firefox at the time being, since SpiderMonkey-Tamarin integration is not yet complete. Once that’s further along, though, I imagine it should be possible to create C “libraries” that can be used in the toolchain to allow sandboxed C code to interact with web pages rather than Flash apps. Should this be feasible, I think it will possibly be the ultimate in a relatively recent string of next-generation Javascript virtual machines that allow existing code to run safely in browsers.

Also, in the context of the web, download size is a significant concern because applications are essentially streamed to clients. While Petersen’s toolchain means that it’s possible to instantly inherit most of CPython’s benefits on the web, it also means that we get all of its flaws along with it—such as the fact that the standard CPython distribution is a few megabytes large. But there’s ways to get around this.

In any case, I’m really excited to see how both Petersen’s work and PyPy proceed. I just hope I haven’t mis-represented either one of them here due to a lack of understanding; I’ll try to correct this blog post as I become aware of my mistakes.

42 Replies to “Running C and Python Code on The Web”

  1. There are already some ideas how to make PyPy work together with Tamarin: The plan is to write an ABC backend, which would give you a full Python interpreter running in flash. The question is how fast that would be, but probably not too bad.

  2. Sounds cool, I am a ruby guy but I want a strong python challenger so ruby can play chasing game. In the end that will benefit both worlds.

    I also hope that one day we can replace Javascript with a language we choose, even if it only runs on a few “candidate client machines” who have this language on their machine as well. In the times of almost free bytes and cheap CPU power, the easier languages should really gain more momentum – and javascript holds them back on the www area.

  3. markus, I don’t think you get it: they are working towards porting C code to run on the Tamarin Ecmascript VM. This means you should obviously be able to recompile the ruby C interpreter to Tamarin as well and run ruby code in your browser.

    This is interesting, but seems to me redundant. Is the ultimate goal of a web browser really to replace the desktop as a whole? “Oh, we can now run Gnome from within Firefox and open up recompiled Firefox from there…” 😛

    OTOH, I can see the benefit of transparently accessing networked resources and running them right away, no need for installation. This is certainly great in the case of games, emulators, interactive tutorials etc…

  4. But how will Tamarin stack up to Silverlight (IronPython?) in terms of speed? If it is an order of magnitude slower, than IronPython may be a better choice for Python in the browser.

  5. wow, david! So that means we’re going full cycle by running compiled x86 in a x86 java emulator which is then jitted to x86 code? Yes, I understand it could be used to run on other CPU architectures, though I’m not sure most x86 code stops just at basic x86 features, most likely also accessing other peculiar PC features.

  6. @anonymous: No, this will not make it easier to create worms/viruses. At the moment we are only talking about code running in the flash sandbox, so it does not introduce any vulnerabilities not already present in flash applications. The only caveat is that if a new exploit was discovered in the Tamarin VM, the new contiguous byte-array type that is being introduced might make it easier to run arbitrary code, since it would be easier to manipulate and predict the structure and contents of certain blocks of memory.

  7. so the custom POSIX library does not allow the application to create files in arbitrary locations ? i mean an attacker can download some trojan from web and save under user’s Desktop, so POSIX file related system calls also go through security check ?

  8. Yeaaaaaah so we’re going to allow any website to download DLLs into a VM and have them execute…this will end well wont it 🙂

    yay security industry, been around since the dawn of time, and thanks to adobe and friends we’re still going strong!

  9. Let’s not forget that Python code can be run today in Firefox and Internet Explorer using Microsoft’s SilverLight plug-in.

    Another note for new scripting language authors: Scripting languages that run in a browser need DOM access and full-featured runtime libraries. Getting a new-browser based VM up and running is only part of the battle.

  10. this is the problem with flash: it starts out as an animation tool and tries to map a programming model to it.

    this is just more ass-backwards thinking.

  11. Pingback: Fuzzy Logic
  12. I don’t quite get the point of the project. Implementing C in a Web browser can cause some serious security flaws (from remote file download to arbitrary code execution), performance problems (imagine KDE ran in FF), compatibility issues (Flash Player behaves quite weird on Mac OS X, for example) and so on.
    IMHO creating a “virtual PC” in a browser is pointless. Wouldn’t it be better to write a sophisticated CMS in Java as a desktop application? Since I’ve written one I can tell you that such an approach is way more flexible and faster than creating the app in Web browser…

  13. foobar,

    you obviously have no idea about where the Flash platform is in 2008. As of V9 the Flash Player has grown up to be a fully fledged bytecode VM (like Java) with a 10x performance increase. Flash files (.swf) can be created either by Flash for designers or Flex for developers. There’s nothing about it that limits it to animations.

    I have 15 years desktop dev experience and find the Flex developer experience to be very good and still improving steadily. If you ignore history and just look at the whole dev/runtime environment now – its like real programming but executable on desktop or web, Mac, Windows or Linux virtually without issues.

  14. > In a nutshell, the difference in mindset between PyPy and Petersen’s work is that the former is radically innovative in scope and mission, while the latter is pragmatic.

    Not claiming that it was invented by Squeak, but that environment has been using a similar technique for a while now:

  15. I think a lot of people are misunderstanding what this is all about.
    It is not running C code in Flash Player.
    It does not allow DLL’s or other machine code to be loaded and executed.

    It is really just a way of compiling C code to bytes read in the AVM (eg compiling c code to swf).
    There will be limitations sure, eg no write file access, but these are just limitations in the flash player.

    There are loads of really good things to do with this. Especially porting stuff to the Flash Player. There is sooo much open-source C code out there – if this compiler works well, we will start seeing lots more stuff ported to the Flash Player. No more waiting for someone to re-implement the zlib code in AS3, just compile it from its C code, for example.

  16. Neat. You have to wonder what Adobe’s motivation is – is it that they have the smart compiler guys trying stuff out, or are they porting something specific – like, eg Acrobat Reader?

    Flash is nearly everywhere, but acrobat isn’t – Preview dominates on mac, and then there’s Evince on Linux. If you don’t use Acrobat, you don’t get the plugins that Adobe’s backoffice tools enable (eg for measurement), or fdf support.

    So, a flash-in-browser might sell them more backoffice, or an AIR-based acrobat might help them make AIR more ubiquitous.

  17. is it just me or are the security implications of running C code inside a browser not massive?

  18. @Baz. Since the market share of apple and linux are extremely low esp in the business desktop arena and almost all the business computers i have ever been on have acrobat and the acrobat plug-in installed, which allows in-browser pdf viewing, I doubt this is there motivation.

  19. If only we can compile and execute Java code in the browser …. Oh, hell, we can do it! At least for 10 years ! It’s absolutly awesome !

  20. anonymous: No, Squeak is quite different from PyPy. While there are superficial similarities, the core ideas are different. Squeak is written in Slang, a subset of Squeak. However, that subset is chosen in such a way that it maps directly to C. It doesn’t give you any high-level features (such as Garbage Collection or metaprogramming capabilities). It would be extremely hard to translate Slang to Java or .NET bytecode.

    PyPy on the other hand is implementing its interpreter in RPython, which is also a subset of Python, but on a much higher level. It is very close to normal Python, but contains some type usage restrictions. It is quite far from C and therefore it is possible to do very different things with it than just translating it to C, like translating it to the JVM, to .NET, auto-generating a JIT, etc.

  21. >Yeaaaaaah so we’re going to allow any website to download DLLs into a VM and have them execute…this will end well wont it

    You just described Silverlight…


  22. I don’t understand why running C programs in the browser as described is supposed to be a security problem. The C programs don’t have any more ways to interact with the OS than all normal JavaScript programs. So either the current model is already bad (which is possible) or the new things aren’t exactly worse than what we have now.

  23. What are the implications for actionscript development? Is it possible that scripting in c would replace scripting in actionscript within the flash environment?

  24. Having Python compiler for Tamarin VM in Adobe Flash and Mozilla Firefox 4 is going to be amazing, and is just the tool Adobe needs to compete with Silverlight 3.

Comments are closed.