September 11, 2009

Coming At You Like A Pydermonkey

Since learning JavaScript over a year ago, it’s become one of my favorite dynamic programming languages alongside Python. And as I’ve mentioned before, I think the two languages actually complement each other pretty well.

Python, at its heart, is a platform that’s built to be extended. The evidence for this is plentiful: there’s modules and packages out there that offer practically any functionality you want, from web servers to 3D game engines to natural language processing toolkits and more, all instantly accessible through a simple command or an installer download. Yet one of the costs of all this generativity has been the fact that Python doesn’t really have much of a security model to speak of: any Python program has as much access to the underlying system as the current user does, which, compared to the Web, is basically omnipotence. Creating programs that obey the principle of least privilege is pretty hard.

JavaScript, on the other hand, has many of the opposite problems. For one thing, it’s really built for embedding: until the very recent advent of CommonJS and Narwhal, for instance, the language has always lacked a general-purpose platform and standard library. A Pythonic way of saying this is that the language doesn’t come with “batteries included”, but this can actually be a good thing from a security standpoint: because the simplest possible embedding has no privileges and needs to be explicitly given all its capabilities by its embedder, it’s very easy to follow the principle of least privilege. Recent work on membranes and capability models puts JavaScript way ahead of many other languages in the security realm, yet the lack of a mature general-purpose platform has meant that anyone who’s wanted to leverage these strengths has always had to muck around in C/C++ to create the kind of embedding they wanted.

Well, to an extent. One of the many aspects of Java that I’ve frequently been envious of has been Rhino, a JavaScript engine written entirely in Java, which allows anyone who knows Java to create their own embedding solution that leverages Java’s strengths. But I prefer Python to Java, and moreover, the engine itself isn’t worked on with as much intensity as the JS engines that power real-world consumer products like V8 and SpiderMonkey—so new language features are slow to be implemented and performance isn’t great.

I’d briefly tried resurrecting John J. Lee’s Python-Spidermonkey last year but I soon discovered that it wasn’t really what I wanted. For instance, JS objects were copied into Python approximations as they crossed the language boundary, which resulted in a “lossy” transfer and prevented features like identity perseverance. It was essentially a high-level wrapper created to solve a specific problem, rather than a low-level tool intended to enable any kind of wrapping based on context (e.g., how trusted the JS code is).

Introductions

In part because of all this, and in part because I’d always wanted to write a Python C extension from scratch, I’ve decided to create a new Python-Spidermonkey bridge: Pydermonkey.

Pydermonkey’s mission is pretty simple and straightforward: it’s just meant to wrap Spidermonkey’s C API as faithfully as possible—including its debugging API—while enforcing the memory safety that Python is known for. This makes it awfully low-level for casual programmers, but thanks to Python’s awesome support for magic methods, it’s not hard to create high-level wrappers that provide much more convenient bridging between JavaScript and Python code.

Where It's At

Pydermonkey is currently at version 0.0.6; its API supports a decent subset of the Spidermonkey C API, but it’s still quite lacking in places: operation callbacks will allow you to run untrusted code that runs in infinite loops, throw hooks allow for full Python-esque stack tracebacks of JS code, yet property catchalls haven’t yet been implemented, which means that security is constrained to conventional sandboxing (membranes and object capabilities aren’t currently possible). There’s also the nasty problem of not being able to detect reference cycles that cross language boundaries, which means that such cycles need to be broken manually for now.

Getting It

Pydermonkey is available at the Python Package Index in source form, and as a precompiled binary for the few platforms that I happen to have access to at the moment.

You should be able to type easy_install pydermonkey at the command line and everything should “just work”: I’ve set up the Paver build script such that the Spidermonkey source code is automatically downloaded and built before the C extension if you’ve got the compiler toolchain on your system, though there are a few snags on Windows to circumnavigate. For more information, read the Pydermonkey documentation. And please feel free to file a bug if you run into one!

Where To Go From Here

If you’d like to see an example of a high-level wrapper, check out my Pydertron experiment. It provides a simple interface to expose untrusted JS functionality to Python code and also contains a CommonJS-compliant implementation of the SecurableModule standard. I’m also playing around with creating a Pydermonkey engine for Narwhal on github; contributions to any of these codebases are more than welcome, and there’s some low-hanging fruit in Pydermonkey that would be perfect for students or first-time contributors.

Finally, if you do anything interesting with Pydermonkey, I’d love to know about it.