Coming At You Like A Pydermonkey

Since learning JavaScript over a year ago, information pills it’s become one of my favorite dynamic programming languages alongside Python. And as I’ve mentioned before, purchase I think the two languages actually complement each other pretty well.

Python, at its heart, is a platform that’s built to be extended. The evidence for this is plentiful: there’s modules and packages out there that offer practically any functionality you want, from web servers to 3D game engines to natural language processing toolkits and more, all instantly accessible through a simple command or an installer download. Yet one of the costs of all this generativity has been the fact that Python doesn’t really have much of a security model to speak of: any Python program has as much access to the underlying system as the current user does, which, compared to the Web, is basically omnipotence. Creating programs that obey the principle of least privilege is pretty hard.

JavaScript, on the other hand, has many of the opposite problems. For one thing, it’s really built for embedding: until the very recent advent of CommonJS and Narwhal, for instance, the language has always lacked a general-purpose platform and standard library. A Pythonic way of saying this is that the language doesn’t come with “batteries included”, but this can actually be a good thing from a security standpoint: because the simplest possible embedding has no privileges and needs to be explicitly given all its capabilities by its embedder, it’s very easy to follow the principle of least privilege. Recent work on membranes and capability models puts JavaScript way ahead of many other languages in the security realm, yet the lack of a mature general-purpose platform has meant that anyone who’s wanted to leverage these strengths has always had to muck around in C/C++ to create the kind of embedding they wanted.

Well, to an extent. One of the many aspects of Java that I’ve frequently been envious of has been Rhino, a JavaScript engine written entirely in Java, which allows anyone who knows Java to create their own embedding solution that leverages Java’s strengths. But I prefer Python to Java, and moreover, the engine itself isn’t worked on with as much intensity as the JS engines that power real-world consumer products like V8 and SpiderMonkey—so new language features are slow to be implemented and performance isn’t great.

I’d briefly tried resurrecting John J. Lee’s Python-Spidermonkey last year but I soon discovered that it wasn’t really what I wanted. For instance, JS objects were copied into Python approximations as they crossed the language boundary, which resulted in a “lossy” transfer and prevented features like identity perseverance. It was essentially a high-level wrapper created to solve a specific problem, rather than a low-level tool intended to enable any kind of wrapping based on context (e.g., how trusted the JS code is).

Introductions

In part because of all this, and in part because I’d always wanted to write a Python C extension from scratch, I’ve decided to create a new Python-Spidermonkey bridge: Pydermonkey.

Pydermonkey’s mission is pretty simple and straightforward: it’s just meant to wrap Spidermonkey’s C API as faithfully as possible—including its debugging API—while enforcing the memory safety that Python is known for. This makes it awfully low-level for casual programmers, but thanks to Python’s awesome support for magic methods, it’s not hard to create high-level wrappers that provide much more convenient bridging between JavaScript and Python code.

Where It’s At

Pydermonkey is currently at version 0.0.6; its API supports a decent subset of the Spidermonkey C API, but it’s still quite lacking in places: operation callbacks will allow you to run untrusted code that runs in infinite loops, throw hooks allow for full Python-esque stack tracebacks of JS code, yet property catchalls haven’t yet been implemented, which means that security is constrained to conventional sandboxing (membranes and object capabilities aren’t currently possible). There’s also the nasty problem of not being able to detect reference cycles that cross language boundaries, which means that such cycles need to be broken manually for now.

Getting It

Pydermonkey is available at the Python Package Index in source form, and as a precompiled binary for the few platforms that I happen to have access to at the moment.

You should be able to type easy_install pydermonkey at the command line and everything should “just work”: I’ve set up the Paver build script such that the Spidermonkey source code is automatically downloaded and built before the C extension if you’ve got the compiler toolchain on your system, though there are a few snags on Windows to circumnavigate. For more information, read the Pydermonkey documentation. And please feel free to file a bug if you run into one!

Where To Go From Here

If you’d like to see an example of a high-level wrapper, check out my Pydertron experiment. It provides a simple interface to expose untrusted JS functionality to Python code and also contains a CommonJS-compliant implementation of the SecurableModule standard. I’m also playing around with creating a Pydermonkey engine for Narwhal on github; contributions to any of these codebases are more than welcome, and there’s some low-hanging fruit in Pydermonkey that would be perfect for students or first-time contributors.

Finally, if you do anything interesting with Pydermonkey, I’d love to know about it.

9 Replies to “Coming At You Like A Pydermonkey”

  1. Yep–as I mention in the documentation, Spidermonkey’s C API relies a fair bit on preprocessor macros that are defined in its header files. Because Pyrex couldn’t directly import header files, this was actually a liability with Python-Spidermonkey too: if Python’s conception of how the macros worked were ever out-of-sync with SpiderMonkey’s, strange crashes could occur. This was further exacerbated by the fact that a number of structs and macros have conditional definitions depending on the specifics of the platform Spidermonkey was compiled for, and what flags were set when Spidermonkey was compiled, which meant that even if I didn’t have to compile a Python C extension for each platform, I’d still probably have to compile the Spidermonkey DLL for each platform, which is almost as much of a hassle.

    In other words, things I got “for free” from Spidermonkey’s build system when compiling in C were suddenly things I had to be responsible for when doing things via ctypes.

    Aside from that, in a lot of ways, the static type checking done by the C compiler was actually a huge aid in development–kind of like a really detailed contract system.

    I also wanted to have the freedom to be able to access Spidermonkey functionality that isn’t necessarily publicly exposed through the JSAPI. One example of this is the parser/AST functionality that JSHydra taps into: because it accesses C++ classes that can’t easily be exposed through a shared library, it’s actually impossible to do what JSHydra does without statically linking to Spidermonkey.

    And finally, I wanted full access to Python’s C API, as I had a hunch that, being a language-to-language bridge, I’d eventually need to really need to tap into some of Python’s low-level functionality that may not be exposed to Python proper, such as the ability to provide custom object finalization, hold the GIL long enough to perform a series of operations atomically, etc. I’m not actually sure if this hunch proved correct or not, though.

  2. Unspeakable glee. Thanks for taking this on, and rigorously documenting and testing.

    On a side note, apart from Rhino, Default, Secure, and Browser, which are a sort of minimum set of engines to get Narwhal and Tusk up and running, we’ve engineered the bootstrapping process so that engines can be installed as packages with Tusk. I recommend making a narwhal-pydermonkey package, organized like http://github.com/tlrobinson/narwhal-v8/tree/master. Then, adding a NARWHAL_ENGINE_HOME to narwhal.conf, or using “tusk engine pydermonkey” will set up a project with that engine.

  3. @Kris: Thanks! I really like the bootstrapping process you’ve built into Narwhal. I’ll let you know when I’ve finished porting the Pydermonkey engine!

    @Patrick: Yeah, a standardized “PyJsApi” would be really slick. I guess it’d even be nice at the C/C++ level, to allow embedders to easily switch from one engine to another, but since different interpreters use different GC schemes that clients need to deal with separately … maybe that’s not workable. Anyhow, thanks for the ctypes code!

  4. Please, change the name of the program. For Russian-speaking users the first part of it (“pyder”) is similar to a rude Russian word meaning “homosexual”.

  5. I’m happy to see a bridge between the two great languages python & javascript.

    I want pydermonkey help me access objects & functions stored in a javasctipt library. Can u help me figure out this???

  6. I’m also looking to access a javascript libraries using pydermonkey. Are there any examples or better documentation on how to do this?

  7. @jpmalich and @selex: Probably the best place to look for examples of how to load JS files is the Pydertron library I linked to.

    Note, though, that this also depends on what your definition of “JavaScript library” is: the core JS language doesn’t actually define a lot of objects that are present in Web pages, like window, document, XMLHttpRequest, and the like, so if your JS libraries expect those to exist, they won’t work. You can try implementing some of those things in Python and exposing them to the JS sandbox, but it’s non-trivial.

Comments are closed.