May 28, 2008

Mozilla Weave: A Bird's-eye View

For the past week at Labs, many of us have been sprinting on getting Weave to release 0.2, which aims to be able to allow users to sync all their information between multiple browsers. This is something that Google Browser Sync has been doing for a while, but Weave’s adding an interesting twist to it all: because it aims to maximize user privacy, the data stored on the cloud is basically encrypted by a passphrase that only the user knows. As a result, there’s no way for even the owner of the cloud to mine their data. And because Weave also aims to allow users to optionally share parts of their data with other users and third parties, it effectively becomes a tool for all kinds of secure communications.

Because Weave is fully open-source and uses some pretty standard technologies to achieve what I think are some pretty cool things, I thought it’d be useful to write a bit about how it currently works. Many thanks to Dan Mills for explaining most of this to me.

The Weave server at services.mozilla.com is, for the most part, just a WebDAV server; this design was intentional, to allow for straightforward scalability and to let anyone to easily set up their own Weave server if needded. Another part of the reason for such a “thin server” is out of necessity, though: since the server can’t understand what it’s storing—that’s part of the whole “maximizing user privacy” thing—virtually all of the syncing logic has to be done on the client-side.

Snooping Around

After installing Weave and registering on services.mozilla.com for an account and syncing my data for the first time, I snooped into my user directory. This kind of hackery was surprisingly easy—I just went to services.mozilla.com, clicked on the “Sign in to Weave” button, and entered my email address and password.

The contents of my user directory were as follows:

meta/
  version
private/
  privkey
public/
  pubkey
user-data/
  bookmarks/
    deltas.json
    keys.json
    snapshot.json
    status.json
  history/
    deltas.json
    keys.json
    snapshot.json
    status.json

The meta directory is just meant to contain metadata about the user account; right now it just contains the unencrypted file version, which contains a single number, 2. This is the version of the Weave directory schema that the user directory is stored in, and the Weave client checks it to make sure that it’s “on the same page” as the data store, so to speak.

The private/privkey file contains the user’s encrypted private key, and the public/pubkey file contains the user’s unencrypted public key. This is where things get a little complicated.

Asymmetric Cryptography

In order for users to share some of their data with each other and third parties while maintaining maximal privacy and control over all their data, all of a user’s data can’t just be encrypted with their encryption passphrase; otherwise, someone who wanted to share their bookmarks with a friend would have to give them their encryption passphrase, which, to quote Eran Hammer-Lahav, is much like going to dinner and giving your ATM card and PIN code to the waiter when it’s time to pay.

To get around the issue of privilege delegation, Weave uses Asymmetric Cryptography, the same kind of cryptography that powers secure transactions on the Web. In short, others can encrypt a piece of data with your public key, and you can decrypt it with your private key and vice versa. As the names suggest, the public key is something that is publicly available to everyone, and a private key is something that only you can access. And in Weave, you’re literally the only one who can access this private key, despite the fact that it’s stored on the Weave server at private/privkey, because your private key is itself encrypted using your encryption passphrase.

The nice thing about Weave, though, is that it doesn’t expose any of this confusing public-private-key mumbo-jumbo to unsuspecting end-users who just want to share stuff with their friends; it’s purely an implementation detail that only developers and security experts need to be concerned with. I’ll address how public-key cryptography is actually used in delegation momentarily.

Data Types, Snapshots, and Deltas

The user-data directory is just an umbrella directory; each subdirectory contain the data needed for synchronizing and sharing a particular kind of data. The bookmarks folder, as you might guess, contains bookmark data, and history contains browsing history. More folders will be added as new data types become available to Weave, and users or organizations will even be able to make their own data types if they want.

The status.json file contained within each data type directory is an unencrypted JSON-formatted file that contains metadata about the data storage: the number of items stored, the schema version of the data type being used, and so forth.

The encrypted snapshot.json file in each data type directory contains the most recent “snapshot” of the data store for a data type. For instance, it may contain all of my bookmarks from two days ago, when I first started up Weave on one of my computers and synced with it. The encrypted companion file, deltas.json, contains a sequence of actions that can be performed on the snapshot to make it arrive at the current state. Continuing with my bookmark example, the deltas.json file may contain one “add” action for a bookmark that I added yesterday afternoon and another “remove” action for a bookmark that I removed this morning. The “bookmark” data type engine in the Weave client is responsible for examining the current state of the browser and these deltas, resolving any conflicts between them, and updating them as needed (asking the user for help if absolutely necessary).

Choosing Encryption Carefully

The interesting thing is that status.json and deltas.json aren’t actually encrypted using the user’s passphrase, because the user may want to share them with someone else; for the same reason, they’re not encrypted with the user’s public key. And they’re obviously not encrypted with the user’s private key, or else everyone in the world would be able to read them!

Instead, these files are encrypted using a brand-new, completely random shared key that is stored in the one file I haven’t yet discussed: the unencrypted keys.json file, contained within each data type directory. This file maps individual Weave users to encrypted versions of the same shared key—encrypted versions that are intended for their eyes only, because each one is encrypted using the public key of the user it’s mapped to.

Work in Progress

There’s a lot of details here that I haven’t covered; for instance, you might notice that under this schema, while it’s possible to share one’s bookmarks with another without sharing their browsing history, it’s not possible to share some bookmarks but not others. That’s one of the things we’re currently working on, along with a bunch of other things: syncing more data types, moving from OpenSSL to NSS for cryptographic operations, and more. My description is also just a snapshot of what Weave looks like today, and because it’s still in alpha, it could change drastically over the next several months. If you have any questions or are interested in helping out, please feel free to leave a comment on this blog, post to the Weave Forum, or join #labs on irc.mozilla.org.

© Atul Varma 2017