Building Bridges Between GUIs and Code With Markup APIs

January 7th, 2013

Recently the Twitter Bootstrap documentation gave a name to something that I’ve been excited about for a pretty long time: Markup API.

Markup APIs give superpowers to HTML. Through the use of class attributes, data attributes, X-Tags, or other conventions they effectively extend the behavior of HTML, turning it into a kind of magic ink. Favorite examples of mine include Twitter Bootstrap, Wowhead Tooltips, and my own Instapoppin.

The advantages of a markup API over a JavaScript API are numerous:

  • They mean that an author only needs to know HTML, whose syntax is very easy to learn, rather than JavaScript, whose syntax is comparatively difficult to learn.
  • Because the API is in HTML rather than JavaScript, it’s declarative rather than imperative. This makes it much easier for development tools to intuit what a user is trying to do—by virtue of a user specifying what they want rather than how to do it. And when a development tool has a clearer idea of what the user wants, it can offer more useful context-sensitive help or error messaging.
  • Because of HTML’s simple and declarative structure, it’s easy for tools to modify hand-written HTML, especially with a library like Slowparse, which help ensure that whitespace and other formatting is preserved. Doing the same with JavaScript, while possible with libraries like esprima, can be difficult because the language is so complex and dynamic.

These advantages make it possible to create GUI affordances atop hand-coded HTML that make it much easier to write. As an example of this, I hacked up prototype slideshow demo and physics demo in July of last year. Dragging an element with the class thimble-movable in the preview pane changes (or adds) CSS absolute positioning properties in the source code pane in real-time, and holding down the shift key modifies width and height. This allows users to size and position elements in a way that even a professional developer would find far more preferable to the usual “guess a number and see how it looks” method. Yet this mechanism still places primacy on the original source code; the GUI is simply a humane interface to change it.

This is the reverse of most authoring tools with an “export to HTML” feature, whereby an opaque internal data model is compiled into a blob of HTML, CSS, and JavaScript that can’t be re-imported into the authoring tool. Pedagogically, this is unfortunate because it means that there’s a high cost to ever leaving the authoring tool—effectively making the authoring tool its own kind of “walled garden”. Such applications could greatly facilitate the learning of HTML and CSS by defining a markup API for their content and allowing end-users to effortlessly switch between hand-coding HTML/CSS and using a graphical user interface that does it for them.

Building Experiences That Work Like The Web

December 5th, 2012

Much has been said about the greatness of the Web, yet most websites don’t actually work like the Web does. And some experiences that aren’t even on the web can still embody its spirit better than the average site.

Here are three webbish characteristics that I want to see in every site I use, and which I try my best to implement in anything I build.

  • “View Source” for every piece of user-generated content. Many sites that support user comments allow users to use some kind of markup language to format their responses. Flickr allows some HTML with shortcuts for embedding other photos, user avatars, and photo sets; Github permits a delicious smorgasboard of HTML and Markdown.

    The more powerful a site’s language for content creation, the more likely it is that one user will see another’s content and ask, “how did they do that?”. If sites like Flickr and Github added a tiny “view source” button next to every comment, it would become much easier for users to create great things and learn from one another.

    I should note that by “source” I don’t necessarily mean plain-text source code: content created by Popcorn Maker, for instance, supports a non-textual view-source by making it easy for any user to transition from viewing a video to deconstructing and remixing it.

  • Outbound Linkability. Every piece of user-generated content should be capable of “pointing at” other things in the world, preferably in a variety of ways that support multiple modes of expression. For instance, a commenting system should at the very least make it trivially easy to insert a clickable hyperlink into a comment; one step better is to allow a user to link particular words to a URL, as with the <a> tag in HTML. Even better is to allow users to embed the content directly into their own content, as with the <img> and <iframe> tags.

  • Inbound Linkability. Conversely, any piece of user-generated content should be capable of being “pointed at” from anywhere else in the world. At the very least, this means permalinks for every piece of content, such as a user comment. Even better is making every piece of content embeddable, so that other places in the world can frame your content in different contexts.

As far as I know, the primary reason most sites don’t implement some of these features is due to security concerns. For example, a naïve implementation of outbound linkability would leave itself open to link farming, while allowing anyone to embed any page on your site in an <iframe> could make you vulnerable to clickjacking. Most sites “play it safe” by simply disallowing such things; while this is perfectly understandable, it is also unfortunate, as they disinherit much of what makes the Web such a generative medium.

I’ve learned a lot about how to mitigate some of these attacks while working through the security model for Thimble, and I’m beginning to think that it might be useful to document some of this thinking so it’s easier for people to create things that work more like the Web. If you think this is a good (or bad) idea, feel free to tweet @toolness.

Questions: Designing for Accessibility on the Web

July 7th, 2012

Marco Zehe recently wrote a good, sobering blog post comparing the accessibility of Web apps to those of native ones.

Much of what I’ve seen on supporting accessibility on the Web has to do with using the right standards: always providing alt attributes for images, for example, or adding semantic ARIA metadata to one’s markup.

As a designer, however, I don’t have much interest in these standards because they don’t seem to address human factors. What’s more interesting to me is understanding how screen readers present the user interface to vision-impaired people and how usable that interface is to them. This would parallel my own experience of designing for non-impaired users, where I use my understanding of human-computer interaction to create interfaces from first principles.

I’ve been meaning to actually get a screen reader and try browsing the Web for a few days to get a better idea of how to build usable interfaces, but I haven’t gotten around to it yet, and I’m also not sure if it’s the best way to empathize with vision-impaired users. In any case, though, my general concern is that there seems to be a distinct lack of material on “how to build truly usable web applications for the vision impaired.” Instead, I only see articles on how to be ARIA standards-compliant, which tells me nothing about the actual human factors involved in designing for accessibility.

So, I’ll be spending some time looking for such resources, and trying to get a better idea of what it’s like to use the internet as someone who is vision-impaired. If you know of any good pointers, please feel free to tweet at me. Thanks!

Empathy For Software Conservatives

July 6th, 2012

Recently my friend Jono wrote an excellent blog post entitled Everybody hates Firefox updates. I agree with pretty much everything he says.

What most struck me during my time in Mozilla’s Mountain View office was a complete lack of empathy for people who might want to “stay where they were” and not upgrade to the latest version of our flagship product. Whenever someone asked a question like, “what if the user wants to stay on the old version of Firefox?”, the response was unequivocally that said user must be delusional: no one should ever want to stay on an old version of a product. It simply doesn’t make sense.

I’ve always credited this bizarre line of thinking to everything I dislike about the Silicon Valley bubble, though I don’t ultimately know if it’s endemic to the Valley or just tech companies in general. I personally have always despised upgrading anything–not just Firefox–for exactly the reasons that Jono outlines in his post. Most people at Mozilla–and likely other software companies–seem to think that it’s just the outliers who dislike such disruption. Maybe they are, I have no idea. But I and most people I know outside of the software industry view upgrades as potential attacks on our productivity, not as shiny new experiences. If there’s any desire at all within Mozilla to cater to such “software conservatives”, the first step is having actual empathy for folks who might want to stay right where they are, rather than treating them as an irrelevant minority.

Learning and Grammatical Forgiveness

April 26th, 2012

HTML is a very interesting machine language because, like human languages, most things that interpret it are very forgiving.

For instance, did you know that the following HTML is technically invalid?

<video>
  <source src="movie.mp4"></source>
</video>

It’s invalid because <source> is a so-called void element: since it can’t have any content inside it, you simply don’t need a closing tag for it. The <img> tag works the same way. The technically correct way to write the above HTML snippet is as follows:

<video>
  <source src="movie.mp4">
</video>

However, in practice, all Web browsers will interpret both of these snippets the exact same way. When a browser sees the closing </source> tag on the first snippet, it realizes the “mistake” the author has made, and simply pretends it isn’t there.

What’s interesting to me is the way this mirrors human languages, and what it means for teaching. For instance, the following sentence is grammatically incorrect:

The dog loves it's owner.

However, no one who knows English will actually be confused by the meaning of the statement.

When I was trained as an adult literacy tutor several years ago, one of the most important principles we were taught was that fostering a love for writing was vastly more important than grammatical correctness. The “red pen” commonly used by school teachers for correcting grammatical errors was seen as anathema to this: when we found a grammatical error in a novice writer’s work, we were encouraged to ignore it unless it actually made the piece confusing or ambiguous for readers in a way that the author didn’t intend. Otherwise, the novice writer would become quickly distracted and discouraged by all their “mistakes” and view writing as a minefield rather than a way to communicate their thoughts and ideas.

We’re running into similar issues in the design of the Webpage Maker. On one hand, the fact that Web browsers are so forgiving when interpreting HTML enables us to follow a similar philosophy as that of progressive adult literacy tutors.

But sometimes, the forgiving nature of Web browsers backfires: they actually render a document that is vastly different from the author’s intent, which is just as frustrating as a pedantic nitpicker. We’ve created a library called Slowparse—soon to be renamed—which attempts to assist with this, providing the logic needed for a user interface to gently inform users of potential ways their HTML and CSS code might be misinterpreted by machines. A full specification of errors and warnings is also available, as is an interactive demo that uses the library to provide real-time feedback to users.

It’s been interesting to see how different Slowparse is from a HTML/CSS validator, whose goal is not one of learning, but of ensuring conformance to a specification. From a learning perspective, a validator is like the pedantic teacher who loves their red pen: some of its feedback is quite useful, but the remainder is likely to confuse and intimidate a newcomer.

Partly as a result of its learning goals, Slowparse actually “warns” the user of things that are technically valid HTML/CSS, but which likely don’t reflect the intent of the author. One current example of this is in regards to the use of unquoted attributes in HTML5, though that particular example is still subject to change.

At this point, I think the challenge will be to work with our learning team and user test our interface to the point that we achieve a good balance between being a pedantic nitpicker and providing useful feedback that helps users as quickly as possible. In my opinion, if we do things right, we’ll help people develop a love for HTML and CSS—even if what they write may technically be “grammatically incorrect.”

Prototyping Presentations

March 31st, 2012

Presentations take a long time to make. Particularly when I’m just conceptualizing my presentation, it takes a lot of work to record myself talking, use a tool to sync it with the proper visuals, and then repeat the recording and syncing process as I iterate on the content.

I recently made a simple tool called Quickpreso to make the process of “prototyping” a presentation quicker, and more like writing a simple HTML page.

A presentation in Quickpreso is just an HTML file with a series of alternating lines for visuals and voice-overs, like this:

<img src="slide-one.jpg">

This text will be spoken for slide one.

<a href="http://mozilla.org/">I am slide two.</a>

This text will be spoken for slide two.

The visuals can contain any HTML markup. Each section of voice-over text is rendered by the OS X say command; they’re all concatenated together into an audio file by ffmpeg. Finally, the visuals are synced to the audio in a Web page using popcorn.js.

Quick iteration is facilitated by a simple Python web server that regenerates the audio file when it detects changes to the voice over text. The final product is all static content that can be served from any web server.

I used this tool to create a Webmaking for Knitters presentation in January. The result is quite robotic, obviously, though it can be made a little more natural-sounding if newer voices from OS X Lion are used (I’m still on Snow Leopard).

One particular advantage of this approach, however, is that you get subtitles/closed-captioning for free. There’s also nothing preventing you from re-recording the final audio in your own voice once you’re happy with your prototype.

The source code is available on Github at toolness/quickpreso. The code is in an alpha state, so your mileage may vary; fortunately, though, the source code is miniscule, so understanding and changing it shouldn’t be hard.

Coffee Machines And Community

March 28th, 2012

The Toronto and San Francisco Mozilla offices each feature very different coffee makers.

The Toronto office has a Rancilio Epoca espresso machine. It has lots of knobs and switches, and one has to be taught how to use it. When one learns, the first few drinks they make are likely to taste very bad; a conscious effort must be made to learn from one’s mistakes and create better drinks.

The SF office has a Miele espresso machine with three buttons and an LCD display. It’s comparatively easy to use, and makes fine drinks at the push of a button—until something goes wrong in the opaque innards of the machine. The sight of error dialogs like this one are extraordinarily common:

The machine probably has as many different kinds of error messages as a modern computer operating system. Like an operating system, it also offers little to no assistance on how to fix the problem that’s occurring.

Both of these coffee machines reflect very different philosophies about the nature of tools, and about the people who use them. For me, one of the most interesting aspects concerns the communities that have grown—or failed to grow—around them.

Actually, the very existence of the Toronto office’s espresso machine is due to community: as I understand it, a core group of Torontonian espresso aficionados decided to buy a small communal machine with their own money, and upgraded the machine later on. The result I’ve noticed, whenever I’ve visited the Toronto office, is that the kitchen area becomes a place for learning. People are constantly teaching each other how to make a better drink, asking questions, debating the proper pressure to use when tamping (“approximately the weight of a cat”, says one coworker), and so forth. In a growing workplace that is constantly in danger of being segregated by organizational boundaries, a community of practice that’s centered around something completely unrelated to work is an amazing asset.

In contrast, there isn’t much of a community that can form around the push-button Miele machine in the San Francisco office. Because there’s no way to go “under the hood” and customize one’s coffee-making experience in a positive way, there’s not really any technique to share with others. An interface that is so “self-serve” and “easy to use” that it prohibits nuance and customization is also one that doesn’t encourage people to talk about it.

This might be an acceptable trade-off, except for the fact that the Miele isn’t fully self-serve. Like any machine, it needs maintenance, and because there was never an incentive for anyone to understand how it worked, very few people know how to fix it—if anything, I’ve experienced a sense of resentment when I’m presented with an error, as though the machine has violated its contract of being mind-numbingly easy to use. Consequently, the only talk I’ve ever heard about this machine revolves around its inscrutable error messages, which only the office manager knows how to fix.

The same dynamics I’ve described here apply to all of our tools. No matter how easy and “self-serve” we try to make our software, a significant portion of its users will still run into problems and use cases that they don’t know how to get past on their own. I don’t know if it’s possible to foster communities of practice through tools designed for freedom and empowerment—even if we put the Rancilio in the San Francisco office, for instance, it may languish if no one there thinks a good espresso is worth learning how to make. But it’s certainly worth finding out.

Playfulness and Learning

November 29th, 2011

Michelle Levesque recently wrote a post about the importance of play in learning.

We need to change people’s mindsets to make them comfortable fooling around, making things, breaking things, and playing on the web.

I totally agree. This is one of the design goals of the Hackasaurus tools and events, actually—it’s a combination of stylistic touches and emotional design to help people feel that what they’re doing is fun, along with humane functionality that makes experimentation easier, such as infinite undoability.

This is something I feel that Apple has managed to do with their products, too: I see non-technical people like my father who are typically terrified of using their personal computer take joy in installing apps on their iPad and playing with them in a way that they never would have dared to do on their PC. Partly it’s due to concrete features, like the fact that it’s impossible for an app to impair the behavior of another app—but it’s also partly due to stylistic touches, like all the device’s glee-inspiring animations.

It’s ironic that tech-savvy folks like me berate the iPad for its lack of generativity and hackability, yet it manages to orient its users towards a sense of playfulness and empowerment in a way that other tools rarely have.

Hacking The Web With Interactive Stories

August 15th, 2011

I recently made The Parable of The Hackasaurus, which is a game-like attempt to make web-hacking easy to learn through a series of simple puzzles in the context of a story.

The parable is really more of a proof-of-concept that combines a bunch of different ideas than an actual attempt at interactive narrative, though. The puzzles don’t actually have anything to do with the story, for instance. But I wanted an excuse to do something fun with the vibrant art that Jessica Klein has made for the project, while also exploring possibilities for the Hack This Game sprint and giving self-directed learners a path to understanding how the Hackasaurus tools work.

If you know HTML and CSS, you’re welcome to remix this and make your own “web hacking puzzle” where the goggles are the controller. Just fork the repository on GitHub and modify index.html and bugs.js as you see fit. Almost all the puzzles, achievements, and hints are data-driven and explained in documentation, so you don’t actually need to know much JavaScript.

For more information on my motivations in creating the parable, check out the experiment’s README. Jess is also thinking about making an interactive comic that teaches web-hacking, which I’m looking forward to.

The Decline and Fall of The URL

August 8th, 2011

The URL is a very powerful concept; it represents a universal way to access any resource anywhere in the world. Here’s one of them, as it appears in Firefox 5′s address bar:

Address bar containing http://www.mozilla.org/

The first few letters before the colon are called the protocol, which tells the computer how to interpret the rest of the URL. The http protocol is the most common and specifies a resource on the World Wide Web, while the tel protocol specifies a telephone number, and https specifies a resource on the Web transferred over a secure channel that can’t be eavesdropped. Those are just a few; there’s lots of other ones.

Many user interface designers for browsers believe that most users don’t understand what a protocol is, which is probably accurate. Google Chrome’s solution is to hide the protocol when it’s http, but to display the protocol in all other cases. Firefox is now adopting the same behavior.

There’s a number of things that trouble me about this approach. I’ve already written about the behavioral impacts, whereby user expectations of copy-paste are broken and a confusing mode is introduced. Furthermore, because protocol information is still displayed for any non-http resource, understanding how to read the address bar is (ironically) made more complex.

Aside from those concerns, however, there’s something else I’m worried about. The main argument I’ve heard against exposing protocol information to end-users is that, if we present it, we might as well present all kinds of other information about the TCP connection, CPU registers, and other obscure technical statistics.

Now, I know I’m biased because I’m on the Hackasaurus team and trying to teach people the basics of HTML and CSS, but browsers have historically been very friendly to learning web-making, in part because they keep protocol information in the address bar. My guess is that removing the http:// neither helps nor hinders someone from using the basics of the web—but it definitely makes it harder to learn what hypertext is.

Understanding technology is relative. Someone can know the basics of writing an anchor tag without knowing what TCP/IP is, and it’s still quite empowering, in much the same way that it’s empowering to know how to grill vegetables without necessarily knowing everything about the chemical reactions taking place underneath.

Doing little things in the interface that promote transparency and help people move from being a web-user to a web-maker is important, so long as we don’t make things difficult for the people that just want to be users. I’ve never found my parents or other non-technical users to be confused by the presence of http://, which is part of why I don’t see much gain in removing it—especially given the behavioral shortcomings of this change. Far more exciting to me is the exact opposite approach: designing experiences to help users understand what the URL in their address bar means, and encouraging them to create things on the Web instead of just browse.