approaches to performance

[This post doesn't have links to anything, and it really should. I'm a bit pressed for time, but I'll try to come back later and fix that.]

Important: I no longer work for Mozilla, and I haven’t yet started working for Facebook, so what I write here shouldn’t be taken as being the stance of those organizations.

Platforms always have to get faster, either to move down the hardware stack to lesser devices, or up the application stack to more complex and intensive applications. The web is no exception, and a critical piece of web-stack performance is JavaScript, the web’s language of computation. This means that improvements in JS performance have been the basis of heated competition over the last several years, and — not coincidentally — an area of tremendous improvement.

There’s long been debate about how fast JS can be, and whether it can be fast enough for a given application. We saw it with the chess-playing demo at the Silverlight launch (for which I can’t find a link, sadly), we saw it with the darkroom demo at the Native Client launch, and I’m sure we’ll keep seeing it. Indeed, when those comparisons were made, they were accurate: web technology of the day wasn’t capable of running those things as quickly. There’s a case to be made for ergonomics and universality and multi-vendor support and all sorts of other benefits, but it doesn’t change the result of the specific performance comparison. So the web needs to get faster, and probably always will. There are a number of approaches being mooted to this by various parties, specifically around computational performance.

One approach is to move computationally-intensive work off to a non-JS environment, such as Silverlight, Flash, or Google’s Native Client. This can be an appealing approach for a number of reasons. JS can be a hard language to optimize, because of its dynamism and some language features. In some cases there are existing pieces of non-web code that would be candidates for re-use in web-distributed apps. On the other hand, these approaches represent a lot of semantic complexity, which makes it very hard to get multiple interoperating implementations. (Silverlight and Moonlight may be a counter-example here; I’m not sure how much they stay in sync.) They also don’t benefit web developers unless those developers rewrite their code to the new environment.

Another approach is to directly replace JS with a language designed for better optimization. This is the direction proposed by Google’s Dart project. It shares some of the same tradeoffs as the technologies noted above (easier to optimize, but complex semantics and requires code to be rewritten), but is probably better in that interaction with existing JS code can be smoother, and it is being designed to work well with the DOM.

A third approach, which is the one that Mozilla has pursued, is to just make JS faster. This involves implementation optimizations and adding language features (like structured types and binary arrays) for more efficient representations. As I mentioned above, we’ve repeatedly seen that JS can be improved to do what is claimed as impossible in terms of performance, and there are still many opportunities to make JS faster still. This benefits not only new applications, but also existing libraries and apps that are on the web today, and the people who use them.

Yesterday at the SPLASH conference, the esteemed Brendan Eich demonstrated that another interesting milestone has been reached: a JavaScript decoder for the H.264 video format. Video decoding is a very computationally-intensive process, which is one reason that phones often provide specialized hardware implementations. Being able to decode at 30 frames a second on laptop hardware is a Big Deal, and points to a new target for JS performance: comparable to tightly-written C code.

There’s lots of work left to do before JS is there in the general case: SIMD, perhaps structural types, better access to GPU resources, and many more in-engine optimizations that are underway in several places I’m sure. JS will get faster still, and that means the web gets faster still, without ripping and replacing it with something shinier.

Aside: the demonstration that was shown at SPLASH was based on a C library converted to JS by a tool called “emscripten”. This points towards being able to reuse existing C libraries as well, which has been a selling point for Native Client thus far.

As Brendan would say, always bet on JS.

the birth of a faster monkey

Over the past year, JavaScript performance on the web has undergone a striking revolution. Virtually every browser has improved its engine to produce significant gains in execution speed; Firefox is about 3 times faster than Firefox 2 in various JavaScript benchmarks, for example. But of course, developer and user demand for performance is insatiable, and at Mozilla we demand it ourselves, since our application itself is largely and increasingly written in JavaScript. In addition to improving the performance of web applications, our work on JS performance in Firefox 3 made our own application snappier and more responsive.

We’re not done. In addition to continuing to work on our existing JavaScript interpreter (some 20% improved over Firefox 3 already), we’re also looking farther into the future of JS performance, and have some early news to share. Permit me, if you will, to set the stage:

Core primitives improved by 20-40x

These are the early results from a project we’ve been calling TraceMonkey, which adds native code compilation to Mozilla’s JavaScript engine (“SpiderMonkey”). Based on a technique developed at UC Irvine called “trace trees“, and building on code and ideas shared with the Tamarin Tracing project, a few of us have spent the last 2 months (and most of the last few nights) teaching SpiderMonkey some exciting new tricks.

The goal of the TraceMonkey project — which is still in its early stages — is to take JavaScript performance to another level, where instead of competing against other interpreters, we start to compete against native code. Even with this very, very early version we’re already seeing some promising results: a simple “for loop” is getting close to unoptimized gcc:

for loop: gcc 1070ms, js 910ms

Yesterday we landed TraceMonkey in the Firefox 3.1 development tree, configured off by default. We have bugs to fix, and an enormous number of optimizations still to choose from, but we’re charging full speed ahead on the work we need to do for this to be a part of Firefox 3.1. Depending on the benchmarks you choose, you might see massive speed-up, minor speed-up, or maybe even some slowdown — those latter cases are definitely bugs, and reporting them through bugzilla will be a big help.

Here are the current speedups of some common and uncommon benchmarks, as compared to Firefox 3: Apple’s SunSpider, the SunSpider “ubench” tests added for squirrelfish; an image manipulation demo; and a test of the Sylvester 3D JavaScript library doing matrix multiplication.

4 bench graph: sunspider at 1.8x, ubench at 22.4x, image manipulation at 6.5x, sylvester at 6.2x

There are many wins left in each one of those benchmarks, and we’ll be working on those through Firefox 3.1 and beyond: better code generation, more efficient guards, improvements to some data structures, parallel compilation, use of specific processor features, new optimization passes, tracing more code patterns, and many more. Right now we write all values back to memory at the end of every loop, for example, so there are some easy wins available in avoiding that — perhaps 2-3x on tight loop performance.

There’s lots more to write about why we chose tracing as the path to future performance, what to expect in terms of future work, how these sorts of performance gains can translate into new web experiences and capabilities, and what we’ve learned along the way. (One example out of left field: the static analysis tools are written in JavaScript, and apparently they are immensely faster due to even the current JIT work.) Look for posts about that from myself and other TraceMonkey hackers soon.

If you’re the sort of person who reads computer science papers, you may find the Hotpath paper to be of interest: it’s a great paper, and a great introduction to the art and science of tracing.

Thanks are especially due to Brendan, Andreas and David for making it fun to be the dumbest guy on the team; Andreas’ colleagues at UCI (Michael Franz, Mason Chang, Michael Bebenita, Gregor Wagner) for their advice and help; Ed Smith and the Adobe Tamarin team for their tech and wisdom; Rob Sayre, Vlad Vukicevic, Blake Kaplan, Boris Zbarsky and Bob Clary for testing and timely guidance; and the Mozilla developer community for letting us hold the tree closed for more than a day to get it landed.

Chris Pine from Opera on ES4 decimals

Chris Pine, a great guy who works at Opera on their JS engine (and who once lived with Sid Meier, I was awestruck to learn) has written an article about the decimal arithmetic available in the proposed ES4 draft. Decimals are pretty handy for applications where you need to…well, you should just go read the article; he does a better job than I would. I only got to meet Chris once, at a fateful ECMA meeting last spring, but from reading even this one article I’m sure you’ll understand why I’d love to meet up with him again.

Also of interest to many will likely be this phrase: “JavaScript 2 (supported in future versions of Opera, but not right at the moment.)” Go Opera!