failure is not an option

There’s a great writeup over on Matasano (home of many a great writeup) about how a supergenius hacker was able to exploit a NULL pointer coming out of malloc failure to run arbitrary code in Flash. This is interesting to Mozilla in part because a lot of our users have Flash installed, but also specially interesting to me because we’re working with Adobe to converge on a common, high-performance scripting engine for both JavaScript-as-she-is-written-today and ECMAScript future. I’m actually in Boston tomorrow to work with some Mozillians to map out the next interim milestone on our way to JS2 and Tamarin.

As part of the same general effort, known as “Mozilla 2″, we’re also going to be changing how we do memory allocation, so that — just as Thomas recommendsout of memory is a hard-stop failure, rather than an opportunity for a clever (or, as in this case, hyper-clever) exploit to take hold.

Of course, in a system as large as ours, you don’t want to do it all by hand, so we’ll be using static analysis tools to identify and rewrite our code mechanically. This will give us better performance from less computer time spent checking allocation results, reduced code complexity from less human time spent reading through tedious failure-handling code, and protection against a large class of potential attacks. That’s a pretty nice set of things to get in one package.

4 comments to “failure is not an option”

  1. entered 17 April 2008 @ 3:40 am

    So have we given up on the idea of failing gracefully and trying to carry on when we hit an OOM condition? Of course, you can never hope to get to a state where you will recover from all OOM conditions, but there are cases where I think we really want to try. SVG filters jump out as a good example. With our new mobile effort I’d think handling OOM gracefully would be especially important.

    I’d also think that making OOM a hard failure will allow any malicious Web page to crash the browser trivially.

    You mention up sides of switching to making OOM a hard failure, but I think there are down sides too.

  2. entered 17 April 2008 @ 3:55 am

    We will be able to try to handle OOM gracefully, where it makes sense to do so and we can be very sure that we’re going to do the right thing, which is why the jemalloc_canfail API has been requested in bug 427109. OOM is basically a hard failure today, in that you’re virtually certain to crash after really hitting OOM, because not everything checks returns well enough. Whack-a-mole on those bugs is the wrong approach.

    For mobile, I think our strategy will more likely be “detect low mem/quota limit, purge caches, try again”, which can be done within an allocator wrapper. Resource limiting is likely to be more effective for us, and safer, than trying to make all of our allocation paths handle “unwind safely, hope enough gets freed, and resume” — especially if we want to be doing error reporting.

  3. entered 17 April 2008 @ 4:37 am

    OOM may basically be a hard failure today for small allocations, but those are less likely to fail. For the more high risk, larger allocations, I think we probably do a good job of checking and are much less likely to hard fail. I’m glad to hear we’ll at least still be able to still handle these points gracefully. ;-)

  4. VanillaMozilla
    entered 18 April 2008 @ 8:50 am

    Exploiting a null pointer? I really don’t understand how this can still happen in the 21st century. It’s really time for computer languages to be strongly typed, to make sure null pointers really are null, and never to dereference null pointers. This should NEVER be left up to the programmer.