fsyncers and curveballs

[Update: Hi, interwebs! Had to block my blog for a little bit to deploy the full power of wp-super-cache, everything should be fine now.]

A couple of articles have now been written about the unpleasant behaviour that people encounter with Firefox 3 in certain Linux configurations, related to the flushing of I/O and system lag that can see if there is a lot of other disk activity at the same time. There are a lot of moving parts to this issue, and so it’s not surprising that there’s a fair bit of misunderstanding, though some of it seems less well-meaning that others, which makes me a bit sad.

What’s Firefox doing?

Firefox uses a database called sqlite as the underpinning for many kinds of data storage in Firefox 3, including the browsing history and bookmarks data used to provide the Awesomebar’s awesomeness. sqlite is an excellent piece of software, written by people who take both data integrity and performance very seriously, which makes it a great place to put this sort of data. Lots of people use sqlite these days, and we’re proud to be founding members of the consortium that helps support sqlite development.

Databases, perhaps obviously, usually have complex file formats, and require that different parts of their files agree about things like how many records there are, whether a transaction completed successfully, or how indexes match up with the data to which they refer. This makes them more sensitive to data corruption than some simpler formats, like a basic text file. If you get a chunk of a text file corrupted, you can probably edit around it and salvage the rest of the file, but if you get a chunk of a database file corrupted you can often effectively lose all of the data that’s held there. This is one of the tradeoffs for being able to have efficient access to large sets of data, and it’s common to virtually all databases.

One of the things that sqlite does to ensure that the database is not corrupted in the case of a crash is call a function called fsync. fsync tells the operating system to ensure that this file has been safely written to disk, and waits until that’s complete. This provides what’s known as a “barrier”, and makes sure that we don’t get mismatched parts of transactions (groups of related database operations) when we look at the file after a crash. This is very effective: even if the operating system itself crashes or the computer loses power suddenly, we won’t see the database corrupted.

We don’t want to lose the user’s data, because that makes users sad, and we like to make users happy. So up through Firefox RC1, we set sqlite to its recommended setting of “FULL” synchronization. As release-end-game luck would have it, that made some users sad, because they would find Firefox and sometimes other parts of their system would pause unpleasantly, and it was tracked down to Firefox calling fsync. The bug in question is here, but I caution you to not read it piecemeal; there’s a lot of intertwined conversation there, and some comments are not as correct as they sound.

(There have been other bugs along the way that could cause this, ranging from performance problems with certain sqlite versions to bad interactions with the data set used for malware protection, but those were put behind us by RC1. This bug is the one that has people working weekends at this point.)

Why does that hurt so much?

On some rather common Linux configurations, especially using the ext3 filesystem in the “data=ordered” mode, calling fsync doesn’t just flush out the data for the file it’s called on, but rather on all the buffered data for that filesystem. Because writing to disk is so much slower than writing to memory, operating systems can buffer a lot of data, especially if you’re doing something that involves a lot of I/O, like unpacking a zip file or compiling software. It gets written out in the background, giving you vastly, vastly improved performance. It’s no exaggeration to say that without this sort of buffering your computer would be entirely unusable.

I think you can see where this is going: if there’s a lot of data waiting to be written to disk, and Firefox’s (sqlite’s) request to flush the data for one file actually sends all that data out, we could be waiting for a while. Worse, all the other applications that are writing data may end up waiting for it to complete as well. In artificial, but not entirely impossible, test conditions, those delays can be 30 seconds or more. That experience, to coin a phrase, kinda sucks. Does it suck as much as file corruption wiping out your bookmarks after your computer (not Firefox) crashes? As you might imagine, opinions vary.

This problem with ext3 is well-known to Linux kernel developers, and there’s great work underway as part of the “ext4″ project to remedy it. Other filesystems (like reiser4, I have heard) have similar problems, and I presume that their developers are also working on resolving them.

Why doesn’t other (non-sqlite) software do this?

Actually, a lot of software that’s concerned with data integrity does this, including the editors emacs and vim, and mail clients like mutt and Evolution, as well as bigger databases like MySQL and Postgres. In some cases, those programs are in fact adding more calls to fsync to protect user data better.

In fact, so many programs use fsync to ensure data integrity, and actually writing to disk is so expensive, that some operating systems make fsync not be a “real” fsync: the data is scheduled for (hopefully) immediate write-out, but the call doesn’t wait until it’s all the way to the disk, so it’s not really an effective barrier. This may be permitted by various standards like POSIX, but it’s certainly surprising for programs like Firefox that use it to protect against data corruption in the case of a crash!

Here’s what Apple has to say about fsync, sqlite, and data corruption:

fsync on Mac OS X: Since on Mac OS X the fsync command does not make the guarantee that bytes are written, SQLite sends a F_FULLFSYNC request to the kernel to ensures that the bytes are actually written through to the drive platter. This causes the kernel to flush all buffers to the drives and causes the drives to flush their track caches. Without this, there is a significantly large window of time within which data will reside in volatile memory—and in the event of system failure you risk data corruption.

Exciting!

If this is an operating system bug, why is Firefox being patched?

Because we want to make users happy. Whether Linux should have better fsync behaviour or not isn’t really going to matter to our users — we want to support Linux, which means Linux-as-she-is-shipped, not Linux-as-we-would-like-her-to-be. That means that we need to deal with X servers, with font-selection systems, and with filesystem behaviour as we find it, because that’s where our Linux users are. (It’s not like Windows and OS X don’t have their own annoying things to work around either, though they don’t seem to have this specific one!)

So is it always going to be like this?

No! (Yay!)

In the immediate term, there is a patch that controls how aggressively we sync, and defaults to a slightly less-aggressive state that is equivalently safe on modern operating systems. That patch will be in either Firefox 3.0.1 at the latest, and we’ve been in contact with Linux distributors to make sure they’re aware that the patch is fine to take in their builds — desirable, even. It might be in Firefox 3.0 proper, depending on what happens with an RC2, but either way the vast majority of affected users (Linux users, who usually get their Firefoxes from their distributors) will get the fix right away. This patch also lets users, who might decide that their systems are stable enough and their backups good enough that they don’t need the extra protection, turn off the data-integrity fsyncs almost entirely.

In the medium term, we’re going to be making our database use more asynchronous in Firefox, and batching transactions for things like history. You’ll likely see those effects in the major version of Firefox that follows 3 (might be called 3.1, might be called 3.5, might be called Firefox Magenta, who knows?). This will keep Firefox from pausing in these states, but may not entirely keep the fsync calls from affecting other applications on the system. The sqlite developers are also looking at adding support for a newer, Linux-only API that doesn’t have the system-wide effects as fsync, which could help as well.

Longer term, as I mentioned above, the Linux filesystem situation will improve in this respect, and the work is well-underway. It’s probably a year at least before most Linux users are running systems with fixed fsync behaviour, but at that point application developers won’t have to worry about their data-integrity needs causing pain for the whole system. I, for one, am looking forward to it.

I’d like to thank the sqlite developers for their help analyzing the effects of different fsync patterns, various Linux kernel developers (especially my former colleague Andreas Dilger), and the users who have helped test different settings and I/O loads. We’re going to ship a better Firefox 3 on Linux because of it, and we have even more ways to improve that experience on all operating systems in the future. I would not like to thank the people who have substituted vitriol and dogma for analysis and understanding of the release cycle, but we all get frustrated sometimes. I hope they enjoy Firefox 3 too.

year of the Gecko

Stuart put up a great post today describing the results of our intensive focus on memory use in Firefox 3 (and followed up, after many requests from commenters on his blog and elsewhere, with a graph including Safari and Opera). The memory gains are great, and they cover all sorts of improvements: leak fixes, allocator changes, new facilities to eliminate classes of troublesome entrainment, and better cache management.

It’s a time-honoured programming tradeoff that using more space speeds you up, but that’s not what happened here: our memory-reduction regimen actually made us faster in a lot of cases by making us more cache-friendly and by side-effects like using a better allocator. And we didn’t stop there, dropping the hammer on major performance gains in rendering and JavaScript as well, and leaving us as of today right at the top of tests like Apple’s SunSpider.

Productivity and feature wins in Firefox-the-application are really coming together as well, with the AwesomeBar leading many people’s lists of favourite new feature. It really has changed the way I use the web, and I feel like everything I’ve ever seen is right at my fingertips. Add to that the great strides in OS integration and theming for Mac and Linux and it really is shaping up to be the best browser the web has ever known.

I’m obviously excited; this feels like exactly the right sort of everything-coming-together that should be in the air on the cusp of the 10th anniversary of the original source release. It hasn’t been an easy ride, especially pre-Firefox, and nobody on the project takes our success so far for granted — which makes it all the more satisfying to see years of investment pay off in a fantastic product.

Other people are excited too, from users and journalists to extension developers and companies looking to add web tech to their products. In the mobile arena especially we’re seeing a ton of excitement about the gains in speed and size. A lot of people aren’t yet used to thinking of Mozilla as a source of mobile-grade technology, but they weren’t used to thinking of us as a major browser force either. It’s fun to break the model.

Fast, small, cross-platform, industry-leading stability, solid OS integration, excellent standards support, excellent web compatibility, great security, ridiculously extensible, a productive app platform, accessible, localized to heck and back, open source from top to bottom: it’s a great time to be building on top of Gecko, and Firefox 3 is just the beginning. Wait until you see what we have in store for the next release…

Tbeachball

One aspect of our product performance that I’m especially interested in knowing more about is responsiveness: when the user does something, how long does it take for us to act on that stimulus? (We might not complete it instantly, as when loading a new page, but we should endeavour to give feedback pretty much instantly.) One thing that sometimes interferes with our interface responsiveness is spending too long away from the main event loop, perhaps in layout or some other intensive computation. This leads to us not noticing — and thus, obviously, acting on — new events from the user such as mouse clicks or keypresses.

I’ve been talking for a while about wanting to measure our UI-thread event service latency as a proxy for that kind of responsiveness, and so this morning I finally just sat down and did it while I was waiting for Tyla to get out of bed. The patch (link fixed now, sorry) is pretty small, though it’s also rough: no control of buffer size or minimum reporting latency without recompiling, that sort of thing.

I plugged the samples from a short run into Numbers — which then beachballed a little while generating the graph, naturally — and this is the result:

Tbeachball graph

The taller portion at the right is from me going to an SVG statistics widget and dragging the sample-picker around like a bandit. On my laptop, we got a little above the 200 ms rule-of-thumb perception threshold for interactivity, and indeed I did start to notice that it was lagging a little behind my motion. I had other tabs open and stuff going on on my computer, etc., so it’s not Very Good Science, but it does give me hope that something like this monitoring will prove a decent proxy for one aspect of responsiveness. (Note that I just drop all samples that are less than 10 milliseconds, so these result sets will tend to give a negative impression of the application, by focusing only on events that actually take meaningful time to process.)