never test for an error you can’t handle

I work on recovery. This means that I’m responsible for getting the cluster — or, at least the filesystem parts — back into mostly-serviceable shape automatically, if (OK, when) something goes wrong. Could be a network failure, or one of the server or client computers crashing. Maybe a rack loses power, or we hit a fatal bug in our server software and it autoreboots. The combinatorial explosion of failure modes and their effects is really pretty impressive, and makes for some challenging analysis problems. (“OK, but what if the file was created on the other client, and then we have to replay the open on this client?” “Wait a sec, that transaction could make it to disk before we send the reply to the dead client.”)

Today, I embarked in earnest on a test suite to let us test the various combinations of, well, recovery things that we care about. (This is, for the record, the same test suite I told Coop yesterday to not write. After I told him to write it. Tee hee.) I’m very excited, because the quality of tests tends to have a significant correlation with the quality of the code in question, and I think it’s pretty important that recovery, our “software safety net” be robust.

And just in time! Phil and Peter have each put both fists — that’s four flying fists of hacker fury, in total — through the lock management and metadata parts of Lustre, and it’s going to break some parts of recovery like a drunken promise. I don’t mind, though, because it’s improved our stability so much.

When I wasn’t fretting over recovery today — I seem to do that a lot since I started this job, don’t I? — I was reading some pretty entertaining stuff on the interweb. Colby Cosh is a funny guy, and this bit from an article about the Davos conference registered as Officially Funny over here in this armchair that I’m calling an office this week:

For some, life begins at conception, for others at birth. “According to Jewish law,” deadpanned Yossi Vardi, an Israeli software entrepreneur, “life begins when the fetus becomes a lawyer.”

My main man Jacob has acquired hockey tickets for my next trip to Boston. It’s all starting to come together.

Comments are closed.