Launching a career in bikeshed painting

In March I’ll pass six years since I first submitted a feature change to PostgreSQL.  Numbers and years shouldn’t mean that much to people, but try and tell that to anyone who’s ever forgotten about their anniversary.  The fifth anniversary is traditionally celebrated with gifts made of wood.  I think I’ll use mine to build a bikeshed, a tradition in software development going back to at least 1956.  My guess is that minutes after the first person built storage that held one bit of information, someone questioned not the logic they used to flip its state, but instead whether the soldering technique used to wire it on would last.

This particular anniversary of mine falls in the middle of a busy PostgreSQL CommitFest, and I’m still working on the same sort of features I started out hacking on.  That’s left me reflecting on other things that have changed and stayed the same in PostgreSQL development during that same time, as I’ve tied myself increasingly to that work.  I think my story is a good example of how to get started in an open-source community.  I had a head start due to interacting with other users and developers for the Progress database mailing lists in the mid 90′s.  Grokbase remembers everything I’ve sent to the PostgreSQL mailing lists from the beginning, and their PostgreSQL badges give a quantity rating to everyone who writes a lot of e-mail there.

My first submission to the project was actually a bug fix in late 2006, just before the release of PostgreSQL 8.2 RC1.  I’d found a problem with the test_fsync utility and submitted an updated version of the entire source code file since I didn’t know how to submit a patch yet.  I disclaimed the update as still being “fishy”, and Tom Lane quickly pointed out what I’d done wrong.   But Bruce Momjian missed my disclaimer and committed it anyway  , was yelled at by Tom for messing with things just before the release candidate, and then Bruce reverted it.   That might sound like a bad process, and it’s not the sort of thing that would happen now.  But to put in perspective on how not disruptive it was, that all happened within 40 minutes.  So much for my first contribution attempt!  That’s a standard welcome to open source development though.  Bruce fixed test_fsync the right way a few days later.

Here in 2013, Bruce still hacks on the now renamed pg_test_fsync and argues with Tom.  Bruce used to manually keep list of unreviewed mailing list messages that contained patch submissions.  Now they go through our CommitFest process.  New submitters now can find guides to Submitting a Patch.  I’ve so repented my early mistakes that I published a guide to Creating Clean Patches, which I think is useful for anyone knew to version control and open-source submissions.  PostgreSQL’s switch from CVS to git has certainly made all that easier.

My first feature submission, the thing I’m basing my anniversary date on, was an initial attempt at adding checkpoint logging.  During that discussion around that, I got my first “prove that” from Tom.    I survived that because it turns out neither of us was completely wrong.  I was using terminology incorrectly due to a confusing part of the source code.  Being gracious for the advice when someone has told you you’re wrong is an important trait for fitting into an development community.  Tom still asks people to prove performance claims without data to back it up, and now I do that too.

I started chiming in about an existing bit of work eventually referred to as checkpoint spreading a few weeks later, and Magnus Hagander decided to pull the checkpoint information I wanted to collect into what became pg_stat_bgwriter.  Magnus and I still bounce ideas in this area between each other, which is a big help because he can commit things, too.

When people work on feature submissions, one possibility is that the original form you submitted it in will be rejected, replaced by a better approach.  Since people writing rejected patches don’t always get credited for that, this is sometimes a sore spot.  Large software projects need to prioritize the health of the project over hurt feelings though, and this sort of thing is certainly not unique to PostgreSQL.  The PostgreSQL CommitFest app has made it easier to credit reviewers and other people who made contributions to a patch during its lifetime.  That’s improved the accuracy of those credits while making collecting it easier for committers.

The first feature change I had commited was adding usage count statistics to pg_buffercache.  That was done only a week after submission, since it was complete, simple, and uncontroversial.  One thing I started doing there that I continue today is always including an example of how to use any feature, when practical to do so.  The initial experience someone gets when trying out new code, the software developer version of the out of box experience, is far easier when reviewers have an example to start with.

The final thing that I mark the end of my early days with was the first major project I took on.  The new monitoring information I’d helped add made it easier than ever to spy on some of the database internals.  At the time, the database background writer was practically untunable, and having performance issues due to tuning it badly was easy to run into.  On top of all that, the default configuration was troublesome for current day larger installations–ones far beyond the scale its original development tested against.

But the consulting job I’d been doing for 9 months, the one that initially gave me more insight into all of this, was over.  And there weren’t many people hiring PostgreSQL developers yet in 2007.  As I approached the summer, I was confident that I could rewrite the background writer to be less bad out of the box, and to be far easier to tune when the defaults weren’t good.  But it was going to take a few months of full time work.  And no one was going to pay me for it.  I’d made enough on the consulting work to have a few months of savings in the bank.  The question I faced is whether I was willing to burn through all of it in order to finish this work?

This is not unusual.  Getting a foothold as a new contributor to an open-source project isn’t going to happen overnight.  If that’s something you want, you’re probably going to have to prove yourself to the existing contributors.  And there’s a circular dependency you’re facing.  To have creditibility in a project, you’ll need to have a track record of committed work.  And your work is more likely to be committed if you have credibility.  It may take quite a bit of volunteer work to work your way up that chain.  I highly recommend starting with very small changes.  My first submission was a single line patch, and I moved up in complexity slowly from there.  I also found another submission that I liked, and I worked on that for a while as a reviewer.  Reviewing other people’s work is also a good way to gain respect among project contributors.

Once I decided to start on savings funded work, my time estimate was good.  The first useful work was submitted in early May, and the new feature was committed in late September.  By October I’d found a new full-time position working on a commercial PostgreSQL fork, and being able to point at this bit of work was a major help being hired for that.  I don’t think it ever would have happened had I not found a large block of time to volunteer on the project first.  Getting that feature done required interacting with many members of the PostgreSQL community.  The good faith I built up from that has proven just as important as the coding itself was.  Code contributions to PostgreSQL without being willing to take on some community interaction are, frankly, worthless.  That fact is a sore spot with many new contributors.

Non-trivial changes to the database code always go through some review and feedback.  And the features always end up better for it.  Submitting something to PostgreSQL is about joining a process.  You don’t just fling code over the wall and expect to get credited with doing something useful.  Like most software, the integration work is often longer than the original coding.  The project doesn’t really even want to work with people who are interested in the coding part, but expect other people to just take their prototype and run with it.  The upside for you is that you’ll almost certainly learn something useful in the process.  There are a lot of really sharp people involved in reviewing and committing new PostgreSQL code.  The feedback you’ll get may be a bit blunt, even tactless on a bad day.  Remember that this isn’t meant for you to take personally.  At the end of the day, the people who work hard on PostgreSQL are worried about the quality of the program.  If it’s feared your change might cause that to drop, you’ll be told exactly where that fear comes from.  Guiding the code progress of an important project is both a job and a responsibility.