Archive for August 2011

Rapid releases: one webdev’s perspective

People still seem to be very confused about why Mozilla has moved to the new rapid release system. I thought I’d try and explain it from my perspective. I should point out that I am not any kind of official spokesperson, and should not be quoted as such. The following is just my own personal opinion.

Imagine, now, you work on a team of web developers, and you only get to push new code to production once a year, or once every eighteen months. Your team has decided to wait until the chosen twenty new features are finished, and not ship until those are totally done and passed through a long staging and QA period. The other hundreds of bugs/tickets you closed out in that 12-18 months would have to wait too.

Seems totally foreign in these days of continuous deployment, doesn’t it?

When I first heard about rapid releases, back at our December All Hands, I had two thoughts. The first was that this was absolutely the right thing to do. When stuff is done we should give it to users. We shouldn’t make them wait, especially when other browsers don’t make them wait.

The second thought was that this was completely audacious and I didn’t know if we could pull it off. Amazingly, it happened, and now Mozilla releases use the train model and leave the station every six weeks.

So now users get features shortly after they are done (big win), but there’s been a lot of fallout. Some of the fallout has been around internal tools breaking – we just pushed out a total death sprint release of Socorro to alleviate some of this, for example. Most of the fallout, however, has been external. I see three main areas, and I’ll talk about each one in turn.

Version numbers

The first thing is pushback on version numbers. I see lots of things like:
“Why is Mozilla using marketing-driven version numbers now?”
“What are they trying to prove?”
“How will I know which versions my addons are compatible with?”
“How will I write code (JS/HTML/CSS) that works on a moving target?”

Version numbers are on the way to becoming much less visible in Firefox, like they are in webapps, or in Chrome, for that matter. (As I heard a Microsoft person say, “Nobody asks ‘Which version of Facebook are you running?’”) So to answer: it’s not marketing-driven. In fact, I think not having big versions full of those twenty new features has been much, much harder for the Engagement (marketing) team to know how to market. I see a lot of rage around version numbers in the newsgroups and on tech news sites (HN, Slashdot, etc), which tells me that we haven’t done a good job communicating this to users. I believe this is a communication issue rather than because it’s a bad idea: nowhere do you see these criticisms of Chrome, which uses the same method.

(This blog post is, in part, my way of trying to help with this.)

Add-on compatibility

The add-ons team has been working really hard to minimize add-on breakage. In realistic terms, most add-ons will continue to work with each new release, they just need a version bump. The team has a process for bumping the compatible versions of an add-on automatically, which solves this problem for add-ons that are hosted on addons.mozilla.org. Self-hosted add-ons will continue to need manual updating, and this has caused problems for people.

The goal is, as I understand it, for add-on authors to use the Add-on SDK wherever possible, which will have versions that are stable for a long time. (Read the Add-ons blog on the roadmap for more information on this.)

Enterprise versions

The other thing that’s caused a lot of stress for people at large companies is the idea that versions won’t persist for a long time. Large enterprises tend not to upgrade desktop software frequently. (This is the sad reason why so many people are still on IE6.)

There is an Enterprise Working Group working on these problems: we are taking it seriously.

finally:

Overall, getting Firefox features to users faster is a good thing. Some of the fallout issues were understood well in advance and had a mitigation plan: add-on incompatibility for example. Some others we haven’t done a good job with.

I truly believe if we had continued to release Firefox at the rate of a major version every 18 months or so, that we would have been on a road to nowhere. We had to get faster. It’s a somewhat risky strategy, but it’s better to take that risk than quietly fade away.

At the end of the day we have to remember the Mozilla mission: to promote openness and innovation on the web. It’s hard to promote innovation within the browser if you ship orders of magnitude more slowly than your competitors.

Notice that I mention the mission: people often don’t know or tend to forget that Mozilla isn’t in this for the money. We’re a non-profit. We’re in it for the good of the web, and we want to do whatever’s needed to make the web a better and more open place. We do these things because we’re passionate about the web.

I’ve never worked anywhere else where the mission and the passion were so prominent. We may sometimes do things you don’t agree with, or communicate them in a less-than-optimal way, but I really want people who read this to understand that our intentions are positive and our goal is the same as it’s always been.

Capability For Continuous Deployment

Continuous deployment is very buzzword-y right now. I have some really strong opinions about deployment (just ask James Socol or Erik Kastner, who have heard me ranting about this). Here’s what I think, in a nutshell:

You should build the capability for continuous deployment even if you never intend to do continuous deployment. The machinery is more important than your deployment velocity.

Let me take a step back and talk about deployment maturity.

Immature deployment

At the immature end, a developer or team works on a system that has no staging environment. Code goes from the development environment straight to production. (I’m not going to even talk about the situation where the development environment is production. I am fully aware that these still exist, from asking questions to that effect of the audience in conference talks.) I’m also assuming, in this era of github, that everybody is using version control.

(I want to point out that it’s the easy availability of services like github that has enabled even tiny, disorganized teams to use version control. VC is ubiquitous now, and that is a huge blessing.)

This sample scenario is very common: work on dev, make a commit, push it out to prod. Usually, no ops team is involved, or even exists. This can work really well in an early stage company or project, especially if you’re pre-launch.

This team likely has no automation, and a variable number of tests (0+). Even if they have tests, they may have no coverage numbers and no continuous integration.

When you hear book authors, conference speakers or tech bloggers talk about the wonders of continuous deployment, this scenario is not what they are describing.

The machinery of continuous deployment

Recently in the Mozilla webdev team, we’ve had a bunch of conversations about CD. When we talked about what was needed to do this, I had a revelation.

Although we were choosing not to do CD on my team, we had in place every requirement that was needed:

  • Continuous integration with build-on-commit
  • Tests with good coverage, and a good feel for the holes in our coverage
  • A staging environment that reflects production – our stage environment is a scale model of production, with the same ratios between boxes
  • Managed configuration
  • Scripted deployment to a large number of machines

I realized then that the machinery for continuous deployment is different from the deployment velocity that you choose for your team. If we need to, we can make a commit and push it out inside of a couple of minutes, without breaking a sweat.

Why we don’t do continuous deployment on Socorro

We choose not to, except in urgent situations, for a few reasons:

  • We like to performance test our stuff, and we haven’t yet automated that
  • We like to have a human QA team test in addition to automated tests
  • We like to version our code and do “proper” releases because it’s open source and other people use our packages
  • A commit to one component of our system is often related to other commits in other components, which make more sense to ship as a bundle

Our process looks like this:

  • The “dev” environment runs trunk, and the “stage” environment runs a release branch.
  • On commit, Jenkins builds packages and deploys them to the appropriate environment.
  • To deploy to prod, we run a script that pulls the specified package from Jenkins and pushes it out to our production environment. We also tag the revision that was packaged for others to use, and for future reference. Note we are pushing the same package to prod that we pushed to stage, and stage reflects production.
  • If we need to change configuration for a release, we change it first in Puppet on staging, and then update production the same way.

We do intend to increase our deployment velocity in the coming months, but for us that’s not about the machinery of deployment, it’s about increasing our delivery velocity.

Delivery velocity is a different problem, which I’m wrestling with right now. We have a small team, and the work we’re trying to do tends not to come in small chunks but big ones, like a new report, or a system-wide change to the way we aggregate data (the thing we’re working on at the moment). It’s not that changes sit in trunk, waiting for a release. It’s more that it takes us a while to get something to a deployable stage. That is, deployment for us is not the blocker to getting features to our users faster.

finally:

It’s the same old theme you’ve seen ten times before on this blog: everybody’s environment is different, and continuous deployment may not be for everybody. On the other hand, the machinery for continuous deployment can be critical to making your life easier . Automating all this stuff certainly helps me sleep at night much better than I used to when we did manual pushes.

(Incidentally, I’d like to thank Rob Helmer and Justin Dow for automating our world: you couldn’t find better engineers to work with.)