relman – Lukas Blakk

Release Management Tooling: Past, Present, and Future

As I was interviewing a potential intern for the summer of 2015 I realized I had outlined all our major tools and what the next enhancement for each could be but that this wasn’t well documented anywhere else yet.

By coming to Release Management from my beginnings as a Release Engineer, I’ve been a part of seeing our overall release automation improve across the whole spectrum of what it takes to put out packaged software for multiple platforms and we’ve come a long way so this post is also intended to capture how the main tools we use have gotten to their current state as well as share where they are heading.

Ship-It

Past: Release Manager on point for a release sent an email to the Release-Drivers mailing list with an hg changeset, a version, build number, and this was the “go” to build for Release Engineering to take over and execute a combination of automated/manual steps (there was even a time when it was only said in IRC, email became the constant when Joduinn pushed for consistency and a traceable trail of events). Release Engineers would update a config files & locale changes, get them attached to a bug, approved, uplifted, then go reconfigure the build machines so they could kick off the release build automation.

Present: Ship-It is an app developed by Release Engineering (bhearsum) that allows a Release Manager to input the configurations needed (changeset, version, build number, partials to be created, l10n changesets) all in one place, and on submit the build automation picks up this change from a db, reconfigures the build machine, and triggers builds. When all goes well, there are zero human hands between the “go” and the availability of builds to QA.

Future: In two parts:
1. To have a simple app that can take a list of bug numbers and check them for landing to {branch} (where branch is Beta, Release, or ESR), once all the bug numbers listed have landed, check tree herder for green status on that last changeset, submit to Ship-It if builds are successful. Benefits: hands off even sooner, knowing that all the important fixes are on the branch in question, and that the tree is totally green prior to build (sometimes we “go” without all the results because of human timing needs).
2. Complete End-To-End Release Checklist, dynamically updated to show what stage a release job is at and who’s got the ball in their court. This should track from buglist added (for the final landings a RM is waiting on) all the way until the release notes are live and QA signs off on updates for the general release being in the wild.

Nucleus (aka Release Note App)

Past: Oh dear, you probably don’t even want to know how our release notes used to be made. It’s worse than sausage. There was a sqlite db file, a script that pulled from that db and generated html based on templates and then the Release Manager had to manually re-order the html to get the desired appearance on final pages, all this was then committed to SVN and with that comes the power to completely break mozilla.org properties. Fun stuff. Really. Also once Release Management was more than just one person we shared this sqlite db over Dropbox which had some fun quirks, like clobbering your changes if two people had the file open at the same time. Nowhere to go but up from here!

Present: Thanks to the web production team (jgmize, hoosteeno, craigcook, jbertsch) we got a new Django app in place that gives us a proper databse that’s redundant, production quality, and not in our hands. We add in release notes as well as releases and can publish notes to both staging and production without any more commits to SVN. There’s also an API that can be scripted to.

Future: The future’s so bright in this area, let me get my shades. We have a flag in Bugzilla for relnote-firefox where it can get set to ? when something is nominated and then when we decide to take on that bug as a release note we can set it to {versionNum}+. With a little tweaking on the Bugzilla side of things we could either have a dedicated field for “release-note text” or we could parse it out of a syntax in a comment (though that’s gonna be more prone to user error, so I prefer the former) and then automatically grab all the release notes for a version, create the release in Nucleus, add the notes, publish to staging, and email the link around for feedback without any manual interference. This also means we can dynamically adjust release notes using Bugzilla (and yes, this will need to be really cautiously done), and it makes sure that our recent convention of having every release note connect to a bug persist and become the standard.

Release Dash

Past: Our only way to visualize the work we were doing was a spreadsheet, and graphs generated from it, of how many crasher bugs were tracked for a version, how many bugs tracked/fixed over the course of 18 weeks for a version, and not much else. We also pay attention to the crash rate at ship time, whether we had to do a dot release or chemspill, and any other release-version-specific issues are sort of lost in the fray after we’re a couple of weeks out from a release. This means we don’t have a great sense of our own history, what we’re doing that works in generating a more stable/successful release, and whether a release is in fact ready to go out the door. It’s a gamble, and we take it every 6 weeks.

Present: We have in place a dashboard that is supposed to allow us to view the current crash data, select Talos (performance) data, custom bug queries, and be able to compare a current release coming down the pipe to previous releases. We do not use this dashboard yet because it’s been a side project for the past year and a half, primarily being created and improved upon by fabulous – yet short-term – interns at Mozilla. The dashboard relies on Elastic Search for Bugzilla data and the cluster it points to is not always up. The dash is written in php and that’s no one’s strong suit on our current team, our last intern did his work by creating a Python Flask app that would work into the current dash. The present situation is basically: we need to work on this.

Future: In the future, this dashboard will be robust, reliable, production-quality (and supported), and it will be able to go up on Mozilla office screens in the dashboard rotation where it will make clear to any viewer:
* Where we are in the current release cycle
* What blockers remain for releas
* How our stability is (over/under acceptable rates)
* If we’re meeting performance expectations
And hopefully more. We have to find more ways to get visibility into issues a release might hit once it’s with the larger population. I’d love to see us get more of our Beta user’s feedback by asking for it on specific features/fixes, get a broader Beta audience that is more reflective of our overall release population (by hardware, location, language, user types) and then grow their ability to report issues well. Then we can find ways to get that front and center too – including to developers because they are great at confirming if something unusual is happening.

What Else?

Well, we used to have an automated script that reminded teams of their open & tracked bugs on Beta/Aurora/Nightly in order to provide a priority order that was visible to devs & their managers. It’s a finicky script that breaks often. I’d like to see that replaced with something that’s not just a cronjob on my personal VPS. We’re also this close to not needed to update product-details (still in SVN) on every release. The fact that the Release Management team has the ability to accidentally take down all mozilla.org properties when a mistake is made submitting svn propedits is not desireable or necessary. We should get the heck away from that asap.

We’ll have more discussions of this in Portland, especially with the teams we work closely with and Sylvestre and I will be talking up our process & future goals at FOSDEM in 2015 as well as following it with a work week in Paris where we can put our heads down and code. Next summer we get an intern again and so we’ll have another set of skilled hands to put on tooling & web service improvements.

Always improving. Always automating. These are the things that make me excited for the next year of Release Management.

Thanks to the GNOME Outreach Program for Women, we’ve got ourselves an awesome January intern who will be doing her first Open Source contributions all the way from Australia.

Lianne Lee stood out as the strongest of several applicants to the Release Metrics Dashboard, which was one of the two Mozilla projects that Selena Deckelmann and I threw together in order to try and lure people to our devious schemes for moar metrics. Lianne’s application was thorough and used all the technologies I wanted our intern to have familiarity with (python, git, javascript, creating data visualizations)

She did a great job of showing one of the things release managers do over the six weeks a Firefox version is in Beta. The spikes in the above graph align with our constant triaging of tracking-firefox17? flags and how the number of bugs flagged for tracking decreases after the first few betas have shipped. When we get to beta 4 we’re starting to get more reserved about what we’re willing to track (it usually has to be pretty critical, or a low-risk fix to a many-user-facing issue).

This next graph shows us what we already know – but it’s very nice to SEE: our bugs tracked for a particular release continually go down over time, gradually. Remember, this is while new bugs are being added to tracking regularly, so the fact that the trend keeps going down helps us know we are staying on top of our work and that engineers are continuing to fix tracked bugs as we close in on a 6 week ship date.

Now that we know Lianne has got what it takes, we’re going to set her on a more ambitious project – to create an engineering dashboard both for individuals and for teams, that would give them this sort of info on demand. Want to see where you’re at (or where your team is at) on a particular version? The engineering dashboard could show you in priority sequence what should be top on your list and also what bugs your team has unassigned that are tracked and should be assigned pronto (or communicate to RelMan that the bug should not be tracked).

This will be a huge improvement over email nagging (don’t worry, that’s still going to be around for many more months) because it will give us some quick, visual cues about how we’re doing with Firefox priorities and then we can also keep these measurements over time to compare release-to-release what the temperature of a particular version was. We hope this will allow us to keep fine tuning and working towards more stable beta cycles as we move forward.

Lianne will be with us from January 2 to April 2, 2013 and in her first week she’ll be evaluating a bunch of existing dashboards we know about to see what the pros and cons of each are and to do reconn on the technologies and visualizations people use. We’ll use that to help us develop the v1.0 of this project’s deliverable and make sure it’s left in a state that RelMan intern 2.0 can pick up next summer.

Please comment if you have dashboards you like, you loathe, or you just want us to know about.

Tag: relman

Release Management Tooling: Past, Present, and Future

Release Management Tooling: Past, Present, and Future

Ship-It

Nucleus (aka Release Note App)

Release Dash

What Else?

Release Management gets an Intern!