Tagged: release management

Release Management Tooling: Past, Present, and Future

Release Management Tooling: Past, Present, and Future

As I was interviewing a potential intern for the summer of 2015 I realized I had outlined all our major tools and what the next enhancement for each could be but that this wasn’t well documented anywhere else yet.

By coming to Release Management from my beginnings as a Release Engineer, I’ve been a part of seeing our overall release automation improve across the whole spectrum of what it takes to put out packaged software for multiple platforms and we’ve come a long way so this post is also intended to capture how the main tools we use have gotten to their current state as well as share where they are heading.

Ship-It

Past: Release Manager on point for a release sent an email to the Release-Drivers mailing list with an hg changeset, a version, build number, and this was the “go” to build for Release Engineering to take over and execute a combination of automated/manual steps (there was even a time when it was only said in IRC, email became the constant when Joduinn pushed for consistency and a traceable trail of events). Release Engineers would update a config files & locale changes, get them attached to a bug, approved, uplifted, then go reconfigure the build machines so they could kick off the release build automation.

Present: Ship-It is an app developed by Release Engineering (bhearsum) that allows a Release Manager to input the configurations needed (changeset, version, build number, partials to be created, l10n changesets) all in one place, and on submit the build automation picks up this change from a db, reconfigures the build machine, and triggers builds. When all goes well, there are zero human hands between the “go” and the availability of builds to QA.

Future: In two parts:
1. To have a simple app that can take a list of bug numbers and check them for landing to {branch} (where branch is Beta, Release, or ESR), once all the bug numbers listed have landed, check tree herder for green status on that last changeset, submit to Ship-It if builds are successful. Benefits: hands off even sooner, knowing that all the important fixes are on the branch in question, and that the tree is totally green prior to build (sometimes we “go” without all the results because of human timing needs).
2. Complete End-To-End Release Checklist, dynamically updated to show what stage a release job is at and who’s got the ball in their court. This should track from buglist added (for the final landings a RM is waiting on) all the way until the release notes are live and QA signs off on updates for the general release being in the wild.

Nucleus (aka Release Note App)

Past: Oh dear, you probably don’t even want to know how our release notes used to be made. It’s worse than sausage. There was a sqlite db file, a script that pulled from that db and generated html based on templates and then the Release Manager had to manually re-order the html to get the desired appearance on final pages, all this was then committed to SVN and with that comes the power to completely break mozilla.org properties. Fun stuff. Really. Also once Release Management was more than just one person we shared this sqlite db over Dropbox which had some fun quirks, like clobbering your changes if two people had the file open at the same time. Nowhere to go but up from here!

Present: Thanks to the web production team (jgmize, hoosteeno, craigcook, jbertsch) we got a new Django app in place that gives us a proper databse that’s redundant, production quality, and not in our hands. We add in release notes as well as releases and can publish notes to both staging and production without any more commits to SVN. There’s also an API that can be scripted to.

Future: The future’s so bright in this area, let me get my shades. We have a flag in Bugzilla for relnote-firefox where it can get set to ? when something is nominated and then when we decide to take on that bug as a release note we can set it to {versionNum}+. With a little tweaking on the Bugzilla side of things we could either have a dedicated field for “release-note text” or we could parse it out of a syntax in a comment (though that’s gonna be more prone to user error, so I prefer the former) and then automatically grab all the release notes for a version, create the release in Nucleus, add the notes, publish to staging, and email the link around for feedback without any manual interference. This also means we can dynamically adjust release notes using Bugzilla (and yes, this will need to be really cautiously done), and it makes sure that our recent convention of having every release note connect to a bug persist and become the standard.

Release Dash

Past: Our only way to visualize the work we were doing was a spreadsheet, and graphs generated from it, of how many crasher bugs were tracked for a version, how many bugs tracked/fixed over the course of 18 weeks for a version, and not much else. We also pay attention to the crash rate at ship time, whether we had to do a dot release or chemspill, and any other release-version-specific issues are sort of lost in the fray after we’re a couple of weeks out from a release. This means we don’t have a great sense of our own history, what we’re doing that works in generating a more stable/successful release, and whether a release is in fact ready to go out the door. It’s a gamble, and we take it every 6 weeks.

Present: We have in place a dashboard that is supposed to allow us to view the current crash data, select Talos (performance) data, custom bug queries, and be able to compare a current release coming down the pipe to previous releases. We do not use this dashboard yet because it’s been a side project for the past year and a half, primarily being created and improved upon by fabulous – yet short-term – interns at Mozilla. The dashboard relies on Elastic Search for Bugzilla data and the cluster it points to is not always up. The dash is written in php and that’s no one’s strong suit on our current team, our last intern did his work by creating a Python Flask app that would work into the current dash. The present situation is basically: we need to work on this.

Future: In the future, this dashboard will be robust, reliable, production-quality (and supported), and it will be able to go up on Mozilla office screens in the dashboard rotation where it will make clear to any viewer:
* Where we are in the current release cycle
* What blockers remain for releas
* How our stability is (over/under acceptable rates)
* If we’re meeting performance expectations
And hopefully more. We have to find more ways to get visibility into issues a release might hit once it’s with the larger population. I’d love to see us get more of our Beta user’s feedback by asking for it on specific features/fixes, get a broader Beta audience that is more reflective of our overall release population (by hardware, location, language, user types) and then grow their ability to report issues well. Then we can find ways to get that front and center too – including to developers because they are great at confirming if something unusual is happening.

What Else?

Well, we used to have an automated script that reminded teams of their open & tracked bugs on Beta/Aurora/Nightly in order to provide a priority order that was visible to devs & their managers. It’s a finicky script that breaks often. I’d like to see that replaced with something that’s not just a cronjob on my personal VPS. We’re also this close to not needed to update product-details (still in SVN) on every release. The fact that the Release Management team has the ability to accidentally take down all mozilla.org properties when a mistake is made submitting svn propedits is not desireable or necessary. We should get the heck away from that asap.

We’ll have more discussions of this in Portland, especially with the teams we work closely with and Sylvestre and I will be talking up our process & future goals at FOSDEM in 2015 as well as following it with a work week in Paris where we can put our heads down and code. Next summer we get an intern again and so we’ll have another set of skilled hands to put on tooling & web service improvements.

Always improving. Always automating. These are the things that make me excited for the next year of Release Management.

Adding more Beta releases to the train

In March of 2011 we shipped Firefox 4 and moved to a rapid release with 6 weeks on each of Nightly, Aurora, and Beta channels prior to shipping a new major version of Firefox Desktop and Mobile to our users. Both Nightly and Aurora channels were getting builds & updates nightly (breakage notwithstanding) while Beta builds were still a highly managed, hands-on release product that shipped once per week, giving 6 builds in all unless there were additional last-minute landings  (typically critical security or 3rd party plugin/addon issues) requiring a beta 7 or, rarely, 8 prior to building our release candidate for that version.

Go to build by or before Tuesday EOD Pacific time, builds would be pushed to beta channel as soon as QA signed off which could be Friday morning or sometimes Thursday afternoons if done early.
Go to build by or before Tuesday EOD Pacific time, builds would be pushed to beta channel as soon as QA signed off which could be Friday morning or sometimes Thursday afternoons if done early.

This is the model we followed up until Firefox 23.  Starting in Firefox 15 we had the ability to perform silent, background updating which meant that we could push more updates to releases without causing update fatigue. Release Management, Release Engineering, QA, Stability, Support hashed out what it would take to move to a system where Beta builds are done on a nightly, automated manner.  We dubbed this a Rapid Beta model and as work from all teams has been done toward that goal we have managed to get a handle on where the bottlenecks are which impeding the complete automation of pushing out the most recent Beta code to our 10 million Beta users.

The reason it is to our advantage to get more builds to Beta users is because at 1/10th of our general release population, the faster we can get fixes (especially crash fixes or speculative fixes for compatibility and addon/plugin breakage) to our users, the sooner we can collect much-needed data that can verify the quality of our impending final build.  With the previous model, fixes missing a beta train meant that much more risk was added to the landing and typically we throttled the landing of all but the most serious security and usability patches back after the 4th beta meaning sometimes developers (and release managers) would be forced to make more pressured decisions about whether something could make a release or have to wait 8 more weeks to be in the next train.

QA did work to pare down on the manual testing needed for sign-off, Release Engineering put together a fabulous Ship-It web interface that Release Management could use to request builds in a more hands-off way to make the processes around starting & monitoring a new beta build much less time intensive.  Socorro work was done to make it possible to match crash data to build IDs so that we could technically support nightly Beta builds and see stability data in useful ways. Once all this work was in place we took a leap of faith and started releasing twice as many Beta builds in weeks 2-5 of the cycle for Firefox 23.

    First and last week still have one beta, weeks 2-5 have two builds per week where one is built on Monday shipping by Wednesday and the other build starts Thursday and ships by end of day Friday.
First and last week still have one beta, weeks 2-5 have two builds per week where one is built on Monday shipping by Wednesday and the other build starts Thursday and ships by end of day Friday.

This new model has had two full releases now, Firefox 23 & 24.  The feedback so far has been quite positive.  Release Engineering has been minimally called upon when the shipping app interface hit glitches, but those are mostly ironed out now.  QA is turning around their sign off of Firefox Desktop within approximately 24 hours and according to them their bug fix verification rates are going up with this new model in part because the smaller changes per Beta allow them to focus more.  They’ve also had an intern and have had their remote testers team gain additional resources, but the switch to more frequent Betas has apparently gone quite smoothly for them.  From a Release Management perspective, the tracking & landing of fixes on Beta is going much better since we now have less panic & stress on landings at the beginning of each week.  With one Beta getting kicked off on Mondays we start the week with something to start evaluating mid-week and then we continue to pick up fixes as developers start their week in order to get another build for feedback gathered over the weekend.

We're moving away from spikes of landings near the end of the Beta cycle now that we have more Betas for people to land in.
We’re moving away from spikes of landings near the end of the Beta cycle now that we have more Betas for people to land in.

Though the data is a little rough right now (I’m dreaming of a pushlog DB), the numbers so far look like we’re doing a good job of spreading out the landings over the course of the cycle, still tapering off at the end:

Landings are more evenly spread out in a week.
Landings are more evenly spread out in a week.

While at the same time, our overall tracking average remains stable and our tracked bugs fixed rate has been holding over 90% per release for the past 3 releases:

Tracking bugs fixed over unfixed Screen Shot 2013-10-17 at 5.55.12 PMScreen Shot 2013-10-17 at 5.57.07 PM Tracked to fixed percentage

 

 

 

 

 

 

 

 

Along with these improvements to getting features, regression & crash fixes to our users sooner with more automation and hands-off processes, we’ve been getting a lot out of the fact that we now have people who are full time sheriffs of the tree.  Ryan VanderMeulen and Ed Morley are doing a lot of the heavy lifting keeping uplifts in order and landing frequently as well as monitoring the trees for breakage.  Having managed trees, as well as team trees for active development is likely responsible for our tracking+/fix ratio on mozilla-central improving over time.

Finally, what’s most important from this experiment and what we consider to be the biggest win so far is that this new beta model helps release drivers over the whole cycle make decisions about uplifts with less concern about timing, and more focus on overall risk to product. Having more Beta builds means not having to make rash decisions because of scarcity.  We will continue to collect data and monitor our progress as well as work towards automated, nightly Beta builds since that would get us crash feedback on a more granular level but for now I see this current progress as a huge step forward for the stability and quality of our releases. Neither of the last two releases had to be followed by dot releases for anything we could have prevented.  Our Beta audience size holds strong, confirming that background updates are doing their job.  Next up we’ll be looking at potentially moving to a slightly longer, and overlapping Beta cycle while shortening time on Aurora – but that’s another post for another time.

 

Contribution opportunity: Early Feedback Community Release Manager

I’ve been in Release Management for 1.8 years now and in that time we’ve grown from one overworked Release Manager to a team of 4 where we can start to split out responsibilities, cover more ground on a particular channel, and also…breathe a bit. With some of the team moving focus over to Firefox OS, we’ve opened up a great opportunity for a Mozillian to help Release Management drive Firefox Desktop & Mobile releases.

We’re looking for someone committed to learning the deepest, darkest secrets of release management who has a few hours a week consistently available to work with us by helping gather early feedback on our Nightly channel (aka mozilla-central or ‘trunk’).  This very fabulous volunteer would get mentoring on tools, process, and build up awareness of risk needed for shipping software to 400 million users, starting at the earliest stage in development. On our Nightly/trunk channel there can be over 3000 changes in the 6 week development cycle and you’d be the primary person calling out potentially critical issues so they are less likely to cause pain to the user-facing release channels with larger audiences.

A long time back, in a post about developing community IT positions, mrz recalled a post where I stated that to have successful integration of community volunteers with paid staff in an organization there has to be time dedicated to working with that community member that is included in an employees hours so that the experience can be positive for both parties.  It can’t just be “off the side of the desk” for the employee because that creates the risk of burnt out which can lead to communication irregularities with the volunteer and make them feel unneeded.  For this community release manager position I will be able to put my time where my mouth is and dedicate hours in my week to actively shape and guide this community Release Manager in order to ensure they get the skills needed while we get the quality improvements in our product.

So here goes with an “official” call for help, come get in on the excitement with us.

You

  • Are familiar and interested in distributed development tools (version control, bug tracker) typically used in an open source project of size (remember when I said 400 million users? Ya, it’s not a small code base)
  • Want to learn (or already know) how to identify critical issues in a pool of bugs filed against a code base that branches every 6 weeks
  • Have worked in open source, or are extremely enthusiastic about learning how to do things in the open with a very diverse, global community of passionate contributors
  • Can demonstrate facility with public communications (do you blog, tweet, have a presence online with an audience?)
  • Will be part of the team that drives what goes in to final Firefox releases
  • Learn to coordinate across functional teams (security, support, engineering, quality assurance, marketing, localization)
  • Have an opportunity to develop tools & work with us to improve existing release processes and build your portfolio/resume

We

  • Mentor and guide your learning in how to ship a massive, open source software project under a brand that’s comparable to major for-profit technology companies (read: we’re competitive but we’re doing it for different end goals)
  • Teach you how to triage bugs and work with engineers to uncover issues and develop your intuition and decision making skills when weighing security/stability concerns with what’s best for our users
  • On-site time with Mozillians outside of Summits & work weeks – access to engineers, project managers, and other functional teams – get real world experience in how to work cross-functionally
  • Invitations to local work weeks where you can learn how to take leadership on ways to improve pre-release quality and stability that improve our Firefox Desktop/Mobile releases
  • provide references, t-shirts, and sometimes cupcakes :)

I’ll be posting this around and looking to chat with people either in person (if you’re in the Bay Area) or over vidyo. The best part is you can be anywhere in the world – we’ll figure out how to work with your schedule to ensure you get the guidance and mentoring you’re looking for.

Look forward to hearing from you! Let’s roll up our sleeves and make Firefox even better for our users!

 

 

Release-Mgmt: My First Beta from the ‘other’ side

Hello and welcome to my continued documentation off my learning curve in Release Management, something I’ve now been working at for 6 weeks.  Last time I was reeling from the new-to-me meeting/email/bugmail firehose.  Now I’ve got that more under control, having created many more filters and folders in Thunderbird as well as having a chance to do more work on the automated tracking emails script.

To continue to spread out the knowledge and tasks for release managers across more than just one Alex, last week I ran my first beta (Firefox 12 beta 4) and now I’m going to tell you what that involved because the bugzilla API is down.

Monday

Monday is about getting the queries down as much as possible.  This means making sure anything with Tracking?, approval-mozilla{beta,aurora,esr10}? is triaged and nudged further towards its final destination on the trains. We’re also needing to watch out at this point for riskier fixes, things that really need some bake time with actual users need to be landed for a beta4 since beta5 is more for low-risk regression back-outs and security fixes that need to land just in time for beta 6 (what is usually re-built as the appropriately branded final release).  At this point in the week there might be about 80-90 bugs we need to get sorted and at the end of the day only about 6 bugs were in the ‘really want this in beta 4′ list.

Tuesday

For this particular beta, we had been asked if it would be possible to give the ‘go’ to build earlier than usual because there was a fairly popular holiday weekend coming up (Easter) and our QA lead for this release was in Canada where Good Friday is a statutory holiday. The QA lead asked if we could go to build earlier so that a Thursday release of the beta would be possible. We confirmed with RelEng that this was do-able and agreed to do our best to get the ‘go’ out at an earlier time.  The plan was for everything to be landed by 2:30pm (Pacific) in order to have a changeset ready to fire off to Release Engineering by 5pm.

Going through the ‘burn list’ from Monday (6 bugs) mostly entailed tracking down people to land patches. There has to be a cutoff time for landings since it takes about 4 hours to get all the builds and tests for a push to the hg repo to report back completely. Note that improvements to build times are being worked on, case in point: faster Mac builds (newer hardware and using OS X 10.7)  take ~2.5hours off the normal Mac build times.

For one of the bugs we weren’t able to reach the dev and a volunteer committer found that the patch didn’t apply cleanly so that bug had to miss the train.  The others got in and I sent the ‘go’ to build Firefox 12 beta 4 at approximately 5:45pm (45 minutes later than desired).  All the results weren’t in yet for this changeset but I wanted to get a shot at making the request for earlier builds to QA so I took the risk that the builds/tests would be OK and we’d be already building when we confirmed that.  Had the builds/tests *not* turned out we’d have to scrap the chance at moving up the release window and one of our QA leads would have worked on a holiday to meet our ‘normal’ Friday release window.  So I took the leap (and this ended up being fine, though I wouldn’t do this again without good reason as it was a stressful call to make and it’s not a good practice to get into).

Right after the ‘go’ email was sent hg.mozilla.org went down and we lost 3 hours of build time.  This is not a normal result of giving the ‘go’ to build.  It was probably just because Hal was new to doing releases and I was new to running a beta so at least one thing had to come along and shake our confidence.

Wednesday

There’s not much (beta-running) to do on Wednesday except wait for Desktop & Mobile QA to do their thing.

Thursday

Mobile & Desktop QA send their results out – either signing off on the builds/updates or calling out issues. At this point QA signed off so at that point I could request the Release Engineer to push the updates to the beta channel (and upload a new beta apk to the Google Play Store). An hour or so later both QA leads signed off on the updates for the release that’s now live to our beta users.  After that, there’s just some product details to change for our websites to include beta 4.  We don’t do release notes per-beta which is good to know for when I run a beta 1.

Friday

Normally we’d be doing the push to beta channel on a Friday, so what would have been different was:

  • getting everything landed to mozilla-beta could have gone until later in the day
  • the ‘go’ to build email wouldn’t have resulted in immediate builds, they could start on Wednesday at the beginning of the Release Engineer’s day
  • QA needs two full nights of testing, so we’d get sign offs from QA on Friday morning instead – hence the push to beta channel (and store) on Friday afternoon

Nothing too crazy for my first beta.  I think beta 4 is a good one to start on – it’s not the “OMG last call!!1!” beta before a release and it’s not beta 1 where any fallout from our merge of mozilla-aurora -> mozilla-beta shake loose. Having done a beta release now,  I have a much more complete mental map of how the 6 week release cycle plays out for Release Management:

Multiply by the above by 6, sprinkle extra bug & meeting communication cycles to weeks 1 and 6, throw in twelve channel meetings, approx 30 more iterations of various queries triage to keep our tracking lists up to date and to know what’s really needing attention vs. what’s taking care of itself.

A ton of email/irc/automated notifications all with the goal of keeping tracked bugs moving forward and you’ve got your 6 week result: A fabulous new Firefox release.

Thanks for reading, more about automatic emails & wiki updates soon.

My first three weeks in Release Management

Three weeks and four days ago a request was floated out to me. Would I consider helping the release management team for a little bit? You see, we’ve had this unfortunate ‘trend’ in Release Management at Mozilla. Whenever a new hire is brought in to join in the fun of release management, the former team member would see their opening and often quite quickly head off to another gig. This 1 to 1 to 1 to 1 ‘team’ size leads to a single release manager taking on an incredible load and working alone, to boot. These two factors might very well lead someone to throw in the towel, no matter how awesome Mozilla and its mission and when you only have one release manager, that’s a terrible edge to ride. So, back to the question: Would I be willing to get in there and help out, attempt to lighten the load, learn how it’s done and let our 6-months-in release manager, Alex, perhaps start to ease off his 100 hour weeks?

Answer: Gladly. I can honestly say there is _nothing_ more motivating to me than being asked to help someone, especially if I think I can actually do it. I mean, I would certainly try to help someone beat off a streak of tigers but I would die pretty fast regardless of my high level of ‘motivation’, and being dead isn’t too helpful is it? In this case, being that I’m very familiar with how releases get out the door, I figured I had a good head start and could pick up the rest as we went along.

Three weeks ago I began a journey into learning how exactly the “other side” lives. Until then I had been blissfully ignorant of most things related to getting our product ready to ship. Sure, I remember before Firefox 4 went out and Shaver was promising people he would squeeze oranges for them if squeezed oranges would help them _get_stuff_done_ but mostly I was unaware of what went on at all those meetings, how bugs get chosen for different releases, and how we get a product to the point where a Sam/Beltzner/LegNeato/Akeybl could say to Release Engineering “go to build on Firefox Version N, build #1″.

No longer am I blind. Bonus? Now I know what the people I’ve known for 5 years _actually_do_.

Here’s what I have discovered so far:

  • Bug queries are my new best friends. We spend a LOT of time looking at various triangulations of bug data – approval required for {beta,aurora,esr10,1.9.2}? are we tracking it for {11,12,13}+? is it a security bug that needs to get into 3.6.next? blocker? topcrash? regression? I now have at least 4 or 5 bug list tabs open at any given time.
  • There’s a lot of communication involved in getting a bug to the finish line. Comment in the bug, wait for response, no response? Send an email. Email response? Great, now back into the bug with you for more history. No response? Ping in IRC. We will do whatever it takes to get a status update, find out if we should really be tracking something, discover blockers, ask about security risks of landing, or otherwise get the information needed to make the most informed decision about including/excluding a patch from a particular release. In the past couple of weeks, when I wasn’t in meetings or looking at bug lists, I built on Christian‘s bztools a couple of scripts that allow us to enter a bugzilla query, sort the bugs into buckets by manager and then send out a single email to each manager of the bugs’ assignees asking them to take a look at what we’re tracking for Firefox Next. Let’s start those discussions early and keep the bugs alive with whatever steps are being taken next.
  • Meetings are much more plentiful now. Two Firefox related meetings per week, two channel meetings, one ESR/1.9.2 triage, and a couple other 1×1 meetings to do RelMan planning. Add to that my 3x/week meetings with Marc to continue project managing Autoland and my week’s windows for writing code are significantly diminished. We have been trying to brainstorm ways of structuring meetings, calendaring, automating, anything to help reduce meetings/make them more effective. Watch for more changes to roll out as we continue to iterate on what’s working and eliminate what’s not. This seems to be the area we should focus on improving more than anything. It’s also the area where I am weakest, having a background in feminist collective process but not much in the way of boardroom and software production meeting styles. However, I am positive we can keep trying to improve one thing per meeting (like setting up a template or using the fabulous new bugzilla extension for our wiki) and keep chipping our way to being a lean, action-focused meeting machine.

My teammate Aki and I were talking almost every evening during the first week. He’d be curious as to how it was going ‘over there’. On the first day I reported back how crazy it seemed and how we spent so much time on bug queries and asking developers to talk with us about how things were going. He said “it sounds like you need three people, one per release branch”. A couple night later, I again filled him in on how things were going and about the email nag tool and about ESR and he said “now it seems like a team of 7 would be more ideal. One per branch (now including ESR), two on tools, one on vacation – then they all rotate”. I think it’s hilarious, and currently seems far-fetched that this team could ever get that big. It sure would be nice though. There’s some interesting problems to solve here. In fact, today in a meeting with Axel and Armen about l10n & releasing we pretended engineering effort was not short in supply and imagined what a fully automated release could look like.

Total hand-waving of what we could do automatically: tracking bug queries of release blockers, landing approved patches and watching for results, requesting l10n milestones for a release, sending the go to build when the last blocker is landed and the tree is green on that build, requesting a push to mirrors once QA signs off.

There’s so much fun to be had here :)

My very first contribution in week one was to get us a ‘release-mgmt@’ email so that we could start to act (and be perceived of ) as a team. Seems like a small thing, but as we continue to use it, I hope that we’ll keep building a sense that there is more than just one person nagging you to get your bug landed/nominated/tracked/backed out. I hope it will start to feel like there’s a team, encouraging you, trying to help you whenever possible, and working to keep the trains moving on time.