Tagged: automation

Why isn’t Autoland working?

This question comes up enough that I figured a quick blog post/status update would be helpful.

What does Autoland do?

Poll individual Bugzilla bugs for an autoland token (currently using whiteboard tags, future Autoland has an extension & webservice that makes polling the entirety of bugzilla no longer needed). When an autoland request is found, the serviced does an automated landing to try for you of all non-obsolete patches attached to the bug if they can be landed cleanly on tip of mozilla-central and either the patch author or a feedback provider have appropriate hg permissions, otherwise it reports back to the bug what the issues were. Upon completion of the try run, a comment is left in the bug stating the results and if a final repo destination (or destinations) was specified (the hg permissions must match up between requester/reviewer and the destination repo(s)) the service can continue on to autolanding the patch(es) to the destination repo.  A comment would be left on the bug when the push to final destination(s) is done.  There would be no reporting back of final build results, that would be handed back over to human eyes on TBPL.

What is Autoland’s status now?

In April of 2012, right before Marc’s second internship ended, we launched a very experimental public-facing version of Autoland and announced it a bit so we could get more people testing it.  This had varying degrees of success.  We got more bugs ironed out but also discovered that Autoland’s daemons for hgpushing and bugzilla polling tend to fall over a bit too often.  When we moved Autoland off it’s staging VM to a more permanent home we lost the status page that would tell us (and developers) what the modules were doing and that really made the workings of Autoland quite opaque.  With Marc leaving at the end of April and my switch over to help with Release Management in Feb 2012  I had kept meeting regularly with Marc and driving the project to completion as much as possible but hadn’t been able to pull my weight on coding for the last 3 months of our time together. This left us with an Autoland that stopped working and no one available to continue to massage it into being the robust system we needed it to be.  I took mention of Autoland out of our trychooser page, try server docs, and have generally tried to downplay it’s existence while still keeping a plan on the back burner for how we will resurrect it as soon as there is some time.

What does Autoland need to be publicly usable again?

  • A status page that can show what modules are running or down, display what’s in the queue, and give a quick visual to users if Autoland is up or down as a result.
  • Nagios alerts on the Autoland modules that let me (and other people interested in helping to maintain Autoland) know when things fall over.
  • At least one person, if not several, who can access the Autoland master VM and ‘kick’ it as needed

This is what’s needed for a short term solution. I know that we have some bugs with our hg pusher module as well as some trickiness in our message queue that, once fixed, would make the overall system more robust.  We need people using the system to be able to catch more of those bugs though, so in the meantime having as close to on-click restoring of the system would be a huge win here.

What does Autoland need to be truly ‘production’ ready?

  • Security review on the code and the BMO extension so that we can move away from whiteboard tags and let people use the BMO extension instead — this gives much cleaner input to Autoland of what is being requested.
  • More VMs to run hgpusher modules on so that Autoland can handle a larger load. Each VM can run 2 hgpushers max so we’d want to be able to grow our pushing farm as the usage of the system increases.
  • Being able to push to repos other than Try.

There is no clear plan to be able to get the system beyond Try landings, but I see automated try landings as still being a huge help so I’d be super happy just to see that part get back to a working state.  This project is no longer a RelEng priority now that I’ve permanently moved to Release Management and Marc has gone on to other internships and more schooling. I can’t promise anything time-wise, but I wanted to provide some clarity into what is needed and put out the “patches welcome” call.  I see Autoland as being a great option for a community-managed project and I want to keep working on it when time permits. If you are looking to become a Mozilla contributor and are interested in automation and web APIs – this might be a good starter project for you. Please get in touch.



Want to help? Encouraging community contributions

In a timely confluence with Mozilla’s new Steward initiative, I’m preparing to get some community contributors engaged with some of the projects we work on in Release Engineering.  A fair amount of our production infrastructure has to be locked behind VPN and sekrit passwords (we have 400+ million users to protect) but there are more and more RelEng side projects. We provide tools to the larger developer community and solve interesting scalability challenges with our unique (and massive) automation systems that can be worked on by any interested person in their own local test environment and then integrated into our /build repos. My personal goal is to try and get 2 or 3 regular community contributors to come work with us on tackling these.

In order to solicit contributions I have been working with David Boswell. We added Release Engineering to the mozilla.org/contribute ‘areas of interest’ page and I have created the beginnings of a RelEng-specific contribution page. The first two areas that I think would be a great introduction to working with RelEng code & tools are the TryChooser and our upcoming Autoland system.  For the latter, our intern Marc Jessome is sticking around this fall as a contributor to carry on the amazing work he put into this system over the summer.  He’ll be continuing to debug the code and improve the portability of it so that we can get it into a beta testing stage by the end of October.  As that work is being done we also need someone to help us write the API functionality that will allow sheriffs and developers to write tools that utilize this new hands-off landing queue.  We’d also be happy to have people work on the issues that come up when we take Autoland to the next level – auto-landing on a production branch.  To do this we’ll want some automated backing out, bisection, and the ability to wait on getting patches reviewed before continuing.

Another great area for someone interested in helping out Firefox developers is working on the TryChooser syntax and features.  There is a whole tracking bug dedicated to try_enhancements and most of those bugs are ones that can be worked on in a local staging environment.  It’s a chance to get your feet wet with buildbot and our custom scheduling setup. Some of these smaller bugs would be short on time commitment and high on developer appreciation if you fix them. That can be a winning combination for a new contributor, I speak from experience on that :)

So, if you’re reading this post and you or someone you know is interested in dipping their toes into becoming a Mozilla contributor and these projects make you curious then come find me and we’ll get you set up with a staging environment so that you can start fixing real world tools and automation bugs in no time.