Category: all

Catchall for posts.

Tree Closing Downtime Notice – 4am – 8am PDT Thursday June 16, 2011

Trees will be closed for downtime so that we can land the following:

1. https://bugzilla.mozilla.org/show_bug.cgi?id=662396 — Fix time on dm-wwwbuild01

2. https://bugzilla.mozilla.org/show_bug.cgi?id=600980 — Set journal_mode = WAL for dirty places profiles — This mean new performance numbers will start on Thursday morning after the downtime

3. https://bugzilla.mozilla.org/show_bug.cgi?id=649123 — Run ANALYZE on dirty places.sqlite files –  This mean new performance numbers will start on Thursday morning after the downtime

4. https://bugzilla.mozilla.org/show_bug.cgi?id=663568 — reboot the DNS and DHCP servers in scl1 — Rebooting these servers has been shown to burn builds in the past, requires a short (~5min) outage to reboot these servers to allow updates to take effect.

5. https://bugzilla.mozilla.org/show_bug.cgi?id=663963 — change LDAP to see if that speeds up mercurial — This change should be entirely transparent.  Hg processes that are running at the time that the change was made will have already loaded the NSS LDAP module and will continue to use it until they exit.  The only issue to be aware of is that changes to hg access (group membership, or the creation of a new account) will not automatically propagate to the hg servers the way they do now.  If any hg access changes need to be pushed urgently, we can do that manually.

If anyone has a reason not to proceed with this downtime, please let me know.

Thoughts on cultivating an “Everyone is Remote” attitude

As I write this I am working from Paris and our team timezone spread looks like this:

  •  Rangoria, New Zealand: UTC (+12)
  •  Bucharest, Romania: UTC (+3)
  •  Istanbul, Turkey: UTC (+3)
  •  Paris, France: UTC (+2) <--- ME!
  •  Ottawa, ON: UTC (-4)
  •  Toronto, ON: UTC (-4)
  •  Philadelphia, PN: UTC (-4)
  •  Clifton Park, NY: UTC (-4)
  •  Chicago, IL: UTC (-5)
  •  San Francisco, CA: UTC (-7)
  •  Mountain View, CA: UTC (-7)

I’m going to go out on a limb here and say this: Release Engineering does a good job of working remotely with each other. We are 15-16 people (with a few more contractors/fte on the way) and it doesn’t matter where you live for you to work with us. Here we are in our meeting yesterday:

Releng Weekly Meeting – June 2011

Quite the impressive Brady Bunch layout, right?

Here’s what we do that I think works well for working remotely:

* We meet once per week as a whole group on Mondays. This starts the week off with a status update on our major projects and also a chance for individuals to speak up about anything they’re working on that they’d like people to be aware of.

* We are always having conversations in IRC amongst ourselves and with others in several channels. We use #mozbuild as a backchannel for our inter-team discussion, #build for access to a larger group of fellow Mozillians (like philor, Kairo, and ted for example, who often need to liaise with us), #developers is also a place we frequent and then there are some IT/mobile/QA/release-specific channels we hang out in as needed. I think this helps us have a presence in many areas of engineering/dev/IT and even with some of the non-technical teams at Mozilla where inter-team communication needs to happen. It keeps us in the loop on what various teams are up to and also provides the IRC equivalent of being able to overhear water-cooler chat and participate as well.

* We keep wiki pages for most everything. From “how-to” pages for our own release process, automation details, and project planning all the way to pages for outside-releng folks like the Try Syntax. While I find wikis frustrating the minute the information is out of date, the fact that I can update them and find them in my awesomebar quickly when I need them is very valuable to me.

* We email our group with important notices and changes to how things are done. There are not often times when someone will say “Oh I didn’t know about that” and the response is “It came up in the hallway when I was talking with so-and-so”. More often than not, the person driving a particular upgrade or change to current practices will send out an email to the group with details of : a) what the change is b) what it means going forward c) how the message has been disseminated to a wider audience (if needed) and finally d) where the wiki pages (and bugs, if needed for reference) can be found. This allows any of us to find the information N time units later when the change actually comes up in your daily work and you’re wondering “What was I supposed to do when trying to use the new X again?”

* We all meet up face to face approximately once per quarter. Twice a year for Releng work weeks and twice a year at Mozilla all-hands/summit gatherings. We take these as opportunities to discuss larger topics with lots of brainstorming, whiteboard scribbling, and animated opinion-sharing. Notes from meetings like this turn into wiki pages (often during the meeting itself) and those can become specs for projects/bugs to carry the work that needs doing to the next level.

I think that gives a good idea of our team practices. Now here are some thoughts I’ve been having about lately with regards to working remotely in Mozilla as a whole. It helps that I’m currently working in Paris right now and am pretty much completely opposite of the PDT work day but some of this was on my mind even when I was in SF.

I think Mozilla has an amazing opportunity to set trends in how to work with distributed teams. We already have people in every time zone! Even with the incredible advancements we’ve made with our use of video/audio/irc tools (airmozilla/vidyo), there are some ways in which MV is still the eye of Mordor for the company.

I would like to see us shake that up so I think we should try:

* Not having meetings in large groups in MV (except at all-hands). Instead, put small groups of people in various rooms around the building so that “we are all remote” is a reality for everyone so that the clarity of the communication channels are taken seriously. This means we all become just as invested in the quality of audio/video feeds, using tools like Etherpad for public collaboration, and advocating best practices for the speakers/presenters as those who are not in MV. I bet we’d see an increase in contributions to new tools & meeting practices if we were all experiencing meetings remotely on a regular basis.

* Rotate the hosting of the Monday meeting so that over a series of Mondays it would be run from various remote Mozilla offices and this would mean that it moves in time (which could be scheduled in advance) but it also means that all offices get a chance to feel special and be the center of attention. We’ll have an opportunity to get to know our co-workers from other offices better as they present the meeting and I even imagine some friendly competition could develop for who can run the most energetic and engaging meeting.

I’m really interested in trying that second one. The most MV-centric thing we do is have our 11am PDT meeting on Mondays be a locked-down time. What if it rotated around each week and just happened somewhere in the 9-5pm spectrum of your timezone? We could create a schedule for it so folks could have lots of notice for scheduling their other Monday things around it. Also, maybe sometimes you might miss one Monday meeting because it’s just not at a good time for you but that’s something some of our remote workers might say is just par for the course.

I know the idea needs more work, but there’s the nugget of it. Curious to know what others think. I’ll be continuing to talk this up – maybe we can have a larger discussion at the all-hands in September. Eventually I’d like to see us get to a point where we all think of ourselves as remote since if you look at Mozilla as a whole there does not really need to be a “hub” where one would be “local” compared to everyone else – there’s just planning for timezones/meetings and then all the people we work with doing their amazing stuff.

Use Try? Read this.

Two updates to Try are about to go into effect which enforce asking for what you want using the try syntax and configuring how much email you want to get with your results.  Read more below.

Bug 661409 – Now that this has landed, a push to try only generates email about a particular try builder’s results if it does not succeed.  You can adjust this to be more verbose by adding a -e/–all-emails to your try syntax if you miss getting over all those emails, or you can just shut off the emails completely with a -n/–no-emails in your commit syntax. Note that you must be using the “try: ” syntax for these email flags to be picked up which leads quite handily to…

Bug 649402 – Try syntax use is about to be mandatory as soon as this bug is fixed and the hg hook is enabled on the try repo. We’re doing this to encourage developers who use try to take an extra moment and request only the resources they absolutely need on their push.  This should reduce the test/talos load that has been increasing wait times across all branches during busy periods.  One additional psychological change is that the “try: -a” syntax has been removed and in order to ask for a mozilla-central matching run you must be more explicit: “try: -b do -p all -u all -t all”. I’ve updated the docs to reflect this change as well as the TryChooser syntax helper webpage. We’re really not trying to make your life harder with this change, approximately 50-60% of pushes to try currently use the try syntax and if you push to try without it you will get a helpful message pointing you to docs and syntax builder.  Check with #developers for tips and tricks from the folks who’ve been using this since the beginning, I know they have many including using the newly-minted Mozilla-Inbound repo where a push will get the complete set of tests/talos if you’d like to let your patch bake for a bit after doing a selective try run.

Summer Travel Log 1 – California->Canada->Paris

Last Saturday started with a little last-minute cleaning of the apartment for our summer Airbnb renters before we hit the road and headed North. In the past week I have been in 3 States (CA, OR, WA), 3 Provinces (BC, AB, ON), and am now in Paris, France in the heart of the city on Isle St-Louis.

 It took us two days to drive to Victoria, and we lengthened the travel time a bit by taking hwy 101 for half of it so as to view the amazing redwood trees along the coast.  On Jenny’s camera there are pictures of us driving through a huge tree – which is what you do when you take hwy 101 through the redwoods.  

We spent a nice quiet week in the woods of Victoria, getting Hopey settled in and sleeping a lot. Finally on Friday the time came to start the 16 hour trip to Paris. I’d like to say that I love flying Air Canada. It’s been a while, and living in the States means often flying Delta, US Airways, Southwest, or United. Air Canada’s planes are so nice and clean and big. Unlike every flight I’ve taken in the last year, this flight had NO issues with people’s carry on baggage fitting into the upper storage areas. Also, the seats are a little wider and I couldn’t measure but I think I had more leg room too. Airline promo done, let’s arrive in Paris!

 Our rental is in an old industrial building with a huge door that opens into an open-air courtyard, and then 4 flights of stairs up is our ‘apartment’. It’s quite small, but it’s quiet and we’re in such a great location for the next 3 weeks. Apparently it’s going to rain off and on all week, but today was a hot and sunny 28 degrees, so after a nap Jenny and I went out to explore the neighbourhood.  Here’s what I’ve observed from our initial, jet-lagged wanderings:

  • Getting keys made is something done at the cobbler’s, not at hardware stores (we visited 2)
  • At 5 or 6 pm, there is nothing slightly resembling dinner available in the local cafés and the restaurants are not open yet.  We will need to adjust to this.
  • There’s some great fashion here and then, in contrast, some people who seem to actively hate fashion :)
  • I love having opportunities to speak French!  Most importantly, everyone is speaking French back to me which is a pleasant surprise.
  • Everything here is REALLY EXPENSIVE!  It makes me kind of nervous.  Will need to get out of this neighbourhood and see what it’s like in less touristy areas.  We plan to get Velib bikes to explore the city tomorrow if it’s not raining.
  • I bought a SIM card at the airport for 19 euros and it gives you 5 euros credit which I’ve already gone through by using data, however there’s nothing in their rate sheet about data costs. SO there’s some lack of communication here.  
  • Also this:
http://www.youtube.com/get_player

That, my dear readers, is the DISCO TOILET in the Creperie where we ended up eating dinner. The tiny toilet booth is a totally different experience than the rest of the restaurant. So you’re eating your crepes and it’s all normal (top 40 UK dance hits quietly playing in the background) but if you get up to go to the washroom, watch out! The music is loud, the little green lights are dancing, and then…you just can’t help yourself…you’re dancing in a toilet :)

This bridge is covered in locks put up by people, looks like mostly to commemorate their love with someone else, and I’m excited to put one up there for Jenny and I at some point during our visit.  It’s so beautiful here. The stone streets, the tiny cars, even this very touristy area has a nice mix of locals going about their business and visitors walking about. There’s a lot of people riding bikes around and it looks pretty safe to do that here. I don’t have a helmet with me because our rentals in Prague will include them. I’m trying to decided if I should pick up one of these helmets while I’m here or wait until I’m back in SF.
There’s a huge thunderstorm rolling in as I write this and I wish I had the right kind of camera to capture the lightning. Time to try and get to sleep at a ‘normal’ hour so that I can get the most out of Paris, Day 2.

    Update on the Auto/Assisted Landing System

    Almost a week since the post introducing the design attempt for auto/assisted branch landings via Bugzilla and Try and guess what? We re-wrote everything!

    The details are in the wiki, bugs have been filed, code is being written.  We are working on making this system use a message queue and also see if we can work with mozillapulse to get information on bug changes from Bugzilla.

    I’d love to tell you more about it but you can read the wiki and I’m excited to get back to my SchedulerDBPoller component.

    Assisted/Automated Landing – Designing the Systems

    Ehsan’s blog post wishing for assisted landings on mozilla-central started a lot of people talking about this being a very desirable and useful tool for developers, where they could set a flag in Bugzilla and then be free to do other work until the results of their push were posted back to the bug. As part of enhancing the Tryserver I was already working on a way for users to signify in their try-syntax that the results of the push should go to the bug and these two ideas started to fuse into a dreamworld where someone could attach a patch to Bugzilla and have it be tried and pushed to trunk all with some magical bot automation.

    After doing a very short survey of developers and their try usage I have observed that there are two very different stakeholders here and both of them need separate-but-related tools:

    Developers The Bot (automation)
    • ease of use
    • better reporting (less email anyone?)
    • option to post to bug(s) *after* a try run has indicated success
    • queuing of patches culled from a flag in Bugzilla
    • automatically apply to tip of repo
    • push, and report back with results

    After soaking in the survey feedback and a first attempt with a whiteboard yesterday, I woke up this morning with some clearer ideas on how to take a first run at creating this system.  It involves creating several new tools, one new database, and enhancing our existing buildapi.

    New tools for Developers:

    • Adding more Try syntax options:
      • include list of the bug(s) that you would like your try results posted to (however many make for a complete run on your push, this can be one linux build or a complete ~186 builder try: -a buildset) 
      • turn off email notifications
    • Adding functionality to the self-serve api view for a revision (eg: https://build.mozilla.org/buildapi/self-serve/try/rev/de8ea75bc48e) that will better show your results for that push and provide a button which will post the patch(es) to a specified bug
    • Auto-landing from a bug in Bugzilla using the [autoland-try] whiteboard tag where any attached patches which are not obsolete, and have nothing set for ‘r’ are applied to the current tip of mozilla-central, pushed to try and those results are returned to a comment in the bug

    New tools to Automate landings (bot or script):

    • Crawl Bugzilla for bugs where [autoland-$branchname] is in the whiteboard and automatically push to tip of named branch, get the results, and return them to a comment in the bug (stripping out the whiteboard tag on completion)
      • bot will grab all non-obsolete, r+ patches (if $branchname != ‘try’) 
      • interdependent bugs will not be handled in this first swipe at a working system
      • pushes will have autoland-$bugnumber as the reason for the build in schedulerdb so that the results can be watched for, aggregated, and reposted to the bug on completion
    • Watch results coming back for one or two oranges (we can set a threshold) and re-triggers those, watching for the second set of results – to attempt catching intermittent oranges
    • Backout patches where even with a rebuild on an orange, there still remain orange results
    • LDAP authentication checking for bugzilla patch author -> hg commit permissions and being able to ensure that only people with the right credentials can trigger automatic landings. This may mean checking the reviewer too before allowing a patch to be applied & pushed.

    The next step is to get this design organized into bugs so that we can parcel out the work involved and start testing/completing segments and features as we work towards the whole. We have a RelEng intern this summer, Marc Jessome (Another Canadian in RelEng!), who will be doing a lot of the work between now and the end of August. Stop by anytime to say “Hi” to Marc and to chat with either of us about the project – feedback is always appreciated.  I’m happy to say that 52 people filled out the Try Usage survey just from posting it on Yammer. That was super helpful, thank you.

    A PyStar Supernova in the Sky

    The first Bay Area PyStar event has come and gone. I’m finally getting a moment to regroup and ponder all the trial and error of being the organizer of this event as well as having time to look at some of the statistics we gathered. Just from an organizing perspective here are a few items I’d like to share about the process.

    Things to do differently next time:
    * When creating the Eventbrite event, add questions like “What level do you want to learn at?” “Meat or Vegetarian?” “Operating System?” to the registration so there’s no need to send out blanket emails to attendees to try and get that information after the sign up.
    * Only do one day workshop instead of Friday night installation and Saturday workshop. I think that for many people the setup could have been done in the first hour and the rest of the day been spent learning instead of having a night session that only is needed by a handful of people.
    * Have the teachers/assistants already assigned to a particular level of instruction – prepare topics, tutorial materials, and class size ahead of time so that on the day of the workshop there might only be a handful of late arrivals to place and the other attendees will already be set up in the right learning level as requested in the sign up. 

    Things that really worked this time:
    * Eventbrite! They have amazing tools, stats, emailing options, charts, and also a way to see where your sign ups come from which showed us that we got a TON of views from Tweets which apparently was an impressive number (I am told by one of our attendees who is an Eventbrite employee)
    * Mozilla! By sponsoring the event – providing the space and food – being able to let people/groups spread out and work in our various conference rooms as well as having lunch on site was very much appreciated by attendees (and of course by me!)
    * CodeChix!  This peninsula-based group of women coders accounted for 30% of our attendance and also netted some teacher/assistants for the workshop. CodeChix co-sponsored the event and helped get word out as well

    Attendance
    There was something odd happening with the Eventbrite signups. In a couple of short bursts, a ton of tickets were being snatched up by names that seemed slightly suspicious. Now the event has passed and I’ve checked in all the attendees as well as accounted for the no-shows (almost all of whom took a moment to send in their regrets so the tickets could be freed up for another person – very sweet!). It looks to me like about 40% of our attendees were fake accounts. Julie (who works at Eventbrite) and I took a look at the numbers and she’s kindly offered to look into it further to see if there is indeed something fishy happening. All that aside, we had 47 people! That feels like great attendance to a first workshop, on a Saturday, in Mountain View.

    Speaking of Mountain View – we had attendees come from all over Northern California. I love this view of how spread out geographically we all were:

    This graph is useful for seeing how my own promotion attempts were successful.  The original spike of page views is obviously when I first announce the event link. CodeChix, Baypiggies, and Devchix were the mailing lists I sent emails to with the link. While that got the ball rolling, it was the tweets and emails sent out almost 3 weeks later – a week before the event where the event got lots of attention.  It probably helped that PyStar Minneapolis was happening then too so #PyStar got lots of tweets (sorry to the person who’s twitter nick is @pystar).

    Can I just say that I am so thrilled with the amount of people who volunteered to teach/assist?  Seriously. Amazing. I love that there are people out there who really enjoy getting newbies involved, who can share their skills, and who will give their time to events that grow community.

    Finally, here’s a breakdown of where we got ticket “sales” from via Eventbrite. This is another reason they rock – they help you promote your event!  As you can see here the Twitter share link definitely got us the most eyes even though direct invitation resulted in more actual signups. For next time I would send the link to a few more mailing lists like SF Python Meetup, Systers, and also next time we’ll be able to invite the folks who came to the first one as well as those who couldn’t make it.

    In follow-up posts I will post and analyze some of the survey results of both the PyStar Bay Area and the PyStar Minneapolis. I need to go learn how to create charts from Google doc spreadsheets. Also we need to figure out how to set up our site and materials to be easily updated and adjusted by a distributed team without having to break off into separate sites.  Finally, the curriculum needs an overhaul. We kept an etherpad during the event to track issues so that I can go through post-workshop and take advantage of all the feedback to improve our offerings.

    What’s Next?
    The next PyStar I plan to organize will be in late July or early August and I’d like to do that one in SF.  Following that I’m going to plan one in Toronto for mid-to-late October.  What we did this past Saturday is only the beginning. I’ll be working with all the folks in the pystar group to get this program shaped up into a much more modular system for learning Python and Django in stages (badges) and also will be setting up sub-groups for things like hack nights, code-masters (think toastmasters but writing code in front of people), and I have this idea of taking the PyStar lessons into women’s prisons as a way to get marketable skills into the hands of people who need them badly.

    Anyway, first we’ll get more material prepared and digest/incorporate all the excellent feedback. Then we can take over the world :)

    I hope I’ll see you at future events. Thanks to everyone who helped make this a great day!

    Captain Destructo Breaks Everything

    Alternate title ideas: “It’s not all s/Tryserver/Try”  or “What I should have done, and didn’t”

    I bet you get the point by now. Today I caused a fairly lengthy, unnecessary downtime on Try.  Now that I’m writing this, things are under control again and there’s a few small niggly bits left but nothing that will keep me up at night.

    It all started with a bug about graphserver posts from tests not getting through because they were looking for MozillaTry (the tinderbox name for Try) but instead the graphserver only knew about Tryserver (the branch name for Try) and nothing was using Try (except the repo for Try) which is what it ought to have been doing in the first place.

    Now that I’ve been adding a lot of project branches in a short amount of time, certain things have become more streamlined and so I felt that the best option was to go through and rename Tryserver/MozillaTry to Try everywhere so that from the repo going forward, everything was the same. This has been working extremely well for our project branches and helps make setup a snap.

    Here’s where it gets all broken. I approached this bug with a quick swipe at this problem was superficial and ended up causing some preventable burning.  I shall now list for you (and future me) what I did and what I should have done:

    Did:

    • hg rename on configs for desktop
    • branch configs for s/tryserver/try
    • updated graphserver branch name to Try
    • a quick downtime window from 10am – 12pm in order to prevent builds from getting split into two different upload dirs

    Should have done:

    • hg rename on configs for mobile
    • grep of buildbotcustom for “tryserver” as we have special casing for it in several files
    • log uploader and post_upload scripts to make sure everything about the try build was going to the right place
    • updated the dir permissions on ftp for the new upload location and ensure that the archive is on nfs mount
    • edited cronjobs on staging to catch the new try builds
    • updated graphserver machines table for each try platform’s builder name
    • more notice for downtime, with a 4 hour window that would have allowed a test push to make sure everything was wired up correctly  
    • updated the treeclosure hook to include the new tinderbox page

    Some of the things I should have done didn’t have an impact on the burning/try closure but it’s fair to say that if I had done a staging round of all my plans first I would have caught more of the obvious things that I missed. I would have then planned the downtime better and been prepared to ensure the disturbance would have been minimal since this was, after all, a really low priority bug.

    Aki told me that he had a manager who said “you don’t learn til you break something”.  Well I broke everything try-related today and here’s hoping that I have learned something because the stress of this whole day is not something I want to experience often. It’s that feeling you get when you realize you’ve started something that you can’t back out of and there’s no way to go but forward, even though everything in front of you now appears hopeless and messy.

    So here’s some lessons to take away:

    1. Staging is not to be underestimated even for just renaming things that are already working
    2. Taking the time to search with grep/mxr and find the terms you are replacing before starting the upgrade in production will help find wiring you might have overlooked in your preparations
    3. Prepare more thoroughly and have a clear idea of the env. you started in and what it will take to have that env. back when you’re done. Leaving dangly bits is not ideal.

    Happy Friday.
    (and many thanks to Aki)

    Hey BBC would you like to know how releasing software works?

    Dear BBC,

    Today on the front page of your technology section you said that downloads for Firefox 4 have been lower than they were for Firefox 3 and that:

    The lower figure may be explained by the widespread availability of pre-release versions of Firefox 4 in the months ahead of its launch.

    First of all, you forgot that we’ve had 3.5 and 3.6 between those two and so we now have users spread out a bit across versions. Second, here’s an overview of how we’re organizing the release of Firefox 4:

    • We put out the RC and picked up users from outside of our usual beta testing pool in order to give our final candidate some solid tire kicking
    • Firefox 4 went live but our users on 3.5 and 3.6 are not offered the update automatically yet, they must “Check for updates” in order to be asked if they want to upgrade to Firefox 4 
    • Once we have more coverage of the new release for a couple of weeks and are even more confident that we’ve got an amazing browser out there we will turn on the Major Update notification which will offer our 400+ million users the chance to come on up and experience the next level of the web

    According to W3C school’s stats(which are measured by visits to their site) the browser distribution of their visitors looks like this:

    2011 Internet Explorer Firefox Chrome Safari Opera
    February 26.5 % 42.4% 24.1% 4.1% 2.5%
    January 26.6 % 42.8% 23.8% 4.0% 2.5%
    2011 Total FF 4.0 FF 3.6 FF 3.5 FF 3.0 Other
    February 42.4 % 1.9 % 35.8 % 2.9 % 1.5 % 0.3 %
    January 42.8 % 1.5 % 36.1 % 3.1 % 1.7 % 0.4 %

    What this says to me is that our more than 8 million downloads since yesterday morning PDT only shows us how many people are paying attention to the fact that Firefox 4 has launched and is available for download. It’s not representative of our 400+ million active daily user base (the people who just use the browser but perhaps don’t read your blog or mine).  These people will soon learn about Firefox 4 through their browser’s update notification window. We’ll be seeing a spike in downloads in a couple of weeks and I hope you’ll report on that.