CIDuty/Meetings:2013-10-01

From MozillaWiki
Jump to: navigation, search

« previous week | index | next week »
< most recent | upcoming >


Release Engineering Buildduty Meeting

Status of buildduty period

https://releng.etherpad.mozilla.org/buildduty

Bugs filed

Previous action items

Agenda

  • (reminder, Callek not attending today, due to flight timing - notes on agenda inline)
    • [Callek] Can I ask someone to take an action item to send me a summary of outcomes of discussion/points here? <- coop will do this
  • (coop) Q4 goals?
    • slave pre-flight tasks (Q4)
    • report on disconnects as %age of overall builds
    • reduce %age or number of disconnects (stretch goal)
      • in-house masters for Windows (Q4)
    • improve loaner process
      • start with AWS loaners using slaveapi
      • coordinate with Kim
  • self-serve reboots for sheriffs (Q4)
    • slaveapi reboots from slave_health
  • improved slavealloc (postponed for now)
    • we currently don't have proper control on how to migrate instances from one coast to another or in-house
    • bug 907431 to handle Amazon issues better (q4 2013 goal?)
    • slaveapi to allow to migrate hosts from one place to another
      • allows for sheriffs to manage the pools
  • (armenzg) - triaging queries
    • can we remove the "build" group belonging? jlund could not see them
      • (Callek) We should/could add people to the 'build' bmo group as well (day 1 page update) -- but agreed anyway
    • releng query of bugs w/o dependencies should *only* be limited to problem tracking
      • (Callek) I don't necessarily agree, however I do think that until we have better tooling elsewhere this is likely necessary
    • see bug https://bugzilla.mozilla.org/show_bug.cgi?id=920453
    • it had a dep bug and prevented us to see it on the query
    • also this bug https://bugzilla.mozilla.org/show_bug.cgi?id=919841
    • even though I see the dep bugs as resolved
  • (armenzg) buildduty report not showing the following bug
  • (armenzg) - improving file removals
  • (armenzg) - (rant warning) win64 post-reimaging steps are complete pain
    • I still remember when I had managed to get to just a hostname change
    • I'm not going to do any hosts until we're fully switched rev2 imaging
    • (Callek) a brief chat in IRC suggested we could/should forgo win64 rev1 bringup in favor of reimaging as rev2 GPO'd machines. Has a short-term pain of extra buildbot-config mess, but probably best for our own sanity.
  • (bhearsum) lowering nagios noise
    • two examples yesterday of important alerts being missed (builds-4hrs file age; buildbot master command queue)
    • possible simple improvement: send slave alerts to a different place than the rest
    • even if we lower the noise we still have the issue that if I'm busy fixing something else I don't get to look at #buildduty
      • (Callek) this is the exact reason I miss some tree-closing issues, get into doing something else and don't look at #buildduty in a timely manner.
    • should we create an IRC channel called #treeclosing and report there issues that we know close the tree? or have a bot that calls for "buildduty"?
      • (Callek) We could also add ourselves to an IT-esque pager-duty for specific "tree closing" alerts we know are worth getting paged on, e.g. builds-4hrs. I would expect "all" of us [buildduty] on weekends, the assigned buildduty person on weekdays (similar to IT on-call magic) and possibly always-joduinn and/or always-coop (as managers).
      • (Callek) Having SMS alerts sent to me would allow me to at least respond that I'm nearby/not-nearby/etc and notice even faster if I am indeed on buildduty in a week and builds-4hr goes off.
  • (bhearsum) is buildduty on the hook for random new machine set-up? eg https://bugzilla.mozilla.org/show_bug.cgi?id=919841

List of current projects

Action items

  • coop:
    • update nagios buildduty docs carryover
    • follow-up with IT re: JSON formatting of nagios output
    • add existing buildduty queries to wiki
    • notify Callek of meeting outcomes
    • talk to arr about turning off nagios slave checks in IRC
  • armenzg
    • propose a query for non-problem tracking VS buildduty bugs