CIDuty/Meetings:2014-07-22

From MozillaWiki
Jump to: navigation, search

« previous week | index | next week »
< most recent | upcoming >


Release Engineering Buildduty Meeting

Status of buildduty period

https://releng.etherpad.mozilla.org/buildduty

Previous Builduty Period

(see agenda for motivation of trying this)

Incomplete Work

Incidents of Note

  • https://bugzil.la/1040062 - B2g bumper bot problems - hasn't made commits for a while - Gaia-feeding trees closed until bumper bot is fixed
    • Issue started with nagios notifications of old timestamped files on master66 and sheriffs complaining, currently bumper is not documented
  • https://bugzil.la/1040150 - High pending count for Windows test slaves
    • Many slaves required manual rebooting via slave_health interface; Callek and coop highlighted the need of some changes to slaverebooter (see bug for details)
  • mac-signing1.srv.releng.scl3.mozilla.com:signing-server has been having issues during the second part of last week, and nthomas was often around to troubleshoot. Nobody in EU has credentials for the signing servers. Issues are not solved though, and apparently the causes are beyond what releng can do, nick suggests to send it off for diagnostics: see : https://bugzil.la/1039977

The week was rich of troubles:

  • https://bugzil.la/1039227 - Backlog of tst-emulator64-spot jobs going up even if all slaves are in working status; this was solved by Kim by increasing the number of slaves in the pool
  • https://bugzil.la/1039313 - Suspiciously high number of talos slaves are broken
    • slaveapi needed the updated passwords for cltbld and Administrator (thanks jlund for finding out) and coop restarted the remaining slaves not taking jobs
  • https://bugzil.la/1039170 - Integration Trees closed due Android tests fail Automation Error: Unable to reboot panda-xyz via Relay Board.
    • This was ultimately related to some missing reconfigs on foopies, so the configuration on them was not reflecting the recent migration of pandas to scl3

Bugs filed

Owner wanted

Previous action items

  • coop to find/file bug for builder name mapping

Agenda

  • (coop) process for handling slaves returning to production
    • when do we close problem tracking bugs?
      • when the slave is re-enabled and rebooted? after it passes it's first job?
    • intern project?
  • ability to terminate spot instances through slave_health
  • does rail have script to get spot DNS entries?

Action items

  • Document bumper (see related incident below) (simone)
  • Give credentials for mac signing servers to EU guys (see related incident above)