CIDuty/Meetings:2014-07-22
From MozillaWiki
< CIDuty
« previous week |
index |
next week »
< most recent |
upcoming >
Contents
Release Engineering Buildduty Meeting
- Date: Tuesdays
- Time: 9:30am ET & 1:30pm ET (alternating weeks)
- Room: ReleaseEngineering Vidyo room
- Meeting notes: https://wiki.mozilla.org/ReleaseEngineering/Buildduty/Meetings
Status of buildduty period
https://releng.etherpad.mozilla.org/buildduty
Previous Builduty Period
(see agenda for motivation of trying this)
Incomplete Work
Incidents of Note
- https://bugzil.la/1040062 - B2g bumper bot problems - hasn't made commits for a while - Gaia-feeding trees closed until bumper bot is fixed
- Issue started with nagios notifications of old timestamped files on master66 and sheriffs complaining, currently bumper is not documented
- https://bugzil.la/1040150 - High pending count for Windows test slaves
- Many slaves required manual rebooting via slave_health interface; Callek and coop highlighted the need of some changes to slaverebooter (see bug for details)
- mac-signing1.srv.releng.scl3.mozilla.com:signing-server has been having issues during the second part of last week, and nthomas was often around to troubleshoot. Nobody in EU has credentials for the signing servers. Issues are not solved though, and apparently the causes are beyond what releng can do, nick suggests to send it off for diagnostics: see : https://bugzil.la/1039977
The week was rich of troubles:
- https://bugzil.la/1039227 - Backlog of tst-emulator64-spot jobs going up even if all slaves are in working status; this was solved by Kim by increasing the number of slaves in the pool
- https://bugzil.la/1039313 - Suspiciously high number of talos slaves are broken
- slaveapi needed the updated passwords for cltbld and Administrator (thanks jlund for finding out) and coop restarted the remaining slaves not taking jobs
- https://bugzil.la/1039170 - Integration Trees closed due Android tests fail Automation Error: Unable to reboot panda-xyz via Relay Board.
- This was ultimately related to some missing reconfigs on foopies, so the configuration on them was not reflecting the recent migration of pandas to scl3
Bugs filed
- https://bugzil.la/1039972 - Define a canonical way to run reconfigs
- https://bugzil.la/1039982 - end_to_end_reconfig.sh should not halt in case of foopies issues, and report what foopies were problematic, which inspired pete for the creation of https://bugzil.la/1040013 No manual reconfigs: buildbot masters and foopies to update themselves
Owner wanted
Previous action items
- coop to find/file bug for builder name mapping
- https://bugzil.la/913658 - Need buildername regular expressions and associated properties published to an API
- see also: https://bugzil.la/586664 - Normalize builder names
- intern project? contractor?
Agenda
- (coop) process for handling slaves returning to production
- when do we close problem tracking bugs?
- when the slave is re-enabled and rebooted? after it passes it's first job?
- intern project?
- when do we close problem tracking bugs?
- ability to terminate spot instances through slave_health
- does rail have script to get spot DNS entries?
Action items
- Document bumper (see related incident below) (simone)
- Give credentials for mac signing servers to EU guys (see related incident above)