CIDuty/Meetings:2013-08-27
From MozillaWiki
< CIDuty(Redirected from Buildduty/Meetings:2013-08-27)
« previous week |
index |
next week »
< most recent |
upcoming >
Contents
Release Engineering Buildduty Meeting
- Date: 2013-08-27
- Time: 1:30pm EDT
- Room: ReleaseEngineering Vidyo room
- Meeting notes: https://wiki.mozilla.org/ReleaseEngineering/Buildduty/Meetings:2013-08-27
Status of buildduty period
Bugs filed
https://bugzil.la/908354 Need a way to obtain production keys for win64 machines (or fix original approach) https://bugzil.la/908359 Machine without reboot history even after not taking jobs for almost two days https://bugzil.la/908670 We cannot file new bugs from slavealloc (using old components)
Previous action items
Agenda
- (Callek) put foopies in slave_health and slavealloc [as slaves]?
- Defer to when coop is here, is probably best.
- (coop) should go in slavealloc, but how? ideas:
1. new category (and associated db table) for foopies, where tegra/panda maps to foopy maps to master 2. foopies as masters 3. foopies as slaves
- for 2 and 3, still need a join table to do the one-to-many mapping
- (armenzg) Amazon issues last week
- https://bugzilla.mozilla.org/show_bug.cgi?id=907158
- I've requested it to be a Q4 goal
- (armenzg) nagios checks
- mozpool pandas (no nagios) VS regular pandas
- desktop machines (only complain after kittenherder has had its chance)
- Do you know if we report nagios issues for *desktop* and *tegras* hosts earlier than 6 hours from an incident? The question comes because I beleive that we should not report anything before 6 hours since we hope briar patch to do something about it.
- hwine says: Agreed -- if you are seeing that, please reopen https://bugzil.la/886637 which was supposed to have done that already.
- I assume the pandas that use mozpool should not report on nagios at all.
- hwine says: This may be a new request -- feel free to file a bug, just file as blocking https://bugzil.la/885560
- Do you know if we report nagios issues for *desktop* and *tegras* hosts earlier than 6 hours from an incident? The question comes because I beleive that we should not report anything before 6 hours since we hope briar patch to do something about it.
- (armenzg) IPMI tends to hung and we should take into consideration into our code
- do not PING regularly or it will go down
- my covo with arr
- (bhearsum) IT escalation for slaveapi reboots - reboot vs. hardware diags
- (bhearsum) difficult in testing new tools due to default deny
- (bhearsum) ssh config changes on r3 w7 machines
- windows auto-reboot status
List of current projects
Action items
- armenzg - to reach IT wrt to nagios URL links documentation
- jhopkins to look at bug with no reboot history - https://bugzil.la/908359