CIDuty/Meetings:2013-12-03
From MozillaWiki
< CIDuty
« previous week |
index |
next week »
< most recent |
upcoming >
Contents
Release Engineering Buildduty Meeting
- Date: 2013-12-03
- Time: 10:00am EST
- Room: ReleaseEngineering Vidyo room
- Meeting notes: https://wiki.mozilla.org/ReleaseEngineering/Buildduty/Meetings:2013-12-03
Status of buildduty period
https://releng.etherpad.mozilla.org/buildduty
Bugs filed
Previous action items
- (jhopkins) follow up with catlee on bug 793989
- zeller is working on this
- jhopkins and callek to discuss bug 927129
Agenda
- (bhearsum) sheriff access to slaveapi
- any reason not to give them access to all endpoints?
- (Callek) All current endpoints are fine, I can think of at least 1 or 2 future endpoints that may be "no"
- AWS Instance Creations
- Loaner Creation/Setup
- (Callek) All current endpoints are fine, I can think of at least 1 or 2 future endpoints that may be "no"
- any reason not to give them access to all endpoints?
- (Callek) ack's getting lost in nagios due to parent->child relationships and "UNREACHABLE"
- http://bugzil.la/942969
- Affecting our panda hosts more than any other (in my observation)
- Solutions? Seperate Meeting with :ashish and IT to discuss options? Other Ideas?
- Can we disable *all* individual nagios checks? at leat on IRC. Only leave important hosts
- already in progress, needs to be pinged: https://bugzilla.mozilla.org/show_bug.cgi?id=927941
- (coop) need owner for redis
- short-term: fix restart script
- supervisord?
- (coop) longer term: move to webops redis cluster, already running more-recent version which presumably doesn't leak FHs
- bug #???
- short-term: fix restart script
- (armenzg) what do we do about this?
10:02 nagios-releng: Tue 07:02:10 PST [4563] fw1.private.releng.scl3.mozilla.net:BGP usw2 vpn-0525f61b-2 is WARNING: SNMP WARNING - BGP sess vpn-0525f61b-2 (usw2/169.254.249.29) uptime *308* secs (http://m.allizom.org/BGP+usw2+vpn-0525f61b-2)
- XioNox is working diligently with AWS and has root access to coord on the ticket. Unsure if there is anything we can "do" at present other than watch that as an informational message
- bug triage Friday morning
- ownership of Releng:Other component
- pods
List of current projects
- https://github.com/bhearsum/slaveapi/blob/master/TODO
- https://bugzilla.mozilla.org/show_bug.cgi?id=914764
- in-house capacity: https://bugzilla.mozilla.org/show_bug.cgi?id=867593
Action items
- bhearsum to file bug on slaveapi finer grained access + giving sheriffs full access for now
- Callek to followup with :ashish/IT about 927941/942969
- rail to look into redis restart script
- coop to investigate move to existing webops redis cluster