Breakpad/Status Meetings/2017-02-15

From MozillaWiki
Jump to: navigation, search

« previous meetingindexnext week » create?

Meeting Info

Breakpad status meetings occur on Wed at 10:30am Pacific Time.

NOTE: Meeting will start 30minutes later than normal due to MoCo meeting.

Conference numbers:

   Vidyo: Stability 
   650-903-0800 x92 conf 98200#
   800-707-2533 (pin 369) conf 98200# 

IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)

Operations Updates

  • antenna week
    • new relic working on stage
    • same load balancer, one node with newrelic and one node without (for profiling)
    • antenna should do intelligent deploys shortly
      • scale down old stack after new stack is up and healthy
  • socorro alerts!!!!
    • pre-warning alerts went off before things broke for users
    • 2 GB per host memory jump in all ES nodes (nearly all of it is fielddata)
      • still not sure what sure what the cause was
    • dropped most of the small indices, down to 24 weeks of retention
    • developed a reliable, repeatable process for adding data nodes on stage
    • unless otherwise directed, going ahead on prod today
  • secreeeet ansible playbooks
    • miles has been keeping ansible files around
    • mostly for dealing with elastic search
    • wrote scripts instead of doing anything manually
    • been kept privately until now
    • but now we know
    • what dark magicks are contained within?


Project Updates

Deployment Triage

PR Triage


Major Projects

Splitting out collector (Antenna)

  • (willkg) set up a "to load test" github project board to keep track of current status: https://github.com/mozilla/antenna/projects/2
    • why? because we've got some things in bugs and a lot of things not in bugs and it was hard to see where things were at.
    • how does it get updated? you update it! and Will will periodically ask people where things are at and update it.
  • (willkg) fixed a bug in Antenna's healthcheck endpoint where if the IAM credentials weren't set up right, it errored on erroring out
  • (willkg) updated antenna-loadtests to use molotov master tip
  • (miles) set up new relic on a single -stage node so we can trace execution
  • (miles) started setting up a -prod environment
  • (mbrandt) we have AIloads (old way), Molotov (the new way). Molotov is ready, waiting on a final r? from Tarek.
  • (mbrandt) there are some questions about molotov features where molotov reporting doesn't provide an easy way to line up server metrics and the molotov reports
  • (mbrandt) at rpopa's recommendation, going to test a single node and increase molotov resources until it breaks, and then do some math
  • (mbrandt) ailoads is ready anytime, needs some responses before molotov goes off
  • (miles) will spin up a single node mbrandt to go after lunchtime

Deprecation rampage

  • no updates

Processor rewrite

  • NO UPDATE

Upgrading elasticsearch

  • (Adrian) still blocked by the most annoying bug (but Will gave me some tips to solve it, working on it)
    • I suspect it might be a bug in an underlying library
  • (Adrian) mapping issue is solved, just need to import prod's Super Search Fields data locally and mess with it
    • ... once I have a Socorro that works with ES 5.1

Other Business

  • outage looks like background quantum aws weirdness and re-balancing

Travel, etc

  • Adrian switching Monday (out) and Friday (working) next week

Links