CrashKill/StabilityWeek2013/Actions
From MozillaWiki
Session | Person | Item | Bug | Status |
---|---|---|---|---|
Rethinking Socorro Storage | laura | Push on the ops/training/support troubles. Try and get some dedicated attention (a person, for instance) allocated to us. | ? | |
Rethinking Socorro Storage | lonnen | Backups: costed on Amazon Glacier | ? | |
Rethinking Socorro Storage | laura | Backups: waiting on bmoss to approve the expense | ? | |
Rethinking Socorro Storage | erik, rhelmer | Write a failing test against some kind of playground environment to expose Thrift failures. Once that's failing, we can do experiments to figure out what's wrong. | ? | |
Rethinking Socorro Storage | selena | Research alternative connection methods to Thrift | ? | |
Rethinking Socorro Storage | rhelmer | Rhelmer will look at CM4 and see what we can get out of that for monitoring, etc | ? | |
Rethinking Socorro Storage | selena | Go shopping for query tools - Cascading, Hive, Impala, etc - must be open source | ? | |
Rethinking Socorro Storage | selena | Research alternatives to HBase | ? | |
Socorro and FHR: Testing, the master plan or the mutating reality | mbrandt | Get Selenium suite polling for new Socorro commits so the tests get run more promptly and we have less trouble tracking down regressions. | ? | |
Socorro and FHR: Testing, the master plan or the mutating reality | selena | How about some tests for our stored procs? Write some. They'll call the procs through the middleware layer. | ? | |
Socorro and FHR: Testing, the master plan or the mutating reality | mbrandt | Stop running tests against dev. | ? | |
Socorro and FHR: Testing, the master plan or the mutating reality | selena, rhelmer, lonnen | Audit stage's config; make sure it's the same as prod. | ? | |
Socorro: Stats | lonnen | File a bug and do something about it. | bug 908334 | FIXED |
Socorro: future of dev | rhelmer | redo Vagrant-lite | ? | |
Socorro: future of dev | rhelmer | developer HBase polished and working (tmary bug 872810, rhelmer to follow up) | bug 907964 | ? |
Socorro: future of dev | lonnen | build new dev env on the VM dumitru gave us | ? | |
Socorro: future of dev | laura | nag IT for puppet write access for everybody | ? | |
Socorro: future of dev | lonnen | follow up on options for try | ? | |
Improving User Support after Crashing | KaiRo | Finish up bug on listing last 3 days of crashes in about:support | bug 765285 | poked, awaiting new patch |
Improving User Support after Crashing | gps | expose "disable addon with ID X" API from chrome to "healthreport" | bug 912815 | filed |
Improving User Support after Crashing | Arun | come back for ideas for design changes in the client (Arun's notes) | ? | |
Improving User Support after Crashing | bsmedberg | Support classification | ? | |
Improving User Support after Crashing | brandonsavage | API to say "tell me what classification for this crash is" [Magic 8 ball API] | bug 915667 | filed |
Improving User Support after Crashing | bsmedberg to follow up | Reduce the junk in the help menu. | ? | |
Improving User Support after Crashing | bsmedberg | Redesign email flow | ? | |
Improving User Support after Crashing | bsmedberg | Support article embedded in crash report ? Need to investigate this further for implemantion | ? | |
Improving User Support after Crashing | laura | Develop plan for processing 100% of crashes only for their support classification | ? | |
Improving User Support after Crashing | laura | Develop plan for API for Firefox client to query for crash classification | ? | |
New B2G Socorro Features | link to per-device crash lists from per-vendor wiki pages | ? | ||
New B2G Socorro Features | Chase ADI data, verify | ? | ||
New B2G Socorro Features | akeybl/khu | follow up with TAMs about getting a crash report per partner (soft blocked on crashme app, instructions to get crash ID) - see notes for details | ? | |
New B2G Socorro Features | bajaj | checklist item for channel to be release-* | ? | |
New B2G Socorro Features | KaiRo | poke catlee for proper channel naming internally | bug 918068 | DONE, bug filed |
B2G Stability Process | communicate the device-specific crash report | ? | ||
B2G Stability Process | akeybl | Propose creation of a confidential partner-specific Boot2Gecko::Partner Issues (just like POVB) | ? | |
B2G Stability Process | nhirata w/ robert wood | automated testing should enable crash metrics for TAMs to use (endurance testing) | DONE | |
B2G Stability Requirements | [Socorro] | expose system app URLs publicly | bug 915397 | filed |
B2G Stability Requirements | [Socorro] | reports per-device | bug 853455 | filed |
B2G Stability Requirements | [Socorro] | report by OS version 1.1.*, 1.1.2, etc. | bug 832362 | filed |
B2G Stability Requirements | surface the buildID to understand geo-specific stability issues | NEEDS BUG [Socorro] | ||
B2G Stability Requirements | bsmedberg | replace debuggerd | NEEDS BUG | |
B2G Stability Requirements | bsmedberg | need to check for process responsiveness from an external place (app pings), and also pull system logs when reporting | bug 908000 | filed |
B2G Stability Requirements | way to report issues to Mozilla from settings, with crash reports (input? etc?) | bug 915409 | filed | |
B2G Stability Requirements | KaiRo | KaiRo->Fabrice to discuss B2G version number-less reports | bug 910836 | filed |
B2G Stability Requirements | nhirata/laura | look at bug 895246 (duplicate reports) - client side bug as well, why are we getting dupes at all? - switch dupe detection to use minidump checksumming - bug 907499 | ? | |
B2G Stability Requirements | Get activation time be reported as install time | bug 915405 | filed | |
B2G Stability Requirements | about:crashes on the phone | bug 908896 | filed | |
B2G Stability Requirements | fg OOM kills (FHR? crashes?) | bug 915407 | filed | |
Meet with B2G Engineering working on Crashes | akeybl/legal | We need to have access to source(gecko only? or also Gaia?) both. ok, thx. | ? | |
Meet with B2G Engineering working on Crashes | akeybl/branding/TAMs | find out about requiring that OEMs send us their device to get stability support | ? | |
Meet with B2G Engineering working on Crashes | we need to document how to investigate issues (source, device access, partner contacts, etc.) | ? | ||
Meet with B2G Engineering working on Crashes | laura | check with Anurag where we are in having Socorro consume ADI data | ? | |
Meet with B2G Engineering working on Crashes | KaiRo to file / jlebar | foreground applications that go out of memory should be found | bug 915407 | filed |
Meet with B2G Engineering working on Crashes | we need a bug for uploading crash reports once on wifi and idle | ? | ||
Meet with B2G Engineering working on Crashes | bsmedberg | we need a bug for comments and emails in B2G | bug 907998 | filed |
Metrics | KaiRo | drive project on long-term crash rate graph. | bug 915438 | filed |
High Level CrashKill Review | KaiRo | arm/flash top crash bugs | bug 918085 | filed |
High Level CrashKill Review | [socorro] | triage bugs that kairo has on file for custom crash reports | ? | |
High Level CrashKill Review | longterm crashes, plugin crashes, hangs, plugin hangs | bug 915438 | filed | |
High Level CrashKill Review | bsmedberg | annotate phase of startup as part of crash, along with actual time | bug 907994 | filed |
High Level CrashKill Review | new crash severity rating | bug 918077 | filed | |
Socorro Brainstorming | KaiRo | find items from notes to actually implement | ? | |
OOM/EMPTY crashes | gps | Annotation on "how many tabs were open?" | ? | |
OOM/EMPTY crashes | dmajor | Annotation on OS (maybe also build architecture?) so EMPTY dump reports still get that (and use it in Socorro) | bug 838061 | ? |
OOM/EMPTY crashes | bsmedberg to file | Annotate that the slow script dialog came up | bug 907993 | filed |
OOM/EMPTY crashes | JS team, laura to nag | get JS stacks on the toplevel for uncaught exceptions | bug 630464 | ? |
OOM/EMPTY crashes | KaiRo [not yet] | Talk to jjensen about what crash data we can put in FHR, bug is leading up to this | bug 875562 | gps has etherpad notes (see bug) |
OOM/EMPTY crashes | laura/kairo | summarized info on gfx chipset and driver, memory, hardware/cpu | bug 853468 | FIXED |
OOM/EMPTY crashes | kairo to file | Annotate/show events like Firefox or MS releases on the crash charts? SUMO already has the ability to note "events" on their graphs. They use it for releases as well. Maybe we can steal or learn. | bug 915438 | noted in bug |
OOM/EMPTY crashes | Kairo to follow up with privacy | Submit all URLs from all tabs with a crash report | contacted afowler | |
Automated Stability | QA | look at the output from the bug hunter tool given the current scenario | ? | |
Automated Stability | bsmedberg | [help with the action needed to resolve the issues] Disconnect between fuzzer bugs and the signature in crash-stat ? | ? | |
Automated Stability | gkw | ensure decoder uploads stacks as well with the bug reports | ? | |
Automated Stability | Jesse | release part of fuzzers so nightly population can run it | ? | |
Automated Stability | KaiRo | escalate necessity of tests passing on various sanitizers so we can have tests and fuzzing find issues using those | DONE, tests PASS in ASan | |
Automated Stability | KaiRo | Ease routine jobs of putting ranks into bugs and updating topcrash keyword | bug snorkel | filed |
Automated Stability | KaiRo | Get a service on Socorro that will give current ranks of a signature on current releases | bug 915373 | filed; prototyped by bsmedberg |
bsmedberg hour | bsmedberg | JSON version of minidump stackwalk | ? | |
bsmedberg hour | bsmedberg | Low end computers need more focus | ? | |
bsmedberg hour | bsmedberg | Issues that happen to a lot of people a little, instead of a lot to a few people | ? | |
bsmedberg hour | bsmedberg | "far crashes" - happening far away from their cause, signatures pretty useless | ? | |
bsmedberg hour | bsmedberg | Firefox OS (also a priority) | ? | |
bsmedberg hour | bsmedberg | empty crash signatures | ? | |
bsmedberg hour | bsmedberg | improving stack walking (depends a lot on JSON minidump stackwalking) | ? | |
bsmedberg hour | bsmedberg | Make Upload Symbols very easy either encrypted or not | ? | |
bsmedberg hour | lars | JSON minidump_stackwalk getting deployed | ? | |
JS Engineering | bajaj | Create a bug# to have the total GC crashes on Nightly graphed | bug 915317 | ? |
JS Engineering | KaiRo | ask Scoobidiver for information on anything that are potentially automatable (Scoobidiver could get hit by a bus, God forbid!) - actually, it looks like he disappeared right before this stability week :( | Scoobidiver is out | |
JS Engineering | Laura | find list of what JS team wants to filter on, determine whether these are common between teams or we need a flexible solution | ? | |
JS Engineering | [Socorro] | Enable search results to be split by custom criteria (e.g. addresses) instead of signature | ? | |
JS Engineering | bsmedberg/nbp | Better way of classification for JIT crashes | bug 764223? | |
JS Engineering | Do something (in Socorro UI) with "interesting" addresses (mark them?) | bug 918101 | filed | |
JS Engineering | bsmedberg/nbp | Store where we are in JIT code in a page, dump it as part of the crash stack | ? | |
JS Engineering | Socorro | exploitable crashes could be marked differently, see notes for heuristics | ? | |
JS Engineering | bsmedberg | Make stalkwalker know about (Ion) JIT frames | bug 951176 | filed |
Radeon Update & Driver Investigation | bsmedberg | learning kernel debugging for the radeon crash, from our new in-house expert | ? | |
Radeon Update & Driver Investigation | KaiRo/akeybl/bizdev? | following up on escalation with AMD, in preparation for blogging | pushed off as dmajor found out more details on what's behind this issue | |
Radeon Update & Driver Investigation | lsblakk/bsmedberg | highly correlated crashes in FF23.0, discussing in post-mortem | ? | |
Radeon Update & Driver Investigation | milan | follow up on bug for empty crash | bug 837835 | ? |
Radeon Update & Driver Investigation | KaiRo | MS support ticket on text corruption on Win7 | bug 812695 | DONE, sent email to jrmuizel/bas to check if we really think it's an MS issue |
GFX Engineering | Milan/Anthony/Marc | how to get access to gfx hardware, or access to user computers (remote access?) | ? | |
GFX Engineering | bsmedberg | flow chart for developers: how to get the data that you need, next steps in the investigation, etc. | ? | |
GFX Engineering | Kairo | Put things that are now in app notes into proper crash annotations instead | bug 918102 | filed |
GFX Engineering | brandon | Graphics vendors and devices - q3 goal to add this to signature summary | bug 853468 | ? |
GFX Engineering | Laura/bsmedberg | help get around the legalities involved with crash data access to contributors and external partners (and once those are resolved, the logistics) | ? |