Firefox/Channels/Postmortem/44
From MozillaWiki
Firefox 44 Post-mortem
- [lmandel] Postmortem on a beta regression (bug 1240559) from Mike Conley on firefox-dev
- Soccoro issues during 44 cycle
- 1 week outage during the xmas week.
- another outage within 48 hrs of going live (critical window for us) :(
- Seems to be a lack of confidence in the current stability data and rate - specifically apparent for Android. We need to figure out what is needed so that we all trust the crash data.
- [Action item]: Ritu to discuss with Lonnen
- [Action item]: Kevin/KaiRo/Ritu to help reconcile Fennec crash rate and help relman understand how to self-serve that data while driving a release.
- NSS Updates
- We need to communicate a timeline to NSS devs that for a Beta cycle we need fixes by Beta3/4 and only extremely critical fixes after that.
- [syl] We discussed with the NSS team and we are probably going to treat NSS changes just like any other Firefox changes
- [lmandel] potentially move NSS in-tree
- [Action item]: Sylvestre to let us know when this can be put into place.
- Hotfix required for bug 1242176 to address password lost, discovered during throttling period
- We need to improve the hotfix situation. Maybe force the hotfix ping before the update / after
- Hotfix is not the right mechanism when we need to gaurantee delivery of a fix before an update as we needed with the places db in 44
- We should spec out a project for hotfix improvements
- support for delivering multiple payloads concurrently
- gauranteed delivery - force application of hotfix before applying an update
- improve data about uptake - can/should we do this with Telemetry?
- [Action item]: Ritu to communicate these items on hotfix improvements to hotfix team.
- Missed uplift of graphics crash fix in bug 1222171 that eventually ended up as a ride-along in 44.0.1 (top-crasher)
- Fennec 44 adjust not working.
- [Action item]: Getting adjust ping in Beta, Aurora, Nightly so we can catch these issues before we go live.
- Beta - https://bugzilla.mozilla.org/show_bug.cgi?id=1248066
- Aurora / Nightly - https://bugzilla.mozilla.org/show_bug.cgi?id=1248764
- [Action item]: Getting adjust ping in Beta, Aurora, Nightly so we can catch these issues before we go live.
- Fennec spent most ofthe beta cycle chasing the crash rate. Disabling the dynamic tilecache made the biggest difference. Holidays did not help here ~10days of people on pto.
- [syl] The new schedule is going to address the holiday issue. No more work during holidays \o/
- Good team work finding and fixing issues to improve Fennec stability during late Beta 44 cycle.
- Cookiegate https://bugzilla.mozilla.org/show_bug.cgi?id=1244505#c21
- [syl] we had some concerns initially ...
- [syl] Hard to address as webdevs are using the web in different ways
- In future, we should consider using telemetry data to understand what sort of an impact this sort of change to bring about
- [lmandel] We spent a lot of time during the cycle trying to figure out whether we should slip, skip, or release on time. A big part of this time was spent because we don't have a mitigation plan or clear criteria in place. We have projects underway to develop these. I think the cycle reinforces the need for this work.
- [Action item]: RelMan working on a criteria