Firefox/Channels/Postmortem/61
From MozillaWiki
Notes for 61 post mortem
10:00am PDT Tuesday July 17 (after the Channel Meeting) Vidyo channel: release coordination IRC: #release-drivers
Contents
Some stats
- around 251 uplifts to beta61
- uplifts during beta: https://mzl.la/2JtfYxn
- uplifts during RC week: https://mzl.la/2Nmbx9X
- pretty much in line with most recent releases
What went well?
- Shortest post-Dawn Beta cycle (and an All Hands thrown in for good measure), went very smoothly!
- Good responsiveness from engineering teams on picking up unassigned bugs and getting patches landed before the last minute.
- Good feedback from Marketing on release notes
- We shipped RDL! \o/
What's new page
- Deployment issues meant that the live WNP wasn't ready for testing until 4am UTC on the 26th (i.e. release day)
- Confusion with QA about expected vs. actual results (compounded by the rollout issues)
- when should it be ready/deployed? Does it need to be on merge day after the merge or can it be tested earlier during RC week?
- Where's the checklist for WhatsNew? https://docs.google.com/document/d/120C5bQ-YmTANWkks4jhEfM50WI87KNp1SwLdULCh6Sw/edit
- during RC week on release-localtest
- We skipped localtest update testing to move faster (going directly to cdntest channel testing)
- can we better clarify the timeline and QA expectations?
- Let's take this back to the WNP team and figure out why the timelines are not being met
- (liz) I will check with them and add items to the RC week/release checklists and the milestones doc
- when should it be ready/deployed? Does it need to be on merge day after the merge or can it be tested earlier during RC week?
Avast/AVG TLS 1.3 issue (bug 1468892)
- Reported against 61.0b14 on June 14 (12 days prior to release)
- Engineering involved June 18
- Tracking requested on June 26 by Philipp (!!)
- Normandy recipe disabling TLS 1.3 for all Win7 users and Avast/AVG users on Win8+ deployed June 28 (still active)
- Anti-virus affects Firefox Windows almost every release, FF 60 was Kaspersky and its a constant pain over the last 9 years, can Product Integrity proactively test AV with RC builds and work with AV folks somehow? <-- [roland] I know it's a hard problem, how can SUMO help?
- QA doing some testing now, complicated due to number of configurations
- Can we automate some of this?
Fennec AllocInfo::Get<T> topcrash (bug 1468541)
- First spiked in 61.0b12
- Linked to Snapdragon 820/821 CPU quickly
- Difficult to reproduce and assign an owner to
- Temporarily throttled updates @ 1% before eventually going to 99% (was blocking Galaxy S8 crash fix)
- Still unclear what happens next, no signs of crash on Beta62 so far
- [marcia] We should be sure that the signature didn't morph into something else in 62
- new sdk work may affect this, signatures may have shifted (for 63)
- [marcia] We should be sure that the signature didn't morph into something else in 62
- Getting affected device directly to the developer is key (in this case nchen)
Late uplifts
- Successful fix for the Galaxy S8 crashes (bug 1460989) \m/
- Uplift would have happened in time for b15, but issues uploading Nightly APKs to the Play Store went un-escalated for a week, delaying Nightly testing results
- Last-minute Android networking feature disabling due to download manager regression (bug 1467755)
- Bug reported Friday before SF All Hands, investigation delayed due to it
- WebRTC sec bug (bug 1458048) needed late uplift and RC respin because fix landed upstream without coordination and uplift wasn't requested in a timely fashion
61.0.1 drivers
- Bug 1471375 - Reports about missing activity stream content on new tab page and about:preferences#home panel
- Reported on 6/26 (go-live day)
- The root cause of this was users with corrupted IndexedDB databases
- The patch landed allowed AS to continue being functional, but doesn't fix the underlying IDB bug
- The AS team did their own post-mortem for this: Meeting Notes
- Bug 1472127 - Update to Firefox 61.0 killed all my bookmarks and also the backups are unreadable
- Reported on 6/29 (post go-live)
- Root cause was new migration code shipped in 61 removing bookmarks with wrong parents (which is an erroneous condition from the start)
- Can we talk to Mak/QA about doing more ongoing testing in later release cycles?
- This is an issue that has occurred in the past. Tom has offered to look into smoke testing this more thoroughly going forward, if possible.
- Examples from prev releases: 1388584 (in 55.0.1), 1206376 (41.0.1), 1206376 (44.0.2)
- Bug 1472137 - Crash in [@ IPCError-browser | ShutDownKill] in mozilla::mscom::Interceptor::~Interceptor()
- Reported on 6/29 (post go-live)
- Also manifested for Chinese users as having an unusable browser (bug 1471824)
- Regressed by bug 1364624 uplifted to beta (along a couple of other shutdownhangs - see below)
61.0.1 notable ride alongs
- Various crash fixes from Windows SRWLOCK change (landed in 61.0b6, not noticed until after we shipped)
- Fix for Windows download issues exposed by sec uplift (bug 1465458)
- Twitch 1080p playback fix (bug 1469257)
- Filed during RC week, didn't have a verified patch ready in time for RC uplift
Normandy hotfix deployments/rollouts
- Avast TLS 1.3 issue (bug 1471672)
- Down to 50% now
- Technical problems with the recipe - should Developers get training on how to create recipes?
- TomGrab mentioned issues with QA testing due to recipe misconfigurations. how to target windows versions, etc
- Pro/cons of default vs. user branch rollouts
- OSX 10.9 OMTP crashes (bug 1472308)
- Still troubleshooting issues with this one
- HTTP throttling v2 algorithm (bug 1462906)
- RDL was also rolled out to release via Normandy
- TLS 1.3 fallback
- (Ritu) We may need to refine, review, share the Normandy rollout process (where to file the bug, who creates the recipe, reviews it, intent to ship, QA testing etc)