Buildbot/OutageReports
From MozillaWiki
< Buildbot
Contents
We started collecting Outage Reports for Tinderbox last year as a means of determining what intermittent failures we were hitting on each platform. This allowed us to track failure patterns over time and helped us figure out where the highest value fixes were.
Many of the errors are difficult to fix or perhaps even unfixable (e.g. toolchain hangs on Windows), but having a history of outage reports with sufficient diagnostic information allows others (e.g. IT) to restart a hung system without outside intervention.
Jan 2008
- 2008/01/28 - Unittest machines and Talos machines aborted mid-run
- 2008/01/21 - Talos machines not reporting any numbers to tinderbox
Nov 2007
- 2007/11/12 - qm-win2k3-01 Buildbot machine required full clobber
- 2007/11/11 - Windows Buildbot machines required full clobber
Oct 2007
- 2007/10/09 - Mozilla1.8 Talos: slave lost
- 2007/10/02 - Unit tests: 'stuck processes'
- 2007/10/01 - Unit tests: 'resource unavailable'
Sep 2007
- 2007/09/28 - Talos: SIGKILL failed to kill process
- 2007/09/28 - Unit tests: "no more space left on device"
- 2007/09/28 - Unit tests: 'resource unavailable'
- 2007/09/28 - Unit tests: stuck processes
- 2007/09/17 - Unit tests: stuck processes
- 2007/09/11 - Try server: SIGKILL failed to kill process
- 2007/09/09 - Try server: SIGKILL failed to kill process
- 2007/09/05 - Update to configuration files was not propagated internally via a reconfig request.
- 2007/09/04 - Try server: SIGKILL failed to kill process
- 2007/09/04 - Try server: SIGKILL failed to kill process
Aug 2007
- 2007/08/30 - Unable to roll cvsco.log files.