Buildbot/OutageReports

From MozillaWiki
Jump to: navigation, search

We started collecting Outage Reports for Tinderbox last year as a means of determining what intermittent failures we were hitting on each platform. This allowed us to track failure patterns over time and helped us figure out where the highest value fixes were.

Many of the errors are difficult to fix or perhaps even unfixable (e.g. toolchain hangs on Windows), but having a history of outage reports with sufficient diagnostic information allows others (e.g. IT) to restart a hung system without outside intervention.

Jan 2008

  • 2008/01/28 - Unittest machines and Talos machines aborted mid-run
  • 2008/01/21 - Talos machines not reporting any numbers to tinderbox

Nov 2007

  • 2007/11/12 - qm-win2k3-01 Buildbot machine required full clobber
  • 2007/11/11 - Windows Buildbot machines required full clobber

Oct 2007

Sep 2007

  • 2007/09/28 - Talos: SIGKILL failed to kill process
  • 2007/09/28 - Unit tests: "no more space left on device"
  • 2007/09/28 - Unit tests: 'resource unavailable'
  • 2007/09/28 - Unit tests: stuck processes
  • 2007/09/17 - Unit tests: stuck processes
  • 2007/09/11 - Try server: SIGKILL failed to kill process
  • 2007/09/09 - Try server: SIGKILL failed to kill process
  • 2007/09/05 - Update to configuration files was not propagated internally via a reconfig request.
  • 2007/09/04 - Try server: SIGKILL failed to kill process
  • 2007/09/04 - Try server: SIGKILL failed to kill process

Aug 2007