TinderboxTLC
Congratulations! You may be the proud new user and/or sheriff of a Mozilla Firefox Tinderbox. Here are some important bits of info you should know.
Contents
- 1 There's a new sheriff in town
- 2 Uh oh, something broke
- 2.1 A unit test failed
- 2.2 Why isn't that box doing anything?
- 2.3 Failed to kill process
- 2.4 XUL Popup test failures
- 2.5 libpr0n reftest failures
- 2.6 make[6]: *** INTERNAL: readdir: Bad file number
- 2.7 Error: bloat test timed out after 1800 seconds.
- 2.8 bzip2: Data integrity error when decompressing.
- 2.9 Spurious RLk test failure on bm-xserve11
- 2.10 (add more common failures here)
There's a new sheriff in town
- Sheriffing schedules are posted on the top of the Firefox Tinderbox page
- The Sheriff_Duty page contains a general overview of what sheriff duty entails
- The password for tinderbox/bonsai administration is kept in the sheriffpass bug, if you've been deputized an existing sheriff can CC you to the bug.
- How to back out changes
- How to clobber on Hg
Uh oh, something broke
The Tinderbox tends to be inflammable (what a country!). While code checkins can obviously cause build/test failures, a box may fail in various ways not related to checkins. Some problems fix themselves in the next cycle, others require filing a server ops bug (in mozilla.org / Server Ops: Tinderbox Maintenance).
A unit test failed
If you see a unittest failure, please check the dependencies of bug 438871 to see if it's already filed. If it isn't, then you should file a bug! A failing test *is always a bug*. It's either a bug in the test or in the code, but it needs to be tracked.
Why isn't that box doing anything?
- Some boxes only start doing something when there's a checkin. If a cycle fails for some reason, if will remain red/orange until a checkin triggers a new build.
- Force a build to start by making a trivial checkin:
- Make a whitespace change
- Fix a spelling from the spelling bug.
Failed to kill process
- The Windows unit test boxes may fail with errors like:
buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process Failure: buildbot.slave.commands.TimeoutError: SIGKILL failed to kill process
- This might indicate a hang while running the test
- It might also mean that a test threw an exception that caused the test harness itself to fail, and the test running script timed out and tried to kill the browser
- This condition may not be caused by a specific checkin, and might fix itself upon the next cycle (when someone checks in).
XUL Popup test failures
- These are due to browser window focus problems on the test box, often because someone logged in to do maintenance and left something else focused.
- File a server ops bug to have the problem corrected
- Sometimes it's unclear why there's a failure, and the box needs rebooted?
libpr0n reftest failures
- Multiple failures on Windows in
modules/libpr0n/test/reftest/
can be due to the box reverting back to 16-bit color mode. See bug 414720 for history.- Caused by someone connecting to the box with a RDP client in 16-bit color mode.
- File a server ops bug to have the problem corrected.
- REFTEST UNEXPECTED FAIL (LOADING)
- Fixed by bug 425987
make[6]: *** INTERNAL: readdir: Bad file number
- Happens occasionally on the Windows unit test box (eg qm-win2k3-01).
- This condition should fix itself upon the next cycle (when someone checks in).
Error: bloat test timed out after 1800 seconds.
- Seen sometimes on qm-win2k3-01 (other platforms too?)
- Should fix itself upon the next cycle (when someone checks in)
bzip2: Data integrity error when decompressing.
-
Happens sporadically on Talos boxes -
Should fix itself on next cycle? - Fixed by bug 427728, 15 May.
Spurious RLk test failure on bm-xserve11
-
Error: Leak Test Failed: Number of leaks 1234 is greater than LeakFailureThreshold 0
- Infrequent, often missed due to fast cycle time of this box
- Tracking in bug 412545
(add more common failures here)
See also: http://wiki.mozilla.org/Buildbot/IT_Unittest_Support_Document