Mobile/Testing/07 25 12
From MozillaWiki
Contents
Previous Action Items
- Jake to get an estimate for new tegras to be online - Response: https://bugzilla.mozilla.org/show_bug.cgi?id=767658#c4
- jmaher to figure out if turning off tests on m-c improved failure rate
- armen to send blassey a list of things on releng's plate
- blassey to put that list in priority order (may take more than a week)
- ateam and releng to figure out the reboots
Status reports
Dev team
- Bug 775227 OutOfMemory or "out of memory" during mochitest
Rel Eng
Linux Foopy
- WIP
(status from last week)
- Linux Foopy --
- Created a new tegra-host-utils for Linux, based on a very recent m-c. (c.f. bug 742597 c#32 )
- Working out load-requirements with RelOps (Looking like disk i/o is our largest concern, which happens in spikes rather than continuous. Evaluating hardware vs VMs)
- In staging these are currently able to connect to buildbot, take a job, but we are failing to run xpcshell for unit-tests, and Talos tests are also failing to properly run.
- Will be working more on this today/this-week to try and work out remaining issues.
Tegra
- reboot logging landed
- reboot logging code backed out
- it was taking the pool down - this morning we were down to 49 running tegras
Panda
- Nothing new
- bug 725544 - (android_4.0_testing) [tracking bug] Android 4.0 testing
-
bug 773517completed - IT to verify PDU setup - to be done this week - bug 776728 - chassis acceptance testing (ateam on point)
- bug 769428 - provide usable image that can work (a-team)
- bug 725845 - run 10 panda boards through buildbot staging systems
-
Beagle
- Nothing new
- bug 767499 - (track-armv6-testing) Beagle Board Arm v6 Support Tracking
- blocked on working image (a-team)
Other
- x86 on hold for now bug 750366
IT
- Developing a remote re-imaging process for pandas bug 764534
- Updated boot.scr source for pxebooting in scl1 - https://wiki.mozilla.org/Auto-tools/Projects/Pandaboard_Setup/Using_Linaro_Prebuilt_Image
- Working on a higher density chassis bug 777393
- IT will be expanding in SCL1 Datacenter for mobile device housing bug 774477
- 81 new tegras will be put into production pool bug 767447
- Building 5 new panda chassis for scl1 bug 777359
A Team
- 28.46% failure rate - an immprovement from last week!
- tests with 50% or greater failure rates: M8, rc, R3, rp, rck, rck2
- today deployed talos changes to add telemetry preference and actually fail on __FAIL
- a few tests disabled and turned back on
- panda update
- stability seems to be impacted by screen resolution, adjusting to 1024x768 resolves almost all stability problems
- reftests are stable (except using --ignore-window-size)
- mochitests fails 100% due to running in iframe
AutoPhone / S1/S2 Automation
- Reran S1/S2 from June 3rd to around July 18 to verify regression (on nightly builds only).
- First run (http://mrcote.info:8118/#/org.mozilla.fennec/throbberstart/local-twitter/04_46_65_fd_2f_e1_samsung_gs2%7C78_d6_f0_cf_8d_11_nexus_s%7C78_d6_f0_cf_d2_17_nexus_s/2012-05-26/2012-07-25) showed regression in throbberstart
- However, we were not rebooting the phone in between runs.
- Second run (rebooting between runs) (http://mrcote.info:8119/#/org.mozilla.fennec/throbberstart/local-twitter/04_46_65_fd_2f_e1_samsung_gs2%7C78_d6_f0_cf_8d_11_nexus_s%7C78_d6_f0_cf_d2_17_nexus_s/2012-05-26/2012-07-25) showed a regression but stopped around June 12.
- First run (http://mrcote.info:8118/#/org.mozilla.fennec/throbberstart/local-twitter/04_46_65_fd_2f_e1_samsung_gs2%7C78_d6_f0_cf_8d_11_nexus_s%7C78_d6_f0_cf_d2_17_nexus_s/2012-05-26/2012-07-25) showed regression in throbberstart
- In both cases, there was no regression in throbberstop (second run: http://mrcote.info:8119/#/org.mozilla.fennec/throbberstop/local-twitter/04_46_65_fd_2f_e1_samsung_gs2%7C78_d6_f0_cf_8d_11_nexus_s%7C78_d6_f0_cf_d2_17_nexus_s/2012-05-26/2012-07-25). There may have even been an improvement around June 14/15, at least on the Nexus S.
- Finished a simple smoke test for AutoPhone. Nearly landed patch to switch all adb usage to SUT; should improve reliability.
Eideticker
- Somewhat slowed down this week as time spent unwinding/debugging tegra and panda issues
- Misc. stability / documentation fixes
- Dashboard actually seems to be running fairly stablely, except last night when we ran out of disk space (will fix dashboard to save less historical image analysis data, as it's clearly not of much value)
- Found a potential regression in the CNN test via the dashboard, filed as bug 777357
- (patch to import backdated info still under development)
Round Table
- (armenzg/jmaher) we would like to focus on stability of current setup and work towards panda as we get out of the forest
- The failures are killing reliability causing changes to back up on try (due to retriggers both on Try and elsewhere) and no one is paying any attention to the failures. What do we do with this?
- Hide the particular suites with >20-30% failure rate?
- Disable more tests to get to a reasonable failure rate?
- Fix the reboot issue so that we can get to a reasonable failure rate?
- bug 748488 taras/glandium: Incremental decompression is the next big startup win. It needs releng infrastructure to gather profiles.
Action Items
- clint to get list of top failures
- hal to get estimates for bug 748488