Performance/Status Meetings/2007-June-20
From MozillaWiki
Participants
joduinn, damon, alice, vlad, schrep, robcee, rhelmer, dbaron, stan
Action Item Update
- AI:Justin see if the perf machines are swapping or if they need more memory.
- install more memory on some machines bug 384314. Arriving tomorrow (thurs).
- AI:robcee bug 383167 tracking problem getting buildID-in-a-file from Tinderbox.
- AI:rhelmer run performance tests with profiling on (rhelmer set up a machine for this, but jprof has problems on trunk)AI:rhelmer needs to change tests to start/stop JProf as part of the test... first implementation lost with harddisk failure, new implementation will be better anyway and checked in! Done, publishing results to experimental. Discussions about Tier1 vs Tier2 support.
- vlad filed bug364779. No longer linux specific, now platform independent.
- people constrained, will do this as part of talos framework. Can use buildbot as other option also.
- vlad filed bug364779. No longer linux specific, now platform independent.
- Getting higher resolution timers for tests
- AI:Damon will meet with Boris about this. Different issues on different platforms.
- Graph server status
- Graph server for easy build-to-build comparisons
- her latest changes now checked into graphs.mozilla.org.
- AI:alice, justin Discussions with IT about having them maintain the machine, not just Alice. Justin & Alice to meet, setup staging & production machines. Justin to support production machine, but not 24x7. Alice to work on stage machines, push to production, like we do for a.m.o and other sites. Justin setting up production machine, estimate 3 days.
- AI:rhelmer/robcee XP machines frequently hang, freeze, out-of-mem, etc. Changes XP machines to clean-boot-and-auto-start-everything. Having them auto-login, start VNCserver, etc. rhelmer would like to do this for both build and perf machines. Run one perf machine rebooting-every-24hours, compare results to perf machines that are not rebooted frequently. No change.
- Reducing test variance
- AI:schrep will try playing with existing TP2 logs from rhelmer, see if schrep can do math magic. Talked with stats guy, assembling raw data in csv file.
Agenda
- Generate reliable, relevant performance data (already underway as talos). Talos status update?
http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTest
- Areas where help is needed
- expand the scope of performance testing beyond Ts/Tp/TXUL/TDHMTL
- reduce noise in tests to ~1% (suggested by bz, not started)
- move perf tests to chrome, so we get more reliable results, and can test more than just content
- improve performance reporting and analyses:
- Better reports for sheriffs to easily spot perf regressions
- Tracking down specific performance issues
- stats change to track AUS usage by osversion.
- Priorities for infra:
- Generate historical baselines
- General profile data regularly on builds
- Getting the perf numbers more stable
- Developing the graph server to display time spent in each module
- New ideas
- Question: How are we tracking perf bugs, specifically, and are we doing this the same way we are triaging security bugs? Can we do it the same way if not? (damon)
Gecko: Perf discussion
- jrpof tinderbox
- jprof profile splitting
- need list of functions checked in
- just toplevel chunks (whatever you want separated out)
- need to go back to 1.8 and collect data from there
- need list of functions checked in
- jprof profile splitting
- synthetic tests
- put specific tests into talos framework
- poach mochitest performance tests
- individual reftest-like items
- need to get a bunch of people looking at profiles
- take up some of these meetings to sit down and look at profiles
- timer-based profiling is better (vtune/jprof/oprofile/etc., not quantify)
- TODO: vlad to generate profile for next week's meeting
- running without Fx chrome
- need dummy history impl
- that would get us a number that's just pure gecko, and not timing the firefox UI
- rhelmer's extension to run Tp2 in chrome
- examining default theme for performance issues
- Tp vs. Tp2
- Tp2 avoids a lot of firefox UI stuff because it's in an iframe
- Tp needs server-side code, but it doesn't have to be the exact code that's there now
- probes
- has value, but needs to be maintained
- merge dtrace/nsProbes/etc. so that we can use whatever tool without reinstrumenting the code
- dbaron already maintains entrypoints into layout to assert that layout doesn't reenter incorrectly
- TODO: vlad/dbaron - connect with sun guys, take probe stuff somewhere
Other Information
Followup on JS timing granularity: turns out that it's not JS timing errors after all! Instead, it's the synthetic load I was using, which was to multiply a pair of numbers. If the load is changed to be addition of numbers, the measured time is a clean linear function of the number of additions. So the mystery becomes one of why multiplication in JS sometimes takes about 16ms longer than expected, but at least we're not suspicious about our measurement tools.