Webdev:Meetings:2009-09-01
From MozillaWiki
Open Items
- AMO team is seeking ideas: We generate CSVs from our statistics data for add-on authors. There are 3 date groupings and 8 different ways to plot the data. The problem is, the historical data continues to grow and we're running out of memory building these huge CSVs on the fly. Ideas:
- When the uninstall survey started dying on CSVs we used a cron to build them and cache them to disk. If we do this for all our add-ons that's well over 200,000 files and growing. Perhaps we can combine this with one of the other ideas.
- Provide less historical data. Right now it goes all the way back. Restricting that is weak sauce.
- Reduce the number of groupings/plots. What if we just provided CSVs for daily downloads with a couple sets of columns. That's only ~15000 files per set of columns. Still a lot.
- 1 row = 1 day, right? What if past $x weeks in history, we only offered monthly totals? i.e. data older than 6 months is 1 row = 1 (week/month)
- Generate CSVs for add-ons with more than $x weeks of history. Eventually we'll have #1.
- Write something way lighter weight to build CSVs on the fly. We can't scale this way forever though.
- Limit the number of rows returned but provide paging params to view older ranges of data
- +1 --wenzel
- Output CSV as it is generated and bypass Cake views, thus avoiding the need to generate huge arrays of data
- Each add-on id gets its own tables stats.addonid.* and some data is only offered for a year:
- *.downloads: date, version, n° of downloads
- *.usage_total: date; sum of update pings
- *.usage_apps: date, app, update pings
- *.usage_ly_versions: date, version, update pings (only for last year)
- *.usage_ly_apps_and versions: date, version, app, appversion, update pings, userEnabked pings, incompatible pings (only for last year, is there a need for needsDependencies or blocklisted?)
- *.usage_ly_os: date, app, os, update pings (only for last year)
- Maybe a service to mail the developer the csv data once per week/month
- Can metrics do this for us?
- Why get all the data in memory at once? We could have a little service that builds the csv and streams it out to disk. Let that stay cached for however long is appropriate.
- add more ideas! thx