Bugzilla Anthropology/2013-01-29
Contents
ElasticSearch Summary
ElasticSearch (ES) is a fast and scalable document store. Each Bugzilla bug is extracted, comments and titles removed, and inserted into ES as a series of JSON documents; each represeting a point on the bug history. This is done for all bugs, including security bugs.
Publicizing ES Data
Currently we are going through a security review to identify the changes required to publicize the ES data. There are two main reasons to do this:
- Academic Interest in Mozilla's rich bug repository
- Olga Baysal – University of Waterloo - Specifically looking at review times and how the rapid release cycle has change them: [1]
- Sean O'Riordain – PhD into the statistics of bugs in software at Trinity College, Dublin, Ireland
- David Eaves – Has interest in what motivates/discourages volunteer contributions at Mozilla: [2]
- Working Conference on Mining Software Repositories (MSR) – May focus on the Mozilla codebase in 2014 [3]
- Increasing Mindshare - We assume the positive effect the BZ Rest interface had on the number of dashboard tools can be amplified further given the speed of ES:
- David Bosewell maintains a [page of dashboards]
- [B2G Dashboard]
- [Firefox OS]
Current Work
Focus has been on dashboards for long term planning; identifying trends that may need action, and measuring the results of those actions over time. These are meant for management and senior management who have the responsibilities over a long time scale.
- Please Note - All dashboards are not specific: They are generalized to cut by Product, Program and Team.
- Weekly updates are on [Kyle Lahnakoski's people page]
Review Queues
Initial work focused on summarizing the review queues. The majority of the work was overcoming the technical limitations of ES.
- Are reviews being done fast enough?
- Is my team doing reviews fast?
- Why are some reviews slow?
- Do reviewers have requester bias? Are certain individuals' outgoing reviews dealt with slowly?
- Are community reviews being done more slowly than MoCo?
Open Bug Counts
Counting open bugs by program/product/component and team. Again, significant work required to keep it fast.
- What is volume of work?
- How is work volume changing over time?
- Why are some bugs taking long to resolve?
- Is a project tracking to expected burndown?
Percentile Ages
Long term Programs can benefit from looking at the percentiles on the age of bugs over time. We can see the positive effects of focusing on security, while the negative effects of demphasizing Snappy:
Operational Dashboards
There was little work done on daily operational dashboards:
- There are no known challenges implementing them on ES instead of BZ
- Operational dashboards already exist in various forms throughout Mozilla already
- ES's superior response times improves the code/debug cycle
- ES's superior response times eliminates need for client-side caching
One was made for B2G, and extended to all programs anyway:
Future Work
Project Dashboards
- Burndown Prediction - Use close rate data and expected scope, to provide optimistic and pessimistic project completion dates.
- Metro
- Track Velocity
- Ratio Velocity to Estimates
- Per iteration
- Rollup Work -> Story -> Epic
- Track Defects
- Track Carryover
- Track Unplanned Scope
Triage
I have not begun to look into the triage process and how it affects bug queues, and how to track "progress" over time.
Release Engineering
Chart time to identify tracking bug, time to confirm, time to resolve: For trunk and for each release.
Tracking Regressions
Matching regression bugs back to regressor.
More Data Sources
- Git – Diffuse nature of Mozilla
- Mercurial – Relating to size of patch, relating patches to each other
- Comment Metadata – Showing activity and liveliness of a bugs
- PTO - plus statutory holidays, weekends can reduce variance significantly
Parallel Efforts
There is distinct need for more tools to better manage the large number of issues BZ deals with. This can be seen in a number of small projects scattered throughout Mozilla:
- B-Team – Working directly on BZ to improve it’s dashboards
- David Bosewell – interest from the community perspective: needs to measure the effectiveness of the community programs
- Liz Henry – Looking into bug triage practices [Bugmasters!]
- Marco Mucci – Currently using [Scumbugs] to track Metro, but needs better tools
- UX Team – has settled on BZ for tracking bugs, but need tools to manage their work
- Release Engineering – Has hired an intern to produce operational dashboards: to sort tracking bugs by component/priority and assignee.
ElasticSearch (Technical Summary)
Despite ES's technical limitations, the current Javascript libraries give us both fast and expressive dash-boarding capability: Scanning 7million documents in sub-second time.
- ES Highlights
- Reduce BZ Load - queries can be moved to ES
- Fast - Automatically indexed, and in memory.
- Scalable - Sharding assumes every document can stand alone
- Extensible - MVEL scripting language allows arbitrary code to be run on the server-side
- Limited Filtering - ES was designed for document search. BI queries require complicated filtering rules across multiple relations. ES' nested filters do not compare.
- Limited Grouping - ES is designed only from simple grouping and document counting
- Enhancments To Date
- Javascript library to convert SQL-like queries to ES/MVEL queries
- Javascript DB implementation to perform the joins and sophisticated calculations on client
- Remaining Issues
- Poor Stability - Need more human resources to identify and tune the existing ES cluster
- Cluster Too Small - The hardware is 5 years old, and never setup to run production queries
- Not centralized - Other projects are using ES instances, spreading the work over one large cluster will tame the relative usage peaks.