Wormhole
Intro
Project Wormhole (previously known as Blackhole), was a collaborative effort to develop and maintain an infrastructure of gathering and serving raw contribution data within Mozilla.
Over the many years of contributions in Mozilla we have not been able to get a holistic view of contribution data to do contribution analysis, metrics and de-duplicate our contribution systems (Bugzilla, SUMO, AMO, MDN, hg, Reps etc)
Project Wormhole makes sure to integrate with those systems, get the contribution data from all different functional areas and store them as raw data, which then in turn become available for querying.
Overview
Wormhole data architecture and flow can be outlined in the following diagram:
In brief, there's a central database into which unrelated systems push data describing contributions. This data can be queried very broadly (eg. show me contributions by a certain user; show me contributions made during a certain period; show me a certain type of contribution), as well as with very deeply (eg. show me Bugzilla triage contributions for the Firefox product). All data stored has a small amount of commonality, describing where it came from, who contributed, when they did it, and how to find the original contribution. Further information available varies per type of contribution, so Wormhole users can query specifically for it if they're interested.
In depth, the preferable push infrastructure is built on pulse. A system accepting user contributions sends a pulse message when a contribution is made, with as much detail as is required to give the contribution context. A Wormhole pulse consumer observes the message, and stores the contribution data in the database with very little processing. A frontend for the database allows complex queries to be executed via a RESTful web interface, allowing consumers to build tools that integrate with Wormhole by performing on-demand lookups.
Roadmap
- Create a prototype to determine viability
- Set up database instance (done)
- Create pushers for several high-volume data sources (source code):
- Bugzilla (live update done, no historical data)
- mozilla-central (historical data done, no live update)
- SUMO (live update prototyped, no historical data)
- MDN (not started)
- Get Involved mailing list (done; monthly batch update)
- Github (not started)
- Create prototypes of useful consumers:
- Analyze database space requirements and frontend responsiveness, in view of consumer requirements
- Assuming viability, find a permanent home for Wormhole and associated infrastructure
- Build pushing infrastructure for remaining automated systems
- Build tools to help less-automated teams/areas/communities hook in
- Publicize
Documentation
Database schema: Wormhole/Schema
Play around with the data live (wget/curl required for JSON output; default XML results often break in browsers): contributions and entrypoints
Sample queries:
Contributions by jdm
hg contributions
newcomers interested in coding
With curl:
curl -G -d "where=extra.values.assigned_to=='josh@joshmatthews.net'" -i http://tranquil-plateau-4519.herokuapp.com/contributions/
curl -G -d 'where={"volunteer":true,"source":"hg"}' -i http://tranquil-plateau-4519.herokuapp.com/contributions/
Read the EVE demo readme and documentation for more information about how to use the database frontend.
See this SUMO pull request for an example of how easy pulse integration can be, and this commit for an example of the simple pulse message types created for SUMO.
People
Contact
Join the blackhole-dev mailing list, or drop by on IRC (#blackhole, irc.mozilla.org) to say hello, or hack with us!