CloudServices/Sync/FxSync/Syncorro
From MozillaWiki
< CloudServices | Sync | FxSync
Socorro + Sync = Syncorro \o/
Contents
People
- Client engineering: Marina Samuel, Philipp von Weitershausen
- Server engineering: XXX
- Metrics: Daniel Einspanjer, Xavier Stevens
- Product: Jennifer Arguello
Goals
- Gather statistics on errors (to help with prioritization)
- Be able to correlate errors with maintenance windows, user profiles, etc.
- Simplify error reporting for users who file bugs or SUMO articles
- Detect the "long tail" of problems that are never filed
Features
- Each submitted report should be represented by a URL or at least an opaque token (e.g. UUID)
- Ability to query according to application, Sync, and error specific metadata
- Fulltext search over submitted log data
- Ability to return instructions to client upon report submission (e.g. throttling, recovery, support messages for the user, etc.)
Roadmap
- Discuss goals and features with metrics (DONE)
- Discuss UI mockups with UX
- Add ability to upload Syncorro data to ElasticSearch (see bug 673318)
- Build add-on for the Services Beta Channel
Client UX
Submitting an error report
- When there's a Sync error, the usual error bar is shown (except we're not showing the dreadful Unknown Error message):
------------------------------------------------------------------------- | We're sorry, Sync encountered a problem [Details ...] (X)| -------------------------------------------------------------------------
- Clicking on the Details button dismisses the bar and brings up a tab with a high-level explanation of the details of the error:
------------------------------------------------------------------------ | Details for Sync problem on Tuesday, May 1, 2011 5:59 pm | | ======================================================== | | | | There was a problem saving the "BBC News - World" bookmark to your | | computer. Other data is not affected. | | | | To help Mozilla improve Sync and prevent errors like this in the | | future, please submit this report. Your personal data will not be | | submitted. | | | | [X] Automatically submit reports in the future. | | | | [Submit report] | | | | > Full report | | | ------------------------------------------------------------------------
- Pressing the Submit report button will submit the report. Once the report is submitted, a link to the report on the server is displayed:
------------------------------------------------------------------------ | Details for Sync problem on Tuesday, May 1, 2011 5:59 pm | | ======================================================== | | | | There was a problem saving the "BBC News - World" bookmark to your | | computer. Other data is not affected. | | | | Firefox submitted a report of the problem to Mozilla. | | | | [X] Automatically submit reports in the future. | | | | > Full report | | | ------------------------------------------------------------------------
- If the Syncorro server finds a suitable support page, the page will display:
------------------------------------------------------------------------ | Details for Sync problem on Tuesday, May 1, 2011 5:59 pm | | ======================================================== | | | | There was a problem saving the "BBC News - World" bookmark to your | | computer. Other data is not affected. | | | | Good news! Firefox submitted a report of the problem to Mozilla and | | a possible solution was found. _View_support_page_ | | | | [X] Automatically submit reports in the future. | | | | > Full report | | | ------------------------------------------------------------------------
- Click on the arrow next to Full Report will show all information that potentially is or was submitted. Since there's typically a lot of it, it's divided into separate collapsible sections itself:
------------------------------------------------------------------------ | Details for Sync problem on Tuesday, May 1, 2011 5:59 pm | | ======================================================== | | | | There was a problem saving the "BBC News - World" bookmark to your | | computer. Other data is not affected. | | | | Firefox submitted a report of the problem to Mozilla. | | | | [X] Automatically submit reports in the future. | | | | \/ Full report | | | | Report ID: {UUID} [Copy to clipboard] | | | | > Application details | | > Sync account info | | > Error fingerprint | | > Log | | | ------------------------------------------------------------------------
Looking up error reports
- Basically make about:sync-log look like about:crashes, linking to the details pages as described in the previous section.
Client Implementation
Note: This is only a draft that is being fleshed out.
- Using Metric's Elastic Search system (also used for AMO stats and Socorro) at data.mozilla.org
- On error, Sync POSTs a payload to data.mozilla.org:
POST /XXX HTTP/1.1 Content-Type: application/json
{ id: "{UUID}", app: { product: "{UUID}", version: "8.0a1", buildID: "...", locale: "en_US", addons: ["{UUID}", "{UUID}", "{UUID}", ...] }, sync: { version: "1.10", account: "eisklclxuauemrjghidis", cluster: "https://phx-sync091.services.mozilla.com/", engines: ["bookmarks", "history", ...], numClients: 2, mobileClients: true }, error: { localTimestamp: 13294938593, engine: "bookmarks", result: 489294595, // the error constant if applicable }, log: "..." }
- Under normal conditions, the server returns HTTP 200 OK with optional hints for the client concerning throttling and help for the user:
HTTP/1.1 200 OK Content-Type: application/json
{ reportURL: "http://data.mozilla.org/syncorro/{UUID}", throttle: 10, // only submit every 10th error infoURL: "http://support.mozilla.com/..." // optional support page }
- Server can also return other status codes to indicate that the data wasn't accepted.
- 500 Server Error
- XXX throttled, try again later
- XXX invalid data
- If the client fails to upload the report (e.g. because of network connectivity problems or similiar), it will retry periodically using a backoff strategy. After some number of failures, the upload is failed permanently, and no further retries will be attempted.
Dashboard implementation
- Graph of number of reports over time (potentially being able to split by certain metadata, e.g. product version, Sync node, etc.)
- Query by metadata
- Fulltext search over logs
- Define SUMO pages for percolator matches
TODO details (talk to ddash, jbalogh)
Questions
- Reports will probably have to be non-public for now, though it would be nice if users could view their own submitted reports... can we do some sort of token-based auth there?
- Will this service require ToS changes?
- What do we do with custom server users?
- What do we do when user has Trace logging enabled?
Discussion
Tentatively identified as not in scope for v1
- Ops paging/integration for events. A large spike in failures could be either a new client error or a server or operational issue, and that's info that we might want to leverage. Best to leave this until we know what we're doing.