Privacy/Reviews/TogetherJS
Contents
Document Overview
Feature/Product: | TogetherJS |
Projected Feature Freeze Date: | (tbd) |
Product Champions: | Ian Bicking, Aaron Druck, Simon Wex |
Privacy Champions: | Curtis Koenig |
Security Contact: | Mark Goodwin |
Document State: | [DONE] Public Comments |
Timeline:
Architectural Overview: | 2013.10.09 |
Recommendation Meeting: | 2013.10.09 |
Review Complete ETA: | 2013.10.17 |
Architecture
In this section, the product's architecture is described. Any individual components or actors are identified, their "knowledge" or what data they store is identified, and data flow between components and external entities is described.
The main objective of this feature/product is: TogetherJS enables real-time collaboration on websites.
Design Documents: technology overview
Components
Client
The client is a Javascript library that site owners include on their website (either linking directly to https://togetherjs.com/togetherjs-min.js
or the site owner can make their own copy of the client). The client is generally activated by the user explicitly hitting a button (though the site owner could invoke it any way they want – for instance in our examples we auto-start the client). The session is always accompanied by a panel of widgets on the side. The session lasts over multiple page loads, so long as they are on the same site and same tab.
The client allows the user to enter their name and an avatar, and sends information about any form elements (so the form values can be sync'd across all the people in the session), and any URLs on the site that the user visits.
Stored Data:
What | Where |
---|---|
Name and avatar | localStorage and sessionStorage |
Question: what about the Google Analytics Metrics API? We could potentially use it (but aren't); opinions from a privacy perspective would be appreciated. (bug)
Answer: this sounds like an item for future discussion and if included would need to be vetted for privacy at that time.
Hub
The clients generally can't speak directly to each other, so we have a "hub": a server that accepts WebSocket connections, based on a "room" (or session ID), and any messages sent to the server are echoed to everyone else in that room. The one exception is WebRTC audio, where we have people connect directly to one another. Web site owners can deploy their own hub if they wish.
All people in the same session connect to the same server instance, and messages received are sent out immediately to the other clients. Messages are not saved, and all the transfer is done in memory. There is no backend server like Redis that coordinates between servers.
The server has some logging. Some details of messages are logged when the server is at a debugging log level. In production we only log statistics. These are per-session statistics. We record the domain the session was on (or in unusual cases, multiple domains). We record a count of the number of pages, unique URLs, and activity. We do not record any identifying information, we do not record actual URLs, and we do not record detailed activity (only things like number of bytes, number of messages, idle time). Basically we're doing the first half of analysis in-process, and only output already-aggregated data. Nothing is anonymized beyond this, as we are not considering the domain private and all other information is naturally anonymous.
Stored Data:
What | Where |
---|---|
Aggregate Statistics: domain names, session times, user agent strings | S3 |
- Note: user agent strings are not yet being saved (bug)
Example of stats record being saved
Communication with Client
Direction | Message | Data | Notes |
---|---|---|---|
In/out: (all communication is symmetric) | {type: "hello", name: "Me", avatar: "data:...", url: "http://example.com/page.html#location"}
|
Name, avatar, exact URLs, idle status |
User Data Risk Minimization
In this section, the privacy champion will identify areas of user data risk and recommendations for minimizing the risk.
Eavesdropping
Risk: Anything passed to the hub could be monitored.
Requirement: Communication channels to and from the hub must be available in and default to HTTPS for the Mozilla hub.
User Impersonation
Risk: Users are not authenticated and as such could impersonate another user.
Requirement: Users should be warned against disclosing sensitive information.
Data Leakage
Risk: Form fields are visible to all members of a session.
Requirement: Ability for sites to disable fields or sets of fields
Spoofing
Risk: It is possible to direct a user to an offsite url that could be crafted to resemble the original URL.
Requirement: This should not be allowed or at the very least a warning should appear when changing sites.
Alignment with Privacy Operating Principles
In this section, the privacy champion will identify how the feature lines up with Mozilla's privacy operating principles.
See Also: Privacy/Roadmap_2011#Operating_Principles:
Principle: Transparency / No Surprises
(How the feature addresses this)
- Users receive a notification in the form of a door hanger when entering into a new session on a site using TogetherJS that they can use to not enter into a session.
- Users will be given appropriate warnings about the disclosure of personal information using the nu-authenticated chat functions.
Recommendations: (what can be improved)
- see items above
Principle: Real Choice
- users can choose not to use the feature
Recommendations:
Principle: Sensible Defaults
- System defaults to using HTTPS (at least on Mozilla sites, we cannot enforce this only recommend it for other consumers of TogetherJS).
Recommendations:
Principle: Limited Data
Recommendations:
Follow-up Tasks and tracking
What | Who | Bug | Details |
---|---|---|---|
[DONE] Initial Overview Discussion | Curtis Koenig, Ian Bicking, Aaron Druck, Mark Goodwin, Dan Veditz, David Chan | Github tracker bugs linked above | 2013.10.09 Security and Privacy Review |
[DONE] Public Comments closed | 2013.10.17 |