User:Gal/SyncDataModel
Data model
Assumptions
We assume that all changes to the local database occur in a mostly-connected state. In other words, people don't change passwords or bookmarks without network connectivity. However, there can be brief network interruptions etc. The service is assumed to be reasonably available (we don't need many 9s, but a few). We use push notifications to shorten the collision window where we care (passwords, bookmarks), and we tolerate collisions where we don't care.
Passwords
Each password is a doc. There are two kinds of password entries. One for forms, and one for the WWW-Authenticate header. We expect dozens of passwords for most users. Extreme cases can be hundreds of passwords.
Key: sha1([hostname, formSubmitURL, usernameField, passwordField].join("|")) or sha1([hostname, httpRealm].join("|"))
Value: JSON of { hostname, username, password, formSubmitURL, usernameField, passwordField } or { hostname, username, password, httpRealm }
Conflict resolution
Last write wins. Future iterations might do merging at read time in presence of _conflicts. Passwords are expected to change very rarely, and we immediately push updates to the server and notify other clients from there.
Initial import
Replicate from (overwrite local passwords with server values). Replicate to (add any passwords we only have locally to server). Retry in case of a race.
Bookmarks
The entire bookmark tree is represented as one doc. The tree consists of folders ({ description, title, children }), separators ({}), and bookmarks ({ title, uri, description, loadInSidebar, tags, keyword}). We expect dozens of bookmarks, and many hundreds worst-case.
Key: "bookmarks" for desktop bookmark tree, "mobile" for mobile bookmark folder
Value: JSON of tree as described above.
Conflict resolution
Last write wins. Future iterations might do merging at read time in presence of _conflicts. Bookmarks are expected to change rather rarely, and we immediately push updates to the server and notify other clients from there.
Initial import
Replicate from. For desktop, move all local bookmarks into a folder "Local Bookmarks" that is not replicated. The bookmarks we received from the server are now the bookmark tree. For mobile, add any bookmarks to the Mobile folder that don't already exist there. Replicate to. Retry in case of a race.
History
Each history entry is one doc. Long histories are not very uncommon (thousands of uris).
Key: sha1(uri)
Value: JSON of { uri, title, visits }
Conflict resolution
Last write wins. History is expected to change frequently, and we group updates with a timer to avoid paying network transaction overhead for every history update. As a result, conflicts are possible, but also fairly inconsequential.
Initial import
Replicate from. Add any entries that don't already exist on the server. Replicate to. Retry in case of a race.
Revision purging
Clients keep the current revision of a doc in shadow couchdb. Since clients don't directly replicate with each other, there is no need for them to keep any more history than that. The server keeps the revision known to the client that is the furthest behind in replication. This is needed in case we ever want to do any kind of merging of docs.
Notifications
Clients that have sync enabled should use our push notifications protocol to listen to server changes. For bookmark and password changes, we replicate to the server after a brief delay. For history updates, we group updates with a certain (not necessarily very short) time delay. When the server receives any changes, it will send a push notification to all clients that are currently registered and are known to be out of date. CouchDB maintains sufficient state as part of the replication protocol to find these clients. When clients start, they should also contact the server to receive updates (and register for push).
Optional CouchDB protocol modifications
jsondiff
CouchDB is not an overly wire-compact protocol. This is particularly painful for bookmarks updates since we repeat a lot of data that is unchanged from the last state. Instead of sending the whole document again, we should instead do a jsondiff and send the diff only. The server and the client both have the last _rev and can reconstruct the full version. Once the initial version of this is operational, we should consider a fairly simple addition to the CouchDB protocol to make it more compact on the wire. This can be implemented by the web heads in front of the CouchDB server on the server side. Such a "/db/doc_diff" method can then do a GET to the database server, get the last version, apply the diff that came over the network, and then do a PUT, and similarly send a diff based on a GET of the current and the version known to the client. This makes most sense as a generic extension to the protocol to let other data types benefit from this.
SPDY
We should use SPDY to enable content compression.