CloudServices/SimplePushServer
Please refer to the Push Service docs.
Contents
SimplePush Server
Overview
Provide a service to allow Third Party Application servers to notify their Web Apps that an event has occurred and action may be required, without requiring a web page to be constantly present and connected to the Third Party Application Server
Project Contacts
Principal Point of Contact - Doug Turner dougt@mozilla
IRC - #push
Group Email - TBD
Goals
Provide a scalable, fast server for the SimplePush protocol as defined by https://wiki.mozilla.org/WebAPI/SimplePush.
In brief, SimplePush is a near dataless method to remotely wake a client application so that it can call "home" and determine what actions are needed. It solves the power and wasted bandwidth concerns of having dozens of applications constantly needing to be connected back with no action required.
This will provide endpoints for both websocket clients and PUTs from third party servers. SimplePush
Use Cases
Use cases are defined here
Definitions
Requirements
- APP requests an ENDPOINT from the PUSH CLIENT and shall register two callback functions, one for receipt of the ENDPOINT, and a second for handling of a VERSION EVENT
- If not already present, PUSH CLIENT shall generate a unique UUID4 Identifier for the UserAgent (UAID)
- PUSH CLIENT shall generate a unique UUID4 Identifier for the APP (APPID)
- PUSH CLIENT shall send UAID, APPID and any additional information required for proprietary KICK to the PUSH SERVER
- PUSH SERVER shall create an ENDPOINT for the UAID and APPID and return it to the PUSH CLIENT.
- If a KICK driver is present, PUSH SERVER shall relay appropriate PUSH CLIENT provided information to the KICK driver.
- PUSH CLIENT tenders the ENDPOINT to APP via callback.
- APP sends ENDPOINT to the APP SERVER
- On VERSION EVENT, APP SERVER PUTs version value to ENDPOINT
- If a PUSH CLIENT is currently connected to APP SERVER, APP SERVER relays an UPDATE containing currently pending VERSION EVENTS.
- If a PUSH CLIENT is NOT currently connected, an optional, proprietary KICK driver may be called to wake devices associated with the corresponding ENDPOINT UAID.
- If a PUSH SERVER is unable to immediately deliver a VERSION EVENT, the VERSION EVENT is logged to short term storage.
- PUSH CLIENT connects to the PUSH SERVER and shall identify a list of one or more UAIDs it is responsible for.
- If there are VERSION EVENTS pending for requested UAIDs, PUSH SERVER sends an UPDATE packet (For this template, italicized names would be replaced by actual values):
{ UAID: { {APPID: VERSION}, ... }, ... }
- If no VERSION EVENTS are pending for the requested UAIDs, PUSH SERVER may return a status indicating no data available (for REST implementations) or simply not return content (for WebSocket)
- During the transmission of the UPDATE, a PUSH SERVER may wish to return a 503 (Service Unavailable) error to APP SERVERS for any VERSION EVENT associated with an in progress UAID, so as to prevent potential race conditions.
- On receipt of UPDATE, PUSH CLIENT shall return an ACK to the PUSH SERVER.
- The ACK shall contain a list of UAIDs for which all APPIDs have been properly received.
- The PUSH SERVER shall then clear APPID version information from short term storage, and re-allow version updates for those UAIDs if currently blocked.
- The PUSH CLIENT shall then notify APPs of the VERSION EVENT using the appropriate callback, and passing the VERSION
NOTE: a PUSH RELAY may be created by combining the polling aspects of the PUSH CLIENT with the data management and KICK driver of the PUSH SERVER. This would allow a VERSION EVENT system to enter protected networks or use restricted means to communicate to USER AGENTs. It is important to note that once a PUSH SERVER has received an ACK for a given UAID, the PUSH SERVER is under no obligation to retain that data, and proper relay of the VERSION EVENT is the PUSH RELAY's problem.
Get Involved
Call to action for folks who want to help.
Design
Points of Contact
Server Engineer - JR Conlin jrconlin@mozilla
The protocol is defined here
Platform Requirements
This system runs on linux systems as a Go executable.
Go executables are mostly self contained, however the following external systems are strongly recommended:
- a memcached server cluster
- heka logger
It should also be noted that Go's SSL implementation is surprisingly CPU intensive as of 1.1.2. For our implementation, we decided that since PUTs require more setup/teardown than longer lived Websocket connections, we would use AWS ELB SSL termination to handle the secure PUTs. If peak user load is not expected to be higher than 100K or so, this may not be required.
Proprietary Ping Requirements
- GCM
- APNS
Code Repository
Previously (as referenced in other parts of this page), https://github.com/mozilla-services/pushgo/.
Currently, https://github.com/mozilla-services/autopush/ is used.
Release Schedule
1.4
- Target date: 20/11/2014
- Released: 20/11/2014
Common Changes:
- Fixes to critical etcd routing bug
- Fixes to Travis testing
- Convert to toml configuration system
- Integrated smoke tests
- Various optimizations.
Loop Push
- Create system that does not store data
Simple Push
- No system specific changes made
1.4.2
Common Changes:
1.5
- Target Date: TBD
- Released: Unreleased
Common Changes:
- Include support for "data" to connected devices only
QA
Points of Contact
- Primary - kthiessen@
- Backup - rpappalardo@
Test Framework
There are several test frameworks in place. Most systems are stand alone test suites so that they may be applied both to the current server and any externally created system.
https://github.com/mozilla-services/simplepush-testpod - provides an end-to-end stress test of the system.
https://github.com/jrconlin/simplepush_test - provides a quick "smoke test" as well as a thorough API test of bad or malicious tests.
Notes
Security and Privacy
wiki page: https://wiki.mozilla.org/Security/Reviews/SimplePushSrv
Points of Contact
Review Status
Bugzilla Tracking # - https://bugzilla.mozilla.org/show_bug.cgi?id=897454
https://wiki.mozilla.org/Security/Reviews/SimplePushSrv
Issues and Resolutions
Operations
Points of Contact
Current Ops-Engineers are oremj@ and bwong@
Deployment Architecture
Bugzilla Tracking # -
https://mana.mozilla.org/wiki/display/SVCOPS/SimplePush#SimplePush-Deployments
Deployment Request Template
Currently, deployments are created from Github Releases. Releases for Stable and Production must be signed off by QA before deployment.
Bugzilla Subject: Please deploy Tag name to Product Platform
Bug content:
Version: Release Title (e.g. Simple Push Server 1.4.2 Release Candidate 1) Release URL (e.g. https://github.com/mozilla-services/pushgo/releases/tag/1.4.2rc1) Product: What platform to deploy the target to (e.g. push-web, push-loop) Config Changes: +[handlers]: handlers section added. ~[default]: default section options changed. -[propping]: propping section removed. ... Deployment Notes: Please note any other modifications which may cause the server to fail to start.
Escalation Paths
Lifespan Support Plans
Logging and Metrics
Current logging and metrics are being filtered into the Heka system. Final logging and metrics are TBD; depending on the sorts of data that needs to be detected.
Points of Contact
Tracking Element Definitions
Data Retention Plans
Dashboard URL
The following are mozilla private URLs.
- Kibana dashboard
- US-West-2 Push monitoring
- Cloudwatch URL (pending?)
Usage metrics are viewable at: https://graphite.shared.us-west-2.prod.mozaws.net/grafana/#/dashboard/file/default.json and use the following view template: https://dl.dropboxusercontent.com/u/361111/Push%20_%20SimplePush%20Stats-1421198257367
Customer Support
External Libraries & Users
Deployments
Loop
Hello (aka Loop) is a WebRTC based video chat program that is available to desktop and mobile devices. It uses a specialized version of SimplePush that does not have back end storage, since there is no need to alert connections that are offline.
Configuration
All systems are deployed to AWS and are set to autoscale within clusters. Clusters are not yet configured to scale, however ops is working to address this.
There are two non-production networks of systems:
- Stable
- This is a non production environment which hosts a stable development version of Simplepush for development and integration tests. This version may be auto-updated from the "dev" branch of https://github.com/mozilla-services/pushgo.
- QA
- This is a non production environment that hosts the stable, pre-release version of SimplePush for load and QA testing. This version updates from explicit releases generated by https://github.com/mozilla-services/pushgo.
It should be noted that effort is currently being made to ensure that the push service used by Hello is not significantly different (API wise) from the standard Push service.
Deployment Architectures
The stable environment is currently configured for push-loop-dev.stage.mozaws.net (for long lived socket connections) and updates-push-loop-dev.stage.mozaws.net (to receive the REST version PUTs)
While the client has the ability to retry a connection if a given machine is not responsive, it has been requested by Ops that Push Protocol Redirects be re-enabled and that a separate suite of machines be created to do connection load balancing. (so clients would first connect to the central server, and then be redirected to a machine with available resources.)
Monitoring And Metrics
Push currently provides metrics using logstash compatible reporting mechanisms (e.g. stackdriver). In addition, logs are scraped and information displayed via kibana. Currently monitoring and metrics are being collected and displayed on to different systems. Effort will be made to simplify this.
TODO: Dev & Ops need to identify a list of key health metrics to monitor for this system.
Points of Contact
In the event of significant events, operations notifies members of development teams that actions are required.
TODO: dev will provide contact information as well as a "jiggle list" of actions which may alleviate issues.