TestEngineering/Services/TokenServerAndSyncLoadTesting
From MozillaWiki
< TestEngineering | Services
- NOTE: We currently have two Verifier stacks in Stage (and probably Production):
- The standalone Browser_ID Verifier stack: See that Verifier sections below...
- A Tokenserver+Verifier stack: See the TokenServer sections below...
Contents
- 1 Quick Verification Of Stage Deployments
- 2 Quick Verification Of Production Deployments
- 3 Load Test Tool Client/Host
- 4 Installing BrowserID-Verifier and the Loads tool on Localhost or AWS
- 5 Running the load test against the Verifier in Stage
- 6 Using the Loads V1 Services Cluster for the Verifier
- 7 Installing TokenServer+Verifier and the Loads tool on Localhost or AWS
- 8 Running the load test against TokenServer+Verifier in Stage
- 9 Using the Loads V1 Services Cluster for TokenServer+Verifier
- 10 Installing Sync and load testing on Localhost or AWS
- 11 Running the load test against Sync 1.5 in Stage
- 12 Using the Loads V1 Services Cluster for Sync 1.5 in Stage
- 13 Running a combined load test against TokenServer+Verifier and Sync 1.5 in Stage
- 14 Using the Loads V1 Services Cluster for a combined load test in Stage
- 15 Configuring The Load Tests
- 16 Test Coverage and Stats
- 17 Analyzing the Results
- 18 Debugging the Issues
- 19 Monitoring TS and Sync Stage
- 20 Performance Testing Information
- 21 Details on the Load Test tool
- 22 Known Bugs, Issues, and Tasks
- 23 References
Quick Verification Of Stage Deployments
- This is a quick sanity test of the environment before getting started on load tests.
- TokenServer+Verifier Stage environment:
From the browser: https://token.stage.mozaws.net curl https://token.stage.mozaws.net curl -I https://token.stage.mozaws.net Use the simple "make test" command from an install of tokenserver on the localhost or AWS instance. cd loadtest make test SERVER_URL=https://token.stage.mozaws.net Alternate method: Use the test tool from here: https://github.com/edmoz/fxa-sync-client Install and check all collection types for a known account in Stage: bin/sync-cli.js -e EMAIL -p PASSWORD --env stage -t COLLECTION where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
- Verifier Stage environment:
In the browser: https://verifier.stage.mozaws.net/ curl https://verifier.stage.mozaws.net curl -I https://verifier.stage.mozaws.net Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance. cd loadtest make test SERVER_URL=https://verifier.stage.mozaws.net
- Sync Server Stage environment:
Install server-syncstorage to the local host or AWS instance (see below) $ cd server-syncstorage Quick test against the TokenServer $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server <Stage TokenServer> Current example: $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py --use-token-server https://token.stage.mozaws.net/1.0/sync/1.5 Quick tests against the Sync nodes $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py <Stage Sync Node>#<Node Secret> Current examples: $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> $ ./local/bin/python ./syncstorage/tests/functional/test_storage.py https://sync-1-us-east-1.stage.mozaws.net#<Node Secret> Get the Node Secret information from OPs
- Using TPS
- The TPS FxA/Sync automated tests can be used as well, but the following file will have to be edited to add Stage environment configuration parameters: https://github.com/mozilla/gecko-dev/blob/master/testing/tps/tps/testrunner.py
- See the following wiki page for more information: https://wiki.mozilla.org/User_Services/Sync/Run_TPS
- See also: https://bugzilla.mozilla.org/show_bug.cgi?id=1006675
Quick Verification Of Production Deployments
- This is a quick sanity test of the environment after a new deployment.
- Tokenserver+Verifier Production Environment
In the browser: https://token.services.mozilla.com curl https://token.services.mozilla.com curl -I https://token.services.mozilla.com Then: Use the test tool from here: https://github.com/edmoz/fxa-sync-client Install and check all collection types for a known account in Production: bin/sync-cli.js -e PROD-EMAIL -p PASSWORD -t COLLECTION where -t is one of bookmarks,history,passwords,tabs,addons,prefs,forms
- Verifier Production Environment
In the browser: https://verifier.accounts.firefox.com curl https://verifier.accounts.firefox.com curl -I https://verifier.accounts.firefox.com Then: Use the simple "make test" command from an install of browserid-verifier on the localhost or AWS instance. cd loadtest make test SERVER_URL=https://verifier.accounts.firefox.com
- Sync Server Stage environment
Sign in with a known FxA account and sync data with a current Production account (sync node). Create a new FxA account and set up sync.
Load Test Tool Client/Host
- It is always best to configure an AWS instance as the host for all load testing.
- All load tests can now run on the localhost (the AWS instance) or against the new Loads Cluster. See the following links for more information:
Installing BrowserID-Verifier and the Loads tool on Localhost or AWS
- Installation:
$ git clone git://github.com/mozilla/browserid-verifier $ cd browserid-verifier Note: You may want to install a specific branch for testing vs defaulting to Master $ npm install $ npm test $ cd loadtest $ make build Note: This should hit Stage by default: SERVER_URL=https://verifier.stage.mozaws.net
- Note: This will install a local copy of the Loads tool for use with the verifier.
Running the load test against the Verifier in Stage
- Stage environment:
$ make test or $ make test SERVER_URL=https://verifier.stage.mozaws.net $ make bench or $ make bench SERVER_URL=https://verifier.stage.mozaws.net Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost. The recommendation is to use 'make test' and 'make megabench' instead (see below)... Note: The Stage Verifier hits the Stage mockmyid server
- Production environment:
$ make test SERVER_URL=https://verifier.accounts.firefox.com $ make bench SERVER_URL=https://verifier.accounts.firefox.com
Using the Loads V1 Services Cluster for the Verifier
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://verifier.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=TBD
- Production environment:
$ make megabench SERVER_URL=https://verifier.accounts.firefox.com
- REFs:
Installing TokenServer+Verifier and the Loads tool on Localhost or AWS
- Installation:
$ git clone https://github.com/mozilla-services/tokenserver $ cd tokenserver Note: You may want to install a specific branch for testing vs defaulting to Master $ make build $ make test Note: This is for local testing only $ cd loadtest $ make build Note: This should hit Prod by default: SERVER_URL=https://token.services.mozilla.com
- Note: This will install a local copy of the Loads tool for use with TokenServer+Verifier.
Running the load test against TokenServer+Verifier in Stage
- Stage environment:
$ make test SERVER_URL=https://token.stage.mozaws.net $ make bench SERVER_URL=https://token.stage.mozaws.net Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost. The recommendation is to use 'make test' and 'make megabench' instead (see below)... Note: This also hits the Stage Verifier, which in turns hits the Stage mockmyid server
- And while we are at it...
- Dev environment:
$ make test SERVER_URL=https://token.dev.lcip.org $ make bench SERVER_URL=https://token.dev.lcip.org
- Production environment:
$ make test SERVER_URL=https://token.services.mozilla.com $ make bench SERVER_URL=https://token.services.mozilla.com
Using the Loads V1 Services Cluster for TokenServer+Verifier
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
- Production environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com
- REFs:
Installing Sync and load testing on Localhost or AWS
Installation: $ git clone https://github.com/mozilla-services/syncstorage-loadtest/ $ cd syncstorage-loadtest Note: You may want to install a specific branch for testing vs defaulting to Master $ pip install -r requirements.txt
Running the load test against Sync 1.5 in Stage
- Loads against specific Sync nodes in Stage
$ export SERVER_URL=https://your.storagenode.here#SECRET Sync Stage nodes: https://sync-1-us-east-1.stage.mozaws.net https://sync-2-us-east-1.stage.mozaws.net ...etc... NOTE: The OPs team has the SECRET string for Stage. Get it from them before you start testing.
- Load testing with Molotov: https://molotov.readthedocs.io/en/stable/
$ bin/molotov [commands] loadtest.py
Using the Loads V1 Services Cluster for Sync 1.5 in Stage
- loadtesting from server-syncstorage has been deprecated, please refer to mozilla-services/syncstorage-loadtest
Running a combined load test against TokenServer+Verifier and Sync 1.5 in Stage
- A combined loads test against TokenServer and Sync 1.5 in Stage
- This is done via the server-syncstorage directory that was cloned and built above
$ cd server-syncstorage $ cd loadtest $ make test SERVER_URL=https://your.tokenserver.here $ make bench SERVER_URL=https://your.tokenserver.here Examples for Stage: $ make test SERVER_URL=https://token.stage.mozaws.net $ make bench SERVER_URL=https://token.stage.mozaws.net See https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer.2BVerifier_Stage_Environment Note: the current version of 'make bench' tends to use a lot of CPU and Memory on the localhost. The recommendation is to use 'make test' and 'make megabench' instead (see below)... Note: The Stage Tokenserver hits the Stage Verifier, which, in turn, hits the mockmyid server.
- And while we are at it...
Dev environment: Examples: $ make test SERVER_URL=https://token.dev.lcip.org $ make bench SERVER_URL=https://token.dev.lcip.org Prod environment: Examples: $ make test SERVER_URL=https://token.services.mozilla.com $ make bench SERVER_URL=https://token.services.mozilla.com See https://wiki.mozilla.org/QA/Services/FxATestEnvironments#FxA.2C_TokenServer.2C_and_Sync_Production_Environments and https://wiki.mozilla.org/QA/Services/TSVerifierSyncTestEnvironments#TokenServer_and_Sync_1.5_Dev_Environments
Using the Loads V1 Services Cluster for a combined load test in Stage
- By using the Loads Services Cluster, we can offload the broker/agents processes and save client-side CPU and memory.
- Changes were made to Makefile and the load test to use the cluster and some associated config files (for test, bench, megabench).
- Stage environment:
$ make megabench SERVER_URL=https://token.stage.mozaws.net
- Dev environment:
$ make megabench SERVER_URL=https://token.dev.lcip.org
- Prod environment:
$ make megabench SERVER_URL=https://token.services.mozilla.com
- REFs:
Configuring The Load Tests
- Makefile
- The SERVER_URL constant can be changed.
- Config files
- For make test (BrowserID-Verifier, TokenServer, Sync, Combined):
- Number of hits
- Number of concurrent users
- For make test (BrowserID-Verifier, TokenServer, Sync, Combined):
- For make bench (BrowserID-Verifier, TokenServer, Sync, Combined):
- Number of concurrent users
- Duration of test
- For make bench (BrowserID-Verifier, TokenServer, Sync, Combined):
- For make megabench (using the LoadsCluster with BrowserID-Verifier, TokenSerer, Sync, Combined):
- Number of concurrent users
- Duration of test
- Include file (this is code dependent)
- Python dependencies (this is code dependent)
- Agents to use for testing (default is 5, max is currently 20, but depends on the number of concurrent load tests running)
- Detach mode (leave as defined for now to automatically detach from the load test once it starts on the localhost)
- Observer (this can be email or irc - the default is irc #services-dev channel)
- SSH (the user account needed to SSH into the loads cluster - the default is ubuntu)
- For make megabench (using the LoadsCluster with BrowserID-Verifier, TokenSerer, Sync, Combined):
- Tokenserver load test code
- The Tokenserver load test can be configured - see the following lines:
- Basic Settings: https://github.com/mozilla-services/loop-server/blob/master/loadtests/loadtest.py
- MockMyID: https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L19-L36
- Percentages: https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L39-L51
- Verifier load test code
- The Verifier load test can be configured - see the following lines:
- Various settings: https://github.com/mozilla/browserid-verifier/blob/master/loadtest/loadtest.py#L13-L53
- Sync Server load test code
- The Sync Server load test can be configured - see the following lines:
- Setting MockMyID: https://github.com/mozilla-services/server-syncstorage/blob/master/loadtest/stress.py#L26-L45
- Setting test distributions: https://github.com/mozilla-services/server-syncstorage/blob/master/loadtest/stress.py#L48-L83
- REFs:
Test Coverage and Stats
- Basic tweakable values for all load tests
- users = number of concurrent users/agent
- agents = number of agents out of the cluster, otherwise errors out
- duration = in seconds
- hits = 1 or X number of rounds/hits/iterations
- TokenServer
- File location: tokenserver/loadtest/loadtest.py
- Inside NoteAssignmentTest, test_realistic is the main load test; the others are for specific behaviors
- The test runs as following:
95% ask for assertions on existing users (on a DB filled by test_single_token_exchange) 4% ask for assertion on a new use 1% ask for a bad assertion
- A bug has been filed to get the following additional coverage for the load test:
- generation numbers in assertion
- client state string
- A bug has been filed to get some integration tests written:
- to cover the edge/error cases not in the load test
- to be pointed at a remote server
- A bug has been filed to get the following additional coverage for the load test:
- Sync
- File location: server-syncstorage/loadtest/stress.py
- This is the Sync 2.0 load test that has been back-ported for Sync 1.5.
- The stress.py file is fully configurable for the following:
- client probability
- client distribution
- collections
- A bug has been filed to add support for load testing tabs
- The tab collection it uses memcache; we need to figure out a way to test it without overloading the server
- There are currently no constants to define how to select percentages per collection type
- Right now, we need to manually configure the collections list in stress.py:
- collections = ['bookmarks', 'forms', 'passwords', 'history', 'prefs']
- Basically, you can add more entries of each type, since the load test (per user/again/hit/pass) picks randomly from the list for any given request...
Analyzing the Results
- There are several methods and tools for analyzing the load test results.
- 1. Using the Loads Services Cluster dashboard
- All loads tests using this cluster generate a live report and a run report available on this site:
- You can quickly review the following here: Status, Configuration, Results, Custom Metrics, and Errors.
- Tokenserver Custom Metrics
- addFailure
- Verifier Custom Metrics
- addFailure
- Sync Custom Metrics
- addFailure
- NOTE: If you want more details on the dashboard, please file an issue here: https://github.com/mozilla-services/loads
Debugging the Issues
- There are several methods and tools for debugging the load test errors and other issues.
- 1. Important logs for TokenServer (per server)
- /media/ephemeral0/logs/
- /media/ephemeral0/nginx/logs/default.access.log
- /media/ephemeral0/nginx/logs/default.error.log
- /media/ephemeral0/nginx/logs/tokenserver.access.log
- /media/ephemeral0/nginx/logs/tokenserver.error.log
- /media/ephemeral0/logs/tokenserver/token.error.log
- /media/ephemeral0/logs/tokenserver/token.log.*
- /media/ephemeral0/logs/tokenserver/process_account_deletions.error.log
- /media/ephemeral0/logs/tokenserver/process_account_deletions.log
- /media/ephemeral0/logs/tokenserver/purge_old_records.log
- /media/ephemeral0/logs/tokenserver/purge_old_records.error.log
- /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
- /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
- /var/log/circus.log
- /var/log/hekad/tokenserver.stdout.log
- /var/log/hekad/tokenserver.stderr.log
- 2. Important logs for Verifier (per server)
- /media/ephemeral0/fxa-browserid-verifier/verifier_err.log
- /media/ephemeral0/fxa-browserid-verifier/verifier_out.log
- /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
- /media/ephemeral0/nginx/logs/fxa-browserid-verifier.access.log
- /media/ephemeral0/nginx/logs/default.access.log (not in use)
- /media/ephemeral0/nginx/logs/default.error.log (not in use)
- /media/ephemeral0/squid/access.log
- /var/log/circus.log
- /var/log/hekad/fxa-browserid_verifier.stderr.log
- /var/log/hekad/fxa-browserid_verifier.stdout.log
- 3. Important error logs for Sync (per Sync node)
- /media/ephemeral0/logs/
- /media/ephemeral0/nginx/access.log
- /media/ephemeral0/error.log
- /media/ephemeral0/sync/sync.err
- /media/ephemeral0/sync/sync.log
- Acceptable TokenServer errors:
1% - 2% failures (as the following) token.log: "name": "token.assertion.invalid_signature_error" "name": "token.assertion.verify_failure" nginx access.log: 401s NOTE: Values can be tweaked here: https://github.com/mozilla-services/tokenserver/blob/master/loadtest/loadtest.py#L58-L60 The following types of errors are known: /media/ephemeral0/logs/tokenserver/token.error.log Exception KeyError: KeyError(49564400,) in <module 'threading'... /media/ephemeral0/logs/tokenserver/token.log ..."Starting new HTTP connection (9): 127.0.0.1", "hostname": ... {"error": "StopIteration()", "traceback": "Uncaught exception:\n File \"/data/tokenserver/local/lib/python2.6/site-packages/gunicorn/workers/async.py\"... ..."Connection pool is full, discarding connection: 127.0.0.1", "... Also, any 499s are probably an artifact of the current (V1) load test. REF: https://bugzilla.mozilla.org/show_bug.cgi?id=1040396 https://bugzilla.mozilla.org/show_bug.cgi?id=1040397 OLD: Also, it may be the case that the following errors are "acceptable" if TS Stage is larger than Verifier Stage: /media/ephemeral0/logs/tokenserver/token.error.log Verifier-related errors of these types: "HttpConnectionPool is full, discarding connection: verifier.stage.mozaws.net" "Resetting dropped connection: verifier.stage.mozaws.net" "Starting new HTTPS connection (179): verifier.stage.mozaws.net"
- Acceptable Verifier errors:
The verifier_out.log will show errors of the following types: result: 'failure',\n reason: 'untrusted issuer...' result: 'failure',\n reason: 'expired' result: 'failure',\n reason: 'algorithms do not match' result: 'failure',\n reason: 'audience mismatch: scheme mismatch' Also, any 499s in the nginx logs are probably an artifact of the current (V1) load test.
- Acceptable Sync node errors:
In the nginx access.log files: We will see some percentage of 404s. Right now we see the following: 14% 404s (compared to the total count of 200s) with the config set up as follows: users = 20 duration = 1800 agents = 5 Ideally, the overall percentage of 404s should drop the longer the load test. Usually, you will not see 304s, 400s, 412s, or 415s for a load test, although they may show up in the logs after running the remote integration tests. Also, any 499s are probably an artifact of the current (V1) load test. In /var/log/hekad/sync_1_5.stderr.log You may see some Decoder 'Sync-1_5-SlowQuery-MySqlSlowQueryDecoder' error: Failed parsing and a lot of BSO INSERTs In /media/ephemeral0/logs/sync/sync.err You should see expected skew and QueuePool messages and Deprecation warnings Also, these are known Exception SystemExit Exception KeyError This is probably https://bugzilla.mozilla.org/show_bug.cgi?id=1040397
Monitoring TS and Sync Stage
- Loads dashboard:
- Cluster status
- Check directly from the Loads Cluster dashboard:
Agents statuses Launch a health check on all agents
- and also on StackDriver: https://app.stackdriver.com/groups/6664/stage-loads-cluster
- For all other monitoring, see the following section:
Performance Testing Information
- TBD
Details on the Load Test tool
- The documentation can be found here:
- The repositories are here:
- The Services cluster is here:
Known Bugs, Issues, and Tasks
- Tokenserver:
- BrowserID-Verifier:
- Repo: https://github.com/mozilla/browserid-verifier/issues
- Bugzilla: no specific cateogory
- Sync:
- OPs and Infrastructure
- Loads Tool and Cluster
References
- Other URLs
- Repositories
- Documentation
- The QA Test Environments:
- Deploying the FxA Load Test environment for broker/agents usage:
- Sync 1.5 protocol, documentation, etc.
- https://github.com/mozilla-services/docs
- https://docs.services.mozilla.com/#how-to
- https://docs.services.mozilla.com/howtos/run-fxa.html
- https://docs.services.mozilla.com/token/apis.html
- https://docs.services.mozilla.com/storage/apis-1.5.html
- https://docs.services.mozilla.com/howtos/run-sync-1.5.html
- https://docs.services.mozilla.com/howtos/run-sync-1.5.html
- https://github.com/mozilla-services/syncserver
- OPs pages for stats collection, logging, monitoring
- TBD