Taskcluster/Monitoring/Services
From MozillaWiki
Contents
Service Tier Definitions
- Tier 1: Required for TaskCluster Platform function
- Tier 2: Insights into operations of TaskCluster Platform
- Tier 3: External infra causes task failures
Services in each tier
Tier 1
- AWS (us-east-1, us-west-1/2)
- Heroku
- Tutum
- Azure
- DockerHub (moving to AWS S3 for primary automation)
- Pulse / CloudAMQP
- Mozilla LDAP (not yet, soon)
Tier 2
- Papertrail
- Influx
Tier 3
- Hg.mozilla.org
- git.mozilla.org
- github.com
- VPN -> Balrog
- AWS not in Tier 1
Project summary
- Mak an API for events related to infrastructure status
- Emit pulse messages for events
- have an out of band status page
- Consider: pause, or stop accepting tasks or stop scheduling tasks on Tier 1 failures