CIDuty/How To/AWS Pending Test
What to do in case of high pending tests under an AWS worker pool
Sometimes AWS worker pool get overloaded with tests or simply we don't have enough workers of a specific pool. If this happens you will see an alert such as:
nagios1.private.releng.mdc1.mozilla.com:Pending tests is CRITICAL: CRITICAL Pending tests: 3589 on gecko-t-linux-xlarge.
When this happens the first step is to check if we are getting outbid. You can see this here.
Look for the number of InsufficientInstanceCapacity instances belonging to the affected pool.
A second best step is to check papertrail.
You can filter the logs after each worker type.
Escalation
Letting people know about the queue in #ci before starting with the steps above is always a good thing.
If we are just missing workers or the number of jobs just keeps piling up, escalate to sheriffs so they can close trees until the queues go down and notify #ci that trees are closed because of InsufficientInstanceCapacity.
Sometimes, the problem isn't easy to be found, so, pinging people on IRC/Slack is the next step:
For EU time-zone we have pmoore
For US time-zone we have bstack, wcosta.
You can also check the escalation path here