Pancake Thumbnailer Infrastructure
See the diagrams attached to https://bugzilla.mozilla.org/show_bug.cgi?id=731228
Contents
Pancake THumbnailer API
What does it do
It implements an API to generate web site thumbnails (screenshots). You give it a bunch of links and it returns a list of URLs to images that contain a screenshot for those links.
What does it store
It stores thumbnail jobs in RabbitMQ.
It stores the state of the thumbnail request in a Redis database. This data expires within a few minutes.
What does it talk to
It talks to a RabbitMQ server to store and distribute thumbnail job.s This data is not persistent and expires as soon as the job has been processed. The job contains:
- Thumbnail Job ID
- Site URL
- Site URL Hash
It talks to a Redis database to store the state for the thumbnail request. The following data is stored in Redis:
- Thumbnail Job ID
- All Sites part of the Job
- Status of the sites (processing, error, ready)
It talks to Amazon S3 to find out if thumbnails for a specific site already exist.
Pancake Thumbnailer Worker
What does it do
It processes a thumbnail request. It uses 'phantomjs' to render the site and create an image. The resulting image is then stored in Amzon S3 and the status of the thumbnail request is updated in Redis.
What does it talk to
It talks to a RabbitMQ server to poll for thumbnail jobs.
It talks to Redis to maintin the state of the thumbnail request.
It talks to Amazon S3 to store resulting thumbnail images
It talks to the site that is being thumbnailed
What does it store
It stores/updates the thumbnail request status in Redis. This data expires within a few minutes.
It stores the images in Amazon S3. They are expired after 24 hours.