ReleaseEngineering/Mozpool
Contents
Overview
Mozilla needs to run its applications on various mobile devices, such as Tegras, Pandas, and even full smartphones. These devices do not act much like the servers that fill the rest of Mozilla's datacenters: they have limited resources, no redundancy, and are comparatively unreliable. With the advent of Firefox OS, Mozilla also needs the ability to automatically reinstall the entire OS on devices.
Mozpool is a system for managing these devices. Users (automated or human) who need a device matching certain specifications can request one from Mozpool, and Mozpool will find such a device, installing a new operating system if necessary. The middle layer of the system (Lifeguard) handles such reinstalls reliably, and also detects and investigates device failure, removing problematic devices from the pool. System administrators can examine these failed devices and repair them, returning them to the pool. The lowest level, Black Mobile Magic (BMM), handles low-level hardware details: automatic power control via IP-addressable power switches; a network-hosted Linux environment for performing software installations; and pinging, logging, and so forth.
Because continued operation of this system is business-critical, it is designed to be resilient to failure not only of individual devices, but to the servers running Mozpool itself.
Policies and Procedures
- ReleaseEngineering/Mozpool/Adding New Android Images to Mozpool
- ReleaseEngineering/Mozpool/Allocating Pandas Between Teams
- ReleaseEngineering/Mozpool/Handling Panda Failures
Available Device Images
- panda-android-4.0.4_v3.2
- Added SUTAgent 1.20 to base image
- panda-android-4.0.4_v3.3
- Added Adobe flash 11.1.115.81 to base image
- panda-android-4.0.4_v3.1
- todo
- android
- todo
- repair-boot
- todo
- b2g
- obsolete
How-To's
- ReleaseEngineering/Mozpool/How To Create a Panda Android Image Suitable For Mozpool
- ReleaseEngineering/Mozpool/How To Interpret Device State in Mozpool
- ReleaseEngineering/Mozpool/How To Use the Mozpool Web UI including such classic hits as
- How to request a device for a loan
- How to manually re-image a device
- How to control the power on a device
- ReleaseEngineering/Mozpool/How To Access the Mozpool API
Links
- repositories - https://github.com/mozilla/mozpool and http://hg.mozilla.org/build/mozpool (synchronized by hand by developers)
- http://hg.mozilla.org/build/mozpool/file/default/README.md (version-controlled documentation)
- http://hg.mozilla.org/build/mozpool/file/default/API.txt (API documentation)
- http://hg.mozilla.org/build/mozpool/file/default/sql/schema.sql (DB Schema)
- Auto-Tools project pages
- PuppetAgain modules (installation details)
- https://mana.mozilla.org/wiki/display/IT/Mozpool (employees only; IT-oriented details of the system implementation)
Architectural Description
See http://hg.mozilla.org/build/mozpool/file/default/README.md for the most up-to-date architectural description of the system.
Source
The source is at http://hg.mozilla.org/build/mozpool
User Interface
The Mozpool user interface is available through a web browser. The home page shows the three layers of the system (Mozpool, Lifeguard, and BMM). Clicking on any of those shows a UI specific to the layer. The BMM UI allows direct control of device power, as well as manual PXE booting; this layer is of most interest to datacenter operations staff. The lifeguard layer allows managed PXE boots and power cycles, as well as forced state transitions.
Deployment
Mozpool is a Python daemon that runs on multiple imaging servers. It uses a database backend and HTTP API for communication between servers. Its frontend is a dynamic web application. The BMM equipment - TFTP servers, syslog daemons, and so on - runs on the same systems.
Mozpool is designed to be deployed in multiple "pools" within Mozilla. The first and likely largest is release engineering.
Release Engineering
In the scl3 datacenter, we have an initial deployment of 10 racks of Pandaboards. Each rack holds about 80 Pandas, grouped in custom-built chassis, for a total of about 800 pandas. Each rack also contains seven "foopies" (proxying between pandas and Buildbot) and one imaging server. Each rack has a dedicated VLAN, keeping most network traffic local to the rack. The database backend is MySQL. See the puppet modules, linked above, for more details of the deployment.
At the BMM and Lifeguard levels, each imaging server is responsible for the pandas in its rack, as assigned in inventory. At the Mozpool level, each imaging server is responsible for all requests that were initiated locally. Mozpool uses HTTP to communicate with Lifeguard on other imaging servers when it needs to reserve a non-local device.
Mozpool Client
In Release Engineering we use the mozpool client to talk with the Mozpool servers to request panda boards. To do this we install the python package inside of a virtual environment. The package is stored in pypi:
To create a new packaged version, checkout the mozpool repo and do the following:
- Make your code changes
- Update the version in setup.py
- Add a new line to CHANGES.txt with the new version, the date and what is changing
- cd mozpoolclient && python setup.py sdist
To deploy to our pypi setup follow these instructions.
There is also a "fork" of the client code that lives in the tools repo: http://hg.mozilla.org/build/tools/lib/python/vendor/mozpoolclient-0.1.6
To update this version run the following commands:
OLD=0.1.5 NEW=0.1.6 cd tools/lib/python/vendor hg move mozpoolclient-${OLD} mozpoolclient-${NEW} # Assuming mozpool is checked out at the same level as your tools repo. rsync --recursive --delete ../../../../mozpool/mozpoolclient/* mozpoolclient-${NEW} #Bump the version in here http://mxr.mozilla.org/build/source/tools/lib/python/vendorlibs.pth vi ../vendorlibs.pth hg commit -m"Bumping mozpool client vendor version from ${OLD} to ${NEW}" hg push
NOTE: if you're making API changes to the mozpool client, you'll need to update the consumers in the tools repo as well before committing.
If you're the pypi package maintainer (armenzg or dustin), you can follow these [??? instructions].