CIDuty/Other Duties
Contents
Tree Maintenance
Repo Errors
If a dev reports a problem pushing to hg (either m-c or try repo) then you need to do the following:
- File a bug (or have dev file it) and then poke in #ops noahm
- If he doesn't respond, then escalate the bug to page on-call
- Follow the steps below for "How do I close the tree"
How do I see problems in Treeherder?
All "infrastructure" (that's us!) problems should be purple at https://treeherder.mozilla.org. Some aren't, so keep your eyes open in IRC, but get on any purples quickly.
How do I close the tree?
See ReleaseEngineering/How_To/Close_or_Open_the_Tree
How do I claim a rentable project branch?
See ReleaseEngineering/DisposableProjectBranches#BOOKING_SCHEDULE
Re-run jobs
How to trigger Talos jobs
see ReleaseEngineering/How_To/Trigger_Talos_Jobs
How to re-trigger all Talos runs for a build (by using sendchange)
see ReleaseEngineering/How_To/Trigger_Talos_Jobs
How to re-run a build
Do not go to the page of the build you'd like to re-run and cook up a sendchange to try to re-create the change that caused it. Changes without revlinks trigger releases, which is not what you want.
Find the revision you want, find a builder page for the builder you want (preferably, but not necessarily, on the same master), and plug the revision, your name, and a comment into the "Force Build" form. Note that the YOU MUST specify the branch, so there's no null keys in the builds-running.js. Otherwise your build will not show up in self-serve or Treeherder.
Nightlies
How do I re-spin mozilla-central nightlies?
To build new nightlies
To build nightlies on all the platforms
Note: as of Jan 17, 2017 we build nightlies for Android and Linux platform builds on Taskcluster. As of June 21, 2017 we build nightlies for Macosx on Taskcluster. As of July 26, 2017, we build nightlies for Windows builds on Taskcluster. In order to retrigger these nightlies by hand, you need to use the taskcluster tools.
Login to taskcluster tools https://tools.taskcluster.net and goto tools
To retrigger nightlies for all desktop platforms (there is no hook for just Linux nightlies) https://tools.taskcluster.net/hooks/#project-releng/nightly-desktop%252fmozilla-central
To retrigger nightlies for Android https://tools.taskcluster.net/hooks/#project-releng/nightly-fennec%252fmozilla-central
To retrigger nightlies for Macosx https://tools.taskcluster.net/hooks/project-releng/nightly-desktop-osx%2Fmozilla-central
To retrigger nightlies for Windows (both win32 and win64) https://tools.taskcluster.net/hooks/project-releng/nightly-desktop-win%2Fmozilla-central
Select the green "trigger hook" button at the bottom of the page.
If you get an error about scopes, you might have to create a client id for this hook See https://tools.taskcluster.net/auth/clients/ for an example and use "mozilla-ldap/kmoir@mozilla.com/" in the "Client ids beginning with page"
Where are nightly and dependant artifacts stored?
With the move to taskcluster, build artifacts are no longer stored on archive.mozilla.org. Instead, they can be downloaded in two ways:
1. Treeherder. With buildbot, the build + signing + repackaging of Firefox into the correct format for that platform was a single job, the nightly build. With the move to taskcluster, the installable artifacts are created in different job names. For Android and Linux, they are in the Ns (nightly signing) job. For Mac and Windows, they are in the Nr (nightly repackage) job. Here is a filter for mozilla-central that will display the job names you need to download the artifacts.
Platform Nightly Job Artifact to download Android* Ns target.apk Linux* Ns target.tar.bz2 Mac Nr target.dmg Win* Nr installer.exe
For non-nightly builds, a useful filter for treeherder is here (mozilla inbound as an example, update as appropriate). In this case, the job symbols are B for Android, Linux and Mac, and Bs for Windows (signed on push build for Windows as required by some tests).
Platform Dep Job Symbol Artifact to download Android* B target.apk Linux* B target.tar.bz2 Mac B target.dmg Win* Bs target.zip
2. Indexes. Taskcluster indexes identify artifacts associated with each job. Look here for taskcluster nightlies, the latest desktop and here for mobile. Again, the jobs and artifacts associated with each nightly correspond the the chart above. Click on the appropriate job, then "Taskid" on the right, then "Run artifacts" on the right. For dep builds, look under for desktop or here for mobile.
Disable updates
See ReleaseEngineering/How_To/Shut_off_all_updates for global shutoff. We use Balrog now for nightly & aurora updates.
Freeze Updates
See ReleaseEngineering/How_To/Enable_or_Disable_Updates_on_Central if you simply need to freeze updates and not completely disable.
Talos
How to update the talos zips
We only need to do this for mobile requests.
This deployment is super safe. NPOTB
# running this from cruncher is faster than downloading/uploading from your localhost ssh -A cruncher export URL=http://people.mozilla.org/~jmaher/taloszips/zips/talos.07322bbe0f7d.zip export TALOS_ZIP=`basename $URL` wget $URL #relengwebadmn has limited access to the internet - that is why we scp from another host scp ${TALOS_ZIP} relengwebadm.private.scl3.mozilla.com:/mnt/netapp/relengweb/talos-bundles/zips ssh relengwebadm.private.scl3.mozilla.com "chmod 644 /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}" ssh relengwebadm.private.scl3.mozilla.com "sha1sum /mnt/netapp/relengweb/talos-bundles/zips/${TALOS_ZIP}" curl -I http://talos-bundles.pvt.build.mozilla.org/zips/${TALOS_ZIP}
For talos.zip changes: Once deployed, notify the a-team and let them know that they can land at their own convenience.
- Please verify the shasum matches what is in the [comment], we have had a few instances where the talos.zip was incorrect.
Update mobile talos webhosts
Keep track of what revisions is being run. Copy/paste the output into the bug. Please update our maintenance page
This could affect mobile talos numbers or break the jobs altogether. Please coordinate with sheriffs
NOTE: There's a great deal of data we can not check into revision control for legal reasons, so there's an extensive .hgignore file. If you're adding new data to the tree that can not be checked in, please ask a talos developer/reviewer or [file a bug] to request any [.hgignore] changes
webapp cluster
ssh relengwebadm.private.scl3.mozilla.com sudo su - cd /data/releng/src/talos-remote/www/talos-repo # NOTICE that we have uncommitted files hg status # Take note of the current revision to revert to (just in case) hg id hg pull -u # 488bc187a3ef tip # ..capture the output here; the remainder will be long and not that useful.. /data/releng/src/talos-remote/update
Tp4 Zip
### ### NOTE UNTESTED AFTER BUG 1050769 -- Please remove this warning next use ### ssh -A cruncher export URL=http://people.mozilla.org/~jmaher/taloszips/zips/mobile_tp4.zip export TALOS_ZIP=`basename $URL` wget $URL scp ${TALOS_ZIP} `whoami`@relengwebadm.private.scl3.mozilla.com:. # Connect to root @ relengwebadm ssh `whoami`@relengwebadm.private.scl3.mozilla.com sudo su - export ME=jwood export ZIP=mobile_tp4.zip chmod 644 /home/$ME/$ZIP sha1sum /home/$ME/$ZIP cd /data/releng/src/talos-remote/www/ unzip /home/$ME/$ZIP ./ # finally update the web heads cd ../ ./update
Tp5n.zip
export URL=http://people.mozilla.org/~jmaher/taloszips/zips/tp5n.zip export TALOS_ZIP=`basename $URL` wget $URL scp ${TALOS_ZIP} `whoami`@relengwebadm.private.scl3.mozilla.com:. # Connect to root @ relengwebadm ssh `whoami`@relengwebadm.private.scl3.mozilla.com sudo su - export ME=jwood ## BE SURE TO CHANGE export ZIP=tp5n.zip export BUG=1288135 ## BE SURE TO CHNAGE chmod 644 /home/$ME/$ZIP sha1sum /home/$ME/$ZIP cd /mnt/netapp/relengweb/talos-bundles/zips/ cp ./tp5n.zip ./tp5n.before_bug_$BUG.zip mv /home/$ME/$ZIP ./ # YES to overwrite here
Ganglia
- if you see that a host is reporting to ganglia in an incorrect manner it might just take this to fix it (e.g. bug 674233):
switch to root, service gmond restart
Queue Directories
If you see this in #build:
<nagios-sjc1> [54] buildbot-master12.build.scl1:Command Queue is CRITICAL: 4 dead items
It means that there are items in the "dead" queue for the given master. You need to look at the logs and fix any underlying issue and then retry the command by moving *only* the json file over to the "new" queue. See the Queue directories wiki page for details.
Cruncher
If you get an alert about cruncher running out of space it might be a sendmail issue (backed up emails taking up too much space and not getting sent out):
<nagios-sjc1> [07] cruncher.build.sjc1:disk - / is WARNING: DISK WARNING - free space: / 384 MB (5% inode=93%):
As root:
du -s -h /var/spool/* # confirm that mqueue or clientmqueue is the oversized culprit # stop sendmail, clean out the queues, restart sendmail /etc/init.d/sendmail stop rm -rf /var/spool/clientmqueue/* rm -rf /var/spool/mqueue/* /etc/init.d/sendmail start
hg<->git conversion
This is a production system RelEng built, but has not yet transitioned to full IT operation. As a production system, it is supported 24x7x365 - escalate to IT oncall (who can page) as needed.
We'll get problem reports from 2 sources:
- via email from vcs2vcs user to release+vcs2vcs@m.c - see email handling instructions for those.
- via a bug report for a customer visible condition - this should only be if there is a new error we aren't detecting ourselves. See the resources below and/or page hwine.
Documentation for this system:
- recent docs (troubleshooting)
- source code: http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-tools/
- config files: http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-configs/
All services run as user vcs2vcs on one of the following hosts (as of 2013-01-07): github-sync1-dev.dmz.scl3.mozilla.com, github-sync1.dmz.scl3.mozilla.com, github-sync2.dmz.scl3.mozilla.com, github-sync3.dmz.scl3.mozilla.com.
Handling alert_major_errors
# SSH as yourself to the hostname in the 'from' address of the alert_major_errors email. $ ssh yourname@github-sync3.dmz.scl3.mozilla.com $ sudo su - vcs2vcs $ cd etc # find the repo name that vcs2vcs is complaining about. For example: $ grep releases-mozilla-central-no-cvs * job02_cmds:# "hg:$HOME/repos/releases-mozilla-central-no-cvs" "github" # discover where that job runs $ grep job02 status job02_cmds,github-sync3.dmz.scl3.mozilla.com,m-c w/o cvs as used by b2g # connect to that host the same as we did above (if not already connected) # then $ cd logs/job02 # same job as above $ show_update_errors update.log # Note: the command exit code precedes the command itself # eg. ...;255;hg --cwd...
Continue with instructions here.
disable/re-enable aurora updates
Take care of by the person doing the final release since merge day activities are on the Monday before the release.
Upload
Python packages
See https://hg.mozilla.org/build/braindump/file/default/utils/publish_package_our_pypi.sh
Download the tool above, and then run this from your local machine:
publish_package_our_pypi.sh <your_python_package.tar.gz>
How to upload to Tooltool
See ReleaseEngineering/Applications/Tooltool#How_to_upload_to_tooltool
How to enable a user to run Tooltool uploads
See ReleaseEngineering/Applications/Tooltool#How_to_enable_a_user_to_run_tooltool_uploads.
How to upload Talos ZIPs
See How to update the talos zips.
How to add NPM packages
See ReleaseEngineering/How To/Mirror NPM Packages
How to upload new xre.zip files for B2G tests
- You can use the script at https://github.com/jonallengriffin/xregen/blob/master/xre_gen.sh to generate a new xre.zip for any OS, based on a gecko release version. If you need an xre.zip for which there are only nightly builds (but not release builds), you can use xre.zip as a guide for how to construct the package, but you'll need to do it manually.
- After you create the xre.zip's (currently needed for linux64 and macosx64), upload them to tooltool, and then update the relevant mozharness config files, currently:
- b2g/gaia_unit_production_config.py
- b2g/gaia_integration_config.py
- marionette/gaia_ui_test_prod_config.py